[ https://issues.apache.org/jira/browse/KAFKA-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497392#comment-14497392 ]
Tim Brooks commented on KAFKA-2102: ----------------------------------- I added an updated patch. This patch includes a few things: 1. I moved to using a finer locking strategy opposed to attempting to use all atomic instructions. None of the methods are synchronized. 2. I delegated the synchronization code and data about when the last update was, etc to a new MetadataBookkeeper. When I was first reading the old code I had some issues parsing the mixture of cluster state, topic state, state about when to do the next update, and state about when the last update had been completed. Maybe my changes make this easier to parse. Maybe they don't. 3. I moved lastNoNodeAvailableMs in the NetworkClient state into the MetadataBookkeeper. Since this variable was essentially a failed attempt to update metadata, and it was not accessed in any different way for distinct metrics, it seemed like it would be nicer to keep state about when the next metadata update should happen together. 4. No one has responded to KAFKA-2101. But it was highly relevant to what I was working on, so it is affected by this patch. I created a distinction between successful metadata update and a metadata update attempt. The metadata-age metric only uses the last successful update in its reports. This seemed like the correct approach based on the name of that metric. Since a failed update does not make the metadata any younger. The performance improvements are primarily in the 90+ percentile. I ran a producer test with both five and eight threads pushing 10,000 messages to kafka. And I repeated it ten times. I recorded the time with HDRHistogram. The improvements were somewhere between 4-30% reduced latency in the 90+%. For example at the 0.990625000000 percentile on the five thread test the latency was reduced from 14.223 microseconds to 9.775 (31%). At the 0.900000000000 percentile the latency was reduced from 2.947 to 2.837 (3.9%) So certainly not a lot. But pretty consistently across the higher percentiles, the latency is improved. In the five thread test the mean decreased 4.8%. In the eight thread test the mean decreased 7.8%. The code for the latency test can be found here: https://github.com/tbrooks8/kafka-latency-test > Remove unnecessary synchronization when managing metadata > --------------------------------------------------------- > > Key: KAFKA-2102 > URL: https://issues.apache.org/jira/browse/KAFKA-2102 > Project: Kafka > Issue Type: Improvement > Reporter: Tim Brooks > Assignee: Tim Brooks > Attachments: KAFKA-2102.patch, KAFKA-2102_2015-04-08_00:20:33.patch > > > Usage of the org.apache.kafka.clients.Metadata class is synchronized. It > seems like the current functionality could be maintained without > synchronizing the whole class. > I have been working on improving this by moving to finer grained locks and > using atomic operations. My initial benchmarking of the producer is that this > will improve latency (using HDRHistogram) on submitting messages. > I have produced an initial patch. I do not necessarily believe this is > complete. And I want to definitely produce some more benchmarks. However, I > wanted to get early feedback because this change could be deceptively tricky. > I am interested in knowing if this is: > 1. Something that is of interest to the maintainers/community. > 2. Along the right track > 3. If there are any gotchas that make my current approach naive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)