[ 
https://issues.apache.org/jira/browse/KAFKA-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497392#comment-14497392
 ] 

Tim Brooks commented on KAFKA-2102:
-----------------------------------

I added an updated patch. This patch includes a few things:

1. I moved to using a finer locking strategy opposed to attempting to use all 
atomic instructions. None of the methods are synchronized.
2. I delegated the synchronization code and data about when the last update 
was, etc to a new MetadataBookkeeper. When I was first reading the old code I 
had some issues parsing the mixture of cluster state, topic state, state about 
when to do the next update, and state about when the last update had been 
completed. Maybe my changes make this easier to parse. Maybe they don't. 
3. I moved lastNoNodeAvailableMs in the NetworkClient state into the 
MetadataBookkeeper. Since this variable was essentially a failed attempt to 
update metadata, and it was not accessed in any different way for distinct 
metrics, it seemed like it would be nicer to keep state about when the next 
metadata update should happen together.
4. No one has responded to KAFKA-2101. But it was highly relevant to what I was 
working on, so it is affected by this patch. I created a distinction between 
successful metadata update and a metadata update attempt. The metadata-age 
metric only uses the last successful update in its reports. This seemed like 
the correct approach based on the name of that metric. Since a failed update 
does not make the metadata any younger.

The performance improvements are primarily in the 90+ percentile. I ran a 
producer test with both five and eight threads pushing 10,000 messages to 
kafka. And I repeated it ten times. I recorded the time with HDRHistogram.

The improvements were somewhere between 4-30% reduced latency in the 90+%. For 
example at the 0.990625000000 percentile on the five thread test the latency 
was reduced from 14.223 microseconds to 9.775 (31%). At the 0.900000000000 
percentile the latency was reduced from 2.947 to 2.837 (3.9%) So certainly not 
a lot. But pretty consistently across the higher percentiles, the latency is 
improved.

In the five thread test the mean decreased 4.8%. In the eight thread test the 
mean decreased 7.8%.

The code for the latency test can be found here:

https://github.com/tbrooks8/kafka-latency-test

> Remove unnecessary synchronization when managing metadata
> ---------------------------------------------------------
>
>                 Key: KAFKA-2102
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2102
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Tim Brooks
>            Assignee: Tim Brooks
>         Attachments: KAFKA-2102.patch, KAFKA-2102_2015-04-08_00:20:33.patch
>
>
> Usage of the org.apache.kafka.clients.Metadata class is synchronized. It 
> seems like the current functionality could be maintained without 
> synchronizing the whole class.
> I have been working on improving this by moving to finer grained locks and 
> using atomic operations. My initial benchmarking of the producer is that this 
> will improve latency (using HDRHistogram) on submitting messages.
> I have produced an initial patch. I do not necessarily believe this is 
> complete. And I want to definitely produce some more benchmarks. However, I 
> wanted to get early feedback because this change could be deceptively tricky.
> I am interested in knowing if this is:
> 1. Something that is of interest to the maintainers/community.
> 2. Along the right track
> 3. If there are any gotchas that make my current approach naive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to