[ 
https://issues.apache.org/jira/browse/KAFKA-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-543:
----------------------------

    Attachment: KAFKA-543-v2.patch

Hey Neha, I actually kind of agree with you. We need to dedupe on the server 
since we can't trust the client. There are three ways to do the client side 
dedupe: in the request object, in the call, or in AdminUtils. I don't like 
changing the request object because I think the list should map directly to the 
serialized data (i.e same order). I agree that fixing AdminUtils is preferable 
to doing it at the call site.

Here is a second patch. It does the following:
1. Make AdminUtils.getTopicMetaDataFromZK() and BrokerPartitionInfo.updateInfo 
take a set instead of a list. Also dedupe on the server side just in case.
2. Cleanup: Rename AdminUtils.getTopicMetaDataFromZK to 
AdminUtils.fetchTopicMetadataFromZk. This is consistent with our capitalization 
of TopicMetadata. I also dislike the naming of methods as getX since it sounds 
like they are a getter when in fact they do remote requests.
3. Cleanup: add a single topic version of the API since the vast majority of 
uses were fetching only a single topic. Reimplement the mutli-topic api in 
terms of the single topic api.
                
> Metadata request from DefaultEventHandler.handle repeats same topic over and 
> over
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-543
>                 URL: https://issues.apache.org/jira/browse/KAFKA-543
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Jay Kreps
>            Priority: Blocker
>              Labels: bugs
>         Attachments: KAFKA-543.patch, KAFKA-543-v2.patch
>
>
> It looks like we are calling BrokerPartitionInfo.updateInfo() with a list of 
> the same topic repeated many times:
> Here is the line:
> Utils.swallowError(brokerPartitionInfo.updateInfo(outstandingProduceRequests.map(_.getTopic)))
> The outstandingProduceRequests can (and generally would) have many entries 
> for the same topic.
> For example if I use the producer performance test with the default batch 
> size on a topic "test" my metadata request will have the topic "test" 
> repeated 200 times. On the server side we do several zk reads for each of 
> these repetitions.
> This is causing the metadata api to timeout in my perf test periodically.
> I think the fix is simply to de-duplicate prior to the call (and perhaps 
> again on the server in case of a misbehaving client).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to