[ https://issues.apache.org/jira/browse/KAFKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932577#comment-13932577 ]
Jun Rao commented on KAFKA-1303: -------------------------------- Guozhang, The new producer doesn't have to wait for a response before sending out a new produce request. However, the broker only processes 1 request from a client one at a time. So, if you have 100 produce requests from the same client queued up on the broker, each has a size of 100 byte, each replica fetch request is still going to take 500ms. Jay, Yes, I thought about the smarter heuristic as well. It wouldn't work for this particular case though. All existing connections are experiencing the same problem, i.e., each has a large # of in-flight requests.Those existing connections are to the leader of different partitions. > metadata request in the new producer can be delayed > --------------------------------------------------- > > Key: KAFKA-1303 > URL: https://issues.apache.org/jira/browse/KAFKA-1303 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.8.2 > Reporter: Jun Rao > > While debugging a system test, I observed the following. > 1. A broker side configuration > (replica.fetch.wait.max.ms=500,replica.fetch.min.bytes=4096) made the time to > complete a produce request long (each taking about 500ms with ack=-1). > 2. The producer client has a bunch of outstanding produce requests queued up > on the brokers. > 3. One of the brokers fails and we force updating the metadata. > 4. The metadata request is queued up behind those outstanding producer > requests. > 5. By the time the metadata response comes back, some messages have failed > all retries because of stale metadata. -- This message was sent by Atlassian JIRA (v6.2#6252)