[ 
https://issues.apache.org/jira/browse/KAFKA-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553528#comment-14553528
 ] 

Jun Rao commented on KAFKA-2147:
--------------------------------

Yes, one workaround is probably to set a relatively low max.wait (say 100ms) 
and a relatively high min.bytes (say 100K). This way, the fetch request will 
alway expire and the latency is still reasonable. The cleaner will then keep 
having a chance to purge the watcher list.

> Unbalanced replication can cause extreme purgatory growth
> ---------------------------------------------------------
>
>                 Key: KAFKA-2147
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2147
>             Project: Kafka
>          Issue Type: Bug
>          Components: purgatory, replication
>    Affects Versions: 0.8.2.1
>            Reporter: Evan Huus
>            Assignee: Jun Rao
>         Attachments: KAFKA-2147.patch, KAFKA-2147_2015-05-14_09:41:56.patch, 
> KAFKA-2147_2015-05-15_16:14:44.patch, 
> craig-kafka-purgatory-queue-size-issue.png, purgatory.log, purgatory.log.gz, 
> watch-lists.log
>
>
> Apologies in advance, this is going to be a bit of complex description, 
> mainly because we've seen this issue several different ways and we're still 
> tying them together in terms of root cause and analysis.
> It is worth noting now that we have all our producers set up to send 
> RequiredAcks==-1, and that this includes all our MirrorMakers.
> I understand the purgatory is being rewritten (again) for 0.8.3. Hopefully 
> that will incidentally fix this issue, or at least render it moot.
> h4. Symptoms
> Fetch request purgatory on a broker or brokers grows rapidly and steadily at 
> a rate of roughly 1-5K requests per second. Heap memory used also grows to 
> keep pace. When 4-5 million requests have accumulated in purgatory, the 
> purgatory is drained, causing a substantial latency spike. The node will tend 
> to drop leadership, replicate, and recover.
> h5. Case 1 - MirrorMaker
> We first noticed this case when enabling mirrormaker. We had one primary 
> cluster already, with many producers and consumers. We created a second, 
> identical cluster and enabled replication from the original to the new 
> cluster on some topics using mirrormaker. This caused all six nodes in the 
> new cluster to exhibit the symptom in lockstep - their purgatories would all 
> grow together, and get drained within about 20 seconds of each other. The 
> cluster-wide latency spikes at this time caused several problems for us.
> Turning MM on and off turned the problem on and off very precisely. When we 
> stopped MM, the purgatories would all drop to normal levels immediately, and 
> would start climbing again when we restarted it.
> Note that this is the *fetch* purgatories on the brokers that MM was 
> *producing* to, which indicates fairly strongly that this is a replication 
> issue, not a MM issue.
> This particular cluster and MM setup was abandoned for other reasons before 
> we could make much progress debugging.
> h5. Case 2 - Broker 6
> The second time we saw this issue was on the newest broker (broker 6) in the 
> original cluster. For a long time we were running with five nodes, and 
> eventually added a sixth to handle the increased load. At first, we moved 
> only a handful of higher-volume partitions to this broker. Later, we created 
> a group of new topics (totalling around 100 partitions) for testing purposes 
> that were spread automatically across all six nodes. These topics saw 
> occasional traffic, but were generally unused. At this point broker 6 had 
> leadership for about an equal number of high-volume and unused partitions, 
> about 15-20 of each.
> Around this time (we don't have detailed enough data to prove real 
> correlation unfortunately), the issue started appearing on this broker as 
> well, but not on any of the other brokers in the cluster.
> h4. Debugging
> The first thing we tried was to reduce the 
> `fetch.purgatory.purge.interval.requests` from the default of 1000 to a much 
> lower value of 200. This had no noticeable effect at all.
> We then enabled debug logging on broker06 and started looking through that. I 
> can attach complete log samples if necessary, but the thing that stood out 
> for us was a substantial number of the following lines:
> {noformat}
> [2015-04-23 20:05:15,196] DEBUG [KafkaApi-6] Putting fetch request with 
> correlation id 49939 from client ReplicaFetcherThread-0-6 into purgatory 
> (kafka.server.KafkaApis)
> {noformat}
> The volume of these lines seemed to match (approximately) the fetch purgatory 
> growth on that broker.
> At this point we developed a hypothesis (detailed below) which guided our 
> subsequent debugging tests:
> - Setting a daemon up to produce regular random data to all of the topics led 
> by kafka06 (specifically the ones which otherwise would receive no data) 
> substantially alleviated the problem.
> - Doing an additional rebalance of the cluster in order to move a number of 
> other topics with regular data to kafka06 appears to have solved the problem 
> completely.
> h4. Hypothesis
> Current versions (0.8.2.1 and earlier) have issues with the replica fetcher 
> not backing off correctly (KAFKA-1461, KAFKA-2082 and others). I believe that 
> in a very specific situation, the replica fetcher thread of one broker can 
> spam another broker with requests that fill up its purgatory and do not get 
> properly flushed. My best guess is that the necessary conditions are:
> - broker A leads some partitions which receive regular traffic, and some 
> partitions which do not
> - broker B replicates some of each type of partition from broker A
> - some producers are producing with RequiredAcks=-1 (wait for all ISR)
> - broker B happens to divide its replicated partitions such that one of its 
> replica threads consists *only* of partitions which receive no regular traffic
> When the above conditions are met, and broker A receives a produce request 
> (frequently, since it leads some high-volume partitions), it triggers broker 
> B's replica manager, which causes *all* of broker B's replica fetcher threads 
> to send fetch requests. This includes the thread which owns *only* the empty 
> partitions, causing fetch requests for those partitions to build up quickly 
> in broker A's purgatory, causing the issue.
> Hopefully somebody with more kafka experience will be able to validate or 
> disprove my hypothesis. The issue has been resolved for us, for now, by the 
> rebalancing of broker 6, but I would like to fully understand and fix it 
> before we run into again in another context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to