Jay Kreps created KAFKA-1441:
--------------------------------
Summary: Purgatory purge causes latency spikes
Key: KAFKA-1441
URL: https://issues.apache.org/jira/browse/KAFKA-1441
Project: Kafka
Issue Type: Bug
Reporter: Jay Kreps
The request purgatory has a funky thing where it periodically loops over all
watches and purges them. If you have a fair number of partitions you can
accumulate lots of watches and purging them can take a long time. During this
time all expiry is halted.
Here is an example log:
[2014-05-08 21:07:41,950] INFO ExpiredRequestReaper-2 Expired request after
10ms: 5829 (kafka.server.RequestPurgatory$ExpiredRequestReaper)
[2014-05-08 21:07:41,952] INFO ExpiredRequestReaper-2 Expired request after
10ms: 5882 (kafka.server.RequestPurgatory$ExpiredRequestReaper)
[2014-05-08 21:07:41,967] INFO ExpiredRequestReaper-2 Expired request after
11ms: 5884 (kafka.server.RequestPurgatory$ExpiredRequestReaper)
[2014-05-08 21:07:41,968] INFO ExpiredRequestReaper-2 Purging purgatory
(kafka.server.RequestPurgatory$ExpiredRequestReaper)
[2014-05-08 21:07:41,969] INFO ExpiredRequestReaper-2 Purged 0 requests from
delay queue. (kafka.server.RequestPurgatory$ExpiredRequestReaper)
[2014-05-08 21:07:42,305] INFO ExpiredRequestReaper-2 Purged 340809 (watcher)
requests. (kafka.server.RequestPurgatory$ExpiredRequestReaper)
[2014-05-08 21:07:42,305] INFO ExpiredRequestReaper-2 Expired request after
106ms: 5847 (kafka.server.RequestPurgatory$ExpiredRequestReaper)
[2014-05-08 21:07:42,305] INFO ExpiredRequestReaper-2 Expired request after
106ms: 5904 (kafka.server.RequestPurgatory$ExpiredRequestReaper)
[2014-05-08 21:07:42,328] INFO ExpiredRequestReaper-2 Expired request after
10ms: 5908 (kafka.server.RequestPurgatory$ExpiredRequestReaper)
[2014-05-08 21:07:42,329] INFO ExpiredRequestReaper-2 Expired request after
10ms: 5852 (kafka.server.RequestPurgatory$ExpiredRequestReaper)
[2014-05-08 21:07:42,343] INFO ExpiredRequestReaper-2 Expired request after
11ms: 5854 (kafka.server.RequestPurgatory$ExpiredRequestReaper)
Combined with our buggy purgatory request impls that can sometimes hit their
expiration this can lead to huge latency spikes.
--
This message was sent by Atlassian JIRA
(v6.2#6252)