Steven Schlansker created KAFKA-13593:
-----------------------------------------

             Summary: ThrottledChannelReaper slows broker shutdown by multiple 
seconds
                 Key: KAFKA-13593
                 URL: https://issues.apache.org/jira/browse/KAFKA-13593
             Project: Kafka
          Issue Type: Improvement
          Components: core
    Affects Versions: 3.0.0
            Reporter: Steven Schlansker


We run an embedded KRaft broker in integration tests, to test that our Producer 
/ Consumers are all hooked up and verify our overall pipeline.

While trying to deliver CI speed improvements, we noticed that the majority of 
time for a small test is actually spent in Kafka broker shutdown. From the 
logs, it looks like the ClientQuotaManager has multiple ThrottledChannelReaper 
threads and each of them takes up to a second to shutdown.
{code:java}
2022-01-12T15:26:31.932Z [main] INFO  kafka.log.LogManager - Shutdown complete.
2022-01-12T15:26:31.934Z [main] INFO  
k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]: 
Shutting down
2022-01-12T15:26:32.311Z [main] INFO  
k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]: 
Shutdown completed
2022-01-12T15:26:32.311Z [ThrottledChannelReaper-Fetch] INFO  
k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]: 
Stopped
2022-01-12T15:26:32.311Z [main] INFO  
k.s.ClientQuotaManager$ThrottledChannelReaper - 
[ThrottledChannelReaper-Produce]: Shutting down
2022-01-12T15:26:33.312Z [ThrottledChannelReaper-Produce] INFO  
k.s.ClientQuotaManager$ThrottledChannelReaper - 
[ThrottledChannelReaper-Produce]: Stopped
2022-01-12T15:26:33.312Z [main] INFO  
k.s.ClientQuotaManager$ThrottledChannelReaper - 
[ThrottledChannelReaper-Produce]: Shutdown completed
2022-01-12T15:26:33.312Z [main] INFO  
k.s.ClientQuotaManager$ThrottledChannelReaper - 
[ThrottledChannelReaper-Request]: Shutting down
2022-01-12T15:26:34.315Z [ThrottledChannelReaper-Request] INFO  
k.s.ClientQuotaManager$ThrottledChannelReaper - 
[ThrottledChannelReaper-Request]: Stopped
2022-01-12T15:26:34.315Z [main] INFO  
k.s.ClientQuotaManager$ThrottledChannelReaper - 
[ThrottledChannelReaper-Request]: Shutdown completed
2022-01-12T15:26:34.315Z [main] INFO  
k.s.ClientQuotaManager$ThrottledChannelReaper - 
[ThrottledChannelReaper-ControllerMutation]: Shutting down
2022-01-12T15:26:35.317Z [main] INFO  
k.s.ClientQuotaManager$ThrottledChannelReaper - 
[ThrottledChannelReaper-ControllerMutation]: Shutdown completed
2022-01-12T15:26:35.317Z [ThrottledChannelReaper-ControllerMutation] INFO  
k.s.ClientQuotaManager$ThrottledChannelReaper - 
[ThrottledChannelReaper-ControllerMutation]: Stopped{code}
Inspecting the code, the ThrottledChannelReaper threads are marked as not 
interruptible, so ShutdownableThread does not interrupt the worker on shutdown. 
Unfortunately, the doWork method polls with a 1 second timeout. So you wait up 
to 1s for every type of quota, in this case for a total of almost 4s.

 

While this is not a problem for production Kafka brokers, where instances are 
expected to stay up for months, in tests that expect to spin up and down it is 
easily noticed and adds real overhead to CI.

 

Suggested possible remediations:
 * Allow ThrottledChannelReaper to be interrupted by ShutdownableThread
 * ThrottledChannelReaper overrides initiateShutdown and after calling 
{{super.initiateShutdown}} then enqueues a {{null}} element on the delayQueue 
to force worker thread to notice shutdown state
 * Reduce 1s poll timeout to something small (carries overhead penalty for all 
users though, so this is less desirable), or make it configurable so we can set 
it to e.g. 10ms in unit tests



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to