Steven Schlansker created KAFKA-13593: -----------------------------------------
Summary: ThrottledChannelReaper slows broker shutdown by multiple seconds Key: KAFKA-13593 URL: https://issues.apache.org/jira/browse/KAFKA-13593 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 3.0.0 Reporter: Steven Schlansker We run an embedded KRaft broker in integration tests, to test that our Producer / Consumers are all hooked up and verify our overall pipeline. While trying to deliver CI speed improvements, we noticed that the majority of time for a small test is actually spent in Kafka broker shutdown. From the logs, it looks like the ClientQuotaManager has multiple ThrottledChannelReaper threads and each of them takes up to a second to shutdown. {code:java} 2022-01-12T15:26:31.932Z [main] INFO kafka.log.LogManager - Shutdown complete. 2022-01-12T15:26:31.934Z [main] INFO k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]: Shutting down 2022-01-12T15:26:32.311Z [main] INFO k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]: Shutdown completed 2022-01-12T15:26:32.311Z [ThrottledChannelReaper-Fetch] INFO k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]: Stopped 2022-01-12T15:26:32.311Z [main] INFO k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Produce]: Shutting down 2022-01-12T15:26:33.312Z [ThrottledChannelReaper-Produce] INFO k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Produce]: Stopped 2022-01-12T15:26:33.312Z [main] INFO k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Produce]: Shutdown completed 2022-01-12T15:26:33.312Z [main] INFO k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Request]: Shutting down 2022-01-12T15:26:34.315Z [ThrottledChannelReaper-Request] INFO k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Request]: Stopped 2022-01-12T15:26:34.315Z [main] INFO k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Request]: Shutdown completed 2022-01-12T15:26:34.315Z [main] INFO k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-ControllerMutation]: Shutting down 2022-01-12T15:26:35.317Z [main] INFO k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-ControllerMutation]: Shutdown completed 2022-01-12T15:26:35.317Z [ThrottledChannelReaper-ControllerMutation] INFO k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-ControllerMutation]: Stopped{code} Inspecting the code, the ThrottledChannelReaper threads are marked as not interruptible, so ShutdownableThread does not interrupt the worker on shutdown. Unfortunately, the doWork method polls with a 1 second timeout. So you wait up to 1s for every type of quota, in this case for a total of almost 4s. While this is not a problem for production Kafka brokers, where instances are expected to stay up for months, in tests that expect to spin up and down it is easily noticed and adds real overhead to CI. Suggested possible remediations: * Allow ThrottledChannelReaper to be interrupted by ShutdownableThread * ThrottledChannelReaper overrides initiateShutdown and after calling {{super.initiateShutdown}} then enqueues a {{null}} element on the delayQueue to force worker thread to notice shutdown state * Reduce 1s poll timeout to something small (carries overhead penalty for all users though, so this is less desirable), or make it configurable so we can set it to e.g. 10ms in unit tests -- This message was sent by Atlassian Jira (v8.20.1#820001)