[ https://issues.apache.org/jira/browse/KAFKA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias J. Sax updated KAFKA-14713: ------------------------------------ Fix Version/s: 3.2.0 (was: 3.4.0) > Kafka Streams global table startup takes too long > ------------------------------------------------- > > Key: KAFKA-14713 > URL: https://issues.apache.org/jira/browse/KAFKA-14713 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 3.0.2 > Reporter: Tamas > Priority: Critical > Fix For: 3.2.0 > > > *Some context first* > We have a spring based kafka streams application. This application is > listening to two topics. Let's call them apartment and visitor. The > apartments are stored in a global table, while the visitors are in the stream > we are processing, and at one point we are joining the visitor stream > together with the apartment table. In our test environment, both topics > contain 10 partitions. > *Issue* > At first deployment, everything goes fine, the global table is built and all > entries in the stream are processed. > After everything is finished, we shut down the application, restart it and > send out a new set of visitors. The application seemingly does not respond. > After some more debugging it turned out that it simply takes 5 minutes to > start up, because the global table takes 30 seconds (default value for the > global request timeout) to accept that there are no messages in the apartment > topics, for each and every partition. If we send out the list of apartments > as new messages, the application starts up immediately. > To make matters worse, we have clients with 96 partitions, where the startup > time would be 48 minutes. Not having messages in the topics between > application shutdown and restart is a valid use case, so this is quite a big > problem. > *Possible workarounds* > We could reduce the request timeout, but since this value is not specific for > the global table initialization, but a global request timeout for a lot of > things, we do not know what else it will affect, so we are not very keen on > doing that. Even then, it would mean a 1.5 minute delay for this particular > client (more if we will have other use cases in the future where we will need > to use more global tables), which is far too much, considering that the > application would be able to otherwise start in about 20 seconds. > *Potential solutions we see* > # Introduce a specific global table initialization timeout in > GlobalStateManagerImpl. Then we would be able to safely modify that value > without fear of making some other part of kafka unstable. > # Parallelize the initialization of the global table partitions in > GlobalStateManagerImpl: knowing that the delay at startup is constant instead > of linear with the number of partitions would be a huge help. > # As long as we receive a response, accept the empty map in the > KafkaConsumer, and continue instead of going into a busy-waiting loop. -- This message was sent by Atlassian Jira (v8.20.10#820010)