[ 
https://issues.apache.org/jira/browse/KAFKA-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689356#comment-17689356
 ] 

Matthias J. Sax edited comment on KAFKA-14713 at 2/15/23 9:44 PM:
------------------------------------------------------------------

What version are you using? – Also, can you point me to the code where it 
actually waits/hangs (as you did already looked into it, it would be quicker 
this way). – I am not sure yet, if both issues are actually the same though or 
not. (Maybe the "eos" config on the other ticket is a red herring.) But I guess 
we can dig into it a little bit.


was (Author: mjsax):
What version are you using? – Also, can you point me to the code where is 
actually waits/hangs (as you did already looked into it, it would be quicker 
this way). – I am not sure yet, if the issue is still the same though or not. 
But I guess we can dig into it a little bit.

> Kafka Streams global table startup takes too long
> -------------------------------------------------
>
>                 Key: KAFKA-14713
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14713
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Tamas
>            Priority: Critical
>
> *Some context first*
> We have a spring based kafka streams application. This application is 
> listening to two topics. Let's call them apartment and visitor. The 
> apartments are stored in a global table, while the visitors are in the stream 
> we are processing, and at one point we are joining the visitor stream 
> together with the apartment table. In our test environment, both topics 
> contain 10 partitions.
> *Issue*
> At first deployment, everything goes fine, the global table is built and all 
> entries in the stream are processed.
> After everything is finished, we shut down the application, restart it and 
> send out a new set of visitors. The application seemingly does not respond.
> After some more debugging it turned out that it simply takes 5 minutes to 
> start up, because the global table takes 30 seconds (default value for the 
> global request timeout) to accept that there are no messages in the apartment 
> topics, for each and every partition. If we send out the list of apartments 
> as new messages, the application starts up immediately.
> To make matters worse, we have clients with 96 partitions, where the startup 
> time would be 48 minutes. Not having messages in the topics between 
> application shutdown and restart is a valid use case, so this is quite a big 
> problem.
> *Possible workarounds*
> We could reduce the request timeout, but since this value is not specific for 
> the global table initialization, but a global request timeout for a lot of 
> things, we do not know what else it will affect, so we are not very keen on 
> doing that. Even then, it would mean a 1.5 minute delay for this particular 
> client (more if we will have other use cases in the future where we will need 
> to use more global tables), which is far too much, considering that the 
> application would be able to otherwise start in about 20 seconds.
> *Potential solutions we see*
>  # Introduce a specific global table initialization timeout in 
> GlobalStateManagerImpl. Then we would be able to safely modify that value 
> without fear of making some other part of kafka unstable.
>  # Parallelize the initialization of the global table partitions in 
> GlobalStateManagerImpl: knowing that the delay at startup is constant instead 
> of linear with the number of partitions would be a huge help.
>  # As long as we receive a response, accept the empty map in the 
> KafkaConsumer, and continue instead of going into a busy-waiting loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to