[ 
https://issues.apache.org/jira/browse/KAFKA-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Crowley updated KAFKA-7657:
----------------------------------
    Description: 
We have a streams application with 3 instances running, two of which are 
reporting the state of REBALANCING even after they have been running for days. 
Restarting the application has no effect on the stream state.

This seems suspect because each instance appears to be processing messages, and 
the kafka-consumer-groups CLI tool reports hardly any offset lag in any of the 
partitions assigned to the REBALANCING consumers. Each partition seems to be 
processing an equal amount of records too.

Inspecting the state.dir on disk, it looks like the RocksDB state has been 
built and hovers at the expected size on disk.

This problem has persisted for us after we rebuilt our Kafka cluster and reset 
topics + consumer groups in our dev environment.

There is nothing in the logs (with level set to DEBUG) in both the broker or 
the application that suggests something exceptional has happened causing the 
application to be stuck REBALANCING.

We are also running multiple streaming applications where this problem does not 
exist.

Two differences between this application and our other streaming applications 
are:
 * We have processing.guarantee set to exactly_once
 * We are using a ValueTransformer which fetches from and puts data on a 
windowed state store

The REBALANCING state is returned from both polling the state method of our 
KafkaStreams instance, and our custom metric which is derived from some logic 
in a KafkaStreams.StateListener class attached via the setStateListener method.

 

While I have provided a bit of context, before I reply with some reproducible 
code - is there a simple way in which I can determine that my streams 
application is in a RUNNING state without relying on the same mechanisms as 
used above?

Further, given that it seems like my application is actually running - could 
this perhaps be a bug to do with how the stream state is being reported (in the 
context of a transactional stream using the processor API)?

 

 

 

 

  was:
We have a streams application with 3 instances running, two of which are 
reporting the state of REBALANCING even after they have been running for days. 
Restarting the application has no effect on the stream state.

This seems suspect because each instance appears to be processing messages, and 
the kafka-consumer-groups CLI tool reports no offset lag in any of the 
partitions assigned to the REBALANCING consumers. Each partition seems to be 
processing an equal amount of records too.

Inspecting the state.dir on disk, it looks like the RocksDB state has been 
built and hovers at the expected size on disk.

This problem has persisted for us after we rebuilt our Kafka cluster and reset 
topics + consumer groups in our dev environment.

There is nothing in the logs (with level set to DEBUG) in both the broker or 
the application that suggests something exceptional has happened causing the 
application to be stuck REBALANCING.

We are also running multiple streaming applications where this problem does not 
exist.

Two differences between this application and our other streaming applications 
are:
 * We have processing.guarantee set to exactly_once
 * We are using a ValueTransformer which fetches from and puts data on a 
windowed state store

The REBALANCING state is returned from both polling the state method of our 
KafkaStreams instance, and our custom metric which is derived from some logic 
in a KafkaStreams.StateListener class attached via the setStateListener method.

 

While I have provided a bit of context, before I reply with some reproducible 
code - is there a simple way in which I can determine that my streams 
application is in a RUNNING state without relying on the same mechanisms as 
used above?

Further, given that it seems like my application is actually running - could 
this perhaps be a bug to do with how the stream state is being reported (in the 
context of a transactional stream using the processor API)?

 

 

 

 


> Invalid reporting of stream state in Kafka streams application
> --------------------------------------------------------------
>
>                 Key: KAFKA-7657
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7657
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.0.1
>            Reporter: Thomas Crowley
>            Priority: Minor
>
> We have a streams application with 3 instances running, two of which are 
> reporting the state of REBALANCING even after they have been running for 
> days. Restarting the application has no effect on the stream state.
> This seems suspect because each instance appears to be processing messages, 
> and the kafka-consumer-groups CLI tool reports hardly any offset lag in any 
> of the partitions assigned to the REBALANCING consumers. Each partition seems 
> to be processing an equal amount of records too.
> Inspecting the state.dir on disk, it looks like the RocksDB state has been 
> built and hovers at the expected size on disk.
> This problem has persisted for us after we rebuilt our Kafka cluster and 
> reset topics + consumer groups in our dev environment.
> There is nothing in the logs (with level set to DEBUG) in both the broker or 
> the application that suggests something exceptional has happened causing the 
> application to be stuck REBALANCING.
> We are also running multiple streaming applications where this problem does 
> not exist.
> Two differences between this application and our other streaming applications 
> are:
>  * We have processing.guarantee set to exactly_once
>  * We are using a ValueTransformer which fetches from and puts data on a 
> windowed state store
> The REBALANCING state is returned from both polling the state method of our 
> KafkaStreams instance, and our custom metric which is derived from some logic 
> in a KafkaStreams.StateListener class attached via the setStateListener 
> method.
>  
> While I have provided a bit of context, before I reply with some reproducible 
> code - is there a simple way in which I can determine that my streams 
> application is in a RUNNING state without relying on the same mechanisms as 
> used above?
> Further, given that it seems like my application is actually running - could 
> this perhaps be a bug to do with how the stream state is being reported (in 
> the context of a transactional stream using the processor API)?
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to