[ 
https://issues.apache.org/jira/browse/KAFKA-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711247#comment-16711247
 ] 

Patrik Kleindl commented on KAFKA-7657:
---------------------------------------

[~guozhang]

 
{code:java}
2018-11-30 08:58:25,560 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-11) - 
stream-thread 
[application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-11] State 
transition from RUNNING to PARTITIONS_REVOKED
2018-11-30 08:58:25,561 INFO [org.apache.kafka.streams.KafkaStreams] 
(application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-11) - 
stream-client [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e] State 
transition from RUNNING to REBALANCING
2018-11-30 08:58:25,630 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-10) - 
stream-thread 
[application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-10] State 
transition from RUNNING to PARTITIONS_REVOKED
2018-11-30 08:58:25,638 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-9) - 
stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-9] 
State transition from RUNNING to PARTITIONS_REVOKED
2018-11-30 08:58:25,885 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-12) - 
stream-thread 
[application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-12] State 
transition from RUNNING to PARTITIONS_REVOKED
2018-11-30 08:58:28,238 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-11) - 
stream-thread 
[application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-11] State 
transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED
2018-11-30 08:58:28,238 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-9) - 
stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-9] 
State transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED
2018-11-30 08:58:28,249 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-10) - 
stream-thread 
[application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-10] State 
transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED
2018-11-30 08:58:28,279 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-12) - 
stream-thread 
[application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-12] State 
transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED
2018-11-30 08:58:28,505 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-11) - 
stream-thread 
[application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-11] State 
transition from PARTITIONS_ASSIGNED to RUNNING
2018-11-30 08:58:29,664 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-10) - 
stream-thread 
[application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-10] State 
transition from PARTITIONS_ASSIGNED to RUNNING
2018-11-30 08:58:33,300 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-9) - 
stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-9] 
State transition from PARTITIONS_ASSIGNED to RUNNING
2018-11-30 08:58:34,490 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-12) - 
stream-thread 
[application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-12] State 
transition from PARTITIONS_ASSIGNED to RUNNING
2018-11-30 08:58:34,491 INFO [org.apache.kafka.streams.KafkaStreams] 
(application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-12) - 
stream-client [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e] State 
transition from REBALANCING to RUNNING
2018-11-30 09:08:14,273 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - 
stream-thread 
[application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15] State 
transition from RUNNING to PENDING_SHUTDOWN
2018-11-30 09:08:14,715 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - 
stream-thread 
[application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15] State 
transition from PENDING_SHUTDOWN to DEAD
2018-11-30 09:08:45,750 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-16) - 
stream-thread 
[application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-16] State 
transition from RUNNING to PARTITIONS_REVOKED
2018-11-30 09:08:45,750 INFO [org.apache.kafka.streams.KafkaStreams] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-16) - 
stream-client [application-610151c7-8769-4cc5-9254-969a831e4a4d] State 
transition from RUNNING to REBALANCING
2018-11-30 09:08:46,928 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-14) - 
stream-thread 
[application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-14] State 
transition from RUNNING to PARTITIONS_REVOKED
2018-11-30 09:08:47,301 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-13) - 
stream-thread 
[application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-13] State 
transition from RUNNING to PARTITIONS_REVOKED
2018-11-30 09:08:47,546 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-14) - 
stream-thread 
[application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-14] State 
transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED
2018-11-30 09:08:47,547 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-16) - 
stream-thread 
[application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-16] State 
transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED
2018-11-30 09:08:47,549 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-13) - 
stream-thread 
[application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-13] State 
transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED
2018-11-30 09:08:47,776 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-13) - 
stream-thread 
[application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-13] State 
transition from PARTITIONS_ASSIGNED to RUNNING
2018-11-30 09:08:47,779 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-14) - 
stream-thread 
[application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-14] State 
transition from PARTITIONS_ASSIGNED to RUNNING
2018-11-30 09:08:47,874 INFO 
[org.apache.kafka.streams.processor.internals.StreamThread] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-16) - 
stream-thread 
[application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-16] State 
transition from PARTITIONS_ASSIGNED to RUNNING

{code}
StreamThread-15 died because of this:
{code:java}
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - You can 
increase producer parameter `retries` and `retry.backoff.ms` to avoid this 
error.
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:130)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:50)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:189)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1253)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:640)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:609)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:566)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:490)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:705)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:532)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:524)
2018-11-30 09:08:14,716 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)
2018-11-30 09:08:14,717 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:168)
2018-11-30 09:08:14,717 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at 
java.lang.Thread.run(Thread.java:748)
2018-11-30 09:08:14,717 ERROR [stderr] 
(application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - Caused by: 
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition.
{code}
But as the application kept running it should be fine anyway, shouldn't it? And 
a rebalance should be visible on the other instance too?

As far as I can see, for this streams application there are no global state 
stores.

 

> Invalid reporting of stream state in Kafka streams application
> --------------------------------------------------------------
>
>                 Key: KAFKA-7657
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7657
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.0.1
>            Reporter: Thomas Crowley
>            Priority: Major
>              Labels: bug
>
> We have a streams application with 3 instances running, two of which are 
> reporting the state of REBALANCING even after they have been running for 
> days. Restarting the application has no effect on the stream state.
> This seems suspect because each instance appears to be processing messages, 
> and the kafka-consumer-groups CLI tool reports hardly any offset lag in any 
> of the partitions assigned to the REBALANCING consumers. Each partition seems 
> to be processing an equal amount of records too.
> Inspecting the state.dir on disk, it looks like the RocksDB state has been 
> built and hovers at the expected size on disk.
> This problem has persisted for us after we rebuilt our Kafka cluster and 
> reset topics + consumer groups in our dev environment.
> There is nothing in the logs (with level set to DEBUG) in both the broker or 
> the application that suggests something exceptional has happened causing the 
> application to be stuck REBALANCING.
> We are also running multiple streaming applications where this problem does 
> not exist.
> Two differences between this application and our other streaming applications 
> are:
>  * We have processing.guarantee set to exactly_once
>  * We are using a ValueTransformer which fetches from and puts data on a 
> windowed state store
> The REBALANCING state is returned from both polling the state method of our 
> KafkaStreams instance, and our custom metric which is derived from some logic 
> in a KafkaStreams.StateListener class attached via the setStateListener 
> method.
>  
> While I have provided a bit of context, before I reply with some reproducible 
> code - is there a simple way in which I can determine that my streams 
> application is in a RUNNING state without relying on the same mechanisms as 
> used above?
> Further, given that it seems like my application is actually running - could 
> this perhaps be a bug to do with how the stream state is being reported (in 
> the context of a transactional stream using the processor API)?
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to