[ https://issues.apache.org/jira/browse/KAFKA-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711247#comment-16711247 ]
Patrik Kleindl commented on KAFKA-7657: --------------------------------------- [~guozhang] {code:java} 2018-11-30 08:58:25,560 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-11) - stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-11] State transition from RUNNING to PARTITIONS_REVOKED 2018-11-30 08:58:25,561 INFO [org.apache.kafka.streams.KafkaStreams] (application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-11) - stream-client [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e] State transition from RUNNING to REBALANCING 2018-11-30 08:58:25,630 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-10) - stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-10] State transition from RUNNING to PARTITIONS_REVOKED 2018-11-30 08:58:25,638 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-9) - stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-9] State transition from RUNNING to PARTITIONS_REVOKED 2018-11-30 08:58:25,885 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-12) - stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-12] State transition from RUNNING to PARTITIONS_REVOKED 2018-11-30 08:58:28,238 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-11) - stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-11] State transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED 2018-11-30 08:58:28,238 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-9) - stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-9] State transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED 2018-11-30 08:58:28,249 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-10) - stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-10] State transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED 2018-11-30 08:58:28,279 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-12) - stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-12] State transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED 2018-11-30 08:58:28,505 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-11) - stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-11] State transition from PARTITIONS_ASSIGNED to RUNNING 2018-11-30 08:58:29,664 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-10) - stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-10] State transition from PARTITIONS_ASSIGNED to RUNNING 2018-11-30 08:58:33,300 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-9) - stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-9] State transition from PARTITIONS_ASSIGNED to RUNNING 2018-11-30 08:58:34,490 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-12) - stream-thread [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-12] State transition from PARTITIONS_ASSIGNED to RUNNING 2018-11-30 08:58:34,491 INFO [org.apache.kafka.streams.KafkaStreams] (application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e-StreamThread-12) - stream-client [application-ab20ef7d-6de2-4335-a8ce-4cb81d01fb7e] State transition from REBALANCING to RUNNING 2018-11-30 09:08:14,273 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - stream-thread [application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15] State transition from RUNNING to PENDING_SHUTDOWN 2018-11-30 09:08:14,715 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - stream-thread [application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15] State transition from PENDING_SHUTDOWN to DEAD 2018-11-30 09:08:45,750 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-16) - stream-thread [application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-16] State transition from RUNNING to PARTITIONS_REVOKED 2018-11-30 09:08:45,750 INFO [org.apache.kafka.streams.KafkaStreams] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-16) - stream-client [application-610151c7-8769-4cc5-9254-969a831e4a4d] State transition from RUNNING to REBALANCING 2018-11-30 09:08:46,928 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-14) - stream-thread [application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-14] State transition from RUNNING to PARTITIONS_REVOKED 2018-11-30 09:08:47,301 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-13) - stream-thread [application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-13] State transition from RUNNING to PARTITIONS_REVOKED 2018-11-30 09:08:47,546 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-14) - stream-thread [application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-14] State transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED 2018-11-30 09:08:47,547 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-16) - stream-thread [application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-16] State transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED 2018-11-30 09:08:47,549 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-13) - stream-thread [application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-13] State transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED 2018-11-30 09:08:47,776 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-13) - stream-thread [application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-13] State transition from PARTITIONS_ASSIGNED to RUNNING 2018-11-30 09:08:47,779 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-14) - stream-thread [application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-14] State transition from PARTITIONS_ASSIGNED to RUNNING 2018-11-30 09:08:47,874 INFO [org.apache.kafka.streams.processor.internals.StreamThread] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-16) - stream-thread [application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-16] State transition from PARTITIONS_ASSIGNED to RUNNING {code} StreamThread-15 died because of this: {code:java} 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - You can increase producer parameter `retries` and `retry.backoff.ms` to avoid this error. 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:130) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:50) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:189) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1253) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:640) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:609) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:566) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:490) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:705) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:532) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:524) 2018-11-30 09:08:14,716 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244) 2018-11-30 09:08:14,717 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:168) 2018-11-30 09:08:14,717 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - at java.lang.Thread.run(Thread.java:748) 2018-11-30 09:08:14,717 ERROR [stderr] (application-610151c7-8769-4cc5-9254-969a831e4a4d-StreamThread-15) - Caused by: org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition. {code} But as the application kept running it should be fine anyway, shouldn't it? And a rebalance should be visible on the other instance too? As far as I can see, for this streams application there are no global state stores. > Invalid reporting of stream state in Kafka streams application > -------------------------------------------------------------- > > Key: KAFKA-7657 > URL: https://issues.apache.org/jira/browse/KAFKA-7657 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.0.1 > Reporter: Thomas Crowley > Priority: Major > Labels: bug > > We have a streams application with 3 instances running, two of which are > reporting the state of REBALANCING even after they have been running for > days. Restarting the application has no effect on the stream state. > This seems suspect because each instance appears to be processing messages, > and the kafka-consumer-groups CLI tool reports hardly any offset lag in any > of the partitions assigned to the REBALANCING consumers. Each partition seems > to be processing an equal amount of records too. > Inspecting the state.dir on disk, it looks like the RocksDB state has been > built and hovers at the expected size on disk. > This problem has persisted for us after we rebuilt our Kafka cluster and > reset topics + consumer groups in our dev environment. > There is nothing in the logs (with level set to DEBUG) in both the broker or > the application that suggests something exceptional has happened causing the > application to be stuck REBALANCING. > We are also running multiple streaming applications where this problem does > not exist. > Two differences between this application and our other streaming applications > are: > * We have processing.guarantee set to exactly_once > * We are using a ValueTransformer which fetches from and puts data on a > windowed state store > The REBALANCING state is returned from both polling the state method of our > KafkaStreams instance, and our custom metric which is derived from some logic > in a KafkaStreams.StateListener class attached via the setStateListener > method. > > While I have provided a bit of context, before I reply with some reproducible > code - is there a simple way in which I can determine that my streams > application is in a RUNNING state without relying on the same mechanisms as > used above? > Further, given that it seems like my application is actually running - could > this perhaps be a bug to do with how the stream state is being reported (in > the context of a transactional stream using the processor API)? > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)