[
https://issues.apache.org/jira/browse/KAFKA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guozhang Wang resolved KAFKA-5397.
----------------------------------
Resolution: Fixed
> streams are not recovering from LockException during rebalancing
> ----------------------------------------------------------------
>
> Key: KAFKA-5397
> URL: https://issues.apache.org/jira/browse/KAFKA-5397
> Project: Kafka
> Issue Type: Sub-task
> Components: streams
> Affects Versions: 0.10.2.1, 0.11.0.0
> Environment: one node setup, confluent kafka broker v3.2.0,
> kafka-clients 0.11.0.0-SNAPSHOT, 5 threads for kafka-streams
> Reporter: Jozef Koval
> Fix For: 1.0.0
>
>
> Probably continuation of #KAFKA-5167. Portions of log:
> {code}
> 2017-06-07 01:17:52,435 WARN
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-5] StreamTask
> - task [2_0] Failed offset commits
> {browser-aggregation-KSTREAM-MAP-0000000039-repartition-0=OffsetAndMetadata{offset=4725597,
> metadata=''},
> browser-aggregation-KSTREAM-MAP-0000000052-repartition-0=OffsetAndMetadata{offset=4968164,
> metadata=''},
> browser-aggregation-KSTREAM-MAP-0000000026-repartition-0=OffsetAndMetadata{offset=2490506,
> metadata=''},
> browser-aggregation-KSTREAM-MAP-0000000065-repartition-0=OffsetAndMetadata{offset=7457795,
> metadata=''},
> browser-aggregation-KSTREAM-MAP-0000000013-repartition-0=OffsetAndMetadata{offset=530888,
> metadata=''}} due to Commit cannot be completed since the group has already
> rebalanced and assigned the partitions to another member. This means that the
> time between subsequent calls to poll() was longer than the configured
> max.poll.interval.ms, which typically implies that the poll loop is spending
> too much time message processing. You can address this either by increasing
> the session timeout or by reducing the maximum size of batches returned in
> poll() with max.poll.records.
> 2017-06-07 01:17:52,436 WARN
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamTask
> - task [7_0] Failed offset commits
> {browser-aggregation-Aggregate-Counts-repartition-0=OffsetAndMetadata{offset=13275085,
> metadata=''}} due to Commit cannot be completed since the group has already
> rebalanced and assigned the partitions to another member. This means that the
> time between subsequent calls to poll() was longer than the configured
> max.poll.interval.ms, which typically implies that the poll loop is spending
> too much time message processing. You can address this either by increasing
> the session timeout or by reducing the maximum size of batches returned in
> poll() with max.poll.records.
> 2017-06-07 01:17:52,488 WARN
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread
> - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2]
> Failed to commit StreamTask 7_0 state:
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be
> completed since the group has already rebalanced and assigned the partitions
> to another member. This means that the time between subsequent calls to
> poll() was longer than the configured max.poll.interval.ms, which typically
> implies that the poll loop is spending too much time message processing. You
> can address this either by increasing the session timeout or by reducing the
> maximum size of batches returned in poll() with max.poll.records.
> at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:792)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:738)
> at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:798)
> at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:778)
> at
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
> at
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
> at
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:488)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:348)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:208)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:184)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:605)
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1146)
> at
> org.apache.kafka.streams.processor.internals.StreamTask.commitOffsets(StreamTask.java:307)
> at
> org.apache.kafka.streams.processor.internals.StreamTask.access$000(StreamTask.java:49)
> at
> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:268)
> at
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:187)
> at
> org.apache.kafka.streams.processor.internals.StreamTask.commitImpl(StreamTask.java:259)
> at
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:253)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:813)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.access$2800(StreamThread.java:73)
> at
> org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:795)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.performOnStreamTasks(StreamThread.java:1442)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:787)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:776)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:565)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:525)
> 2017-06-07 01:17:52,747 WARN
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamTask
> - task [7_0] Failed offset commits
> {browser-aggregation-Aggregate-Counts-repartition-0=OffsetAndMetadata{offset=13275085,
> metadata=''}} due to Commit cannot be completed since the group has already
> rebalanced and assigned the partitions to another member. This means that the
> time between subsequent calls to poll() was longer than the configured
> max.poll.interval.ms, which typically implies that the poll loop is spending
> too much time message processing. You can address this either by increasing
> the session timeout or by reducing the maximum size of batches returned in
> poll() with max.pol
> l.records.
> 2017-06-07 01:17:52,776 ERROR
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread
> - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2]
> Failed to suspend stream task 7_0 due to:
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be
> completed since the group has already rebalanced and assigned the partitions
> to another member. This means that the time between subsequent calls to
> poll() was longer than the configured max.poll.interval.ms, which typically
> implies that the poll loop is spending too much time message processing. You
> can address this either by increasing the session timeout or by reducing the
> maximum size of batches returned in poll() with max.poll.records.
> 2017-06-07 01:17:52,781 WARN
> [73e81b0b-5801-40ab-b02d-79afede6cc6-StreamThread-2] StreamTask
> - task [6_3] Failed offset commits
> {browser-aggregation-Aggregate-Texts-repartition-3=OffsetAndMetadata{offset=13489738,
> metadata=''}} due to Commit cannot be completed since the group has already
> rebalanced and assigned the partitions to another member. This means that the
> time between subsequent calls to poll() was longer than the configured
> max.poll.interval.ms, which typically implies that the poll loop is spending
> too much time message processing. You can address this either by increasing
> the session timeout or by reducing the maximum size of batches returned in
> poll() with max.poll.records.
> 2017-06-07 01:17:52,781 ERROR
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread
> - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2]
> Failed to suspend stream task 6_3 due to:
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be
> completed since the group has already rebalanced and assigned the partitions
> to another member. This means that the time between subsequent calls to
> poll() was longer than the configured max.poll.interval.ms, which typically
> implies that the poll loop is spending too much time message processing. You
> can address this either by increasing the session timeout or by reducing the
> maximum size of batches returned in poll() with max.poll.records.
> 2017-06-07 01:17:52,782 ERROR
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] ConsumerCoordinator
> - User provided listener
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener
> for group browser-aggregation failed on partition revocation
> org.apache.kafka.streams.errors.StreamsException: stream-thread
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] failed to suspend
> stream tasks
> at
> org.apache.kafka.streams.processor.internals.StreamThread.suspendTasksAndState(StreamThread.java:1134)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.access$1800(StreamThread.java:73)
> at
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsRevoked(StreamThread.java:218)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:422)
> at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:353)
> at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
> at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1051)
> at
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1016)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:580)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:525)
> //
> 2017-06-07 01:18:15,739 WARN
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread
> - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2]
> Could not create task 6_2. Will retry.
> org.apache.kafka.streams.errors.LockException: task [6_2] Failed to lock the
> state directory for task 6_2
> 2017-06-07 01:18:16,741 WARN
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread
> - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2]
> Could not create task 7_2. Will retry.
> org.apache.kafka.streams.errors.LockException: task [7_2] Failed to lock the
> state directory for task 7_2
> 2017-06-07 01:18:17,745 WARN
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread
> - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2]
> Could not create task 7_3. Will retry.
> org.apache.kafka.streams.errors.LockException: task [7_3] Failed to lock the
> state directory for task 7_3
> 2017-06-07 01:18:17,795 WARN
> [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread
> - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2]
> Still retrying to create tasks: [0_0, 1_0, 2_0, 0_2, 3_0, 0_3, 4_0, 3_1, 2_2,
> 5_0, 4_1, 3_2, 5_1, 6_0, 3_3, 5_2, 4_3, 6_1, 7_1, 5_3, 6_2, 7_2, 7_3]
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)