Jozef Koval created KAFKA-5397:
----------------------------------

             Summary: streams are not recovering from LockException during 
rebalancing
                 Key: KAFKA-5397
                 URL: https://issues.apache.org/jira/browse/KAFKA-5397
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 0.10.2.1, 0.11.0.0
         Environment: one node setup, confluent kafka broker v3.2.0, 
kafka-clients 0.11.0.0-SNAPSHOT, 5 threads for kafka-streams
            Reporter: Jozef Koval


Probably continuation of #KAFKA-5167. Portions of log:

{code}
2017-06-07 01:17:52,435 WARN  
[73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-5] StreamTask                
 - task [2_0] Failed offset commits 
{browser-aggregation-KSTREAM-MAP-0000000039-repartition-0=OffsetAndMetadata{offset=4725597,
 metadata=''}, 
browser-aggregation-KSTREAM-MAP-0000000052-repartition-0=OffsetAndMetadata{offset=4968164,
 metadata=''}, 
browser-aggregation-KSTREAM-MAP-0000000026-repartition-0=OffsetAndMetadata{offset=2490506,
 metadata=''}, 
browser-aggregation-KSTREAM-MAP-0000000065-repartition-0=OffsetAndMetadata{offset=7457795,
 metadata=''}, 
browser-aggregation-KSTREAM-MAP-0000000013-repartition-0=OffsetAndMetadata{offset=530888,
 metadata=''}} due to Commit cannot be completed since the group has already 
rebalanced and assigned the partitions to another member. This means that the 
time between subsequent calls to poll() was longer than the configured 
max.poll.interval.ms, which typically implies that the poll loop is spending 
too much time message processing. You can address this either by increasing the 
session timeout or by reducing the maximum size of batches returned in poll() 
with max.poll.records.
2017-06-07 01:17:52,436 WARN  
[73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamTask                
 - task [7_0] Failed offset commits 
{browser-aggregation-Aggregate-Counts-repartition-0=OffsetAndMetadata{offset=13275085,
 metadata=''}} due to Commit cannot be completed since the group has already 
rebalanced and assigned the partitions to another member. This means that the 
time between subsequent calls to poll() was longer than the configured 
max.poll.interval.ms, which typically implies that the poll loop is spending 
too much time message processing. You can address this either by increasing the 
session timeout or by reducing the maximum size of batches returned in poll() 
with max.poll.records.
2017-06-07 01:17:52,488 WARN  
[73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread              
 - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Failed 
to commit StreamTask 7_0 state: 
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
completed since the group has already rebalanced and assigned the partitions to 
another member. This means that the time between subsequent calls to poll() was 
longer than the configured max.poll.interval.ms, which typically implies that 
the poll loop is spending too much time message processing. You can address 
this either by increasing the session timeout or by reducing the maximum size 
of batches returned in poll() with max.poll.records.
        at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:792)
at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:738)
        at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:798)
        at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:778)
        at 
org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
        at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
        at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:488)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:348)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:208)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:184)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:605)
        at 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1146)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.commitOffsets(StreamTask.java:307)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.access$000(StreamTask.java:49)
        at 
org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:268)
        at 
org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:187)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.commitImpl(StreamTask.java:259)
        at 
org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:253)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:813)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.access$2800(StreamThread.java:73)
        at 
org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:795)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.performOnStreamTasks(StreamThread.java:1442)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:787)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:776)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:565)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:525)

2017-06-07 01:17:52,747 WARN  
[73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamTask                
 - task [7_0] Failed offset commits 
{browser-aggregation-Aggregate-Counts-repartition-0=OffsetAndMetadata{offset=13275085,
 metadata=''}} due to Commit cannot be completed since the group has already 
rebalanced and assigned the partitions to another member. This means that the 
time between subsequent calls to poll() was longer than the configured 
max.poll.interval.ms, which typically implies that the poll loop is spending 
too much time message processing. You can address this either by increasing the 
session timeout or by reducing the maximum size of batches returned in poll() 
with max.pol
l.records.
2017-06-07 01:17:52,776 ERROR 
[73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread              
 - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Failed 
to suspend stream task 7_0 due to: 
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
completed since the group has already rebalanced and assigned the partitions to 
another member. This means that the time between subsequent calls to poll() was 
longer than the configured max.poll.interval.ms, which typically implies that 
the poll loop is spending too much time message processing. You can address 
this either by increasing the session timeout or by reducing the maximum size 
of batches returned in poll() with max.poll.records.
2017-06-07 01:17:52,781 WARN  
[73e81b0b-5801-40ab-b02d-79afede6cc6-StreamThread-2] StreamTask                 
- task [6_3] Failed offset commits 
{browser-aggregation-Aggregate-Texts-repartition-3=OffsetAndMetadata{offset=13489738,
 metadata=''}} due to Commit cannot be completed since the group has already 
rebalanced and assigned the partitions to another member. This means that the 
time between subsequent calls to poll() was longer than the configured 
max.poll.interval.ms, which typically implies that the poll loop is spending 
too much time message processing. You can address this either by increasing the 
session timeout or by reducing the maximum size of batches returned in poll() 
with max.poll.records.
2017-06-07 01:17:52,781 ERROR 
[73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread              
 - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Failed 
to suspend stream task 6_3 due to: 
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
completed since the group has already rebalanced and assigned the partitions to 
another member. This means that the time between subsequent calls to poll() was 
longer than the configured max.poll.interval.ms, which typically implies that 
the poll loop is spending too much time message processing. You can address 
this either by increasing the session timeout or by reducing the maximum size 
of batches returned in poll() with max.poll.records.
2017-06-07 01:17:52,782 ERROR 
[73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] ConsumerCoordinator       
 - User provided listener 
org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener for 
group browser-aggregation failed on partition revocation
org.apache.kafka.streams.errors.StreamsException: stream-thread 
[73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] failed to suspend stream 
tasks
        at 
org.apache.kafka.streams.processor.internals.StreamThread.suspendTasksAndState(StreamThread.java:1134)
at 
org.apache.kafka.streams.processor.internals.StreamThread.access$1800(StreamThread.java:73)
        at 
org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsRevoked(StreamThread.java:218)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:422)
        at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:353)
        at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
        at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
        at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1051)
        at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1016)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:580)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
        at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:525)

//

2017-06-07 01:18:15,739 WARN  
[73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread              
 - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Could 
not create task 6_2. Will retry. org.apache.kafka.streams.errors.LockException: 
task [6_2] Failed to lock the state directory for task 6_2
2017-06-07 01:18:16,741 WARN  
[73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread              
 - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Could 
not create task 7_2. Will retry. org.apache.kafka.streams.errors.LockException: 
task [7_2] Failed to lock the state directory for task 7_2
2017-06-07 01:18:17,745 WARN  
[73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread              
 - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Could 
not create task 7_3. Will retry. org.apache.kafka.streams.errors.LockException: 
task [7_3] Failed to lock the state directory for task 7_3
2017-06-07 01:18:17,795 WARN  
[73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread              
 - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Still 
retrying to create tasks: [0_0, 1_0, 2_0, 0_2, 3_0, 0_3, 4_0, 3_1, 2_2, 5_0, 
4_1, 3_2, 5_1, 6_0, 3_3, 5_2, 4_3, 6_1, 7_1, 5_3, 6_2, 7_2, 7_3]
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to