[ 
https://issues.apache.org/jira/browse/KAFKA-9701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063689#comment-17063689
 ] 

ASF GitHub Bot commented on KAFKA-9701:
---------------------------------------

abbccdda commented on pull request #8324: KAFKA-9701 (fix): Only check protocol 
name when generation is valid
URL: https://github.com/apache/kafka/pull/8324
 
 
   This bug was incurred by https://github.com/apache/kafka/pull/7994 with a 
too-strong consistency check. It is because a reset generation operation could 
be called in between the `joinGroupRequest` -> `joinGroupResponse` -> 
`SyncGroupRequest` -> `SyncGroupResponse` sequence of events, if user calls 
`unsubscribe` in the middle of consumer#poll().
   
   Proper fix is to avoid the protocol name check when the generation is 
invalid.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Consumer could catch InconsistentGroupProtocolException during rebalance
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-9701
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9701
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.5.0
>            Reporter: Boyang Chen
>            Assignee: Boyang Chen
>            Priority: Major
>         Attachments: cluster.log
>
>
> The bug was due to an out-of-order handling of the SyncGroupRequest after the 
> LeaveGroupRequest.
> The sequence of events are:
>  # The stream thread tries to rejoin the group during runOnce#poll
>  # The join group call was successful and group was waiting for sync group 
> result
>  # Outside the poll, task producer hits FencedException, triggering a 
> partition lost
>  # Stream thread unsubscribes and sends out an leave group, and gets the 
> local generation wipe out 
>  # The sync group response was processed. Although it is legitimate, the 
> local protocol type becomes null in this case
>  # The sync group response hits the protocol type mismatch fatal exception
>  
> [2020-03-20T*10:40:08-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:08,754] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] (Re-)joining group 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2020-03-20T*10:40:11-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:11,152] ERROR [kafka-producer-network-thread | 
> stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-0_1-producer]
>  stream-thread 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] task 
> [0_1] Error encountered sending record to topic network-id-repartition for 
> task 0_1 due to:
> [2020-03-20T10:40:11-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) 
> org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an 
> operation with an old epoch. Either there is a newer producer with the same 
> transactionalId, or the producer's transaction has been expired by the broker.
> [2020-03-20T10:40:12-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:12,048] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> stream-thread 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] at 
> state RUNNING: partitions [logs.json.kafka-1, node-name-repartition-1, 
> logs.json.zookeeper-1, logs.kubernetes-1, windowed-node-counts-1, 
> logs.operator-1, logs.syslog-1] lost due to missed rebalance.
>         lost active tasks: []
>         lost assigned standby tasks: []
>  (org.apache.kafka.streams.processor.internals.StreamThread)
>  
> [2020-03-20T*10:40:12-07:00*] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:12,048] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] Member 
> stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer-34c2198b-5bdd-470b-ae50-30a39873edab
>  sending LeaveGroup request to coordinator 
> ip-172-31-18-29.us-west-2.compute.internal:9092 (id: 2147482644 rack: null) 
> due to the consumer *unsubscribed from all topics* 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2020-03-20T10:40:12-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:12,048] INFO 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] Unsubscribed all topics or patterns and assigned 
> partitions (org.apache.kafka.clients.consumer.KafkaConsumer)
> [2020-03-20T10:40:17-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:16,972] ERROR 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1-consumer,
>  groupId=stream-soak-test] SyncGroup failed due to inconsistent Protocol 
> Name, received stream but expected null 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2020-03-20T10:40:17-07:00] 
> (streams-soak-trunk-eos_soak_i-01629239fa39901b4_streamslog) [2020-03-20 
> 17:40:16,973] ERROR 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> stream-thread 
> [stream-soak-test-f7392d33-55d7-484f-8b72-578e22fead96-StreamThread-1] 
> Encountered the following exception during processing and the thread is going 
> to shut down:  (org.apache.kafka.streams.processor.internals.StreamThread)
>  
> ----------- Original Exception -------------
> INFO log shows that we accidentally hit an unexpected inconsistent group 
> protocol exception:
> [2020-03-10T17:16:53-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,382*] INFO 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> stream-client [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949] State 
> transition from REBALANCING to RUNNING (org.apache.kafka.streams.KafkaStreams)
>  
> [2020-03-10T17:16:53-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,384*] WARN [kafka-producer-network-thread | 
> stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-0_1-producer]
>  stream-thread 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] task 
> [0_1] Error sending record to topic node-name-repartition due to Producer 
> attempted an operation with an old epoch. Either there is a newer producer 
> with the same transactionalId, or the producer's transaction has been expired 
> by the broker.; No more records will be sent and no more offsets will be 
> recorded for this task.
>  
>  
> [2020-03-10T17:16:53-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,521*] INFO 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> [Consumer 
> clientId=stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-consumer,
>  groupId=stream-soak-test] Member 
> stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1-consumer-d1c3c796-0bfb-4c1c-9fb4-5a807d8b53a2
>  sending LeaveGroup request to coordinator 
> ip-172-31-20-215.us-west-2.compute.internal:9092 (id: 2147482646 rack: null) 
> due to the consumer unsubscribed from all topics 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
>  
> [2020-03-10T17:16:54-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> [2020-03-11 *00:16:53,798*] ERROR 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> stream-thread 
> [stream-soak-test-d3da8597-c371-450e-81d9-72aea6a26949-StreamThread-1] 
> Encountered the following unexpected Kafka exception during processing, this 
> usually indicate Streams internal errors: 
> (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-03-10T17:16:54-07:00] 
> (streams-soak-2-5-eos-broker-2-5_soak_i-00067445452c82fe8_streamslog) 
> org.apache.kafka.common.errors.InconsistentGroupProtocolException: The group 
> member's supported protocols are incompatible with those of existing members 
> or first group member tried to join with empty protocol type or empty 
> protocol list.
>  
> Potentially needs further log to understand this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to