[ 
https://issues.apache.org/jira/browse/KAFKA-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049582#comment-16049582
 ] 

Vahid Hashemian commented on KAFKA-5430:
----------------------------------------

I could simulate a network failure and get the warning messages below on the 
disconnected consumer after the network recovered:
bq. \[2017-06-14 12:34:39,446\] WARN Auto-commit of offsets 
{test-2=OffsetAndMetadata{offset=6, metadata=''}} failed for group group1: 
Commit cannot be completed since the group has already rebalanced and assigned 
the partitions to another member. This means that the time between subsequent 
calls to poll() was longer than the configured max.poll.interval.ms, which 
typically implies that the poll loop is spending too much time message 
processing. You can address this either by increasing the session timeout or by 
reducing the maximum size of batches returned in poll() with max.poll.records. 
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

However, this goes on for only a fraction of a second and then the group 
rebalances and distributes the partitions among all clients (different from 
what you experienced above). Not sure if there is anything else other than the 
network failure that contributed to the behavior you described.

> new consumers getting data for revoked partitions
> -------------------------------------------------
>
>                 Key: KAFKA-5430
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5430
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions: 0.10.2.0
>            Reporter: Lior Chaga
>         Attachments: kafka_trace.log.gz
>
>
> Due to bad configuration applied to network components, we experienced issues 
> with communication between kafka brokers (causing under replication) as well 
> as producers/consumers not being able to work against kafka.
> The symptoms on the consumer were many errors of the following form:
> {code}
> 2017-06-04 04:27:35,200 ERROR [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] TaboolaKafkaConsumer [] - Failed 
> committing to kafka topicPartitions 
> [requestlegacy-2,requestlegacy-0,requestlegacy-1] 
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing offsets.
> Caused by: org.apache.kafka.common.errors.DisconnectException
> {code}
> So far so good. However, upon network recovery, there were several rebalance 
> operations, which eventually resulted in only one consumer (#14) being 
> assigned with all topic partitions (at this case we're talking about a 
> consumer groups for which all consumers are running in same process):
> {code}
> 2017-06-04 04:27:02,168 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-8, requestlegacy-9] 
> for group session_parser_02
> 2017-06-04 04:27:04,208 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-10, requestlegacy-11] 
> for group session_parser_02
> 2017-06-04 04:27:18,167 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-3, requestlegacy-4, 
> requestlegacy-5] for group session_parser_02
> 2017-06-04 04:27:20,232 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-2, requestlegacy-0, 
> requestlegacy-1] for group session_parser_02
> 2017-06-04 04:27:20,236 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-9, requestlegacy-10, 
> requestlegacy-11] for group session_parser_02
> 2017-06-04 04:27:20,237 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-3, requestlegacy-4, requestlegacy-5] 
> for group session_parser_02
> 2017-06-04 04:27:20,237 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-6, requestlegacy-7, requestlegacy-8] 
> for group session_parser_02
> 2017-06-04 04:27:20,332 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-2, requestlegacy-0, requestlegacy-1] 
> for group session_parser_02
> 2017-06-04 04:28:52,368 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-13_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-6, requestlegacy-7] 
> for group session_parser_02
> 2017-06-04 04:29:15,201 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-2, requestlegacy-0, 
> requestlegacy-1] for group session_parser_02
> 2017-06-04 04:30:22,379 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-6, requestlegacy-7, 
> requestlegacy-8] for group session_parser_02
> 2017-06-04 04:30:24,431 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-9, requestlegacy-10, 
> requestlegacy-11] for group session_parser_02
> 2017-06-04 04:30:38,229 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-3, requestlegacy-4, 
> requestlegacy-5] for group session_parser_02
> 2017-06-04 04:30:39,156 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-13_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-6, requestlegacy-7] for group 
> session_parser_02
> 2017-06-04 04:30:39,156 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-10, requestlegacy-11] for group 
> session_parser_02
> 2017-06-04 04:30:39,156 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-2, requestlegacy-0, requestlegacy-1] 
> for group session_parser_02
> 2017-06-04 04:30:39,156 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-3, requestlegacy-4, requestlegacy-5] 
> for group session_parser_02
> 2017-06-04 04:30:39,157 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-8, requestlegacy-9] for group 
> session_parser_02
> 2017-06-04 04:49:51,118 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-13_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-6, requestlegacy-7] 
> for group session_parser_02
> 2017-06-04 04:50:40,847 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-12_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-3, requestlegacy-4, 
> requestlegacy-5] for group session_parser_02
> 2017-06-04 04:50:58,898 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-15_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-10, requestlegacy-11] 
> for group session_parser_02
> 2017-06-04 04:50:59,010 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-11_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-2, requestlegacy-0, 
> requestlegacy-1] for group session_parser_02
> 2017-06-04 04:52:39,732 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-8, requestlegacy-9] 
> for group session_parser_02
> 2017-06-04 04:55:57,633 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-8, requestlegacy-9] for group 
> session_parser_02
> 2017-06-04 04:57:37,662 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-8, requestlegacy-9] 
> for group session_parser_02
> 2017-06-04 05:00:57,601 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-2, requestlegacy-3, requestlegacy-4, 
> requestlegacy-5, requestlegacy-6, requestlegacy-7, requestlegacy-8, 
> requestlegacy-9, requestlegacy-0, requestlegacy-1, requestlegacy-10, 
> requestlegacy-11] for group session_parser_02
> 2017-06-04 05:02:37,605 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - 
> Revoking previously assigned partitions [requestlegacy-2, requestlegacy-3, 
> requestlegacy-4, requestlegacy-5, requestlegacy-6, requestlegacy-7, 
> requestlegacy-8, requestlegacy-9, requestlegacy-0, requestlegacy-1, 
> requestlegacy-10, requestlegacy-11] for group session_parser_02
> 2017-06-04 05:02:37,666 INFO  [Kafka Topics Cosumer 
> requestlegacy.consumer-14_session_parser_02] ConsumerCoordinator [] - Setting 
> newly assigned partitions [requestlegacy-2, requestlegacy-3, requestlegacy-4, 
> requestlegacy-5, requestlegacy-6, requestlegacy-7, requestlegacy-8, 
> requestlegacy-9, requestlegacy-0, requestlegacy-1, requestlegacy-10, 
> requestlegacy-11] for group session_parser_02
> {code}
> The interesting part is that after the last rebalance, other consumers were 
> still polling messages for hours, and failed upon commit (notice the revoke 
> message on consumer-12, and the commit failure below on same consumer, 2 
> minutes later):
> {code}
> 2017-06-04 04:52:41,396 ERROR [Kafka Topics Cosumer 
> requestlegacy.consumer-12_session_parser_02] TaboolaKafkaConsumer [] - Failed 
> committing to kafka topicPartitions 
> [requestlegacy-3,requestlegacy-4,requestlegacy-5] 
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured max.poll.interval.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
> {code}
> To complete the picture from kafka side, kafka-consumer-group reports all 
> partitions assigned to consumer-14:
> {code}
> requestlegacy                  0          9092336         9092337         1   
>        consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28       
>           consumer-14
> requestlegacy                  1          9094923         9094923         0   
>        consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28       
>           consumer-14
> requestlegacy                  2          9091382         9091382         0   
>        consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28       
>           consumer-14
> requestlegacy                  3          9097052         9097052         0   
>        consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28       
>           consumer-14
> requestlegacy                  4          9097542         9097542         0   
>        consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28       
>           consumer-14
> requestlegacy                  5          9098915         9098915         0   
>        consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28       
>           consumer-14
> requestlegacy                  6          9098308         9098309         1   
>        consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28       
>           consumer-14
> requestlegacy                  7          9096984         9096984         0   
>        consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28       
>           consumer-14
> requestlegacy                  8          9100548         9100548         0   
>        consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28       
>           consumer-14
> requestlegacy                  9          9097615         9097615         0   
>        consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28       
>           consumer-14
> requestlegacy                  10         9095553         9095553         0   
>        consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28       
>           consumer-14
> requestlegacy                  11         9089520         9089520         0   
>        consumer-14-aafabf03-2e94-42ef-922a-ba699c5a8517  /10.105.137.28       
>           consumer-14
> {code}
> We took a trace log (attached) of org.apache.kafka for our consuming service 
> when we discovered the issue, prior to restarting the service (which resulted 
> in full recovery). 
> Also, this happened for multiple consumer groups, this is just one example.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to