[ 
https://issues.apache.org/jira/browse/KAFKA-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Stephenson updated KAFKA-8900:
-----------------------------------
    Summary: Stalled partition for a consumer group  (was: Stalled partitions)

> Stalled partition for a consumer group
> --------------------------------------
>
>                 Key: KAFKA-8900
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8900
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.1.1
>            Reporter: Luke Stephenson
>            Priority: Major
>
> I'm seeing behaviour where a Scala KafkaConsumer has stalled for 1 partition 
> for a topic.  All other partitions for that topic are successfully being 
> consumed.
> Restarting the consumer process does not resolve the issue.  The consumer is 
> using version 2.3.0 ("org.apache.kafka" % "kafka-clients" % "2.3.0").
> When the consumer starts, I see that it is assigned the partition.  However 
> it then logs:
> {code}
> [Consumer 
> clientId=kafka-bus-router-64c88855cf-hxck7.event-bus-router-consumer.1d1ed7ee-5038-4441-84eb-8080ac130e9a,
>  groupId=event-bus-router] Setting offset for partition 
> maxwell.transactions-22 to the committed offset 
> FetchPosition{offset=275413397, offsetEpoch=Optional[271], 
> currentLeader=LeaderAndEpoch{leader=:-1 (id: -1 rack: null), epoch=271}}
> {code}
> Note that the leader is logged as "-1".  If I search through my application 
> logs for the past couple of days, the only time I ever see this logged on the 
> consumer is for this partition.
> The kafka broker is running version 2.1.1.  On the broker side the logs show:
> {code}
> {"timeMillis":1568087844876,"thread":"kafka-request-handler-1","level":"WARN","loggerName":"state.change.logger","message":"[Broker
>  id=5] Ignoring LeaderAndIsr request from controller 4 with correlation id 
> 15943 epoch 155 for partition maxwell.transactions-22 since its associated 
> leader epoch 270 is not higher than the current leader epoch 
> 270","endOfBatch":false,"loggerFqcn":"org.slf4j.impl.Log4jLoggerAdapter","threadId":72,"threadPriority":5}
> {"timeMillis":1568087844880,"thread":"kafka-request-handler-1","level":"INFO","loggerName":"kafka.server.ReplicaFetcherManager","message":"[ReplicaFetcherManager
>  on broker 5] Removed fetcher for partitions 
> Set(maxwell.transactions-22)","endOfBatch":false,"loggerFqcn":"org.slf4j.impl.Log4jLoggerAdapter","threadId":72,"threadPriority":5}
> {"timeMillis":1568087844880,"thread":"kafka-request-handler-1","level":"INFO","loggerName":"kafka.cluster.Partition","message":"[Partition
>  maxwell.transactions-22 broker=5] maxwell.transactions-22 starts at Leader 
> Epoch 271 from offset 275403423. Previous Leader Epoch was: 
> 270","endOfBatch":false,"loggerFqcn":"org.slf4j.impl.Log4jLoggerAdapter","threadId":72,"threadPriority":5}
> {"timeMillis":1568087844891,"thread":"kafka-request-handler-1","level":"INFO","loggerName":"state.change.logger","message":"[Broker
>  id=5] Skipped the become-leader state change after marking its partition as 
> leader with correlation id 15945 from controller 4 epoch 155 for partition 
> maxwell.transactions-22 (last update controller epoch 155) since it is 
> already the leader for the 
> partition.","endOfBatch":false,"loggerFqcn":"org.slf4j.impl.Log4jLoggerAdapter","threadId":72,"threadPriority":5}
> {code}
> As soon as I restart the broker which is the leader for that partition, the 
> messages flow through to the consumer.
> Given restarts of the consumer don't help, but restarting the broker allows 
> the stalled partition to resume, I'm inclined to think this is an issue with 
> the broker.  Please let me know if I can assist further with investigating or 
> resolving this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to