[ https://issues.apache.org/jira/browse/KAFKA-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Gustafson resolved KAFKA-8242. ------------------------------------ Resolution: Fixed Marking this as resolved. We believe we having fixed the cause of the coordinator fenced error and we have also improved the replica fetcher behavior in KIP-461, as Dhruvil notes. > Exception in ReplicaFetcher blocks replication of all other partitions > ---------------------------------------------------------------------- > > Key: KAFKA-8242 > URL: https://issues.apache.org/jira/browse/KAFKA-8242 > Project: Kafka > Issue Type: Bug > Components: replication > Affects Versions: 1.1.1 > Reporter: Nevins Bartolomeo > Priority: Major > > We're seeing the following exception in our replication threads. > {code:java} > [2019-04-16 14:14:39,724] ERROR [ReplicaFetcher replicaId=15, leaderId=8, > fetcherId=0] Error due to (kafka.server.ReplicaFetcherThread) > kafka.common.KafkaException: Error processing data for partition > testtopic-123 offset 9880379 > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:204) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:169) > at scala.Option.foreach(Option.scala:257) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:169) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:166) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:166) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:166) > at > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:166) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250) > at > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:164) > at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) > Caused by: > org.apache.kafka.common.errors.TransactionCoordinatorFencedException: Invalid > coordinator epoch: 27 (zombie), 31 (current) > {code} > While this is an issue itself the larger issue is that this exception kills > the replication threads so no other partitions get replicated to this broker. > That a single corrupt partition can affect the availability of multiple > topics is a great concern to us. -- This message was sent by Atlassian Jira (v8.3.4#803005)