Nevins Bartolomeo created KAFKA-8242: ----------------------------------------
Summary: Exception in ReplicaFetcher blocks replication of all other partitions Key: KAFKA-8242 URL: https://issues.apache.org/jira/browse/KAFKA-8242 Project: Kafka Issue Type: Bug Affects Versions: 1.1.1 Reporter: Nevins Bartolomeo We're seeing the following exception in our replication threads. {code:java} [2019-04-16 14:14:39,724] ERROR [ReplicaFetcher replicaId=15, leaderId=8, fetcherId=0] Error due to (kafka.server.ReplicaFetcherThread) kafka.common.KafkaException: Error processing data for partition testtopic-123 offset 9880379 at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:204) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:169) at scala.Option.foreach(Option.scala:257) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:169) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:166) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:166) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:166) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:166) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:250) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:164) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) Caused by: org.apache.kafka.common.errors.TransactionCoordinatorFencedException: Invalid coordinator epoch: 27 (zombie), 31 (current) {code} While this is an issue itself the larger issue is that this exception kills the replication threads so no other partitions get replicated to this broker. That a single corrupt partition can affect the availability of multiple topics is a great concern to us. -- This message was sent by Atlassian JIRA (v7.6.3#76005)