[ https://issues.apache.org/jira/browse/KAFKA-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ming Liu updated KAFKA-10398: ----------------------------- Description: When I tried the intra-broker disk move on 2.5.0, it always failed quickly in onPartitionFenced() failure. That is all the log for ReplicaAlterLogManager: {code:java} [2020-06-03 04:52:17,541] INFO [ReplicaAlterLogDirsManager on broker 5] Added fetcher to broker BrokerEndPoint(id=5, host=localhost:-1) for partitions Map(author_id_enrichment_changelog_staging-302 -> (offset=0, leaderEpoch=45)) (kafka.server.ReplicaAlterLogDirsManager) [2020-06-03 04:52:17,546] INFO [ReplicaAlterLogDirsThread-5]: Starting (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,546] INFO [ReplicaAlterLogDirsThread-5]: Truncating partition author_id_enrichment_changelog_staging-302 to local high watermark 0 (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,547] INFO [ReplicaAlterLogDirsThread-5]: Beginning/resuming copy of partition author_id_enrichment_changelog_staging-302 from offset 0. Including this partition, there are 1 remaining partitions to copy by this thread. (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,547] WARN [ReplicaAlterLogDirsThread-5]: Reset fetch offset for partition author_id_enrichment_changelog_staging-302 from 0 to current leader's start offset 1656927679 (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,550] INFO [ReplicaAlterLogDirsThread-5]: Current offset 0 for partition author_id_enrichment_changelog_staging-302 is out of range, which typically implies a leader change. Reset fetch offset to 1656927679 (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,653] INFO [ReplicaAlterLogDirsThread-5]: Partition author_id_enrichment_changelog_staging-302 has an older epoch (45) than the current leader. Will await the new LeaderAndIsr state before resuming fetching. (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,653] WARN [ReplicaAlterLogDirsThread-5]: Partition author_id_enrichment_changelog_staging-302 marked as failed (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,657] INFO [ReplicaAlterLogDirsThread-5]: Shutting down (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,661] INFO [ReplicaAlterLogDirsThread-5]: Stopped (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,661] INFO [ReplicaAlterLogDirsThread-5]: Shutdown completed (kafka.server.ReplicaAlterLogDirsThread){code} Only after restart the broker, the disk move succeed. The offset and epoch number looks better. {code:java} [2020-06-03 05:20:12,597] INFO [ReplicaAlterLogDirsManager on broker 5] Added fetcher to broker BrokerEndPoint(id=5, host=localhost:-1) for partitions Map(author_id_enrichment_changelog_staging-302 -> (offset=1663111146, leaderEpoch=47)) (kafka.server.ReplicaAlterLogDirsManager) [2020-06-03 05:20:12,606] INFO [ReplicaAlterLogDirsThread-5]: Starting (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 05:20:12,618] INFO [ReplicaAlterLogDirsThread-5]: Beginning/resuming copy of partition author_id_enrichment_changelog_staging-302 from offset 1657605964. Including this partition, there are 1 remaining partitions to copy by this thread. (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 05:20:20,992] INFO [ReplicaAlterLogDirsThread-5]: Shutting down (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 05:20:20,994] INFO [ReplicaAlterLogDirsThread-5]: Shutdown completed (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 05:20:20,994] INFO [ReplicaAlterLogDirsThread-5]: Stopped (kafka.server.ReplicaAlterLogDirsThread) {code} was: When I tried the intra-broker disk move on 2.5.0, it always failed quickly in onPartitionFenced() failure. That is all the log for ReplicaAlterLogManager: [2020-06-03 04:52:17,541] INFO [ReplicaAlterLogDirsManager on broker 5] Added fetcher to broker BrokerEndPoint(id=5, host=localhost:-1) for partitions Map(author_id_enrichment_changelog_staging-302 -> (offset=0, leaderEpoch=45)) (kafka.server.ReplicaAlterLogDirsManager) [2020-06-03 04:52:17,546] INFO [ReplicaAlterLogDirsThread-5]: Starting (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,546] INFO [ReplicaAlterLogDirsThread-5]: Truncating partition author_id_enrichment_changelog_staging-302 to local high watermark 0 (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,547] INFO [ReplicaAlterLogDirsThread-5]: Beginning/resuming copy of partition author_id_enrichment_changelog_staging-302 from offset 0. Including this partition, there are 1 remaining partitions to copy by this thread. (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,547] WARN [ReplicaAlterLogDirsThread-5]: Reset fetch offset for partition author_id_enrichment_changelog_staging-302 from 0 to current leader's start offset 1656927679 (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,550] INFO [ReplicaAlterLogDirsThread-5]: Current offset 0 for partition author_id_enrichment_changelog_staging-302 is out of range, which typically implies a leader change. Reset fetch offset to 1656927679 (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,653] INFO [ReplicaAlterLogDirsThread-5]: Partition author_id_enrichment_changelog_staging-302 has an older epoch (45) than the current leader. Will await the new LeaderAndIsr state before resuming fetching. (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,653] WARN [ReplicaAlterLogDirsThread-5]: Partition author_id_enrichment_changelog_staging-302 marked as failed (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,657] INFO [ReplicaAlterLogDirsThread-5]: Shutting down (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,661] INFO [ReplicaAlterLogDirsThread-5]: Stopped (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 04:52:17,661] INFO [ReplicaAlterLogDirsThread-5]: Shutdown completed (kafka.server.ReplicaAlterLogDirsThread) Only after restart the broker, the disk move succeed. The offset and epoch number looks better. [2020-06-03 05:20:12,597] INFO [ReplicaAlterLogDirsManager on broker 5] Added fetcher to broker BrokerEndPoint(id=5, host=localhost:-1) for partitions Map(author_id_enrichment_changelog_staging-302 -> (offset=1663111146, leaderEpoch=47)) (kafka.server.ReplicaAlterLogDirsManager) [2020-06-03 05:20:12,606] INFO [ReplicaAlterLogDirsThread-5]: Starting (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 05:20:12,618] INFO [ReplicaAlterLogDirsThread-5]: Beginning/resuming copy of partition author_id_enrichment_changelog_staging-302 from offset 1657605964. Including this partition, there are 1 remaining partitions to copy by this thread. (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 05:20:20,992] INFO [ReplicaAlterLogDirsThread-5]: Shutting down (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 05:20:20,994] INFO [ReplicaAlterLogDirsThread-5]: Shutdown completed (kafka.server.ReplicaAlterLogDirsThread) [2020-06-03 05:20:20,994] INFO [ReplicaAlterLogDirsThread-5]: Stopped (kafka.server.ReplicaAlterLogDirsThread) > Intra-broker disk move failed with onPartitionFenced() > ------------------------------------------------------ > > Key: KAFKA-10398 > URL: https://issues.apache.org/jira/browse/KAFKA-10398 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.5.0 > Reporter: Ming Liu > Priority: Major > > When I tried the intra-broker disk move on 2.5.0, it always failed quickly in > onPartitionFenced() failure. That is all the log for ReplicaAlterLogManager: > {code:java} > [2020-06-03 04:52:17,541] INFO [ReplicaAlterLogDirsManager on broker 5] > Added fetcher to broker BrokerEndPoint(id=5, host=localhost:-1) for > partitions Map(author_id_enrichment_changelog_staging-302 -> (offset=0, > leaderEpoch=45)) (kafka.server.ReplicaAlterLogDirsManager) > [2020-06-03 04:52:17,546] INFO [ReplicaAlterLogDirsThread-5]: Starting > (kafka.server.ReplicaAlterLogDirsThread) > [2020-06-03 04:52:17,546] INFO [ReplicaAlterLogDirsThread-5]: Truncating > partition author_id_enrichment_changelog_staging-302 to local high watermark > 0 (kafka.server.ReplicaAlterLogDirsThread) > [2020-06-03 04:52:17,547] INFO [ReplicaAlterLogDirsThread-5]: > Beginning/resuming copy of partition > author_id_enrichment_changelog_staging-302 from offset 0. Including this > partition, there are 1 remaining partitions to copy by this thread. > (kafka.server.ReplicaAlterLogDirsThread) > [2020-06-03 04:52:17,547] WARN [ReplicaAlterLogDirsThread-5]: Reset fetch > offset for partition author_id_enrichment_changelog_staging-302 from 0 to > current leader's start offset 1656927679 > (kafka.server.ReplicaAlterLogDirsThread) > [2020-06-03 04:52:17,550] INFO [ReplicaAlterLogDirsThread-5]: Current offset > 0 for partition author_id_enrichment_changelog_staging-302 is out of range, > which typically implies a leader change. Reset fetch offset to 1656927679 > (kafka.server.ReplicaAlterLogDirsThread) > [2020-06-03 04:52:17,653] INFO [ReplicaAlterLogDirsThread-5]: Partition > author_id_enrichment_changelog_staging-302 has an older epoch (45) than the > current leader. Will await the new LeaderAndIsr state before resuming > fetching. (kafka.server.ReplicaAlterLogDirsThread) > [2020-06-03 04:52:17,653] WARN [ReplicaAlterLogDirsThread-5]: Partition > author_id_enrichment_changelog_staging-302 marked as failed > (kafka.server.ReplicaAlterLogDirsThread) > [2020-06-03 04:52:17,657] INFO [ReplicaAlterLogDirsThread-5]: Shutting down > (kafka.server.ReplicaAlterLogDirsThread) > [2020-06-03 04:52:17,661] INFO [ReplicaAlterLogDirsThread-5]: Stopped > (kafka.server.ReplicaAlterLogDirsThread) > [2020-06-03 04:52:17,661] INFO [ReplicaAlterLogDirsThread-5]: Shutdown > completed (kafka.server.ReplicaAlterLogDirsThread){code} > Only after restart the broker, the disk move succeed. The offset and epoch > number looks better. > {code:java} > [2020-06-03 05:20:12,597] INFO [ReplicaAlterLogDirsManager on broker 5] > Added fetcher to broker BrokerEndPoint(id=5, host=localhost:-1) for > partitions Map(author_id_enrichment_changelog_staging-302 -> > (offset=1663111146, leaderEpoch=47)) (kafka.server.ReplicaAlterLogDirsManager) > [2020-06-03 05:20:12,606] INFO [ReplicaAlterLogDirsThread-5]: Starting > (kafka.server.ReplicaAlterLogDirsThread) > [2020-06-03 05:20:12,618] INFO [ReplicaAlterLogDirsThread-5]: > Beginning/resuming copy of partition > author_id_enrichment_changelog_staging-302 from offset 1657605964. Including > this partition, there are 1 remaining partitions to copy by this thread. > (kafka.server.ReplicaAlterLogDirsThread) > [2020-06-03 05:20:20,992] INFO [ReplicaAlterLogDirsThread-5]: Shutting down > (kafka.server.ReplicaAlterLogDirsThread) > [2020-06-03 05:20:20,994] INFO [ReplicaAlterLogDirsThread-5]: Shutdown > completed (kafka.server.ReplicaAlterLogDirsThread) > [2020-06-03 05:20:20,994] INFO [ReplicaAlterLogDirsThread-5]: Stopped > (kafka.server.ReplicaAlterLogDirsThread) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)