[
https://issues.apache.org/jira/browse/KAFKA-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ming Liu updated KAFKA-10398:
-----------------------------
Description:
When I tried the intra-broker disk move on 2.5.0, it always failed quickly in
onPartitionFenced() failure. That is all the log for ReplicaAlterLogManager:
{code:java}
[2020-06-03 04:52:17,541] INFO [ReplicaAlterLogDirsManager on broker 5] Added
fetcher to broker BrokerEndPoint(id=5, host=localhost:-1) for partitions
Map(author_id_enrichment_changelog_staging-302 -> (offset=0, leaderEpoch=45))
(kafka.server.ReplicaAlterLogDirsManager)
[2020-06-03 04:52:17,546] INFO [ReplicaAlterLogDirsThread-5]: Starting
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,546] INFO [ReplicaAlterLogDirsThread-5]: Truncating
partition author_id_enrichment_changelog_staging-302 to local high watermark 0
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,547] INFO [ReplicaAlterLogDirsThread-5]:
Beginning/resuming copy of partition author_id_enrichment_changelog_staging-302
from offset 0. Including this partition, there are 1 remaining partitions to
copy by this thread. (kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,547] WARN [ReplicaAlterLogDirsThread-5]: Reset fetch
offset for partition author_id_enrichment_changelog_staging-302 from 0 to
current leader's start offset 1656927679
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,550] INFO [ReplicaAlterLogDirsThread-5]: Current offset 0
for partition author_id_enrichment_changelog_staging-302 is out of range, which
typically implies a leader change. Reset fetch offset to 1656927679
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,653] INFO [ReplicaAlterLogDirsThread-5]: Partition
author_id_enrichment_changelog_staging-302 has an older epoch (45) than the
current leader. Will await the new LeaderAndIsr state before resuming fetching.
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,653] WARN [ReplicaAlterLogDirsThread-5]: Partition
author_id_enrichment_changelog_staging-302 marked as failed
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,657] INFO [ReplicaAlterLogDirsThread-5]: Shutting down
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,661] INFO [ReplicaAlterLogDirsThread-5]: Stopped
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,661] INFO [ReplicaAlterLogDirsThread-5]: Shutdown
completed (kafka.server.ReplicaAlterLogDirsThread){code}
Only after restart the broker, the disk move succeed. The offset and epoch
number looks better.
{code:java}
[2020-06-03 05:20:12,597] INFO [ReplicaAlterLogDirsManager on broker 5] Added
fetcher to broker BrokerEndPoint(id=5, host=localhost:-1) for partitions
Map(author_id_enrichment_changelog_staging-302 -> (offset=1663111146,
leaderEpoch=47)) (kafka.server.ReplicaAlterLogDirsManager)
[2020-06-03 05:20:12,606] INFO [ReplicaAlterLogDirsThread-5]: Starting
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 05:20:12,618] INFO [ReplicaAlterLogDirsThread-5]:
Beginning/resuming copy of partition author_id_enrichment_changelog_staging-302
from offset 1657605964. Including this partition, there are 1 remaining
partitions to copy by this thread. (kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 05:20:20,992] INFO [ReplicaAlterLogDirsThread-5]: Shutting down
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 05:20:20,994] INFO [ReplicaAlterLogDirsThread-5]: Shutdown
completed (kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 05:20:20,994] INFO [ReplicaAlterLogDirsThread-5]: Stopped
(kafka.server.ReplicaAlterLogDirsThread)
{code}
was:
When I tried the intra-broker disk move on 2.5.0, it always failed quickly in
onPartitionFenced() failure. That is all the log for ReplicaAlterLogManager:
[2020-06-03 04:52:17,541] INFO [ReplicaAlterLogDirsManager on broker 5] Added
fetcher to broker BrokerEndPoint(id=5, host=localhost:-1) for partitions
Map(author_id_enrichment_changelog_staging-302 -> (offset=0, leaderEpoch=45))
(kafka.server.ReplicaAlterLogDirsManager)
[2020-06-03 04:52:17,546] INFO [ReplicaAlterLogDirsThread-5]: Starting
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,546] INFO [ReplicaAlterLogDirsThread-5]: Truncating
partition author_id_enrichment_changelog_staging-302 to local high watermark 0
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,547] INFO [ReplicaAlterLogDirsThread-5]:
Beginning/resuming copy of partition author_id_enrichment_changelog_staging-302
from offset 0. Including this partition, there are 1 remaining partitions to
copy by this thread. (kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,547] WARN [ReplicaAlterLogDirsThread-5]: Reset fetch
offset for partition author_id_enrichment_changelog_staging-302 from 0 to
current leader's start offset 1656927679
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,550] INFO [ReplicaAlterLogDirsThread-5]: Current offset 0
for partition author_id_enrichment_changelog_staging-302 is out of range, which
typically implies a leader change. Reset fetch offset to 1656927679
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,653] INFO [ReplicaAlterLogDirsThread-5]: Partition
author_id_enrichment_changelog_staging-302 has an older epoch (45) than the
current leader. Will await the new LeaderAndIsr state before resuming fetching.
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,653] WARN [ReplicaAlterLogDirsThread-5]: Partition
author_id_enrichment_changelog_staging-302 marked as failed
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,657] INFO [ReplicaAlterLogDirsThread-5]: Shutting down
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,661] INFO [ReplicaAlterLogDirsThread-5]: Stopped
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 04:52:17,661] INFO [ReplicaAlterLogDirsThread-5]: Shutdown
completed (kafka.server.ReplicaAlterLogDirsThread)
Only after restart the broker, the disk move succeed. The offset and epoch
number looks better.
[2020-06-03 05:20:12,597] INFO [ReplicaAlterLogDirsManager on broker 5] Added
fetcher to broker BrokerEndPoint(id=5, host=localhost:-1) for partitions
Map(author_id_enrichment_changelog_staging-302 -> (offset=1663111146,
leaderEpoch=47)) (kafka.server.ReplicaAlterLogDirsManager)
[2020-06-03 05:20:12,606] INFO [ReplicaAlterLogDirsThread-5]: Starting
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 05:20:12,618] INFO [ReplicaAlterLogDirsThread-5]:
Beginning/resuming copy of partition author_id_enrichment_changelog_staging-302
from offset 1657605964. Including this partition, there are 1 remaining
partitions to copy by this thread. (kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 05:20:20,992] INFO [ReplicaAlterLogDirsThread-5]: Shutting down
(kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 05:20:20,994] INFO [ReplicaAlterLogDirsThread-5]: Shutdown
completed (kafka.server.ReplicaAlterLogDirsThread)
[2020-06-03 05:20:20,994] INFO [ReplicaAlterLogDirsThread-5]: Stopped
(kafka.server.ReplicaAlterLogDirsThread)
> Intra-broker disk move failed with onPartitionFenced()
> ------------------------------------------------------
>
> Key: KAFKA-10398
> URL: https://issues.apache.org/jira/browse/KAFKA-10398
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 2.5.0
> Reporter: Ming Liu
> Priority: Major
>
> When I tried the intra-broker disk move on 2.5.0, it always failed quickly in
> onPartitionFenced() failure. That is all the log for ReplicaAlterLogManager:
> {code:java}
> [2020-06-03 04:52:17,541] INFO [ReplicaAlterLogDirsManager on broker 5]
> Added fetcher to broker BrokerEndPoint(id=5, host=localhost:-1) for
> partitions Map(author_id_enrichment_changelog_staging-302 -> (offset=0,
> leaderEpoch=45)) (kafka.server.ReplicaAlterLogDirsManager)
> [2020-06-03 04:52:17,546] INFO [ReplicaAlterLogDirsThread-5]: Starting
> (kafka.server.ReplicaAlterLogDirsThread)
> [2020-06-03 04:52:17,546] INFO [ReplicaAlterLogDirsThread-5]: Truncating
> partition author_id_enrichment_changelog_staging-302 to local high watermark
> 0 (kafka.server.ReplicaAlterLogDirsThread)
> [2020-06-03 04:52:17,547] INFO [ReplicaAlterLogDirsThread-5]:
> Beginning/resuming copy of partition
> author_id_enrichment_changelog_staging-302 from offset 0. Including this
> partition, there are 1 remaining partitions to copy by this thread.
> (kafka.server.ReplicaAlterLogDirsThread)
> [2020-06-03 04:52:17,547] WARN [ReplicaAlterLogDirsThread-5]: Reset fetch
> offset for partition author_id_enrichment_changelog_staging-302 from 0 to
> current leader's start offset 1656927679
> (kafka.server.ReplicaAlterLogDirsThread)
> [2020-06-03 04:52:17,550] INFO [ReplicaAlterLogDirsThread-5]: Current offset
> 0 for partition author_id_enrichment_changelog_staging-302 is out of range,
> which typically implies a leader change. Reset fetch offset to 1656927679
> (kafka.server.ReplicaAlterLogDirsThread)
> [2020-06-03 04:52:17,653] INFO [ReplicaAlterLogDirsThread-5]: Partition
> author_id_enrichment_changelog_staging-302 has an older epoch (45) than the
> current leader. Will await the new LeaderAndIsr state before resuming
> fetching. (kafka.server.ReplicaAlterLogDirsThread)
> [2020-06-03 04:52:17,653] WARN [ReplicaAlterLogDirsThread-5]: Partition
> author_id_enrichment_changelog_staging-302 marked as failed
> (kafka.server.ReplicaAlterLogDirsThread)
> [2020-06-03 04:52:17,657] INFO [ReplicaAlterLogDirsThread-5]: Shutting down
> (kafka.server.ReplicaAlterLogDirsThread)
> [2020-06-03 04:52:17,661] INFO [ReplicaAlterLogDirsThread-5]: Stopped
> (kafka.server.ReplicaAlterLogDirsThread)
> [2020-06-03 04:52:17,661] INFO [ReplicaAlterLogDirsThread-5]: Shutdown
> completed (kafka.server.ReplicaAlterLogDirsThread){code}
> Only after restart the broker, the disk move succeed. The offset and epoch
> number looks better.
> {code:java}
> [2020-06-03 05:20:12,597] INFO [ReplicaAlterLogDirsManager on broker 5]
> Added fetcher to broker BrokerEndPoint(id=5, host=localhost:-1) for
> partitions Map(author_id_enrichment_changelog_staging-302 ->
> (offset=1663111146, leaderEpoch=47)) (kafka.server.ReplicaAlterLogDirsManager)
> [2020-06-03 05:20:12,606] INFO [ReplicaAlterLogDirsThread-5]: Starting
> (kafka.server.ReplicaAlterLogDirsThread)
> [2020-06-03 05:20:12,618] INFO [ReplicaAlterLogDirsThread-5]:
> Beginning/resuming copy of partition
> author_id_enrichment_changelog_staging-302 from offset 1657605964. Including
> this partition, there are 1 remaining partitions to copy by this thread.
> (kafka.server.ReplicaAlterLogDirsThread)
> [2020-06-03 05:20:20,992] INFO [ReplicaAlterLogDirsThread-5]: Shutting down
> (kafka.server.ReplicaAlterLogDirsThread)
> [2020-06-03 05:20:20,994] INFO [ReplicaAlterLogDirsThread-5]: Shutdown
> completed (kafka.server.ReplicaAlterLogDirsThread)
> [2020-06-03 05:20:20,994] INFO [ReplicaAlterLogDirsThread-5]: Stopped
> (kafka.server.ReplicaAlterLogDirsThread)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)