[jira] [Commented] (KAFKA-9087) ReplicaAlterLogDirs stuck and restart fails with java.lang.IllegalStateException: Offset mismatch for the future replica

Chia-Ping Tsai (Jira) Tue, 03 Jan 2023 10:25:30 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654133#comment-17654133
 ]


Chia-Ping Tsai commented on KAFKA-9087:
---------------------------------------

[~junrao] Sorry for late response.
{quote}So, ReplicaAlterLogDirsThread is supposed to ignore the old fetched data 
and fetch again using the new fetch offset. I am wondering why that didn't 
happen.
{quote}
You are right. The true root cause is shown below.
 # tp-0 is located at broker-0:/tmp/data0
 # move tp-0 from /tmp/data0 to /tmp/data1. It will create a new future log 
([https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L765])
 and ReplicaAlterLogDirsThread. The new future log does not have leader epoch 
before it sync data
 # file a partition reassignment to trigger LeaderAndIsrRequest request. The 
request will update the partition state of ReplicaAlterLogDirsThread 
([https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L1565]),
 and the new offset of partition state is set with highWatermark of log
 # ReplicaAlterLogDirsThread uses the high watermark instead of 
OffsetsForLeaderEpoch API if there is no epoch cache.
 # The future log is new, so its end offset is 0. And the offset mismatch ( 0 
v.s high watermark of log) causes the error.

In short, the race condition of processing LeaderAndIsrRequest and 
AlterReplicaLogDirsRequest causes this error (on V2 message format). Also, the 
error can be reproduced easily on V1 since there is no epoch cache. I’m not 
sure why it used log.highWatermark 
([https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L1559]).
 The ReplicaAlterLogDirsThread checks the offset of “future log” rather than 
“log. Hence, here is my two cents, we can replace log.highWatermark by 
futureLog.highWatermark to resolve this issue. I tested it on our cluster and 
it works well (on both V1 and V2).

> ReplicaAlterLogDirs stuck and restart fails with 
> java.lang.IllegalStateException: Offset mismatch for the future replica
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-9087
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9087
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.2.0
>            Reporter: Gregory Koshelev
>            Priority: Major
>
> I've started multiple replica movements between log directories and some 
> partitions were stuck. After the restart of the broker I've got exception in 
> server.log:
> {noformat}
> [2019-06-11 17:58:46,304] ERROR [ReplicaAlterLogDirsThread-1]: Error due to 
> (kafka.server.ReplicaAlterLogDirsThread)
>  org.apache.kafka.common.KafkaException: Error processing data for partition 
> metrics_timers-35 offset 4224887
>  at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:342)
>  at scala.Option.foreach(Option.scala:274)
>  at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6(AbstractFetcherThread.scala:300)
>  at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted(AbstractFetcherThread.scala:299)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>  at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$5(AbstractFetcherThread.scala:299)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
>  at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:299)
>  at 
> kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:132)
>  at 
> kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:131)
>  at scala.Option.foreach(Option.scala:274)
>  at 
> kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:131)
>  at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113)
>  at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
>  Caused by: java.lang.IllegalStateException: Offset mismatch for the future 
> replica metrics_timers-35: fetched offset = 4224887, log end offset = 0.
>  at 
> kafka.server.ReplicaAlterLogDirsThread.processPartitionData(ReplicaAlterLogDirsThread.scala:107)
>  at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:311)
>  ... 16 more
>  [2019-06-11 17:58:46,305] INFO [ReplicaAlterLogDirsThread-1]: Stopped 
> (kafka.server.ReplicaAlterLogDirsThread)
> {noformat}
> Also, ReplicaAlterLogDirsThread has been stopped. Further restarts do not fix 
> the problem. To fix it I've stopped the broker and remove all the stuck 
> future partitions.
> Detailed log below
> {noformat}
> [2019-06-11 12:09:52,833] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Truncating to 4224887 has no effect as the largest 
> offset in the log is 4224886 (kafka.log.Log)
> [2019-06-11 12:21:34,979] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Loading producer state till offset 4224887 with 
> message format version 2 (kafka.log.Log)
> [2019-06-11 12:21:34,980] INFO [ProducerStateManager 
> partition=metrics_timers-35] Loading producer state from snapshot file 
> '/storage2/kafka/data/metrics_timers-35/00000000000004224887.snapshot' 
> (kafka.log.ProducerStateManager)
> [2019-06-11 12:21:34,980] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Completed load of log with 1 segments, log start 
> offset 4120720 and log end offset 4224887 in 70 ms (kafka.log.Log)
> [2019-06-11 12:21:45,307] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 0 (kafka.cluster.Replica)
> [2019-06-11 12:21:45,307] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 0 (kafka.cluster.Replica)
> [2019-06-11 12:21:45,307] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 4224887 (kafka.cluster.Replica)
> [2019-06-11 12:21:47,090] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Truncating to 4224887 has no effect as the largest 
> offset in the log is 4224886 (kafka.log.Log)
> [2019-06-11 12:30:04,757] INFO [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=0] Retrying leaderEpoch request for partition metrics_timers-35 as 
> the leader reported an error: UNKNOWN_LEADER_EPOCH 
> (kafka.server.ReplicaFetcherThread)
> [2019-06-11 12:30:06,157] INFO [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=0] Retrying leaderEpoch request for partition metrics_timers-35 as 
> the leader reported an error: UNKNOWN_LEADER_EPOCH 
> (kafka.server.ReplicaFetcherThread)
> [2019-06-11 12:30:07,238] INFO [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=0] Retrying leaderEpoch request for partition metrics_timers-35 as 
> the leader reported an error: UNKNOWN_LEADER_EPOCH 
> (kafka.server.ReplicaFetcherThread)
> [2019-06-11 12:30:08,251] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Truncating to 4224887 has no effect as the largest 
> offset in the log is 4224886 (kafka.log.Log)
> {noformat}
> I've started replica movement at this moment.
> {noformat}
> [2019-06-11 12:47:32,502] INFO [Log partition=metrics_timers-35, 
> dir=/storage5/kafka/data] Loading producer state till offset 0 with message 
> format version 2 (kafka.log.Log)
> [2019-06-11 12:47:32,502] INFO [Log partition=metrics_timers-35, 
> dir=/storage5/kafka/data] Completed load of log with 1 segments, log start 
> offset 0 and log end offset 0 in 1 ms (kafka.log.Log)
> [2019-06-11 12:47:32,502] INFO Created log for partition metrics_timers-35 in 
> /storage5/kafka/data with properties {compression.type -> producer, 
> message.format.version -> 2.2-IV1, file.delete.delay.ms -> 60000, 
> max.message.bytes -> 1000012, min.compaction.lag.ms -> 0, 
> message.timestamp.type -> CreateTime, message.downconversion.enable -> true, 
> min.insync.replicas -> 2, segment.jitter.ms -> 0, preallocate -> false, 
> min.cleanable.dirty.ratio -> 0.5, index.interval.bytes -> 4096, 
> unclean.leader.election.enable -> false, retention.bytes -> 137438953472, 
> delete.retention.ms -> 86400000, cleanup.policy -> [delete], flush.ms -> 
> 9223372036854775807, segment.ms -> 604800000, segment.bytes -> 1073741824, 
> retention.ms -> 259200000, message.timestamp.difference.max.ms -> 
> 9223372036854775807, segment.index.bytes -> 10485760, flush.messages -> 
> 9223372036854775807}. (kafka.log.LogManager)
> [2019-06-11 12:47:32,502] INFO [Partition metrics_timers-35 broker=1] No 
> checkpointed highwatermark is found for partition metrics_timers-35 
> (kafka.cluster.Partition)
> [2019-06-11 12:47:32,502] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 0 (kafka.cluster.Replica)
> [2019-06-11 12:47:33,083] INFO [ReplicaAlterLogDirsManager on broker 1] Added 
> fetcher to broker BrokerEndPoint(id=1, host=localhost:-1) for partitions 
> Map(metrics_timers-35 -> (offset=0, leaderEpoch=27)) 
> (kafka.server.ReplicaAlterLogDirsManager)
> [2019-06-11 12:47:33,309] INFO [ReplicaAlterLogDirsThread-1]: Truncating 
> partition metrics_timers-35 to local high watermark 0 
> (kafka.server.ReplicaAlterLogDirsThread)
> [2019-06-11 12:47:33,309] INFO [Log partition=metrics_timers-35, 
> dir=/storage5/kafka/data] Truncating to 0 has no effect as the largest offset 
> in the log is -1 (kafka.log.Log)
> [2019-06-11 14:02:25,937] INFO [ReplicaAlterLogDirsThread-1]: Partition 
> metrics_timers-35 has an older epoch (27) than the current leader. Will await 
> the new LeaderAndIsr state before resuming fetching. 
> (kafka.server.ReplicaAlterLogDirsThread)
> [2019-06-11 14:02:25,952] INFO [ReplicaFetcherManager on broker 1] Removed 
> fetcher for partitions Set(metrics_timer-35, …
> [2019-06-11 14:02:25,980] INFO [ReplicaFetcherManager on broker 1] Added 
> fetcher to broker BrokerEndPoint(id=2, host=vostok09:9092) for partitions 
> Map(metrics_timers-35 -> (offset=4224887, leaderEpoch=28),…
> [2019-06-11 14:02:25,998] INFO [ReplicaAlterLogDirsThread-1]: Shutting down 
> (kafka.server.ReplicaAlterLogDirsThread)
> [2019-06-11 14:02:25,998] INFO [ReplicaAlterLogDirsThread-1]: Stopped 
> (kafka.server.ReplicaAlterLogDirsThread)
> [2019-06-11 14:02:25,998] INFO [ReplicaAlterLogDirsThread-1]: Shutdown 
> completed (kafka.server.ReplicaAlterLogDirsThread)
> [2019-06-11 14:02:26,803] INFO [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=0] Retrying leaderEpoch request for partition metrics_timers-35 as 
> the leader reported an error: UNKNOWN_LEADER_EPOCH 
> (kafka.server.ReplicaFetcherThread)
> [2019-06-11 14:02:43,406] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Truncating to 4224887 has no effect as the largest 
> offset in the log is 4224886 (kafka.log.Log)
> {noformat}
> The broker has been restarted at 17:35
> {noformat}
> [2019-06-11 17:35:32,176] INFO [ReplicaFetcherManager on broker 1] Removed 
> fetcher for partitions Set(metrics_timers-35) 
> (kafka.server.ReplicaFetcherManager)
> [2019-06-11 17:37:48,265] WARN [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Found a corrupted index file corresponding to log 
> file /storage2/kafka/data/metrics_timers-35/00000000000004120720.log due to 
> Corrupt time index found, time index file 
> (/storage2/kafka/data/metrics_timers-35/00000000000004120720.timeindex) has 
> non-zero size but the last timestamp is 0 which is less than the first 
> timestamp 1560154787249}, recovering segment and rebuilding index files... 
> (kafka.log.Log)
> [2019-06-11 17:37:48,265] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Loading producer state till offset 4120720 with 
> message format version 2 (kafka.log.Log)
> [2019-06-11 17:37:48,266] INFO [ProducerStateManager 
> partition=metrics_timers-35] Writing producer snapshot at offset 4120720 
> (kafka.log.ProducerStateManager)
> [2019-06-11 17:37:48,522] INFO [ProducerStateManager 
> partition=metrics_timers-35] Writing producer snapshot at offset 4224887 
> (kafka.log.ProducerStateManager)
> [2019-06-11 17:37:48,524] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Loading producer state till offset 4224887 with 
> message format version 2 (kafka.log.Log)
> [2019-06-11 17:37:48,525] INFO [ProducerStateManager 
> partition=metrics_timers-35] Loading producer state from snapshot file 
> '/storage2/kafka/data/metrics_timers-35/00000000000004224887.snapshot' 
> (kafka.log.ProducerStateManager)
> [2019-06-11 17:37:48,525] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Completed load of log with 1 segments, log start 
> offset 4120720 and log end offset 4224887 in 298 ms (kafka.log.Log)
> [2019-06-11 17:38:01,954] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 0 (kafka.cluster.Replica)
> [2019-06-11 17:38:01,954] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 0 (kafka.cluster.Replica)
> [2019-06-11 17:38:01,955] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 4224887 (kafka.cluster.Replica)
> [2019-06-11 17:38:02,582] INFO [Partition metrics_timers-35 broker=1] No 
> checkpointed highwatermark is found for partition metrics_timers-35 
> (kafka.cluster.Partition)
> [2019-06-11 17:38:02,582] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 0 (kafka.cluster.Replica)
> [2019-06-11 17:38:02,588] INFO [ReplicaAlterLogDirsManager on broker 1] Added 
> fetcher to broker BrokerEndPoint(id=1, host=localhost:-1) for partitions 
> Map(metrics_timers-35 -> (offset=4224887, leaderEpoch=29), traces_cloud-4 -> 
> (offset=4381208630, leaderEpoch=27), metrics_histograms-11 -> (offset=0, 
> leaderEpoch=28), metrics_histograms-25 -> (offset=0, leaderEpoch=34), 
> metrics_histograms-39 -> (offset=0, leaderEpoch=29), metrics_counters-15 -> 
> (offset=0, leaderEpoch=34), metrics_final-21 -> (offset=1852, 
> leaderEpoch=28), metrics_final-7 -> (offset=1926, leaderEpoch=29), 
> metrics_any-17 -> (offset=0, leaderEpoch=28), metrics_timers-14 -> (offset=0, 
> leaderEpoch=29), metrics_counters-1 -> (offset=0, leaderEpoch=28)) 
> (kafka.server.ReplicaAlterLogDirsManager)
> [2019-06-11 17:38:02,596] INFO [ReplicaAlterLogDirsThread-1]: Truncating 
> partition metrics_timers-35 to local high watermark 4224887 
> (kafka.server.ReplicaAlterLogDirsThread)
> [2019-06-11 17:38:02,596] INFO [Log partition=metrics_timers-35, 
> dir=/storage5/kafka/data] Truncating to 4224887 has no effect as the largest 
> offset in the log is -1 (kafka.log.Log)
> [2019-06-11 17:38:06,005] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Truncating to 4224887 has no effect as the largest 
> offset in the log is 4224886 (kafka.log.Log)
> [2019-06-11 17:38:06,080] INFO [Log partition=metrics_timers-35, 
> dir=/storage5/kafka/data] Truncating to 4224887 has no effect as the largest 
> offset in the log is -1 (kafka.log.Log)
> [2019-06-11 17:38:06,080] INFO [ReplicaAlterLogDirsThread-1]: Truncating 
> partition metrics_timers-35 to local high watermark 4224887 
> (kafka.server.ReplicaAlterLogDirsThread)
> [2019-06-11 17:38:06,080] INFO [Log partition=metrics_timers-35, 
> dir=/storage5/kafka/data] Truncating to 4224887 has no effect as the largest 
> offset in the log is -1 (kafka.log.Log)
> [2019-06-11 17:58:46,304] ERROR [ReplicaAlterLogDirsThread-1]: Error due to 
> (kafka.server.ReplicaAlterLogDirsThread)
> org.apache.kafka.common.KafkaException: Error processing data for partition 
> metrics_timers-35 offset 4224887
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:342)
>         at scala.Option.foreach(Option.scala:274)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6(AbstractFetcherThread.scala:300)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted(AbstractFetcherThread.scala:299)
>         at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>         at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$5(AbstractFetcherThread.scala:299)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
>         at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:299)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:132)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:131)
>         at scala.Option.foreach(Option.scala:274)
>         at 
> kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:131)
>         at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> Caused by: java.lang.IllegalStateException: Offset mismatch for the future 
> replica metrics_timers-35: fetched offset = 4224887, log end offset = 0.
>         at 
> kafka.server.ReplicaAlterLogDirsThread.processPartitionData(ReplicaAlterLogDirsThread.scala:107)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:311)
>         ... 16 more
> [2019-06-11 17:58:46,305] INFO [ReplicaAlterLogDirsThread-1]: Stopped 
> (kafka.server.ReplicaAlterLogDirsThread)
> {noformat}
> The broker has been restarted at 18:21
> {noformat}
> [2019-06-11 18:21:26,422] INFO [Log partition=metrics_timers-35, 
> dir=/storage5/kafka/data] Loading producer state till offset 0 with message 
> format version 2 (kafka.log.Log)
> [2019-06-11 18:21:26,423] INFO [Log partition=metrics_timers-35, 
> dir=/storage5/kafka/data] Completed load of log with 1 segments, log start 
> offset 0 and log end offset 0 in 2 ms (kafka.log.Log)
> [2019-06-11 18:23:21,300] WARN [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Found a corrupted index file corresponding to log 
> file /storage2/kafka/data/metrics_timers-35/00000000000004120720.log due to 
> Corrupt time index found, time index file 
> (/storage2/kafka/data/metrics_timers-35/00000000000004120720.timeindex) has 
> non-zero size but the last timestamp is 0 which is less than the first 
> timestamp 1560154787249}, recovering segment and rebuilding index files... 
> (kafka.log.Log)
> [2019-06-11 18:23:21,300] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Loading producer state till offset 4120720 with 
> message format version 2 (kafka.log.Log)
> [2019-06-11 18:23:21,301] INFO [ProducerStateManager 
> partition=metrics_timers-35] Writing producer snapshot at offset 4120720 
> (kafka.log.ProducerStateManager)
> [2019-06-11 18:23:21,559] INFO [ProducerStateManager 
> partition=metrics_timers-35] Writing producer snapshot at offset 4224887 
> (kafka.log.ProducerStateManager)
> [2019-06-11 18:23:21,561] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Loading producer state till offset 4224887 with 
> message format version 2 (kafka.log.Log)
> [2019-06-11 18:23:21,562] INFO [ProducerStateManager 
> partition=metrics_timers-35] Loading producer state from snapshot file 
> '/storage2/kafka/data/metrics_timers-35/00000000000004224887.snapshot' 
> (kafka.log.ProducerStateManager)
> [2019-06-11 18:23:21,563] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Completed load of log with 1 segments, log start 
> offset 4120720 and log end offset 4224887 in 353 ms (kafka.log.Log)
> [2019-06-11 18:23:35,928] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 0 (kafka.cluster.Replica)
> [2019-06-11 18:23:35,928] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 0 (kafka.cluster.Replica)
> [2019-06-11 18:23:35,929] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 4224887 (kafka.cluster.Replica)
> [2019-06-11 18:23:36,516] INFO [Partition metrics_timers-35 broker=1] No 
> checkpointed highwatermark is found for partition metrics_timers-35 
> (kafka.cluster.Partition)
> [2019-06-11 18:23:36,516] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 0 (kafka.cluster.Replica)
> [2019-06-11 18:23:36,521] INFO [ReplicaAlterLogDirsManager on broker 1] Added 
> fetcher to broker BrokerEndPoint(id=1, host=localhost:-1) for partitions 
> Map(metrics_timers-35 -> (offset=4224887, leaderEpoch=30), 
> metrics_histograms-11 -> (offset=0, leaderEpoch=29), metrics_histograms-25 -> 
> (offset=0, leaderEpoch=36), metrics_histograms-39 -> (offset=0, 
> leaderEpoch=30), metrics_counters-15 -> (offset=0, leaderEpoch=36), 
> metrics_final-21 -> (offset=1861, leaderEpoch=29), metrics_final-7 -> 
> (offset=1931, leaderEpoch=30), metrics_any-17 -> (offset=0, leaderEpoch=29), 
> metrics_timers-14 -> (offset=0, leaderEpoch=30), metrics_counters-1 -> 
> (offset=0, leaderEpoch=29)) (kafka.server.ReplicaAlterLogDirsManager)
> [2019-06-11 18:23:36,522] INFO [ReplicaAlterLogDirsThread-1]: Truncating 
> partition metrics_timers-35 to local high watermark 4224887 
> (kafka.server.ReplicaAlterLogDirsThread)
> [2019-06-11 18:23:36,523] INFO [Log partition=metrics_timers-35, 
> dir=/storage5/kafka/data] Truncating to 4224887 has no effect as the largest 
> offset in the log is -1 (kafka.log.Log)
> [2019-06-11 18:23:36,523] INFO [ReplicaAlterLogDirsThread-1]: Truncating 
> partition metrics_final-7 to local high watermark 1931 
> (kafka.server.ReplicaAlterLogDirsThread)
> [2019-06-11 18:23:36,563] ERROR [ReplicaAlterLogDirsThread-1]: Error due to 
> (kafka.server.ReplicaAlterLogDirsThread)
> org.apache.kafka.common.KafkaException: Error processing data for partition 
> metrics_timers-35 offset 4224887
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:342)
>         at scala.Option.foreach(Option.scala:274)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6(AbstractFetcherThread.scala:300)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted(AbstractFetcherThread.scala:299)
>         at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>         at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$5(AbstractFetcherThread.scala:299)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
>         at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:299)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:132)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:131)
>         at scala.Option.foreach(Option.scala:274)
>         at 
> kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:131)
>         at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
> Caused by: java.lang.IllegalStateException: Offset mismatch for the future 
> replica metrics_timers-35: fetched offset = 4224887, log end offset = 0.
>         at 
> kafka.server.ReplicaAlterLogDirsThread.processPartitionData(ReplicaAlterLogDirsThread.scala:107)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:311)
>         ... 16 more
> [2019-06-11 18:23:36,564] INFO [GroupMetadataManager brokerId=1] Scheduling 
> unloading of offsets and group metadata from __consumer_offsets-19 
> (kafka.coordinator.group.GroupMetadataManager)
> [2019-06-11 18:23:36,572] INFO [ReplicaAlterLogDirsThread-1]: Stopped 
> (kafka.server.ReplicaAlterLogDirsThread)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-9087) ReplicaAlterLogDirs stuck and restart fails with java.lang.IllegalStateException: Offset mismatch for the future replica

Reply via email to