[
https://issues.apache.org/jira/browse/KAFKA-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220139#comment-17220139
]
Ming Liu commented on KAFKA-8733:
---------------------------------
One observation is after moving to 2.5 (so
_[replica.lag.time.max.ms|http://replica.lag.time.max.ms/]_ is changed from 10
second to 30 seconds), Offline partitions when leader's disk went bad occurs
much less frequently.
> Offline partitions occur when leader's disk is slow in reads while responding
> to follower fetch requests.
> ---------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-8733
> URL: https://issues.apache.org/jira/browse/KAFKA-8733
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 1.1.2, 2.4.0
> Reporter: Satish Duggana
> Assignee: Satish Duggana
> Priority: Critical
> Attachments: weighted-io-time-2.png, wio-time.png
>
>
> We found offline partitions issue multiple times on some of the hosts in our
> clusters. After going through the broker logs and hosts’s disk stats, it
> looks like this issue occurs whenever the read/write operations take more
> time on that disk. In a particular case where read time is more than the
> replica.lag.time.max.ms, follower replicas will be out of sync as their
> earlier fetch requests are stuck while reading the local log and their fetch
> status is not yet updated as mentioned in the below code of `ReplicaManager`.
> If there is an issue in reading the data from the log for a duration more
> than replica.lag.time.max.ms then all the replicas will be out of sync and
> partition becomes offline if min.isr.replicas > 1 and unclean.leader.election
> is false.
>
> {code:java}
> def readFromLog(): Seq[(TopicPartition, LogReadResult)] = {
> val result = readFromLocalLog( // this call took more than
> `replica.lag.time.max.ms`
> replicaId = replicaId,
> fetchOnlyFromLeader = fetchOnlyFromLeader,
> readOnlyCommitted = fetchOnlyCommitted,
> fetchMaxBytes = fetchMaxBytes,
> hardMaxBytesLimit = hardMaxBytesLimit,
> readPartitionInfo = fetchInfos,
> quota = quota,
> isolationLevel = isolationLevel)
> if (isFromFollower) updateFollowerLogReadResults(replicaId, result). //
> fetch time gets updated here, but mayBeShrinkIsr should have been already
> called and the replica is removed from isr
> else result
> }
> val logReadResults = readFromLog()
> {code}
> Attached the graphs of disk weighted io time stats when this issue occurred.
> I will raise [KIP-501|https://s.apache.org/jhbpn] describing options on how
> to handle this scenario.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)