[ 
https://issues.apache.org/jira/browse/KAFKA-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Duggana updated KAFKA-8733:
----------------------------------
    Description: 
We found offline partitions issue multiple times on some of the hosts in our 
clusters. After going through the broker logs and hosts’s disk stats, it looks 
like this issue occurs whenever the read/write operations take more time on 
that disk. In a particular case where read time is more than the 
replica.lag.time.max.ms, follower replicas will be out of sync as their earlier 
fetch requests are stuck while reading the local log and their fetch status is 
not yet updated as mentioned in the below code of `ReplicaManager`. If there is 
an issue in reading the data from the log for a duration more than 
replica.lag.time.max.ms then all the replicas will be out of sync and partition 
becomes offline if min.isr.replicas > 1 and unclean.leader.election is disabled.

 
{code:java}
def readFromLog(): Seq[(TopicPartition, LogReadResult)] = {
  val result = readFromLocalLog( // this call took more than 
`replica.lag.time.max.ms`
  replicaId = replicaId,
  fetchOnlyFromLeader = fetchOnlyFromLeader,
  readOnlyCommitted = fetchOnlyCommitted,
  fetchMaxBytes = fetchMaxBytes,
  hardMaxBytesLimit = hardMaxBytesLimit,
  readPartitionInfo = fetchInfos,
  quota = quota,
  isolationLevel = isolationLevel)
  if (isFromFollower) updateFollowerLogReadResults(replicaId, result). // fetch 
time gets updated here, but mayBeShrinkIsr should have been already called and 
the replica is removed from sir
 else result
 }

val logReadResults = readFromLog()
{code}
Attached the graphs of disk weighted io time stats when this issue occurred.

I will raise a 
[KIP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-501+Avoid+offline+partitions+in+the+edgcase+scenario+of+follower+fetch+requests+not+processed+in+time]
 describing options on how to handle this scenario.

 

  was:
We found offline partitions issue multiple times on some of the hosts in our 
clusters. After going through the broker logs and hosts’s disk stats, it looks 
like this issue occurs whenever the read/write operations take more time on 
that disk. In a particular case where read time is more than the 
replica.lag.time.max.ms, follower replicas will be out of sync as their earlier 
fetch requests are stuck while reading the local log and their fetch status is 
not yet updated as mentioned in the below code of `ReplicaManager`. If there is 
an issue in reading the data from the log for a duration more than 
replica.lag.time.max.ms then all the replicas will be out of sync and partition 
becomes offline if min.isr.replicas > 1 and unclean.leader.election is disabled.

 
{code:java}
def readFromLog(): Seq[(TopicPartition, LogReadResult)] = {
  val result = readFromLocalLog( // this call took more than 
`replica.lag.time.max.ms`
  replicaId = replicaId,
  fetchOnlyFromLeader = fetchOnlyFromLeader,
  readOnlyCommitted = fetchOnlyCommitted,
  fetchMaxBytes = fetchMaxBytes,
  hardMaxBytesLimit = hardMaxBytesLimit,
  readPartitionInfo = fetchInfos,
  quota = quota,
  isolationLevel = isolationLevel)
  if (isFromFollower) updateFollowerLogReadResults(replicaId, result). // fetch 
time gets updated here, but mayBeShrinkIsr should have been already called and 
the replica is removed from sir
 else result
 }

val logReadResults = readFromLog()
{code}
I will raise a 
[KIP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-501+Avoid+offline+partitions+in+the+edgcase+scenario+of+follower+fetch+requests+not+processed+in+time]
 describing options on how to handle this scenario.

 


> Offline partitions occur when leader's disk is slow in reads while responding 
> to follower fetch requests.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8733
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8733
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.1.2, 2.4.0
>            Reporter: Satish Duggana
>            Assignee: Satish Duggana
>            Priority: Critical
>         Attachments: weighted-io-time-2.png, wio-time.png
>
>
> We found offline partitions issue multiple times on some of the hosts in our 
> clusters. After going through the broker logs and hosts’s disk stats, it 
> looks like this issue occurs whenever the read/write operations take more 
> time on that disk. In a particular case where read time is more than the 
> replica.lag.time.max.ms, follower replicas will be out of sync as their 
> earlier fetch requests are stuck while reading the local log and their fetch 
> status is not yet updated as mentioned in the below code of `ReplicaManager`. 
> If there is an issue in reading the data from the log for a duration more 
> than replica.lag.time.max.ms then all the replicas will be out of sync and 
> partition becomes offline if min.isr.replicas > 1 and unclean.leader.election 
> is disabled.
>  
> {code:java}
> def readFromLog(): Seq[(TopicPartition, LogReadResult)] = {
>   val result = readFromLocalLog( // this call took more than 
> `replica.lag.time.max.ms`
>   replicaId = replicaId,
>   fetchOnlyFromLeader = fetchOnlyFromLeader,
>   readOnlyCommitted = fetchOnlyCommitted,
>   fetchMaxBytes = fetchMaxBytes,
>   hardMaxBytesLimit = hardMaxBytesLimit,
>   readPartitionInfo = fetchInfos,
>   quota = quota,
>   isolationLevel = isolationLevel)
>   if (isFromFollower) updateFollowerLogReadResults(replicaId, result). // 
> fetch time gets updated here, but mayBeShrinkIsr should have been already 
> called and the replica is removed from sir
>  else result
>  }
> val logReadResults = readFromLog()
> {code}
> Attached the graphs of disk weighted io time stats when this issue occurred.
> I will raise a 
> [KIP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-501+Avoid+offline+partitions+in+the+edgcase+scenario+of+follower+fetch+requests+not+processed+in+time]
>  describing options on how to handle this scenario.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to