Seems like this proposal is between unclean elect and clean elect, maybe need add new policy for this?
— hackerwin7 — hackersw...@gmail.com > On Feb 15, 2019, at 07:50, Ming Liu <minga...@gmail.com> wrote: > > Hi Kafka community, > I like to propose a small change related to > OfflinePartitionLeaderElectionStrategy. > In our system, we usually has RF = 3, Min_ISR = 2, > unclean.leader.election = false and client usually set the ACK.all when > publishing. We have observed that occasionally, when disk become bad, we > have partition offline and stayed on the offline state, which of cause, > causing the availability issue and we have to manually set > unclean.leader.election = true to bring the partition online. > This partition offlie due to disk failure become a huge operational pain > for us. > > Looking into, the sequence of events are: > 1. First, ISR for that partition drops to 1 (maybe bad disk causing the > broker to respond to fetch more slowly. Note dead disk doesn't cause this > to happen every time, but occasionally) > 2. Then disk completely give up and the failure causing leader replica > offline > 3. Because the ISR is 1, OfflinePartitionLeaderElectionStrategy won't > choose the leader if unclean.leader.election = false. > > The observation here is, in this case, even the last failed replica is > not in ISR, it still should have the HW same as the failed leader replica. > So the OfflinePartitionLeaderElectionStrategy should select the last failed > replica as the leader, espcially if it has the same HW. > > So the proposal is: > 1. Choose replica as the leader if it has the same HW (and even it is > not in ISR) > 2. Further, when unclean.leader.election = true, choose the replica with > highest HW as the leader. > > Let me know if this makes sense or any suggestions. If yes, I will > create a JIRA and work on it. > > Thanks! > Ming