[ https://issues.apache.org/jira/browse/KAFKA-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383107#comment-15383107 ]
Andrew Olson commented on KAFKA-3924: ------------------------------------- Reviewed, looks good to me. > Data loss due to halting when LEO is larger than leader's LEO > ------------------------------------------------------------- > > Key: KAFKA-3924 > URL: https://issues.apache.org/jira/browse/KAFKA-3924 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.10.0.0 > Reporter: Maysam Yabandeh > > Currently the follower broker panics when its LEO is larger than its leader's > LEO, and assuming that this is an impossible state to reach, halts the > process to prevent any further damage. > {code} > if (leaderEndOffset < replica.logEndOffset.messageOffset) { > // Prior to truncating the follower's log, ensure that doing so is not > disallowed by the configuration for unclean leader election. > // This situation could only happen if the unclean election > configuration for a topic changes while a replica is down. Otherwise, > // we should never encounter this situation since a non-ISR leader > cannot be elected if disallowed by the broker configuration. > if (!LogConfig.fromProps(brokerConfig.originals, > AdminUtils.fetchEntityConfig(replicaMgr.zkUtils, > ConfigType.Topic, > topicAndPartition.topic)).uncleanLeaderElectionEnable) { > // Log a fatal error and shutdown the broker to ensure that data loss > does not unexpectedly occur. > fatal("...") > Runtime.getRuntime.halt(1) > } > {code} > Firstly this assumption is invalid and there are legitimate cases (examples > below) that this case could actually occur. Secondly halt results into the > broker losing its un-flushed data, and if multiple brokers halt > simultaneously there is a chance that both leader and followers of a > partition are among the halted brokers, which would result into permanent > data loss. > Given that this is a legit case, we suggest to replace it with a graceful > shutdown to avoid propagating data loss to the entire cluster. > Details: > One legit case that this could actually occur is when a troubled broker > shrinks its partitions right before crashing (KAFKA-3410 and KAFKA-3861). In > this case the broker has lost some data but the controller cannot still > elects the others as the leader. If the crashed broker comes back up, the > controller elects it as the leader, and as a result all other brokers who are > now following it halt since they have LEOs larger than that of shrunk topics > in the restarted broker. We actually had a case that bringing up a crashed > broker simultaneously took down the entire cluster and as explained above > this could result into data loss. > The other legit case is when multiple brokers ungracefully shutdown at the > same time. In this case both of the leader and the followers lose their > un-flushed data but one of them has HW larger than the other. Controller > elects the one who comes back up sooner as the leader and if its LEO is less > than its future follower, the follower will halt (and probably lose more > data). Simultaneous ungrateful shutdown could happen due to hardware issue > (e.g., rack power failure), operator errors, or software issue (e.g., the > case above that is further explained in KAFKA-3410 and KAFKA-3861 and causes > simultaneous halts in multiple brokers) -- This message was sent by Atlassian JIRA (v6.3.4#6332)