[ https://issues.apache.org/jira/browse/KAFKA-7151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anna Povzner updated KAFKA-7151: -------------------------------- Summary: IOException on broker may result in state where unclean leader election is required (was: Broker running out of disk space may result in state where unclean leader election is required) > IOException on broker may result in state where unclean leader election is > required > ----------------------------------------------------------------------------------- > > Key: KAFKA-7151 > URL: https://issues.apache.org/jira/browse/KAFKA-7151 > Project: Kafka > Issue Type: Bug > Reporter: Anna Povzner > Priority: Major > > We have seen situations like the following: > 1) Broker A is a leader for topic partition, and brokers B and C are the > followers > 2) Broker A is running out of disk space, shrinks ISR only to itself, and > then sometime later gets disk errors, etc. > 3) Broker A is stopped, disk space is reclaimed, and broker A is restarted > Result: Broker A becomes a leader, but followers cannot fetch because their > log is ahead. The only way to continue is to enable unclean leader election. > > There are several issues here: > -- if the machine is running out of disk space, we do not reliably get an > error from a file system as soon as that happens. The broker could be in a > state where some writes succeed (possibly if the write is not flushed to > disk) and some writes fails, or maybe fail later. This may cause fetchers > fetch records that are still in the leader's file system cache, and then the > flush to disk failing on the leader, causes followers to be ahead of the > leader. > -- I am not sure exactly why, but it seems like the leader broker (that is > running out of disk space) may also stop servicing fetch requests making > followers fall behind and kicked out of ISR. > Ideally, the broker should stop being a leader for any topic partition before > accepting any records that may fail to be flushed to disk. One option is to > automatically detect disk space usage and make a broker read-only for topic > partitions if disk space gets to 80% or something. Maybe there is a better > option. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)