Igor Soarez created KAFKA-15649:
-----------------------------------

             Summary: Handle directory failure timeout 
                 Key: KAFKA-15649
                 URL: https://issues.apache.org/jira/browse/KAFKA-15649
             Project: Kafka
          Issue Type: Sub-task
            Reporter: Igor Soarez


If a broker with an offline log directory continues to fail to notify the 
controller of either:
 * the fact that the directory is offline; or
 * of any replica assignment into a failed directory

then the controller will not check if a leadership change is required, and this 
may lead to partitions remaining indefinitely offline.

KIP-858 proposes that the broker should shut down after a configurable timeout 
to force a leadership change. Alternatively, the broker could also request to 
be fenced, as long as there's a path for it to later become unfenced.

While this unavailability is possible in theory, in practice it's not easy to 
entertain a scenario where a broker continues to appear as healthy before the 
controller, but fails to send this information. So it's not clear if this is a 
real problem. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to