[ 
https://issues.apache.org/jira/browse/KAFKA-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16431.
------------------------------------
    Fix Version/s:     (was: 3.7.1)
       Resolution: Duplicate

> Handle log dir failure in hybrid mode
> -------------------------------------
>
>                 Key: KAFKA-16431
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16431
>             Project: Kafka
>          Issue Type: Bug
>          Components: jbod
>    Affects Versions: 3.7.0
>            Reporter: Igor Soarez
>            Assignee: Igor Soarez
>            Priority: Critical
>
> As part of the KRaft migration, the Controller implements some of the ZK-mode 
> controller functionality that is employed during the migration in what is 
> known as "hybrid mode".
> In hybrid mode some brokers may still be running in ZK-mode and some brokers 
> may have already been restarted into KRaft mode.
> The ZK-mode Controller implementation in KRaft does not implement the 
> ZK-based logic to handle directory failures, so it will be unable to re-elect 
> leaders for partitions led by failed directories.
> This leaves a gap for JBOD during the ZK-KRaft migration. And there are two 
> main ways this can be addressed:
>  # Implement the ZK-mode functionality to handle failed directories. Like in 
> ZK-mode, the controller needs to subscribe to events in the 
> `/log_dir_event_notification` ZNode, and rely on per-partition errors on full 
> LeaderAndIsr responses to detect directory failures.
>  # Another, simpler way to address this, would be to have a migrating ZK 
> broker stop upon any directory failure. This would sacrifice some 
> availability / operational flexibility, but it may be much more 
> straightforward to implement in comparison.
> Without a solution, a directory failure during the migration may lead to 
> indefinite partition unavailability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to