[ 
https://issues.apache.org/jira/browse/KAFKA-16606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840125#comment-17840125
 ] 

Igor Soarez commented on KAFKA-16606:
-------------------------------------

Thanks for bringing this to my attention [~mimaison].

Hi [~scholzj] , thanks for pointing this out. I think there's some confusion 
here with a JBOD configuration being +allowed+ vs being +supported+ in KRaft.

In terms of just reading and writing data to multiple log directories, as long 
as those direcories are always available, there's nothing special about KRaft 
that would require changes compared with ZK mode. What is enabled with 3.7 is 
the handling of failed log directories. You'll find that partitions don't get 
new leaders elected – becoming indefinitely unavailable – if the log directory 
for the leader replica fails but the broker stays alive.

If a single directory is configured and it becomes unavailable the broker shuts 
down, as there is no point in continuing to run without access to storage. When 
it shuts down the controller becomes aware of that – via an ephemeral znode in 
ZK mode, or via missing hearbeats in KRaft – and it will re-elect leaders for 
partitions that were led by the broker. When multiple directories are 
configured, it is critical to have a separate mechanism to let the controller 
know there is a partial failure – the broker is still alive and operational on 
the remaining log dirs, but any partitions on the directory that failed need a 
leadership and ISR update.

In ZK mode that was handled by notifying the controller via a znode, so a an 
alternative solution was required for KRaft. You can find the details in 
[KIP-858|https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft].

Let me know if that makes sense.

 

> JBOD support in KRaft does not seem to be gated by the metadata version
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-16606
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16606
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.7.0
>            Reporter: Jakub Scholz
>            Priority: Major
>
> JBOD support in KRaft should be supported since Kafka 3.7.0. The Kafka 
> [source 
> code|https://github.com/apache/kafka/blob/1b301b30207ed8fca9f0aea5cf940b0353a1abca/server-common/src/main/java/org/apache/kafka/server/common/MetadataVersion.java#L194-L195]
>  suggests that it is supported with the metadata version {{{}3.7-IV2{}}}. 
> However, it seems to be possible to run KRaft cluster with JBOD even with 
> older metadata versions such as {{{}3.6{}}}. For example, I have a cluster 
> using the {{3.6}} metadata version:
> {code:java}
> bin/kafka-features.sh --bootstrap-server localhost:9092 describe
> Feature: metadata.version       SupportedMinVersion: 3.0-IV1    
> SupportedMaxVersion: 3.7-IV4    FinalizedVersionLevel: 3.6-IV2  Epoch: 1375 
> {code}
> Yet a KRaft cluster with JBOD seems to run fine:
> {code:java}
> bin/kafka-log-dirs.sh --bootstrap-server localhost:9092 --describe
> Querying brokers for log directories information
> Received log directory information from brokers 2000,3000,1000
> {"brokers":[{"broker":2000,"logDirs":[{"partitions":[{"partition":"__consumer_offsets-13","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-46","size":0,"offsetLag":0,"isFuture":false},{"partition":"kafka-test-apps-0","size":28560,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-9","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-42","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-21","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-17","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-30","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-26","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-5","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-38","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-1","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-34","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-16","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-45","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-12","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-41","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-24","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-20","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-49","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-0","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-29","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-25","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-8","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-37","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-4","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-33","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-15","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-48","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-11","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-44","size":407136,"offsetLag":0,"isFuture":false},{"partition":"kafka-test-apps-2","size":28560,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-23","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-19","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-32","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-28","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-7","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-40","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-3","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-36","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-47","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-14","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-43","size":0,"offsetLag":0,"isFuture":false},{"partition":"kafka-test-apps-1","size":114240,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-10","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-22","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-18","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-31","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-27","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-39","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-6","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-35","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-2","size":0,"offsetLag":0,"isFuture":false}],"error":null,"logDir":"/var/lib/kafka/data-0/kafka-log2000"}]},{"broker":3000,"logDirs":[{"partitions":[{"partition":"__consumer_offsets-48","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-13","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-11","size":0,"offsetLag":0,"isFuture":false},{"partition":"kafka-test-apps-2","size":28560,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-42","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-21","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-17","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-30","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-26","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-40","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-5","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-3","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-36","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-47","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-14","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-12","size":0,"offsetLag":0,"isFuture":false},{"partition":"kafka-test-apps-1","size":114240,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-41","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-10","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-20","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-18","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-0","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-27","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-39","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-37","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-4","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-2","size":0,"offsetLag":0,"isFuture":false}],"error":null,"logDir":"/var/lib/kafka/data-0/kafka-log3000"},{"partitions":[{"partition":"__consumer_offsets-15","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-46","size":0,"offsetLag":0,"isFuture":false},{"partition":"kafka-test-apps-0","size":28560,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-44","size":407136,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-9","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-23","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-19","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-32","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-28","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-7","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-38","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-1","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-34","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-16","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-45","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-43","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-24","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-22","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-49","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-31","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-29","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-25","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-8","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-6","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-35","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-33","size":0,"offsetLag":0,"isFuture":false}],"error":null,"logDir":"/var/lib/kafka/data-1/kafka-log3000"}]},{"broker":1000,"logDirs":[{"partitions":[{"partition":"__consumer_offsets-13","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-46","size":0,"offsetLag":0,"isFuture":false},{"partition":"kafka-test-apps-0","size":28560,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-9","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-42","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-21","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-17","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-30","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-26","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-5","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-38","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-1","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-34","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-16","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-45","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-12","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-41","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-24","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-20","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-49","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-0","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-29","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-25","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-8","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-37","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-4","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-33","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-15","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-48","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-11","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-44","size":407136,"offsetLag":0,"isFuture":false},{"partition":"kafka-test-apps-2","size":28560,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-23","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-19","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-32","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-28","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-7","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-40","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-3","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-36","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-47","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-14","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-43","size":0,"offsetLag":0,"isFuture":false},{"partition":"kafka-test-apps-1","size":114240,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-10","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-22","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-18","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-31","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-27","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-39","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-6","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-35","size":0,"offsetLag":0,"isFuture":false},{"partition":"__consumer_offsets-2","size":0,"offsetLag":0,"isFuture":false}],"error":null,"logDir":"/var/lib/kafka/data-0/kafka-log1000"}]}],"version":1}
>  {code}
> Is this expected? Or is it a bug? If it is a bug, can it still be fixed given 
> there might be already users running Kafka 3.7.0 with JBOD and fixing the 
> gating of JBOD in a later version will actually break their clusters?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to