[jira] [Commented] (KAFKA-15495) KRaft partition truncated when the only ISR member restarts with an empty disk

2023-12-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/KAFKA-15495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800601#comment-17800601
 ] 

José Armando García Sancio commented on KAFKA-15495:


[~rndgstn] , this issue makes it look like this only applies to KRaft clusters. 
This is not accurate. This has been an issue since ISR leader election and 
replication was introduced to Apache Kafka.

> KRaft partition truncated when the only ISR member restarts with an empty disk
> --
>
> Key: KAFKA-15495
> URL: https://issues.apache.org/jira/browse/KAFKA-15495
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.2, 3.4.1, 3.6.0, 3.5.1
>Reporter: Ron Dagostino
>Priority: Critical
>
> Assume a topic-partition in KRaft has just a single leader replica in the 
> ISR.  Assume next that this replica goes offline.  This replica's log will 
> define the contents of that partition when the replica restarts, which is 
> correct behavior.  However, assume now that the replica has a disk failure, 
> and we then replace the failed disk with a new, empty disk that we also 
> format with the storage tool so it has the correct cluster ID.  If we then 
> restart the broker, the topic-partition will have no data in it, and any 
> other replicas that might exist will truncate their logs to match, which 
> results in data loss.  See below for a step-by-step demo of how to reproduce 
> this.
> [KIP-858: Handle JBOD broker disk failure in 
> KRaft|https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft]
>  introduces the concept of a Disk UUID that we can use to solve this problem. 
>  Specifically, when the leader restarts with an empty (but 
> correctly-formatted) disk, the actual UUID associated with the disk will be 
> different.  The controller will notice upon broker re-registration that its 
> disk UUID differs from what was previously registered.  Right now we have no 
> way of detecting this situation, but the disk UUID gives us that capability.
> STEPS TO REPRODUCE:
> Create a single broker cluster with single controller.  The standard files 
> under config/kraft work well:
> bin/kafka-storage.sh random-uuid
> J8qXRwI-Qyi2G0guFTiuYw
> #ensure we start clean
> /bin/rm -rf /tmp/kraft-broker-logs /tmp/kraft-controller-logs
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/controller.properties
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/broker.properties
> bin/kafka-server-start.sh config/kraft/controller.properties
> bin/kafka-server-start.sh config/kraft/broker.properties
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic foo1 
> --partitions 1 --replication-factor 1
> #create __consumer-offsets topics
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic foo1 
> --from-beginning
> ^C
> #confirm that __consumer_offsets topic partitions are all created and on 
> broker with node id 2
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe
> Now create 2 more brokers, with node IDs 11 and 12
> cat config/kraft/broker.properties | sed 's/node.id=2/node.id=11/' | sed 
> 's/localhost:9092/localhost:9011/g' |  sed 
> 's#log.dirs=/tmp/kraft-broker-logs#log.dirs=/tmp/kraft-broker-logs11#' > 
> config/kraft/broker11.properties
> cat config/kraft/broker.properties | sed 's/node.id=2/node.id=12/' | sed 
> 's/localhost:9092/localhost:9012/g' |  sed 
> 's#log.dirs=/tmp/kraft-broker-logs#log.dirs=/tmp/kraft-broker-logs12#' > 
> config/kraft/broker12.properties
> #ensure we start clean
> /bin/rm -rf /tmp/kraft-broker-logs11 /tmp/kraft-broker-logs12
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/broker11.properties
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/broker12.properties
> bin/kafka-server-start.sh config/kraft/broker11.properties
> bin/kafka-server-start.sh config/kraft/broker12.properties
> #create a topic with a single partition replicated on two brokers
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic foo2 
> --partitions 1 --replication-factor 2
> #reassign partitions onto brokers with Node IDs 11 and 12
> echo '{"partitions":[{"topic": "foo2","partition": 0,"replicas": [11,12]}], 
> "version":1}' > /tmp/reassign.json
> bin/kafka-reassign-partitions.sh --bootstrap-server localhost:9092 
> --reassignment-json-file /tmp/reassign.json --execute
> bin/kafka-reassign-partitions.sh --bootstrap-server localhost:9092 
> --reassignment-json-file /tmp/reassign.json --verify
> #make preferred leader 11 the actual leader if it not
> bin/kafka-leader-election.sh --bootstrap-server localhost:9092 
> 

[jira] [Commented] (KAFKA-15495) KRaft partition truncated when the only ISR member restarts with an empty disk

2023-09-25 Thread Ron Dagostino (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768787#comment-17768787
 ] 

Ron Dagostino commented on KAFKA-15495:
---

Yes.  The Disk UUID from JBOD is not needed for this because ELR takes care of 
it as described above.  It feels right to have the Disk UUID anyway since it is 
a simple concept that is easy to understand and reason about, and we can 
consider it as part of "defense in depth" -- it doesn't hurt to get multiple 
signals.  Also we will probably need the Disk UUID for Raft, which doesn't use 
ISR, and ELR won't help there.

This ticket will likely get resolved when [Eligible Leader 
Replicas|https://issues.apache.org/jira/browse/KAFKA-15332] gets resolved.

> KRaft partition truncated when the only ISR member restarts with an empty disk
> --
>
> Key: KAFKA-15495
> URL: https://issues.apache.org/jira/browse/KAFKA-15495
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.2, 3.4.1, 3.6.0, 3.5.1
>Reporter: Ron Dagostino
>Priority: Critical
>
> Assume a topic-partition in KRaft has just a single leader replica in the 
> ISR.  Assume next that this replica goes offline.  This replica's log will 
> define the contents of that partition when the replica restarts, which is 
> correct behavior.  However, assume now that the replica has a disk failure, 
> and we then replace the failed disk with a new, empty disk that we also 
> format with the storage tool so it has the correct cluster ID.  If we then 
> restart the broker, the topic-partition will have no data in it, and any 
> other replicas that might exist will truncate their logs to match, which 
> results in data loss.  See below for a step-by-step demo of how to reproduce 
> this.
> [KIP-858: Handle JBOD broker disk failure in 
> KRaft|https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft]
>  introduces the concept of a Disk UUID that we can use to solve this problem. 
>  Specifically, when the leader restarts with an empty (but 
> correctly-formatted) disk, the actual UUID associated with the disk will be 
> different.  The controller will notice upon broker re-registration that its 
> disk UUID differs from what was previously registered.  Right now we have no 
> way of detecting this situation, but the disk UUID gives us that capability.
> STEPS TO REPRODUCE:
> Create a single broker cluster with single controller.  The standard files 
> under config/kraft work well:
> bin/kafka-storage.sh random-uuid
> J8qXRwI-Qyi2G0guFTiuYw
> #ensure we start clean
> /bin/rm -rf /tmp/kraft-broker-logs /tmp/kraft-controller-logs
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/controller.properties
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/broker.properties
> bin/kafka-server-start.sh config/kraft/controller.properties
> bin/kafka-server-start.sh config/kraft/broker.properties
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic foo1 
> --partitions 1 --replication-factor 1
> #create __consumer-offsets topics
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic foo1 
> --from-beginning
> ^C
> #confirm that __consumer_offsets topic partitions are all created and on 
> broker with node id 2
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe
> Now create 2 more brokers, with node IDs 11 and 12
> cat config/kraft/broker.properties | sed 's/node.id=2/node.id=11/' | sed 
> 's/localhost:9092/localhost:9011/g' |  sed 
> 's#log.dirs=/tmp/kraft-broker-logs#log.dirs=/tmp/kraft-broker-logs11#' > 
> config/kraft/broker11.properties
> cat config/kraft/broker.properties | sed 's/node.id=2/node.id=12/' | sed 
> 's/localhost:9092/localhost:9012/g' |  sed 
> 's#log.dirs=/tmp/kraft-broker-logs#log.dirs=/tmp/kraft-broker-logs12#' > 
> config/kraft/broker12.properties
> #ensure we start clean
> /bin/rm -rf /tmp/kraft-broker-logs11 /tmp/kraft-broker-logs12
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/broker11.properties
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/broker12.properties
> bin/kafka-server-start.sh config/kraft/broker11.properties
> bin/kafka-server-start.sh config/kraft/broker12.properties
> #create a topic with a single partition replicated on two brokers
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic foo2 
> --partitions 1 --replication-factor 2
> #reassign partitions onto brokers with Node IDs 11 and 12
> echo '{"partitions":[{"topic": "foo2","partition": 0,"replicas": [11,12]}], 
> "version":1}' > /tmp/reassign.json
> bin/kafka-reassign-partitions.sh --bootstrap-server 

[jira] [Commented] (KAFKA-15495) KRaft partition truncated when the only ISR member restarts with an empty disk

2023-09-25 Thread Ron Dagostino (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768674#comment-17768674
 ] 

Ron Dagostino commented on KAFKA-15495:
---

Thanks, Ismael.  Yes, I read [the 
KIP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas]
 and [Jack's blog post about 
it|https://jack-vanlightly.com/blog/2023/8/17/kafka-kip-966-fixing-the-last-replica-standing-issue|]
 last night after I posted this, and I have asked some folks who have been 
involved in that discussion if the broker epoch communicated in the broker 
registration request via the clean shutdown file might also serve to give the 
controller a signal that the disk is new.  I'll comment more here soon.

> KRaft partition truncated when the only ISR member restarts with an empty disk
> --
>
> Key: KAFKA-15495
> URL: https://issues.apache.org/jira/browse/KAFKA-15495
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.2, 3.4.1, 3.6.0, 3.5.1
>Reporter: Ron Dagostino
>Priority: Critical
>
> Assume a topic-partition in KRaft has just a single leader replica in the 
> ISR.  Assume next that this replica goes offline.  This replica's log will 
> define the contents of that partition when the replica restarts, which is 
> correct behavior.  However, assume now that the replica has a disk failure, 
> and we then replace the failed disk with a new, empty disk that we also 
> format with the storage tool so it has the correct cluster ID.  If we then 
> restart the broker, the topic-partition will have no data in it, and any 
> other replicas that might exist will truncate their logs to match, which 
> results in data loss.  See below for a step-by-step demo of how to reproduce 
> this.
> [KIP-858: Handle JBOD broker disk failure in 
> KRaft|https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft]
>  introduces the concept of a Disk UUID that we can use to solve this problem. 
>  Specifically, when the leader restarts with an empty (but 
> correctly-formatted) disk, the actual UUID associated with the disk will be 
> different.  The controller will notice upon broker re-registration that its 
> disk UUID differs from what was previously registered.  Right now we have no 
> way of detecting this situation, but the disk UUID gives us that capability.
> STEPS TO REPRODUCE:
> Create a single broker cluster with single controller.  The standard files 
> under config/kraft work well:
> bin/kafka-storage.sh random-uuid
> J8qXRwI-Qyi2G0guFTiuYw
> #ensure we start clean
> /bin/rm -rf /tmp/kraft-broker-logs /tmp/kraft-controller-logs
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/controller.properties
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/broker.properties
> bin/kafka-server-start.sh config/kraft/controller.properties
> bin/kafka-server-start.sh config/kraft/broker.properties
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic foo1 
> --partitions 1 --replication-factor 1
> #create __consumer-offsets topics
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic foo1 
> --from-beginning
> ^C
> #confirm that __consumer_offsets topic partitions are all created and on 
> broker with node id 2
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe
> Now create 2 more brokers, with node IDs 11 and 12
> cat config/kraft/broker.properties | sed 's/node.id=2/node.id=11/' | sed 
> 's/localhost:9092/localhost:9011/g' |  sed 
> 's#log.dirs=/tmp/kraft-broker-logs#log.dirs=/tmp/kraft-broker-logs11#' > 
> config/kraft/broker11.properties
> cat config/kraft/broker.properties | sed 's/node.id=2/node.id=12/' | sed 
> 's/localhost:9092/localhost:9012/g' |  sed 
> 's#log.dirs=/tmp/kraft-broker-logs#log.dirs=/tmp/kraft-broker-logs12#' > 
> config/kraft/broker12.properties
> #ensure we start clean
> /bin/rm -rf /tmp/kraft-broker-logs11 /tmp/kraft-broker-logs12
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/broker11.properties
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/broker12.properties
> bin/kafka-server-start.sh config/kraft/broker11.properties
> bin/kafka-server-start.sh config/kraft/broker12.properties
> #create a topic with a single partition replicated on two brokers
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic foo2 
> --partitions 1 --replication-factor 2
> #reassign partitions onto brokers with Node IDs 11 and 12
> echo '{"partitions":[{"topic": "foo2","partition": 0,"replicas": [11,12]}], 
> "version":1}' > /tmp/reassign.json
> bin/kafka-reassign-partitions.sh --bootstrap-server 

[jira] [Commented] (KAFKA-15495) KRaft partition truncated when the only ISR member restarts with an empty disk

2023-09-24 Thread Ismael Juma (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768472#comment-17768472
 ] 

Ismael Juma commented on KAFKA-15495:
-

Please check KIP-966 - it talks about this problem.

> KRaft partition truncated when the only ISR member restarts with an empty disk
> --
>
> Key: KAFKA-15495
> URL: https://issues.apache.org/jira/browse/KAFKA-15495
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.3.2, 3.4.1, 3.6.0, 3.5.1
>Reporter: Ron Dagostino
>Priority: Critical
>
> Assume a topic-partition in KRaft has just a single leader replica in the 
> ISR.  Assume next that this replica goes offline.  This replica's log will 
> define the contents of that partition when the replica restarts, which is 
> correct behavior.  However, assume now that the replica has a disk failure, 
> and we then replace the failed disk with a new, empty disk that we also 
> format with the storage tool so it has the correct cluster ID.  If we then 
> restart the broker, the topic-partition will have no data in it, and any 
> other replicas that might exist will truncate their logs to match, which 
> results in data loss.  See below for a step-by-step demo of how to reproduce 
> this.
> [KIP-858: Handle JBOD broker disk failure in 
> KRaft|https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft]
>  introduces the concept of a Disk UUID that we can use to solve this problem. 
>  Specifically, when the leader restarts with an empty (but 
> correctly-formatted) disk, the actual UUID associated with the disk will be 
> different.  The controller will notice upon broker re-registration that its 
> disk UUID differs from what was previously registered.  Right now we have no 
> way of detecting this situation, but the disk UUID gives us that capability.
> STEPS TO REPRODUCE:
> Create a single broker cluster with single controller.  The standard files 
> under config/kraft work well:
> bin/kafka-storage.sh random-uuid
> J8qXRwI-Qyi2G0guFTiuYw
> #ensure we start clean
> /bin/rm -rf /tmp/kraft-broker-logs /tmp/kraft-controller-logs
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/controller.properties
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/broker.properties
> bin/kafka-server-start.sh config/kraft/controller.properties
> bin/kafka-server-start.sh config/kraft/broker.properties
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic foo1 
> --partitions 1 --replication-factor 1
> #create __consumer-offsets topics
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic foo1 
> --from-beginning
> ^C
> #confirm that __consumer_offsets topic partitions are all created and on 
> broker with node id 2
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe
> Now create 2 more brokers, with node IDs 11 and 12
> cat config/kraft/broker.properties | sed 's/node.id=2/node.id=11/' | sed 
> 's/localhost:9092/localhost:9011/g' |  sed 
> 's#log.dirs=/tmp/kraft-broker-logs#log.dirs=/tmp/kraft-broker-logs11#' > 
> config/kraft/broker11.properties
> cat config/kraft/broker.properties | sed 's/node.id=2/node.id=12/' | sed 
> 's/localhost:9092/localhost:9012/g' |  sed 
> 's#log.dirs=/tmp/kraft-broker-logs#log.dirs=/tmp/kraft-broker-logs12#' > 
> config/kraft/broker12.properties
> #ensure we start clean
> /bin/rm -rf /tmp/kraft-broker-logs11 /tmp/kraft-broker-logs12
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/broker11.properties
> bin/kafka-storage.sh format --cluster-id J8qXRwI-Qyi2G0guFTiuYw --config 
> config/kraft/broker12.properties
> bin/kafka-server-start.sh config/kraft/broker11.properties
> bin/kafka-server-start.sh config/kraft/broker12.properties
> #create a topic with a single partition replicated on two brokers
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic foo2 
> --partitions 1 --replication-factor 2
> #reassign partitions onto brokers with Node IDs 11 and 12
> echo '{"partitions":[{"topic": "foo2","partition": 0,"replicas": [11,12]}], 
> "version":1}' > /tmp/reassign.json
> bin/kafka-reassign-partitions.sh --bootstrap-server localhost:9092 
> --reassignment-json-file /tmp/reassign.json --execute
> bin/kafka-reassign-partitions.sh --bootstrap-server localhost:9092 
> --reassignment-json-file /tmp/reassign.json --verify
> #make preferred leader 11 the actual leader if it not
> bin/kafka-leader-election.sh --bootstrap-server localhost:9092 
> --all-topic-partitions --election-type preferred
> #Confirm both brokers are in ISR and 11 is the leader
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic foo2
> Topic: