[jira] [Updated] (KAFKA-16281) Possible IllegalState with KIP-996

2024-02-20 Thread Jack Vanlightly (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Vanlightly updated KAFKA-16281:

Description: 
I have a TLA+ model of KIP-996 (pre-vote) and I have identified an IllegalState 
exception that would occur with the existing MaybeHandleCommonResponse behavior.

The issue stems from the fact that a leader, let's call it r1, can resign 
(either due to a restart or check quorum) and then later initiate a pre-vote 
where it ends up in the same epoch as before. When r1 receives a response from 
r2 who believes that r1 is still the leader, the logic in 
MaybeHandleCommonResponse tries to transition r1 to follower of itself, causing 
an IllegalState exception to be raised.

This is an example history:
 # r1 is the leader in epoch 1.
 # r1 quorum resigns, or restarts and resigns.
 # r1 experiences an election timeout and transitions to Prospective.
 # r1 sends a pre vote request to its peers.
 # r2 thinks r1 is still the leader, sends a vote response, not granting its 
vote and setting leaderId=r1 and epoch=1.
 # r1 receives the vote response and executes MaybeHandleCommonResponse which 
tries to transition r1 to Follower of itself and an illegal state occurs.

The relevant else if statement in MaybeHandleCommonResponse is here: 
[https://github.com/apache/kafka/blob/a26a1d847f1884a519561e7a4fb4cd13e051c824/raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java#L1538]

In the TLA+ specification, I fixed this issue by adding a fourth condition to 
this statement, that replica must not be in the Prospective state. 
[https://github.com/Vanlightly/kafka-tlaplus/blob/9b2600d1cd5c65930d666b12792d47362b64c015/kraft/kip_996/kraft_kip_996_functions.tla#L336|https://github.com/Vanlightly/kafka-tlaplus/blob/421f170ba4bd8c5eceb36b88b47901ee3d9c3d2a/kraft/kip_996/kraft_kip_996_functions.tla#L336]

 

Note, that I also had to implement the sending of the BeginQuorumEpoch request 
by the leader to prevent a replica getting stuck in Prospective. If the replica 
r2 has an election timeout but due to a transient connectivity issue with the 
leader, but has also fallen behind slightly, then r2 will remain stuck as a 
Prospective because none of its peers, who have connectivity to the leader, 
will grant it a pre-vote. To enable r2 to become a functional member again, the 
leader must give it a nudge with a BeginQuorumEpoch request. The alternative 
(which I have also modeled) is for a Prospective to transition to Follower when 
it receives a negative pre-vote response with a non-null leaderId. This comes 
with a separate liveness issue which I can discuss if this "transition to 
Follower" approach is interesting. Either way, a stuck Prospective needs a way 
to transition to follower eventually, if all other members have a stable leader.

 

  was:
I have a TLA+ model of KIP-966 and I have identified an IllegalState exception 
that would occur with the existing MaybeHandleCommonResponse behavior.

The issue stems from the fact that a leader, let's call it r1, can resign 
(either due to a restart or check quorum) and then later initiate a pre-vote 
where it ends up in the same epoch as before, but a cleared local leader id. 
When r1 transitions to Prospective it clears its local leader id. When r1 
receives a response from r2 who believes that r1 is still the leader, the logic 
in MaybeHandleCommonResponse tries to transition r1 to follower of itself, 
causing an IllegalState exception to be raised.

This is an example history:
 # r1 is the leader in epoch 1.
 # r1 quorum resigns, or restarts and resigns.
 # r1 experiences an election timeout and transitions to Prospective clearing 
its local leader id.
 # r1 sends a pre vote request to its peers.
 # r2 thinks r1 is still the leader, sends a vote response, not granting its 
vote and setting leaderId=r1 and epoch=1.
 # r1 receives the vote response and executes MaybeHandleCommonResponse which 
tries to transition r1 to Follower of itself and an illegal state occurs.

The relevant else if statement in MaybeHandleCommonResponse is here: 
https://github.com/apache/kafka/blob/a26a1d847f1884a519561e7a4fb4cd13e051c824/raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java#L1538

In the TLA+ specification, I fixed this issue by adding a fourth condition to 
this statement, that the leaderId also does not equal this server's id. 
[https://github.com/Vanlightly/kafka-tlaplus/blob/9b2600d1cd5c65930d666b12792d47362b64c015/kraft/kip_996/kraft_kip_996_functions.tla#L336]

We should probably create a test to confirm the issue first and then look at 
using the fix I made in the TLA+, though there may be other options.


> Possible IllegalState with KIP-996
> --
>
> Key: KAFKA-16281
> URL: https://issues.apache.org/jira/browse/KAFKA-16281
> Project: Kafka
>  Issue Type: Task
>

[jira] [Updated] (KAFKA-16281) Possible IllegalState with KIP-996

2024-02-20 Thread Jack Vanlightly (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Vanlightly updated KAFKA-16281:

Summary: Possible IllegalState with KIP-996  (was: Probable IllegalState 
possible with KIP-966)

> Possible IllegalState with KIP-996
> --
>
> Key: KAFKA-16281
> URL: https://issues.apache.org/jira/browse/KAFKA-16281
> Project: Kafka
>  Issue Type: Task
>  Components: kraft
>Reporter: Jack Vanlightly
>Priority: Major
>
> I have a TLA+ model of KIP-966 and I have identified an IllegalState 
> exception that would occur with the existing MaybeHandleCommonResponse 
> behavior.
> The issue stems from the fact that a leader, let's call it r1, can resign 
> (either due to a restart or check quorum) and then later initiate a pre-vote 
> where it ends up in the same epoch as before, but a cleared local leader id. 
> When r1 transitions to Prospective it clears its local leader id. When r1 
> receives a response from r2 who believes that r1 is still the leader, the 
> logic in MaybeHandleCommonResponse tries to transition r1 to follower of 
> itself, causing an IllegalState exception to be raised.
> This is an example history:
>  # r1 is the leader in epoch 1.
>  # r1 quorum resigns, or restarts and resigns.
>  # r1 experiences an election timeout and transitions to Prospective clearing 
> its local leader id.
>  # r1 sends a pre vote request to its peers.
>  # r2 thinks r1 is still the leader, sends a vote response, not granting its 
> vote and setting leaderId=r1 and epoch=1.
>  # r1 receives the vote response and executes MaybeHandleCommonResponse which 
> tries to transition r1 to Follower of itself and an illegal state occurs.
> The relevant else if statement in MaybeHandleCommonResponse is here: 
> https://github.com/apache/kafka/blob/a26a1d847f1884a519561e7a4fb4cd13e051c824/raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java#L1538
> In the TLA+ specification, I fixed this issue by adding a fourth condition to 
> this statement, that the leaderId also does not equal this server's id. 
> [https://github.com/Vanlightly/kafka-tlaplus/blob/9b2600d1cd5c65930d666b12792d47362b64c015/kraft/kip_996/kraft_kip_996_functions.tla#L336]
> We should probably create a test to confirm the issue first and then look at 
> using the fix I made in the TLA+, though there may be other options.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16281) Probable IllegalState possible with KIP-966

2024-02-20 Thread Jack Vanlightly (Jira)
Jack Vanlightly created KAFKA-16281:
---

 Summary: Probable IllegalState possible with KIP-966
 Key: KAFKA-16281
 URL: https://issues.apache.org/jira/browse/KAFKA-16281
 Project: Kafka
  Issue Type: Task
  Components: kraft
Reporter: Jack Vanlightly


I have a TLA+ model of KIP-966 and I have identified an IllegalState exception 
that would occur with the existing MaybeHandleCommonResponse behavior.

The issue stems from the fact that a leader, let's call it r1, can resign 
(either due to a restart or check quorum) and then later initiate a pre-vote 
where it ends up in the same epoch as before, but a cleared local leader id. 
When r1 transitions to Prospective it clears its local leader id. When r1 
receives a response from r2 who believes that r1 is still the leader, the logic 
in MaybeHandleCommonResponse tries to transition r1 to follower of itself, 
causing an IllegalState exception to be raised.

This is an example history:
 # r1 is the leader in epoch 1.
 # r1 quorum resigns, or restarts and resigns.
 # r1 experiences an election timeout and transitions to Prospective clearing 
its local leader id.
 # r1 sends a pre vote request to its peers.
 # r2 thinks r1 is still the leader, sends a vote response, not granting its 
vote and setting leaderId=r1 and epoch=1.
 # r1 receives the vote response and executes MaybeHandleCommonResponse which 
tries to transition r1 to Follower of itself and an illegal state occurs.

The relevant else if statement in MaybeHandleCommonResponse is here: 
https://github.com/apache/kafka/blob/a26a1d847f1884a519561e7a4fb4cd13e051c824/raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java#L1538

In the TLA+ specification, I fixed this issue by adding a fourth condition to 
this statement, that the leaderId also does not equal this server's id. 
[https://github.com/Vanlightly/kafka-tlaplus/blob/9b2600d1cd5c65930d666b12792d47362b64c015/kraft/kip_996/kraft_kip_996_functions.tla#L336]

We should probably create a test to confirm the issue first and then look at 
using the fix I made in the TLA+, though there may be other options.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-13872) Partitions are truncated when leader is replaced

2023-08-31 Thread Jack Vanlightly (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760945#comment-17760945
 ] 

Jack Vanlightly commented on KAFKA-13872:
-

This will be fixed by KIP-966. 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas.]

> Partitions are truncated when leader is replaced
> 
>
> Key: KAFKA-13872
> URL: https://issues.apache.org/jira/browse/KAFKA-13872
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Francois Visconte
>Priority: Major
> Attachments: extract-2022-05-04T15_50_34.110Z.csv
>
>
> Sample setup:
>  * a topic with one partition and RF=3
>  * a producer using acks=1
>  * min.insync.replicas to 1
>  * 3 brokers 1,2,3
>  * Preferred leader of the partition is brokerId 0
>  
> Steps to reproduce the issue
>  * Producer keeps producing to the partition, leader is brokerId=0
>  * At some point, replicas 1 and 2 are falling behind and removed from the ISR
>  * The leader broker 0 has an hardware failure
>  * Partition transition to offline
>  * This leader is replaced with a new broker with an empty disk and the same 
> broker id 0
>  * Partition transition from offline to online with leader 0, ISR = 0
>  * Followers see the leader offset is 0 and decide to truncate their 
> partitions to 0, ISR=0,1,2
>  * At this point all the topic data has been removed from all replicas and 
> partition size drops to 0 on all replicas
> Attached some of the relevant logs. I can provide more logs if necessary



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-13872) Partitions are truncated when leader is replaced

2022-05-05 Thread Jack Vanlightly (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532122#comment-17532122
 ] 

Jack Vanlightly commented on KAFKA-13872:
-

I presume you would need to perform a broker decommissioning process to remove 
that broker from the cluster before adding a new empty broker with the same id?

Is there documentation for how to decommission a dead broker safely?

> Partitions are truncated when leader is replaced
> 
>
> Key: KAFKA-13872
> URL: https://issues.apache.org/jira/browse/KAFKA-13872
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Francois Visconte
>Priority: Major
> Attachments: extract-2022-05-04T15_50_34.110Z.csv
>
>
> Sample setup:
>  * a topic with one partition and RF=3
>  * a producer using acks=1
>  * min.insync.replicas to 1
>  * 3 brokers 1,2,3
>  * Preferred leader of the partition is brokerId 0
>  
> Steps to reproduce the issue
>  * Producer keeps producing to the partition, leader is brokerId=0
>  * At some point, replicas 1 and 2 are falling behind and removed from the ISR
>  * The leader broker 0 has an hardware failure
>  * Partition transition to offline
>  * This leader is replaced with a new broker with an empty disk and the same 
> broker id 0
>  * Partition transition from offline to online with leader 0, ISR = 0
>  * Followers see the leader offset is 0 and decide to truncate their 
> partitions to 0, ISR=0,1,2
>  * At this point all the topic data has been removed from all replicas and 
> partition size drops to 0 on all replicas
> Attached some of the relevant logs. I can provide more logs if necessary



--
This message was sent by Atlassian Jira
(v8.20.7#820007)