Re: [DISCUSS] KIP-853: KRaft Controller Membership Changes

José Armando García Sancio Fri, 16 Feb 2024 08:20:47 -0800

Thanks for the feedback Jason. See my comments below.

On Mon, Feb 5, 2024 at 3:11 PM Jason Gustafson
<ja...@confluent.io.invalid> wrote:
> 1. When adding a voter, the KIP proposes to return a timeout error if the
> voter cannot catch up in time. It might be more useful to return a more
> specific error so that an operator can understand why the timeout occurred.
> Alternatively, perhaps we could keep the generic error but add an
> ErrorMessage field in the response to explain the issue. (At the same time,
> perhaps we can add ErrorMessage to the other new RPCs. I think this is
> becoming more or less standard in Kafka RPCs.)


I added the ErrorMessage field to the 3 admin client RPCs: AddVoter,
RemoveVoter and DescribeQuorrum. I didn't add it to the other,
internal, RPCs since I didn't think it would be useful in practice.
Let me know if you think we should add that field to the "internal"
RPCs.

> 2. Say that the voters are A, B, and C, and that C is offline. What error
> would be returned from RemoveVoters if we try to remove B before C is back
> online?

In this example, the case below would never succeed:
"5. Wait for the VotersRecord to commit using the majority of new
configuration."

I am planning to return a REQUEST_TIMED_OUT error for this case. A
future RemoveVoter request for the same replica id and uuid would also
return a REQUEST_TIMED_OUT if the VotersRecord
stays uncommitted because of:
"1. Wait until there are no uncommitted VotersRecord. Note that the
implementation may just return a REQUEST_TIMED_OUT error if there are
pending operations."

Note that this is after the leader (A or B) wrote VotersRecord that
removed B, so the cluster is unavailable. If C doesn't come back
online, eventually "check quorum" will fail and the leader (A or B)
will resign.

This is also an example of the HWM decreasing because of removing the voter.

We could add a heuristic to lower the impact of this example but I am
not sure if it is worth the implementation cost at the moment. For
example, the leader can check the LEO and lastFetchTimestamp of the
replicas (C in this example). What are these replicas in a bigger
voter set?

> 3. There is a note about a voter accepting a Vote request from a voter that
> is not in their voter set. I guess the same comment applies to
> BeginQuorumEpoch?

Yes. The other extension to the BeginQuorumEpoch request is that the
leader will resend the request (even if it was previously
acknowledged), if the voters fails to send a Fetch request within the
"check quorum" timeout. I had to change this because BeginQuorumEpoch
also includes the leader's endpoint information. Yet the receiving
replica doesn't persist the leader endpoint information into the
quorum-state file like it does for leader id, uuid and epoch.

> In other words, a voter should accept a new leader even
> if it is not in its local voter set.

Yes. There are two cases for this. 1. A voter has become leader before
another replica (observer or voter) has replicated the voter set. 2.
The active leader is the replica that is getting removed from the
voter set.

> The note in the reference says: "Thus,
> servers process incoming RPC requests without consulting their current
> configurations." On the other hand, when it comes to election, voters will
> become candidates based on the latest voter set from the log (regardless
> whether it is committed), and they will seek out votes only from the same
> voter set. Is that about right?

Yes. That's correct. When transitioning to prospective (pre-vote KIP),
when transitioning to candidate, or when deciding to send RPCs like
Vote (and BeginQuorumEpoch); the replica will do so based on the
latest voter set (VotersRecord) in the log.

Thanks
--
-José

Re: [DISCUSS] KIP-853: KRaft Controller Membership Changes

Reply via email to