>From Jun -

> 10. "If a server happens to receive multiple VoteResponses from another
> server for a particular VoteRequest, it can take the first and ignore the
> rest.": Could you explain why a server would receive multiple responses for
> the same request?
>
This was meant to be a coverall for network flakiness and weirdness, it
wouldn't be expected in the general case.


> 11. "e.g. S1 in the below diagram pg. 41)": What is pg. 41?

Of the Raft paper, I've made the language more clear now

(e.g. S1 in the below diagram, pg. 41 of Raft paper
<https://purl.stanford.edu/qr033xr6097>)

 12. "if a server attempts to send out a Pre-Vote request while any other

server in the quorum does not understand it, it will get back an
> UnsupportedVersionException from the network client and knows to default
> back to the old behavior."

12.1 Based on ApiVersion, a server knows whether a peer supports PreVote or
> not. If it doesn't, there is no need for the server to send a PreVote
> request only to be rejected, right?

Correct, the server won't actually send the PreVote request, its network
client will skip/abort the request when `latestUsableVersion` throws an
UnsupportedVersionException because the peer does not support PreVote.

12.2 What happens when some servers understand PreVote while some others
> don't?
>
We would default to the original standard vote behavior. I can be more
explicit about this in the Compatibility section (modified section pasted
below)

We currently use ApiVersions to gate new/newer versions of Raft APIs from
being used before all servers can support it. This is useful in the upgrade
scenario for Pre-Vote - if a server attempts to send out a Pre-Vote request
while any other server in the quorum does not understand it, it will get
back an UnsupportedVersionException from the network client and knows to
default back to the old behavior. Specifically, the server will transition
from Prospective immediately to Candidate state, and will send standard
votes instead which can be understood by servers on older software
versions.
Let's take a look at an edge case. As the network client will only check
the supported version of the peer that we are intending to send a request
to, we can imagine a scenario where a server first sends PreVotes to peers
which understand PreVote, and then attempts to send PreVote to a peer which
does not. If the server receives and processes a majority of granted
PreVote responses prior to hitting the UnsupportedVersionException, it can
transition to Candidate phase. Otherwise, it will also transition to
Candidate phase once it hits the exception, and send standard vote requests
to all servers. Any PreVote responses received while in Candidate phase
would be ignored.


On Tue, Dec 5, 2023 at 10:10 AM Alyssa Huang <ahu...@confluent.io> wrote:

> Hey folks, thanks for the reviews!
> Addressing them one by one. From Luke -
>
> Some comments:
>> 1. Follower transitions to: Prospective: After expiration of the election
>> timeout
>> -> Is this the fetch timeout, not election timeout?
>>
> Yes, thanks for this catch!
>
>
>> 2. I also agree we don't bump the epoch in prospective state.
>>  A candidate will now send a VoteRequest with the PreVote field set to
>> true
>> and CandidateEpoch set to its [epoch + 1] when its election timeout
>> expires.
>> -> What is "CandidateEpoch"? And I thought you've agreed to not set [epoch
>> + 1] ?
>>
> Forgot to update this section, it now reads
>
> A follower will now transition to Prospective when its fetch timeout
> expires. The Prospective server will send a VoteRequest with the PreVote
> field set to true and ReplicaEpoch  set to its current, unbumped epoch.
> If [majority - 1] of VoteResponse grant the vote, the server will
> transition to Candidate and will then bump its epoch up and send a
> VoteRequest with PreVote set to false (which is the original behavior).
>
>
> On Wed, Nov 29, 2023 at 4:53 PM José Armando García Sancio
> <jsan...@confluent.io.invalid> wrote:
>
>> Hi Alyssa,
>>
>> 1. In the schema for VoteRequest and VoteResponse, you are using
>> "boolean" as the type keyword. The correct keyword should be "bool"
>> instead.
>>
>> 2. In the states and state transaction table you have the following entry:
>> >  * Candidate transitions to:
>> > *    ...
>> > *    Prospective: After expiration of the election timeout
>>
>> Can you explain the reason a candidate would transition back to
>> prospective? If a voter transitions to the candidate state it is
>> because the voters don't support KIP-996 or the replica was able to
>> win the majority of the votes at some point in the past. Are we
>> concerned that the network partition might have occurred after the
>> replica has become a candidate? If so, I think we should state this
>> explicitly in the KIP.
>>
>> 3. In the proposed section and state transition section, I think it
>> would be helpful to explicitly state that we have an invariant that
>> only the prospective state can transition to the candidate state. This
>> transition to the candidate state from the prospective state can only
>> happen because the replica won the majority of the votes or there is
>> at least one remote voter that doesn't support pre-vote.
>>
>> 4. I am a bit confused by this paragraph
>> > A candidate will now send a VoteRequest with the PreVote field set to
>> true and CandidateEpoch set to its [epoch + 1] when its election timeout
>> expires. If [majority - 1] of VoteResponse grant the vote, the candidate
>> will then bump its epoch up and send a VoteRequest with PreVote set to
>> false which is our standard vote that will cause state changes for servers
>> receiving the request.
>>
>> I am assuming that "candidate" refers to the states enumerated on the
>> table above this quote. If so, I think you mean "prospective" for the
>> first candidate.
>>
>> CandidateEpoch should be ReplicaEpoch.
>>
>> [epoch + 1] should just be epoch. I thought we agreed that replicas
>> will always send their current epoch to the remote replicas.
>>
>> 5. I am a bit confused by this bullet section
>> > true if the server receives less than [majority] VoteResponse with
>> VoteGranted set to false within [election.timeout.ms + a little
>> randomness] and the first bullet point does not apply
>>      Explanation for why we don't send a standard vote at this point
>> is explained in rejected alternatives.
>>
>> Can we explain this case in plain english? I assume that this case is
>> trying to cover the scenario where the election timer expired but the
>> prospective candidate hasn't received enough votes (granted or
>> rejected) to make a decision if it could win an election.
>>
>> 6.
>> > Yes. If a leader is unable to receive fetch responses from a majority
>> of servers, it can impede followers that are able to communicate with it
>> from voting in an eligible leader that can communicate with a majority of
>> the cluster.
>>
>> In general, leaders don't receive fetch responses. They receive FETCH
>> requests. Did you mean "if a leader is able to send FETCH responses to
>> the majority - 1 of the voters, it can impede fetching voters
>> (followers) from granting their vote to prospective candidates. This
>> should stop prospective candidates from getting enough votes to
>> transition to the candidate state and increase their epoch".
>>
>> 7.
>> > Check Quorum ensures a leader steps down if it is unable to receive
>> fetch responses from a majority of servers.
>>
>> I think you mean "... if it is unable to receive FETCH requests from
>> the majority - 1 of the voters".
>>
>> 8. At the end of the Proposed changes section you have the following:
>> > The logic now looks like the following for servers receiving
>> VoteRequests with PreVote set to true:
>> >
>> > When servers receive VoteRequests with the PreVote field set to true,
>> they will respond with VoteGranted set to
>> >
>> > * true if they are not a Follower and the epoch and offsets in the
>> Pre-Vote request satisfy the same requirements as a standard vote
>> > * false if they are a Follower or the epoch and end offsets in the
>> Pre-Vote request do not satisfy the requirements
>>
>> This seems to duplicate the same algorithm that was stated earlier in
>> the section.
>>
>> 9. I don't understand this rejected idea: Sending Standard Votes after
>> failure to win Pre-Vote
>>
>> In your example in the "Disruptive server scenarios" voters 4 and 5
>> are partitioned from the majority of the voters. We don't want voters
>> 4 and 5 increasing their epoch and transitioning to the candidate
>> state else they would disrupt the quorum established by voters 1, 2
>> and 3.
>>
>>
>> Thanks,
>> --
>> -José
>>
>

Reply via email to