Re: [DISCUSS] KIP-966: Eligible Leader Replicas

Calvin Liu Fri, 01 Sep 2023 15:27:33 -0700

Hi Justine
1. With the new proposal, in order to let the consumer consume a message
when only 1 replica commits it to its log, the min ISR has to be set to 1.
2. Yes, we will need an unclean recovery if the leader has an unclean
shutdown.
3. If the min ISR config is changed to a larger value, the ISR and ELR will
not be updated. ISR members are always valid no matter how min ISR changes.
If ELR is not empty, then the HWM can't advance as well after the min ISR
increase, so the ELR members are safe to stay.
4. I will highlight the explanation. Thanks.


On Thu, Aug 31, 2023 at 4:35 PM Justine Olshan <jols...@confluent.io.invalid>
wrote:

> Hey Calvin,
>
> Thanks for the responses. I think I understood most of it, but had a few
> follow up questions
>
> 1. For the acks=1 case, I was wondering if there is any way to continue
> with the current behavior (ie -- we only need one ack to produce to the log
> and consider the request complete.) My understanding is that we can also
> consume from such topics at that point.
> If users wanted this lower durability could they set min.insync.replicas to
> 1?
>
> 2. For the case where we elect a leader that was unknowingly offline. Say
> this replica was the only one in ELR. My understanding is that we would
> promote it to ISR and remove it from ELR when it is the leader, but then we
> would remove it from ISR and have no brokers in ISR or ELR. From there we
> would need to do unclean recovery right?
>
> 3. Did we address the case where dynamically min isr is increased?
>
> 4. I think my comment was more about confusion on the KIP. It was not clear
> to me that the section was describing points if one was done before the
> other. But I now see the sentence explaining that. I think I skipped from
> "delivery plan" to the bullet points.
>
> Justine
>
> On Thu, Aug 31, 2023 at 4:04 PM Calvin Liu <ca...@confluent.io.invalid>
> wrote:
>
> > Hi Justine
> > Thanks for the questions!
> >   *a. For my understanding, will we block replication? Or just the high
> > watermark advancement?*
> >   - The replication will not be blocked. The followers are free to
> > replicate messages above the HWM. Only HWM advancement is blocked.
> >
> >   b. *Also in the acks=1 case, if folks want to continue the previous
> > behavior, they also need to set min.insync.replicas to 1, correct?*
> >   - If the clients only send ack=1 messages and minISR=2. The HWM
> behavior
> > will only be different when there is 1 replica in the ISR. In this case,
> > the min ISR does not do much in the current system. It is kind of a
> > trade-off but we think it is ok.
> >
> >   c. *The KIP seems to suggest that we remove from ELR when we start up
> > again and notice we do not have the clean shutdown file. Is there a
> chance
> > we have an offline broker in ELR that had an unclean shutdown that we
> elect
> > as a leader before we get the change to realize the shutdown was
> unclean?*
> > *  - *The controller will only elect an unfenced(online) replica as the
> > leader. If a broker has an unclean shutdown, it should register to the
> > controller first(where it has to declair whether it is a clean/unclean
> > shutdown) and then start to serve broker requests. So
> >      1. If the broker has an unclean shutdown before the controller is
> > aware that the replica is offline, then the broker can become the leader
> > temporarily. But it can't serve any Fetch requests before it registers
> > again, and that's when the controller will re-elect a leader.
> >      2. If the controller knows the replica is offline(missing heartbeats
> > from the broker for a while) before the broker re-registers, the broker
> > can't be elected as a leader.
> >
> > d. *Would this be the case for strictly a smaller min ISR?*
> > - Yes, only when we have a smaller min ISR. Once the leader is aware of
> the
> > minISR change, the HWM can advance and make the current ELR obsolete. So
> > the controller should clear the ELR if the ISR >= the new min ISR.
> >
> > e. *I thought we said the above "Last Leader” behavior can’t be
> maintained
> > with an empty ISR and it should be removed."*
> > -  As the Kip is a big one, we have to consider delivering it in phases.
> If
> > only the Unclean Recovery is delivered, we do not touch the ISR then the
> > ISR behavior will be the same as the current. I am open to the proposal
> > that directly starting unclean recovery if the last leader fails. Let's
> see
> > if other folks hope to have more if Unclean Recover delivers first.
> >
> > On Tue, Aug 29, 2023 at 4:53 PM Justine Olshan
> > <jols...@confluent.io.invalid>
> > wrote:
> >
> > > Hey Calvin,
> > >
> > > Thanks for the KIP. This will close some of the gaps in leader
> election!
> > I
> > > has a few questions:
> > >
> > > *>* *High Watermark can only advance if the ISR size is larger or equal
> > > to min.insync.replicas*.
> > >
> > > For my understanding, will we block replication? Or just the high
> > watermark
> > > advancement?
> > > Also in the acks=1 case, if folks want to continue the previous
> behavior,
> > > they also need to set min.insync.replicas to 1, correct? It seems like
> > this
> > > change takes some control away from clients when it comes to durability
> > vs
> > > availability.
> > >
> > > *> *
> > > *ELR + ISR size will not be dropped below the min ISR unless the
> > controller
> > > discovers an ELR member has an unclean shutdown. *
> > > The KIP seems to suggest that we remove from ELR when we start up again
> > and
> > > notice we do not have the clean shutdown file. Is there a chance we
> have
> > an
> > > offline broker in ELR that had an unclean shutdown that we elect as a
> > > leader before we get the change to realize the shutdown was unclean?
> > > This seems like it could cause some problems. I may have missed how we
> > > avoid this scenario though.
> > >
> > > *> When updating the config **min.insync.replicas, *
> > > *if the new min ISR <= current ISR, the ELR will be removed.*Would this
> > be
> > > the case for strictly a smaller min ISR? I suppose if we increase the
> > ISR,
> > > we can't reason about ELR. Can we reason about high water mark in this
> > > case--seems like we will have the broker out of ISR not in ISR or ELR?
> > > (Forgive me if we can't increase min ISR if the increase will put us
> > under
> > > it)
> > >
> > > *> Unclean recovery. *
> > >
> > >    - *The unclean leader election will be replaced by the unclean
> > > recovery.*
> > >    - *unclean.leader.election.enable will only be replaced by
> > >    the unclean.recovery.strategy after ELR is delivered.*
> > >    - *As there is no change to the ISR, the "last known leader"
> behavior
> > is
> > >    maintained.*
> > >
> > > What does "last known leader behavior maintained" mean here? I thought
> we
> > > said *"*The above “*Last Leader” behavior can’t be maintained with an
> > empty
> > > ISR and it should be removed." *My understanding is once metadata
> version
> > > is updated we will always take the more thoughtful unclean election
> > process
> > > (ie, inspect the logs)
> > >
> > > Overall though, the general KIP is pretty solid. Looking at the
> rejected
> > > alternatives, it looks like a lot was considered, so it's nice to see
> the
> > > final proposal.
> > >
> > > Justine
> > >
> > > On Mon, Aug 14, 2023 at 8:50 AM Calvin Liu <ca...@confluent.io.invalid
> >
> > > wrote:
> > >
> > > >    1. Yes, the new protocol requires 2 things to advance the HWM. a)
> > The
> > > >    messages have been replicated to the controller-committed ISR
> > members.
> > > > b)
> > > >    The number of ISR members should be at least the min ISR.
> > > >    2. With the current protocol, we are not able to select broker 1
> as
> > > the
> > > >    leader. If we first imply we have the new HWM requirement in
> place,
> > > then
> > > >    broker 1 is a good candidate to choose. The following part of the
> > KIP
> > > > (ELR)
> > > >    part will explain a new mechanism to enable us to choose broker 1.
> > > > Note, if
> > > >    both HWM and ELR are in place, broker 1 will be actually elected
> in
> > > T3.
> > > >
> > > >
> > > > On Fri, Aug 11, 2023 at 10:05 AM Jeff Kim
> > <jeff....@confluent.io.invalid
> > > >
> > > > wrote:
> > > >
> > > > > Hi Calvin,
> > > > >
> > > > > Thanks for the KIP! I'm still digesting it but I have two
> questions:
> > > > >
> > > > > > In the scenario raised in the motivation section, the server may
> > > > receive
> > > > > ack=1 messages during T1 and advance High Watermark when the leader
> > > > > is the only one in ISR.
> > > > >
> > > > > To confirm, the current protocol allows advancing the HWM if all
> > > brokers
> > > > in
> > > > > the ISR append to their logs (in this case only the leader). And
> > we're
> > > > > proposing
> > > > > to advance the HWM only when <at least min.insync.replicas> brokers
> > > > > replicate. Is this correct?
> > > > >
> > > > > > Then, if we elect broker 1 as the leader at T4, though we can
> > > guarantee
> > > > > the safety of ack=all messages, the High Watermark may move
> backward
> > > > > which causes further impacts on the consumers.
> > > > >
> > > > > How can broker 1 become the leader if it was ineligible in T3? Or
> are
> > > > > you referring to broker 2?
> > > > >
> > > > > Thanks,
> > > > > Jeff
> > > > >
> > > > > On Thu, Aug 10, 2023 at 6:48 PM Calvin Liu
> > <ca...@confluent.io.invalid
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > > I'd like to discuss a series of enhancement to the replication
> > > > protocol.
> > > > > >
> > > > > > A partition replica can experience local data loss in unclean
> > > shutdown
> > > > > > scenarios where unflushed data in the OS page cache is lost -
> such
> > as
> > > > an
> > > > > > availability zone power outage or a server error. The Kafka
> > > replication
> > > > > > protocol is designed to handle these situations by removing such
> > > > replicas
> > > > > > from the ISR and only re-adding them once they have caught up and
> > > > > therefore
> > > > > > recovered any lost data. This prevents replicas that lost an
> > > arbitrary
> > > > > log
> > > > > > suffix, which included committed data, from being elected leader.
> > > > > > However, there is a "last replica standing" state which when
> > combined
> > > > > with
> > > > > > a data loss unclean shutdown event can turn a local data loss
> > > scenario
> > > > > into
> > > > > > a global data loss scenario, i.e., committed data can be removed
> > from
> > > > all
> > > > > > replicas. When the last replica in the ISR experiences an unclean
> > > > > shutdown
> > > > > > and loses committed data, it will be reelected leader after
> > starting
> > > up
> > > > > > again, causing rejoining followers to truncate their logs and
> > thereby
> > > > > > removing the last copies of the committed records which the
> leader
> > > lost
> > > > > > initially.
> > > > > >
> > > > > > The new KIP will maximize the protection and provides MinISR-1
> > > > tolerance
> > > > > to
> > > > > > data loss unclean shutdown events.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

Reply via email to