Re: [DISCUSS] KIP-966: Eligible Leader Replicas

Colin McCabe Mon, 25 Sep 2023 16:19:07 -0700

On Mon, Sep 25, 2023, at 15:48, Jun Rao wrote:
> Hi, Calvin,
>
> Thanks for the updated KIP. Made another pass. A few more comments below.
>
> 40. unclean.leader.election.enable.false ->
> unclean.recovery.strategy.Balanced: The Balanced mode could still lead to
> data loss. So, I am wondering if unclean.leader.election.enable.false
> should map to None?
>


Hi Jun,

Thanks for the re-review.

The issue with mapping unclean.leader.election.enable=false to 
unclean.recovery.strategy=None is that it will be a large behavior change for 
existing users. For example, in the scenario where 3 or more brokers lose power 
at the same time, if unclean.recovery.strategy=None, the cluster will never 
recover without manual intervention. I don't think this really is viable for us 
since users who upgrade will see it as just "broken." In contrast, 
unclean.recovery.strategy=Balanced implement the current behavior (basically).

We all agree that it's awkward to keep around unclean.leader.election.enable, 
which is why this KIP deprecates it. We should be able to remove it in 4.0 and 
just have unclean.recovery.strategy, which will make things clearer.

Agreed on your other comments -- thanks again for reviewing.

best,
Colin


> 41. unclean.recovery.manager.enabled: I am not sure why we introduce this
> additional config. Is it the same as unclean.recovery.strategy=None?
>
> 42. DescribeTopicResponse.TopicAuthorizedOperations: Should this be at the
> topic level?
>
> 43. "Limit: 20 topics max per request": Could we describe what happens if
> the request includes more than 20 topics?
>
> 44. ElectLeadersRequest.DesiredLeaders: Could we describe whether the
> ordering matters?
>
> 45. GetReplicaLogInfo.TopicPartitions: "about": "The topic partitions to
> elect leaders.": The description in "about" is incorrect.
>
> 46. GetReplicaLogInfoResponse: Should we nest partitions under topicId to
> be consistent with other types of responses?
>
> 47. kafka-leader-election.sh:
> 47.1 Could we explain DESIGNATION?
> 47.2 desiredLeader: Should it be a list to match the field in
> ElectLeadersRequest?
>
> 48. We could add a section on downgrade?
>
> 49. LastKnownLeader: This seems only needed in the first phase of
> delivering ELR. Will it be removed when the complete KIP is delivered?
>
> Thanks,
>
> Jun
>
> On Tue, Sep 19, 2023 at 1:30 PM Colin McCabe <cmcc...@apache.org> wrote:
>
>> Hi Calvin,
>>
>> Thanks for the explanations. I like the idea of using none, balanced,
>> aggressive. We also had an offline discussion about why it is good to use a
>> new config key (basically, so that we can deprecate the old one which had
>> only false/true values in 4.0) With these changes, I am +1.
>>
>> best,
>> Colin
>>
>> On Mon, Sep 18, 2023, at 15:54, Calvin Liu wrote:
>> > Hi Colin,
>> > Also, can we deprecate unclean.leader.election.enable in 4.0? Before
>> that,
>> > we can have both the config unclean.recovery.strategy and
>> > unclean.leader.election.enable
>> > and using the unclean.recovery.Enabled to determine which config to use
>> > during the unclean leader election.
>> >
>> > On Mon, Sep 18, 2023 at 3:51 PM Calvin Liu <ca...@confluent.io> wrote:
>> >
>> >> Hi Colin,
>> >> For the unclean.recovery.strategy config name, how about we use the
>> >> following
>> >> None. It basically means no unclean recovery will be performed.
>> >> Aggressive. It means availability goes first. Whenever the partition
>> can't
>> >> elect a durable replica, the controller will try the unclean recovery.
>> >> Balanced. It is the balance point of the availability first(Aggressive)
>> >> and least availability(None). The controller performs unclean recovery
>> when
>> >> both ISR and ELR are empty.
>> >>
>> >>
>> >> On Fri, Sep 15, 2023 at 11:42 AM Calvin Liu <ca...@confluent.io> wrote:
>> >>
>> >>> Hi Colin,
>> >>>
>> >>> > So, the proposal is that if someone sets
>> "unclean.leader.election.enable
>> >>> = true"...
>> >>>
>> >>>
>> >>> The idea is to use one of the unclean.leader.election.enable and
>> >>> unclean.recovery.strategy based on the unclean.recovery.Enabled. A
>> possible
>> >>> version can be
>> >>>
>> >>> If unclean.recovery.Enabled:
>> >>>
>> >>> {
>> >>>
>> >>> Check unclean.recovery.strategy. If set, use it. Otherwise, check
>> >>> unclean.leader.election.enable and translate it to
>> >>> unclean.recovery.strategy.
>> >>>
>> >>> } else {
>> >>>
>> >>> Use unclean.leader.election.enable
>> >>>
>> >>> }
>> >>>
>> >>>
>> >>> —--------
>> >>>
>> >>> >The configuration key should be "unclean.recovery.manager.enabled",
>> >>> right?
>> >>>
>> >>>
>> >>> I think we have two ways of choosing a leader uncleanly, unclean leader
>> >>> election and unclean recovery(log inspection) and we try to switch
>> between
>> >>> them.
>> >>>
>> >>> Do you mean we want to develop two ways of performing the unclean
>> >>> recovery and one of them is using “unclean recovery manager”? I guess
>> we
>> >>> haven’t discussed the second way.
>> >>>
>> >>>
>> >>> —-------
>> >>>
>> >>> >How do these 4 levels of overrides interact with your new
>> >>> configurations?
>> >>>
>> >>>
>> >>> I do notice in the Kraft controller code, the method to check whether
>> >>> perform unclean leader election is hard coded to false since
>> >>> 2021(uncleanLeaderElectionEnabledForTopic). Isn’t it a good chance to
>> >>> completely deprecate the unclean.leader.election.enable? We don’t even
>> have
>> >>> to worry about the config conversion.
>> >>>
>> >>> On the other hand, whatever the override is, as long as the controller
>> >>> can have the final effective unclean.leader.election.enable, the topic
>> >>> level config unclean.recovery.strategy, the cluster level config
>> >>> unclean.recovery.Enabled, the controller can calculate the correct
>> methods
>> >>> to use right?
>> >>>
>> >>>
>> >>> On Fri, Sep 15, 2023 at 10:02 AM Colin McCabe <cmcc...@apache.org>
>> wrote:
>> >>>
>> >>>> On Thu, Sep 14, 2023, at 22:23, Calvin Liu wrote:
>> >>>> > Hi Colin
>> >>>> > 1. I think using the new config name is more clear.
>> >>>> >        a. The unclean leader election is actually removed if unclean
>> >>>> > recovery is in use.
>> >>>> >        b. Using multiple values in unclean.leader.election.enable is
>> >>>> > confusing and it will be more confusing after people forget about
>> this
>> >>>> > discussion.
>> >>>>
>> >>>> Hi Calvin,
>> >>>>
>> >>>> So, the proposal is that if someone sets
>> "unclean.leader.election.enable
>> >>>> = true" but then sets one of your new configurations, the value of
>> >>>> unclean.leader.election.enable is ignored? That seems less clear to
>> me, not
>> >>>> more. Just in general, having multiple configuration keys to control
>> the
>> >>>> same thing confuses users. Basically, they are sitting at a giant
>> control
>> >>>> panel, and some of the levers do nothing.
>> >>>>
>> >>>> > 2. Sorry I forgot to mention in the response that I did add the
>> >>>> > unclean.recovery.Enabled flag.
>> >>>>
>> >>>> The configuration key should be "unclean.recovery.manager.enabled",
>> >>>> right? Becuase we can do "unclean recovery" without the manager.
>> Disabling
>> >>>> the manager just means we use a different mechanism for recovery.
>> >>>>
>> >>>> >        c. Maybe I underestimated the challenge of replacing the
>> >>>> config. Any
>> >>>> > implementation problems ahead?
>> >>>>
>> >>>> There are four levels of overrides for unclean.leader.election.enable.
>> >>>>
>> >>>> 1. static configuration for node.
>> >>>>     This goes in the configuration file, typically named
>> >>>> server.properties
>> >>>>
>> >>>> 2. dynamic configuration for node default
>> >>>>   ConfigResource(type=BROKER, name="")
>> >>>>
>> >>>> 3. dynamic configuration for node
>> >>>>   ConfigResource(type=BROKER, name=<controller id>)
>> >>>>
>> >>>> 4. dynamic configuration for topic
>> >>>>   ConfigResource(type=TOPIC, name=<topic-name>)
>> >>>>
>> >>>> How do these 4 levels of overrides interact with your new
>> >>>> configurations? If the new configurations dominate over the old ones,
>> it
>> >>>> seems like this will get a lot more confusing to implement (and also
>> to
>> >>>> use.)
>> >>>>
>> >>>> Again, I'd recommend just adding some new values to
>> >>>> unclean.leader.election.enable. It's simple and will prevent user
>> confusion
>> >>>> (as well as developer confusion.)
>> >>>>
>> >>>> best,
>> >>>> Colin
>> >>>>
>> >>>>
>> >>>> > 3. About the admin client, I mentioned 3 changes in the client.
>> >>>> Anything
>> >>>> > else I missed in the KIP?
>> >>>> >       a. The client will switch to using the new RPC instead of
>> >>>> > MetadataRequest for the topics.
>> >>>> >       b. The TopicPartitionInfo used in TopicDescription needs to
>> add
>> >>>> new
>> >>>> > fields related to the ELR.
>> >>>> >       c. The outputs will add the ELR related fields.
>> >>>> >
>> >>>> > On Thu, Sep 14, 2023 at 9:19 PM Colin McCabe <cmcc...@apache.org>
>> >>>> wrote:
>> >>>> >
>> >>>> >> Hi Calvin,
>> >>>> >>
>> >>>> >> Thanks for the changes.
>> >>>> >>
>> >>>> >> 1. Earlier I commented that creating "unclean.recovery.strategy "
>> is
>> >>>> not
>> >>>> >> necessary, and we can just reuse the existing
>> >>>> >> "unclean.leader.election.enable" configuration key. Let's discuss
>> >>>> that.
>> >>>> >>
>> >>>> >> 2.I also don't understand why you didn't add a configuration to
>> >>>> enable or
>> >>>> >> disable the Unclean Recovery Manager. This seems like a very simple
>> >>>> way to
>> >>>> >> handle the staging issue which we discussed. The URM can just be
>> >>>> turned off
>> >>>> >> until it is production ready. Let's discuss this.
>> >>>> >>
>> >>>> >> 3. You still need to describe the changes to AdminClient that are
>> >>>> needed
>> >>>> >> to use DescribeTopicRequest.
>> >>>> >>
>> >>>> >> Keep at it. It's looking better. :)
>> >>>> >>
>> >>>> >> best,
>> >>>> >> Colin
>> >>>> >>
>> >>>> >>
>> >>>> >> On Thu, Sep 14, 2023, at 11:03, Calvin Liu wrote:
>> >>>> >> > Hi Colin
>> >>>> >> > Thanks for the comments!
>> >>>> >> >
>> >>>> >> > I did the following changes
>> >>>> >> >
>> >>>> >> >    1.
>> >>>> >> >
>> >>>> >> >    Simplified the API spec section to only include the diff.
>> >>>> >> >    2.
>> >>>> >> >
>> >>>> >> >    Reordered the HWM requirement section.
>> >>>> >> >    3.
>> >>>> >> >
>> >>>> >> >    Removed the URM implementation details to keep the necessary
>> >>>> >> >    characteristics to perform the unclean recovery.
>> >>>> >> >    1.
>> >>>> >> >
>> >>>> >> >       When to perform the unclean recovery
>> >>>> >> >       2.
>> >>>> >> >
>> >>>> >> >       Under different config, how the unclean recovery finds the
>> >>>> leader.
>> >>>> >> >       3.
>> >>>> >> >
>> >>>> >> >       How the config unclean.leader.election.enable and
>> >>>> >> >       unclean.recovery.strategy are converted when users
>> >>>> enable/disable
>> >>>> >> the
>> >>>> >> >       unclean recovery.
>> >>>> >> >       4.
>> >>>> >> >
>> >>>> >> >    More details about how we change admin client.
>> >>>> >> >    5.
>> >>>> >> >
>> >>>> >> >    API limits on the GetReplicaLogInfoRequest and
>> >>>> DescribeTopicRequest.
>> >>>> >> >    6.
>> >>>> >> >
>> >>>> >> >    Two metrics added
>> >>>> >> >    1.
>> >>>> >> >
>> >>>> >> >       Kafka.controller.global_under_min_isr_partition_count
>> >>>> >> >       2.
>> >>>> >> >
>> >>>> >> >       kafka.controller.unclean_recovery_finished_count
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > On Wed, Sep 13, 2023 at 10:46 AM Colin McCabe <
>> cmcc...@apache.org>
>> >>>> >> wrote:
>> >>>> >> >
>> >>>> >> >> On Tue, Sep 12, 2023, at 17:21, Calvin Liu wrote:
>> >>>> >> >> > Hi Colin
>> >>>> >> >> > Thanks for the comments!
>> >>>> >> >> >
>> >>>> >> >>
>> >>>> >> >> Hi Calvin,
>> >>>> >> >>
>> >>>> >> >> Thanks again for the KIP.
>> >>>> >> >>
>> >>>> >> >> One meta-comment: it's usually better to just do a diff on a
>> >>>> message
>> >>>> >> spec
>> >>>> >> >> file or java file if you're including changes to it in the KIP.
>> >>>> This is
>> >>>> >> >> easier to read than looking for "new fields begin" etc. in the
>> >>>> text, and
>> >>>> >> >> gracefully handles the case where existing fields were changed.
>> >>>> >> >>
>> >>>> >> >> > Rewrite the Additional High Watermark advancement requirement
>> >>>> >> >> > There was feedback on this section that some readers may not
>> be
>> >>>> >> familiar
>> >>>> >> >> > with HWM and Ack=0,1,all requests. This can help them
>> understand
>> >>>> the
>> >>>> >> >> > proposal. I will rewrite this part for more readability.
>> >>>> >> >> >
>> >>>> >> >>
>> >>>> >> >> To be clear, I wasn't suggesting dropping either section. I
>> agree
>> >>>> that
>> >>>> >> >> they add useful background. I was just suggesting that we should
>> >>>> discuss
>> >>>> >> >> the "acks" setting AFTER discussing the new high watermark
>> >>>> advancement
>> >>>> >> >> conditions. We also should discuss acks=0. While it isn't
>> >>>> conceptually
>> >>>> >> much
>> >>>> >> >> different than acks=1 here, its omission from this section is
>> >>>> confusing.
>> >>>> >> >>
>> >>>> >> >> > Unclean recovery
>> >>>> >> >> >
>> >>>> >> >> > The plan is to replace the unclean.leader.election.enable with
>> >>>> >> >> > unclean.recovery.strategy. If the Unclean Recovery is enabled
>> >>>> then it
>> >>>> >> >> deals
>> >>>> >> >> > with the three options in the unclean.recovery.strategy.
>> >>>> >> >> >
>> >>>> >> >> >
>> >>>> >> >> > Let’s refine the Unclean Recovery. We have already taken a
>> lot of
>> >>>> >> >> > suggestions and I hope to enhance the durability of Kafka to
>> the
>> >>>> next
>> >>>> >> >> level
>> >>>> >> >> > with this KIP.
>> >>>> >> >>
>> >>>> >> >> I am OK with doing the unclean leader recovery improvements in
>> >>>> this KIP.
>> >>>> >> >> However, I think we need to really work on the configuration
>> >>>> settings.
>> >>>> >> >>
>> >>>> >> >> Configuration overrides are often quite messy. For example, the
>> >>>> cases
>> >>>> >> >> where we have log.roll.hours and log.roll.segment.ms, the user
>> >>>> has to
>> >>>> >> >> remember which one takes precedence, and it is not obvious. So,
>> >>>> rather
>> >>>> >> than
>> >>>> >> >> creating a new configuration, why not add additional values to
>> >>>> >> >> "unclean.leader.election.enable"? I think this will be simpler
>> for
>> >>>> >> people
>> >>>> >> >> to understand, and simpler in the code as well.
>> >>>> >> >>
>> >>>> >> >> What if we continued to use "unclean.leader.election.enable" but
>> >>>> >> extended
>> >>>> >> >> it so that it took a string? Then the string could have these
>> >>>> values:
>> >>>> >> >>
>> >>>> >> >> never
>> >>>> >> >>     never automatically do an unclean leader election under any
>> >>>> >> conditions
>> >>>> >> >>
>> >>>> >> >> false / default
>> >>>> >> >>     only do an unclean leader election if there may be possible
>> >>>> data
>> >>>> >> loss
>> >>>> >> >>
>> >>>> >> >> true / always
>> >>>> >> >>     always do an unclean leader election if we can't immediately
>> >>>> elect a
>> >>>> >> >> leader
>> >>>> >> >>
>> >>>> >> >> It's a bit awkward that false maps to default rather than to
>> >>>> never. But
>> >>>> >> >> this awkwardness exists if we use two different configuration
>> keys
>> >>>> as
>> >>>> >> well.
>> >>>> >> >> The reason for the awkwardness is that we simply don't want most
>> >>>> of the
>> >>>> >> >> people currently setting unclean.leader.election.enable=false to
>> >>>> get the
>> >>>> >> >> "never" behavior. We have to bite that bullet. Better to be
>> clear
>> >>>> and
>> >>>> >> >> explicit than hide it.
>> >>>> >> >>
>> >>>> >> >> Another thing that's a bit awkward is having two different ways
>> to
>> >>>> do
>> >>>> >> >> unclean leader election specified in the KIP. You descirbe two
>> >>>> methods:
>> >>>> >> the
>> >>>> >> >> simple "choose the last leader" method, and the "unclean
>> recovery
>> >>>> >> manager"
>> >>>> >> >> method. I understand why you did it this way -- "choose the last
>> >>>> >> leader" is
>> >>>> >> >> simple, and will help us deliver an implementation quickly,
>> while
>> >>>> the
>> >>>> >> URM
>> >>>> >> >> is preferable in the long term. My suggestion here is to
>> separate
>> >>>> the
>> >>>> >> >> decision of HOW to do unclean leader election from the decision
>> of
>> >>>> WHEN
>> >>>> >> to
>> >>>> >> >> do it.
>> >>>> >> >>
>> >>>> >> >> So in other words, have "unclean.leader.election.enable" specify
>> >>>> when we
>> >>>> >> >> do unclean leader election, and have a new configuration like
>> >>>> >> >> "unclean.recovery.manager.enable" to determine if we use the
>> URM.
>> >>>> >> >> Presumably the URM will take some time to get fully stable, so
>> >>>> this can
>> >>>> >> >> default to false for a while, and we can flip the default to
>> true
>> >>>> when
>> >>>> >> we
>> >>>> >> >> feel ready.
>> >>>> >> >>
>> >>>> >> >> The URM is somewhat under-described here. I think we need a few
>> >>>> >> >> configurations here for it. For example, we need a
>> configuration to
>> >>>> >> specify
>> >>>> >> >> how long it should wait for a broker to respond to its RPCs
>> before
>> >>>> >> moving
>> >>>> >> >> on. We also need to understand how the URM interacts with
>> >>>> >> >> unclean.leader.election.enable=always. I assume that with
>> "always"
>> >>>> we
>> >>>> >> will
>> >>>> >> >> just unconditionally use the URM rather than choosing randomly.
>> >>>> But this
>> >>>> >> >> should be spelled out in the KIP.
>> >>>> >> >>
>> >>>> >> >> >
>> >>>> >> >> > DescribeTopicRequest
>> >>>> >> >> >
>> >>>> >> >> >    1.
>> >>>> >> >> >    Yes, the plan is to replace the MetadataRequest with the
>> >>>> >> >> >    DescribeTopicRequest for the admin clients. Will check the
>> >>>> details.
>> >>>> >> >>
>> >>>> >> >> Sounds good. But as I said, you need to specify how AdminClient
>> >>>> >> interacts
>> >>>> >> >> with the new request. This will involve adding some fields to
>> >>>> >> >> TopicDescription.java. And you need to specify the changes to
>> the
>> >>>> >> >> kafka-topics.sh command line tool. Otherwise we cannot use the
>> >>>> tool to
>> >>>> >> see
>> >>>> >> >> the new information.
>> >>>> >> >>
>> >>>> >> >> The new requests, DescribeTopicRequest and
>> >>>> GetReplicaLogInfoRequest,
>> >>>> >> need
>> >>>> >> >> to have limits placed on them so that their size can't be
>> >>>> infinite. We
>> >>>> >> >> don't want to propagate the current problems of MetadataRequest,
>> >>>> where
>> >>>> >> >> clients can request massive responses that can mess up the JVM
>> when
>> >>>> >> handled.
>> >>>> >> >>
>> >>>> >> >> Adding limits is simple for GetReplicaLogInfoRequest -- we can
>> >>>> just say
>> >>>> >> >> that only 2000 partitions at a time can be requested. For
>> >>>> >> >> DescribeTopicRequest we can probably just limit to 20 topics or
>> >>>> >> something
>> >>>> >> >> like that, to avoid the complexity of doing pagination in this
>> KIP.
>> >>>> >> >>
>> >>>> >> >> >    2.
>> >>>> >> >> >    I can let the broker load the ELR info so that they can
>> serve
>> >>>> the
>> >>>> >> >> >    DescribeTopicRequest as well.
>> >>>> >> >>
>> >>>> >> >> Yes, it's fine to add to MetadataCache. In fact, you'll be
>> loading
>> >>>> it
>> >>>> >> >> anyway once it's added to PartitionImage.
>> >>>> >> >>
>> >>>> >> >> >    3.
>> >>>> >> >> >    Yeah, it does not make sense to have the topic id if
>> >>>> >> >> >    DescribeTopicRequest is only used by the admin client.
>> >>>> >> >>
>> >>>> >> >> OK. That makes things simpler. We can always create a new API
>> later
>> >>>> >> >> (hopefully not in this KIP!) to query by topic ID.
>> >>>> >> >>
>> >>>> >> >> >
>> >>>> >> >> >
>> >>>> >> >> > Metrics
>> >>>> >> >> >
>> >>>> >> >> > As for overall cluster health metrics, I think under-min-ISR
>> is
>> >>>> still
>> >>>> >> a
>> >>>> >> >> > useful one. ELR is more like a safety belt. When the ELR is
>> >>>> used, the
>> >>>> >> >> > cluster availability has already been impacted.
>> >>>> >> >> >
>> >>>> >> >> > Maybe we can have a metric to count the partitions that
>> sum(ISR,
>> >>>> ELR)
>> >>>> >> <
>> >>>> >> >> min
>> >>>> >> >> > ISR. What do you think?
>> >>>> >> >>
>> >>>> >> >> How about:
>> >>>> >> >>
>> >>>> >> >> A.  a metric for the totoal number of under-min-isr partitions?
>> We
>> >>>> don't
>> >>>> >> >> have that in Apache Kafka at the moment.
>> >>>> >> >>
>> >>>> >> >> B. a metric for the number of unclean leader elections we did
>> (for
>> >>>> >> >> simplicity, it can reset to 0 on controller restart: we expect
>> >>>> people to
>> >>>> >> >> monitor the change over time anyway)
>> >>>> >> >>
>> >>>> >> >> best,
>> >>>> >> >> Colin
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >> >
>> >>>> >> >> > Yeah, for the ongoing unclean recoveries, the controller can
>> >>>> keep an
>> >>>> >> >> > accurate count through failover because partition registration
>> >>>> can
>> >>>> >> >> indicate
>> >>>> >> >> > whether a recovery is needed. However, for the happened ones,
>> >>>> unless
>> >>>> >> we
>> >>>> >> >> > want to persist the number somewhere, we can only figure it
>> out
>> >>>> from
>> >>>> >> the
>> >>>> >> >> > log.
>> >>>> >> >> >
>> >>>> >> >> > On Tue, Sep 12, 2023 at 3:16 PM Colin McCabe <
>> cmcc...@apache.org
>> >>>> >
>> >>>> >> wrote:
>> >>>> >> >> >
>> >>>> >> >> >> Also, we should have metrics that show what is going on with
>> >>>> regard
>> >>>> >> to
>> >>>> >> >> the
>> >>>> >> >> >> eligible replica set. I'm not sure exactly what to suggest,
>> but
>> >>>> >> >> something
>> >>>> >> >> >> that could identify when things are going wrong in the
>> clsuter.
>> >>>> >> >> >>
>> >>>> >> >> >> For example, maybe a metric for partitions containing
>> replicas
>> >>>> that
>> >>>> >> are
>> >>>> >> >> >> ineligible to be leader? That would show a spike when a
>> broker
>> >>>> had an
>> >>>> >> >> >> unclean restart.
>> >>>> >> >> >>
>> >>>> >> >> >> Ideally, we'd also have a metric that indicates when an
>> unclear
>> >>>> >> leader
>> >>>> >> >> >> election or a recovery happened. It's a bit tricky because
>> the
>> >>>> simple
>> >>>> >> >> >> thing, of tracking it per controller, may be a bit confusing
>> >>>> during
>> >>>> >> >> >> failovers.
>> >>>> >> >> >>
>> >>>> >> >> >> best,
>> >>>> >> >> >> Colin
>> >>>> >> >> >>
>> >>>> >> >> >>
>> >>>> >> >> >> On Tue, Sep 12, 2023, at 14:25, Colin McCabe wrote:
>> >>>> >> >> >> > Hi Calvin,
>> >>>> >> >> >> >
>> >>>> >> >> >> > Thanks for the KIP. I think this is a great improvement.
>> >>>> >> >> >> >
>> >>>> >> >> >> >> Additional High Watermark advance requirement
>> >>>> >> >> >> >
>> >>>> >> >> >> > Typo: change "advance" to "advancement"
>> >>>> >> >> >> >
>> >>>> >> >> >> >> A bit recap of some key concepts.
>> >>>> >> >> >> >
>> >>>> >> >> >> > Typo: change "bit" to "quick"
>> >>>> >> >> >> >
>> >>>> >> >> >> >> Ack=1/all produce request. It defines when the Kafka
>> server
>> >>>> should
>> >>>> >> >> >> respond to the produce request
>> >>>> >> >> >> >
>> >>>> >> >> >> > I think this section would be clearer if we talked about
>> the
>> >>>> new
>> >>>> >> high
>> >>>> >> >> >> > watermark advancement requirement first, and THEN talked
>> >>>> about its
>> >>>> >> >> >> > impact on acks=0, acks=1, and     acks=all.  acks=all is of
>> >>>> course
>> >>>> >> the
>> >>>> >> >> >> > main case we care about here, so it would be good to lead
>> with
>> >>>> >> that,
>> >>>> >> >> >> > rather than delving into the technicalities of acks=0/1
>> first.
>> >>>> >> >> >> >
>> >>>> >> >> >> >> Unclean recovery
>> >>>> >> >> >> >
>> >>>> >> >> >> > So, here you are introducing a new configuration,
>> >>>> >> >> >> > unclean.recovery.strategy. The difficult thing here is that
>> >>>> there
>> >>>> >> is a
>> >>>> >> >> >> > lot of overlap with unclean.leader.election.enable. So we
>> >>>> have 3
>> >>>> >> >> >> > different settings for unclean.recovery.strategy, plus 2
>> >>>> different
>> >>>> >> >> >> > settings for unclean.leader.election.enable, giving a cross
>> >>>> >> product of
>> >>>> >> >> >> > 6 different options. The following "unclean recovery
>> manager"
>> >>>> >> section
>> >>>> >> >> >> > only applies to one fo those 6 different possibilities (I
>> >>>> think?)
>> >>>> >> >> >> >
>> >>>> >> >> >> > I simply don't think we need so many different election
>> types.
>> >>>> >> Really
>> >>>> >> >> >> > the use-cases we need are people who want NO unclean
>> >>>> elections,
>> >>>> >> people
>> >>>> >> >> >> > who want "the reasonable thing" and people who want
>> >>>> avaialbility at
>> >>>> >> >> all
>> >>>> >> >> >> > costs.
>> >>>> >> >> >> >
>> >>>> >> >> >> > Overall, I feel like the first half of the KIP is about the
>> >>>> ELR,
>> >>>> >> and
>> >>>> >> >> >> > the second half is about reworking unclean leader
>> election. It
>> >>>> >> might
>> >>>> >> >> be
>> >>>> >> >> >> > better to move that second half to a separate KIP so that
>> we
>> >>>> can
>> >>>> >> >> figure
>> >>>> >> >> >> > it out fully. It should be fine to punt this until later
>> and
>> >>>> just
>> >>>> >> have
>> >>>> >> >> >> > the current behavior on empty ELR be waiting for the last
>> >>>> known
>> >>>> >> leader
>> >>>> >> >> >> > to return. After all, that's what we do today.
>> >>>> >> >> >> >
>> >>>> >> >> >> >> DescribeTopicRequest
>> >>>> >> >> >> >
>> >>>> >> >> >> > Is the intention for AdminClient to use this RPC for
>> >>>> >> >> >> > Admin#describeTopics ? If so, we need to describe all of
>> the
>> >>>> >> changes
>> >>>> >> >> to
>> >>>> >> >> >> > the admin client API, as well as changes to command-line
>> >>>> tools like
>> >>>> >> >> >> > kafka-topics.sh (if there are any). For example, you will
>> >>>> probably
>> >>>> >> >> need
>> >>>> >> >> >> > changes to TopicDescription.java. You will also need to
>> >>>> provide
>> >>>> >> all of
>> >>>> >> >> >> > the things that admin client needs -- for example,
>> >>>> >> >> >> > TopicAuthorizedOperations.
>> >>>> >> >> >> >
>> >>>> >> >> >> > I also don't think the controller should serve this
>> request.
>> >>>> We
>> >>>> >> want
>> >>>> >> >> to
>> >>>> >> >> >> > minimize load on the controller. Just like with the other
>> >>>> metadata
>> >>>> >> >> >> > requests like MetadataRequest, this should be served by
>> >>>> brokers.
>> >>>> >> >> >> >
>> >>>> >> >> >> > It's a bit confusing why both topic ID and topic name are
>> >>>> provided
>> >>>> >> to
>> >>>> >> >> >> > this API. Is the intention that callers should set one but
>> >>>> not the
>> >>>> >> >> >> > other? Or both? This needs to be clarified. Also, if we do
>> >>>> want to
>> >>>> >> >> >> > support lookups by UUID, that is another thing that needs
>> to
>> >>>> be
>> >>>> >> added
>> >>>> >> >> >> > to adminclient.
>> >>>> >> >> >> >
>> >>>> >> >> >> > In general, I feel like this should also probably be its
>> own
>> >>>> KIP
>> >>>> >> since
>> >>>> >> >> >> > it's fairly complex
>> >>>> >> >> >> >
>> >>>> >> >> >> > best,
>> >>>> >> >> >> > Colin
>> >>>> >> >> >> >
>> >>>> >> >> >> >
>> >>>> >> >> >> > On Thu, Aug 10, 2023, at 15:46, Calvin Liu wrote:
>> >>>> >> >> >> >> Hi everyone,
>> >>>> >> >> >> >> I'd like to discuss a series of enhancement to the
>> >>>> replication
>> >>>> >> >> protocol.
>> >>>> >> >> >> >>
>> >>>> >> >> >> >> A partition replica can experience local data loss in
>> unclean
>> >>>> >> >> shutdown
>> >>>> >> >> >> >> scenarios where unflushed data in the OS page cache is
>> lost
>> >>>> - such
>> >>>> >> >> as an
>> >>>> >> >> >> >> availability zone power outage or a server error. The
>> Kafka
>> >>>> >> >> replication
>> >>>> >> >> >> >> protocol is designed to handle these situations by
>> removing
>> >>>> such
>> >>>> >> >> >> replicas
>> >>>> >> >> >> >> from the ISR and only re-adding them once they have caught
>> >>>> up and
>> >>>> >> >> >> therefore
>> >>>> >> >> >> >> recovered any lost data. This prevents replicas that lost
>> an
>> >>>> >> >> arbitrary
>> >>>> >> >> >> log
>> >>>> >> >> >> >> suffix, which included committed data, from being elected
>> >>>> leader.
>> >>>> >> >> >> >> However, there is a "last replica standing" state which
>> when
>> >>>> >> combined
>> >>>> >> >> >> with
>> >>>> >> >> >> >> a data loss unclean shutdown event can turn a local data
>> loss
>> >>>> >> >> scenario
>> >>>> >> >> >> into
>> >>>> >> >> >> >> a global data loss scenario, i.e., committed data can be
>> >>>> removed
>> >>>> >> from
>> >>>> >> >> >> all
>> >>>> >> >> >> >> replicas. When the last replica in the ISR experiences an
>> >>>> unclean
>> >>>> >> >> >> shutdown
>> >>>> >> >> >> >> and loses committed data, it will be reelected leader
>> after
>> >>>> >> starting
>> >>>> >> >> up
>> >>>> >> >> >> >> again, causing rejoining followers to truncate their logs
>> and
>> >>>> >> thereby
>> >>>> >> >> >> >> removing the last copies of the committed records which
>> the
>> >>>> leader
>> >>>> >> >> lost
>> >>>> >> >> >> >> initially.
>> >>>> >> >> >> >>
>> >>>> >> >> >> >> The new KIP will maximize the protection and provides
>> >>>> MinISR-1
>> >>>> >> >> >> tolerance to
>> >>>> >> >> >> >> data loss unclean shutdown events.
>> >>>> >> >> >> >>
>> >>>> >> >> >> >>
>> >>>> >> >> >>
>> >>>> >> >>
>> >>>> >>
>> >>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas
>> >>>> >> >> >>
>> >>>> >> >>
>> >>>> >>
>> >>>>
>> >>>
>>

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

Reply via email to