Hi, Calvin, Thanks for the updated KIP. Made another pass. A few more comments below.
40. unclean.leader.election.enable.false -> unclean.recovery.strategy.Balanced: The Balanced mode could still lead to data loss. So, I am wondering if unclean.leader.election.enable.false should map to None? 41. unclean.recovery.manager.enabled: I am not sure why we introduce this additional config. Is it the same as unclean.recovery.strategy=None? 42. DescribeTopicResponse.TopicAuthorizedOperations: Should this be at the topic level? 43. "Limit: 20 topics max per request": Could we describe what happens if the request includes more than 20 topics? 44. ElectLeadersRequest.DesiredLeaders: Could we describe whether the ordering matters? 45. GetReplicaLogInfo.TopicPartitions: "about": "The topic partitions to elect leaders.": The description in "about" is incorrect. 46. GetReplicaLogInfoResponse: Should we nest partitions under topicId to be consistent with other types of responses? 47. kafka-leader-election.sh: 47.1 Could we explain DESIGNATION? 47.2 desiredLeader: Should it be a list to match the field in ElectLeadersRequest? 48. We could add a section on downgrade? 49. LastKnownLeader: This seems only needed in the first phase of delivering ELR. Will it be removed when the complete KIP is delivered? Thanks, Jun On Tue, Sep 19, 2023 at 1:30 PM Colin McCabe <cmcc...@apache.org> wrote: > Hi Calvin, > > Thanks for the explanations. I like the idea of using none, balanced, > aggressive. We also had an offline discussion about why it is good to use a > new config key (basically, so that we can deprecate the old one which had > only false/true values in 4.0) With these changes, I am +1. > > best, > Colin > > On Mon, Sep 18, 2023, at 15:54, Calvin Liu wrote: > > Hi Colin, > > Also, can we deprecate unclean.leader.election.enable in 4.0? Before > that, > > we can have both the config unclean.recovery.strategy and > > unclean.leader.election.enable > > and using the unclean.recovery.Enabled to determine which config to use > > during the unclean leader election. > > > > On Mon, Sep 18, 2023 at 3:51 PM Calvin Liu <ca...@confluent.io> wrote: > > > >> Hi Colin, > >> For the unclean.recovery.strategy config name, how about we use the > >> following > >> None. It basically means no unclean recovery will be performed. > >> Aggressive. It means availability goes first. Whenever the partition > can't > >> elect a durable replica, the controller will try the unclean recovery. > >> Balanced. It is the balance point of the availability first(Aggressive) > >> and least availability(None). The controller performs unclean recovery > when > >> both ISR and ELR are empty. > >> > >> > >> On Fri, Sep 15, 2023 at 11:42 AM Calvin Liu <ca...@confluent.io> wrote: > >> > >>> Hi Colin, > >>> > >>> > So, the proposal is that if someone sets > "unclean.leader.election.enable > >>> = true"... > >>> > >>> > >>> The idea is to use one of the unclean.leader.election.enable and > >>> unclean.recovery.strategy based on the unclean.recovery.Enabled. A > possible > >>> version can be > >>> > >>> If unclean.recovery.Enabled: > >>> > >>> { > >>> > >>> Check unclean.recovery.strategy. If set, use it. Otherwise, check > >>> unclean.leader.election.enable and translate it to > >>> unclean.recovery.strategy. > >>> > >>> } else { > >>> > >>> Use unclean.leader.election.enable > >>> > >>> } > >>> > >>> > >>> —-------- > >>> > >>> >The configuration key should be "unclean.recovery.manager.enabled", > >>> right? > >>> > >>> > >>> I think we have two ways of choosing a leader uncleanly, unclean leader > >>> election and unclean recovery(log inspection) and we try to switch > between > >>> them. > >>> > >>> Do you mean we want to develop two ways of performing the unclean > >>> recovery and one of them is using “unclean recovery manager”? I guess > we > >>> haven’t discussed the second way. > >>> > >>> > >>> —------- > >>> > >>> >How do these 4 levels of overrides interact with your new > >>> configurations? > >>> > >>> > >>> I do notice in the Kraft controller code, the method to check whether > >>> perform unclean leader election is hard coded to false since > >>> 2021(uncleanLeaderElectionEnabledForTopic). Isn’t it a good chance to > >>> completely deprecate the unclean.leader.election.enable? We don’t even > have > >>> to worry about the config conversion. > >>> > >>> On the other hand, whatever the override is, as long as the controller > >>> can have the final effective unclean.leader.election.enable, the topic > >>> level config unclean.recovery.strategy, the cluster level config > >>> unclean.recovery.Enabled, the controller can calculate the correct > methods > >>> to use right? > >>> > >>> > >>> On Fri, Sep 15, 2023 at 10:02 AM Colin McCabe <cmcc...@apache.org> > wrote: > >>> > >>>> On Thu, Sep 14, 2023, at 22:23, Calvin Liu wrote: > >>>> > Hi Colin > >>>> > 1. I think using the new config name is more clear. > >>>> > a. The unclean leader election is actually removed if unclean > >>>> > recovery is in use. > >>>> > b. Using multiple values in unclean.leader.election.enable is > >>>> > confusing and it will be more confusing after people forget about > this > >>>> > discussion. > >>>> > >>>> Hi Calvin, > >>>> > >>>> So, the proposal is that if someone sets > "unclean.leader.election.enable > >>>> = true" but then sets one of your new configurations, the value of > >>>> unclean.leader.election.enable is ignored? That seems less clear to > me, not > >>>> more. Just in general, having multiple configuration keys to control > the > >>>> same thing confuses users. Basically, they are sitting at a giant > control > >>>> panel, and some of the levers do nothing. > >>>> > >>>> > 2. Sorry I forgot to mention in the response that I did add the > >>>> > unclean.recovery.Enabled flag. > >>>> > >>>> The configuration key should be "unclean.recovery.manager.enabled", > >>>> right? Becuase we can do "unclean recovery" without the manager. > Disabling > >>>> the manager just means we use a different mechanism for recovery. > >>>> > >>>> > c. Maybe I underestimated the challenge of replacing the > >>>> config. Any > >>>> > implementation problems ahead? > >>>> > >>>> There are four levels of overrides for unclean.leader.election.enable. > >>>> > >>>> 1. static configuration for node. > >>>> This goes in the configuration file, typically named > >>>> server.properties > >>>> > >>>> 2. dynamic configuration for node default > >>>> ConfigResource(type=BROKER, name="") > >>>> > >>>> 3. dynamic configuration for node > >>>> ConfigResource(type=BROKER, name=<controller id>) > >>>> > >>>> 4. dynamic configuration for topic > >>>> ConfigResource(type=TOPIC, name=<topic-name>) > >>>> > >>>> How do these 4 levels of overrides interact with your new > >>>> configurations? If the new configurations dominate over the old ones, > it > >>>> seems like this will get a lot more confusing to implement (and also > to > >>>> use.) > >>>> > >>>> Again, I'd recommend just adding some new values to > >>>> unclean.leader.election.enable. It's simple and will prevent user > confusion > >>>> (as well as developer confusion.) > >>>> > >>>> best, > >>>> Colin > >>>> > >>>> > >>>> > 3. About the admin client, I mentioned 3 changes in the client. > >>>> Anything > >>>> > else I missed in the KIP? > >>>> > a. The client will switch to using the new RPC instead of > >>>> > MetadataRequest for the topics. > >>>> > b. The TopicPartitionInfo used in TopicDescription needs to > add > >>>> new > >>>> > fields related to the ELR. > >>>> > c. The outputs will add the ELR related fields. > >>>> > > >>>> > On Thu, Sep 14, 2023 at 9:19 PM Colin McCabe <cmcc...@apache.org> > >>>> wrote: > >>>> > > >>>> >> Hi Calvin, > >>>> >> > >>>> >> Thanks for the changes. > >>>> >> > >>>> >> 1. Earlier I commented that creating "unclean.recovery.strategy " > is > >>>> not > >>>> >> necessary, and we can just reuse the existing > >>>> >> "unclean.leader.election.enable" configuration key. Let's discuss > >>>> that. > >>>> >> > >>>> >> 2.I also don't understand why you didn't add a configuration to > >>>> enable or > >>>> >> disable the Unclean Recovery Manager. This seems like a very simple > >>>> way to > >>>> >> handle the staging issue which we discussed. The URM can just be > >>>> turned off > >>>> >> until it is production ready. Let's discuss this. > >>>> >> > >>>> >> 3. You still need to describe the changes to AdminClient that are > >>>> needed > >>>> >> to use DescribeTopicRequest. > >>>> >> > >>>> >> Keep at it. It's looking better. :) > >>>> >> > >>>> >> best, > >>>> >> Colin > >>>> >> > >>>> >> > >>>> >> On Thu, Sep 14, 2023, at 11:03, Calvin Liu wrote: > >>>> >> > Hi Colin > >>>> >> > Thanks for the comments! > >>>> >> > > >>>> >> > I did the following changes > >>>> >> > > >>>> >> > 1. > >>>> >> > > >>>> >> > Simplified the API spec section to only include the diff. > >>>> >> > 2. > >>>> >> > > >>>> >> > Reordered the HWM requirement section. > >>>> >> > 3. > >>>> >> > > >>>> >> > Removed the URM implementation details to keep the necessary > >>>> >> > characteristics to perform the unclean recovery. > >>>> >> > 1. > >>>> >> > > >>>> >> > When to perform the unclean recovery > >>>> >> > 2. > >>>> >> > > >>>> >> > Under different config, how the unclean recovery finds the > >>>> leader. > >>>> >> > 3. > >>>> >> > > >>>> >> > How the config unclean.leader.election.enable and > >>>> >> > unclean.recovery.strategy are converted when users > >>>> enable/disable > >>>> >> the > >>>> >> > unclean recovery. > >>>> >> > 4. > >>>> >> > > >>>> >> > More details about how we change admin client. > >>>> >> > 5. > >>>> >> > > >>>> >> > API limits on the GetReplicaLogInfoRequest and > >>>> DescribeTopicRequest. > >>>> >> > 6. > >>>> >> > > >>>> >> > Two metrics added > >>>> >> > 1. > >>>> >> > > >>>> >> > Kafka.controller.global_under_min_isr_partition_count > >>>> >> > 2. > >>>> >> > > >>>> >> > kafka.controller.unclean_recovery_finished_count > >>>> >> > > >>>> >> > > >>>> >> > On Wed, Sep 13, 2023 at 10:46 AM Colin McCabe < > cmcc...@apache.org> > >>>> >> wrote: > >>>> >> > > >>>> >> >> On Tue, Sep 12, 2023, at 17:21, Calvin Liu wrote: > >>>> >> >> > Hi Colin > >>>> >> >> > Thanks for the comments! > >>>> >> >> > > >>>> >> >> > >>>> >> >> Hi Calvin, > >>>> >> >> > >>>> >> >> Thanks again for the KIP. > >>>> >> >> > >>>> >> >> One meta-comment: it's usually better to just do a diff on a > >>>> message > >>>> >> spec > >>>> >> >> file or java file if you're including changes to it in the KIP. > >>>> This is > >>>> >> >> easier to read than looking for "new fields begin" etc. in the > >>>> text, and > >>>> >> >> gracefully handles the case where existing fields were changed. > >>>> >> >> > >>>> >> >> > Rewrite the Additional High Watermark advancement requirement > >>>> >> >> > There was feedback on this section that some readers may not > be > >>>> >> familiar > >>>> >> >> > with HWM and Ack=0,1,all requests. This can help them > understand > >>>> the > >>>> >> >> > proposal. I will rewrite this part for more readability. > >>>> >> >> > > >>>> >> >> > >>>> >> >> To be clear, I wasn't suggesting dropping either section. I > agree > >>>> that > >>>> >> >> they add useful background. I was just suggesting that we should > >>>> discuss > >>>> >> >> the "acks" setting AFTER discussing the new high watermark > >>>> advancement > >>>> >> >> conditions. We also should discuss acks=0. While it isn't > >>>> conceptually > >>>> >> much > >>>> >> >> different than acks=1 here, its omission from this section is > >>>> confusing. > >>>> >> >> > >>>> >> >> > Unclean recovery > >>>> >> >> > > >>>> >> >> > The plan is to replace the unclean.leader.election.enable with > >>>> >> >> > unclean.recovery.strategy. If the Unclean Recovery is enabled > >>>> then it > >>>> >> >> deals > >>>> >> >> > with the three options in the unclean.recovery.strategy. > >>>> >> >> > > >>>> >> >> > > >>>> >> >> > Let’s refine the Unclean Recovery. We have already taken a > lot of > >>>> >> >> > suggestions and I hope to enhance the durability of Kafka to > the > >>>> next > >>>> >> >> level > >>>> >> >> > with this KIP. > >>>> >> >> > >>>> >> >> I am OK with doing the unclean leader recovery improvements in > >>>> this KIP. > >>>> >> >> However, I think we need to really work on the configuration > >>>> settings. > >>>> >> >> > >>>> >> >> Configuration overrides are often quite messy. For example, the > >>>> cases > >>>> >> >> where we have log.roll.hours and log.roll.segment.ms, the user > >>>> has to > >>>> >> >> remember which one takes precedence, and it is not obvious. So, > >>>> rather > >>>> >> than > >>>> >> >> creating a new configuration, why not add additional values to > >>>> >> >> "unclean.leader.election.enable"? I think this will be simpler > for > >>>> >> people > >>>> >> >> to understand, and simpler in the code as well. > >>>> >> >> > >>>> >> >> What if we continued to use "unclean.leader.election.enable" but > >>>> >> extended > >>>> >> >> it so that it took a string? Then the string could have these > >>>> values: > >>>> >> >> > >>>> >> >> never > >>>> >> >> never automatically do an unclean leader election under any > >>>> >> conditions > >>>> >> >> > >>>> >> >> false / default > >>>> >> >> only do an unclean leader election if there may be possible > >>>> data > >>>> >> loss > >>>> >> >> > >>>> >> >> true / always > >>>> >> >> always do an unclean leader election if we can't immediately > >>>> elect a > >>>> >> >> leader > >>>> >> >> > >>>> >> >> It's a bit awkward that false maps to default rather than to > >>>> never. But > >>>> >> >> this awkwardness exists if we use two different configuration > keys > >>>> as > >>>> >> well. > >>>> >> >> The reason for the awkwardness is that we simply don't want most > >>>> of the > >>>> >> >> people currently setting unclean.leader.election.enable=false to > >>>> get the > >>>> >> >> "never" behavior. We have to bite that bullet. Better to be > clear > >>>> and > >>>> >> >> explicit than hide it. > >>>> >> >> > >>>> >> >> Another thing that's a bit awkward is having two different ways > to > >>>> do > >>>> >> >> unclean leader election specified in the KIP. You descirbe two > >>>> methods: > >>>> >> the > >>>> >> >> simple "choose the last leader" method, and the "unclean > recovery > >>>> >> manager" > >>>> >> >> method. I understand why you did it this way -- "choose the last > >>>> >> leader" is > >>>> >> >> simple, and will help us deliver an implementation quickly, > while > >>>> the > >>>> >> URM > >>>> >> >> is preferable in the long term. My suggestion here is to > separate > >>>> the > >>>> >> >> decision of HOW to do unclean leader election from the decision > of > >>>> WHEN > >>>> >> to > >>>> >> >> do it. > >>>> >> >> > >>>> >> >> So in other words, have "unclean.leader.election.enable" specify > >>>> when we > >>>> >> >> do unclean leader election, and have a new configuration like > >>>> >> >> "unclean.recovery.manager.enable" to determine if we use the > URM. > >>>> >> >> Presumably the URM will take some time to get fully stable, so > >>>> this can > >>>> >> >> default to false for a while, and we can flip the default to > true > >>>> when > >>>> >> we > >>>> >> >> feel ready. > >>>> >> >> > >>>> >> >> The URM is somewhat under-described here. I think we need a few > >>>> >> >> configurations here for it. For example, we need a > configuration to > >>>> >> specify > >>>> >> >> how long it should wait for a broker to respond to its RPCs > before > >>>> >> moving > >>>> >> >> on. We also need to understand how the URM interacts with > >>>> >> >> unclean.leader.election.enable=always. I assume that with > "always" > >>>> we > >>>> >> will > >>>> >> >> just unconditionally use the URM rather than choosing randomly. > >>>> But this > >>>> >> >> should be spelled out in the KIP. > >>>> >> >> > >>>> >> >> > > >>>> >> >> > DescribeTopicRequest > >>>> >> >> > > >>>> >> >> > 1. > >>>> >> >> > Yes, the plan is to replace the MetadataRequest with the > >>>> >> >> > DescribeTopicRequest for the admin clients. Will check the > >>>> details. > >>>> >> >> > >>>> >> >> Sounds good. But as I said, you need to specify how AdminClient > >>>> >> interacts > >>>> >> >> with the new request. This will involve adding some fields to > >>>> >> >> TopicDescription.java. And you need to specify the changes to > the > >>>> >> >> kafka-topics.sh command line tool. Otherwise we cannot use the > >>>> tool to > >>>> >> see > >>>> >> >> the new information. > >>>> >> >> > >>>> >> >> The new requests, DescribeTopicRequest and > >>>> GetReplicaLogInfoRequest, > >>>> >> need > >>>> >> >> to have limits placed on them so that their size can't be > >>>> infinite. We > >>>> >> >> don't want to propagate the current problems of MetadataRequest, > >>>> where > >>>> >> >> clients can request massive responses that can mess up the JVM > when > >>>> >> handled. > >>>> >> >> > >>>> >> >> Adding limits is simple for GetReplicaLogInfoRequest -- we can > >>>> just say > >>>> >> >> that only 2000 partitions at a time can be requested. For > >>>> >> >> DescribeTopicRequest we can probably just limit to 20 topics or > >>>> >> something > >>>> >> >> like that, to avoid the complexity of doing pagination in this > KIP. > >>>> >> >> > >>>> >> >> > 2. > >>>> >> >> > I can let the broker load the ELR info so that they can > serve > >>>> the > >>>> >> >> > DescribeTopicRequest as well. > >>>> >> >> > >>>> >> >> Yes, it's fine to add to MetadataCache. In fact, you'll be > loading > >>>> it > >>>> >> >> anyway once it's added to PartitionImage. > >>>> >> >> > >>>> >> >> > 3. > >>>> >> >> > Yeah, it does not make sense to have the topic id if > >>>> >> >> > DescribeTopicRequest is only used by the admin client. > >>>> >> >> > >>>> >> >> OK. That makes things simpler. We can always create a new API > later > >>>> >> >> (hopefully not in this KIP!) to query by topic ID. > >>>> >> >> > >>>> >> >> > > >>>> >> >> > > >>>> >> >> > Metrics > >>>> >> >> > > >>>> >> >> > As for overall cluster health metrics, I think under-min-ISR > is > >>>> still > >>>> >> a > >>>> >> >> > useful one. ELR is more like a safety belt. When the ELR is > >>>> used, the > >>>> >> >> > cluster availability has already been impacted. > >>>> >> >> > > >>>> >> >> > Maybe we can have a metric to count the partitions that > sum(ISR, > >>>> ELR) > >>>> >> < > >>>> >> >> min > >>>> >> >> > ISR. What do you think? > >>>> >> >> > >>>> >> >> How about: > >>>> >> >> > >>>> >> >> A. a metric for the totoal number of under-min-isr partitions? > We > >>>> don't > >>>> >> >> have that in Apache Kafka at the moment. > >>>> >> >> > >>>> >> >> B. a metric for the number of unclean leader elections we did > (for > >>>> >> >> simplicity, it can reset to 0 on controller restart: we expect > >>>> people to > >>>> >> >> monitor the change over time anyway) > >>>> >> >> > >>>> >> >> best, > >>>> >> >> Colin > >>>> >> >> > >>>> >> >> > >>>> >> >> > > >>>> >> >> > Yeah, for the ongoing unclean recoveries, the controller can > >>>> keep an > >>>> >> >> > accurate count through failover because partition registration > >>>> can > >>>> >> >> indicate > >>>> >> >> > whether a recovery is needed. However, for the happened ones, > >>>> unless > >>>> >> we > >>>> >> >> > want to persist the number somewhere, we can only figure it > out > >>>> from > >>>> >> the > >>>> >> >> > log. > >>>> >> >> > > >>>> >> >> > On Tue, Sep 12, 2023 at 3:16 PM Colin McCabe < > cmcc...@apache.org > >>>> > > >>>> >> wrote: > >>>> >> >> > > >>>> >> >> >> Also, we should have metrics that show what is going on with > >>>> regard > >>>> >> to > >>>> >> >> the > >>>> >> >> >> eligible replica set. I'm not sure exactly what to suggest, > but > >>>> >> >> something > >>>> >> >> >> that could identify when things are going wrong in the > clsuter. > >>>> >> >> >> > >>>> >> >> >> For example, maybe a metric for partitions containing > replicas > >>>> that > >>>> >> are > >>>> >> >> >> ineligible to be leader? That would show a spike when a > broker > >>>> had an > >>>> >> >> >> unclean restart. > >>>> >> >> >> > >>>> >> >> >> Ideally, we'd also have a metric that indicates when an > unclear > >>>> >> leader > >>>> >> >> >> election or a recovery happened. It's a bit tricky because > the > >>>> simple > >>>> >> >> >> thing, of tracking it per controller, may be a bit confusing > >>>> during > >>>> >> >> >> failovers. > >>>> >> >> >> > >>>> >> >> >> best, > >>>> >> >> >> Colin > >>>> >> >> >> > >>>> >> >> >> > >>>> >> >> >> On Tue, Sep 12, 2023, at 14:25, Colin McCabe wrote: > >>>> >> >> >> > Hi Calvin, > >>>> >> >> >> > > >>>> >> >> >> > Thanks for the KIP. I think this is a great improvement. > >>>> >> >> >> > > >>>> >> >> >> >> Additional High Watermark advance requirement > >>>> >> >> >> > > >>>> >> >> >> > Typo: change "advance" to "advancement" > >>>> >> >> >> > > >>>> >> >> >> >> A bit recap of some key concepts. > >>>> >> >> >> > > >>>> >> >> >> > Typo: change "bit" to "quick" > >>>> >> >> >> > > >>>> >> >> >> >> Ack=1/all produce request. It defines when the Kafka > server > >>>> should > >>>> >> >> >> respond to the produce request > >>>> >> >> >> > > >>>> >> >> >> > I think this section would be clearer if we talked about > the > >>>> new > >>>> >> high > >>>> >> >> >> > watermark advancement requirement first, and THEN talked > >>>> about its > >>>> >> >> >> > impact on acks=0, acks=1, and acks=all. acks=all is of > >>>> course > >>>> >> the > >>>> >> >> >> > main case we care about here, so it would be good to lead > with > >>>> >> that, > >>>> >> >> >> > rather than delving into the technicalities of acks=0/1 > first. > >>>> >> >> >> > > >>>> >> >> >> >> Unclean recovery > >>>> >> >> >> > > >>>> >> >> >> > So, here you are introducing a new configuration, > >>>> >> >> >> > unclean.recovery.strategy. The difficult thing here is that > >>>> there > >>>> >> is a > >>>> >> >> >> > lot of overlap with unclean.leader.election.enable. So we > >>>> have 3 > >>>> >> >> >> > different settings for unclean.recovery.strategy, plus 2 > >>>> different > >>>> >> >> >> > settings for unclean.leader.election.enable, giving a cross > >>>> >> product of > >>>> >> >> >> > 6 different options. The following "unclean recovery > manager" > >>>> >> section > >>>> >> >> >> > only applies to one fo those 6 different possibilities (I > >>>> think?) > >>>> >> >> >> > > >>>> >> >> >> > I simply don't think we need so many different election > types. > >>>> >> Really > >>>> >> >> >> > the use-cases we need are people who want NO unclean > >>>> elections, > >>>> >> people > >>>> >> >> >> > who want "the reasonable thing" and people who want > >>>> avaialbility at > >>>> >> >> all > >>>> >> >> >> > costs. > >>>> >> >> >> > > >>>> >> >> >> > Overall, I feel like the first half of the KIP is about the > >>>> ELR, > >>>> >> and > >>>> >> >> >> > the second half is about reworking unclean leader > election. It > >>>> >> might > >>>> >> >> be > >>>> >> >> >> > better to move that second half to a separate KIP so that > we > >>>> can > >>>> >> >> figure > >>>> >> >> >> > it out fully. It should be fine to punt this until later > and > >>>> just > >>>> >> have > >>>> >> >> >> > the current behavior on empty ELR be waiting for the last > >>>> known > >>>> >> leader > >>>> >> >> >> > to return. After all, that's what we do today. > >>>> >> >> >> > > >>>> >> >> >> >> DescribeTopicRequest > >>>> >> >> >> > > >>>> >> >> >> > Is the intention for AdminClient to use this RPC for > >>>> >> >> >> > Admin#describeTopics ? If so, we need to describe all of > the > >>>> >> changes > >>>> >> >> to > >>>> >> >> >> > the admin client API, as well as changes to command-line > >>>> tools like > >>>> >> >> >> > kafka-topics.sh (if there are any). For example, you will > >>>> probably > >>>> >> >> need > >>>> >> >> >> > changes to TopicDescription.java. You will also need to > >>>> provide > >>>> >> all of > >>>> >> >> >> > the things that admin client needs -- for example, > >>>> >> >> >> > TopicAuthorizedOperations. > >>>> >> >> >> > > >>>> >> >> >> > I also don't think the controller should serve this > request. > >>>> We > >>>> >> want > >>>> >> >> to > >>>> >> >> >> > minimize load on the controller. Just like with the other > >>>> metadata > >>>> >> >> >> > requests like MetadataRequest, this should be served by > >>>> brokers. > >>>> >> >> >> > > >>>> >> >> >> > It's a bit confusing why both topic ID and topic name are > >>>> provided > >>>> >> to > >>>> >> >> >> > this API. Is the intention that callers should set one but > >>>> not the > >>>> >> >> >> > other? Or both? This needs to be clarified. Also, if we do > >>>> want to > >>>> >> >> >> > support lookups by UUID, that is another thing that needs > to > >>>> be > >>>> >> added > >>>> >> >> >> > to adminclient. > >>>> >> >> >> > > >>>> >> >> >> > In general, I feel like this should also probably be its > own > >>>> KIP > >>>> >> since > >>>> >> >> >> > it's fairly complex > >>>> >> >> >> > > >>>> >> >> >> > best, > >>>> >> >> >> > Colin > >>>> >> >> >> > > >>>> >> >> >> > > >>>> >> >> >> > On Thu, Aug 10, 2023, at 15:46, Calvin Liu wrote: > >>>> >> >> >> >> Hi everyone, > >>>> >> >> >> >> I'd like to discuss a series of enhancement to the > >>>> replication > >>>> >> >> protocol. > >>>> >> >> >> >> > >>>> >> >> >> >> A partition replica can experience local data loss in > unclean > >>>> >> >> shutdown > >>>> >> >> >> >> scenarios where unflushed data in the OS page cache is > lost > >>>> - such > >>>> >> >> as an > >>>> >> >> >> >> availability zone power outage or a server error. The > Kafka > >>>> >> >> replication > >>>> >> >> >> >> protocol is designed to handle these situations by > removing > >>>> such > >>>> >> >> >> replicas > >>>> >> >> >> >> from the ISR and only re-adding them once they have caught > >>>> up and > >>>> >> >> >> therefore > >>>> >> >> >> >> recovered any lost data. This prevents replicas that lost > an > >>>> >> >> arbitrary > >>>> >> >> >> log > >>>> >> >> >> >> suffix, which included committed data, from being elected > >>>> leader. > >>>> >> >> >> >> However, there is a "last replica standing" state which > when > >>>> >> combined > >>>> >> >> >> with > >>>> >> >> >> >> a data loss unclean shutdown event can turn a local data > loss > >>>> >> >> scenario > >>>> >> >> >> into > >>>> >> >> >> >> a global data loss scenario, i.e., committed data can be > >>>> removed > >>>> >> from > >>>> >> >> >> all > >>>> >> >> >> >> replicas. When the last replica in the ISR experiences an > >>>> unclean > >>>> >> >> >> shutdown > >>>> >> >> >> >> and loses committed data, it will be reelected leader > after > >>>> >> starting > >>>> >> >> up > >>>> >> >> >> >> again, causing rejoining followers to truncate their logs > and > >>>> >> thereby > >>>> >> >> >> >> removing the last copies of the committed records which > the > >>>> leader > >>>> >> >> lost > >>>> >> >> >> >> initially. > >>>> >> >> >> >> > >>>> >> >> >> >> The new KIP will maximize the protection and provides > >>>> MinISR-1 > >>>> >> >> >> tolerance to > >>>> >> >> >> >> data loss unclean shutdown events. > >>>> >> >> >> >> > >>>> >> >> >> >> > >>>> >> >> >> > >>>> >> >> > >>>> >> > >>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas > >>>> >> >> >> > >>>> >> >> > >>>> >> > >>>> > >>> >