Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2024-02-29 Thread Andrew Schofield
Hi David, Thanks for the summary. It seems to me that the Flow-like option is best because it can easily handle cancellations and exceptions, returning the topic partition information and signalling when the last of the results have been returned. I think it’s also equally applicable to any of

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2024-02-28 Thread David Arthur
Andrew/Jose, I like the suggested Flow API. It's also similar to the stream observers in GPRC. I'm not sure we should expose something as complex as the Flow API directly in KafkaAdminClient, but certainly we can provide a similar interface. --- Cancellations: Another thing not yet discussed is

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2024-02-23 Thread José Armando García Sancio
Hi Calvin On Fri, Feb 23, 2024 at 9:23 AM Calvin Liu wrote: > As we agreed to implement the pagination for the new API > DescribeTopicPartitions, the client side must also add a proper interface > to handle the pagination. > The current KafkaAdminClient.describeTopics returns > the

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2024-02-23 Thread Andrew Schofield
This is an interesting problem. While it would be possible to use a Consumer, it can’t handle exceptions really. A java.util.Stream has a similar problem. I wonder whether an interface which looks like java.util.concurrent.Flow.Subscriber would be good. Something like: public interface

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2024-02-23 Thread David Arthur
Thanks for raising this here, Calvin. Since this is the first "streaming results" type API in KafkaAdminClient (as far as I know), we're treading new ground here. As you mentioned, we can either accept a consumer or return some iterable result. Returning a java.util.Stream is also an option, and

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2024-02-23 Thread Calvin Liu
Hey, As we agreed to implement the pagination for the new API DescribeTopicPartitions, the client side must also add a proper interface to handle the pagination. The current KafkaAdminClient.describeTopics returns the DescribeTopicsResult which is the future for querying all the topics. It is

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-11 Thread Calvin Liu
Hi David, Thanks for the comment. Yes, we can separate the ELR enablement from the metadata version. It is also helpful to avoid blocking the following MV releases if the user is not ready for ELR. One thing to correct is that, the Unclean recovery is controlled by

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-11 Thread David Arthur
One thing we should consider is a static config to totally enable/disable the ELR feature. If I understand the KIP correctly, we can effectively disable the unclean recovery by setting the recovery strategy config to "none". This would make development and rollout of this feature a bit smoother.

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-11 Thread Calvin Liu
Hi Jun -Good catch, yes, we don't need the -1 in the DescribeTopicRequest. -No new value is added. The LeaderRecoveryState will still be set to 1 if we have an unclean leader election. The unclean leader election includes the old random way and the unclean recovery. During the unclean recovery,

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-11 Thread Jun Rao
Hi, Calvin, Another thing. Currently, when there is an unclean leader election, we set the LeaderRecoveryState in PartitionRecord and PartitionChangeRecord to 1. With the KIP, will there be new values for LeaderRecoveryState? If not, when will LeaderRecoveryState be set to 1? Thanks, Jun On

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-10 Thread Jun Rao
Hi, Calvin, One more comment. "The first partition to fetch details for. -1 means to fetch all partitions." It seems that FirstPartitionId of 0 naturally means fetching all partitions? Thanks, Jun On Tue, Oct 10, 2023 at 12:40 PM Calvin Liu wrote: > Hi Jun, > Yeah, with the current Metadata

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-10 Thread Calvin Liu
Hi Jun, Yeah, with the current Metadata request handling, we only return errors on the Topic level, like topic not found. It seems that querying a specific partition is not a valid use case. Will update. Thanks On Tue, Oct 10, 2023 at 11:55 AM Jun Rao wrote: > Hi, Calvin, > > 60. If the range

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-10 Thread Jun Rao
Hi, Calvin, 60. If the range query has errors for some of the partitions, do we expect different responses when querying particular partitions? Thanks, Jun On Tue, Oct 10, 2023 at 10:50 AM Calvin Liu wrote: > Hi Jun > 60. Yes, it is a good question. I was thinking the API could be flexible

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-10 Thread Calvin Liu
Hi Jun 60. Yes, it is a good question. I was thinking the API could be flexible to query the particular partitions if the range query has errors for some of the partitions. Not sure whether it is a valid assumption, what do you think? 61. Good point, I will update them to partition level with the

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-10 Thread Jun Rao
Hi, Calvin, A few more minor comments on your latest update. 60. DescribeTopicRequest: When will the Partitions field be used? It seems that the FirstPartitionId field is enough for AdminClient usage. 61. Could we make the limit for DescribeTopicRequest, ElectLeadersRequest, GetReplicaLogInfo

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-04 Thread Calvin Liu
Hi David, Thanks for the comments. I thought that a new snapshot with the downgraded MV is created in this case. Isn’t it the case? Yes, you are right, a metadata delta will be generated after the MV downgrade. Then the user can start the software downgrade. - Could you also elaborate a

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-04 Thread David Jacot
Hi Calvin, I thought that a new snapshot with the downgraded MV is created in this case. Isn’t it the case? Could you also elaborate a bit more on the reasoning behind adding the limits to the admin RPCs? This is a new pattern in Kafka so it would be good to clear on the motivation. Could you

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-04 Thread Calvin Liu
Hi Jun, After the MV downgrade, the controller will write in the old version of the PartitionRecord/PartitionChangeRecord. If I understand correctly, it is possible to downgrade the software version if the controller only has to handle old version records. However, the controller will not

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-04 Thread Jun Rao
Hi, Calvin and Justine, Historically, when we change the record format in the log, we don't support software version downgrading. For the record format change in the metadata log, have we thought about forcing the write of the latest metadata records with the old version during MV downgrading?

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-04 Thread Justine Olshan
Sorry -- not MV but software version. On Wed, Oct 4, 2023 at 9:51 AM Justine Olshan wrote: > Catching up with this discussion. > > I was just curious -- have we had other instances where downgrading MV is > not supported? I think Kafka typically tries to support downgrades, and I > couldn't

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-04 Thread Justine Olshan
Catching up with this discussion. I was just curious -- have we had other instances where downgrading MV is not supported? I think Kafka typically tries to support downgrades, and I couldn't think of other examples. Thanks, Justine On Wed, Oct 4, 2023 at 9:40 AM Calvin Liu wrote: > Hi Jun, >

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-04 Thread Calvin Liu
Hi Jun, 54. Marked the software downgrading is not supported. As the old controller will not understand the new PartitionRecord and PartitionChangeRecord. Thanks! On Wed, Oct 4, 2023 at 9:12 AM Jun Rao wrote: > Hi, Calvin, > > Thanks for the reply. Just one more comment. > > 54. It seems that

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-04 Thread Jun Rao
Hi, Calvin, Thanks for the reply. Just one more comment. 54. It seems that downgrading MV is supported. Is downgrading the software version supported? It would be useful to document that. Thanks, Jun On Tue, Oct 3, 2023 at 4:55 PM Artem Livshits wrote: > Hi Colin, > > I think in your

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-03 Thread Artem Livshits
Hi Colin, I think in your example "do_unclean_recovery" would need to do different things depending on the strategy. do_unclean_recovery() { if (unclean.recovery.manager.enabled) { if (strategy == Aggressive) use UncleanRecoveryManager(waitLastKnownERL=false) // just inspect logs

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-03 Thread Calvin Liu
Hi Jun, Thanks for the comments. And thanks to Colin's explanation. 41. The unclean recovery manager may need time to be ready for production. So adding a flag to disable the new feature is in case it has any problems. The config may be deprecated or removed when we are ready. 50. Sorry for the

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-03 Thread David Arthur
Calvin, thanks for the KIP! I'm getting up to speed on the discussion. I had a few questions 57. When is the CleanShutdownFile removed? I think it probably happens after registering with the controller, but it would be good to clarify this. 58. Since the broker epoch comes from the controller,

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-03 Thread Colin McCabe
On Tue, Oct 3, 2023, at 10:49, Jun Rao wrote: > Hi, Calvin, > > Thanks for the update KIP. A few more comments. > > 41. Why would a user choose the option to select a random replica as the > leader instead of using unclean.recovery.strateg=Aggressive? It seems that > the latter is strictly better?

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-10-03 Thread Jun Rao
Hi, Calvin, Thanks for the update KIP. A few more comments. 41. Why would a user choose the option to select a random replica as the leader instead of using unclean.recovery.strateg=Aggressive? It seems that the latter is strictly better? If that's not the case, could we fold this option under

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-25 Thread Calvin Liu
Hi Jun Thanks for the comments. 40. If we change to None, it is not guaranteed for no data loss. For users who are not able to validate the data with external resources, manual intervention does not give a better result but a loss of availability. So practically speaking, the Balance mode would

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-25 Thread Colin McCabe
On Mon, Sep 25, 2023, at 15:48, Jun Rao wrote: > Hi, Calvin, > > Thanks for the updated KIP. Made another pass. A few more comments below. > > 40. unclean.leader.election.enable.false -> > unclean.recovery.strategy.Balanced: The Balanced mode could still lead to > data loss. So, I am wondering if

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-25 Thread Jun Rao
Hi, Calvin, Thanks for the updated KIP. Made another pass. A few more comments below. 40. unclean.leader.election.enable.false -> unclean.recovery.strategy.Balanced: The Balanced mode could still lead to data loss. So, I am wondering if unclean.leader.election.enable.false should map to None?

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-19 Thread Colin McCabe
Hi Calvin, Thanks for the explanations. I like the idea of using none, balanced, aggressive. We also had an offline discussion about why it is good to use a new config key (basically, so that we can deprecate the old one which had only false/true values in 4.0) With these changes, I am +1.

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-18 Thread Calvin Liu
Hi Colin, Also, can we deprecate unclean.leader.election.enable in 4.0? Before that, we can have both the config unclean.recovery.strategy and unclean.leader.election.enable and using the unclean.recovery.Enabled to determine which config to use during the unclean leader election. On Mon, Sep 18,

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-18 Thread Calvin Liu
Hi Colin, For the unclean.recovery.strategy config name, how about we use the following None. It basically means no unclean recovery will be performed. Aggressive. It means availability goes first. Whenever the partition can't elect a durable replica, the controller will try the unclean recovery.

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-15 Thread Calvin Liu
Hi Colin, > So, the proposal is that if someone sets "unclean.leader.election.enable = true"... The idea is to use one of the unclean.leader.election.enable and unclean.recovery.strategy based on the unclean.recovery.Enabled. A possible version can be If unclean.recovery.Enabled: { Check

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-15 Thread Colin McCabe
On Thu, Sep 14, 2023, at 22:23, Calvin Liu wrote: > Hi Colin > 1. I think using the new config name is more clear. >a. The unclean leader election is actually removed if unclean > recovery is in use. >b. Using multiple values in unclean.leader.election.enable is > confusing and it

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-14 Thread Calvin Liu
Hi Colin 1. I think using the new config name is more clear. a. The unclean leader election is actually removed if unclean recovery is in use. b. Using multiple values in unclean.leader.election.enable is confusing and it will be more confusing after people forget about this

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-14 Thread Colin McCabe
Hi Calvin, Thanks for the changes. 1. Earlier I commented that creating "unclean.recovery.strategy " is not necessary, and we can just reuse the existing "unclean.leader.election.enable" configuration key. Let's discuss that. 2.I also don't understand why you didn't add a configuration to

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-14 Thread Calvin Liu
Hi Colin Thanks for the comments! I did the following changes 1. Simplified the API spec section to only include the diff. 2. Reordered the HWM requirement section. 3. Removed the URM implementation details to keep the necessary characteristics to perform the unclean

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-13 Thread Colin McCabe
On Tue, Sep 12, 2023, at 17:21, Calvin Liu wrote: > Hi Colin > Thanks for the comments! > Hi Calvin, Thanks again for the KIP. One meta-comment: it's usually better to just do a diff on a message spec file or java file if you're including changes to it in the KIP. This is easier to read than

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-12 Thread Calvin Liu
Hi Colin Thanks for the comments! Rewrite the Additional High Watermark advancement requirement There was feedback on this section that some readers may not be familiar with HWM and Ack=0,1,all requests. This can help them understand the proposal. I will rewrite this part for more readability.

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-12 Thread Calvin Liu
Hi Jun, I have a second thought on the kafka.replication.electable_replicas_count. I don't think it is a useful metric and it seems wasteful to have a per partition metric to track the sum(ISR, ELR). I will remove this metric. On Tue, Sep 12, 2023 at 3:35 PM Calvin Liu wrote: > Hi Jun, > Thanks

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-12 Thread Calvin Liu
Hi Jun, Thanks for the follow-up. Adjusted the KIP as following. 16. The controller does not know whether a broker is live or not when it is fenced. Updated the behavior. In Balanced mode, the URM will start recovery when all the members in the LastKnownELR are all unfenced. In the Proactive

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-12 Thread Colin McCabe
Also, we should have metrics that show what is going on with regard to the eligible replica set. I'm not sure exactly what to suggest, but something that could identify when things are going wrong in the clsuter. For example, maybe a metric for partitions containing replicas that are

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-12 Thread Colin McCabe
Hi Calvin, Thanks for the KIP. I think this is a great improvement. > Additional High Watermark advance requirement Typo: change "advance" to "advancement" > A bit recap of some key concepts. Typo: change "bit" to "quick" > Ack=1/all produce request. It defines when the Kafka server should

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-12 Thread Jun Rao
Hi, Calvin, Thanks for the reply and the updated KIP. A few more comments. 16. "The URM will query all the replicas including the fenced replicas." Currently, could we tell whether a fenced broker is alive or not? Ideally, we only want to query the live brokers. Otherwise URM will be forced to

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-08 Thread Calvin Liu
Hi Artem Thanks so much for the comments! 1. Yes, you are right, when the leader gets fenced, it will be put into ELR. The unclean recovery can only be triggered if the mode is Proactive. Let me clarify the trigger requirement in the KIP. 2. Good point, the controller should wait for all the

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-08 Thread Calvin Liu
Hi Jun Thanks for the comments! 10. Updated 11. It is mentioned in the ELR invariants section. It is mainly to save metadata space. 12. Good catch, let me update the graph. 13. The thing we did not change is the original HWM advance requirement “replicate to maximum ISR”. The proposal adds

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-07 Thread Artem Livshits
Hi Calvin, Thanks for the KIP. The new ELR protocol looks good to me. I have some questions about unclean recovery, specifically in "balanced" mode: 1. The KIP mentions that the controller would trigger unclear recovery when the leader is fenced, but my understanding is that when a leader is

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-07 Thread Jun Rao
Hi, Calvin, Thanks for the KIP. A few comments below. 10. "The High Watermark forwarding still requires a quorum within the ISR." Should it say that it requires the full ISR? 11. "remove the duplicate member in both ISR and ELR from ELR." Hmm, not sure that I follow since the KIP says that ELR

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-06 Thread Calvin Liu
Hi Jack Thanks for the comment. I have updated the reassignment part. Now the reassignment can only be completed/canceled if the final ISR size is larger than min ISR. Thanks to your efforts of the TLA+! It has been a great help to the KIP! On Wed, Sep 6, 2023 at 6:32 AM Jack Vanlightly wrote:

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-06 Thread Jack Vanlightly
Hi Calvin, Regarding partition reassignment, I have two comments. I notice the KIP says "The AlterPartitionReassignments will not change the ELR" however, when a reassignment completes (or reverts) any replicas removed from the replica set would be removed from the ELR. Sounds obvious but I

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-09-01 Thread Calvin Liu
Hi Justine 1. With the new proposal, in order to let the consumer consume a message when only 1 replica commits it to its log, the min ISR has to be set to 1. 2. Yes, we will need an unclean recovery if the leader has an unclean shutdown. 3. If the min ISR config is changed to a larger value, the

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-08-31 Thread Justine Olshan
Hey Calvin, Thanks for the responses. I think I understood most of it, but had a few follow up questions 1. For the acks=1 case, I was wondering if there is any way to continue with the current behavior (ie -- we only need one ack to produce to the log and consider the request complete.) My

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-08-31 Thread Calvin Liu
Hi Justine Thanks for the questions! *a. For my understanding, will we block replication? Or just the high watermark advancement?* - The replication will not be blocked. The followers are free to replicate messages above the HWM. Only HWM advancement is blocked. b. *Also in the acks=1 case,

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-08-29 Thread Justine Olshan
Hey Calvin, Thanks for the KIP. This will close some of the gaps in leader election! I has a few questions: *>* *High Watermark can only advance if the ISR size is larger or equal to min.insync.replicas*. For my understanding, will we block replication? Or just the high watermark advancement?

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-08-14 Thread Calvin Liu
1. Yes, the new protocol requires 2 things to advance the HWM. a) The messages have been replicated to the controller-committed ISR members. b) The number of ISR members should be at least the min ISR. 2. With the current protocol, we are not able to select broker 1 as the leader.

Re: [DISCUSS] KIP-966: Eligible Leader Replicas

2023-08-11 Thread Jeff Kim
Hi Calvin, Thanks for the KIP! I'm still digesting it but I have two questions: > In the scenario raised in the motivation section, the server may receive ack=1 messages during T1 and advance High Watermark when the leader is the only one in ISR. To confirm, the current protocol allows

[DISCUSS] KIP-966: Eligible Leader Replicas

2023-08-10 Thread Calvin Liu
Hi everyone, I'd like to discuss a series of enhancement to the replication protocol. A partition replica can experience local data loss in unclean shutdown scenarios where unflushed data in the OS page cache is lost - such as an availability zone power outage or a server error. The Kafka