Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-12-04 Thread Igor Soarez
Hi Viktor, Thanks for pointing this out. I forgot to make this clear in the KIP. I'll update it. ClusterAction on Cluster resource is exactly right, see `ControllerApis.handleAssignReplicasToDirs`. [1] -- Igor [1]:

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-12-04 Thread Colin McCabe
Yes, this should be CLUSTER_ACTION on CLUSTER, to be consistent with our other internal RPCs. best, Colin On Mon, Dec 4, 2023, at 04:28, Viktor Somogyi-Vass wrote: > Hi Igor, > > I'm just reading through your KIP and noticed that the new protocol you > created doesn't say anything about ACLs

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-12-04 Thread Viktor Somogyi-Vass
Hi Igor, I'm just reading through your KIP and noticed that the new protocol you created doesn't say anything about ACLs of the new AssignReplicasToDirs API. Would it make sense to authorize these requests as other inter-broker protocol calls are usually authorized, that is ClusterAction on

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-11-28 Thread Igor Soarez
Hi everyone, There have been a number of further changes. I have updated the KIP to reflect them, but for reference, I'd also like to update this thread with a summary. 1. The reserved Uuids and their names for directories have changed. The first 100 Uuids are reserved for future use. 2.

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-10-09 Thread Colin McCabe
On Fri, Oct 6, 2023, at 18:30, Igor Soarez wrote: > Hi Colin, > >> I would call #2 LOST. It was assigned in the past, but we don't know where. >> I see that you called this OFFLINE). This is not really normal... >> it should happen only when we're migrating from ZK mode to KRaft mode, >> or going

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-10-06 Thread Igor Soarez
Hi Colin, > I would call #2 LOST. It was assigned in the past, but we don't know where. > I see that you called this OFFLINE). This is not really normal... > it should happen only when we're migrating from ZK mode to KRaft mode, > or going from an older KRaft release with multiple directories to

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-10-06 Thread Igor Soarez
Hi David, Thanks for shedding light on migration goals, makes sense. Your preference for option a) makes it even more attractive. We'll keep that as the preferred approach, thanks for the advice. > One question with this approach is how the KRaft controller learns about > the multiple log

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-10-06 Thread Colin McCabe
Thanks for posting these notes, Igor. I think we should definitely distinguish between these two cases: 1. this replica hasn't been assigned to a storage directory 2. we don't know which storage directory this replica was assigned to in the past I would call #1 UNASSIGNED. I see that you

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-10-05 Thread David Arthur
Hey, just chiming in regarding the ZK migration piece. Generally speaking, one of the design goals of the migration was to have minimal changes on the ZK brokers and especially the ZK controller. Since ZK mode is our safe/well-known fallback mode, we wanted to reduce the chances of introducing

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-10-04 Thread Igor Soarez
Hi everyone, Earlier today Colin, Ron, Proven and I had a chat about this work. We discussed several aspects which I’d like to share here. ## A new reserved UUID We'll reserve a third UUID to indicate an unspecified dir, but one that is known to be selected. As opposed to the default

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-09-27 Thread Igor Soarez
Hi everyone, After a conversation with Colin McCabe and Proven Provenzano yesterday, we decided that the benefits outweigh the concerns with the overhead of associating a directory UUID to every replica in the metadata partition records. i.e. We prefer to always associate the log dir UUID even

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-09-25 Thread Igor Soarez
Hi Ron, I think we can generalize the deconfigured directory scenario in your last question to address this situation too. When handling a broker registration request, the controller can check if OfflineLogDirs=false and any UUIDs are missing in OnlineLogDirs, compared with the previous

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-09-24 Thread Ron Dagostino
Hi Igor. I've opened https://issues.apache.org/jira/browse/KAFKA-15495 to identify the data loss scenario that I was referring to, including steps on how to reproduce it. I agree this is a separate issue from JBOD per se. The disk UUID that this KIP introduces does give us the ability to avoid

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-09-23 Thread Igor Soarez
Hi everyone, I believe we can close this voting thread now, as there were three +1 binding votes from Ziming, Mickael and Ron. With that, this vote passes. Thanks to everyone who participated in reviewing, and/or taking the time to vote on this KIP! Best, -- Igor

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-09-23 Thread Igor Soarez
Hi Ron, Thanks for pointing this out. I was assuming this case was already handled in current KRaft operation with a single log directory — I wouldn't expect a broker restarting with an empty disk to cause data loss in a current KRaft system. But your question made me go look again, so here's

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-09-22 Thread Ron Dagostino
Hi Igor. Someone just asked about the case where a broker with a single log directory restarts with a blank disk. I looked at the "Metadata caching" section, and I don't think it covers it as currently written. The PartitionRecord will not have an explicit UUID for brokers that have just a

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-09-19 Thread Ron Dagostino
Ok, great, that makes sense, Igor. Thanks. +1 (binding) on the KIP from me. Ron > On Sep 13, 2023, at 11:58 AM, Igor Soarez wrote: > > Hi Ron, > > Thanks for drilling down on this. I think the KIP isn't really clear here, > and the metadata caching section you quoted needs clarification. >

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-09-13 Thread Igor Soarez
Hi Ron, Thanks for drilling down on this. I think the KIP isn't really clear here, and the metadata caching section you quoted needs clarification. The "hosting broker's latest registration" refers to the previous, not the current registration. The registrations are only compared by the

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-09-12 Thread Ron Dagostino
Thanks, Igor. Regarding the last point, I want to drill in on something. The KIP says this: "Replicas will also be considered offline if the replica references a log directory UUID (in the new field partitionRecord.Assignment.Directory) that is not present in the hosting Broker's latest

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-09-12 Thread Igor Soarez
Hi Ron, Thank you for having a look a this KIP. Indeed, the log directory UUID should always be generated and loaded. I've have corrected the wording in the KIP to clarify. It is a bit of a pain to replace the field, but I agree that is the best approach for the same reason you pointed out. I

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-09-12 Thread Igor Soarez
Hi Ziming, Thank you for having a look and taking the time to vote. I have already opened some PRs, see: https://issues.apache.org/jira/browse/KAFKA-14127 Best, -- Igor

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-09-10 Thread Ron Dagostino
Hi Igor. Thanks for all your work here. Before I can vote, I have the following questions/comments about the KIP: > When multiple log.dirs are configured, a new property will be included > in meta.properties — directory.id — which will identify each log directory > with a UUID. The UUID is

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-09-05 Thread ziming deng
Hi, Igor I’m +1(binding) for this, looking forward the PR. -- Best, Ziming > On Jul 26, 2023, at 01:13, Igor Soarez wrote: > > Hi everyone, > > Following a face-to-face discussion with Ron and Colin, > I have just made further improvements to this KIP: > > > 1. Every log directory gets a

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-08-10 Thread Igor Soarez
Hi Mickael, Thanks for voting, and for pointing out the mistake. I've corrected it in the KIP now. The proposed name is "QueuedReplicaToDirAssignments". Best, -- Igor

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-08-09 Thread Mickael Maison
Hi Igor, Thanks for the KIP, adding JBOD support to KRaft is really important. I see 2 different metric names mentioned "QueuedReplicaToDirAssignments" and "NumMismatchingReplicaToLogDirAssignments". From the descriptions it seems it's the same metric, can you clarify which name you propose

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-07-25 Thread Igor Soarez
Hi Ismael, I believe I have addressed all concerns. Please have a look, and consider a vote on this KIP. Thank you, -- Igor

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-07-25 Thread Ismael Juma
Thanks Igor. Are there unaddressed concerns or is this ready for a vote again? Ismael On Tue, Jul 25, 2023 at 6:14 PM Igor Soarez wrote: > Hi everyone, > > Following a face-to-face discussion with Ron and Colin, > I have just made further improvements to this KIP: > > > 1. Every log directory

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-07-25 Thread Igor Soarez
Hi everyone, Following a face-to-face discussion with Ron and Colin, I have just made further improvements to this KIP: 1. Every log directory gets a random UUID assigned, even if just one log dir is configured in the Broker. 2. All online log directories are registered, even if just one if

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-07-10 Thread Igor Soarez
Hi Colin, Thanks for your questions. Please have a look at my answers below. > In the previous email I asked, "who is responsible for assigning replicas to > broker directories?" Can you clarify what the answer is to that? If the > answer is the controller, there is no need for an "unknown"

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-07-03 Thread Colin McCabe
On Mon, Jun 26, 2023, at 05:08, Igor Soarez wrote: > Hi Colin, > > Thanks for your support with getting this over the line and that’s > great re the preliminary pass! Thanks also for sharing your > thoughts, I've had a careful look at each of these and sharing my > comments below. > > I agree, it

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-06-26 Thread Igor Soarez
Hi Colin, Thanks for your support with getting this over the line and that’s great re the preliminary pass! Thanks also for sharing your thoughts, I've had a careful look at each of these and sharing my comments below. I agree, it is important to avoid a perf hit on non-JBOD. I've opted for

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-06-14 Thread Colin McCabe
Hi Igor, Thanks for all your work on this! You've really done a lot to push it over the finish line. It will be awesome to see this coming to a 3.x release soon. I did a preliminary pass today. Can I please ask that you give us two weeks before you close the vote, since it's such a big change?

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-06-13 Thread ziming deng
Hi Igor, Thanks for this work. +1(binding) from me -- Best, Ziming > On Jun 12, 2023, at 18:07, Igor Soarez wrote: > > Hi everyone, > > We're getting closer to dropping ZooKeeper support, and JBOD > in KRaft mode is one of the outstanding big missing features. > > It's been a while since

[VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-06-12 Thread Igor Soarez
Hi everyone, We're getting closer to dropping ZooKeeper support, and JBOD in KRaft mode is one of the outstanding big missing features. It's been a while since there was new feedback on KIP-858 [1] which aims to address this gap, so I'm calling for a vote. A huge thank you to everyone who has