Hi, Not sure what happened, I thought I had saved my changes but upon reopening the page on the wiki, my changes appeared as draft. I saved again and verified the page has updated now.
Thanks, Mickael On Fri, Jul 18, 2025 at 5:52 PM Jun Rao <j...@confluent.io.invalid> wrote: > > Hi, Michael, > > Thanks for the reply. > > It seems that the migration section hasn't changed. Have you updated the > wiki? > > Jun > > On Fri, Jul 18, 2025 at 6:43 AM Mickael Maison <mickael.mai...@gmail.com> > wrote: > > > Hi Jun, > > > > Thanks for taking another look. > > > > JR10: We already have anoother metrics > > kafka.log:type=LogManager,name=LogDirectoryOffline,logDirectory="<PATH>" > > that contains the path in the MBean name. > > The path is between quotes which should avoid issues. So I kept the > > same naming for consistency. > > > > JR11.1: I updated the Compatibility, Deprecation, and Migration Plan > > section to explicitly note that this features requires updating to a > > metadata version that supports it to function. > > The Configurations section already stated that the new configuration > > can only be set with a compatible metadata version. > > JR11.2: Ditto I've explicitly mentioned it in the KIP now. > > > > JR12: Right, as the description says it's a list! I've adjusted the type. > > > > Thanks, > > Mickael > > > > > > On Wed, Jul 16, 2025 at 8:43 PM Jun Rao <j...@confluent.io.invalid> wrote: > > > > > > Hi, Mickael, > > > > > > Thanks for the updated KIP. A few more comments. > > > > > > JR10. > > kafka.log:type=LogManager,name=LogDirectoryCordoned,logDirectory=<PATH>: > > > Path contains /, which has special meaning in mbean. Could we use > > > kafka.log:type=LogManager,name=LogDirectoryCordoned and show the cordoned > > > paths as the value? > > > > > > JR11. Migration plan > > > JR11.1 We need to mention that this feature is gated by a newly > > introduced > > > MV. > > > JR11.2 Since this changes metadata records and we don't support > > downgrading > > > MV yet, we need to add that once the feature is enabled, it can't be > > > downgraded. > > > > > > JR12. cordoned.log.dirs: Should it be of type list and default to empty > > > list? > > > > > > Jun > > > > > > On Mon, Apr 7, 2025 at 7:29 AM Mickael Maison <mickael.mai...@gmail.com> > > > wrote: > > > > > > > Hi David, > > > > > > > > DA1: Done > > > > > > > > DA2.1: Yes because we only have the directory ids in the metadata log. > > > > At the moment only brokers have the path -> id mappings. > > > > > > > > DA2.2: I proposed this mechanism because of DA2.1. Also this is > > > > consistent with other operations like updating log levels. If people > > > > would prefer to have dedicated APIs, I can adjust the proposal. > > > > > > > > DA4: Good idea, I added metrics to the KIP. > > > > > > > > Thanks, > > > > Mickael > > > > > > > > On Mon, Apr 7, 2025 at 4:23 PM Mickael Maison < > > mickael.mai...@gmail.com> > > > > wrote: > > > > > > > > > > Hi Jun, > > > > > > > > > > 10: Yes that's good idea. I've added that to the KIP. > > > > > > > > > > 11: Yes that was my plan. We need to check the metadata version and > > > > > reject updates to the configuration is the metadata version is too > > > > > low. I clarified that in the KIP. > > > > > Did you have anything else in mind (if you can remember what you > > > > > thought over 8 months ago!)? I don't think we need to expose a > > > > > feature. > > > > > > > > > > Thanks, > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 30, 2024 at 4:22 PM David Arthur <mum...@gmail.com> > > wrote: > > > > > > > > > > > > DA1: I like Jun's suggestion of using a wildcard, this would also > > help > > > > with > > > > > > the case I mentioned (cordon a whole broker, regardless of how > > many log > > > > > > dirs). > > > > > > > > > > > > DA2.1: Re: log dir names to UUID mapping -- do you mean a new > > > > CordonLogDirs > > > > > > RPC would need to send the UUID instead of the log dir name? > > > > > > > > > > > > DA2.2: In general, I don't really like the idea of using dynamic > > > > configs as > > > > > > an API for modifying state. I know we do it a lot, but I think > > it's an > > > > > > anti-pattern. It makes it difficult to evolve an API since you're > > stuck > > > > > > with the key/value config system. Also, the dynamic config code is > > a > > > > lot > > > > > > harder to navigate and understand than KafkaApis and MetadataImage > > (my > > > > > > opinion is probably biased here). > > > > > > > > > > > > DA3: I was thinking along the lines of this being a new feature, > > so we > > > > may > > > > > > want to just disable the code paths entirely with a config. For > > > > example, > > > > > > maybe we introduce a rare bug that breaks assignments, or broker > > > > > > registration, or whatever -- it's nicer to have a config to turn > > off a > > > > > > feature rather than having to downgrade. I guess this really > > depends > > > > on how > > > > > > tightly integrated the new feature is with existing code. Let's > > leave > > > > this > > > > > > out of the KIP for now and maybe revisit after the feature is done. > > > > > > > > > > > > DA4: Ok, thanks. So eventually the operator would un-cordon the > > > > brokers to > > > > > > allow normal placements to occur. Since we expect the operator to > > > > > > eventually come back and un-cordon the brokers, this means it will > > > > > > occasionally be forgotten (since it's a human task). WDYT about a > > new > > > > > > broker metric to show the number of cordoned log dirs? This would > > help > > > > > > operators set up monitors like "alert if cordoned count > 0 for 7 > > > > days" or > > > > > > similar. > > > > > > > > > > > > -David > > > > > > > > > > > > On Wed, Jul 24, 2024 at 1:21 PM Jun Rao <j...@confluent.io.invalid> > > > > wrote: > > > > > > > > > > > > > Hi, Mickael, > > > > > > > > > > > > > > Thanks for the KIP. > > > > > > > > > > > > > > 10. A common case is only one log dir per broker. Could we > > support > > > > sth > > > > > > > like --add-config cordoned.log.dirs=* to make it more convenient > > for > > > > this > > > > > > > case? > > > > > > > > > > > > > > 11. Since we changed the metadata record format, should we gate > > the > > > > new > > > > > > > configuration based on a new metadata version? > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > On Wed, Jul 17, 2024 at 9:18 AM Mickael Maison < > > > > mickael.mai...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Kamal, > > > > > > > > > > > > > > > > Good spot, yes this is a typo. The flexibleVersions stays as > > "0+". > > > > Fixed > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Mickael > > > > > > > > > > > > > > > > On Wed, Jul 17, 2024 at 6:14 PM Mickael Maison < > > > > mickael.mai...@gmail.com > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi David, > > > > > > > > > > > > > > > > > > DA1: It's a point I considered, especially being able to > > cordon a > > > > > > > > > whole broker. With the current proposal you just need to set > > > > > > > > > cordoned.log.dirs to the same value as log.dirs. That does > > not > > > > seem > > > > > > > > > excessively difficult. > > > > > > > > > > > > > > > > > > DA2: I did consider using a new metadata record like we do > > for > > > > > > > > > fence/unfence. Since I don't expect cordoned log dirs to > > change > > > > very > > > > > > > > > frequently, and the size should be small, I opted to reuse > > the > > > > > > > > > BrokerRegistrationRecord metadata record. At the moment I > > guess > > > > it was > > > > > > > > > mostly for the convenience while prototyping. Semantically it > > > > probably > > > > > > > > > makes sense to have separate records. Your further point > > suggest > > > > > > > > > design changes in the mechanism as a whole, so let's discuss > > > > these > > > > > > > > > first and we can return to the exact metadata records after. > > > > > > > > > I find the idea of having dedicated RPCs interesting. One of > > the > > > > > > > > > reasons I piggybacked on the heartbeating process is for the > > > > mapping > > > > > > > > > between log directory names and their UUIDs. Currently the > > > > mapping > > > > > > > > > only exists on each broker instance. So if we wanted a > > dedicated > > > > RPC, > > > > > > > > > we would first need to change how we manage the log > > directory to > > > > UUID > > > > > > > > > mappings. I guess this could be done via the > > BrokerRegistration > > > > API. > > > > > > > > > I'm not sure about storing additional metadata (reason, > > > > timestamp). We > > > > > > > > > currently don't do that for any operations > > > > > > > > > (AlterPartitionReassignments, UnregisterBroker). Typically > > these > > > > are > > > > > > > > > stored in the tools used by the operators to drive these > > > > operations. > > > > > > > > > You bring another interesting point about the ability to > > cordon > > > > > > > > > brokers/log directories while they are offline. It's not > > > > something the > > > > > > > > > current proposal supports. I'm not sure how useful this would > > > > turn out > > > > > > > > > it practice. In experience is that brokers are mostly > > online, so > > > > I'd > > > > > > > > > expect the need to do so relatively rare. This also kind of > > > > loops back > > > > > > > > > to KAFKA-17094 [0] and the discussion Gantigmaa started on > > the > > > > dev > > > > > > > > > list [1] about being able to identify the ids of offline (but > > > > still > > > > > > > > > registered) brokers. > > > > > > > > > > > > > > > > > > DA3: With the current proposal, I don't see a reason why you > > > > would > > > > > > > > > want to disable the new behavior. If you don't want to use > > it, > > > > you > > > > > > > > > have nothing to do. It's opt-in as you need to set > > > > cordoned.log.dirs > > > > > > > > > on some brokers to get the new behavior. If you don't want it > > > > anymore, > > > > > > > > > you should unset cordoned.log.dirs. Can you explain why this > > > > would not > > > > > > > > > work? > > > > > > > > > > > > > > > > > > DA4: Yes > > > > > > > > > > > > > > > > > > 0: https://issues.apache.org/jira/browse/KAFKA-17094 > > > > > > > > > 1: > > > > https://lists.apache.org/thread/1rrgbhk43d85wobcp0dqz6mhpn93j9yo > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Jul 14, 2024 at 10:37 AM Kamal Chandraprakash > > > > > > > > > <kamal.chandraprak...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Hi Mickael, > > > > > > > > > > > > > > > > > > > > In the BrokerHearbeatRequest.json, the flexibleVersions are > > > > bumped > > > > > > > from > > > > > > > > > > "0+" to "1+". Is it a typo? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 12, 2024 at 11:42 PM David Arthur < > > > > mum...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Mickael, thanks for the KIP! I think this could be quite > > a > > > > useful > > > > > > > > feature. > > > > > > > > > > > > > > > > > > > > > > DA1: Having to know each of the log dirs for a broker > > seems > > > > a bit > > > > > > > > > > > inconvenient for cases where we want to cordon off a > > whole > > > > broker. > > > > > > > I > > > > > > > > do > > > > > > > > > > > think having the ability to cordon off a specific log > > dir is > > > > useful > > > > > > > > for > > > > > > > > > > > JBOD, but I imagine a common case might be to cordon off > > the > > > > whole > > > > > > > > broker. > > > > > > > > > > > > > > > > > > > > > > DA2: Looks like the new "cordoned.log.dirs" can be > > configured > > > > > > > > statically > > > > > > > > > > > and updated dynamically per-broker. What do you think > > about > > > > a new > > > > > > > > metadata > > > > > > > > > > > record and RPC instead of using a config? From my > > > > understanding, > > > > > > > the > > > > > > > > > > > BrokerRegistration/Heartbeat is more about the lifecycle > > of a > > > > > > > broker > > > > > > > > > > > whereas cordoning a broker is an operator driven action. > > It > > > > might > > > > > > > > make > > > > > > > > > > > sense to have a separate record for this. We could > > include > > > > > > > additional > > > > > > > > > > > fields like a timestamp, a reason/comment field (e.g., > > > > > > > > "decommissioning", > > > > > > > > > > > "disk failure", "new broker" etc), stuff like that. > > > > > > > > > > > > > > > > > > > > > > This would also allow cordoning to be done while a > > broker is > > > > > > > offline > > > > > > > > or > > > > > > > > > > > before it has been provisioned. Not sure how likely that > > is, > > > > but > > > > > > > > might be > > > > > > > > > > > useful? > > > > > > > > > > > > > > > > > > > > > > DA3: Can we consider having a configuration to > > > > enable/disable the > > > > > > > new > > > > > > > > > > > replica placer behavior? This would be separate from the > > new > > > > > > > > > > > MetadataVersion for the RPC/record changes. > > > > > > > > > > > > > > > > > > > > > > DA4: In the Motivation section, you mention the cluster > > > > expansion > > > > > > > > scenario. > > > > > > > > > > > For this scenario, is the expectation that the operator > > will > > > > cordon > > > > > > > > off the > > > > > > > > > > > existing full brokers so placements only happen on the > > new > > > > brokers? > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 12, 2024 at 8:53 AM Mickael Maison < > > > > > > > > mickael.mai...@gmail.com> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi Kamal, > > > > > > > > > > > > > > > > > > > > > > > > Thanks for taking a look at the KIP! > > > > > > > > > > > > > > > > > > > > > > > > I briefly considered that option initially but I found > > it > > > > not > > > > > > > very > > > > > > > > > > > > practical once you have more than a few cordoned log > > > > directories. > > > > > > > > > > > > I find your example is already not very easy to read, > > and > > > > it only > > > > > > > > has > > > > > > > > > > > > 2 entries. Also if the configuration is at the cluster > > > > level > > > > > > > it'sis > > > > > > > > > > > > not easy to see if a broker has all its log directories > > > > cordoned, > > > > > > > > and > > > > > > > > > > > > you still need to describe a specific broker's > > > > configuration to > > > > > > > > find > > > > > > > > > > > > the "name" of a log directory you want to cordon. > > > > > > > > > > > > > > > > > > > > > > > > I think an easy way to get an overall view of the > > cordoned > > > > log > > > > > > > > > > > > directories/brokers will be via the kafka-log-dirs.sh > > > > tool. I am > > > > > > > > also > > > > > > > > > > > > considering adding metrics like we have today for > > > > > > > > LogDirectoryOffline > > > > > > > > > > > > to ease monitoring. > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jul 11, 2024 at 8:41 PM Kamal Chandraprakash > > > > > > > > > > > > <kamal.chandraprak...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mickael, > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the KIP! > > > > > > > > > > > > > > > > > > > > > > > > > > This is a useful feature which helps to decommission > > the > > > > nodes > > > > > > > by > > > > > > > > > > > > > essentially > > > > > > > > > > > > > creating a new replica exclude broker list. > > > > > > > > > > > > > > > > > > > > > > > > > > To cordon a list of brokers, we have to apply the > > config > > > > on > > > > > > > each > > > > > > > > of the > > > > > > > > > > > > > broker nodes > > > > > > > > > > > > > and similarly to see the list of cordoned brokers, we > > > > have to > > > > > > > > either > > > > > > > > > > > > query > > > > > > > > > > > > > individual broker > > > > > > > > > > > > > config in the cluster or query each of the broker log > > > > > > > directory. > > > > > > > > > > > > > > > > > > > > > > > > > > Can we move the configuration from the broker to > > cluster > > > > level? > > > > > > > > (eg) > > > > > > > > > > > > > > > > > > > > > > > > > > bin/kafka-configs.sh --bootstrap-server > > localhost:9092 > > > > > > > > > > > --broker-defaults > > > > > > > > > > > > > --alter --add-config > > > > > > > > > > > > > cordoned.log.dirs=[broker.id > > > > > > > > .0]:[log.dir.0],[broker.id.1]:[log.dir.1] > > > > > > > > > > > > > > > > > > > > > > > > > > This will provide a consistent view of the list of > > > > cordoned > > > > > > > > brokers in > > > > > > > > > > > > the > > > > > > > > > > > > > cluster. > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > Kamal > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 10, 2024 at 7:53 PM Mickael Maison < > > > > > > > > > > > mickael.mai...@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Luke, > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. You're right this scenario can happen. In this > > case > > > > I > > > > > > > think > > > > > > > > the > > > > > > > > > > > > > > broker should enforce its new state and not create > > the > > > > > > > replica > > > > > > > > as all > > > > > > > > > > > > > > its log directories are now cordoned. The replica > > will > > > > be > > > > > > > > offline and > > > > > > > > > > > > > > an administrator would need to reassign it to > > another > > > > > > > broker. I > > > > > > > > > > > expect > > > > > > > > > > > > > > most users will rely on kafka-configs.sh to manage > > the > > > > > > > > cordoned log > > > > > > > > > > > > > > directories instead of updating the broker > > properties, > > > > so it > > > > > > > > should > > > > > > > > > > > > > > not be a common issue. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 6. To resolve this issue, the user can uncordon a > > log > > > > > > > > directory and > > > > > > > > > > > > > > retry an operation that triggers the creation of > > the > > > > internal > > > > > > > > topics. > > > > > > > > > > > > > > For example start a consumer using a group should > > make > > > > the > > > > > > > > broker > > > > > > > > > > > > > > retry creating the __consumer_offsets topic. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 10, 2024 at 4:14 PM Mickael Maison < > > > > > > > > > > > > mickael.mai...@gmail.com> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Chia-Ping, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Question 1) Yes that's a good summary. I'd also > > add > > > > that > > > > > > > > managing > > > > > > > > > > > > > > > cordoned log directories is intended to be done > > by > > > > cluster > > > > > > > > > > > > > > > administrators who also know about operations > > > > in-progress > > > > > > > or > > > > > > > > > > > planned > > > > > > > > > > > > > > > such as scaling or adding/removing log > > directories. > > > > In > > > > > > > > practice you > > > > > > > > > > > > > > > can't expect users to provide "good" explicit > > > > partition > > > > > > > > assignments > > > > > > > > > > > > as > > > > > > > > > > > > > > > they are not aware of the cluster operations. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Question 2) I don't see an actual question :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Question 3) My current proposal is that in that > > case, > > > > > > > > brokers will > > > > > > > > > > > > > > > ignore the preferred log directory and place the > > new > > > > > > > replica > > > > > > > > on a > > > > > > > > > > > log > > > > > > > > > > > > > > > directory that is not cordoned. To use > > > > AlterReplicaLogDirs > > > > > > > > you need > > > > > > > > > > > > > > > ALTER permission on the CLUSTER resource. In most > > > > > > > > environments > > > > > > > > > > > users > > > > > > > > > > > > > > > don't have permissions to use that API. Or if > > they > > > > do, I'd > > > > > > > > expect > > > > > > > > > > > > them > > > > > > > > > > > > > > > to also have ALTER_CONFIGS on the BROKER resource > > > > and be > > > > > > > > able to > > > > > > > > > > > > > > > change the cordoned log directories > > configuration. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Question 4) If we don't include the > > CordonedLogDirs > > > > field > > > > > > > in > > > > > > > > > > > > > > > BrokerRegistrationRequest as you said there's a > > small > > > > > > > window > > > > > > > > when > > > > > > > > > > > the > > > > > > > > > > > > > > > controller would not know if a broker can be > > used for > > > > > > > > assignments. > > > > > > > > > > > If > > > > > > > > > > > > > > > all log directories are cordoned the controller > > > > should not > > > > > > > > use that > > > > > > > > > > > > > > > broker for assignments. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Question 5) In KRaft mode offline brokers are > > still > > > > > > > > registered, and > > > > > > > > > > > > > > > can be used for partition assignment. So we need > > to > > > > persist > > > > > > > > the > > > > > > > > > > > > > > > cordoned log directories too to exclude brokers > > that > > > > don't > > > > > > > > have > > > > > > > > > > > > > > > uncordoned log directories. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 10, 2024 at 12:25 PM Luke Chen < > > > > > > > > show...@gmail.com> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mickael, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the response. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. Cordoned log directories are persisted to > > the > > > > > > > > metadata log > > > > > > > > > > > > via the > > > > > > > > > > > > > > > > RegisterBrokerRecord, > > > > BrokerRegistrationChangeRecord > > > > > > > > records. If > > > > > > > > > > > a > > > > > > > > > > > > > > > > broker is offline, the controller will use the > > > > latest > > > > > > > > known state > > > > > > > > > > > > of > > > > > > > > > > > > > > > > the broker to determine the broker's cordoned > > log > > > > > > > > directories. > > > > > > > > > > > I've > > > > > > > > > > > > > > > > added a sentence clarifying this point. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OK, so if the broker A goes offline, and the > > > > controller > > > > > > > is > > > > > > > > in > > > > > > > > > > > > "fenced" > > > > > > > > > > > > > > > > state, without any cordoned log dirs, then some > > > > topic > > > > > > > > created and > > > > > > > > > > > > > > assigned > > > > > > > > > > > > > > > > to broker A. Later, broker A starts up with all > > > > its log > > > > > > > > dirs > > > > > > > > > > > > cordoned > > > > > > > > > > > > > > > > configured. At this situation, will the broker > > A > > > > create > > > > > > > the > > > > > > > > > > > > partitions? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 6. I'm leaning towards considering that > > scenario > > > > a > > > > > > > > > > > configuration > > > > > > > > > > > > > > > > error. If all log directories are cordoned > > before > > > > the > > > > > > > > internal > > > > > > > > > > > > topics > > > > > > > > > > > > > > > > are created, then the broker will not be able > > to > > > > create > > > > > > > > them. > > > > > > > > > > > This > > > > > > > > > > > > > > > > seems like a pretty strange scenario, where > > it's > > > > the > > > > > > > first > > > > > > > > time > > > > > > > > > > > you > > > > > > > > > > > > > > > > start a broker, you've cordoned all its log > > > > directory, > > > > > > > and > > > > > > > > the > > > > > > > > > > > > > > > > internal topics (offsets and transactions) have > > > > not yet > > > > > > > > been > > > > > > > > > > > > created > > > > > > > > > > > > > > > > in the rest of the cluster. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, I agree that this should be a > > configuration > > > > error. > > > > > > > > > > > > > > > > So the follow-up question is: Suppose users > > > > encounter > > > > > > > this > > > > > > > > issue, > > > > > > > > > > > > and > > > > > > > > > > > > > > how > > > > > > > > > > > > > > > > could they resolve it? > > > > > > > > > > > > > > > > Uncordon the log dir dynamically using > > > > kafka-configs.sh? > > > > > > > > Will the > > > > > > > > > > > > > > > > uncordoning config change recreate the > > partitions > > > > we > > > > > > > didn't > > > > > > > > > > > create > > > > > > > > > > > > > > earlier > > > > > > > > > > > > > > > > because of log dir cordoned? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The metadata log is different (not managed by > > > > > > > > LogManager), so I > > > > > > > > > > > > think > > > > > > > > > > > > > > > > it should always be created regardless if its > > log > > > > > > > > directory is > > > > > > > > > > > > > > > > cordoned or not. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I agree we should treat "__cluster_metadata" > > > > differently. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > Luke > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 10, 2024 at 12:42 AM Mickael > > Maison < > > > > > > > > > > > > > > mickael.mai...@gmail.com> > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Luke, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. isCordoned() is a new method on > > > > LogDirDescription. > > > > > > > It > > > > > > > > does > > > > > > > > > > > not > > > > > > > > > > > > > > take > > > > > > > > > > > > > > > > > any arguments. It just returns true if this > > log > > > > > > > > directory the > > > > > > > > > > > > > > > > > LogDirDescription represents is cordoned. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. Sorry that was a typo. This method will > > only > > > > return > > > > > > > a > > > > > > > > log > > > > > > > > > > > > > > directory > > > > > > > > > > > > > > > > > that is not cordoned. Fixed > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. Cordoned log directories are persisted to > > the > > > > > > > > metadata log > > > > > > > > > > > > via the > > > > > > > > > > > > > > > > > RegisterBrokerRecord, > > > > BrokerRegistrationChangeRecord > > > > > > > > records. > > > > > > > > > > > If > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > broker is offline, the controller will use > > the > > > > latest > > > > > > > > known > > > > > > > > > > > > state of > > > > > > > > > > > > > > > > > the broker to determine the broker's > > cordoned log > > > > > > > > directories. > > > > > > > > > > > > I've > > > > > > > > > > > > > > > > > added a sentence clarifying this point. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 5. Yes a log directory can be uncordoned. > > You can > > > > > > > either > > > > > > > > update > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > properties file and restart the broker or > > > > dynamically > > > > > > > > change > > > > > > > > > > > the > > > > > > > > > > > > > > value > > > > > > > > > > > > > > > > > at runtime using kafka-configs. I've added a > > > > paragraph > > > > > > > > about it > > > > > > > > > > > > in > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > KIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 6. I'm leaning towards considering that > > scenario > > > > a > > > > > > > > > > > configuration > > > > > > > > > > > > > > > > > error. If all log directories are cordoned > > > > before the > > > > > > > > internal > > > > > > > > > > > > topics > > > > > > > > > > > > > > > > > are created, then the broker will not be > > able to > > > > create > > > > > > > > them. > > > > > > > > > > > > This > > > > > > > > > > > > > > > > > seems like a pretty strange scenario, where > > it's > > > > the > > > > > > > > first time > > > > > > > > > > > > you > > > > > > > > > > > > > > > > > start a broker, you've cordoned all its log > > > > directory, > > > > > > > > and the > > > > > > > > > > > > > > > > > internal topics (offsets and transactions) > > have > > > > not yet > > > > > > > > been > > > > > > > > > > > > created > > > > > > > > > > > > > > > > > in the rest of the cluster. > > > > > > > > > > > > > > > > > The metadata log is different (not managed by > > > > > > > > LogManager), so I > > > > > > > > > > > > think > > > > > > > > > > > > > > > > > it should always be created regardless if > > its log > > > > > > > > directory is > > > > > > > > > > > > > > > > > cordoned or not. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 9, 2024 at 3:48 PM Chia-Ping > > Tsai < > > > > > > > > > > > > chia7...@gmail.com> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hi Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > That is totally a good idea, but I have a > > > > question > > > > > > > > about the > > > > > > > > > > > > > > > > > implementation > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Do we consider making pluggable > > ReplicaPlacer > > > > > > > > (KIP-660) first > > > > > > > > > > > > and > > > > > > > > > > > > > > then > > > > > > > > > > > > > > > > > add > > > > > > > > > > > > > > > > > > another impl of ReplicaPlacer to offer > > cordon > > > > > > > > mechanism? > > > > > > > > > > > Noted > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > `ReplicaPlacer` can implement > > Reconfigurable > > > > to get > > > > > > > > updated > > > > > > > > > > > at > > > > > > > > > > > > > > runtime. > > > > > > > > > > > > > > > > > > That is similar to KIP-1066 - change > > > > > > > cordoned.log.dirs > > > > > > > > > > > through > > > > > > > > > > > > > > configs > > > > > > > > > > > > > > > > > > update. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The benefit is to let users have their > > > > optimized > > > > > > > > policy for > > > > > > > > > > > > > > specific > > > > > > > > > > > > > > > > > > scenario. Also, it can avoid that we add > > more > > > > and > > > > > > > more > > > > > > > > > > > > mechanism > > > > > > > > > > > > > > to our > > > > > > > > > > > > > > > > > > code base. Of course we can merge the > > > > mechanism which > > > > > > > > can be > > > > > > > > > > > > used > > > > > > > > > > > > > > by 99% > > > > > > > > > > > > > > > > > > users :smile > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > Chia-Ping > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Luke Chen <show...@gmail.com> 於 2024年7月9日 > > 週二 > > > > > > > 下午9:07寫道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mickael, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the KIP! > > > > > > > > > > > > > > > > > > > This is a long waiting feature for many > > > > users! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Questions: > > > > > > > > > > > > > > > > > > > 1. I think piggyback the > > > > "BrokerHeartbeatRequest" > > > > > > > to > > > > > > > > > > > forward > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > corden log > > > > > > > > > > > > > > > > > > > dir to controller makes sense to me. > > > > > > > > > > > > > > > > > > > We already did similar things for fence, > > > > controller > > > > > > > > > > > shutdown, > > > > > > > > > > > > > > failed > > > > > > > > > > > > > > > > > log > > > > > > > > > > > > > > > > > > > dir...etc. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. In the admin API, what parameters will > > > > the new > > > > > > > > added > > > > > > > > > > > > > > isCordoned() > > > > > > > > > > > > > > > > > method > > > > > > > > > > > > > > > > > > > take? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. In the KIP, we said: > > > > > > > > > > > > > > > > > > > "defaultDir(): This method will not > > return > > > > the Uuid > > > > > > > > of a > > > > > > > > > > > log > > > > > > > > > > > > > > directory > > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > > is not cordoned." > > > > > > > > > > > > > > > > > > > --> It's hard to understand. Does that > > mean > > > > we will > > > > > > > > only > > > > > > > > > > > > return > > > > > > > > > > > > > > > > > cordoned > > > > > > > > > > > > > > > > > > > log dir? > > > > > > > > > > > > > > > > > > > From the current java doc of the > > interface, > > > > it > > > > > > > > doesn't look > > > > > > > > > > > > > > right: > > > > > > > > > > > > > > > > > > > "Get the default directory for new > > partitions > > > > > > > placed > > > > > > > > in a > > > > > > > > > > > > given > > > > > > > > > > > > > > > > > broker." > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. Currently, if a broker is registered > > and > > > > then go > > > > > > > > > > > offline. > > > > > > > > > > > > In > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > state, > > > > > > > > > > > > > > > > > > > the controller will still distribute > > > > partitions to > > > > > > > > this > > > > > > > > > > > > broker. > > > > > > > > > > > > > > > > > > > So, if now, the broker get startup with > > > > > > > > "cordoned.log.dirs" > > > > > > > > > > > > set, > > > > > > > > > > > > > > what > > > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > > > happen? > > > > > > > > > > > > > > > > > > > Will the newly assigned partitions be > > created > > > > > > > > successfully > > > > > > > > > > > or > > > > > > > > > > > > > > not? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 5. I think after a log dir get cordoned, > > we > > > > can > > > > > > > > always > > > > > > > > > > > > uncordon > > > > > > > > > > > > > > it, > > > > > > > > > > > > > > > > > right? > > > > > > > > > > > > > > > > > > > I think we should mention it in the KIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 6. If a broker is startup with > > > > "cordoned.log.dirs" > > > > > > > > set, and > > > > > > > > > > > > does > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > mean > > > > > > > > > > > > > > > > > > > the internal topics partitions (ex: > > > > > > > > __consumer_offsets) > > > > > > > > > > > > cannot be > > > > > > > > > > > > > > > > > created, > > > > > > > > > > > > > > > > > > > either? > > > > > > > > > > > > > > > > > > > Also, if this log dir is happen to be the > > > > metadata > > > > > > > > log dir, > > > > > > > > > > > > what > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > > > happen to the metadata topic creation? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > Luke > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 9, 2024 at 12:12 AM Mickael > > > > Maison < > > > > > > > > > > > > > > > > > mickael.mai...@gmail.com> > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for taking a look. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Yes you're right, I meant > > > > > > > > AlterPartitionReassignments. > > > > > > > > > > > > Fixed. > > > > > > > > > > > > > > > > > > > > - That's a good idea. I was expecting > > > > users to > > > > > > > > discover > > > > > > > > > > > > > > cordoned log > > > > > > > > > > > > > > > > > > > > directories by describing broker > > > > configurations. > > > > > > > > But > > > > > > > > > > > being > > > > > > > > > > > > > > able to > > > > > > > > > > > > > > > > > > > > also get this information when > > describing > > > > log > > > > > > > > directories > > > > > > > > > > > > makes > > > > > > > > > > > > > > > > > sense. > > > > > > > > > > > > > > > > > > > > I've added that to the KIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 5, 2024 at 8:05 AM Haruki > > > > Okada < > > > > > > > > > > > > > > ocadar...@gmail.com> > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you for the KIP. > > > > > > > > > > > > > > > > > > > > > The motivation sounds make sense to > > me. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have a few questions: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - [nits] "AlterPartitions request" in > > > > Error > > > > > > > > handling > > > > > > > > > > > > section > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > > > "AlterPartitionReassignments request" > > > > actually, > > > > > > > > right? > > > > > > > > > > > > > > > > > > > > > - Don't we need to include cordoned > > > > information > > > > > > > > in > > > > > > > > > > > > > > DescribeLogDirs > > > > > > > > > > > > > > > > > > > > response > > > > > > > > > > > > > > > > > > > > > too? Some tools (e.g. CruiseControl) > > > > need to > > > > > > > > have a way > > > > > > > > > > > > to > > > > > > > > > > > > > > know > > > > > > > > > > > > > > > > > which > > > > > > > > > > > > > > > > > > > > > broker/log-dirs are cordoned to > > generate > > > > > > > > partition > > > > > > > > > > > > > > reassignment > > > > > > > > > > > > > > > > > > > proposal. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2024年7月4日(木) 22:57 Mickael Maison < > > > > > > > > > > > > mickael.mai...@gmail.com > > > > > > > > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'd like to start a discussion on > > > > KIP-1066 > > > > > > > that > > > > > > > > > > > > introduces > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > mechanism > > > > > > > > > > > > > > > > > > > > > > to cordon log directories and > > brokers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1066%3A+Mechanism+to+cordon+brokers+and+log+directories > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > ======================== > > > > > > > > > > > > > > > > > > > > > Okada Haruki > > > > > > > > > > > > > > > > > > > > > ocadar...@gmail.com > > > > > > > > > > > > > > > > > > > > > ======================== > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > David Arthur > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > David Arthur > > > > > > > >