Hi, Mickael, Thanks for the updated KIP. A few more comments.
JR10. kafka.log:type=LogManager,name=LogDirectoryCordoned,logDirectory=<PATH>: Path contains /, which has special meaning in mbean. Could we use kafka.log:type=LogManager,name=LogDirectoryCordoned and show the cordoned paths as the value? JR11. Migration plan JR11.1 We need to mention that this feature is gated by a newly introduced MV. JR11.2 Since this changes metadata records and we don't support downgrading MV yet, we need to add that once the feature is enabled, it can't be downgraded. JR12. cordoned.log.dirs: Should it be of type list and default to empty list? Jun On Mon, Apr 7, 2025 at 7:29 AM Mickael Maison <mickael.mai...@gmail.com> wrote: > Hi David, > > DA1: Done > > DA2.1: Yes because we only have the directory ids in the metadata log. > At the moment only brokers have the path -> id mappings. > > DA2.2: I proposed this mechanism because of DA2.1. Also this is > consistent with other operations like updating log levels. If people > would prefer to have dedicated APIs, I can adjust the proposal. > > DA4: Good idea, I added metrics to the KIP. > > Thanks, > Mickael > > On Mon, Apr 7, 2025 at 4:23 PM Mickael Maison <mickael.mai...@gmail.com> > wrote: > > > > Hi Jun, > > > > 10: Yes that's good idea. I've added that to the KIP. > > > > 11: Yes that was my plan. We need to check the metadata version and > > reject updates to the configuration is the metadata version is too > > low. I clarified that in the KIP. > > Did you have anything else in mind (if you can remember what you > > thought over 8 months ago!)? I don't think we need to expose a > > feature. > > > > Thanks, > > Mickael > > > > > > > > > > > > > > > > On Tue, Jul 30, 2024 at 4:22 PM David Arthur <mum...@gmail.com> wrote: > > > > > > DA1: I like Jun's suggestion of using a wildcard, this would also help > with > > > the case I mentioned (cordon a whole broker, regardless of how many log > > > dirs). > > > > > > DA2.1: Re: log dir names to UUID mapping -- do you mean a new > CordonLogDirs > > > RPC would need to send the UUID instead of the log dir name? > > > > > > DA2.2: In general, I don't really like the idea of using dynamic > configs as > > > an API for modifying state. I know we do it a lot, but I think it's an > > > anti-pattern. It makes it difficult to evolve an API since you're stuck > > > with the key/value config system. Also, the dynamic config code is a > lot > > > harder to navigate and understand than KafkaApis and MetadataImage (my > > > opinion is probably biased here). > > > > > > DA3: I was thinking along the lines of this being a new feature, so we > may > > > want to just disable the code paths entirely with a config. For > example, > > > maybe we introduce a rare bug that breaks assignments, or broker > > > registration, or whatever -- it's nicer to have a config to turn off a > > > feature rather than having to downgrade. I guess this really depends > on how > > > tightly integrated the new feature is with existing code. Let's leave > this > > > out of the KIP for now and maybe revisit after the feature is done. > > > > > > DA4: Ok, thanks. So eventually the operator would un-cordon the > brokers to > > > allow normal placements to occur. Since we expect the operator to > > > eventually come back and un-cordon the brokers, this means it will > > > occasionally be forgotten (since it's a human task). WDYT about a new > > > broker metric to show the number of cordoned log dirs? This would help > > > operators set up monitors like "alert if cordoned count > 0 for 7 > days" or > > > similar. > > > > > > -David > > > > > > On Wed, Jul 24, 2024 at 1:21 PM Jun Rao <j...@confluent.io.invalid> > wrote: > > > > > > > Hi, Mickael, > > > > > > > > Thanks for the KIP. > > > > > > > > 10. A common case is only one log dir per broker. Could we support > sth > > > > like --add-config cordoned.log.dirs=* to make it more convenient for > this > > > > case? > > > > > > > > 11. Since we changed the metadata record format, should we gate the > new > > > > configuration based on a new metadata version? > > > > > > > > Jun > > > > > > > > On Wed, Jul 17, 2024 at 9:18 AM Mickael Maison < > mickael.mai...@gmail.com> > > > > wrote: > > > > > > > > > Hi Kamal, > > > > > > > > > > Good spot, yes this is a typo. The flexibleVersions stays as "0+". > Fixed > > > > > > > > > > Thanks, > > > > > Mickael > > > > > > > > > > On Wed, Jul 17, 2024 at 6:14 PM Mickael Maison < > mickael.mai...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > Hi David, > > > > > > > > > > > > DA1: It's a point I considered, especially being able to cordon a > > > > > > whole broker. With the current proposal you just need to set > > > > > > cordoned.log.dirs to the same value as log.dirs. That does not > seem > > > > > > excessively difficult. > > > > > > > > > > > > DA2: I did consider using a new metadata record like we do for > > > > > > fence/unfence. Since I don't expect cordoned log dirs to change > very > > > > > > frequently, and the size should be small, I opted to reuse the > > > > > > BrokerRegistrationRecord metadata record. At the moment I guess > it was > > > > > > mostly for the convenience while prototyping. Semantically it > probably > > > > > > makes sense to have separate records. Your further point suggest > > > > > > design changes in the mechanism as a whole, so let's discuss > these > > > > > > first and we can return to the exact metadata records after. > > > > > > I find the idea of having dedicated RPCs interesting. One of the > > > > > > reasons I piggybacked on the heartbeating process is for the > mapping > > > > > > between log directory names and their UUIDs. Currently the > mapping > > > > > > only exists on each broker instance. So if we wanted a dedicated > RPC, > > > > > > we would first need to change how we manage the log directory to > UUID > > > > > > mappings. I guess this could be done via the BrokerRegistration > API. > > > > > > I'm not sure about storing additional metadata (reason, > timestamp). We > > > > > > currently don't do that for any operations > > > > > > (AlterPartitionReassignments, UnregisterBroker). Typically these > are > > > > > > stored in the tools used by the operators to drive these > operations. > > > > > > You bring another interesting point about the ability to cordon > > > > > > brokers/log directories while they are offline. It's not > something the > > > > > > current proposal supports. I'm not sure how useful this would > turn out > > > > > > it practice. In experience is that brokers are mostly online, so > I'd > > > > > > expect the need to do so relatively rare. This also kind of > loops back > > > > > > to KAFKA-17094 [0] and the discussion Gantigmaa started on the > dev > > > > > > list [1] about being able to identify the ids of offline (but > still > > > > > > registered) brokers. > > > > > > > > > > > > DA3: With the current proposal, I don't see a reason why you > would > > > > > > want to disable the new behavior. If you don't want to use it, > you > > > > > > have nothing to do. It's opt-in as you need to set > cordoned.log.dirs > > > > > > on some brokers to get the new behavior. If you don't want it > anymore, > > > > > > you should unset cordoned.log.dirs. Can you explain why this > would not > > > > > > work? > > > > > > > > > > > > DA4: Yes > > > > > > > > > > > > 0: https://issues.apache.org/jira/browse/KAFKA-17094 > > > > > > 1: > https://lists.apache.org/thread/1rrgbhk43d85wobcp0dqz6mhpn93j9yo > > > > > > > > > > > > Thanks, > > > > > > Mickael > > > > > > > > > > > > > > > > > > On Sun, Jul 14, 2024 at 10:37 AM Kamal Chandraprakash > > > > > > <kamal.chandraprak...@gmail.com> wrote: > > > > > > > > > > > > > > Hi Mickael, > > > > > > > > > > > > > > In the BrokerHearbeatRequest.json, the flexibleVersions are > bumped > > > > from > > > > > > > "0+" to "1+". Is it a typo? > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 12, 2024 at 11:42 PM David Arthur < > mum...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > Mickael, thanks for the KIP! I think this could be quite a > useful > > > > > feature. > > > > > > > > > > > > > > > > DA1: Having to know each of the log dirs for a broker seems > a bit > > > > > > > > inconvenient for cases where we want to cordon off a whole > broker. > > > > I > > > > > do > > > > > > > > think having the ability to cordon off a specific log dir is > useful > > > > > for > > > > > > > > JBOD, but I imagine a common case might be to cordon off the > whole > > > > > broker. > > > > > > > > > > > > > > > > DA2: Looks like the new "cordoned.log.dirs" can be configured > > > > > statically > > > > > > > > and updated dynamically per-broker. What do you think about > a new > > > > > metadata > > > > > > > > record and RPC instead of using a config? From my > understanding, > > > > the > > > > > > > > BrokerRegistration/Heartbeat is more about the lifecycle of a > > > > broker > > > > > > > > whereas cordoning a broker is an operator driven action. It > might > > > > > make > > > > > > > > sense to have a separate record for this. We could include > > > > additional > > > > > > > > fields like a timestamp, a reason/comment field (e.g., > > > > > "decommissioning", > > > > > > > > "disk failure", "new broker" etc), stuff like that. > > > > > > > > > > > > > > > > This would also allow cordoning to be done while a broker is > > > > offline > > > > > or > > > > > > > > before it has been provisioned. Not sure how likely that is, > but > > > > > might be > > > > > > > > useful? > > > > > > > > > > > > > > > > DA3: Can we consider having a configuration to > enable/disable the > > > > new > > > > > > > > replica placer behavior? This would be separate from the new > > > > > > > > MetadataVersion for the RPC/record changes. > > > > > > > > > > > > > > > > DA4: In the Motivation section, you mention the cluster > expansion > > > > > scenario. > > > > > > > > For this scenario, is the expectation that the operator will > cordon > > > > > off the > > > > > > > > existing full brokers so placements only happen on the new > brokers? > > > > > > > > > > > > > > > > Cheers, > > > > > > > > David > > > > > > > > > > > > > > > > On Fri, Jul 12, 2024 at 8:53 AM Mickael Maison < > > > > > mickael.mai...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Kamal, > > > > > > > > > > > > > > > > > > Thanks for taking a look at the KIP! > > > > > > > > > > > > > > > > > > I briefly considered that option initially but I found it > not > > > > very > > > > > > > > > practical once you have more than a few cordoned log > directories. > > > > > > > > > I find your example is already not very easy to read, and > it only > > > > > has > > > > > > > > > 2 entries. Also if the configuration is at the cluster > level > > > > it'sis > > > > > > > > > not easy to see if a broker has all its log directories > cordoned, > > > > > and > > > > > > > > > you still need to describe a specific broker's > configuration to > > > > > find > > > > > > > > > the "name" of a log directory you want to cordon. > > > > > > > > > > > > > > > > > > I think an easy way to get an overall view of the cordoned > log > > > > > > > > > directories/brokers will be via the kafka-log-dirs.sh > tool. I am > > > > > also > > > > > > > > > considering adding metrics like we have today for > > > > > LogDirectoryOffline > > > > > > > > > to ease monitoring. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > On Thu, Jul 11, 2024 at 8:41 PM Kamal Chandraprakash > > > > > > > > > <kamal.chandraprak...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Hi Mickael, > > > > > > > > > > > > > > > > > > > > Thanks for the KIP! > > > > > > > > > > > > > > > > > > > > This is a useful feature which helps to decommission the > nodes > > > > by > > > > > > > > > > essentially > > > > > > > > > > creating a new replica exclude broker list. > > > > > > > > > > > > > > > > > > > > To cordon a list of brokers, we have to apply the config > on > > > > each > > > > > of the > > > > > > > > > > broker nodes > > > > > > > > > > and similarly to see the list of cordoned brokers, we > have to > > > > > either > > > > > > > > > query > > > > > > > > > > individual broker > > > > > > > > > > config in the cluster or query each of the broker log > > > > directory. > > > > > > > > > > > > > > > > > > > > Can we move the configuration from the broker to cluster > level? > > > > > (eg) > > > > > > > > > > > > > > > > > > > > bin/kafka-configs.sh --bootstrap-server localhost:9092 > > > > > > > > --broker-defaults > > > > > > > > > > --alter --add-config > > > > > > > > > > cordoned.log.dirs=[broker.id > > > > > .0]:[log.dir.0],[broker.id.1]:[log.dir.1] > > > > > > > > > > > > > > > > > > > > This will provide a consistent view of the list of > cordoned > > > > > brokers in > > > > > > > > > the > > > > > > > > > > cluster. > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Kamal > > > > > > > > > > > > > > > > > > > > On Wed, Jul 10, 2024 at 7:53 PM Mickael Maison < > > > > > > > > mickael.mai...@gmail.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi Luke, > > > > > > > > > > > > > > > > > > > > > > 4. You're right this scenario can happen. In this case > I > > > > think > > > > > the > > > > > > > > > > > broker should enforce its new state and not create the > > > > replica > > > > > as all > > > > > > > > > > > its log directories are now cordoned. The replica will > be > > > > > offline and > > > > > > > > > > > an administrator would need to reassign it to another > > > > broker. I > > > > > > > > expect > > > > > > > > > > > most users will rely on kafka-configs.sh to manage the > > > > > cordoned log > > > > > > > > > > > directories instead of updating the broker properties, > so it > > > > > should > > > > > > > > > > > not be a common issue. > > > > > > > > > > > > > > > > > > > > > > 6. To resolve this issue, the user can uncordon a log > > > > > directory and > > > > > > > > > > > retry an operation that triggers the creation of the > internal > > > > > topics. > > > > > > > > > > > For example start a consumer using a group should make > the > > > > > broker > > > > > > > > > > > retry creating the __consumer_offsets topic. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 10, 2024 at 4:14 PM Mickael Maison < > > > > > > > > > mickael.mai...@gmail.com> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi Chia-Ping, > > > > > > > > > > > > > > > > > > > > > > > > Question 1) Yes that's a good summary. I'd also add > that > > > > > managing > > > > > > > > > > > > cordoned log directories is intended to be done by > cluster > > > > > > > > > > > > administrators who also know about operations > in-progress > > > > or > > > > > > > > planned > > > > > > > > > > > > such as scaling or adding/removing log directories. > In > > > > > practice you > > > > > > > > > > > > can't expect users to provide "good" explicit > partition > > > > > assignments > > > > > > > > > as > > > > > > > > > > > > they are not aware of the cluster operations. > > > > > > > > > > > > > > > > > > > > > > > > Question 2) I don't see an actual question :) > > > > > > > > > > > > > > > > > > > > > > > > Question 3) My current proposal is that in that case, > > > > > brokers will > > > > > > > > > > > > ignore the preferred log directory and place the new > > > > replica > > > > > on a > > > > > > > > log > > > > > > > > > > > > directory that is not cordoned. To use > AlterReplicaLogDirs > > > > > you need > > > > > > > > > > > > ALTER permission on the CLUSTER resource. In most > > > > > environments > > > > > > > > users > > > > > > > > > > > > don't have permissions to use that API. Or if they > do, I'd > > > > > expect > > > > > > > > > them > > > > > > > > > > > > to also have ALTER_CONFIGS on the BROKER resource > and be > > > > > able to > > > > > > > > > > > > change the cordoned log directories configuration. > > > > > > > > > > > > > > > > > > > > > > > > Question 4) If we don't include the CordonedLogDirs > field > > > > in > > > > > > > > > > > > BrokerRegistrationRequest as you said there's a small > > > > window > > > > > when > > > > > > > > the > > > > > > > > > > > > controller would not know if a broker can be used for > > > > > assignments. > > > > > > > > If > > > > > > > > > > > > all log directories are cordoned the controller > should not > > > > > use that > > > > > > > > > > > > broker for assignments. > > > > > > > > > > > > > > > > > > > > > > > > Question 5) In KRaft mode offline brokers are still > > > > > registered, and > > > > > > > > > > > > can be used for partition assignment. So we need to > persist > > > > > the > > > > > > > > > > > > cordoned log directories too to exclude brokers that > don't > > > > > have > > > > > > > > > > > > uncordoned log directories. > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 10, 2024 at 12:25 PM Luke Chen < > > > > > show...@gmail.com> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mickael, > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the response. > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. Cordoned log directories are persisted to the > > > > > metadata log > > > > > > > > > via the > > > > > > > > > > > > > RegisterBrokerRecord, > BrokerRegistrationChangeRecord > > > > > records. If > > > > > > > > a > > > > > > > > > > > > > broker is offline, the controller will use the > latest > > > > > known state > > > > > > > > > of > > > > > > > > > > > > > the broker to determine the broker's cordoned log > > > > > directories. > > > > > > > > I've > > > > > > > > > > > > > added a sentence clarifying this point. > > > > > > > > > > > > > > > > > > > > > > > > > > OK, so if the broker A goes offline, and the > controller > > > > is > > > > > in > > > > > > > > > "fenced" > > > > > > > > > > > > > state, without any cordoned log dirs, then some > topic > > > > > created and > > > > > > > > > > > assigned > > > > > > > > > > > > > to broker A. Later, broker A starts up with all > its log > > > > > dirs > > > > > > > > > cordoned > > > > > > > > > > > > > configured. At this situation, will the broker A > create > > > > the > > > > > > > > > partitions? > > > > > > > > > > > > > > > > > > > > > > > > > > > 6. I'm leaning towards considering that scenario > a > > > > > > > > configuration > > > > > > > > > > > > > error. If all log directories are cordoned before > the > > > > > internal > > > > > > > > > topics > > > > > > > > > > > > > are created, then the broker will not be able to > create > > > > > them. > > > > > > > > This > > > > > > > > > > > > > seems like a pretty strange scenario, where it's > the > > > > first > > > > > time > > > > > > > > you > > > > > > > > > > > > > start a broker, you've cordoned all its log > directory, > > > > and > > > > > the > > > > > > > > > > > > > internal topics (offsets and transactions) have > not yet > > > > > been > > > > > > > > > created > > > > > > > > > > > > > in the rest of the cluster. > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, I agree that this should be a configuration > error. > > > > > > > > > > > > > So the follow-up question is: Suppose users > encounter > > > > this > > > > > issue, > > > > > > > > > and > > > > > > > > > > > how > > > > > > > > > > > > > could they resolve it? > > > > > > > > > > > > > Uncordon the log dir dynamically using > kafka-configs.sh? > > > > > Will the > > > > > > > > > > > > > uncordoning config change recreate the partitions > we > > > > didn't > > > > > > > > create > > > > > > > > > > > earlier > > > > > > > > > > > > > because of log dir cordoned? > > > > > > > > > > > > > > > > > > > > > > > > > > > The metadata log is different (not managed by > > > > > LogManager), so I > > > > > > > > > think > > > > > > > > > > > > > it should always be created regardless if its log > > > > > directory is > > > > > > > > > > > > > cordoned or not. > > > > > > > > > > > > > > > > > > > > > > > > > > I agree we should treat "__cluster_metadata" > differently. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > Luke > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 10, 2024 at 12:42 AM Mickael Maison < > > > > > > > > > > > mickael.mai...@gmail.com> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Luke, > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. isCordoned() is a new method on > LogDirDescription. > > > > It > > > > > does > > > > > > > > not > > > > > > > > > > > take > > > > > > > > > > > > > > any arguments. It just returns true if this log > > > > > directory the > > > > > > > > > > > > > > LogDirDescription represents is cordoned. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. Sorry that was a typo. This method will only > return > > > > a > > > > > log > > > > > > > > > > > directory > > > > > > > > > > > > > > that is not cordoned. Fixed > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. Cordoned log directories are persisted to the > > > > > metadata log > > > > > > > > > via the > > > > > > > > > > > > > > RegisterBrokerRecord, > BrokerRegistrationChangeRecord > > > > > records. > > > > > > > > If > > > > > > > > > a > > > > > > > > > > > > > > broker is offline, the controller will use the > latest > > > > > known > > > > > > > > > state of > > > > > > > > > > > > > > the broker to determine the broker's cordoned log > > > > > directories. > > > > > > > > > I've > > > > > > > > > > > > > > added a sentence clarifying this point. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 5. Yes a log directory can be uncordoned. You can > > > > either > > > > > update > > > > > > > > > the > > > > > > > > > > > > > > properties file and restart the broker or > dynamically > > > > > change > > > > > > > > the > > > > > > > > > > > value > > > > > > > > > > > > > > at runtime using kafka-configs. I've added a > paragraph > > > > > about it > > > > > > > > > in > > > > > > > > > > > the > > > > > > > > > > > > > > KIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 6. I'm leaning towards considering that scenario > a > > > > > > > > configuration > > > > > > > > > > > > > > error. If all log directories are cordoned > before the > > > > > internal > > > > > > > > > topics > > > > > > > > > > > > > > are created, then the broker will not be able to > create > > > > > them. > > > > > > > > > This > > > > > > > > > > > > > > seems like a pretty strange scenario, where it's > the > > > > > first time > > > > > > > > > you > > > > > > > > > > > > > > start a broker, you've cordoned all its log > directory, > > > > > and the > > > > > > > > > > > > > > internal topics (offsets and transactions) have > not yet > > > > > been > > > > > > > > > created > > > > > > > > > > > > > > in the rest of the cluster. > > > > > > > > > > > > > > The metadata log is different (not managed by > > > > > LogManager), so I > > > > > > > > > think > > > > > > > > > > > > > > it should always be created regardless if its log > > > > > directory is > > > > > > > > > > > > > > cordoned or not. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 9, 2024 at 3:48 PM Chia-Ping Tsai < > > > > > > > > > chia7...@gmail.com> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hi Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > That is totally a good idea, but I have a > question > > > > > about the > > > > > > > > > > > > > > implementation > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Do we consider making pluggable ReplicaPlacer > > > > > (KIP-660) first > > > > > > > > > and > > > > > > > > > > > then > > > > > > > > > > > > > > add > > > > > > > > > > > > > > > another impl of ReplicaPlacer to offer cordon > > > > > mechanism? > > > > > > > > Noted > > > > > > > > > that > > > > > > > > > > > > > > > `ReplicaPlacer` can implement Reconfigurable > to get > > > > > updated > > > > > > > > at > > > > > > > > > > > runtime. > > > > > > > > > > > > > > > That is similar to KIP-1066 - change > > > > cordoned.log.dirs > > > > > > > > through > > > > > > > > > > > configs > > > > > > > > > > > > > > > update. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The benefit is to let users have their > optimized > > > > > policy for > > > > > > > > > > > specific > > > > > > > > > > > > > > > scenario. Also, it can avoid that we add more > and > > > > more > > > > > > > > > mechanism > > > > > > > > > > > to our > > > > > > > > > > > > > > > code base. Of course we can merge the > mechanism which > > > > > can be > > > > > > > > > used > > > > > > > > > > > by 99% > > > > > > > > > > > > > > > users :smile > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > Chia-Ping > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Luke Chen <show...@gmail.com> 於 2024年7月9日 週二 > > > > 下午9:07寫道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mickael, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the KIP! > > > > > > > > > > > > > > > > This is a long waiting feature for many > users! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Questions: > > > > > > > > > > > > > > > > 1. I think piggyback the > "BrokerHeartbeatRequest" > > > > to > > > > > > > > forward > > > > > > > > > the > > > > > > > > > > > > > > corden log > > > > > > > > > > > > > > > > dir to controller makes sense to me. > > > > > > > > > > > > > > > > We already did similar things for fence, > controller > > > > > > > > shutdown, > > > > > > > > > > > failed > > > > > > > > > > > > > > log > > > > > > > > > > > > > > > > dir...etc. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. In the admin API, what parameters will > the new > > > > > added > > > > > > > > > > > isCordoned() > > > > > > > > > > > > > > method > > > > > > > > > > > > > > > > take? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. In the KIP, we said: > > > > > > > > > > > > > > > > "defaultDir(): This method will not return > the Uuid > > > > > of a > > > > > > > > log > > > > > > > > > > > directory > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > is not cordoned." > > > > > > > > > > > > > > > > --> It's hard to understand. Does that mean > we will > > > > > only > > > > > > > > > return > > > > > > > > > > > > > > cordoned > > > > > > > > > > > > > > > > log dir? > > > > > > > > > > > > > > > > From the current java doc of the interface, > it > > > > > doesn't look > > > > > > > > > > > right: > > > > > > > > > > > > > > > > "Get the default directory for new partitions > > > > placed > > > > > in a > > > > > > > > > given > > > > > > > > > > > > > > broker." > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. Currently, if a broker is registered and > then go > > > > > > > > offline. > > > > > > > > > In > > > > > > > > > > > this > > > > > > > > > > > > > > state, > > > > > > > > > > > > > > > > the controller will still distribute > partitions to > > > > > this > > > > > > > > > broker. > > > > > > > > > > > > > > > > So, if now, the broker get startup with > > > > > "cordoned.log.dirs" > > > > > > > > > set, > > > > > > > > > > > what > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > happen? > > > > > > > > > > > > > > > > Will the newly assigned partitions be created > > > > > successfully > > > > > > > > or > > > > > > > > > > > not? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 5. I think after a log dir get cordoned, we > can > > > > > always > > > > > > > > > uncordon > > > > > > > > > > > it, > > > > > > > > > > > > > > right? > > > > > > > > > > > > > > > > I think we should mention it in the KIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 6. If a broker is startup with > "cordoned.log.dirs" > > > > > set, and > > > > > > > > > does > > > > > > > > > > > that > > > > > > > > > > > > > > mean > > > > > > > > > > > > > > > > the internal topics partitions (ex: > > > > > __consumer_offsets) > > > > > > > > > cannot be > > > > > > > > > > > > > > created, > > > > > > > > > > > > > > > > either? > > > > > > > > > > > > > > > > Also, if this log dir is happen to be the > metadata > > > > > log dir, > > > > > > > > > what > > > > > > > > > > > will > > > > > > > > > > > > > > > > happen to the metadata topic creation? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > Luke > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 9, 2024 at 12:12 AM Mickael > Maison < > > > > > > > > > > > > > > mickael.mai...@gmail.com> > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for taking a look. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Yes you're right, I meant > > > > > AlterPartitionReassignments. > > > > > > > > > Fixed. > > > > > > > > > > > > > > > > > - That's a good idea. I was expecting > users to > > > > > discover > > > > > > > > > > > cordoned log > > > > > > > > > > > > > > > > > directories by describing broker > configurations. > > > > > But > > > > > > > > being > > > > > > > > > > > able to > > > > > > > > > > > > > > > > > also get this information when describing > log > > > > > directories > > > > > > > > > makes > > > > > > > > > > > > > > sense. > > > > > > > > > > > > > > > > > I've added that to the KIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 5, 2024 at 8:05 AM Haruki > Okada < > > > > > > > > > > > ocadar...@gmail.com> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you for the KIP. > > > > > > > > > > > > > > > > > > The motivation sounds make sense to me. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have a few questions: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - [nits] "AlterPartitions request" in > Error > > > > > handling > > > > > > > > > section > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > "AlterPartitionReassignments request" > actually, > > > > > right? > > > > > > > > > > > > > > > > > > - Don't we need to include cordoned > information > > > > > in > > > > > > > > > > > DescribeLogDirs > > > > > > > > > > > > > > > > > response > > > > > > > > > > > > > > > > > > too? Some tools (e.g. CruiseControl) > need to > > > > > have a way > > > > > > > > > to > > > > > > > > > > > know > > > > > > > > > > > > > > which > > > > > > > > > > > > > > > > > > broker/log-dirs are cordoned to generate > > > > > partition > > > > > > > > > > > reassignment > > > > > > > > > > > > > > > > proposal. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2024年7月4日(木) 22:57 Mickael Maison < > > > > > > > > > mickael.mai...@gmail.com > > > > > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'd like to start a discussion on > KIP-1066 > > > > that > > > > > > > > > introduces > > > > > > > > > > > a > > > > > > > > > > > > > > > > mechanism > > > > > > > > > > > > > > > > > > > to cordon log directories and brokers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1066%3A+Mechanism+to+cordon+brokers+and+log+directories > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > ======================== > > > > > > > > > > > > > > > > > > Okada Haruki > > > > > > > > > > > > > > > > > > ocadar...@gmail.com > > > > > > > > > > > > > > > > > > ======================== > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > David Arthur > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > David Arthur >