Hi Igor, Thanks for the KIP, I've finally managed to take an initial look.
0. You mention the command line tools (which one?) in the public interfaces section, but don't spell out how it changes -- what options are added. Reading the proposed changes it suggests that there are no changes to the supported options and that it is done automatically during initial formatting based on the broker config. But what about the case where we're upgrading an existing non-JBOD KRaft cluster where the meta.properties already exists? Do we just run `./bin/kafka-storage.sh format -c /tmp/server.properties` again? How would an operator remove an existing log dir? 1. In the example for the storage format command I think it's worth explaining it in a little more detail. i.e. that the `directory.ids` has three items: two for the configured log.dirs and one for the configured metadata.log.dir. 2. In 'Broker lifecycle management' you presumably want to check that the directory ids for each log dir are actually unique. 3. I don't understand the motivation for having the controller decide the log dir for new replicas. I think there are two cases we want to support: a) Where the user specifies the log dir (likely based on some external information). This is out of scope for this KIP. b) If the user didn't specify, isn't the broker in a better position to decide (for example, based on free storage), and the communicate back to the controller the log dir that was chosen using the ASSIGN_REPLICAS_TO_DIRECTORIES RPC? 4. Broker registration. I don't understand the intent behind the optimization for the single log dir case (last bullet). "Brokers whose registration indicate that multiple log directories are configured remain FENCED until all log directory assignments for that broker are learnt by the active controller and persisted into metadata." is something you've committed to anyway. 5. AFAICS there's no way for the user to determine via the Kafka protocol which directory id corresponds to which log dir path. I.e. you've not changed any of the Admin APIs. Perhaps adding a Future Work section to spell out the pieces we know are missing would be a good idea? I would second Jason's idea for piggybacking on- and off-line state changes on the BrokerHeartbeat RPC. I suspect the complexity of making this incrementally isn't so great, given that both broker and controller need to keep track of the on- and off-line directories anyway. i.e. We could add LogDirsOfflined and LogDirsOnlined fields to both request and response and have the broker keep including a log dir in requests until acknowledged in the response, but otherwise they'd be empty. Thanks again, Tom On Tue, 25 Oct 2022 at 19:59, Igor Soarez <soa...@apple.com.invalid> wrote: > Hello, > > There’s now a proposal to address ZK to KRaft migration — KIP-866 — but > currently it doesn't address JBOD so I've decided to update this proposal > to address that migration scenario. > > So given that: > > - When migrating from a ZK cluster running JBOD to KRaft, brokers > registering in KRaft mode will need to be able to register all configured > log directories. > - As part of the migration, the mapping of partition to log directory will > have to be learnt by the active controller and persisted into the cluster > metadata. > - It isn’t safe to allow for leadership from replicas without this > mapping, as if the hosting log directory fails there will be no failover > mechanism. > > I have updated the proposal to reflect that: > > - Multiple log directories may be indicated in the first broker > registration referencing log directory UUIDs. We no longer require a single > log directory to start with. > - The controller must never assign leadership to a replica in a broker > registered with multiple log directories, unless the partition to log > directory assignment is already in the cluster metadata. > - The broker should not be unfenced until all of its partition to log > directory mapping is persisted into cluster metadata > > I've also added details as to how the ZK to KRaft migration can work in a > cluster that is already operating with JBOD. > > Please have a look and share your thoughts. > > -- > Igor > > >