Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Tom Bentley Tue, 01 Nov 2022 11:00:33 -0700

Hi Igor,

Thanks for the KIP, I've finally managed to take an initial look.

0. You mention the command line tools (which one?) in the public interfaces
section, but don't spell out how it changes -- what options are added.
Reading the proposed changes it suggests that there are no changes to the
supported options and that it is done automatically during initial
formatting based on the broker config. But what about the case where we're
upgrading an existing non-JBOD KRaft cluster where the meta.properties
already exists? Do we just run `./bin/kafka-storage.sh format -c
/tmp/server.properties` again? How would an operator remove an existing log
dir?

1. In the example for the storage format command I think it's worth
explaining it in a little more detail. i.e. that the `directory.ids` has
three items: two for the configured log.dirs and one for the configured
metadata.log.dir.

2. In 'Broker lifecycle management' you presumably want to check that the
directory ids for each log dir are actually unique.

3. I don't understand the motivation for having the controller decide the
log dir for new replicas. I think there are two cases we want to support:
a) Where the user specifies the log dir (likely based on some external
information). This is out of scope for this KIP.
b) If the user didn't specify, isn't the broker in a better position to
decide (for example, based on free storage), and the communicate back to
the controller the log dir that was chosen using the
ASSIGN_REPLICAS_TO_DIRECTORIES RPC?

4. Broker registration. I don't understand the intent behind the
optimization for the single log dir case (last bullet). "Brokers whose
registration indicate that multiple log directories are configured remain
FENCED until all log directory assignments for that broker are learnt by
the active controller and persisted into metadata." is something you've
committed to anyway.

5. AFAICS there's no way for the user to determine via the Kafka protocol
which directory id corresponds to which log dir path. I.e. you've not
changed any of the Admin APIs. Perhaps adding a Future Work section to
spell out the pieces we know are missing would be a good idea?

I would second Jason's idea for piggybacking on- and off-line state changes
on the BrokerHeartbeat RPC. I suspect the complexity of making this
incrementally isn't so great, given that both broker and controller need to
keep track of the on- and off-line directories anyway. i.e. We could add
LogDirsOfflined and LogDirsOnlined fields to both request and response and
have the broker keep including a log dir in requests until acknowledged in
the response, but otherwise they'd be empty.

Thanks again,

Tom

On Tue, 25 Oct 2022 at 19:59, Igor Soarez <soa...@apple.com.invalid> wrote:

> Hello,
>
> There’s now a proposal to address ZK to KRaft migration — KIP-866 — but
> currently it doesn't address JBOD so I've decided to update this proposal
> to address that migration scenario.
>
> So given that:
>
> - When migrating from a ZK cluster running JBOD to KRaft, brokers
> registering in KRaft mode will need to be able to register all configured
> log directories.
> - As part of the migration, the mapping of partition to log directory will
> have to be learnt by the active controller and persisted into the cluster
> metadata.
> - It isn’t safe to allow for leadership from replicas without this
> mapping, as if the hosting log directory fails there will be no failover
> mechanism.
>
> I have updated the proposal to reflect that:
>
> - Multiple log directories may be indicated in the first broker
> registration referencing log directory UUIDs. We no longer require a single
> log directory to start with.
> - The controller must never assign leadership to a replica in a broker
> registered with multiple log directories, unless the partition to log
> directory assignment is already in the cluster metadata.
> - The broker should not be unfenced until all of its partition to log
> directory mapping is persisted into cluster metadata
>
> I've also added details as to how the ZK to KRaft migration can work in a
> cluster that is already operating with JBOD.
>
> Please have a look and share your thoughts.
>
> --
> Igor
>
>
>

Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

Reply via email to