Thanks, Igor.  Regarding the last point, I want to drill in on
something.  The KIP says this:

"Replicas will also be considered offline if the replica references a
log directory UUID (in the new field
partitionRecord.Assignment.Directory) that is not present in the
hosting Broker's latest registration under OnlineLogDirs and either:

the hosting broker's latest registration indicates multiple online log
directories. i.e. brokerRegistration.OnlineLogDirs.length > 1
the hosting broker's latest registration indicates that there are
offline directories. i.e. brokerRegistration.OfflineLogDirs == true

If neither of the above conditions are true, we assume that there is
only one log directory configured, the broker is not configured with
multiple log directories, replicas all live in the same directory and
neither log directory assignments nor log directory failures shall be
communicated to the Controller."


What does "the hosting broker's latest registration" refer to?  Is it
the prior registration that it had before it registered this time?  Or
is it this registration that it made when it came back up with a
single log directory?  If it is this registration, then neither of
those conditions hold and the replica is not considered offline.  If
it is the prior registration, then I don't know how we deal with the
broker restarting several ties in quick succession -- the "prior"
registration will not remain the same.

Ron


On Tue, Sep 12, 2023 at 10:58 AM Igor Soarez <soa...@apple.com.invalid> wrote:
>
> Hi Ron,
>
> Thank you for having a look a this KIP.
>
> Indeed, the log directory UUID should always be generated
> and loaded. I've have corrected the wording in the KIP to clarify.
>
> It is a bit of a pain to replace the field, but I agree that is
> the best approach for the same reason you pointed out.
>
> I have updated the log.dir.failure.timeout.ms config
> documentation to make it clear that it only applies when
> there are partitions being led from the failed directory.
>
> Your understanding is correct regarding the snapshot result
> after the logical update when a broker transitions to multiple
> log directories. I have updated the KIP to clarify that.
>
> > I wonder about the corner case where a broker that previously
> > had multiple log dirs is restarted with a new config that specifies
> > just a single log directory.  What would happen here?  If the broker
> > were not the leader then perhaps it would replicate the data into the
> > single log directory.  What would happen if it were the leader of a
> > partition that had been marked as offline?  Would we have data loss
> > even if other replicas still had data?
>
> There would be no data loss. After the configuration change,
> the broker would register indicating a single log directory
> and OfflineLogDirs==false. This indicates to the controller
> that any replicas in this broker that referenced a different
> (and non null / default) log directory, require a leadership
> update, that would prevent this broker from become a
> leader for those partitions. Those partitions are then created
> by the broker into the single configured log directory, and
> streamed from the new leaders.
> Does this make sense?
>
> Thanks,
>
> --
> Igor
>
>

Reply via email to