Hi Ron,

Thanks for pointing this out.

I was assuming this case was already handled in current KRaft operation
with a single log directory — I wouldn't expect a broker restarting
with an empty disk to cause data loss in a current KRaft system.

But your question made me go look again, so here's my understanding:

When unclean leader election (ULE) is disabled (by default),
a broker can only be elected as a leader if both of these are true:
* It is part of the ISR - see PartitionChangeBuilder.isValidNewLeader(r)
* It is either
  + Not fenced and not in a controlled shutdown; or
  + Currently being unfenced

Unless of course unclean leader election is enabled, in which case
yes it looks like you could have full data loss. But I don't think
we want to address this case.

If the broker is coming back with an empty disk, it would still
need to join the ISR to become the leader.
So as far as I can tell, this doesn't seem to be a problem.

If we do always add the assignment, even for single log dir uses,
there could potentially be some smarter and safer choices we can do
in the broker, but AFAICT this doesn't seem to be a hard requirement
right now.

And as Colin pointed out before, addding the assignment even when
only a single dir is configured may unecessarily penalize non JBOD
usecases.

Does this make sense? Did I understand your point correctly? WDYT?

--
Igor

On Fri, Sep 22, 2023, at 2:44 PM, Ron Dagostino wrote:
> Hi Igor.  Someone just asked about the case where a broker with a
> single log directory restarts with a blank disk.  I looked at the
> "Metadata caching" section, and I don't think it covers it as
> currently written.  The PartitionRecord will not have an explicit UUID
> for brokers that have just a single log directory (the idea was to
> save space), but the UUID is inferred to be the UUID associated with
> the single log directory that the broker had registered.  Assume a
> broker restarts with an empty disk.  That disk will get a new UUID,
> and upon broker registration the controller will see the UUID mismatch
> between what the broker is presenting now and what it had presented
> the last time it registered.  So we need to deal with this possibility
> even for the case where a broker has a single log directory.  WDYT?
> 
> Ron
> 
> On Tue, Sep 19, 2023 at 10:04 AM Ron Dagostino <rndg...@gmail.com> wrote:
> >
> > Ok, great, that makes sense, Igor.  Thanks.  +1 (binding) on the KIP from 
> > me.
> >
> > Ron
> >
> > > On Sep 13, 2023, at 11:58 AM, Igor Soarez <soa...@apple.com.invalid> 
> > > wrote:
> > >
> > > Hi Ron,
> > >
> > > Thanks for drilling down on this. I think the KIP isn't really clear here,
> > > and the metadata caching section you quoted needs clarification.
> > >
> > > The "hosting broker's latest registration" refers to the previous,
> > > not the current registration. The registrations are only compared by
> > > the controller, when handling the broker registration request.
> > >
> > > Suppose broker b1 hosts two partitions, t-1 and t-2, in two
> > > directories, d1 and d2. The broker is registered, and the
> > > metadata correlates the replicas to their respective directories.
> > > i.e. OnlineLogDirs=[d1,d2] and OfflineLogDirs=false
> > >
> > > The broker is then reconfigured to remove t-2 from log.dirs, and at 
> > > startup,
> > > the registration request shows OnlineLogDirs=[d1] and 
> > > OfflineLogDirs=false.
> > > The previous registration will only be replaced after a new successful
> > > registration, regardless of how quickly or how often b1 restarts.
> > > The controller compares the previous registration, and notices
> > > that one of the directories has been removed.
> > > So for any replica hosted in the broker that is assigned to that
> > > missing log directory, a logical metadata update takes place
> > > that assigned them to Uuid.OfflineDir, so Assignment.Directory
> > > is updated for t-2. This value is indicates that the replica
> > > is offline — I have updated the section you quoted to address this.
> > >
> > > Once the broker catches up with metadata, it will select the only
> > > configured log directory — d1 — for any partitions assigned to
> > > Uuid.OfflineDir, and update the assignment.
> > >
> > > Best,
> > >
> > > --
> > > Igor
> > >
> > >
> > >
> 

Reply via email to