Hi folks,

Perhaps a solution option is to only rename partitions to
"whatever-topic-x.stray" when processing the LAIR and delete it with a
periodic task (so not with a fixed delay but have a thread which scans and
deletes them periodically). I think it has an advantage as it is a similar
approach that is used in deletion and compaction and won't cause immediate
mass deletion.

Viktor

On Thu, Jan 16, 2020 at 11:35 PM Colin McCabe <cmcc...@apache.org> wrote:

> On Thu, Jan 16, 2020, at 10:29, Dhruvil Shah wrote:
> > Hi Colin,
> >
> > That’s fair though I am unsure if a delay + metric + log message would
> > really serve our purpose. There would be no action required from the
> > operator in almost all cases. A signal that is not actionable in 99%
> cases
> > may not be very useful, in my opinion.
>
> As I understand it, the case we're trying to solve is where a broker has
> gone away for a while and then comes back, but some of its partitions have
> been moved to a different broker.  Because this case is already relatively
> rare, I don't think we need to worry too much about adding non-actionable
> signals.
>
> Maybe more importantly, broker downtime will also independently trigger
> alerts in a well-managed cluster.  So what we are adding is a metric that
> indicates that "something bad is happening" that is highly correlated with
> other "something bad is happening" metrics.  This is similar to URPs, or
> even under-min-isr partitions, which are all worth monitoring and possibly
> alerting on, and which will all tend to show activity at the same time.
>
> >
> > Additionally, if we add in a delay, we would need to reason about the
> > behavior when the same topic is recreated while a stray partition has
> been
> > queued for deletion.
> >
>
> This is a good question, but I think the current code already handles a
> very similar case.  The broker currently handles topic deletions in a
> two-step process.  The first step is renaming the topic directory.  The
> directory's new name will contain a UUID and end with .deleted.  The second
> step is actually deleting the directory.  (It was done in this way to allow
> deletion to be done asynchronously.)  I would expect the proposed delay
> mechanism to do something like this, such that a new topic created with the
> same name would not have a name collision.
>
> > I would be in support of adding a configuration to disable stray
> partition
> > deletion. This way, if users find abnormal behavior when testing /
> > upgrading development environments, they could choose to disable the
> > feature altogether.
> >
> > Let me know what you think. It would be good to hear what others think as
> > well.
>
> I feel strongly that this should come with a delay period and advance
> warning.  We just had too much pain with lost data as a result of bugs in
> HDFS leading to rapid deletion.  These bugs didn't manifest in testing or
> routine upgrades.
>
> best,
> Colin
>
>
> >
> > Thanks,
> > Dhruvil
> >
> > On Thu, Jan 16, 2020 at 3:24 AM Colin McCabe <cmcc...@apache.org> wrote:
> >
> > > On Wed, Jan 15, 2020, at 03:54, Dhruvil Shah wrote:
> > > > Hi Colin,
> > > >
> > > > We could add a configuration to disable stray partition deletion if
> > > needed,
> > > > but I wasn't sure if an operator would really want to disable it.
> Perhaps
> > > > if the implementation were buggy, the configuration could be used to
> > > > disable the feature until a bug fix is made. Is that the kind of use
> case
> > > > you were thinking of?
> > > >
> > > > I was thinking that there would not be any delay between detection
> and
> > > > deletion of stray logs. We would schedule an async task to do the
> actual
> > > > deletion though.
> > >
> > > Based on my experience in HDFS, immediately deleting data that looks
> out
> > > of place can cause severe issues when a bug occurs.  See
> > > https://issues.apache.org/jira/browse/HDFS-6186 for details.  So I
> really
> > > do think there should be a delay, and a metric + log message in the
> > > meantime to alert the operators to what is about to happen.
> > >
> > > best,
> > > Colin
> > >
> > > >
> > > > Thanks,
> > > > Dhruvil
> > > >
> > > > On Tue, Jan 14, 2020 at 11:04 PM Colin McCabe <cmcc...@apache.org>
> > > wrote:
> > > >
> > > > > Hi Dhruvil,
> > > > >
> > > > > Thanks for the KIP.  I think there should be some way to turn this
> > > off, in
> > > > > case that becomes necessary.  I'm also curious how long we intend
> to
> > > wait
> > > > > between detecting the duplication and  deleting the extra logs.
> The
> > > KIP
> > > > > says "scheduled for deletion" but doesn't give a time frame -- is
> it
> > > > > assumed to be immediate?
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Tue, Jan 14, 2020, at 05:56, Dhruvil Shah wrote:
> > > > > > If there are no more questions or concerns, I will start a vote
> > > thread
> > > > > > tomorrow.
> > > > > >
> > > > > > Thanks,
> > > > > > Dhruvil
> > > > > >
> > > > > > On Mon, Jan 13, 2020 at 6:59 PM Dhruvil Shah <
> dhru...@confluent.io>
> > > > > wrote:
> > > > > >
> > > > > > > Hi Nikhil,
> > > > > > >
> > > > > > > Thanks for looking at the KIP. The kind of race condition you
> > > mention
> > > > > is
> > > > > > > not possible as stray partition detection is done synchronously
> > > while
> > > > > > > handling the LeaderAndIsrRequest. In other words, we atomically
> > > > > evaluate
> > > > > > > the partitions the broker must host and the extra partitions
> it is
> > > > > hosting
> > > > > > > and schedule deletions based on that.
> > > > > > >
> > > > > > > One possible shortcoming of the KIP is that we do not have the
> > > ability
> > > > > to
> > > > > > > detect a stray partition if the topic has been recreated
> since. We
> > > will
> > > > > > > have the ability to disambiguate between different generations
> of a
> > > > > > > partition with KIP-516.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Dhruvil
> > > > > > >
> > > > > > > On Sat, Jan 11, 2020 at 11:40 AM Nikhil Bhatia <
> > > nik...@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Thanks Dhruvil, the proposal looks reasonable to me.
> > > > > > >>
> > > > > > >> is there a potential of a race between a new topic being
> assigned
> > > to
> > > > > the
> > > > > > >> same node that is still performing a cleanup of the stray
> > > partition ?
> > > > > > >> Topic
> > > > > > >> ID will definitely solve this issue.
> > > > > > >>
> > > > > > >> Thanks
> > > > > > >> Nikhil
> > > > > > >>
> > > > > > >> On 2020/01/06 04:30:20, Dhruvil Shah <d...@confluent.io>
> wrote:
> > > > > > >> > Here is the link to the KIP:>
> > > > > > >> >
> > > > > > >>
> > > > > > >>
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-550%3A+Mechanism+to+Delete+Stray+Partitions+on+Broker
> > > > > > >> >
> > > > > > >>
> > > > > > >> >
> > > > > > >> > On Mon, Jan 6, 2020 at 9:59 AM Dhruvil Shah <
> dh...@confluent.io
> > > >
> > > > > > >> wrote:>
> > > > > > >> >
> > > > > > >> > > Hi all, I would like to kick off discussion for KIP-550
> which
> > > > > proposes
> > > > > > >> a>
> > > > > > >> > > mechanism to detect and delete stray partitions on a
> broker.
> > > > > > >> Suggestions>
> > > > > > >> > > and feedback are welcome.>
> > > > > > >> > >>
> > > > > > >> > > - Dhruvil>
> > > > > > >> > >>
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to