Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

Stanislav Kozlovski Tue, 24 Jul 2018 17:23:17 -0700

Hey James, Ted,

@James - Thanks for showing me some of the changes, that was informative.

* *Log Cleaner Thread Revival* - I also acknowledge that could be useful.
My concern is that if the thread has died, there is most likely something
wrong with either the disk or the software and since both are deterministic
(correct me if I'm wrong), we will most likely hit it very soon again. I am
not sure that scenario would be any good, but I am also not sure if it
would hurt. Could it waste a significant amount of CPU from dying and
running again?

* *Partition Re-clean* - Hmm, maybe some sort of retry mechanism could be
worth exploring. I'd like to hear other people's opinion on this and
whether or not they've seen such scenarios before diving into possible
implementation.

* *Metric* - Could you point me to the some resources showing how the JMX
metrics should be structured? I could not found any and am sadly not too
knowledgeable on the topic

* *uncleanable-partitions* *metric* - Yes, that might be problematic. Maybe
the format Ted suggested would be best - "topic1-0,1,2". Then again, I fear
we might still run out of characters. I am not sure how to best approach
this yet.

* *Disk Problems* - I am aware that the 4 JIRAs are not related to disk
problems. I think this KIP brings the most value to exactly such scenarios
- ones where the disk is OK. But then again, I thought I'd suggest failing
the disk after a certain number of errors on it since it makes sense to me.
I do not have a strong opinion about this, though. Now that you mentioned
that this actually increases the blast radius - I tend to agree. Maybe we
should scrap this behavior.

Best,
Stanislav

On Tue, Jul 24, 2018 at 6:13 AM Ted Yu <yuzhih...@gmail.com> wrote:

> As James pointed out in his reply, topic-partition name can be long.
> It is not necessary to repeat the topic name for each of its partitions.
> How about the following format:
>
> topic-name1-{partition1, partition2, etc}
>
> That is, topic name only appears once.
>
> Cheers
>
> On Mon, Jul 23, 2018 at 9:08 PM Stanislav Kozlovski <
> stanis...@confluent.io>
> wrote:
>
> > Hi Ted,
> >
> > Yes, absolutely. Thanks for pointing that out!
> >
> > On Mon, Jul 23, 2018 at 6:12 PM Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > For `uncleanable-partitions`, should the example include topic name(s)
> ?
> > >
> > > Cheers
> > >
> > > On Mon, Jul 23, 2018 at 5:46 PM Stanislav Kozlovski <
> > > stanis...@confluent.io>
> > > wrote:
> > >
> > > > I renamed the KIP and that changed the link. Sorry about that. Here
> is
> > > the
> > > > new link:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-346+-+Improve+LogCleaner+behavior+on+error
> > > >
> > > > On Mon, Jul 23, 2018 at 5:11 PM Stanislav Kozlovski <
> > > > stanis...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hey group,
> > > > >
> > > > > I created a new KIP about making log compaction more
> fault-tolerant.
> > > > > Please give it a look here and please share what you think,
> > especially
> > > in
> > > > > regards to the points in the "Needs Discussion" paragraph.
> > > > >
> > > > > KIP: KIP-346
> > > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-346+-+Limit+blast+radius+of+log+compaction+failure
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>

-- 
Best,
Stanislav

Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

Reply via email to