Hi Stanislav, Just a couple quick questions:
1. I may have missed it, but what will be the default value for `max.uncleanable.partitions`? 2. It seems there will be some impact for users that monitoring "time-since-last-run-ms" in order to detect cleaner failures. Not sure it's a major concern, but probably worth mentioning in the compatibility section. Also, is this still a useful metric after this KIP? Also, maybe the "Needs Discussion" item can be moved to rejected alternatives since you've moved to a vote? I think leaving this for potential future work is reasonable. Thanks, Jason On Mon, Aug 6, 2018 at 12:29 PM, Ray Chiang <rchi...@apache.org> wrote: > I'm okay with that. > > -Ray > > On 8/6/18 10:59 AM, Colin McCabe wrote: > >> Perhaps we could start with max.uncleanable.partitions and then implement >> max.uncleanable.partitions.per.logdir in a follow-up change if it seemed >> to be necessary? What do you think? >> >> regards, >> Colin >> >> >> On Sat, Aug 4, 2018, at 10:53, Stanislav Kozlovski wrote: >> >>> Hey Ray, >>> >>> Thanks for the explanation. In regards to the configuration property - >>> I'm >>> not sure. As long as it has sufficient documentation, I find >>> "max.uncleanable.partitions" to be okay. If we were to add the >>> distinction >>> explicitly, maybe it should be `max.uncleanable.partitions.per.logdir` ? >>> >>> On Thu, Aug 2, 2018 at 7:32 PM Ray Chiang <rchi...@apache.org> wrote: >>> >>> One more thing occurred to me. Should the configuration property be >>>> named "max.uncleanable.partitions.per.disk" instead? >>>> >>>> -Ray >>>> >>>> >>>> On 8/1/18 9:11 AM, Stanislav Kozlovski wrote: >>>> >>>>> Yes, good catch. Thank you, James! >>>>> >>>>> Best, >>>>> Stanislav >>>>> >>>>> On Wed, Aug 1, 2018 at 5:05 PM James Cheng <wushuja...@gmail.com> >>>>> wrote: >>>>> >>>>> Can you update the KIP to say what the default is for >>>>>> max.uncleanable.partitions? >>>>>> >>>>>> -James >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> On Jul 31, 2018, at 9:56 AM, Stanislav Kozlovski < >>>>>>> >>>>>> stanis...@confluent.io> >>>> >>>>> wrote: >>>>>> >>>>>>> Hey group, >>>>>>> >>>>>>> I am planning on starting a voting thread tomorrow. Please do reply >>>>>>> if >>>>>>> >>>>>> you >>>>>> >>>>>>> feel there is anything left to discuss. >>>>>>> >>>>>>> Best, >>>>>>> Stanislav >>>>>>> >>>>>>> On Fri, Jul 27, 2018 at 11:05 PM Stanislav Kozlovski < >>>>>>> >>>>>> stanis...@confluent.io> >>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> Hey, Ray >>>>>>>> >>>>>>>> Thanks for pointing that out, it's fixed now >>>>>>>> >>>>>>>> Best, >>>>>>>> Stanislav >>>>>>>> >>>>>>>> On Fri, Jul 27, 2018 at 9:43 PM Ray Chiang <rchi...@apache.org> >>>>>>>>> >>>>>>>> wrote: >>>> >>>>> Thanks. Can you fix the link in the "KIPs under discussion" table on >>>>>>>>> the main KIP landing page >>>>>>>>> < >>>>>>>>> >>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+ >>>> Improvement+Proposals# >>>> >>>>> ? >>>>>>> >>>>>>>> I tried, but the Wiki won't let me. >>>>>>>>> >>>>>>>>> -Ray >>>>>>>>> >>>>>>>>> On 7/26/18 2:01 PM, Stanislav Kozlovski wrote: >>>>>>>>>> Hey guys, >>>>>>>>>> >>>>>>>>>> @Colin - good point. I added some sentences mentioning recent >>>>>>>>>> >>>>>>>>> improvements >>>>>>>>> >>>>>>>>>> in the introductory section. >>>>>>>>>> >>>>>>>>>> *Disk Failure* - I tend to agree with what Colin said - once a >>>>>>>>>> disk >>>>>>>>>> >>>>>>>>> fails, >>>>>>>>> >>>>>>>>>> you don't want to work with it again. As such, I've changed my >>>>>>>>>> mind >>>>>>>>>> >>>>>>>>> and >>>>>> >>>>>>> believe that we should mark the LogDir (assume its a disk) as >>>>>>>>>> >>>>>>>>> offline >>>> >>>>> on >>>>>> >>>>>>> the first `IOException` encountered. This is the LogCleaner's >>>>>>>>>> >>>>>>>>> current >>>> >>>>> behavior. We shouldn't change that. >>>>>>>>>> >>>>>>>>>> *Respawning Threads* - I believe we should never re-spawn a >>>>>>>>>> thread. >>>>>>>>>> >>>>>>>>> The >>>>>> >>>>>>> correct approach in my mind is to either have it stay dead or never >>>>>>>>>> >>>>>>>>> let >>>>>> >>>>>>> it >>>>>>>>> >>>>>>>>>> die in the first place. >>>>>>>>>> >>>>>>>>>> *Uncleanable-partition-names metric* - Colin is right, this metric >>>>>>>>>> >>>>>>>>> is >>>> >>>>> unneeded. Users can monitor the `uncleanable-partitions-count` >>>>>>>>>> >>>>>>>>> metric >>>> >>>>> and >>>>>>>>> >>>>>>>>>> inspect logs. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hey Ray, >>>>>>>>>> >>>>>>>>>> 2) I'm 100% with James in agreement with setting up the LogCleaner >>>>>>>>>>> >>>>>>>>>> to >>>> >>>>> skip over problematic partitions instead of dying. >>>>>>>>>>> >>>>>>>>>> I think we can do this for every exception that isn't >>>>>>>>>> `IOException`. >>>>>>>>>> >>>>>>>>> This >>>>>>>>> >>>>>>>>>> will future-proof us against bugs in the system and potential >>>>>>>>>> other >>>>>>>>>> >>>>>>>>> errors. >>>>>>>>> >>>>>>>>>> Protecting yourself against unexpected failures is always a good >>>>>>>>>> >>>>>>>>> thing >>>> >>>>> in >>>>>>>>> >>>>>>>>>> my mind, but I also think that protecting yourself against bugs in >>>>>>>>>> >>>>>>>>> the >>>> >>>>> software is sort of clunky. What does everybody think about this? >>>>>>>>>> >>>>>>>>>> 4) The only improvement I can think of is that if such an >>>>>>>>>>> error occurs, then have the option (configuration setting?) to >>>>>>>>>>> >>>>>>>>>> create a >>>>>> >>>>>>> <log_segment>.skip file (or something similar). >>>>>>>>>>> >>>>>>>>>> This is a good suggestion. Have others also seen corruption be >>>>>>>>>> >>>>>>>>> generally >>>>>> >>>>>>> tied to the same segment? >>>>>>>>>> >>>>>>>>>> On Wed, Jul 25, 2018 at 11:55 AM Dhruvil Shah < >>>>>>>>>> dhru...@confluent.io >>>>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> For the cleaner thread specifically, I do not think respawning >>>>>>>>>>> will >>>>>>>>>>> >>>>>>>>>> help at >>>>>>>>> >>>>>>>>>> all because we are more than likely to run into the same issue >>>>>>>>>>> >>>>>>>>>> again >>>> >>>>> which >>>>>>>>> >>>>>>>>>> would end up crashing the cleaner. Retrying makes sense for >>>>>>>>>>> >>>>>>>>>> transient >>>> >>>>> errors or when you believe some part of the system could have >>>>>>>>>>> >>>>>>>>>> healed >>>> >>>>> itself, both of which I think are not true for the log cleaner. >>>>>>>>>>> >>>>>>>>>>> On Wed, Jul 25, 2018 at 11:08 AM Ron Dagostino < >>>>>>>>>>> rndg...@gmail.com> >>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> <<<respawning threads is likely to make things worse, by putting >>>>>>>>>>>> >>>>>>>>>>> you >>>> >>>>> in >>>>>>>>> >>>>>>>>>> an >>>>>>>>>>> >>>>>>>>>>>> infinite loop which consumes resources and fires off continuous >>>>>>>>>>>> >>>>>>>>>>> log >>>> >>>>> messages. >>>>>>>>>>>> Hi Colin. In case it could be relevant, one way to mitigate >>>>>>>>>>>> this >>>>>>>>>>>> >>>>>>>>>>> effect >>>>>>>>> >>>>>>>>>> is >>>>>>>>>>> >>>>>>>>>>>> to implement a backoff mechanism (if a second respawn is to >>>>>>>>>>>> occur >>>>>>>>>>>> >>>>>>>>>>> then >>>>>> >>>>>>> wait >>>>>>>>>>> >>>>>>>>>>>> for 1 minute before doing it; then if a third respawn is to >>>>>>>>>>>> occur >>>>>>>>>>>> >>>>>>>>>>> wait >>>>>> >>>>>>> for >>>>>>>>>>> >>>>>>>>>>>> 2 minutes before doing it; then 4 minutes, 8 minutes, etc. up to >>>>>>>>>>>> >>>>>>>>>>> some >>>>>> >>>>>>> max >>>>>>>>> >>>>>>>>>> wait time). >>>>>>>>>>>> >>>>>>>>>>>> I have no opinion on whether respawn is appropriate or not in >>>>>>>>>>>> this >>>>>>>>>>>> >>>>>>>>>>> context, >>>>>>>>>>> >>>>>>>>>>>> but a mitigation like the increasing backoff described above may >>>>>>>>>>>> >>>>>>>>>>> be >>>> >>>>> relevant in weighing the pros and cons. >>>>>>>>>>>> >>>>>>>>>>>> Ron >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 25, 2018 at 1:26 PM Colin McCabe < >>>>>>>>>>>> cmcc...@apache.org> >>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On Mon, Jul 23, 2018, at 23:20, James Cheng wrote: >>>>>>>>>>>>>> Hi Stanislav! Thanks for this KIP! >>>>>>>>>>>>>> >>>>>>>>>>>>>> I agree that it would be good if the LogCleaner were more >>>>>>>>>>>>>> >>>>>>>>>>>>> tolerant >>>> >>>>> of >>>>>>>>> >>>>>>>>>> errors. Currently, as you said, once it dies, it stays dead. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Things are better now than they used to be. We have the metric >>>>>>>>>>>>>> >>>>>>>>>>>>>> kafka.log:type=LogCleanerManager,name=time-since-last-run-ms >>>> >>>>> which we can use to tell us if the threads are dead. And as of >>>>>>>>>>>>>> >>>>>>>>>>>>> 1.1.0, >>>>>>>>> >>>>>>>>>> we >>>>>>>>>>>> >>>>>>>>>>>>> have KIP-226, which allows you to restart the log cleaner >>>>>>>>>>>>>> >>>>>>>>>>>>> thread, >>>> >>>>> without requiring a broker restart. >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+ >>>> Dynamic+Broker+Configuration >>>> >>>>> < >>>>>>>>>>>>>> >>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+ >>>> Dynamic+Broker+Configuration >>>> >>>>> I've only read about this, I haven't personally tried it. >>>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for pointing this out, James! Stanislav, we should >>>>>>>>>>>>> >>>>>>>>>>>> probably >>>> >>>>> add a >>>>>>>>>>> >>>>>>>>>>>> sentence or two mentioning the KIP-226 changes somewhere in the >>>>>>>>>>>>> >>>>>>>>>>>> KIP. >>>>>> >>>>>>> Maybe >>>>>>>>>>>> >>>>>>>>>>>>> in the intro section? >>>>>>>>>>>>> >>>>>>>>>>>>> I think it's clear that requiring the users to manually restart >>>>>>>>>>>>> >>>>>>>>>>>> the >>>> >>>>> log >>>>>>>>> >>>>>>>>>> cleaner is not a very good solution. But it's good to know that >>>>>>>>>>>>> >>>>>>>>>>>> it's a >>>>>>>>> >>>>>>>>>> possibility on some older releases. >>>>>>>>>>>>> >>>>>>>>>>>>> Some comments: >>>>>>>>>>>>>> * I like the idea of having the log cleaner continue to clean >>>>>>>>>>>>>> as >>>>>>>>>>>>>> >>>>>>>>>>>>> many >>>>>>>>> >>>>>>>>>> partitions as it can, skipping over the problematic ones if >>>>>>>>>>>>>> >>>>>>>>>>>>> possible. >>>>>>>>> >>>>>>>>>> * If the log cleaner thread dies, I think it should >>>>>>>>>>>>>> >>>>>>>>>>>>> automatically >>>> >>>>> be >>>>>> >>>>>>> revived. Your KIP attempts to do that by catching exceptions >>>>>>>>>>>>>> >>>>>>>>>>>>> during >>>>>> >>>>>>> execution, but I think we should go all the way and make sure >>>>>>>>>>>>>> >>>>>>>>>>>>> that a >>>>>> >>>>>>> new >>>>>>>>>>>> >>>>>>>>>>>>> one gets created, if the thread ever dies. >>>>>>>>>>>>>> >>>>>>>>>>>>> This is inconsistent with the way the rest of Kafka works. We >>>>>>>>>>>>> >>>>>>>>>>>> don't >>>>>> >>>>>>> automatically re-create other threads in the broker if they >>>>>>>>>>>>> >>>>>>>>>>>> terminate. >>>>>>>>> >>>>>>>>>> In >>>>>>>>>>>> >>>>>>>>>>>>> general, if there is a serious bug in the code, respawning >>>>>>>>>>>>> >>>>>>>>>>>> threads >>>> >>>>> is >>>>>> >>>>>>> likely to make things worse, by putting you in an infinite loop >>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>> >>>>>>> consumes resources and fires off continuous log messages. >>>>>>>>>>>>> >>>>>>>>>>>>> * It might be worth trying to re-clean the uncleanable >>>>>>>>>>>>>> >>>>>>>>>>>>> partitions. >>>> >>>>> I've >>>>>>>>>>> >>>>>>>>>>>> seen cases where an uncleanable partition later became >>>>>>>>>>>>>> >>>>>>>>>>>>> cleanable. >>>> >>>>> I >>>>>> >>>>>>> unfortunately don't remember how that happened, but I remember >>>>>>>>>>>>>> >>>>>>>>>>>>> being >>>>>> >>>>>>> surprised when I discovered it. It might have been something >>>>>>>>>>>>>> >>>>>>>>>>>>> like >>>> >>>>> a >>>>>> >>>>>>> follower was uncleanable but after a leader election happened, >>>>>>>>>>>>>> >>>>>>>>>>>>> the >>>> >>>>> log >>>>>>>>>>> >>>>>>>>>>>> truncated and it was then cleanable again. I'm not sure. >>>>>>>>>>>>>> >>>>>>>>>>>>> James, I disagree. We had this behavior in the Hadoop >>>>>>>>>>>>> >>>>>>>>>>>> Distributed >>>> >>>>> File >>>>>>>>> >>>>>>>>>> System (HDFS) and it was a constant source of user problems. >>>>>>>>>>>>> >>>>>>>>>>>>> What would happen is disks would just go bad over time. The >>>>>>>>>>>>> >>>>>>>>>>>> DataNode >>>>>> >>>>>>> would notice this and take them offline. But then, due to some >>>>>>>>>>>>> "optimistic" code, the DataNode would periodically try to >>>>>>>>>>>>> re-add >>>>>>>>>>>>> >>>>>>>>>>>> them >>>>>> >>>>>>> to >>>>>>>>>>> >>>>>>>>>>>> the system. Then one of two things would happen: the disk would >>>>>>>>>>>>> >>>>>>>>>>>> just >>>>>> >>>>>>> fail >>>>>>>>>>>> >>>>>>>>>>>>> immediately again, or it would appear to work and then fail >>>>>>>>>>>>> >>>>>>>>>>>> after a >>>> >>>>> short >>>>>>>>>>> >>>>>>>>>>>> amount of time. >>>>>>>>>>>>> >>>>>>>>>>>>> The way the disk failed was normally having an I/O request >>>>>>>>>>>>> take a >>>>>>>>>>>>> >>>>>>>>>>>> really >>>>>>>>>>> >>>>>>>>>>>> long time and time out. So a bunch of request handler threads >>>>>>>>>>>>> >>>>>>>>>>>> would >>>>>> >>>>>>> basically slam into a brick wall when they tried to access the >>>>>>>>>>>>> >>>>>>>>>>>> bad >>>> >>>>> disk, >>>>>>>>>>> >>>>>>>>>>>> slowing the DataNode to a crawl. It was even worse in the >>>>>>>>>>>>> second >>>>>>>>>>>>> >>>>>>>>>>>> scenario, >>>>>>>>>>>> >>>>>>>>>>>>> if the disk appeared to work for a while, but then failed. Any >>>>>>>>>>>>> >>>>>>>>>>>> data >>>>>> >>>>>>> that >>>>>>>>>>> >>>>>>>>>>>> had been written on that DataNode to that disk would be lost, >>>>>>>>>>>>> and >>>>>>>>>>>>> >>>>>>>>>>>> we >>>>>> >>>>>>> would >>>>>>>>>>>> >>>>>>>>>>>>> need to re-replicate it. >>>>>>>>>>>>> >>>>>>>>>>>>> Disks aren't biological systems-- they don't heal over time. >>>>>>>>>>>>> >>>>>>>>>>>> Once >>>> >>>>> they're >>>>>>>>>>>> >>>>>>>>>>>>> bad, they stay bad. The log cleaner needs to be robust against >>>>>>>>>>>>> >>>>>>>>>>>> cases >>>>>> >>>>>>> where >>>>>>>>>>>> >>>>>>>>>>>>> the disk really is failing, and really is returning bad data or >>>>>>>>>>>>> >>>>>>>>>>>> timing >>>>>>>>> >>>>>>>>>> out. >>>>>>>>>>>> >>>>>>>>>>>>> * For your metrics, can you spell out the full metric in >>>>>>>>>>>>>> >>>>>>>>>>>>> JMX-style >>>> >>>>> format, such as: >>>>>>>>>>>>>> >>>>>>>>>>>>>> kafka.log:type=LogCleanerManager,name=uncleanable- >>>> partitions-count >>>> >>>>> value=4 >>>>>>>>>>>>>> >>>>>>>>>>>>>> * For "uncleanable-partitions": topic-partition names can be >>>>>>>>>>>>>> >>>>>>>>>>>>> very >>>> >>>>> long. >>>>>>>>>>> >>>>>>>>>>>> I think the current max size is 210 characters (or maybe >>>>>>>>>>>>>> >>>>>>>>>>>>> 240-ish?). >>>>>> >>>>>>> Having the "uncleanable-partitions" being a list could be very >>>>>>>>>>>>>> >>>>>>>>>>>>> large >>>>>> >>>>>>> metric. Also, having the metric come out as a csv might be >>>>>>>>>>>>>> >>>>>>>>>>>>> difficult >>>>>> >>>>>>> to >>>>>>>>>>> >>>>>>>>>>>> work with for monitoring systems. If we *did* want the topic >>>>>>>>>>>>>> >>>>>>>>>>>>> names >>>> >>>>> to >>>>>>>>> >>>>>>>>>> be >>>>>>>>>>>> >>>>>>>>>>>>> accessible, what do you think of having the >>>>>>>>>>>>>> kafka.log:type=LogCleanerManag >>>>>>>>>>>>>> er,topic=topic1,partition=2 >>>>>>>>>>>>>> I'm not sure if LogCleanerManager is the right type, but my >>>>>>>>>>>>>> >>>>>>>>>>>>> example >>>>>> >>>>>>> was >>>>>>>>>>> >>>>>>>>>>>> that the topic and partition can be tags in the metric. That >>>>>>>>>>>>>> >>>>>>>>>>>>> will >>>> >>>>> allow >>>>>>>>>>> >>>>>>>>>>>> monitoring systems to more easily slice and dice the metric. I'm >>>>>>>>>>>>>> >>>>>>>>>>>>> not >>>>>> >>>>>>> sure what the attribute for that metric would be. Maybe >>>>>>>>>>>>>> >>>>>>>>>>>>> something >>>> >>>>> like >>>>>>>>>>> >>>>>>>>>>>> "uncleaned bytes" for that topic-partition? Or >>>>>>>>>>>>>> >>>>>>>>>>>>> time-since-last-clean? >>>>>>>>> >>>>>>>>>> Or >>>>>>>>>>>> >>>>>>>>>>>>> maybe even just "Value=1". >>>>>>>>>>>>>> >>>>>>>>>>>>> I haven't though about this that hard, but do we really need >>>>>>>>>>>>> the >>>>>>>>>>>>> uncleanable topic names to be accessible through a metric? It >>>>>>>>>>>>> >>>>>>>>>>>> seems >>>>>> >>>>>>> like >>>>>>>>>>> >>>>>>>>>>>> the admin should notice that uncleanable partitions are present, >>>>>>>>>>>>> >>>>>>>>>>>> and >>>>>> >>>>>>> then >>>>>>>>>>> >>>>>>>>>>>> check the logs? >>>>>>>>>>>>> >>>>>>>>>>>>> * About `max.uncleanable.partitions`, you said that this likely >>>>>>>>>>>>>> indicates that the disk is having problems. I'm not sure that >>>>>>>>>>>>>> is >>>>>>>>>>>>>> >>>>>>>>>>>>> the >>>>>> >>>>>>> case. For the 4 JIRAs that you mentioned about log cleaner >>>>>>>>>>>>>> >>>>>>>>>>>>> problems, >>>>>> >>>>>>> all >>>>>>>>>>>> >>>>>>>>>>>>> of them are partition-level scenarios that happened during >>>>>>>>>>>>>> >>>>>>>>>>>>> normal >>>> >>>>> operation. None of them were indicative of disk problems. >>>>>>>>>>>>>> >>>>>>>>>>>>> I don't think this is a meaningful comparison. In general, we >>>>>>>>>>>>> >>>>>>>>>>>> don't >>>>>> >>>>>>> accept JIRAs for hard disk problems that happen on a particular >>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>> >>>>>>>>>>>> If someone opened a JIRA that said "my hard disk is having >>>>>>>>>>>>> >>>>>>>>>>>> problems" >>>>>> >>>>>>> we >>>>>>>>> >>>>>>>>>> could close that as "not a Kafka bug." This doesn't prove that >>>>>>>>>>>>> >>>>>>>>>>>> disk >>>>>> >>>>>>> problems don't happen, but just that JIRA isn't the right place >>>>>>>>>>>>> >>>>>>>>>>>> for >>>>>> >>>>>>> them. >>>>>>>>>>>> >>>>>>>>>>>>> I do agree that the log cleaner has had a significant number of >>>>>>>>>>>>> >>>>>>>>>>>> logic >>>>>> >>>>>>> bugs, and that we need to be careful to limit their impact. >>>>>>>>>>>>> >>>>>>>>>>>> That's >>>> >>>>> one >>>>>>>>> >>>>>>>>>> reason why I think that a threshold of "number of uncleanable >>>>>>>>>>>>> >>>>>>>>>>>> logs" >>>> >>>>> is >>>>>>>>> >>>>>>>>>> a >>>>>>>>>>> >>>>>>>>>>>> good idea, rather than just failing after one IOException. In >>>>>>>>>>>>> >>>>>>>>>>>> all >>>> >>>>> the >>>>>>>>> >>>>>>>>>> cases I've seen where a user hit a logic bug in the log cleaner, >>>>>>>>>>>>> >>>>>>>>>>>> it >>>> >>>>> was >>>>>>>>> >>>>>>>>>> just one partition that had the issue. We also should increase >>>>>>>>>>>>> >>>>>>>>>>>> test >>>>>> >>>>>>> coverage for the log cleaner. >>>>>>>>>>>>> >>>>>>>>>>>>> * About marking disks as offline when exceeding a certain >>>>>>>>>>>>>> >>>>>>>>>>>>> threshold, >>>>>> >>>>>>> that actually increases the blast radius of log compaction >>>>>>>>>>>>>> >>>>>>>>>>>>> failures. >>>>>> >>>>>>> Currently, the uncleaned partitions are still readable and >>>>>>>>>>>>>> >>>>>>>>>>>>> writable. >>>>>> >>>>>>> Taking the disks offline would impact availability of the >>>>>>>>>>>>>> >>>>>>>>>>>>> uncleanable >>>>>>>>> >>>>>>>>>> partitions, as well as impact all other partitions that are on >>>>>>>>>>>>>> >>>>>>>>>>>>> the >>>> >>>>> disk. >>>>>>>>>>>> >>>>>>>>>>>>> In general, when we encounter I/O errors, we take the disk >>>>>>>>>>>>> >>>>>>>>>>>> partition >>>>>> >>>>>>> offline. This is spelled out in KIP-112 ( >>>>>>>>>>>>> >>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-112% >>>> 3A+Handle+disk+failure+for+JBOD >>>> >>>>> ) : >>>>>>>>>>>>> >>>>>>>>>>>>> - Broker assumes a log directory to be good after it starts, >>>>>>>>>>>>>> and >>>>>>>>>>>>>> >>>>>>>>>>>>> mark >>>>>>>>> >>>>>>>>>> log directory as >>>>>>>>>>>>> >>>>>>>>>>>>>> bad once there is IOException when broker attempts to access >>>>>>>>>>>>>> >>>>>>>>>>>>> (i.e. >>>> >>>>> read >>>>>>>>>>> >>>>>>>>>>>> or write) the log directory. >>>>>>>>>>>>> >>>>>>>>>>>>>> - Broker will be offline if all log directories are bad. >>>>>>>>>>>>>> - Broker will stop serving replicas in any bad log directory. >>>>>>>>>>>>>> >>>>>>>>>>>>> New >>> >>>