Hi James and all, I checked again and I can see when creating UnifiedLog, we expected the logs/indexes/snapshots are in good state. So, I don't think we should break the current design to expose the `RemainingBytesToRecovery` metric.
If there is no other comments, I'll start a vote within this week. Thank you. Luke On Fri, May 6, 2022 at 6:00 PM Luke Chen <show...@gmail.com> wrote: > Hi James, > > Thanks for your input. > > For the `RemainingBytesToRecovery` metric proposal, I think there's one > thing I didn't make it clear. > Currently, when log manager start up, we'll try to load all logs > (segments), and during the log loading, we'll try to recover logs if > necessary. > And the logs loading is using "thread pool" as you thought. > > So, here's the problem: > All segments in each log folder (partition) will be loaded in each log > recovery thread, and until it's loaded, we can know how many segments (or > how many Bytes) needed to recover. > That means, if we have 10 partition logs in one broker, and we have 2 log > recovery threads (num.recovery.threads.per.data.dir=2), before the > threads load the segments in each log, we only know how many logs > (partitions) we have in the broker (i.e. RemainingLogsToRecover metric). > We cannot know how many segments/Bytes needed to recover until each thread > starts to load the segments under one log (partition). > > So, the example in the KIP, it shows: > Currently, there are still 5 logs (partitions) needed to recover under > /tmp/log1 dir. And there are 2 threads doing the jobs, where one thread has > 10000 segments needed to recover, and the other one has 3 segments needed > to recover. > > - kafka.log > - LogManager > - RemainingLogsToRecover > - /tmp/log1 => 5 ← there are 5 logs under > /tmp/log1 needed to be recovered > - /tmp/log2 => 0 > - RemainingSegmentsToRecover > - /tmp/log1 ← 2 threads are doing log > recovery for /tmp/log1 > - 0 => 10000 ← there are 10000 segments needed to be > recovered for thread 0 > - 1 => 3 > - /tmp/log2 > - 0 => 0 > - 1 => 0 > > > So, after a while, the metrics might look like this: > It said, now, there are only 4 logs needed to recover in /tmp/log1, and > the thread 0 has 9000 segments left, and thread 1 has 5 segments left > (which should imply the thread already completed 2 logs recovery in the > period) > > - kafka.log > - LogManager > - RemainingLogsToRecover > - /tmp/log1 => 3 ← there are 3 logs under > /tmp/log1 needed to be recovered > - /tmp/log2 => 0 > - RemainingSegmentsToRecover > - /tmp/log1 ← 2 threads are doing log > recovery for /tmp/log1 > - 0 => 9000 ← there are 9000 segments needed to be > recovered for thread 0 > - 1 => 5 > - /tmp/log2 > - 0 => 0 > - 1 => 0 > > > That said, the `RemainingBytesToRecovery` metric is difficult to achieve > as you expected. I think the current proposal with `RemainingLogsToRecover` > and `RemainingSegmentsToRecover` should already provide enough info for > the log recovery progress. > > I've also updated the KIP example to make it clear. > > > Thank you. > Luke > > > On Thu, May 5, 2022 at 3:31 AM James Cheng <wushuja...@gmail.com> wrote: > >> Hi Luke, >> >> Thanks for adding RemainingSegmentsToRecovery. >> >> Another thought: different topics can have different segment sizes. I >> don't know how common it is, but it is possible. Some topics might want >> small segment sizes to more granular expiration of data. >> >> The downside of RemainingLogsToRecovery and RemainingSegmentsToRecovery >> is that the rate that they will decrement depends on the configuration and >> patterns of the topics and partitions and segment sizes. If someone is >> monitoring those metrics, they might see times where the metric decrements >> slowly, followed by a burst where it decrements quickly. >> >> What about RemainingBytesToRecovery? This would not depend on the >> configuration of the topic or of the data. It would actually be a pretty >> good metric, because I think that this metric would change at a constant >> rate (based on the disk I/O speed that the broker allocates to recovery). >> Because it changes at a constant rate, you would be able to use the >> rate-of-change to predict when it hits zero, which will let you know when >> the broker is going to start up. Like, I would imagine if we graphed >> RemainingBytesToRecovery that we'd see a fairly straight line that is >> decrementing at a steady rate towards zero. >> >> What do you think about adding RemainingBytesToRecovery? >> >> Or, what would you think about making the primary metric be >> RemainingBytesToRecovery, and getting rid of the others? >> >> I don't know if I personally would rather have all 3 metrics, or would >> just use RemainingBytesToRecovery. I'd too would like more community input >> on which of those metrics would be useful to people. >> >> About the JMX metrics, you said that if >> num.recovery.threads.per.data.dir=2, that there might be a separate >> RemainingSegmentsToRecovery counter for each thread. Is that actually how >> the data is structured within the Kafka recovery threads? Does each thread >> get a fixed set of partitions, or is there just one big pool of partitions >> that the threads all work on? >> >> As a more concrete example: >> * If I have 9 small partitions and 1 big partition, and >> num.recovery.threads.per.data.dir=2 >> Does each thread get 5 partitions, which means one thread will finish >> much sooner than the other? >> OR >> Do both threads just work on the set of 10 partitions, which means likely >> 1 thread will be busy with the big partition, while the other one ends up >> plowing through the 9 small partitions? >> >> If each thread gets assigned 5 partitions, then it would make sense that >> each thread has its own counter. >> If the threads works on a single pool of 10 partitions, then it would >> probably mean that the counter is on the pool of partitions itself, and not >> on each thread. >> >> -James >> >> > On May 4, 2022, at 5:55 AM, Luke Chen <show...@gmail.com> wrote: >> > >> > Hi devs, >> > >> > If there are no other comments, I'll start a vote tomorrow. >> > >> > Thank you. >> > Luke >> > >> > On Sun, May 1, 2022 at 5:08 PM Luke Chen <show...@gmail.com> wrote: >> > >> >> Hi James, >> >> >> >> Sorry for the late reply. >> >> >> >> Yes, this is a good point, to know how many segments to be recovered if >> >> there are some large partitions. >> >> I've updated the KIP, to add a `*RemainingSegmentsToRecover*` metric >> for >> >> each log recovery thread, to show the value. >> >> The example in the Proposed section here >> >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges >> > >> >> shows what it will look like. >> >> >> >> Thanks for the suggestion. >> >> >> >> Thank you. >> >> Luke >> >> >> >> >> >> >> >> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <wushuja...@gmail.com> >> wrote: >> >> >> >>> The KIP describes RemainingLogsToRecovery, which seems to be the >> number >> >>> of partitions in each log.dir. >> >>> >> >>> We have some partitions which are much much larger than others. Those >> >>> large partitions have many many more segments than others. >> >>> >> >>> Is there a way the metric can reflect partition size? Could it be >> >>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover? >> >>> >> >>> -James >> >>> >> >>> Sent from my iPhone >> >>> >> >>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <show...@gmail.com> wrote: >> >>>> >> >>>> Hi all, >> >>>> >> >>>> I'd like to propose a KIP to expose a metric for log recovery >> progress. >> >>>> This metric would let the admins have a way to monitor the log >> recovery >> >>>> progress. >> >>>> Details can be found here: >> >>>> >> >>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress >> >>>> >> >>>> Any feedback is appreciated. >> >>>> >> >>>> Thank you. >> >>>> Luke >> >>> >> >> >> >>