Hi Jun, Thanks for the comment.
I've updated the KIP as: 1. remainingLogsToRecover*y* -> remainingLogsToRecover 2. remainingSegmentsToRecover*y* -> remainingSegmentsToRecover 3. The description of remainingSegmentsToRecover: The remaining segments for the current log assigned to the recovery thread. Thank you. Luke On Sat, Jun 4, 2022 at 12:54 AM Jun Rao <j...@confluent.io.invalid> wrote: > Hi, Luke, > > Thanks for the explanation. > > 10. It makes sense to me now. Instead of using a longer name, perhaps we > could keep the current name, but make the description clear that it's the > remaining segments for the current log assigned to a thread. Also, would it > be better to use ToRecover instead of ToRecovery? > > Thanks, > > Jun > > On Fri, Jun 3, 2022 at 1:18 AM Luke Chen <show...@gmail.com> wrote: > > > Hi Jun, > > > > > how do we implement kafka.log > > > > > :type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+), > > which tracks at the segment level? > > > > It looks like the name is misleading. > > Suppose we have 2 log recovery threads > > (num.recovery.threads.per.data.dir=2), > > and 10 logs to iterate through > > As mentioned before, we don't know how many segments in each log until > the > > log is iterated(loaded) > > So, when thread-1 iterates logA, it gets the segments to recover, and > > expose the number to `remainingSegmentsToRecovery` metric. > > And the same for thread-2 iterating logB. > > That is, the metric showed in `remainingSegmentsToRecovery` is actually > the > > remaining segments to recover "in a specific log". > > > > Maybe I should rename it: remainingSegmentsToRecovery -> > > remainingSegmentsToRecoverInCurrentLog > > WDYT? > > > > Thank you. > > Luke > > > > On Fri, Jun 3, 2022 at 1:27 AM Jun Rao <j...@confluent.io.invalid> wrote: > > > > > Hi, Luke, > > > > > > Thanks for the reply. > > > > > > 10. You are saying it's difficult to track the number of segments to > > > recover. But how do we > > > implement > > > > > > kafka.log:type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+), > > > which tracks at the segment level? > > > > > > Jun > > > > > > On Thu, Jun 2, 2022 at 3:39 AM Luke Chen <show...@gmail.com> wrote: > > > > > > > Hi Jun, > > > > > > > > Thanks for the comment. > > > > > > > > Yes, I've tried to work on this way to track the number of remaining > > > > segments, but it will change the design in UnifiedLog, so I only > track > > > the > > > > logs number. > > > > Currently, we will load all segments and recover those segments if > > needed > > > > "during creating UnifiedLog instance". And also get the log offsets > > here > > > > < > > > > > > > > > > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/UnifiedLog.scala#L1819-L1842 > > > > > > > > > . > > > > That is, if we want to get all segments to be recovered before > running > > > log > > > > recovery, we need to break the logic in UnifiedLog, to create a > partial > > > > UnifiedLog instance, and add more info to it later, which I think is > > just > > > > making the codes more complicated. > > > > > > > > > > > > Thank you. > > > > Luke > > > > > > > > > > > > > > > > On Thu, May 26, 2022 at 2:57 AM Jun Rao <j...@confluent.io.invalid> > > > wrote: > > > > > > > > > Hi, Luke, > > > > > > > > > > Thanks for the KIP. Just one comment. > > > > > > > > > > 10. For kafka.log:type=LogManager,name=remainingLogsToRecovery, > could > > > we > > > > > instead track the number of remaining segments? This monitors the > > > > progress > > > > > at a finer granularity and is also consistent with the thread level > > > > metric. > > > > > > > > > > Thanks, > > > > > > > > > > Jun > > > > > > > > > > On Wed, May 25, 2022 at 7:47 AM Tom Bentley <tbent...@redhat.com> > > > wrote: > > > > > > > > > > > Thanks Luke! LGTM. > > > > > > > > > > > > On Sun, 22 May 2022 at 05:18, Luke Chen <show...@gmail.com> > wrote: > > > > > > > > > > > > > Hi Tom and Raman, > > > > > > > > > > > > > > Thanks for your comments. > > > > > > > > > > > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs > > > > updating). > > > > > > > 2. Similarly the link to this discussion thread needs updating. > > > > > > > > Please update the links to JIRA and the discussion thread. > > > > > > > > > > > > > > Yes, thanks for the reminder. I've updated the KIP. > > > > > > > > > > > > > > > 3. I wonder whether we need to keep these metrics (with value > > 0) > > > > once > > > > > > the > > > > > > > broker enters the running state. Do you see it as valuable? A > > > benefit > > > > > of > > > > > > > removing the metrics would be a reduction on storage required > for > > > > > metric > > > > > > > stores which are recording these metrics. > > > > > > > > > > > > > > Yes, removing the metrics after log recovery completed is a > good > > > > idea. > > > > > > > Updated the KIP. > > > > > > > > > > > > > > > 4. I think the KIP's public interfaces section could be a bit > > > > > clearer. > > > > > > > Previous KIPs which added metrics usually used a table, with > the > > > > MBean > > > > > > > name, metric type and description. SeeKIP-551 for example (or > > > > KIP-748, > > > > > > > KIP-608). Similarly you could use a table in the proposed > changes > > > > > section > > > > > > > rather than describing the tree you'd see in an MBean console. > > > > > > > > > > > > > > Good point! Updated the KIP to use a table to list the MBean > > name, > > > > > metric > > > > > > > type and descriptions. > > > > > > > > > > > > > > > > > > > > > Thank you. > > > > > > > Luke > > > > > > > > > > > > > > On Fri, May 20, 2022 at 9:13 AM Raman Verma > > > > > <rve...@confluent.io.invalid > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Luke, > > > > > > > > > > > > > > > > The change is useful and simple. Thanks. > > > > > > > > Please update the links to JIRA and the discussion thread. > > > > > > > > > > > > > > > > Best Regards, > > > > > > > > Raman Verma > > > > > > > > > > > > > > > > On Thu, May 19, 2022 at 8:57 AM Tom Bentley < > > tbent...@redhat.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi Luke, > > > > > > > > > > > > > > > > > > Thanks for the KIP. I think the idea makes sense and would > > > > provide > > > > > > > useful > > > > > > > > > observability of log recovery. I have a few comments. > > > > > > > > > > > > > > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs > > > > > updating). > > > > > > > > > 2. Similarly the link to this discussion thread needs > > updating. > > > > > > > > > 3. I wonder whether we need to keep these metrics (with > value > > > 0) > > > > > once > > > > > > > the > > > > > > > > > broker enters the running state. Do you see it as > valuable? A > > > > > benefit > > > > > > > of > > > > > > > > > removing the metrics would be a reduction on storage > required > > > for > > > > > > > metric > > > > > > > > > stores which are recording these metrics. > > > > > > > > > 4. I think the KIP's public interfaces section could be a > bit > > > > > > clearer. > > > > > > > > > Previous KIPs which added metrics usually used a table, > with > > > the > > > > > > MBean > > > > > > > > > name, metric type and description. SeeKIP-551 for example > (or > > > > > > KIP-748, > > > > > > > > > KIP-608). Similarly you could use a table in the proposed > > > changes > > > > > > > section > > > > > > > > > rather than describing the tree you'd see in an MBean > > console. > > > > > > > > > > > > > > > > > > Kind regards, > > > > > > > > > > > > > > > > > > Tom > > > > > > > > > > > > > > > > > > On Wed, 11 May 2022 at 09:08, Luke Chen <show...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > And if people start using RemainingLogs and > > > RemainingSegments > > > > > and > > > > > > > > then > > > > > > > > > > REALLY FEEL like they need RemainingBytes, then we can > > always > > > > add > > > > > > it > > > > > > > > in the > > > > > > > > > > future. > > > > > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > Thanks James! > > > > > > > > > > Luke > > > > > > > > > > > > > > > > > > > > On Wed, May 11, 2022 at 3:57 PM James Cheng < > > > > > wushuja...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi Luke, > > > > > > > > > > > > > > > > > > > > > > Thanks for the detailed explanation. I agree that the > > > current > > > > > > > > proposal of > > > > > > > > > > > RemainingLogs and RemainingSegments will greatly > improve > > > the > > > > > > > > situation, > > > > > > > > > > and > > > > > > > > > > > that we can go ahead with the KIP as is. > > > > > > > > > > > > > > > > > > > > > > If RemainingBytes were straight-forward to implement, > > then > > > > I’d > > > > > > like > > > > > > > > to > > > > > > > > > > > have it. But we can live without it for now. And if > > people > > > > > start > > > > > > > > using > > > > > > > > > > > RemainingLogs and RemainingSegments and then REALLY > FEEL > > > like > > > > > > they > > > > > > > > need > > > > > > > > > > > RemainingBytes, then we can always add it in the > future. > > > > > > > > > > > > > > > > > > > > > > Thanks Luke, for the detailed explanation, and for > > > responding > > > > > to > > > > > > my > > > > > > > > > > > feedback! > > > > > > > > > > > > > > > > > > > > > > -James > > > > > > > > > > > > > > > > > > > > > > Sent from my iPhone > > > > > > > > > > > > > > > > > > > > > > > On May 10, 2022, at 6:48 AM, Luke Chen < > > > show...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi James and all, > > > > > > > > > > > > > > > > > > > > > > > > I checked again and I can see when creating > UnifiedLog, > > > we > > > > > > > > expected the > > > > > > > > > > > > logs/indexes/snapshots are in good state. > > > > > > > > > > > > So, I don't think we should break the current design > to > > > > > expose > > > > > > > the > > > > > > > > > > > > `RemainingBytesToRecovery` > > > > > > > > > > > > metric. > > > > > > > > > > > > > > > > > > > > > > > > If there is no other comments, I'll start a vote > within > > > > this > > > > > > > week. > > > > > > > > > > > > > > > > > > > > > > > > Thank you. > > > > > > > > > > > > Luke > > > > > > > > > > > > > > > > > > > > > > > >> On Fri, May 6, 2022 at 6:00 PM Luke Chen < > > > > show...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > >> > > > > > > > > > > > >> Hi James, > > > > > > > > > > > >> > > > > > > > > > > > >> Thanks for your input. > > > > > > > > > > > >> > > > > > > > > > > > >> For the `RemainingBytesToRecovery` metric proposal, > I > > > > think > > > > > > > > there's > > > > > > > > > > one > > > > > > > > > > > >> thing I didn't make it clear. > > > > > > > > > > > >> Currently, when log manager start up, we'll try to > > load > > > > all > > > > > > logs > > > > > > > > > > > >> (segments), and during the log loading, we'll try to > > > > recover > > > > > > > logs > > > > > > > > if > > > > > > > > > > > >> necessary. > > > > > > > > > > > >> And the logs loading is using "thread pool" as you > > > > thought. > > > > > > > > > > > >> > > > > > > > > > > > >> So, here's the problem: > > > > > > > > > > > >> All segments in each log folder (partition) will be > > > loaded > > > > > in > > > > > > > > each log > > > > > > > > > > > >> recovery thread, and until it's loaded, we can know > > how > > > > many > > > > > > > > segments > > > > > > > > > > > (or > > > > > > > > > > > >> how many Bytes) needed to recover. > > > > > > > > > > > >> That means, if we have 10 partition logs in one > > broker, > > > > and > > > > > we > > > > > > > > have 2 > > > > > > > > > > > log > > > > > > > > > > > >> recovery threads > > (num.recovery.threads.per.data.dir=2), > > > > > before > > > > > > > the > > > > > > > > > > > >> threads load the segments in each log, we only know > > how > > > > many > > > > > > > logs > > > > > > > > > > > >> (partitions) we have in the broker (i.e. > > > > > > RemainingLogsToRecover > > > > > > > > > > metric). > > > > > > > > > > > >> We cannot know how many segments/Bytes needed to > > recover > > > > > until > > > > > > > > each > > > > > > > > > > > thread > > > > > > > > > > > >> starts to load the segments under one log > (partition). > > > > > > > > > > > >> > > > > > > > > > > > >> So, the example in the KIP, it shows: > > > > > > > > > > > >> Currently, there are still 5 logs (partitions) > needed > > to > > > > > > recover > > > > > > > > under > > > > > > > > > > > >> /tmp/log1 dir. And there are 2 threads doing the > jobs, > > > > where > > > > > > one > > > > > > > > > > thread > > > > > > > > > > > has > > > > > > > > > > > >> 10000 segments needed to recover, and the other one > > has > > > 3 > > > > > > > segments > > > > > > > > > > > needed > > > > > > > > > > > >> to recover. > > > > > > > > > > > >> > > > > > > > > > > > >> - kafka.log > > > > > > > > > > > >> - LogManager > > > > > > > > > > > >> - RemainingLogsToRecover > > > > > > > > > > > >> - /tmp/log1 => 5 ← there are 5 > > > logs > > > > > > under > > > > > > > > > > > >> /tmp/log1 needed to be recovered > > > > > > > > > > > >> - /tmp/log2 => 0 > > > > > > > > > > > >> - RemainingSegmentsToRecover > > > > > > > > > > > >> - /tmp/log1 ← 2 > threads > > > are > > > > > > doing > > > > > > > > log > > > > > > > > > > > >> recovery for /tmp/log1 > > > > > > > > > > > >> - 0 => 10000 ← there are 10000 > > > segments > > > > > > > needed > > > > > > > > to > > > > > > > > > > be > > > > > > > > > > > >> recovered for thread 0 > > > > > > > > > > > >> - 1 => 3 > > > > > > > > > > > >> - /tmp/log2 > > > > > > > > > > > >> - 0 => 0 > > > > > > > > > > > >> - 1 => 0 > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> So, after a while, the metrics might look like this: > > > > > > > > > > > >> It said, now, there are only 4 logs needed to > recover > > in > > > > > > > > /tmp/log1, > > > > > > > > > > and > > > > > > > > > > > >> the thread 0 has 9000 segments left, and thread 1 > has > > 5 > > > > > > segments > > > > > > > > left > > > > > > > > > > > >> (which should imply the thread already completed 2 > > logs > > > > > > recovery > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > > >> period) > > > > > > > > > > > >> > > > > > > > > > > > >> - kafka.log > > > > > > > > > > > >> - LogManager > > > > > > > > > > > >> - RemainingLogsToRecover > > > > > > > > > > > >> - /tmp/log1 => 3 ← there are 3 > > > logs > > > > > > under > > > > > > > > > > > >> /tmp/log1 needed to be recovered > > > > > > > > > > > >> - /tmp/log2 => 0 > > > > > > > > > > > >> - RemainingSegmentsToRecover > > > > > > > > > > > >> - /tmp/log1 ← 2 > threads > > > are > > > > > > doing > > > > > > > > log > > > > > > > > > > > >> recovery for /tmp/log1 > > > > > > > > > > > >> - 0 => 9000 ← there are 9000 > > segments > > > > > > needed > > > > > > > > to be > > > > > > > > > > > >> recovered for thread 0 > > > > > > > > > > > >> - 1 => 5 > > > > > > > > > > > >> - /tmp/log2 > > > > > > > > > > > >> - 0 => 0 > > > > > > > > > > > >> - 1 => 0 > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> That said, the `RemainingBytesToRecovery` metric is > > > > > difficult > > > > > > to > > > > > > > > > > achieve > > > > > > > > > > > >> as you expected. I think the current proposal with > > > > > > > > > > > `RemainingLogsToRecover` > > > > > > > > > > > >> and `RemainingSegmentsToRecover` should already > > provide > > > > > enough > > > > > > > > info > > > > > > > > > > for > > > > > > > > > > > >> the log recovery progress. > > > > > > > > > > > >> > > > > > > > > > > > >> I've also updated the KIP example to make it clear. > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> Thank you. > > > > > > > > > > > >> Luke > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >>> On Thu, May 5, 2022 at 3:31 AM James Cheng < > > > > > > > wushuja...@gmail.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > >>> > > > > > > > > > > > >>> Hi Luke, > > > > > > > > > > > >>> > > > > > > > > > > > >>> Thanks for adding RemainingSegmentsToRecovery. > > > > > > > > > > > >>> > > > > > > > > > > > >>> Another thought: different topics can have > different > > > > > segment > > > > > > > > sizes. I > > > > > > > > > > > >>> don't know how common it is, but it is possible. > Some > > > > > topics > > > > > > > > might > > > > > > > > > > want > > > > > > > > > > > >>> small segment sizes to more granular expiration of > > > data. > > > > > > > > > > > >>> > > > > > > > > > > > >>> The downside of RemainingLogsToRecovery and > > > > > > > > > > RemainingSegmentsToRecovery > > > > > > > > > > > >>> is that the rate that they will decrement depends > on > > > the > > > > > > > > > > configuration > > > > > > > > > > > and > > > > > > > > > > > >>> patterns of the topics and partitions and segment > > > sizes. > > > > If > > > > > > > > someone > > > > > > > > > > is > > > > > > > > > > > >>> monitoring those metrics, they might see times > where > > > the > > > > > > metric > > > > > > > > > > > decrements > > > > > > > > > > > >>> slowly, followed by a burst where it decrements > > > quickly. > > > > > > > > > > > >>> > > > > > > > > > > > >>> What about RemainingBytesToRecovery? This would not > > > > depend > > > > > on > > > > > > > the > > > > > > > > > > > >>> configuration of the topic or of the data. It would > > > > > actually > > > > > > > be a > > > > > > > > > > > pretty > > > > > > > > > > > >>> good metric, because I think that this metric would > > > > change > > > > > > at a > > > > > > > > > > > constant > > > > > > > > > > > >>> rate (based on the disk I/O speed that the broker > > > > allocates > > > > > > to > > > > > > > > > > > recovery). > > > > > > > > > > > >>> Because it changes at a constant rate, you would be > > > able > > > > to > > > > > > use > > > > > > > > the > > > > > > > > > > > >>> rate-of-change to predict when it hits zero, which > > will > > > > let > > > > > > you > > > > > > > > know > > > > > > > > > > > when > > > > > > > > > > > >>> the broker is going to start up. Like, I would > > imagine > > > if > > > > > we > > > > > > > > graphed > > > > > > > > > > > >>> RemainingBytesToRecovery that we'd see a fairly > > > straight > > > > > line > > > > > > > > that is > > > > > > > > > > > >>> decrementing at a steady rate towards zero. > > > > > > > > > > > >>> > > > > > > > > > > > >>> What do you think about adding > > > RemainingBytesToRecovery? > > > > > > > > > > > >>> > > > > > > > > > > > >>> Or, what would you think about making the primary > > > metric > > > > be > > > > > > > > > > > >>> RemainingBytesToRecovery, and getting rid of the > > > others? > > > > > > > > > > > >>> > > > > > > > > > > > >>> I don't know if I personally would rather have all > 3 > > > > > metrics, > > > > > > > or > > > > > > > > > > would > > > > > > > > > > > >>> just use RemainingBytesToRecovery. I'd too would > like > > > > more > > > > > > > > community > > > > > > > > > > > input > > > > > > > > > > > >>> on which of those metrics would be useful to > people. > > > > > > > > > > > >>> > > > > > > > > > > > >>> About the JMX metrics, you said that if > > > > > > > > > > > >>> num.recovery.threads.per.data.dir=2, that there > might > > > be > > > > a > > > > > > > > separate > > > > > > > > > > > >>> RemainingSegmentsToRecovery counter for each > thread. > > Is > > > > > that > > > > > > > > actually > > > > > > > > > > > how > > > > > > > > > > > >>> the data is structured within the Kafka recovery > > > threads? > > > > > > Does > > > > > > > > each > > > > > > > > > > > thread > > > > > > > > > > > >>> get a fixed set of partitions, or is there just one > > big > > > > > pool > > > > > > of > > > > > > > > > > > partitions > > > > > > > > > > > >>> that the threads all work on? > > > > > > > > > > > >>> > > > > > > > > > > > >>> As a more concrete example: > > > > > > > > > > > >>> * If I have 9 small partitions and 1 big partition, > > and > > > > > > > > > > > >>> num.recovery.threads.per.data.dir=2 > > > > > > > > > > > >>> Does each thread get 5 partitions, which means one > > > thread > > > > > > will > > > > > > > > finish > > > > > > > > > > > >>> much sooner than the other? > > > > > > > > > > > >>> OR > > > > > > > > > > > >>> Do both threads just work on the set of 10 > > partitions, > > > > > which > > > > > > > > means > > > > > > > > > > > likely > > > > > > > > > > > >>> 1 thread will be busy with the big partition, while > > the > > > > > other > > > > > > > one > > > > > > > > > > ends > > > > > > > > > > > up > > > > > > > > > > > >>> plowing through the 9 small partitions? > > > > > > > > > > > >>> > > > > > > > > > > > >>> If each thread gets assigned 5 partitions, then it > > > would > > > > > make > > > > > > > > sense > > > > > > > > > > > that > > > > > > > > > > > >>> each thread has its own counter. > > > > > > > > > > > >>> If the threads works on a single pool of 10 > > partitions, > > > > > then > > > > > > it > > > > > > > > would > > > > > > > > > > > >>> probably mean that the counter is on the pool of > > > > partitions > > > > > > > > itself, > > > > > > > > > > > and not > > > > > > > > > > > >>> on each thread. > > > > > > > > > > > >>> > > > > > > > > > > > >>> -James > > > > > > > > > > > >>> > > > > > > > > > > > >>>> On May 4, 2022, at 5:55 AM, Luke Chen < > > > > show...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> Hi devs, > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> If there are no other comments, I'll start a vote > > > > > tomorrow. > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> Thank you. > > > > > > > > > > > >>>> Luke > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen < > > > > > show...@gmail.com > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > >>>> > > > > > > > > > > > >>>>> Hi James, > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>> Sorry for the late reply. > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>> Yes, this is a good point, to know how many > > segments > > > to > > > > > be > > > > > > > > > > recovered > > > > > > > > > > > if > > > > > > > > > > > >>>>> there are some large partitions. > > > > > > > > > > > >>>>> I've updated the KIP, to add a > > > > > > `*RemainingSegmentsToRecover*` > > > > > > > > > > metric > > > > > > > > > > > >>> for > > > > > > > > > > > >>>>> each log recovery thread, to show the value. > > > > > > > > > > > >>>>> The example in the Proposed section here > > > > > > > > > > > >>>>> < > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges > > > > > > > > > > > >>>> > > > > > > > > > > > >>>>> shows what it will look like. > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>> Thanks for the suggestion. > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>> Thank you. > > > > > > > > > > > >>>>> Luke > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng < > > > > > > > > wushuja...@gmail.com> > > > > > > > > > > > >>> wrote: > > > > > > > > > > > >>>>> > > > > > > > > > > > >>>>>> The KIP describes RemainingLogsToRecovery, which > > > seems > > > > > to > > > > > > be > > > > > > > > the > > > > > > > > > > > >>> number > > > > > > > > > > > >>>>>> of partitions in each log.dir. > > > > > > > > > > > >>>>>> > > > > > > > > > > > >>>>>> We have some partitions which are much much > larger > > > > than > > > > > > > > others. > > > > > > > > > > > Those > > > > > > > > > > > >>>>>> large partitions have many many more segments > than > > > > > others. > > > > > > > > > > > >>>>>> > > > > > > > > > > > >>>>>> Is there a way the metric can reflect partition > > > size? > > > > > > Could > > > > > > > > it be > > > > > > > > > > > >>>>>> RemainingSegmentsToRecover? Or even > > > > > > RemainingBytesToRecover? > > > > > > > > > > > >>>>>> > > > > > > > > > > > >>>>>> -James > > > > > > > > > > > >>>>>> > > > > > > > > > > > >>>>>> Sent from my iPhone > > > > > > > > > > > >>>>>> > > > > > > > > > > > >>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen < > > > > > > show...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > >>>>>>> > > > > > > > > > > > >>>>>>> Hi all, > > > > > > > > > > > >>>>>>> > > > > > > > > > > > >>>>>>> I'd like to propose a KIP to expose a metric > for > > > log > > > > > > > recovery > > > > > > > > > > > >>> progress. > > > > > > > > > > > >>>>>>> This metric would let the admins have a way to > > > > monitor > > > > > > the > > > > > > > > log > > > > > > > > > > > >>> recovery > > > > > > > > > > > >>>>>>> progress. > > > > > > > > > > > >>>>>>> Details can be found here: > > > > > > > > > > > >>>>>>> > > > > > > > > > > > >>>>>> > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress > > > > > > > > > > > >>>>>>> > > > > > > > > > > > >>>>>>> Any feedback is appreciated. > > > > > > > > > > > >>>>>>> > > > > > > > > > > > >>>>>>> Thank you. > > > > > > > > > > > >>>>>>> Luke > > > > > > > > > > > >>>>>> > > > > > > > > > > > >>>>> > > > > > > > > > > > >>> > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Best Regards, > > > > > > > > Raman Verma > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >