Thanks for the updates Sagar! On Thu, Feb 10, 2022 at 5:59 PM Sagar <sagarmeansoc...@gmail.com> wrote:
> Hi All, > > As discussed above, this KIP would be discarded and the new metric proposed > here would be added to KIP-770 as the need to add a new metric was > discovered when working on it. > > Thanks! > Sagar. > > On Thu, Feb 10, 2022 at 9:54 AM Sagar <sagarmeansoc...@gmail.com> wrote: > > > Hi Guozhang, > > > > Sure. I will add it to the KIP. > > > > Thanks! > > Sagar. > > > > On Mon, Feb 7, 2022 at 6:22 AM Guozhang Wang <wangg...@gmail.com> wrote: > > > >> Since the PR is reopened and we are going to re-merged the fixed PRs, > what > >> about just adding that as part of the KIP as the addendum? > >> > >> On Fri, Feb 4, 2022 at 2:13 AM Sagar <sagarmeansoc...@gmail.com> wrote: > >> > >> > Thanks Sophie/Guozhang. > >> > > >> > Yeah I could have amended the KIP but it slipped my mind when Guozhang > >> > proposed this in the PR. Later on, the PR was merged and KIP was > marked > >> as > >> > adopted so I thought I will write a new one. I know the PR had been > >> > reopened now :p . I dont have much preference on a new KIP v/s the > >> original > >> > one so anything is ok with me as well. > >> > > >> > I agree with the INFO part. I will make that change. > >> > > >> > Regarding task level, from my understanding, since every task's > >> > buffer/cache size might be different so if a certain task might be > >> > overshooting the limits then the task level metric might help people > to > >> > infer this. Also, thanks for the explanation Guozhang on why this > >> should be > >> > a task level metric. What are your thoughts on this @Sophie? > >> > > >> > Thanks! > >> > Sagar. > >> > > >> > > >> > On Fri, Feb 4, 2022 at 4:47 AM Guozhang Wang <wangg...@gmail.com> > >> wrote: > >> > > >> > > Thanks Sagar for proposing the KIP, and Sophie for sharing your > >> thoughts. > >> > > Here're my 2c: > >> > > > >> > > I think I agree with Sophie for making the two metrics (both the > added > >> > and > >> > > the newly proposed) on INFO level since we are always calculating > them > >> > > anyways. Regarding the level of the cache-size though, I'm thinking > a > >> bit > >> > > different with you two: today we do not actually keep that caches on > >> the > >> > > per-store level, but rather on the per-thread level, i.e. when the > >> cache > >> > is > >> > > full we would flush not only on the triggering state store but also > >> > > potentially on other state stores as well of the task that thread > >> owns. > >> > > This mechanism, in hindsight, is a bit weird and we have some > >> discussions > >> > > about refactoring that in the future already. Personally I'd like to > >> make > >> > > this new metric to be aligned with whatever our future design will > be. > >> > > > >> > > In the long run if we would not have a static assignment from tasks > to > >> > > threads, it may not make sense to keep a dedicated cache pool per > >> thread. > >> > > Instead all tasks will be dynamically sharing the globally > configured > >> max > >> > > cache size (dynamically here means, we would not just divide the > total > >> > size > >> > > by the num.tasks and then assign that to each task), and when a > cache > >> put > >> > > triggers the flushing because the sum now exceeds the global > >> configured > >> > > value, we would potentially flush all the cached records for that > >> task. > >> > If > >> > > this is the end stage, then I think keeping this metric at the task > >> level > >> > > is good. > >> > > > >> > > > >> > > > >> > > Guozhang > >> > > > >> > > > >> > > > >> > > > >> > > On Thu, Feb 3, 2022 at 10:15 AM Sophie Blee-Goldman > >> > > <sop...@confluent.io.invalid> wrote: > >> > > > >> > > > Hey Sagar, thanks for the KIP! > >> > > > > >> > > > And yes, all metrics are considered part of the public API and > thus > >> > > require > >> > > > a KIP to add (or modify, etc...) Although in this particular case, > >> you > >> > > > could probably make a good case for just considering it as an > >> update to > >> > > the > >> > > > original KIP which added the analogous metric > >> > `input-buffer-bytes-total`. > >> > > > For things like this that weren't considered during the KIP > >> proposal > >> > but > >> > > > came up during the implementation or review, and are small changes > >> that > >> > > > would have made sense to include in that KIP had they been thought > >> of, > >> > > you > >> > > > can just send an update to the existing KIP's discussion and.or > >> voting > >> > > > thread that explains what you want to add or modify and maybe a > >> brief > >> > > > description why. > >> > > > > >> > > > It's always ok to make a new KIP when in doubt, but there are some > >> > cases > >> > > > where an update email is sufficient. If there are any concerns or > >> > > > suggestions that significantly expand the scope of the update, you > >> can > >> > > > always go create a new KIP and move the discussion there. > >> > > > > >> > > > I'd say you can feel free to proceed in whichever way you'd prefer > >> for > >> > > this > >> > > > new proposal -- it just needs to appear in some KIP somewhere, and > >> have > >> > > > given the community thew opportunity to discuss and provide > feedback > >> > on. > >> > > > > >> > > > On that note, I do have two suggestions: > >> > > > > >> > > > 1) since we need to measure the size of the cache (and the input > >> > > buffer(s) > >> > > > anyways, we may as well make `cache-size-bytes-total` -- and also > >> the > >> > new > >> > > > input-buffer-bytes-total -- an INFO level metric. In general the > >> more > >> > > > metrics the merrier, the only real reason for disabling some are > if > >> > they > >> > > > have a performance impact or other cost that not everyone will > want > >> to > >> > > pay. > >> > > > In this case we're already computing the value of these metrics, > so > >> why > >> > > not > >> > > > expose it to the user as an INFO metric > >> > > > 2) I think it would be both more natural and easier to implement > if > >> > this > >> > > > was a store-level metric. A single task could in theory contain > >> > multiple > >> > > > physical state store caches and we would have to roll them up to > >> report > >> > > the > >> > > > size for the task as a whole. It's additional work just to lose > some > >> > > > information that the user may want to have > >> > > > > >> > > > Let me know if anything here doesn't make sense or needs > >> clarification. > >> > > And > >> > > > thanks for the quick followup to get this 2nd metric! > >> > > > > >> > > > -Sophie > >> > > > > >> > > > On Sat, Jan 29, 2022 at 4:27 AM Sagar <sagarmeansoc...@gmail.com> > >> > wrote: > >> > > > > >> > > > > Hi All, > >> > > > > > >> > > > > I would like to open a discussion thread on the following KIP: > >> > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390 > >> > > > > > >> > > > > PS: This is about introducing a new metric and I am assuming > that > >> it > >> > > > > requires a KIP. If that isn't the case, I can close it. > >> > > > > > >> > > > > Thanks! > >> > > > > Sagar. > >> > > > > > >> > > > > >> > > > >> > > > >> > > -- > >> > > -- Guozhang > >> > > > >> > > >> > >> > >> -- > >> -- Guozhang > >> > > > -- -- Guozhang