Re: [DISCUSS] KIP-818: Introduce cache-size-bytes-total Task Level Metric

2022-02-10 Thread Guozhang Wang
Thanks for the updates Sagar!

On Thu, Feb 10, 2022 at 5:59 PM Sagar  wrote:

> Hi All,
>
> As discussed above, this KIP would be discarded and the new metric proposed
> here would be added to KIP-770 as the need to add a new metric was
> discovered when working on it.
>
> Thanks!
> Sagar.
>
> On Thu, Feb 10, 2022 at 9:54 AM Sagar  wrote:
>
> > Hi Guozhang,
> >
> > Sure. I will add it to the KIP.
> >
> > Thanks!
> > Sagar.
> >
> > On Mon, Feb 7, 2022 at 6:22 AM Guozhang Wang  wrote:
> >
> >> Since the PR is reopened and we are going to re-merged the fixed PRs,
> what
> >> about just adding that as part of the KIP as the addendum?
> >>
> >> On Fri, Feb 4, 2022 at 2:13 AM Sagar  wrote:
> >>
> >> > Thanks Sophie/Guozhang.
> >> >
> >> > Yeah I could have amended the KIP but it slipped my mind when Guozhang
> >> > proposed this in the PR. Later on, the PR was merged and KIP was
> marked
> >> as
> >> > adopted so I thought I will write a new one. I know the PR had been
> >> > reopened now :p . I dont have much preference on a new KIP v/s the
> >> original
> >> > one so anything is ok with me as well.
> >> >
> >> > I agree with the INFO part. I will make that change.
> >> >
> >> > Regarding task level, from my understanding, since every task's
> >> > buffer/cache size might be different so if a certain task might be
> >> > overshooting the limits then the task level metric might help people
> to
> >> > infer this. Also, thanks for the explanation Guozhang on why this
> >> should be
> >> > a task level metric. What are your thoughts on this @Sophie?
> >> >
> >> > Thanks!
> >> > Sagar.
> >> >
> >> >
> >> > On Fri, Feb 4, 2022 at 4:47 AM Guozhang Wang 
> >> wrote:
> >> >
> >> > > Thanks Sagar for proposing the KIP, and Sophie for sharing your
> >> thoughts.
> >> > > Here're my 2c:
> >> > >
> >> > > I think I agree with Sophie for making the two metrics (both the
> added
> >> > and
> >> > > the newly proposed) on INFO level since we are always calculating
> them
> >> > > anyways. Regarding the level of the cache-size though, I'm thinking
> a
> >> bit
> >> > > different with you two: today we do not actually keep that caches on
> >> the
> >> > > per-store level, but rather on the per-thread level, i.e. when the
> >> cache
> >> > is
> >> > > full we would flush not only on the triggering state store but also
> >> > > potentially on other state stores as well of the task that thread
> >> owns.
> >> > > This mechanism, in hindsight, is a bit weird and we have some
> >> discussions
> >> > > about refactoring that in the future already. Personally I'd like to
> >> make
> >> > > this new metric to be aligned with whatever our future design will
> be.
> >> > >
> >> > > In the long run if we would not have a static assignment from tasks
> to
> >> > > threads, it may not make sense to keep a dedicated cache pool per
> >> thread.
> >> > > Instead all tasks will be dynamically sharing the globally
> configured
> >> max
> >> > > cache size (dynamically here means, we would not just divide the
> total
> >> > size
> >> > > by the num.tasks and then assign that to each task), and when a
> cache
> >> put
> >> > > triggers the flushing because the sum now exceeds the global
> >> configured
> >> > > value, we would potentially flush all the cached records for that
> >> task.
> >> > If
> >> > > this is the end stage, then I think keeping this metric at the task
> >> level
> >> > > is good.
> >> > >
> >> > >
> >> > >
> >> > > Guozhang
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Feb 3, 2022 at 10:15 AM Sophie Blee-Goldman
> >> > >  wrote:
> >> > >
> >> > > > Hey Sagar,  thanks for the KIP!
> >> > > >
> >> > > > And yes, all metrics are considered part of the public API and
> thus
> >> > > require
> >> > > > a KIP to add (or modify, etc...) Although in this particular case,
> >> you
> >> > > > could probably make a good case for just considering it as an
> >> update to
> >> > > the
> >> > > > original KIP which added the analogous metric
> >> > `input-buffer-bytes-total`.
> >> > > > For  things like this that weren't considered during the KIP
> >> proposal
> >> > but
> >> > > > came up during the implementation or review, and are small changes
> >> that
> >> > > > would have made sense to include in that KIP had they been thought
> >> of,
> >> > > you
> >> > > > can just send an update to the existing KIP's discussion and.or
> >> voting
> >> > > > thread that explains what you want to add or modify and maybe a
> >> brief
> >> > > > description why.
> >> > > >
> >> > > > It's always ok to make a new KIP when in doubt, but there are some
> >> > cases
> >> > > > where an update email is sufficient. If there are any concerns or
> >> > > > suggestions that significantly expand the scope of the update, you
> >> can
> >> > > > always go create a new KIP and move the discussion there.
> >> > > >
> >> > > > I'd say you can feel free to proceed in whichever way you'd prefer
> >> for
> >> > > this
> >> > > > new proposal -- it just needs to 

Re: [DISCUSS] KIP-818: Introduce cache-size-bytes-total Task Level Metric

2022-02-10 Thread Sagar
Hi All,

As discussed above, this KIP would be discarded and the new metric proposed
here would be added to KIP-770 as the need to add a new metric was
discovered when working on it.

Thanks!
Sagar.

On Thu, Feb 10, 2022 at 9:54 AM Sagar  wrote:

> Hi Guozhang,
>
> Sure. I will add it to the KIP.
>
> Thanks!
> Sagar.
>
> On Mon, Feb 7, 2022 at 6:22 AM Guozhang Wang  wrote:
>
>> Since the PR is reopened and we are going to re-merged the fixed PRs, what
>> about just adding that as part of the KIP as the addendum?
>>
>> On Fri, Feb 4, 2022 at 2:13 AM Sagar  wrote:
>>
>> > Thanks Sophie/Guozhang.
>> >
>> > Yeah I could have amended the KIP but it slipped my mind when Guozhang
>> > proposed this in the PR. Later on, the PR was merged and KIP was marked
>> as
>> > adopted so I thought I will write a new one. I know the PR had been
>> > reopened now :p . I dont have much preference on a new KIP v/s the
>> original
>> > one so anything is ok with me as well.
>> >
>> > I agree with the INFO part. I will make that change.
>> >
>> > Regarding task level, from my understanding, since every task's
>> > buffer/cache size might be different so if a certain task might be
>> > overshooting the limits then the task level metric might help people to
>> > infer this. Also, thanks for the explanation Guozhang on why this
>> should be
>> > a task level metric. What are your thoughts on this @Sophie?
>> >
>> > Thanks!
>> > Sagar.
>> >
>> >
>> > On Fri, Feb 4, 2022 at 4:47 AM Guozhang Wang 
>> wrote:
>> >
>> > > Thanks Sagar for proposing the KIP, and Sophie for sharing your
>> thoughts.
>> > > Here're my 2c:
>> > >
>> > > I think I agree with Sophie for making the two metrics (both the added
>> > and
>> > > the newly proposed) on INFO level since we are always calculating them
>> > > anyways. Regarding the level of the cache-size though, I'm thinking a
>> bit
>> > > different with you two: today we do not actually keep that caches on
>> the
>> > > per-store level, but rather on the per-thread level, i.e. when the
>> cache
>> > is
>> > > full we would flush not only on the triggering state store but also
>> > > potentially on other state stores as well of the task that thread
>> owns.
>> > > This mechanism, in hindsight, is a bit weird and we have some
>> discussions
>> > > about refactoring that in the future already. Personally I'd like to
>> make
>> > > this new metric to be aligned with whatever our future design will be.
>> > >
>> > > In the long run if we would not have a static assignment from tasks to
>> > > threads, it may not make sense to keep a dedicated cache pool per
>> thread.
>> > > Instead all tasks will be dynamically sharing the globally configured
>> max
>> > > cache size (dynamically here means, we would not just divide the total
>> > size
>> > > by the num.tasks and then assign that to each task), and when a cache
>> put
>> > > triggers the flushing because the sum now exceeds the global
>> configured
>> > > value, we would potentially flush all the cached records for that
>> task.
>> > If
>> > > this is the end stage, then I think keeping this metric at the task
>> level
>> > > is good.
>> > >
>> > >
>> > >
>> > > Guozhang
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, Feb 3, 2022 at 10:15 AM Sophie Blee-Goldman
>> > >  wrote:
>> > >
>> > > > Hey Sagar,  thanks for the KIP!
>> > > >
>> > > > And yes, all metrics are considered part of the public API and thus
>> > > require
>> > > > a KIP to add (or modify, etc...) Although in this particular case,
>> you
>> > > > could probably make a good case for just considering it as an
>> update to
>> > > the
>> > > > original KIP which added the analogous metric
>> > `input-buffer-bytes-total`.
>> > > > For  things like this that weren't considered during the KIP
>> proposal
>> > but
>> > > > came up during the implementation or review, and are small changes
>> that
>> > > > would have made sense to include in that KIP had they been thought
>> of,
>> > > you
>> > > > can just send an update to the existing KIP's discussion and.or
>> voting
>> > > > thread that explains what you want to add or modify and maybe a
>> brief
>> > > > description why.
>> > > >
>> > > > It's always ok to make a new KIP when in doubt, but there are some
>> > cases
>> > > > where an update email is sufficient. If there are any concerns or
>> > > > suggestions that significantly expand the scope of the update, you
>> can
>> > > > always go create a new KIP and move the discussion there.
>> > > >
>> > > > I'd say you can feel free to proceed in whichever way you'd prefer
>> for
>> > > this
>> > > > new proposal -- it just needs to appear in some KIP somewhere, and
>> have
>> > > > given the community thew opportunity to discuss and provide feedback
>> > on.
>> > > >
>> > > > On that note, I do have two suggestions:
>> > > >
>> > > > 1)  since we need to measure the size of the cache (and the input
>> > > buffer(s)
>> > > > anyways, we may as well make `cache-size-bytes-total` -- and also
>> the
>> 

Re: [DISCUSS] KIP-818: Introduce cache-size-bytes-total Task Level Metric

2022-02-09 Thread Sagar
Hi Guozhang,

Sure. I will add it to the KIP.

Thanks!
Sagar.

On Mon, Feb 7, 2022 at 6:22 AM Guozhang Wang  wrote:

> Since the PR is reopened and we are going to re-merged the fixed PRs, what
> about just adding that as part of the KIP as the addendum?
>
> On Fri, Feb 4, 2022 at 2:13 AM Sagar  wrote:
>
> > Thanks Sophie/Guozhang.
> >
> > Yeah I could have amended the KIP but it slipped my mind when Guozhang
> > proposed this in the PR. Later on, the PR was merged and KIP was marked
> as
> > adopted so I thought I will write a new one. I know the PR had been
> > reopened now :p . I dont have much preference on a new KIP v/s the
> original
> > one so anything is ok with me as well.
> >
> > I agree with the INFO part. I will make that change.
> >
> > Regarding task level, from my understanding, since every task's
> > buffer/cache size might be different so if a certain task might be
> > overshooting the limits then the task level metric might help people to
> > infer this. Also, thanks for the explanation Guozhang on why this should
> be
> > a task level metric. What are your thoughts on this @Sophie?
> >
> > Thanks!
> > Sagar.
> >
> >
> > On Fri, Feb 4, 2022 at 4:47 AM Guozhang Wang  wrote:
> >
> > > Thanks Sagar for proposing the KIP, and Sophie for sharing your
> thoughts.
> > > Here're my 2c:
> > >
> > > I think I agree with Sophie for making the two metrics (both the added
> > and
> > > the newly proposed) on INFO level since we are always calculating them
> > > anyways. Regarding the level of the cache-size though, I'm thinking a
> bit
> > > different with you two: today we do not actually keep that caches on
> the
> > > per-store level, but rather on the per-thread level, i.e. when the
> cache
> > is
> > > full we would flush not only on the triggering state store but also
> > > potentially on other state stores as well of the task that thread owns.
> > > This mechanism, in hindsight, is a bit weird and we have some
> discussions
> > > about refactoring that in the future already. Personally I'd like to
> make
> > > this new metric to be aligned with whatever our future design will be.
> > >
> > > In the long run if we would not have a static assignment from tasks to
> > > threads, it may not make sense to keep a dedicated cache pool per
> thread.
> > > Instead all tasks will be dynamically sharing the globally configured
> max
> > > cache size (dynamically here means, we would not just divide the total
> > size
> > > by the num.tasks and then assign that to each task), and when a cache
> put
> > > triggers the flushing because the sum now exceeds the global configured
> > > value, we would potentially flush all the cached records for that task.
> > If
> > > this is the end stage, then I think keeping this metric at the task
> level
> > > is good.
> > >
> > >
> > >
> > > Guozhang
> > >
> > >
> > >
> > >
> > > On Thu, Feb 3, 2022 at 10:15 AM Sophie Blee-Goldman
> > >  wrote:
> > >
> > > > Hey Sagar,  thanks for the KIP!
> > > >
> > > > And yes, all metrics are considered part of the public API and thus
> > > require
> > > > a KIP to add (or modify, etc...) Although in this particular case,
> you
> > > > could probably make a good case for just considering it as an update
> to
> > > the
> > > > original KIP which added the analogous metric
> > `input-buffer-bytes-total`.
> > > > For  things like this that weren't considered during the KIP proposal
> > but
> > > > came up during the implementation or review, and are small changes
> that
> > > > would have made sense to include in that KIP had they been thought
> of,
> > > you
> > > > can just send an update to the existing KIP's discussion and.or
> voting
> > > > thread that explains what you want to add or modify and maybe a brief
> > > > description why.
> > > >
> > > > It's always ok to make a new KIP when in doubt, but there are some
> > cases
> > > > where an update email is sufficient. If there are any concerns or
> > > > suggestions that significantly expand the scope of the update, you
> can
> > > > always go create a new KIP and move the discussion there.
> > > >
> > > > I'd say you can feel free to proceed in whichever way you'd prefer
> for
> > > this
> > > > new proposal -- it just needs to appear in some KIP somewhere, and
> have
> > > > given the community thew opportunity to discuss and provide feedback
> > on.
> > > >
> > > > On that note, I do have two suggestions:
> > > >
> > > > 1)  since we need to measure the size of the cache (and the input
> > > buffer(s)
> > > > anyways, we may as well make `cache-size-bytes-total` -- and also the
> > new
> > > > input-buffer-bytes-total -- an INFO level metric. In general the more
> > > > metrics the merrier, the only real reason for disabling some are if
> > they
> > > > have a performance impact or other cost that not everyone will want
> to
> > > pay.
> > > > In this case we're already computing the value of these metrics, so
> why
> > > not
> > > > expose it to the user as an INFO metric
> 

Re: [DISCUSS] KIP-818: Introduce cache-size-bytes-total Task Level Metric

2022-02-06 Thread Guozhang Wang
Since the PR is reopened and we are going to re-merged the fixed PRs, what
about just adding that as part of the KIP as the addendum?

On Fri, Feb 4, 2022 at 2:13 AM Sagar  wrote:

> Thanks Sophie/Guozhang.
>
> Yeah I could have amended the KIP but it slipped my mind when Guozhang
> proposed this in the PR. Later on, the PR was merged and KIP was marked as
> adopted so I thought I will write a new one. I know the PR had been
> reopened now :p . I dont have much preference on a new KIP v/s the original
> one so anything is ok with me as well.
>
> I agree with the INFO part. I will make that change.
>
> Regarding task level, from my understanding, since every task's
> buffer/cache size might be different so if a certain task might be
> overshooting the limits then the task level metric might help people to
> infer this. Also, thanks for the explanation Guozhang on why this should be
> a task level metric. What are your thoughts on this @Sophie?
>
> Thanks!
> Sagar.
>
>
> On Fri, Feb 4, 2022 at 4:47 AM Guozhang Wang  wrote:
>
> > Thanks Sagar for proposing the KIP, and Sophie for sharing your thoughts.
> > Here're my 2c:
> >
> > I think I agree with Sophie for making the two metrics (both the added
> and
> > the newly proposed) on INFO level since we are always calculating them
> > anyways. Regarding the level of the cache-size though, I'm thinking a bit
> > different with you two: today we do not actually keep that caches on the
> > per-store level, but rather on the per-thread level, i.e. when the cache
> is
> > full we would flush not only on the triggering state store but also
> > potentially on other state stores as well of the task that thread owns.
> > This mechanism, in hindsight, is a bit weird and we have some discussions
> > about refactoring that in the future already. Personally I'd like to make
> > this new metric to be aligned with whatever our future design will be.
> >
> > In the long run if we would not have a static assignment from tasks to
> > threads, it may not make sense to keep a dedicated cache pool per thread.
> > Instead all tasks will be dynamically sharing the globally configured max
> > cache size (dynamically here means, we would not just divide the total
> size
> > by the num.tasks and then assign that to each task), and when a cache put
> > triggers the flushing because the sum now exceeds the global configured
> > value, we would potentially flush all the cached records for that task.
> If
> > this is the end stage, then I think keeping this metric at the task level
> > is good.
> >
> >
> >
> > Guozhang
> >
> >
> >
> >
> > On Thu, Feb 3, 2022 at 10:15 AM Sophie Blee-Goldman
> >  wrote:
> >
> > > Hey Sagar,  thanks for the KIP!
> > >
> > > And yes, all metrics are considered part of the public API and thus
> > require
> > > a KIP to add (or modify, etc...) Although in this particular case, you
> > > could probably make a good case for just considering it as an update to
> > the
> > > original KIP which added the analogous metric
> `input-buffer-bytes-total`.
> > > For  things like this that weren't considered during the KIP proposal
> but
> > > came up during the implementation or review, and are small changes that
> > > would have made sense to include in that KIP had they been thought of,
> > you
> > > can just send an update to the existing KIP's discussion and.or voting
> > > thread that explains what you want to add or modify and maybe a brief
> > > description why.
> > >
> > > It's always ok to make a new KIP when in doubt, but there are some
> cases
> > > where an update email is sufficient. If there are any concerns or
> > > suggestions that significantly expand the scope of the update, you can
> > > always go create a new KIP and move the discussion there.
> > >
> > > I'd say you can feel free to proceed in whichever way you'd prefer for
> > this
> > > new proposal -- it just needs to appear in some KIP somewhere, and have
> > > given the community thew opportunity to discuss and provide feedback
> on.
> > >
> > > On that note, I do have two suggestions:
> > >
> > > 1)  since we need to measure the size of the cache (and the input
> > buffer(s)
> > > anyways, we may as well make `cache-size-bytes-total` -- and also the
> new
> > > input-buffer-bytes-total -- an INFO level metric. In general the more
> > > metrics the merrier, the only real reason for disabling some are if
> they
> > > have a performance impact or other cost that not everyone will want to
> > pay.
> > > In this case we're already computing the value of these metrics, so why
> > not
> > > expose it to the user as an INFO metric
> > > 2) I think it would be both more natural and easier to implement if
> this
> > > was a store-level metric. A single task could in theory contain
> multiple
> > > physical state store caches and we would have to roll them up to report
> > the
> > > size for the task as a whole. It's additional work just to lose some
> > > information that the user may want to have
> > >
> > 

Re: [DISCUSS] KIP-818: Introduce cache-size-bytes-total Task Level Metric

2022-02-04 Thread Sagar
Thanks Sophie/Guozhang.

Yeah I could have amended the KIP but it slipped my mind when Guozhang
proposed this in the PR. Later on, the PR was merged and KIP was marked as
adopted so I thought I will write a new one. I know the PR had been
reopened now :p . I dont have much preference on a new KIP v/s the original
one so anything is ok with me as well.

I agree with the INFO part. I will make that change.

Regarding task level, from my understanding, since every task's
buffer/cache size might be different so if a certain task might be
overshooting the limits then the task level metric might help people to
infer this. Also, thanks for the explanation Guozhang on why this should be
a task level metric. What are your thoughts on this @Sophie?

Thanks!
Sagar.


On Fri, Feb 4, 2022 at 4:47 AM Guozhang Wang  wrote:

> Thanks Sagar for proposing the KIP, and Sophie for sharing your thoughts.
> Here're my 2c:
>
> I think I agree with Sophie for making the two metrics (both the added and
> the newly proposed) on INFO level since we are always calculating them
> anyways. Regarding the level of the cache-size though, I'm thinking a bit
> different with you two: today we do not actually keep that caches on the
> per-store level, but rather on the per-thread level, i.e. when the cache is
> full we would flush not only on the triggering state store but also
> potentially on other state stores as well of the task that thread owns.
> This mechanism, in hindsight, is a bit weird and we have some discussions
> about refactoring that in the future already. Personally I'd like to make
> this new metric to be aligned with whatever our future design will be.
>
> In the long run if we would not have a static assignment from tasks to
> threads, it may not make sense to keep a dedicated cache pool per thread.
> Instead all tasks will be dynamically sharing the globally configured max
> cache size (dynamically here means, we would not just divide the total size
> by the num.tasks and then assign that to each task), and when a cache put
> triggers the flushing because the sum now exceeds the global configured
> value, we would potentially flush all the cached records for that task. If
> this is the end stage, then I think keeping this metric at the task level
> is good.
>
>
>
> Guozhang
>
>
>
>
> On Thu, Feb 3, 2022 at 10:15 AM Sophie Blee-Goldman
>  wrote:
>
> > Hey Sagar,  thanks for the KIP!
> >
> > And yes, all metrics are considered part of the public API and thus
> require
> > a KIP to add (or modify, etc...) Although in this particular case, you
> > could probably make a good case for just considering it as an update to
> the
> > original KIP which added the analogous metric `input-buffer-bytes-total`.
> > For  things like this that weren't considered during the KIP proposal but
> > came up during the implementation or review, and are small changes that
> > would have made sense to include in that KIP had they been thought of,
> you
> > can just send an update to the existing KIP's discussion and.or voting
> > thread that explains what you want to add or modify and maybe a brief
> > description why.
> >
> > It's always ok to make a new KIP when in doubt, but there are some cases
> > where an update email is sufficient. If there are any concerns or
> > suggestions that significantly expand the scope of the update, you can
> > always go create a new KIP and move the discussion there.
> >
> > I'd say you can feel free to proceed in whichever way you'd prefer for
> this
> > new proposal -- it just needs to appear in some KIP somewhere, and have
> > given the community thew opportunity to discuss and provide feedback on.
> >
> > On that note, I do have two suggestions:
> >
> > 1)  since we need to measure the size of the cache (and the input
> buffer(s)
> > anyways, we may as well make `cache-size-bytes-total` -- and also the new
> > input-buffer-bytes-total -- an INFO level metric. In general the more
> > metrics the merrier, the only real reason for disabling some are if they
> > have a performance impact or other cost that not everyone will want to
> pay.
> > In this case we're already computing the value of these metrics, so why
> not
> > expose it to the user as an INFO metric
> > 2) I think it would be both more natural and easier to implement if this
> > was a store-level metric. A single task could in theory contain multiple
> > physical state store caches and we would have to roll them up to report
> the
> > size for the task as a whole. It's additional work just to lose some
> > information that the user may want to have
> >
> > Let me know if anything here doesn't make sense or needs clarification.
> And
> > thanks for the quick followup to get this 2nd metric!
> >
> > -Sophie
> >
> > On Sat, Jan 29, 2022 at 4:27 AM Sagar  wrote:
> >
> > > Hi All,
> > >
> > > I would like to open a discussion thread on the following KIP:
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390
> > >
> > > 

Re: [DISCUSS] KIP-818: Introduce cache-size-bytes-total Task Level Metric

2022-02-03 Thread Guozhang Wang
Thanks Sagar for proposing the KIP, and Sophie for sharing your thoughts.
Here're my 2c:

I think I agree with Sophie for making the two metrics (both the added and
the newly proposed) on INFO level since we are always calculating them
anyways. Regarding the level of the cache-size though, I'm thinking a bit
different with you two: today we do not actually keep that caches on the
per-store level, but rather on the per-thread level, i.e. when the cache is
full we would flush not only on the triggering state store but also
potentially on other state stores as well of the task that thread owns.
This mechanism, in hindsight, is a bit weird and we have some discussions
about refactoring that in the future already. Personally I'd like to make
this new metric to be aligned with whatever our future design will be.

In the long run if we would not have a static assignment from tasks to
threads, it may not make sense to keep a dedicated cache pool per thread.
Instead all tasks will be dynamically sharing the globally configured max
cache size (dynamically here means, we would not just divide the total size
by the num.tasks and then assign that to each task), and when a cache put
triggers the flushing because the sum now exceeds the global configured
value, we would potentially flush all the cached records for that task. If
this is the end stage, then I think keeping this metric at the task level
is good.



Guozhang




On Thu, Feb 3, 2022 at 10:15 AM Sophie Blee-Goldman
 wrote:

> Hey Sagar,  thanks for the KIP!
>
> And yes, all metrics are considered part of the public API and thus require
> a KIP to add (or modify, etc...) Although in this particular case, you
> could probably make a good case for just considering it as an update to the
> original KIP which added the analogous metric `input-buffer-bytes-total`.
> For  things like this that weren't considered during the KIP proposal but
> came up during the implementation or review, and are small changes that
> would have made sense to include in that KIP had they been thought of, you
> can just send an update to the existing KIP's discussion and.or voting
> thread that explains what you want to add or modify and maybe a brief
> description why.
>
> It's always ok to make a new KIP when in doubt, but there are some cases
> where an update email is sufficient. If there are any concerns or
> suggestions that significantly expand the scope of the update, you can
> always go create a new KIP and move the discussion there.
>
> I'd say you can feel free to proceed in whichever way you'd prefer for this
> new proposal -- it just needs to appear in some KIP somewhere, and have
> given the community thew opportunity to discuss and provide feedback on.
>
> On that note, I do have two suggestions:
>
> 1)  since we need to measure the size of the cache (and the input buffer(s)
> anyways, we may as well make `cache-size-bytes-total` -- and also the new
> input-buffer-bytes-total -- an INFO level metric. In general the more
> metrics the merrier, the only real reason for disabling some are if they
> have a performance impact or other cost that not everyone will want to pay.
> In this case we're already computing the value of these metrics, so why not
> expose it to the user as an INFO metric
> 2) I think it would be both more natural and easier to implement if this
> was a store-level metric. A single task could in theory contain multiple
> physical state store caches and we would have to roll them up to report the
> size for the task as a whole. It's additional work just to lose some
> information that the user may want to have
>
> Let me know if anything here doesn't make sense or needs clarification. And
> thanks for the quick followup to get this 2nd metric!
>
> -Sophie
>
> On Sat, Jan 29, 2022 at 4:27 AM Sagar  wrote:
>
> > Hi All,
> >
> > I would like to open a discussion thread on the following KIP:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390
> >
> > PS: This is about introducing a new metric and I am assuming that it
> > requires a KIP. If that isn't the case, I can close it.
> >
> > Thanks!
> > Sagar.
> >
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-818: Introduce cache-size-bytes-total Task Level Metric

2022-02-03 Thread Sophie Blee-Goldman
Hey Sagar,  thanks for the KIP!

And yes, all metrics are considered part of the public API and thus require
a KIP to add (or modify, etc...) Although in this particular case, you
could probably make a good case for just considering it as an update to the
original KIP which added the analogous metric `input-buffer-bytes-total`.
For  things like this that weren't considered during the KIP proposal but
came up during the implementation or review, and are small changes that
would have made sense to include in that KIP had they been thought of, you
can just send an update to the existing KIP's discussion and.or voting
thread that explains what you want to add or modify and maybe a brief
description why.

It's always ok to make a new KIP when in doubt, but there are some cases
where an update email is sufficient. If there are any concerns or
suggestions that significantly expand the scope of the update, you can
always go create a new KIP and move the discussion there.

I'd say you can feel free to proceed in whichever way you'd prefer for this
new proposal -- it just needs to appear in some KIP somewhere, and have
given the community thew opportunity to discuss and provide feedback on.

On that note, I do have two suggestions:

1)  since we need to measure the size of the cache (and the input buffer(s)
anyways, we may as well make `cache-size-bytes-total` -- and also the new
input-buffer-bytes-total -- an INFO level metric. In general the more
metrics the merrier, the only real reason for disabling some are if they
have a performance impact or other cost that not everyone will want to pay.
In this case we're already computing the value of these metrics, so why not
expose it to the user as an INFO metric
2) I think it would be both more natural and easier to implement if this
was a store-level metric. A single task could in theory contain multiple
physical state store caches and we would have to roll them up to report the
size for the task as a whole. It's additional work just to lose some
information that the user may want to have

Let me know if anything here doesn't make sense or needs clarification. And
thanks for the quick followup to get this 2nd metric!

-Sophie

On Sat, Jan 29, 2022 at 4:27 AM Sagar  wrote:

> Hi All,
>
> I would like to open a discussion thread on the following KIP:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390
>
> PS: This is about introducing a new metric and I am assuming that it
> requires a KIP. If that isn't the case, I can close it.
>
> Thanks!
> Sagar.
>


[DISCUSS] KIP-818: Introduce cache-size-bytes-total Task Level Metric

2022-01-29 Thread Sagar
Hi All,

I would like to open a discussion thread on the following KIP:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390

PS: This is about introducing a new metric and I am assuming that it
requires a KIP. If that isn't the case, I can close it.

Thanks!
Sagar.