Thanks for the clarification!

By default the meterview measures for 1 minute sounds good to me!

+1 for this proposal.

Best,
Rui

On Wed, Sep 6, 2023 at 1:27 PM Gyula Fóra <gyula.f...@gmail.com> wrote:

> Thanks for the feedback Rui,
>
> The rates would be computed using the MeterView class (like for any other
> rate metric), just because we report the value per second it doesn't mean
> that we measure in a second granularity.
> By default the meterview measures for 1 minute and then we calculate the
> per second rates, but we can increase the timespan if necessary.
>
> So I don't think we run into this problem in practice and we can keep the
> metric aligned with other time rate metrics like busyTimeMsPerSec etc.
>
> Cheers,
> Gyula
>
> On Wed, Sep 6, 2023 at 4:55 AM Rui Fan <1996fan...@gmail.com> wrote:
>
> > Hi Gyula,
> >
> > +1 for this proposal. The current GC metric is really unfriendly.
> >
> > I have a concern with your proposed rate metric: the rate is perSecond
> > instead of per minute. I'm unsure whether it's suitable for GC metric.
> >
> > There are two reasons why I suspect perSecond may not be well
> > compatible with GC metric:
> >
> > 1. GCs are usually infrequent and may only occur for a small number
> > of time periods within a minute.
> >
> > Metrics are collected periodically, for example, reported every minute.
> > If the result reported by the GC metric is 1s/perSecond, it does not
> > mean that the GC of the TM is serious, because there may be no GC
> > in the remaining 59s.
> >
> > On the contrary, the GC metric reports 0s/perSecond, which does not
> > mean that the GC of the TM is not serious, and the GC may be very
> > serious in the remaining 59s.
> >
> > 2. Stop-the-world may cause the metric to fail(delay) to report
> >
> > The TM will stop the world during GC, especially full GC. It means
> > the metric cannot be collected or reported during full GC.
> >
> > So the collected GC metric may never be 1s/perSecond. This metric
> > may always be good because the metric will only be reported when
> > the GC is not severe.
> >
> >
> > If these concerns make sense, how about updating the GC rate
> > at minute level?
> >
> > We can define the type to Gauge for TimeMsPerMiunte, and updating
> > this Gauge every second, it is:
> > GC Total.Time of current time - GC total time of one miunte ago.
> >
> > Best,
> > Rui
> >
> > On Tue, Sep 5, 2023 at 11:05 PM Maximilian Michels <m...@apache.org>
> wrote:
> >
> > > Hi Gyula,
> > >
> > > +1 The proposed changes make sense and are in line with what is
> > > available for other metrics, e.g. number of records processed.
> > >
> > > -Max
> > >
> > > On Tue, Sep 5, 2023 at 2:43 PM Gyula Fóra <gyula.f...@gmail.com>
> wrote:
> > > >
> > > > Hi Devs,
> > > >
> > > > I would like to start a discussion on FLIP-361: Improve GC Metrics
> [1].
> > > >
> > > > The current Flink GC metrics [2] are not very useful for monitoring
> > > > purposes as they require post processing logic that is also dependent
> > on
> > > > the current runtime environment.
> > > >
> > > > Problems:
> > > >  - Total time is not very relevant for long running applications,
> only
> > > the
> > > > rate of change (msPerSec)
> > > >  - In most cases it's best to simply aggregate the time/count across
> > the
> > > > different GabrageCollectors, however the specific collectors are
> > > dependent
> > > > on the current Java runtime
> > > >
> > > > We propose to improve the current situation by:
> > > >  - Exposing rate metrics per GarbageCollector
> > > >  - Exposing aggregated Total time/count/rate metrics
> > > >
> > > > These new metrics are all derived from the existing ones with minimal
> > > > overhead.
> > > >
> > > > Looking forward to your feedback.
> > > >
> > > > Cheers,
> > > > Gyula
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-361%3A+Improve+GC+Metrics
> > > > [2]
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#garbagecollection
> > >
> >
>

Reply via email to