Hi, devs,

Thank you for your feedback on PIP-3[1], I'd like to start a vote on it
since the discussion on this thread[2] came to an agreement.

The vote will last for at least 72 hours (Jun 29th, 2:00 GMT, excluding
weekend days) unless there is an objection or insufficient vote.

Best,
Guojun

[1]
https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon
[2]https://lists.apache.org/thread/h2lgqgrvcptzj3c8q4cjzv5jopgtwx9o

On Wed, Jun 21, 2023 at 5:53 PM Caizhi Weng <[email protected]> wrote:

> Hi Guojun!
>
> I've checked the latest PIP-3 and it now looks fine to me overall. Although
> there are still some minor issues, for example it is unclear to me how
> MetricGroup is created and is tagged. But this detail can be further
> discussed during the implementation. From my point of view this PIP is now
> fine enough to go on a vote.
>
> Guojun Li <[email protected]> 于2023年6月21日周三 15:23写道:
>
> > Hi caizhi,
> >
> > I've updated APIs in PIP3:
> > 1. Instance of Metrics class is a singleton which maintains the reporters
> > and Metric groups.
> > 2. Instantiation of MetricGroup will add the current group to Metrics
> > instance.
> > 3. The metrics list instance, like CommitMetrics instance, will maintain
> > the corresponding metricGroup, and can register metrics by the gauge(),
> > counter() ... method.
> >
> > Best,
> > Guojun
> >
> > On Mon, Jun 19, 2023 at 11:30 AM Caizhi Weng <[email protected]>
> wrote:
> >
> > > Hi!
> > >
> > > 1. Could you add some demonstration code about how to create and use a
> > > metric? You CommitMetrics code example uses Metrics#gauge method, but
> > > Metrics class does not have such method.
> > >
> > > 2. You've added Metrics#registerMetrics(String name, Metric metric, int
> > > groupIndex), which seems really weird to me. When registering a metric
> we
> > > have to remember the index of the group? Why not providing the group
> > > directly, or just register the metric in the group?
> > >
> > >
> > > Caizhi Weng <[email protected]> 于2023年6月16日周五 16:01写道:
> > >
> > > > Hi Guojun!
> > > >
> > > > I see that you've submitted the 3rd edition of PIP-3. Most problems
> > have
> > > > been resolved, however I still have these doubts:
> > > >
> > > > 1. Is `Metrics` class a singleton? If yes, it should contain multiple
> > > > `MetricGroup` as its member. Currently each `Metrics` class has only
> > one
> > > > `MetricGroup`. If it is not a singleton, how many instances will
> there
> > > be?
> > > > When and where are the instances created?
> > > > 2. In the `MetricGroup` class you support adding tags to one single
> > > > metric. In my opinion, tags should be added to groups, not metrics.
> > > Metrics
> > > > created from the same group should have the same set of tags, so the
> > > > creator of metrics do not need to concern about what tags to add.
> > > >
> > > > Guojun Li <[email protected]> 于2023年6月14日周三 07:15写道:
> > > >
> > > >> Yes, thanks Caizhi, it looks very nice.
> > > >>
> > > >> I'm going to make the metrics name more reasonable. Metrics group is
> > > also
> > > >> a
> > > >> good idea to take into account.
> > > >>
> > > >> On Thu, Jun 8, 2023 at 4:50 PM Caizhi Weng <[email protected]>
> > > wrote:
> > > >>
> > > >> > Hi Guojun!
> > > >> >
> > > >> > I've read the modified document. I still have the following
> > concerns:
> > > >> >
> > > >> > 1. Some metric names are still misleading. For example,
> > > >> "totalTablesFiles"
> > > >> > is the "number of total data files currently maintained on
> storage",
> > > >> > however "numTotalManifests" is the "number of scanned manifests
> > files
> > > in
> > > >> > the last scan planning", not the total number of manifest files on
> > > >> storage.
> > > >> > Please keep metric names consistent, even across different metric
> > > types.
> > > >> >
> > > >> > 2. There is no grouping in the current design. Let's consider all
> > > >> metrics
> > > >> > related to compaction. As one parallelism can hold multiple
> writers,
> > > it
> > > >> is
> > > >> > very likely that compactions for different buckets may happen in
> the
> > > >> same
> > > >> > parallelism. How are you going to distinguish the same compaction
> > > metric
> > > >> > for different buckets? Flink has metric groups which separate
> > > different
> > > >> > subtasks of the same operator.
> > > >> >
> > > >> > Guojun Li <[email protected]> 于2023年6月2日周五 18:25写道:
> > > >> >
> > > >> > > Thanks caizhi for the detailed review.
> > > >> > >
> > > >> > > I've updated the pip-3 based on your comments, they're useful to
> > > >> improve
> > > >> > > this proposal and make it clear.
> > > >> > > 1. I added the Histogram and HistogramStatistic interfaces.
> > > >> > > 2. Added histogram metrics to measure the distribution of
> commit /
> > > >> scan /
> > > >> > > compaction duration.
> > > >> > > 3. Update the descriptions for some metrics, make them more
> clear.
> > > >> > >
> > > >> > > And any other ideas from devs? Looking forward to your feedback.
> > > >> > >
> > > >> > > Best,
> > > >> > > Guojun
> > > >> > >
> > > >> > > On Mon, May 29, 2023 at 1:51 PM Guojun Li <
> > [email protected]>
> > > >> > wrote:
> > > >> > >
> > > >> > > > Thanks Caizhi for your comments. I'll think about introducing
> > > >> Histogram
> > > >> > > > metric type and making the metrics description more clear.
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Guojun
> > > >> > > >
> > > >> > > > On Thu, May 25, 2023 at 7:13 PM Caizhi Weng <
> > [email protected]
> > > >
> > > >> > > wrote:
> > > >> > > >
> > > >> > > >> I also forgot to mention that Flink metrics [1] is a good
> > > >> reference.
> > > >> > It
> > > >> > > >> clearly describes what Gauge, Counter and Histogram should do
> > and
> > > >> has
> > > >> > a
> > > >> > > >> list of metrics with clear definition. For example they have
> a
> > > >> metric
> > > >> > > >> named
> > > >> > > >> "lastCheckpointDuration" (which is a Gauge) and not just
> > > >> > > >> "checkpointDuration".
> > > >> > > >>
> > > >> > > >> [1]
> > > >> > > >>
> > > >> >
> > > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/
> > > >> > > >>
> > > >> > > >> Caizhi Weng <[email protected]> 于2023年5月25日周四 19:10写道:
> > > >> > > >>
> > > >> > > >> > Hi Guojun!
> > > >> > > >> >
> > > >> > > >> > Thanks for raising this discussion. After reading the
> > document,
> > > >> I'd
> > > >> > > like
> > > >> > > >> > to raise a few opinions.
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >>
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> --------------------------------------------------------------------------------
> > > >> > > >> >
> > > >> > > >> > 1. About the overall types of metrics.
> > > >> > > >> >
> > > >> > > >> > You suggested two types of metrics, namely Gauge and
> > Counter. A
> > > >> > gauge
> > > >> > > >> > indicates the value of a metric at that instance, for
> example
> > > >> "the
> > > >> > > >> duration
> > > >> > > >> > of the last compaction". A counter indicates an accumulated
> > > value
> > > >> > of a
> > > >> > > >> > metric overtime, for example "the total number of records
> > > >> written".
> > > >> > > >> >
> > > >> > > >> > However there are times when we want to study the history
> and
> > > see
> > > >> > the
> > > >> > > >> > trend. For example if we notice that the last compaction is
> > > >> slower
> > > >> > > than
> > > >> > > >> > normal, we might want to know "the average duration of the
> > last
> > > >> few
> > > >> > > >> > compactions" to make sure that this slow down is just by
> > chance
> > > >> or
> > > >> > is
> > > >> > > a
> > > >> > > >> > common case. Both gauge and counter cannot meet this need.
> > What
> > > >> we
> > > >> > > need
> > > >> > > >> is
> > > >> > > >> > another type of metrics called Histogram. This metrics type
> > > >> should
> > > >> > be
> > > >> > > >> able
> > > >> > > >> > to record the last few values of a metric and calculate
> their
> > > >> > > statistics
> > > >> > > >> > such as average, min/max and more.
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >>
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> --------------------------------------------------------------------------------
> > > >> > > >> >
> > > >> > > >> > 2. About the types of some specific metrics.
> > > >> > > >> >
> > > >> > > >> > It seems to me that in your proposal, you cannot clearly
> tell
> > > the
> > > >> > > >> > difference between Gauges and Counters, and sometimes it is
> > not
> > > >> > clear
> > > >> > > >> that
> > > >> > > >> > what a specific metric means. Let me give you some example.
> > > >> > > >> >
> > > >> > > >> > Metric Name: commitDuration
> > > >> > > >> >
> > > >> > > >> > Description:  Commit
> > > >> > > >> >
> > > >> > > >> > Type: Gauge
> > > >> > > >> >
> > > >> > > >> > Update at: Timer starts before commit starting, update
> commit
> > > >> > duration
> > > >> > > >> >> after commit finished
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >> > At first glance, "Commit" is not a valid description. From
> > the
> > > >> > metric
> > > >> > > >> name
> > > >> > > >> > I guess you'd like to record the length of duration for
> > > >> committing
> > > >> > > >> > snapshots. As its type is a Gauge, I suppose you want to
> > record
> > > >> the
> > > >> > > >> *last*
> > > >> > > >> > commit. Please make things clear what exactly you want to
> > > record.
> > > >> > And
> > > >> > > >> as I
> > > >> > > >> > said in the last section, it might be better to introduce a
> > > >> > Histogram
> > > >> > > >> type
> > > >> > > >> > so that we can study the average duration of the last few
> > > >> commits.
> > > >> > > >> >
> > > >> > > >> > Metric Name: numTableFilesAdded
> > > >> > > >> >
> > > >> > > >> > Description: Number of added table files in this commit
> > > >> > > >> >
> > > >> > > >> > Type: Counter
> > > >> > > >> >
> > > >> > > >> > Update at: Collecting changes from committables
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >> > By "this commit" I suppose you mean *last* commit. If you
> the
> > > >> type
> > > >> > > >> should
> > > >> > > >> > be a Gauge, not a Counter. By using Counter you may want to
> > > >> record
> > > >> > the
> > > >> > > >> *total
> > > >> > > >> > number* of files added during all commits.
> > > >> > > >> >
> > > >> > > >> > Metric Name: numFilesCompactedBefore
> > > >> > > >> >
> > > >> > > >> > Description: Number of deleted files in compaction
> > > >> > > >> >
> > > >> > > >> > Type: Counter
> > > >> > > >> >
> > > >> > > >> > Update at: Triggering compaction
> > > >> > > >> >
> > > >> > > >> >
> > > >> > > >> > It is also unclear to me what you're going to record. Are
> you
> > > >> going
> > > >> > > the
> > > >> > > >> > record the number of deleted files during the *last*
> > > compaction?
> > > >> Or
> > > >> > > >> you'd
> > > >> > > >> > like to record the *total number* of deleted files during
> all
> > > >> > > >> compaction?
> > > >> > > >> >
> > > >> > > >> > I'm afraid most of your proposed metrics seem unclear to
> me.
> > > >> Please
> > > >> > > >> > rewrite the proposal to make them clear.
> > > >> > > >> >
> > > >> > > >> > Guojun Li <[email protected]> 于2023年5月25日周四 16:59写道:
> > > >> > > >> >
> > > >> > > >> >> Hi, Paimon Devs,
> > > >> > > >> >>      I’d like to start a discussion about PIP-3[1].
> > > >> > > >> >>      In this PIP, I'm talking about how to support metrics
> > for
> > > >> > > paimon,
> > > >> > > >> >> what
> > > >> > > >> >> metrics paimon needs. Look forward to your question and
> > > >> > suggestions.
> > > >> > > >> >>
> > > >> > > >> >> Best,
> > > >> > > >> >> Guojun
> > > >> > > >> >>
> > > >> > > >> >> [1]
> > > >> > > >> >>
> > > >> > > >> >>
> > > >> > > >>
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon
> > > >> > > >> >>
> > > >> > > >> >
> > > >> > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Reply via email to