Hi Guojun,

Sorry for the late reply and thanks for the proposal. The PIP looks good to
me. I just have two questions to make sure we are on the same page.

1. What is the relationship between Metrics and ***Metrics? It seems that
e.g. CommitMetrics is only responsible for built-in metrics write/init
while Metrics is used for read and update access centrally for all
metricGroups and metrics, i.e. getter methods. Did I understand correctly?
2. Current group design is flat. There is no group hierarchy, i.e.
sub-groups. Will it be supported later or will we stick to the flat group
design in the future?

Best Regards
Jing


On Sun, Jun 25, 2023 at 10:33 AM Guojun Li <[email protected]> wrote:

> Hi, devs,
>
> Thank you for your feedback on PIP-3[1], I'd like to start a vote on it
> since the discussion on this thread[2] came to an agreement.
>
> The vote will last for at least 72 hours (Jun 29th, 2:00 GMT, excluding
> weekend days) unless there is an objection or insufficient vote.
>
> Best,
> Guojun
>
> [1]
>
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon
> [2]https://lists.apache.org/thread/h2lgqgrvcptzj3c8q4cjzv5jopgtwx9o
>
> On Wed, Jun 21, 2023 at 5:53 PM Caizhi Weng <[email protected]> wrote:
>
> > Hi Guojun!
> >
> > I've checked the latest PIP-3 and it now looks fine to me overall.
> Although
> > there are still some minor issues, for example it is unclear to me how
> > MetricGroup is created and is tagged. But this detail can be further
> > discussed during the implementation. From my point of view this PIP is
> now
> > fine enough to go on a vote.
> >
> > Guojun Li <[email protected]> 于2023年6月21日周三 15:23写道:
> >
> > > Hi caizhi,
> > >
> > > I've updated APIs in PIP3:
> > > 1. Instance of Metrics class is a singleton which maintains the
> reporters
> > > and Metric groups.
> > > 2. Instantiation of MetricGroup will add the current group to Metrics
> > > instance.
> > > 3. The metrics list instance, like CommitMetrics instance, will
> maintain
> > > the corresponding metricGroup, and can register metrics by the gauge(),
> > > counter() ... method.
> > >
> > > Best,
> > > Guojun
> > >
> > > On Mon, Jun 19, 2023 at 11:30 AM Caizhi Weng <[email protected]>
> > wrote:
> > >
> > > > Hi!
> > > >
> > > > 1. Could you add some demonstration code about how to create and use
> a
> > > > metric? You CommitMetrics code example uses Metrics#gauge method, but
> > > > Metrics class does not have such method.
> > > >
> > > > 2. You've added Metrics#registerMetrics(String name, Metric metric,
> int
> > > > groupIndex), which seems really weird to me. When registering a
> metric
> > we
> > > > have to remember the index of the group? Why not providing the group
> > > > directly, or just register the metric in the group?
> > > >
> > > >
> > > > Caizhi Weng <[email protected]> 于2023年6月16日周五 16:01写道:
> > > >
> > > > > Hi Guojun!
> > > > >
> > > > > I see that you've submitted the 3rd edition of PIP-3. Most problems
> > > have
> > > > > been resolved, however I still have these doubts:
> > > > >
> > > > > 1. Is `Metrics` class a singleton? If yes, it should contain
> multiple
> > > > > `MetricGroup` as its member. Currently each `Metrics` class has
> only
> > > one
> > > > > `MetricGroup`. If it is not a singleton, how many instances will
> > there
> > > > be?
> > > > > When and where are the instances created?
> > > > > 2. In the `MetricGroup` class you support adding tags to one single
> > > > > metric. In my opinion, tags should be added to groups, not metrics.
> > > > Metrics
> > > > > created from the same group should have the same set of tags, so
> the
> > > > > creator of metrics do not need to concern about what tags to add.
> > > > >
> > > > > Guojun Li <[email protected]> 于2023年6月14日周三 07:15写道:
> > > > >
> > > > >> Yes, thanks Caizhi, it looks very nice.
> > > > >>
> > > > >> I'm going to make the metrics name more reasonable. Metrics group
> is
> > > > also
> > > > >> a
> > > > >> good idea to take into account.
> > > > >>
> > > > >> On Thu, Jun 8, 2023 at 4:50 PM Caizhi Weng <[email protected]>
> > > > wrote:
> > > > >>
> > > > >> > Hi Guojun!
> > > > >> >
> > > > >> > I've read the modified document. I still have the following
> > > concerns:
> > > > >> >
> > > > >> > 1. Some metric names are still misleading. For example,
> > > > >> "totalTablesFiles"
> > > > >> > is the "number of total data files currently maintained on
> > storage",
> > > > >> > however "numTotalManifests" is the "number of scanned manifests
> > > files
> > > > in
> > > > >> > the last scan planning", not the total number of manifest files
> on
> > > > >> storage.
> > > > >> > Please keep metric names consistent, even across different
> metric
> > > > types.
> > > > >> >
> > > > >> > 2. There is no grouping in the current design. Let's consider
> all
> > > > >> metrics
> > > > >> > related to compaction. As one parallelism can hold multiple
> > writers,
> > > > it
> > > > >> is
> > > > >> > very likely that compactions for different buckets may happen in
> > the
> > > > >> same
> > > > >> > parallelism. How are you going to distinguish the same
> compaction
> > > > metric
> > > > >> > for different buckets? Flink has metric groups which separate
> > > > different
> > > > >> > subtasks of the same operator.
> > > > >> >
> > > > >> > Guojun Li <[email protected]> 于2023年6月2日周五 18:25写道:
> > > > >> >
> > > > >> > > Thanks caizhi for the detailed review.
> > > > >> > >
> > > > >> > > I've updated the pip-3 based on your comments, they're useful
> to
> > > > >> improve
> > > > >> > > this proposal and make it clear.
> > > > >> > > 1. I added the Histogram and HistogramStatistic interfaces.
> > > > >> > > 2. Added histogram metrics to measure the distribution of
> > commit /
> > > > >> scan /
> > > > >> > > compaction duration.
> > > > >> > > 3. Update the descriptions for some metrics, make them more
> > clear.
> > > > >> > >
> > > > >> > > And any other ideas from devs? Looking forward to your
> feedback.
> > > > >> > >
> > > > >> > > Best,
> > > > >> > > Guojun
> > > > >> > >
> > > > >> > > On Mon, May 29, 2023 at 1:51 PM Guojun Li <
> > > [email protected]>
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Thanks Caizhi for your comments. I'll think about
> introducing
> > > > >> Histogram
> > > > >> > > > metric type and making the metrics description more clear.
> > > > >> > > >
> > > > >> > > > Best,
> > > > >> > > > Guojun
> > > > >> > > >
> > > > >> > > > On Thu, May 25, 2023 at 7:13 PM Caizhi Weng <
> > > [email protected]
> > > > >
> > > > >> > > wrote:
> > > > >> > > >
> > > > >> > > >> I also forgot to mention that Flink metrics [1] is a good
> > > > >> reference.
> > > > >> > It
> > > > >> > > >> clearly describes what Gauge, Counter and Histogram should
> do
> > > and
> > > > >> has
> > > > >> > a
> > > > >> > > >> list of metrics with clear definition. For example they
> have
> > a
> > > > >> metric
> > > > >> > > >> named
> > > > >> > > >> "lastCheckpointDuration" (which is a Gauge) and not just
> > > > >> > > >> "checkpointDuration".
> > > > >> > > >>
> > > > >> > > >> [1]
> > > > >> > > >>
> > > > >> >
> > > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/
> > > > >> > > >>
> > > > >> > > >> Caizhi Weng <[email protected]> 于2023年5月25日周四 19:10写道:
> > > > >> > > >>
> > > > >> > > >> > Hi Guojun!
> > > > >> > > >> >
> > > > >> > > >> > Thanks for raising this discussion. After reading the
> > > document,
> > > > >> I'd
> > > > >> > > like
> > > > >> > > >> > to raise a few opinions.
> > > > >> > > >> >
> > > > >> > > >> >
> > > > >> > > >> >
> > > > >> > > >>
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> --------------------------------------------------------------------------------
> > > > >> > > >> >
> > > > >> > > >> > 1. About the overall types of metrics.
> > > > >> > > >> >
> > > > >> > > >> > You suggested two types of metrics, namely Gauge and
> > > Counter. A
> > > > >> > gauge
> > > > >> > > >> > indicates the value of a metric at that instance, for
> > example
> > > > >> "the
> > > > >> > > >> duration
> > > > >> > > >> > of the last compaction". A counter indicates an
> accumulated
> > > > value
> > > > >> > of a
> > > > >> > > >> > metric overtime, for example "the total number of records
> > > > >> written".
> > > > >> > > >> >
> > > > >> > > >> > However there are times when we want to study the history
> > and
> > > > see
> > > > >> > the
> > > > >> > > >> > trend. For example if we notice that the last compaction
> is
> > > > >> slower
> > > > >> > > than
> > > > >> > > >> > normal, we might want to know "the average duration of
> the
> > > last
> > > > >> few
> > > > >> > > >> > compactions" to make sure that this slow down is just by
> > > chance
> > > > >> or
> > > > >> > is
> > > > >> > > a
> > > > >> > > >> > common case. Both gauge and counter cannot meet this
> need.
> > > What
> > > > >> we
> > > > >> > > need
> > > > >> > > >> is
> > > > >> > > >> > another type of metrics called Histogram. This metrics
> type
> > > > >> should
> > > > >> > be
> > > > >> > > >> able
> > > > >> > > >> > to record the last few values of a metric and calculate
> > their
> > > > >> > > statistics
> > > > >> > > >> > such as average, min/max and more.
> > > > >> > > >> >
> > > > >> > > >> >
> > > > >> > > >> >
> > > > >> > > >>
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> --------------------------------------------------------------------------------
> > > > >> > > >> >
> > > > >> > > >> > 2. About the types of some specific metrics.
> > > > >> > > >> >
> > > > >> > > >> > It seems to me that in your proposal, you cannot clearly
> > tell
> > > > the
> > > > >> > > >> > difference between Gauges and Counters, and sometimes it
> is
> > > not
> > > > >> > clear
> > > > >> > > >> that
> > > > >> > > >> > what a specific metric means. Let me give you some
> example.
> > > > >> > > >> >
> > > > >> > > >> > Metric Name: commitDuration
> > > > >> > > >> >
> > > > >> > > >> > Description:  Commit
> > > > >> > > >> >
> > > > >> > > >> > Type: Gauge
> > > > >> > > >> >
> > > > >> > > >> > Update at: Timer starts before commit starting, update
> > commit
> > > > >> > duration
> > > > >> > > >> >> after commit finished
> > > > >> > > >> >
> > > > >> > > >> >
> > > > >> > > >> > At first glance, "Commit" is not a valid description.
> From
> > > the
> > > > >> > metric
> > > > >> > > >> name
> > > > >> > > >> > I guess you'd like to record the length of duration for
> > > > >> committing
> > > > >> > > >> > snapshots. As its type is a Gauge, I suppose you want to
> > > record
> > > > >> the
> > > > >> > > >> *last*
> > > > >> > > >> > commit. Please make things clear what exactly you want to
> > > > record.
> > > > >> > And
> > > > >> > > >> as I
> > > > >> > > >> > said in the last section, it might be better to
> introduce a
> > > > >> > Histogram
> > > > >> > > >> type
> > > > >> > > >> > so that we can study the average duration of the last few
> > > > >> commits.
> > > > >> > > >> >
> > > > >> > > >> > Metric Name: numTableFilesAdded
> > > > >> > > >> >
> > > > >> > > >> > Description: Number of added table files in this commit
> > > > >> > > >> >
> > > > >> > > >> > Type: Counter
> > > > >> > > >> >
> > > > >> > > >> > Update at: Collecting changes from committables
> > > > >> > > >> >
> > > > >> > > >> >
> > > > >> > > >> > By "this commit" I suppose you mean *last* commit. If you
> > the
> > > > >> type
> > > > >> > > >> should
> > > > >> > > >> > be a Gauge, not a Counter. By using Counter you may want
> to
> > > > >> record
> > > > >> > the
> > > > >> > > >> *total
> > > > >> > > >> > number* of files added during all commits.
> > > > >> > > >> >
> > > > >> > > >> > Metric Name: numFilesCompactedBefore
> > > > >> > > >> >
> > > > >> > > >> > Description: Number of deleted files in compaction
> > > > >> > > >> >
> > > > >> > > >> > Type: Counter
> > > > >> > > >> >
> > > > >> > > >> > Update at: Triggering compaction
> > > > >> > > >> >
> > > > >> > > >> >
> > > > >> > > >> > It is also unclear to me what you're going to record. Are
> > you
> > > > >> going
> > > > >> > > the
> > > > >> > > >> > record the number of deleted files during the *last*
> > > > compaction?
> > > > >> Or
> > > > >> > > >> you'd
> > > > >> > > >> > like to record the *total number* of deleted files during
> > all
> > > > >> > > >> compaction?
> > > > >> > > >> >
> > > > >> > > >> > I'm afraid most of your proposed metrics seem unclear to
> > me.
> > > > >> Please
> > > > >> > > >> > rewrite the proposal to make them clear.
> > > > >> > > >> >
> > > > >> > > >> > Guojun Li <[email protected]> 于2023年5月25日周四
> 16:59写道:
> > > > >> > > >> >
> > > > >> > > >> >> Hi, Paimon Devs,
> > > > >> > > >> >>      I’d like to start a discussion about PIP-3[1].
> > > > >> > > >> >>      In this PIP, I'm talking about how to support
> metrics
> > > for
> > > > >> > > paimon,
> > > > >> > > >> >> what
> > > > >> > > >> >> metrics paimon needs. Look forward to your question and
> > > > >> > suggestions.
> > > > >> > > >> >>
> > > > >> > > >> >> Best,
> > > > >> > > >> >> Guojun
> > > > >> > > >> >>
> > > > >> > > >> >> [1]
> > > > >> > > >> >>
> > > > >> > > >> >>
> > > > >> > > >>
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon
> > > > >> > > >> >>
> > > > >> > > >> >
> > > > >> > > >>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Reply via email to