Hi Guojun!

I've checked the latest PIP-3 and it now looks fine to me overall. Although
there are still some minor issues, for example it is unclear to me how
MetricGroup is created and is tagged. But this detail can be further
discussed during the implementation. From my point of view this PIP is now
fine enough to go on a vote.

Guojun Li <[email protected]> 于2023年6月21日周三 15:23写道:

> Hi caizhi,
>
> I've updated APIs in PIP3:
> 1. Instance of Metrics class is a singleton which maintains the reporters
> and Metric groups.
> 2. Instantiation of MetricGroup will add the current group to Metrics
> instance.
> 3. The metrics list instance, like CommitMetrics instance, will maintain
> the corresponding metricGroup, and can register metrics by the gauge(),
> counter() ... method.
>
> Best,
> Guojun
>
> On Mon, Jun 19, 2023 at 11:30 AM Caizhi Weng <[email protected]> wrote:
>
> > Hi!
> >
> > 1. Could you add some demonstration code about how to create and use a
> > metric? You CommitMetrics code example uses Metrics#gauge method, but
> > Metrics class does not have such method.
> >
> > 2. You've added Metrics#registerMetrics(String name, Metric metric, int
> > groupIndex), which seems really weird to me. When registering a metric we
> > have to remember the index of the group? Why not providing the group
> > directly, or just register the metric in the group?
> >
> >
> > Caizhi Weng <[email protected]> 于2023年6月16日周五 16:01写道:
> >
> > > Hi Guojun!
> > >
> > > I see that you've submitted the 3rd edition of PIP-3. Most problems
> have
> > > been resolved, however I still have these doubts:
> > >
> > > 1. Is `Metrics` class a singleton? If yes, it should contain multiple
> > > `MetricGroup` as its member. Currently each `Metrics` class has only
> one
> > > `MetricGroup`. If it is not a singleton, how many instances will there
> > be?
> > > When and where are the instances created?
> > > 2. In the `MetricGroup` class you support adding tags to one single
> > > metric. In my opinion, tags should be added to groups, not metrics.
> > Metrics
> > > created from the same group should have the same set of tags, so the
> > > creator of metrics do not need to concern about what tags to add.
> > >
> > > Guojun Li <[email protected]> 于2023年6月14日周三 07:15写道:
> > >
> > >> Yes, thanks Caizhi, it looks very nice.
> > >>
> > >> I'm going to make the metrics name more reasonable. Metrics group is
> > also
> > >> a
> > >> good idea to take into account.
> > >>
> > >> On Thu, Jun 8, 2023 at 4:50 PM Caizhi Weng <[email protected]>
> > wrote:
> > >>
> > >> > Hi Guojun!
> > >> >
> > >> > I've read the modified document. I still have the following
> concerns:
> > >> >
> > >> > 1. Some metric names are still misleading. For example,
> > >> "totalTablesFiles"
> > >> > is the "number of total data files currently maintained on storage",
> > >> > however "numTotalManifests" is the "number of scanned manifests
> files
> > in
> > >> > the last scan planning", not the total number of manifest files on
> > >> storage.
> > >> > Please keep metric names consistent, even across different metric
> > types.
> > >> >
> > >> > 2. There is no grouping in the current design. Let's consider all
> > >> metrics
> > >> > related to compaction. As one parallelism can hold multiple writers,
> > it
> > >> is
> > >> > very likely that compactions for different buckets may happen in the
> > >> same
> > >> > parallelism. How are you going to distinguish the same compaction
> > metric
> > >> > for different buckets? Flink has metric groups which separate
> > different
> > >> > subtasks of the same operator.
> > >> >
> > >> > Guojun Li <[email protected]> 于2023年6月2日周五 18:25写道:
> > >> >
> > >> > > Thanks caizhi for the detailed review.
> > >> > >
> > >> > > I've updated the pip-3 based on your comments, they're useful to
> > >> improve
> > >> > > this proposal and make it clear.
> > >> > > 1. I added the Histogram and HistogramStatistic interfaces.
> > >> > > 2. Added histogram metrics to measure the distribution of commit /
> > >> scan /
> > >> > > compaction duration.
> > >> > > 3. Update the descriptions for some metrics, make them more clear.
> > >> > >
> > >> > > And any other ideas from devs? Looking forward to your feedback.
> > >> > >
> > >> > > Best,
> > >> > > Guojun
> > >> > >
> > >> > > On Mon, May 29, 2023 at 1:51 PM Guojun Li <
> [email protected]>
> > >> > wrote:
> > >> > >
> > >> > > > Thanks Caizhi for your comments. I'll think about introducing
> > >> Histogram
> > >> > > > metric type and making the metrics description more clear.
> > >> > > >
> > >> > > > Best,
> > >> > > > Guojun
> > >> > > >
> > >> > > > On Thu, May 25, 2023 at 7:13 PM Caizhi Weng <
> [email protected]
> > >
> > >> > > wrote:
> > >> > > >
> > >> > > >> I also forgot to mention that Flink metrics [1] is a good
> > >> reference.
> > >> > It
> > >> > > >> clearly describes what Gauge, Counter and Histogram should do
> and
> > >> has
> > >> > a
> > >> > > >> list of metrics with clear definition. For example they have a
> > >> metric
> > >> > > >> named
> > >> > > >> "lastCheckpointDuration" (which is a Gauge) and not just
> > >> > > >> "checkpointDuration".
> > >> > > >>
> > >> > > >> [1]
> > >> > > >>
> > >> >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/
> > >> > > >>
> > >> > > >> Caizhi Weng <[email protected]> 于2023年5月25日周四 19:10写道:
> > >> > > >>
> > >> > > >> > Hi Guojun!
> > >> > > >> >
> > >> > > >> > Thanks for raising this discussion. After reading the
> document,
> > >> I'd
> > >> > > like
> > >> > > >> > to raise a few opinions.
> > >> > > >> >
> > >> > > >> >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> --------------------------------------------------------------------------------
> > >> > > >> >
> > >> > > >> > 1. About the overall types of metrics.
> > >> > > >> >
> > >> > > >> > You suggested two types of metrics, namely Gauge and
> Counter. A
> > >> > gauge
> > >> > > >> > indicates the value of a metric at that instance, for example
> > >> "the
> > >> > > >> duration
> > >> > > >> > of the last compaction". A counter indicates an accumulated
> > value
> > >> > of a
> > >> > > >> > metric overtime, for example "the total number of records
> > >> written".
> > >> > > >> >
> > >> > > >> > However there are times when we want to study the history and
> > see
> > >> > the
> > >> > > >> > trend. For example if we notice that the last compaction is
> > >> slower
> > >> > > than
> > >> > > >> > normal, we might want to know "the average duration of the
> last
> > >> few
> > >> > > >> > compactions" to make sure that this slow down is just by
> chance
> > >> or
> > >> > is
> > >> > > a
> > >> > > >> > common case. Both gauge and counter cannot meet this need.
> What
> > >> we
> > >> > > need
> > >> > > >> is
> > >> > > >> > another type of metrics called Histogram. This metrics type
> > >> should
> > >> > be
> > >> > > >> able
> > >> > > >> > to record the last few values of a metric and calculate their
> > >> > > statistics
> > >> > > >> > such as average, min/max and more.
> > >> > > >> >
> > >> > > >> >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> --------------------------------------------------------------------------------
> > >> > > >> >
> > >> > > >> > 2. About the types of some specific metrics.
> > >> > > >> >
> > >> > > >> > It seems to me that in your proposal, you cannot clearly tell
> > the
> > >> > > >> > difference between Gauges and Counters, and sometimes it is
> not
> > >> > clear
> > >> > > >> that
> > >> > > >> > what a specific metric means. Let me give you some example.
> > >> > > >> >
> > >> > > >> > Metric Name: commitDuration
> > >> > > >> >
> > >> > > >> > Description:  Commit
> > >> > > >> >
> > >> > > >> > Type: Gauge
> > >> > > >> >
> > >> > > >> > Update at: Timer starts before commit starting, update commit
> > >> > duration
> > >> > > >> >> after commit finished
> > >> > > >> >
> > >> > > >> >
> > >> > > >> > At first glance, "Commit" is not a valid description. From
> the
> > >> > metric
> > >> > > >> name
> > >> > > >> > I guess you'd like to record the length of duration for
> > >> committing
> > >> > > >> > snapshots. As its type is a Gauge, I suppose you want to
> record
> > >> the
> > >> > > >> *last*
> > >> > > >> > commit. Please make things clear what exactly you want to
> > record.
> > >> > And
> > >> > > >> as I
> > >> > > >> > said in the last section, it might be better to introduce a
> > >> > Histogram
> > >> > > >> type
> > >> > > >> > so that we can study the average duration of the last few
> > >> commits.
> > >> > > >> >
> > >> > > >> > Metric Name: numTableFilesAdded
> > >> > > >> >
> > >> > > >> > Description: Number of added table files in this commit
> > >> > > >> >
> > >> > > >> > Type: Counter
> > >> > > >> >
> > >> > > >> > Update at: Collecting changes from committables
> > >> > > >> >
> > >> > > >> >
> > >> > > >> > By "this commit" I suppose you mean *last* commit. If you the
> > >> type
> > >> > > >> should
> > >> > > >> > be a Gauge, not a Counter. By using Counter you may want to
> > >> record
> > >> > the
> > >> > > >> *total
> > >> > > >> > number* of files added during all commits.
> > >> > > >> >
> > >> > > >> > Metric Name: numFilesCompactedBefore
> > >> > > >> >
> > >> > > >> > Description: Number of deleted files in compaction
> > >> > > >> >
> > >> > > >> > Type: Counter
> > >> > > >> >
> > >> > > >> > Update at: Triggering compaction
> > >> > > >> >
> > >> > > >> >
> > >> > > >> > It is also unclear to me what you're going to record. Are you
> > >> going
> > >> > > the
> > >> > > >> > record the number of deleted files during the *last*
> > compaction?
> > >> Or
> > >> > > >> you'd
> > >> > > >> > like to record the *total number* of deleted files during all
> > >> > > >> compaction?
> > >> > > >> >
> > >> > > >> > I'm afraid most of your proposed metrics seem unclear to me.
> > >> Please
> > >> > > >> > rewrite the proposal to make them clear.
> > >> > > >> >
> > >> > > >> > Guojun Li <[email protected]> 于2023年5月25日周四 16:59写道:
> > >> > > >> >
> > >> > > >> >> Hi, Paimon Devs,
> > >> > > >> >>      I’d like to start a discussion about PIP-3[1].
> > >> > > >> >>      In this PIP, I'm talking about how to support metrics
> for
> > >> > > paimon,
> > >> > > >> >> what
> > >> > > >> >> metrics paimon needs. Look forward to your question and
> > >> > suggestions.
> > >> > > >> >>
> > >> > > >> >> Best,
> > >> > > >> >> Guojun
> > >> > > >> >>
> > >> > > >> >> [1]
> > >> > > >> >>
> > >> > > >> >>
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon
> > >> > > >> >>
> > >> > > >> >
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Reply via email to