Hi!

1. Could you add some demonstration code about how to create and use a
metric? You CommitMetrics code example uses Metrics#gauge method, but
Metrics class does not have such method.

2. You've added Metrics#registerMetrics(String name, Metric metric, int
groupIndex), which seems really weird to me. When registering a metric we
have to remember the index of the group? Why not providing the group
directly, or just register the metric in the group?


Caizhi Weng <[email protected]> 于2023年6月16日周五 16:01写道:

> Hi Guojun!
>
> I see that you've submitted the 3rd edition of PIP-3. Most problems have
> been resolved, however I still have these doubts:
>
> 1. Is `Metrics` class a singleton? If yes, it should contain multiple
> `MetricGroup` as its member. Currently each `Metrics` class has only one
> `MetricGroup`. If it is not a singleton, how many instances will there be?
> When and where are the instances created?
> 2. In the `MetricGroup` class you support adding tags to one single
> metric. In my opinion, tags should be added to groups, not metrics. Metrics
> created from the same group should have the same set of tags, so the
> creator of metrics do not need to concern about what tags to add.
>
> Guojun Li <[email protected]> 于2023年6月14日周三 07:15写道:
>
>> Yes, thanks Caizhi, it looks very nice.
>>
>> I'm going to make the metrics name more reasonable. Metrics group is also
>> a
>> good idea to take into account.
>>
>> On Thu, Jun 8, 2023 at 4:50 PM Caizhi Weng <[email protected]> wrote:
>>
>> > Hi Guojun!
>> >
>> > I've read the modified document. I still have the following concerns:
>> >
>> > 1. Some metric names are still misleading. For example,
>> "totalTablesFiles"
>> > is the "number of total data files currently maintained on storage",
>> > however "numTotalManifests" is the "number of scanned manifests files in
>> > the last scan planning", not the total number of manifest files on
>> storage.
>> > Please keep metric names consistent, even across different metric types.
>> >
>> > 2. There is no grouping in the current design. Let's consider all
>> metrics
>> > related to compaction. As one parallelism can hold multiple writers, it
>> is
>> > very likely that compactions for different buckets may happen in the
>> same
>> > parallelism. How are you going to distinguish the same compaction metric
>> > for different buckets? Flink has metric groups which separate different
>> > subtasks of the same operator.
>> >
>> > Guojun Li <[email protected]> 于2023年6月2日周五 18:25写道:
>> >
>> > > Thanks caizhi for the detailed review.
>> > >
>> > > I've updated the pip-3 based on your comments, they're useful to
>> improve
>> > > this proposal and make it clear.
>> > > 1. I added the Histogram and HistogramStatistic interfaces.
>> > > 2. Added histogram metrics to measure the distribution of commit /
>> scan /
>> > > compaction duration.
>> > > 3. Update the descriptions for some metrics, make them more clear.
>> > >
>> > > And any other ideas from devs? Looking forward to your feedback.
>> > >
>> > > Best,
>> > > Guojun
>> > >
>> > > On Mon, May 29, 2023 at 1:51 PM Guojun Li <[email protected]>
>> > wrote:
>> > >
>> > > > Thanks Caizhi for your comments. I'll think about introducing
>> Histogram
>> > > > metric type and making the metrics description more clear.
>> > > >
>> > > > Best,
>> > > > Guojun
>> > > >
>> > > > On Thu, May 25, 2023 at 7:13 PM Caizhi Weng <[email protected]>
>> > > wrote:
>> > > >
>> > > >> I also forgot to mention that Flink metrics [1] is a good
>> reference.
>> > It
>> > > >> clearly describes what Gauge, Counter and Histogram should do and
>> has
>> > a
>> > > >> list of metrics with clear definition. For example they have a
>> metric
>> > > >> named
>> > > >> "lastCheckpointDuration" (which is a Gauge) and not just
>> > > >> "checkpointDuration".
>> > > >>
>> > > >> [1]
>> > > >>
>> > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/
>> > > >>
>> > > >> Caizhi Weng <[email protected]> 于2023年5月25日周四 19:10写道:
>> > > >>
>> > > >> > Hi Guojun!
>> > > >> >
>> > > >> > Thanks for raising this discussion. After reading the document,
>> I'd
>> > > like
>> > > >> > to raise a few opinions.
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >>
>> > >
>> >
>> --------------------------------------------------------------------------------
>> > > >> >
>> > > >> > 1. About the overall types of metrics.
>> > > >> >
>> > > >> > You suggested two types of metrics, namely Gauge and Counter. A
>> > gauge
>> > > >> > indicates the value of a metric at that instance, for example
>> "the
>> > > >> duration
>> > > >> > of the last compaction". A counter indicates an accumulated value
>> > of a
>> > > >> > metric overtime, for example "the total number of records
>> written".
>> > > >> >
>> > > >> > However there are times when we want to study the history and see
>> > the
>> > > >> > trend. For example if we notice that the last compaction is
>> slower
>> > > than
>> > > >> > normal, we might want to know "the average duration of the last
>> few
>> > > >> > compactions" to make sure that this slow down is just by chance
>> or
>> > is
>> > > a
>> > > >> > common case. Both gauge and counter cannot meet this need. What
>> we
>> > > need
>> > > >> is
>> > > >> > another type of metrics called Histogram. This metrics type
>> should
>> > be
>> > > >> able
>> > > >> > to record the last few values of a metric and calculate their
>> > > statistics
>> > > >> > such as average, min/max and more.
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >>
>> > >
>> >
>> --------------------------------------------------------------------------------
>> > > >> >
>> > > >> > 2. About the types of some specific metrics.
>> > > >> >
>> > > >> > It seems to me that in your proposal, you cannot clearly tell the
>> > > >> > difference between Gauges and Counters, and sometimes it is not
>> > clear
>> > > >> that
>> > > >> > what a specific metric means. Let me give you some example.
>> > > >> >
>> > > >> > Metric Name: commitDuration
>> > > >> >
>> > > >> > Description:  Commit
>> > > >> >
>> > > >> > Type: Gauge
>> > > >> >
>> > > >> > Update at: Timer starts before commit starting, update commit
>> > duration
>> > > >> >> after commit finished
>> > > >> >
>> > > >> >
>> > > >> > At first glance, "Commit" is not a valid description. From the
>> > metric
>> > > >> name
>> > > >> > I guess you'd like to record the length of duration for
>> committing
>> > > >> > snapshots. As its type is a Gauge, I suppose you want to record
>> the
>> > > >> *last*
>> > > >> > commit. Please make things clear what exactly you want to record.
>> > And
>> > > >> as I
>> > > >> > said in the last section, it might be better to introduce a
>> > Histogram
>> > > >> type
>> > > >> > so that we can study the average duration of the last few
>> commits.
>> > > >> >
>> > > >> > Metric Name: numTableFilesAdded
>> > > >> >
>> > > >> > Description: Number of added table files in this commit
>> > > >> >
>> > > >> > Type: Counter
>> > > >> >
>> > > >> > Update at: Collecting changes from committables
>> > > >> >
>> > > >> >
>> > > >> > By "this commit" I suppose you mean *last* commit. If you the
>> type
>> > > >> should
>> > > >> > be a Gauge, not a Counter. By using Counter you may want to
>> record
>> > the
>> > > >> *total
>> > > >> > number* of files added during all commits.
>> > > >> >
>> > > >> > Metric Name: numFilesCompactedBefore
>> > > >> >
>> > > >> > Description: Number of deleted files in compaction
>> > > >> >
>> > > >> > Type: Counter
>> > > >> >
>> > > >> > Update at: Triggering compaction
>> > > >> >
>> > > >> >
>> > > >> > It is also unclear to me what you're going to record. Are you
>> going
>> > > the
>> > > >> > record the number of deleted files during the *last* compaction?
>> Or
>> > > >> you'd
>> > > >> > like to record the *total number* of deleted files during all
>> > > >> compaction?
>> > > >> >
>> > > >> > I'm afraid most of your proposed metrics seem unclear to me.
>> Please
>> > > >> > rewrite the proposal to make them clear.
>> > > >> >
>> > > >> > Guojun Li <[email protected]> 于2023年5月25日周四 16:59写道:
>> > > >> >
>> > > >> >> Hi, Paimon Devs,
>> > > >> >>      I’d like to start a discussion about PIP-3[1].
>> > > >> >>      In this PIP, I'm talking about how to support metrics for
>> > > paimon,
>> > > >> >> what
>> > > >> >> metrics paimon needs. Look forward to your question and
>> > suggestions.
>> > > >> >>
>> > > >> >> Best,
>> > > >> >> Guojun
>> > > >> >>
>> > > >> >> [1]
>> > > >> >>
>> > > >> >>
>> > > >>
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon
>> > > >> >>
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>

Reply via email to