Hi, devs, Thank you for your feedback on PIP-3[1], I'd like to start a vote on it since the discussion on this thread[2] came to an agreement.
The vote will last for at least 72 hours (Jun 29th, 2:00 GMT, excluding weekend days) unless there is an objection or insufficient vote. Best, Guojun [1] https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon [2]https://lists.apache.org/thread/h2lgqgrvcptzj3c8q4cjzv5jopgtwx9o On Wed, Jun 21, 2023 at 5:53 PM Caizhi Weng <[email protected]> wrote: > Hi Guojun! > > I've checked the latest PIP-3 and it now looks fine to me overall. Although > there are still some minor issues, for example it is unclear to me how > MetricGroup is created and is tagged. But this detail can be further > discussed during the implementation. From my point of view this PIP is now > fine enough to go on a vote. > > Guojun Li <[email protected]> 于2023年6月21日周三 15:23写道: > > > Hi caizhi, > > > > I've updated APIs in PIP3: > > 1. Instance of Metrics class is a singleton which maintains the reporters > > and Metric groups. > > 2. Instantiation of MetricGroup will add the current group to Metrics > > instance. > > 3. The metrics list instance, like CommitMetrics instance, will maintain > > the corresponding metricGroup, and can register metrics by the gauge(), > > counter() ... method. > > > > Best, > > Guojun > > > > On Mon, Jun 19, 2023 at 11:30 AM Caizhi Weng <[email protected]> > wrote: > > > > > Hi! > > > > > > 1. Could you add some demonstration code about how to create and use a > > > metric? You CommitMetrics code example uses Metrics#gauge method, but > > > Metrics class does not have such method. > > > > > > 2. You've added Metrics#registerMetrics(String name, Metric metric, int > > > groupIndex), which seems really weird to me. When registering a metric > we > > > have to remember the index of the group? Why not providing the group > > > directly, or just register the metric in the group? > > > > > > > > > Caizhi Weng <[email protected]> 于2023年6月16日周五 16:01写道: > > > > > > > Hi Guojun! > > > > > > > > I see that you've submitted the 3rd edition of PIP-3. Most problems > > have > > > > been resolved, however I still have these doubts: > > > > > > > > 1. Is `Metrics` class a singleton? If yes, it should contain multiple > > > > `MetricGroup` as its member. Currently each `Metrics` class has only > > one > > > > `MetricGroup`. If it is not a singleton, how many instances will > there > > > be? > > > > When and where are the instances created? > > > > 2. In the `MetricGroup` class you support adding tags to one single > > > > metric. In my opinion, tags should be added to groups, not metrics. > > > Metrics > > > > created from the same group should have the same set of tags, so the > > > > creator of metrics do not need to concern about what tags to add. > > > > > > > > Guojun Li <[email protected]> 于2023年6月14日周三 07:15写道: > > > > > > > >> Yes, thanks Caizhi, it looks very nice. > > > >> > > > >> I'm going to make the metrics name more reasonable. Metrics group is > > > also > > > >> a > > > >> good idea to take into account. > > > >> > > > >> On Thu, Jun 8, 2023 at 4:50 PM Caizhi Weng <[email protected]> > > > wrote: > > > >> > > > >> > Hi Guojun! > > > >> > > > > >> > I've read the modified document. I still have the following > > concerns: > > > >> > > > > >> > 1. Some metric names are still misleading. For example, > > > >> "totalTablesFiles" > > > >> > is the "number of total data files currently maintained on > storage", > > > >> > however "numTotalManifests" is the "number of scanned manifests > > files > > > in > > > >> > the last scan planning", not the total number of manifest files on > > > >> storage. > > > >> > Please keep metric names consistent, even across different metric > > > types. > > > >> > > > > >> > 2. There is no grouping in the current design. Let's consider all > > > >> metrics > > > >> > related to compaction. As one parallelism can hold multiple > writers, > > > it > > > >> is > > > >> > very likely that compactions for different buckets may happen in > the > > > >> same > > > >> > parallelism. How are you going to distinguish the same compaction > > > metric > > > >> > for different buckets? Flink has metric groups which separate > > > different > > > >> > subtasks of the same operator. > > > >> > > > > >> > Guojun Li <[email protected]> 于2023年6月2日周五 18:25写道: > > > >> > > > > >> > > Thanks caizhi for the detailed review. > > > >> > > > > > >> > > I've updated the pip-3 based on your comments, they're useful to > > > >> improve > > > >> > > this proposal and make it clear. > > > >> > > 1. I added the Histogram and HistogramStatistic interfaces. > > > >> > > 2. Added histogram metrics to measure the distribution of > commit / > > > >> scan / > > > >> > > compaction duration. > > > >> > > 3. Update the descriptions for some metrics, make them more > clear. > > > >> > > > > > >> > > And any other ideas from devs? Looking forward to your feedback. > > > >> > > > > > >> > > Best, > > > >> > > Guojun > > > >> > > > > > >> > > On Mon, May 29, 2023 at 1:51 PM Guojun Li < > > [email protected]> > > > >> > wrote: > > > >> > > > > > >> > > > Thanks Caizhi for your comments. I'll think about introducing > > > >> Histogram > > > >> > > > metric type and making the metrics description more clear. > > > >> > > > > > > >> > > > Best, > > > >> > > > Guojun > > > >> > > > > > > >> > > > On Thu, May 25, 2023 at 7:13 PM Caizhi Weng < > > [email protected] > > > > > > > >> > > wrote: > > > >> > > > > > > >> > > >> I also forgot to mention that Flink metrics [1] is a good > > > >> reference. > > > >> > It > > > >> > > >> clearly describes what Gauge, Counter and Histogram should do > > and > > > >> has > > > >> > a > > > >> > > >> list of metrics with clear definition. For example they have > a > > > >> metric > > > >> > > >> named > > > >> > > >> "lastCheckpointDuration" (which is a Gauge) and not just > > > >> > > >> "checkpointDuration". > > > >> > > >> > > > >> > > >> [1] > > > >> > > >> > > > >> > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/ > > > >> > > >> > > > >> > > >> Caizhi Weng <[email protected]> 于2023年5月25日周四 19:10写道: > > > >> > > >> > > > >> > > >> > Hi Guojun! > > > >> > > >> > > > > >> > > >> > Thanks for raising this discussion. After reading the > > document, > > > >> I'd > > > >> > > like > > > >> > > >> > to raise a few opinions. > > > >> > > >> > > > > >> > > >> > > > > >> > > >> > > > > >> > > >> > > > >> > > > > > >> > > > > >> > > > > > > -------------------------------------------------------------------------------- > > > >> > > >> > > > > >> > > >> > 1. About the overall types of metrics. > > > >> > > >> > > > > >> > > >> > You suggested two types of metrics, namely Gauge and > > Counter. A > > > >> > gauge > > > >> > > >> > indicates the value of a metric at that instance, for > example > > > >> "the > > > >> > > >> duration > > > >> > > >> > of the last compaction". A counter indicates an accumulated > > > value > > > >> > of a > > > >> > > >> > metric overtime, for example "the total number of records > > > >> written". > > > >> > > >> > > > > >> > > >> > However there are times when we want to study the history > and > > > see > > > >> > the > > > >> > > >> > trend. For example if we notice that the last compaction is > > > >> slower > > > >> > > than > > > >> > > >> > normal, we might want to know "the average duration of the > > last > > > >> few > > > >> > > >> > compactions" to make sure that this slow down is just by > > chance > > > >> or > > > >> > is > > > >> > > a > > > >> > > >> > common case. Both gauge and counter cannot meet this need. > > What > > > >> we > > > >> > > need > > > >> > > >> is > > > >> > > >> > another type of metrics called Histogram. This metrics type > > > >> should > > > >> > be > > > >> > > >> able > > > >> > > >> > to record the last few values of a metric and calculate > their > > > >> > > statistics > > > >> > > >> > such as average, min/max and more. > > > >> > > >> > > > > >> > > >> > > > > >> > > >> > > > > >> > > >> > > > >> > > > > > >> > > > > >> > > > > > > -------------------------------------------------------------------------------- > > > >> > > >> > > > > >> > > >> > 2. About the types of some specific metrics. > > > >> > > >> > > > > >> > > >> > It seems to me that in your proposal, you cannot clearly > tell > > > the > > > >> > > >> > difference between Gauges and Counters, and sometimes it is > > not > > > >> > clear > > > >> > > >> that > > > >> > > >> > what a specific metric means. Let me give you some example. > > > >> > > >> > > > > >> > > >> > Metric Name: commitDuration > > > >> > > >> > > > > >> > > >> > Description: Commit > > > >> > > >> > > > > >> > > >> > Type: Gauge > > > >> > > >> > > > > >> > > >> > Update at: Timer starts before commit starting, update > commit > > > >> > duration > > > >> > > >> >> after commit finished > > > >> > > >> > > > > >> > > >> > > > > >> > > >> > At first glance, "Commit" is not a valid description. From > > the > > > >> > metric > > > >> > > >> name > > > >> > > >> > I guess you'd like to record the length of duration for > > > >> committing > > > >> > > >> > snapshots. As its type is a Gauge, I suppose you want to > > record > > > >> the > > > >> > > >> *last* > > > >> > > >> > commit. Please make things clear what exactly you want to > > > record. > > > >> > And > > > >> > > >> as I > > > >> > > >> > said in the last section, it might be better to introduce a > > > >> > Histogram > > > >> > > >> type > > > >> > > >> > so that we can study the average duration of the last few > > > >> commits. > > > >> > > >> > > > > >> > > >> > Metric Name: numTableFilesAdded > > > >> > > >> > > > > >> > > >> > Description: Number of added table files in this commit > > > >> > > >> > > > > >> > > >> > Type: Counter > > > >> > > >> > > > > >> > > >> > Update at: Collecting changes from committables > > > >> > > >> > > > > >> > > >> > > > > >> > > >> > By "this commit" I suppose you mean *last* commit. If you > the > > > >> type > > > >> > > >> should > > > >> > > >> > be a Gauge, not a Counter. By using Counter you may want to > > > >> record > > > >> > the > > > >> > > >> *total > > > >> > > >> > number* of files added during all commits. > > > >> > > >> > > > > >> > > >> > Metric Name: numFilesCompactedBefore > > > >> > > >> > > > > >> > > >> > Description: Number of deleted files in compaction > > > >> > > >> > > > > >> > > >> > Type: Counter > > > >> > > >> > > > > >> > > >> > Update at: Triggering compaction > > > >> > > >> > > > > >> > > >> > > > > >> > > >> > It is also unclear to me what you're going to record. Are > you > > > >> going > > > >> > > the > > > >> > > >> > record the number of deleted files during the *last* > > > compaction? > > > >> Or > > > >> > > >> you'd > > > >> > > >> > like to record the *total number* of deleted files during > all > > > >> > > >> compaction? > > > >> > > >> > > > > >> > > >> > I'm afraid most of your proposed metrics seem unclear to > me. > > > >> Please > > > >> > > >> > rewrite the proposal to make them clear. > > > >> > > >> > > > > >> > > >> > Guojun Li <[email protected]> 于2023年5月25日周四 16:59写道: > > > >> > > >> > > > > >> > > >> >> Hi, Paimon Devs, > > > >> > > >> >> I’d like to start a discussion about PIP-3[1]. > > > >> > > >> >> In this PIP, I'm talking about how to support metrics > > for > > > >> > > paimon, > > > >> > > >> >> what > > > >> > > >> >> metrics paimon needs. Look forward to your question and > > > >> > suggestions. > > > >> > > >> >> > > > >> > > >> >> Best, > > > >> > > >> >> Guojun > > > >> > > >> >> > > > >> > > >> >> [1] > > > >> > > >> >> > > > >> > > >> >> > > > >> > > >> > > > >> > > > > > >> > > > > >> > > > > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-3%3A+Introduce+metrics+for+Paimon > > > >> > > >> >> > > > >> > > >> > > > > >> > > >> > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > >
