Thanks for writing this up. I left some comments in the doc, but at a high level I am in favor of the "more deeply overhaul SDKs' metrics/querying structures to use MonitoringInfos / URNs" option, at least over the Jobs API, for consistency and completeness. The SDK can provide whatever convenience wrappers over these it wants.
On Tue, Jan 29, 2019 at 6:30 PM Ryan Williams <r...@runsascoded.com> wrote: > > Alex and I have PRs out related to supporting metrics in portable-runner > code-paths: > > #7624 associates metrics in the SDK harness with the (pre-fusion) PTransforms > the user defined them in. > #7641 sends metrics over the "Job API" (between job server and portable > runner): > > Flink portable-VR metrics tests pass (Java) > metrics print()s work in portable wordcount (Python) > > Open Questions: > > What to do with type-specific protos (e.g. IntDistributionData vs. > DoubleDistributionData)? > > I think Alex and I were leaning toward only supporting the "int"-cases for now > That's what Java does in its existing metrics > > "MetricKey" and "MetricName" semantics: > > These exist in Java and Python, and I added proto versions in #7641. > MetricName wraps "namespace" and "name" strings, and MetricKey wraps a "step > (ptransform) name" and a MetricName. > PCollection-scoped metrics (e.g. element count) are identified by a null > "step name" in #7624 and #7641. > Alex and I discussed using URNs as the source of this information instead: > > "step name" can instead come from a MonitoringInfo's PTRANSFORM label, while > "namespace" and "name" can be parsed from its URN. > URNs could encode these over the wire, then SDKs could convert to existing > MetricKey/MetricNames for use in querying / MetricResults > or: we could more deeply overhaul SDKs' metrics/querying structures to use > MonitoringInfos / URNs. > > at the least, SDKs should get helpers for querying for Alex's new "system > metrics" (e.g. element count, various timings) that are associated with > specific URNs > > Gauges: the protos have a nod to sending gauges over the wire as counters > > are there problems with that? > #7641 should support this, for now. > > ExtremaData: the protos contain these, but SDKs don't support them (afaik). > > Alex likely has more to add, and we plan to make a doc about these changes, > but I wanted to post here first in case others have thoughts or we are > overlooking anything. > > Thanks!