Thanks for writing this up. I left some comments in the doc, but at a
high level I am in favor of the "more deeply overhaul SDKs'
metrics/querying structures to use MonitoringInfos / URNs" option, at
least over the Jobs API, for consistency and completeness. The SDK can
provide whatever convenience wrappers over these it wants.

On Tue, Jan 29, 2019 at 6:30 PM Ryan Williams <r...@runsascoded.com> wrote:
>
> Alex and I have PRs out related to supporting metrics in portable-runner 
> code-paths:
>
> #7624 associates metrics in the SDK harness with the (pre-fusion) PTransforms 
> the user defined them in.
> #7641 sends metrics over the "Job API" (between job server and portable 
> runner):
>
> Flink portable-VR metrics tests pass (Java)
> metrics print()s work in portable wordcount (Python)
>
> Open Questions:
>
> What to do with type-specific protos (e.g. IntDistributionData vs. 
> DoubleDistributionData)?
>
> I think Alex and I were leaning toward only supporting the "int"-cases for now
> That's what Java does in its existing metrics
>
> "MetricKey" and "MetricName" semantics:
>
> These exist in Java and Python, and I added proto versions in #7641.
> MetricName wraps "namespace" and "name" strings, and MetricKey wraps a "step 
> (ptransform) name" and a MetricName.
> PCollection-scoped metrics (e.g. element count) are identified by a null 
> "step name" in #7624 and #7641.
> Alex and I discussed using URNs as the source of this information instead:
>
> "step name" can instead come from a MonitoringInfo's PTRANSFORM label, while 
> "namespace" and "name" can be parsed from its URN.
> URNs could encode these over the wire, then SDKs could convert to existing 
> MetricKey/MetricNames for use in querying / MetricResults
> or: we could more deeply overhaul SDKs' metrics/querying structures to use 
> MonitoringInfos / URNs.
>
> at the least, SDKs should get helpers for querying for Alex's new "system 
> metrics" (e.g. element count, various timings) that are associated with 
> specific URNs
>
> Gauges: the protos have a nod to sending gauges over the wire as counters
>
> are there problems with that?
> #7641 should support this, for now.
>
> ExtremaData: the protos contain these, but SDKs don't support them (afaik).
>
> Alex likely has more to add, and we plan to make a doc about these changes, 
> but I wanted to post here first in case others have thoughts or we are 
> overlooking anything.
>
> Thanks!

Reply via email to