hi!

(ben just sent his mail and he covered some similar topics to me, but I'll
keep my comments intact since they are slightly different)

* I think there are a lot of metrics that should be exposed for all
transforms - everything from JB's list (mile number of split, throughput,
reading/writing rate, number of splits, etc..) also apply to
splittableDoFns.
* I also think there are data source specific metrics that a given IO will
want to expose (ie, things like kafka backlog for a topic.) No one on this
thread has specifically addressed this, but Beam Sources & Sinks do not
presently have the ability to report metrics even if a given IO writer
wanted to - depending on the timeline for SplittableDoFn and the move to
that infrastructure, I don't think we need that support in Sources/Sinks,
but I do think we should make sure SplittableDoFn has the necessary support.
* I think there are ways to do many metrics such that they are not too
expensive to calculate all the time.  (ie, reporting per bundle rather than
per item) I think we should ask whether we want/need are metrics that are
expensive to calculate before going to the effort of adding enable/disable.
* I disagree with ben about showing the amount of splitting - I think
especially with IOs it's useful to understand/diagnose reading problems
since that's one potential source of problems, especially given that the
user can write transforms that split now in SplittableDoFn. But I look
forward to discussing that further

+1 on talking about specific examples

S

On Tue, Feb 14, 2017 at 8:29 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Aviem
>
> Agree with your comments, it's pretty close to my previous ones.
>
> Regards
> JB
>
> On Feb 14, 2017, 12:04, at 12:04, Aviem Zur <aviem...@gmail.com> wrote:
> >Hi Ismaël,
> >
> >You've raised some great points.
> >Please see my comments inline.
> >
> >On Tue, Feb 14, 2017 at 3:37 PM Ismaël Mejía <ieme...@gmail.com> wrote:
> >
> >> ​Hello,
> >>
> >> The new metrics API allows us to integrate some basic metrics into
> >the Beam
> >> IOs. I have been following some discussions about this on JIRAs/PRs,
> >and I
> >> think it is important to discuss the subject here so we can have more
> >> awareness and obtain ideas from the community.
> >>
> >> First I want to thank Ben for his work on the metrics API, and Aviem
> >for
> >> his ongoing work on metrics for IOs, e.g. KafkaIO) that made me aware
> >of
> >> this subject.
> >>
> >> There are some basic ideas to discuss e.g.
> >>
> >> - What are the responsibilities of Beam IOs in terms of Metrics
> >> (considering the fact that the actual IOs, server + client, usually
> >provide
> >> their own)?
> >>
> >
> >While it is true that many IOs provide their own metrics, I think that
> >Beam
> >should expose IO metrics because:
> >
> >1. Metrics which help understanding performance of a pipeline which
> >uses
> >   an IO may not be covered by the IO .
> >2. Users may not be able to setup integrations with the IO's metrics to
> >view them effectively (And correlate them to a specific Beam pipeline),
> >but
> >   still want to investigate their pipeline's performance.
> >
> >
> >> - What metrics are relevant to the pipeline (or some particular IOs)?
> >Kafka
> >> backlog for one could point that a pipeline is behind ingestion rate.
> >
> >
> >I think it depends on the IO, but there is probably overlap in some of
> >the
> >metrics so a guideline might be written for this.
> >I listed what I thought should be reported for KafkaIO in the following
> >JIRA: https://issues.apache.org/jira/browse/BEAM-1398
> >Feel free to add more metrics you think are important to report.
> >
> >
> >>
> >>
> >- Should metrics be calculated on IOs by default or no?
> >> - If metrics are defined by default does it make sense to allow users
> >to
> >> disable them?
> >>
> >
> >IIUC, your concern is that metrics will add overhead to the pipeline,
> >and
> >pipelines which are highly sensitive to this will be hampered?
> >In any case I think that yes, metrics calculation should be
> >configurable
> >(Enable/disable).
> >In Spark runner, for example the Metrics sink feature (not the metrics
> >calculation itself, but sinks to send them to) is configurable in the
> >pipeline options.
> >
> >
> >> Well these are just some questions around the subject so we can
> >create a
> >> common set of practices to include metrics in the IOs and eventually
> >> improve the transform guide with this. What do you think about this?
> >Do you
> >> have other questions/ideas?
> >>
> >> Thanks,
> >> Ismaël
> >>
>

Reply via email to