Perhaps no metric at all should be returned, instead of 0, which is an incorrect value.
Also, is there a reason to have state_sampler_slow at all then, if its not intended to be implemented? On Mon, Jul 15, 2019 at 5:03 PM Kyle Weaver <kcwea...@google.com> wrote: > Pablo, what about setting a lower sampling rate? Or would that lead to > poor results? > > Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com > | +16502035555 > > > On Mon, Jul 15, 2019 at 4:44 PM Pablo Estrada <pabl...@google.com> wrote: > >> @Thomas do you think this is a problem of documentation, or a missing >> feature? >> >> We did not add support for it without cython because the cost of locking >> and checking every 200ms in Python would be too high - that's why this is >> only implemented in the optimized Cython codepath. I think it makes sense >> to document this, rather than adding the support, as it would be really >> expensive. What are your thoughts? >> >> Best >> -P. >> >> On Mon, Jul 15, 2019, 1:48 PM Thomas Weise <t...@apache.org> wrote: >> >>> That's great, but I think the JIRA needs to remain open since w/o Cython >>> the metric still doesn't work. >>> >>> It would however be helpful to add a comment regarding your findings. >>> >>> >>> On Mon, Jul 15, 2019 at 1:46 PM Rakesh Kumar <rakeshku...@lyft.com> >>> wrote: >>> >>>> >>>> Installing cython in the application environment fixed the issue. Now I >>>> am able to see the operator metrics ({organization_specific_prefix} >>>> .operator.beam-metric-pardo_execution_time-process_bundle_ >>>> msecs-v1.gauge.mean) >>>> >>>> Thanks Ankur for looking into it and providing support. >>>> >>>> I am going to close https://issues.apache.org/jira/browse/BEAM-7058 if >>>> no one has any objection? >>>> >>>> >>>> On Thu, Apr 11, 2019 at 7:13 AM Thomas Weise <t...@apache.org> wrote: >>>> >>>>> Tracked as https://issues.apache.org/jira/browse/BEAM-7058 >>>>> >>>>> >>>>> On Wed, Apr 10, 2019 at 11:38 AM Pablo Estrada <pabl...@google.com> >>>>> wrote: >>>>> >>>>>> This sounds like a bug then? +Alex Amato <ajam...@google.com> >>>>>> >>>>>> On Wed, Apr 10, 2019 at 3:59 AM Maximilian Michels <m...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hi @all, >>>>>>> >>>>>>> From a quick debugging session, I conclude that the wiring is in >>>>>>> place >>>>>>> for the Flink Runner. There is a ProgressReporter that reports >>>>>>> MonitoringInfos to Flink, in a similar fashion as the "legacy" >>>>>>> Runner. >>>>>>> >>>>>>> The bundle duration metrics are 0, but the element count gets >>>>>>> reported >>>>>>> correctly. It appears to be an issue of the Python/Java harness >>>>>>> because >>>>>>> "ProcessBundleProgressResponse" contains only 0 values for the >>>>>>> bundle >>>>>>> duration. >>>>>>> >>>>>>> Thanks, >>>>>>> Max >>>>>>> >>>>>>> On 04.04.19 19:54, Mikhail Gryzykhin wrote: >>>>>>> > Hi everyone, >>>>>>> > >>>>>>> > Quick summary on python and Dataflow Runner: >>>>>>> > Python SDK already reports: >>>>>>> > - MSec >>>>>>> > - User metrics (int64 and distribution) >>>>>>> > - PCollection Element Count >>>>>>> > - Work on MeanByteCount for pcollection is ongoing here >>>>>>> > <https://github.com/apache/beam/pull/8062>. >>>>>>> > >>>>>>> > Dataflow Runner: >>>>>>> > - all metrics listed above are passed through to Dataflow. >>>>>>> > >>>>>>> > Ryan can give more information on Flink Runner. I also see >>>>>>> Maximilian on >>>>>>> > some of relevant PRs, so he might comment on this as well. >>>>>>> > >>>>>>> > Regards, >>>>>>> > Mikhail. >>>>>>> > >>>>>>> > >>>>>>> > On Thu, Apr 4, 2019 at 10:43 AM Pablo Estrada <pabl...@google.com >>>>>>> > <mailto:pabl...@google.com>> wrote: >>>>>>> > >>>>>>> > Hello guys! >>>>>>> > Alex, Mikhail and Ryan are working on support for metrics in >>>>>>> the >>>>>>> > portability framework. The support on the SDK is pretty >>>>>>> advanced >>>>>>> > AFAIK*, and the next step is to get the metrics back into the >>>>>>> > runner. Lukazs and myself are working on a project that >>>>>>> depends on >>>>>>> > this too, so I'm adding everyone so we can get an idea of >>>>>>> what's >>>>>>> > missing. >>>>>>> > >>>>>>> > I believe: >>>>>>> > - User metrics are fully wired up in the SDK >>>>>>> > - State sampler (timing) metrics are wired up as well (is that >>>>>>> > right, +Alex Amato <mailto:ajam...@google.com>?) >>>>>>> > - Work is ongoing to send the updates back to Flink. >>>>>>> > - What is the plan for making metrics queriable from Flink? >>>>>>> +Ryan >>>>>>> > Williams <mailto:r...@runsascoded.com> >>>>>>> > >>>>>>> > Thanks! >>>>>>> > -P. >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > On Wed, Apr 3, 2019 at 12:02 PM Thomas Weise <t...@apache.org >>>>>>> > <mailto:t...@apache.org>> wrote: >>>>>>> > >>>>>>> > I believe this is where the metrics are supplied: >>>>>>> > >>>>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/operations.py >>>>>>> > >>>>>>> > git grep process_bundle_msecs yields results for dataflow >>>>>>> > worker only >>>>>>> > >>>>>>> > There isn't any test coverage for the Flink runner: >>>>>>> > >>>>>>> > >>>>>>> https://github.com/apache/beam/blob/d38645ae8758d834c3e819b715a66dd82c78f6d4/sdks/python/apache_beam/runners/portability/flink_runner_test.py#L181 >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > On Wed, Apr 3, 2019 at 10:45 AM Akshay Balwally >>>>>>> > <abalwa...@lyft.com <mailto:abalwa...@lyft.com>> wrote: >>>>>>> > >>>>>>> > Should have added- I'm using Python sdk, Flink runner >>>>>>> > >>>>>>> > On Wed, Apr 3, 2019 at 10:32 AM Akshay Balwally >>>>>>> > <abalwa...@lyft.com <mailto:abalwa...@lyft.com>> >>>>>>> wrote: >>>>>>> > >>>>>>> > Hi, >>>>>>> > I'm hoping to get metrics on the amount of time >>>>>>> spent on >>>>>>> > each operator, so it seams like the stat >>>>>>> > >>>>>>> > >>>>>>> >>>>>>> {organization_specific_prefix}.operator.beam-metric-pardo_execution_time-process_bundle_msecs-v1.gauge.mean >>>>>>> > >>>>>>> > would be pretty helpful. But in practice, this stat >>>>>>> > always shows 0, which I interpret as 0 milliseconds >>>>>>> > spent per bundle, which can't be correct (other >>>>>>> stats >>>>>>> > show that the operators are running, and timers >>>>>>> within >>>>>>> > the operators show more reasonable times). Is this >>>>>>> a >>>>>>> > known bug? >>>>>>> > >>>>>>> > >>>>>>> > -- >>>>>>> > *Akshay Balwally* >>>>>>> > Software Engineer >>>>>>> > 937.271.6469 <tel:+19372716469> >>>>>>> > Lyft <http://www.lyft.com/> >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > -- >>>>>>> > *Akshay Balwally* >>>>>>> > Software Engineer >>>>>>> > 937.271.6469 <tel:+19372716469> >>>>>>> > Lyft <http://www.lyft.com/> >>>>>>> > >>>>>>> >>>>>>