Re: ParDo Execution Time stat is always 0

Alex Amato Mon, 15 Jul 2019 17:05:24 -0700

Perhaps no metric at all should be returned, instead of 0, which is an
incorrect value.


Also, is there a reason to have state_sampler_slow at all then, if its not
intended to be implemented?

On Mon, Jul 15, 2019 at 5:03 PM Kyle Weaver <kcwea...@google.com> wrote:

> Pablo, what about setting a lower sampling rate? Or would that lead to
> poor results?
>
> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com
> | +16502035555
>
>
> On Mon, Jul 15, 2019 at 4:44 PM Pablo Estrada <pabl...@google.com> wrote:
>
>> @Thomas do you think this is a problem of documentation, or a missing
>> feature?
>>
>> We did not add support for it without cython because the cost of locking
>> and checking every 200ms in Python would be too high - that's why this is
>> only implemented in the optimized Cython codepath. I think it makes sense
>> to document this, rather than adding the support, as it would be really
>> expensive. What are your thoughts?
>>
>> Best
>> -P.
>>
>> On Mon, Jul 15, 2019, 1:48 PM Thomas Weise <t...@apache.org> wrote:
>>
>>> That's great, but I think the JIRA needs to remain open since w/o Cython
>>> the metric still doesn't work.
>>>
>>> It would however be helpful to add a comment regarding your findings.
>>>
>>>
>>> On Mon, Jul 15, 2019 at 1:46 PM Rakesh Kumar <rakeshku...@lyft.com>
>>> wrote:
>>>
>>>>
>>>> Installing cython in the application environment fixed the issue. Now I
>>>> am able to see the operator metrics ({organization_specific_prefix}
>>>> .operator.beam-metric-pardo_execution_time-process_bundle_
>>>> msecs-v1.gauge.mean)
>>>>
>>>> Thanks Ankur for looking into it and providing support.
>>>>
>>>> I am going to close  https://issues.apache.org/jira/browse/BEAM-7058 if
>>>> no one has any objection?
>>>>
>>>>
>>>> On Thu, Apr 11, 2019 at 7:13 AM Thomas Weise <t...@apache.org> wrote:
>>>>
>>>>> Tracked as https://issues.apache.org/jira/browse/BEAM-7058
>>>>>
>>>>>
>>>>> On Wed, Apr 10, 2019 at 11:38 AM Pablo Estrada <pabl...@google.com>
>>>>> wrote:
>>>>>
>>>>>> This sounds like a bug then? +Alex Amato <ajam...@google.com>
>>>>>>
>>>>>> On Wed, Apr 10, 2019 at 3:59 AM Maximilian Michels <m...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi @all,
>>>>>>>
>>>>>>>  From a quick debugging session, I conclude that the wiring is in
>>>>>>> place
>>>>>>> for the Flink Runner. There is a ProgressReporter that reports
>>>>>>> MonitoringInfos to Flink, in a similar fashion as the "legacy"
>>>>>>> Runner.
>>>>>>>
>>>>>>> The bundle duration metrics are 0, but the element count gets
>>>>>>> reported
>>>>>>> correctly. It appears to be an issue of the Python/Java harness
>>>>>>> because
>>>>>>> "ProcessBundleProgressResponse" contains only 0 values for the
>>>>>>> bundle
>>>>>>> duration.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Max
>>>>>>>
>>>>>>> On 04.04.19 19:54, Mikhail Gryzykhin wrote:
>>>>>>> > Hi everyone,
>>>>>>> >
>>>>>>> > Quick summary on python and Dataflow Runner:
>>>>>>> > Python SDK already reports:
>>>>>>> > - MSec
>>>>>>> > - User metrics (int64 and distribution)
>>>>>>> > - PCollection Element Count
>>>>>>> > - Work on MeanByteCount for pcollection is ongoing here
>>>>>>> > <https://github.com/apache/beam/pull/8062>.
>>>>>>> >
>>>>>>> > Dataflow Runner:
>>>>>>> > - all metrics listed above are passed through to Dataflow.
>>>>>>> >
>>>>>>> > Ryan can give more information on Flink Runner. I also see
>>>>>>> Maximilian on
>>>>>>> > some of relevant PRs, so he might comment on this as well.
>>>>>>> >
>>>>>>> > Regards,
>>>>>>> > Mikhail.
>>>>>>> >
>>>>>>> >
>>>>>>> > On Thu, Apr 4, 2019 at 10:43 AM Pablo Estrada <pabl...@google.com
>>>>>>> > <mailto:pabl...@google.com>> wrote:
>>>>>>> >
>>>>>>> >     Hello guys!
>>>>>>> >     Alex, Mikhail and Ryan are working on support for metrics in
>>>>>>> the
>>>>>>> >     portability framework. The support on the SDK is pretty
>>>>>>> advanced
>>>>>>> >     AFAIK*, and the next step is to get the metrics back into the
>>>>>>> >     runner. Lukazs and myself are working on a project that
>>>>>>> depends on
>>>>>>> >     this too, so I'm adding everyone so we can get an idea of
>>>>>>> what's
>>>>>>> >     missing.
>>>>>>> >
>>>>>>> >     I believe:
>>>>>>> >     - User metrics are fully wired up in the SDK
>>>>>>> >     - State sampler (timing) metrics are wired up as well (is that
>>>>>>> >     right, +Alex Amato <mailto:ajam...@google.com>?)
>>>>>>> >     - Work is ongoing to send the updates back to Flink.
>>>>>>> >     - What is the plan for making metrics queriable from Flink?
>>>>>>> +Ryan
>>>>>>> >     Williams <mailto:r...@runsascoded.com>
>>>>>>> >
>>>>>>> >     Thanks!
>>>>>>> >     -P.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >     On Wed, Apr 3, 2019 at 12:02 PM Thomas Weise <t...@apache.org
>>>>>>> >     <mailto:t...@apache.org>> wrote:
>>>>>>> >
>>>>>>> >         I believe this is where the metrics are supplied:
>>>>>>> >
>>>>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/worker/operations.py
>>>>>>> >
>>>>>>> >         git grep process_bundle_msecs   yields results for dataflow
>>>>>>> >         worker only
>>>>>>> >
>>>>>>> >         There isn't any test coverage for the Flink runner:
>>>>>>> >
>>>>>>> >
>>>>>>> https://github.com/apache/beam/blob/d38645ae8758d834c3e819b715a66dd82c78f6d4/sdks/python/apache_beam/runners/portability/flink_runner_test.py#L181
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >         On Wed, Apr 3, 2019 at 10:45 AM Akshay Balwally
>>>>>>> >         <abalwa...@lyft.com <mailto:abalwa...@lyft.com>> wrote:
>>>>>>> >
>>>>>>> >             Should have added- I'm using Python sdk, Flink runner
>>>>>>> >
>>>>>>> >             On Wed, Apr 3, 2019 at 10:32 AM Akshay Balwally
>>>>>>> >             <abalwa...@lyft.com <mailto:abalwa...@lyft.com>>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> >                 Hi,
>>>>>>> >                 I'm hoping to get metrics on the amount of time
>>>>>>> spent on
>>>>>>> >                 each operator, so it seams like the stat
>>>>>>> >
>>>>>>> >
>>>>>>>  
>>>>>>> {organization_specific_prefix}.operator.beam-metric-pardo_execution_time-process_bundle_msecs-v1.gauge.mean
>>>>>>> >
>>>>>>> >                 would be pretty helpful. But in practice, this stat
>>>>>>> >                 always shows 0, which I interpret as 0 milliseconds
>>>>>>> >                 spent per bundle, which can't be correct (other
>>>>>>> stats
>>>>>>> >                 show that the operators are running, and timers
>>>>>>> within
>>>>>>> >                 the operators show more reasonable times). Is this
>>>>>>> a
>>>>>>> >                 known bug?
>>>>>>> >
>>>>>>> >
>>>>>>> >                 --
>>>>>>> >                 *Akshay Balwally*
>>>>>>> >                 Software Engineer
>>>>>>> >                 937.271.6469 <tel:+19372716469>
>>>>>>> >                 Lyft <http://www.lyft.com/>
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >             --
>>>>>>> >             *Akshay Balwally*
>>>>>>> >             Software Engineer
>>>>>>> >             937.271.6469 <tel:+19372716469>
>>>>>>> >             Lyft <http://www.lyft.com/>
>>>>>>> >
>>>>>>>
>>>>>>

Re: ParDo Execution Time stat is always 0

Reply via email to