Re: Possible Python SDK performance regression

Thomas Weise Fri, 06 Sep 2019 18:18:02 -0700

On Fri, Sep 6, 2019 at 2:24 PM Valentyn Tymofieiev <valen...@google.com>
wrote:


> +Mark Liu <mark...@google.com> has added some benchmarks running across
> multiple Python versions. Specifically we run 1 GB wordcount job on
> Dataflow runner on Python 2.7, 3.5-3.7. The benchmarks do not have
> configured alerting and to my knowledge are not actively monitored yet.
>

Are there any benchmarks for streaming? Streaming and batch are quite
different runtime paths. And some of the issues can only be identified with
longer running processes through metrics. It would be good to verify
utilization of memory, cpu etc.

I additionally discovered that our 2.16 upgrade exhibits a memory leak in
the Python worker (Py 2.7).


> Thomas, is it possible for you to do the bisection using SDK code from
> master at various commits to narrow down the regression on your end?
>

I don't know how soon I will get to it. It's of course possible, but
expensive due to having to rebase the fork, build and deploy an entire
stack of stuff for each iteration. The pipeline itself is super simple. We
need this testbed as part of Beam. It would be nice to be able to pick an
update and have more confidence that the baseline has not slipped.


>
> [1]
> https://apache-beam-testing.appspot.com/explore?dashboard=5691127080419328
> [2] https://drive.google.com/file/d/1ERlnN8bA2fKCUPBHTnid1l__81qpQe2W/view
> [3]
> https://github.com/apache/beam/commit/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5
>
>
>
> On Fri, Sep 6, 2019 at 8:38 AM Ahmet Altay <al...@google.com> wrote:
>
>> +Valentyn Tymofieiev <valen...@google.com> do we have benchmarks in
>> different python versions? Was there a recent change that is specific to
>> python 3.x ?
>>
>> On Fri, Sep 6, 2019 at 8:36 AM Thomas Weise <t...@apache.org> wrote:
>>
>>> The issue is only visible with Python 3.6, not 2.7.
>>>
>>> If there is a framework in place to add a streaming test, that would be
>>> great. We would use what we have internally as starting point.
>>>
>>> On Thu, Sep 5, 2019 at 5:00 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Sep 5, 2019 at 4:15 PM Thomas Weise <t...@apache.org> wrote:
>>>>
>>>>> The workload is quite different. What I have is streaming with state
>>>>> and timers.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Sep 5, 2019 at 3:47 PM Pablo Estrada <pabl...@google.com>
>>>>> wrote:
>>>>>
>>>>>> We only recently started running Chicago Taxi Example. +Michał
>>>>>> Walenia <michal.wale...@polidea.com> I don't see it in the
>>>>>> dashboards. Do you know if it's possible to see any trends in the data?
>>>>>>
>>>>>> We have a few tests running now:
>>>>>> - Combine tests:
>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>> - GBK tests:
>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>
>>>>>> They don't seem to show a very drastic jump either, but they aren't
>>>>>> very old.
>>>>>>
>>>>>> There is also work ongoing to add alerting for this sort of
>>>>>> regressions by Kasia and Kamil (added). The work is not there yet (it's 
>>>>>> in
>>>>>> progress).
>>>>>> Best
>>>>>> -P.
>>>>>>
>>>>>> On Thu, Sep 5, 2019 at 3:35 PM Thomas Weise <t...@apache.org> wrote:
>>>>>>
>>>>>>> It probably won't be practical to do a bisect due to the high cost
>>>>>>> of each iteration with our fork/deploy setup.
>>>>>>>
>>>>>>> Perhaps it is time to setup something with the synthetic source that
>>>>>>> works just with Beam as dependency.
>>>>>>>
>>>>>>
>>>> I agree with this.
>>>>
>>>> Pablo, Kasia, Kamil, does the new benchmarks give us a easy to use
>>>> framework for using synthetic source in benchmarks?
>>>>
>>>>
>>>>>
>>>>>>> On Thu, Sep 5, 2019 at 3:23 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>>
>>>>>>>> There are a few in this dashboard [1], but not very useful in this
>>>>>>>> case because they do not go back more than a month and not very
>>>>>>>> comprehensive. I do not see a jump there. Thomas, would it be possible 
>>>>>>>> to
>>>>>>>> bisect to find what commit caused the regression?
>>>>>>>>
>>>>>>>> +Pablo Estrada <pabl...@google.com> do we have any python on flink
>>>>>>>> benchmarks for chicago example?
>>>>>>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou
>>>>>>>> <yifan...@google.com> It would be good to have alerts on
>>>>>>>> benchmarks. Do we have such an ability today?
>>>>>>>>
>>>>>>>> [1] https://apache-beam-testing.appspot.com/dashboard-admin
>>>>>>>>
>>>>>>>> On Thu, Sep 5, 2019 at 3:15 PM Thomas Weise <t...@apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Are there any performance tests run for the Python SDK as part of
>>>>>>>>> release verification (or otherwise as well)?
>>>>>>>>>
>>>>>>>>> I see what appears to be a regression in master (compared to 2.14)
>>>>>>>>> with our in-house application (~ 25% jump in cpu utilization and
>>>>>>>>> corresponds drop in throughput).
>>>>>>>>>
>>>>>>>>> I wanted to see if there is anything available to verify that
>>>>>>>>> within Beam.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Thomas
>>>>>>>>>
>>>>>>>>>

Re: Possible Python SDK performance regression

Reply via email to