+Mark Liu <mark...@google.com> has added some benchmarks running across
multiple Python versions. Specifically we run 1 GB wordcount job on
Dataflow runner on Python 2.7, 3.5-3.7. The benchmarks do not have
configured alerting and to my knowledge are not actively monitored yet.

The zoom buttons on the dashboard [1] seem to be malfunctioning, as it is
not readily possible to extend the range for some reason, however, the data
is available in BigQuery, and by adjusting the SQL query we can see an
regression in benchmark performance on July 4 [2].

I looked at merge commits on July 4, and only saw changes to loadtest
infrastructure [3].  AFAIK that change affects a different set of
performance tests than wordcount 1GB benchmark, however I may be wrong.
Lukasz, Kamil, or Mark can correct me. Either way, it is not clear why
only py35-37 benchmarks were affected, but not py27. It is also possible
that a new version of some Beam dependency was released, that affected
benchmark performance.

Was there a recent change that is specific to python 3.x ?


We had some changes related to type inference that are specific to python
version, for example https://github.com/apache/beam/pull/8893.

Thomas, is it possible for you to do the bisection using SDK code from
master at various commits to narrow down the regression on your end?

[1]
https://apache-beam-testing.appspot.com/explore?dashboard=5691127080419328
[2] https://drive.google.com/file/d/1ERlnN8bA2fKCUPBHTnid1l__81qpQe2W/view
[3]
https://github.com/apache/beam/commit/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5



On Fri, Sep 6, 2019 at 8:38 AM Ahmet Altay <al...@google.com> wrote:

> +Valentyn Tymofieiev <valen...@google.com> do we have benchmarks in
> different python versions? Was there a recent change that is specific to
> python 3.x ?
>
> On Fri, Sep 6, 2019 at 8:36 AM Thomas Weise <t...@apache.org> wrote:
>
>> The issue is only visible with Python 3.6, not 2.7.
>>
>> If there is a framework in place to add a streaming test, that would be
>> great. We would use what we have internally as starting point.
>>
>> On Thu, Sep 5, 2019 at 5:00 PM Ahmet Altay <al...@google.com> wrote:
>>
>>>
>>>
>>> On Thu, Sep 5, 2019 at 4:15 PM Thomas Weise <t...@apache.org> wrote:
>>>
>>>> The workload is quite different. What I have is streaming with state
>>>> and timers.
>>>>
>>>>
>>>>
>>>> On Thu, Sep 5, 2019 at 3:47 PM Pablo Estrada <pabl...@google.com>
>>>> wrote:
>>>>
>>>>> We only recently started running Chicago Taxi Example. +MichaƂ Walenia
>>>>> <michal.wale...@polidea.com> I don't see it in the dashboards. Do you
>>>>> know if it's possible to see any trends in the data?
>>>>>
>>>>> We have a few tests running now:
>>>>> - Combine tests:
>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>> - GBK tests:
>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>
>>>>> They don't seem to show a very drastic jump either, but they aren't
>>>>> very old.
>>>>>
>>>>> There is also work ongoing to add alerting for this sort of
>>>>> regressions by Kasia and Kamil (added). The work is not there yet (it's in
>>>>> progress).
>>>>> Best
>>>>> -P.
>>>>>
>>>>> On Thu, Sep 5, 2019 at 3:35 PM Thomas Weise <t...@apache.org> wrote:
>>>>>
>>>>>> It probably won't be practical to do a bisect due to the high cost of
>>>>>> each iteration with our fork/deploy setup.
>>>>>>
>>>>>> Perhaps it is time to setup something with the synthetic source that
>>>>>> works just with Beam as dependency.
>>>>>>
>>>>>
>>> I agree with this.
>>>
>>> Pablo, Kasia, Kamil, does the new benchmarks give us a easy to use
>>> framework for using synthetic source in benchmarks?
>>>
>>>
>>>>
>>>>>> On Thu, Sep 5, 2019 at 3:23 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>
>>>>>>> There are a few in this dashboard [1], but not very useful in this
>>>>>>> case because they do not go back more than a month and not very
>>>>>>> comprehensive. I do not see a jump there. Thomas, would it be possible 
>>>>>>> to
>>>>>>> bisect to find what commit caused the regression?
>>>>>>>
>>>>>>> +Pablo Estrada <pabl...@google.com> do we have any python on flink
>>>>>>> benchmarks for chicago example?
>>>>>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou <yifan...@google.com> It
>>>>>>> would be good to have alerts on benchmarks. Do we have such an ability
>>>>>>> today?
>>>>>>>
>>>>>>> [1] https://apache-beam-testing.appspot.com/dashboard-admin
>>>>>>>
>>>>>>> On Thu, Sep 5, 2019 at 3:15 PM Thomas Weise <t...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Are there any performance tests run for the Python SDK as part of
>>>>>>>> release verification (or otherwise as well)?
>>>>>>>>
>>>>>>>> I see what appears to be a regression in master (compared to 2.14)
>>>>>>>> with our in-house application (~ 25% jump in cpu utilization and
>>>>>>>> corresponds drop in throughput).
>>>>>>>>
>>>>>>>> I wanted to see if there is anything available to verify that
>>>>>>>> within Beam.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>>

Reply via email to