On Fri, Sep 6, 2019 at 6:23 PM Ahmet Altay <al...@google.com> wrote: > > > On Fri, Sep 6, 2019 at 6:17 PM Thomas Weise <t...@apache.org> wrote: > >> >> >> On Fri, Sep 6, 2019 at 2:24 PM Valentyn Tymofieiev <valen...@google.com> >> wrote: >> >>> +Mark Liu <mark...@google.com> has added some benchmarks running across >>> multiple Python versions. Specifically we run 1 GB wordcount job on >>> Dataflow runner on Python 2.7, 3.5-3.7. The benchmarks do not have >>> configured alerting and to my knowledge are not actively monitored yet. >>> >> >> Are there any benchmarks for streaming? Streaming and batch are quite >> different runtime paths. And some of the issues can only be identified >> with longer running processes through metrics. It would be good to verify >> utilization of memory, cpu etc. >> >> I additionally discovered that our 2.16 upgrade exhibits a memory leak in >> the Python worker (Py 2.7). >> > > Do you have more details on this one? >
Unfortunately only that at the moment. The workers eat up all memory and eventually crash. Reverted back to 2.14 / Py 3.6 and the issue is gone. > > >> >> >>> Thomas, is it possible for you to do the bisection using SDK code from >>> master at various commits to narrow down the regression on your end? >>> >> >> I don't know how soon I will get to it. It's of course possible, but >> expensive due to having to rebase the fork, build and deploy an entire >> stack of stuff for each iteration. The pipeline itself is super simple. We >> need this testbed as part of Beam. It would be nice to be able to pick an >> update and have more confidence that the baseline has not slipped. >> >> >>> >>> [1] >>> https://apache-beam-testing.appspot.com/explore?dashboard=5691127080419328 >>> [2] >>> https://drive.google.com/file/d/1ERlnN8bA2fKCUPBHTnid1l__81qpQe2W/view >>> [3] >>> https://github.com/apache/beam/commit/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5 >>> >>> >>> >>> On Fri, Sep 6, 2019 at 8:38 AM Ahmet Altay <al...@google.com> wrote: >>> >>>> +Valentyn Tymofieiev <valen...@google.com> do we have benchmarks in >>>> different python versions? Was there a recent change that is specific to >>>> python 3.x ? >>>> >>>> On Fri, Sep 6, 2019 at 8:36 AM Thomas Weise <t...@apache.org> wrote: >>>> >>>>> The issue is only visible with Python 3.6, not 2.7. >>>>> >>>>> If there is a framework in place to add a streaming test, that would >>>>> be great. We would use what we have internally as starting point. >>>>> >>>>> On Thu, Sep 5, 2019 at 5:00 PM Ahmet Altay <al...@google.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Thu, Sep 5, 2019 at 4:15 PM Thomas Weise <t...@apache.org> wrote: >>>>>> >>>>>>> The workload is quite different. What I have is streaming with state >>>>>>> and timers. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 5, 2019 at 3:47 PM Pablo Estrada <pabl...@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> We only recently started running Chicago Taxi Example. +MichaĆ >>>>>>>> Walenia <michal.wale...@polidea.com> I don't see it in the >>>>>>>> dashboards. Do you know if it's possible to see any trends in the data? >>>>>>>> >>>>>>>> We have a few tests running now: >>>>>>>> - Combine tests: >>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373 >>>>>>>> - GBK tests: >>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373 >>>>>>>> >>>>>>>> They don't seem to show a very drastic jump either, but they aren't >>>>>>>> very old. >>>>>>>> >>>>>>>> There is also work ongoing to add alerting for this sort of >>>>>>>> regressions by Kasia and Kamil (added). The work is not there yet >>>>>>>> (it's in >>>>>>>> progress). >>>>>>>> Best >>>>>>>> -P. >>>>>>>> >>>>>>>> On Thu, Sep 5, 2019 at 3:35 PM Thomas Weise <t...@apache.org> wrote: >>>>>>>> >>>>>>>>> It probably won't be practical to do a bisect due to the high cost >>>>>>>>> of each iteration with our fork/deploy setup. >>>>>>>>> >>>>>>>>> Perhaps it is time to setup something with the synthetic source >>>>>>>>> that works just with Beam as dependency. >>>>>>>>> >>>>>>>> >>>>>> I agree with this. >>>>>> >>>>>> Pablo, Kasia, Kamil, does the new benchmarks give us a easy to use >>>>>> framework for using synthetic source in benchmarks? >>>>>> >>>>>> >>>>>>> >>>>>>>>> On Thu, Sep 5, 2019 at 3:23 PM Ahmet Altay <al...@google.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> There are a few in this dashboard [1], but not very useful in >>>>>>>>>> this case because they do not go back more than a month and not very >>>>>>>>>> comprehensive. I do not see a jump there. Thomas, would it be >>>>>>>>>> possible to >>>>>>>>>> bisect to find what commit caused the regression? >>>>>>>>>> >>>>>>>>>> +Pablo Estrada <pabl...@google.com> do we have any python on >>>>>>>>>> flink benchmarks for chicago example? >>>>>>>>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou >>>>>>>>>> <yifan...@google.com> It would be good to have alerts on >>>>>>>>>> benchmarks. Do we have such an ability today? >>>>>>>>>> >>>>>>>>>> [1] https://apache-beam-testing.appspot.com/dashboard-admin >>>>>>>>>> >>>>>>>>>> On Thu, Sep 5, 2019 at 3:15 PM Thomas Weise <t...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Are there any performance tests run for the Python SDK as part >>>>>>>>>>> of release verification (or otherwise as well)? >>>>>>>>>>> >>>>>>>>>>> I see what appears to be a regression in master (compared to >>>>>>>>>>> 2.14) with our in-house application (~ 25% jump in cpu utilization >>>>>>>>>>> and >>>>>>>>>>> corresponds drop in throughput). >>>>>>>>>>> >>>>>>>>>>> I wanted to see if there is anything available to verify that >>>>>>>>>>> within Beam. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Thomas >>>>>>>>>>> >>>>>>>>>>>