After running through the entire bisect based on the 2.16 release branch I found that the regression was caused by our own Cython setup. So green light for the 2.16.0 release.
Thomas On Tue, Sep 17, 2019 at 1:21 PM Thomas Weise <t...@apache.org> wrote: > Hi Valentyn, > > Thanks for the reminder. The bisect is on my TODO list. > > Hopefully this week. > > I saw the discussion about declaring 2.16 LTS. We probably need to sort > these performance concerns out prior to doing so. > > Thomas > > > On Tue, Sep 17, 2019 at 12:02 PM Valentyn Tymofieiev <valen...@google.com> > wrote: > >> Hi Thomas, >> >> Just a reminder that 2.16.0 was cut and soon the voting may start, so to >> avoid the regression that you reported blocking the vote, it would be great >> to start investigate it if it is reproducible. >> >> Thanks, >> Valentyn >> >> On Tue, Sep 10, 2019 at 1:53 PM Valentyn Tymofieiev <valen...@google.com> >> wrote: >> >>> Thomas, did you have a change to open a Jira for the streaming >>> regression you observe? If not, could you please do so and cc +Ankur >>> Goenka <goe...@google.com> ? I talked with Ankur offline and he is also >>> interested in this regression. >>> >>> I opened: >>> - https://issues.apache.org/jira/browse/BEAM-8198 for batch regression. >>> - https://issues.apache.org/jira/browse/BEAM-8199 to improve tooling >>> around performance monitoring. >>> - https://issues.apache.org/jira/browse/BEAM-8200 to add benchmarks for >>> streaming. >>> >>> I cc'ed some folks, however not everyone. Manisha, I could not find your >>> username in Jira, feel free to cc or assign BEAM-8199 >>> <https://issues.apache.org/jira/browse/BEAM-8199> to yourself if that >>> is something you are actively working on. >>> >>> Thanks, >>> Valentyn >>> >>> On Mon, Sep 9, 2019 at 9:59 AM Mark Liu <mark...@google.com> wrote: >>> >>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou <yifan...@google.com> It >>>>> would be good to have alerts on benchmarks. Do we have such an ability >>>>> today? >>>>> >>>> >>>> As for regression detection, we have a Jenkins job >>>> beam_PerformanceTests_Analysis >>>> <https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PerformanceTests_Analysis/> >>>> which >>>> analyzes metrics on Bigquery and report a summary to job console output. >>>> However, not all jobs are registered on this analyzer and currently no >>>> further alerts integrated with it (e.g. email / slack). >>>> >>>> There are ongoing work to add alerting to benchmarks. Kasia and Kamil >>>> are investigating on Prometheus + Grafana, and Manisha and me are looking >>>> into mako.dev. >>>> >>>> Mark >>>> >>>> On Fri, Sep 6, 2019 at 7:21 PM Ahmet Altay <al...@google.com> wrote: >>>> >>>>> I agree, let's investigate. Thomas could you file JIRAs once you have >>>>> additional information. >>>>> >>>>> Valentyn, I think the performance regression could be investigated >>>>> now, by running whatever benchmarks that is available against 2.14, 2.15 >>>>> and head and see if the same regression could be reproduced. >>>>> >>>>> On Fri, Sep 6, 2019 at 7:11 PM Valentyn Tymofieiev < >>>>> valen...@google.com> wrote: >>>>> >>>>>> Sounds like these regressions need to be investigated ahead of 2.16.0 >>>>>> release. >>>>>> >>>>>> On Fri, Sep 6, 2019 at 6:44 PM Thomas Weise <t...@apache.org> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 6, 2019 at 6:23 PM Ahmet Altay <al...@google.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Sep 6, 2019 at 6:17 PM Thomas Weise <t...@apache.org> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Sep 6, 2019 at 2:24 PM Valentyn Tymofieiev <valentyn@ >>>>>>>>> google.com> wrote: >>>>>>>>> >>>>>>>>>> +Mark Liu <mark...@google.com> has added some benchmarks running >>>>>>>>>> across multiple Python versions. Specifically we run 1 GB wordcount >>>>>>>>>> job on >>>>>>>>>> Dataflow runner on Python 2.7, 3.5-3.7. The benchmarks do not have >>>>>>>>>> configured alerting and to my knowledge are not actively monitored >>>>>>>>>> yet. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Are there any benchmarks for streaming? Streaming and batch are >>>>>>>>> quite different runtime paths. And some of the issues can only be >>>>>>>>> identified with longer running processes through metrics. It would be >>>>>>>>> good >>>>>>>>> to verify utilization of memory, cpu etc. >>>>>>>>> >>>>>>>>> I additionally discovered that our 2.16 upgrade exhibits a memory >>>>>>>>> leak in the Python worker (Py 2.7). >>>>>>>>> >>>>>>>> >>>>>>>> Do you have more details on this one? >>>>>>>> >>>>>>> >>>>>>> Unfortunately only that at the moment. The workers eat up all memory >>>>>>> and eventually crash. Reverted back to 2.14 / Py 3.6 and the issue is >>>>>>> gone. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thomas, is it possible for you to do the bisection using SDK code >>>>>>>>>> from master at various commits to narrow down the regression on your >>>>>>>>>> end? >>>>>>>>>> >>>>>>>>> >>>>>>>>> I don't know how soon I will get to it. It's of course possible, >>>>>>>>> but expensive due to having to rebase the fork, build and deploy >>>>>>>>> an entire stack of stuff for each iteration. The pipeline itself is >>>>>>>>> super >>>>>>>>> simple. We need this testbed as part of Beam. It would be nice to be >>>>>>>>> able >>>>>>>>> to pick an update and have more confidence that the baseline has not >>>>>>>>> slipped. >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5691127080419328 >>>>>>>>>> [2] >>>>>>>>>> https://drive.google.com/file/d/1ERlnN8bA2fKCUPBHTnid1l__81qpQe2W/view >>>>>>>>>> [3] >>>>>>>>>> https://github.com/apache/beam/commit/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Sep 6, 2019 at 8:38 AM Ahmet Altay <al...@google.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> +Valentyn Tymofieiev <valen...@google.com> do we have >>>>>>>>>>> benchmarks in different python versions? Was there a recent change >>>>>>>>>>> that is >>>>>>>>>>> specific to python 3.x ? >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 6, 2019 at 8:36 AM Thomas Weise <t...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> The issue is only visible with Python 3.6, not 2.7. >>>>>>>>>>>> >>>>>>>>>>>> If there is a framework in place to add a streaming test, that >>>>>>>>>>>> would be great. We would use what we have internally as starting >>>>>>>>>>>> point. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Sep 5, 2019 at 5:00 PM Ahmet Altay <al...@google.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 5, 2019 at 4:15 PM Thomas Weise <t...@apache.org> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> The workload is quite different. What I have is streaming >>>>>>>>>>>>>> with state and timers. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:47 PM Pablo Estrada < >>>>>>>>>>>>>> pabl...@google.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> We only recently started running Chicago Taxi Example. +MichaĆ >>>>>>>>>>>>>>> Walenia <michal.wale...@polidea.com> I don't see it in the >>>>>>>>>>>>>>> dashboards. Do you know if it's possible to see any trends in >>>>>>>>>>>>>>> the data? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We have a few tests running now: >>>>>>>>>>>>>>> - Combine tests: >>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373 >>>>>>>>>>>>>>> - GBK tests: >>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> They don't seem to show a very drastic jump either, but they >>>>>>>>>>>>>>> aren't very old. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> There is also work ongoing to add alerting for this sort of >>>>>>>>>>>>>>> regressions by Kasia and Kamil (added). The work is not there >>>>>>>>>>>>>>> yet (it's in >>>>>>>>>>>>>>> progress). >>>>>>>>>>>>>>> Best >>>>>>>>>>>>>>> -P. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:35 PM Thomas Weise <t...@apache.org> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It probably won't be practical to do a bisect due to the >>>>>>>>>>>>>>>> high cost of each iteration with our fork/deploy setup. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Perhaps it is time to setup something with the synthetic >>>>>>>>>>>>>>>> source that works just with Beam as dependency. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> I agree with this. >>>>>>>>>>>>> >>>>>>>>>>>>> Pablo, Kasia, Kamil, does the new benchmarks give us a easy to >>>>>>>>>>>>> use framework for using synthetic source in benchmarks? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:23 PM Ahmet Altay < >>>>>>>>>>>>>>>> al...@google.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There are a few in this dashboard [1], but not very useful >>>>>>>>>>>>>>>>> in this case because they do not go back more than a month >>>>>>>>>>>>>>>>> and not very >>>>>>>>>>>>>>>>> comprehensive. I do not see a jump there. Thomas, would it be >>>>>>>>>>>>>>>>> possible to >>>>>>>>>>>>>>>>> bisect to find what commit caused the regression? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> +Pablo Estrada <pabl...@google.com> do we have any python >>>>>>>>>>>>>>>>> on flink benchmarks for chicago example? >>>>>>>>>>>>>>>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou >>>>>>>>>>>>>>>>> <yifan...@google.com> It would be good to have alerts on >>>>>>>>>>>>>>>>> benchmarks. Do we have such an ability today? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/dashboard-admin >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:15 PM Thomas Weise < >>>>>>>>>>>>>>>>> t...@apache.org> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Are there any performance tests run for the Python SDK as >>>>>>>>>>>>>>>>>> part of release verification (or otherwise as well)? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I see what appears to be a regression in master (compared >>>>>>>>>>>>>>>>>> to 2.14) with our in-house application (~ 25% jump in cpu >>>>>>>>>>>>>>>>>> utilization and >>>>>>>>>>>>>>>>>> corresponds drop in throughput). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I wanted to see if there is anything available to verify >>>>>>>>>>>>>>>>>> that within Beam. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Thomas >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>