Re: Possible Python SDK performance regression

Valentyn Tymofieiev Wed, 25 Sep 2019 07:14:49 -0700

Thank you. In case there are details that would be relevant for others in
the community to avoid similar regressions, feel free to share them. We
also have Cython experts here who may be able to  advise.



On Wed, Sep 25, 2019 at 6:58 AM Thomas Weise <t...@apache.org> wrote:

> After running through the entire bisect based on the 2.16 release branch I
> found that the regression was caused by our own Cython setup. So green
> light for the 2.16.0 release.
>
> Thomas
>
> On Tue, Sep 17, 2019 at 1:21 PM Thomas Weise <t...@apache.org> wrote:
>
>> Hi Valentyn,
>>
>> Thanks for the reminder. The bisect is on my TODO list.
>>
>> Hopefully this week.
>>
>> I saw the discussion about declaring 2.16 LTS. We probably need to sort
>> these performance concerns out prior to doing so.
>>
>> Thomas
>>
>>
>> On Tue, Sep 17, 2019 at 12:02 PM Valentyn Tymofieiev <valen...@google.com>
>> wrote:
>>
>>> Hi Thomas,
>>>
>>> Just a reminder that 2.16.0 was cut and soon the voting may start, so to
>>> avoid the regression that you reported blocking the vote, it would be great
>>> to start investigate it if it is reproducible.
>>>
>>> Thanks,
>>> Valentyn
>>>
>>> On Tue, Sep 10, 2019 at 1:53 PM Valentyn Tymofieiev <valen...@google.com>
>>> wrote:
>>>
>>>> Thomas, did you have a change to open a Jira for the streaming
>>>> regression you observe? If not, could you please do so and cc +Ankur
>>>> Goenka <goe...@google.com> ? I talked with Ankur offline and he is
>>>> also interested in this regression.
>>>>
>>>> I opened:
>>>> - https://issues.apache.org/jira/browse/BEAM-8198 for batch regression.
>>>> - https://issues.apache.org/jira/browse/BEAM-8199 to improve tooling
>>>> around performance monitoring.
>>>> - https://issues.apache.org/jira/browse/BEAM-8200 to add benchmarks
>>>> for streaming.
>>>>
>>>> I cc'ed some folks, however not everyone. Manisha, I could not find
>>>> your username in Jira, feel free to cc or assign BEAM-8199
>>>> <https://issues.apache.org/jira/browse/BEAM-8199>  to yourself if that
>>>> is something you are actively working on.
>>>>
>>>> Thanks,
>>>> Valentyn
>>>>
>>>> On Mon, Sep 9, 2019 at 9:59 AM Mark Liu <mark...@google.com> wrote:
>>>>
>>>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou <yifan...@google.com> It
>>>>>> would be good to have alerts on benchmarks. Do we have such an ability
>>>>>> today?
>>>>>>
>>>>>
>>>>> As for regression detection, we have a Jenkins job
>>>>> beam_PerformanceTests_Analysis
>>>>> <https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PerformanceTests_Analysis/>
>>>>>  which
>>>>> analyzes metrics on Bigquery and report a summary to job console output.
>>>>> However, not all jobs are registered on this analyzer and currently no
>>>>> further alerts integrated with it (e.g. email / slack).
>>>>>
>>>>> There are ongoing work to add alerting to benchmarks. Kasia and Kamil
>>>>> are investigating on Prometheus + Grafana, and Manisha and me are looking
>>>>> into mako.dev.
>>>>>
>>>>> Mark
>>>>>
>>>>> On Fri, Sep 6, 2019 at 7:21 PM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> I agree, let's investigate. Thomas could you file JIRAs once you have
>>>>>> additional information.
>>>>>>
>>>>>> Valentyn, I think the performance regression could be investigated
>>>>>> now, by running whatever benchmarks that is available against 2.14, 2.15
>>>>>> and head and see if the same regression could be reproduced.
>>>>>>
>>>>>> On Fri, Sep 6, 2019 at 7:11 PM Valentyn Tymofieiev <
>>>>>> valen...@google.com> wrote:
>>>>>>
>>>>>>> Sounds like these regressions need to be investigated ahead of
>>>>>>> 2.16.0 release.
>>>>>>>
>>>>>>> On Fri, Sep 6, 2019 at 6:44 PM Thomas Weise <t...@apache.org> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 6, 2019 at 6:23 PM Ahmet Altay <al...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Sep 6, 2019 at 6:17 PM Thomas Weise <t...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 6, 2019 at 2:24 PM Valentyn Tymofieiev <valentyn@
>>>>>>>>>> google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> +Mark Liu <mark...@google.com> has added some benchmarks
>>>>>>>>>>> running across multiple Python versions. Specifically we run 1 GB 
>>>>>>>>>>> wordcount
>>>>>>>>>>> job on Dataflow runner on Python 2.7, 3.5-3.7. The benchmarks do 
>>>>>>>>>>> not have
>>>>>>>>>>> configured alerting and to my knowledge are not actively monitored 
>>>>>>>>>>> yet.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Are there any benchmarks for streaming? Streaming and batch are
>>>>>>>>>> quite different runtime paths. And some of the issues can only
>>>>>>>>>> be identified with longer running processes through metrics. It 
>>>>>>>>>> would be
>>>>>>>>>> good to verify utilization of memory, cpu etc.
>>>>>>>>>>
>>>>>>>>>> I additionally discovered that our 2.16 upgrade exhibits a memory
>>>>>>>>>> leak in the Python worker (Py 2.7).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Do you have more details on this one?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Unfortunately only that at the moment. The workers eat up all
>>>>>>>> memory and eventually crash. Reverted back to 2.14 / Py 3.6 and the 
>>>>>>>> issue
>>>>>>>> is gone.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thomas, is it possible for you to do the bisection using SDK
>>>>>>>>>>> code from master at various commits to narrow down the regression 
>>>>>>>>>>> on your
>>>>>>>>>>> end?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I don't know how soon I will get to it. It's of course possible,
>>>>>>>>>> but expensive due to having to rebase the fork, build and deploy
>>>>>>>>>> an entire stack of stuff for each iteration. The pipeline itself is 
>>>>>>>>>> super
>>>>>>>>>> simple. We need this testbed as part of Beam. It would be nice to be 
>>>>>>>>>> able
>>>>>>>>>> to pick an update and have more confidence that the baseline has not
>>>>>>>>>> slipped.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5691127080419328
>>>>>>>>>>> [2]
>>>>>>>>>>> https://drive.google.com/file/d/1ERlnN8bA2fKCUPBHTnid1l__81qpQe2W/view
>>>>>>>>>>> [3]
>>>>>>>>>>> https://github.com/apache/beam/commit/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 6, 2019 at 8:38 AM Ahmet Altay <al...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +Valentyn Tymofieiev <valen...@google.com> do we have
>>>>>>>>>>>> benchmarks in different python versions? Was there a recent change 
>>>>>>>>>>>> that is
>>>>>>>>>>>> specific to python 3.x ?
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Sep 6, 2019 at 8:36 AM Thomas Weise <t...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The issue is only visible with Python 3.6, not 2.7.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If there is a framework in place to add a streaming test, that
>>>>>>>>>>>>> would be great. We would use what we have internally as starting 
>>>>>>>>>>>>> point.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 5:00 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 4:15 PM Thomas Weise <t...@apache.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The workload is quite different. What I have is streaming
>>>>>>>>>>>>>>> with state and timers.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:47 PM Pablo Estrada <
>>>>>>>>>>>>>>> pabl...@google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We only recently started running Chicago Taxi Example. +Michał
>>>>>>>>>>>>>>>> Walenia <michal.wale...@polidea.com> I don't see it in the
>>>>>>>>>>>>>>>> dashboards. Do you know if it's possible to see any trends in 
>>>>>>>>>>>>>>>> the data?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We have a few tests running now:
>>>>>>>>>>>>>>>> - Combine tests:
>>>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>>>>>>> - GBK tests:
>>>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> They don't seem to show a very drastic jump either, but
>>>>>>>>>>>>>>>> they aren't very old.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There is also work ongoing to add alerting for this sort of
>>>>>>>>>>>>>>>> regressions by Kasia and Kamil (added). The work is not there 
>>>>>>>>>>>>>>>> yet (it's in
>>>>>>>>>>>>>>>> progress).
>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>> -P.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:35 PM Thomas Weise <t...@apache.org>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It probably won't be practical to do a bisect due to the
>>>>>>>>>>>>>>>>> high cost of each iteration with our fork/deploy setup.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Perhaps it is time to setup something with the synthetic
>>>>>>>>>>>>>>>>> source that works just with Beam as dependency.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I agree with this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Pablo, Kasia, Kamil, does the new benchmarks give us a easy
>>>>>>>>>>>>>> to use framework for using synthetic source in benchmarks?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:23 PM Ahmet Altay <
>>>>>>>>>>>>>>>>> al...@google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> There are a few in this dashboard [1], but not very
>>>>>>>>>>>>>>>>>> useful in this case because they do not go back more than a 
>>>>>>>>>>>>>>>>>> month and not
>>>>>>>>>>>>>>>>>> very comprehensive. I do not see a jump there. Thomas, would 
>>>>>>>>>>>>>>>>>> it be possible
>>>>>>>>>>>>>>>>>> to bisect to find what commit caused the regression?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +Pablo Estrada <pabl...@google.com> do we have any
>>>>>>>>>>>>>>>>>> python on flink benchmarks for chicago example?
>>>>>>>>>>>>>>>>>> +Alan Myrvold <amyrv...@google.com> +Yifan Zou
>>>>>>>>>>>>>>>>>> <yifan...@google.com> It would be good to have alerts on
>>>>>>>>>>>>>>>>>> benchmarks. Do we have such an ability today?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/dashboard-admin
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:15 PM Thomas Weise <
>>>>>>>>>>>>>>>>>> t...@apache.org> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Are there any performance tests run for the Python SDK
>>>>>>>>>>>>>>>>>>> as part of release verification (or otherwise as well)?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I see what appears to be a regression in master
>>>>>>>>>>>>>>>>>>> (compared to 2.14) with our in-house application (~ 25% 
>>>>>>>>>>>>>>>>>>> jump in cpu
>>>>>>>>>>>>>>>>>>> utilization and corresponds drop in throughput).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I wanted to see if there is anything available to verify
>>>>>>>>>>>>>>>>>>> that within Beam.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>

Re: Possible Python SDK performance regression

Reply via email to