Thank you. In case there are details that would be relevant for others in
the community to avoid similar regressions, feel free to share them. We
also have Cython experts here who may be able to  advise.


On Wed, Sep 25, 2019 at 6:58 AM Thomas Weise <[email protected]> wrote:

> After running through the entire bisect based on the 2.16 release branch I
> found that the regression was caused by our own Cython setup. So green
> light for the 2.16.0 release.
>
> Thomas
>
> On Tue, Sep 17, 2019 at 1:21 PM Thomas Weise <[email protected]> wrote:
>
>> Hi Valentyn,
>>
>> Thanks for the reminder. The bisect is on my TODO list.
>>
>> Hopefully this week.
>>
>> I saw the discussion about declaring 2.16 LTS. We probably need to sort
>> these performance concerns out prior to doing so.
>>
>> Thomas
>>
>>
>> On Tue, Sep 17, 2019 at 12:02 PM Valentyn Tymofieiev <[email protected]>
>> wrote:
>>
>>> Hi Thomas,
>>>
>>> Just a reminder that 2.16.0 was cut and soon the voting may start, so to
>>> avoid the regression that you reported blocking the vote, it would be great
>>> to start investigate it if it is reproducible.
>>>
>>> Thanks,
>>> Valentyn
>>>
>>> On Tue, Sep 10, 2019 at 1:53 PM Valentyn Tymofieiev <[email protected]>
>>> wrote:
>>>
>>>> Thomas, did you have a change to open a Jira for the streaming
>>>> regression you observe? If not, could you please do so and cc +Ankur
>>>> Goenka <[email protected]> ? I talked with Ankur offline and he is
>>>> also interested in this regression.
>>>>
>>>> I opened:
>>>> - https://issues.apache.org/jira/browse/BEAM-8198 for batch regression.
>>>> - https://issues.apache.org/jira/browse/BEAM-8199 to improve tooling
>>>> around performance monitoring.
>>>> - https://issues.apache.org/jira/browse/BEAM-8200 to add benchmarks
>>>> for streaming.
>>>>
>>>> I cc'ed some folks, however not everyone. Manisha, I could not find
>>>> your username in Jira, feel free to cc or assign BEAM-8199
>>>> <https://issues.apache.org/jira/browse/BEAM-8199>  to yourself if that
>>>> is something you are actively working on.
>>>>
>>>> Thanks,
>>>> Valentyn
>>>>
>>>> On Mon, Sep 9, 2019 at 9:59 AM Mark Liu <[email protected]> wrote:
>>>>
>>>>> +Alan Myrvold <[email protected]> +Yifan Zou <[email protected]> It
>>>>>> would be good to have alerts on benchmarks. Do we have such an ability
>>>>>> today?
>>>>>>
>>>>>
>>>>> As for regression detection, we have a Jenkins job
>>>>> beam_PerformanceTests_Analysis
>>>>> <https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PerformanceTests_Analysis/>
>>>>>  which
>>>>> analyzes metrics on Bigquery and report a summary to job console output.
>>>>> However, not all jobs are registered on this analyzer and currently no
>>>>> further alerts integrated with it (e.g. email / slack).
>>>>>
>>>>> There are ongoing work to add alerting to benchmarks. Kasia and Kamil
>>>>> are investigating on Prometheus + Grafana, and Manisha and me are looking
>>>>> into mako.dev.
>>>>>
>>>>> Mark
>>>>>
>>>>> On Fri, Sep 6, 2019 at 7:21 PM Ahmet Altay <[email protected]> wrote:
>>>>>
>>>>>> I agree, let's investigate. Thomas could you file JIRAs once you have
>>>>>> additional information.
>>>>>>
>>>>>> Valentyn, I think the performance regression could be investigated
>>>>>> now, by running whatever benchmarks that is available against 2.14, 2.15
>>>>>> and head and see if the same regression could be reproduced.
>>>>>>
>>>>>> On Fri, Sep 6, 2019 at 7:11 PM Valentyn Tymofieiev <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Sounds like these regressions need to be investigated ahead of
>>>>>>> 2.16.0 release.
>>>>>>>
>>>>>>> On Fri, Sep 6, 2019 at 6:44 PM Thomas Weise <[email protected]> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 6, 2019 at 6:23 PM Ahmet Altay <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Sep 6, 2019 at 6:17 PM Thomas Weise <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 6, 2019 at 2:24 PM Valentyn Tymofieiev <valentyn@
>>>>>>>>>> google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> +Mark Liu <[email protected]> has added some benchmarks
>>>>>>>>>>> running across multiple Python versions. Specifically we run 1 GB 
>>>>>>>>>>> wordcount
>>>>>>>>>>> job on Dataflow runner on Python 2.7, 3.5-3.7. The benchmarks do 
>>>>>>>>>>> not have
>>>>>>>>>>> configured alerting and to my knowledge are not actively monitored 
>>>>>>>>>>> yet.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Are there any benchmarks for streaming? Streaming and batch are
>>>>>>>>>> quite different runtime paths. And some of the issues can only
>>>>>>>>>> be identified with longer running processes through metrics. It 
>>>>>>>>>> would be
>>>>>>>>>> good to verify utilization of memory, cpu etc.
>>>>>>>>>>
>>>>>>>>>> I additionally discovered that our 2.16 upgrade exhibits a memory
>>>>>>>>>> leak in the Python worker (Py 2.7).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Do you have more details on this one?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Unfortunately only that at the moment. The workers eat up all
>>>>>>>> memory and eventually crash. Reverted back to 2.14 / Py 3.6 and the 
>>>>>>>> issue
>>>>>>>> is gone.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thomas, is it possible for you to do the bisection using SDK
>>>>>>>>>>> code from master at various commits to narrow down the regression 
>>>>>>>>>>> on your
>>>>>>>>>>> end?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I don't know how soon I will get to it. It's of course possible,
>>>>>>>>>> but expensive due to having to rebase the fork, build and deploy
>>>>>>>>>> an entire stack of stuff for each iteration. The pipeline itself is 
>>>>>>>>>> super
>>>>>>>>>> simple. We need this testbed as part of Beam. It would be nice to be 
>>>>>>>>>> able
>>>>>>>>>> to pick an update and have more confidence that the baseline has not
>>>>>>>>>> slipped.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5691127080419328
>>>>>>>>>>> [2]
>>>>>>>>>>> https://drive.google.com/file/d/1ERlnN8bA2fKCUPBHTnid1l__81qpQe2W/view
>>>>>>>>>>> [3]
>>>>>>>>>>> https://github.com/apache/beam/commit/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 6, 2019 at 8:38 AM Ahmet Altay <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +Valentyn Tymofieiev <[email protected]> do we have
>>>>>>>>>>>> benchmarks in different python versions? Was there a recent change 
>>>>>>>>>>>> that is
>>>>>>>>>>>> specific to python 3.x ?
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Sep 6, 2019 at 8:36 AM Thomas Weise <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The issue is only visible with Python 3.6, not 2.7.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If there is a framework in place to add a streaming test, that
>>>>>>>>>>>>> would be great. We would use what we have internally as starting 
>>>>>>>>>>>>> point.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 5:00 PM Ahmet Altay <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 4:15 PM Thomas Weise <[email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The workload is quite different. What I have is streaming
>>>>>>>>>>>>>>> with state and timers.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:47 PM Pablo Estrada <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We only recently started running Chicago Taxi Example. +MichaƂ
>>>>>>>>>>>>>>>> Walenia <[email protected]> I don't see it in the
>>>>>>>>>>>>>>>> dashboards. Do you know if it's possible to see any trends in 
>>>>>>>>>>>>>>>> the data?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We have a few tests running now:
>>>>>>>>>>>>>>>> - Combine tests:
>>>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>>>>>>> - GBK tests:
>>>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> They don't seem to show a very drastic jump either, but
>>>>>>>>>>>>>>>> they aren't very old.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There is also work ongoing to add alerting for this sort of
>>>>>>>>>>>>>>>> regressions by Kasia and Kamil (added). The work is not there 
>>>>>>>>>>>>>>>> yet (it's in
>>>>>>>>>>>>>>>> progress).
>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>> -P.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:35 PM Thomas Weise <[email protected]>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It probably won't be practical to do a bisect due to the
>>>>>>>>>>>>>>>>> high cost of each iteration with our fork/deploy setup.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Perhaps it is time to setup something with the synthetic
>>>>>>>>>>>>>>>>> source that works just with Beam as dependency.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I agree with this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Pablo, Kasia, Kamil, does the new benchmarks give us a easy
>>>>>>>>>>>>>> to use framework for using synthetic source in benchmarks?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:23 PM Ahmet Altay <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> There are a few in this dashboard [1], but not very
>>>>>>>>>>>>>>>>>> useful in this case because they do not go back more than a 
>>>>>>>>>>>>>>>>>> month and not
>>>>>>>>>>>>>>>>>> very comprehensive. I do not see a jump there. Thomas, would 
>>>>>>>>>>>>>>>>>> it be possible
>>>>>>>>>>>>>>>>>> to bisect to find what commit caused the regression?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +Pablo Estrada <[email protected]> do we have any
>>>>>>>>>>>>>>>>>> python on flink benchmarks for chicago example?
>>>>>>>>>>>>>>>>>> +Alan Myrvold <[email protected]> +Yifan Zou
>>>>>>>>>>>>>>>>>> <[email protected]> It would be good to have alerts on
>>>>>>>>>>>>>>>>>> benchmarks. Do we have such an ability today?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/dashboard-admin
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:15 PM Thomas Weise <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Are there any performance tests run for the Python SDK
>>>>>>>>>>>>>>>>>>> as part of release verification (or otherwise as well)?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I see what appears to be a regression in master
>>>>>>>>>>>>>>>>>>> (compared to 2.14) with our in-house application (~ 25% 
>>>>>>>>>>>>>>>>>>> jump in cpu
>>>>>>>>>>>>>>>>>>> utilization and corresponds drop in throughput).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I wanted to see if there is anything available to verify
>>>>>>>>>>>>>>>>>>> that within Beam.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>

Reply via email to