Yes, it is possible that this had an influence--Reads are now all
implemented as SDFs and Creates involve a reshuffle to better
redistribute data. This much of a change is quite surprising. Where is
the pipeline for, say, "Python | ParDo | 2GB, 100 byte records, 10
iterations | Batch" and how does one run it?

On Fri, Dec 20, 2019 at 6:50 AM Kamil Wasilewski
<[email protected]> wrote:
>
> Hi all,
>
> We have a couple of Python load tests running on Flink in which we are 
> testing the performance of ParDo, GroupByKey, CoGroupByKey and Combine 
> operations.
>
> Recently, I've discovered that the runtime of all those tests rose up 
> significantly. It happened between the 6th and 7th of December (the tests are 
> running daily). Here are the dashboards where you can see the results:
>
> https://apache-beam-testing.appspot.com/explore?dashboard=5649695233802240
> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792
> https://apache-beam-testing.appspot.com/explore?dashboard=5698549949923328
> https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536
>
> I've seen in that period we submitted some changes to the core, including 
> Read transform. Do you think this might have influenced the results?
>
> Thanks,
> Kamil

Reply via email to