Beam flink runner job not keeping up with input rate after downscaling

Kathula, Sandeep Mon, 10 Aug 2020 09:29:40 -0700

Hi,
   We started a Beam application with Flink runner with parallelism as 50. It 
is a stateless application.  With initial parallelism of 50, our application is 
able to process up to 50,000 records per second. After a week, we took a 
savepoint and restarted from savepoint with the parallelism of 18. We are 
seeing that our application is only able to process 7000 records per second but 
we expect it to process almost 18,000 records per second. Records processed per 
task manager was almost half of what is used to process previously with 50 task 
managers.


 When we started a new application with 18 pods without any savepoint, it is 
able to process ~18500 records per second. This problem occurs only when we 
downscale after taking a savepoint. We ported same application to simple Flink 
application without Apache Beam, and there it scales well without any issues 
after restarting from savepoint with less parallelism.  So the problem should 
be with Apache Beam or some config we are passing to Beam/Flink. We are using 
the following config:

numberOfExecutionRetries=2
externalizedCheckpointsEnabled=true
retainExternalizedCheckpointsOnCancellation=true


We didn’t give any maxParallelism in our Beam application but just specifying 
parallelism.

Beam version - 2.19
Flink version- 1.9

Any suggestions/help would be appreciated.


Thanks
Sandeep Kathula

Beam flink runner job not keeping up with input rate after downscaling

Reply via email to