Re: Dynamic resource allocation for structured streaming [SPARK-24815]

Adam Hobbs Tue, 16 Jan 2024 21:23:28 -0800

Hi,

This is my first time using the dev mailing list so I hope this is the correct 
way to do it.


I would like to lend my support to this proposal and offer my experiences as a 
consumer of spark, and specifically Spark Structured Streaming (SSS). I am more 
of an cloud infrastructure devops engineer that a spark/scala coder.

Over the last couple of years I have been a member of a team that has built a 
banking application on top of SSS, kafka and microservices.  We currently run 
about 40 SSS apps that run 24x7.  The load on the jobs fluctuates throughout 
the day based on customer activity and overnight there is a large amount of 
data that comes from core banking batch runs.

We have been down the path of trying to make DRA work within our spark 
infrastructure and it has taken a long time to properly understand that the 
existing DRA mechanisms in spark are mostly useless for SSS.  We chased dynamic 
allocation for some time until we finally realised it is focussed on batch jobs 
and that it would not work properly with our SSS jobs (documentation relating 
to SSS and DRA is sparse to non-existent and the fact that what DRA stuff is 
well documented isn't relevant to SSS was not at first clear).  Most of our 
jobs have enough data flow that they never hit the idle timeout that governs 
standard DRA.  Those that do have low data flow would tend to end up causing 
cluster flapping as scaling would take longer than it would take to process the 
data.

Eventually we have landed on the best stability and performance compromise by 
completely disabling all DRA and deploying our SSS apps at a static size that 
the resourcing can cope with daily peaks and overnight batch load.  Obviously 
this means that for much of the day the deployed apps are running very over 
provisioned.

Proper DRA that is built to work with SSS would be a massive money saver for us.

To me it seems that Pavan has a very good understanding of the same sort of 
issues that we have found and seems to have a working solution (I'm sure I read 
that he has his code in place and working successfully for his organisation)

I think it would be a great thing to get some form of DRA in place for SSS even 
if it is rudimentary in form as it will be a definite step up from what is 
essentially zero support that works with 24x7 style SSS apps.

If there is more that I can do to support this initiative and get this code 
included in an official Spark release, please let me know.


Regards,

Adam Hobbs

********************************************************************************

This communication is intended only for use of the addressee and may contain 
legally privileged and confidential information.
If you are not the addressee or intended recipient, you are notified that any 
dissemination, copying or use of any of the information is unauthorised.

The legal privilege and confidentiality attached to this e-mail is not waived, 
lost or destroyed by reason of a mistaken delivery to you.
If you have received this message in error, we would appreciate an immediate 
notification via e-mail to [email protected] or by phoning 1300 
BENDIGO (1300 236 344), and ask that the e-mail be permanently deleted from 
your system.

Bendigo and Adelaide Bank Limited ABN 11 068 049 178

********************************************************************************

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

Reply via email to