Hi, This is my first time using the dev mailing list so I hope this is the correct way to do it.
I would like to lend my support to this proposal and offer my experiences as a consumer of spark, and specifically Spark Structured Streaming (SSS). I am more of an cloud infrastructure devops engineer that a spark/scala coder. Over the last couple of years I have been a member of a team that has built a banking application on top of SSS, kafka and microservices. We currently run about 40 SSS apps that run 24x7. The load on the jobs fluctuates throughout the day based on customer activity and overnight there is a large amount of data that comes from core banking batch runs. We have been down the path of trying to make DRA work within our spark infrastructure and it has taken a long time to properly understand that the existing DRA mechanisms in spark are mostly useless for SSS. We chased dynamic allocation for some time until we finally realised it is focussed on batch jobs and that it would not work properly with our SSS jobs (documentation relating to SSS and DRA is sparse to non-existent and the fact that what DRA stuff is well documented isn't relevant to SSS was not at first clear). Most of our jobs have enough data flow that they never hit the idle timeout that governs standard DRA. Those that do have low data flow would tend to end up causing cluster flapping as scaling would take longer than it would take to process the data. Eventually we have landed on the best stability and performance compromise by completely disabling all DRA and deploying our SSS apps at a static size that the resourcing can cope with daily peaks and overnight batch load. Obviously this means that for much of the day the deployed apps are running very over provisioned. Proper DRA that is built to work with SSS would be a massive money saver for us. To me it seems that Pavan has a very good understanding of the same sort of issues that we have found and seems to have a working solution (I'm sure I read that he has his code in place and working successfully for his organisation) I think it would be a great thing to get some form of DRA in place for SSS even if it is rudimentary in form as it will be a definite step up from what is essentially zero support that works with 24x7 style SSS apps. If there is more that I can do to support this initiative and get this code included in an official Spark release, please let me know. Regards, Adam Hobbs ******************************************************************************** This communication is intended only for use of the addressee and may contain legally privileged and confidential information. If you are not the addressee or intended recipient, you are notified that any dissemination, copying or use of any of the information is unauthorised. The legal privilege and confidentiality attached to this e-mail is not waived, lost or destroyed by reason of a mistaken delivery to you. If you have received this message in error, we would appreciate an immediate notification via e-mail to contac...@bendigoadelaide.com.au or by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be permanently deleted from your system. Bendigo and Adelaide Bank Limited ABN 11 068 049 178 ********************************************************************************