Hi team, I'm kube-batch/Volcano founder, and I'm excited to hear that the spark community also has such requirements :)
Volcano provides several features for batch workload, e.g. fair-share, queue, reservation, preemption/reclaim and so on. It has been used in several product environments with Spark; if necessary, I can give an overall introduction about Volcano's features and those use cases :) -- Klaus On Wed, Jun 23, 2021 at 11:26 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > > Please allow me to be diverse and express a different point of view on > this roadmap. > > > I believe from a technical point of view spending time and effort plus > talent on batch scheduling on Kubernetes could be rewarding. However, if I > may say I doubt whether such an approach and the so-called democratization > of Spark on whatever platform is really should be of great focus. > > Having worked on Google Dataproc <https://cloud.google.com/dataproc> (A fully > managed and highly scalable service for running Apache Spark, Hadoop and > more recently other artefacts) for that past two years, and Spark on > Kubernetes on-premise, I have come to the conclusion that Spark is not a > beast that that one can fully commoditize it much like one can do with > Zookeeper, Kafka etc. There is always a struggle to make some niche areas > of Spark like Spark Structured Streaming (SSS) work seamlessly and > effortlessly on these commercial platforms with whatever as a Service. > > > Moreover, Spark (and I stand corrected) from the ground up has already a > lot of resiliency and redundancy built in. It is truly an enterprise class > product (requires enterprise class support) that will be difficult to > commoditize with Kubernetes and expect the same performance. After all, > Kubernetes is aimed at efficient resource sharing and potential cost saving > for the mass market. In short I can see commercial enterprises will work on > these platforms ,but may be the great talents on dev team should focus on > stuff like the perceived limitation of SSS in dealing with chain of > aggregation( if I am correct it is not yet supported on streaming datasets) > > > These are my opinions and they are not facts, just opinions so to speak :) > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 18 Jun 2021 at 23:18, Holden Karau <hol...@pigscanfly.ca> wrote: > >> I think these approaches are good, but there are limitations (eg dynamic >> scaling) without us making changes inside of the Spark Kube scheduler. >> >> Certainly whichever scheduler extensions we add support for we should >> collaborate with the people developing those extensions insofar as they are >> interested. My first place that I checked was #sig-scheduling which is >> fairly quite on the Kubernetes slack but if there are more places to look >> for folks interested in batch scheduling on Kubernetes we should definitely >> give it a shot :) >> >> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Hi, >>> >>> Regarding your point and I quote >>> >>> ".. I know that one of the Spark on Kube operators >>> supports volcano/kube-batch so I was thinking that might be a place I would >>> start exploring..." >>> >>> There seems to be ongoing work on say Volcano as part of Cloud Native >>> Computing Foundation <https://cncf.io/> (CNCF). For example through >>> https://github.com/volcano-sh/volcano >>> >> <https://github.com/volcano-sh/volcano> >>> >>> There may be value-add in collaborating with such groups through CNCF in >>> order to have a collective approach to such work. There also seems to be >>> some work on Integration of Spark with Volcano for Batch Scheduling. >>> <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md> >>> >>> >>> >>> What is not very clear is the degree of progress of these projects. You >>> may be kind enough to elaborate on KPI for each of these projects and where >>> you think your contributions is going to be. >>> >>> >>> HTH, >>> >>> >>> Mich >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Fri, 18 Jun 2021 at 00:44, Holden Karau <hol...@pigscanfly.ca> wrote: >>> >>>> Hi Folks, >>>> >>>> I'm continuing my adventures to make Spark on containers party and I >>>> was wondering if folks have experience with the different batch >>>> scheduler options that they prefer? I was thinking so that we can >>>> better support dynamic allocation it might make sense for us to >>>> support using different schedulers and I wanted to see if there are >>>> any that the community is more interested in? >>>> >>>> I know that one of the Spark on Kube operators supports >>>> volcano/kube-batch so I was thinking that might be a place I start >>>> exploring but also want to be open to other schedulers that folks >>>> might be interested in. >>>> >>>> Cheers, >>>> >>>> Holden :) >>>> >>>> -- >>>> Twitter: https://twitter.com/holdenkarau >>>> Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 >>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>> -- >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> >