> Is this the correct link for integrating Volcano with Spark? Yes, it is Kubernetes operator style of integrating Volcano. And if you want to just use spark submit style to submit a native support job, you can see [2] as ref.
[1] https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4 Regards, Yikun Mich Talebzadeh <mich.talebza...@gmail.com> 于2021年6月28日周一 下午6:03写道: > Hi Yikun, > > Is this the correct link for integrating Volcano with Spark? > > spark-on-k8s-operator/volcano-integration.md at master · > GoogleCloudPlatform/spark-on-k8s-operator · GitHub > <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md> > > Thanks > > > Mich > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 25 Jun 2021 at 09:45, Yikun Jiang <yikunk...@gmail.com> wrote: > >> Oops, sorry for the error link, it should be: >> >> We will also prepare to propose an initial design and POC[3] on a shared >> branch (based on spark master branch) where we can collaborate on it, so I >> created the spark-volcano[1] org in github to make it happen. >> >> [3] >> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4 >> >> >> And >> Regards, >> Yikun >> >> >> Yikun Jiang <yikunk...@gmail.com> 于2021年6月25日周五 上午11:53写道: >> >>> Hi, folks. >>> >>> As @Klaus mentioned, We have some work on Spark on k8s with volcano >>> native support. Also, there were also some production deployment validation >>> from our partners in China, like JingDong, XiaoHongShu, VIPshop. >>> >>> We will also prepare to propose an initial design and POC[3] on a shared >>> branch (based on spark master branch) where we can collaborate on it, so I >>> created the spark-volcano[1] org in github to make it happen. >>> >>> Pls feel free to comment on it [2] if you guys have any questions or >>> concerns. >>> >>> [1] https://github.com/spark-volcano >>> [2] https://github.com/spark-volcano/spark/issues/1 >>> [3] >>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4 >>> >>> >> >> >>> Regards, >>> Yikun >>> >>> Holden Karau <hol...@pigscanfly.ca> 于2021年6月25日周五 上午12:00写道: >>> >>>> Hi Mich, >>>> >>>> I certainly think making Spark on Kubernetes run well is going to be a >>>> challenge. However I think, and I could be wrong about this as well, that >>>> in terms of cluster managers Kubernetes is likely to be our future. Talking >>>> with people I don't hear about new standalone, YARN or mesos deployments of >>>> Spark, but I do hear about people trying to migrate to Kubernetes. >>>> >>>> To be clear I certainly agree that we need more work on structured >>>> streaming, but its important to remember that the Spark developers are not >>>> all fully interchangeable, we work on the things that we're interested in >>>> pursuing so even if structured streaming needs more love if I'm not super >>>> interested in structured streaming I'm less likely to work on it. That >>>> being said I am certainly spinning up a bit more in the Spark SQL area >>>> especially around our data source/connectors because I can see the need >>>> there too. >>>> >>>> On Wed, Jun 23, 2021 at 8:26 AM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> >>>>> >>>>> Please allow me to be diverse and express a different point of view on >>>>> this roadmap. >>>>> >>>>> >>>>> I believe from a technical point of view spending time and effort plus >>>>> talent on batch scheduling on Kubernetes could be rewarding. However, if I >>>>> may say I doubt whether such an approach and the so-called democratization >>>>> of Spark on whatever platform is really should be of great focus. >>>>> >>>>> Having worked on Google Dataproc <https://cloud.google.com/dataproc> >>>>> (A fully managed and highly scalable service for running Apache >>>>> Spark, Hadoop and more recently other artefacts) for that past two >>>>> years, and Spark on Kubernetes on-premise, I have come to the conclusion >>>>> that Spark is not a beast that that one can fully commoditize it much like >>>>> one can do with Zookeeper, Kafka etc. There is always a struggle to make >>>>> some niche areas of Spark like Spark Structured Streaming (SSS) work >>>>> seamlessly and effortlessly on these commercial platforms with whatever as >>>>> a Service. >>>>> >>>>> >>>>> Moreover, Spark (and I stand corrected) from the ground up has already >>>>> a lot of resiliency and redundancy built in. It is truly an enterprise >>>>> class product (requires enterprise class support) that will be difficult >>>>> to >>>>> commoditize with Kubernetes and expect the same performance. After all, >>>>> Kubernetes is aimed at efficient resource sharing and potential cost >>>>> saving >>>>> for the mass market. In short I can see commercial enterprises will work >>>>> on >>>>> these platforms ,but may be the great talents on dev team should focus on >>>>> stuff like the perceived limitation of SSS in dealing with chain of >>>>> aggregation( if I am correct it is not yet supported on streaming >>>>> datasets) >>>>> >>>>> >>>>> These are my opinions and they are not facts, just opinions so to >>>>> speak :) >>>>> >>>>> >>>>> view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> >>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>>> any loss, damage or destruction of data or any other property which may >>>>> arise from relying on this email's technical content is explicitly >>>>> disclaimed. The author will in no case be liable for any monetary damages >>>>> arising from such loss, damage or destruction. >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, 18 Jun 2021 at 23:18, Holden Karau <hol...@pigscanfly.ca> >>>>> wrote: >>>>> >>>>>> I think these approaches are good, but there are limitations (eg >>>>>> dynamic scaling) without us making changes inside of the Spark Kube >>>>>> scheduler. >>>>>> >>>>>> Certainly whichever scheduler extensions we add support for we should >>>>>> collaborate with the people developing those extensions insofar as they >>>>>> are >>>>>> interested. My first place that I checked was #sig-scheduling which is >>>>>> fairly quite on the Kubernetes slack but if there are more places to look >>>>>> for folks interested in batch scheduling on Kubernetes we should >>>>>> definitely >>>>>> give it a shot :) >>>>>> >>>>>> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Regarding your point and I quote >>>>>>> >>>>>>> ".. I know that one of the Spark on Kube operators >>>>>>> supports volcano/kube-batch so I was thinking that might be a place I >>>>>>> would >>>>>>> start exploring..." >>>>>>> >>>>>>> There seems to be ongoing work on say Volcano as part of Cloud >>>>>>> Native Computing Foundation <https://cncf.io/> (CNCF). For example >>>>>>> through https://github.com/volcano-sh/volcano >>>>>>> >>>>>> <https://github.com/volcano-sh/volcano> >>>>>>> >>>>>>> There may be value-add in collaborating with such groups through >>>>>>> CNCF in order to have a collective approach to such work. There also >>>>>>> seems >>>>>>> to be some work on Integration of Spark with Volcano for Batch >>>>>>> Scheduling. >>>>>>> <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md> >>>>>>> >>>>>>> >>>>>>> >>>>>>> What is not very clear is the degree of progress of these projects. >>>>>>> You may be kind enough to elaborate on KPI for each of these projects >>>>>>> and >>>>>>> where you think your contributions is going to be. >>>>>>> >>>>>>> >>>>>>> HTH, >>>>>>> >>>>>>> >>>>>>> Mich >>>>>>> >>>>>>> >>>>>>> view my Linkedin profile >>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>>> for any loss, damage or destruction of data or any other property which >>>>>>> may >>>>>>> arise from relying on this email's technical content is explicitly >>>>>>> disclaimed. The author will in no case be liable for any monetary >>>>>>> damages >>>>>>> arising from such loss, damage or destruction. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, 18 Jun 2021 at 00:44, Holden Karau <hol...@pigscanfly.ca> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Folks, >>>>>>>> >>>>>>>> I'm continuing my adventures to make Spark on containers party and I >>>>>>>> was wondering if folks have experience with the different batch >>>>>>>> scheduler options that they prefer? I was thinking so that we can >>>>>>>> better support dynamic allocation it might make sense for us to >>>>>>>> support using different schedulers and I wanted to see if there are >>>>>>>> any that the community is more interested in? >>>>>>>> >>>>>>>> I know that one of the Spark on Kube operators supports >>>>>>>> volcano/kube-batch so I was thinking that might be a place I start >>>>>>>> exploring but also want to be open to other schedulers that folks >>>>>>>> might be interested in. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Holden :) >>>>>>>> >>>>>>>> -- >>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>> https://amzn.to/2MaRAG9 >>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>> >>>>>>>> -- >>>>>> Twitter: https://twitter.com/holdenkarau >>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>> >>>>> >>>> >>>> -- >>>> Twitter: https://twitter.com/holdenkarau >>>> Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> >>>