Re: Spark on Kubernetes scheduler variety

Yikun Jiang Mon, 28 Jun 2021 23:33:15 -0700

> Is this the correct link for integrating Volcano with Spark?

Yes, it is Kubernetes operator style of integrating Volcano. And if you
want to just use spark submit style to submit a native support job, you can
see [2] as ref.


[1]
https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4

Regards,
Yikun


Mich Talebzadeh <mich.talebza...@gmail.com> 于2021年6月28日周一 下午6:03写道：

> Hi Yikun,
>
> Is this the correct link for integrating Volcano with Spark?
>
> spark-on-k8s-operator/volcano-integration.md at master ·
> GoogleCloudPlatform/spark-on-k8s-operator · GitHub
> <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md>
>
> Thanks
>
>
> Mich
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 25 Jun 2021 at 09:45, Yikun Jiang <yikunk...@gmail.com> wrote:
>
>> Oops, sorry for the error link, it should be:
>>
>> We will also prepare to propose an initial design and POC[3] on a shared
>> branch (based on spark master branch) where we can collaborate on it, so I
>> created the spark-volcano[1] org in github to make it happen.
>>
>> [3]
>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>>
>>
>> And
>> Regards,
>> Yikun
>>
>>
>> Yikun Jiang <yikunk...@gmail.com> 于2021年6月25日周五 上午11:53写道：
>>
>>> Hi, folks.
>>>
>>> As @Klaus mentioned, We have some work on Spark on k8s with volcano
>>> native support. Also, there were also some production deployment validation
>>> from our partners in China, like JingDong, XiaoHongShu, VIPshop.
>>>
>>> We will also prepare to propose an initial design and POC[3] on a shared
>>> branch (based on spark master branch) where we can collaborate on it, so I
>>> created the spark-volcano[1] org in github to make it happen.
>>>
>>> Pls feel free to comment on it [2] if you guys have any questions or
>>> concerns.
>>>
>>> [1] https://github.com/spark-volcano
>>> [2] https://github.com/spark-volcano/spark/issues/1
>>> [3]
>>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>>>
>>>
>>
>>
>>> Regards,
>>> Yikun
>>>
>>> Holden Karau <hol...@pigscanfly.ca> 于2021年6月25日周五 上午12:00写道：
>>>
>>>> Hi Mich,
>>>>
>>>> I certainly think making Spark on Kubernetes run well is going to be a
>>>> challenge. However I think, and I could be wrong about this as well, that
>>>> in terms of cluster managers Kubernetes is likely to be our future. Talking
>>>> with people I don't hear about new standalone, YARN or mesos deployments of
>>>> Spark, but I do hear about people trying to migrate to Kubernetes.
>>>>
>>>> To be clear I certainly agree that we need more work on structured
>>>> streaming, but its important to remember that the Spark developers are not
>>>> all fully interchangeable, we work on the things that we're interested in
>>>> pursuing so even if structured streaming needs more love if I'm not super
>>>> interested in structured streaming I'm less likely to work on it. That
>>>> being said I am certainly spinning up a bit more in the Spark SQL area
>>>> especially around our data source/connectors because I can see the need
>>>> there too.
>>>>
>>>> On Wed, Jun 23, 2021 at 8:26 AM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> Please allow me to be diverse and express a different point of view on
>>>>> this roadmap.
>>>>>
>>>>>
>>>>> I believe from a technical point of view spending time and effort plus
>>>>> talent on batch scheduling on Kubernetes could be rewarding. However, if I
>>>>> may say I doubt whether such an approach and the so-called democratization
>>>>> of Spark on whatever platform is really should be of great focus.
>>>>>
>>>>> Having worked on Google Dataproc <https://cloud.google.com/dataproc>
>>>>>  (A fully managed and highly scalable service for running Apache
>>>>> Spark, Hadoop and more recently other artefacts) for that past two
>>>>> years, and Spark on Kubernetes on-premise, I have come to the conclusion
>>>>> that Spark is not a beast that that one can fully commoditize it much like
>>>>> one can do with  Zookeeper, Kafka etc. There is always a struggle to make
>>>>> some niche areas of Spark like Spark Structured Streaming (SSS) work
>>>>> seamlessly and effortlessly on these commercial platforms with whatever as
>>>>> a Service.
>>>>>
>>>>>
>>>>> Moreover, Spark (and I stand corrected) from the ground up has already
>>>>> a lot of resiliency and redundancy built in. It is truly an enterprise
>>>>> class product (requires enterprise class support) that will be difficult 
>>>>> to
>>>>> commoditize with Kubernetes and expect the same performance. After all,
>>>>> Kubernetes is aimed at efficient resource sharing and potential cost 
>>>>> saving
>>>>> for the mass market. In short I can see commercial enterprises will work 
>>>>> on
>>>>> these platforms ,but may be the great talents on dev team should focus on
>>>>> stuff like the perceived limitation of SSS in dealing with chain of
>>>>> aggregation( if I am correct it is not yet supported on streaming 
>>>>> datasets)
>>>>>
>>>>>
>>>>> These are my opinions and they are not facts, just opinions so to
>>>>> speak :)
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 18 Jun 2021 at 23:18, Holden Karau <hol...@pigscanfly.ca>
>>>>> wrote:
>>>>>
>>>>>> I think these approaches are good, but there are limitations (eg
>>>>>> dynamic scaling) without us making changes inside of the Spark Kube
>>>>>> scheduler.
>>>>>>
>>>>>> Certainly whichever scheduler extensions we add support for we should
>>>>>> collaborate with the people developing those extensions insofar as they 
>>>>>> are
>>>>>> interested. My first place that I checked was #sig-scheduling which is
>>>>>> fairly quite on the Kubernetes slack but if there are more places to look
>>>>>> for folks interested in batch scheduling on Kubernetes we should 
>>>>>> definitely
>>>>>> give it a shot :)
>>>>>>
>>>>>> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Regarding your point and I quote
>>>>>>>
>>>>>>> "..  I know that one of the Spark on Kube operators
>>>>>>> supports volcano/kube-batch so I was thinking that might be a place I 
>>>>>>> would
>>>>>>> start exploring..."
>>>>>>>
>>>>>>> There seems to be ongoing work on say Volcano as part of  Cloud
>>>>>>> Native Computing Foundation <https://cncf.io/> (CNCF). For example
>>>>>>> through https://github.com/volcano-sh/volcano
>>>>>>>
>>>>>> <https://github.com/volcano-sh/volcano>
>>>>>>>
>>>>>>> There may be value-add in collaborating with such groups through
>>>>>>> CNCF in order to have a collective approach to such work. There also 
>>>>>>> seems
>>>>>>> to be some work on Integration of Spark with Volcano for Batch
>>>>>>> Scheduling.
>>>>>>> <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> What is not very clear is the degree of progress of these projects.
>>>>>>> You may be kind enough to elaborate on KPI for each of these projects 
>>>>>>> and
>>>>>>> where you think your contributions is going to be.
>>>>>>>
>>>>>>>
>>>>>>> HTH,
>>>>>>>
>>>>>>>
>>>>>>> Mich
>>>>>>>
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 18 Jun 2021 at 00:44, Holden Karau <hol...@pigscanfly.ca>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Folks,
>>>>>>>>
>>>>>>>> I'm continuing my adventures to make Spark on containers party and I
>>>>>>>> was wondering if folks have experience with the different batch
>>>>>>>> scheduler options that they prefer? I was thinking so that we can
>>>>>>>> better support dynamic allocation it might make sense for us to
>>>>>>>> support using different schedulers and I wanted to see if there are
>>>>>>>> any that the community is more interested in?
>>>>>>>>
>>>>>>>> I know that one of the Spark on Kube operators supports
>>>>>>>> volcano/kube-batch so I was thinking that might be a place I start
>>>>>>>> exploring but also want to be open to other schedulers that folks
>>>>>>>> might be interested in.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Holden :)
>>>>>>>>
>>>>>>>> --
>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>> https://amzn.to/2MaRAG9
>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>
>>>>>>>> --
>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>

Re: Spark on Kubernetes scheduler variety

Reply via email to