Re: Spark on Kubernetes scheduler variety

2021-07-08 Thread Mich Talebzadeh
Splendid.

Please invite me to the next meeting

mich.talebza...@gmail.com

Timezone London, UK  *GMT+1*

Thanks,


   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 8 Jul 2021 at 19:04, Holden Karau  wrote:

> Hi Y'all,
>
> We had an initial meeting which went well, got some more context around
> Volcano and its near-term roadmap. Talked about the impact around scheduler
> deadlocking and some ways that we could potentially improve integration
> from the Spark side and Volcano sides respectively. I'm going to start
> creating some sub-issues under
> https://issues.apache.org/jira/browse/SPARK-36057
>
> If anyone is interested in being on the next meeting please reach out and
> I'll send an e-mail around to try and schedule re-occurring sync that works
> for folks.
>
> Cheers,
>
> Holden
>
> On Thu, Jun 24, 2021 at 8:56 AM Holden Karau  wrote:
>
>> That's awesome, I'm just starting to get context around Volcano but maybe
>> we can schedule an initial meeting for all of us interested in pursuing
>> this to get on the same page.
>>
>> On Wed, Jun 23, 2021 at 6:54 PM Klaus Ma  wrote:
>>
>>> Hi team,
>>>
>>> I'm kube-batch/Volcano founder, and I'm excited to hear that the spark
>>> community also has such requirements :)
>>>
>>> Volcano provides several features for batch workload, e.g. fair-share,
>>> queue, reservation, preemption/reclaim and so on.
>>> It has been used in several product environments with Spark; if
>>> necessary, I can give an overall introduction about Volcano's features and
>>> those use cases :)
>>>
>>> -- Klaus
>>>
>>> On Wed, Jun 23, 2021 at 11:26 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>


 Please allow me to be diverse and express a different point of view on
 this roadmap.


 I believe from a technical point of view spending time and effort plus
 talent on batch scheduling on Kubernetes could be rewarding. However, if I
 may say I doubt whether such an approach and the so-called democratization
 of Spark on whatever platform is really should be of great focus.

 Having worked on Google Dataproc  (A 
 fully
 managed and highly scalable service for running Apache Spark, Hadoop and
 more recently other artefacts) for that past two years, and Spark on
 Kubernetes on-premise, I have come to the conclusion that Spark is not a
 beast that that one can fully commoditize it much like one can do with
 Zookeeper, Kafka etc. There is always a struggle to make some niche areas
 of Spark like Spark Structured Streaming (SSS) work seamlessly and
 effortlessly on these commercial platforms with whatever as a Service.


 Moreover, Spark (and I stand corrected) from the ground up has already
 a lot of resiliency and redundancy built in. It is truly an enterprise
 class product (requires enterprise class support) that will be difficult to
 commoditize with Kubernetes and expect the same performance. After all,
 Kubernetes is aimed at efficient resource sharing and potential cost saving
 for the mass market. In short I can see commercial enterprises will work on
 these platforms ,but may be the great talents on dev team should focus on
 stuff like the perceived limitation of SSS in dealing with chain of
 aggregation( if I am correct it is not yet supported on streaming datasets)


 These are my opinions and they are not facts, just opinions so to speak
 :)


view my Linkedin profile
 



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Fri, 18 Jun 2021 at 23:18, Holden Karau 
 wrote:

> I think these approaches are good, but there are limitations (eg
> dynamic scaling) without us making changes inside of the Spark Kube
> scheduler.
>
> Certainly whichever scheduler extensions we add support for we should
> collaborate with the people developing those extensions insofar as they 
> are
> interested. My first place that I checked was #sig-scheduling which is
> fairly quite on the Kubernetes slack but if there are more places to look
> for folks interested in batch scheduling 

Re: Spark on Kubernetes scheduler variety

2021-07-08 Thread Holden Karau
Hi Y'all,

We had an initial meeting which went well, got some more context around
Volcano and its near-term roadmap. Talked about the impact around scheduler
deadlocking and some ways that we could potentially improve integration
from the Spark side and Volcano sides respectively. I'm going to start
creating some sub-issues under
https://issues.apache.org/jira/browse/SPARK-36057

If anyone is interested in being on the next meeting please reach out and
I'll send an e-mail around to try and schedule re-occurring sync that works
for folks.

Cheers,

Holden

On Thu, Jun 24, 2021 at 8:56 AM Holden Karau  wrote:

> That's awesome, I'm just starting to get context around Volcano but maybe
> we can schedule an initial meeting for all of us interested in pursuing
> this to get on the same page.
>
> On Wed, Jun 23, 2021 at 6:54 PM Klaus Ma  wrote:
>
>> Hi team,
>>
>> I'm kube-batch/Volcano founder, and I'm excited to hear that the spark
>> community also has such requirements :)
>>
>> Volcano provides several features for batch workload, e.g. fair-share,
>> queue, reservation, preemption/reclaim and so on.
>> It has been used in several product environments with Spark; if
>> necessary, I can give an overall introduction about Volcano's features and
>> those use cases :)
>>
>> -- Klaus
>>
>> On Wed, Jun 23, 2021 at 11:26 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>>
>>>
>>> Please allow me to be diverse and express a different point of view on
>>> this roadmap.
>>>
>>>
>>> I believe from a technical point of view spending time and effort plus
>>> talent on batch scheduling on Kubernetes could be rewarding. However, if I
>>> may say I doubt whether such an approach and the so-called democratization
>>> of Spark on whatever platform is really should be of great focus.
>>>
>>> Having worked on Google Dataproc  (A 
>>> fully
>>> managed and highly scalable service for running Apache Spark, Hadoop and
>>> more recently other artefacts) for that past two years, and Spark on
>>> Kubernetes on-premise, I have come to the conclusion that Spark is not a
>>> beast that that one can fully commoditize it much like one can do with
>>> Zookeeper, Kafka etc. There is always a struggle to make some niche areas
>>> of Spark like Spark Structured Streaming (SSS) work seamlessly and
>>> effortlessly on these commercial platforms with whatever as a Service.
>>>
>>>
>>> Moreover, Spark (and I stand corrected) from the ground up has already a
>>> lot of resiliency and redundancy built in. It is truly an enterprise class
>>> product (requires enterprise class support) that will be difficult to
>>> commoditize with Kubernetes and expect the same performance. After all,
>>> Kubernetes is aimed at efficient resource sharing and potential cost saving
>>> for the mass market. In short I can see commercial enterprises will work on
>>> these platforms ,but may be the great talents on dev team should focus on
>>> stuff like the perceived limitation of SSS in dealing with chain of
>>> aggregation( if I am correct it is not yet supported on streaming datasets)
>>>
>>>
>>> These are my opinions and they are not facts, just opinions so to speak
>>> :)
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Fri, 18 Jun 2021 at 23:18, Holden Karau  wrote:
>>>
 I think these approaches are good, but there are limitations (eg
 dynamic scaling) without us making changes inside of the Spark Kube
 scheduler.

 Certainly whichever scheduler extensions we add support for we should
 collaborate with the people developing those extensions insofar as they are
 interested. My first place that I checked was #sig-scheduling which is
 fairly quite on the Kubernetes slack but if there are more places to look
 for folks interested in batch scheduling on Kubernetes we should definitely
 give it a shot :)

 On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Hi,
>
> Regarding your point and I quote
>
> "..  I know that one of the Spark on Kube operators
> supports volcano/kube-batch so I was thinking that might be a place I 
> would
> start exploring..."
>
> There seems to be ongoing work on say Volcano as part of  Cloud
> Native Computing Foundation  (CNCF). For example
> through https://github.com/volcano-sh/volcano
>
 
>
> There may be value-add in collaborating with such groups th

Re: Spark on Kubernetes scheduler variety

2021-07-01 Thread Mich Talebzadeh
Thanks. I also have a three node cluster in my lab running Red Hat 7.6 with
64GB of RAM etc. However, I doubt whether minikube will be useful.

If we can get a Google Kubernetes Engine
(GKE) cluster (which is a fully
managed service) from Google on a loan, then it will be great. That will
take out the hassle of setting up the K8 cluster manually and dealing with
compatibility issues further down the line.

To take this further, I would like to suggest having a discussion here with
Klaus Mao and the other colleagues who represent the Volcano project on the
best way of progressing on this. I am a Google Advantage partner so I can
put such an agreed proposal to the account manager and ask whether Google
will agree to support this R &D work (which BTW I think would be beneficial
to both parties) as Google started Kubernetes themselves.


HTH

   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 1 Jul 2021 at 21:04, Holden Karau  wrote:

> I do my own dev work on a personal cluster I have down in Fremont which
> I’ve got setup using k3sup. I know some devs use minikube (and our
> integration tests can). But yeah if there was a vendor willing to hand out
> Kube resources that could simplify our dev cycles.
>
> On Thu, Jul 1, 2021 at 12:52 PM Mich Talebzadeh 
> wrote:
>
>> Hi,
>>
>> A rather simple question.
>>
>> As Kubernetes is a special work requiring some effort in setting it up
>> properly, do we have a dev/test bed to conduct development work?
>>
>> What I am trying to get at is if there is official support for Volcano
>> stuff that a vendor can provide free cluster usage in exchange for R & D.
>> For example Google themselves?
>>
>> Thanks,
>>
>> Mich
>>
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Thu, 1 Jul 2021 at 05:00, Mich Talebzadeh 
>> wrote:
>>
>>> Hi Klaus,
>>>
>>> Thanks
>>>
>>> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1289
>>>
>>>
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Thu, 1 Jul 2021 at 03:16, Klaus Ma  wrote:
>>>
 Hi Mich,

 Would you help to open an issue at spark-on-k8s-operator repo? We're
 going to submit a PR to update the install steps :)

 -- Klaus

 On Wed, Jun 30, 2021 at 12:24 AM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Hi Yikun
>
> In reference
>
>
> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md
>
> Trying to install Volcano I am getting this error
>
> helm repo add incubator
> http://storage.googleapis.com/kubernetes-charts-incubator
> Error: looks like "
> http://storage.googleapis.com/kubernetes-charts-incubator"; is not a
> valid chart repository or cannot be reached: failed to fetch
> http://storage.googleapis.com/kubernetes-charts-incubator/index.yaml
> : 404 Not Found
>
> Any ideas will be appreciated.
>
> Thanks,
>
> Mich
>
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Tue, 29 Jun 2021 at 09:14, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Cool, thanks!
>>
>>
>>
>>view my Linkedin profile
>> 
>>

Re: Spark on Kubernetes scheduler variety

2021-07-01 Thread Holden Karau
I do my own dev work on a personal cluster I have down in Fremont which
I’ve got setup using k3sup. I know some devs use minikube (and our
integration tests can). But yeah if there was a vendor willing to hand out
Kube resources that could simplify our dev cycles.

On Thu, Jul 1, 2021 at 12:52 PM Mich Talebzadeh 
wrote:

> Hi,
>
> A rather simple question.
>
> As Kubernetes is a special work requiring some effort in setting it up
> properly, do we have a dev/test bed to conduct development work?
>
> What I am trying to get at is if there is official support for Volcano
> stuff that a vendor can provide free cluster usage in exchange for R & D.
> For example Google themselves?
>
> Thanks,
>
> Mich
>
>
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 1 Jul 2021 at 05:00, Mich Talebzadeh 
> wrote:
>
>> Hi Klaus,
>>
>> Thanks
>>
>> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1289
>>
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Thu, 1 Jul 2021 at 03:16, Klaus Ma  wrote:
>>
>>> Hi Mich,
>>>
>>> Would you help to open an issue at spark-on-k8s-operator repo? We're
>>> going to submit a PR to update the install steps :)
>>>
>>> -- Klaus
>>>
>>> On Wed, Jun 30, 2021 at 12:24 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Hi Yikun

 In reference


 https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md

 Trying to install Volcano I am getting this error

 helm repo add incubator
 http://storage.googleapis.com/kubernetes-charts-incubator
 Error: looks like "
 http://storage.googleapis.com/kubernetes-charts-incubator"; is not a
 valid chart repository or cannot be reached: failed to fetch
 http://storage.googleapis.com/kubernetes-charts-incubator/index.yaml :
 404 Not Found

 Any ideas will be appreciated.

 Thanks,

 Mich



view my Linkedin profile
 



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Tue, 29 Jun 2021 at 09:14, Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Cool, thanks!
>
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Tue, 29 Jun 2021 at 07:33, Yikun Jiang  wrote:
>
>> > Is this the correct link for integrating Volcano with Spark?
>>
>> Yes, it is Kubernetes operator style of integrating Volcano. And if
>> you want to just use spark submit style to submit a native support job, 
>> you
>> can see [2] as ref.
>>
>> [1]
>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>>
>> Regards,
>> Yikun
>>
>>
>> Mich Talebzadeh  于2021年6月28日周一 下午6:03写道:
>>
>>> Hi Yikun,
>>>
>>> Is this the correct link for integrating Volcano with Spark?
>>>
>>> spark-on-k8s-operator/volcano-integration.md at master ·
>>> GoogleCloudPlatform/spark-on-k8s-operator · GitHub
>>> 
>>>
>>> Thanks
>>>
>>>
>>> Mich
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>
>>> *Disclaimer:

Re: Spark on Kubernetes scheduler variety

2021-07-01 Thread Mich Talebzadeh
Hi,

A rather simple question.

As Kubernetes is a special work requiring some effort in setting it up
properly, do we have a dev/test bed to conduct development work?

What I am trying to get at is if there is official support for Volcano
stuff that a vendor can provide free cluster usage in exchange for R & D.
For example Google themselves?

Thanks,

Mich




   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 1 Jul 2021 at 05:00, Mich Talebzadeh 
wrote:

> Hi Klaus,
>
> Thanks
>
> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1289
>
>
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 1 Jul 2021 at 03:16, Klaus Ma  wrote:
>
>> Hi Mich,
>>
>> Would you help to open an issue at spark-on-k8s-operator repo? We're
>> going to submit a PR to update the install steps :)
>>
>> -- Klaus
>>
>> On Wed, Jun 30, 2021 at 12:24 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Yikun
>>>
>>> In reference
>>>
>>>
>>> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md
>>>
>>> Trying to install Volcano I am getting this error
>>>
>>> helm repo add incubator
>>> http://storage.googleapis.com/kubernetes-charts-incubator
>>> Error: looks like "
>>> http://storage.googleapis.com/kubernetes-charts-incubator"; is not a
>>> valid chart repository or cannot be reached: failed to fetch
>>> http://storage.googleapis.com/kubernetes-charts-incubator/index.yaml :
>>> 404 Not Found
>>>
>>> Any ideas will be appreciated.
>>>
>>> Thanks,
>>>
>>> Mich
>>>
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 29 Jun 2021 at 09:14, Mich Talebzadeh 
>>> wrote:
>>>
 Cool, thanks!



view my Linkedin profile
 



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Tue, 29 Jun 2021 at 07:33, Yikun Jiang  wrote:

> > Is this the correct link for integrating Volcano with Spark?
>
> Yes, it is Kubernetes operator style of integrating Volcano. And if
> you want to just use spark submit style to submit a native support job, 
> you
> can see [2] as ref.
>
> [1]
> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>
> Regards,
> Yikun
>
>
> Mich Talebzadeh  于2021年6月28日周一 下午6:03写道:
>
>> Hi Yikun,
>>
>> Is this the correct link for integrating Volcano with Spark?
>>
>> spark-on-k8s-operator/volcano-integration.md at master ·
>> GoogleCloudPlatform/spark-on-k8s-operator · GitHub
>> 
>>
>> Thanks
>>
>>
>> Mich
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>> for any loss, damage or destruction of data or any other property which 
>> may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 25 Jun 2021 at 09:45, Yikun Jiang 
>> wrote:
>>
>>> Oops, sorry for the error link, it should be:
>>>
>>> We will also p

Re: Spark on Kubernetes scheduler variety

2021-06-30 Thread Mich Talebzadeh
Hi Klaus,

Thanks

https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1289




   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 1 Jul 2021 at 03:16, Klaus Ma  wrote:

> Hi Mich,
>
> Would you help to open an issue at spark-on-k8s-operator repo? We're going
> to submit a PR to update the install steps :)
>
> -- Klaus
>
> On Wed, Jun 30, 2021 at 12:24 AM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi Yikun
>>
>> In reference
>>
>>
>> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md
>>
>> Trying to install Volcano I am getting this error
>>
>> helm repo add incubator
>> http://storage.googleapis.com/kubernetes-charts-incubator
>> Error: looks like "
>> http://storage.googleapis.com/kubernetes-charts-incubator"; is not a
>> valid chart repository or cannot be reached: failed to fetch
>> http://storage.googleapis.com/kubernetes-charts-incubator/index.yaml :
>> 404 Not Found
>>
>> Any ideas will be appreciated.
>>
>> Thanks,
>>
>> Mich
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 29 Jun 2021 at 09:14, Mich Talebzadeh 
>> wrote:
>>
>>> Cool, thanks!
>>>
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 29 Jun 2021 at 07:33, Yikun Jiang  wrote:
>>>
 > Is this the correct link for integrating Volcano with Spark?

 Yes, it is Kubernetes operator style of integrating Volcano. And if you
 want to just use spark submit style to submit a native support job, you can
 see [2] as ref.

 [1]
 https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4

 Regards,
 Yikun


 Mich Talebzadeh  于2021年6月28日周一 下午6:03写道:

> Hi Yikun,
>
> Is this the correct link for integrating Volcano with Spark?
>
> spark-on-k8s-operator/volcano-integration.md at master ·
> GoogleCloudPlatform/spark-on-k8s-operator · GitHub
> 
>
> Thanks
>
>
> Mich
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Fri, 25 Jun 2021 at 09:45, Yikun Jiang  wrote:
>
>> Oops, sorry for the error link, it should be:
>>
>> We will also prepare to propose an initial design and POC[3] on a
>> shared branch (based on spark master branch) where we can collaborate on
>> it, so I created the spark-volcano[1] org in github to make it happen.
>>
>> [3]
>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>>
>>
>> And
>> Regards,
>> Yikun
>>
>>
>> Yikun Jiang  于2021年6月25日周五 上午11:53写道:
>>
>>> Hi, folks.
>>>
>>> As @Klaus mentioned, We have some work on Spark on k8s with volcano
>>> native support. Also, there were also some production deployment 
>>> validation
>>> from our partners in China, like JingDong, XiaoHongShu, VIPshop.
>>>
>>> We will also prepare to propose an initial design and POC[3] on a
>>> shared branch (based on spark master branch) where we can collaborate on
>>> it, so I created the spark-volcano[1] org in github to make it happen.
>>>
>>> Pls feel free to comment on it [2] if you guys

Re: Spark on Kubernetes scheduler variety

2021-06-30 Thread Klaus Ma
Hi Mich,

Would you help to open an issue at spark-on-k8s-operator repo? We're going
to submit a PR to update the install steps :)

-- Klaus

On Wed, Jun 30, 2021 at 12:24 AM Mich Talebzadeh 
wrote:

> Hi Yikun
>
> In reference
>
>
> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md
>
> Trying to install Volcano I am getting this error
>
> helm repo add incubator
> http://storage.googleapis.com/kubernetes-charts-incubator
> Error: looks like "
> http://storage.googleapis.com/kubernetes-charts-incubator"; is not a valid
> chart repository or cannot be reached: failed to fetch
> http://storage.googleapis.com/kubernetes-charts-incubator/index.yaml :
> 404 Not Found
>
> Any ideas will be appreciated.
>
> Thanks,
>
> Mich
>
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 29 Jun 2021 at 09:14, Mich Talebzadeh 
> wrote:
>
>> Cool, thanks!
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 29 Jun 2021 at 07:33, Yikun Jiang  wrote:
>>
>>> > Is this the correct link for integrating Volcano with Spark?
>>>
>>> Yes, it is Kubernetes operator style of integrating Volcano. And if you
>>> want to just use spark submit style to submit a native support job, you can
>>> see [2] as ref.
>>>
>>> [1]
>>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>>>
>>> Regards,
>>> Yikun
>>>
>>>
>>> Mich Talebzadeh  于2021年6月28日周一 下午6:03写道:
>>>
 Hi Yikun,

 Is this the correct link for integrating Volcano with Spark?

 spark-on-k8s-operator/volcano-integration.md at master ·
 GoogleCloudPlatform/spark-on-k8s-operator · GitHub
 

 Thanks


 Mich


view my Linkedin profile
 



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Fri, 25 Jun 2021 at 09:45, Yikun Jiang  wrote:

> Oops, sorry for the error link, it should be:
>
> We will also prepare to propose an initial design and POC[3] on a
> shared branch (based on spark master branch) where we can collaborate on
> it, so I created the spark-volcano[1] org in github to make it happen.
>
> [3]
> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>
>
> And
> Regards,
> Yikun
>
>
> Yikun Jiang  于2021年6月25日周五 上午11:53写道:
>
>> Hi, folks.
>>
>> As @Klaus mentioned, We have some work on Spark on k8s with volcano
>> native support. Also, there were also some production deployment 
>> validation
>> from our partners in China, like JingDong, XiaoHongShu, VIPshop.
>>
>> We will also prepare to propose an initial design and POC[3] on a
>> shared branch (based on spark master branch) where we can collaborate on
>> it, so I created the spark-volcano[1] org in github to make it happen.
>>
>> Pls feel free to comment on it [2] if you guys have any questions or
>> concerns.
>>
>> [1] https://github.com/spark-volcano
>> [2] https://github.com/spark-volcano/spark/issues/1
>> [3]
>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>>
>>
>
>
>> Regards,
>> Yikun
>>
>> Holden Karau  于2021年6月25日周五 上午12:00写道:
>>
>>> Hi Mich,
>>>
>>> I certainly think making Spark on Kubernetes run well is going to be
>>> a challenge. However I think, and I could be wrong about this as well, 
>>> that
>>> in terms of cluster managers Kubernetes is likely to be our future. 
>>> Talking
>>> with people I don't hear about new standalone, YARN or mesos 
>>> deploy

Re: Spark on Kubernetes scheduler variety

2021-06-30 Thread Mich Talebzadeh
Hi Michel,

Thanks for the link.

I am familiar with G-Research as I met them in my presentation in London
back in October 2019.

The amanda project sems to create super-scheduling on top of Kubernetes
clusters and I quote:

"Armada is an application to achieve high throughput of run-to-completion
jobs on multiple Kubernetes clusters. It stores queues for users/projects
with pod specifications and creates these pods once there is available
resource in one of the connected Kubernetes clusters."

I believe Volcano is slightly different beast as it defines itself as
"batch system built on Kubernetes" from this link


"Volcano  is a batch system built on
Kubernetes. It provides a suite of mechanisms currently missing from
Kubernetes that are commonly required by many classes of batch & elastic
workloads. With the integration with Volcano, Spark application pods can be
scheduled for better scheduling efficiency."

Currently we need to establish a valid *link* to Install Kubernetes
Operator for Apache Spark with Volcano enabled as I intend to use it.

Any help will be appreciated.

Regards,

Mich



   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 29 Jun 2021 at 19:15, Michel Sumbul  wrote:

> Hi,
>
> Just for info, there is also the scheduler Armada (G-Research/armada
> ), its not well know but bring the
> interesting idea to be able to schedule job on top of multiple k8s clusters.
> I know that they have plan to add more advance features on top what they
> have now.
>
> Michel
>
>
> Le mardi 29 juin 2021, 17:18:25 UTC+1, Mich Talebzadeh <
> mich.talebza...@gmail.com> a écrit :
>
>
> Hi Yikun
>
> In reference
>
>
> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md
>
> Trying to install Volcano I am getting this error
>
> helm repo add incubator
> http://storage.googleapis.com/kubernetes-charts-incubator
> Error: looks like "
> http://storage.googleapis.com/kubernetes-charts-incubator"; is not a valid
> chart repository or cannot be reached: failed to fetch
> http://storage.googleapis.com/kubernetes-charts-incubator/index.yaml :
> 404 Not Found
>
> Any ideas will be appreciated.
>
> Thanks,
>
> Mich
>
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 29 Jun 2021 at 09:14, Mich Talebzadeh 
> wrote:
>
> Cool, thanks!
>
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 29 Jun 2021 at 07:33, Yikun Jiang  wrote:
>
> > Is this the correct link for integrating Volcano with Spark?
>
> Yes, it is Kubernetes operator style of integrating Volcano. And if you
> want to just use spark submit style to submit a native support job, you can
> see [2] as ref.
>
> [1]
> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>
> Regards,
> Yikun
>
>
> Mich Talebzadeh  于2021年6月28日周一 下午6:03写道:
>
> Hi Yikun,
>
> Is this the correct link for integrating Volcano with Spark?
>
> spark-on-k8s-operator/volcano-integration.md at master ·
> GoogleCloudPlatform/spark-on-k8s-operator · GitHub
> 
>
> Thanks
>
>
> Mich
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The auth

Re: Spark on Kubernetes scheduler variety

2021-06-29 Thread Mich Talebzadeh
Hi Yikun

In reference

https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md

Trying to install Volcano I am getting this error

helm repo add incubator
http://storage.googleapis.com/kubernetes-charts-incubator
Error: looks like "http://storage.googleapis.com/kubernetes-charts-incubator";
is not a valid chart repository or cannot be reached: failed to fetch
http://storage.googleapis.com/kubernetes-charts-incubator/index.yaml : 404
Not Found

Any ideas will be appreciated.

Thanks,

Mich



   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 29 Jun 2021 at 09:14, Mich Talebzadeh 
wrote:

> Cool, thanks!
>
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 29 Jun 2021 at 07:33, Yikun Jiang  wrote:
>
>> > Is this the correct link for integrating Volcano with Spark?
>>
>> Yes, it is Kubernetes operator style of integrating Volcano. And if you
>> want to just use spark submit style to submit a native support job, you can
>> see [2] as ref.
>>
>> [1]
>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>>
>> Regards,
>> Yikun
>>
>>
>> Mich Talebzadeh  于2021年6月28日周一 下午6:03写道:
>>
>>> Hi Yikun,
>>>
>>> Is this the correct link for integrating Volcano with Spark?
>>>
>>> spark-on-k8s-operator/volcano-integration.md at master ·
>>> GoogleCloudPlatform/spark-on-k8s-operator · GitHub
>>> 
>>>
>>> Thanks
>>>
>>>
>>> Mich
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Fri, 25 Jun 2021 at 09:45, Yikun Jiang  wrote:
>>>
 Oops, sorry for the error link, it should be:

 We will also prepare to propose an initial design and POC[3] on a
 shared branch (based on spark master branch) where we can collaborate on
 it, so I created the spark-volcano[1] org in github to make it happen.

 [3]
 https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4


 And
 Regards,
 Yikun


 Yikun Jiang  于2021年6月25日周五 上午11:53写道:

> Hi, folks.
>
> As @Klaus mentioned, We have some work on Spark on k8s with volcano
> native support. Also, there were also some production deployment 
> validation
> from our partners in China, like JingDong, XiaoHongShu, VIPshop.
>
> We will also prepare to propose an initial design and POC[3] on a
> shared branch (based on spark master branch) where we can collaborate on
> it, so I created the spark-volcano[1] org in github to make it happen.
>
> Pls feel free to comment on it [2] if you guys have any questions or
> concerns.
>
> [1] https://github.com/spark-volcano
> [2] https://github.com/spark-volcano/spark/issues/1
> [3]
> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>
>


> Regards,
> Yikun
>
> Holden Karau  于2021年6月25日周五 上午12:00写道:
>
>> Hi Mich,
>>
>> I certainly think making Spark on Kubernetes run well is going to be
>> a challenge. However I think, and I could be wrong about this as well, 
>> that
>> in terms of cluster managers Kubernetes is likely to be our future. 
>> Talking
>> with people I don't hear about new standalone, YARN or mesos deployments 
>> of
>> Spark, but I do hear about people trying to migrate to Kubernetes.
>>
>> To be clear I certainly agree that we need more work on structured
>> streaming, but its important to remember that the Spark developers are 
>> not
>> all fully interchangeable, we work on the things that we're interested in
>> pursuing so even if structured streaming ne

Re: Spark on Kubernetes scheduler variety

2021-06-29 Thread Mich Talebzadeh
Cool, thanks!



   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 29 Jun 2021 at 07:33, Yikun Jiang  wrote:

> > Is this the correct link for integrating Volcano with Spark?
>
> Yes, it is Kubernetes operator style of integrating Volcano. And if you
> want to just use spark submit style to submit a native support job, you can
> see [2] as ref.
>
> [1]
> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>
> Regards,
> Yikun
>
>
> Mich Talebzadeh  于2021年6月28日周一 下午6:03写道:
>
>> Hi Yikun,
>>
>> Is this the correct link for integrating Volcano with Spark?
>>
>> spark-on-k8s-operator/volcano-integration.md at master ·
>> GoogleCloudPlatform/spark-on-k8s-operator · GitHub
>> 
>>
>> Thanks
>>
>>
>> Mich
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 25 Jun 2021 at 09:45, Yikun Jiang  wrote:
>>
>>> Oops, sorry for the error link, it should be:
>>>
>>> We will also prepare to propose an initial design and POC[3] on a shared
>>> branch (based on spark master branch) where we can collaborate on it, so I
>>> created the spark-volcano[1] org in github to make it happen.
>>>
>>> [3]
>>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>>>
>>>
>>> And
>>> Regards,
>>> Yikun
>>>
>>>
>>> Yikun Jiang  于2021年6月25日周五 上午11:53写道:
>>>
 Hi, folks.

 As @Klaus mentioned, We have some work on Spark on k8s with volcano
 native support. Also, there were also some production deployment validation
 from our partners in China, like JingDong, XiaoHongShu, VIPshop.

 We will also prepare to propose an initial design and POC[3] on a
 shared branch (based on spark master branch) where we can collaborate on
 it, so I created the spark-volcano[1] org in github to make it happen.

 Pls feel free to comment on it [2] if you guys have any questions or
 concerns.

 [1] https://github.com/spark-volcano
 [2] https://github.com/spark-volcano/spark/issues/1
 [3]
 https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4


>>>
>>>
 Regards,
 Yikun

 Holden Karau  于2021年6月25日周五 上午12:00写道:

> Hi Mich,
>
> I certainly think making Spark on Kubernetes run well is going to be a
> challenge. However I think, and I could be wrong about this as well, that
> in terms of cluster managers Kubernetes is likely to be our future. 
> Talking
> with people I don't hear about new standalone, YARN or mesos deployments 
> of
> Spark, but I do hear about people trying to migrate to Kubernetes.
>
> To be clear I certainly agree that we need more work on structured
> streaming, but its important to remember that the Spark developers are not
> all fully interchangeable, we work on the things that we're interested in
> pursuing so even if structured streaming needs more love if I'm not super
> interested in structured streaming I'm less likely to work on it. That
> being said I am certainly spinning up a bit more in the Spark SQL area
> especially around our data source/connectors because I can see the need
> there too.
>
> On Wed, Jun 23, 2021 at 8:26 AM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>>
>>
>> Please allow me to be diverse and express a different point of view
>> on this roadmap.
>>
>>
>> I believe from a technical point of view spending time and effort
>> plus talent on batch scheduling on Kubernetes could be rewarding. 
>> However,
>> if I may say I doubt whether such an approach and the so-called
>> democratization of Spark on whatever platform is really should be of 
>> great
>> focus.
>>
>> Having worked on Google Dataproc 
>>  (A fully managed and highly scalable service for running Apache
>> Spark, Hadoop and more recently other artefacts) for that past two
>> years, and Spark on Kubernetes on-premise, I have come to the conclusion
>> that Sp

Re: Spark on Kubernetes scheduler variety

2021-06-28 Thread Yikun Jiang
> Is this the correct link for integrating Volcano with Spark?

Yes, it is Kubernetes operator style of integrating Volcano. And if you
want to just use spark submit style to submit a native support job, you can
see [2] as ref.

[1]
https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4

Regards,
Yikun


Mich Talebzadeh  于2021年6月28日周一 下午6:03写道:

> Hi Yikun,
>
> Is this the correct link for integrating Volcano with Spark?
>
> spark-on-k8s-operator/volcano-integration.md at master ·
> GoogleCloudPlatform/spark-on-k8s-operator · GitHub
> 
>
> Thanks
>
>
> Mich
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 25 Jun 2021 at 09:45, Yikun Jiang  wrote:
>
>> Oops, sorry for the error link, it should be:
>>
>> We will also prepare to propose an initial design and POC[3] on a shared
>> branch (based on spark master branch) where we can collaborate on it, so I
>> created the spark-volcano[1] org in github to make it happen.
>>
>> [3]
>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>>
>>
>> And
>> Regards,
>> Yikun
>>
>>
>> Yikun Jiang  于2021年6月25日周五 上午11:53写道:
>>
>>> Hi, folks.
>>>
>>> As @Klaus mentioned, We have some work on Spark on k8s with volcano
>>> native support. Also, there were also some production deployment validation
>>> from our partners in China, like JingDong, XiaoHongShu, VIPshop.
>>>
>>> We will also prepare to propose an initial design and POC[3] on a shared
>>> branch (based on spark master branch) where we can collaborate on it, so I
>>> created the spark-volcano[1] org in github to make it happen.
>>>
>>> Pls feel free to comment on it [2] if you guys have any questions or
>>> concerns.
>>>
>>> [1] https://github.com/spark-volcano
>>> [2] https://github.com/spark-volcano/spark/issues/1
>>> [3]
>>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>>>
>>>
>>
>>
>>> Regards,
>>> Yikun
>>>
>>> Holden Karau  于2021年6月25日周五 上午12:00写道:
>>>
 Hi Mich,

 I certainly think making Spark on Kubernetes run well is going to be a
 challenge. However I think, and I could be wrong about this as well, that
 in terms of cluster managers Kubernetes is likely to be our future. Talking
 with people I don't hear about new standalone, YARN or mesos deployments of
 Spark, but I do hear about people trying to migrate to Kubernetes.

 To be clear I certainly agree that we need more work on structured
 streaming, but its important to remember that the Spark developers are not
 all fully interchangeable, we work on the things that we're interested in
 pursuing so even if structured streaming needs more love if I'm not super
 interested in structured streaming I'm less likely to work on it. That
 being said I am certainly spinning up a bit more in the Spark SQL area
 especially around our data source/connectors because I can see the need
 there too.

 On Wed, Jun 23, 2021 at 8:26 AM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

>
>
> Please allow me to be diverse and express a different point of view on
> this roadmap.
>
>
> I believe from a technical point of view spending time and effort plus
> talent on batch scheduling on Kubernetes could be rewarding. However, if I
> may say I doubt whether such an approach and the so-called democratization
> of Spark on whatever platform is really should be of great focus.
>
> Having worked on Google Dataproc 
>  (A fully managed and highly scalable service for running Apache
> Spark, Hadoop and more recently other artefacts) for that past two
> years, and Spark on Kubernetes on-premise, I have come to the conclusion
> that Spark is not a beast that that one can fully commoditize it much like
> one can do with  Zookeeper, Kafka etc. There is always a struggle to make
> some niche areas of Spark like Spark Structured Streaming (SSS) work
> seamlessly and effortlessly on these commercial platforms with whatever as
> a Service.
>
>
> Moreover, Spark (and I stand corrected) from the ground up has already
> a lot of resiliency and redundancy built in. It is truly an enterprise
> class product (requires enterprise class support) that will be difficult 
> to
> commoditize with Kubernetes and expect the same performance. After all,
> Ku

Re: Spark on Kubernetes scheduler variety

2021-06-28 Thread Mich Talebzadeh
Hi Yikun,

Is this the correct link for integrating Volcano with Spark?

spark-on-k8s-operator/volcano-integration.md at master ·
GoogleCloudPlatform/spark-on-k8s-operator · GitHub


Thanks


Mich


   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 25 Jun 2021 at 09:45, Yikun Jiang  wrote:

> Oops, sorry for the error link, it should be:
>
> We will also prepare to propose an initial design and POC[3] on a shared
> branch (based on spark master branch) where we can collaborate on it, so I
> created the spark-volcano[1] org in github to make it happen.
>
> [3]
> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>
>
> And
> Regards,
> Yikun
>
>
> Yikun Jiang  于2021年6月25日周五 上午11:53写道:
>
>> Hi, folks.
>>
>> As @Klaus mentioned, We have some work on Spark on k8s with volcano
>> native support. Also, there were also some production deployment validation
>> from our partners in China, like JingDong, XiaoHongShu, VIPshop.
>>
>> We will also prepare to propose an initial design and POC[3] on a shared
>> branch (based on spark master branch) where we can collaborate on it, so I
>> created the spark-volcano[1] org in github to make it happen.
>>
>> Pls feel free to comment on it [2] if you guys have any questions or
>> concerns.
>>
>> [1] https://github.com/spark-volcano
>> [2] https://github.com/spark-volcano/spark/issues/1
>> [3]
>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>>
>>
>
>
>> Regards,
>> Yikun
>>
>> Holden Karau  于2021年6月25日周五 上午12:00写道:
>>
>>> Hi Mich,
>>>
>>> I certainly think making Spark on Kubernetes run well is going to be a
>>> challenge. However I think, and I could be wrong about this as well, that
>>> in terms of cluster managers Kubernetes is likely to be our future. Talking
>>> with people I don't hear about new standalone, YARN or mesos deployments of
>>> Spark, but I do hear about people trying to migrate to Kubernetes.
>>>
>>> To be clear I certainly agree that we need more work on structured
>>> streaming, but its important to remember that the Spark developers are not
>>> all fully interchangeable, we work on the things that we're interested in
>>> pursuing so even if structured streaming needs more love if I'm not super
>>> interested in structured streaming I'm less likely to work on it. That
>>> being said I am certainly spinning up a bit more in the Spark SQL area
>>> especially around our data source/connectors because I can see the need
>>> there too.
>>>
>>> On Wed, Jun 23, 2021 at 8:26 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>


 Please allow me to be diverse and express a different point of view on
 this roadmap.


 I believe from a technical point of view spending time and effort plus
 talent on batch scheduling on Kubernetes could be rewarding. However, if I
 may say I doubt whether such an approach and the so-called democratization
 of Spark on whatever platform is really should be of great focus.

 Having worked on Google Dataproc  (A 
 fully
 managed and highly scalable service for running Apache Spark, Hadoop and
 more recently other artefacts) for that past two years, and Spark on
 Kubernetes on-premise, I have come to the conclusion that Spark is not a
 beast that that one can fully commoditize it much like one can do with
 Zookeeper, Kafka etc. There is always a struggle to make some niche areas
 of Spark like Spark Structured Streaming (SSS) work seamlessly and
 effortlessly on these commercial platforms with whatever as a Service.


 Moreover, Spark (and I stand corrected) from the ground up has already
 a lot of resiliency and redundancy built in. It is truly an enterprise
 class product (requires enterprise class support) that will be difficult to
 commoditize with Kubernetes and expect the same performance. After all,
 Kubernetes is aimed at efficient resource sharing and potential cost saving
 for the mass market. In short I can see commercial enterprises will work on
 these platforms ,but may be the great talents on dev team should focus on
 stuff like the perceived limitation of SSS in dealing with chain of
 aggregation( if I am correct it is not yet supported on streaming datasets)


 These are my opinions and they are not facts, just opinions so to speak
 :)


view my Linkedin profile
>>

Re: Spark on Kubernetes scheduler variety

2021-06-25 Thread Yikun Jiang
Oops, sorry for the error link, it should be:

We will also prepare to propose an initial design and POC[3] on a shared
branch (based on spark master branch) where we can collaborate on it, so I
created the spark-volcano[1] org in github to make it happen.

[3]
https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4


And
Regards,
Yikun


Yikun Jiang  于2021年6月25日周五 上午11:53写道:

> Hi, folks.
>
> As @Klaus mentioned, We have some work on Spark on k8s with volcano native
> support. Also, there were also some production deployment validation from
> our partners in China, like JingDong, XiaoHongShu, VIPshop.
>
> We will also prepare to propose an initial design and POC[3] on a shared
> branch (based on spark master branch) where we can collaborate on it, so I
> created the spark-volcano[1] org in github to make it happen.
>
> Pls feel free to comment on it [2] if you guys have any questions or
> concerns.
>
> [1] https://github.com/spark-volcano
> [2] https://github.com/spark-volcano/spark/issues/1
> [3]
> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>
>


> Regards,
> Yikun
>
> Holden Karau  于2021年6月25日周五 上午12:00写道:
>
>> Hi Mich,
>>
>> I certainly think making Spark on Kubernetes run well is going to be a
>> challenge. However I think, and I could be wrong about this as well, that
>> in terms of cluster managers Kubernetes is likely to be our future. Talking
>> with people I don't hear about new standalone, YARN or mesos deployments of
>> Spark, but I do hear about people trying to migrate to Kubernetes.
>>
>> To be clear I certainly agree that we need more work on structured
>> streaming, but its important to remember that the Spark developers are not
>> all fully interchangeable, we work on the things that we're interested in
>> pursuing so even if structured streaming needs more love if I'm not super
>> interested in structured streaming I'm less likely to work on it. That
>> being said I am certainly spinning up a bit more in the Spark SQL area
>> especially around our data source/connectors because I can see the need
>> there too.
>>
>> On Wed, Jun 23, 2021 at 8:26 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>>
>>>
>>> Please allow me to be diverse and express a different point of view on
>>> this roadmap.
>>>
>>>
>>> I believe from a technical point of view spending time and effort plus
>>> talent on batch scheduling on Kubernetes could be rewarding. However, if I
>>> may say I doubt whether such an approach and the so-called democratization
>>> of Spark on whatever platform is really should be of great focus.
>>>
>>> Having worked on Google Dataproc  (A 
>>> fully
>>> managed and highly scalable service for running Apache Spark, Hadoop and
>>> more recently other artefacts) for that past two years, and Spark on
>>> Kubernetes on-premise, I have come to the conclusion that Spark is not a
>>> beast that that one can fully commoditize it much like one can do with
>>> Zookeeper, Kafka etc. There is always a struggle to make some niche areas
>>> of Spark like Spark Structured Streaming (SSS) work seamlessly and
>>> effortlessly on these commercial platforms with whatever as a Service.
>>>
>>>
>>> Moreover, Spark (and I stand corrected) from the ground up has already a
>>> lot of resiliency and redundancy built in. It is truly an enterprise class
>>> product (requires enterprise class support) that will be difficult to
>>> commoditize with Kubernetes and expect the same performance. After all,
>>> Kubernetes is aimed at efficient resource sharing and potential cost saving
>>> for the mass market. In short I can see commercial enterprises will work on
>>> these platforms ,but may be the great talents on dev team should focus on
>>> stuff like the perceived limitation of SSS in dealing with chain of
>>> aggregation( if I am correct it is not yet supported on streaming datasets)
>>>
>>>
>>> These are my opinions and they are not facts, just opinions so to speak
>>> :)
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Fri, 18 Jun 2021 at 23:18, Holden Karau  wrote:
>>>
 I think these approaches are good, but there are limitations (eg
 dynamic scaling) without us making changes inside of the Spark Kube
 scheduler.

 Certainly whichever scheduler extensions we add support for we should
 collaborate with the people developing those extensions insofar as they are
 interested. My first place that I checked was #sig-scheduling wh

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread John Zhuge
Thanks Yikun!

On Thu, Jun 24, 2021 at 8:54 PM Yikun Jiang  wrote:

> Hi, folks.
>
> As @Klaus mentioned, We have some work on Spark on k8s with volcano native
> support. Also, there were also some production deployment validation from
> our partners in China, like JingDong, XiaoHongShu, VIPshop.
>
> We will also prepare to propose an initial design and POC[3] on a shared
> branch (based on spark master branch) where we can collaborate on it, so I
> created the spark-volcano[1] org in github to make it happen.
>
> Pls feel free to comment on it [2] if you guys have any questions or
> concerns.
>
> [1] https://github.com/spark-volcano
> [2] https://github.com/spark-volcano/spark/issues/1
> [3] https://github.com/spark-volcano-wip/spark-3-volcano
>
> Regards,
> Yikun
>
> Holden Karau  于2021年6月25日周五 上午12:00写道:
>
>> Hi Mich,
>>
>> I certainly think making Spark on Kubernetes run well is going to be a
>> challenge. However I think, and I could be wrong about this as well, that
>> in terms of cluster managers Kubernetes is likely to be our future. Talking
>> with people I don't hear about new standalone, YARN or mesos deployments of
>> Spark, but I do hear about people trying to migrate to Kubernetes.
>>
>> To be clear I certainly agree that we need more work on structured
>> streaming, but its important to remember that the Spark developers are not
>> all fully interchangeable, we work on the things that we're interested in
>> pursuing so even if structured streaming needs more love if I'm not super
>> interested in structured streaming I'm less likely to work on it. That
>> being said I am certainly spinning up a bit more in the Spark SQL area
>> especially around our data source/connectors because I can see the need
>> there too.
>>
>> On Wed, Jun 23, 2021 at 8:26 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>>
>>>
>>> Please allow me to be diverse and express a different point of view on
>>> this roadmap.
>>>
>>>
>>> I believe from a technical point of view spending time and effort plus
>>> talent on batch scheduling on Kubernetes could be rewarding. However, if I
>>> may say I doubt whether such an approach and the so-called democratization
>>> of Spark on whatever platform is really should be of great focus.
>>>
>>> Having worked on Google Dataproc  (A 
>>> fully
>>> managed and highly scalable service for running Apache Spark, Hadoop and
>>> more recently other artefacts) for that past two years, and Spark on
>>> Kubernetes on-premise, I have come to the conclusion that Spark is not a
>>> beast that that one can fully commoditize it much like one can do with
>>> Zookeeper, Kafka etc. There is always a struggle to make some niche areas
>>> of Spark like Spark Structured Streaming (SSS) work seamlessly and
>>> effortlessly on these commercial platforms with whatever as a Service.
>>>
>>>
>>> Moreover, Spark (and I stand corrected) from the ground up has already a
>>> lot of resiliency and redundancy built in. It is truly an enterprise class
>>> product (requires enterprise class support) that will be difficult to
>>> commoditize with Kubernetes and expect the same performance. After all,
>>> Kubernetes is aimed at efficient resource sharing and potential cost saving
>>> for the mass market. In short I can see commercial enterprises will work on
>>> these platforms ,but may be the great talents on dev team should focus on
>>> stuff like the perceived limitation of SSS in dealing with chain of
>>> aggregation( if I am correct it is not yet supported on streaming datasets)
>>>
>>>
>>> These are my opinions and they are not facts, just opinions so to speak
>>> :)
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Fri, 18 Jun 2021 at 23:18, Holden Karau  wrote:
>>>
 I think these approaches are good, but there are limitations (eg
 dynamic scaling) without us making changes inside of the Spark Kube
 scheduler.

 Certainly whichever scheduler extensions we add support for we should
 collaborate with the people developing those extensions insofar as they are
 interested. My first place that I checked was #sig-scheduling which is
 fairly quite on the Kubernetes slack but if there are more places to look
 for folks interested in batch scheduling on Kubernetes we should definitely
 give it a shot :)

 On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Hi,
>
> Regarding your point and I quote
>
> "..  I know that one of the Spar

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread Yikun Jiang
Hi, folks.

As @Klaus mentioned, We have some work on Spark on k8s with volcano native
support. Also, there were also some production deployment validation from
our partners in China, like JingDong, XiaoHongShu, VIPshop.

We will also prepare to propose an initial design and POC[3] on a shared
branch (based on spark master branch) where we can collaborate on it, so I
created the spark-volcano[1] org in github to make it happen.

Pls feel free to comment on it [2] if you guys have any questions or
concerns.

[1] https://github.com/spark-volcano
[2] https://github.com/spark-volcano/spark/issues/1
[3] https://github.com/spark-volcano-wip/spark-3-volcano

Regards,
Yikun

Holden Karau  于2021年6月25日周五 上午12:00写道:

> Hi Mich,
>
> I certainly think making Spark on Kubernetes run well is going to be a
> challenge. However I think, and I could be wrong about this as well, that
> in terms of cluster managers Kubernetes is likely to be our future. Talking
> with people I don't hear about new standalone, YARN or mesos deployments of
> Spark, but I do hear about people trying to migrate to Kubernetes.
>
> To be clear I certainly agree that we need more work on structured
> streaming, but its important to remember that the Spark developers are not
> all fully interchangeable, we work on the things that we're interested in
> pursuing so even if structured streaming needs more love if I'm not super
> interested in structured streaming I'm less likely to work on it. That
> being said I am certainly spinning up a bit more in the Spark SQL area
> especially around our data source/connectors because I can see the need
> there too.
>
> On Wed, Jun 23, 2021 at 8:26 AM Mich Talebzadeh 
> wrote:
>
>>
>>
>> Please allow me to be diverse and express a different point of view on
>> this roadmap.
>>
>>
>> I believe from a technical point of view spending time and effort plus
>> talent on batch scheduling on Kubernetes could be rewarding. However, if I
>> may say I doubt whether such an approach and the so-called democratization
>> of Spark on whatever platform is really should be of great focus.
>>
>> Having worked on Google Dataproc  (A fully
>> managed and highly scalable service for running Apache Spark, Hadoop and
>> more recently other artefacts) for that past two years, and Spark on
>> Kubernetes on-premise, I have come to the conclusion that Spark is not a
>> beast that that one can fully commoditize it much like one can do with
>> Zookeeper, Kafka etc. There is always a struggle to make some niche areas
>> of Spark like Spark Structured Streaming (SSS) work seamlessly and
>> effortlessly on these commercial platforms with whatever as a Service.
>>
>>
>> Moreover, Spark (and I stand corrected) from the ground up has already a
>> lot of resiliency and redundancy built in. It is truly an enterprise class
>> product (requires enterprise class support) that will be difficult to
>> commoditize with Kubernetes and expect the same performance. After all,
>> Kubernetes is aimed at efficient resource sharing and potential cost saving
>> for the mass market. In short I can see commercial enterprises will work on
>> these platforms ,but may be the great talents on dev team should focus on
>> stuff like the perceived limitation of SSS in dealing with chain of
>> aggregation( if I am correct it is not yet supported on streaming datasets)
>>
>>
>> These are my opinions and they are not facts, just opinions so to speak :)
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 18 Jun 2021 at 23:18, Holden Karau  wrote:
>>
>>> I think these approaches are good, but there are limitations (eg dynamic
>>> scaling) without us making changes inside of the Spark Kube scheduler.
>>>
>>> Certainly whichever scheduler extensions we add support for we should
>>> collaborate with the people developing those extensions insofar as they are
>>> interested. My first place that I checked was #sig-scheduling which is
>>> fairly quite on the Kubernetes slack but if there are more places to look
>>> for folks interested in batch scheduling on Kubernetes we should definitely
>>> give it a shot :)
>>>
>>> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Hi,

 Regarding your point and I quote

 "..  I know that one of the Spark on Kube operators
 supports volcano/kube-batch so I was thinking that might be a place I would
 start exploring..."

 There seems to be ongoing work on say Volcano as part of  Cloud Native
 Computing Foundation 

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread Mich Talebzadeh
Hi Holden,

Thank you for your points. I guess coming from a corporate world I had an
oversight on how an open source project like Spark does leverage resources
and interest :).

As @KlausMa kindly volunteered it would be good to hear scheduling ideas on
Spark on Kubernetes and of course as I am sure you have some inroads/ideas
on this subject as well, then truly I guess love would be in the air for
Kubernetes 

HTH



   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 24 Jun 2021 at 16:59, Holden Karau  wrote:

> Hi Mich,
>
> I certainly think making Spark on Kubernetes run well is going to be a
> challenge. However I think, and I could be wrong about this as well, that
> in terms of cluster managers Kubernetes is likely to be our future. Talking
> with people I don't hear about new standalone, YARN or mesos deployments of
> Spark, but I do hear about people trying to migrate to Kubernetes.
>
> To be clear I certainly agree that we need more work on structured
> streaming, but its important to remember that the Spark developers are not
> all fully interchangeable, we work on the things that we're interested in
> pursuing so even if structured streaming needs more love if I'm not super
> interested in structured streaming I'm less likely to work on it. That
> being said I am certainly spinning up a bit more in the Spark SQL area
> especially around our data source/connectors because I can see the need
> there too.
>
> On Wed, Jun 23, 2021 at 8:26 AM Mich Talebzadeh 
> wrote:
>
>>
>>
>> Please allow me to be diverse and express a different point of view on
>> this roadmap.
>>
>>
>> I believe from a technical point of view spending time and effort plus
>> talent on batch scheduling on Kubernetes could be rewarding. However, if I
>> may say I doubt whether such an approach and the so-called democratization
>> of Spark on whatever platform is really should be of great focus.
>>
>> Having worked on Google Dataproc  (A fully
>> managed and highly scalable service for running Apache Spark, Hadoop and
>> more recently other artefacts) for that past two years, and Spark on
>> Kubernetes on-premise, I have come to the conclusion that Spark is not a
>> beast that that one can fully commoditize it much like one can do with
>> Zookeeper, Kafka etc. There is always a struggle to make some niche areas
>> of Spark like Spark Structured Streaming (SSS) work seamlessly and
>> effortlessly on these commercial platforms with whatever as a Service.
>>
>>
>> Moreover, Spark (and I stand corrected) from the ground up has already a
>> lot of resiliency and redundancy built in. It is truly an enterprise class
>> product (requires enterprise class support) that will be difficult to
>> commoditize with Kubernetes and expect the same performance. After all,
>> Kubernetes is aimed at efficient resource sharing and potential cost saving
>> for the mass market. In short I can see commercial enterprises will work on
>> these platforms ,but may be the great talents on dev team should focus on
>> stuff like the perceived limitation of SSS in dealing with chain of
>> aggregation( if I am correct it is not yet supported on streaming datasets)
>>
>>
>> These are my opinions and they are not facts, just opinions so to speak :)
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 18 Jun 2021 at 23:18, Holden Karau  wrote:
>>
>>> I think these approaches are good, but there are limitations (eg dynamic
>>> scaling) without us making changes inside of the Spark Kube scheduler.
>>>
>>> Certainly whichever scheduler extensions we add support for we should
>>> collaborate with the people developing those extensions insofar as they are
>>> interested. My first place that I checked was #sig-scheduling which is
>>> fairly quite on the Kubernetes slack but if there are more places to look
>>> for folks interested in batch scheduling on Kubernetes we should definitely
>>> give it a shot :)
>>>
>>> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Hi,

 Regarding your point and I quote

 "..  I know that one of the Spark on Kub

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread Holden Karau
Hi Mich,

I certainly think making Spark on Kubernetes run well is going to be a
challenge. However I think, and I could be wrong about this as well, that
in terms of cluster managers Kubernetes is likely to be our future. Talking
with people I don't hear about new standalone, YARN or mesos deployments of
Spark, but I do hear about people trying to migrate to Kubernetes.

To be clear I certainly agree that we need more work on structured
streaming, but its important to remember that the Spark developers are not
all fully interchangeable, we work on the things that we're interested in
pursuing so even if structured streaming needs more love if I'm not super
interested in structured streaming I'm less likely to work on it. That
being said I am certainly spinning up a bit more in the Spark SQL area
especially around our data source/connectors because I can see the need
there too.

On Wed, Jun 23, 2021 at 8:26 AM Mich Talebzadeh 
wrote:

>
>
> Please allow me to be diverse and express a different point of view on
> this roadmap.
>
>
> I believe from a technical point of view spending time and effort plus
> talent on batch scheduling on Kubernetes could be rewarding. However, if I
> may say I doubt whether such an approach and the so-called democratization
> of Spark on whatever platform is really should be of great focus.
>
> Having worked on Google Dataproc  (A fully
> managed and highly scalable service for running Apache Spark, Hadoop and
> more recently other artefacts) for that past two years, and Spark on
> Kubernetes on-premise, I have come to the conclusion that Spark is not a
> beast that that one can fully commoditize it much like one can do with
> Zookeeper, Kafka etc. There is always a struggle to make some niche areas
> of Spark like Spark Structured Streaming (SSS) work seamlessly and
> effortlessly on these commercial platforms with whatever as a Service.
>
>
> Moreover, Spark (and I stand corrected) from the ground up has already a
> lot of resiliency and redundancy built in. It is truly an enterprise class
> product (requires enterprise class support) that will be difficult to
> commoditize with Kubernetes and expect the same performance. After all,
> Kubernetes is aimed at efficient resource sharing and potential cost saving
> for the mass market. In short I can see commercial enterprises will work on
> these platforms ,but may be the great talents on dev team should focus on
> stuff like the perceived limitation of SSS in dealing with chain of
> aggregation( if I am correct it is not yet supported on streaming datasets)
>
>
> These are my opinions and they are not facts, just opinions so to speak :)
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 18 Jun 2021 at 23:18, Holden Karau  wrote:
>
>> I think these approaches are good, but there are limitations (eg dynamic
>> scaling) without us making changes inside of the Spark Kube scheduler.
>>
>> Certainly whichever scheduler extensions we add support for we should
>> collaborate with the people developing those extensions insofar as they are
>> interested. My first place that I checked was #sig-scheduling which is
>> fairly quite on the Kubernetes slack but if there are more places to look
>> for folks interested in batch scheduling on Kubernetes we should definitely
>> give it a shot :)
>>
>> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Regarding your point and I quote
>>>
>>> "..  I know that one of the Spark on Kube operators
>>> supports volcano/kube-batch so I was thinking that might be a place I would
>>> start exploring..."
>>>
>>> There seems to be ongoing work on say Volcano as part of  Cloud Native
>>> Computing Foundation  (CNCF). For example through
>>> https://github.com/volcano-sh/volcano
>>>
>> 
>>>
>>> There may be value-add in collaborating with such groups through CNCF in
>>> order to have a collective approach to such work. There also seems to be
>>> some work on Integration of Spark with Volcano for Batch Scheduling.
>>> 
>>>
>>>
>>>
>>> What is not very clear is the degree of progress of these projects. You
>>> may be kind enough to elaborate on KPI for each of these projects and where
>>> you think your contributions is going to be.
>>>
>>>
>>> HTH,
>>>
>>>
>>> Mich
>>>
>>>
>>>view my Linkedin profile
>>> 
>

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread Holden Karau
That's awesome, I'm just starting to get context around Volcano but maybe
we can schedule an initial meeting for all of us interested in pursuing
this to get on the same page.

On Wed, Jun 23, 2021 at 6:54 PM Klaus Ma  wrote:

> Hi team,
>
> I'm kube-batch/Volcano founder, and I'm excited to hear that the spark
> community also has such requirements :)
>
> Volcano provides several features for batch workload, e.g. fair-share,
> queue, reservation, preemption/reclaim and so on.
> It has been used in several product environments with Spark; if necessary,
> I can give an overall introduction about Volcano's features and those use
> cases :)
>
> -- Klaus
>
> On Wed, Jun 23, 2021 at 11:26 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>>
>>
>> Please allow me to be diverse and express a different point of view on
>> this roadmap.
>>
>>
>> I believe from a technical point of view spending time and effort plus
>> talent on batch scheduling on Kubernetes could be rewarding. However, if I
>> may say I doubt whether such an approach and the so-called democratization
>> of Spark on whatever platform is really should be of great focus.
>>
>> Having worked on Google Dataproc  (A fully
>> managed and highly scalable service for running Apache Spark, Hadoop and
>> more recently other artefacts) for that past two years, and Spark on
>> Kubernetes on-premise, I have come to the conclusion that Spark is not a
>> beast that that one can fully commoditize it much like one can do with
>> Zookeeper, Kafka etc. There is always a struggle to make some niche areas
>> of Spark like Spark Structured Streaming (SSS) work seamlessly and
>> effortlessly on these commercial platforms with whatever as a Service.
>>
>>
>> Moreover, Spark (and I stand corrected) from the ground up has already a
>> lot of resiliency and redundancy built in. It is truly an enterprise class
>> product (requires enterprise class support) that will be difficult to
>> commoditize with Kubernetes and expect the same performance. After all,
>> Kubernetes is aimed at efficient resource sharing and potential cost saving
>> for the mass market. In short I can see commercial enterprises will work on
>> these platforms ,but may be the great talents on dev team should focus on
>> stuff like the perceived limitation of SSS in dealing with chain of
>> aggregation( if I am correct it is not yet supported on streaming datasets)
>>
>>
>> These are my opinions and they are not facts, just opinions so to speak :)
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 18 Jun 2021 at 23:18, Holden Karau  wrote:
>>
>>> I think these approaches are good, but there are limitations (eg dynamic
>>> scaling) without us making changes inside of the Spark Kube scheduler.
>>>
>>> Certainly whichever scheduler extensions we add support for we should
>>> collaborate with the people developing those extensions insofar as they are
>>> interested. My first place that I checked was #sig-scheduling which is
>>> fairly quite on the Kubernetes slack but if there are more places to look
>>> for folks interested in batch scheduling on Kubernetes we should definitely
>>> give it a shot :)
>>>
>>> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Hi,

 Regarding your point and I quote

 "..  I know that one of the Spark on Kube operators
 supports volcano/kube-batch so I was thinking that might be a place I would
 start exploring..."

 There seems to be ongoing work on say Volcano as part of  Cloud Native
 Computing Foundation  (CNCF). For example through
 https://github.com/volcano-sh/volcano

>>> 

 There may be value-add in collaborating with such groups through CNCF
 in order to have a collective approach to such work. There also seems to be
 some work on Integration of Spark with Volcano for Batch Scheduling.
 



 What is not very clear is the degree of progress of these projects. You
 may be kind enough to elaborate on KPI for each of these projects and where
 you think your contributions is going to be.


 HTH,


 Mich


view my Linkedin profile
 



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any 

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread John Zhuge
Thanks Klaus! I am interested in more details.

On Wed, Jun 23, 2021 at 6:54 PM Klaus Ma  wrote:

> Hi team,
>
> I'm kube-batch/Volcano founder, and I'm excited to hear that the spark
> community also has such requirements :)
>
> Volcano provides several features for batch workload, e.g. fair-share,
> queue, reservation, preemption/reclaim and so on.
> It has been used in several product environments with Spark; if necessary,
> I can give an overall introduction about Volcano's features and those use
> cases :)
>
> -- Klaus
>
> On Wed, Jun 23, 2021 at 11:26 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>>
>>
>> Please allow me to be diverse and express a different point of view on
>> this roadmap.
>>
>>
>> I believe from a technical point of view spending time and effort plus
>> talent on batch scheduling on Kubernetes could be rewarding. However, if I
>> may say I doubt whether such an approach and the so-called democratization
>> of Spark on whatever platform is really should be of great focus.
>>
>> Having worked on Google Dataproc  (A fully
>> managed and highly scalable service for running Apache Spark, Hadoop and
>> more recently other artefacts) for that past two years, and Spark on
>> Kubernetes on-premise, I have come to the conclusion that Spark is not a
>> beast that that one can fully commoditize it much like one can do with
>> Zookeeper, Kafka etc. There is always a struggle to make some niche areas
>> of Spark like Spark Structured Streaming (SSS) work seamlessly and
>> effortlessly on these commercial platforms with whatever as a Service.
>>
>>
>> Moreover, Spark (and I stand corrected) from the ground up has already a
>> lot of resiliency and redundancy built in. It is truly an enterprise class
>> product (requires enterprise class support) that will be difficult to
>> commoditize with Kubernetes and expect the same performance. After all,
>> Kubernetes is aimed at efficient resource sharing and potential cost saving
>> for the mass market. In short I can see commercial enterprises will work on
>> these platforms ,but may be the great talents on dev team should focus on
>> stuff like the perceived limitation of SSS in dealing with chain of
>> aggregation( if I am correct it is not yet supported on streaming datasets)
>>
>>
>> These are my opinions and they are not facts, just opinions so to speak :)
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 18 Jun 2021 at 23:18, Holden Karau  wrote:
>>
>>> I think these approaches are good, but there are limitations (eg dynamic
>>> scaling) without us making changes inside of the Spark Kube scheduler.
>>>
>>> Certainly whichever scheduler extensions we add support for we should
>>> collaborate with the people developing those extensions insofar as they are
>>> interested. My first place that I checked was #sig-scheduling which is
>>> fairly quite on the Kubernetes slack but if there are more places to look
>>> for folks interested in batch scheduling on Kubernetes we should definitely
>>> give it a shot :)
>>>
>>> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Hi,

 Regarding your point and I quote

 "..  I know that one of the Spark on Kube operators
 supports volcano/kube-batch so I was thinking that might be a place I would
 start exploring..."

 There seems to be ongoing work on say Volcano as part of  Cloud Native
 Computing Foundation  (CNCF). For example through
 https://github.com/volcano-sh/volcano

>>> 

 There may be value-add in collaborating with such groups through CNCF
 in order to have a collective approach to such work. There also seems to be
 some work on Integration of Spark with Volcano for Batch Scheduling.
 



 What is not very clear is the degree of progress of these projects. You
 may be kind enough to elaborate on KPI for each of these projects and where
 you think your contributions is going to be.


 HTH,


 Mich


view my Linkedin profile
 



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is 

Re: Spark on Kubernetes scheduler variety

2021-06-24 Thread Mich Talebzadeh
Thanks Klaus. That will be great.

It will also be intuitive if you elaborate the need for this feature in
line with the limitation of the current batch workload.

Regards,

Mich



   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 24 Jun 2021 at 02:53, Klaus Ma  wrote:

> Hi team,
>
> I'm kube-batch/Volcano founder, and I'm excited to hear that the spark
> community also has such requirements :)
>
> Volcano provides several features for batch workload, e.g. fair-share,
> queue, reservation, preemption/reclaim and so on.
> It has been used in several product environments with Spark; if necessary,
> I can give an overall introduction about Volcano's features and those use
> cases :)
>
> -- Klaus
>
> On Wed, Jun 23, 2021 at 11:26 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>>
>>
>> Please allow me to be diverse and express a different point of view on
>> this roadmap.
>>
>>
>> I believe from a technical point of view spending time and effort plus
>> talent on batch scheduling on Kubernetes could be rewarding. However, if I
>> may say I doubt whether such an approach and the so-called democratization
>> of Spark on whatever platform is really should be of great focus.
>>
>> Having worked on Google Dataproc  (A fully
>> managed and highly scalable service for running Apache Spark, Hadoop and
>> more recently other artefacts) for that past two years, and Spark on
>> Kubernetes on-premise, I have come to the conclusion that Spark is not a
>> beast that that one can fully commoditize it much like one can do with
>> Zookeeper, Kafka etc. There is always a struggle to make some niche areas
>> of Spark like Spark Structured Streaming (SSS) work seamlessly and
>> effortlessly on these commercial platforms with whatever as a Service.
>>
>>
>> Moreover, Spark (and I stand corrected) from the ground up has already a
>> lot of resiliency and redundancy built in. It is truly an enterprise class
>> product (requires enterprise class support) that will be difficult to
>> commoditize with Kubernetes and expect the same performance. After all,
>> Kubernetes is aimed at efficient resource sharing and potential cost saving
>> for the mass market. In short I can see commercial enterprises will work on
>> these platforms ,but may be the great talents on dev team should focus on
>> stuff like the perceived limitation of SSS in dealing with chain of
>> aggregation( if I am correct it is not yet supported on streaming datasets)
>>
>>
>> These are my opinions and they are not facts, just opinions so to speak :)
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 18 Jun 2021 at 23:18, Holden Karau  wrote:
>>
>>> I think these approaches are good, but there are limitations (eg dynamic
>>> scaling) without us making changes inside of the Spark Kube scheduler.
>>>
>>> Certainly whichever scheduler extensions we add support for we should
>>> collaborate with the people developing those extensions insofar as they are
>>> interested. My first place that I checked was #sig-scheduling which is
>>> fairly quite on the Kubernetes slack but if there are more places to look
>>> for folks interested in batch scheduling on Kubernetes we should definitely
>>> give it a shot :)
>>>
>>> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Hi,

 Regarding your point and I quote

 "..  I know that one of the Spark on Kube operators
 supports volcano/kube-batch so I was thinking that might be a place I would
 start exploring..."

 There seems to be ongoing work on say Volcano as part of  Cloud Native
 Computing Foundation  (CNCF). For example through
 https://github.com/volcano-sh/volcano

>>> 

 There may be value-add in collaborating with such groups through CNCF
 in order to have a collective approach to such work. There also seems to be
 some work on Integration of Spark with Volcano for Batch Scheduling.
 



 What is not very clear

Re: Spark on Kubernetes scheduler variety

2021-06-23 Thread Klaus Ma
Hi team,

I'm kube-batch/Volcano founder, and I'm excited to hear that the spark
community also has such requirements :)

Volcano provides several features for batch workload, e.g. fair-share,
queue, reservation, preemption/reclaim and so on.
It has been used in several product environments with Spark; if necessary,
I can give an overall introduction about Volcano's features and those use
cases :)

-- Klaus

On Wed, Jun 23, 2021 at 11:26 PM Mich Talebzadeh 
wrote:

>
>
> Please allow me to be diverse and express a different point of view on
> this roadmap.
>
>
> I believe from a technical point of view spending time and effort plus
> talent on batch scheduling on Kubernetes could be rewarding. However, if I
> may say I doubt whether such an approach and the so-called democratization
> of Spark on whatever platform is really should be of great focus.
>
> Having worked on Google Dataproc  (A fully
> managed and highly scalable service for running Apache Spark, Hadoop and
> more recently other artefacts) for that past two years, and Spark on
> Kubernetes on-premise, I have come to the conclusion that Spark is not a
> beast that that one can fully commoditize it much like one can do with
> Zookeeper, Kafka etc. There is always a struggle to make some niche areas
> of Spark like Spark Structured Streaming (SSS) work seamlessly and
> effortlessly on these commercial platforms with whatever as a Service.
>
>
> Moreover, Spark (and I stand corrected) from the ground up has already a
> lot of resiliency and redundancy built in. It is truly an enterprise class
> product (requires enterprise class support) that will be difficult to
> commoditize with Kubernetes and expect the same performance. After all,
> Kubernetes is aimed at efficient resource sharing and potential cost saving
> for the mass market. In short I can see commercial enterprises will work on
> these platforms ,but may be the great talents on dev team should focus on
> stuff like the perceived limitation of SSS in dealing with chain of
> aggregation( if I am correct it is not yet supported on streaming datasets)
>
>
> These are my opinions and they are not facts, just opinions so to speak :)
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 18 Jun 2021 at 23:18, Holden Karau  wrote:
>
>> I think these approaches are good, but there are limitations (eg dynamic
>> scaling) without us making changes inside of the Spark Kube scheduler.
>>
>> Certainly whichever scheduler extensions we add support for we should
>> collaborate with the people developing those extensions insofar as they are
>> interested. My first place that I checked was #sig-scheduling which is
>> fairly quite on the Kubernetes slack but if there are more places to look
>> for folks interested in batch scheduling on Kubernetes we should definitely
>> give it a shot :)
>>
>> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Regarding your point and I quote
>>>
>>> "..  I know that one of the Spark on Kube operators
>>> supports volcano/kube-batch so I was thinking that might be a place I would
>>> start exploring..."
>>>
>>> There seems to be ongoing work on say Volcano as part of  Cloud Native
>>> Computing Foundation  (CNCF). For example through
>>> https://github.com/volcano-sh/volcano
>>>
>> 
>>>
>>> There may be value-add in collaborating with such groups through CNCF in
>>> order to have a collective approach to such work. There also seems to be
>>> some work on Integration of Spark with Volcano for Batch Scheduling.
>>> 
>>>
>>>
>>>
>>> What is not very clear is the degree of progress of these projects. You
>>> may be kind enough to elaborate on KPI for each of these projects and where
>>> you think your contributions is going to be.
>>>
>>>
>>> HTH,
>>>
>>>
>>> Mich
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Fri, 18 Jun 2021 at 00:44, Holden Karau  wrote:
>>>
 Hi Folks,

 I'm continuing my

Re: Spark on Kubernetes scheduler variety

2021-06-23 Thread Mich Talebzadeh
Please allow me to be diverse and express a different point of view on
this roadmap.


I believe from a technical point of view spending time and effort plus
talent on batch scheduling on Kubernetes could be rewarding. However, if I
may say I doubt whether such an approach and the so-called democratization
of Spark on whatever platform is really should be of great focus.

Having worked on Google Dataproc  (A fully
managed and highly scalable service for running Apache Spark, Hadoop and
more recently other artefacts) for that past two years, and Spark on
Kubernetes on-premise, I have come to the conclusion that Spark is not a
beast that that one can fully commoditize it much like one can do with
Zookeeper, Kafka etc. There is always a struggle to make some niche areas
of Spark like Spark Structured Streaming (SSS) work seamlessly and
effortlessly on these commercial platforms with whatever as a Service.


Moreover, Spark (and I stand corrected) from the ground up has already a
lot of resiliency and redundancy built in. It is truly an enterprise class
product (requires enterprise class support) that will be difficult to
commoditize with Kubernetes and expect the same performance. After all,
Kubernetes is aimed at efficient resource sharing and potential cost saving
for the mass market. In short I can see commercial enterprises will work on
these platforms ,but may be the great talents on dev team should focus on
stuff like the perceived limitation of SSS in dealing with chain of
aggregation( if I am correct it is not yet supported on streaming datasets)


These are my opinions and they are not facts, just opinions so to speak :)


   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 18 Jun 2021 at 23:18, Holden Karau  wrote:

> I think these approaches are good, but there are limitations (eg dynamic
> scaling) without us making changes inside of the Spark Kube scheduler.
>
> Certainly whichever scheduler extensions we add support for we should
> collaborate with the people developing those extensions insofar as they are
> interested. My first place that I checked was #sig-scheduling which is
> fairly quite on the Kubernetes slack but if there are more places to look
> for folks interested in batch scheduling on Kubernetes we should definitely
> give it a shot :)
>
> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh 
> wrote:
>
>> Hi,
>>
>> Regarding your point and I quote
>>
>> "..  I know that one of the Spark on Kube operators
>> supports volcano/kube-batch so I was thinking that might be a place I would
>> start exploring..."
>>
>> There seems to be ongoing work on say Volcano as part of  Cloud Native
>> Computing Foundation  (CNCF). For example through
>> https://github.com/volcano-sh/volcano
>>
> 
>>
>> There may be value-add in collaborating with such groups through CNCF in
>> order to have a collective approach to such work. There also seems to be
>> some work on Integration of Spark with Volcano for Batch Scheduling.
>> 
>>
>>
>>
>> What is not very clear is the degree of progress of these projects. You
>> may be kind enough to elaborate on KPI for each of these projects and where
>> you think your contributions is going to be.
>>
>>
>> HTH,
>>
>>
>> Mich
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 18 Jun 2021 at 00:44, Holden Karau  wrote:
>>
>>> Hi Folks,
>>>
>>> I'm continuing my adventures to make Spark on containers party and I
>>> was wondering if folks have experience with the different batch
>>> scheduler options that they prefer? I was thinking so that we can
>>> better support dynamic allocation it might make sense for us to
>>> support using different schedulers and I wanted to see if there are
>>> any that the community is more interested in?
>>>
>>> I know that one of the Spark on Kube operators supports
>>> volcano/kube-batch so I was thinking that might be a place I start
>>> exploring but also want to be open to other schedulers that folks
>>> might be interested in.
>>>
>>> Cheers,
>>>
>>> Holden :)

Re: Spark on Kubernetes scheduler variety

2021-06-18 Thread Holden Karau
I think these approaches are good, but there are limitations (eg dynamic
scaling) without us making changes inside of the Spark Kube scheduler.

Certainly whichever scheduler extensions we add support for we should
collaborate with the people developing those extensions insofar as they are
interested. My first place that I checked was #sig-scheduling which is
fairly quite on the Kubernetes slack but if there are more places to look
for folks interested in batch scheduling on Kubernetes we should definitely
give it a shot :)

On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh 
wrote:

> Hi,
>
> Regarding your point and I quote
>
> "..  I know that one of the Spark on Kube operators
> supports volcano/kube-batch so I was thinking that might be a place I would
> start exploring..."
>
> There seems to be ongoing work on say Volcano as part of  Cloud Native
> Computing Foundation  (CNCF). For example through
> https://github.com/volcano-sh/volcano
>

>
> There may be value-add in collaborating with such groups through CNCF in
> order to have a collective approach to such work. There also seems to be
> some work on Integration of Spark with Volcano for Batch Scheduling.
> 
>
>
>
> What is not very clear is the degree of progress of these projects. You
> may be kind enough to elaborate on KPI for each of these projects and where
> you think your contributions is going to be.
>
>
> HTH,
>
>
> Mich
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 18 Jun 2021 at 00:44, Holden Karau  wrote:
>
>> Hi Folks,
>>
>> I'm continuing my adventures to make Spark on containers party and I
>> was wondering if folks have experience with the different batch
>> scheduler options that they prefer? I was thinking so that we can
>> better support dynamic allocation it might make sense for us to
>> support using different schedulers and I wanted to see if there are
>> any that the community is more interested in?
>>
>> I know that one of the Spark on Kube operators supports
>> volcano/kube-batch so I was thinking that might be a place I start
>> exploring but also want to be open to other schedulers that folks
>> might be interested in.
>>
>> Cheers,
>>
>> Holden :)
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: Spark on Kubernetes scheduler variety

2021-06-18 Thread Mich Talebzadeh
Hi,

Regarding your point and I quote

"..  I know that one of the Spark on Kube operators
supports volcano/kube-batch so I was thinking that might be a place I would
start exploring..."

There seems to be ongoing work on say Volcano as part of  Cloud Native
Computing Foundation  (CNCF). For example through
https://github.com/volcano-sh/volcano

There may be value-add in collaborating with such groups through CNCF in
order to have a collective approach to such work. There also seems to be
some work on Integration of Spark with Volcano for Batch Scheduling.




What is not very clear is the degree of progress of these projects. You may
be kind enough to elaborate on KPI for each of these projects and where you
think your contributions is going to be.


HTH,


Mich


   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 18 Jun 2021 at 00:44, Holden Karau  wrote:

> Hi Folks,
>
> I'm continuing my adventures to make Spark on containers party and I
> was wondering if folks have experience with the different batch
> scheduler options that they prefer? I was thinking so that we can
> better support dynamic allocation it might make sense for us to
> support using different schedulers and I wanted to see if there are
> any that the community is more interested in?
>
> I know that one of the Spark on Kube operators supports
> volcano/kube-batch so I was thinking that might be a place I start
> exploring but also want to be open to other schedulers that folks
> might be interested in.
>
> Cheers,
>
> Holden :)
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>