Re: push-based external shuffle service on K8S - Spark 4.0? Earlier versions?

2024-06-06 Thread Ye Zhou
Hi Ofir.
Right now, the push based shuffle within Spark is only supported for Spark
on YARN, with external shuffle service running as auxiliary service in
NodeManager, but not natively on K8s.
As far as I know, there are no recent plans to add the support for Spark on
K8s natively.

For question 2, are you looking for how to set up push based shuffle for
Spark on YARN or Spark on K8s? For Spark on YARN, as documented here
,
you need to enable the merge shuffle manager in shuffle service, and also
enable push based shuffle in client.

Thanks.
Ye.

On Thu, Jun 6, 2024 at 7:28 AM Keyong Zhou  wrote:

> Hi Ofir,
>
> I can provide some information about use cases for Apache Celeborn.
>
> Apache Celeborn can be deployed on K8s and standalone, both are widely
> used in production environment by users. The largest cluster I know
> contains
> more than 1,000 Celeborn workers.
>
> Celeborn is specially beneficial for large scale shuffle with high
> parallelism, which
> usually causes long fetch wait time or even fetch failure. We have seen
> serveral times speedup
> for jobs with large scale shuffle.
>
> Besides, with Celeborn, Spark on K8s can achive better Dynamic Resource
> Allocation because
> executors don't need to store shuffle data locally, also the pods don't
> need a large disk space.
>
> Celeborn is relatively easy to operate, especially for its graceful
> rolling upgrade and
> backward compatibility (across two successive versions).
>
> You can find more information including user feedbacks here[1]. I
> recommend you to try it out, and the community is happy to help :)
>
> Regards,
> Keyong Zhou
>
> [1]
> https://news.apache.org/foundation/entry/apache-software-foundation-announces-new-top-level-project-apache-celeborn
>
> On 2024/06/06 09:08:31 Ofir Manor wrote:
> > Hi,
> > Regarding the external shuffle service on K8S and especially the
> push-based variant that was merged in 3.2:
> >
> >   1.
> > Are there plans to make it supported and work out-of-the-box in 4.0?
> >   2.
> > Did anyone make it work for themselves in 3.5 or earlier? If so, can you
> share your experience and what was needed to make it work?
> >
> > As a fallback, someone using one of the new shuffle projects with K8S
> such as Apache Uniffle or Apache Celeborn and can share some feedback?
> Performance, stability, added complexity etc?
> > Thanks,
> >Ofir
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 

*Zhou, Ye  **周晔*


Re: push-based external shuffle service on K8S - Spark 4.0? Earlier versions?

2024-06-06 Thread Keyong Zhou
Hi Ofir,

I can provide some information about use cases for Apache Celeborn.

Apache Celeborn can be deployed on K8s and standalone, both are widely
used in production environment by users. The largest cluster I know contains
more than 1,000 Celeborn workers.

Celeborn is specially beneficial for large scale shuffle with high parallelism, 
which
usually causes long fetch wait time or even fetch failure. We have seen 
serveral times speedup
for jobs with large scale shuffle.

Besides, with Celeborn, Spark on K8s can achive better Dynamic Resource 
Allocation because
executors don't need to store shuffle data locally, also the pods don't need a 
large disk space.

Celeborn is relatively easy to operate, especially for its graceful rolling 
upgrade and
backward compatibility (across two successive versions).

You can find more information including user feedbacks here[1]. I recommend you 
to try it out, and the community is happy to help :)

Regards,
Keyong Zhou

[1] 
https://news.apache.org/foundation/entry/apache-software-foundation-announces-new-top-level-project-apache-celeborn

On 2024/06/06 09:08:31 Ofir Manor wrote:
> Hi,
> Regarding the external shuffle service on K8S and especially the push-based 
> variant that was merged in 3.2:
> 
>   1.
> Are there plans to make it supported and work out-of-the-box in 4.0?
>   2.
> Did anyone make it work for themselves in 3.5 or earlier? If so, can you 
> share your experience and what was needed to make it work?
> 
> As a fallback, someone using one of the new shuffle projects with K8S such as 
> Apache Uniffle or Apache Celeborn and can share some feedback? Performance, 
> stability, added complexity etc?
> Thanks,
>Ofir
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



push-based external shuffle service on K8S - Spark 4.0? Earlier versions?

2024-06-06 Thread Ofir Manor
Hi,
Regarding the external shuffle service on K8S and especially the push-based 
variant that was merged in 3.2:

  1.
Are there plans to make it supported and work out-of-the-box in 4.0?
  2.
Did anyone make it work for themselves in 3.5 or earlier? If so, can you share 
your experience and what was needed to make it work?

As a fallback, someone using one of the new shuffle projects with K8S such as 
Apache Uniffle or Apache Celeborn and can share some feedback? Performance, 
stability, added complexity etc?
Thanks,
   Ofir