Thanks Cheng for the heads up. I will have a look.

Cheers

Mich Talebzadeh,
Technologist | Solutions Architect | Data Engineer  | Generative AI
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Sun, 7 Apr 2024 at 15:08, Cheng Pan <pan3...@gmail.com> wrote:

> Instead of External Shuffle Shufle, Apache Celeborn might be a good option
> as a Remote Shuffle Service for Spark on K8s.
>
> There are some useful resources you might be interested in.
>
> [1] https://celeborn.apache.org/
> [2] https://www.youtube.com/watch?v=s5xOtG6Venw
> [3] https://github.com/aws-samples/emr-remote-shuffle-service
> [4] https://github.com/apache/celeborn/issues/2140
>
> Thanks,
> Cheng Pan
>
>
> > On Apr 6, 2024, at 21:41, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
> >
> > I have seen some older references for shuffle service for k8s,
> > although it is not clear they are talking about a generic shuffle
> > service for k8s.
> >
> > Anyhow with the advent of genai and the need to allow for a larger
> > volume of data, I was wondering if there has been any more work on
> > this matter. Specifically larger and scalable file systems like HDFS,
> > GCS , S3 etc, offer significantly larger storage capacity than local
> > disks on individual worker nodes in a k8s cluster, thus allowing
> > handling much larger datasets more efficiently. Also the degree of
> > parallelism and fault tolerance  with these files systems come into
> > it. I will be interested in hearing more about any progress on this.
> >
> > Thanks
> > .
> >
> > Mich Talebzadeh,
> >
> > Technologist | Solutions Architect | Data Engineer  | Generative AI
> >
> > London
> > United Kingdom
> >
> >
> >   view my Linkedin profile
> >
> >
> > https://en.everybodywiki.com/Mich_Talebzadeh
> >
> >
> >
> > Disclaimer: The information provided is correct to the best of my
> > knowledge but of course cannot be guaranteed . It is essential to note
> > that, as with any advice, quote "one test result is worth one-thousand
> > expert opinions (Werner Von Braun)".
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
>

Reply via email to