Thanks Cheng for the heads up. I will have a look. Cheers
Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". On Sun, 7 Apr 2024 at 15:08, Cheng Pan <pan3...@gmail.com> wrote: > Instead of External Shuffle Shufle, Apache Celeborn might be a good option > as a Remote Shuffle Service for Spark on K8s. > > There are some useful resources you might be interested in. > > [1] https://celeborn.apache.org/ > [2] https://www.youtube.com/watch?v=s5xOtG6Venw > [3] https://github.com/aws-samples/emr-remote-shuffle-service > [4] https://github.com/apache/celeborn/issues/2140 > > Thanks, > Cheng Pan > > > > On Apr 6, 2024, at 21:41, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > > > I have seen some older references for shuffle service for k8s, > > although it is not clear they are talking about a generic shuffle > > service for k8s. > > > > Anyhow with the advent of genai and the need to allow for a larger > > volume of data, I was wondering if there has been any more work on > > this matter. Specifically larger and scalable file systems like HDFS, > > GCS , S3 etc, offer significantly larger storage capacity than local > > disks on individual worker nodes in a k8s cluster, thus allowing > > handling much larger datasets more efficiently. Also the degree of > > parallelism and fault tolerance with these files systems come into > > it. I will be interested in hearing more about any progress on this. > > > > Thanks > > . > > > > Mich Talebzadeh, > > > > Technologist | Solutions Architect | Data Engineer | Generative AI > > > > London > > United Kingdom > > > > > > view my Linkedin profile > > > > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > > > > > Disclaimer: The information provided is correct to the best of my > > knowledge but of course cannot be guaranteed . It is essential to note > > that, as with any advice, quote "one test result is worth one-thousand > > expert opinions (Werner Von Braun)". > > > > --------------------------------------------------------------------- > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > >