There is Apache incubator project Uniffle: https://github.com/apache/incubator-uniffle
It stores shuffle data on remote servers in memory, on local disk and HDFS. Cheers, Enrico Am 06.04.24 um 15:41 schrieb Mich Talebzadeh:
I have seen some older references for shuffle service for k8s, although it is not clear they are talking about a generic shuffle service for k8s. Anyhow with the advent of genai and the need to allow for a larger volume of data, I was wondering if there has been any more work on this matter. Specifically larger and scalable file systems like HDFS, GCS , S3 etc, offer significantly larger storage capacity than local disks on individual worker nodes in a k8s cluster, thus allowing handling much larger datasets more efficiently. Also the degree of parallelism and fault tolerance with these files systems come into it. I will be interested in hearing more about any progress on this. Thanks . Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)". --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org