Hello, I'm trying to host spark applications on a kubernetes cluster and want to provide localized persistent storage to the spark workers in a small research project I'm currently doing. I googled a bit around and found that HDFS seems to be pretty well supported with spark, but there arise some problems with the localization of data if I want to do this as outlined in this talk [1]. As far as I understand it, most of the configurations for deploying this are in their git repo [2]. But the spark-driver needs some patch to map the workers and the HDFS datanodes correctly to the kubernetes nodes, is something like this already part of the current spark codebase as of spark 2.4.0? I had a look at the code but couldn't find anything related to hdfs localization (pretty sure I just didn't look at the right place).
So, my question now is: is this even a viable option at the current state of the project(s)? What storage solution would be recommended instead if spark on kubernetes is given (so no yarn/mesos)? Looking forward to your input. Arne [1] https://databricks.com/session/hdfs-on-kubernetes-lessons-learned [2] https://github.com/apache-spark-on-k8s/kubernetes-HDFS --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org