Do Ignite and Alluxio offer reasonable means of transferring data, in memory, from Spark to MPI? A straightforward way to transfer data is use piping, but unless you have MPI processes running in a one-to-one mapping to the Spark partitions, this will require some complicated logic to get working (you'll have to handle multiple tasks sending their data to one process).
It seems like potentially Ignite and Alluxio might allow you to pull the data you want into each of your MPI processes without worrying about such a requirement, but it's not clear to me from the high-level descriptions of the systems whether this is something that can be readily realized. Is this the case? Another issue is that with the piping solution, you only need to store two copies of the data: one each on the Spark and MPI sides. With Ignite and Alluxio, would you need three? It seems that they let you replace the standard RDDs with RDDs backed with their memory stores, but do those perform as efficiently as the standard Spark RDDs that are persisted in memory? More generally, I'd be interested to know if there are existing solutions to this problem of transferring data between MPI and Spark. Thanks for any insight you can offer! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/feasibility-of-ignite-and-alluxio-for-interfacing-MPI-and-Spark-tp27745.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org