Hey, I am about to implement a spark app which will require to use both, pyspark and spark on scala.
Data should be read from AWS S3 (compressed CSV files), and must be pre-processed by an existing Python codebase. However, our final goal is to make those datasets available for Spark apps written in either Python or Scala through e.g. Tachyon. S3 => Pyspark => Tachyon => {Py, Scala}Spark Is there any recommended way to pass data between Spark applications implemented in different languages? I thought about using some sort of serialisation framework like Thrift or Avro, but maybe there are other ways to do this (if possible without writing CSV files). I am open for any kind of input! --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org