Exchanging data between pyspark and scala

Dominik Hübner Wed, 03 Sep 2014 02:43:56 -0700

Hey,
I am about to implement a spark app which will require to use both, pyspark and 
spark on scala.


Data should be read from AWS S3 (compressed CSV files), and must be 
pre-processed by an existing Python codebase. However, our final goal is to 
make those datasets available for Spark apps written in either Python or Scala 
through e.g. Tachyon. 

S3 => Pyspark => Tachyon => {Py, Scala}Spark

Is there any recommended way to pass data between Spark applications 
implemented in different languages? I thought about using some sort of 
serialisation framework like Thrift or Avro, but maybe there are other ways to 
do this (if possible without writing CSV files). I am open for any kind of 
input!
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Exchanging data between pyspark and scala

Reply via email to