I would just like to be able to put a Spark DataFrame in a manager.dict() and
be able to get it out (manager.dict() calls pickle on the object being
stored).  Ideally, I would just like to store a pointer to the DataFrame
object so that it remains distributed within Spark (i.e., not materialize
and then store).  Here is an example:

data = sparkContext.jsonFile(data_file) #load file
cache = Manager.dict() #thread-safe container
cache['id'] = data #store reference to data, not materialized result
new_data = cache['id'] #get reference to distributed spark dataframe
new_data.show()




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Pickle-Spark-DataFrame-tp14803p14825.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to