Is the dictionary read-only?
Did you look at 
http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables ?


-----Original Message-----
From: dgoldenberg [mailto:dgoldenberg...@gmail.com] 
Sent: Thursday, June 04, 2015 4:50 PM
To: user@spark.apache.org
Subject: How to share large resources like dictionaries while processing data 
with Spark ?

We have some pipelines defined where sometimes we need to load potentially 
large resources such as dictionaries.

What would be the best strategy for sharing such resources among the 
transformations/actions within a consumer?  Can they be shared somehow across 
the RDD's?

I'm looking for a way to load such a resource once into the cluster memory and 
have it be available throughout the lifecycle of a consumer...

Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-large-resources-like-dictionaries-while-processing-data-with-Spark-tp23162.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to