Spark, in general, is good for iterating through an entire dataset again
and again. All operations are expressed in terms of iteration through all
the records of at least one partition. You may want to look at IndexedRDD (
https://issues.apache.org/jira/browse/SPARK-2365) that aims to improve
Thanks!
On Thu, Oct 23, 2014 at 10:56 AM, Akshat Aranya aara...@gmail.com wrote:
Yes, that is a downside of Spark's design in general. The only way to
share data across consumers of data is by having a separate entity that
owns the Spark context. That's the idea behind Ooyala's job server.