If you are working on large structures, you probably want to look at the GraphX
extension to Spark:
https://spark.apache.org/docs/latest/graphx-programming-guide.html
On June 14, 2015, at 10:50 AM, lisp lispra...@gmail.com wrote:
Hi there,
I have a large amount of objects, which I have to partition into chunks with
the help of a binary tree: after each object has been run through the tree,
the leaves of that tree contain the chunks. Next I have to process each of
those chunks in the same way with a function f(chunk). So I thought if I
could make the list of chunks into an RDD listOfChunks, I could use Spark by
calling listOfChunks.map(f) and do the processing in parallel.
What would you recommend how I create the RDD? Is it possible to start with
an RDD that is a list of empty chunks and then to add my objects one by one
to the belonging chunks? Or would you recommend something else?
Thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/creation-of-RDD-from-a-Tree-tp23310.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org