I am wondering about the same concept as the OP, did anyone have an answer
for this question? I can't see that Spark has loops built in, except to loop
over a dataset of existing/known size. Thus I often create a "dummy"
ArrayList and pass it to parallelize to control how many times Spark will
run the function I supply.

                // setup "dummy" ArrayList of size HOW_MANY_DARTS -- how many 
darts to
throw
                List<Integer> throwsList = new 
ArrayList<Integer>(HOW_MANY_DARTS);

                JavaRDD<Integer> dataSet = jsc.parallelize(throwsList, SLICES);

Then if I want to do this nested loop the OP was asking about, it seems like
I am going to nest parallelize calls, with the outer parallelize being
passed an ArrayList of length i and then inner parallelize getting a
different ArrayList of length j?

Is there not a more direct way to tell Spark to do something a fixed number
of times? It seems like it's geared to datasets of existing length, like
"iterate over the complete works of Shakespeare" which already has content
of a specific length.

In my case, for the simplest example, I am trying to run the PiAverage Spark
example, which "throws darts" at a circle j times. Then I want to run that
PiAverage i times and average the results to see if they really converge on
pi.

I would be happy to post  more code but it seems like a more conceptual
question I am asking? Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-meet-nested-loop-on-pairRdd-tp21121p25718.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to