Iterative changes to RDD and broadcast variables

Shannon Quinn Sun, 16 Nov 2014 18:35:51 -0800

Hi all,

I'm iterating over an RDD (representing a distributed matrix...have toroll my own in Python) and making changes to different submatrices ateach iteration. The loop structure looks something like:


for i in range(x):
  VAR = sc.broadcast(i)
  rdd.map(func1).reduceByKey(func2)
M = rdd.collect()

where "func1" and "func2" use the current value of VAR for that iteration.

Because there aren't any "actions" in the main loop, nothing actuallyhappens until the "collect" method is called. I'm running into problemsI can't diagnose (*extremely* long execution time for no particularreason, among others); is this code even valid? If not, how should makein-place iterative edits to different portions of a matrix, where eachsubsequent edit is dependent on the edits from the previous iteration?


Thanks in advance!

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Iterative changes to RDD and broadcast variables

Reply via email to