Hi all,

I'm iterating over an RDD (representing a distributed matrix...have to roll my own in Python) and making changes to different submatrices at each iteration. The loop structure looks something like:

for i in range(x):
  VAR = sc.broadcast(i)
  rdd.map(func1).reduceByKey(func2)
M = rdd.collect()

where "func1" and "func2" use the current value of VAR for that iteration.

Because there aren't any "actions" in the main loop, nothing actually happens until the "collect" method is called. I'm running into problems I can't diagnose (*extremely* long execution time for no particular reason, among others); is this code even valid? If not, how should make in-place iterative edits to different portions of a matrix, where each subsequent edit is dependent on the edits from the previous iteration?

Thanks in advance!

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to