Hi all,
I'm iterating over an RDD (representing a distributed matrix...have to
roll my own in Python) and making changes to different submatrices at
each iteration. The loop structure looks something like:
for i in range(x):
VAR = sc.broadcast(i)
rdd.map(func1).reduceByKey(func2)
M = rdd.collect()
where "func1" and "func2" use the current value of VAR for that iteration.
Because there aren't any "actions" in the main loop, nothing actually
happens until the "collect" method is called. I'm running into problems
I can't diagnose (*extremely* long execution time for no particular
reason, among others); is this code even valid? If not, how should make
in-place iterative edits to different portions of a matrix, where each
subsequent edit is dependent on the edits from the previous iteration?
Thanks in advance!
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org