Re: Iterative transformations over RDD crashes in phantom reduce

Shannon Quinn Tue, 18 Nov 2014 11:05:18 -0800

To clarify about what, precisely, is impossible: the crash happens withINDEX == 1 in func2, but func2 is only called in the reduceByKeytransformation when INDEX == 0. And according to the output of theforeach() in line 4, that reduceByKey(func2) works just fine. How is itthen invoked again with INDEX == 1 when there clearly isn't anotherreduce call at line 7?


On 11/18/14 1:58 PM, Shannon Quinn wrote:

Hi all,
This is somewhat related to my previous question (http://apache-spark-user-list.1001560.n3.nabble.com/Iterative-changes-to-RDD-and-broadcast-variables-tt19042.html, for additional context) but for all practical purposes this is itsown issue.
As in my previous question, I'm making iterative changes to an RDD,where each iteration depends on the results of the previous one. I'vestripped down what was previously a loop to just be two sequentialedits to try and nail down where the problem is. It looks like this:
index = 0
INDEX = sc.broadcast(index)
M = M.flatMap(func1).reduceByKey(func2)
M.foreach(debug_output)
index = 1
INDEX = sc.broadcast(index)
M = M.flatMap(func1)
M.foreach(debug_output)
M is basically a row-indexed matrix, where each index points to adictionary (sparse matrix more or less, with some domain-specificmodifications). This program crashes on the second-to-last (7th) line;the creepy part is that it says the crash happens in "func2" with thebroadcast variable "INDEX" == 1 (it attempts to access an entry thatdoesn't exist in a dictionary of one of the rows).
How is that even possible? Am I missing something fundamental abouthow Spark works under the hood?
Thanks for your help!

Shannon



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Iterative transformations over RDD crashes in phantom reduce

Reply via email to