Iterative transformations over RDD crashes in phantom reduce

2014-11-18 Thread Shannon Quinn
Hi all, This is somewhat related to my previous question ( http://apache-spark-user-list.1001560.n3.nabble.com/Iterative-changes-to-RDD-and-broadcast-variables-tt19042.html , for additional context) but for all practical purposes this is its own issue. As in my previous question, I'm making

Re: Iterative transformations over RDD crashes in phantom reduce

2014-11-18 Thread Shannon Quinn
To clarify about what, precisely, is impossible: the crash happens with INDEX == 1 in func2, but func2 is only called in the reduceByKey transformation when INDEX == 0. And according to the output of the foreach() in line 4, that reduceByKey(func2) works just fine. How is it then invoked again

Re: Iterative transformations over RDD crashes in phantom reduce

2014-11-18 Thread Shannon Quinn
Sorry everyone--turns out an oft-forgotten single line of code was required to make this work: index = 0 INDEX = sc.broadcast(index) M = M.flatMap(func1).reduceByKey(func2) M.foreach(debug_output) *M.cache()* index = 1 INDEX = sc.broadcast(index) M = M.flatMap(func1) M.foreach(debug_output)