Hi all,
This is somewhat related to my previous question (
http://apache-spark-user-list.1001560.n3.nabble.com/Iterative-changes-to-RDD-and-broadcast-variables-tt19042.html
, for additional context) but for all practical purposes this is its own
issue.
As in my previous question, I'm making
To clarify about what, precisely, is impossible: the crash happens with
INDEX == 1 in func2, but func2 is only called in the reduceByKey
transformation when INDEX == 0. And according to the output of the
foreach() in line 4, that reduceByKey(func2) works just fine. How is it
then invoked again
Sorry everyone--turns out an oft-forgotten single line of code was
required to make this work:
index = 0
INDEX = sc.broadcast(index)
M = M.flatMap(func1).reduceByKey(func2)
M.foreach(debug_output)
*M.cache()*
index = 1
INDEX = sc.broadcast(index)
M = M.flatMap(func1)
M.foreach(debug_output)