I'm not sure, but I wonder if because you are using the Spark REPL that it may not be representing what a normal runtime execution would look like and is possibly eagerly running a partial DAG once you define an operation that would cause a shuffle.
What happens if you setup your same set of commands [a-e] in a file and use the Spark REPL's `load` or `paste` command to load them all at once? On Wed, Apr 29, 2015 at 2:55 PM, Tom Hubregtsen <thubregt...@gmail.com> wrote: > Thanks for the responses. > > "Try removing toDebugString and see what happens. " > > The toDebugString is performed after [d] (the action), as [e]. By then all > stages are already executed. > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Extra-stage-that-executes-before-triggering-computation-with-an-action-tp22707p22712.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >