"I'm not sure, but I wonder if because you are using the Spark REPL that it
may not be representing what a normal runtime execution would look like and
is possibly eagerly running a partial DAG once you define an operation that
would cause a shuffle.

What happens if you setup your same set of commands [a-e] in a file and use
the Spark REPL's `load` or `paste` command to load them all at once?" From
Richard

I have also packaged it in a jar file (without [e], the debug string), and
still see the extra stage before the other two that I would expect. Even
when I remove [d], the action, I still see stage 0 being executed (and do
not see stage 1 and 2). 

Again a shortened log of the Stage 0:
INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at
sortByKey, which has no missing parents
INFO DAGScheduler: ResultStage 0 (sortByKey) finished in 0.192 s




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Extra-stage-that-executes-before-triggering-computation-with-an-action-tp22707p22713.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to