Re: previous stage results are not saved?

2016-10-17 Thread Mark Hamstra
There is no need to do that if 1) the stage that you are concerned with either made use of or produced MapOutputs/shuffle files; 2) reuse of those shuffle files (which may very well be in the OS buffer cache of the worker nodes) is sufficient for your needs; 3) the relevant Stage objects haven't

Re: previous stage results are not saved?

2016-10-17 Thread ayan guha
You can use cache or persist. On Tue, Oct 18, 2016 at 10:11 AM, Yang wrote: > I'm trying out 2.0, and ran a long job with 10 stages, in spark-shell > > it seems that after all 10 finished successfully, if I run the last, or > the 9th again, > spark reruns all the previous

previous stage results are not saved?

2016-10-17 Thread Yang
I'm trying out 2.0, and ran a long job with 10 stages, in spark-shell it seems that after all 10 finished successfully, if I run the last, or the 9th again, spark reruns all the previous stages from scratch, instead of utilizing the partial results. this is quite serious since I can't experiment