There is no need to do that if 1) the stage that you are concerned with
either made use of or produced MapOutputs/shuffle files; 2) reuse of those
shuffle files (which may very well be in the OS buffer cache of the worker
nodes) is sufficient for your needs; 3) the relevant Stage objects haven't
You can use cache or persist.
On Tue, Oct 18, 2016 at 10:11 AM, Yang wrote:
> I'm trying out 2.0, and ran a long job with 10 stages, in spark-shell
>
> it seems that after all 10 finished successfully, if I run the last, or
> the 9th again,
> spark reruns all the previous
I'm trying out 2.0, and ran a long job with 10 stages, in spark-shell
it seems that after all 10 finished successfully, if I run the last, or the
9th again,
spark reruns all the previous stages from scratch, instead of utilizing the
partial results.
this is quite serious since I can't experiment