Re: Custom persist or cache of RDD?

2014-11-11 Thread Daniel Siegmann
But that requires an (unnecessary) load from disk. I have run into this same issue, where we want to save intermediate results but continue processing. The cache / persist feature of Spark doesn't seem designed for this case. Unfortunately I'm not aware of a better solution with the current

Re: Custom persist or cache of RDD?

2014-11-10 Thread Sean Owen
Well you can always create C by loading B from disk, and likewise for E / D. No need for any custom procedure. On Mon, Nov 10, 2014 at 7:33 PM, Benyi Wang bewang.t...@gmail.com wrote: When I have a multi-step process flow like this: A - B - C - D - E - F I need to store B and D's results