Well you can always create C by loading B from disk, and likewise for E / D. No need for any custom procedure.
On Mon, Nov 10, 2014 at 7:33 PM, Benyi Wang <bewang.t...@gmail.com> wrote: > When I have a multi-step process flow like this: > > A -> B -> C -> D -> E -> F > > I need to store B and D's results into parquet files > > B.saveAsParquetFile > D.saveAsParquetFile > > If I don't cache/persist any step, spark might recompute from A,B,C,D and E > if something is wrong in F. > > Of course, I'd better cache all steps if I have enough memory to avoid this > re-computation, or persist result to disk. But persisting B and D seems > duplicate with saving B and D as parquet files. > > I'm wondering if spark can restore B and D from the parquet files using a > customized persist and restore procedure? > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org