Hi Jeff, that looks sane to me. Do you have additional details? On 1 August 2017 at 11:05, jeff saremi <jeffsar...@hotmail.com> wrote:
> Calling cache/persist fails all our jobs (i have posted 2 threads on > this). > > And we're giving up hope in finding a solution. > So I'd like to find a workaround for that: > > If I save an RDD to hdfs and read it back, can I use it in more than one > operation? > > Example: (using cache) > // do a whole bunch of transformations on an RDD > > myrdd.cache() > > val result1 = myrdd.map(op1(_)) > > val result2 = myrdd.map(op2(_)) > > // in the above I am assuming that a call to cache will prevent all > previous transformation from being calculated twice > > I'd like to somehow get result1 and result2 without duplicating work. How > can I do that? > > thanks > > Jeff >