Hi,

the 4th step should contain "transformrdd2", right?

considering that transformations are lined-up and executed only when
there is an action (also known as lazy execution), I would say that
adding persist() to the step 1 would not do any good (and may even be
harmful as you may lose the optimisations given by lining up the 3 steps
in one operation).

If there is a second action executed on any of the transformation,
persisting the farthest common transformation would be a good idea.

Regards,
--
  Bedrytski Aliaksandr
  sp...@bedryt.ski



On Thu, Sep 29, 2016, at 07:09, Shushant Arora wrote:
> Hi
>
> I have a flow like below
>
> 1.rdd1=some source.transform();
> 2.tranformedrdd1 = rdd1.transform(..);
> 3.transformrdd2 = rdd1.transform(..);
>
> 4.tranformrdd1.action();
>
> Does I need to persist rdd1 to optimise step 2 and 3 ? or since there
> is no lineage breakage so it will work without persist ?
>
> Thanks
>

Reply via email to