Hi, the 4th step should contain "transformrdd2", right?
considering that transformations are lined-up and executed only when there is an action (also known as lazy execution), I would say that adding persist() to the step 1 would not do any good (and may even be harmful as you may lose the optimisations given by lining up the 3 steps in one operation). If there is a second action executed on any of the transformation, persisting the farthest common transformation would be a good idea. Regards, -- Bedrytski Aliaksandr sp...@bedryt.ski On Thu, Sep 29, 2016, at 07:09, Shushant Arora wrote: > Hi > > I have a flow like below > > 1.rdd1=some source.transform(); > 2.tranformedrdd1 = rdd1.transform(..); > 3.transformrdd2 = rdd1.transform(..); > > 4.tranformrdd1.action(); > > Does I need to persist rdd1 to optimise step 2 and 3 ? or since there > is no lineage breakage so it will work without persist ? > > Thanks >