Since you're using Dataset API or RDD API, they won't be fused together by the Catalyst optimizer unless you use the DF API. Two filters will get executed within one stage, and there'll be very small overhead on having two separate filters vs having only one.
On Tue, Oct 3, 2017 at 8:14 AM, Ahmed Mahmoud <don1...@gmail.com> wrote: > Hi All, > > Just a quick question from an optimisation point of view: > > Approach 1: > .filter (t-> t.x=1 && t.y=2) > > Approach 2: > .filter (t-> t.x=1) > .filter (t-> t.y=2) > > Is there a difference or one is better than the other or both are same? > > Thanks! > Ahmed Mahmoud > >