Re: Multiple filters vs multiple conditions

2017-10-03 Thread Vadim Semenov
Since you're using Dataset API or RDD API, they won't be fused together by the Catalyst optimizer unless you use the DF API. Two filters will get executed within one stage, and there'll be very small overhead on having two separate filters vs having only one. On Tue, Oct 3, 2017 at 8:14 AM, Ahmed

Re: Multiple filters vs multiple conditions

2017-10-03 Thread Michael Artz
Hi Ahmed, Depending on which version you have it could matter. We received an email about multiple conditions in the filter not being picked up. I copied the email below that was sent out the the spark user list. The use never tried multiple one condition filters which might have worked. Hi

Re: Multiple filters vs multiple conditions

2017-10-03 Thread ayan guha
Remember transformations are lazy.so nothing happens until you call an action.at that point both are same. On Tue, Oct 3, 2017 at 11:19 PM, Femi Anthony wrote: > I would assume that the optimizer would end up transforming both to the > same expression. > > Femi > >

Re: Multiple filters vs multiple conditions

2017-10-03 Thread Femi Anthony
I would assume that the optimizer would end up transforming both to the same expression. Femi Sent from my iPhone > On Oct 3, 2017, at 8:14 AM, Ahmed Mahmoud wrote: > > Hi All, > > Just a quick question from an optimisation point of view: > > Approach 1: > .filter (t->

Multiple filters vs multiple conditions

2017-10-03 Thread Ahmed Mahmoud
Hi All, Just a quick question from an optimisation point of view: Approach 1: .filter (t-> t.x=1 && t.y=2) Approach 2: .filter (t-> t.x=1) .filter (t-> t.y=2) Is there a difference or one is better than the other or both are same? Thanks! Ahmed Mahmoud