Re: Multiple filters vs multiple conditions

Vadim Semenov Tue, 03 Oct 2017 09:06:07 -0700

Since you're using Dataset API or RDD API, they won't be fused together by
the Catalyst optimizer unless you use the DF API.
Two filters will get executed within one stage, and there'll be very small
overhead on having two separate filters vs having only one.


On Tue, Oct 3, 2017 at 8:14 AM, Ahmed Mahmoud <don1...@gmail.com> wrote:

> Hi All,
>
> Just a quick question from an optimisation point of view:
>
> Approach 1:
> .filter (t-> t.x=1 && t.y=2)
>
> Approach 2:
> .filter (t-> t.x=1)
> .filter (t-> t.y=2)
>
> Is there a difference or one is better than the other  or both are same?
>
> Thanks!
> Ahmed Mahmoud
>
>

Re: Multiple filters vs multiple conditions

Reply via email to