Since you're using Dataset API or RDD API, they won't be fused together by
the Catalyst optimizer unless you use the DF API.
Two filters will get executed within one stage, and there'll be very small
overhead on having two separate filters vs having only one.
On Tue, Oct 3, 2017 at 8:14 AM, Ahmed
Hi Ahmed,
Depending on which version you have it could matter. We received an email
about multiple conditions in the filter not being picked up. I copied the
email below that was sent out the the spark user list. The use never tried
multiple one condition filters which might have worked.
Hi
Remember transformations are lazy.so nothing happens until you call an
action.at that point both are same.
On Tue, Oct 3, 2017 at 11:19 PM, Femi Anthony wrote:
> I would assume that the optimizer would end up transforming both to the
> same expression.
>
> Femi
>
>
I would assume that the optimizer would end up transforming both to the same
expression.
Femi
Sent from my iPhone
> On Oct 3, 2017, at 8:14 AM, Ahmed Mahmoud wrote:
>
> Hi All,
>
> Just a quick question from an optimisation point of view:
>
> Approach 1:
> .filter (t->
Hi All,
Just a quick question from an optimisation point of view:
Approach 1:
.filter (t-> t.x=1 && t.y=2)
Approach 2:
.filter (t-> t.x=1)
.filter (t-> t.y=2)
Is there a difference or one is better than the other or both are same?
Thanks!
Ahmed Mahmoud