[
https://issues.apache.org/jira/browse/PIG-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olga Natkovich reassigned PIG-1494:
-----------------------------------
Assignee: Swati Jain
> PIG Logical Optimization: Use CNF in PushUpFilter
> -------------------------------------------------
>
> Key: PIG-1494
> URL: https://issues.apache.org/jira/browse/PIG-1494
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Affects Versions: 0.7.0
> Reporter: Swati Jain
> Assignee: Swati Jain
> Priority: Minor
> Fix For: 0.8.0
>
>
> The PushUpFilter rule is not able to handle complicated boolean expressions.
> For example, SplitFilter rule is splitting one LOFilter into two by "AND".
> However it will not be able to split LOFilter if the top level operator is
> "OR". For example:
> *ex script:*
> A = load 'file_a' USING PigStorage(',') as (a1:int,a2:int,a3:int);
> B = load 'file_b' USING PigStorage(',') as (b1:int,b2:int,b3:int);
> C = load 'file_c' USING PigStorage(',') as (c1:int,c2:int,c3:int);
> J1 = JOIN B by b1, C by c1;
> J2 = JOIN J1 by $0, A by a1;
> D = *Filter J2 by ( (c1 < 10) AND (a3+b3 > 10) ) OR (c2 == 5);*
> explain D;
> In the above example, the PushUpFilter is not able to push any filter
> condition across any join as it contains columns from all branches (inputs).
> But if we convert this expression into "Conjunctive Normal Form" (CNF) then
> we would be able to push filter condition c1< 10 and c2 == 5 below both join
> conditions. Here is the CNF expression for highlighted line:
> ( (c1 < 10) OR (c2 == 5) ) AND ( (a3+b3 > 10) OR (c2 ==5) )
> *Suggestion:* It would be a good idea to convert LOFilter's boolean
> expression into CNF, it would then be easy to push parts (conjuncts) of the
> LOFilter boolean expression selectively. We would also not require rule
> SplitFilter anymore if we were to add this utility to rule PushUpFilter
> itself.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.