[ https://issues.apache.org/jira/browse/PIG-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-1494: -------------------------------- Unlinking from 0.8 since we are about to branch for release > PIG Logical Optimization: Use CNF in PushUpFilter > ------------------------------------------------- > > Key: PIG-1494 > URL: https://issues.apache.org/jira/browse/PIG-1494 > Project: Pig > Issue Type: Improvement > Components: impl > Affects Versions: 0.7.0 > Reporter: Swati Jain > Assignee: Swati Jain > Priority: Minor > > The PushUpFilter rule is not able to handle complicated boolean expressions. > For example, SplitFilter rule is splitting one LOFilter into two by "AND". > However it will not be able to split LOFilter if the top level operator is > "OR". For example: > *ex script:* > A = load 'file_a' USING PigStorage(',') as (a1:int,a2:int,a3:int); > B = load 'file_b' USING PigStorage(',') as (b1:int,b2:int,b3:int); > C = load 'file_c' USING PigStorage(',') as (c1:int,c2:int,c3:int); > J1 = JOIN B by b1, C by c1; > J2 = JOIN J1 by $0, A by a1; > D = *Filter J2 by ( (c1 < 10) AND (a3+b3 > 10) ) OR (c2 == 5);* > explain D; > In the above example, the PushUpFilter is not able to push any filter > condition across any join as it contains columns from all branches (inputs). > But if we convert this expression into "Conjunctive Normal Form" (CNF) then > we would be able to push filter condition c1< 10 and c2 == 5 below both join > conditions. Here is the CNF expression for highlighted line: > ( (c1 < 10) OR (c2 == 5) ) AND ( (a3+b3 > 10) OR (c2 ==5) ) > *Suggestion:* It would be a good idea to convert LOFilter's boolean > expression into CNF, it would then be easy to push parts (conjuncts) of the > LOFilter boolean expression selectively. We would also not require rule > SplitFilter anymore if we were to add this utility to rule PushUpFilter > itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.