[ https://issues.apache.org/jira/browse/PIG-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheolsoo Park updated PIG-3510: ------------------------------- Description: This is a regression from PIG-3461 - rewrite of partition filter optimizer. Here is an example that demonstrates the problem: {code:title=two filters} b = FILTER a BY (dateint >= 20130901 AND dateint <= 20131001); c = FILTER b BY (event_id == 419 OR event_id == 418); {code} {code:title=one filter} b = FILTER a BY (dateint >= 20130901 AND dateint <= 20131001) AND (event_id == 419 OR event_id == 418); {code} Both dateint and event_id are partition columns. For the 1 filter case, the whole expression is pushed down whereas for the 2 filter case, only (event_id == 419 OR event_id == 418) is pushed down. was: This is a regression from PIG-3461 - rewrite of partition filter optimizer. Here is an example that demonstrates the problem: {code:title=two filters} b = FILTER a BY (dateint >= 20130901 AND dateint <= 20131001); c = FILTER b BY (event_id == 419 OR event_id == 418); {code} {code:title=one filter} b = FILTER a BY (dateint >= 20130901 AND dateint <= 20131001) AND (event_id == 419 OR event_id == 418); {code} Both dateint and event_id are partition columns. For the 1 filter case, the whole expression is pushed down whereas for the 2 filter case, only (event_id == 419 OR event_id == 418) is pushed down. The reason is the filter extractor overwrites the pushdown expression that it extracted from the 1st statement while visiting the 2nd statement. {code} private Expression pushdownExpr = null; {code} The old filter extractor used to keep pushdown expressions in array and assemble them with AND at the end. {code} private ArrayList<Expression> pColConditions = new ArrayList<Expression>(); {code} After debugging further, I found that the real problem is that the order of optimization rules being applied has changed by PIG-3461. In particular, MergeFilter is applied *after* NewPartitionFilterOptimizer resulting that only partial expressions are pushed down. Old order: # MergeFilter # PartitionFilterOptimizer New order: # NewPartitionFilterOptimizer # MergeFilter > New filter extractor fails with more than one filter statement > -------------------------------------------------------------- > > Key: PIG-3510 > URL: https://issues.apache.org/jira/browse/PIG-3510 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.12.0 > Reporter: Cheolsoo Park > Assignee: Cheolsoo Park > Fix For: 0.12.1 > > > This is a regression from PIG-3461 - rewrite of partition filter optimizer. > Here is an example that demonstrates the problem: > {code:title=two filters} > b = FILTER a BY (dateint >= 20130901 AND dateint <= 20131001); > c = FILTER b BY (event_id == 419 OR event_id == 418); > {code} > {code:title=one filter} > b = FILTER a BY (dateint >= 20130901 AND dateint <= 20131001) AND (event_id > == 419 OR event_id == 418); > {code} > Both dateint and event_id are partition columns. For the 1 filter case, the > whole expression is pushed down whereas for the 2 filter case, only (event_id > == 419 OR event_id == 418) is pushed down. -- This message was sent by Atlassian JIRA (v6.1#6144)