Yu Xu created PIG-2747:
--------------------------

             Summary: Support more predicate pushdown to a data source by 
pulling up multiple predicates from branches using the same data source
                 Key: PIG-2747
                 URL: https://issues.apache.org/jira/browse/PIG-2747
             Project: Pig
          Issue Type: Improvement
            Reporter: Yu Xu
            Priority: Minor


consider the following example:

T = load ... ;
T1 = filter T by col == 'hello';
T2 = filter T by col =='world';

currently Pig optimizer does not combine the two predicates and cannot push 
down the predicates to the data sources (via LoadMetadata).  Thus the data 
source cannot do any filtering. A full table/file scan is required.

A current more efficient workaround (by hand) is to rewrite the above script to 
the following equivalent one:

T = load ...;
T = filter T by col == 'hello' or col == 'world' ;
T1 = filter T by col == 'hello';
T2 = filter T by col == 'world';

the above script enables Pig to push down the predicate (col == 'hello' or col 
== 'world') to the data source to use available partitions/indexes for 
potentially much more efficient processing. 

This JIRA is created to request PIG optimizer to perform the above type of 
optimization automatically. 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to