[jira] [Commented] (PIG-2747) Support more predicate pushdown to a data source by pulling up multiple predicates from branches using the same data source

Daniel Dai (JIRA) Mon, 11 Jun 2012 11:58:45 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292983#comment-13292983
 ]


Daniel Dai commented on PIG-2747:
---------------------------------

My understanding is there is a union after T1, T2, right?

Yes we only merge the consecutive filter into "and" condition. We don't merge 
"or" condition. So you want

filter cond1, filter cond2 -> union ==> filter cond1 or cond2
                
> Support more predicate pushdown to a data source by pulling up multiple 
> predicates from branches using the same data source
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2747
>                 URL: https://issues.apache.org/jira/browse/PIG-2747
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Yu Xu
>            Priority: Minor
>
> consider the following example:
> T = load ... ;
> T1 = filter T by col == 'hello';
> T2 = filter T by col =='world';
> currently Pig optimizer does not combine the two predicates and cannot push 
> down the predicates to the data sources (via LoadMetadata).  Thus the data 
> source cannot do any filtering. A full table/file scan is required.
> A current more efficient workaround (by hand) is to rewrite the above script 
> to the following equivalent one:
> T = load ...;
> T = filter T by col == 'hello' or col == 'world' ;
> T1 = filter T by col == 'hello';
> T2 = filter T by col == 'world';
> the above script enables Pig to push down the predicate (col == 'hello' or 
> col == 'world') to the data source to use available partitions/indexes for 
> potentially much more efficient processing. 
> This JIRA is created to request PIG optimizer to perform the above type of 
> optimization automatically. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2747) Support more predicate pushdown to a data source by pulling up multiple predicates from branches using the same data source

Reply via email to