[ https://issues.apache.org/jira/browse/PIG-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292983#comment-13292983 ]
Daniel Dai commented on PIG-2747: --------------------------------- My understanding is there is a union after T1, T2, right? Yes we only merge the consecutive filter into "and" condition. We don't merge "or" condition. So you want filter cond1, filter cond2 -> union ==> filter cond1 or cond2 > Support more predicate pushdown to a data source by pulling up multiple > predicates from branches using the same data source > --------------------------------------------------------------------------------------------------------------------------- > > Key: PIG-2747 > URL: https://issues.apache.org/jira/browse/PIG-2747 > Project: Pig > Issue Type: Improvement > Reporter: Yu Xu > Priority: Minor > > consider the following example: > T = load ... ; > T1 = filter T by col == 'hello'; > T2 = filter T by col =='world'; > currently Pig optimizer does not combine the two predicates and cannot push > down the predicates to the data sources (via LoadMetadata). Thus the data > source cannot do any filtering. A full table/file scan is required. > A current more efficient workaround (by hand) is to rewrite the above script > to the following equivalent one: > T = load ...; > T = filter T by col == 'hello' or col == 'world' ; > T1 = filter T by col == 'hello'; > T2 = filter T by col == 'world'; > the above script enables Pig to push down the predicate (col == 'hello' or > col == 'world') to the data source to use available partitions/indexes for > potentially much more efficient processing. > This JIRA is created to request PIG optimizer to perform the above type of > optimization automatically. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira