I don't think a union is required for this to make sense. 

On Jun 11, 2012, at 11:58 AM, "Daniel Dai (JIRA)" <j...@apache.org> wrote:

> 
>    [ 
> https://issues.apache.org/jira/browse/PIG-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292983#comment-13292983
>  ] 
> 
> Daniel Dai commented on PIG-2747:
> ---------------------------------
> 
> My understanding is there is a union after T1, T2, right?
> 
> Yes we only merge the consecutive filter into "and" condition. We don't merge 
> "or" condition. So you want
> 
> filter cond1, filter cond2 -> union ==> filter cond1 or cond2
> 
>> Support more predicate pushdown to a data source by pulling up multiple 
>> predicates from branches using the same data source
>> ---------------------------------------------------------------------------------------------------------------------------
>> 
>>                Key: PIG-2747
>>                URL: https://issues.apache.org/jira/browse/PIG-2747
>>            Project: Pig
>>         Issue Type: Improvement
>>           Reporter: Yu Xu
>>           Priority: Minor
>> 
>> consider the following example:
>> T = load ... ;
>> T1 = filter T by col == 'hello';
>> T2 = filter T by col =='world';
>> currently Pig optimizer does not combine the two predicates and cannot push 
>> down the predicates to the data sources (via LoadMetadata).  Thus the data 
>> source cannot do any filtering. A full table/file scan is required.
>> A current more efficient workaround (by hand) is to rewrite the above script 
>> to the following equivalent one:
>> T = load ...;
>> T = filter T by col == 'hello' or col == 'world' ;
>> T1 = filter T by col == 'hello';
>> T2 = filter T by col == 'world';
>> the above script enables Pig to push down the predicate (col == 'hello' or 
>> col == 'world') to the data source to use available partitions/indexes for 
>> potentially much more efficient processing. 
>> This JIRA is created to request PIG optimizer to perform the above type of 
>> optimization automatically. 
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA 
> administrators: 
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 

Reply via email to