[ 
https://issues.apache.org/jira/browse/PIG-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1660:
--------------------------------

    Fix Version/s:     (was: 0.10)
    
> Consider passing result of COUNT/COUNT_STAR to LIMIT 
> -----------------------------------------------------
>
>                 Key: PIG-1660
>                 URL: https://issues.apache.org/jira/browse/PIG-1660
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Viraj Bhat
>
> In realistic scenarios we need to split a dataset into segments by using 
> LIMIT, and like to achieve that goal within the same pig script. Here is a 
> case:
> {code}
> A = load '$DATA' using PigStorage(',') as (id, pvs);
> B = group A by ALL;
> C = foreach B generate COUNT_STAR(A) as row_cnt;
> -- get the low 50% segment
> D = order A by pvs;
> E = limit D (C.row_cnt * 0.2);
> store E in '$Eoutput';
> -- get the high 20% segment
> F = order A by pvs DESC;
> G = limit F (C.row_cnt * 0.2);
> store G in '$Goutput';
> {code}
> Since LIMIT only accepts constants, we have to split the operation to two 
> steps in order to pass in the constants for the LIMIT statements. Please 
> consider bringing this feature in so the processing can be more efficient.
> Viraj

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to