Consider passing result of COUNT/COUNT_STAR to LIMIT
-----------------------------------------------------
Key: PIG-1660
URL: https://issues.apache.org/jira/browse/PIG-1660
Project: Pig
Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Fix For: 0.9.0
In realistic scenarios we need to split a dataset into segments by using LIMIT,
and like to achieve that goal within the same pig script. Here is a case:
{code}
A = load '$DATA' using PigStorage(',') as (id, pvs);
B = group A by ALL;
C = foreach B generate COUNT_STAR(A) as row_cnt;
-- get the low 50% segment
D = order A by pvs;
E = limit D (C.row_cnt * 0.2);
store E in '$Eoutput';
-- get the high 20% segment
F = order A by pvs DESC;
G = limit F (C.row_cnt * 0.2);
store G in '$Goutput';
{code}
Since LIMIT only accepts constants, we have to split the operation to two steps
in order to pass in the constants for the LIMIT statements. Please consider
bringing this feature in so the processing can be more efficient.
Viraj
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.