Add LIMIT as a statement that works in nested FOREACH -----------------------------------------------------
Key: PIG-741 URL: https://issues.apache.org/jira/browse/PIG-741 Project: Pig Issue Type: New Feature Reporter: David Ciemiewicz I'd like to compute the top 10 results in each group. The natural way to express this in Pig would be: {code} A = load '...' using PigStorage() as ( date: int, count: int, url: chararray ); B = group A by ( date ); C = foreach B { D = order A by count desc; E = limit D 10; generate FLATTEN(E); }; dump C; {code} Yeah, I could write a UDF / PiggyBank function to take the top n results. But since LIMIT already exists as a statement, it seems like it should also work in the nested foreach context. Example workaround code. {code} C = foreach B { D = order A by count desc; E = util.TOP(D, 10); generate FLATTEN(E); }; dump C; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.