[ https://issues.apache.org/jira/browse/PIG-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates updated PIG-741: --------------------------- Resolution: Fixed Status: Resolved (was: Patch Available) Patch checked in. > Add LIMIT as a statement that works in nested FOREACH > ----------------------------------------------------- > > Key: PIG-741 > URL: https://issues.apache.org/jira/browse/PIG-741 > Project: Pig > Issue Type: New Feature > Reporter: David Ciemiewicz > Assignee: Alan Gates > Fix For: 0.3.0 > > Attachments: PIG-741.patch > > > I'd like to compute the top 10 results in each group. > The natural way to express this in Pig would be: > {code} > A = load '...' using PigStorage() as ( > date: int, > count: int, > url: chararray > ); > B = group A by ( date ); > C = foreach B { > D = order A by count desc; > E = limit D 10; > generate > FLATTEN(E); > }; > dump C; > {code} > Yeah, I could write a UDF / PiggyBank function to take the top n results. But > since LIMIT already exists as a statement, it seems like it should also work > in the nested foreach context. > Example workaround code. > {code} > C = foreach B { > D = order A by count desc; > E = util.TOP(D, 10); > generate > FLATTEN(E); > }; > dump C; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.