[ https://issues.apache.org/jira/browse/PIG-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1437: ---------------------------- Assignee: Xuefu Zhang Fix Version/s: 0.9.0 > [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct > ----------------------------------------------------------------- > > Key: PIG-1437 > URL: https://issues.apache.org/jira/browse/PIG-1437 > Project: Pig > Issue Type: Sub-task > Components: impl > Affects Versions: 0.7.0 > Reporter: Ashutosh Chauhan > Assignee: Xuefu Zhang > Priority: Minor > Fix For: 0.9.0 > > > Its possible to rewrite queries like this > {code} > A = load 'data' as (name,age); > B = group A by (name,age); > C = foreach B generate group.name, group.age; > dump C; > {code} > or > {code} > (name,age); > B = group A by (name > A = load 'data' as,age); > C = foreach B generate flatten(group); > dump C; > {code} > to > {code} > A = load 'data' as (name,age); > B = distinct A; > dump B; > {code} > This could only be done if no columns within the bags are referenced > subsequently in the script. Since in Pig-Hadoop world DISTINCT will be > executed more effeciently then group-by this will be a huge win. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.