[ https://issues.apache.org/jira/browse/PIG-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031385#comment-15031385 ]
Pallavi Rao commented on PIG-4709: ---------------------------------- [~mohitsabharwal], [~xuefuz], review please? > Improve performance of GROUPBY operator on Spark > ------------------------------------------------ > > Key: PIG-4709 > URL: https://issues.apache.org/jira/browse/PIG-4709 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: Pallavi Rao > Assignee: Pallavi Rao > Labels: spork > Fix For: spark-branch > > Attachments: PIG-4709.patch > > > Currently, the GROUPBY operator of PIG is mapped by Spark's CoGroup. When the > grouped data is consumed by subsequent operations to perform algebraic > operations, this is sub-optimal as there is lot of shuffle traffic. > The Spark Plan must be optimized to use reduceBy, where possible, so that a > combiner is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)