[ https://issues.apache.org/jira/browse/PIG-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039661#comment-15039661 ]
liyunzhang_intel commented on PIG-4709: --------------------------------------- [~pallavi.rao]: thanks for your work. i posted a few comments on RB. Can you give more detailed document for this optimization? > Improve performance of GROUPBY operator on Spark > ------------------------------------------------ > > Key: PIG-4709 > URL: https://issues.apache.org/jira/browse/PIG-4709 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: Pallavi Rao > Assignee: Pallavi Rao > Labels: spork > Fix For: spark-branch > > Attachments: PIG-4709.patch > > > Currently, the GROUPBY operator of PIG is mapped by Spark's CoGroup. When the > grouped data is consumed by subsequent operations to perform algebraic > operations, this is sub-optimal as there is lot of shuffle traffic. > The Spark Plan must be optimized to use reduceBy, where possible, so that a > combiner is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)