[ https://issues.apache.org/jira/browse/PIG-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pallavi Rao updated PIG-4709: ----------------------------- Attachment: PIG-4709.patch Initial patch. Handles algebraic operations on grouped data. There are certain cases the patch does not handle. For example, when constant expressions are used. Plan to handle those in a separate JIRA > Improve performance of GROUPBY operator on Spark > ------------------------------------------------ > > Key: PIG-4709 > URL: https://issues.apache.org/jira/browse/PIG-4709 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: Pallavi Rao > Assignee: Pallavi Rao > Labels: spork > Fix For: spark-branch > > Attachments: PIG-4709.patch > > > Currently, the GROUPBY operator of PIG is mapped by Spark's CoGroup. When the > grouped data is consumed by subsequent operations to perform algebraic > operations, this is sub-optimal as there is lot of shuffle traffic. > The Spark Plan must be optimized to use reduceBy, where possible, so that a > combiner is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)