[ https://issues.apache.org/jira/browse/PIG-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Bain updated PIG-3562: --------------------------- Attachment: PIG-3562-0.patch > Implement combiner optimizations for DISTINCT > --------------------------------------------- > > Key: PIG-3562 > URL: https://issues.apache.org/jira/browse/PIG-3562 > Project: Pig > Issue Type: Sub-task > Components: tez > Affects Versions: tez-branch > Reporter: Alex Bain > Assignee: Alex Bain > Fix For: tez-branch > > Attachments: PIG-3562-0.patch > > > Currently, DISTINCT is implemented in a straightforward manner per > https://issues.apache.org/jira/browse/PIG-3538. > However, we can implement two types of combiner optimizations for DISTINCT, > just as the MRCompiler does for map-reduce: > 1. A simple DistinctCombiner that throws away the duplicate tuples > 2. An optimizer that transforms certain uses of DISTINCT into an algebraic > udf form -- This message was sent by Atlassian JIRA (v6.1.5#6160)