[ https://issues.apache.org/jira/browse/HIVE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jesus Camacho Rodriguez updated HIVE-17037: ------------------------------------------- Attachment: HIVE-17073.03.patch > Extend join algorithm selection to avoid unnecessary input data shuffle > ----------------------------------------------------------------------- > > Key: HIVE-17037 > URL: https://issues.apache.org/jira/browse/HIVE-17037 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer > Affects Versions: 3.0.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17037.01.patch, HIVE-17037.02.patch, > HIVE-17037.patch, HIVE-17073.03.patch > > > As an example, consider the following query: > {code:sql} > SELECT * > FROM ( > SELECT a.value > FROM src1 a > JOIN src1 b > ON (a.value = b.value) > GROUP BY a.value > ) a > JOIN src > ON (a.value = src.value); > {code} > Currently, the plan generated for Tez will contain an unnecessary shuffle > operation between the subquery and the join, since the records produced by > the subquery are already sorted by the value. > This issue is to extend join algorithm selection to be able to shuffle only > some of the inputs for a given join and avoid unnecessary shuffle operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)