Jesus Camacho Rodriguez created HIVE-13750:
----------------------------------------------
Summary: Avoid additional shuffle stage created by Sorted Dynamic
Partition Optimizer when possible
Key: HIVE-13750
URL: https://issues.apache.org/jira/browse/HIVE-13750
Project: Hive
Issue Type: Improvement
Components: Physical Optimizer
Affects Versions: 2.1.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
Extend ReduceDedup to remove additional shuffle stage created by sorted dynamic
partition optimizer when possible, thus avoiding unnecessary work.
By [~ashutoshc]:
{quote}
Currently, if config is on Sorted Dynamic Partition Optimizer (SDPO)
unconditionally adds an extra shuffle stage. If sort columns of previous
shuffle and partitioning columns of table match, reduce sink deduplication
optimizer removes extra shuffle stage, thus bringing down overhead to zero.
However, if they don’t match, we end up doing extra shuffle. This can be
improved since we can add table partition columns as a sort columns on earlier
shuffle and avoid this extra shuffle. This ensures that in cases query already
has a shuffle stage, we are not shuffling data again.
{quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)