[
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135176#comment-14135176
]
Zhichun Wu commented on HIVE-6883:
----------------------------------
@ [~prasanth_j] , this fix cause some problems when combine dynamic
partitioning with group by. Consider the following case:
{code}
CREATE TABLE `t1`( `a` int,`b` string) PARTITIONED BY (`dt` string);
create table src1 (
`key` string,
`val` string
);
explain insert overwrite table t1 partition(dt) select 1, "hello", "20140901"
from src1 group by key;
{code}
The key expressions of RS in Stage-2 are wrong. The part of the patch which
using the parent RS's keyCols needs more changes.
{code}
if (parentRSOpOrder != null && !parentRSOpOrder.isEmpty() &&
sortPositions.isEmpty()) {
newKeyCols.addAll(parentRSOp.getConf().getKeyCols());
orderStr += parentRSOpOrder;
}
{code}
> Dynamic partitioning optimization does not honor sort order or order by
> -----------------------------------------------------------------------
>
> Key: HIVE-6883
> URL: https://issues.apache.org/jira/browse/HIVE-6883
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.13.0
> Reporter: Prasanth J
> Assignee: Prasanth J
> Priority: Critical
> Fix For: 0.14.0, 0.13.1
>
> Attachments: HIVE-6883-branch-0.13.3.patch, HIVE-6883.1.patch,
> HIVE-6883.2.patch, HIVE-6883.3.patch
>
>
> HIVE-6455 patch does not honor sort order of the output table or order by of
> select statement. The reason for the former is numDistributionKey in
> ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns,
> because of this RSOp sets the sort columns to null in Key. Since nulls are
> set in place of sort columns in Key, the sort columns in Value are not
> sorted.
> The other issue is ORDER BY columns are not honored during insertion. For
> example
> {code}
> insert overwrite table over1k_part_orc partition(ds="foo", t) select
> si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
> {code}
> the select query performs order by on column 'si' in the first MR job. The
> following MR job (inserted by HIVE-6455), sorts the input data on dynamic
> partition column 't' without taking into account the already sorted 'si'
> column. This results in out of order insertion for 'si' column.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)