Krisztian Kasa created HIVE-28732:
-------------------------------------
Summary: Sorted dynamic partition optimization does not apply
hive.default.nulls.last
Key: HIVE-28732
URL: https://issues.apache.org/jira/browse/HIVE-28732
Project: Hive
Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
The default value of {{hive.default.nulls.last}} is {{true}} but Sorted dynamic
partition optimization generates reduce sink operators with keys ascending
order nulls first.
{code}
POSTHOOK: query: explain insert overwrite table over1k_part partition(ds="foo",
t) select si,i,b,f,t from over1k_n3 where t is null or t=27
POSTHOOK: type: QUERY
POSTHOOK: Input: default@over1k_n3
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-0 depends on stages: Stage-2
Stage-3 depends on stages: Stage-0
STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Map 1 (SIMPLE_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: over1k_n3
filterExpr: (t is null or (t = 27Y)) (type: boolean)
Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: (t is null or (t = 27Y)) (type: boolean)
Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: si (type: smallint), i (type: int), b (type:
bigint), f (type: float), t (type: tinyint)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 24 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col4 (type: tinyint)
null sort order: a
sort order: +
Map-reduce partition columns: _col4 (type: tinyint)
Statistics: Num rows: 1 Data size: 24 Basic stats:
COMPLETE Column stats: NONE
value expressions: _col0 (type: smallint), _col1 (type:
int), _col2 (type: bigint), _col3 (type: float)
Select Operator
expressions: _col0 (type: smallint), _col1 (type: int),
_col2 (type: bigint), _col3 (type: float), 'foo' (type: string), _col4 (type:
tinyint)
outputColumnNames: si, i, b, f, ds, t
Statistics: Num rows: 1 Data size: 24 Basic stats:
COMPLETE Column stats: NONE
Group By Operator
aggregations: min(si), max(si), count(1), count(si),
compute_bit_vector_hll(si), min(i), max(i), count(i),
compute_bit_vector_hll(i), min(b), max(b), count(b), compute_bit_vector_hll(b),
min(f), max(f), count(f), compute_bit_vector_hll(f)
keys: ds (type: string), t (type: tinyint)
minReductionHashAggr: 0.99
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3, _col4,
_col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14,
_col15, _col16, _col17, _col18
Statistics: Num rows: 1 Data size: 24 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type:
tinyint)
null sort order: zz
sort order: ++
Map-reduce partition columns: _col0 (type: string),
_col1 (type: tinyint)
Statistics: Num rows: 1 Data size: 24 Basic stats:
COMPLETE Column stats: NONE
value expressions: _col2 (type: smallint), _col3
(type: smallint), _col4 (type: bigint), _col5 (type: bigint), _col6 (type:
binary), _col7 (type: int), _col8 (type: int), _col9 (type: bigint), _col10
(type: binary), _col11 (type: bigint), _col12 (type: bigint), _col13 (type:
bigint), _col14 (type: binary), _col15 (type: float), _col16 (type: float),
_col17 (type: bigint), _col18 (type: binary)
Execution mode: llap
LLAP IO: all inputs
Reducer 2
Execution mode: llap
Reduce Operator Tree:
Select Operator
expressions: VALUE._col0 (type: smallint), VALUE._col1 (type:
int), VALUE._col2 (type: bigint), VALUE._col3 (type: float), KEY._col4 (type:
tinyint)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
File Output Operator
compressed: false
Dp Sort State: PARTITION_SORTED
Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE
Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.over1k_part
Reducer 3
Execution mode: llap
Reduce Operator Tree:
Group By Operator
aggregations: min(VALUE._col0), max(VALUE._col1),
count(VALUE._col2), count(VALUE._col3), compute_bit_vector_hll(VALUE._col4),
min(VALUE._col5), max(VALUE._col6), count(VALUE._col7),
compute_bit_vector_hll(VALUE._col8), min(VALUE._col9), max(VALUE._col10),
count(VALUE._col11), compute_bit_vector_hll(VALUE._col12), min(VALUE._col13),
max(VALUE._col14), count(VALUE._col15), compute_bit_vector_hll(VALUE._col16)
keys: KEY._col0 (type: string), KEY._col1 (type: tinyint)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15,
_col16, _col17, _col18
Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: 'LONG' (type: string), UDFToLong(_col2) (type:
bigint), UDFToLong(_col3) (type: bigint), (_col4 - _col5) (type: bigint),
COALESCE(ndv_compute_bit_vector(_col6),0) (type: bigint), _col6 (type: binary),
'LONG' (type: string), UDFToLong(_col7) (type: bigint), UDFToLong(_col8) (type:
bigint), (_col4 - _col9) (type: bigint),
COALESCE(ndv_compute_bit_vector(_col10),0) (type: bigint), _col10 (type:
binary), 'LONG' (type: string), _col11 (type: bigint), _col12 (type: bigint),
(_col4 - _col13) (type: bigint), COALESCE(ndv_compute_bit_vector(_col14),0)
(type: bigint), _col14 (type: binary), 'DOUBLE' (type: string),
UDFToDouble(_col15) (type: double), UDFToDouble(_col16) (type: double), (_col4
- _col17) (type: bigint), COALESCE(ndv_compute_bit_vector(_col18),0) (type:
bigint), _col18 (type: binary), _col0 (type: string), _col1 (type: tinyint)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15,
_col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25
Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE
Column stats: NONE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-2
Dependency Collection
Stage: Stage-0
Move Operator
tables:
partition:
ds foo
t
replace: true
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.over1k_part
Stage: Stage-3
Stats Work
Basic Stats Work:
Column Stats Desc:
Columns: si, i, b, f
Column Types: smallint, int, bigint, float
Table: default.over1k_part
{code}
Focus on the RS operator in Map 1:
{code}
Reduce Output Operator
key expressions: _col4 (type: tinyint)
null sort order: a
sort order: +
{code}
The {{null sort order}} should be {{"z"}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)