[
https://issues.apache.org/jira/browse/HIVE-28732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HIVE-28732:
----------------------------------
Labels: pull-request-available (was: )
> Sorted dynamic partition optimization does not apply hive.default.nulls.last
> ----------------------------------------------------------------------------
>
> Key: HIVE-28732
> URL: https://issues.apache.org/jira/browse/HIVE-28732
> Project: Hive
> Issue Type: Bug
> Reporter: Krisztian Kasa
> Assignee: Krisztian Kasa
> Priority: Major
> Labels: pull-request-available
>
> The default value of {{hive.default.nulls.last}} is {{true}} but Sorted
> dynamic partition optimization generates reduce sink operators with keys
> ascending order nulls first.
> {code}
> POSTHOOK: query: explain insert overwrite table over1k_part
> partition(ds="foo", t) select si,i,b,f,t from over1k_n3 where t is null or
> t=27
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@over1k_n3
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-2 depends on stages: Stage-1
> Stage-0 depends on stages: Stage-2
> Stage-3 depends on stages: Stage-0
> STAGE PLANS:
> Stage: Stage-1
> Tez
> #### A masked pattern was here ####
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Map 1 (SIMPLE_EDGE)
> #### A masked pattern was here ####
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: over1k_n3
> filterExpr: (t is null or (t = 27Y)) (type: boolean)
> Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE
> Column stats: NONE
> Filter Operator
> predicate: (t is null or (t = 27Y)) (type: boolean)
> Statistics: Num rows: 1 Data size: 24 Basic stats:
> COMPLETE Column stats: NONE
> Select Operator
> expressions: si (type: smallint), i (type: int), b
> (type: bigint), f (type: float), t (type: tinyint)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4
> Statistics: Num rows: 1 Data size: 24 Basic stats:
> COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col4 (type: tinyint)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col4 (type: tinyint)
> Statistics: Num rows: 1 Data size: 24 Basic stats:
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: smallint), _col1
> (type: int), _col2 (type: bigint), _col3 (type: float)
> Select Operator
> expressions: _col0 (type: smallint), _col1 (type:
> int), _col2 (type: bigint), _col3 (type: float), 'foo' (type: string), _col4
> (type: tinyint)
> outputColumnNames: si, i, b, f, ds, t
> Statistics: Num rows: 1 Data size: 24 Basic stats:
> COMPLETE Column stats: NONE
> Group By Operator
> aggregations: min(si), max(si), count(1),
> count(si), compute_bit_vector_hll(si), min(i), max(i), count(i),
> compute_bit_vector_hll(i), min(b), max(b), count(b),
> compute_bit_vector_hll(b), min(f), max(f), count(f), compute_bit_vector_hll(f)
> keys: ds (type: string), t (type: tinyint)
> minReductionHashAggr: 0.99
> mode: hash
> outputColumnNames: _col0, _col1, _col2, _col3,
> _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13,
> _col14, _col15, _col16, _col17, _col18
> Statistics: Num rows: 1 Data size: 24 Basic stats:
> COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: string), _col1
> (type: tinyint)
> null sort order: zz
> sort order: ++
> Map-reduce partition columns: _col0 (type:
> string), _col1 (type: tinyint)
> Statistics: Num rows: 1 Data size: 24 Basic
> stats: COMPLETE Column stats: NONE
> value expressions: _col2 (type: smallint), _col3
> (type: smallint), _col4 (type: bigint), _col5 (type: bigint), _col6 (type:
> binary), _col7 (type: int), _col8 (type: int), _col9 (type: bigint), _col10
> (type: binary), _col11 (type: bigint), _col12 (type: bigint), _col13 (type:
> bigint), _col14 (type: binary), _col15 (type: float), _col16 (type: float),
> _col17 (type: bigint), _col18 (type: binary)
> Execution mode: llap
> LLAP IO: all inputs
> Reducer 2
> Execution mode: llap
> Reduce Operator Tree:
> Select Operator
> expressions: VALUE._col0 (type: smallint), VALUE._col1 (type:
> int), VALUE._col2 (type: bigint), VALUE._col3 (type: float), KEY._col4 (type:
> tinyint)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4
> File Output Operator
> compressed: false
> Dp Sort State: PARTITION_SORTED
> Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE
> Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> name: default.over1k_part
> Reducer 3
> Execution mode: llap
> Reduce Operator Tree:
> Group By Operator
> aggregations: min(VALUE._col0), max(VALUE._col1),
> count(VALUE._col2), count(VALUE._col3), compute_bit_vector_hll(VALUE._col4),
> min(VALUE._col5), max(VALUE._col6), count(VALUE._col7),
> compute_bit_vector_hll(VALUE._col8), min(VALUE._col9), max(VALUE._col10),
> count(VALUE._col11), compute_bit_vector_hll(VALUE._col12), min(VALUE._col13),
> max(VALUE._col14), count(VALUE._col15), compute_bit_vector_hll(VALUE._col16)
> keys: KEY._col0 (type: string), KEY._col1 (type: tinyint)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
> _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15,
> _col16, _col17, _col18
> Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE
> Column stats: NONE
> Select Operator
> expressions: 'LONG' (type: string), UDFToLong(_col2) (type:
> bigint), UDFToLong(_col3) (type: bigint), (_col4 - _col5) (type: bigint),
> COALESCE(ndv_compute_bit_vector(_col6),0) (type: bigint), _col6 (type:
> binary), 'LONG' (type: string), UDFToLong(_col7) (type: bigint),
> UDFToLong(_col8) (type: bigint), (_col4 - _col9) (type: bigint),
> COALESCE(ndv_compute_bit_vector(_col10),0) (type: bigint), _col10 (type:
> binary), 'LONG' (type: string), _col11 (type: bigint), _col12 (type: bigint),
> (_col4 - _col13) (type: bigint), COALESCE(ndv_compute_bit_vector(_col14),0)
> (type: bigint), _col14 (type: binary), 'DOUBLE' (type: string),
> UDFToDouble(_col15) (type: double), UDFToDouble(_col16) (type: double),
> (_col4 - _col17) (type: bigint), COALESCE(ndv_compute_bit_vector(_col18),0)
> (type: bigint), _col18 (type: binary), _col0 (type: string), _col1 (type:
> tinyint)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14,
> _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23,
> _col24, _col25
> Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE
> Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 24 Basic stats:
> COMPLETE Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-2
> Dependency Collection
> Stage: Stage-0
> Move Operator
> tables:
> partition:
> ds foo
> t
> replace: true
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> name: default.over1k_part
> Stage: Stage-3
> Stats Work
> Basic Stats Work:
> Column Stats Desc:
> Columns: si, i, b, f
> Column Types: smallint, int, bigint, float
> Table: default.over1k_part
> {code}
> Focus on the RS operator in Map 1:
> {code}
> Reduce Output Operator
> key expressions: _col4 (type: tinyint)
> null sort order: a
> sort order: +
> {code}
> The {{null sort order}} should be {{"z"}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)