[jira] [Created] (HIVE-28732) Sorted dynamic partition optimization does not apply hive.default.nulls.last

Krisztian Kasa (Jira) Fri, 31 Jan 2025 04:05:47 -0800

Krisztian Kasa created HIVE-28732:
-------------------------------------

             Summary: Sorted dynamic partition optimization does not apply 
hive.default.nulls.last
                 Key: HIVE-28732
                 URL: https://issues.apache.org/jira/browse/HIVE-28732
             Project: Hive
          Issue Type: Bug
            Reporter: Krisztian Kasa
            Assignee: Krisztian Kasa



The default value of {{hive.default.nulls.last}} is {{true}} but Sorted dynamic 
partition optimization generates reduce sink operators with keys ascending 
order nulls first.
{code}
POSTHOOK: query: explain insert overwrite table over1k_part partition(ds="foo", 
t) select si,i,b,f,t from over1k_n3 where t is null or t=27
POSTHOOK: type: QUERY
POSTHOOK: Input: default@over1k_n3
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-0

STAGE PLANS:
  Stage: Stage-1
    Tez
#### A masked pattern was here ####
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE)
        Reducer 3 <- Map 1 (SIMPLE_EDGE)
#### A masked pattern was here ####
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: over1k_n3
                  filterExpr: (t is null or (t = 27Y)) (type: boolean)
                  Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
Column stats: NONE
                  Filter Operator
                    predicate: (t is null or (t = 27Y)) (type: boolean)
                    Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
Column stats: NONE
                    Select Operator
                      expressions: si (type: smallint), i (type: int), b (type: 
bigint), f (type: float), t (type: tinyint)
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4
                      Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col4 (type: tinyint)
                        null sort order: a
                        sort order: +
                        Map-reduce partition columns: _col4 (type: tinyint)
                        Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
                        value expressions: _col0 (type: smallint), _col1 (type: 
int), _col2 (type: bigint), _col3 (type: float)
                      Select Operator
                        expressions: _col0 (type: smallint), _col1 (type: int), 
_col2 (type: bigint), _col3 (type: float), 'foo' (type: string), _col4 (type: 
tinyint)
                        outputColumnNames: si, i, b, f, ds, t
                        Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
                        Group By Operator
                          aggregations: min(si), max(si), count(1), count(si), 
compute_bit_vector_hll(si), min(i), max(i), count(i), 
compute_bit_vector_hll(i), min(b), max(b), count(b), compute_bit_vector_hll(b), 
min(f), max(f), count(f), compute_bit_vector_hll(f)
                          keys: ds (type: string), t (type: tinyint)
                          minReductionHashAggr: 0.99
                          mode: hash
                          outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, 
_col15, _col16, _col17, _col18
                          Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
                          Reduce Output Operator
                            key expressions: _col0 (type: string), _col1 (type: 
tinyint)
                            null sort order: zz
                            sort order: ++
                            Map-reduce partition columns: _col0 (type: string), 
_col1 (type: tinyint)
                            Statistics: Num rows: 1 Data size: 24 Basic stats: 
COMPLETE Column stats: NONE
                            value expressions: _col2 (type: smallint), _col3 
(type: smallint), _col4 (type: bigint), _col5 (type: bigint), _col6 (type: 
binary), _col7 (type: int), _col8 (type: int), _col9 (type: bigint), _col10 
(type: binary), _col11 (type: bigint), _col12 (type: bigint), _col13 (type: 
bigint), _col14 (type: binary), _col15 (type: float), _col16 (type: float), 
_col17 (type: bigint), _col18 (type: binary)
            Execution mode: llap
            LLAP IO: all inputs
        Reducer 2 
            Execution mode: llap
            Reduce Operator Tree:
              Select Operator
                expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
int), VALUE._col2 (type: bigint), VALUE._col3 (type: float), KEY._col4 (type: 
tinyint)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4
                File Output Operator
                  compressed: false
                  Dp Sort State: PARTITION_SORTED
                  Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
Column stats: NONE
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      name: default.over1k_part
        Reducer 3 
            Execution mode: llap
            Reduce Operator Tree:
              Group By Operator
                aggregations: min(VALUE._col0), max(VALUE._col1), 
count(VALUE._col2), count(VALUE._col3), compute_bit_vector_hll(VALUE._col4), 
min(VALUE._col5), max(VALUE._col6), count(VALUE._col7), 
compute_bit_vector_hll(VALUE._col8), min(VALUE._col9), max(VALUE._col10), 
count(VALUE._col11), compute_bit_vector_hll(VALUE._col12), min(VALUE._col13), 
max(VALUE._col14), count(VALUE._col15), compute_bit_vector_hll(VALUE._col16)
                keys: KEY._col0 (type: string), KEY._col1 (type: tinyint)
                mode: mergepartial
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, 
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, 
_col16, _col17, _col18
                Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
Column stats: NONE
                Select Operator
                  expressions: 'LONG' (type: string), UDFToLong(_col2) (type: 
bigint), UDFToLong(_col3) (type: bigint), (_col4 - _col5) (type: bigint), 
COALESCE(ndv_compute_bit_vector(_col6),0) (type: bigint), _col6 (type: binary), 
'LONG' (type: string), UDFToLong(_col7) (type: bigint), UDFToLong(_col8) (type: 
bigint), (_col4 - _col9) (type: bigint), 
COALESCE(ndv_compute_bit_vector(_col10),0) (type: bigint), _col10 (type: 
binary), 'LONG' (type: string), _col11 (type: bigint), _col12 (type: bigint), 
(_col4 - _col13) (type: bigint), COALESCE(ndv_compute_bit_vector(_col14),0) 
(type: bigint), _col14 (type: binary), 'DOUBLE' (type: string), 
UDFToDouble(_col15) (type: double), UDFToDouble(_col16) (type: double), (_col4 
- _col17) (type: bigint), COALESCE(ndv_compute_bit_vector(_col18),0) (type: 
bigint), _col18 (type: binary), _col0 (type: string), _col1 (type: tinyint)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, 
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, 
_col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25
                  Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
Column stats: NONE
                    table:
                        input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-2
    Dependency Collection

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            ds foo
            t 
          replace: true
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: default.over1k_part

  Stage: Stage-3
    Stats Work
      Basic Stats Work:
      Column Stats Desc:
          Columns: si, i, b, f
          Column Types: smallint, int, bigint, float
          Table: default.over1k_part
{code}
Focus on the RS operator in Map 1:
{code}
                      Reduce Output Operator
                        key expressions: _col4 (type: tinyint)
                        null sort order: a
                        sort order: +
{code}
The {{null sort order}} should be {{"z"}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-28732) Sorted dynamic partition optimization does not apply hive.default.nulls.last

Reply via email to