[jira] [Updated] (HIVE-28732) Sorted dynamic partition optimization does not apply hive.default.nulls.last

ASF GitHub Bot (Jira) Fri, 31 Jan 2025 04:11:09 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-28732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated HIVE-28732:
----------------------------------
    Labels: pull-request-available  (was: )

> Sorted dynamic partition optimization does not apply hive.default.nulls.last
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-28732
>                 URL: https://issues.apache.org/jira/browse/HIVE-28732
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>
> The default value of {{hive.default.nulls.last}} is {{true}} but Sorted 
> dynamic partition optimization generates reduce sink operators with keys 
> ascending order nulls first.
> {code}
> POSTHOOK: query: explain insert overwrite table over1k_part 
> partition(ds="foo", t) select si,i,b,f,t from over1k_n3 where t is null or 
> t=27
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@over1k_n3
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-2 depends on stages: Stage-1
>   Stage-0 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-0
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
> #### A masked pattern was here ####
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>         Reducer 3 <- Map 1 (SIMPLE_EDGE)
> #### A masked pattern was here ####
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: over1k_n3
>                   filterExpr: (t is null or (t = 27Y)) (type: boolean)
>                   Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
> Column stats: NONE
>                   Filter Operator
>                     predicate: (t is null or (t = 27Y)) (type: boolean)
>                     Statistics: Num rows: 1 Data size: 24 Basic stats: 
> COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: si (type: smallint), i (type: int), b 
> (type: bigint), f (type: float), t (type: tinyint)
>                       outputColumnNames: _col0, _col1, _col2, _col3, _col4
>                       Statistics: Num rows: 1 Data size: 24 Basic stats: 
> COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col4 (type: tinyint)
>                         null sort order: a
>                         sort order: +
>                         Map-reduce partition columns: _col4 (type: tinyint)
>                         Statistics: Num rows: 1 Data size: 24 Basic stats: 
> COMPLETE Column stats: NONE
>                         value expressions: _col0 (type: smallint), _col1 
> (type: int), _col2 (type: bigint), _col3 (type: float)
>                       Select Operator
>                         expressions: _col0 (type: smallint), _col1 (type: 
> int), _col2 (type: bigint), _col3 (type: float), 'foo' (type: string), _col4 
> (type: tinyint)
>                         outputColumnNames: si, i, b, f, ds, t
>                         Statistics: Num rows: 1 Data size: 24 Basic stats: 
> COMPLETE Column stats: NONE
>                         Group By Operator
>                           aggregations: min(si), max(si), count(1), 
> count(si), compute_bit_vector_hll(si), min(i), max(i), count(i), 
> compute_bit_vector_hll(i), min(b), max(b), count(b), 
> compute_bit_vector_hll(b), min(f), max(f), count(f), compute_bit_vector_hll(f)
>                           keys: ds (type: string), t (type: tinyint)
>                           minReductionHashAggr: 0.99
>                           mode: hash
>                           outputColumnNames: _col0, _col1, _col2, _col3, 
> _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, 
> _col14, _col15, _col16, _col17, _col18
>                           Statistics: Num rows: 1 Data size: 24 Basic stats: 
> COMPLETE Column stats: NONE
>                           Reduce Output Operator
>                             key expressions: _col0 (type: string), _col1 
> (type: tinyint)
>                             null sort order: zz
>                             sort order: ++
>                             Map-reduce partition columns: _col0 (type: 
> string), _col1 (type: tinyint)
>                             Statistics: Num rows: 1 Data size: 24 Basic 
> stats: COMPLETE Column stats: NONE
>                             value expressions: _col2 (type: smallint), _col3 
> (type: smallint), _col4 (type: bigint), _col5 (type: bigint), _col6 (type: 
> binary), _col7 (type: int), _col8 (type: int), _col9 (type: bigint), _col10 
> (type: binary), _col11 (type: bigint), _col12 (type: bigint), _col13 (type: 
> bigint), _col14 (type: binary), _col15 (type: float), _col16 (type: float), 
> _col17 (type: bigint), _col18 (type: binary)
>             Execution mode: llap
>             LLAP IO: all inputs
>         Reducer 2 
>             Execution mode: llap
>             Reduce Operator Tree:
>               Select Operator
>                 expressions: VALUE._col0 (type: smallint), VALUE._col1 (type: 
> int), VALUE._col2 (type: bigint), VALUE._col3 (type: float), KEY._col4 (type: 
> tinyint)
>                 outputColumnNames: _col0, _col1, _col2, _col3, _col4
>                 File Output Operator
>                   compressed: false
>                   Dp Sort State: PARTITION_SORTED
>                   Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
> Column stats: NONE
>                   table:
>                       input format: org.apache.hadoop.mapred.TextInputFormat
>                       output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                       serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                       name: default.over1k_part
>         Reducer 3 
>             Execution mode: llap
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: min(VALUE._col0), max(VALUE._col1), 
> count(VALUE._col2), count(VALUE._col3), compute_bit_vector_hll(VALUE._col4), 
> min(VALUE._col5), max(VALUE._col6), count(VALUE._col7), 
> compute_bit_vector_hll(VALUE._col8), min(VALUE._col9), max(VALUE._col10), 
> count(VALUE._col11), compute_bit_vector_hll(VALUE._col12), min(VALUE._col13), 
> max(VALUE._col14), count(VALUE._col15), compute_bit_vector_hll(VALUE._col16)
>                 keys: KEY._col0 (type: string), KEY._col1 (type: tinyint)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, 
> _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, 
> _col16, _col17, _col18
>                 Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
> Column stats: NONE
>                 Select Operator
>                   expressions: 'LONG' (type: string), UDFToLong(_col2) (type: 
> bigint), UDFToLong(_col3) (type: bigint), (_col4 - _col5) (type: bigint), 
> COALESCE(ndv_compute_bit_vector(_col6),0) (type: bigint), _col6 (type: 
> binary), 'LONG' (type: string), UDFToLong(_col7) (type: bigint), 
> UDFToLong(_col8) (type: bigint), (_col4 - _col9) (type: bigint), 
> COALESCE(ndv_compute_bit_vector(_col10),0) (type: bigint), _col10 (type: 
> binary), 'LONG' (type: string), _col11 (type: bigint), _col12 (type: bigint), 
> (_col4 - _col13) (type: bigint), COALESCE(ndv_compute_bit_vector(_col14),0) 
> (type: bigint), _col14 (type: binary), 'DOUBLE' (type: string), 
> UDFToDouble(_col15) (type: double), UDFToDouble(_col16) (type: double), 
> (_col4 - _col17) (type: bigint), COALESCE(ndv_compute_bit_vector(_col18),0) 
> (type: bigint), _col18 (type: binary), _col0 (type: string), _col1 (type: 
> tinyint)
>                   outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, 
> _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, 
> _col24, _col25
>                   Statistics: Num rows: 1 Data size: 24 Basic stats: COMPLETE 
> Column stats: NONE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 1 Data size: 24 Basic stats: 
> COMPLETE Column stats: NONE
>                     table:
>                         input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                         output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                         serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-2
>     Dependency Collection
>   Stage: Stage-0
>     Move Operator
>       tables:
>           partition:
>             ds foo
>             t 
>           replace: true
>           table:
>               input format: org.apache.hadoop.mapred.TextInputFormat
>               output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>               serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>               name: default.over1k_part
>   Stage: Stage-3
>     Stats Work
>       Basic Stats Work:
>       Column Stats Desc:
>           Columns: si, i, b, f
>           Column Types: smallint, int, bigint, float
>           Table: default.over1k_part
> {code}
> Focus on the RS operator in Map 1:
> {code}
>                       Reduce Output Operator
>                         key expressions: _col4 (type: tinyint)
>                         null sort order: a
>                         sort order: +
> {code}
> The {{null sort order}} should be {{"z"}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28732) Sorted dynamic partition optimization does not apply hive.default.nulls.last

Reply via email to