Vineet Garg created HIVE-20757:
----------------------------------
Summary: Autogather stats doesn't work when SDPO (sort dynamic
partition optimization) is ON
Key: HIVE-20757
URL: https://issues.apache.org/jira/browse/HIVE-20757
Project: Hive
Issue Type: Bug
Components: Statistics
Affects Versions: 4.0.0
Reporter: Vineet Garg
*Reproducer*
{code:sql}
set hive.optimize.sort.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.stats.autogather=true;
create table t11(i int, j int) partitioned by (s string);
insert into t11 partition(s) values(3,4, 'p1'),(4,5, 'p2'),(6,9,'p3');
hive> desc formatted t11 j;
OK
col_name j
data_type int
min
max
num_nulls
distinct_count
avg_col_len
max_col_len
num_trues
num_falses
bitVector
comment from deserializer
COLUMN_STATS_ACCURATE {}
{code}
{code:sql}
hive> explain insert into t11 partition(s) values(3,4, 'p1'),(4,5,
'p2'),(6,9,'p3');
STAGE PLANS:
Stage: Stage-1
Tez
DagId: vgarg_20181016113701_f3aa9f8f-b38b-47a8-8149-b5521bf072f6:13
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
DagName: vgarg_20181016113701_f3aa9f8f-b38b-47a8-8149-b5521bf072f6:13
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
expressions: array(const struct(3,4,'p1'),const
struct(4,5,'p2'),const struct(6,9,'p3')) (type:
array<struct<col1:int,col2:int,col3:string>>)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 64 Basic stats: COMPLETE
Column stats: COMPLETE
UDTF Operator
Statistics: Num rows: 1 Data size: 64 Basic stats:
COMPLETE Column stats: COMPLETE
function name: inline
Select Operator
expressions: col1 (type: int), col2 (type: int), col3
(type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col2 (type: string)
sort order: +
Map-reduce partition columns: _col2 (type: string)
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: int), _col1 (type:
int)
Reducer 2
Execution mode: vectorized
Reduce Operator Tree:
Select Operator
expressions: VALUE._col0 (type: int), VALUE._col1 (type: int),
KEY._col2 (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
File Output Operator
compressed: false
Dp Sort State: PARTITION_SORTED
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.t11
Stage: Stage-2
Dependency Collection
Stage: Stage-0
Move Operator
tables:
partition:
s
replace: false
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.t11
Stage: Stage-3
Stats Work
Basic Stats Work:
Column Stats Desc:
Columns: i, j
Column Types: int, int
Table: default.t11
{code}
Notice that explain plan has autogather stats branch missing
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)