Looking at the failed Jenkins runs for HIVE-5998, I see there are diffs in the
statistics in the EXPLAIN:
Running: diff -a
/root/hive/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/vectorized_parquet.q.out
/root/hive/itests/qtest/../../ql/src/test/results/clientpositive/vectorized_parquet.q.out
72c72
< Statistics: Num rows: 12288 Data size: 73728 Basic stats:
COMPLETE Column stats: NONE
---
> Statistics: Num rows: 2072 Data size: 257046 Basic stats:
> COMPLETE Column stats: NONE
75c75
< Statistics: Num rows: 6144 Data size: 36864 Basic stats:
COMPLETE Column stats: NONE
---
> Statistics: Num rows: 1036 Data size: 128523 Basic stats:
> COMPLETE Column stats: NONE
79c79
< Statistics: Num rows: 6144 Data size: 36864 Basic stats:
COMPLETE Column stats: NONE
---
> Statistics: Num rows: 1036 Data size: 128523 Basic stats:
> COMPLETE Column stats: NONE
82c82
< Statistics: Num rows: 10 Data size: 60 Basic stats:
COMPLETE Column stats: NONE
---
> Statistics: Num rows: 10 Data size: 1240 Basic stats:
> COMPLETE Column stats: NONE
What would cause such statistics diffs? The Parquet file is created as:
create table if not exists alltypes_parquet (
cint int,
ctinyint tinyint,
csmallint smallint,
cfloat float,
cdouble double,
cstring1 string) stored as parquet;
insert overwrite table alltypes_parquet
select cint,
ctinyint,
csmallint,
cfloat,
cdouble,
cstring1
from alltypesorc;
Note that there are no diffs in the actual query results.
Thanks,
~Remus