Looking at the failed Jenkins runs for HIVE-5998, I see there are diffs in the 
statistics in the EXPLAIN:

Running: diff -a 
/root/hive/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/vectorized_parquet.q.out
 
/root/hive/itests/qtest/../../ql/src/test/results/clientpositive/vectorized_parquet.q.out
72c72
<             Statistics: Num rows: 12288 Data size: 73728 Basic stats: 
COMPLETE Column stats: NONE
---
>             Statistics: Num rows: 2072 Data size: 257046 Basic stats: 
> COMPLETE Column stats: NONE
75c75
<               Statistics: Num rows: 6144 Data size: 36864 Basic stats: 
COMPLETE Column stats: NONE
---
>               Statistics: Num rows: 1036 Data size: 128523 Basic stats: 
> COMPLETE Column stats: NONE
79c79
<                 Statistics: Num rows: 6144 Data size: 36864 Basic stats: 
COMPLETE Column stats: NONE
---
>                 Statistics: Num rows: 1036 Data size: 128523 Basic stats: 
> COMPLETE Column stats: NONE
82c82
<                   Statistics: Num rows: 10 Data size: 60 Basic stats: 
COMPLETE Column stats: NONE
---
>                   Statistics: Num rows: 10 Data size: 1240 Basic stats: 
> COMPLETE Column stats: NONE

What would cause such statistics diffs? The Parquet file is created as:

create table if not exists alltypes_parquet (
  cint int,
  ctinyint tinyint,
  csmallint smallint,
  cfloat float,
  cdouble double,
  cstring1 string) stored as parquet;

insert overwrite table alltypes_parquet
  select cint,
    ctinyint,
    csmallint,
    cfloat,
    cdouble,
    cstring1
  from alltypesorc;

Note that there are no diffs in the actual query results.

Thanks,
~Remus

Reply via email to