[
https://issues.apache.org/jira/browse/HIVE-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904009#comment-13904009
]
Remus Rusanu commented on HIVE-6449:
------------------------------------
[~prasanth_j] thanks for the guidance. Since the difference reproes on ORC
files, I focused on them now to eliminate any Parquet related problem. For my
test ORC file, created as
{code}
CREATE TABLE decimal_mapjoin STORED AS ORC AS
SELECT cdouble, CAST (((cdouble*22.1)/37) AS DECIMAL(20,10)) AS cdecimal1,
CAST (((cdouble*9.3)/13) AS DECIMAL(23,14)) AS cdecimal2,
cint
FROM alltypesorc;
{code}
I get the following stats in describe extended:
{code}
describe extended decimal_mapjoin;
...
Windows: {numFiles=1, COLUMN_STATS_ACCURATE=true,
transient_lastDdlTime=1392727196, numRows=0, totalSize=126087, rawDataSize=0}
Linux: {numFiles=1, transient_lastDdlTime=1392722507,
COLUMN_STATS_ACCURATE=true, totalSize=126087, numRows=12288,
rawDataSize=2165060} ...
{code}
So the problem is that neither ROW_COUNT nor RAW_DATA_SIZE are initialized
properly. I'm investigating.
> EXPLAIN has diffs in Statistics in tests generated on Windows vs. test
> generated on Linux
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-6449
> URL: https://issues.apache.org/jira/browse/HIVE-6449
> Project: Hive
> Issue Type: Bug
> Components: Tests
> Reporter: Remus Rusanu
> Assignee: Remus Rusanu
> Priority: Critical
>
> When .q.out files are generated on Windows the statistics in EXPLAIN differ
> from ones generated on Linux. Eg:
> {code}
> Running: diff -a
> /root/hive/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/vectorized_parquet.q.out
>
> /root/hive/itests/qtest/../../ql/src/test/results/clientpositive/vectorized_parquet.q.out
> 72c72
> < Statistics: Num rows: 12288 Data size: 73728 Basic stats:
> COMPLETE Column stats: NONE
> ---
> > Statistics: Num rows: 2072 Data size: 257046 Basic stats:
> > COMPLETE Column stats: NONE
> 75c75
> < Statistics: Num rows: 6144 Data size: 36864 Basic stats:
> COMPLETE Column stats: NONE
> ---
> > Statistics: Num rows: 1036 Data size: 128523 Basic stats:
> > COMPLETE Column stats: NONE
> {code}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)