[
https://issues.apache.org/jira/browse/HIVE-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304497#comment-14304497
]
Ashutosh Chauhan commented on HIVE-9560:
----------------------------------------
[~prasanth_j] what you explained makes sense, but I think it will be fair on
part of user to run analyze command without noscan/partialscan and expect it to
work. Resetting rawDataSize to 0 as Hive does today, I will consider that as a
bug. I think we should see check in analyze statement that its for ORC table
and in such cases automatically do noscan analyze for them (which also will be
more performant).
Thoughts?
> When hive.stats.collect.rawdatasize=true, 'rawDataSize' for an ORC table will
> result in value '0' after running 'analyze table TABLE_NAME compute
> statistics;'
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-9560
> URL: https://issues.apache.org/jira/browse/HIVE-9560
> Project: Hive
> Issue Type: Bug
> Reporter: Xin Hao
>
> When hive.stats.collect.rawdatasize=true, 'rawDataSize' for an ORC table will
> result in value '0' after running 'analyze table TABLE_NAME compute
> statistics;'
> Reproduce step:
> (1) set hive.stats.collect.rawdatasize=true;
> (2) Generate an ORC table in hive, and the value of its 'rawDataSize' is NOT
> zero.
> You can find the value of 'rawDataSize' (NOT zero) by executing 'describe
> extended TABLE_NAME;'
> (4) Execute 'analyze table TABLE_NAME compute statistics;'
> (5) Execute 'describe extended TABLE_NAME;' again, and you will find that
> the value of 'rawDataSize' will be changed to '0'.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)