[ 
https://issues.apache.org/jira/browse/HIVE-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304497#comment-14304497
 ] 

Ashutosh Chauhan commented on HIVE-9560:
----------------------------------------

[~prasanth_j] what you explained makes sense, but I think it will be fair on 
part of user to run analyze command without noscan/partialscan and expect it to 
work. Resetting rawDataSize to 0 as Hive does today, I will consider that as a 
bug. I think we should see check in analyze statement that its for ORC table 
and in such cases automatically do noscan analyze for them (which also will be 
more performant). 
Thoughts?

> When hive.stats.collect.rawdatasize=true, 'rawDataSize' for an ORC table will 
> result in value '0' after running 'analyze table TABLE_NAME compute 
> statistics;'
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-9560
>                 URL: https://issues.apache.org/jira/browse/HIVE-9560
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Xin Hao
>
> When hive.stats.collect.rawdatasize=true, 'rawDataSize' for an ORC table will 
> result in value '0' after running 'analyze table TABLE_NAME compute 
> statistics;'
> Reproduce step:
> (1) set hive.stats.collect.rawdatasize=true;
> (2) Generate an ORC table in hive, and the value of its 'rawDataSize' is NOT 
> zero.
> You can find the value of 'rawDataSize' (NOT zero) by executing  'describe 
> extended TABLE_NAME;' 
> (4) Execute 'analyze table TABLE_NAME compute statistics;'
> (5) Execute  'describe extended TABLE_NAME;' again, and you will find that  
> the value of 'rawDataSize' will be changed to '0'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to