[ https://issues.apache.org/jira/browse/HIVE-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304497#comment-14304497 ]
Ashutosh Chauhan commented on HIVE-9560: ---------------------------------------- [~prasanth_j] what you explained makes sense, but I think it will be fair on part of user to run analyze command without noscan/partialscan and expect it to work. Resetting rawDataSize to 0 as Hive does today, I will consider that as a bug. I think we should see check in analyze statement that its for ORC table and in such cases automatically do noscan analyze for them (which also will be more performant). Thoughts? > When hive.stats.collect.rawdatasize=true, 'rawDataSize' for an ORC table will > result in value '0' after running 'analyze table TABLE_NAME compute > statistics;' > -------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-9560 > URL: https://issues.apache.org/jira/browse/HIVE-9560 > Project: Hive > Issue Type: Bug > Reporter: Xin Hao > > When hive.stats.collect.rawdatasize=true, 'rawDataSize' for an ORC table will > result in value '0' after running 'analyze table TABLE_NAME compute > statistics;' > Reproduce step: > (1) set hive.stats.collect.rawdatasize=true; > (2) Generate an ORC table in hive, and the value of its 'rawDataSize' is NOT > zero. > You can find the value of 'rawDataSize' (NOT zero) by executing 'describe > extended TABLE_NAME;' > (4) Execute 'analyze table TABLE_NAME compute statistics;' > (5) Execute 'describe extended TABLE_NAME;' again, and you will find that > the value of 'rawDataSize' will be changed to '0'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)