[ 
https://issues.apache.org/jira/browse/HIVE-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302891#comment-14302891
 ] 

Xin Hao commented on HIVE-9560:
-------------------------------

For example, we have an ORC table named 'item'.

(a) Before running 'analyze table item compute statistics;',
the 'rawDataSize' was '884720592'.

The result of 'describe extended item':
Detailed Table Information      Table(tableName:item, dbName:bigbenchorc, 
owner:root, createTime:1421984899, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:i_item_sk, type:bigint, 
comment:null), FieldSchema(name:i_item_id, type:string, comment:null), 
FieldSchema(name:i_rec_start_date, type:string, comment:null), 
FieldSchema(name:i_rec_end_date, type:string, comment:null), 
FieldSchema(name:i_item_desc, type:string, comment:null), 
FieldSchema(name:i_current_price, type:double, comment:null), 
FieldSchema(name:i_wholesale_cost, type:double, comment:null), 
FieldSchema(name:i_brand_id, type:int, comment:null), FieldSchema(name:i_brand, 
type:string, comment:null), FieldSchema(name:i_class_id, type:int, 
comment:null), FieldSchema(name:i_class, type:string, comment:null), 
FieldSchema(name:i_category_id, type:int, comment:null), 
FieldSchema(name:i_category, type:string, comment:null), 
FieldSchema(name:i_manufact_id, type:int, comment:null), 
FieldSchema(name:i_manufact, type:string, comment:null), 
FieldSchema(name:i_size, type:string, comment:null), 
FieldSchema(name:i_formulation, type:string, comment:null), 
FieldSchema(name:i_color, type:string, comment:null), FieldSchema(name:i_units, 
type:string, comment:null), FieldSchema(name:i_container, type:string, 
comment:null), FieldSchema(name:i_manager_id, type:int, comment:null), 
FieldSchema(name:i_product_name, type:string, comment:null)], 
location:hdfs://bhx1:8020/user/hive/warehouse/bigbenchorc.db/item, 
inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, 
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
partitionKeys:[], parameters:{numFiles=4, transient_lastDdlTime=1421984899, 
COLUMN_STATS_ACCURATE=true, totalSize=83267548, numRows=563518, 
rawDataSize=884720592}, viewOriginalText:null, viewExpandedText:null, 
tableType:MANAGED_TABLE)
Time taken: 0.527 seconds, Fetched: 24 row(s)

(b)After running 'analyze table TABLE_NAME compute statistics;'
the 'rawDataSize' will be changed to '0',

The result of 'describe extended item':
Detailed Table Information      Table(tableName:item, dbName:bigbenchorc, 
owner:root, createTime:1421984899, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:i_item_sk, type:bigint, 
comment:null), FieldSchema(name:i_item_id, type:string, comment:null), 
FieldSchema(name:i_rec_start_date, type:string, comment:null), 
FieldSchema(name:i_rec_end_date, type:string, comment:null), 
FieldSchema(name:i_item_desc, type:string, comment:null), 
FieldSchema(name:i_current_price, type:double, comment:null), 
FieldSchema(name:i_wholesale_cost, type:double, comment:null), 
FieldSchema(name:i_brand_id, type:int, comment:null), FieldSchema(name:i_brand, 
type:string, comment:null), FieldSchema(name:i_class_id, type:int, 
comment:null), FieldSchema(name:i_class, type:string, comment:null), 
FieldSchema(name:i_category_id, type:int, comment:null), 
FieldSchema(name:i_category, type:string, comment:null), 
FieldSchema(name:i_manufact_id, type:int, comment:null), 
FieldSchema(name:i_manufact, type:string, comment:null), 
FieldSchema(name:i_size, type:string, comment:null), 
FieldSchema(name:i_formulation, type:string, comment:null), 
FieldSchema(name:i_color, type:string, comment:null), FieldSchema(name:i_units, 
type:string, comment:null), FieldSchema(name:i_container, type:string, 
comment:null), FieldSchema(name:i_manager_id, type:int, comment:null), 
FieldSchema(name:i_product_name, type:string, comment:null)], 
location:hdfs://bhx1:8020/user/hive/warehouse/bigbenchorc.db/item, 
inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, 
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
partitionKeys:[], parameters:{numFiles=4, transient_lastDdlTime=1421984899, 
COLUMN_STATS_ACCURATE=true, totalSize=83267548, numRows=563518, 
rawDataSize=884720592}, viewOriginalText:null, viewExpandedText:null, 
tableType:MANAGED_TABLE)
Time taken: 0.527 seconds, Fetched: 24 row(s)

> When hive.stats.collect.rawdatasize=true, 'rawDataSize' for an ORC table will 
> result in value '0' after running 'analyze table TABLE_NAME compute 
> statistics;'
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-9560
>                 URL: https://issues.apache.org/jira/browse/HIVE-9560
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Xin Hao
>
> When hive.stats.collect.rawdatasize=true, 'rawDataSize' for an ORC table will 
> result in value '0' after running 'analyze table TABLE_NAME compute 
> statistics;'
> Reproduce step:
> (1) set hive.stats.collect.rawdatasize=true;
> (2) Generate an ORC table in hive, and the value of its 'rawDataSize' is NOT 
> zero.
> You can find the value of 'rawDataSize' (NOT zero) by executing  'describe 
> extended TABLE_NAME;' 
> (4) Execute 'analyze table TABLE_NAME compute statistics;'
> (5) Execute  'describe extended TABLE_NAME;' again, and you will find that  
> the value of 'rawDataSize' will be changed to '0'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to