[ 
https://issues.apache.org/jira/browse/HIVE-20246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alice Fan updated HIVE-20246:
-----------------------------
    Description: 
By default, Hive collects stats when running operations like alter partitioned 
table, alter unpartitioned_table, create table, and create external table. 
However, collecting stats requires Metastore lists all files under the table 
directory and the file listing operation can be very expensive particularly on 
filesystems like S3.

This Jira aims at introducing DO_NOT_UPDATE_STATS into the above operations to 
provide user a configurable option to stop collecting stats at table level. For 
example, by 'Alter Table S3_Table set 
tblproperties('DO_NOT_UPDATE_STATS'='TRUE');' MetaStore should stop collecting 
stats for the specified S3_Table.



  was:
When hive.stats.autogather=true then the Metastore lists all files under the 
table directory to populate basic stats like file counts and sizes. This file 
listing operation can be very expensive particularly on filesystems like S3.
One way to address this issue is to reconfigure hive.stats.autogather=false.
However, set metaconf:hive.stats.autogather=false will not be taken by 
HiveMetaStore when user set this in session.


        Summary: Configurable collecting stats by using DO_NOT_UPDATE_STATS 
table property  (was: Make some collect stats flags be user configurable)

> Configurable collecting stats by using DO_NOT_UPDATE_STATS table property
> -------------------------------------------------------------------------
>
>                 Key: HIVE-20246
>                 URL: https://issues.apache.org/jira/browse/HIVE-20246
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Alice Fan
>            Assignee: Alice Fan
>            Priority: Minor
>             Fix For: 4.0.0
>
>
> By default, Hive collects stats when running operations like alter 
> partitioned table, alter unpartitioned_table, create table, and create 
> external table. However, collecting stats requires Metastore lists all files 
> under the table directory and the file listing operation can be very 
> expensive particularly on filesystems like S3.
> This Jira aims at introducing DO_NOT_UPDATE_STATS into the above operations 
> to provide user a configurable option to stop collecting stats at table 
> level. For example, by 'Alter Table S3_Table set 
> tblproperties('DO_NOT_UPDATE_STATS'='TRUE');' MetaStore should stop 
> collecting stats for the specified S3_Table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to