[ 
https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24870:
--------------------------------
    Summary: Metastore: cleanup unused column descriptors asynchronously in 
batches  (was: Metastore: cleanup unused column descriptors asynchronously)

> Metastore: cleanup unused column descriptors asynchronously in batches
> ----------------------------------------------------------------------
>
>                 Key: HIVE-24870
>                 URL: https://issues.apache.org/jira/browse/HIVE-24870
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>
> HIVE-2246 introduces CD_ID for optimizing metastore db (details there). 
> ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called 
> in every alter partition kind of operation. During a replication, 
> alterPartition could be a heavy path, and has no direct advantage of running 
> removeUnusedColumnDescriptor immediately. Moreover, there is a 
> {code}
> select count(*) from "SDS" where "CD_ID"=12345;
> {code}
> kind of query in it, which can take a relatively long time compared to alter 
> partition. 
> {code}
>       query = pm.newQuery("select count(1) from " +
>         "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where 
> (this.cd == inCD)");
>       query.declareParameters("MColumnDescriptor inCD");
> {code}
> My proposal is to run this in a batched way, in every configurable amount of 
> seconds/minutes/whatever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to