[ 
https://issues.apache.org/jira/browse/HIVE-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941569#comment-14941569
 ] 

Ashutosh Chauhan commented on HIVE-9447:
----------------------------------------

This patch needs more work. First, it needs to be rebased on master. Second, it 
executes the query outside of Transaction, which needs to be fixed.

> Metastore: inefficient Oracle query for removing unused column descriptors 
> when add/drop table/partition
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-9447
>                 URL: https://issues.apache.org/jira/browse/HIVE-9447
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0
>            Reporter: Selina Zhang
>            Assignee: Selina Zhang
>         Attachments: HIVE-9447.1.patch, HIVE-9447.2.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> Metastore needs removing unused column descriptors when drop/add partitions 
> or tables. For query the unused column descriptor, the current implementation 
> utilizes datanuleus' range function, which basically equals LIMIT syntax. 
> However, Oracle does not support LIMIT, the query is converted as  
> {quote}
> SQL> SELECT * FROM (SELECT subq.*,ROWNUM rn FROM (SELECT
> 'org.apache.hadoop.hive.metastore.model.MStorageDescriptor' AS
> NUCLEUS_TYPE,A0.INPUT_FORMAT,A0.IS_COMPRESSED,A0.IS_STOREDASSUBDIRECTORIES,A0.LOCATION,
> A0.NUM_BUCKETS,A0.OUTPUT_FORMAT,A0.SD_ID FROM drhcat.SDS A0 
> WHERE A0.CD_ID = ? ) subq ) WHERE  rn <= 1;
> {quote}
> Given that CD_ID is not very selective, this query may have to access large 
> amount of rows (depends how many partitions the table has, millions of rows 
> in our case). Metastore may become unresponsive because of this. 
> Since Metastore only needs to know if the specific CD_ID is referenced in SDS 
> table and does not need access the whole row. We can use 
> {quote}
> select count(1) from SDS where SDS.CD_ID=?
> {quote}
> CD_ID is index column, the above query will do range scan for index, which is 
> faster. 
> For other DBs support LIMIT syntax such as MySQL, this problem does not 
> exist. However, the new query does not hurt.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to