[ https://issues.apache.org/jira/browse/HIVE-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292449#comment-14292449 ]
Selina Zhang commented on HIVE-9447: ------------------------------------ The unit test failures seem irrelevant to this patch. > Metastore: inefficient Oracle query for removing unused column descriptors > when add/drop table/partition > -------------------------------------------------------------------------------------------------------- > > Key: HIVE-9447 > URL: https://issues.apache.org/jira/browse/HIVE-9447 > Project: Hive > Issue Type: Improvement > Components: Metastore > Affects Versions: 0.14.0 > Reporter: Selina Zhang > Assignee: Selina Zhang > Attachments: HIVE-9447.1.patch > > Original Estimate: 3h > Remaining Estimate: 3h > > Metastore needs removing unused column descriptors when drop/add partitions > or tables. For query the unused column descriptor, the current implementation > utilizes datanuleus' range function, which basically equals LIMIT syntax. > However, Oracle does not support LIMIT, the query is converted as > {quote} > SQL> SELECT * FROM (SELECT subq.*,ROWNUM rn FROM (SELECT > 'org.apache.hadoop.hive.metastore.model.MStorageDescriptor' AS > NUCLEUS_TYPE,A0.INPUT_FORMAT,A0.IS_COMPRESSED,A0.IS_STOREDASSUBDIRECTORIES,A0.LOCATION, > A0.NUM_BUCKETS,A0.OUTPUT_FORMAT,A0.SD_ID FROM drhcat.SDS A0 > WHERE A0.CD_ID = ? ) subq ) WHERE rn <= 1; > {quote} > Given that CD_ID is not very selective, this query may have to access large > amount of rows (depends how many partitions the table has, millions of rows > in our case). Metastore may become unresponsive because of this. > Since Metastore only needs to know if the specific CD_ID is referenced in SDS > table and does not need access the whole row. We can use > {quote} > select count(1) from SDS where SDS.CD_ID=? > {quote} > CD_ID is index column, the above query will do range scan for index, which is > faster. > For other DBs support LIMIT syntax such as MySQL, this problem does not > exist. However, the new query does not hurt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)