Selina Zhang created HIVE-9447:
----------------------------------
Summary: Metastore: inefficient Oracle query for removing unused
column descriptors when add/drop table/partition
Key: HIVE-9447
URL: https://issues.apache.org/jira/browse/HIVE-9447
Project: Hive
Issue Type: Improvement
Components: Metastore
Reporter: Selina Zhang
Assignee: Selina Zhang
Metastore needs removing unused column descriptors when drop/add partitions or
tables. For query the unused column descriptor, the current implementation
utilizes datanuleus' range function, which basically equals LIMIT syntax.
However, Oracle does not support LIMIT, the query is converted as
{quote}
SQL> SELECT * FROM (SELECT subq.*,ROWNUM rn FROM (SELECT
'org.apache.hadoop.hive.metastore.model.MStorageDescriptor' AS
NUCLEUS_TYPE,A0.INPUT_FORMAT,A0.IS_COMPRESSED,A0.IS_STOREDASSUBDIRECTORIES,A0.LOCATION,
A0.NUM_BUCKETS,A0.OUTPUT_FORMAT,A0.SD_ID FROM drhcat.SDS A0
WHERE A0.CD_ID = ? ) subq ) WHERE rn <= 1;
{quote}
Given that CD_ID is not very selective, this query may have to access large
amount of rows (depends how many partitions the table has, millions of rows in
our case). Metastore may become unresponsive because of this.
Since Metastore only needs to know if the specific CD_ID is referenced in SDS
table and does not need access the whole row. We can use
{quote}
select count(1) from SDS where SDS.CD_ID=?
{quote}
CD_ID is index column, the above query will do range scan for index, which is
faster.
For other DBs support LIMIT syntax such as MySQL, this problem does not exist.
However, the new query does not hurt.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)