Gopal V created HIVE-19790:
------------------------------

             Summary: Metastore upgrade: 3.1.0 upgrade script is slow and 
non-idempotent
                 Key: HIVE-19790
                 URL: https://issues.apache.org/jira/browse/HIVE-19790
             Project: Hive
          Issue Type: Bug
          Components: Standalone Metastore
            Reporter: Gopal V


Because of the giant bit-vectors stored on mysql, the update of PART_COL_STATS 
is very slow and also is not idempotent.

{code}
--------------
UPDATE `PART_COL_STATS`
  SET `CAT_NAME` = 'hive'
--------------

Query OK, 0 rows affected (4 min 1.57 sec)
Rows matched: 778025  Changed: 0  Warnings: 0
{code}

Adding a filter speeds it up because it will no longer overwrite 

{code}
mysql> explain UPDATE `PART_COL_STATS` SET `CAT_NAME` = 'hive' where `CAT_NAME` 
='';
--------------
explain UPDATE `PART_COL_STATS` SET `CAT_NAME` = 'hive' where `CAT_NAME` =''
--------------

+----+-------------+----------------+-------+---------------+---------------+---------+-------+------+------------------------------+
| id | select_type | table          | type  | possible_keys | key           | 
key_len | ref   | rows | Extra                        |
+----+-------------+----------------+-------+---------------+---------------+---------+-------+------+------------------------------+
|  1 | SIMPLE      | PART_COL_STATS | range | PCS_STATS_IDX | PCS_STATS_IDX | 
258     | const |    1 | Using where; Using temporary |
+----+-------------+----------------+-------+---------------+---------------+---------+-------+------+------------------------------+
1 row in set (0.00 sec)
{code}

this would be much faster to re-run and would not accidentally overwrite any 
existing CAT_NAMEs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to