Speeding up a pretty simple correlated update query

Hank Wed, 02 Sep 2009 16:35:30 -0700

Hello All,
  I have a legacy application which was written using a compound primary key
of an item number (non unique) along with a category ID. The combination of
the item number and category ID make the records unique.


  I am in the process of replacing the compound (VARCHAR) keys with an
unique integer key in these tables.

So I have created an item_seq table and assigned a unique sequence number to
each compound key -- it looks like this (all tables are myisam tables, and
mysql version 5.0)

desc item_seq;
+-----------+------------------+------+-----+---------+----------------+
| Field     | Type             | Null | Key | Default | Extra          |
+-----------+------------------+------+-----+---------+----------------+
| seq       | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| itemid    | char(11)         | NO   | MUL |         |                |
| category  | char(4)          | NO   |     |         |                |
+-----------+------------------+------+-----+---------+----------------+

I also have my main transactional table with about 180,000,000 rows -- it
looks like this:

desc item_trans;
+-------------+-----------------------+------+-----+---------------------+-------+
| Field       | Type                  | Null | Key | Default             |
Extra |
+-------------+-----------------------+------+-----+---------------------+-------+
| seq         | int(10) unsigned      | NO   | MUL |                     |
    |
| itemid      | char(11)              | NO   | PRI |                     |
    |
| category    | char(4)               | NO   | PRI |                     |
    |
| transid     | int(10)               | NO   | PRI |                     |
    |

Currently the "seq" field is null for the entire table.  So of course, I
want to update the main transaction table with the new sequence number.

So I've disabled all the keys on the "item_trans" table -- since I am
updating every row, it wouldn't (shouldn't) be using the index anyway.

Here is my correlated update query:

 update item_trans i, item_seq is
 set i.seq=is.seq
 where is.itemid=i.itemid and is.category=i.category;

  If I run an explain on the select version of the update, this is what I
get:

+----+-------------+----------+--------+---------------+--------+---------+------------------------------------------------+-----------+-------+
| id | select_type | table    | type   | possible_keys | key    | key_len |
ref                                            | rows      | Extra |
+----+-------------+----------+--------+---------------+--------+---------+------------------------------------------------+-----------+-------+
|  1 | SIMPLE      | item_trans    | ALL    | PRIMARY       | NULL   | NULL
   | NULL                                           | 178948797 |       |
|  1 | SIMPLE      | item_seq | eq_ref | itemid        | itemid | 20      |
g.item_trans.itemid,g.item_trans.category                |         1 |
|
+----+-------------+----------+--------+---------------+--------+---------+------------------------------------------------+-----------+-------+

... which is exactly what I would expect it to do.  Update every record of
the item_trans table, and do a full index lookup on the items_seq table.

SO... I've been running this query to update item_trans, and it's been
running for 5 days now.

I've also tried running this with the primary key index on the item_trans
table (but not the seq index), and that ran slower in my initial tests.

Are there any faster ways to update 180 million records with a correlated
update query?  And I'm fairly certain that trying to do this in PHP
one-record at a time would take much longer than a SQL solution.

Thanks,

-Hank

Speeding up a pretty simple correlated update query

Reply via email to