I routinely see people requesting information on how to efficiently
handle *large* tables in mysql. Hopefully, the following will be helpful to those individuals.


I  have several  tables that  are 500M to 4B rows in size and
require  me  to  drop  and  re-create indices  at  regular  intervals.
Usually,  I'm removing  about 10%  of  the existing  records and  then
adding about 15% new records  back in.  Doing this with indices turned
on  would  take  days if  not  weeks.   With  the proper  sequence  of
operations, however, I can accomplish this in about 24 hours.


For reference, my database server consists of a dual processor machine
with a 3Ware 7xxx series controller. For what we are trying to do
here, processor speed is almost irrelevant when compared to disk I/O
speed. Tests with bonnie++ indicate we get ~130MB/s when using 8 disks configured for RAID 0+1. Note: RAID 5 write performance is 10x
slower than RAID 0+1 on these cards



For convenience, I will refer to my table as "Data" throughout this explanation.

Deleting Data
-------------
It turns out that copying only the desired portion of the data can be much more efficient than running a series of "DELETE FROM TABLE"
type of statements. In addition, it has the same effect as running "OPTIMIZE TABLE" which should be run anyway in this situation.


Removing unwanted records:
1) mysql> ALTER TABLE Data RENAME TO OldData;
*This can require up to 2x the size of Data.MYD in free disk space
2) mysql> CREATE TABLE Data <table spec>;
*Do not add indices at this point, we will add them later
3) mysql> INSERT INTO Data SELECT * FROM OldData WHERE <criteria for selection>;


At this point you can choose to drop "OldData" or keep it around until
you are confident that you did everything correctly so far.


Loading Data ------------- When possible, we load data using mysqlimport. This is much faster than connecting to the server and executing a series of INSERT statements.


Creating Indices ----------------- The following process works for adding or removing indices from large tables. It does, however, require direct access to the MYD/MYI files used by mysql which likely means you need root privileges on your database server.

This   process  was   given  to   me  by   an  old   and   wise  mysql
administrator/developer.  Hopefully, it will  someday make it into the
FAQ/HOW-TO section of the docs.

1) Flush the tables (force mysql to close all open file handles) by executing:

   % mysql -e "flush tables
   or
   % mysqladmin flush-tables

   If  possible, you  should  also consider  shutting  down the  mysql
   server at this point

2) Rename the data file (.MYD) by changing to the directory where your
   database tables are stored and executing the following command:

% mv Data.MYD Data.MYD.save

3) Truncate the data and index files so that mysql thinks the table
contains no data: (Restart the mysql server if you stopped it in step #1)


mysql> TRUNCATE TABLE Data;

   When  you execute  'TRUNCATE TABLE'  (or equivalent),  mysql simply
   deletes and recreates  the MYD and MYI files  for the table.  Since
   you've already removed  the MYD file as far  as mysql is concerned,
   this is very fast.

4) Now add your index(es) to the empty table:

mysql> ALTER TABLE ADD INDEX <index spec>;

5) This is the same as step  #1.  Either flush the tables or shut down
   mysql.   At this  point, you  should  have 4  files like  "Data.*":
   Data.MYD, Data.MYD.save, Data.MYI, and Data.frm

6) Restore the data file you renamed in step #2:

% mv Data.MYD.save Data.MYD

   This will  result in  a mismatch between  the data and  index files
   similar to what you might find if the table crashed.

7) Restore/repair the index file based on the data file:

% myisamchk -q -r <options> Data.MYI

Typical options I use:
-O key_buffer=128M -O sort_buffer=128M -O read_buffer=2M -O write_buffer=2M


   The above command will recreate  the index without copying the data
   file.  This means that the  index creation process does roughly 50%
   less work --  and it is the expensive part  of the process, writing
   the data to disk, that is skipped.

If you have any questions regarding this process, please feel free to contact me.

Kyle

***************************************************************************
Kyle J. Munn                            Email: [EMAIL PROTECTED]
EraGen Biosciences                      Phone: 608.662.9000 x351
http://www.EraGen.com                   Fax:   608.662.9008


-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to