handle *large* tables in mysql. Hopefully, the following will be helpful to those individuals.
I have several tables that are 500M to 4B rows in size and require me to drop and re-create indices at regular intervals. Usually, I'm removing about 10% of the existing records and then adding about 15% new records back in. Doing this with indices turned on would take days if not weeks. With the proper sequence of operations, however, I can accomplish this in about 24 hours.
For reference, my database server consists of a dual processor machine
with a 3Ware 7xxx series controller. For what we are trying to do
here, processor speed is almost irrelevant when compared to disk I/O
speed. Tests with bonnie++ indicate we get ~130MB/s when using 8 disks configured for RAID 0+1. Note: RAID 5 write performance is 10x
slower than RAID 0+1 on these cards
For convenience, I will refer to my table as "Data" throughout this explanation.
Deleting Data
-------------
It turns out that copying only the desired portion of the data can be much more efficient than running a series of "DELETE FROM TABLE"
type of statements. In addition, it has the same effect as running "OPTIMIZE TABLE" which should be run anyway in this situation.
Removing unwanted records:
1) mysql> ALTER TABLE Data RENAME TO OldData;
*This can require up to 2x the size of Data.MYD in free disk space
2) mysql> CREATE TABLE Data <table spec>;
*Do not add indices at this point, we will add them later
3) mysql> INSERT INTO Data SELECT * FROM OldData WHERE <criteria for selection>;
At this point you can choose to drop "OldData" or keep it around until you are confident that you did everything correctly so far.
Loading Data ------------- When possible, we load data using mysqlimport. This is much faster than connecting to the server and executing a series of INSERT statements.
Creating Indices ----------------- The following process works for adding or removing indices from large tables. It does, however, require direct access to the MYD/MYI files used by mysql which likely means you need root privileges on your database server.
This process was given to me by an old and wise mysql administrator/developer. Hopefully, it will someday make it into the FAQ/HOW-TO section of the docs.
1) Flush the tables (force mysql to close all open file handles) by executing:
% mysql -e "flush tables or % mysqladmin flush-tables
If possible, you should also consider shutting down the mysql server at this point
2) Rename the data file (.MYD) by changing to the directory where your database tables are stored and executing the following command:
% mv Data.MYD Data.MYD.save
3) Truncate the data and index files so that mysql thinks the table
contains no data: (Restart the mysql server if you stopped it in step #1)
mysql> TRUNCATE TABLE Data;
When you execute 'TRUNCATE TABLE' (or equivalent), mysql simply deletes and recreates the MYD and MYI files for the table. Since you've already removed the MYD file as far as mysql is concerned, this is very fast.
4) Now add your index(es) to the empty table:
mysql> ALTER TABLE ADD INDEX <index spec>;
5) This is the same as step #1. Either flush the tables or shut down mysql. At this point, you should have 4 files like "Data.*": Data.MYD, Data.MYD.save, Data.MYI, and Data.frm
6) Restore the data file you renamed in step #2:
% mv Data.MYD.save Data.MYD
This will result in a mismatch between the data and index files similar to what you might find if the table crashed.
7) Restore/repair the index file based on the data file:
% myisamchk -q -r <options> Data.MYI
Typical options I use:
-O key_buffer=128M -O sort_buffer=128M -O read_buffer=2M -O write_buffer=2M
The above command will recreate the index without copying the data file. This means that the index creation process does roughly 50% less work -- and it is the expensive part of the process, writing the data to disk, that is skipped.
If you have any questions regarding this process, please feel free to contact me.
Kyle
*************************************************************************** Kyle J. Munn Email: [EMAIL PROTECTED] EraGen Biosciences Phone: 608.662.9000 x351 http://www.EraGen.com Fax: 608.662.9008
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]