Try analyze your table(s).
What information will this yield to make indexing faster?
"Me fail English? That's unpossible" ###___Archon___###
----- Original Message ----- From: "electroteque" <[EMAIL PROTECTED]> To: "Paul DuBois" <[EMAIL PROTECTED]>; "Florian Weimer" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, July 09, 2003 10:23 AM Subject: RE: Faster reindexing
> Are you MySQL certified? http://www.mysql.com/certification/when reimporting or reinserting or whatever from a huge db i usually drop all the indexes reimport then create them again much quicker
-----Original Message----- From: Paul DuBois [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 1:09 PM To: Florian Weimer; [EMAIL PROTECTED] Subject: Re: Faster reindexing
At 9:39 +0200 7/7/03, Florian Weimer wrote: >I've got a table with 100 million rows and need some indexes on it >(one row is 126 bytes). > >I'm currently using MyISAM and the indexing proceeds at an >astonishingly low rate: about 200 MB per hour. This is rate is far >too low; if we had to recover the database for some reason, we'd have >to wait for days. > >The table looks like this: > >CREATE TABLE flows ( > version CHAR NOT NULL, > router CHAR(15) NOT NULL, > src_ip CHAR(15) NOT NULL, > dst_ip CHAR(15) NOT NULL, > protocol TINYINT UNSIGNED NOT NULL, > src_port MEDIUMINT UNSIGNED NOT NULL, > dst_port MEDIUMINT UNSIGNED NOT NULL, > packets INTEGER UNSIGNED NOT NULL, > bytes INTEGER UNSIGNED NOT NULL, > src_if MEDIUMINT UNSIGNED NOT NULL, > dst_if MEDIUMINT UNSIGNED NOT NULL, > src_as MEDIUMINT UNSIGNED NOT NULL, > dst_as MEDIUMINT UNSIGNED NOT NULL, > src_net CHAR(1) NOT NULL, > dst_net CHAR(1) NOT NULL, > direction CHAR(1) NOT NULL, > class CHAR(1) NOT NULL, > start_time CHAR(24), > end_time CHAR(24) >); > >Indexes are created using this statement: > >mysql> ALTER TABLE flows > -> ADD INDEX dst_ip (dst_ip, src_ip), > -> ADD INDEX dst_port (dst_port, start_time), > -> ADD INDEX src_ip (src_ip, start_time), > -> ADD INDEX time (start_time); > >In theory, we could represent the columns router, src_ip, dst_ip, >start_time, end_time using integers of the appropriate size, but this >would make ad-hoc queries harder to type (and porting our applications >would be even more difficult).
Perhaps, but as a test, you might add a couple of extra columns to the table, then populate them like this after loading the table:
UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip = INET_ATON(dst_ip);
Then try creating the indexes using int_src_ip and int_dst_ip rather than src_ip and dst_ip.
If it's significantly faster, you may want to reconsider whether it might not be worth using INET_ATON(X) in your queries rather than X.
> >Should I switch to another table type?
It's easy enough to convert the table to, e.g., InnoDB and then create the indexes, so an empirical test should not be difficult.
-- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com
-- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com
Are you MySQL certified? http://www.mysql.com/certification/
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]