Re: Faster reindexing
On Wed, Jul 09, 2003 at 11:51:01PM -0500, Paul DuBois wrote: At 14:14 +0700 7/9/03, Dominicus Donny wrote: Sorry, my mistake. It's reindexing then. Once i should add 2 key indexes on a huge table. But instead of alter the table in 1 single query, I build each index 1 by 1. Generally, it is faster to build all your indexes with a single ALTER TABLE statement than to build them one by one with separate ALTER TABLE statements. Single. -- Jeremy D. Zawodny | Perl, Web, MySQL, Linux Magazine, Yahoo! [EMAIL PROTECTED] | http://jeremy.zawodny.com/ MySQL 4.0.13: up 8 days, processed 247,399,591 queries (350/sec. avg) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Faster reindexing
umm say i reindex after a reimport i usually drop all the indexes first , then add indexes again one by one , seems to be quicker for me especially for fulltext indexes , doing it at once tends to be slower, must crunch the cpu a bit dont know On Wed, Jul 09, 2003 at 11:51:01PM -0500, Paul DuBois wrote: At 14:14 +0700 7/9/03, Dominicus Donny wrote: Sorry, my mistake. It's reindexing then. Once i should add 2 key indexes on a huge table. But instead of alter the table in 1 single query, I build each index 1 by 1. Generally, it is faster to build all your indexes with a single ALTER TABLE statement than to build them one by one with separate ALTER TABLE statements. Single. -- Jeremy D. Zawodny | Perl, Web, MySQL, Linux Magazine, Yahoo! [EMAIL PROTECTED] | http://jeremy.zawodny.com/ MySQL 4.0.13: up 8 days, processed 247,399,591 queries (350/sec. avg) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Faster reindexing
Yep... im using fulltext indexes That is, multiple indexing would be the best. Nut now i add more memory and do some fine-tuning on my.cnf, perhaps the result won't be the same. Me fail English? That's unpossible ###___Archon___### - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Friday, July 18, 2003 11:42 AM Subject: Re: Faster reindexing umm say i reindex after a reimport i usually drop all the indexes first , then add indexes again one by one , seems to be quicker for me especially for fulltext indexes , doing it at once tends to be slower, must crunch the cpu a bit dont know On Wed, Jul 09, 2003 at 11:51:01PM -0500, Paul DuBois wrote: At 14:14 +0700 7/9/03, Dominicus Donny wrote: Sorry, my mistake. It's reindexing then. Once i should add 2 key indexes on a huge table. But instead of alter the table in 1 single query, I build each index 1 by 1. Generally, it is faster to build all your indexes with a single ALTER TABLE statement than to build them one by one with separate ALTER TABLE statements. Single. -- Jeremy D. Zawodny | Perl, Web, MySQL, Linux Magazine, Yahoo! [EMAIL PROTECTED] | http://jeremy.zawodny.com/ MySQL 4.0.13: up 8 days, processed 247,399,591 queries (350/sec. avg) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Faster reindexing
Dathan Vance Pattishall [EMAIL PROTECTED] writes: Maybe increasing #use for when mysql is doing a check or repair set-variable= myisam_sort_buffer_size=64M to a higher value will make the index happen faster on the fly. MySQL doesn't seem to honour this variable. I've set it to '1024M', but the process size stays below 512 MB (which are occupied by other buffers). The indexing rate drops to a less than 6 MB per *hour* once 10 million (or so) records have been processed. It appears as if the only solution is splitting the table into 20 or 30 smaller ones and using a MERGE table. But if this is required, I might also do this using the existing RDBMS. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Faster reindexing
You should also check myisam_max_extra_sort_file_size and myisam_max_sort_file_size. NOTE: I heard/know that these variables are dependant on each other, however I cannot remember anymore how. It was something that if a is not set, b is not considered. Also check out myisam_repair_threads, which could give you even more speed. Best check all the variables starting with myisam (SHOW VARIABLES LIKE 'myisam%') and look up what they are doing. Then put all to a max and try again. Maybe keep myisam_repair_threads to 1 though... Cheers /rudy -Original Message- From: Florian Weimer [mailto:[EMAIL PROTECTED] Sent: donderdag 10 juli 2003 10:06 To: Dathan Vance Pattishall Cc: 'Paul DuBois'; [EMAIL PROTECTED] Subject: Re: Faster reindexing Dathan Vance Pattishall [EMAIL PROTECTED] writes: Maybe increasing #use for when mysql is doing a check or repair set-variable= myisam_sort_buffer_size=64M to a higher value will make the index happen faster on the fly. MySQL doesn't seem to honour this variable. I've set it to '1024M', but the process size stays below 512 MB (which are occupied by other buffers). The indexing rate drops to a less than 6 MB per *hour* once 10 million (or so) records have been processed. It appears as if the only solution is splitting the table into 20 or 30 smaller ones and using a MERGE table. But if this is required, I might also do this using the existing RDBMS. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Faster reindexing
Sorry, my mistake. It's reindexing then. Once i should add 2 key indexes on a huge table. But instead of alter the table in 1 single query, I build each index 1 by 1. And the responses of the slaves also great, too. Anyway, im using the standard my-medium.cnf setup. The huge table/db also located on another drive/partition. Me fail English? That's unpossible ###___Archon___### - Original Message - From: Paul DuBois [EMAIL PROTECTED] To: Dominicus Donny [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 11:22 AM Subject: Re: Faster reindexing At 11:23 +0700 7/9/03, Dominicus Donny wrote: Try analyze your table(s). What information will this yield to make indexing faster? Me fail English? That's unpossible ###___Archon___### - Original Message - From: electroteque [EMAIL PROTECTED] To: Paul DuBois [EMAIL PROTECTED]; Florian Weimer [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 10:23 AM Subject: RE: Faster reindexing when reimporting or reinserting or whatever from a huge db i usually drop all the indexes reimport then create them again much quicker -Original Message- From: Paul DuBois [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 1:09 PM To: Florian Weimer; [EMAIL PROTECTED] Subject: Re: Faster reindexing At 9:39 +0200 7/7/03, Florian Weimer wrote: I've got a table with 100 million rows and need some indexes on it (one row is 126 bytes). I'm currently using MyISAM and the indexing proceeds at an astonishingly low rate: about 200 MB per hour. This is rate is far too low; if we had to recover the database for some reason, we'd have to wait for days. The table looks like this: CREATE TABLE flows ( versionCHAR NOT NULL, router CHAR(15) NOT NULL, src_ip CHAR(15) NOT NULL, dst_ip CHAR(15) NOT NULL, protocol TINYINT UNSIGNED NOT NULL, src_port MEDIUMINT UNSIGNED NOT NULL, dst_port MEDIUMINT UNSIGNED NOT NULL, packetsINTEGER UNSIGNED NOT NULL, bytes INTEGER UNSIGNED NOT NULL, src_if MEDIUMINT UNSIGNED NOT NULL, dst_if MEDIUMINT UNSIGNED NOT NULL, src_as MEDIUMINT UNSIGNED NOT NULL, dst_as MEDIUMINT UNSIGNED NOT NULL, src_netCHAR(1) NOT NULL, dst_netCHAR(1) NOT NULL, direction CHAR(1) NOT NULL, class CHAR(1) NOT NULL, start_time CHAR(24), end_time CHAR(24) ); Indexes are created using this statement: mysql ALTER TABLE flows - ADD INDEX dst_ip (dst_ip, src_ip), - ADD INDEX dst_port (dst_port, start_time), - ADD INDEX src_ip (src_ip, start_time), - ADD INDEX time (start_time); In theory, we could represent the columns router, src_ip, dst_ip, start_time, end_time using integers of the appropriate size, but this would make ad-hoc queries harder to type (and porting our applications would be even more difficult). Perhaps, but as a test, you might add a couple of extra columns to the table, then populate them like this after loading the table: UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip = INET_ATON(dst_ip); Then try creating the indexes using int_src_ip and int_dst_ip rather than src_ip and dst_ip. If it's significantly faster, you may want to reconsider whether it might not be worth using INET_ATON(X) in your queries rather than X. Should I switch to another table type? It's easy enough to convert the table to, e.g., InnoDB and then create the indexes, so an empirical test should not be difficult. -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Faster reindexing
Check out the EXPLAIN command EXPLAIN tbl_name is a synonym for DESCRIBE tbl_name or SHOW COLUMNS FROM tbl_name. When you precede a SELECT statement with the keyword EXPLAIN, MySQL explains how it would process the SELECT, providing information about how tables are joined and in which order. With the help of EXPLAIN, you can see when you must add indexes to tables to get a faster SELECT that uses indexes to find the records. http://www.mysql.com/doc/en/EXPLAIN.html Terry Spencer Haigh Consultancy Services +44 (0)2073007329 www.haigh-cs.co.uk -Original Message- From: Paul DuBois [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 5:23 AM To: Dominicus Donny; [EMAIL PROTECTED] Subject: Re: Faster reindexing At 11:23 +0700 7/9/03, Dominicus Donny wrote: Try analyze your table(s). What information will this yield to make indexing faster? Me fail English? That's unpossible ###___Archon___### - Original Message - From: electroteque [EMAIL PROTECTED] To: Paul DuBois [EMAIL PROTECTED]; Florian Weimer [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 10:23 AM Subject: RE: Faster reindexing when reimporting or reinserting or whatever from a huge db i usually drop all the indexes reimport then create them again much quicker -Original Message- From: Paul DuBois [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 1:09 PM To: Florian Weimer; [EMAIL PROTECTED] Subject: Re: Faster reindexing At 9:39 +0200 7/7/03, Florian Weimer wrote: I've got a table with 100 million rows and need some indexes on it (one row is 126 bytes). I'm currently using MyISAM and the indexing proceeds at an astonishingly low rate: about 200 MB per hour. This is rate is far too low; if we had to recover the database for some reason, we'd have to wait for days. The table looks like this: CREATE TABLE flows ( versionCHAR NOT NULL, router CHAR(15) NOT NULL, src_ip CHAR(15) NOT NULL, dst_ip CHAR(15) NOT NULL, protocol TINYINT UNSIGNED NOT NULL, src_port MEDIUMINT UNSIGNED NOT NULL, dst_port MEDIUMINT UNSIGNED NOT NULL, packetsINTEGER UNSIGNED NOT NULL, bytes INTEGER UNSIGNED NOT NULL, src_if MEDIUMINT UNSIGNED NOT NULL, dst_if MEDIUMINT UNSIGNED NOT NULL, src_as MEDIUMINT UNSIGNED NOT NULL, dst_as MEDIUMINT UNSIGNED NOT NULL, src_netCHAR(1) NOT NULL, dst_netCHAR(1) NOT NULL, direction CHAR(1) NOT NULL, class CHAR(1) NOT NULL, start_time CHAR(24), end_time CHAR(24) ); Indexes are created using this statement: mysql ALTER TABLE flows - ADD INDEX dst_ip (dst_ip, src_ip), - ADD INDEX dst_port (dst_port, start_time), - ADD INDEX src_ip (src_ip, start_time), - ADD INDEX time (start_time); In theory, we could represent the columns router, src_ip, dst_ip, start_time, end_time using integers of the appropriate size, but this would make ad-hoc queries harder to type (and porting our applications would be even more difficult). Perhaps, but as a test, you might add a couple of extra columns to the table, then populate them like this after loading the table: UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip = INET_ATON(dst_ip); Then try creating the indexes using int_src_ip and int_dst_ip rather than src_ip and dst_ip. If it's significantly faster, you may want to reconsider whether it might not be worth using INET_ATON(X) in your queries rather than X. Should I switch to another table type? It's easy enough to convert the table to, e.g., InnoDB and then create the indexes, so an empirical test should not be difficult. -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Faster reindexing
Maybe increasing #use for when mysql is doing a check or repair set-variable= myisam_sort_buffer_size=64M to a higher value will make the index happen faster on the fly. But, for a 100 million row table doing a dump and adding that dump back to the db might be your fastest method. Building the index at insertion for a self balancing tree is a faster in some cases (I believe this is the case) then building one on the fly. My 2 cents. 100 million rows WOW. ---Original Message- --From: Paul DuBois [mailto:[EMAIL PROTECTED] --Sent: Tuesday, July 08, 2003 8:09 PM --To: Florian Weimer; [EMAIL PROTECTED] --Subject: Re: Faster reindexing -- --At 9:39 +0200 7/7/03, Florian Weimer wrote: --I've got a table with 100 million rows and need some indexes on it --(one row is 126 bytes). -- --I'm currently using MyISAM and the indexing proceeds at an --astonishingly low rate: about 200 MB per hour. This is rate is far --too low; if we had to recover the database for some reason, we'd have --to wait for days. -- --The table looks like this: -- --CREATE TABLE flows ( --versionCHAR NOT NULL, --router CHAR(15) NOT NULL, --src_ip CHAR(15) NOT NULL, --dst_ip CHAR(15) NOT NULL, --protocol TINYINT UNSIGNED NOT NULL, --src_port MEDIUMINT UNSIGNED NOT NULL, --dst_port MEDIUMINT UNSIGNED NOT NULL, --packetsINTEGER UNSIGNED NOT NULL, --bytes INTEGER UNSIGNED NOT NULL, --src_if MEDIUMINT UNSIGNED NOT NULL, --dst_if MEDIUMINT UNSIGNED NOT NULL, --src_as MEDIUMINT UNSIGNED NOT NULL, --dst_as MEDIUMINT UNSIGNED NOT NULL, --src_netCHAR(1) NOT NULL, --dst_netCHAR(1) NOT NULL, --direction CHAR(1) NOT NULL, --class CHAR(1) NOT NULL, --start_time CHAR(24), --end_time CHAR(24) --); -- --Indexes are created using this statement: -- --mysql ALTER TABLE flows -- - ADD INDEX dst_ip (dst_ip, src_ip), -- - ADD INDEX dst_port (dst_port, start_time), -- - ADD INDEX src_ip (src_ip, start_time), -- - ADD INDEX time (start_time); -- --In theory, we could represent the columns router, src_ip, dst_ip, --start_time, end_time using integers of the appropriate size, but this --would make ad-hoc queries harder to type (and porting our applications --would be even more difficult). -- --Perhaps, but as a test, you might add a couple of extra columns to --the table, then populate them like this after loading the table: -- --UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip = --INET_ATON(dst_ip); -- --Then try creating the indexes using int_src_ip and int_dst_ip rather --than src_ip and dst_ip. -- --If it's significantly faster, you may want to reconsider whether it might --not be worth using INET_ATON(X) in your queries rather than X. -- -- --Should I switch to another table type? -- --It's easy enough to convert the table to, e.g., InnoDB and then --create the indexes, so an empirical test should not be difficult. -- --Paul DuBois, Senior Technical Writer --Madison, Wisconsin, USA --MySQL AB, www.mysql.com -- --Are you MySQL certified? http://www.mysql.com/certification/ -- -- --MySQL General Mailing List --For list archives: http://lists.mysql.com/mysql --To unsubscribe: --http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Faster reindexing
Dathan Vance Pattishall [EMAIL PROTECTED] writes: #use for when mysql is doing a check or repair set-variable= myisam_sort_buffer_size=64M to a higher value will make the index happen faster on the fly. Oops. I only adjusted the key_buffer value. Probably I should set myisam_sort_buffer_size to several hundred megabytes. If InnoDB indexing doesn't finish either, I'll give it a try (I still hope that MyISAM tables are more light-weight than InnoDB tables and result in higher throughput in a many reads/rare bulk updates scenario). However, the indexes must be maintenance-free once created (no creeping index syndrome). Can it occur that index pages get lost during deletion? But, for a 100 million row table doing a dump and adding that dump back to the db might be your fastest method. Building the index at insertion for a self balancing tree is a faster in some cases (I believe this is the case) then building one on the fly. I don't think this matters much, as reindexing seems to reload the database anyway. My 2 cents. 100 million rows WOW. I initially hoped to store even a bit more. 8-/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Faster reindexing
At 14:14 +0700 7/9/03, Dominicus Donny wrote: Sorry, my mistake. It's reindexing then. Once i should add 2 key indexes on a huge table. But instead of alter the table in 1 single query, I build each index 1 by 1. Generally, it is faster to build all your indexes with a single ALTER TABLE statement than to build them one by one with separate ALTER TABLE statements. And the responses of the slaves also great, too. Anyway, im using the standard my-medium.cnf setup. The huge table/db also located on another drive/partition. Me fail English? That's unpossible ###___Archon___### - Original Message - From: Paul DuBois [EMAIL PROTECTED] To: Dominicus Donny [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 11:22 AM Subject: Re: Faster reindexing At 11:23 +0700 7/9/03, Dominicus Donny wrote: Try analyze your table(s). What information will this yield to make indexing faster? Me fail English? That's unpossible ###___Archon___### - Original Message - From: electroteque [EMAIL PROTECTED] To: Paul DuBois [EMAIL PROTECTED]; Florian Weimer [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 10:23 AM Subject: RE: Faster reindexing when reimporting or reinserting or whatever from a huge db i usually drop all the indexes reimport then create them again much quicker -Original Message- From: Paul DuBois [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 1:09 PM To: Florian Weimer; [EMAIL PROTECTED] Subject: Re: Faster reindexing At 9:39 +0200 7/7/03, Florian Weimer wrote: I've got a table with 100 million rows and need some indexes on it (one row is 126 bytes). I'm currently using MyISAM and the indexing proceeds at an astonishingly low rate: about 200 MB per hour. This is rate is far too low; if we had to recover the database for some reason, we'd have to wait for days. The table looks like this: CREATE TABLE flows ( versionCHAR NOT NULL, router CHAR(15) NOT NULL, src_ip CHAR(15) NOT NULL, dst_ip CHAR(15) NOT NULL, protocol TINYINT UNSIGNED NOT NULL, src_port MEDIUMINT UNSIGNED NOT NULL, dst_port MEDIUMINT UNSIGNED NOT NULL, packetsINTEGER UNSIGNED NOT NULL, bytes INTEGER UNSIGNED NOT NULL, src_if MEDIUMINT UNSIGNED NOT NULL, dst_if MEDIUMINT UNSIGNED NOT NULL, src_as MEDIUMINT UNSIGNED NOT NULL, dst_as MEDIUMINT UNSIGNED NOT NULL, src_netCHAR(1) NOT NULL, dst_netCHAR(1) NOT NULL, direction CHAR(1) NOT NULL, class CHAR(1) NOT NULL, start_time CHAR(24), end_time CHAR(24) ); Indexes are created using this statement: mysql ALTER TABLE flows - ADD INDEX dst_ip (dst_ip, src_ip), - ADD INDEX dst_port (dst_port, start_time), - ADD INDEX src_ip (src_ip, start_time), - ADD INDEX time (start_time); In theory, we could represent the columns router, src_ip, dst_ip, start_time, end_time using integers of the appropriate size, but this would make ad-hoc queries harder to type (and porting our applications would be even more difficult). Perhaps, but as a test, you might add a couple of extra columns to the table, then populate them like this after loading the table: UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip = INET_ATON(dst_ip); Then try creating the indexes using int_src_ip and int_dst_ip rather than src_ip and dst_ip. If it's significantly faster, you may want to reconsider whether it might not be worth using INET_ATON(X) in your queries rather than X. Should I switch to another table type? It's easy enough to convert the table to, e.g., InnoDB and then create the indexes, so an empirical test should not be difficult. -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Faster reindexing
At 9:39 +0200 7/7/03, Florian Weimer wrote: I've got a table with 100 million rows and need some indexes on it (one row is 126 bytes). I'm currently using MyISAM and the indexing proceeds at an astonishingly low rate: about 200 MB per hour. This is rate is far too low; if we had to recover the database for some reason, we'd have to wait for days. The table looks like this: CREATE TABLE flows ( versionCHAR NOT NULL, router CHAR(15) NOT NULL, src_ip CHAR(15) NOT NULL, dst_ip CHAR(15) NOT NULL, protocol TINYINT UNSIGNED NOT NULL, src_port MEDIUMINT UNSIGNED NOT NULL, dst_port MEDIUMINT UNSIGNED NOT NULL, packetsINTEGER UNSIGNED NOT NULL, bytes INTEGER UNSIGNED NOT NULL, src_if MEDIUMINT UNSIGNED NOT NULL, dst_if MEDIUMINT UNSIGNED NOT NULL, src_as MEDIUMINT UNSIGNED NOT NULL, dst_as MEDIUMINT UNSIGNED NOT NULL, src_netCHAR(1) NOT NULL, dst_netCHAR(1) NOT NULL, direction CHAR(1) NOT NULL, class CHAR(1) NOT NULL, start_time CHAR(24), end_time CHAR(24) ); Indexes are created using this statement: mysql ALTER TABLE flows - ADD INDEX dst_ip (dst_ip, src_ip), - ADD INDEX dst_port (dst_port, start_time), - ADD INDEX src_ip (src_ip, start_time), - ADD INDEX time (start_time); In theory, we could represent the columns router, src_ip, dst_ip, start_time, end_time using integers of the appropriate size, but this would make ad-hoc queries harder to type (and porting our applications would be even more difficult). Perhaps, but as a test, you might add a couple of extra columns to the table, then populate them like this after loading the table: UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip = INET_ATON(dst_ip); Then try creating the indexes using int_src_ip and int_dst_ip rather than src_ip and dst_ip. If it's significantly faster, you may want to reconsider whether it might not be worth using INET_ATON(X) in your queries rather than X. Should I switch to another table type? It's easy enough to convert the table to, e.g., InnoDB and then create the indexes, so an empirical test should not be difficult. -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Faster reindexing
when reimporting or reinserting or whatever from a huge db i usually drop all the indexes reimport then create them again much quicker -Original Message- From: Paul DuBois [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 1:09 PM To: Florian Weimer; [EMAIL PROTECTED] Subject: Re: Faster reindexing At 9:39 +0200 7/7/03, Florian Weimer wrote: I've got a table with 100 million rows and need some indexes on it (one row is 126 bytes). I'm currently using MyISAM and the indexing proceeds at an astonishingly low rate: about 200 MB per hour. This is rate is far too low; if we had to recover the database for some reason, we'd have to wait for days. The table looks like this: CREATE TABLE flows ( versionCHAR NOT NULL, router CHAR(15) NOT NULL, src_ip CHAR(15) NOT NULL, dst_ip CHAR(15) NOT NULL, protocol TINYINT UNSIGNED NOT NULL, src_port MEDIUMINT UNSIGNED NOT NULL, dst_port MEDIUMINT UNSIGNED NOT NULL, packetsINTEGER UNSIGNED NOT NULL, bytes INTEGER UNSIGNED NOT NULL, src_if MEDIUMINT UNSIGNED NOT NULL, dst_if MEDIUMINT UNSIGNED NOT NULL, src_as MEDIUMINT UNSIGNED NOT NULL, dst_as MEDIUMINT UNSIGNED NOT NULL, src_netCHAR(1) NOT NULL, dst_netCHAR(1) NOT NULL, direction CHAR(1) NOT NULL, class CHAR(1) NOT NULL, start_time CHAR(24), end_time CHAR(24) ); Indexes are created using this statement: mysql ALTER TABLE flows - ADD INDEX dst_ip (dst_ip, src_ip), - ADD INDEX dst_port (dst_port, start_time), - ADD INDEX src_ip (src_ip, start_time), - ADD INDEX time (start_time); In theory, we could represent the columns router, src_ip, dst_ip, start_time, end_time using integers of the appropriate size, but this would make ad-hoc queries harder to type (and porting our applications would be even more difficult). Perhaps, but as a test, you might add a couple of extra columns to the table, then populate them like this after loading the table: UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip = INET_ATON(dst_ip); Then try creating the indexes using int_src_ip and int_dst_ip rather than src_ip and dst_ip. If it's significantly faster, you may want to reconsider whether it might not be worth using INET_ATON(X) in your queries rather than X. Should I switch to another table type? It's easy enough to convert the table to, e.g., InnoDB and then create the indexes, so an empirical test should not be difficult. -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Faster reindexing
Try analyze your table(s). Me fail English? That's unpossible ###___Archon___### - Original Message - From: electroteque [EMAIL PROTECTED] To: Paul DuBois [EMAIL PROTECTED]; Florian Weimer [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 10:23 AM Subject: RE: Faster reindexing when reimporting or reinserting or whatever from a huge db i usually drop all the indexes reimport then create them again much quicker -Original Message- From: Paul DuBois [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 1:09 PM To: Florian Weimer; [EMAIL PROTECTED] Subject: Re: Faster reindexing At 9:39 +0200 7/7/03, Florian Weimer wrote: I've got a table with 100 million rows and need some indexes on it (one row is 126 bytes). I'm currently using MyISAM and the indexing proceeds at an astonishingly low rate: about 200 MB per hour. This is rate is far too low; if we had to recover the database for some reason, we'd have to wait for days. The table looks like this: CREATE TABLE flows ( versionCHAR NOT NULL, router CHAR(15) NOT NULL, src_ip CHAR(15) NOT NULL, dst_ip CHAR(15) NOT NULL, protocol TINYINT UNSIGNED NOT NULL, src_port MEDIUMINT UNSIGNED NOT NULL, dst_port MEDIUMINT UNSIGNED NOT NULL, packetsINTEGER UNSIGNED NOT NULL, bytes INTEGER UNSIGNED NOT NULL, src_if MEDIUMINT UNSIGNED NOT NULL, dst_if MEDIUMINT UNSIGNED NOT NULL, src_as MEDIUMINT UNSIGNED NOT NULL, dst_as MEDIUMINT UNSIGNED NOT NULL, src_netCHAR(1) NOT NULL, dst_netCHAR(1) NOT NULL, direction CHAR(1) NOT NULL, class CHAR(1) NOT NULL, start_time CHAR(24), end_time CHAR(24) ); Indexes are created using this statement: mysql ALTER TABLE flows - ADD INDEX dst_ip (dst_ip, src_ip), - ADD INDEX dst_port (dst_port, start_time), - ADD INDEX src_ip (src_ip, start_time), - ADD INDEX time (start_time); In theory, we could represent the columns router, src_ip, dst_ip, start_time, end_time using integers of the appropriate size, but this would make ad-hoc queries harder to type (and porting our applications would be even more difficult). Perhaps, but as a test, you might add a couple of extra columns to the table, then populate them like this after loading the table: UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip = INET_ATON(dst_ip); Then try creating the indexes using int_src_ip and int_dst_ip rather than src_ip and dst_ip. If it's significantly faster, you may want to reconsider whether it might not be worth using INET_ATON(X) in your queries rather than X. Should I switch to another table type? It's easy enough to convert the table to, e.g., InnoDB and then create the indexes, so an empirical test should not be difficult. -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Faster reindexing
At 11:23 +0700 7/9/03, Dominicus Donny wrote: Try analyze your table(s). What information will this yield to make indexing faster? Me fail English? That's unpossible ###___Archon___### - Original Message - From: electroteque [EMAIL PROTECTED] To: Paul DuBois [EMAIL PROTECTED]; Florian Weimer [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 10:23 AM Subject: RE: Faster reindexing when reimporting or reinserting or whatever from a huge db i usually drop all the indexes reimport then create them again much quicker -Original Message- From: Paul DuBois [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 09, 2003 1:09 PM To: Florian Weimer; [EMAIL PROTECTED] Subject: Re: Faster reindexing At 9:39 +0200 7/7/03, Florian Weimer wrote: I've got a table with 100 million rows and need some indexes on it (one row is 126 bytes). I'm currently using MyISAM and the indexing proceeds at an astonishingly low rate: about 200 MB per hour. This is rate is far too low; if we had to recover the database for some reason, we'd have to wait for days. The table looks like this: CREATE TABLE flows ( versionCHAR NOT NULL, router CHAR(15) NOT NULL, src_ip CHAR(15) NOT NULL, dst_ip CHAR(15) NOT NULL, protocol TINYINT UNSIGNED NOT NULL, src_port MEDIUMINT UNSIGNED NOT NULL, dst_port MEDIUMINT UNSIGNED NOT NULL, packetsINTEGER UNSIGNED NOT NULL, bytes INTEGER UNSIGNED NOT NULL, src_if MEDIUMINT UNSIGNED NOT NULL, dst_if MEDIUMINT UNSIGNED NOT NULL, src_as MEDIUMINT UNSIGNED NOT NULL, dst_as MEDIUMINT UNSIGNED NOT NULL, src_netCHAR(1) NOT NULL, dst_netCHAR(1) NOT NULL, direction CHAR(1) NOT NULL, class CHAR(1) NOT NULL, start_time CHAR(24), end_time CHAR(24) ); Indexes are created using this statement: mysql ALTER TABLE flows - ADD INDEX dst_ip (dst_ip, src_ip), - ADD INDEX dst_port (dst_port, start_time), - ADD INDEX src_ip (src_ip, start_time), - ADD INDEX time (start_time); In theory, we could represent the columns router, src_ip, dst_ip, start_time, end_time using integers of the appropriate size, but this would make ad-hoc queries harder to type (and porting our applications would be even more difficult). Perhaps, but as a test, you might add a couple of extra columns to the table, then populate them like this after loading the table: UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip = INET_ATON(dst_ip); Then try creating the indexes using int_src_ip and int_dst_ip rather than src_ip and dst_ip. If it's significantly faster, you may want to reconsider whether it might not be worth using INET_ATON(X) in your queries rather than X. Should I switch to another table type? It's easy enough to convert the table to, e.g., InnoDB and then create the indexes, so an empirical test should not be difficult. -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- Paul DuBois, Senior Technical Writer Madison, Wisconsin, USA MySQL AB, www.mysql.com Are you MySQL certified? http://www.mysql.com/certification/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]