Re: Faster reindexing

2003-07-17 Thread Jeremy Zawodny
On Wed, Jul 09, 2003 at 11:51:01PM -0500, Paul DuBois wrote:
 At 14:14 +0700 7/9/03, Dominicus Donny wrote:
 Sorry, my mistake. It's reindexing then.
 Once i should add 2 key indexes on a huge table.
 But instead of alter the table in 1 single query,
 I build each index 1 by 1.
 
 Generally, it is faster to build all your indexes with a single
 ALTER TABLE statement than to build them one by one with separate
 ALTER TABLE statements.

Single.
-- 
Jeremy D. Zawodny |  Perl, Web, MySQL, Linux Magazine, Yahoo!
[EMAIL PROTECTED]  |  http://jeremy.zawodny.com/

MySQL 4.0.13: up 8 days, processed 247,399,591 queries (350/sec. avg)

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Faster reindexing

2003-07-17 Thread daniel
umm say i reindex after a reimport i usually drop all the indexes first ,
then add indexes again one by one , seems to be quicker for me especially
for fulltext indexes , doing it at once tends to be slower, must crunch the
cpu a bit dont know

 On Wed, Jul 09, 2003 at 11:51:01PM -0500, Paul DuBois wrote:
 At 14:14 +0700 7/9/03, Dominicus Donny wrote:
 Sorry, my mistake. It's reindexing then.
 Once i should add 2 key indexes on a huge table.
 But instead of alter the table in 1 single query,
 I build each index 1 by 1.

 Generally, it is faster to build all your indexes with a single
 ALTER TABLE statement than to build them one by one with separate
 ALTER TABLE statements.

 Single.
 --
 Jeremy D. Zawodny |  Perl, Web, MySQL, Linux Magazine, Yahoo!
 [EMAIL PROTECTED]  |  http://jeremy.zawodny.com/

 MySQL 4.0.13: up 8 days, processed 247,399,591 queries (350/sec. avg)

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Faster reindexing

2003-07-17 Thread Dominicus Donny
Yep...
im using fulltext indexes
That is, multiple indexing would be the best.

Nut now i add more memory and do some
fine-tuning on my.cnf, perhaps the result won't be
the same.

Me fail English? That's unpossible
###___Archon___###


- Original Message -
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Friday, July 18, 2003 11:42 AM
Subject: Re: Faster reindexing


 umm say i reindex after a reimport i usually drop all the indexes first ,
 then add indexes again one by one , seems to be quicker for me especially
 for fulltext indexes , doing it at once tends to be slower, must crunch
the
 cpu a bit dont know

  On Wed, Jul 09, 2003 at 11:51:01PM -0500, Paul DuBois wrote:
  At 14:14 +0700 7/9/03, Dominicus Donny wrote:
  Sorry, my mistake. It's reindexing then.
  Once i should add 2 key indexes on a huge table.
  But instead of alter the table in 1 single query,
  I build each index 1 by 1.
 
  Generally, it is faster to build all your indexes with a single
  ALTER TABLE statement than to build them one by one with separate
  ALTER TABLE statements.
 
  Single.
  --
  Jeremy D. Zawodny |  Perl, Web, MySQL, Linux Magazine, Yahoo!
  [EMAIL PROTECTED]  |  http://jeremy.zawodny.com/
 
  MySQL 4.0.13: up 8 days, processed 247,399,591 queries (350/sec. avg)
 
  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe:
  http://lists.mysql.com/[EMAIL PROTECTED]





-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Faster reindexing

2003-07-10 Thread Florian Weimer
Dathan Vance Pattishall [EMAIL PROTECTED] writes:

 Maybe increasing

 #use for when mysql is doing a check or repair
 set-variable= myisam_sort_buffer_size=64M

 to a higher value will make the index happen faster on the fly.

MySQL doesn't seem to honour this variable.  I've set it to '1024M',
but the process size stays below 512 MB (which are occupied by other
buffers).

The indexing rate drops to a less than 6 MB per *hour* once 10 million
(or so) records have been processed.

It appears as if the only solution is splitting the table into 20 or
30 smaller ones and using a MERGE table.  But if this is required, I
might also do this using the existing RDBMS.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Faster reindexing

2003-07-10 Thread Rudy Metzger
You should also check myisam_max_extra_sort_file_size and
myisam_max_sort_file_size.

NOTE: I heard/know that these variables are dependant on each other,
however I cannot remember anymore how. It was something that if a is not
set, b is not considered. 

Also check out myisam_repair_threads, which could give you even more
speed.

Best check all the variables starting with myisam (SHOW VARIABLES LIKE
'myisam%') and look up what they are doing. Then put all to a max and
try again. Maybe keep myisam_repair_threads to 1 though...

Cheers
/rudy

-Original Message-
From: Florian Weimer [mailto:[EMAIL PROTECTED] 
Sent: donderdag 10 juli 2003 10:06
To: Dathan Vance Pattishall
Cc: 'Paul DuBois'; [EMAIL PROTECTED]
Subject: Re: Faster reindexing

Dathan Vance Pattishall [EMAIL PROTECTED] writes:

 Maybe increasing

 #use for when mysql is doing a check or repair
 set-variable= myisam_sort_buffer_size=64M

 to a higher value will make the index happen faster on the fly.

MySQL doesn't seem to honour this variable.  I've set it to '1024M',
but the process size stays below 512 MB (which are occupied by other
buffers).

The indexing rate drops to a less than 6 MB per *hour* once 10 million
(or so) records have been processed.

It appears as if the only solution is splitting the table into 20 or
30 smaller ones and using a MERGE table.  But if this is required, I
might also do this using the existing RDBMS.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Faster reindexing

2003-07-09 Thread Dominicus Donny
Sorry, my mistake. It's reindexing then.
Once i should add 2 key indexes on a huge table.
But instead of alter the table in 1 single query,
I build each index 1 by 1.
And the responses of the slaves also great, too.
Anyway, im using the standard my-medium.cnf setup.
The huge table/db also located on another drive/partition.

Me fail English? That's unpossible
###___Archon___###

- Original Message -
From: Paul DuBois [EMAIL PROTECTED]
To: Dominicus Donny [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Wednesday, July 09, 2003 11:22 AM
Subject: Re: Faster reindexing


 At 11:23 +0700 7/9/03, Dominicus Donny wrote:
 Try analyze your table(s).

 What information will this yield to make indexing faster?

 
 Me fail English? That's unpossible
 ###___Archon___###
 
 - Original Message -
 From: electroteque [EMAIL PROTECTED]
 To: Paul DuBois [EMAIL PROTECTED]; Florian Weimer [EMAIL PROTECTED];
 [EMAIL PROTECTED]
 Sent: Wednesday, July 09, 2003 10:23 AM
 Subject: RE: Faster reindexing
 
 
   when reimporting or reinserting or whatever from a huge db i usually
drop
   all the indexes reimport then create them again much quicker
 
   -Original Message-
   From: Paul DuBois [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, July 09, 2003 1:09 PM
   To: Florian Weimer; [EMAIL PROTECTED]
   Subject: Re: Faster reindexing
 
 
   At 9:39 +0200 7/7/03, Florian Weimer wrote:
   I've got a table with 100 million rows and need some indexes on it
   (one row is 126 bytes).
   
   I'm currently using MyISAM and the indexing proceeds at an
   astonishingly low rate: about 200 MB per hour.  This is rate is far
   too low; if we had to recover the database for some reason, we'd have
   to wait for days.
   
   The table looks like this:
   
   CREATE TABLE flows (
versionCHAR NOT NULL,
router CHAR(15) NOT NULL,
src_ip CHAR(15) NOT NULL,
dst_ip CHAR(15) NOT NULL,
protocol   TINYINT UNSIGNED NOT NULL,
src_port   MEDIUMINT UNSIGNED NOT NULL,
dst_port   MEDIUMINT UNSIGNED NOT NULL,
packetsINTEGER UNSIGNED NOT NULL,
bytes  INTEGER UNSIGNED NOT NULL,
src_if MEDIUMINT UNSIGNED NOT NULL,
dst_if MEDIUMINT UNSIGNED NOT NULL,
src_as MEDIUMINT UNSIGNED NOT NULL,
dst_as MEDIUMINT UNSIGNED NOT NULL,
src_netCHAR(1) NOT NULL,
dst_netCHAR(1) NOT NULL,
direction  CHAR(1) NOT NULL,
class  CHAR(1) NOT NULL,
start_time CHAR(24),
end_time   CHAR(24)
   );
   
   Indexes are created using this statement:
   
   mysql ALTER TABLE flows
- ADD INDEX dst_ip (dst_ip, src_ip),
- ADD INDEX dst_port (dst_port, start_time),
- ADD INDEX src_ip (src_ip, start_time),
- ADD INDEX time (start_time);
   
   In theory, we could represent the columns router, src_ip, dst_ip,
   start_time, end_time using integers of the appropriate size, but this
   would make ad-hoc queries harder to type (and porting our
applications
   would be even more difficult).
 
   Perhaps, but as a test, you might add a couple of extra columns to
   the table, then populate them like this after loading the table:
 
   UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip =
   INET_ATON(dst_ip);
 
   Then try creating the indexes using int_src_ip and int_dst_ip rather
   than src_ip and dst_ip.
 
   If it's significantly faster, you may want to reconsider whether it
might
   not be worth using INET_ATON(X) in your queries rather than X.
 
   
   Should I switch to another table type?
 
   It's easy enough to convert the table to, e.g., InnoDB and then
   create the indexes, so an empirical test should not be difficult.
 
   --
   Paul DuBois, Senior Technical Writer
   Madison, Wisconsin, USA
   MySQL AB, www.mysql.com
 
Are you MySQL certified?  http://www.mysql.com/certification/



 --
 Paul DuBois, Senior Technical Writer
 Madison, Wisconsin, USA
 MySQL AB, www.mysql.com

 Are you MySQL certified?  http://www.mysql.com/certification/


 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Faster reindexing

2003-07-09 Thread Terry Spencer
Check out the EXPLAIN command

EXPLAIN tbl_name is a synonym for DESCRIBE tbl_name or SHOW COLUMNS FROM
tbl_name. 

When you precede a SELECT statement with the keyword EXPLAIN, MySQL explains
how it would process the SELECT, providing information about how tables are
joined and in which order. 

With the help of EXPLAIN, you can see when you must add indexes to tables to
get a faster SELECT that uses indexes to find the records.

http://www.mysql.com/doc/en/EXPLAIN.html



Terry Spencer
Haigh Consultancy Services
+44 (0)2073007329
www.haigh-cs.co.uk


-Original Message-
From: Paul DuBois [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 09, 2003 5:23 AM
To: Dominicus Donny; [EMAIL PROTECTED]
Subject: Re: Faster reindexing


At 11:23 +0700 7/9/03, Dominicus Donny wrote:
Try analyze your table(s).

What information will this yield to make indexing faster?


Me fail English? That's unpossible
###___Archon___###

- Original Message -
From: electroteque [EMAIL PROTECTED]
To: Paul DuBois [EMAIL PROTECTED]; Florian Weimer [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Wednesday, July 09, 2003 10:23 AM
Subject: RE: Faster reindexing


  when reimporting or reinserting or whatever from a huge db i usually
drop
  all the indexes reimport then create them again much quicker

  -Original Message-
  From: Paul DuBois [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, July 09, 2003 1:09 PM
  To: Florian Weimer; [EMAIL PROTECTED]
  Subject: Re: Faster reindexing


  At 9:39 +0200 7/7/03, Florian Weimer wrote:
  I've got a table with 100 million rows and need some indexes on it
  (one row is 126 bytes).
  
  I'm currently using MyISAM and the indexing proceeds at an
  astonishingly low rate: about 200 MB per hour.  This is rate is far
  too low; if we had to recover the database for some reason, we'd have
  to wait for days.
  
  The table looks like this:
  
  CREATE TABLE flows (
   versionCHAR NOT NULL,
   router CHAR(15) NOT NULL,
   src_ip CHAR(15) NOT NULL,
   dst_ip CHAR(15) NOT NULL,
   protocol   TINYINT UNSIGNED NOT NULL,
   src_port   MEDIUMINT UNSIGNED NOT NULL,
   dst_port   MEDIUMINT UNSIGNED NOT NULL,
   packetsINTEGER UNSIGNED NOT NULL,
   bytes  INTEGER UNSIGNED NOT NULL,
   src_if MEDIUMINT UNSIGNED NOT NULL,
   dst_if MEDIUMINT UNSIGNED NOT NULL,
   src_as MEDIUMINT UNSIGNED NOT NULL,
   dst_as MEDIUMINT UNSIGNED NOT NULL,
   src_netCHAR(1) NOT NULL,
   dst_netCHAR(1) NOT NULL,
   direction  CHAR(1) NOT NULL,
   class  CHAR(1) NOT NULL,
   start_time CHAR(24),
   end_time   CHAR(24)
  );
  
  Indexes are created using this statement:
  
  mysql ALTER TABLE flows
   - ADD INDEX dst_ip (dst_ip, src_ip),
   - ADD INDEX dst_port (dst_port, start_time),
   - ADD INDEX src_ip (src_ip, start_time),
   - ADD INDEX time (start_time);
  
  In theory, we could represent the columns router, src_ip, dst_ip,
  start_time, end_time using integers of the appropriate size, but this
  would make ad-hoc queries harder to type (and porting our applications
  would be even more difficult).

  Perhaps, but as a test, you might add a couple of extra columns to
  the table, then populate them like this after loading the table:

  UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip =
  INET_ATON(dst_ip);

  Then try creating the indexes using int_src_ip and int_dst_ip rather
  than src_ip and dst_ip.

  If it's significantly faster, you may want to reconsider whether it
might
  not be worth using INET_ATON(X) in your queries rather than X.

  
  Should I switch to another table type?

  It's easy enough to convert the table to, e.g., InnoDB and then
  create the indexes, so an empirical test should not be difficult.

  --
  Paul DuBois, Senior Technical Writer
  Madison, Wisconsin, USA
  MySQL AB, www.mysql.com

   Are you MySQL certified?  http://www.mysql.com/certification/



-- 
Paul DuBois, Senior Technical Writer
Madison, Wisconsin, USA
MySQL AB, www.mysql.com

Are you MySQL certified?  http://www.mysql.com/certification/


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Faster reindexing

2003-07-09 Thread Dathan Vance Pattishall
Maybe increasing

#use for when mysql is doing a check or repair
set-variable= myisam_sort_buffer_size=64M

to a higher value will make the index happen faster on the fly.

But, for a 100 million row table doing a dump and adding that dump back
to the db might be your fastest method. Building the index at insertion
for a self balancing tree is a faster in some cases (I believe this is
the case) then building one on the fly. 

My 2 cents. 100 million rows WOW.

---Original Message-
--From: Paul DuBois [mailto:[EMAIL PROTECTED]
--Sent: Tuesday, July 08, 2003 8:09 PM
--To: Florian Weimer; [EMAIL PROTECTED]
--Subject: Re: Faster reindexing
--
--At 9:39 +0200 7/7/03, Florian Weimer wrote:
--I've got a table with 100 million rows and need some indexes on it
--(one row is 126 bytes).
--
--I'm currently using MyISAM and the indexing proceeds at an
--astonishingly low rate: about 200 MB per hour.  This is rate is far
--too low; if we had to recover the database for some reason, we'd
have
--to wait for days.
--
--The table looks like this:
--
--CREATE TABLE flows (
--versionCHAR NOT NULL,
--router CHAR(15) NOT NULL,
--src_ip CHAR(15) NOT NULL,
--dst_ip CHAR(15) NOT NULL,
--protocol   TINYINT UNSIGNED NOT NULL,
--src_port   MEDIUMINT UNSIGNED NOT NULL,
--dst_port   MEDIUMINT UNSIGNED NOT NULL,
--packetsINTEGER UNSIGNED NOT NULL,
--bytes  INTEGER UNSIGNED NOT NULL,
--src_if MEDIUMINT UNSIGNED NOT NULL,
--dst_if MEDIUMINT UNSIGNED NOT NULL,
--src_as MEDIUMINT UNSIGNED NOT NULL,
--dst_as MEDIUMINT UNSIGNED NOT NULL,
--src_netCHAR(1) NOT NULL,
--dst_netCHAR(1) NOT NULL,
--direction  CHAR(1) NOT NULL,
--class  CHAR(1) NOT NULL,
--start_time CHAR(24),
--end_time   CHAR(24)
--);
--
--Indexes are created using this statement:
--
--mysql ALTER TABLE flows
-- - ADD INDEX dst_ip (dst_ip, src_ip),
-- - ADD INDEX dst_port (dst_port, start_time),
-- - ADD INDEX src_ip (src_ip, start_time),
-- - ADD INDEX time (start_time);
--
--In theory, we could represent the columns router, src_ip, dst_ip,
--start_time, end_time using integers of the appropriate size, but
this
--would make ad-hoc queries harder to type (and porting our
applications
--would be even more difficult).
--
--Perhaps, but as a test, you might add a couple of extra columns to
--the table, then populate them like this after loading the table:
--
--UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip =
--INET_ATON(dst_ip);
--
--Then try creating the indexes using int_src_ip and int_dst_ip rather
--than src_ip and dst_ip.
--
--If it's significantly faster, you may want to reconsider whether it
might
--not be worth using INET_ATON(X) in your queries rather than X.
--
--
--Should I switch to another table type?
--
--It's easy enough to convert the table to, e.g., InnoDB and then
--create the indexes, so an empirical test should not be difficult.
--

--Paul DuBois, Senior Technical Writer
--Madison, Wisconsin, USA
--MySQL AB, www.mysql.com
--
--Are you MySQL certified?  http://www.mysql.com/certification/
--
--

--MySQL General Mailing List
--For list archives: http://lists.mysql.com/mysql
--To unsubscribe:
--http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Faster reindexing

2003-07-09 Thread Florian Weimer
Dathan Vance Pattishall [EMAIL PROTECTED] writes:

 #use for when mysql is doing a check or repair
 set-variable= myisam_sort_buffer_size=64M

 to a higher value will make the index happen faster on the fly.

Oops.  I only adjusted the key_buffer value.  Probably I should set
myisam_sort_buffer_size to several hundred megabytes.

If InnoDB indexing doesn't finish either, I'll give it a try (I still
hope that MyISAM tables are more light-weight than InnoDB tables and
result in higher throughput in a many reads/rare bulk updates
scenario).

However, the indexes must be maintenance-free once created (no
creeping index syndrome).  Can it occur that index pages get lost
during deletion?

 But, for a 100 million row table doing a dump and adding that dump back
 to the db might be your fastest method. Building the index at insertion
 for a self balancing tree is a faster in some cases (I believe this is
 the case) then building one on the fly. 

I don't think this matters much, as reindexing seems to reload the
database anyway.

 My 2 cents. 100 million rows WOW.

I initially hoped to store even a bit more. 8-/

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Faster reindexing

2003-07-09 Thread Paul DuBois
At 14:14 +0700 7/9/03, Dominicus Donny wrote:
Sorry, my mistake. It's reindexing then.
Once i should add 2 key indexes on a huge table.
But instead of alter the table in 1 single query,
I build each index 1 by 1.
Generally, it is faster to build all your indexes with a single
ALTER TABLE statement than to build them one by one with separate
ALTER TABLE statements.
And the responses of the slaves also great, too.
Anyway, im using the standard my-medium.cnf setup.
The huge table/db also located on another drive/partition.
Me fail English? That's unpossible
###___Archon___###
- Original Message -
From: Paul DuBois [EMAIL PROTECTED]
To: Dominicus Donny [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Wednesday, July 09, 2003 11:22 AM
Subject: Re: Faster reindexing

 At 11:23 +0700 7/9/03, Dominicus Donny wrote:
 Try analyze your table(s).
 What information will this yield to make indexing faster?

 
 Me fail English? That's unpossible
 ###___Archon___###
 
 - Original Message -
 From: electroteque [EMAIL PROTECTED]
 To: Paul DuBois [EMAIL PROTECTED]; Florian Weimer [EMAIL PROTECTED];
 [EMAIL PROTECTED]
 Sent: Wednesday, July 09, 2003 10:23 AM
 Subject: RE: Faster reindexing
 
 
   when reimporting or reinserting or whatever from a huge db i usually
drop
   all the indexes reimport then create them again much quicker
 
   -Original Message-
   From: Paul DuBois [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, July 09, 2003 1:09 PM
   To: Florian Weimer; [EMAIL PROTECTED]
   Subject: Re: Faster reindexing
 
 
   At 9:39 +0200 7/7/03, Florian Weimer wrote:
   I've got a table with 100 million rows and need some indexes on it
   (one row is 126 bytes).
   
   I'm currently using MyISAM and the indexing proceeds at an
   astonishingly low rate: about 200 MB per hour.  This is rate is far
   too low; if we had to recover the database for some reason, we'd have
   to wait for days.
   
   The table looks like this:
   
   CREATE TABLE flows (
versionCHAR NOT NULL,
router CHAR(15) NOT NULL,
src_ip CHAR(15) NOT NULL,
dst_ip CHAR(15) NOT NULL,
protocol   TINYINT UNSIGNED NOT NULL,
src_port   MEDIUMINT UNSIGNED NOT NULL,
dst_port   MEDIUMINT UNSIGNED NOT NULL,
packetsINTEGER UNSIGNED NOT NULL,
bytes  INTEGER UNSIGNED NOT NULL,
src_if MEDIUMINT UNSIGNED NOT NULL,
dst_if MEDIUMINT UNSIGNED NOT NULL,
src_as MEDIUMINT UNSIGNED NOT NULL,
dst_as MEDIUMINT UNSIGNED NOT NULL,
src_netCHAR(1) NOT NULL,
dst_netCHAR(1) NOT NULL,
direction  CHAR(1) NOT NULL,
class  CHAR(1) NOT NULL,
start_time CHAR(24),
end_time   CHAR(24)
   );
   
   Indexes are created using this statement:
   
   mysql ALTER TABLE flows
- ADD INDEX dst_ip (dst_ip, src_ip),
- ADD INDEX dst_port (dst_port, start_time),
- ADD INDEX src_ip (src_ip, start_time),
- ADD INDEX time (start_time);
   
   In theory, we could represent the columns router, src_ip, dst_ip,
   start_time, end_time using integers of the appropriate size, but this
   would make ad-hoc queries harder to type (and porting our
applications
   would be even more difficult).
 
   Perhaps, but as a test, you might add a couple of extra columns to
   the table, then populate them like this after loading the table:
 
   UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip =
   INET_ATON(dst_ip);
 
   Then try creating the indexes using int_src_ip and int_dst_ip rather
   than src_ip and dst_ip.
 
   If it's significantly faster, you may want to reconsider whether it
might
   not be worth using INET_ATON(X) in your queries rather than X.
 
   
   Should I switch to another table type?
 
   It's easy enough to convert the table to, e.g., InnoDB and then
   create the indexes, so an empirical test should not be difficult.
 
   --
   Paul DuBois, Senior Technical Writer
   Madison, Wisconsin, USA
   MySQL AB, www.mysql.com
 
Are you MySQL certified?  http://www.mysql.com/certification/


 --
 Paul DuBois, Senior Technical Writer
 Madison, Wisconsin, USA
 MySQL AB, www.mysql.com
  Are you MySQL certified?  http://www.mysql.com/certification/


--
Paul DuBois, Senior Technical Writer
Madison, Wisconsin, USA
MySQL AB, www.mysql.com
Are you MySQL certified?  http://www.mysql.com/certification/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Faster reindexing

2003-07-08 Thread Paul DuBois
At 9:39 +0200 7/7/03, Florian Weimer wrote:
I've got a table with 100 million rows and need some indexes on it
(one row is 126 bytes).
I'm currently using MyISAM and the indexing proceeds at an
astonishingly low rate: about 200 MB per hour.  This is rate is far
too low; if we had to recover the database for some reason, we'd have
to wait for days.
The table looks like this:

CREATE TABLE flows (
versionCHAR NOT NULL,
router CHAR(15) NOT NULL,
src_ip CHAR(15) NOT NULL,
dst_ip CHAR(15) NOT NULL,
protocol   TINYINT UNSIGNED NOT NULL,
src_port   MEDIUMINT UNSIGNED NOT NULL,
dst_port   MEDIUMINT UNSIGNED NOT NULL,
packetsINTEGER UNSIGNED NOT NULL,
bytes  INTEGER UNSIGNED NOT NULL,
src_if MEDIUMINT UNSIGNED NOT NULL,
dst_if MEDIUMINT UNSIGNED NOT NULL,
src_as MEDIUMINT UNSIGNED NOT NULL,
dst_as MEDIUMINT UNSIGNED NOT NULL,
src_netCHAR(1) NOT NULL,
dst_netCHAR(1) NOT NULL,
direction  CHAR(1) NOT NULL,
class  CHAR(1) NOT NULL,
start_time CHAR(24),
end_time   CHAR(24)
);
Indexes are created using this statement:

mysql ALTER TABLE flows
- ADD INDEX dst_ip (dst_ip, src_ip),
- ADD INDEX dst_port (dst_port, start_time),
- ADD INDEX src_ip (src_ip, start_time),
- ADD INDEX time (start_time);
In theory, we could represent the columns router, src_ip, dst_ip,
start_time, end_time using integers of the appropriate size, but this
would make ad-hoc queries harder to type (and porting our applications
would be even more difficult).
Perhaps, but as a test, you might add a couple of extra columns to
the table, then populate them like this after loading the table:
UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip = 
INET_ATON(dst_ip);

Then try creating the indexes using int_src_ip and int_dst_ip rather
than src_ip and dst_ip.
If it's significantly faster, you may want to reconsider whether it might
not be worth using INET_ATON(X) in your queries rather than X.
Should I switch to another table type?
It's easy enough to convert the table to, e.g., InnoDB and then
create the indexes, so an empirical test should not be difficult.
--
Paul DuBois, Senior Technical Writer
Madison, Wisconsin, USA
MySQL AB, www.mysql.com
Are you MySQL certified?  http://www.mysql.com/certification/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


RE: Faster reindexing

2003-07-08 Thread electroteque
when reimporting or reinserting or whatever from a huge db i usually drop
all the indexes reimport then create them again much quicker

-Original Message-
From: Paul DuBois [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 09, 2003 1:09 PM
To: Florian Weimer; [EMAIL PROTECTED]
Subject: Re: Faster reindexing


At 9:39 +0200 7/7/03, Florian Weimer wrote:
I've got a table with 100 million rows and need some indexes on it
(one row is 126 bytes).

I'm currently using MyISAM and the indexing proceeds at an
astonishingly low rate: about 200 MB per hour.  This is rate is far
too low; if we had to recover the database for some reason, we'd have
to wait for days.

The table looks like this:

CREATE TABLE flows (
   versionCHAR NOT NULL,
   router CHAR(15) NOT NULL,
   src_ip CHAR(15) NOT NULL,
   dst_ip CHAR(15) NOT NULL,
   protocol   TINYINT UNSIGNED NOT NULL,
   src_port   MEDIUMINT UNSIGNED NOT NULL,
   dst_port   MEDIUMINT UNSIGNED NOT NULL,
   packetsINTEGER UNSIGNED NOT NULL,
   bytes  INTEGER UNSIGNED NOT NULL,
   src_if MEDIUMINT UNSIGNED NOT NULL,
   dst_if MEDIUMINT UNSIGNED NOT NULL,
   src_as MEDIUMINT UNSIGNED NOT NULL,
   dst_as MEDIUMINT UNSIGNED NOT NULL,
   src_netCHAR(1) NOT NULL,
   dst_netCHAR(1) NOT NULL,
   direction  CHAR(1) NOT NULL,
   class  CHAR(1) NOT NULL,
   start_time CHAR(24),
   end_time   CHAR(24)
);

Indexes are created using this statement:

mysql ALTER TABLE flows
 - ADD INDEX dst_ip (dst_ip, src_ip),
 - ADD INDEX dst_port (dst_port, start_time),
 - ADD INDEX src_ip (src_ip, start_time),
 - ADD INDEX time (start_time);

In theory, we could represent the columns router, src_ip, dst_ip,
start_time, end_time using integers of the appropriate size, but this
would make ad-hoc queries harder to type (and porting our applications
would be even more difficult).

Perhaps, but as a test, you might add a couple of extra columns to
the table, then populate them like this after loading the table:

UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip =
INET_ATON(dst_ip);

Then try creating the indexes using int_src_ip and int_dst_ip rather
than src_ip and dst_ip.

If it's significantly faster, you may want to reconsider whether it might
not be worth using INET_ATON(X) in your queries rather than X.


Should I switch to another table type?

It's easy enough to convert the table to, e.g., InnoDB and then
create the indexes, so an empirical test should not be difficult.

--
Paul DuBois, Senior Technical Writer
Madison, Wisconsin, USA
MySQL AB, www.mysql.com

Are you MySQL certified?  http://www.mysql.com/certification/


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Faster reindexing

2003-07-08 Thread Dominicus Donny
Try analyze your table(s).

Me fail English? That's unpossible
###___Archon___###

- Original Message -
From: electroteque [EMAIL PROTECTED]
To: Paul DuBois [EMAIL PROTECTED]; Florian Weimer [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Wednesday, July 09, 2003 10:23 AM
Subject: RE: Faster reindexing


 when reimporting or reinserting or whatever from a huge db i usually drop
 all the indexes reimport then create them again much quicker

 -Original Message-
 From: Paul DuBois [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, July 09, 2003 1:09 PM
 To: Florian Weimer; [EMAIL PROTECTED]
 Subject: Re: Faster reindexing


 At 9:39 +0200 7/7/03, Florian Weimer wrote:
 I've got a table with 100 million rows and need some indexes on it
 (one row is 126 bytes).
 
 I'm currently using MyISAM and the indexing proceeds at an
 astonishingly low rate: about 200 MB per hour.  This is rate is far
 too low; if we had to recover the database for some reason, we'd have
 to wait for days.
 
 The table looks like this:
 
 CREATE TABLE flows (
  versionCHAR NOT NULL,
  router CHAR(15) NOT NULL,
  src_ip CHAR(15) NOT NULL,
  dst_ip CHAR(15) NOT NULL,
  protocol   TINYINT UNSIGNED NOT NULL,
  src_port   MEDIUMINT UNSIGNED NOT NULL,
  dst_port   MEDIUMINT UNSIGNED NOT NULL,
  packetsINTEGER UNSIGNED NOT NULL,
  bytes  INTEGER UNSIGNED NOT NULL,
  src_if MEDIUMINT UNSIGNED NOT NULL,
  dst_if MEDIUMINT UNSIGNED NOT NULL,
  src_as MEDIUMINT UNSIGNED NOT NULL,
  dst_as MEDIUMINT UNSIGNED NOT NULL,
  src_netCHAR(1) NOT NULL,
  dst_netCHAR(1) NOT NULL,
  direction  CHAR(1) NOT NULL,
  class  CHAR(1) NOT NULL,
  start_time CHAR(24),
  end_time   CHAR(24)
 );
 
 Indexes are created using this statement:
 
 mysql ALTER TABLE flows
  - ADD INDEX dst_ip (dst_ip, src_ip),
  - ADD INDEX dst_port (dst_port, start_time),
  - ADD INDEX src_ip (src_ip, start_time),
  - ADD INDEX time (start_time);
 
 In theory, we could represent the columns router, src_ip, dst_ip,
 start_time, end_time using integers of the appropriate size, but this
 would make ad-hoc queries harder to type (and porting our applications
 would be even more difficult).

 Perhaps, but as a test, you might add a couple of extra columns to
 the table, then populate them like this after loading the table:

 UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip =
 INET_ATON(dst_ip);

 Then try creating the indexes using int_src_ip and int_dst_ip rather
 than src_ip and dst_ip.

 If it's significantly faster, you may want to reconsider whether it might
 not be worth using INET_ATON(X) in your queries rather than X.

 
 Should I switch to another table type?

 It's easy enough to convert the table to, e.g., InnoDB and then
 create the indexes, so an empirical test should not be difficult.

 --
 Paul DuBois, Senior Technical Writer
 Madison, Wisconsin, USA
 MySQL AB, www.mysql.com

 Are you MySQL certified?  http://www.mysql.com/certification/


 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/[EMAIL PROTECTED]


 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Faster reindexing

2003-07-08 Thread Paul DuBois
At 11:23 +0700 7/9/03, Dominicus Donny wrote:
Try analyze your table(s).
What information will this yield to make indexing faster?

Me fail English? That's unpossible
###___Archon___###
- Original Message -
From: electroteque [EMAIL PROTECTED]
To: Paul DuBois [EMAIL PROTECTED]; Florian Weimer [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Wednesday, July 09, 2003 10:23 AM
Subject: RE: Faster reindexing

 when reimporting or reinserting or whatever from a huge db i usually drop
 all the indexes reimport then create them again much quicker
 -Original Message-
 From: Paul DuBois [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, July 09, 2003 1:09 PM
 To: Florian Weimer; [EMAIL PROTECTED]
 Subject: Re: Faster reindexing
 At 9:39 +0200 7/7/03, Florian Weimer wrote:
 I've got a table with 100 million rows and need some indexes on it
 (one row is 126 bytes).
 
 I'm currently using MyISAM and the indexing proceeds at an
 astonishingly low rate: about 200 MB per hour.  This is rate is far
 too low; if we had to recover the database for some reason, we'd have
 to wait for days.
 
 The table looks like this:
 
 CREATE TABLE flows (
  versionCHAR NOT NULL,
  router CHAR(15) NOT NULL,
  src_ip CHAR(15) NOT NULL,
  dst_ip CHAR(15) NOT NULL,
  protocol   TINYINT UNSIGNED NOT NULL,
  src_port   MEDIUMINT UNSIGNED NOT NULL,
  dst_port   MEDIUMINT UNSIGNED NOT NULL,
  packetsINTEGER UNSIGNED NOT NULL,
  bytes  INTEGER UNSIGNED NOT NULL,
  src_if MEDIUMINT UNSIGNED NOT NULL,
  dst_if MEDIUMINT UNSIGNED NOT NULL,
  src_as MEDIUMINT UNSIGNED NOT NULL,
  dst_as MEDIUMINT UNSIGNED NOT NULL,
  src_netCHAR(1) NOT NULL,
  dst_netCHAR(1) NOT NULL,
  direction  CHAR(1) NOT NULL,
  class  CHAR(1) NOT NULL,
  start_time CHAR(24),
  end_time   CHAR(24)
 );
 
 Indexes are created using this statement:
 
 mysql ALTER TABLE flows
  - ADD INDEX dst_ip (dst_ip, src_ip),
  - ADD INDEX dst_port (dst_port, start_time),
  - ADD INDEX src_ip (src_ip, start_time),
  - ADD INDEX time (start_time);
 
 In theory, we could represent the columns router, src_ip, dst_ip,
 start_time, end_time using integers of the appropriate size, but this
 would make ad-hoc queries harder to type (and porting our applications
 would be even more difficult).
 Perhaps, but as a test, you might add a couple of extra columns to
 the table, then populate them like this after loading the table:
 UPDATE flows SET int_src_ip = INET_ATON(src_ip), int_dst_ip =
 INET_ATON(dst_ip);
 Then try creating the indexes using int_src_ip and int_dst_ip rather
 than src_ip and dst_ip.
 If it's significantly faster, you may want to reconsider whether it might
 not be worth using INET_ATON(X) in your queries rather than X.
 
 Should I switch to another table type?
 It's easy enough to convert the table to, e.g., InnoDB and then
 create the indexes, so an empirical test should not be difficult.
 --
 Paul DuBois, Senior Technical Writer
 Madison, Wisconsin, USA
 MySQL AB, www.mysql.com
  Are you MySQL certified?  http://www.mysql.com/certification/


--
Paul DuBois, Senior Technical Writer
Madison, Wisconsin, USA
MySQL AB, www.mysql.com
Are you MySQL certified?  http://www.mysql.com/certification/

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]