"Only" 272235???  I enter on average about 75,000 to 80,000 records a day 
(and some times, I break 100,000 records a day.  I do monthly rotations so 
it's easy to calculate how big my table gets).  Granted, I don't know what 
your table structure is but mine is very simple.  All I do is run imports 
every morning of CACI.txt file generated by EchoScope (network traffic 
monitoring tool).  I have about 15 fields per records (as far as I can 
remember) and most of them are indexed.  I wish only, that I could only do 
unique index on more than just ID field but due to the uniqueness of what I'm 
doing, I have to live with regular indexes.  Some of the fields are longer 
than 100 characters so I made sure that my indexes are no longer than 20 
characters.  I was playing a lot before I got performance I have right now.  
One thing I've noticed was that if I indexed more than 20 characters (some 
times, I was crazy trying to index all 255 characters), performance was 
actually dropping down drastically.

Now, things I want to share with:

1.  I use my database as a back-end for my PERL scripts and web interface for 
easy data querying by end users.  Having said that, I could not really afford 
long delays between queries and data display to the browser.

2.  In my queries I use =, LIKE and used to use REGEXP within MySQL.  Out of 
these three, REGEXP was the WORST in performance (and rightly so).  I decided 
to drop it altogether.  An example:

I created a keyword list so clients would not have to type individual words 
to find what they want.  So rather than type: red, yellow, green, gray, blue 
and so on, they would only type "colors" in the query field.  It was my task, 
then, to go and fetch all those.  It's, of course a simple example but you 
can imagine the implications (sex sites, e.g.).  Initially, I used REGEXP and 
it took FOREVER to return a query.  It took on average 5-7 minutes to return 
a query to a browser searching through 1,000,000+ database.  It was totally 
unacceptable so I had to re-write it.  I have decided to write a routine that 
would simply write a query for me.  So, if I had 30 words I am interested it, 
my routine would build a query for me like:

AND (field LIKE %something% OR field LIKE %something1% OR field LIKE 
%something2%) AND NOT (field LIKE %thisthing% OR field LIKE %thisthing1%) and 
so on (it can get quite long).  As you can see, I use %word% yet still, I get 
a great performance out of it.  After re-writing my code, query of that type 
dropped to about 60 seconds running over 1,000,000+ records.  It's sevenfold+ 
improvement over MySQL's builtin REGEXP.

Doing '=' returns a query almost in less than a second to a second on the 
server.  Very rarely it's more than a few seconds.  Doing 'LIKE "something%"' 
query is also very fast but doing 'LIKE %something%' is not much slower 
either.  The longest it took for any of my queries to run was 90 seconds.  
Overall, I am very pleased with the way things go.  I was comparing how MS 
SQL would hold up against MySQL.  I tried to import 500,000 records from 
ASCII file to MS SQL and it took half a day only to completely die (I guess, 
machine ran out of resources).  That was done on a comparable machine running 
Windows NT 4.0.  The same task on MySQL took me between 1 to 5 minutes (if 
table was indexed).  Running some test against commercial databases (in my 
case, Progress.  I don't know if anybody knows it here), MySQL also came up a 
winner.  It was about 10 times faster in read queries.

I guess, what I am trying to say is that it's up to you to optimize it as 
much as possible.  Run tests and see what's best for you.

P.S.  If you let people run queries like 's%', expect long delays.  It's only 
natural.  I always tell my users that if they want speed up their queries, 
type in as much as possible.  Unless it's only absolutely necessary, use 
query of that type.

Now the hardware (don't laugh, please):

IBM Pentium II, 450MHz with 324Mb of RAM (so I have less than you do) :-)  
One thing that really bugs me, though, is the disks.  On that particular 
machine they are VERY VERY slow.  It's a simple off the assembly line machine 
so I guess, I can't really complain.  If I switched them to something better, 
I'm sure I would get even a better performance.



On Tuesday 06 February 2001 17:24, Ryan Hadley wrote:
->  I'm not much of a db admin so I really don't know how to get better
->  performance out of our database... but it seriously needs some speeding
 up. ->
->  We have this huge dictionary... It's 272235 rows.  We have to be able t
->  search through it.
->
->  We run about 800,000 visits a day.
->
->  Right now we're doing a "WHERE keyWord LIKE 'word%'"... but the "LIKE"'s
 are ->  killing the machine.  We had to upgrade our memory to 512M.  That
 helped a ->  bit, now our machine doesn't just die.  But it's still way to
 slow.  The CPU ->  is maxing out and we're hitting like 10-15% idle during
 slow periods and ->  0.0% idle during rush periods.
->
->  What can we do?  Besides of course firing me and getting a real db admin.
 :) ->
->
->  ---------------------------------------------------------------------
->  Before posting, please check:
->     http://www.mysql.com/manual.php   (the manual)
->     http://lists.mysql.com/           (the list archive)
->
->  To request this thread, e-mail <[EMAIL PROTECTED]>
->  To unsubscribe, e-mail <[EMAIL PROTECTED]>
->  Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

-- 
Bolek,

URL: http://www.bolek.com
URL: http://slash.bolek.com
e-mail: [EMAIL PROTECTED]
ICQ: 4086197, Address: 402905326

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to