"Only" 272235??? I enter on average about 75,000 to 80,000 records a day
(and some times, I break 100,000 records a day. I do monthly rotations so
it's easy to calculate how big my table gets). Granted, I don't know what
your table structure is but mine is very simple. All I do is run imports
every morning of CACI.txt file generated by EchoScope (network traffic
monitoring tool). I have about 15 fields per records (as far as I can
remember) and most of them are indexed. I wish only, that I could only do
unique index on more than just ID field but due to the uniqueness of what I'm
doing, I have to live with regular indexes. Some of the fields are longer
than 100 characters so I made sure that my indexes are no longer than 20
characters. I was playing a lot before I got performance I have right now.
One thing I've noticed was that if I indexed more than 20 characters (some
times, I was crazy trying to index all 255 characters), performance was
actually dropping down drastically.
Now, things I want to share with:
1. I use my database as a back-end for my PERL scripts and web interface for
easy data querying by end users. Having said that, I could not really afford
long delays between queries and data display to the browser.
2. In my queries I use =, LIKE and used to use REGEXP within MySQL. Out of
these three, REGEXP was the WORST in performance (and rightly so). I decided
to drop it altogether. An example:
I created a keyword list so clients would not have to type individual words
to find what they want. So rather than type: red, yellow, green, gray, blue
and so on, they would only type "colors" in the query field. It was my task,
then, to go and fetch all those. It's, of course a simple example but you
can imagine the implications (sex sites, e.g.). Initially, I used REGEXP and
it took FOREVER to return a query. It took on average 5-7 minutes to return
a query to a browser searching through 1,000,000+ database. It was totally
unacceptable so I had to re-write it. I have decided to write a routine that
would simply write a query for me. So, if I had 30 words I am interested it,
my routine would build a query for me like:
AND (field LIKE %something% OR field LIKE %something1% OR field LIKE
%something2%) AND NOT (field LIKE %thisthing% OR field LIKE %thisthing1%) and
so on (it can get quite long). As you can see, I use %word% yet still, I get
a great performance out of it. After re-writing my code, query of that type
dropped to about 60 seconds running over 1,000,000+ records. It's sevenfold+
improvement over MySQL's builtin REGEXP.
Doing '=' returns a query almost in less than a second to a second on the
server. Very rarely it's more than a few seconds. Doing 'LIKE "something%"'
query is also very fast but doing 'LIKE %something%' is not much slower
either. The longest it took for any of my queries to run was 90 seconds.
Overall, I am very pleased with the way things go. I was comparing how MS
SQL would hold up against MySQL. I tried to import 500,000 records from
ASCII file to MS SQL and it took half a day only to completely die (I guess,
machine ran out of resources). That was done on a comparable machine running
Windows NT 4.0. The same task on MySQL took me between 1 to 5 minutes (if
table was indexed). Running some test against commercial databases (in my
case, Progress. I don't know if anybody knows it here), MySQL also came up a
winner. It was about 10 times faster in read queries.
I guess, what I am trying to say is that it's up to you to optimize it as
much as possible. Run tests and see what's best for you.
P.S. If you let people run queries like 's%', expect long delays. It's only
natural. I always tell my users that if they want speed up their queries,
type in as much as possible. Unless it's only absolutely necessary, use
query of that type.
Now the hardware (don't laugh, please):
IBM Pentium II, 450MHz with 324Mb of RAM (so I have less than you do) :-)
One thing that really bugs me, though, is the disks. On that particular
machine they are VERY VERY slow. It's a simple off the assembly line machine
so I guess, I can't really complain. If I switched them to something better,
I'm sure I would get even a better performance.
On Tuesday 06 February 2001 17:24, Ryan Hadley wrote:
-> I'm not much of a db admin so I really don't know how to get better
-> performance out of our database... but it seriously needs some speeding
up. ->
-> We have this huge dictionary... It's 272235 rows. We have to be able t
-> search through it.
->
-> We run about 800,000 visits a day.
->
-> Right now we're doing a "WHERE keyWord LIKE 'word%'"... but the "LIKE"'s
are -> killing the machine. We had to upgrade our memory to 512M. That
helped a -> bit, now our machine doesn't just die. But it's still way to
slow. The CPU -> is maxing out and we're hitting like 10-15% idle during
slow periods and -> 0.0% idle during rush periods.
->
-> What can we do? Besides of course firing me and getting a real db admin.
:) ->
->
-> ---------------------------------------------------------------------
-> Before posting, please check:
-> http://www.mysql.com/manual.php (the manual)
-> http://lists.mysql.com/ (the list archive)
->
-> To request this thread, e-mail <[EMAIL PROTECTED]>
-> To unsubscribe, e-mail <[EMAIL PROTECTED]>
-> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
--
Bolek,
URL: http://www.bolek.com
URL: http://slash.bolek.com
e-mail: [EMAIL PROTECTED]
ICQ: 4086197, Address: 402905326
---------------------------------------------------------------------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)
To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php