Hi all,

I have used PHP and MySQL for years but have never tackled the amount of
data that I now have (and it's expected to grow considerably). My queries
run OK when the server is not busy but they just about kill the DB when
traffic picks up.

My apologies if this is common knowledge...I've had trouble searching on
custom full text indexing because it generally brings up hits regarding the
built-in full text indexing for various DB servers. MySQL's built-in
fulltext doesn't quite do what we want.

We are currently running MySQL 4.0.16 compiled with LinuxThreads on FreeBSD.


Basically, I'm trying to optimize a search involving three tables.

Table A: (content table...currently nearly 40,000 rows and 62 MB)
 id UNSIGNED INT PRIMARY
 status VARCHAR 10 INDEXED
 category VARCHAR 20 INDEXED
 content LONGTEXT
 + other fields

Table B: (stem word index...instead of indexing the exact word, I just keep
the stem, so 'car' and 'cars' are both stored as 'car'. Table currently has
about 180,000 rows and is 9 MB)
 id UNSIGNED INT PRIMARY
 stem_word VARCHAR 30 INDEXED

Table C: (full text index...currently about 4.5 million rows and 186 MB)
 id UNSIGNED INT PRIMARY
 stem_word_id (references id in table B) UNSIGNED INT INDEXED
 content_id (references id in table A) UNSIGNED INT INDEXED


Here's how I perform the search right now.

The user enters keywords. I turn those words into a list of unique stems. I
then search for the stem IDs from Table B using the following query:

SELECT id FROM B WHERE stem_word IN ('truck','piano','move');

Using the IDs from that query (say 10, 20, 30), I run the following query:

SELECT C.content_id, COUNT(C.content_id), A.category FROM A, C WHERE
C.content_id=A.id AND A.status='Active' AND (C.stem_word_id IN (10,20,30))
GROUP BY C.content_id HAVING Count(C.content_id)=3;

I have recently also tried this query, which is a little cleaner without the
count/having stuff, but it seems about the same speed-wise:

SELECT DISTINCT C0.content_id, A.category FROM A, C C0, C C1, C C2 WHERE
C0.content_id=A.id AND A.status='Active' AND C0.stem_word_id=10 AND
C1.stem_word_id=20 AND C0.content_id=C1.content_id AND C2.stem_word=30 AND
C0.content_id=C1.content_id;

When running the EXPLAIN on both queries, both are doing 'using where; using
temporary; using filesort' on table C. I'm not sure how to avoid that.

This system has 512MB and I'm basically using the my-large.cnf file as-is.
Running mytop shows that the key efficiency is 100%, and (cache?) "ratio"
around 36%. All my tables are MyISAM right now. I tried switching to InnoDB
but it was much slower and I figured there were enough variables to
troubleshoot already without playing around with the InnoDB parameters.

So my questions:

1. Is there anything blatantly wrong with my queries?
2. Should I have designed my index table differently?
3. Any parameter in my.cnf I should modify to be different from the
my-large.cnf settings?
4. Any web resources with instructions for building customized full text
indexing...not using built-in stuff?
5. Since the content field of table A is only used for display (since the
words have been indexed), I was considering compressing the text in that
field so save DB disk space. Is that worth the effort?

Any input is appreciated. Thanks for your help.

Mike Boone
(reply to the list or contact me directly at:
http://boonedocks.net/mailmike.php3)


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to