Hi all, I have used PHP and MySQL for years but have never tackled the amount of data that I now have (and it's expected to grow considerably). My queries run OK when the server is not busy but they just about kill the DB when traffic picks up.
My apologies if this is common knowledge...I've had trouble searching on custom full text indexing because it generally brings up hits regarding the built-in full text indexing for various DB servers. MySQL's built-in fulltext doesn't quite do what we want. We are currently running MySQL 4.0.16 compiled with LinuxThreads on FreeBSD. Basically, I'm trying to optimize a search involving three tables. Table A: (content table...currently nearly 40,000 rows and 62 MB) id UNSIGNED INT PRIMARY status VARCHAR 10 INDEXED category VARCHAR 20 INDEXED content LONGTEXT + other fields Table B: (stem word index...instead of indexing the exact word, I just keep the stem, so 'car' and 'cars' are both stored as 'car'. Table currently has about 180,000 rows and is 9 MB) id UNSIGNED INT PRIMARY stem_word VARCHAR 30 INDEXED Table C: (full text index...currently about 4.5 million rows and 186 MB) id UNSIGNED INT PRIMARY stem_word_id (references id in table B) UNSIGNED INT INDEXED content_id (references id in table A) UNSIGNED INT INDEXED Here's how I perform the search right now. The user enters keywords. I turn those words into a list of unique stems. I then search for the stem IDs from Table B using the following query: SELECT id FROM B WHERE stem_word IN ('truck','piano','move'); Using the IDs from that query (say 10, 20, 30), I run the following query: SELECT C.content_id, COUNT(C.content_id), A.category FROM A, C WHERE C.content_id=A.id AND A.status='Active' AND (C.stem_word_id IN (10,20,30)) GROUP BY C.content_id HAVING Count(C.content_id)=3; I have recently also tried this query, which is a little cleaner without the count/having stuff, but it seems about the same speed-wise: SELECT DISTINCT C0.content_id, A.category FROM A, C C0, C C1, C C2 WHERE C0.content_id=A.id AND A.status='Active' AND C0.stem_word_id=10 AND C1.stem_word_id=20 AND C0.content_id=C1.content_id AND C2.stem_word=30 AND C0.content_id=C1.content_id; When running the EXPLAIN on both queries, both are doing 'using where; using temporary; using filesort' on table C. I'm not sure how to avoid that. This system has 512MB and I'm basically using the my-large.cnf file as-is. Running mytop shows that the key efficiency is 100%, and (cache?) "ratio" around 36%. All my tables are MyISAM right now. I tried switching to InnoDB but it was much slower and I figured there were enough variables to troubleshoot already without playing around with the InnoDB parameters. So my questions: 1. Is there anything blatantly wrong with my queries? 2. Should I have designed my index table differently? 3. Any parameter in my.cnf I should modify to be different from the my-large.cnf settings? 4. Any web resources with instructions for building customized full text indexing...not using built-in stuff? 5. Since the content field of table A is only used for display (since the words have been indexed), I was considering compressing the text in that field so save DB disk space. Is that worth the effort? Any input is appreciated. Thanks for your help. Mike Boone (reply to the list or contact me directly at: http://boonedocks.net/mailmike.php3) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]