Jasper Bryant-Greene wrote:

Hi



I'm running a small search engine that indexes the web, but have been having
trouble with optimising it to handle the load.

There are two tables involved in searching - `pages`, which stores the ID,
URL, title and crawl date, and `words` which has two rows - `word` and
`page` where `word` is a word found on the page and `page` is the ID from
the `pages` table.



When we crawl an URL we rip all the words from the page and add them to the
`words` table with the ID of the page.



The query we use for searches is:



SELECT COUNT(words.word) AS score, words.page AS id, pages.title, pages.url
FROM words,pages WHERE pages.id=words.page AND words.word IN($words) GROUP
BY words.page ORDER BY score DESC LIMIT 10



Not sure which columns are indexed, but the main problem is almost certainly
"words.word IN($words)" - this will yield a set of rows from the 'words' table
assuming that the string "$words" contains the query terms(s), this requires an
expensive "serial walk" of the "words" table and an expensive string matching
operation associated with each row - unless the IN() operator is a lot cleverer
than I suspect.


A better approach would, perhaps, be to parse the query into an array of
words in the application and then construct suitable SQL along the lines of
.... words.word = qword1 OR words.word = qword2 OR words.word = qword3
   ...... etc with as many or few terms as required.

Index the words table on word for a further really big performance boost .



I've put the LIMIT 10 in there because it's been going slow as hell. not
only that but it's still going rather slow since we're getting rather high
load on the search engine at the moment.



If anyone could suggest ways to make it run faster that'd be great, bearing
in mind that:



a) I can't change MySQL server parameters since the host won't allow it

b)      I'd rather not start crawling again with a different method - the
words table has over 1,700,000 rows.



Thanks



Jasper Bryant-Greene
Cabbage Promotions
<mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]
<http://fatalnetwork.com/> http://fatalnetwork.com/
US: +1 (509) 691 3287
NZ: +64 (21) 232 3303









-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to