The MATCH syntax will return a number indicating the relevance of a
particular record to the specified criteria. I would like to generate
similar numbers for phrase searches... please read on.



I have written code which will generate an SQL query that allows phrase
searching (without using IN BOOLEAN MODE - I do not wish to use 4.0.1
alpha in production, it doesn't seem to rank documents well at all
anyway) but currently I only rank the results by the number of
occurrences of the phrase. This is an example of an SQL query for the
phrase "phrase search":

SELECT id, title, (LENGTH(body) - LENGTH(REPLACE(LOWER(body),
LOWER('phrase search'),''))) / LENGTH('phrase search') AS relevance FROM
articles WHERE MATCH (body) AGAINST ('phrase search') AND
(LOCATE('phrase search', body)) ORDER BY relevance DESC

This might return something like this:

+----+-----------------------------+-----------+
| id | title                       | relevance |
+----+-----------------------------+-----------+
|  1 | Phrase searching with MySQL |       9.0 |
|  2 | MySQL Full-Text Search      |       7.0 |
|  3 | Searching Web Sites         |       2.0 |
+----+-----------------------------+-----------+

This ranks documents in a reasonable way, but not as accurately as a
single word with the MATCH syntax. However when combining the phrase
search with a single word things become difficult.

>From the MySQL documentation (for the MATCH syntax): "Relevance is
computed based on the number of words in the row, the number of unique
words in that row, the total number of words in the collection, and the
number of documents (rows) that contain a particular word." How can I
apply a similar method to my phrase ranking?

My phrase ranking is only a whole number indicating the number of
occurrences. Simply adding the two together will not produce any usable
ranking. How can I best combine this number generated by MATCH for a
single word with my ranking for a phrase?

I suppose I could multiply my phrase ranking by the MATCH ranking of all
the words in that phrase, like this:

SELECT id, title, MATCH (body) AGAINST ('phrase search') * (LENGTH(body)
- LENGTH(REPLACE(LOWER(body), LOWER('phrase search'),''))) /
LENGTH('phrase search') AS relevance FROM ...

However, I'm not sure if this is the most accurate method.

Any other thoughts would be greatly appreciated!

Regards,
Duncan Maitland
[EMAIL PROTECTED]



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to