The MATCH syntax will return a number indicating the relevance of a particular record to the specified criteria. I would like to generate similar numbers for phrase searches... please read on.
I have written code which will generate an SQL query that allows phrase searching (without using IN BOOLEAN MODE - I do not wish to use 4.0.1 alpha in production, it doesn't seem to rank documents well at all anyway) but currently I only rank the results by the number of occurrences of the phrase. This is an example of an SQL query for the phrase "phrase search": SELECT id, title, (LENGTH(body) - LENGTH(REPLACE(LOWER(body), LOWER('phrase search'),''))) / LENGTH('phrase search') AS relevance FROM articles WHERE MATCH (body) AGAINST ('phrase search') AND (LOCATE('phrase search', body)) ORDER BY relevance DESC This might return something like this: +----+-----------------------------+-----------+ | id | title | relevance | +----+-----------------------------+-----------+ | 1 | Phrase searching with MySQL | 9.0 | | 2 | MySQL Full-Text Search | 7.0 | | 3 | Searching Web Sites | 2.0 | +----+-----------------------------+-----------+ This ranks documents in a reasonable way, but not as accurately as a single word with the MATCH syntax. However when combining the phrase search with a single word things become difficult. >From the MySQL documentation (for the MATCH syntax): "Relevance is computed based on the number of words in the row, the number of unique words in that row, the total number of words in the collection, and the number of documents (rows) that contain a particular word." How can I apply a similar method to my phrase ranking? My phrase ranking is only a whole number indicating the number of occurrences. Simply adding the two together will not produce any usable ranking. How can I best combine this number generated by MATCH for a single word with my ranking for a phrase? I suppose I could multiply my phrase ranking by the MATCH ranking of all the words in that phrase, like this: SELECT id, title, MATCH (body) AGAINST ('phrase search') * (LENGTH(body) - LENGTH(REPLACE(LOWER(body), LOWER('phrase search'),''))) / LENGTH('phrase search') AS relevance FROM ... However, I'm not sure if this is the most accurate method. Any other thoughts would be greatly appreciated! Regards, Duncan Maitland [EMAIL PROTECTED] --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php