Hi! On Dec 15, Bob Sidebotham wrote: > Thanks for the note, Sergei. I admit to being confused by the > documentation on boolean search. In particular, exactly what the problem > domain for the two algorithms is supposed to be (and how relevancy is > computed with boolean search). I don't understand why boolean search has > (it seems) a totally different approach to relevancy. > > Or to ask it a different way, which applications would use non-boolean > search, and why?
Boolean search engine answers the question: what documents contain these words or these words and do not contain these words. It's _boolean_ query. Every document can either match (value = TRUE) or not (value = FALSE). Some "relevance" that MySQL returns does not mean that much - it's a simple estimation based on a number of words matched. Boolean query - by its nature - should not be a subject to stopword filtering. It is now, but it will be changed soon. Natural language query engine is designed to find documents "about this and that". It can do it any way it want. It is _not_ guaranteed that it will do it by comparing query and documents word-by-word. It is _not_ guaranteed that the document found will have some words in common with the query at all. It does now, but it can be changed. It's not a query about _words_, it's a query about their _meaning_. The document cannot simply match or mismatch - it can be partially relevant. In fact, different people will have different notion of how relevant the particular document is. To train a nl search engine so called test collections are used - a set of documents and queries, were for each query a set or relevand documents is specified by a group of human experts. Natural language search engine can do anything that will help to approximate those human judgements as close as possible. It can apply stopword filtering - to remove noise words - stemming, thesaurus expansion, complex statistics, whatever. Thanks for the good question. The issue really requires some clarification. We'll update the manual. > Also my understanding from the documentation and release notes is that > boolean search is only available in 4.0.1, which is not yet available. > Is this correct? Almost. Some "boolean search" is available in 4.0.0 as well, but it's built on top of existing nl-search code. In 4.0.1 it is completely rewritten from scratch, syntax is changed in incompatible way and documented in the manual. ("it is documented" means that there should be no syntax changes - or at least backward-compatibility will be maintained) Regards, Sergei -- MySQL Development Team __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Sergei Golubchik <[EMAIL PROTECTED]> / /|_/ / // /\ \/ /_/ / /__ MySQL AB, http://www.mysql.com/ /_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany <___/ --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php