On Tuesday 27 Nov 2001 20:17, Mike Wexler wrote: > > The query > > > > SELECT * FROM table > > WHERE MATCH (description) AGAINST ("+A +really +nice +toy" IN BOOLEAN > > MODE) ORDER BY MATCH (description) AGAINST ("+A +really +nice +toy" IN > > BOOLEAN MODE) LIMIT 10 > > > > will abort the search when it retrieves the 500th document with all the > > three words. It's impossible to say, how many index entries it will look > > at.
I take it the "IN BOOLEAN MODE" part of the AGAINST() is going to be new to 4.0.1. Incidentally, how are the WHERE clauses handled when MATCH/AGAINST is used for FTS? Given that I am seeing a fairly linear increase in query time with the increase in number of matched terms, I would guess that the FTS is performed first. Especially since limiting other constraints in the WHERE clause produces no noticeable reduction in query time. This seems to be wasteful. Considering that FTS is likely the slowest part of the query, it would probably be beneficial in terms of performance to have it execute last, with all other "simpler" constraints being satisfied first, so fewer records need to be searched. Another question - is there a way to acquire a list of words in the FTS index? Someting like SELECT Word, count(*) AS Frequency FROM FTSIndex GROUP BY Word ORDER BY FREQUENCY ASC LIMIT 100; This would allow for easier overview of what "dead" words are being indexed, and therefore allow for easier isolation of new "stop words", and reduction in unnecessary searching that FTS would have to perform, thus increasing performance. Considering that I'm really after SELECT speed, would more careful tuning of stop words be likeky to yield signifficant performance improvements? It would also be REALLY nice to have a "dynamic" list of stop words. I know you said that this is definitely planned, but it would be nice to know how soon... Another thing - it would probably be useful to gather some statistics about FTS queries performed. For example, say the terms 'perl' or 'linux' appear 10000 times in the queries (since the last checkpoint interval, or whatever). But the words 'nt' and 'vb' appear 0 times. It might be beneficial to tune the FTS in a slightly dynamic way, so that the frequently searched terms have a sort of "keep-alive", while the words that are never searched for (but are indexed) get added to the stop-word list. OTOH, if a stop word does end up being searched for a number of times that exceeds some threshold (say, 0.1%), it could be removed from the stop word list. The FTS index would then need to be updated, but if the database structure is dynamic (for example a search engine of some sort) where the records get deleted and inserted all the time, the database would be able to "learn" what are "start words" and what are "stop words". These statistics could be analyzed and acted upon when, say, ANALYZE is issued. It would make the FTS "self-tuning". Of course, the FTS feature in the TODO that allows specifiying explicitly strings that get indexed "as they are" becomes much more useful, then, as it can prevent things from being automatically added to the "stop word" list during ANALYZE/OPTIMIZE stage. Has any of this been at least thought about? I've just checked the TODO, and it doesn't appear to be there... Looking forward to 4.0.1. BTW, will the file formats be compatible? Or will it require a dump + restore of the database, when going from 4.0.0 to 4.0.1? Regards. Gordan --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php