Hi!

On Nov 27, Gordan Bobic wrote:
> 
> I take it the "IN BOOLEAN MODE" part of the AGAINST() is going to be new to 
> 4.0.1.

Yes. And as it's in the manual now, changes of the syntax are unlikely.

> Incidentally, how are the WHERE clauses handled when MATCH/AGAINST is used 
> for FTS? Given that I am seeing a fairly linear increase in query time with 
> the increase in number of matched terms, I would guess that the FTS is 
> performed first. Especially since limiting other constraints in the WHERE 
> clause produces no noticeable reduction in query time. This seems to be 
> wasteful.

Yes it performed first.
By design natural language fts engine has to build list of oll the
documents matched (as relevance value depends on some global
statistics). So it cannot take into account any other constraints.

Boolean FTS engine need not this statistics and it will benefit from
other constraints, if possible.
I'm talking about 4.0.1 here, in 4.0.0 boolean search was build on top
of nl fulltext search code.

> Considering that FTS is likely the slowest part of the query, it would 
> probably be beneficial in terms of performance to have it execute last, with 
> all other "simpler" constraints being satisfied first, so fewer records need 
> to be searched.
> 
> Another question - is there a way to acquire a list of words in the FTS 
> index? Someting like
> 
> SELECT                Word,
>               count(*) AS Frequency
> FROM          FTSIndex
> GROUP BY              Word
> ORDER BY              FREQUENCY ASC
> LIMIT         100;

There's myisam/ft_dump utility that can dump fulltext index ot of MYI file.
Adding such a functionality to mysqld would be not that impossible.

> This would allow for easier overview of what "dead" words are being indexed, 
> and therefore allow for easier isolation of new "stop words", and reduction 
> in unnecessary searching that FTS would have to perform, thus increasing 
> performance. Considering that I'm really after SELECT speed, would more 
> careful tuning of stop words be likeky to yield signifficant performance 
> improvements?

Mot probably, yes. But
 - strictly speaking it would be not in line with boolean search
   approach.  In 4.0.1 boolean FTS is still a subject to stopword
   filtering, but I hope it will be changed soon.
 - it would often require different stopword lists for different indexes
   even on the same table! We do not plan to add such a feature in the
   nearest future.
 - it would often require a lot of manual work - periodical updating of
   stopword list according to the recent table data.

So, there're some ideas how to avoid all these drawbacks by automated
stopword list creation based on live data, (and making them applicable
only to nl-fts queries). It's most probably the way fulltext search in
MySQL will be developed. It can result in a _huge_ speedup, if properly
implemented.

> It would also be REALLY nice to have a "dynamic" list of stop words. I know 
> you said that this is definitely planned, but it would be nice to know how 
> soon...

Well, we plan to have plain-text .frm files this year.
Making stopword list "dynamic" relies on this feature.

> Another thing - it would probably be useful to gather some statistics about 
> FTS queries performed.
... 
> Has any of this been at least thought about? I've just checked the TODO, and 
> it doesn't appear to be there...

These are nice ideas, they weren't thought about, but they definitely
will - I promise :-)

> Looking forward to 4.0.1.

It should be out in a few days.

> BTW, will the file formats be compatible? Or will it require a dump + restore 
> of the database, when going from 4.0.0 to 4.0.1?

For now - there's one bit changed - and one has to rebuild the table.
The easiest way is 'ALTER TABLE ... TYPE=MYISAM', though dump+restore
will work too, of course.

Still, I'd like to make file formats fully compatible - so,
you'd better take a look at the ChangeLog section of the
manual included in the 4.0.1 distribution.

Regards,
Sergei

-- 
MySQL Development Team
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik <[EMAIL PROTECTED]>
 / /|_/ / // /\ \/ /_/ / /__  MySQL AB, http://www.mysql.com/
/_/  /_/\_, /___/\___\_\___/  Osnabrueck, Germany
       <___/

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to