Re: Speed difference between boolean full-text searches and full-text searches

Sergei Golubchik Fri, 02 Jan 2004 10:34:27 -0800

Hi!

Sorry for delayed answer - I was on vacations...


On Dec 08, Uros Kotnik wrote:
> OK, I will give you more details.
> 
> Table CDS, have 1,053,794 rows, FT index on title, 
> Data 67,646 KB, Index 70,401 KB
> 
> Table ARTISTS, Rows 292,330, FT on name,
> Data 8,096 KB 
> Index 17,218 KB
> 
> Table TRACKS, rows 13,841,930, FT on title
> Data 625,360 KB 
> Index 646,672 KB
> 
> ft_min_word_len = 3
> key_buffer_size 786432000
> 
> Explain for both SQLs gives same info :
> 
> table  type  possible_keys  key  key_len  ref  rows  Extra
> artists fulltext PRIMARY,ft_name ft_name 0   1 Using where 
> cds fulltext PRIMARY,artistIndex,ft_title ft_title 0   1 Using where 
> tracks ref PRIMARY,artistIndex PRIMARY 4 cds.cdId 13 Using where
> 
> Time for first SQL : 21 sec.
> SELECT artists.name, cds.title, tracks.title FROM artists, cds, tracks
> WHERE artists.artistid = cds.artistid AND artists.artistid =
> tracks.artistid AND cds.cdid = tracks.cdid AND MATCH (artists.name)
> AGAINST ('madonna' IN BOOLEAN MODE) AND 
> MATCH (cds.title) AGAINST ('"music mix 2001"' IN BOOLEAN MODE)
> 
> Time for second SQL : < 1 sec.
> SELECT artists.name, cds.title, tracks.title
> FROM artists, cds, tracks
> WHERE artists.artistid = cds.artistid AND artists.artistid =
> tracks.artistid AND cds.cdid = tracks.cdid AND 
> MATCH ( artists.name ) AGAINST (  'madonna' ) AND 
> MATCH ( cds.title ) AGAINST (  'music' ) AND 
> MATCH ( cds.title ) AGAINST (  'mix' ) AND 
> MATCH ( cds.title ) AGAINST (  '2001' )

Assuming,

SELECT @N=COUNT(*) FROM cds WHERE MATCH title AGAINST ('music');
SELECT @M=COUNT(*) FROM cds WHERE MATCH title AGAINST ('mix');
SELECT @K=COUNT(*) FROM cds WHERE MATCH title AGAINST ('2001');
SELECT @L=COUNT(*) FROM cds WHERE MATCH title AGAINST ('+music +mix +2001' IN BOOLEAN 
MODE);

The first query will do N+M+K index lookups and read L rows from the
disk.

The second query will do N index lookups and read N rows from the disk.

Thus, if (N+M+K) is much greater than N (it usually is)
and L is close to N (it is usually not), than first query should be much
slower. Typically L is less than min(N,M,K), thus the goal is to reduce
the number of row reads.

I agree this optimization is not the best for all situations,
optimizing this type of queries is in the todo.

btw, it could be that my guess about the slowness was wrong :)
Compare these N,M,K,L numbers yourself.

Regards,
Sergei

-- 
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik <[EMAIL PROTECTED]>
 / /|_/ / // /\ \/ /_/ / /__  MySQL AB, Senior Software Developer
/_/  /_/\_, /___/\___\_\___/  Osnabrueck, Germany
       <___/  www.mysql.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Speed difference between boolean full-text searches and full-text searches

Reply via email to