Hi! I'm running a large web application with MySQL 4.0.14. The application is performing searches in a large table (>500.000 rows). The WHERE-clause for these searches sometimes is a combination of different columns in the table, and I have noticed some very sub-optimal index-usage when the search includes a FULLTEXT-indexed column. MySQL seems to ALWAYS use the fulltext-index in these searches - even if, for instance the PRIMARY-index would be far more optimal.
How to repeat: CREATE TABLE testTable ( usrName varchar(40) NOT NULL default '', age tinyint(4) default NULL, music text, film text, otherInterests text, PRIMARY KEY (usrName), KEY ageIndex (age), FULLTEXT KEY music (music), FULLTEXT KEY film (film), FULLTEXT KEY otherInterests (otherInterests) ) TYPE=MyISAM; INSERT INTO testTable VALUES ('kalle',14,'Eminem, hiphop, heavy metal','matrix, lord of the rings, the ring','football'); INSERT INTO testTable VALUES ('pelle',16,'Jazz, blues','Steve McQueen','cars'); INSERT INTO testTable VALUES ('olle',14,'Slipknot, Eminem','The Matrix','nothing'); INSERT INTO testTable VALUES ('lasse',15,'Reggae, Eminem','Rockers','Reading, football'); INSERT INTO testTable VALUES ('nisse',15,'Reggae, Eminem','The Matrix','Reading, football'); Examples: mysql> EXPLAIN SELECT * FROM testTable WHERE usrName='olle' AND MATCH (film) AGAINST ('"matrix"' IN BOOLEAN MODE); +-----------+--------+---------------+---------+---------+-------+------+--- ----------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-----------+--------+---------------+---------+---------+-------+------+--- ----------+ | testTable | eq_ref | PRIMARY,film | PRIMARY | 40 | const | 1 | Using where | +-----------+--------+---------------+---------+---------+-------+------+--- ----------+ 1 row in set (0.00 sec) This is ok - the PRIMARY index is selected... mysql> EXPLAIN SELECT * FROM testTable WHERE usrName IN ('olle','pelle') AND MATCH (film) AGAINST ('"matrix"' IN BOOLEAN MODE); +-----------+----------+---------------+------+---------+------+------+----- --------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-----------+----------+---------------+------+---------+------+------+----- --------+ | testTable | fulltext | PRIMARY,film | film | 0 | | 1 | Using where | +-----------+----------+---------------+------+---------+------+------+----- --------+ 1 row in set (0.00 sec) But here the fulltext index "film" is selected! I think the optimal selection would be to always use PRIMARY-index if possible! In this small table it's ok of course, but in my application I have over 500.000 rows. When searching for very common films (that matches a lot of rows) this leads to very poor performance. Using PRIMARY-index gives A LOT better performance in all cases I have tested. mysql> EXPLAIN SELECT * FROM testTable WHERE usrName IN ('olle','pelle') AND age=14 AND MATCH (film) AGAINST ('"matrix"' IN BOOLEAN MODE); +-----------+----------+-----------------------+------+---------+------+---- --+-------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-----------+----------+-----------------------+------+---------+------+---- --+-------------+ | testTable | fulltext | PRIMARY,ageIndex,film | film | 0 | | 1 | Using where | +-----------+----------+-----------------------+------+---------+------+---- --+-------------+ 1 row in set (0.00 sec) Here the fulltext index "film" is also selected. PRIMARY or ageIndex would be better... mysql> EXPLAIN SELECT * FROM testTable ignore index (film) WHERE usrName IN ('olle','pelle') AND age=14 AND MATCH (film) AGAINST ('"matrix"' IN BOOLEAN MODE); +-----------+------+------------------+----------+---------+-------+------+- ------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-----------+------+------------------+----------+---------+-------+------+- ------------+ | testTable | ref | PRIMARY,ageIndex | ageIndex | 2 | const | 2 | Using where | +-----------+------+------------------+----------+---------+-------+------+- ------------+ 1 row in set (0.00 sec) If I use IGNORE INDEX (film), it works better. This is actuelly how I deal with it in my application - if the search contains very limiting where-clauses (other than a FULLTEXT-columns), I add "IGNORE INDEX (music,film,otherInterests)" to the query... The main problem with all this seems to be that when a fulltext-indexed column is used in the where-clause, The MySQL optimizer will ALWAYS choose the FULLTEXT-index since it is assumed that this will match only 1 row (always 1 row). I understand that it is very hard to do a correct analysis and estimation of rows for a BOOLEAN FULLTEXT-search, but to assume that it will always return 1 row is not at all optimal in my opinion! If MySQL instead assumed - say 1000 rows - we would get far better results in most situations. Then we would use another index if it would result in less rows than 1000 (not so many in my large table). If that assumption would be incorrect, the worst thing that could happen would be that another index (with less that 1000 possible rows) would be used instead even though the fulltext-index would have been better. But that's not so bad - we would scan maximum 999 rows with the use of another index - no problem! ...not compared to my 100.000 rows scans that's the result of picking a fulltext-index when it gives a lot of matches... The best solution would of course be that MySQL could estimate the number of possible rows from the boolean fulltest-search. I guess that this is impossible. The next best thing would be to be able to manually configure the minimum estimated number of rows from a boolean search as a hint to the MySQL optimizer... Right now this estimation is always 1, I'd like to set it higher!! Best regards, Tobias Lind -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]