Hi!
I'm running a large web application with MySQL 4.0.14.
The application is performing searches in a large table (500.000 rows).
The WHERE-clause for these searches sometimes is a combination of different
columns in the table, and I have noticed some very sub-optimal index-usage
when the search includes a FULLTEXT-indexed column. MySQL seems to ALWAYS
use the fulltext-index in these searches - even if, for instance the
PRIMARY-index would be far more optimal.
How to repeat:
CREATE TABLE testTable (
usrName varchar(40) NOT NULL default '',
age tinyint(4) default NULL,
music text,
film text,
otherInterests text,
PRIMARY KEY (usrName),
KEY ageIndex (age),
FULLTEXT KEY music (music),
FULLTEXT KEY film (film),
FULLTEXT KEY otherInterests (otherInterests)
) TYPE=MyISAM;
INSERT INTO testTable VALUES ('kalle',14,'Eminem, hiphop, heavy
metal','matrix, lord of the rings, the ring','football');
INSERT INTO testTable VALUES ('pelle',16,'Jazz, blues','Steve
McQueen','cars');
INSERT INTO testTable VALUES ('olle',14,'Slipknot, Eminem','The
Matrix','nothing');
INSERT INTO testTable VALUES ('lasse',15,'Reggae,
Eminem','Rockers','Reading, football');
INSERT INTO testTable VALUES ('nisse',15,'Reggae, Eminem','The
Matrix','Reading, football');
Examples:
mysql EXPLAIN SELECT * FROM testTable WHERE usrName='olle' AND MATCH (film)
AGAINST ('matrix' IN BOOLEAN MODE);
+---++---+-+-+---+--+---
--+
| table | type | possible_keys | key | key_len | ref | rows |
Extra |
+---++---+-+-+---+--+---
--+
| testTable | eq_ref | PRIMARY,film | PRIMARY | 40 | const |1 |
Using where |
+---++---+-+-+---+--+---
--+
1 row in set (0.00 sec)
This is ok - the PRIMARY index is selected...
mysql EXPLAIN SELECT * FROM testTable WHERE usrName IN ('olle','pelle') AND
MATCH (film) AGAINST ('matrix' IN BOOLEAN MODE);
+---+--+---+--+-+--+--+-
+
| table | type | possible_keys | key | key_len | ref | rows |
Extra |
+---+--+---+--+-+--+--+-
+
| testTable | fulltext | PRIMARY,film | film | 0 | |1 |
Using where |
+---+--+---+--+-+--+--+-
+
1 row in set (0.00 sec)
But here the fulltext index film is selected!
I think the optimal selection would be to always use PRIMARY-index if
possible!
In this small table it's ok of course, but in my application I have over
500.000 rows. When searching for very common films (that matches a lot of
rows) this leads to very poor performance. Using PRIMARY-index gives A LOT
better performance in all cases I have tested.
mysql EXPLAIN SELECT * FROM testTable WHERE usrName IN ('olle','pelle') AND
age=14 AND MATCH (film) AGAINST ('matrix' IN BOOLEAN MODE);
+---+--+---+--+-+--+
--+-+
| table | type | possible_keys | key | key_len | ref |
rows | Extra |
+---+--+---+--+-+--+
--+-+
| testTable | fulltext | PRIMARY,ageIndex,film | film | 0 | |
1 | Using where |
+---+--+---+--+-+--+
--+-+
1 row in set (0.00 sec)
Here the fulltext index film is also selected.
PRIMARY or ageIndex would be better...
mysql EXPLAIN SELECT * FROM testTable ignore index (film) WHERE usrName IN
('olle','pelle') AND age=14 AND MATCH (film) AGAINST ('matrix' IN BOOLEAN
MODE);
+---+--+--+--+-+---+--+-
+
| table | type | possible_keys| key | key_len | ref | rows |
Extra |
+---+--+--+--+-+---+--+-
+
| testTable | ref | PRIMARY,ageIndex | ageIndex | 2 | const |2 |
Using where |
+---+--+--+--+-+---+--+-
+
1 row in set (0.00 sec)
If I use IGNORE INDEX (film), it works better.
This is actuelly how I deal with it in my application - if the search
contains very limiting where-clauses (other than a FULLTEXT-columns), I add
IGNORE INDEX (music,film,otherInterests) to the query...
The main problem with all this seems to be that when a fulltext-indexed
column is used in the where-clause, The MySQL optimizer will ALWAYS choose
the FULLTEXT-index since it is assumed that this will match only 1 row
(always 1 row).
I understand that it is very hard to do a correct analysis and estimation of
rows for a BOOLEAN FULLTEXT-search, but to assume that it will always return
1 row is not at all optimal in my opinion!
If MySQL instead assumed - say 1000 rows - we would get far better results