Re: Suboptimal index-usage with FULLTEXT-search

2003-09-17 Thread Steven Roussey
  The WHERE-clause for these searches sometimes is a combination 
  of different columns in the table, and I have noticed some 
  very sub-optimal index-usage

 Adding correct estimations for MATCH is in todo.
 Until it's done you can use USE_INDEX/IGNORE_INDEX as a simple 
 workaround.

FORCE_INDEX works in this case. I don't think USE_INDEX does since the FTS
index always reports one match and that overrides the USE_INDEX hint.

However, at this point the FTS index is not used at all. If the original
poster was always doing a query where the other columns narrowed down the
matches significantly, then it would be decided to never really use the full
text index.

Say in the fictional case of a table of messages that belong to a million
different forums. When searching inside one forum, the fulltext index would
likely always have more matches than the key on the forumid.

This won't really get fixed unless you can do a composite index of normal
and fulltext indexes.

Even with tsearch2 which is coming in Postgres 7.4, it doesn't let you do a
composite index. :( It does have some nice language configuration stuff (for
example, you can tell it that it is to be parsed as English and it will then
send it to an English stemmer and an English stopword list -- I don't know
if you can preprocess it ).

There is a very ugly workaround on this. You could fake a composite index.
Assuming that you create a separate copied table for FTS, in that table scan
all the words in the document and prefix them with the other column. First
drop all the one and two character words (and the stopwords) yourself. Then
insert something like this as the message f31334wanted f31334say
f31334hello instead of I wanted to say hello. Do the same with the search
and suddenly things work a lot faster. And uglier...

Sincerely,
Steven Roussey
http://Network54.com/ 




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Suboptimal index-usage with FULLTEXT-search

2003-09-10 Thread Tobias Lind
Hi!
I'm running a large web application with MySQL 4.0.14.
The application is performing searches in a large table (500.000 rows).
The WHERE-clause for these searches sometimes is a combination of different
columns in the table, and I have noticed some very sub-optimal index-usage
when the search includes a FULLTEXT-indexed column. MySQL seems to ALWAYS
use the fulltext-index in these searches - even if, for instance the
PRIMARY-index would be far more optimal.

How to repeat:

CREATE TABLE testTable (
  usrName varchar(40) NOT NULL default '',
  age tinyint(4) default NULL,
  music text,
  film text,
  otherInterests text,
  PRIMARY KEY  (usrName),
  KEY ageIndex (age),
  FULLTEXT KEY music (music),
  FULLTEXT KEY film (film),
  FULLTEXT KEY otherInterests (otherInterests)
) TYPE=MyISAM;

INSERT INTO testTable VALUES ('kalle',14,'Eminem, hiphop, heavy
metal','matrix, lord of the rings, the ring','football');
INSERT INTO testTable VALUES ('pelle',16,'Jazz, blues','Steve
McQueen','cars');
INSERT INTO testTable VALUES ('olle',14,'Slipknot, Eminem','The
Matrix','nothing');
INSERT INTO testTable VALUES ('lasse',15,'Reggae,
Eminem','Rockers','Reading, football');
INSERT INTO testTable VALUES ('nisse',15,'Reggae, Eminem','The
Matrix','Reading, football');

Examples:

mysql EXPLAIN SELECT * FROM testTable WHERE usrName='olle' AND MATCH (film)
AGAINST ('matrix' IN BOOLEAN MODE);
+---++---+-+-+---+--+---
--+
| table | type   | possible_keys | key | key_len | ref   | rows |
Extra   |
+---++---+-+-+---+--+---
--+
| testTable | eq_ref | PRIMARY,film  | PRIMARY |  40 | const |1 |
Using where |
+---++---+-+-+---+--+---
--+
1 row in set (0.00 sec)
This is ok - the PRIMARY index is selected...



mysql EXPLAIN SELECT * FROM testTable WHERE usrName IN ('olle','pelle') AND
MATCH (film) AGAINST ('matrix' IN BOOLEAN MODE);
+---+--+---+--+-+--+--+-
+
| table | type | possible_keys | key  | key_len | ref  | rows |
Extra   |
+---+--+---+--+-+--+--+-
+
| testTable | fulltext | PRIMARY,film  | film |   0 |  |1 |
Using where |
+---+--+---+--+-+--+--+-
+
1 row in set (0.00 sec)
But here the fulltext index film is selected!
I think the optimal selection would be to always use PRIMARY-index if
possible!
In this small table it's ok of course, but in my application I have over
500.000 rows. When searching for very common films (that matches a lot of
rows) this leads to very poor performance. Using PRIMARY-index gives A LOT
better performance in all cases I have tested.


mysql EXPLAIN SELECT * FROM testTable WHERE usrName IN ('olle','pelle') AND
age=14 AND MATCH (film) AGAINST ('matrix' IN BOOLEAN MODE);
+---+--+---+--+-+--+
--+-+
| table | type | possible_keys | key  | key_len | ref  |
rows | Extra   |
+---+--+---+--+-+--+
--+-+
| testTable | fulltext | PRIMARY,ageIndex,film | film |   0 |  |
1 | Using where |
+---+--+---+--+-+--+
--+-+
1 row in set (0.00 sec)
Here the fulltext index film is also selected.
PRIMARY or ageIndex would be better...



mysql EXPLAIN SELECT * FROM testTable ignore index (film) WHERE usrName IN
('olle','pelle') AND age=14 AND MATCH (film) AGAINST ('matrix' IN BOOLEAN
MODE);
+---+--+--+--+-+---+--+-
+
| table | type | possible_keys| key  | key_len | ref   | rows |
Extra   |
+---+--+--+--+-+---+--+-
+
| testTable | ref  | PRIMARY,ageIndex | ageIndex |   2 | const |2 |
Using where |
+---+--+--+--+-+---+--+-
+
1 row in set (0.00 sec)
If I use IGNORE INDEX (film), it works better.
This is actuelly how I deal with it in my application - if the search
contains very limiting where-clauses (other than a FULLTEXT-columns), I add
IGNORE INDEX (music,film,otherInterests) to the query...


The main problem with all this seems to be that when a fulltext-indexed
column is used in the where-clause, The MySQL optimizer will ALWAYS choose
the FULLTEXT-index since it is assumed that this will match only 1 row
(always 1 row).
I understand that it is very hard to do a correct analysis and estimation of
rows for a BOOLEAN FULLTEXT-search, but to assume that it will always return
1 row is not at all optimal in my opinion!
If MySQL instead assumed - say 1000 rows - we would get far better results

Re: Suboptimal index-usage with FULLTEXT-search

2003-09-10 Thread Sergei Golubchik
Hi!

On Sep 10, Tobias Lind wrote:
 Hi!
 I'm running a large web application with MySQL 4.0.14.
 The application is performing searches in a large table (500.000 rows).
 The WHERE-clause for these searches sometimes is a combination of different
 columns in the table, and I have noticed some very sub-optimal index-usage
 when the search includes a FULLTEXT-indexed column. MySQL seems to ALWAYS
 use the fulltext-index in these searches - even if, for instance the
 PRIMARY-index would be far more optimal.

Adding correct estimations for MATCH is in todo.
Until it's done you can use USE_INDEX/IGNORE_INDEX as a simple workaround.
 
Regards,
Sergei

-- 
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik [EMAIL PROTECTED]
 / /|_/ / // /\ \/ /_/ / /__  MySQL AB, Senior Software Developer
/_/  /_/\_, /___/\___\_\___/  Osnabrueck, Germany
   ___/  www.mysql.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]