Hi,
I am trying to speed up the full text searching on my MySQL database. I currently have
one table that has approximately 90,000 rows added to it each 24 hour period in
batches of up to 200 rows constantly. These rows are will be kept for approximately 60
days, however I am currently working with test data consisting of about 7 days worth
of data (630,000 rows).
The fulltext index is over two fields and I have 7 other indexes, each over a single
field. Queries are always done over at least the fulltext fields and one other indexed
field. I am using MySQL verison 4.0.17-standard-log (binary from mysql.com) on Red Hat
9.0 Linux 2.4.20-16.9 #1 on an Intel 2.6GHz with 2GB RAM. I have read a considerable
amount of the MySQL book and searched on the FAQ lists and on google for some ideas
and here's what I have tried to speed up the searches so far....
Initially test searches would return a matching result set of 7,000 rows in
approximately 12 secs (longer if inserts are being made on the table). The test query
looks like:
SELECT DISTINCT indexed_field_1 FROM table WHERE (MATCH
(fulltext_field_one,fulltext_field_two) AGAINST ('test words' IN BOOLEAN MODE)) AND
indexed_field_2 = 'aword' ORDER BY indexed_field_3 DESC
Firstly I tried allocating more memory to the MySQL server according to the advice in
the MySQL book, however this did not improve the response times. I then tried putting
a LIMIT 10 clause in the statement as the user only wants to look at a small range of
the results (in my testing 10 rows). This improved the query time to about 6 secs.
I then talked to another person using a similar sized database with similar indexes
and queries and they were managing to get results back in 2 - 3 secs. I looked at
their queries and saw that they queries lack an ORDER BY clause, i.e.
SELECT DISTINCT field_1 FROM table WHERE (MATCH
(fulltext_field_one,fulltext_field_two) AGAINST ('test words' IN BOOLEAN MODE)) AND
field_2 = 'aword' LIMIT 10
so I tried this on my database.... and my queries came back in under 1 second. This
was great, and I am guessing is down to the lack of sorting required by the ORDER BY
clause. I then tried to think of a way to store the data in such a way as an ORDER BY
would not be needed and so I tried storing the data with the PRIMARY KEY as the field
I always order by. (btw.. the field is a date field). I had read on the MySQL site
that the PRIMARY KEY is a clustered key and therefore the data rows are stored in the
order of the PRIMARY KEY.
After recreating the table with the ORDER BY field now as the PRIMARY KEY I tried the
query again with out the ORDER BY clause but the rows seem to come out in the order
that they were inserted into the table. This means that I am always getting the
earliest inserted 10 rows that match the query rather than the latest inserted 10 rows
which is what my users want. Maybe I misread/misunderstood the text I found on the
MySQL site about the clustered index/primary key field.
If anyone has any ideas on how I can try to avoid the need of the ORDER BY clause, yet
always get the data results ordered in a certain manner ( I know that this seems like
a contradiction! ) by doing something with the way the data set is stored, or in fact
any other way possible that would be great. Other comments/ideas on where I may be
going wrong or other options I have not appeared to have considered would be great as
well.
Thanks in advance for all of you who take the time to read this and those who are kind
enough to reply.
Tom