I have spent quite some time trying to tune up fulltext indexing
(coincidentally I am also working with the DMOZ data in my database).
I too noticed how slow it goes when you order by relevancy... but there is
no reason to order by the relevancy because it returns the results in
descending order anyway. I ran the "gossamer threads" search on my table and
got results in .88 seconds. A bit slower than you but that could be because
I have only 512 megs of RAM.
One solution I came up with was to build a "cache" table that caches results
of using the fulltext search. I preload the cache table by using top 500
search terms that you can get from various search engines that publish that
information.
I also have tried doing custom search routines but they were dreadfully slow
when a search term occurred more than a few thousand times. I have custom
search routines for the category searching in the DMOZ though because it
only has a few hundred thousand records.
Paul Rydell
SearchShots.com - See your search.
-----Original Message-----
From: Alex Krohn [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, January 23, 2001 1:12 PM
To: [EMAIL PROTECTED]
Subject: fulltext search performance
Hi,
Has anyone used the fulltext indexes in 3.23 and can share their
experiences? I've only done limited testing, but was quite disapointed
with performance.
We have an import of DMoz data into a table with Title, URL and
Description fields as a fulltext index. There are 2.5 million rows that
takes up about 600 megs of data.
A search ordered by relevancy takes about 10 seconds which isn't
acceptable:
mysql> select ID, Title, match(Title,URL,Description) AGAINST ('gossamer
threads') as x
mysql> from Links having x order by x desc limit 5;
+--------+-----------------------------+-----------------+
| ID | Title | x |
+--------+-----------------------------+-----------------+
| 534518 | Gossamer Threads | 26.504848142061 |
| 529654 | Gossamer Threads | 25.362285817498 |
| 347215 | Gossamer Threads | 24.619930445728 |
| 303514 | Gossamer Threads Scripts | 20.181361130953 |
| 223327 | The Gossamer Project- Fluky | 17.176138340563 |
+--------+-----------------------------+-----------------+
5 rows in set (10.05 sec)
Worse still, it doesn't cache well, and it always takes 10 seconds.
If I turn off the order by it works well, the first hit is slow, but
future searches are < 1 second. However, as you can see the results are
pretty poor.
mysql> select ID, Title, match(Title,URL,Description) AGAINST ('gossamer
threads') as x
mysql> from Links having x limit 5;
+-------+-------------------------------------------+-----------------+
| ID | Title | x |
+-------+-------------------------------------------+-----------------+
| 78206 | Mussel Man | 6.4472288734643 |
| 81368 | The Blacksmiths Compendium | 5.7507258841122 |
| 81519 | Jill Cater Nixon's The World of Blackwork | 6.5658374638893 |
| 81544 | Caron Collection Threads | 13.784135416084 |
| 81646 | Salt & Pepper Cross Stitch | 6.1446464319279 |
+-------+-------------------------------------------+-----------------+
5 rows in set (0.44 sec)
Is there anything I can be doing to improve this? I'm running 3.23.28
on a dual piii-800 with 1gig of ram and a u160 scsi disk. We have built
our own custom search routines that do full text altavista style
searches using two internal tables:
mysql> describe Links_Word_List;
+-----------+------------------+------+-----+---------+----------------+----
-----------------------------+
| Field | Type | Null | Key | Default | Extra |
Privileges |
+-----------+------------------+------+-----+---------+----------------+----
-----------------------------+
| Word_ID | int(10) unsigned | | PRI | NULL | auto_increment |
select,insert,update,references |
| Word | varchar(50) | | UNI | | |
select,insert,update,references |
| Frequency | int(11) | | | 0 | |
select,insert,update,references |
+-----------+------------------+------+-----+---------+----------------+----
-----------------------------+
3 rows in set (0.00 sec)
mysql> describe Links_Score_List;
+----------+------------------+------+-----+---------+-------+--------------
-------------------+
| Field | Type | Null | Key | Default | Extra | Privileges
|
+----------+------------------+------+-----+---------+-------+--------------
-------------------+
| Word_ID | int(10) unsigned | | MUL | 0 | |
select,insert,update,references |
| Item_ID | int(10) unsigned | | MUL | 0 | |
select,insert,update,references |
| Score | int(11) | | | 0 | |
select,insert,update,references |
| Position | int(11) | | | 0 | |
select,insert,update,references |
+----------+------------------+------+-----+---------+-------+--------------
-------------------+
4 rows in set (0.00 sec)
and the same searches come back in less then 3 seconds the first hit and
under 1 second thereafter (ordered by relevancy).
I'd love to hear any comments about this..
Cheers,
Alex
---------------------------------------------------------------------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)
To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
---------------------------------------------------------------------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)
To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php