I am still stuck with my full text search engine. I have experemented with different approaches to seleting search results and figure out that having temporary tables is about 300 times faster than doing 'inner joins'
The table: +-------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------------+------+-----+---------+-------+ | l_id | int(10) unsigned | | PRI | 0 | | | w_id | int(10) unsigned | | PRI | 0 | | +-------+------------------+------+-----+---------+-------+ That the only used table in the query. It contains law id and word id. For each law there are many words. There is a vocabulary which hold the word, but that does not matter, since only law_words table is used. The query is: SELECT DISTINCT w0.l_id FROM law_words as w0 inner join law_words as w1 on w0.l_id=w1.l_id inner join law_words as w2 on w0.l_id=w2.l_id inner join law_words as w3 on w0.l_id=w3.l_id WHERE w0.w_id IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578, 3643, 4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150, 11172, 11232,11847, 12542, 12859, 14811, 24839, 26653,27662) AND w1.w_id IN (405, 2017,2192, 2592, 2595, 2603, 2981, 4055, 4068, 4346,5755, 6480, 9384, 9408, 11513, 11514, 12126, 12134, 12638, 13052, 13643, 13769, 13836, 13945, 14154, 14693, 14867, 14980, 15518, 15557, 17830, 19005, 19051, 19247, 20176, 20926, 22364, 22365, 22366, 22732, 24668, 24793, 24956, 25286, 26242, 26665, 26847, 27144, 27348, 27815, 28494, 30910, 31878, 32161, 33586, 34396) AND w2.w_id IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578, 3643, 4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150, 11172, 11232,11847, 12542, 12859, 14811, 24839, 26653,27662) AND w3.w_id IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578, 3643, 4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150, 11172, 11232,11847, 12542, 12859, 14811, 24839, 26653,27662) ; It selects all laws which have all 4 words which user entered in the search form. For example, if user entered 'look', then get ids from the vocabulary of words: look, looks, looked, looky, etc, so we get a list of ids for the first word. The same goes for all other words. Then, since the search is done using AND (all words) we need to join the table with itself, so only the laws which have ALL the words are left at the end. The above query takes about 30 seconds to complete. Explain shows: +-------+-------+---------------+---------+---------+------+------+--- ---------------------------------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+-------+---------------+---------+---------+------+------+--- ---------------------------------------+ | w0 | range | PRIMARY | PRIMARY | 4 | NULL | 474 | where used; Using index; Using temporary | | w2 | range | PRIMARY | PRIMARY | 4 | NULL | 474 | where used; Using index; Distinct | | w3 | range | PRIMARY | PRIMARY | 4 | NULL | 474 | where used; Using index; Distinct | | w1 | range | PRIMARY | PRIMARY | 4 | NULL | 477 | where used; Using index; Distinct | +-------+-------+---------------+---------+---------+------+------+--- ---------------------------------------+ 4 rows in set (0.02 sec) Show index: +-----------+------------+----------+--------------+-------------+---- -------+-------------+----------+--------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Comment | +-----------+------------+----------+--------------+-------------+---- -------+-------------+----------+--------+---------+ | law_words | 0 | PRIMARY | 1 | w_id | A | NULL | NULL | NULL | | | law_words | 0 | PRIMARY | 2 | l_id | A | 228208 | NULL | NULL | | +-----------+------------+----------+--------------+-------------+---- -------+-------------+----------+--------+---------+ 2 rows in set (0.00 sec) There are 228208 records in the table. If more words specified in a search query search takes longer than TCP/IP timeout, so user never gets the result. I don't understand what's wrong with the query and why it takes so long, so today i tried a different approach. I have created 4 temporary tables: create temporary table t1 (id int unsigned not null, primary key (id)); create temporary table t2 (id int unsigned not null, primary key (id)); create temporary table t3 (id int unsigned not null, primary key (id)); create temporary table t4 (id int unsigned not null, primary key (id)); Done 4 separate queries: INSERT INTO t1 SELECT DISTINCT w0.l_id FROM law_words as w0 WHERE w0.w_id IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578, 3643, 4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150, 11172, 11232,11847, 12542, 12859, 14811, 24839, 26653,27662); INSERT INTO t2 SELECT DISTINCT w1.l_id FROM law_words as w1 WHERE w1.w_id IN (405, 2017,2192, 2592, 2595, 2603, 2981, 4055, 4068, 4346,5755, 6480, 9384, 9408, 11513, 11514, 12126, 12134, 12638, 13052, 13643, 13769, 13836, 13945, 14154, 14693, 14867, 14980, 15518, 15557, 17830, 19005, 19051, 19247, 20176, 20926, 22364, 22365, 22366, 22732, 24668, 24793, 24956, 25286, 26242, 26665, 26847, 27144, 27348, 27815, 28494, 30910, 31878, 32161, 33586, 34396); INSERT INTO t3 SELECT DISTINCT w3.l_id FROM law_words as w3 WHERE w3.w_id IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578, 3643, 4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150, 11172, 11232,11847, 12542, 12859, 14811, 24839, 26653,27662); INSERT INTO t4 SELECT DISTINCT w4.l_id FROM law_words as w4 WHERE w4.w_id IN (405, 2017,2192, 2592, 2595, 2603, 2981, 4055, 4068, 4346,5755, 6480, 9384, 9408, 11513, 11514, 12126, 12134, 12638, 13052, 13643, 13769, 13836, 13945, 14154, 14693, 14867, 14980, 15518, 15557, 17830, 19005, 19051, 19247, 20176, 20926, 22364, 22365, 22366, 22732, 24668, 24793, 24956, 25286, 26242, 26665, 26847, 27144, 27348, 27815, 28494, 30910, 31878, 32161, 33586, 34396); And then done: select t1.id from t1 inner join t2 on t1.id=t2.id inner join t3 on t3.id=t1.id inner join t4 on t4.id=t1.id; It returned the same result as the full query, but! it took only 0.1 second! Each 'insert into t4' takes 0.02, creation takes 0.01 and the last select with join take from 0 to 0.01 seconds. Now, why on earth this would be 300 times faster than the above query? It seems as mysql just can't optimize the query properly. Also, maybe someone got a link to articles describe effective technique for building full-text search engine? Regards, Artem --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php