Are temporary tables more effective than joins?
I am still stuck with my full text search engine. I have experemented with different approaches to seleting search results and figure out that having temporary tables is about 300 times faster than doing 'inner joins' The table: +---+--+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +---+--+--+-+-+---+ | l_id | int(10) unsigned | | PRI | 0 | | | w_id | int(10) unsigned | | PRI | 0 | | +---+--+--+-+-+---+ That the only used table in the query. It contains law id and word id. For each law there are many words. There is a vocabulary which hold the word, but that does not matter, since only law_words table is used. The query is: SELECT DISTINCT w0.l_id FROM law_words as w0 inner join law_words as w1 on w0.l_id=w1.l_id inner join law_words as w2 on w0.l_id=w2.l_id inner join law_words as w3 on w0.l_id=w3.l_id WHERE w0.w_id IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578, 3643, 4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150, 11172, 11232,11847, 12542, 12859, 14811, 24839, 26653,27662) AND w1.w_id IN (405, 2017,2192, 2592, 2595, 2603, 2981, 4055, 4068, 4346,5755, 6480, 9384, 9408, 11513, 11514, 12126, 12134, 12638, 13052, 13643, 13769, 13836, 13945, 14154, 14693, 14867, 14980, 15518, 15557, 17830, 19005, 19051, 19247, 20176, 20926, 22364, 22365, 22366, 22732, 24668, 24793, 24956, 25286, 26242, 26665, 26847, 27144, 27348, 27815, 28494, 30910, 31878, 32161, 33586, 34396) AND w2.w_id IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578, 3643, 4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150, 11172, 11232,11847, 12542, 12859, 14811, 24839, 26653,27662) AND w3.w_id IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578, 3643, 4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150, 11172, 11232,11847, 12542, 12859, 14811, 24839, 26653,27662) ; It selects all laws which have all 4 words which user entered in the search form. For example, if user entered 'look', then get ids from the vocabulary of words: look, looks, looked, looky, etc, so we get a list of ids for the first word. The same goes for all other words. Then, since the search is done using AND (all words) we need to join the table with itself, so only the laws which have ALL the words are left at the end. The above query takes about 30 seconds to complete. Explain shows: +---+---+---+-+-+--+--+--- ---+ | table | type | possible_keys | key | key_len | ref | rows | Extra| +---+---+---+-+-+--+--+--- ---+ | w0| range | PRIMARY | PRIMARY | 4 | NULL | 474 | where used; Using index; Using temporary | | w2| range | PRIMARY | PRIMARY | 4 | NULL | 474 | where used; Using index; Distinct| | w3| range | PRIMARY | PRIMARY | 4 | NULL | 474 | where used; Using index; Distinct| | w1| range | PRIMARY | PRIMARY | 4 | NULL | 477 | where used; Using index; Distinct| +---+---+---+-+-+--+--+--- ---+ 4 rows in set (0.02 sec) Show index: +---++--+--+-+ ---+-+--++-+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Comment | +---++--+--+-+ ---+-+--++-+ | law_words | 0 | PRIMARY |1 | w_id| A |NULL | NULL | NULL | | | law_words | 0 | PRIMARY |2 | l_id| A | 228208 | NULL | NULL | | +---++--+--+-+ ---+-+--++-+ 2 rows in set (0.00 sec) There are 228208 records in the table. If more words specified in a search query search takes longer than TCP/IP timeout, so user never gets the result. I don't understand what's wrong with the query and why it takes so long, so today i tried a different approach. I have created 4 temporary tables: create temporary table t1 (id int unsigned not null, primary key (id)); create temporary table t2 (id int unsigned not null, primary key (id)); create temporary table t3 (id int unsigned not null, primary key (id)); create temporary table t4 (id int unsigned not null, primary key (id)); Done 4 separate queries: INSERT INTO t1 SELECT DISTINCT w0.l_id FROM law_words as w0 WHERE w0.w_id IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578, 3643, 4345,4768, 5297, 5976,6133, 7243,7245, 9271,
Re: Are temporary tables more effective than joins?
On Thu, Feb 13, 2003 at 03:04:08PM +0300, Artem Koutchine wrote: homemade fulltext searchsystem The query is: SELECT DISTINCT w0.l_id FROM law_words as w0 inner join law_words as w1 on w0.l_id=w1.l_id inner join law_words as w2 on w0.l_id=w2.l_id inner join law_words as w3 on w0.l_id=w3.l_id WHERE w0.w_id IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578, 3643, 4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150, 11172, 11232,11847, 12542, 12859, 14811, 24839, 26653,27662) AND w1.w_id IN (405, 2017,2192, 2592, 2595, 2603, 2981, 4055, 4068, 4346,5755, 6480, 9384, 9408, 11513, 11514, 12126, 12134, 12638, 13052, 13643, 13769, 13836, 13945, 14154, 14693, 14867, 14980, 15518, 15557, 17830, 19005, 19051, 19247, 20176, 20926, 22364, 22365, 22366, 22732, 24668, 24793, 24956, 25286, 26242, 26665, 26847, 27144, 27348, 27815, 28494, 30910, 31878, 32161, 33586, 34396) AND w2.w_id IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578, 3643, 4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150, 11172, 11232,11847, 12542, 12859, 14811, 24839, 26653,27662) AND w3.w_id IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578, 3643, 4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150, 11172, 11232,11847, 12542, 12859, 14811, 24839, 26653,27662) ; It selects all laws which have all 4 words which user entered in the search form. For example, if user entered 'look', then get ids from I used to use something similair. Nowadays I use the following: The table to be searchd has many columns which all should be used in the fulltext search. For the fulltext search there is an extra 'text' field which holds all the words of the record without punctuation. (this is needed anyway because some categorie columns are represented by numbers but I do want to be able to search by category name, so this text column has the name of the cateroy in stead of the categorynumber) For the fulltext there are 2 tables: The dictionary: id int, word varchar, used_count int. The link between the dict and the table I search in: dictionary_id, table_I_search_in_id When somebody searches for 1 word it is easy, just a simple join between the 3 tables: select table_I_search_in.id from table_i_search_in, link_table, the_dictionary where table_i_search_in_.id=link_table.table_I_search_in_id and link_table.dictionary_id = dictionary.id and dictionary.word like 'look%'; When one searches on more than one word I check which of the words is the most rare (via the 'used_count' column in the dictionary). I use that word to join the tables as above. The other words are put in a long 'like' structure and search in the extra text column in the main table. The effect is you limit the possible rows to a small amount really quick with a simple join. The details of the search are handled bij a like which can get as complicated as you like using OR`s, NOT`s etc; it won`t need more joins or anything anyway. The need for temporary tables is gone as well! Good luck, Harmen (jep, for me the above system is faster (and more flexibel) then mysql 4.0 fulltext searches) ... Also, maybe someone got a link to articles describe effective technique for building full-text search engine? Regards, Artem -- The Moon is Waxing Gibbous (86% of Full) tty.nl - 2dehands.nl: 69721 - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Are temporary tables more effective than joins?
On Thu, Feb 13, 2003 at 05:57:54PM +0300, Artem Koutchine wrote: The effect is you limit the possible rows to a small amount really quick with a simple join. The details of the search are handled bij a like which can get as complicated as you like using OR`s, NOT`s etc; it won`t need more joins or anything anyway. The need for temporary tables is gone as well! I wonder if there is something bad in creating 5-20 temporary tables every minute or so? AFAIK not. They can make life really easy every now and then :) Threads: 83 Questions: 253527481 Slow queries: 659 Opens: 735739 Flush tables: 1 Open tables: 1231 Queries per second avg: 227.646 Most of these 'Opens' are temp tables. I use them often in cases where one big join would be much slower. The thing you have to be careful for (in the fulltext system you are working on) is to handle the cases when somebody enters a keyword which will return tens of thousends of results. The joins using the tmp tables will get really slow (And might block the rest f your users if you use myisam) -- The Moon is Waxing Gibbous (87% of Full) tty.nl - 2dehands.nl: 69830 - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php