Are temporary tables more effective than joins?

2003-02-13 Thread Artem Koutchine
I am still stuck with my full text search engine. I have experemented
with
different approaches to seleting search results and figure out that
having temporary tables is about 300 times faster than doing 'inner
joins'


The table:

+---+--+--+-+-+---+
| Field | Type | Null | Key | Default | Extra |
+---+--+--+-+-+---+
| l_id  | int(10) unsigned |  | PRI | 0   |   |
| w_id  | int(10) unsigned |  | PRI | 0   |   |
+---+--+--+-+-+---+

That the only used table in the query. It contains
law id and word id. For each law there are many words.
There is a vocabulary which hold the word, but that does not
matter, since only law_words table is used.

The query is:

SELECT DISTINCT  w0.l_id FROM   law_words as w0
inner join law_words as w1 on w0.l_id=w1.l_id
inner join law_words as w2 on w0.l_id=w2.l_id
inner join law_words as w3 on w0.l_id=w3.l_id
WHERE
w0.w_id  IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578,
3643,
4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150,
11172,
11232,11847, 12542, 12859, 14811, 24839, 26653,27662)
AND
w1.w_id IN (405, 2017,2192, 2592, 2595, 2603, 2981, 4055, 4068,
4346,5755,
6480, 9384, 9408, 11513, 11514, 12126, 12134, 12638, 13052, 13643,
13769,
13836, 13945, 14154, 14693, 14867, 14980, 15518, 15557, 17830, 19005,
19051,
19247, 20176, 20926, 22364, 22365, 22366, 22732, 24668, 24793, 24956,
25286,
26242, 26665, 26847, 27144, 27348, 27815, 28494, 30910, 31878, 32161,
33586,  34396)
AND
w2.w_id  IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578,
3643,
4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150,
11172,
11232,11847, 12542, 12859, 14811, 24839, 26653,27662)
AND
w3.w_id  IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578,
3643,
4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150,
11172,
11232,11847, 12542, 12859, 14811, 24839, 26653,27662)
;

It selects all laws which have all 4 words which user entered in the
search form. For example, if user entered 'look', then get ids from
the vocabulary of words: look, looks, looked, looky, etc, so we
get a list of ids for the first word. The same goes for all other
words. Then, since the search is done using AND (all words) we
need to join the table with itself, so only the laws which have
ALL the words are left at the end.

The above query takes about 30 seconds to complete.

Explain shows:
+---+---+---+-+-+--+--+---
---+
| table | type  | possible_keys | key | key_len | ref  | rows |
Extra|
+---+---+---+-+-+--+--+---
---+
| w0| range | PRIMARY   | PRIMARY |   4 | NULL |  474 |
where used; Using index; Using temporary |
| w2| range | PRIMARY   | PRIMARY |   4 | NULL |  474 |
where used; Using index; Distinct|
| w3| range | PRIMARY   | PRIMARY |   4 | NULL |  474 |
where used; Using index; Distinct|
| w1| range | PRIMARY   | PRIMARY |   4 | NULL |  477 |
where used; Using index; Distinct|
+---+---+---+-+-+--+--+---
---+
4 rows in set (0.02 sec)

Show index:
+---++--+--+-+
---+-+--++-+
| Table | Non_unique | Key_name | Seq_in_index | Column_name |
Collation | Cardinality | Sub_part | Packed | Comment |
+---++--+--+-+
---+-+--++-+
| law_words |  0 | PRIMARY  |1 | w_id| A
|NULL | NULL | NULL   | |
| law_words |  0 | PRIMARY  |2 | l_id| A
|  228208 | NULL | NULL   | |
+---++--+--+-+
---+-+--++-+
2 rows in set (0.00 sec)


There are 228208 records in the table.

If more words specified in a search query
search takes longer than TCP/IP timeout,
so user never gets the result.

I don't understand what's wrong with the query and why it takes
so long, so today i tried a different approach.

I have created 4 temporary tables:
create temporary table t1 (id int unsigned not null, primary key
(id));
create temporary table t2 (id int unsigned not null, primary key
(id));
create temporary table t3 (id int unsigned not null, primary key
(id));
create temporary table t4 (id int unsigned not null, primary key
(id));

Done 4 separate queries:

INSERT INTO t1 SELECT DISTINCT  w0.l_id FROM   law_words as w0
WHERE
w0.w_id  IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578,
3643,
4345,4768, 5297, 5976,6133, 7243,7245, 9271, 

Re: Are temporary tables more effective than joins?

2003-02-13 Thread harm
On Thu, Feb 13, 2003 at 03:04:08PM +0300, Artem Koutchine wrote:

homemade fulltext searchsystem

 The query is:
 
 SELECT DISTINCT  w0.l_id FROM   law_words as w0
 inner join law_words as w1 on w0.l_id=w1.l_id
 inner join law_words as w2 on w0.l_id=w2.l_id
 inner join law_words as w3 on w0.l_id=w3.l_id
 WHERE
 w0.w_id  IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578,
 3643,
 4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150,
 11172,
 11232,11847, 12542, 12859, 14811, 24839, 26653,27662)
 AND
 w1.w_id IN (405, 2017,2192, 2592, 2595, 2603, 2981, 4055, 4068,
 4346,5755,
 6480, 9384, 9408, 11513, 11514, 12126, 12134, 12638, 13052, 13643,
 13769,
 13836, 13945, 14154, 14693, 14867, 14980, 15518, 15557, 17830, 19005,
 19051,
 19247, 20176, 20926, 22364, 22365, 22366, 22732, 24668, 24793, 24956,
 25286,
 26242, 26665, 26847, 27144, 27348, 27815, 28494, 30910, 31878, 32161,
 33586,  34396)
 AND
 w2.w_id  IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578,
 3643,
 4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150,
 11172,
 11232,11847, 12542, 12859, 14811, 24839, 26653,27662)
 AND
 w3.w_id  IN (258,282,287, 615, 1101, 1949, 1968, 3417, 3574, 3578,
 3643,
 4345,4768, 5297, 5976,6133, 7243,7245, 9271, 9348, 11146, 11150,
 11172,
 11232,11847, 12542, 12859, 14811, 24839, 26653,27662)
 ;
 
 It selects all laws which have all 4 words which user entered in the
 search form. For example, if user entered 'look', then get ids from

I used to use something similair. Nowadays I use the following:

The table to be searchd has many columns which all should be used in the
fulltext search. For the fulltext search there is an extra 'text' field which holds 
all the words
of the record without punctuation.
(this is needed anyway because some categorie columns are represented by numbers but I 
do want to
be able to search by category name, so this text column has the name of the cateroy
in stead of the categorynumber) 


For the fulltext there are 2 tables:
The dictionary: id int, word varchar, used_count int.
The link between the dict and the table I search in: dictionary_id,
table_I_search_in_id

When somebody searches for 1 word it is easy, just a simple join between
the 3 tables:
select table_I_search_in.id from table_i_search_in, link_table,
the_dictionary where  table_i_search_in_.id=link_table.table_I_search_in_id
and link_table.dictionary_id = dictionary.id and dictionary.word like
'look%';

When one searches on more than one word I check which of the words is the
most rare (via the 'used_count' column in the dictionary). I use that word
to join the tables as above. The other words are put in a long 'like' structure and
search in the extra text column in the main table.

The effect is you limit the possible rows to a small amount really quick
with a simple join. The details of the search are handled bij a like which
can get as complicated as you like using OR`s, NOT`s etc; it won`t need more joins or 
anything anyway.  The need for temporary tables is gone as well!


Good luck,
Harmen
(jep, for me the above system is faster (and more flexibel) then mysql 4.0
fulltext searches)


...

 Also, maybe someone got a link to articles describe effective
 technique for building full-text search engine?
 
 Regards, Artem
 

-- 
The Moon is Waxing Gibbous (86% of Full)
 tty.nl - 2dehands.nl: 69721

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Are temporary tables more effective than joins?

2003-02-13 Thread harm
On Thu, Feb 13, 2003 at 05:57:54PM +0300, Artem Koutchine wrote:
 
  The effect is you limit the possible rows to a small amount really
 quick
  with a simple join. The details of the search are handled bij a like
 which
  can get as complicated as you like using OR`s, NOT`s etc; it won`t
 need more joins or anything anyway.  The need for temporary tables is
 gone as well!
 
 I wonder if there is something bad in creating 5-20 temporary tables
 every
 minute or so?

AFAIK not. They can make life really easy every now and then :)

Threads: 83  Questions: 253527481  Slow queries: 659  Opens: 735739  Flush
tables: 1  Open tables: 1231  Queries per second avg: 227.646

Most of these 'Opens' are temp tables. I use them often in cases where one 
big join would be much slower.

The thing you have to be careful for (in the fulltext system you are working on)
is to handle the cases when somebody enters a keyword which will return
tens of thousends of results. The joins using the tmp tables will 
get really slow (And might block the rest f your users if you use myisam)




-- 
The Moon is Waxing Gibbous (87% of Full)
 tty.nl - 2dehands.nl: 69830

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php