Hello world,

I'm pretty new at SQL.  I'm just trying to build a simple document 
index.  I'm hoping there is a better way to do things than what I've 
got here.

Here's what I've got for a table definition:

>mysql> create table main_index(
>     sequence int unsigned primary key auto_increment not null,
>     doc_id int unsigned not null,
>     word_id int unsigned not null,
>     index idx_word (word_id),
>     index idx_doc (doc_id) );
>mysql> describe main_index;
>+----------+------------------+------+-----+---------+----------------+
>| Field    | Type             | Null | Key | Default | Extra          |
>+----------+------------------+------+-----+---------+----------------+
>| sequence | int(10) unsigned |      | PRI | NULL    | auto_increment |
>| doc_id   | int(10) unsigned |      | MUL | 0       |                |
>| word_id  | int(10) unsigned |      | MUL | 0       |                |
>+----------+------------------+------+-----+---------+----------------+
>3 rows in set (0.00 sec)

In this table are about five million rows (at the moment but it will 
grow).  There are a several thousand possible unique values for 
doc_id's and word_id's.

I'm trying to find the unique doc_id's which have matching word_id's. 
Simple queries run very quickly of course.  More complex queries are 
very unpredictable.  They vary tremendously depending on the size of 
the (intermediate) result set.  I expected that a certain amount but 
the variability is much larger than I expected.  Do you have any 
suggestions about a better way to design/index my table or do the 
queries?  Do I need to break my queries up and use temporary tables?

If it matters, I'm running MySQL 3.23.49 on a AMD 1.3ghz Linux machine.

Here are some typical queries and results:

>mysql> select count(distinct m0.doc_id) from main_index as m0 where 
>m0.word_id=10;
>+---------------------------+
>| count(distinct m0.doc_id) |
>+---------------------------+
>|                       405 |
>+---------------------------+
>1 row in set (0.01 sec)
>
>mysql> select count(distinct m0.doc_id) from main_index as m0 where 
>m0.word_id=9;
>+---------------------------+
>| count(distinct m0.doc_id) |
>+---------------------------+
>|                       221 |
>+---------------------------+
>1 row in set (0.00 sec)
>
>mysql> select count(distinct m0.doc_id) from main_index as m0 where 
>m0.word_id=2327;
>+---------------------------+
>| count(distinct m0.doc_id) |
>+---------------------------+
>|                        17 |
>+---------------------------+
>1 row in set (0.00 sec)
>
>mysql> select count(distinct m0.doc_id) from main_index as m0, 
>main_index as m1 where m0.word_id=10 and m1.word_id=2327 and 
>m0.doc_id=m1.doc_id;
>+---------------------------+
>| count(distinct m0.doc_id) |
>+---------------------------+
>|                        17 |
>+---------------------------+
>1 row in set (0.59 sec)
>
>mysql> select count(distinct m0.doc_id) from main_index as m0, 
>main_index as m1 where m0.word_id=9 and m1.word_id=2327 and 
>m0.doc_id=m1.doc_id;
>+---------------------------+
>| count(distinct m0.doc_id) |
>+---------------------------+
>|                        15 |
>+---------------------------+
>1 row in set (0.55 sec)
>
>mysql> select count(distinct m0.doc_id) from main_index as m0, 
>main_index as m1 where m0.word_id=9 and m1.word_id=10 and 
>m0.doc_id=m1.doc_id;
>+---------------------------+
>| count(distinct m0.doc_id) |
>+---------------------------+
>|                       181 |
>+---------------------------+
>1 row in set (59.56 sec)
>
>mysql> select count(distinct m0.doc_id) from main_index as m0, 
>main_index as m1, main_index as m2 where m0.word_id=10 and 
>m1.word_id=2327 and m2.word_id=9 and m0.doc_id=m1.doc_id and 
>m0.doc_id=m2.doc_id;
>+---------------------------+
>| count(distinct m0.doc_id) |
>+---------------------------+
>|                        15 |
>+---------------------------+
>1 row in set (8.62 sec)

Thank you in advance,

Bill



-- 
Bill Rausch, Software Development, UNIX, Mac, Windows
Numerical Applications, Richland, WA  509-943-0861

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to