On 8/17/06, Chris <[EMAIL PROTECTED]> wrote:
Jon Molin wrote:
> Hi list
>
> I have 5 tables:
>
> words (word_id int auto_increment, word varbinary(40)) (has ~3.5M
> rows) with the keys:
> PRIMARY KEY  (`word_id`),UNIQUE KEY `word_ind` (`word`)
>
> phrases (phrase_id int auto_increment, phrase varbinary(100)) (has
> ~11M rows) with the keys:
> PRIMARY KEY  (`phrase_id`),UNIQUE KEY `phrase_ind` (`phrase`)
>
> phrase_words (phrase_id, word_id) (has ~31M rows) with:
> UNIQUE KEY `phrase_ind` (`phrase_id`,`word_id`)
> KEY `word` (`word_id`), KEY `phrase` (`phrase_id`)
>
> word_searches (word_id, search_date date, search hour char(2), amount
> smallint, type char(8), location char(2)) with:
> KEY `word_search` (`word_id`),
> KEY `id_search` (`search_date`),
> KEY `word_date` (`word_id`,`search_date`)
>
> (and a similar for phrase_searches, these two tables are merge tables
> with one table for each month, each table having 15-30M rows)
>
> phrases are built of  "words" identified by phrase_words (these are
> not human language words and phrases but rather random bytes where
> some are human readable).
>
> Now, I'm trying to find out "how many times has word 1..n been
> searched for and how many times has phrases containing 1..n been
> searched for?"
>
> These queries take a really long time to execute, first I select for the
> words:
> explain sELECT w.word as word, w.word_id as word_id, sum(ws.amount) as
> amount FROM words w, word_searches ws WHERE
> ws.word_id=w.word_id AND w.word IN ("ppppp", "xyz", "zzz", "abc") AND
>              ws.search_date >= '2006-07-17' AND ws.search_date <=
> '2006-08-16' group by ws.word_id;
> 
+----+-------------+-------+-------+---------------------------------+-------------+---------+----------------------+------+----------------------------------------------+
>
> | id | select_type | table | type  | possible_keys                   |
> key         | key_len | ref                  | rows | Extra
>                            |
> 
+----+-------------+-------+-------+---------------------------------+-------------+---------+----------------------+------+----------------------------------------------+
>
> |  1 | SIMPLE      | w     | range | PRIMARY,word_ind                |
> word_ind    | 42      | NULL                 |    4 | Using where;
> Using temporary; Using filesort |
> |  1 | SIMPLE      | ws    | ref   | word_search,id_search,word_date |
> word_search | 4       | statistics.w.word_id |   15 | Using where
>                            |
> 
+----+-------------+-------+-------+---------------------------------+-------------+---------+----------------------+------+----------------------------------------------+
>
>
> and then for phrases:
> explain SELECT pw.word_id as word_id, sum(ps.amount) as amount FROM
> phrase_words pw, phrase_searches ps WHERE
>                pw.word_id IN (966,1,1250,1741) AND
>                pw.phrase_id = ps.phrase_id AND ps.search_date >=
> '2006-07-17' AND ps.search_date <= '2006-08-16'
>                GROUP by pw.word_id;
> 
+----+-------------+-------+-------+-------------------------------------+---------------+---------+-------------------------+--------+-------------+
>
> | id | select_type | table | type  | possible_keys
>  | key           | key_len | ref                     | rows   | Extra
>      |
> 
+----+-------------+-------+-------+-------------------------------------+---------------+---------+-------------------------+--------+-------------+
>
> |  1 | SIMPLE      | pw    | range | phrase_ind,word,phrase
>  | word          | 4       | NULL                    | 226847 | Using
> where |
> |  1 | SIMPLE      | ps    | ref   |
> phrase_search,id_search,phrase_date | phrase_search | 4       |
> statistics.pw.phrase_id |     15 | Using where |
> 
+----+-------------+-------+-------+-------------------------------------+---------------+---------+-------------------------+--------+-------------+

The problem is it's picking the "word" index which apparently is
returning 226,000+ areas.

Test this:

EXPLAIN SELECT pw.word_id as word_id, sum(ps.amount) as amount FROM
phrase_words pw, phrase_searches ps IGNORE INDEX (word) WHERE
pw.word_id IN (966,1,1250,1741) AND
pw.phrase_id = ps.phrase_id AND ps.search_date >=
'2006-07-17' AND ps.search_date <= '2006-08-16'
GROUP by pw.word_id;

(that should ignore the 'word' index and instead use the 'phrase' index).


Unfortunately didn't that help, it leads to:
+----+-------------+-------+-------+-------------------------------------+-----------+---------+-------------------------+---------+----------------------------------------------+
| id | select_type | table | type  | possible_keys
 | key       | key_len | ref                     | rows    | Extra
                                   |
+----+-------------+-------+-------+-------------------------------------+-----------+---------+-------------------------+---------+----------------------------------------------+
|  1 | SIMPLE      | ps    | range |
phrase_search,id_search,phrase_date | id_search | 3       | NULL
            | 3836930 | Using where; Using temporary; Using filesort
|
|  1 | SIMPLE      | pw    | ref   | phrase_ind,phrase
 | phrase    | 4       | statistics.ps.phrase_id |       3 | Using
where                                  |
+----+-------------+-------+-------+-------------------------------------+-----------+---------+-------------------------+---------+----------------------------------------------+

And executing the queries once each (yes unscientific but there's
nothing else running there so it should be fairly ok) takes the skip
index query almost twice as much time as with the index (13 vs 7
minutes)


Thanks for the quick reply!

/jon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to