On Fri, Jan 21, 2011 at 2:01 PM, Shawn Green (MySQL) <
shawn.l.gr...@oracle.com> wrote:

> On 1/21/2011 14:21, Kendall Gifford wrote:
>
>> Hello everyone, I've got a database on an old Fedora Core 4 server running
>> MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question has
>> just two (InnoDB) tables:
>>
>> messages (approx 2.5 million records)
>> recipients (approx 6.5 million records)
>>
>> These track information about email messages. Each message "has many"
>> recipient records. The structure of the two tables (omitting irrelevant
>> data
>> fields) are as follows:
>>
>>
>> +-------------+------------------+------+-----+---------------------+----------------+
>> | Field       | Type             | Null | Key | Default             |
>> Extra          |
>>
>> +-------------+------------------+------+-----+---------------------+----------------+
>> | id          | int(10) unsigned |      | PRI | NULL                |
>> auto_increment |
>> | sent_at     | datetime         |      | MUL | 0000-00-00 00:00:00
>> |                |
>> | ......................... OTHER FIELDS OMITTED FOR BREVITY
>> ....................... |
>>
>> +-------------+------------------+------+-----+---------------------+----------------+
>>
>>
>> +-------------+------------------+------+-----+---------------------+----------------+
>> | Field       | Type             | Null | Key | Default             |
>> Extra          |
>>
>> +-------------+------------------+------+-----+---------------------+----------------+
>> | id          | int(10) unsigned |      | PRI | NULL                |
>> auto_increment |
>> | message_id  | int(10) unsigned |      | MUL | 0
>> |                |
>> | employee_id | int(10) unsigned | YES  | MUL | NULL
>> |                |
>> | ......................... OTHER FIELDS OMITTED FOR BREVITY
>> ....................... |
>>
>> +-------------+------------------+------+-----+---------------------+----------------+
>>
>> I have the following query that is just too slow:
>>
>>  SELECT messages.* FROM messages
>>> INNER JOIN recipients ON recipients.message_id = messages.id
>>> WHERE recipients.employee_id = X
>>> GROUP BY messages.id
>>> ORDER BY sent_at DESC
>>> LIMIT 0, 25;
>>>
>>
>> This takes about 44 seconds on average. The query explanation is as
>> follows:
>>
>>
>> +----+-------------+------------+--------+--------------------------+--------------+---------+-------------------------------------+--------+----------------------------------------------+
>> | id | select_type | table      | type   | possible_keys            |
>> key          | key_len | ref                                 | rows   |
>> Extra                                        |
>>
>> +----+-------------+------------+--------+--------------------------+--------------+---------+-------------------------------------+--------+----------------------------------------------+
>> |  1 | SIMPLE      | recipients | ref    | messages_fk,employee_idx |
>> employee_idx |       5 | const                               | 222640 |
>> Using where; Using temporary; Using filesort |
>> |  1 | SIMPLE      | messages   | eq_ref | PRIMARY                  |
>> PRIMARY      |       4 | email_archive.recipients.message_id |      1
>> |                                              |
>>
>> +----+-------------+------------+--------+--------------------------+--------------+---------+-------------------------------------+--------+----------------------------------------------+
>>
>> I've been doing some searching on the web and have no idea if/how this can
>> be sped up. Most searches these days reference MySQL 5.x which I'm just
>> not
>> sure how much applies. I'm hoping that there is something obvious that I'm
>> missing, or that one of you experts knows what I might be able to change
>> to
>> speed this query up.
>>
>> Anyhow, thanks in advance for even so much as reading my message, let
>> alone
>> replying :).
>>
>>
> You need to get rid of the GROUP BY to make this go faster. You can do that
> by running two queries, one to pick the list of unique recipients.message_id
> values that match your where condition then another to actually retrieve the
> message data. Something like this
>
> CREATE TEMPORARY TABLE tmpMessages (message_id INT UNSIGNED, PRIMARY KEY
> (message_id)) ENGINE=MEMORY;
>
> INSERT IGNORE tmpMessages
> SELECT message_id
> FROM recipients
> WHERE employee_id = X;
>
> SELECT messages.* FROM messages
> INNER JOIN tmpMessages ON tmpMessages.message_id = messages.id
>
> ORDER BY sent_at DESC
> LIMIT 0, 25;
>
> By pre-selecting a limited set of message_id values from the recipients
> table, you seriously reduce the number of rows that need to be scanned.
> Also, the INSERT IGNORE technique is faster than the GROUP BY because it
> uses an index to identify any duplicates instead of a scan of all previous
> unique values.
>
> Please let us all know if this is faster enough. (and don't forget to drop
> the temp table once you are through using it)
>
> --
> Shawn Green
> MySQL Principal Technical Support Engineer
> Oracle USA, Inc.
> Office: Blountville, TN
>

Thanks Shawn, I'm in the process of trying you're suggestion now. I'll let
you know how it goes.

-- 
Kendall Gifford
zettab...@gmail.com

Reply via email to