Hi Guys - 

We've just taken the plunge to go to mysql 4.1 on our dbmail database; and
all went extremely well except for one little glitch I've discovered.

This query, in mysql/dbmysql.c:

        "select message_idnr from messages where unique_id != '' order by
message_idnr desc limit 0,1"
        (about line 2782)

Seems to take an absolute age to return a result.

mysql> explain select message_idnr from messages where unique_id != '' order
by message_idnr desc limit 0,1;
+----+-------------+----------+-------+---------------+-----------+---------
+------+----------+------------------------------------------+
| id | select_type | table    | type  | possible_keys | key       | key_len
| ref  | rows     | Extra                                    |
+----+-------------+----------+-------+---------------+-----------+---------
+------+----------+------------------------------------------+
|  1 | SIMPLE      | messages | range | unique_id     | unique_id |      70
| NULL | 11446443 | Using where; Using index; Using filesort |
+----+-------------+----------+-------+---------------+-----------+---------
+------+----------+------------------------------------------+
1 row in set (0.00 sec)

I've done some reading, and the "using filesort" behaviour seems to have
changed in MySQL 4.1.

--- begin quote:
http://dev.mysql.com/doc/mysql/en/ORDER_BY_optimization.html --

In MySQL 4.1 and up, a filesort optimization is used that records not only
the sort key value and row position, but also the columns required for the
query. This avoids reading the rows twice. The modified filesort algorithm
works like this: 

Read the rows that match the WHERE clause, as before. 
For each row, record a tuple of values consisting of the sort key value and
row position, and also the columns required for the query. 
Sort the tuples by sort key value 
Retrieve the rows in sorted order, but read the required columns directly
from the sorted tuples rather than by accessing the table a second time. 
Using the modified filesort algorithm, the tuples are longer than the pairs
used in the original method, and fewer of them fit in the sort buffer (the
size of which is given by sort_buffer_size). As a result, it is possible for
the extra I/O to make the modified approach slower, not faster. To avoid a
slowdown, the optimization is used only if the total size of the extra
columns in the sort tuple does not exceed the value of the
max_length_for_sort_data system variable. (A symptom of setting the value of
this variable too high is that you will see high disk activity and low CPU
activity.) 
--- end quote ---

This query is only affecting the imap server for me, and if I remove the
"where unique_id = ''" clause the problem disappears.

I'm wondering is the condition actually required? I only have 146 messages
in the DB without a uniqueId (probably legacy cruft never cleaned up); and
does it matter whether the uniqueid is present -- as we're simply using the
messageblk_idnr as the basis for the next uniqueId.

====

Also: This query did the same thing (really long execution time), from the
db_get_quotum_used function.

  snprintf(query, DEF_QUERYSIZE, "SELECT SUM(m.messagesize) FROM messages m,
mailboxes mb "
           "WHERE m.mailbox_idnr = mb.mailbox_idnr AND mb.owner_idnr = %llu
AND m.status < %d",
           userid, STATUS_DELETE);

I'm temporarily just returning "0" which seems to allow things to work (or
so it seems).

====

In general 4.1 seems to be working much faster. 

So the questions:

1) is anything I've done above really bad and gonna break users' mail in the
long run?
2) do we need the "where unique_id = ''" qualifier?
3) Anyone else had similar problems with MySQL 4.1 and Dbmail 1.3.21 and
worked out better ways around it?


--
Regards,
Mark Mackay

Reply via email to