Hi Guys - We've just taken the plunge to go to mysql 4.1 on our dbmail database; and all went extremely well except for one little glitch I've discovered.
This query, in mysql/dbmysql.c: "select message_idnr from messages where unique_id != '' order by message_idnr desc limit 0,1" (about line 2782) Seems to take an absolute age to return a result. mysql> explain select message_idnr from messages where unique_id != '' order by message_idnr desc limit 0,1; +----+-------------+----------+-------+---------------+-----------+--------- +------+----------+------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+---------------+-----------+--------- +------+----------+------------------------------------------+ | 1 | SIMPLE | messages | range | unique_id | unique_id | 70 | NULL | 11446443 | Using where; Using index; Using filesort | +----+-------------+----------+-------+---------------+-----------+--------- +------+----------+------------------------------------------+ 1 row in set (0.00 sec) I've done some reading, and the "using filesort" behaviour seems to have changed in MySQL 4.1. --- begin quote: http://dev.mysql.com/doc/mysql/en/ORDER_BY_optimization.html -- In MySQL 4.1 and up, a filesort optimization is used that records not only the sort key value and row position, but also the columns required for the query. This avoids reading the rows twice. The modified filesort algorithm works like this: Read the rows that match the WHERE clause, as before. For each row, record a tuple of values consisting of the sort key value and row position, and also the columns required for the query. Sort the tuples by sort key value Retrieve the rows in sorted order, but read the required columns directly from the sorted tuples rather than by accessing the table a second time. Using the modified filesort algorithm, the tuples are longer than the pairs used in the original method, and fewer of them fit in the sort buffer (the size of which is given by sort_buffer_size). As a result, it is possible for the extra I/O to make the modified approach slower, not faster. To avoid a slowdown, the optimization is used only if the total size of the extra columns in the sort tuple does not exceed the value of the max_length_for_sort_data system variable. (A symptom of setting the value of this variable too high is that you will see high disk activity and low CPU activity.) --- end quote --- This query is only affecting the imap server for me, and if I remove the "where unique_id = ''" clause the problem disappears. I'm wondering is the condition actually required? I only have 146 messages in the DB without a uniqueId (probably legacy cruft never cleaned up); and does it matter whether the uniqueid is present -- as we're simply using the messageblk_idnr as the basis for the next uniqueId. ==== Also: This query did the same thing (really long execution time), from the db_get_quotum_used function. snprintf(query, DEF_QUERYSIZE, "SELECT SUM(m.messagesize) FROM messages m, mailboxes mb " "WHERE m.mailbox_idnr = mb.mailbox_idnr AND mb.owner_idnr = %llu AND m.status < %d", userid, STATUS_DELETE); I'm temporarily just returning "0" which seems to allow things to work (or so it seems). ==== In general 4.1 seems to be working much faster. So the questions: 1) is anything I've done above really bad and gonna break users' mail in the long run? 2) do we need the "where unique_id = ''" qualifier? 3) Anyone else had similar problems with MySQL 4.1 and Dbmail 1.3.21 and worked out better ways around it? -- Regards, Mark Mackay