[Dbmail-dev] STORE is SLOOOOOW on folder with many, many messages

Kevin Brown Mon, 12 Dec 2005 19:03:21 +0100 (CET)

In experimenting with DBMail 2.0.7 (even with the changes I submitted
regarding the headername query, I can't get 2.1-trunk to run well
enough to work properly with both Mutt and Thunderbird), I discovered
(as perhaps some of you have) that storing changes to the attributes
is very slow when the folder has lots of messages (tens of thousands).


2.1.x may have fixed this, though the query in question is the same.
But in 2.0.7, the following query is responsible for the bulk of the
time spent in the update:

    SELECT 'a',COUNT(*) FROM dbmail_messages WHERE mailbox_idnr='14'
        AND (status='0' OR status='1') UNION SELECT 'b',COUNT(*) FROM
        dbmail_messages WHERE mailbox_idnr='14' AND (status='0' OR
        status='1') AND seen_flag=1 UNION SELECT 'c',COUNT(*) FROM
        dbmail_messages WHERE mailbox_idnr='14' AND (status='0' OR
        status='1') AND recent_flag=1

(substitute any given mailbox id for '14', of course).


This query originates from this bit of code here in function
db_getmailbox in db.c:

    snprintf(query, DEF_QUERYSIZE,
        "SELECT 'a',COUNT(*) FROM dbmail_messages WHERE mailbox_idnr='%llu' "
        "AND (status='%d' OR status='%d') UNION "
        "SELECT 'b',COUNT(*) FROM dbmail_messages WHERE mailbox_idnr='%llu' "
        "AND (status='%d' OR status='%d') AND seen_flag=1 UNION "
        "SELECT 'c',COUNT(*) FROM dbmail_messages WHERE mailbox_idnr='%llu' "
        "AND (status='%d' OR status='%d') AND recent_flag=1", 
        mb->uid, MESSAGE_STATUS_NEW, MESSAGE_STATUS_SEEN,
        mb->uid, MESSAGE_STATUS_NEW, MESSAGE_STATUS_SEEN,
        mb->uid, MESSAGE_STATUS_NEW, MESSAGE_STATUS_SEEN);
              


That query is horribly inefficient on PostgreSQL.  The following is
much more efficient (by at least a factor of two if not more) and
accomplishes the same thing, though it will require a bit more
interpretation by our code to use:

    SELECT seen_flag*2 + recent_flag, count(*) FROM dbmail_messages
        WHERE mailbox_idnr = '14' AND status IN ('0', '1') GROUP BY
        seen_flag*2 + recent_flag;

We can generate it thusly:

    snprintf(query, DEF_QUERYSIZE,
        "SELECT seen_flag*2 + recent_flag, count(*) FROM "
        "dbmail_messages WHERE mailbox_idnr = '%d' AND status IN "
        "('%d', '%d') GROUP BY seen_flag*2 + recent_flag",
        mb->uid, MESSAGE_STATUS_NEW, MESSAGE_STATUS_SEEN);

The query will return up to 4 rows with the following values as the
first column:

    0: neither seen_flag nor recent_flag is set.
    1: recent_flag is set, seen_flag is not.
    2: seen_flag is set, recent_flag is not.
    3: both seen_flag and recent_flag are set


The additional interpretation is in reading and interpreting the
results.  The query won't return rows for flag combinations that don't
have any corresponding entries, so we have to infer those.  I think
the most efficient way to do that is to create a simple 4-element
integer array, initialize its elements to zero, and use the first
column results as the index to that array.  Place the results in the
array.  The array will have zero for entries that weren't returned by
the query.  The array definition will be something like:

    unsigned int results[4];


The values we're interested in (exists, seen, recent) will then be:

    exists = results[0] + results[1] + results[2] + results[3];
    seen = results[2] + results[3];
    recent = results[1] + results[3]



Now, the above is more efficient, but the real win would be if we
could avoid calling db_getmailbox (or perhaps a modified version of it
that doesn't have to do any of this) altogether after doing a STORE.


I don't know how much more (if any) efficient the above will be on
MySQL or SQLite, but I wouldn't expect it to be less efficient.


Thoughts?




-- 
Kevin Brown                                           [EMAIL PROTECTED]

[Dbmail-dev] STORE is SLOOOOOW on folder with many, many messages

Reply via email to