I had exact same slowness problem after upgrading to 2.0 via imap last night. I sent a report to the users list before I saw this thread.

If the following new query is to be used to optimize the imap:

select
       count(message_idnr),
       count(message_idnr) - sum(seen_flag),
       sum(recent_flag)
   from %smessages
       where mailbox_idnr = '%llu' and status < '%d';

dbmail should have an index, at least for mysql that index both column (mailbox_idnr and status). I was looking at some of the debug queries and after add a (mailbox_idnr, status, uniqueid) combo index, mysql was able to trim the records need to file sort from 21000 to 17000 from of the queries which use all of these 3 fields. 4000/21000 = 19% decrease in the records mysql has to manually sort through.

Xing

Paul J Stevens wrote:




Mikhail Ramendik wrote:

Hello,

I have imported a large (about 50,000 messages) folder into dbmail - and
got really SLOW performance. Opening the folder with a IMAP client
(Evolution, on the same local machine) takes ages! The promised 250
messages/second are just not there.


Painfully true.

I have analyzed the logs and looked at the code. And I can clearly see
where the bottlenecks are. (This is why I am writing to the developers
list.)

However, I'm not an expert in database programming. So while finding the
problems was easy, I'd prefer it if someone else could fix them :) I
have some ideas how to do this; but this would be a "last resort".


Please do share your ideas. Anything that will boost perfomance will be seriously considered.



So, when the folder gets opened, first the system spends some minutes
with dbmail-imapd hogging the CPU (a Celeron 2400). And here are the log
entriesbetween which this happens:

Oct 23 11:42:28 localhost dbmail/imap4d[2762]: dbmysql.c,db_query:
executing query [SELECT message_idnr, seen_flag, recent_flag FROM
dbmail_messages WHERE mailbox_idnr = '9' AND status < '2' AND unique_id
!= '' ORDER BY message_idnr ASC]
Oct 23 11:44:19 localhost dbmail/imap4d[2762]: dbmysql.c,db_query:
executing query [SELECT MAX(message_idnr) FROM dbmail_
messages WHERE unique_id != '']

I have found the corresponding place in the code and can see the CPU hog
there:

(db.c line 2534)


Mmm, for me this is around 2340 more like.


        /* alloc mem */
        mb->seq_list = (u64_t *) my_malloc(sizeof(u64_t) * mb->exists);
        if (!mb->seq_list) {
                /* out of mem */
                db_free_result();
                return -1;
        }

        for (i = 0; i < db_num_rows(); i++) {
                if (db_get_result(i, 1)[0] == '0')
                        mb->unseen++;
                if (db_get_result(i, 2)[0] == '1')
                        mb->recent++;

                mb->seq_list[i] = db_get_result_u64(i, 0);
        }

        db_free_result();

Well, with db_num_rows() in the tens of thousands one would expect a CPU
hog here! Three calls to db_get_result for every message - and
db_get_result performs seeking every single time, too.


And only to count the number of recent_flags, and seen_flags for a certain subset of message_idnrs all those message rows are actually selected !! Clearly usage of COUNT() comes to mind as an optimization.

therefore, where in db_getmailbox() the following code is used to obtain the total number of messages, the number of unseen, and number of recent messages:

        /* select messages */
        snprintf(query, DEF_QUERYSIZE,
                 "SELECT message_idnr, seen_flag, recent_flag "
                 "FROM %smessages WHERE mailbox_idnr = '%llu' "
                 "AND status < '%d' "
"ORDER BY message_idnr ASC",DBPFX, mb->uid, MESSAGE_STATUS_DELETE);

we would be better of to use:

     select
        count(message_idnr),
        count(message_idnr) - sum(seen_flag),
        sum(recent_flag)
    from %smessages
        where mailbox_idnr = '%llu' and status < '%d';

and defer fetching the list of message_idnr to a separate query. This will reduce the number of db_get_result calls in this function by approx. 66%, which must be good for large folders.

Thanks for pointing this out. I have this implemented already, if this works out, I'll commit this change next week.

[snip a typical message-parsing fetch run]


And so it goes on, doing four queries per message! No wonder it can only
process about 20 messages per second, instead of 250 as promised in the
README.


Yep. You hit dbmail's sorest spot called _ic_fetch. Not so easy to fix. And something I've been working on for some months now. Using a better mime-parser will give us better model-view separation. We also need a server-side list-view of messages as presented to the controller (imapclient). I was working on a separate branch of the code to investigate some of these issues but have now abbandoned the branch in favor of using smaller atomic change-sets in separate patches. Debian's dpatch tool really rules here for me, though the upcoming switch to subversion may well have similar advantages.


This actually makes dbmail unusable for me as a local storage - and eats
away performance in other cases too, though it may not be that
noticeable.


Try searching large folders for specific body content.... and wait... and wait.... Something that makes dbmail currently all but unusable for client-side filtering.




Reply via email to