Re: [Dbmail-dev] Two bottlenecks in db_getmailbox

[EMAIL PROTECTED] Sun, 24 Oct 2004 00:13:10 +0200 (CEST)

I had exact same slowness problem after upgrading to 2.0 via imap lastnight. I sent a report to the users list before I saw this thread.


If the following new query is to be used to optimize the imap:


select
       count(message_idnr),
       count(message_idnr) - sum(seen_flag),
       sum(recent_flag)
   from %smessages
       where mailbox_idnr = '%llu' and status < '%d';

dbmail should have an index, at least for mysql that index both column(mailbox_idnr and status). I was looking at some of the debug queriesand after add a (mailbox_idnr, status, uniqueid) combo index, mysql wasable to trim the records need to file sort from 21000 to 17000 from ofthe queries which use all of these 3 fields. 4000/21000 = 19% decreasein the records mysql has to manually sort through.


Xing

Paul J Stevens wrote:

Mikhail Ramendik wrote:
Hello,

I have imported a large (about 50,000 messages) folder into dbmail - and
got really SLOW performance. Opening the folder with a IMAP client
(Evolution, on the same local machine) takes ages! The promised 250
messages/second are just not there.
Painfully true.
I have analyzed the logs and looked at the code. And I can clearly see
where the bottlenecks are. (This is why I am writing to the developers
list.)

However, I'm not an expert in database programming. So while finding the
problems was easy, I'd prefer it if someone else could fix them :) I
have some ideas how to do this; but this would be a "last resort".
Please do share your ideas. Anything that will boost perfomance willbe seriously considered.
So, when the folder gets opened, first the system spends some minutes
with dbmail-imapd hogging the CPU (a Celeron 2400). And here are the log
entriesbetween which this happens:

Oct 23 11:42:28 localhost dbmail/imap4d[2762]: dbmysql.c,db_query:
executing query [SELECT message_idnr, seen_flag, recent_flag FROM
dbmail_messages WHERE mailbox_idnr = '9' AND status < '2' AND unique_id
!= '' ORDER BY message_idnr ASC]
Oct 23 11:44:19 localhost dbmail/imap4d[2762]: dbmysql.c,db_query:
executing query [SELECT MAX(message_idnr) FROM dbmail_
messages WHERE unique_id != '']

I have found the corresponding place in the code and can see the CPU hog
there:

(db.c line 2534)
Mmm, for me this is around 2340 more like.
        /* alloc mem */
        mb->seq_list = (u64_t *) my_malloc(sizeof(u64_t) * mb->exists);
        if (!mb->seq_list) {
                /* out of mem */
                db_free_result();
                return -1;
        }

        for (i = 0; i < db_num_rows(); i++) {
                if (db_get_result(i, 1)[0] == '0')
                        mb->unseen++;
                if (db_get_result(i, 2)[0] == '1')
                        mb->recent++;

                mb->seq_list[i] = db_get_result_u64(i, 0);
        }

        db_free_result();

Well, with db_num_rows() in the tens of thousands one would expect a CPU
hog here! Three calls to db_get_result for every message - and
db_get_result performs seeking every single time, too.
And only to count the number of recent_flags, and seen_flags for acertain subset of message_idnrs all those message rows are actuallyselected !! Clearly usage of COUNT() comes to mind as an optimization.
therefore, where in db_getmailbox() the following code is used toobtain the total number of messages, the number of unseen, and numberof recent messages:
        /* select messages */
        snprintf(query, DEF_QUERYSIZE,
                 "SELECT message_idnr, seen_flag, recent_flag "
                 "FROM %smessages WHERE mailbox_idnr = '%llu' "
                 "AND status < '%d' "
"ORDER BY message_idnr ASC",DBPFX, mb->uid,MESSAGE_STATUS_DELETE);
we would be better of to use:

     select
        count(message_idnr),
        count(message_idnr) - sum(seen_flag),
        sum(recent_flag)
    from %smessages
        where mailbox_idnr = '%llu' and status < '%d';
and defer fetching the list of message_idnr to a separate query. Thiswill reduce the number of db_get_result calls in this function byapprox. 66%, which must be good for large folders.
Thanks for pointing this out. I have this implemented already, if thisworks out, I'll commit this change next week.
[snip a typical message-parsing fetch run]
And so it goes on, doing four queries per message! No wonder it can only
process about 20 messages per second, instead of 250 as promised in the
README.
Yep. You hit dbmail's sorest spot called _ic_fetch. Not so easy tofix. And something I've been working on for some months now. Using abetter mime-parser will give us better model-view separation. We alsoneed a server-side list-view of messages as presented to thecontroller (imapclient). I was working on a separate branch of thecode to investigate some of these issues but have now abbandoned thebranch in favor of using smaller atomic change-sets in separatepatches. Debian's dpatch tool really rules here for me, though theupcoming switch to subversion may well have similar advantages.
This actually makes dbmail unusable for me as a local storage - and eats
away performance in other cases too, though it may not be that
noticeable.
Try searching large folders for specific body content.... and wait...and wait....Something that makes dbmail currently all but unusable for client-sidefiltering.

Re: [Dbmail-dev] Two bottlenecks in db_getmailbox

Reply via email to