I just tested Paul's latest patch (minus the is_header code updates) and
Mikhail's patch with the non-cvs 2.0 package and both coexist nicely. =)

Since you guys seem to want to have is_header header used asap to optimize
processing time, the following mysql query could help in the
updade/upgrade process:

update `dbmail_messageblks`
set is_header = 1 where where  left( messageblk, 9 ) = 'Received:';

I'm assuming all the header blocks start with 'Received:'.

Xing





> Xing,
>
>
> I have fixed my previously untested patch. Now it works nicely.
>
>
>
>
> [EMAIL PROTECTED] wrote:
>> I had exact same slowness problem after upgrading to 2.0 via imap last
>> night. I sent a report to the users list before I saw this thread.
>>
>> If the following new query is to be used to optimize the imap:
>>
>>
>> select count(message_idnr), count(message_idnr) - sum(seen_flag),
>> sum(recent_flag) from %smessages where mailbox_idnr = '%llu' and status <
>> '%d';
>>
>>
>> dbmail should have an index, at least for mysql that index both column
>> (mailbox_idnr and status). I was looking at some of the debug queries
>> and after add a (mailbox_idnr, status, uniqueid) combo index, mysql was
>> able  to trim the records need to file sort from 21000 to 17000 from of
>>  the queries which use all of these 3 fields. 4000/21000 =  19%
>> decrease in the records mysql has to manually sort through.
>>
>> Xing
>>
>>
>> Paul J Stevens wrote:
>>
>>
>>>
>>>
>>>
>>> Mikhail Ramendik wrote:
>>>
>>>
>>>> Hello,
>>>>
>>>>
>>>> I have imported a large (about 50,000 messages) folder into dbmail
>>>> - and
>>>> got really SLOW performance. Opening the folder with a IMAP client
>>>> (Evolution, on the same local machine) takes ages! The promised 250
>>>>  messages/second are just not there.
>>>
>>>
>>>
>>> Painfully true.
>>>
>>>
>>>> I have analyzed the logs and looked at the code. And I can clearly
>>>> see where the bottlenecks are. (This is why I am writing to the
>>>> developers list.)
>>>>
>>>> However, I'm not an expert in database programming. So while
>>>> finding the problems was easy, I'd prefer it if someone else could
>>>> fix them :) I have some ideas how to do this; but this would be a
>>>> "last resort".
>>>>
>>>
>>>
>>>
>>> Please do share your ideas. Anything that will boost perfomance will
>>> be seriously considered.
>>>
>>>
>>>>
>>>> So, when the folder gets opened, first the system spends some
>>>> minutes with dbmail-imapd hogging the CPU (a Celeron 2400). And here
>>>> are the log entriesbetween which this happens:
>>>>
>>>> Oct 23 11:42:28 localhost dbmail/imap4d[2762]: dbmysql.c,db_query:
>>>> executing query [SELECT message_idnr, seen_flag, recent_flag FROM
>>>> dbmail_messages WHERE mailbox_idnr = '9' AND status < '2' AND
>>>> unique_id != '' ORDER BY message_idnr ASC]
>>>> Oct 23 11:44:19 localhost dbmail/imap4d[2762]: dbmysql.c,db_query:
>>>> executing query [SELECT MAX(message_idnr) FROM dbmail_ messages WHERE
>>>> unique_id != '']
>>>>
>>>> I have found the corresponding place in the code and can see the
>>>> CPU hog
>>>> there:
>>>>
>>>>
>>>> (db.c line 2534)
>>>>
>>>
>>>
>>>
>>> Mmm, for me this is around 2340 more like.
>>>
>>>
>>>>
>>>> /* alloc mem */
>>>> mb->seq_list = (u64_t *) my_malloc(sizeof(u64_t) * mb->exists); if
>>>> (!mb->seq_list) {
>>>> /* out of mem */
>>>> db_free_result(); return -1; }
>>>>
>>>>
>>>> for (i = 0; i < db_num_rows(); i++) { if (db_get_result(i, 1)[0] ==
>>>> '0')
>>>> mb->unseen++; if (db_get_result(i, 2)[0] == '1') mb->recent++;
>>>>
>>>> mb->seq_list[i] = db_get_result_u64(i, 0); }
>>>>
>>>>
>>>> db_free_result();
>>>>
>>>> Well, with db_num_rows() in the tens of thousands one would expect
>>>> a CPU hog here! Three calls to db_get_result for every message - and
>>>>  db_get_result performs seeking every single time, too.
>>>
>>>
>>>
>>> And only to count the number of recent_flags, and seen_flags for a
>>> certain subset of message_idnrs all those message rows are actually
>>> selected !! Clearly usage of COUNT() comes to mind as an
>>> optimization.
>>>
>>> therefore, where in db_getmailbox() the following code is used to
>>> obtain the total number of messages, the number of unseen, and number
>>>  of recent messages:
>>>
>>> /* select messages */
>>> snprintf(query, DEF_QUERYSIZE, "SELECT message_idnr, seen_flag,
>>> recent_flag " "FROM %smessages WHERE mailbox_idnr = '%llu' "
>>> "AND status < '%d' "
>>> "ORDER BY message_idnr ASC",DBPFX, mb->uid,
>>> MESSAGE_STATUS_DELETE);
>>>
>>>
>>> we would be better of to use:
>>>
>>> select count(message_idnr), count(message_idnr) - sum(seen_flag),
>>> sum(recent_flag) from %smessages where mailbox_idnr = '%llu' and status
>>> < '%d';
>>>
>>>
>>> and defer fetching the list of message_idnr to a separate query. This
>>>  will reduce the number of db_get_result calls in this function by
>>> approx. 66%, which must be good for large folders.
>>>
>>> Thanks for pointing this out. I have this implemented already, if
>>> this works out, I'll commit this change next week.
>>>
>>> [snip a typical message-parsing fetch run]
>>>
>>>
>>>>
>>>> And so it goes on, doing four queries per message! No wonder it can
>>>> only process about 20 messages per second, instead of 250 as
>>>> promised in the README.
>>>>
>>>
>>>
>>>
>>> Yep. You hit dbmail's sorest spot called _ic_fetch. Not so easy to
>>> fix. And something I've been working on for some months now. Using a
>>> better mime-parser will give us better model-view separation. We also
>>>  need a server-side list-view of messages as presented to the
>>> controller (imapclient). I was working on a separate branch of the
>>> code to investigate some of these issues but have now abbandoned the
>>> branch in favor of using smaller atomic change-sets in separate
>>> patches. Debian's dpatch tool really rules here for me, though the
>>> upcoming switch to subversion may well have similar advantages.
>>>
>>>
>>>> This actually makes dbmail unusable for me as a local storage - and
>>>> eats away performance in other cases too, though it may not be that
>>>> noticeable.
>>>
>>>
>>>
>>> Try searching large folders for specific body content.... and wait...
>>>  and wait.... Something that makes dbmail currently all but unusable
>>> for client-side filtering.
>>>
>>>
>>>
>>
>> _______________________________________________
>> Dbmail-dev mailing list
>> Dbmail-dev@dbmail.org
>> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>>
>>
>
> --
> ________________________________________________________________
> Paul Stevens                                         [EMAIL PROTECTED]
> NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
> The Netherlands_______________________________________www.nfg.nl
>
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>
>
>


Reply via email to