A NOTE has been added to this issue.
======================================================================
http://dbmail.org/mantis/view.php?id=139
======================================================================
Reported By: aaron
Assigned To: paul
======================================================================
Project: DBMail
Issue ID: 139
Category: IMAP daemon
Reproducibility: always
Severity: feature
Priority: normal
Status: acknowledged
======================================================================
Date Submitted: 12-Dec-04 00:27 CET
Last Modified: 24-Mar-06 23:43 CET
======================================================================
Summary: dbmail-imapd doesn't scale nicely with large message
ranges
Description:
Thomas Mueller wrote:
Sometimes my server uses for some minutes much more memory than it should
- and I guess it's dbmail.
I hope I'll find some time soon to use a profiler, but meanwhile I guess
the following happens: someone marks a mailbox for offline use and
dbmail-imapd does the following:
- fetch all mails from database
- keep the result set in memory
- deliver them
The third step can take a while so the process eats lots of memory for
quite some time - no bug, its a design problem.
This only happens for some minutes, that's why I'm quite sure it's no
memory hole.
The way to go would be to use a server side cursor so only one mail has to
be kept in memory - but AFAIK there's a storage system with SQL interface
(sorry couldn't resist) that doesn't support cursors.
======================================================================
----------------------------------------------------------------------
aaron - 12-Dec-04 00:30
----------------------------------------------------------------------
Paul, you might know this code best, is there someplace in _ic_foo() that
goes through a result list from the database and builds some huge thing in
memory, and then begins to send it back to the client?
It might be as simple as placing some ic_write()'s in the middle of that
loop, rather than building up the whole structure at all.
----------------------------------------------------------------------
paul - 12-Dec-04 09:34
----------------------------------------------------------------------
There is basically only one candidate: _ic_fetch
Thomas was referring to the use-case where users mark a mailbox for
offline usage. This probably triggers something like:
C: A001 UID FETCH 1:* (FULL)
This will first retrieve the full range of message_idnr with their flags
using db_get_msginfo_range, and after that start retrieving the full
messages one-by-one and dumping them to the client.
There is no place in the code where messageblks for more than one message
at a time are selected. There should be, once we can support cursors, but
for now dbmail is on a one-message-at-a-time paradigm.
So, db_get_msginfo_range will build a large result-set that should scale
well since it's holding only the message_idnr and the flags, and
afterwards messageblks for these messages are selected, one message at a
time.
It could be this long loop that retrieves the full messages is 'leaking'
memory during its run: rescaling the memory allocated for the cache to the
largest message retrieved, and not releasing that memory until the end of
the loop.
----------------------------------------------------------------------
aaron - 24-Mar-06 12:25
----------------------------------------------------------------------
How does this look in SVN trunk? I haven't tried it with any huge mailboxes
myself...
----------------------------------------------------------------------
sayler - 24-Mar-06 16:59
----------------------------------------------------------------------
I'll see if I can produce a test case against my 90kmessage Mbox.
Generally, I've not noticed dbmail using lots of memory (but I'm running
it on a server with 2GB so it's possible I'm just not noticing).
Thomas -- if you know which mail client you're using it would help. I can
try Thunderbird here easily.
----------------------------------------------------------------------
sayler - 24-Mar-06 17:58
----------------------------------------------------------------------
Paul,
Heap usage seems to grow pretty badly, and somewhat linearily, when doing
a FETCH 1:* (FULL). The core size reported by top grows from 2mb on
program load to around 30-35mb for FETCH 1:1000 (FULL) on my mailbox. I
had it up to about 105mb on a FETCH 1:* before I killed it (I think it was
between 2 and 3000 messages into the run).
I'm learning to play with valgrind and friends now.. I'll let you know if
I see anything interesting.
----------------------------------------------------------------------
sayler - 24-Mar-06 19:59
----------------------------------------------------------------------
OK, if I understand this right memory is being allocated by
_set_content_from_stream (in g_mime_stream_write_string) and then never
free'd. I'm not sure if this is a bug in g_mime or the way we use it (or
a misunderstanding on my part).
The attached valgrind/massif plots are from me doing:
1 LOGIN XXX YYY
2 SELECT INBOX
3 FETCH 1:700 (FULL)
4 LOGOUT
against a 90kmessage INBOX. After fetching 700 messages, we have around
10mb of heap allocated by g_mime_stream_write_string. I *think* the rest
of the usage is legit..
Anyone (Paul, Aaron?) care to comment? I don't understand Gmime and our
usage of it well enough to dig very very (yet)
----------------------------------------------------------------------
aaron - 24-Mar-06 22:09
----------------------------------------------------------------------
I don't understand how your suggested callchain works. Here's what I see in
the code:
lmtp.c/main.c
dbmail_message_new_from_stream
dbmail_message_init_with_stream
_set_content_from_stream
OK, must not be this one, this is delivery...
dbmail-imapsession.c, dbmail-mailbox.c, dbmail-message.c call:
db_init_fetch
dbmail_message_retrieve
_fetch_head/_fetch_full
_retrieve
dbmail_message_init_with_string
_set_content
_set_content_from_stream
OK, here we go. Something going wrong here... Found it: char *buf =
g_new0... buf only gets freed in case DBMAIL_STREAM_LMTP and
DBMAIL_STREAM_PIPE, but not from default or case DBMAIL_STREAM_RAW.
Ok, try the latest SVN. I just wrapped buf inside an anon block only in
the part of the switch block where it gets used.
----------------------------------------------------------------------
sayler - 24-Mar-06 23:43
----------------------------------------------------------------------
no dice. I still get a big chunk of stuff allocated..
Here's my (partial) theory after pouring over the code today:
The problem doesn't happen (e.g. memory usage is nice and flat) if I do a
FETCH (INTERNALDATE)
or even
FETCH (RFC822.HEADER)
We get massive bloat because we're retaining the whole body after parsing
the BODYSTRUCTURE (which is needed by FULL)
As far as I can tell, the imap_cache is correctly freeing all its
resources after the new message is swapped in every iteration of the fetch
loop (when the message id of self and the cache mismatch). However,
valgrind seems to think that the memory from g_mime_stream_write_string is
never reclaimed. g_mime_stream_write_string is used to convert the content
as a GString into a GMIME object. The weird thing is as far as I can tell
everything is being cleaned up properly.
If I add up the outputs of all the
"dbmail-imapsession.c,_imap_cache_update: cache size [XXX]" lines in my
log file it seems to be about the amount of memory leaked in
g_mime_stream_write_string
Any thoughts?
Issue History
Date Modified Username Field Change
======================================================================
12-Dec-04 00:27 aaron New Issue
12-Dec-04 00:30 aaron Note Added: 0000439
12-Dec-04 09:34 paul Note Added: 0000441
22-Aug-05 10:29 paul Assigned To => paul
22-Aug-05 10:29 paul Status new => acknowledged
22-Aug-05 10:29 paul Projection none => redesign
22-Aug-05 10:29 paul ETA none => > 1 month
24-Mar-06 12:25 aaron Note Added: 0001051
24-Mar-06 16:59 sayler Note Added: 0001055
24-Mar-06 17:58 sayler Note Added: 0001057
24-Mar-06 19:52 sayler File Added: massif.27855.pdf
24-Mar-06 19:52 sayler File Added: massif.27855.txt
24-Mar-06 19:59 sayler Note Added: 0001058
24-Mar-06 22:09 aaron Note Added: 0001059
24-Mar-06 23:43 sayler Note Added: 0001060
======================================================================