[Dbmail-dev] Ideas on further _ic_fetch() speedup

Mikhail Ramendik Fri, 29 Oct 2004 11:13:22 +0200 (CEST)

Hello,

I have some further ideas on _ic_fetch() speedup, both for the time when
we have is_header and, as I spent some  more time thinking, for the
current situation as well.


But I'm afraid I can't code them :( Complicated work in C turned out to
be too much for me. I'm a casual programmer only (a tech writer by
profession), I did relatively much coding in my life but never in C.

I'll keep my coding to simple things; for example, I'll try to fix the
UID SEARCH UID 1:* failure (this prevents communication with Sylpheed).
But I can't handle serious FETCH and SEARCH speedups. The one I did was
relatively simple, for example, it never touched anything containing
allocaion and freeing of buffers...

The SEARCH speedup idea (regexp search) was already discussed; FETCH was
only mentioned in broad terms. So I decided to describe the FETCH ideas
here; I really hope they will be of use.

The main cause of FETCH relative slowness is, I think, the fact that it
uses many queries. This can also cause system load problems when many
people do FETCHing at once. While other things like the header parse
loop also take some time, the bulk is spent in the queries. So we need
to reduce the number of queries.

An ideal solution would join the entire FETCH into one query. But this
would require additions on the database layer, because queries like that
can produce *big* result sets. At least for MySQL, we currently use
mysql_store_result, which loads the entire result set at once; we'd have
to add a way to use mysql_use_result. 

But, one can also do much without going that far - at least when
FETCHing only the headers. This solution would involve a load of several
sets of headers into a buffer, and then using up the buffered data
before the next query.

While we don't have is_header, I would be wary of queries that involve
more than 5 or 6 messages. It's easy to query for just the header of
*one* message (just add LIMIT 1), but a query for several messages will
have to include the bodies, which can be up to several megabytes each
sometimes - and we still use mysql_store_result. A large query would
result in a RASM hog every time we hit a series of big messages.

But querying for 5 messages and storing the headers would work even
without is_header. This is the query I'm thinking of: (line wrapped for
clarity; warning: not tested)

SELECT msg.message_idnr, blk.messageblk_idnr, blk.messageblk
FROM dbmail_messages msg, dbmail_messageblks blk
WHERE msg.physmessage_id = blk.physmessage_id
AND msg.message_idnr BETWEEN '%llu' AND '%llu'
AND msg.mailbox_idnr='%llu'
ORDER BY msg.message_idnr ASC, blk.messageblk_idnr ASC;

Then we loop through the result set, only use the first messageblk for
each message_indr, parse the headers, and fetch them to the client. This
can be implemented in the current code, and I suspect it can speed
things up (and make them scale much better) even at this stage.

When we finally have is_header, we can only query for headers. Then we
can query for, and buffer, something like 50 or 100 headers at one time,
with no changes to the database layer in the code.A header will not take
more than several kilobytes of RAM, so both the quering and the
buffering bbecome easy on RAM.

These ideas are for the current code base. I just won't be able to
handle those buffers the right way, tracking each pointer to a free() in
all the code; so I won't be able to code them. I hope someone here is
better at C than me :)

Yours, Mikhail Ramendik

[Dbmail-dev] Ideas on further _ic_fetch() speedup

Reply via email to