Re: [Dovecot] SQL indexes [was: How to get rid of locks]

Timo Sirainen Sun, 08 Apr 2007 01:30:26 -0700

On 8.4.2007, at 10.29, Daniel L. Miller wrote:

This is probably a scary thought, but . . . what would it take for the indexing part of Dovecot to be implemented via an API/plug-in model? I'm curious about the effect of using an external SQL engine (my vote would be Firebird) for processing these, and using a open plug-in method would allow for that without binding Dovecot to a particular implementation.

Well.. It would be possible to make the lib-index API completely virtualized, but I don't think there's much point. The lib-index API actually doesn't have all that much to do with reading/writing index files. It's much more about easily manipulating mailbox metadata in memory.

For example the way I was planning on implementing SQL mail storage was to create an in-memory index and keep it updated by reading the data from SQL. The same metadata is in SQL, but it still needs to be stored into Dovecot's internal structures (== the indexes).

Then there is however dovecot.index.cache file. It's a pretty simple database, so replacing it with SQL would make more sense. The cache file API isn't virtualizable yet either, but I was planning on doing that if I ever got around to making the SQL mail storage plugin really usable.

SQL cache replacement would need to be a bit tricky however to work. Currently lib-storage API works like:

- mail_alloc() is done first. It tells what fields it most likely wants to fetch.

 - mail_set_seq() can be used to switch to whatever message in mailbox
 - mail_get_*() functions can be used to fetch the message data.

The simplest SQL implementation would just do a SQL query for each mail_get_*() call, but this would also be the slowest implementation. A bit better would be to use one SQL query to fetch all the data specified by mail_alloc() in the first mail_get_*() function call. Then if something extra is fetched that would generate extra SQL queries.

However most of the time mail_set_seq() isn't used randomly. It's mostly done only when building a reply for THREAD command. Usually searching is used:

- mailbox_search_init() specifies search arguments. For FETCH commands this is simple "sequences 1-10". - mailbox_search_next() finds the next match and calls mail_set_seq () for that mail

So hooking into these functions you could figure out in the _init() that you want to do the SQL query for messages 1-10, and the first call to _next() tells you the mail structure where you can get the list of wanted fields. So IMAP command:


UID FETCH 1:5,10:20 (ENVELOPE BODY INTERNALDATE)

Could be done with a single SQL query, something like:

select envelope, body, internaldate from message_cache where uid between 1 and 5 or uid between 10 and 20;

PGP.sig
Description: This is a digitally signed message part

Re: [Dovecot] SQL indexes [was: How to get rid of locks]

Reply via email to