Re: [Dovecot] dbox redesign

2009-02-17 Thread Steffen Kaiser

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, 12 Feb 2009, Allen Belletti wrote:


I would add that having fewer, larger files should make backups much
more feasible.  There's a certain amount of overhead for each file


That's true for full backups. I don't defend Maildir, esp. because it 
changes the filename, which is a new file for any backup software (which 
are usually not Maildir aware in my case).



operation (especially for us GFS people!) and reducing the number of
files will reduce that overhead.


Also, makes partial recoveries problematic.


Right now our backups (done via rsync) take a pretty scary amount of


Yep.

Bye,

- -- 
Steffen Kaiser

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iQEVAwUBSZqWHHWSIuGy1ktrAQL7lggAnWUfIIPwVj3xV4csIVl0ayVCn2lgBGzG
lRIg+OzGRbpZx9uvwJpPRtJS7TphFTBmctvdvL22NROGaXJh0bmvdfgeXnUf+IG/
YJqNEGN5j/yUAHON4l9hMnv9JjgWwSIFKCUZJ7MFVJpohpPLXJoxDt+AYyb+d+44
GImHDgpcyr0089Asv6FN8Q4rzGIxQAvdIx/n/nMeQ77ZVnbTJDtrcuNUywcV7Hqq
lYbEX83ikq206QSJjmwM1j6w5n+PAsHWE8UJdmmpP/7vemsg3KDVkhaMcCfhLyL4
FqDZOwuhhsVEjykLfgbx6onJ1bon7u987eqJt5yv1NYRGU+NoCVUDQ==
=+s98
-END PGP SIGNATURE-


Re: [Dovecot] dbox redesign

2009-02-12 Thread Timo Sirainen
On Thu, 2009-02-12 at 11:29 +0100, Mikkel wrote:
> Hi Timo
> 
> I have a few comments. Please just disregard them if I have 
> misunderstood your design.
> 
> Regarding your storage plan
> I find it very important that users can be stored in different locations 
> because:

This you misunderstood. The mails of a single user are stored in one
dbox directory, not all users.

> Regarding 7.
> I very much for all the self healing you describe.
> There is nothing worse than huge complex systems that fail just because 
> of some minor error that could easily be fixed without manual intervention.
> But also I'm a little worried in this regard.
> 
> Maildir is so robust that nothing can really go wrong.

Yes. If you don't care that much about performance Maildir is going to
be more reliable, especially when recovering from filesystem corruption.

> It should be very resilient to temporarily losing access to all files in 
> this operation (could happen very often on NFS mounts).

I/O errors and such are treated differently than corrupted/missing
files. So as long as reading gives an error it doesn't try to repair
anything.

> Also I imagine the self-healing going into loops if it doesn't 
> understand what’s going on.
> If the data changes dues to manual intervention or par of the file 
> system can be accessed you could imagine the self healing process trying 
> again and again to fix something that isn't its job to fix.
> In that case it would be better if it just skipped the apparent failures.

I'm not really sure what you're thinking about here. Assuming there
aren't bugs in the fixup code, it should be able to fix things. If
someone manually goes and breaks things again, then sure it fixes them
again later, but there's really no automatic looping. Also Dovecot
already does index file fixing if it notices corruption, so this won't
be all that much different.

> If there is serious data corruption and you have only one file then all 
> operations are paused while the self healing is trying to figure out 
> what went wrong

There will be multiple files even per user, but yes, if corruption is
noticed then the user is blocked until the corruption is fixed.

>  (and what happens if different servers decide to do 
> self-healing on this one file at the same time?).

The same as if two processes in one server decide to self-heal: Locking
prevents it from happening.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] dbox redesign

2009-02-12 Thread Allen Belletti

I would add that having fewer, larger files should make backups much
more feasible.  There's a certain amount of overhead for each file
operation (especially for us GFS people!) and reducing the number of
files will reduce that overhead.

Right now our backups (done via rsync) take a pretty scary amount of
time, only to get worse as the size of the mailstore (currently 200G) grows.

Personally I'm pretty excited about dbox.

Allen

Timo Sirainen wrote:

On Wed, 2009-02-11 at 14:32 -0800, Seth Mattinen wrote:
  

Timo Sirainen wrote:


This is about how to implement multiple msgs/file dbox format. The
current v1.1's one msg/file design would stay pretty much the same and
it would be compatible with this new design.

  

Out of curiosity, what's the advantage to going to multiple messages per
file? Wouldn't this have the same problems as mbox?



Multiple per file, not everything in one file. As long as the file size
is set "right", it's probably faster than one per file. We'll see :)

  


--
Allen Belletti
al...@isye.gatech.edu 404-894-6221 Phone
Industrial and Systems Engineering404-385-2988 Fax
Georgia Institute of Technology


Re: [Dovecot] dbox redesign

2009-02-12 Thread Mikkel

Hi Timo

I have a few comments. Please just disregard them if I have 
misunderstood your design.


Regarding your storage plan
I find it very important that users can be stored in different locations 
because:
1. Discount users could be placed on cheap storage while others are 
offered premium service on expensive hardware
2. It's easy to scale if you just add another LUN from your SAN or mount 
from NAS
3. In order to avoing huge directories you can put users into subdirs 
with each subdir containing only say 1000 users each
All this is very easy to achieve in 1.1 because you can return 
individual storage dirs for indexes and data from the user db.
I'm not sure from reading your post whether this will still be possible 
but I believe it’s a very important thing.



Regarding 7.
I very much for all the self healing you describe.
There is nothing worse than huge complex systems that fail just because 
of some minor error that could easily be fixed without manual intervention.

But also I'm a little worried in this regard.

Maildir is so robust that nothing can really go wrong. But here you have 
index files and data files located in different places.
Imagine the index file being on one NFS mount whilst the data resides on 
another.
Or if the administrator is purposely loading a different index file or 
data file from a backup.


Worst case scenario is that the self healing takes a manual operation 
for a failure and breaks something.
It should be very resilient to temporarily losing access to all files in 
this operation (could happen very often on NFS mounts).


Also I imagine the self-healing going into loops if it doesn't 
understand what’s going on.
If the data changes dues to manual intervention or par of the file 
system can be accessed you could imagine the self healing process trying 
again and again to fix something that isn't its job to fix.

In that case it would be better if it just skipped the apparent failures.


Timo wrote:
>I'm also wondering if it's better for each mailbox to have its separate
>dovecot.index.cache file or if there should be one cache file for the
>map index.
I think you should consider more files as the general choice (not only 
regarding cache files).
Imagine many dovecot servers accessing the same storage simultaneously. 
I figure it would be a lot easier if they weren’t all trying to 
read/update one essential file at the same time (with only one file, 
load can’t be spread across multiple mounts and everything goes down if 
the mount with the essential file is inaccessible).
If there is serious data corruption and you have only one file then all 
operations are paused while the self healing is trying to figure out 
what went wrong (and what happens if different servers decide to do 
self-healing on this one file at the same time?).
With one file per maildir only a small portion of the users are 
affected, the load is spread and really bad file corruption doesn’t 
break everything for thousands of users.


Other than that I’m just really glad that dbox is progressing. I 
consider it the feature.
Dbox is the email administrator’s wet dream. I’m already dreaming of 
completely avoiding the scalability issues of large Maildirs (which is 
the biggest challenge today in my opinion) and reducing the IO. Buying 
more IO is an order of magnitude more expensive than getting more RAM or 
CPU power (and dovecot barely needs any RAM and CPU anyway).


Best wishes, Mikkel



Re: [Dovecot] dbox redesign

2009-02-11 Thread Timo Sirainen
On Wed, 2009-02-11 at 17:35 -0500, Timo Sirainen wrote:
> On Wed, 2009-02-11 at 14:32 -0800, Seth Mattinen wrote:
> > Timo Sirainen wrote:
> > > This is about how to implement multiple msgs/file dbox format. The
> > > current v1.1's one msg/file design would stay pretty much the same and
> > > it would be compatible with this new design.
> > > 
> > 
> > Out of curiosity, what's the advantage to going to multiple messages per
> > file? Wouldn't this have the same problems as mbox?
> 
> Multiple per file, not everything in one file. As long as the file size
> is set "right", it's probably faster than one per file. We'll see :)

Also there are no locking issues since reading doesn't require locking
and write locks are very short lived. Corruption isn't possible because
data is never copied within a file. A crash can happen at any point and
Dovecot will be able to recover from it 100%. The worst that can happen
is that some extra garbage is left lying around for some time wasting
disk space.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] dbox redesign

2009-02-11 Thread Timo Sirainen
On Wed, 2009-02-11 at 14:32 -0800, Seth Mattinen wrote:
> Timo Sirainen wrote:
> > This is about how to implement multiple msgs/file dbox format. The
> > current v1.1's one msg/file design would stay pretty much the same and
> > it would be compatible with this new design.
> > 
> 
> Out of curiosity, what's the advantage to going to multiple messages per
> file? Wouldn't this have the same problems as mbox?

Multiple per file, not everything in one file. As long as the file size
is set "right", it's probably faster than one per file. We'll see :)



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] dbox redesign

2009-02-11 Thread Seth Mattinen
Timo Sirainen wrote:
> This is about how to implement multiple msgs/file dbox format. The
> current v1.1's one msg/file design would stay pretty much the same and
> it would be compatible with this new design.
> 

Out of curiosity, what's the advantage to going to multiple messages per
file? Wouldn't this have the same problems as mbox?

~Seth


[Dovecot] dbox redesign

2009-02-11 Thread Timo Sirainen
This is about how to implement multiple msgs/file dbox format. The
current v1.1's one msg/file design would stay pretty much the same and
it would be compatible with this new design.

dbox directories with multiple msgs/file would be like:

~/dbox/storage/ has the actual mail data for all mailboxes
~/dbox/mailboxes/ has subdirectories containing mailboxes and their
indexes

Also since dbox supports already the single msg per file, those files
would be stored in the mailboxes/ directory. So the idea would be that
either you use multiple msgs per file using a global storage, or you use
single msg per file without a global storage (or it's also possible to
be in a mixed setup with some mails in storage/ and some in mailboxes/,
mainly to allow migration between those configurations).

The storage/ directory would have a new "map index" which is a regular
dovecot index (dovecot.index and dovecot.index.log). So the mailbox
index would point to mails using an intermediary "map UID". This way if
mails are moved to another file only the map index needs to be updated.

GUID would be a globally unique 128 bit ID for messages. So if map
indexes get corrupted for any reason it's possible to rebuild it by
finding the mails using GUIDs.

v1.1 dbox has this "dbox.index" file which I was originally planning on
using with multiple msgs/file. It had complex file range locking stuff.
Now I'm thinking that it's pretty much useless. The only reason for its
existence with the new design is for listing metadata for files
converted from Maildir.

Map index record would contain:
 - 32 bit map UID
 - 8 bit flags (MAIL_DELETED flag = message marked as expunged)
 - 8 bit unused wasted space
 - 16 bit refcount
 - 32 bit file sequence
 - 32 bit file offset
 --> total 128 bits/msg

Mailbox index:
 - IMAP UID, flags, keywords, etc.
 - 32 bit map UID
 - 128 bit GUID

dbox file metadata:
 - 128 bit GUID
 - size, vsize, received time, saved time, etc.
 - initial mailbox name (if all indexes get trashed, we can still figure
out at least one mailbox where to put the mail. copies would get lost
though.)
 (- no map UID, no imap UID)

How to save a message with multiple msgs/file:

1. Find dbox file where to append to:
1.1. Look up the last message from map index
1.2. Is the file "too old"? (or doesn't exist at all)
 - Yes -> Create new dbox file
1.3. Is the file "too large"?
 - Yes -> Look at the previous file (one sequence less) and goto 1.2.
1.4. Try to lock the file.
 - Fail -> Look at prev file and goto 1.2.

Now we have a locked/new dbox file where we can write to. Because 1.4.
step only tries to lock the file, there's no waiting on locks. This also
means that if e.g. two processes are writing new messages rapidly they
may be appending actively to two different files. I don't think that's a
problem, better than waiting for locks.

2a) We're using an existing file and we need to find the append offset.
Since we found the file by finding the last msg in the file, we also
know the last message's offset. I wasn't really planning on saving the
message sizes in the index file, so to get the append offset I guess it
needs to do an extra read on the last msg's header to find the size and
skip over it. Hmm. Or would it be less disk I/O to store the size on the
index so it could be found directly? I'm not really sure..

In any case, after we find the append offset, check to see if it's at
EOF. If not, that means that either another process just saved a new
message there or a process crashed previously and left garbage lying
around. Refresh the map index to see if this file+offset exists in it.
If not, truncate the file and just continue writing there. If it exists,
figure out the new append offset and see again if the file limit would
be reached. If the file would become too large, unlock the file and goto
step 1.

2b) We're writing to a new file. No need to worry about anything in 2a)

3. Write the message and its metadata to dbox file (including generated
128 bit GUID).

4. Assign map UIDs for the written mails and write APPEND records to map
index's transaction log. The record would contain the map UID, file seq,
offset, refcount=1. The transaction is saved with a "weak" flag (wonder
if there's a better name for this) and its offset is remembered.
 - If we're creating a new dbox file, it's assigned the file seq and
rename()d to the final file name while the map index is locked.

5. Write APPEND record to mailbox index's transaction log with IMAP UID,
map UID and GUID (and flags, keywords, etc).

6. Write "commit offset=x" record where x is the offset remembered in
step 4. This marks the 4's weak transaction as being fully finished.

7. dbox file is unlocked (if we weren't creating a new file).

When reading the index and we see a weak transaction without a commit
record, call a resolve() function in dbox code. It finds the dbox file
in the weak transaction and tries to lock it. If it can't lock it, it
(probably) means that there's still a process 

Re: [Dovecot] dbox redesign

2007-05-18 Thread Gunter Ohrner
Am Samstag, 19. Mai 2007 schrieb Timo Sirainen:
> 1) Have another human readable mailbox ID <-> name mapping file which
> is used if the binary index is corrupted. If mailboxes are
> created/deleted/renamed often, this would just slow things down. Might
> be a good idea optionally though.

The Mailbox structure usually is not changed that often. Maybe just 
provide a way to dump/export the current mapping to a specially formatted 
text file and a way to manually load/import a provided dump file.

This way, administrators can configure daily cron jobs to dump the current 
mailbox state and if a mapping really gets lost, a "pretty good" mapping 
could be reconstructed without any runtime penalty.

> 2) If the ID <-> name mapping is lost, the mailboxes could be created
> using those IDs as their names.

Yes, for example, with the option to overwrite this synthesized mapping 
with the latest dump.

Greetings,

  Gunter

-- 
*** Powered by AudioScrobbler --> http://www.last.fm/user/Interneci/ ***
15:30 | Within Temptation - The Promise
15:24 | Within Temptation - Mother Earth
15:19 | Within Temptation - Ice Queen
14:21 | Within Temptation - What Have You Done (Rock Mix)
*** PGP-Verschlüsselung bei eMails erwünscht :-) *** PGP: 0x1128F25F ***


pgpIFBeMYZBby.pgp
Description: PGP signature


Re: [Dovecot] dbox redesign

2007-05-18 Thread Timo Sirainen
On Wed, 2007-05-16 at 20:27 +0200, Gunter Ohrner wrote:
> Am Mittwoch, 16. Mai 2007 schrieb Timo Sirainen:
> > > Yes, I think treating mailboxes similary to keywords is ideal.  There
> > Except if you want to handle some mailboxes in a special way it's
> > easier if they're separated on disk. Such as renaming or deleting
> > mailboxes is a lot easier.They're based on filtering rules. I don't 
> think they support "copying"
> messages. So the virtual folders are easily rebuilt by just re-applying
> the filters into all the messages.
> 
> Not neccessarily if you add one level of indirection, simply numbering the 
> mailboxes by index numbers internally and providing a number/name mapping 
> somewhere. This way, a mailbox can be renamed easily simply by updating 
> the map, and might by deleted by removing the map entry. Stale index 
> number may be left in the messages and might cleaned up the next time a 
> message's folder list is updated or messages are expunged.

Right. This would also make it use less space inside the dbox files.
There already exists a mailbox list index in v1.1 which contains mailbox
ID <-> name mappings. But I'm still a bit concerned of its stability.
There are two things that could be done:

1) Have another human readable mailbox ID <-> name mapping file which is
used if the binary index is corrupted. If mailboxes are
created/deleted/renamed often, this would just slow things down. Might
be a good idea optionally though.

2) If the ID <-> name mapping is lost, the mailboxes could be created
using those IDs as their names. That would be a lot better than just
having all the mails merged into a single mailbox. As additional help,
there could be a couple of built-in mailbox IDs for INBOX, Trash and
Drafts. Perhaps that could be admin-configurable, but then again adding
new IDs could make it conflict with existing ones. Perhaps just a single
1=INBOX would be enough..

The mailbox IDs could have a validity number as well, similar to
UIDVALIDITY for message UIDs. That would make sure that it's safe to use
the validity+ID combination to uniquely and permanently identify a
mailbox, even if the mailbox list mapping was completely rebuilt (in
that case it would get a new validity).


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] dbox redesign

2007-05-16 Thread Gunter Ohrner
Am Mittwoch, 16. Mai 2007 schrieb Gunter Ohrner:
> > mailboxes is a lot easier.They're based on filtering rules. I don't
> think they support "copying"
> messages. So the virtual folders are easily rebuilt by just re-applying
> the filters into all the messages.

Whoops, this yunk should not have been in the message... Looks as if I 
accidentially middle-clicked somehow... :-/

Greetings,

  Gunter

-- 
*** Powered by AudioScrobbler --> http://www.last.fm/user/Interneci/ ***
21:54 | The Retrosic - Silence
21:49 | The Retrosic - Deathdealer
21:44 | The Retrosic - Bloodsport
21:40 | The Retrosic - Desperate Youth
*** PGP-Verschlüsselung bei eMails erwünscht :-) *** PGP: 0x1128F25F ***


pgpzOQNrA3QmO.pgp
Description: PGP signature


Re: [Dovecot] dbox redesign

2007-05-16 Thread Gunter Ohrner
Am Mittwoch, 16. Mai 2007 schrieb Timo Sirainen:
> > Yes, I think treating mailboxes similary to keywords is ideal.  There
> Except if you want to handle some mailboxes in a special way it's
> easier if they're separated on disk. Such as renaming or deleting
> mailboxes is a lot easier.They're based on filtering rules. I don't 
think they support "copying"
messages. So the virtual folders are easily rebuilt by just re-applying
the filters into all the messages.

Not neccessarily if you add one level of indirection, simply numbering the 
mailboxes by index numbers internally and providing a number/name mapping 
somewhere. This way, a mailbox can be renamed easily simply by updating 
the map, and might by deleted by removing the map entry. Stale index 
number may be left in the messages and might cleaned up the next time a 
message's folder list is updated or messages are expunged.

Greetings,

  Gunter

-- 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
PEOPLE'S WHOLE LIVES *DO* PASS IN FRONT OF THEIR EYES BEFORE THEY DIE. 
THE PROCESS IS CALLED 'LIVING'.-- (Terry Pratchett, The Last 
Continent)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   PGP-verschlüsselte Mails bevorzugt! +
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


pgpUBHMsDA8vO.pgp
Description: PGP signature


Re: [Dovecot] dbox redesign

2007-05-16 Thread Timo Sirainen
On Wed, 2007-05-16 at 07:47 -0400, Charles Marcus wrote:
> >> Although one possibility would be treat mailboxes a bit similarly 
> >> than keywords. So that when a message is copied to another mailbox,
> >> the message in dbox file is updated to contain information that it 
> >> exists in such and such mailboxes. Hmm. Perhaps that would be good
> >> enough, yes.
> 
> > Yes, I think treating mailboxes similary to keywords is ideal.  There 
> > really is no reason to physically separate mailboxes on disk.  All 
> > that is needed is this logical separation if it can be done in a 
> > reliable way.
> > 
> > Or maybe track this in mailbox-specific index files, and also have a 
> > corespodning text file that stores a list of messages that are 
> > contained in that mailbox... similar to maildir's dovecot-uidlist 
> > file.  Then if you lose the index you can rebuild the index from the 
> > text file.
> 
> This sounds suspiciously like 'virtual folders', that are supported by 
> both Evolution and Thunderbird... how do they do it?

They're based on filtering rules. I don't think they support "copying"
messages. So the virtual folders are easily rebuilt by just re-applying
the filters into all the messages.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] dbox redesign

2007-05-16 Thread Charles Marcus
Would be nice if copying a message from one mailbox to another 
wouldn't require actually reading+writing the whole message

contents. But I can't really figure out how to implement this
without requiring that there is only a single dbox storage which
contains the mails for all the mailboxes, and the mailboxes
themselves are just Dovecot's index files containing pointers to
the dbox storage.

The problem with having everything in one storage is that if the 
index files are broken, the messages can't be placed into correct 
mailboxes anymore.


Although one possibility would be treat mailboxes a bit similarly 
than keywords. So that when a message is copied to another mailbox,
the message in dbox file is updated to contain information that it 
exists in such and such mailboxes. Hmm. Perhaps that would be good

enough, yes.


Yes, I think treating mailboxes similary to keywords is ideal.  There 
really is no reason to physically separate mailboxes on disk.  All 
that is needed is this logical separation if it can be done in a 
reliable way.


Or maybe track this in mailbox-specific index files, and also have a 
corespodning text file that stores a list of messages that are 
contained in that mailbox... similar to maildir's dovecot-uidlist 
file.  Then if you lose the index you can rebuild the index from the 
text file.


This sounds suspiciously like 'virtual folders', that are supported by 
both Evolution and Thunderbird... how do they do it?


--

Best regards,

Charles


Re: [Dovecot] dbox redesign

2007-05-16 Thread Timo Sirainen
On Wed, 2007-05-16 at 06:40 -0400, Bill Boebel wrote:
> > Although one possibility would be treat mailboxes a bit similarly than
> > keywords. So that when a message is copied to another mailbox, the
> > message in dbox file is updated to contain information that it exists in
> > such and such mailboxes. Hmm. Perhaps that would be good enough, yes.
> > 
> 
> Yes, I think treating mailboxes similary to keywords is ideal.  There
> really is no reason to physically separate mailboxes on disk.  All
> that is needed is this logical separation if it can be done in a
> reliable way.

Except if you want to handle some mailboxes in a special way it's easier
if they're separated on disk. Such as renaming or deleting mailboxes is
a lot easier.

> Or maybe track this in mailbox-specific index files, and also have a
> corespodning text file that stores a list of messages that are
> contained in that mailbox... similar to maildir's dovecot-uidlist
> file.  Then if you lose the index you can rebuild the index from the
> text file.

Except that such mailbox-messagelist file could also be counted as
"index file", and losing it again loses the messages :) That's why I
thought saving the mailbox name in the message file's headers would be
better. If you then lose the mailbox name, you most likely have lost the
message itself as well. Also it makes it easier to restore individual
messages from backups.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] dbox redesign

2007-05-16 Thread Bill Boebel
On Sat, May 12, 2007 9:10 am, Timo Sirainen <[EMAIL PROTECTED]> said:

> Fast copying
> 
> 
> Would be nice if copying a message from one mailbox to another wouldn't
> require actually reading+writing the whole message contents. But I can't
> really figure out how to implement this without requiring that there is
> only a single dbox storage which contains the mails for all the
> mailboxes, and the mailboxes themselves are just Dovecot's index files
> containing pointers to the dbox storage.
> 
> The problem with having everything in one storage is that if the index
> files are broken, the messages can't be placed into correct mailboxes
> anymore.
> 
> Although one possibility would be treat mailboxes a bit similarly than
> keywords. So that when a message is copied to another mailbox, the
> message in dbox file is updated to contain information that it exists in
> such and such mailboxes. Hmm. Perhaps that would be good enough, yes.
> 

Yes, I think treating mailboxes similary to keywords is ideal.  There really is 
no reason to physically separate mailboxes on disk.  All that is needed is this 
logical separation if it can be done in a reliable way.

Or maybe track this in mailbox-specific index files, and also have a 
corespodning text file that stores a list of messages that are contained in 
that mailbox... similar to maildir's dovecot-uidlist file.  Then if you lose 
the index you can rebuild the index from the text file.

Bill



[Dovecot] dbox redesign

2007-05-12 Thread Timo Sirainen
I don't think anyone uses dbox currently, so the whole format could
still be redesigned. So I was thinking about doing two major changes:

1. Rely on index files a lot more. The flags are already stored in index
files, so there's no need to waste I/O updating them to dbox files all
the time. They could still be updated (if indexes get deleted, the flags
aren't all gone), but less often.

2. Require fcntl() locking. Currently dbox uses dotlocks which is slow.

Cydir could be a good alternative also once index file code is made a
bit more robust. Perhaps I could implement single instance attachments
for cydir too..

Locking
===

The current dbox "index" file would be gone. It's pretty useless.
Replace it with a whole new index file. Or perhaps it should be called
"locks" file or something.

The locks file would contain records:  
. So something like:


1 4645a60f 0
2  N
3  D

That would mean that the first file is locked by some process, either
for appending or expunging. The 2nd and 3rd files aren't locked. New
messages can't be appended to 2nd file anymore. 3rd file is already
deleted and this record needs to be removed when rewriting the file.

Locking a dbox file for either appends or expunges is done like:

1. See if timestamp is zero
 - If not, see if it's older than .. let's say a day or so ..
a) Yes: Continue to 2.
b) No: Assume the file is locked
2. Do fcntl() byte range lock over the record line in the locks file.
 - If it failed, the record is locked
3. Write the timestamp.
4. Compare stat() and fstat() inodes to see if the file was rebuilt
  - If yes, reopen the file and goto 1
5. File is now locked. Do the append/expunge.
6. Write timestamp to zero.
7. Unlock the byte range.

If a file is locked, append will try another file and expunge will mark
the message as expunged instead of actually expunging it yet.

Note that the locks file is read without locking. This is safe because
data is never moved within the file, and it doesn't matter if the
timestamp isn't read correctly always. The timestamp check is only an
optimization. Actually I'm not sure if it would be better not to have
the timestamp at all.

Deleted records will stay in the file until the file is rebuilt. If a
deleted record is noticed in the file, the process tries to lock the
whole locks file. If it succeeds, it proceeds with writing the
non-deleted records to a temporary file and rename()ing it over the
locks file.

Appending
=

1. Find the first file in the locks file that has appendable=0
 - If no such file was found, go to "create a new file" logic as
described below
2. Lock the file record
3. Verify from the file's headers that this file can actually be
appended to
 - If messages have been expunged from a dbox file, it can't be safely
appended to anymore.
 - Other reasons include eg. configurable max. file size and daily
rotations
4. Write the mail
5. Lock locks file's header
6. Get the UID from "next uid" field and update it
7. Unlock the header
8. Unlock the file record
9. Update index file

Create a new file logic:

1. Create a temporary file
2. Write the messages there
3. Lock locks file's header (including fstat() / stat() rebuild check)
4. See what the latest file ID is in the file
5. rename() temp fail to msg.
6. Lock locks file for the range of the to-be-written record below
7. Write the new record to locks file
8. Go to step 6 in the original append logic

Syncing / expunging
===

If locks file's header's "next uid" doesn't match the one currently in
index file, the appending crashed between steps 6 and 9. Find the new
message(s) and append them to index file.

If the locks file is completely gone , rebuild it by going through all
the msg.* files in the directory.

If "expunge counter" (see below) doesn't match in locks file's header
vs. index file header, go through all the msg.* files to see if a
message exists in multiple files. If found, remove the duplicates.

Typically neither of the above happens, so the only thing to do here is
to write changes from index file to dbox files. This may mean flag
changes once in a while, but most importantly expunges will always be
synced.

Initially figure out what files require expunging. Try to create a
single lock range that includes all of them. If there are non-zero lock
timestamps in that range, create multiple ranges.

If a file couldn't be locked, the expunge is done by updating
expunge-flag in the file. This can be done without locking (see below
for flag updates).

If a file was successfully locked, the expunging is done by:

1. Update "expunge offset" in file header to the offset of the first
expunged message. If expunge offset is non-zero, the file is treated as
non-appendable. Also when rebuilding and finding the same message from
multiple files, this field is used to figure out which file should be
truncated.

2. Copy the rest of the non-expunged messages to a new temporary file
and add the file to locks file using