On 24.3.2012, at 9.19, Attila Nagy wrote:

> Well, dsync is a very useful tool, but with continuous replication it tries 
> to solve a problem which should be handled -at least partially- elsewhere. 
> Storing stuff in plain file systems and duplicating them to another one just 
> doesn't scale.

dsync solves several other problems besides replication. Even if Dovecot had a 
super efficient replicated storage, dsync would still exist for doing things 
like:

 - migrating between mailbox formats
 - migrating from other imap/pop3 servers
 - creating (incremental) backups
 - the redesign works great for super-high latency replication (USB sticks, 
cross-planet replication :)
 - and when you really just don't want any kind of a complex replicated 
database, just something simple

So I'll need to get this working well in any case. And with the redesign the 
replication should be efficient enough to scale pretty well.

> I personally think that Dovecot could gain much more if the amount of work 
> going into fixing or improving dsync would go into making Dovecot to (be able 
> of) use a high scale, distributed storage backend.
> I know it's much harder, because there are several major differences compared 
> to the "low latency" and consistency problem free local file systems, but its 
> fruits are also sweeter for the long term. :)

Yes, I'm also planning on implementing that, but not yet.

> It would bring Dovecot into the class of open source mail servers where there 
> are currently no contenders.
> 
> BTW, for the previous question in this topic (are there any nosql dbs 
> supporting application-level conflict resolution?), there are similar 
> solutions (like CouchDB, but having some experiences with it, I wouldn't 
> recommend it for massive mail storage -at least the plain CouchDB product), 
> but I guess you would be better off with designing a schema which doesn't 
> need it at the first time.
> For example, messages are immutable, so you won't face this issue in this 
> area.
> And for metadata, maybe the solution is not to store "digested" snapshots of 
> the current metadata (folders, flags, message links for folders etc), but to 
> store the changes happening on the user's mailbox and occasionally aggregate 
> them into a last known good and consistent state.

My plan was to create similar index files as currently exists in filesystem. It 
would work pretty much the same as you described: There's a "log" where changes 
are appended, and once in a while the changes are written into an "index" 
snapshot. When reading you first read the snapshot and then apply new changes 
from the log. The conflict resolution if DB supports it would work by reading 
the two logs in parallel and figure out a way to merge them consistently, 
similar to how dsync does pretty much the same thing. Hmm. Perhaps the metadata 
log could exist exactly as the dsync data format and have dsync code do the 
merging?..

> Also, there are other interesting ideas, maybe with real single instance 
> store (splitting mime parts? Storing attachments in plain binary form? This 
> always brings up the question of whether the mail server should modify the 
> mails, can be pretty bad for encrypted/signed stuff).

This is already optionally done in v2.0+dbox. MIME attachments can be stored in 
plain binary form if they can be reconstructed back into their original form. 
It doesn't break any signed stuff.

Reply via email to