[Dovecot] RFC: grouped (f)sync

Attila Nagy Wed, 05 Jan 2011 14:28:11 -0800

 Hello,

Currently Dovecot (and any other application, which cares about e-maildelivery) does at least one fsync per mail delivery. Given that harddisk drives have a very limited IOPS, this effectively limits themaximum mail delivery performance to a very low value, under utilizingthe available storage IO capacity.Calculating with an average mail size of 50 kB and an average consumerHDD with 120 IOPS, the theoretical mail delivery performance will be 50kB*120 IOPS=5.85 MBps. But if we could write 500 kB with everytransaction, the delivery speed would be nearly 10 times as well.

Dovecot have two process models: separate processes for each clientconnection and an async in-process multiplexing method. This works foreach one, albeit the timing is somewhat different.So here's the idea: instead of fsyncing immediately in the LDA (lmtpd)every time when the client says "\r\n.\r\n" after the DATA phase, let'sintroduce a user settable timer (let's call that sync_delay from now on)and only sync in every sync_delay seconds.This would introduce an up to sync_delay seconds delay in lmtpdreturning "250 Ok" to the client, but that's generally not a problem,because in high traffic setups there is a great amount of concurrency,so you could use a lot of client connections easily.

Take an example setting of sync_delay = 100 ms.

With this, 10 syncs would happen in every second from Dovecot LDAprocesses, meaning if a client connects in t=0 it will immediately gotthe response 250, if a client connects in t=0.05, it will get theresponse in 50 ms (in an ideal world, where syncing does not take time),and the committed blocks could accumulate for a maximum of 100 ms.In a busy system (where this setting would make sense), it means itwould be possible to write more data with less IOPS needed.

I can see two problems:

1. there is no call for committing a lot of file descriptors in onetransaction, so instead of fsync() for each of the modified FDs, async() would be needed. sync() writes all buffers to stable storage,which is bad if you have a mixed workload, where there are a lot ofnon-fsynced data, or other heavy fsync users. But modern file systems,like ZFS will write those back too, so there an fsync(fd) is -AFAIK-mostly equivalent with a sync(pool on which fd is). sync() of course issystem wide, so if you have other file systems, those will be synced aswell. (this setting isn't for everybody)2. in a multiprocess environment this would need coordination, soinstead of doing fsyncs in distinct processes, there would be oneprocess needed, which does the sync and returns OK for the others, sothey can notify the client about the commit to the stable storage.


Any opinions on this?

[Dovecot] RFC: grouped (f)sync

Reply via email to