On 01/05/2011 11:38 PM, Timo Sirainen wrote:
On 6.1.2011, at 0.27, Attila Nagy wrote:

With this, 10 syncs would happen in every second from Dovecot LDA processes, 
meaning if a client connects in t=0 it will immediately got the response 250, 
if a client connects in t=0.05, it will get the response in 50 ms (in an ideal 
world, where syncing does not take time), and the committed blocks could 
accumulate for a maximum of 100 ms.
In a busy system (where this setting would make sense), it means it would be 
possible to write more data with less IOPS needed.
I guess this could work. Although earlier I thought about delaying fsyncs so 
that when saving/copying multiple mails in a single transaction with maildir it 
would delay about 10 or so close()s and then fsync() them all at the same time 
at the end. This ended up being slower (but I only tested it with a single user 
- maybe in real world setups it might have worked better).
What filesystem was used for this test? If that writes only the involved FD's data with an fsync, the effect is pretty much the same when you issue fsync real time, or serialize them into nearly the same time: the file system will write small amounts of data and issue a flush after each fsync. On a file system, which writes all the dirty data for an fsync (like ZFS does), it may work better, altough only the first fsync would be necessary, with the others you will only risk that other data got into the caches and you make the solution useless with that. That's why I wrote in this case you would need to use sync() instead of fsync(), so this would make this file system independent.
Many sync() man pages write these:
FreeBSD:
BUGS
     The sync() system call may return before the buffers are completely
     flushed.
Linux:
BUGS
According to the standard specification (e.g., POSIX.1-2001), sync() schedules the writes, but may return before the actual writing is done. However, since version 1.3.20 Linux does actually wait. (This still
       does not guarantee data integrity: modern disks have large caches.)

But I think the same warning will stand against fsync too.
Otherwise, I guess a little experimenting and reading would be needed here. I think for this setting it would be OK to assume some technical knowledge on the users and say: you should only turn this on, if you have a file system, which flushes all dirty buffers for a single fsync for the entire file system. Then you would delay fsyncs for a list of FDs, and issue only one for the list, instead of one for each of the list elements.
Or just issue a single sync().

I can see two problems:
1. there is no call for committing a lot of file descriptors in one 
transaction, so instead of fsync() for each of the modified FDs, a sync() would 
be needed. sync() writes all buffers to stable storage, which is bad if you 
have a mixed workload, where there are a lot of non-fsynced data, or other 
heavy fsync users. But modern file systems, like ZFS will write those back too, 
so there an fsync(fd) is -AFAIK- mostly equivalent with a sync(pool on which fd 
is). sync() of course is system wide, so if you have other file systems, those 
will be synced as well. (this setting isn't for everybody)
2. in a multiprocess environment this would need coordination, so instead of 
doing fsyncs in distinct processes, there would be one process needed, which 
does the sync and returns OK for the others, so they can notify the client 
about the commit to the stable storage.
It's possible for you to send the fds to another process via UNIX socket that 
does fsync() on them. I was also hoping for using lib-fs for at least some 
mailbox formats at some point (either some modified dbox, or a new one), and 
for that it would be even easier to add an fsync plugin that does this kind of 
fsync-transfer to another process.
Yes, but as above stated, I don't think it will help, because on a file system, which writes only the given FD's data, it's the same, nothing gained, and on a file system, which flushes all dirty buffers, you will possibly have some dirty buffers between those fsyncs, so it will be the same, IOPS will be the limiting factor.

Reply via email to