Hi, I'm sure this has been discussed here several times already, but I haven't been able to find a comprehensive answer. My concern is the very differing performance of using ZFS locally and using it via NFS when creating lots of small files.
When creating files locally, these operations are not synchronous on ZFS. Even after a successful close, if the machine crashes immediately afterwards the file/data might get lost. If I want to commit the file, I need to fsync() it before closing or open it with O_DSYNC. This is local filesystem semantics. When creating files over NFS, after a close, I know the data is commited to stable storage even without issuing a fsync() before close. This leads to a large performance penalty over local storage. The answer I read is "NFS is conservative", but I don't see how this is manifested in the protocol. The client has the choice to write data committed or uncommitted to the server. Normally, while writing larges files, all data will be written uncommited, but with the close, all data will get flushed out in a committed fashion. I can alter this behavior by mounting the filesystem without close-to-open semantics, but this leads to the situation where some data might not have been written to the server after a close, not even uncommitted. I have to ensure that only one client works on the data, otherwise data might get corrupted. Now to my point: I like to find a way to configure client and server in a way so that using a filesystem over NFS has the same semantics as using the filesystem locally. For this it would need at least two modi- fications: the client should only issue committed writes if the application explicitly requests it via fsync() or O_DSYNC. As far as I can see the server currently has no way to distinguish between close() and fsync(); close(). Data would obviously still need to be flushed out to the server with a close(), otherwise it wouldn't be possible to work with several clients on the same data. But it would _not_ be necessary to flush the data committed to preserve close-to-open semantics. Like in the case of a local filesystem, the server can commit the data later on. Regarding the second change I'm not sure if it is necessary: if metadata/attribute are currently synchronous, they might get changed to asynchronous, too. With ZFS, I can decide to just turn of ZIL completely, but that would lead to a situation where every write would be uncommitted, even if the application explicitly requested it via fsync(). I am also aware that with using separate ZIL devices I can speed up synchronous operations up to a point that they are nearly as fast as asynchronous operations, but such ZIL devices are quite costly and add an unnecessary layer of complexity in this case. Any thoughts on this are welcome! Thanks, Arne