Hi,

I'm sure this has been discussed here several times already, but I
haven't been able to find a comprehensive answer. My concern is the
very differing performance of using ZFS locally and using it via
NFS when creating lots of small files.

When creating files locally, these operations are not synchronous
on ZFS. Even after a successful close, if the machine crashes
immediately afterwards the file/data might get lost. If I want
to commit the file, I need to fsync() it before closing or open
it with O_DSYNC. This is local filesystem semantics.

When creating files over NFS, after a close, I know the data is commited
to stable storage even without issuing a fsync() before close. This
leads to a large performance penalty over local storage. The answer
I read is "NFS is conservative", but I don't see how this is manifested
in the protocol. The client has the choice to write data committed or
uncommitted to the server. Normally, while writing larges files, all
data will be written uncommited, but with the close, all data will get
flushed out in a committed fashion.
I can alter this behavior by mounting the filesystem without
close-to-open semantics, but this leads to the situation where some data
might not have been written to the server after a close, not even
uncommitted. I have to ensure that only one client works on the data,
otherwise data might get corrupted.

Now to my point: I like to find a way to configure client and server in
a way so that using a filesystem over NFS has the same semantics as
using the filesystem locally. For this it would need at least two modi-
fications: the client should only issue committed writes if the
application explicitly requests it via fsync() or O_DSYNC. As far as I
can see the server currently has no way to distinguish between close()
and fsync(); close(). Data would obviously still need to be flushed
out to the server with a close(), otherwise it wouldn't be possible
to work with several clients on the same data. But it would _not_ be
necessary to flush the data committed to preserve close-to-open
semantics. Like in the case of a local filesystem, the server can commit
the data later on.
Regarding the second change I'm not sure if it is necessary: if
metadata/attribute are currently synchronous, they might get changed
to asynchronous, too.

With ZFS, I can decide to just turn of ZIL completely, but that would
lead to a situation where every write would be uncommitted, even if
the application explicitly requested it via fsync().
I am also aware that with using separate ZIL devices I can speed up
synchronous operations up to a point that they are nearly as fast as
asynchronous operations, but such ZIL devices are quite costly and
add an unnecessary layer of complexity in this case.

Any thoughts on this are welcome!

Thanks,
Arne

Reply via email to