Sage Weil wrote:
On Mon, 4 Dec 2006, Peter Staubach wrote:
I think that there are several points which are missing here.

First, readdirplus(), without any sort of caching, is going to be _very_
expensive, performance-wise, for _any_ size directory.  You can see this
by instrumenting any NFS server which already supports the NFSv3 READDIRPLUS
semantics.

Are you referring to the work the server must do to gather stat information for each inode?


Yes and the fact that the client will be forced to go over the wire for
each readdirplus() call, whereas it can use cached information today.
An application actually waiting on the response to a READDIRPLUS will
not be pleased at the resulting performance.

Second, the NFS client side readdirplus() implementation is going to be
_very_ expensive as well.  The NFS client does write-behind and all this
data _must_ be flushed to the server _before_ the over the wire READDIRPLUS can be issued. This means that the client will have to step through every
inode which is associated with the directory inode being readdirplus()'d
and ensure that all modified data has been successfully written out. This part of the operation, for a sufficiently large directory and a sufficiently
large page cache, could take signficant time in itself.

Why can't the client send the over the wire READDIRPLUS without flushing inode data, and then simply ignore the stat portion of the server's response in instances where it's locally cached (and dirty) inode data is newer than the server's?


This would seem to minimize the value as far as I understand the
requirements here.

These overheads may make this new operation expensive enough that no
applications will end up using it.

If the application calls readdirplus() only when it would otherwise do readdir()+stat(), the flushing you mention would happen anyway (from the stat()). Wouldn't this at least allow that to happen in parallel for the whole directory?

I don't see where the parallelism comes from.  Before issuing the
READDIRPLUS over the wire, the client would have to ensure that each
and every one of those flushes was completed.  I suppose that a
sufficiently clever and complex implementation could figure out how
to schedule all those flushes asynchronously and then wait for all
of them to complete, but there will be a performance cost.  Walking
the caches for all of those inodes, perhaps using several or all of
the cpus in the system, smacking the server with all of those WRITE
operations simultaneously with all of the associated network
bandwidth usage, all adds up to other applications on the client
and potentially the network not doing much at the same time.

All of this cost to the system and to the network for the benefit of
a single application?  That seems like a tough sell to me.

This is an easy problem to look at from the application viewpoint.
The solution seems obvious.  Give it the fastest possible way to
read the directory and retrieve stat information about every entry
in the directory.  However, when viewed from a systemic level, this
becomes a very different problem with many more aspects.  Perhaps
flow controlling this one application in favor of many other applications,
running network wide, may be the better thing to continue to do.
I dunno.

      ps
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to