Re: [patch] [possible race in ext2] Re: how to write get_block?

1999-10-11 Thread Eric W. Biederman

"Stephen C. Tweedie" <[EMAIL PROTECTED]> writes:

> Hi,
> 
> On Sun, 10 Oct 1999 16:57:18 +0200 (CEST), Andrea Arcangeli
> <[EMAIL PROTECTED]> said:
> 
> > My point was that even being forced to do a lookup before creating
> > each empty buffer, will be still faster than 2.2.x as in 2.3.x the hash
> > will contain only metadata. Less elements means faster lookups.
> 
> The _fast_ quick fix is to maintain a per-inode list of dirty buffers
> and to invalidate that list when we do a delete.  This works for
> directories if we only support truncate back to zero --- it obviously
> gets things wrong if we allow partial truncates of directories (but why
> would anyone want to allow that?!)
> 
> This would have minimal performance implication and would also allow
> fast fsync() of indirect block metadata for regular files.

What about adding to the end of ext2_alloc_block:

bh = get_hash_table(inode->i_dev, result, inode->i_sb->s_blocksize);
/* something is playing with our fresh block, make them stop.  ;-) */
if (bh) {
if (buffer_dirty(bh)) { 
mark_buffer_clean(bh);
wait_on_buffer(bh);
}
bforget(bh);
}

This is in a relatively slow/uncommon path and also catches
races if indirect blocks are allocated to a file, instead
of just directory syncs.

Ultimately we really want to have indirect blocks, and
the directory in the page cache as it should result in
more uniform code, and faster partial truncates (as well as faster
syncs).

Eric



Re: [RFC] truncate() (generic stuff)

1999-10-11 Thread Alexander Viro



On Mon, 11 Oct 1999, Mikulas Patocka wrote:

> > I mean the following: assume that call of hpfs_truncate() (via
> > ->truncate()) had been moved into the inode_setattr() (i.e. happens
> > immediately before the hpfs_write_inode() in hpfs_notify_change()). Will
> > it have any bad consequences? Right now the sequence looks so:
> > inode->i_size = newsize;
> > hpfs_write_inode(inode);
> > hpfs_truncate(inode);
> > Could we swap the last steps?
> 
> Probably yes, the first hpfs_write_inode is not needed at all.

Aha. OK, since with this patch ->truncate() is called only from
inode_setattr() I'm leaving the call of hpfs_write_inode() in
hpfs_notify_change() and removing it from hpfs_truncate().

> You can do the same just by examining content of dnode (dirents have fnode
> numbers). 

They do, but lookup in icache is going to be more expensive. Besides, we
will have to do it for every dirent when we are splitting dnode. And we
do not care for inodes not in core.

> > ->clear_inode() would exclude the sucker from the lists, all inodes marked
> > with I_FREEING (i.e. those that gave NULL upon igrab()) are skipped during
> > the cyclic list searches.
> 
> There is another clash if you use async write_inode. Somebody is modifying
> directory, btree operation is in progress and now - system decides to call
> write_inode. You can't wait in write_inode until directory is in
> consistent state, you just must patch dirent somwhere in memory and
> return. That means - inode must hold pointer to dirent on disk or in
> memory.
No problems - in-core inode holds a location on disk, any dirent
moving starts from setting it to 0 (with spinlock held - see above) and
ends with setting it according to new location.
write_inode():
write fnode part;
retry:
loc = location;
if (!loc)
return;
get the block corresponding to loc;
grab the spinlock;
if (loc still points to the same block) {
update dirent;
mark it dirty;
release block;
release spinlock;
return;
}
/* we had been moved */
release block;
release spinlock;
goto retry;

That's it. This is what I've done in FAT and it doesn't take much code.

> It is possible to write it, but HPFS is currently very stable
and
> race-free (the last race I had was bug in kernel) and such modifications
> would surely make a lot of problems.

OK, I definitely agree with that. It's worth doing someday, but not now.

Anyway, the new variant of truncate patch follows. Please, check whether
you have problems with the hpfs part.
Cheers,
Al
diff -urN linux-2.3.20/fs/affs/inode.c linux-bird.truncate/fs/affs/inode.c
--- linux-2.3.20/fs/affs/inode.cMon Jun 21 12:36:43 1999
+++ linux-bird.truncate/fs/affs/inode.c Mon Oct 11 10:55:11 1999
@@ -246,7 +246,6 @@
inode->u.affs_i.i_protect = mode_to_prot(attr->ia_mode);
 
inode_setattr(inode, attr);
-   mark_inode_dirty(inode);
error = 0;
 out:
return error;
diff -urN linux-2.3.20/fs/attr.c linux-bird.truncate/fs/attr.c
--- linux-2.3.20/fs/attr.c  Mon Jun 21 12:36:42 1999
+++ linux-bird.truncate/fs/attr.c   Mon Oct 11 10:54:44 1999
@@ -62,8 +62,6 @@
inode->i_uid = attr->ia_uid;
if (ia_valid & ATTR_GID)
inode->i_gid = attr->ia_gid;
-   if (ia_valid & ATTR_SIZE)
-   inode->i_size = attr->ia_size;
if (ia_valid & ATTR_ATIME)
inode->i_atime = attr->ia_atime;
if (ia_valid & ATTR_MTIME)
@@ -75,6 +73,12 @@
if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
inode->i_mode &= ~S_ISGID;
}
+   if (ia_valid & ATTR_SIZE) {
+   inode->i_size = attr->ia_size;
+   vmtruncate(inode, attr->ia_size);
+   if (inode->i_op && inode->i_op->truncate)
+   inode->i_op->truncate(inode);
+   }
mark_inode_dirty(inode);
 }
 
diff -urN linux-2.3.20/fs/coda/inode.c linux-bird.truncate/fs/coda/inode.c
--- linux-2.3.20/fs/coda/inode.cSun Sep 12 12:45:49 1999
+++ linux-bird.truncate/fs/coda/inode.c Mon Oct 11 10:59:23 1999
@@ -229,6 +229,8 @@
 if ( !error ) {
coda_vattr_to_iattr(inode, &vattr); 
coda_cache_clear_inode(inode);
+   if (iattr->ia_valid && ATTR_SIZE)
+   vmtruncate(inode, iattr->ia_size);
 }
CDEBUG(D_SUPER, "inode.i_mode %o, error %d\n", 
   inode->i_mode, error);
diff -urN linux-2.3.20/fs/hpfs/file.c linux-bird.truncate/fs/hpfs/file.c
--- linux-2.3.20/fs/hpfs/file.c Sun Sep 12 06:00:35 1999
+++ linux-bird.truncate/fs/hpfs/file.c  Mon Oct 11 17:44:44 1999
@@ -55,7 +55,6 @@
i->i_hpfs_n_secs = 0;
   

Re: HFS, QNXFS

1999-10-11 Thread Alexander Viro



On Mon, 11 Oct 1999, Jedi/Sector One wrote:

> Jeff Garzik wrote:
> > IMHO, even if an fs was bug-free in 2.2.x, lack of testing under the new
> > 2.3.x VFS implies an experimental and possibly non-working nature.
> 
>   The QNXFS is not bug-free in 2.2.x . It is useable to share some files
I'll say...
> between Linux and QNX, and when QNX is screwed up, this is really a good
> help : things can be fixed under Linux a lot easier than with a QNX
> rescue system.
>   I am a coder and maintainer of the QNX FS. The QSSL team gave me a
> valid QNX license in order to do some work like that project. But what
> they didn't tell me is that this license was limited and a few days
> after the QNX FS was in the Kernel, it expired. I tried to got in touch
> with QSSL without any success, maybe they don't want to help us in
> working on their FS for Linux.

OK, to bloody hell with QSSL. Do you have a documentation? If the
license had expired, how can you use your code at all (with what)? Do you
have valid fs images?

>   I will try to fix it for 2.3.x (in fact, I am waiting for the ReiserFS
> port for that series because my system heavily depends on ReiserFS)
???

> asap. Some features are still missing to fully support the FS (like the
> extents list) . I'd be glad to implement them as soon as I will have a
> copy of QNX to work with.
== when your contacts with QSSL will give some results, that is?
I suspect that those guys are less than excited by the interesting trick
played by Amiga idi^H^H^Hmanagement.
Does their license allow remote logins? Is it per-user or per-CPU
or what? If it does you might consider comp.os.linux.announce - somebody
might help you.



Re: [RFC] truncate() (generic stuff)

1999-10-11 Thread Mikulas Patocka

> I mean the following: assume that call of hpfs_truncate() (via
> ->truncate()) had been moved into the inode_setattr() (i.e. happens
> immediately before the hpfs_write_inode() in hpfs_notify_change()). Will
> it have any bad consequences? Right now the sequence looks so:
>   inode->i_size = newsize;
>   hpfs_write_inode(inode);
>   hpfs_truncate(inode);
> Could we swap the last steps?

Probably yes, the first hpfs_write_inode is not needed at all.

> > > I can toss the code for that in - preserving the current behaviour, but
> > > allowing to get rid of locking. Would you mind such a variant? It's not
> > > like the thing was going to take a lot of time - straightforward copy of
> > > FAT version would go.
> > 
> > Do you see any better solution than to go through all btree operations and
> > update pointer on every dirent move?
> 
>   Just put a cyclic list through the inodes referenced from the same
> dnode + hash the lists by dnode block number (hash chain going through one
> of the cyclic list members, shifted to the neighbor if we are excluding
> the thing from cyclic list; the whole mess protected by spinlock). Then
> you can quickly find all in-core inodes that belong to any given dnode.

You can do the same just by examining content of dnode (dirents have fnode
numbers). 

> ->clear_inode() would exclude the sucker from the lists, all inodes marked
> with I_FREEING (i.e. those that gave NULL upon igrab()) are skipped during
> the cyclic list searches.

There is another clash if you use async write_inode. Somebody is modifying
directory, btree operation is in progress and now - system decides to call
write_inode. You can't wait in write_inode until directory is in
consistent state, you just must patch dirent somwhere in memory and
return. That means - inode must hold pointer to dirent on disk or in
memory.
It is possible to write it, but HPFS is currently very stable and
race-free (the last race I had was bug in kernel) and such modifications
would surely make a lot of problems.

Mikulas Patocka



Re: HFS, QNXFS

1999-10-11 Thread Jedi/Sector One

Jeff Garzik wrote:
> IMHO, even if an fs was bug-free in 2.2.x, lack of testing under the new
> 2.3.x VFS implies an experimental and possibly non-working nature.

  The QNXFS is not bug-free in 2.2.x . It is useable to share some files
between Linux and QNX, and when QNX is screwed up, this is really a good
help : things can be fixed under Linux a lot easier than with a QNX
rescue system.
  I am a coder and maintainer of the QNX FS. The QSSL team gave me a
valid QNX license in order to do some work like that project. But what
they didn't tell me is that this license was limited and a few days
after the QNX FS was in the Kernel, it expired. I tried to got in touch
with QSSL without any success, maybe they don't want to help us in
working on their FS for Linux.
  I will try to fix it for 2.3.x (in fact, I am waiting for the ReiserFS
port for that series because my system heavily depends on ReiserFS)
asap. Some features are still missing to fully support the FS (like the
extents list) . I'd be glad to implement them as soon as I will have a
copy of QNX to work with.
  Scuba made a very good starting point for that FS implementation, but
I've not heard about him a lot of time ago. Still alive Scuba ?

  Best regards,
-- 
 Frank DENIS aka Jedi/Sector One aka DJ Chrysalis <[EMAIL PROTECTED]>
-> Software : http://www.jedi.claranet.fr <-
 -> Music : http://www.mp3.com/chrysalis <-



Re: [RFC] truncate() (generic stuff)

1999-10-11 Thread Alexander Viro



On Mon, 11 Oct 1999, Mikulas Patocka wrote:

> > > > hpfs - no problems, but there is something rather odd. Why on the
> > > > Earth does it forcibly write inode to disk in the end of notify_change()?
> > > > Immediately before that inode is dirtified by inode_setattr(). Mikulas, is
> > > > there something special that makes you write it ASAP?
> > > 
> > > HPFS doesn't use write_inode. On HPFS, write_inode needs to lock parent
> > > directory to update dirent. write_inode is called asynchronously and
> > > locking directory makes deadlocks. So, write_inode is NULL and inode is
> > > explicitly written on notify_change and file_release. 
> > 
> > I see. OK, will it hurt if would do hpfs_truncate() prior to the call of
> > hpfs_file_write()?
> 
> What do you mean? truncate and write together will surely cause bad
> things.

I mean the following: assume that call of hpfs_truncate() (via
->truncate()) had been moved into the inode_setattr() (i.e. happens
immediately before the hpfs_write_inode() in hpfs_notify_change()). Will
it have any bad consequences? Right now the sequence looks so:
inode->i_size = newsize;
hpfs_write_inode(inode);
hpfs_truncate(inode);
Could we swap the last steps?

> > I can toss the code for that in - preserving the current behaviour, but
> > allowing to get rid of locking. Would you mind such a variant? It's not
> > like the thing was going to take a lot of time - straightforward copy of
> > FAT version would go.
> 
> Do you see any better solution than to go through all btree operations and
> update pointer on every dirent move?

Just put a cyclic list through the inodes referenced from the same
dnode + hash the lists by dnode block number (hash chain going through one
of the cyclic list members, shifted to the neighbor if we are excluding
the thing from cyclic list; the whole mess protected by spinlock). Then
you can quickly find all in-core inodes that belong to any given dnode.
->clear_inode() would exclude the sucker from the lists, all inodes marked
with I_FREEING (i.e. those that gave NULL upon igrab()) are skipped during
the cyclic list searches.



Re: [RFC] truncate() (generic stuff)

1999-10-11 Thread Mikulas Patocka

> > >   hpfs - no problems, but there is something rather odd. Why on the
> > > Earth does it forcibly write inode to disk in the end of notify_change()?
> > > Immediately before that inode is dirtified by inode_setattr(). Mikulas, is
> > > there something special that makes you write it ASAP?
> > 
> > HPFS doesn't use write_inode. On HPFS, write_inode needs to lock parent
> > directory to update dirent. write_inode is called asynchronously and
> > locking directory makes deadlocks. So, write_inode is NULL and inode is
> > explicitly written on notify_change and file_release. 
> 
> I see. OK, will it hurt if would do hpfs_truncate() prior to the call of
> hpfs_file_write()?

What do you mean? truncate and write together will surely cause bad
things.

> > [ Once you suggested a solution for this - keep in inode pointer to
> > directory data, but I'm too lazy to implement it. HPFS is not performance
> > critical ]
> 
> I can toss the code for that in - preserving the current behaviour, but
> allowing to get rid of locking. Would you mind such a variant? It's not
> like the thing was going to take a lot of time - straightforward copy of
> FAT version would go.

Do you see any better solution than to go through all btree operations and
update pointer on every dirent move?

Mikulas Patocka



Re: [patch] [possible race in ext2] Re: how to write get_block?

1999-10-11 Thread Stephen C. Tweedie

Hi,

On Sun, 10 Oct 1999 16:57:18 +0200 (CEST), Andrea Arcangeli
<[EMAIL PROTECTED]> said:

> My point was that even being forced to do a lookup before creating
> each empty buffer, will be still faster than 2.2.x as in 2.3.x the hash
> will contain only metadata. Less elements means faster lookups.

The _fast_ quick fix is to maintain a per-inode list of dirty buffers
and to invalidate that list when we do a delete.  This works for
directories if we only support truncate back to zero --- it obviously
gets things wrong if we allow partial truncates of directories (but why
would anyone want to allow that?!)

This would have minimal performance implication and would also allow
fast fsync() of indirect block metadata for regular files.

--Stephen



Re: [patch] [possible race in ext2] Re: how to write get_block?

1999-10-11 Thread Alexander Viro



On Mon, 11 Oct 1999, Stephen C. Tweedie wrote:

> Hi,
> 
> On Sun, 10 Oct 1999 16:57:18 +0200 (CEST), Andrea Arcangeli
> <[EMAIL PROTECTED]> said:
> 
> > My point was that even being forced to do a lookup before creating
> > each empty buffer, will be still faster than 2.2.x as in 2.3.x the hash
> > will contain only metadata. Less elements means faster lookups.
> 
> The _fast_ quick fix is to maintain a per-inode list of dirty buffers
> and to invalidate that list when we do a delete.  This works for
> directories if we only support truncate back to zero --- it obviously
> gets things wrong if we allow partial truncates of directories (but why
> would anyone want to allow that?!)

AFFS. OTOH there we are dropping the known bh every time.



Re: [RFC] truncate() (generic stuff)

1999-10-11 Thread Alexander Viro



On Mon, 11 Oct 1999, Mikulas Patocka wrote:

> > hpfs - no problems, but there is something rather odd. Why on the
> > Earth does it forcibly write inode to disk in the end of notify_change()?
> > Immediately before that inode is dirtified by inode_setattr(). Mikulas, is
> > there something special that makes you write it ASAP?
> 
> HPFS doesn't use write_inode. On HPFS, write_inode needs to lock parent
> directory to update dirent. write_inode is called asynchronously and
> locking directory makes deadlocks. So, write_inode is NULL and inode is
> explicitly written on notify_change and file_release. 

I see. OK, will it hurt if would do hpfs_truncate() prior to the call of
hpfs_file_write()?

> [ Once you suggested a solution for this - keep in inode pointer to
> directory data, but I'm too lazy to implement it. HPFS is not performance
> critical ]

I can toss the code for that in - preserving the current behaviour, but
allowing to get rid of locking. Would you mind such a variant? It's not
like the thing was going to take a lot of time - straightforward copy of
FAT version would go.



Re: [RFC] truncate() (generic stuff)

1999-10-11 Thread Mikulas Patocka

>   hpfs - no problems, but there is something rather odd. Why on the
> Earth does it forcibly write inode to disk in the end of notify_change()?
> Immediately before that inode is dirtified by inode_setattr(). Mikulas, is
> there something special that makes you write it ASAP?

HPFS doesn't use write_inode. On HPFS, write_inode needs to lock parent
directory to update dirent. write_inode is called asynchronously and
locking directory makes deadlocks. So, write_inode is NULL and inode is
explicitly written on notify_change and file_release. 

[ Once you suggested a solution for this - keep in inode pointer to
directory data, but I'm too lazy to implement it. HPFS is not performance
critical ]

Mikulas Patocka



[RFC] truncate() (generic stuff)

1999-10-11 Thread Alexander Viro

I began screwing around the truncate() stuff and the following is
a status report/request for comments:
a) call of ->truncate() method (and vmtruncate()) had been moved
into the notify_change(). It is triggered if ATTR_SIZE is set. Modified
places: do_truncate(), nfsd_truncate(), nfsd_setattr() and hpfs_unlink().
Semantics had been changed (fixed, AFAICS) for nfsd_* - truncation happens
with capabilities dropped. BTW, nfsd_setattr() didn't call vmtruncate().
Fixed.
b) I'ld rather pushed the call of ->truncate() deeper - into the
inode_setattr() if possible. Situation being:
adfs, efs, isofs, qnx4 and romfs - readonly, no problems at all.
devpts, autofs - no regular files, no problems.
ext2, ufs, minix, sysv, msdos, vfat, udf, umsdos - no problems.
ntfs - very odd. No ->truncate(), but there is ->write(). IOW,
call of sys_truncate() will happily pass and just change ->i_size. Steve,
could you comment on that? No problems wrt this patch, but something is
very wrong.
procfs - there may be drago^Wbugs. That is, sys_truncate()
silently sets i_size to zero. sys_chmod() is also, erm, interesting.
Unrelated to the patch, though.
hpfs - no problems, but there is something rather odd. Why on the
Earth does it forcibly write inode to disk in the end of notify_change()?
Immediately before that inode is dirtified by inode_setattr(). Mikulas, is
there something special that makes you write it ASAP?
affs - no problems, except the call of make_inode_dirty()
immediately after inode_setattr() (which dirtifies inode immediately
before exit). I'm removing the call.
hfs - probably OK, but I'ld really like to hear the confirmation -
is there any chance of troubles (from the HFS side, VFS is OK) if the call
of ->truncate() will happen before the hfs_cat_mark_dirty()? (in
hfs_notify_change(), that is).
coda,nfs,ncpfs,smbfs - I'm bringing the call of vmtruncate() into
the end of ->notify_change() (if ATTR_SIZE is set, indeed). It preserves
the current behaviour, but I'm not sure that it's OK. Those filesystems
don't have ->truncate() and do the equivalent theselves (from
->notify_change()). Maybe vmtruncate() should be called earlier.

c) ext2-related parts (and general messing with pagecache for
directories) - still too unstable. IOW, crashes the kernel _fast_.

The following is the relatively stable part of the patch (no ext2
stuff). Please, look if you have any problems with it.
Cheers,
Al

diff -urN linux-2.3.20/fs/affs/inode.c linux-bird.truncate/fs/affs/inode.c
--- linux-2.3.20/fs/affs/inode.cMon Jun 21 12:36:43 1999
+++ linux-bird.truncate/fs/affs/inode.c Mon Oct 11 10:55:11 1999
@@ -246,7 +246,6 @@
inode->u.affs_i.i_protect = mode_to_prot(attr->ia_mode);
 
inode_setattr(inode, attr);
-   mark_inode_dirty(inode);
error = 0;
 out:
return error;
diff -urN linux-2.3.20/fs/attr.c linux-bird.truncate/fs/attr.c
--- linux-2.3.20/fs/attr.c  Mon Jun 21 12:36:42 1999
+++ linux-bird.truncate/fs/attr.c   Mon Oct 11 10:54:44 1999
@@ -62,8 +62,6 @@
inode->i_uid = attr->ia_uid;
if (ia_valid & ATTR_GID)
inode->i_gid = attr->ia_gid;
-   if (ia_valid & ATTR_SIZE)
-   inode->i_size = attr->ia_size;
if (ia_valid & ATTR_ATIME)
inode->i_atime = attr->ia_atime;
if (ia_valid & ATTR_MTIME)
@@ -75,6 +73,12 @@
if (!in_group_p(inode->i_gid) && !capable(CAP_FSETID))
inode->i_mode &= ~S_ISGID;
}
+   if (ia_valid & ATTR_SIZE) {
+   inode->i_size = attr->ia_size;
+   vmtruncate(inode, attr->ia_size);
+   if (inode->i_op && inode->i_op->truncate)
+   inode->i_op->truncate(inode);
+   }
mark_inode_dirty(inode);
 }
 
diff -urN linux-2.3.20/fs/coda/inode.c linux-bird.truncate/fs/coda/inode.c
--- linux-2.3.20/fs/coda/inode.cSun Sep 12 12:45:49 1999
+++ linux-bird.truncate/fs/coda/inode.c Mon Oct 11 10:59:23 1999
@@ -229,6 +229,8 @@
 if ( !error ) {
coda_vattr_to_iattr(inode, &vattr); 
coda_cache_clear_inode(inode);
+   if (iattr->ia_valid && ATTR_SIZE)
+   vmtruncate(inode, iattr->ia_size);
 }
CDEBUG(D_SUPER, "inode.i_mode %o, error %d\n", 
   inode->i_mode, error);
diff -urN linux-2.3.20/fs/hpfs/inode.c linux-bird.truncate/fs/hpfs/inode.c
--- linux-2.3.20/fs/hpfs/inode.cSun Sep 12 06:00:35 1999
+++ linux-bird.truncate/fs/hpfs/inode.c Mon Oct 11 10:55:25 1999
@@ -364,7 +364,6 @@
if (inode->i_sb->s_hpfs_root == inode->i_ino) return -EINVAL;
if ((error = inode_change_ok(inode, attr))) return error;
inode_setattr(inode, attr);
-  

Re: (reiserfs) Re: RE: journal requirements for buffer.c (was: Roma progress report)

1999-10-11 Thread Eric W. Biederman

Hans Reiser <[EMAIL PROTECTED]> writes:

> I feel we should encourage Linus to allow the following:
> 
> * unions in struct buffer_head and struct page containing filesystem specific
> fields comparable to the union in struct inode.

No.  

In struct buffer_head I don't have problems.

In struct page (you don't want to pay the overhead for every page of memory)...
There is the buffer_head *bh pointer that could be made more generic.

> 
> * a filesystem operation which flushes to disk a buffer or page, and leaves it
> to the filesystem whether to flush a whole commit along with it, or a whole
> bunch of semantically adjacent buffers, or to repack the buffers before writing
> them, or to write a whole bunch of semantically nearby buffers, or to fill the
> rest of the RAID stripe, or assign a block number to the page, or mark the page
> copy_on_write, or whatever else it wants.

Or allocate bounce buffers, for high memory pages. . .

Though I don't think copy_on_write and (whatever else it wants)
are necessarily hot ideas.  There need to be some constraints.

> Do the rest of you agree?

I agree that it is a good idea. Exactly how it should, and if it
should be implemented for 2.3 is a second question.

There is also the related issue of the page lock not currently working
for NFS (it needs some I/O locking).

Since I've been pushing for this for a while.  I'll see if there
is anything that could be considered a ``bug fix'' doable yet in 2.3

The recent patch to improve fdatasync performance is also somewhat
related.

Eric