Re: OOPS in nfsd, affects all 2.2 and 2.4 kernels

2000-10-28 Thread Michael Eisler


> This problem that you are addressing is caused when solaris sends a
> zero length write (I assume to implement the "access" system call, but
> I haven't checked).

more likely a long standing bug in Solaris that hasn't been stomped.

Tony, you might let Sun know that you have a way to reproduce it at
will, though there are Sun people on this alias who I'm sure will
make it a high priority to stomp this one. :-)

-mre 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: OOPS in nfsd, affects all 2.2 and 2.4 kernels

2000-10-28 Thread Michael Eisler


 This problem that you are addressing is caused when solaris sends a
 zero length write (I assume to implement the "access" system call, but
 I haven't checked).

more likely a long standing bug in Solaris that hasn't been stomped.

Tony, you might let Sun know that you have a way to reproduce it at
will, though there are Sun people on this alias who I'm sure will
make it a high priority to stomp this one. :-)

-mre 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS locking bug -- limited mtime resolution means nfs_lock() does not provide coherency guarantee

2000-09-18 Thread Michael Eisler

> >>>>> " " == Michael Eisler <[EMAIL PROTECTED]> writes:
> 
> >> I'm not clear on why you want to enforce page alignedness
> >> though? As long as writes respect the lock boundaries (and not
> >> page boundaries) why would use of a page cache change matters?
> 
>  > For the reason that was pointed earlier by someone else as to
>  > why your fix in adequate. Since the I/O is page-based, if the
>  > locks are not, then two threads on two different clients will
>  > step over each other's locked regions.
> 
> No they don't.
> 
> As I've repeatedly stated, our cache does not require us to respect
> page boundaries when writing. We do make sure that all writes pending
> on the entire file are flushed to disk before we lock/unlock a
> region. If somebody has held a lock on 2 bytes lying across a page
> boundary, and has only written within that 2 byte region, a write is
> sent for 2 bytes.

What if someone has written to multiple, non-contiguous regions of a page?

> There is no difference here between cached and uncached operation. The
> only difference is that we trust the lock to prevent other machines
> writing within the locked region.

So, assume a page is 4096 bytes long, and there are 2048 processes that each
have write locked the even numbered bytes of the first page of a file. Once
all the locks have been acquired, the page has been flushed to disk.

Now we have a clean page. Each process then writes only one byte region that
it has locked.

Now they unlock the region.

What happens?

(btw, on another nfs client, 2048 processes have locked the odd number bytes
of the first page of the same file).

If you claim that first client doesn't overwrite the updates from the
2nd client, then I'll shut up.

-mre
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS locking bug -- limited mtime resolution means nfs_lock() does not provide coherency guarantee

2000-09-18 Thread Michael Eisler


> Yes. fs/read_write calls the NFS subsystem. The problem then is that
> NFS uses the generic_file_{read,write,mmap}() interfaces. These are
> what enforce use of the page cache.

So, don't use generic*() when locking is active. It's what most other
UNIX-based NFS clients do. Even if it is "stupid beyond
belief", it works. 

> You could drop these functions, but that would mean designing an
> entire VFS for NFS's use alone. Such a decision would have to be very
> well motivated in order to convince Linus.

Avoiding corruption.

> >> As far as I can see, the current use of the page cache should
> >> be safe as long as applications respect the locking boundaries,
> >> and don't expect consistency outside locked areas.
> 
>  > Then the code ought to enforce page aligned locks. Of course,
>  > while that will produce correctness, it will violate the
>  > principle of least surprise. It might be better to simply
> 
> AFAICS that would be a bad idea, since it will lead to programs having
> to know about the hardware granularity. You could easily imagine
> deadlock situations that could arise if one program is unwittingly
> locking an area that is meant to be claimed by another.

I can't imagine any deadlock scenarios. If the app locks on a page
boundary, then accept it, otherwise return an error. But
it does violate least surprise, so I think bypassing the page
cache when locking is active is better.

> I'm not clear on why you want to enforce page alignedness though? As
> long as writes respect the lock boundaries (and not page boundaries)
> why would use of a page cache change matters?

For the reason that was pointed earlier by someone else as to why your fix in
adequate. Since the I/O is page-based, if the locks are not, then two threads
on two different clients will step over each other's locked regions.

Folks might think that NLM locking is brain dead, and they wouldn't get an
argument from me. But if you are going to document that you support it, then
please get it right.

-mre
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS locking bug -- limited mtime resolution means nfs_lock() does not provide coherency guarantee

2000-09-16 Thread Michael Eisler

> >>>>> " " == Michael Eisler <[EMAIL PROTECTED]> writes:
> 
>  > Focus on correctness and do the expedient thing first, which
>  > is:
>  > - The first time a file is locked, flush dirty pages
>  >to the server, and then invalidate the page cache
> 
> This would be implemented with the last patch I proposed.
> 
>  > - While the file is locked, do vypass the page cache for all
>  >  I/O.
> 
> This is not possible given the current design of the Linux VFS. The
> design is such that all reads/writes go through the page cache. I'm

I'm not a Linux kernel literate. However, I found your
assertion surprising. Does procfs do page i/o as well?

file.c in fs/nfs suggests that the Linux VFS has non-page interfaces
in addition to page interfaces. fs/read_write.c suggests that the
read and write system calls uses the non-page interface.

I cannot speak for Linux, but System V Release 4 dervied systems
uses the page cache primarily as a tool for each file system, yet
still hide the page interface from the code path leading from the
read/write system calls to the VFS.

> not sure that it is possible to get round this without some major
> changes in VFS philosophy. Hacks such as invalidating the cache after
> each read/write would definitely give rise to races.
> 
> As far as I can see, the current use of the page cache should be safe
> as long as applications respect the locking boundaries, and don't
> expect consistency outside locked areas.

Then the code ought to enforce page aligned locks. Of course, while
that will produce correctness, it will violate the principle of
least surprise. It might be better to simply return an error when
a lock operation is attempted on an NFS file. Assuming of course, the 
Linux kernel isn't capable of honoring a read() or write() system
whenever the file system doesn't support page-based i/o, which, again,
I'd be surprised by.

-mre
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS locking bug -- limited mtime resolution means nfs_lock() does not provide coherency guarantee

2000-09-16 Thread Michael Eisler

  " " == Michael Eisler [EMAIL PROTECTED] writes:
 
   Focus on correctness and do the expedient thing first, which
   is:
   - The first time a file is locked, flush dirty pages
  to the server, and then invalidate the page cache
 
 This would be implemented with the last patch I proposed.
 
   - While the file is locked, do vypass the page cache for all
I/O.
 
 This is not possible given the current design of the Linux VFS. The
 design is such that all reads/writes go through the page cache. I'm

I'm not a Linux kernel literate. However, I found your
assertion surprising. Does procfs do page i/o as well?

file.c in fs/nfs suggests that the Linux VFS has non-page interfaces
in addition to page interfaces. fs/read_write.c suggests that the
read and write system calls uses the non-page interface.

I cannot speak for Linux, but System V Release 4 dervied systems
uses the page cache primarily as a tool for each file system, yet
still hide the page interface from the code path leading from the
read/write system calls to the VFS.

 not sure that it is possible to get round this without some major
 changes in VFS philosophy. Hacks such as invalidating the cache after
 each read/write would definitely give rise to races.
 
 As far as I can see, the current use of the page cache should be safe
 as long as applications respect the locking boundaries, and don't
 expect consistency outside locked areas.

Then the code ought to enforce page aligned locks. Of course, while
that will produce correctness, it will violate the principle of
least surprise. It might be better to simply return an error when
a lock operation is attempted on an NFS file. Assuming of course, the 
Linux kernel isn't capable of honoring a read() or write() system
whenever the file system doesn't support page-based i/o, which, again,
I'd be surprised by.

-mre
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS locking bug -- limited mtime resolution means nfs_lock() does not provide coherency guarantee

2000-09-15 Thread Michael Eisler

> > " " == James Yarbrough <[EMAIL PROTECTED]> writes:
> 
>  > What is done for bypassing the cache when the size of a file
>  > lock held by the reading/writing process is not a multiple of
>  > the caching granularity?  Consider two different clients with
>  > processes sharing a file and locking 2k byte regions of the
>  > file and possibly updating these regions.  Suppose that each
>  > system caches 4k byte blocks.  If system A locks the first 2k
>  > of a block and system B locks the second 2k, the updates from
>  > one of the systems may be lost if these systems cache the
>  > writes.  This is because each system will write back the 4k
>  > block it cached, not the 2k block that was locked.
> 
> Under Linux writebacks have byte-sized granularity. If a page has been
> partially dirtied, we save that information, and only write back the
> dirty areas. As long as each system has restricted its updates to
> within the 2k block that it has locked, there should be no conflict.
> 
> If however one system has been writing over the full 4k block, then
> there will indeed be a race.

Using a page cache when locking is turned on is tricky at best. The
only working optimizations I know of in this area are allowing
the use of page cache when the entire file is locked.

My two cents ...

Focus on correctness and do the expedient thing
first, which is:
-  The first time a file is locked, flush dirty pages
to the server, and then invalidate the page cache
- While the file is locked, do vypass the page cache for all I/O.

Once that works, the gaping wound in the Linux NFS/NLM client will be
closed. This will give you the breathing room to experiment with
something that works more optimally (yet still correctly) in some conditions.
E.g., one possible optimization is to allow page I/O as long as the
locks are page aligned or whole file aligned.

-mre 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS locking bug -- limited mtime resolution means nfs_lock() does not provide coherency guarantee

2000-09-15 Thread Michael Eisler

  " " == James Yarbrough [EMAIL PROTECTED] writes:
 
   What is done for bypassing the cache when the size of a file
   lock held by the reading/writing process is not a multiple of
   the caching granularity?  Consider two different clients with
   processes sharing a file and locking 2k byte regions of the
   file and possibly updating these regions.  Suppose that each
   system caches 4k byte blocks.  If system A locks the first 2k
   of a block and system B locks the second 2k, the updates from
   one of the systems may be lost if these systems cache the
   writes.  This is because each system will write back the 4k
   block it cached, not the 2k block that was locked.
 
 Under Linux writebacks have byte-sized granularity. If a page has been
 partially dirtied, we save that information, and only write back the
 dirty areas. As long as each system has restricted its updates to
 within the 2k block that it has locked, there should be no conflict.
 
 If however one system has been writing over the full 4k block, then
 there will indeed be a race.

Using a page cache when locking is turned on is tricky at best. The
only working optimizations I know of in this area are allowing
the use of page cache when the entire file is locked.

My two cents ...

Focus on correctness and do the expedient thing
first, which is:
-  The first time a file is locked, flush dirty pages
to the server, and then invalidate the page cache
- While the file is locked, do vypass the page cache for all I/O.

Once that works, the gaping wound in the Linux NFS/NLM client will be
closed. This will give you the breathing room to experiment with
something that works more optimally (yet still correctly) in some conditions.
E.g., one possible optimization is to allow page I/O as long as the
locks are page aligned or whole file aligned.

-mre 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS locking bug -- limited mtime resolution means nfs_lock() does not provide coherency guarantee

2000-09-14 Thread Michael Eisler

> > " " == Jeff Epler <[EMAIL PROTECTED]> writes:
> 
>  > Is there a solution that would allow the kind of guarantee our
>  > software wants with non-linux nfsds without the cache-blowing
>  > that the change I'm suggesting causes?
> 
> How about something like the following compromise?
> 
> I haven't tried it out yet (and I've no idea whether or not Linus
> would accept this) but it compiles, and it should definitely be better
> behaved with respect to slowly-changing files.
> 
> As you can see, the idea is to look at whether or not the file has
> changed recently (I arbitrarily chose a full minute as a concession
> towards clusters with lousy clock synchronization). If it has, then
> the page cache is zapped. If not, we force ordinary attribute cache
> consistency checking.

The fix still does not provide coherency guarantees in all situations, and
at minimum, there ought to be a way to force the client provide a coherency
guarantee.

> Cheers,
>   Trond
> 
> --- linux-2.4.0-test8/fs/nfs/file.c   Fri Jun 30 01:02:40 2000
> +++ linux-2.4.0-test8-fix_lock/fs/nfs/file.c  Thu Sep 14 09:18:50 2000
> @@ -240,6 +240,20 @@
>  }
>  
>  /*
> + * Ensure more conservative data cache consistency than NFS_CACHEINV()
> + * for files that change frequently. Avoids problems with sub-second
> + * changes that don't register on i_mtime.
> + */
> +static inline void
> +nfs_lock_cacheinv(struct inode *inode)
> +{
> + if ((long)CURRENT_TIME - (long)(inode->i_mtime + 60) < 0)
> + nfs_zap_caches(inode);
> + else
> + NFS_CACHEINV(inode);
> +}
> +
> +/*
>   * Lock a (portion of) a file
>   */
>  int
> @@ -295,6 +309,6 @@
>* This makes locking act as a cache coherency point.
>*/
>   out_ok:
> - NFS_CACHEINV(inode);
> + nfs_lock_cacheinv(inode);
>   return status;
>  }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS locking bug -- limited mtime resolution means nfs_lock() does not provide coherency guarantee

2000-09-14 Thread Michael Eisler

  " " == Jeff Epler [EMAIL PROTECTED] writes:
 
   Is there a solution that would allow the kind of guarantee our
   software wants with non-linux nfsds without the cache-blowing
   that the change I'm suggesting causes?
 
 How about something like the following compromise?
 
 I haven't tried it out yet (and I've no idea whether or not Linus
 would accept this) but it compiles, and it should definitely be better
 behaved with respect to slowly-changing files.
 
 As you can see, the idea is to look at whether or not the file has
 changed recently (I arbitrarily chose a full minute as a concession
 towards clusters with lousy clock synchronization). If it has, then
 the page cache is zapped. If not, we force ordinary attribute cache
 consistency checking.

The fix still does not provide coherency guarantees in all situations, and
at minimum, there ought to be a way to force the client provide a coherency
guarantee.

 Cheers,
   Trond
 
 --- linux-2.4.0-test8/fs/nfs/file.c   Fri Jun 30 01:02:40 2000
 +++ linux-2.4.0-test8-fix_lock/fs/nfs/file.c  Thu Sep 14 09:18:50 2000
 @@ -240,6 +240,20 @@
  }
  
  /*
 + * Ensure more conservative data cache consistency than NFS_CACHEINV()
 + * for files that change frequently. Avoids problems with sub-second
 + * changes that don't register on i_mtime.
 + */
 +static inline void
 +nfs_lock_cacheinv(struct inode *inode)
 +{
 + if ((long)CURRENT_TIME - (long)(inode-i_mtime + 60)  0)
 + nfs_zap_caches(inode);
 + else
 + NFS_CACHEINV(inode);
 +}
 +
 +/*
   * Lock a (portion of) a file
   */
  int
 @@ -295,6 +309,6 @@
* This makes locking act as a cache coherency point.
*/
   out_ok:
 - NFS_CACHEINV(inode);
 + nfs_lock_cacheinv(inode);
   return status;
  }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS client option to force 16-bit ugid.

2000-09-04 Thread Michael Eisler

> My home directory lives on a SunOS 4.1.4 server, which helpfully expands 
> 16-bit UIDs to 32 bits as signed quantities, not unsigned. So any uid above 
> 32768 gets 0x added to it. 

Doesn't

http://sunsolve.Sun.COM/pub-cgi/retrieve.pl?type=0=fpatches/102394

fix this on the 4.1.4 server?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS client option to force 16-bit ugid.

2000-09-04 Thread Michael Eisler

 My home directory lives on a SunOS 4.1.4 server, which helpfully expands 
 16-bit UIDs to 32 bits as signed quantities, not unsigned. So any uid above 
 32768 gets 0x added to it. 

Doesn't

http://sunsolve.Sun.COM/pub-cgi/retrieve.pl?type=0doc=fpatches/102394

fix this on the 4.1.4 server?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/