Re: [RFC/PATCH] revokeat/frevoke system calls V5

2007-02-25 Thread Alan
Whats the status on this, I was suprised to see something so important
just go dead ?


Alan
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-cifs-client Digest, Vol 39, Issue 19

2007-02-25 Thread Steve French (smfltc)

[EMAIL PROTECTED] wrote:


Send linux-cifs-client mailing list submissions to
[EMAIL PROTECTED]

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.samba.org/mailman/listinfo/linux-cifs-client
or, via email, send a message with subject or body 'help' to
[EMAIL PROTECTED]

You can reach the person managing the list at
[EMAIL PROTECTED]

When replying, please edit your Subject line so it is more specific
than Re: Contents of linux-cifs-client digest...


Today's Topics:

  1. i_mutex and deadlock (Steve French (smfltc))
  2. Re: i_mutex and deadlock (Dave Kleikamp)


--

Message: 1
Date: Fri, 23 Feb 2007 10:02:16 -0600
From: Steve French (smfltc) [EMAIL PROTECTED]
Subject: [linux-cifs-client] i_mutex and deadlock
To: linux-fsdevel@vger.kernel.org
Cc: [EMAIL PROTECTED]
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

A field in i_size_write (i_size_seqcount) must be protected against 
simultaneous update otherwise we risk looping in i_size_read.


The suggestion in fs.h is to use i_mutex which seems too dangerous due 
to the possibility of deadlock.


There are 65 places in the fs directory which lock an i_mutex, and seven 
more in the mm directory.   The vfs does clearly lock file inodes in 
some paths before calling into a particular filesystem (nfs, ext3, cifs 
etc.) - in particular for fsync but probably for others that are harder 
to trace.  This seems to introduce the possibility of deadlock if a 
filesystem also uses i_mutex to protect file size updates


Documentation/filesystems/Locking describes the use of i_mutex (was 
i_sem previously) and indicates that it is held by the vfs on three 
additional calls on file inodes which concern me (for deadlock 
possibility), setattr, truncate and unlink.


nfs seems to limit its use of i_mutex to llseek and invalidate_mapping, 
and does not appear to grab the i_mutex (or any sem for that matter) to 
protect i_size_write
(nfs calls i_size_write in nfs_grow_file) - and for the case of 
nfs_fhget (in which they bypass i_size_write and set i_size directly) 
does not seem to grab i_mutex either.


ext3 also does not use i_mutex for this purpose (protecting 
i_size_write) - ony to protect a journalling ioctl.


I am concerned about using i_mutex to protect the cifs calls to 
i_size_write (although it seems to fix a problem reported in i_size_read 
under stress) because of the following:


1) no one else calls i_size_write AFAIK (on our file inodes)
2) we don't block inside i_size_write do we ... (so why in the world do 
they take a slow mutex instead of a fast spinlock)
3) we don't really know what happens inside fsync (the paths through the 
page cache code seem complex and we don't want to reenter writepage in 
low memory conditions and deadlock updating the file size), and there is 
some concern that the vfs takes the i_mutex in other paths on file 
inodes before entering our code and could deadlock.


Any reason, why an fs shouldn't simply use something else (a spinlock) 
other than i_mutex to protect the i_size_write call?



--

Message: 2
Date: Fri, 23 Feb 2007 10:29:53 -0600
From: Dave Kleikamp [EMAIL PROTECTED]
Subject: Re: [linux-cifs-client] i_mutex and deadlock
To: Steve French (smfltc) [EMAIL PROTECTED]
Cc: linux-fsdevel@vger.kernel.org, [EMAIL PROTECTED]
Message-ID: [EMAIL PROTECTED]
Content-Type: text/plain

On Fri, 2007-02-23 at 10:02 -0600, Steve French (smfltc) wrote:
 

A field in i_size_write (i_size_seqcount) must be protected against 
simultaneous update otherwise we risk looping in i_size_read.


The suggestion in fs.h is to use i_mutex which seems too dangerous due 
to the possibility of deadlock.
   



I'm not sure if it's as much a suggestion as a way of documenting the
locking  that exists (or existed when the comment was written).

... i_size_write() does need locking around it  (normally i_mutex) ...

 

There are 65 places in the fs directory which lock an i_mutex, and seven 
more in the mm directory.   The vfs does clearly lock file inodes in 
some paths before calling into a particular filesystem (nfs, ext3, cifs 
etc.) - in particular for fsync but probably for others that are harder 
to trace.  This seems to introduce the possibility of deadlock if a 
filesystem also uses i_mutex to protect file size updates


I am concerned about using i_mutex to protect the cifs calls to 
i_size_write (although it seems to fix a problem reported in i_size_read 
under stress) because of the following:


1) no one else calls i_size_write AFAIK (on our file inodes)
   



I think you're right.

 

I produced a patch (attached to 
http://bugzilla.kernel.org/show_bug.cgi?id=7903) which
fixes this, has tested out fine, has no sideeffects/changes outside of 
cifs and simply uses 
inode-i_lock (spinlock) to protect i_size_write calls on cifs inodes 
which seemed

Re: [RFC/PATCH] revokeat/frevoke system calls V5

2007-02-25 Thread Pekka Enberg

Hi Alan,

On 2/26/07, Alan [EMAIL PROTECTED] wrote:

Whats the status on this, I was suprised to see something so important
just go dead ?


It's not dead. You can find the latest patches here:

http://www.cs.helsinki.fi/u/penberg/linux/revoke/patches/

and user-space tests here:

http://www.cs.helsinki.fi/u/penberg/linux/revoke/utils/

What they are lacking is review so I am not sure how to proceed with
the patches.

Pekka
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: end to end error recovery musings

2007-02-25 Thread Neil Brown
On Friday February 23, [EMAIL PROTECTED] wrote:
 On Fri, Feb 23, 2007 at 05:37:23PM -0700, Andreas Dilger wrote:
   Probably the only sane thing to do is to remember the bad sectors and 
   avoid attempting reading them; that would mean marking automatic 
   versus explicitly requested requests to determine whether or not to 
   filter them against a list of discovered bad blocks.
  
  And clearing this list when the sector is overwritten, as it will almost
  certainly be relocated at the disk level.  For that matter, a huge win
  would be to have the MD RAID layer rewrite only the bad sector (in hopes
  of the disk relocating it) instead of failing the whiole disk.  Otherwise,
  a few read errors on different disks in a RAID set can take the whole
  system offline.  Apologies if this is already done in recent kernels...

Yes, current md does this.

 
 And having a way of making this list available to both the filesystem
 and to a userspace utility, so they can more easily deal with doing a
 forced rewrite of the bad sector, after determining which file is
 involved and perhaps doing something intelligent (up to and including
 automatically requesting a backup system to fetch a backup version of
 the file, and if it can be determined that the file shouldn't have
 been changed since the last backup, automatically fixing up the
 corrupted data block :-).
 
   - Ted

So we want a clear path for media read errors from the device up to
user-space.  Stacked devices (like md) would do appropriate mappings
maybe (for raid0/linear at least.  Other levels wouldn't tolerate
errors).
There would need to be a limit on the number of 'bad blocks' that is
recorded.  Maybe a mechanism to clear old bad  blocks from the list is
needed.

Maybe if generic make request gets a request for a block which
overlaps a 'bad-block' it returns an error immediately.

Do we want a path in the other direction to handle write errors?  The
file system could say Don't worry to much if this block cannot be
written, just return an error and I will write it somewhere else?
This might allow md not to fail a whole drive if there is a single
write error.
Or is that completely un-necessary as all modern devices do bad-block
relocation for us?
Is there any need for a bad-block-relocating layer in md or dm?

What about corrected-error counts?  Drives provide them with SMART.
The SCSI layer could provide some as well.  Md can do a similar thing
to some extent.  Where these are actually useful predictors of pending
failure is unclear, but there could be some value.
e.g. after a certain number of recovered errors raid5 could trigger a
background consistency check, or a filesystem could trigger a
background fsck should it support that.


Lots of interesting questions... not so many answers.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: end to end error recovery musings

2007-02-25 Thread Douglas Gilbert
H. Peter Anvin wrote:
 Ric Wheeler wrote:

 We still have the following challenges:

(1) read-ahead often means that we will  retry every bad sector at
 least twice from the file system level. The first time, the fs read
 ahead request triggers a speculative read that includes the bad sector
 (triggering the error handling mechanisms) right before the real
 application triggers a read does the same thing.  Not sure what the
 answer is here since read-ahead is obviously a huge win in the normal
 case.

 
 Probably the only sane thing to do is to remember the bad sectors and
 avoid attempting reading them; that would mean marking automatic
 versus explicitly requested requests to determine whether or not to
 filter them against a list of discovered bad blocks.

Some disks are doing their own read-ahead in the form
of a background media scan. Scans are done on request or
periodically (e.g. once per day or once per week) and we
have tools that can fetch the scan results from a disk
(e.g. a list of unreadable sectors). What we don't have
is any way to feed such information to a file system
that may be impacted.

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html