Re: [RFC/PATCH] revokeat/frevoke system calls V5
Whats the status on this, I was suprised to see something so important just go dead ? Alan - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-cifs-client Digest, Vol 39, Issue 19
[EMAIL PROTECTED] wrote: Send linux-cifs-client mailing list submissions to [EMAIL PROTECTED] To subscribe or unsubscribe via the World Wide Web, visit https://lists.samba.org/mailman/listinfo/linux-cifs-client or, via email, send a message with subject or body 'help' to [EMAIL PROTECTED] You can reach the person managing the list at [EMAIL PROTECTED] When replying, please edit your Subject line so it is more specific than Re: Contents of linux-cifs-client digest... Today's Topics: 1. i_mutex and deadlock (Steve French (smfltc)) 2. Re: i_mutex and deadlock (Dave Kleikamp) -- Message: 1 Date: Fri, 23 Feb 2007 10:02:16 -0600 From: Steve French (smfltc) [EMAIL PROTECTED] Subject: [linux-cifs-client] i_mutex and deadlock To: linux-fsdevel@vger.kernel.org Cc: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=ISO-8859-1; format=flowed A field in i_size_write (i_size_seqcount) must be protected against simultaneous update otherwise we risk looping in i_size_read. The suggestion in fs.h is to use i_mutex which seems too dangerous due to the possibility of deadlock. There are 65 places in the fs directory which lock an i_mutex, and seven more in the mm directory. The vfs does clearly lock file inodes in some paths before calling into a particular filesystem (nfs, ext3, cifs etc.) - in particular for fsync but probably for others that are harder to trace. This seems to introduce the possibility of deadlock if a filesystem also uses i_mutex to protect file size updates Documentation/filesystems/Locking describes the use of i_mutex (was i_sem previously) and indicates that it is held by the vfs on three additional calls on file inodes which concern me (for deadlock possibility), setattr, truncate and unlink. nfs seems to limit its use of i_mutex to llseek and invalidate_mapping, and does not appear to grab the i_mutex (or any sem for that matter) to protect i_size_write (nfs calls i_size_write in nfs_grow_file) - and for the case of nfs_fhget (in which they bypass i_size_write and set i_size directly) does not seem to grab i_mutex either. ext3 also does not use i_mutex for this purpose (protecting i_size_write) - ony to protect a journalling ioctl. I am concerned about using i_mutex to protect the cifs calls to i_size_write (although it seems to fix a problem reported in i_size_read under stress) because of the following: 1) no one else calls i_size_write AFAIK (on our file inodes) 2) we don't block inside i_size_write do we ... (so why in the world do they take a slow mutex instead of a fast spinlock) 3) we don't really know what happens inside fsync (the paths through the page cache code seem complex and we don't want to reenter writepage in low memory conditions and deadlock updating the file size), and there is some concern that the vfs takes the i_mutex in other paths on file inodes before entering our code and could deadlock. Any reason, why an fs shouldn't simply use something else (a spinlock) other than i_mutex to protect the i_size_write call? -- Message: 2 Date: Fri, 23 Feb 2007 10:29:53 -0600 From: Dave Kleikamp [EMAIL PROTECTED] Subject: Re: [linux-cifs-client] i_mutex and deadlock To: Steve French (smfltc) [EMAIL PROTECTED] Cc: linux-fsdevel@vger.kernel.org, [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] Content-Type: text/plain On Fri, 2007-02-23 at 10:02 -0600, Steve French (smfltc) wrote: A field in i_size_write (i_size_seqcount) must be protected against simultaneous update otherwise we risk looping in i_size_read. The suggestion in fs.h is to use i_mutex which seems too dangerous due to the possibility of deadlock. I'm not sure if it's as much a suggestion as a way of documenting the locking that exists (or existed when the comment was written). ... i_size_write() does need locking around it (normally i_mutex) ... There are 65 places in the fs directory which lock an i_mutex, and seven more in the mm directory. The vfs does clearly lock file inodes in some paths before calling into a particular filesystem (nfs, ext3, cifs etc.) - in particular for fsync but probably for others that are harder to trace. This seems to introduce the possibility of deadlock if a filesystem also uses i_mutex to protect file size updates I am concerned about using i_mutex to protect the cifs calls to i_size_write (although it seems to fix a problem reported in i_size_read under stress) because of the following: 1) no one else calls i_size_write AFAIK (on our file inodes) I think you're right. I produced a patch (attached to http://bugzilla.kernel.org/show_bug.cgi?id=7903) which fixes this, has tested out fine, has no sideeffects/changes outside of cifs and simply uses inode-i_lock (spinlock) to protect i_size_write calls on cifs inodes which seemed
Re: [RFC/PATCH] revokeat/frevoke system calls V5
Hi Alan, On 2/26/07, Alan [EMAIL PROTECTED] wrote: Whats the status on this, I was suprised to see something so important just go dead ? It's not dead. You can find the latest patches here: http://www.cs.helsinki.fi/u/penberg/linux/revoke/patches/ and user-space tests here: http://www.cs.helsinki.fi/u/penberg/linux/revoke/utils/ What they are lacking is review so I am not sure how to proceed with the patches. Pekka - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: end to end error recovery musings
On Friday February 23, [EMAIL PROTECTED] wrote: On Fri, Feb 23, 2007 at 05:37:23PM -0700, Andreas Dilger wrote: Probably the only sane thing to do is to remember the bad sectors and avoid attempting reading them; that would mean marking automatic versus explicitly requested requests to determine whether or not to filter them against a list of discovered bad blocks. And clearing this list when the sector is overwritten, as it will almost certainly be relocated at the disk level. For that matter, a huge win would be to have the MD RAID layer rewrite only the bad sector (in hopes of the disk relocating it) instead of failing the whiole disk. Otherwise, a few read errors on different disks in a RAID set can take the whole system offline. Apologies if this is already done in recent kernels... Yes, current md does this. And having a way of making this list available to both the filesystem and to a userspace utility, so they can more easily deal with doing a forced rewrite of the bad sector, after determining which file is involved and perhaps doing something intelligent (up to and including automatically requesting a backup system to fetch a backup version of the file, and if it can be determined that the file shouldn't have been changed since the last backup, automatically fixing up the corrupted data block :-). - Ted So we want a clear path for media read errors from the device up to user-space. Stacked devices (like md) would do appropriate mappings maybe (for raid0/linear at least. Other levels wouldn't tolerate errors). There would need to be a limit on the number of 'bad blocks' that is recorded. Maybe a mechanism to clear old bad blocks from the list is needed. Maybe if generic make request gets a request for a block which overlaps a 'bad-block' it returns an error immediately. Do we want a path in the other direction to handle write errors? The file system could say Don't worry to much if this block cannot be written, just return an error and I will write it somewhere else? This might allow md not to fail a whole drive if there is a single write error. Or is that completely un-necessary as all modern devices do bad-block relocation for us? Is there any need for a bad-block-relocating layer in md or dm? What about corrected-error counts? Drives provide them with SMART. The SCSI layer could provide some as well. Md can do a similar thing to some extent. Where these are actually useful predictors of pending failure is unclear, but there could be some value. e.g. after a certain number of recovered errors raid5 could trigger a background consistency check, or a filesystem could trigger a background fsck should it support that. Lots of interesting questions... not so many answers. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: end to end error recovery musings
H. Peter Anvin wrote: Ric Wheeler wrote: We still have the following challenges: (1) read-ahead often means that we will retry every bad sector at least twice from the file system level. The first time, the fs read ahead request triggers a speculative read that includes the bad sector (triggering the error handling mechanisms) right before the real application triggers a read does the same thing. Not sure what the answer is here since read-ahead is obviously a huge win in the normal case. Probably the only sane thing to do is to remember the bad sectors and avoid attempting reading them; that would mean marking automatic versus explicitly requested requests to determine whether or not to filter them against a list of discovered bad blocks. Some disks are doing their own read-ahead in the form of a background media scan. Scans are done on request or periodically (e.g. once per day or once per week) and we have tools that can fetch the scan results from a disk (e.g. a list of unreadable sectors). What we don't have is any way to feed such information to a file system that may be impacted. Doug Gilbert - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html