Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
On Mon, 2008-01-28 at 12:33 -0600, Steve French wrote: > On Jan 28, 2008 2:17 AM, Andi Kleen <[EMAIL PROTECTED]> wrote: > > > I completely agree. If one thread writes A and another writes B then the > > > kernel should record either A or B, not ((A & 0x) | (B & > > > 0x)) > > > > The problem is pretty nasty unfortunately. To solve it properly I think > > the file_operations->read/write prototypes would need to be changed > > because otherwise it is not possible to do atomic relative updates > > of f_pos. Right now the actual update is burrowed deeply in the low level > > read/write implementation. But that would be a huge impact all over > > the tree :/ > > If there were a wrapper around reads and writes of f_pos as there is > for i_size e.g. it would hit a lot of code, but not as many as I had > originally thought. the most important ones are in the vfs itself, where > there are only 59 uses of the field (not all need to be changed). ext3 > has fewer (25), and cifs only 12 uses. Most of the uses in ext3 and cifs deal with a directory's f_pos in readdir, which is protected by i_mutex, so I don't think we need to worry about them at all. -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
On Jan 28, 2008 2:17 AM, Andi Kleen <[EMAIL PROTECTED]> wrote: > > I completely agree. If one thread writes A and another writes B then the > > kernel should record either A or B, not ((A & 0x) | (B & > > 0x)) > > The problem is pretty nasty unfortunately. To solve it properly I think > the file_operations->read/write prototypes would need to be changed > because otherwise it is not possible to do atomic relative updates > of f_pos. Right now the actual update is burrowed deeply in the low level > read/write implementation. But that would be a huge impact all over > the tree :/ If there were a wrapper around reads and writes of f_pos as there is for i_size e.g. it would hit a lot of code, but not as many as I had originally thought. the most important ones are in the vfs itself, where there are only 59 uses of the field (not all need to be changed). ext3 has fewer (25), and cifs only 12 uses. -- Thanks, Steve - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
El Mon, 28 Jan 2008 15:10:34 +0100, Andi Kleen <[EMAIL PROTECTED]> escribió: > So you get overlapping reads. Probably not good. This was discussed in the past i think -> http://lkml.org/lkml/2006/4/13/124 http://lkml.org/lkml/2006/4/13/130 - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
On Mon, 28 Jan 2008 15:10:34 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: > On Monday 28 January 2008 14:38:57 Alan Cox wrote: > > > Also worse really fixing it would be a major change to the VFS > > > because of the way ->read/write are defined :/ > > > > I don't see a problem there. ->read and ->write update the passed pointer > > which is not the real f_pos anyway. Just the copies need fixing. > > They are effectually doing a decoupled read/modify/write cycle. e.g.: > > A B > > read fpos > > read fpos > > fpos += A fpos += B > write fpos > > > write fpos > > So you get overlapping reads. Probably not good. No unix system I'm aware of cares about the read/write positioning during parallel simultaneous reads or writes, with the exception of O_APPEND which is strictly defined. The problem case is getting fpos != either valid value. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
On Monday 28 January 2008 14:38:57 Alan Cox wrote: > > Also worse really fixing it would be a major change to the VFS > > because of the way ->read/write are defined :/ > > I don't see a problem there. ->read and ->write update the passed pointer > which is not the real f_pos anyway. Just the copies need fixing. They are effectually doing a decoupled read/modify/write cycle. e.g.: A B read fpos read fpos fpos += A fpos += B write fpos write fpos So you get overlapping reads. Probably not good. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
> Also worse really fixing it would be a major change to the VFS > because of the way ->read/write are defined :/ I don't see a problem there. ->read and ->write update the passed pointer which is not the real f_pos anyway. Just the copies need fixing. Alan - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
On Monday 28 January 2008 13:56:05 Alan Cox wrote: > > > No specific spec, just general quality of implementation. > > > > I completely agree. If one thread writes A and another writes B then the > > kernel should record either A or B, not ((A & 0x) | (B & > > 0x)) > > Agree entirely: the spec doesn't allow for random scribbling in the wrong > place. It doesn't cover which goes first or who "wins" the race but > provides pwrite/pread for that situation. Writing somewhere unrelated is > definitely not to spec Actually it would probably -- i guess it's undefined and in undefined country such things can happen. Also to be fair I think it's only a problem for the 4GB wrapping case which is presumably rare (otherwise we would have heard about it) Also worse really fixing it would be a major change to the VFS because of the way ->read/write are defined :/ -Andi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
> > No specific spec, just general quality of implementation. > > I completely agree. If one thread writes A and another writes B then the > kernel should record either A or B, not ((A & 0x) | (B & > 0x)) Agree entirely: the spec doesn't allow for random scribbling in the wrong place. It doesn't cover which goes first or who "wins" the race but provides pwrite/pread for that situation. Writing somewhere unrelated is definitely not to spec and not good. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
Trond Myklebust <[EMAIL PROTECTED]> wrote: > On Mon, 2008-01-28 at 05:38 +0100, Andi Kleen wrote: >> On Monday 28 January 2008 05:13:09 Trond Myklebust wrote: >> > On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote: >> > > The problem is that it's not a race in who gets to do its thing first, >> > > but a parallel reader can actually see a corrupted value from the two >> > > independent words on 32bit (e.g. during a 4GB). And this could actually >> > > completely corrupt f_pos when it happens with two racing relative seeks >> > > or read/write()s >> > > >> > > I would consider that a bug. >> > >> > I disagree. The corruption occurs because this isn't a situation that is >> > allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring >> > to here? >> >> No specific spec, just general quality of implementation. We normally don't >> have non thread safe system calls even if it was in theory allowed by some >> specification. > > We've had the existing implementation for quite some time. The arguments > against changing it have been the same all along: if your application > wants to share files between threads, the portability argument implies > that you should either use pread/pwrite or use a mutex or some other > form of synchronisation primitive in order to ensure that > lseek()/read()/write() do not overlap. Does anything in the kernel depend on f_pos being valid? E.g. is it possible to read beyond the EOF using this race, or to have files larger than the ulimit? If not, update the manpage and be done. ¢¢ - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
> I completely agree. If one thread writes A and another writes B then the > kernel should record either A or B, not ((A & 0x) | (B & > 0x)) The problem is pretty nasty unfortunately. To solve it properly I think the file_operations->read/write prototypes would need to be changed because otherwise it is not possible to do atomic relative updates of f_pos. Right now the actual update is burrowed deeply in the low level read/write implementation. But that would be a huge impact all over the tree :/ Or maybe define a new read/write64 and keep the default as 32bit only-- i suppose most users don't really need 64bit. Still would be a nasty API change. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
On Mon, 28 Jan 2008 05:38:25 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: > On Monday 28 January 2008 05:13:09 Trond Myklebust wrote: > > > > On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote: > > > The problem is that it's not a race in who gets to do its thing first, > > > but a > > > parallel reader can actually see a corrupted value from the two > > > independent > > > words on 32bit (e.g. during a 4GB). And this could actually completely > > > corrupt > > > f_pos when it happens with two racing relative seeks or read/write()s > > > > > > I would consider that a bug. > > > > I disagree. The corruption occurs because this isn't a situation that is > > allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring > > to here? > > No specific spec, just general quality of implementation. I completely agree. If one thread writes A and another writes B then the kernel should record either A or B, not ((A & 0x) | (B & 0x)) - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
On Mon, 2008-01-28 at 05:38 +0100, Andi Kleen wrote: > On Monday 28 January 2008 05:13:09 Trond Myklebust wrote: > > > > On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote: > > > The problem is that it's not a race in who gets to do its thing first, > > > but a > > > parallel reader can actually see a corrupted value from the two > > > independent > > > words on 32bit (e.g. during a 4GB). And this could actually completely > > > corrupt > > > f_pos when it happens with two racing relative seeks or read/write()s > > > > > > I would consider that a bug. > > > > I disagree. The corruption occurs because this isn't a situation that is > > allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring > > to here? > > No specific spec, just general quality of implementation. We normally don't > have > non thread safe system calls even if it was in theory allowed by some > specification. We've had the existing implementation for quite some time. The arguments against changing it have been the same all along: if your application wants to share files between threads, the portability argument implies that you should either use pread/pwrite or use a mutex or some other form of synchronisation primitive in order to ensure that lseek()/read()/write() do not overlap. Cheers Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
On Monday 28 January 2008 05:13:09 Trond Myklebust wrote: > > On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote: > > The problem is that it's not a race in who gets to do its thing first, but > > a > > parallel reader can actually see a corrupted value from the two independent > > words on 32bit (e.g. during a 4GB). And this could actually completely > > corrupt > > f_pos when it happens with two racing relative seeks or read/write()s > > > > I would consider that a bug. > > I disagree. The corruption occurs because this isn't a situation that is > allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring > to here? No specific spec, just general quality of implementation. We normally don't have non thread safe system calls even if it was in theory allowed by some specification. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote: > The problem is that it's not a race in who gets to do its thing first, but a > parallel reader can actually see a corrupted value from the two independent > words on 32bit (e.g. during a 4GB). And this could actually completely > corrupt > f_pos when it happens with two racing relative seeks or read/write()s > > I would consider that a bug. I disagree. The corruption occurs because this isn't a situation that is allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring to here? Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
On Monday 28 January 2008 00:08:56 Trond Myklebust wrote: > > On Sun, 2008-01-27 at 16:18 -0600, Steve French wrote: > > If two seeks overlap, can't you end up with an f_pos value that is > > different than what either thread seeked to? or if you have a seek and > > a read overlap can't you end up with the read occurring in the midst > > of an update of f_pos (which takes more than one instruction on > > various architectures), e.g. reading an f_pos, which has only the > > lower half of a 64 bit field updated? I agree that you shouldn't > > have seeks racing in parallel but I think it is preferable to get > > either the updated f_pos or the earlier f_pos not something 1/2 > > updated. > > Why? The threads are doing something inherently liable to corrupt data > anyway. If they can race over the seek, why wouldn't they race over the > read or write too? > The race in lseek() should probably be the least of your worries in this > case. The problem is that it's not a race in who gets to do its thing first, but a parallel reader can actually see a corrupted value from the two independent words on 32bit (e.g. during a 4GB). And this could actually completely corrupt f_pos when it happens with two racing relative seeks or read/write()s I would consider that a bug. Fixes would be either to always take a spinlock to update this (nasty on platforms where spinlocks are expensive like P4) or define some architecture specific way to read/write 64bit values consistently. In theory also some lazy locking seqlock like mechanism could be used, but that had the disadvantage of being theoretically starvable. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
On Sunday 27 January 2008 23:18:26 Steve French wrote: > If two seeks overlap, can't you end up with an f_pos value that is > different than what either thread seeked to? Yes you can on 32bit. Especially during the 4GB wrap -Andi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
On Sunday 27 January 2008 17:57:14 Steve French wrote: > Don't you need to a spinlock/spinunlock(i_lock) or something similar > (there isn't a spinlock in the file struct unfortunately) around the > reads and writes from f_pos in fs/read_write.c in remote_llseek with > your patch since the reads/writes from that field are not necessarily > atomic and threads could be racing in seek on the same file struct? Funny that you mention it. I actually noticed this too while working on this, but noticed that it is wrong everywhere (as in even plain sys_write/read gets it wrong). So I decided to not address it because it is already broken. I did actually send email to a few people about this, but no answer yet. I agree it's probably all broken on 32bit platforms, but I'm not sure how to best address this. When it is comprehensively addressed remote_llseek can use that new method too. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
On Sun, 2008-01-27 at 16:18 -0600, Steve French wrote: > If two seeks overlap, can't you end up with an f_pos value that is > different than what either thread seeked to? or if you have a seek and > a read overlap can't you end up with the read occurring in the midst > of an update of f_pos (which takes more than one instruction on > various architectures), e.g. reading an f_pos, which has only the > lower half of a 64 bit field updated? I agree that you shouldn't > have seeks racing in parallel but I think it is preferable to get > either the updated f_pos or the earlier f_pos not something 1/2 > updated. Why? The threads are doing something inherently liable to corrupt data anyway. If they can race over the seek, why wouldn't they race over the read or write too? The race in lseek() should probably be the least of your worries in this case. Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
If two seeks overlap, can't you end up with an f_pos value that is different than what either thread seeked to? or if you have a seek and a read overlap can't you end up with the read occurring in the midst of an update of f_pos (which takes more than one instruction on various architectures), e.g. reading an f_pos, which has only the lower half of a 64 bit field updated? I agree that you shouldn't have seeks racing in parallel but I think it is preferable to get either the updated f_pos or the earlier f_pos not something 1/2 updated. On Jan 27, 2008 11:56 AM, Trond Myklebust <[EMAIL PROTECTED]> wrote: > > On Sun, 2008-01-27 at 10:57 -0600, Steve French wrote: > > Don't you need to a spinlock/spinunlock(i_lock) or something similar > > (there isn't a spinlock in the file struct unfortunately) around the > > reads and writes from f_pos in fs/read_write.c in remote_llseek with > > your patch since the reads/writes from that field are not necessarily > > atomic and threads could be racing in seek on the same file struct? > > Where does is state in POSIX or SUS that we need to cater to that kind > of application? > In any case, the current behaviour of f_pos if two threads are sharing > the file struct is undefined no matter whether you spinlock or not, > since there is no special locking around sys_read() or sys_write(). > > Trond > -- Thanks, Steve - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
On Sun, 2008-01-27 at 10:57 -0600, Steve French wrote: > Don't you need to a spinlock/spinunlock(i_lock) or something similar > (there isn't a spinlock in the file struct unfortunately) around the > reads and writes from f_pos in fs/read_write.c in remote_llseek with > your patch since the reads/writes from that field are not necessarily > atomic and threads could be racing in seek on the same file struct? Where does is state in POSIX or SUS that we need to cater to that kind of application? In any case, the current behaviour of f_pos if two threads are sharing the file struct is undefined no matter whether you spinlock or not, since there is no special locking around sys_read() or sys_write(). Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
Don't you need to a spinlock/spinunlock(i_lock) or something similar (there isn't a spinlock in the file struct unfortunately) around the reads and writes from f_pos in fs/read_write.c in remote_llseek with your patch since the reads/writes from that field are not necessarily atomic and threads could be racing in seek on the same file struct? On Jan 26, 2008 8:17 PM, Andi Kleen <[EMAIL PROTECTED]> wrote: > > - Replace remote_llseek with remote_llseek_unlocked (to force compilation > failures in all users) > - Change all users to either use remote_llseek directly or take the > BKL around. I changed the file systems who don't use the BKL > for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS > take the BKL, but explicitely in their own source now. > > I moved them all over in a single patch to avoid unbisectable sections. > > Cc: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > > Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> > > --- > fs/cifs/cifsfs.c |2 +- > fs/gfs2/ops_file.c |4 ++-- > fs/ncpfs/file.c| 12 +++- > fs/nfs/file.c |6 +- > fs/read_write.c|7 +++ > fs/smbfs/file.c| 11 ++- > include/linux/fs.h |3 ++- > 7 files changed, 34 insertions(+), 11 deletions(-) > > Index: linux/fs/cifs/cifsfs.c > === > --- linux.orig/fs/cifs/cifsfs.c > +++ linux/fs/cifs/cifsfs.c > @@ -549,7 +549,7 @@ static loff_t cifs_llseek(struct file *f > if (retval < 0) > return (loff_t)retval; > } > - return remote_llseek(file, offset, origin); > + return remote_llseek_unlocked(file, offset, origin); > } > > static struct file_system_type cifs_fs_type = { > Index: linux/fs/gfs2/ops_file.c > === > --- linux.orig/fs/gfs2/ops_file.c > +++ linux/fs/gfs2/ops_file.c > @@ -107,11 +107,11 @@ static loff_t gfs2_llseek(struct file *f > error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, > LM_FLAG_ANY, >&i_gh); > if (!error) { > - error = remote_llseek(file, offset, origin); > + error = remote_llseek_unlocked(file, offset, origin); > gfs2_glock_dq_uninit(&i_gh); > } > } else > - error = remote_llseek(file, offset, origin); > + error = remote_llseek_unlocked(file, offset, origin); > > return error; > } > Index: linux/fs/ncpfs/file.c > === > --- linux.orig/fs/ncpfs/file.c > +++ linux/fs/ncpfs/file.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > > #include > #include "ncplib_kernel.h" > @@ -281,9 +282,18 @@ static int ncp_release(struct inode *ino > return 0; > } > > +static loff_t ncp_remote_llseek(struct file *file, loff_t offset, int origin) > +{ > + loff_t ret; > + lock_kernel(); > + ret = remote_llseek_unlocked(file, offset, origin); > + unlock_kernel(); > + return ret; > +} > + > const struct file_operations ncp_file_operations = > { > - .llseek = remote_llseek, > + .llseek = ncp_remote_llseek, > .read = ncp_file_read, > .write = ncp_file_write, > .ioctl = ncp_ioctl, > Index: linux/fs/read_write.c > === > --- linux.orig/fs/read_write.c > +++ linux/fs/read_write.c > @@ -58,11 +58,10 @@ loff_t generic_file_llseek(struct file * > > EXPORT_SYMBOL(generic_file_llseek); > > -loff_t remote_llseek(struct file *file, loff_t offset, int origin) > +loff_t remote_llseek_unlocked(struct file *file, loff_t offset, int origin) > { > long long retval; > > - lock_kernel(); > switch (origin) { > case SEEK_END: > offset += i_size_read(file->f_path.dentry->d_inode); > @@ -73,15 +72,15 @@ loff_t remote_llseek(struct file *file, > retval = -EINVAL; > if (offset>=0 && > offset<=file->f_path.dentry->d_inode->i_sb->s_maxbytes) { > if (offset != file->f_pos) { > + /* AK: do we need a lock for those? */ > file->f_pos = offset; > file->f_version = 0; > } > retval = offset; > } > - unlock_kernel(); > return retval; > } > -EXPORT_SYMBOL(remote_llseek); > +EXPORT_SYMBOL(remote_llseek_unlocked); > > loff_t no_llseek(struct file *file, loff_t offset, int origin) > { > Index: linux/fs/smbfs/file.c > === > --- linux.orig/fs/smbfs/file.c > +++ linux/fs/smbfs/file.c > @@ -422,9 +422,18 @@ smb_file_per
[PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek
- Replace remote_llseek with remote_llseek_unlocked (to force compilation failures in all users) - Change all users to either use remote_llseek directly or take the BKL around. I changed the file systems who don't use the BKL for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS take the BKL, but explicitely in their own source now. I moved them all over in a single patch to avoid unbisectable sections. Cc: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> --- fs/cifs/cifsfs.c |2 +- fs/gfs2/ops_file.c |4 ++-- fs/ncpfs/file.c| 12 +++- fs/nfs/file.c |6 +- fs/read_write.c|7 +++ fs/smbfs/file.c| 11 ++- include/linux/fs.h |3 ++- 7 files changed, 34 insertions(+), 11 deletions(-) Index: linux/fs/cifs/cifsfs.c === --- linux.orig/fs/cifs/cifsfs.c +++ linux/fs/cifs/cifsfs.c @@ -549,7 +549,7 @@ static loff_t cifs_llseek(struct file *f if (retval < 0) return (loff_t)retval; } - return remote_llseek(file, offset, origin); + return remote_llseek_unlocked(file, offset, origin); } static struct file_system_type cifs_fs_type = { Index: linux/fs/gfs2/ops_file.c === --- linux.orig/fs/gfs2/ops_file.c +++ linux/fs/gfs2/ops_file.c @@ -107,11 +107,11 @@ static loff_t gfs2_llseek(struct file *f error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_ANY, &i_gh); if (!error) { - error = remote_llseek(file, offset, origin); + error = remote_llseek_unlocked(file, offset, origin); gfs2_glock_dq_uninit(&i_gh); } } else - error = remote_llseek(file, offset, origin); + error = remote_llseek_unlocked(file, offset, origin); return error; } Index: linux/fs/ncpfs/file.c === --- linux.orig/fs/ncpfs/file.c +++ linux/fs/ncpfs/file.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include "ncplib_kernel.h" @@ -281,9 +282,18 @@ static int ncp_release(struct inode *ino return 0; } +static loff_t ncp_remote_llseek(struct file *file, loff_t offset, int origin) +{ + loff_t ret; + lock_kernel(); + ret = remote_llseek_unlocked(file, offset, origin); + unlock_kernel(); + return ret; +} + const struct file_operations ncp_file_operations = { - .llseek = remote_llseek, + .llseek = ncp_remote_llseek, .read = ncp_file_read, .write = ncp_file_write, .ioctl = ncp_ioctl, Index: linux/fs/read_write.c === --- linux.orig/fs/read_write.c +++ linux/fs/read_write.c @@ -58,11 +58,10 @@ loff_t generic_file_llseek(struct file * EXPORT_SYMBOL(generic_file_llseek); -loff_t remote_llseek(struct file *file, loff_t offset, int origin) +loff_t remote_llseek_unlocked(struct file *file, loff_t offset, int origin) { long long retval; - lock_kernel(); switch (origin) { case SEEK_END: offset += i_size_read(file->f_path.dentry->d_inode); @@ -73,15 +72,15 @@ loff_t remote_llseek(struct file *file, retval = -EINVAL; if (offset>=0 && offset<=file->f_path.dentry->d_inode->i_sb->s_maxbytes) { if (offset != file->f_pos) { + /* AK: do we need a lock for those? */ file->f_pos = offset; file->f_version = 0; } retval = offset; } - unlock_kernel(); return retval; } -EXPORT_SYMBOL(remote_llseek); +EXPORT_SYMBOL(remote_llseek_unlocked); loff_t no_llseek(struct file *file, loff_t offset, int origin) { Index: linux/fs/smbfs/file.c === --- linux.orig/fs/smbfs/file.c +++ linux/fs/smbfs/file.c @@ -422,9 +422,18 @@ smb_file_permission(struct inode *inode, return error; } +static loff_t smb_remote_llseek(struct file *file, loff_t offset, int origin) +{ + loff_t ret; + lock_kernel(); + ret = remote_llseek_unlocked(file, offset, origin); + unlock_kernel(); + return ret; +} + const struct file_operations smb_file_operations = { - .llseek = remote_llseek, + .llseek = smb_remote_llseek, .read = do_sync_read, .aio_read = smb_file_aio_read, .write = do_sync_write, Index: linux/include/linux/fs.h