Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-28 Thread Dave Kleikamp

On Mon, 2008-01-28 at 12:33 -0600, Steve French wrote:
> On Jan 28, 2008 2:17 AM, Andi Kleen <[EMAIL PROTECTED]> wrote:
> > > I completely agree.  If one thread writes A and another writes B then the
> > > kernel should record either A or B, not ((A & 0x) | (B &
> > > 0x))
> >
> > The problem is pretty nasty unfortunately. To solve it properly I think
> > the file_operations->read/write prototypes would need to be changed
> > because otherwise it is not possible to do atomic relative updates
> > of f_pos. Right now the actual update is burrowed deeply in the low level
> > read/write implementation. But that would be a huge impact all over
> > the tree :/
> 
> If there were a wrapper around reads and writes of f_pos as there is
> for i_size e.g. it would hit a lot of code, but not as many as I had
> originally thought.  the most important ones are in the vfs itself, where
> there are only 59 uses of the field (not all need to be changed).   ext3
> has fewer (25), and cifs only 12 uses.

Most of the uses in ext3 and cifs deal with a directory's f_pos in
readdir, which is protected by i_mutex, so I don't think we need to
worry about them at all.
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-28 Thread Steve French
On Jan 28, 2008 2:17 AM, Andi Kleen <[EMAIL PROTECTED]> wrote:
> > I completely agree.  If one thread writes A and another writes B then the
> > kernel should record either A or B, not ((A & 0x) | (B &
> > 0x))
>
> The problem is pretty nasty unfortunately. To solve it properly I think
> the file_operations->read/write prototypes would need to be changed
> because otherwise it is not possible to do atomic relative updates
> of f_pos. Right now the actual update is burrowed deeply in the low level
> read/write implementation. But that would be a huge impact all over
> the tree :/

If there were a wrapper around reads and writes of f_pos as there is
for i_size e.g. it would hit a lot of code, but not as many as I had
originally thought.  the most important ones are in the vfs itself, where
there are only 59 uses of the field (not all need to be changed).   ext3
has fewer (25), and cifs only 12 uses.


-- 
Thanks,

Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-28 Thread Diego Calleja
El Mon, 28 Jan 2008 15:10:34 +0100, Andi Kleen <[EMAIL PROTECTED]> escribió:

> So you get overlapping reads. Probably not good.

This was discussed in the past i think ->

http://lkml.org/lkml/2006/4/13/124
http://lkml.org/lkml/2006/4/13/130
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-28 Thread Alan Cox
On Mon, 28 Jan 2008 15:10:34 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:

> On Monday 28 January 2008 14:38:57 Alan Cox wrote:
> > > Also worse really fixing it would be a major change to the VFS 
> > > because of the way ->read/write are defined :/
> > 
> > I don't see a problem there. ->read and ->write update the passed pointer
> > which is not the real f_pos anyway. Just the copies need fixing. 
> 
> They are effectually doing a decoupled read/modify/write cycle. e.g.:
> 
> A   B
> 
> read fpos   
> 
> read fpos
> 
> fpos += A   fpos += B
> write fpos
> 
> 
> write fpos
> 
> So you get overlapping reads. Probably not good.

No unix system I'm aware of cares about the read/write positioning during
parallel simultaneous reads or writes, with the exception of O_APPEND
which is strictly defined. The problem case is getting fpos != either
valid value.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-28 Thread Andi Kleen
On Monday 28 January 2008 14:38:57 Alan Cox wrote:
> > Also worse really fixing it would be a major change to the VFS 
> > because of the way ->read/write are defined :/
> 
> I don't see a problem there. ->read and ->write update the passed pointer
> which is not the real f_pos anyway. Just the copies need fixing. 

They are effectually doing a decoupled read/modify/write cycle. e.g.:

A   B

read fpos   

read fpos

fpos += A   fpos += B
write fpos


write fpos

So you get overlapping reads. Probably not good.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-28 Thread Alan Cox
> Also worse really fixing it would be a major change to the VFS 
> because of the way ->read/write are defined :/

I don't see a problem there. ->read and ->write update the passed pointer
which is not the real f_pos anyway. Just the copies need fixing.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-28 Thread Andi Kleen
On Monday 28 January 2008 13:56:05 Alan Cox wrote:
> > > No specific spec, just general quality of implementation.
> > 
> > I completely agree.  If one thread writes A and another writes B then the
> > kernel should record either A or B, not ((A & 0x) | (B &
> > 0x))
> 
> Agree entirely: the spec doesn't allow for random scribbling in the wrong
> place. It doesn't cover which goes first or who "wins" the race but
> provides pwrite/pread for that situation. Writing somewhere unrelated is
> definitely not to spec 

Actually it would probably -- i guess it's undefined and in undefined
country such things can happen.

Also to be fair I think it's only a problem for the 4GB wrapping case
which is presumably rare (otherwise we would have heard about it)

Also worse really fixing it would be a major change to the VFS 
because of the way ->read/write are defined :/

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-28 Thread Alan Cox
> > No specific spec, just general quality of implementation.
> 
> I completely agree.  If one thread writes A and another writes B then the
> kernel should record either A or B, not ((A & 0x) | (B &
> 0x))

Agree entirely: the spec doesn't allow for random scribbling in the wrong
place. It doesn't cover which goes first or who "wins" the race but
provides pwrite/pread for that situation. Writing somewhere unrelated is
definitely not to spec and not good.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-28 Thread Bodo Eggert
Trond Myklebust <[EMAIL PROTECTED]> wrote:
> On Mon, 2008-01-28 at 05:38 +0100, Andi Kleen wrote:
>> On Monday 28 January 2008 05:13:09 Trond Myklebust wrote:
>> > On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote:

>> > > The problem is that it's not a race in who gets to do its thing first,
>> > > but a parallel reader can actually see a corrupted value from the two
>> > > independent words on 32bit (e.g. during a 4GB). And this could actually
>> > > completely corrupt f_pos when it happens with two racing relative seeks
>> > > or read/write()s
>> > > 
>> > > I would consider that a bug.
>> > 
>> > I disagree. The corruption occurs because this isn't a situation that is
>> > allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring
>> > to here?
>> 
>> No specific spec, just general quality of implementation. We normally don't
>> have non thread safe system calls even if it was in theory allowed by some
>> specification.
> 
> We've had the existing implementation for quite some time. The arguments
> against changing it have been the same all along: if your application
> wants to share files between threads, the portability argument implies
> that you should either use pread/pwrite or use a mutex or some other
> form of synchronisation primitive in order to ensure that
> lseek()/read()/write() do not overlap.

Does anything in the kernel depend on f_pos being valid?
E.g. is it possible to read beyond the EOF using this race, or to have files
larger than the ulimit?

If not, update the manpage and be done. ¢¢

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-28 Thread Andi Kleen

> I completely agree.  If one thread writes A and another writes B then the
> kernel should record either A or B, not ((A & 0x) | (B &
> 0x))

The problem is pretty nasty unfortunately. To solve it properly I think
the file_operations->read/write prototypes would need to be changed
because otherwise it is not possible to do atomic relative updates
of f_pos. Right now the actual update is burrowed deeply in the low level 
read/write implementation. But that would be a huge impact all over
the tree :/

Or maybe define a new read/write64 and keep the default as 32bit only-- i 
suppose most users don't really need 64bit. Still would be a nasty API 
change.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-27 Thread Andrew Morton
On Mon, 28 Jan 2008 05:38:25 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote:

> On Monday 28 January 2008 05:13:09 Trond Myklebust wrote:
> > 
> > On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote:
> > > The problem is that it's not a race in who gets to do its thing first, 
> > > but a 
> > > parallel reader can actually see a corrupted value from the two 
> > > independent 
> > > words on 32bit (e.g. during a 4GB). And this could actually completely 
> > > corrupt 
> > > f_pos when it happens with two racing relative seeks or read/write()s
> > > 
> > > I would consider that a bug.
> > 
> > I disagree. The corruption occurs because this isn't a situation that is
> > allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring
> > to here?
> 
> No specific spec, just general quality of implementation.

I completely agree.  If one thread writes A and another writes B then the
kernel should record either A or B, not ((A & 0x) | (B &
0x))
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-27 Thread Trond Myklebust

On Mon, 2008-01-28 at 05:38 +0100, Andi Kleen wrote:
> On Monday 28 January 2008 05:13:09 Trond Myklebust wrote:
> > 
> > On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote:
> > > The problem is that it's not a race in who gets to do its thing first, 
> > > but a 
> > > parallel reader can actually see a corrupted value from the two 
> > > independent 
> > > words on 32bit (e.g. during a 4GB). And this could actually completely 
> > > corrupt 
> > > f_pos when it happens with two racing relative seeks or read/write()s
> > > 
> > > I would consider that a bug.
> > 
> > I disagree. The corruption occurs because this isn't a situation that is
> > allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring
> > to here?
> 
> No specific spec, just general quality of implementation. We normally don't 
> have
> non thread safe system calls even if it was in theory allowed by some 
> specification.

We've had the existing implementation for quite some time. The arguments
against changing it have been the same all along: if your application
wants to share files between threads, the portability argument implies
that you should either use pread/pwrite or use a mutex or some other
form of synchronisation primitive in order to ensure that
lseek()/read()/write() do not overlap.

Cheers
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-27 Thread Andi Kleen
On Monday 28 January 2008 05:13:09 Trond Myklebust wrote:
> 
> On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote:
> > The problem is that it's not a race in who gets to do its thing first, but 
> > a 
> > parallel reader can actually see a corrupted value from the two independent 
> > words on 32bit (e.g. during a 4GB). And this could actually completely 
> > corrupt 
> > f_pos when it happens with two racing relative seeks or read/write()s
> > 
> > I would consider that a bug.
> 
> I disagree. The corruption occurs because this isn't a situation that is
> allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring
> to here?

No specific spec, just general quality of implementation. We normally don't have
non thread safe system calls even if it was in theory allowed by some 
specification.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-27 Thread Trond Myklebust

On Mon, 2008-01-28 at 03:58 +0100, Andi Kleen wrote:
> The problem is that it's not a race in who gets to do its thing first, but a 
> parallel reader can actually see a corrupted value from the two independent 
> words on 32bit (e.g. during a 4GB). And this could actually completely 
> corrupt 
> f_pos when it happens with two racing relative seeks or read/write()s
> 
> I would consider that a bug.

I disagree. The corruption occurs because this isn't a situation that is
allowed by either POSIX or SUSv2/v3. Exactly what spec are you referring
to here?

Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-27 Thread Andi Kleen
On Monday 28 January 2008 00:08:56 Trond Myklebust wrote:
> 
> On Sun, 2008-01-27 at 16:18 -0600, Steve French wrote:
> > If two seeks overlap, can't you end up with an f_pos value that is
> > different than what either thread seeked to? or if you have a seek and
> > a read overlap can't you end up with the read occurring in the midst
> > of an update of f_pos (which takes more than one instruction on
> > various architectures), e.g. reading an f_pos, which has only the
> > lower half of a 64 bit field updated?   I agree that you shouldn't
> > have seeks racing in parallel but I think it is preferable to get
> > either the updated f_pos or the earlier f_pos not something 1/2
> > updated.
> 
> Why? The threads are doing something inherently liable to corrupt data
> anyway. If they can race over the seek, why wouldn't they race over the
> read or write too?
> The race in lseek() should probably be the least of your worries in this
> case.

The problem is that it's not a race in who gets to do its thing first, but a 
parallel reader can actually see a corrupted value from the two independent 
words on 32bit (e.g. during a 4GB). And this could actually completely corrupt 
f_pos when it happens with two racing relative seeks or read/write()s

I would consider that a bug.

Fixes would be either to always take a spinlock to update this (nasty on 
platforms where spinlocks are expensive like P4) or define some architecture
specific way to read/write 64bit values consistently. In theory also some
lazy locking seqlock like mechanism could be used, but that had the disadvantage
of being theoretically starvable.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-27 Thread Andi Kleen
On Sunday 27 January 2008 23:18:26 Steve French wrote:
> If two seeks overlap, can't you end up with an f_pos value that is
> different than what either thread seeked to?

Yes you can on 32bit. Especially during the 4GB wrap

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-27 Thread Andi Kleen
On Sunday 27 January 2008 17:57:14 Steve French wrote:
> Don't you need to a spinlock/spinunlock(i_lock) or something similar
> (there isn't a spinlock in the file struct unfortunately) around the
> reads and writes from f_pos in fs/read_write.c in remote_llseek with
> your patch since the reads/writes from that field are not necessarily
> atomic and threads could be racing in seek on the same file struct?

Funny that you mention it. I actually noticed this too while working on this, 
but noticed that it is wrong everywhere (as in even plain sys_write/read gets 
it wrong). So I decided to not address it because it is already
broken. 

I did actually send email to a few people about this, but no answer
yet.

I agree it's probably all broken on 32bit platforms, but I'm not
sure how to best address this. When it is comprehensively addressed 
remote_llseek 
can use that new method too.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-27 Thread Trond Myklebust

On Sun, 2008-01-27 at 16:18 -0600, Steve French wrote:
> If two seeks overlap, can't you end up with an f_pos value that is
> different than what either thread seeked to? or if you have a seek and
> a read overlap can't you end up with the read occurring in the midst
> of an update of f_pos (which takes more than one instruction on
> various architectures), e.g. reading an f_pos, which has only the
> lower half of a 64 bit field updated?   I agree that you shouldn't
> have seeks racing in parallel but I think it is preferable to get
> either the updated f_pos or the earlier f_pos not something 1/2
> updated.

Why? The threads are doing something inherently liable to corrupt data
anyway. If they can race over the seek, why wouldn't they race over the
read or write too?
The race in lseek() should probably be the least of your worries in this
case.

Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-27 Thread Steve French
If two seeks overlap, can't you end up with an f_pos value that is
different than what either thread seeked to? or if you have a seek and
a read overlap can't you end up with the read occurring in the midst
of an update of f_pos (which takes more than one instruction on
various architectures), e.g. reading an f_pos, which has only the
lower half of a 64 bit field updated?   I agree that you shouldn't
have seeks racing in parallel but I think it is preferable to get
either the updated f_pos or the earlier f_pos not something 1/2
updated.

On Jan 27, 2008 11:56 AM, Trond Myklebust <[EMAIL PROTECTED]> wrote:
>
> On Sun, 2008-01-27 at 10:57 -0600, Steve French wrote:
> > Don't you need to a spinlock/spinunlock(i_lock) or something similar
> > (there isn't a spinlock in the file struct unfortunately) around the
> > reads and writes from f_pos in fs/read_write.c in remote_llseek with
> > your patch since the reads/writes from that field are not necessarily
> > atomic and threads could be racing in seek on the same file struct?
>
> Where does is state in POSIX or SUS that we need to cater to that kind
> of application?
> In any case, the current behaviour of f_pos if two threads are sharing
> the file struct is undefined no matter whether you spinlock or not,
> since there is no special locking around sys_read() or sys_write().
>
> Trond
>



-- 
Thanks,

Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-27 Thread Trond Myklebust

On Sun, 2008-01-27 at 10:57 -0600, Steve French wrote:
> Don't you need to a spinlock/spinunlock(i_lock) or something similar
> (there isn't a spinlock in the file struct unfortunately) around the
> reads and writes from f_pos in fs/read_write.c in remote_llseek with
> your patch since the reads/writes from that field are not necessarily
> atomic and threads could be racing in seek on the same file struct?

Where does is state in POSIX or SUS that we need to cater to that kind
of application?
In any case, the current behaviour of f_pos if two threads are sharing
the file struct is undefined no matter whether you spinlock or not,
since there is no special locking around sys_read() or sys_write().

Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-27 Thread Steve French
Don't you need to a spinlock/spinunlock(i_lock) or something similar
(there isn't a spinlock in the file struct unfortunately) around the
reads and writes from f_pos in fs/read_write.c in remote_llseek with
your patch since the reads/writes from that field are not necessarily
atomic and threads could be racing in seek on the same file struct?

On Jan 26, 2008 8:17 PM, Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> - Replace remote_llseek with remote_llseek_unlocked (to force compilation
> failures in all users)
> - Change all users to either use remote_llseek directly or take the
> BKL around. I changed the file systems who don't use the BKL
> for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
> take the BKL, but explicitely in their own source now.
>
> I moved them all over in a single patch to avoid unbisectable sections.
>
> Cc: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
>
> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
>
> ---
>  fs/cifs/cifsfs.c   |2 +-
>  fs/gfs2/ops_file.c |4 ++--
>  fs/ncpfs/file.c|   12 +++-
>  fs/nfs/file.c  |6 +-
>  fs/read_write.c|7 +++
>  fs/smbfs/file.c|   11 ++-
>  include/linux/fs.h |3 ++-
>  7 files changed, 34 insertions(+), 11 deletions(-)
>
> Index: linux/fs/cifs/cifsfs.c
> ===
> --- linux.orig/fs/cifs/cifsfs.c
> +++ linux/fs/cifs/cifsfs.c
> @@ -549,7 +549,7 @@ static loff_t cifs_llseek(struct file *f
> if (retval < 0)
> return (loff_t)retval;
> }
> -   return remote_llseek(file, offset, origin);
> +   return remote_llseek_unlocked(file, offset, origin);
>  }
>
>  static struct file_system_type cifs_fs_type = {
> Index: linux/fs/gfs2/ops_file.c
> ===
> --- linux.orig/fs/gfs2/ops_file.c
> +++ linux/fs/gfs2/ops_file.c
> @@ -107,11 +107,11 @@ static loff_t gfs2_llseek(struct file *f
> error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, 
> LM_FLAG_ANY,
>&i_gh);
> if (!error) {
> -   error = remote_llseek(file, offset, origin);
> +   error = remote_llseek_unlocked(file, offset, origin);
> gfs2_glock_dq_uninit(&i_gh);
> }
> } else
> -   error = remote_llseek(file, offset, origin);
> +   error = remote_llseek_unlocked(file, offset, origin);
>
> return error;
>  }
> Index: linux/fs/ncpfs/file.c
> ===
> --- linux.orig/fs/ncpfs/file.c
> +++ linux/fs/ncpfs/file.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include "ncplib_kernel.h"
> @@ -281,9 +282,18 @@ static int ncp_release(struct inode *ino
> return 0;
>  }
>
> +static loff_t ncp_remote_llseek(struct file *file, loff_t offset, int origin)
> +{
> +   loff_t ret;
> +   lock_kernel();
> +   ret = remote_llseek_unlocked(file, offset, origin);
> +   unlock_kernel();
> +   return ret;
> +}
> +
>  const struct file_operations ncp_file_operations =
>  {
> -   .llseek = remote_llseek,
> +   .llseek = ncp_remote_llseek,
> .read   = ncp_file_read,
> .write  = ncp_file_write,
> .ioctl  = ncp_ioctl,
> Index: linux/fs/read_write.c
> ===
> --- linux.orig/fs/read_write.c
> +++ linux/fs/read_write.c
> @@ -58,11 +58,10 @@ loff_t generic_file_llseek(struct file *
>
>  EXPORT_SYMBOL(generic_file_llseek);
>
> -loff_t remote_llseek(struct file *file, loff_t offset, int origin)
> +loff_t remote_llseek_unlocked(struct file *file, loff_t offset, int origin)
>  {
> long long retval;
>
> -   lock_kernel();
> switch (origin) {
> case SEEK_END:
> offset += i_size_read(file->f_path.dentry->d_inode);
> @@ -73,15 +72,15 @@ loff_t remote_llseek(struct file *file,
> retval = -EINVAL;
> if (offset>=0 && 
> offset<=file->f_path.dentry->d_inode->i_sb->s_maxbytes) {
> if (offset != file->f_pos) {
> +   /* AK: do we need a lock for those? */
> file->f_pos = offset;
> file->f_version = 0;
> }
> retval = offset;
> }
> -   unlock_kernel();
> return retval;
>  }
> -EXPORT_SYMBOL(remote_llseek);
> +EXPORT_SYMBOL(remote_llseek_unlocked);
>
>  loff_t no_llseek(struct file *file, loff_t offset, int origin)
>  {
> Index: linux/fs/smbfs/file.c
> ===
> --- linux.orig/fs/smbfs/file.c
> +++ linux/fs/smbfs/file.c
> @@ -422,9 +422,18 @@ smb_file_per

[PATCH] [8/18] BKL-removal: Remove BKL from remote_llseek

2008-01-26 Thread Andi Kleen

- Replace remote_llseek with remote_llseek_unlocked (to force compilation 
failures in all users)
- Change all users to either use remote_llseek directly or take the
BKL around. I changed the file systems who don't use the BKL
for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
take the BKL, but explicitely in their own source now.

I moved them all over in a single patch to avoid unbisectable sections.

Cc: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 fs/cifs/cifsfs.c   |2 +-
 fs/gfs2/ops_file.c |4 ++--
 fs/ncpfs/file.c|   12 +++-
 fs/nfs/file.c  |6 +-
 fs/read_write.c|7 +++
 fs/smbfs/file.c|   11 ++-
 include/linux/fs.h |3 ++-
 7 files changed, 34 insertions(+), 11 deletions(-)

Index: linux/fs/cifs/cifsfs.c
===
--- linux.orig/fs/cifs/cifsfs.c
+++ linux/fs/cifs/cifsfs.c
@@ -549,7 +549,7 @@ static loff_t cifs_llseek(struct file *f
if (retval < 0)
return (loff_t)retval;
}
-   return remote_llseek(file, offset, origin);
+   return remote_llseek_unlocked(file, offset, origin);
 }
 
 static struct file_system_type cifs_fs_type = {
Index: linux/fs/gfs2/ops_file.c
===
--- linux.orig/fs/gfs2/ops_file.c
+++ linux/fs/gfs2/ops_file.c
@@ -107,11 +107,11 @@ static loff_t gfs2_llseek(struct file *f
error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_ANY,
   &i_gh);
if (!error) {
-   error = remote_llseek(file, offset, origin);
+   error = remote_llseek_unlocked(file, offset, origin);
gfs2_glock_dq_uninit(&i_gh);
}
} else
-   error = remote_llseek(file, offset, origin);
+   error = remote_llseek_unlocked(file, offset, origin);
 
return error;
 }
Index: linux/fs/ncpfs/file.c
===
--- linux.orig/fs/ncpfs/file.c
+++ linux/fs/ncpfs/file.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "ncplib_kernel.h"
@@ -281,9 +282,18 @@ static int ncp_release(struct inode *ino
return 0;
 }
 
+static loff_t ncp_remote_llseek(struct file *file, loff_t offset, int origin)
+{
+   loff_t ret;
+   lock_kernel();
+   ret = remote_llseek_unlocked(file, offset, origin);
+   unlock_kernel();
+   return ret;
+}
+
 const struct file_operations ncp_file_operations =
 {
-   .llseek = remote_llseek,
+   .llseek = ncp_remote_llseek,
.read   = ncp_file_read,
.write  = ncp_file_write,
.ioctl  = ncp_ioctl,
Index: linux/fs/read_write.c
===
--- linux.orig/fs/read_write.c
+++ linux/fs/read_write.c
@@ -58,11 +58,10 @@ loff_t generic_file_llseek(struct file *
 
 EXPORT_SYMBOL(generic_file_llseek);
 
-loff_t remote_llseek(struct file *file, loff_t offset, int origin)
+loff_t remote_llseek_unlocked(struct file *file, loff_t offset, int origin)
 {
long long retval;
 
-   lock_kernel();
switch (origin) {
case SEEK_END:
offset += i_size_read(file->f_path.dentry->d_inode);
@@ -73,15 +72,15 @@ loff_t remote_llseek(struct file *file, 
retval = -EINVAL;
if (offset>=0 && 
offset<=file->f_path.dentry->d_inode->i_sb->s_maxbytes) {
if (offset != file->f_pos) {
+   /* AK: do we need a lock for those? */
file->f_pos = offset;
file->f_version = 0;
}
retval = offset;
}
-   unlock_kernel();
return retval;
 }
-EXPORT_SYMBOL(remote_llseek);
+EXPORT_SYMBOL(remote_llseek_unlocked);
 
 loff_t no_llseek(struct file *file, loff_t offset, int origin)
 {
Index: linux/fs/smbfs/file.c
===
--- linux.orig/fs/smbfs/file.c
+++ linux/fs/smbfs/file.c
@@ -422,9 +422,18 @@ smb_file_permission(struct inode *inode,
return error;
 }
 
+static loff_t smb_remote_llseek(struct file *file, loff_t offset, int origin)
+{
+   loff_t ret;
+   lock_kernel();
+   ret = remote_llseek_unlocked(file, offset, origin);
+   unlock_kernel();
+   return ret;
+}
+
 const struct file_operations smb_file_operations =
 {
-   .llseek = remote_llseek,
+   .llseek = smb_remote_llseek,
.read   = do_sync_read,
.aio_read   = smb_file_aio_read,
.write  = do_sync_write,
Index: linux/include/linux/fs.h