Re: [fuse-devel] [PATCH 01/25] VFS: move attr_kill logic from notify_change into helper function

2007-08-07 Thread Miklos Szeredi
There's another way to deal with this in NFS and fuse, without having
to change the API:

 - remove suid/sgid bits from i_mode, when refreshing the inode attributes
 - store the removed bits (or the original mode)  in the fs' inode strucure
 - in ->getattr() restore the original mode into the returned stat

This way the VFS believes, that the inode does not in fact have the
suid/sgid bits and so won't call ->setattr() unnecessarily.

I've tested a similar change in fuse for working around unneeded check
for the directory sticky bit.

Yes, this is cheating the interface, but there's a deeper level where
it makes sense: the VFS should not be checking _any_ inode attribute
besides the file type, since they can change at any moment, and the
VFS might be using stale data without having first properly
revalidated it.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [fuse-devel] [PATCH 01/25] VFS: move attr_kill logic from notify_change into helper function

2007-08-07 Thread Miklos Szeredi
> There's another way to deal with this in NFS and fuse, without having
> to change the API:
> 
>  - remove suid/sgid bits from i_mode, when refreshing the inode attributes
>  - store the removed bits (or the original mode)  in the fs' inode strucure
>  - in ->getattr() restore the original mode into the returned stat
> 
> This way the VFS believes, that the inode does not in fact have the
> suid/sgid bits and so won't call ->setattr() unnecessarily.

Of course this would break exec().  Oh, well.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [fuse-devel] [PATCH 01/25] VFS: move attr_kill logic from notify_change into helper function

2007-08-07 Thread Jeff Layton
On Tue, 07 Aug 2007 08:00:40 +0200
Miklos Szeredi <[EMAIL PROTECTED]> wrote:

> (cutting out lists from CC)
> 
> > > > > Your patch is changing the API in a very unsafe way, since there will
> > > > > be no error or warning on an unconverted fs.  And that could lead to
> > > > > security holes.
> > > > > 
> > > > > If we would rename the setattr method to setattr_new as well as
> > > > > changing it's behavior, that would be fine.  But I guess we do not
> > > > > want to do that.
> > > > 
> > > > Which "unconverted fses"? If we're talking out of tree stuff, then too
> > > > bad: it is _their_ responsibility to keep up with kernel changes.
> > > 
> > > It is usually a good idea to not change the semantics of an API in a
> > > backward incompatible way without changing the syntax as well.
> > 
> > We're taking two setattr flags ATTR_KILL_SGID, and ATTR_KILL_SUID which
> > have existed for several years in the VFS, and making them visible to
> > the filesystems. Out-of-tree filesystems that care can check for them in
> > a completely backward compatible way: you don't even need to add a
> > #define.
> 
> Making flags visible is not the problem.
> 
> Making another flag invisible (ATTR_MODE) at the same time _is_.
> 
> > > This is true regardless of whether we care about out-of-tree code or
> > > not (and we should care to some degree).  And especially true if the
> > > change in question is security sensitive.
> > 
> > It is not true "regardless": the in-tree code is being converted.
> > Out-of-tree code is the only "problem" here, and their only problem is
> > that they may have to add support for the new flags if they also support
> > suid/sgid mode bits.
> 
> Yes.  And there would be no problem with that, as long as it would be
> breaking the API in a visible way.  It does not do that and that is
> unsafe.
> 
> The other thing with this change is, that it's generally a good idea
> to let the VFS do as much as possible, because then filesystem writers
> won't get it wrong.
> 
> In this case the suid/sgid check needs to be omitted for _very_ few
> filesystems (NFS and fuse).  So it makes sense to leave the check
> outside filesystem code, and move it inside only when necessary.
> 

On the other hand, the filesystem writers here are declaring their own
setattr operation. Is it unreasonable for them to take responsibility
for handling this too? 

There are other in-tree filesystems that likely need to have this check
omitted (other networked filesystems in particular), but this
patchset tries to make this change transparent for now. Those authors
can go back and fix this up later.

As Trond said, in-tree filesystems will be converted so they won't
be an issue. The only danger is someone who is running unconverted
out-of-tree filesystem code on a kernel with this patch. Is that enough
of an issue to warrant us taking extra steps to deal with it?

Another alternative might be to rename notify_change(). I don't really
like gratuitously breaking the API, though, and that function is
referenced in a lot of documentation. Changing it might be confusing...

-- 
Jeff Layton <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [fuse-devel] [PATCH 01/25] VFS: move attr_kill logic from notify_change into helper function

2007-08-07 Thread Miklos Szeredi
> On the other hand, the filesystem writers here are declaring their own
> setattr operation. Is it unreasonable for them to take responsibility
> for handling this too? 

We have about half of all the in-tree filesystems declaring
->setattr(), and out of those only two that we know would use this.
Others haven't commented, which proably means, they just don't care
about this issue.

And even if most of them would make use of this feature, a inode flag
or filesystem flag for a smooth and backward compatible migration is
just so much better than risking to break filesystems.

And yes, I'm thinking about in-tree filesystems as well.  I'm sure you
did a thorough audit of all filesystems, but we are all human and can
make mistakes.  It is _always_ safer to not change an API which could
cause silent breakage.  And IMO, that's far more important than the
beauty of an interface (and ->setattr() is not beautiful with or
without an inode flag).

> Another alternative might be to rename notify_change(). I don't really
> like gratuitously breaking the API, though, and that function is
> referenced in a lot of documentation. Changing it might be confusing...

Oh no.  I'd like less breakage not more.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-07 Thread Jens Axboe
On Sun, Aug 05 2007, Daniel Phillips wrote:
> A simple way to solve the stable accounting field issue is to add a new 
> pointer to struct bio that is owned by the top level submitter 
> (normally generic_make_request but not always) and is not affected by 
> any recursive resubmission.  Then getting rid of that field later 
> becomes somebody's summer project, which is not all that urgent because 
> struct bio is already bloated up with a bunch of dubious fields and is 
> a transient structure anyway.

Thanks for your insights. Care to detail what bloat and dubious fields
struct bio has?

And we don't add temporary fields out of laziness, hoping that "someone"
will later kill it again and rewrite it in a nicer fashion. Hint: that
never happens, bloat sticks.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-07 Thread Daniel Phillips
On Tuesday 07 August 2007 05:05, Jens Axboe wrote:
> On Sun, Aug 05 2007, Daniel Phillips wrote:
> > A simple way to solve the stable accounting field issue is to add a
> > new pointer to struct bio that is owned by the top level submitter
> > (normally generic_make_request but not always) and is not affected
> > by any recursive resubmission.  Then getting rid of that field
> > later becomes somebody's summer project, which is not all that
> > urgent because struct bio is already bloated up with a bunch of
> > dubious fields and is a transient structure anyway.
>
> Thanks for your insights. Care to detail what bloat and dubious
> fields struct bio has?

First obvious one I see is bi_rw separate from bi_flags.  Front_size and 
back_size smell dubious.  Is max_vecs really necessary?  You could 
reasonably assume bi_vcnt rounded up to a power of two and bury the 
details of making that work behind wrapper functions to change the 
number of bvecs, if anybody actually needs that.  Bi_endio and 
bi_destructor could be combined.  I don't see a lot of users of bi_idx, 
that looks like a soft target.  See what happened to struct page when a 
couple of folks got serious about attacking it, some really deep hacks 
were done to pare off a few bytes here and there.  But struct bio as a 
space waster is not nearly in the same ballpark.

It would be interesting to see if bi_bdev could be made read only.  
Generally, each stage in the block device stack knows what the next 
stage is going to be, so why do we have to write that in the bio?  For 
error reporting from interrupt context?  Anyway, if Evgeniy wants to do 
the patch, I will happily unload the task of convincing you that random 
fields are/are not needed in struct bio :-)

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] cramfs: Error message about endianess

2007-08-07 Thread Andi Drebes
The README file in the cramfs subdirectory says:

"All data is currently in host-endian format; neither mkcramfs nor the
kernel ever do swabbing."

If somebody tries to mount a cramfs with the wrong endianess,
cramfs only complains about a wrong magic but doesn't inform
the user that only the endianess isn't right.

The following patch adds an error message to the cramfs sources.
If a user tries to mount a cramfs with the wrong endianess using the
patched sources, cramfs will display the message "cramfs: wrong
endianess".

I'm currently not subscribed to the list, so please CC me for any comments.

Tested on an i386 box. Diffed against Linus' git-tree.

Signed-off-by: Andi Drebes <[EMAIL PROTECTED]>
---
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 3d194a2..34c9b91 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -258,12 +258,21 @@ static int cramfs_fill_super(struct super_block *sb, void 
*data, int silent)
 
/* Do sanity checks on the superblock */
if (super.magic != CRAMFS_MAGIC) {
+   /* check for wrong endianess */
+   if(super.magic == CRAMFS_MAGIC_WEND) {
+   if (!silent)
+   printk(KERN_ERR "cramfs: wrong endianess\n");
+   goto out;
+   }
+   
/* check at 512 byte offset */
mutex_lock(&read_mutex);
memcpy(&super, cramfs_read(sb, 512, sizeof(super)), 
sizeof(super));
mutex_unlock(&read_mutex);
if (super.magic != CRAMFS_MAGIC) {
-   if (!silent)
+   if(super.magic == CRAMFS_MAGIC_WEND && !silent)
+   printk(KERN_ERR "cramfs: wrong endianess\n");
+   else if (!silent)
printk(KERN_ERR "cramfs: wrong magic\n");
goto out;
}
diff --git a/include/linux/cramfs_fs.h b/include/linux/cramfs_fs.h
index 1dba681..3be4e5a 100644
--- a/include/linux/cramfs_fs.h
+++ b/include/linux/cramfs_fs.h
@@ -4,6 +4,7 @@
 #include 
 
 #define CRAMFS_MAGIC   0x28cd3d45  /* some random number */
+#define CRAMFS_MAGIC_WEND  0x453dcd28  /* magic number with the wrong 
endianess */
 #define CRAMFS_SIGNATURE   "Compressed ROMFS"
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

2007-08-07 Thread Christoph Hellwig
First thanks a lot for doing this work, it's been long needed.

Second please don't send out that many patches.  We encourage people
to split things into small patches when the changes are logially
separated.  Which these are not - it's a flag day change (which btw
is fine despite the rants soe people spewed in reply to this), so it
should be one single patch. (Or one for all mainline filesystems +
one per fs only in -mm to make Andrew's life a little easier if you
really care.)
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/25] VFS: move attr_kill logic from notify_change into helper function

2007-08-07 Thread Christoph Hellwig
> +void attr_kill_to_mode(struct inode *inode, struct iattr *attr)

This function badly needs a kerneldoc description.  Also I can't say
I like the name a lot, but without a clearly better idea I should
probably not complain :)

We should at least add a generic_ prefix to indicate it's a generic
helper valid for most filesystem (and the kerneldoc comment can explain
the details)

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage.

2007-08-07 Thread Jens Axboe
On Tue, Aug 07 2007, Daniel Phillips wrote:
> On Tuesday 07 August 2007 05:05, Jens Axboe wrote:
> > On Sun, Aug 05 2007, Daniel Phillips wrote:
> > > A simple way to solve the stable accounting field issue is to add a
> > > new pointer to struct bio that is owned by the top level submitter
> > > (normally generic_make_request but not always) and is not affected
> > > by any recursive resubmission.  Then getting rid of that field
> > > later becomes somebody's summer project, which is not all that
> > > urgent because struct bio is already bloated up with a bunch of
> > > dubious fields and is a transient structure anyway.
> >
> > Thanks for your insights. Care to detail what bloat and dubious
> > fields struct bio has?
> 
> First obvious one I see is bi_rw separate from bi_flags.  Front_size and 
> back_size smell dubious.  Is max_vecs really necessary?  You could 

I don't like structure bloat, but I do like nice design. Overloading is
a necessary evil sometimes, though. Even today, there isn't enough room
to hold bi_rw and bi_flags in the same variable on 32-bit archs, so that
concern can be scratched. If you read bio.h, that much is obvious.

If you check up on the iommu virtual merging, you'll understand the
front and back size members. They may smell dubious to you, but please
take the time to understand why it looks the way it does.

> reasonably assume bi_vcnt rounded up to a power of two and bury the 
> details of making that work behind wrapper functions to change the 
> number of bvecs, if anybody actually needs that.  Bi_endio and 

Changing the number of bvecs is integral to how bio buildup current
works.

> bi_destructor could be combined.  I don't see a lot of users of bi_idx, 

bi_idx is integral to partial io completions.

> that looks like a soft target.  See what happened to struct page when a 
> couple of folks got serious about attacking it, some really deep hacks 
> were done to pare off a few bytes here and there.  But struct bio as a 
> space waster is not nearly in the same ballpark.

So show some concrete patches and examples, hand waving and assumptions
is just a waste of everyones time.

> It would be interesting to see if bi_bdev could be made read only.  
> Generally, each stage in the block device stack knows what the next 
> stage is going to be, so why do we have to write that in the bio?  For 
> error reporting from interrupt context?  Anyway, if Evgeniy wants to do 
> the patch, I will happily unload the task of convincing you that random 
> fields are/are not needed in struct bio :-)

It's a trade off, otherwise you'd have to pass the block device around a
lot. And it's, again, a design issue. A bio contains destination
information, that means device/offset/size information. I'm all for
shaving structure bytes where it matters, but not for the sake of
sacrificing code stability or design. I consider struct bio quite lean
and have worked hard to keep it that way. In fact, iirc, the only
addition to struct bio since 2001 is the iommu front/back size members.
And I resisted those for quite a while.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Jfs-discussion] [PATCH 15/25] JFS: call attr_kill_to_mode from jfs_setattr

2007-08-07 Thread Dave Kleikamp
On Mon, 2007-08-06 at 09:54 -0400, Jeff Layton wrote:
> Signed-off-by: Jeff Layton <[EMAIL PROTECTED]>
Acked-by: Dave Kleikamp <[EMAIL PROTECTED]>

> ---
>  fs/jfs/acl.c |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/jfs/acl.c b/fs/jfs/acl.c
> index 4d84bdc..34ca314 100644
> --- a/fs/jfs/acl.c
> +++ b/fs/jfs/acl.c
> @@ -227,6 +227,8 @@ int jfs_setattr(struct dentry *dentry, struct iattr 
> *iattr)
>   struct inode *inode = dentry->d_inode;
>   int rc;
> 
> + attr_kill_to_mode(inode, iattr);
> +
>   rc = inode_change_ok(inode, iattr);
>   if (rc)
>   return rc;
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with getting signals delivered to a Samba server

2007-08-07 Thread J. Bruce Fields
On Tue, Jun 26, 2007 at 04:48:42PM -0400, Robert Rappaport wrote:
> A Samba server running on Linux, supporting Oplocks for its clients,
> will establish a lease for each OpLock that it grants to a client.
> Then when some other activity in the file system occurs, such as
> another application opening a file with an OpLock (and therefore a
> lease), a call is made to Linux routine, __break_lease() and this is
> supposed to result in a signal being delivered to the process which
> established the lease.  Receipt of such a signal should cause the
> process to release the lease.
>
> What I see is that the delivery of such signals appears to be
> unreliable.  The problem occurs in routine, sigio_perm(), which often
> returns a value which then leads to the signal not being delivered.
> The entire sequence of calls leading to this failure is as follows:
>
>__break_lease() => lease_break_callback() => kill_fasync() =>
> __kill_fasync() => send_sigio() => send_sigio_to_task() =>
> sigio_perm()
>
>  Routine, sigio_perm() is very simple:
>
>  static inline int sigio_perm(struct task_struct *p,
>  struct fown_struct *fown, int sig)
>  {
>  return (((fown->euid == 0) ||
>   (fown->euid == p->suid) || (fown->euid == p->uid) ||
>   (fown->uid == p->suid) || (fown->uid == p->uid)) &&
>  !security_file_send_sigiotask(p, fown, sig));
> }

Hm.  I don't understand this code well either.  However, looking at the
F_SETOWN description in the man page for fcntl(2):

"Sending a signal to  the  owner  process  (group)  specified by
F_SETOWN  is  subject  to  the  same  permissions checks as are
described for kill(2), where the sending process is the one that
employs F_SETOWN (but see BUGS below)."

where the relevant language from kill(2) is:

"For  a  process  to  have permission to send a signal it must
either be privileged (under Linux: have the CAP_KILL
capability), or the real  or effective  user  ID of the sending
process must equal the real or saved set-user-ID of the target
process."

it appears that the above logic is enforcing this requirement.

> And the reason that this is failing to send the signal is that the
> values for fown->euid and fown->uid are both 500, consistent with a
> user mode client, and the values of p->uid and p->suid are both zero,
> consistent with a root process, i.e. the smbd.

So it looks to me like the kernel may be correct here, and that Samba
should be calling F_SETOWN as root to ensure that this permission check
will pass.  (From a quick check of the F_SETOWN implementation in
fs/fcntl.c, it does appear to set the uid and euid to the that of the
calling process, as documented in the man pages.)

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

2007-08-07 Thread Jeff Layton
On Tue, 7 Aug 2007 21:49:09 +0100
Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> First thanks a lot for doing this work, it's been long needed.
> 
> Second please don't send out that many patches.  We encourage people
> to split things into small patches when the changes are logially
> separated.  Which these are not - it's a flag day change (which btw
> is fine despite the rants soe people spewed in reply to this), so it
> should be one single patch. (Or one for all mainline filesystems +
> one per fs only in -mm to make Andrew's life a little easier if you
> really care.)

Thanks. I debated about how best to split these up. A coworker
mentioned that Andrew had tossed him back a single patch that
touched several mainline filesystems and asked him to break it 
up. I took that to mean that the patches should generally be split
out, but I guess I took that too far ;-)

-- 
Jeff Layton <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/25] VFS: move attr_kill logic from notify_change into helper function

2007-08-07 Thread Jeff Layton
On Tue, 7 Aug 2007 21:51:49 +0100
Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> > +void attr_kill_to_mode(struct inode *inode, struct iattr *attr)
> 
> This function badly needs a kerneldoc description.  Also I can't say
> I like the name a lot, but without a clearly better idea I should
> probably not complain :)
> 

Thanks for the comments.

I'm not thrilled with the name either, but kill_suid and *remove_suid
were already taken, and I really didn't want to name this something too
similar since there are already so many similarly named functions that
don't do the same thing. I'm definitely open to suggestions for
something different.

> We should at least add a generic_ prefix to indicate it's a generic
> helper valid for most filesystem (and the kerneldoc comment can explain
> the details)
> 

Both good suggestions. I'll plan to incorporate them in the next
respin of the set.

Thanks,
--
Jeff Layton <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

2007-08-07 Thread Andrew Morton
On Mon, 6 Aug 2007 09:54:03 -0400
Jeff Layton <[EMAIL PROTECTED]> wrote:

> Apologies for the resend, but the original sending had the date in the
> email header and it caused some of these to bounce...
> 
> ( Please consider trimming the Cc list if discussing some aspect of this
> that doesn't concern everyone.)
> 
> When an unprivileged process attempts to modify a file that has the
> setuid or setgid bits set, the VFS will attempt to clear these bits. The
> VFS will set the ATTR_KILL_SUID or ATTR_KILL_SGID bits in the ia_valid
> mask, and then call notify_change to clear these bits and set the mode
> accordingly.
> 
> With a networked filesystem (NFS in particular but most likely others),
> the client machine may not have credentials that allow for the clearing
> of these bits. In some situations, this can lead to file corruption, or
> to an operation failing outright because the setattr fails.
> 
> In this situation, we'd like to just leave the handling of this to
> the server and ignore these bits. The problem is that by the time
> nfs_setattr is called, the VFS has already reinterpreted the ATTR_KILL_*
> bits into a mode change. We can't fix this in the filesystems where
> this is a problem, as doing so would leave us having to second-guess
> what the VFS wants us to do. So we need to change it so that filesystems
> have more flexibility in how to interpret the ATTR_KILL_* bits.
> 
> The first patch in the following patchset moves this logic into a helper
> function, and then only calls this helper function for inodes that do
> not have a setattr operation defined. The subsequent patches fix up
> individual filesystem setattr functions to call this helper function.
> 
> The upshot of this is that with this change, filesystems that define
> a setattr inode operation are now responsible for handling the ATTR_KILL
> bits as well. They can trivially do so by calling the helper, but they
> must do so.
> 
> Some of the follow-on patches may not be strictly necessary, but I
> decided that it was better to take the conservative approach and call
> the helper when I wasn't sure. I've tried to CC the maintainers
> for the individual filesystems as well where I could find them,
> please let me know if there are others who should be informed.
> 
> Comments and suggestions appreciated...
> 

>From a purely practical standpoint: it's a concern that all filesytems need
patching to continue to correctly function after this change.  There might
be filesystems which you missed, and there are out-of-tree filesystems
which won't be updated.

And I think the impact upon the out-of-tree filesystems would be fairly
severe: they quietly and subtly get their secutiry guarantees broken (I
think?)

Is there any way in which we can prevent these problems?  Say

- rename something so that unconverted filesystems will reliably fail to
  compile?

- leave existing filesystems alone, but add a new
  inode_operations.setattr_jeff, which the networked filesytems can
  implement, and teach core vfs to call setattr_jeff in preference to
  setattr?

Something else?
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

2007-08-07 Thread Trond Myklebust
On Tue, 2007-08-07 at 17:15 -0700, Andrew Morton wrote:

> Is there any way in which we can prevent these problems?  Say

The problem here is that we occasionally DO need to add new flags, and
yes, they MAY be security related. The whole reason why we're now having
to change the semantics of setattr is because somebody tried to hack
their way around the write+suid issue.

I suspect we will see the exact same thing will happen again in a couple
of years with Serge's ATTR_KILL_PRIV flag.

> - rename something so that unconverted filesystems will reliably fail to
>   compile?
> 
> - leave existing filesystems alone, but add a new
>   inode_operations.setattr_jeff, which the networked filesytems can
>   implement, and teach core vfs to call setattr_jeff in preference to
>   setattr?

If you really need to know that the filesystem is handling the flags,
then how about instead having ->setattr() return something which
indicates which flags it actually handled? That is likely to be a far
more intrusive change, but it is one which is future-proof.

Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

2007-08-07 Thread Andrew Morton
On Tue, 07 Aug 2007 20:45:34 -0400
Trond Myklebust <[EMAIL PROTECTED]> wrote:

> > - rename something so that unconverted filesystems will reliably fail to
> >   compile?
> > 
> > - leave existing filesystems alone, but add a new
> >   inode_operations.setattr_jeff, which the networked filesytems can
> >   implement, and teach core vfs to call setattr_jeff in preference to
> >   setattr?
> 
> If you really need to know that the filesystem is handling the flags,
> then how about instead having ->setattr() return something which
> indicates which flags it actually handled? That is likely to be a far
> more intrusive change, but it is one which is future-proof.

If we change ->setattr so that it will return a positive, non-zero value
which the caller can then check and reliably do printk("that filesystem
needs updating") then that addresses my concern, sure.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch][rfc] fs: fix nobh error handling

2007-08-07 Thread Andrew Morton
On Tue, 7 Aug 2007 07:51:29 +0200
Nick Piggin <[EMAIL PROTECTED]> wrote:

> nobh mode error handling is not just pretty slack, it's wrong.
> 
> One cannot zero out the whole page to ensure new blocks are zeroed,
> because it just brings the whole page "uptodate" with zeroes even if
> that may not be the correct uptodate data. Also, other parts of the
> page may already contain dirty data which would get lost by zeroing
> it out. Thirdly, the writeback of zeroes to the new blocks will also
> erase existing blocks. All these conditions are pagecache and/or
> filesystem corruption.
> 
> The problem comes about because we didn't keep track of which buffers
> actually are new or old. However it is not enough just to keep only
> this state, because at the point we start dirtying parts of the page
> (new blocks, with zeroes), the handling of IO errors becomes impossible
> without buffers because the page may only be partially uptodate, in
> which case the page flags allone cannot capture the state of the parts
> of the page.
> 
> So allocate all buffers for the page upfront, but leave them unattached
> so that they don't pick up any other references and can be freed when
> we're done. If the error path is hit, then zero the new buffers as the
> regular buffer path does, then attach the buffers to the page so that
> it can actually be written out correctly and be subject to the normal
> IO error handling paths.
> 
> As an upshot, we save 1K of kernel stack on ia64 or powerpc 64K page
> systems.
> 
> Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
> 

With this change, nobh_prepare_write() can magically attach buffers to the
page.  But a filesystem which is running in nobh mode wouldn't expect that,
and could quite legitimately go BUG, or leak data or, more seriously and
much less fixably, just go and overwrite page->private, because it "knows"
that nobody else is using ->private.

I'd have thought that it would be better to not attach the buffers and to
go ahead and do whatever synchronous IO is needed in the error recovery
code, then free those buffers again.

Also, you have a couple of (cheerily uncommented) PagePrivate() tests in
there which should be page_has_buffers().

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch][rfc] fs: fix nobh error handling

2007-08-07 Thread Nick Piggin
On Tue, Aug 07, 2007 at 06:09:03PM -0700, Andrew Morton wrote:
> On Tue, 7 Aug 2007 07:51:29 +0200
> Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > nobh mode error handling is not just pretty slack, it's wrong.
> > 
> > One cannot zero out the whole page to ensure new blocks are zeroed,
> > because it just brings the whole page "uptodate" with zeroes even if
> > that may not be the correct uptodate data. Also, other parts of the
> > page may already contain dirty data which would get lost by zeroing
> > it out. Thirdly, the writeback of zeroes to the new blocks will also
> > erase existing blocks. All these conditions are pagecache and/or
> > filesystem corruption.
> > 
> > The problem comes about because we didn't keep track of which buffers
> > actually are new or old. However it is not enough just to keep only
> > this state, because at the point we start dirtying parts of the page
> > (new blocks, with zeroes), the handling of IO errors becomes impossible
> > without buffers because the page may only be partially uptodate, in
> > which case the page flags allone cannot capture the state of the parts
> > of the page.
> > 
> > So allocate all buffers for the page upfront, but leave them unattached
> > so that they don't pick up any other references and can be freed when
> > we're done. If the error path is hit, then zero the new buffers as the
> > regular buffer path does, then attach the buffers to the page so that
> > it can actually be written out correctly and be subject to the normal
> > IO error handling paths.
> > 
> > As an upshot, we save 1K of kernel stack on ia64 or powerpc 64K page
> > systems.
> > 
> > Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
> > 
> 
> With this change, nobh_prepare_write() can magically attach buffers to the
> page.  But a filesystem which is running in nobh mode wouldn't expect that,
> and could quite legitimately go BUG, or leak data or, more seriously and
> much less fixably, just go and overwrite page->private, because it "knows"
> that nobody else is using ->private.

I was fairly sure that a filesystem can not assume buffers won't be
attached, because there are various error case paths thta do exactly
the same thing (eg. nobh_writepage can call __block_write_full_page
which will attach buffers). 

Does any filesystem assume this? Is it not already broken?


> I'd have thought that it would be better to not attach the buffers and to
> go ahead and do whatever synchronous IO is needed in the error recovery
> code, then free those buffers again.

It is hard because if the synchronous IO fails, then what do you do?
You could try making it up as you go along, but of course if we _can_
attach the buffers here then it would be preferable to do that. IMO.
 

> Also, you have a couple of (cheerily uncommented) PagePrivate() tests in
> there which should be page_has_buffers().

Yeah, I guess the whole thing needs more commenting :P
page_has_buffers... right, I'll change that.


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch][rfc] fs: fix nobh error handling

2007-08-07 Thread Andrew Morton
On Wed, 8 Aug 2007 04:18:38 +0200 Nick Piggin <[EMAIL PROTECTED]> wrote:

> On Tue, Aug 07, 2007 at 06:09:03PM -0700, Andrew Morton wrote:
> > On Tue, 7 Aug 2007 07:51:29 +0200
> > Nick Piggin <[EMAIL PROTECTED]> wrote:
> > 
> > > nobh mode error handling is not just pretty slack, it's wrong.
> > > 
> > > One cannot zero out the whole page to ensure new blocks are zeroed,
> > > because it just brings the whole page "uptodate" with zeroes even if
> > > that may not be the correct uptodate data. Also, other parts of the
> > > page may already contain dirty data which would get lost by zeroing
> > > it out. Thirdly, the writeback of zeroes to the new blocks will also
> > > erase existing blocks. All these conditions are pagecache and/or
> > > filesystem corruption.
> > > 
> > > The problem comes about because we didn't keep track of which buffers
> > > actually are new or old. However it is not enough just to keep only
> > > this state, because at the point we start dirtying parts of the page
> > > (new blocks, with zeroes), the handling of IO errors becomes impossible
> > > without buffers because the page may only be partially uptodate, in
> > > which case the page flags allone cannot capture the state of the parts
> > > of the page.
> > > 
> > > So allocate all buffers for the page upfront, but leave them unattached
> > > so that they don't pick up any other references and can be freed when
> > > we're done. If the error path is hit, then zero the new buffers as the
> > > regular buffer path does, then attach the buffers to the page so that
> > > it can actually be written out correctly and be subject to the normal
> > > IO error handling paths.
> > > 
> > > As an upshot, we save 1K of kernel stack on ia64 or powerpc 64K page
> > > systems.
> > > 
> > > Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
> > > 
> > 
> > With this change, nobh_prepare_write() can magically attach buffers to the
> > page.  But a filesystem which is running in nobh mode wouldn't expect that,
> > and could quite legitimately go BUG, or leak data or, more seriously and
> > much less fixably, just go and overwrite page->private, because it "knows"
> > that nobody else is using ->private.
> 
> I was fairly sure that a filesystem can not assume buffers won't be
> attached, because there are various error case paths thta do exactly
> the same thing (eg. nobh_writepage can call __block_write_full_page
> which will attach buffers). 

oh crap, that's sad.  Either we broke it later on or I misremembered.

> Does any filesystem assume this? Is it not already broken?

Yes, it would be broken.

> 
> > I'd have thought that it would be better to not attach the buffers and to
> > go ahead and do whatever synchronous IO is needed in the error recovery
> > code, then free those buffers again.
> 
> It is hard because if the synchronous IO fails, then what do you do?

Do what we usually do when an IO error happens: crash the kernel?  (Sorry,
have been spending too long at bugzilla.kernel.org)

> You could try making it up as you go along, but of course if we _can_
> attach the buffers here then it would be preferable to do that. IMO.
>  
> 
> > Also, you have a couple of (cheerily uncommented) PagePrivate() tests in
> > there which should be page_has_buffers().
> 
> Yeah, I guess the whole thing needs more commenting :P
> page_has_buffers... right, I'll change that.

Did it get much testing?

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch][rfc] fs: fix nobh error handling

2007-08-07 Thread Nick Piggin
On Tue, Aug 07, 2007 at 07:33:47PM -0700, Andrew Morton wrote:
> On Wed, 8 Aug 2007 04:18:38 +0200 Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, Aug 07, 2007 at 06:09:03PM -0700, Andrew Morton wrote:
> > > 
> > > With this change, nobh_prepare_write() can magically attach buffers to the
> > > page.  But a filesystem which is running in nobh mode wouldn't expect 
> > > that,
> > > and could quite legitimately go BUG, or leak data or, more seriously and
> > > much less fixably, just go and overwrite page->private, because it "knows"
> > > that nobody else is using ->private.
> > 
> > I was fairly sure that a filesystem can not assume buffers won't be
> > attached, because there are various error case paths thta do exactly
> > the same thing (eg. nobh_writepage can call __block_write_full_page
> > which will attach buffers). 
> 
> oh crap, that's sad.  Either we broke it later on or I misremembered.
> 
> > Does any filesystem assume this? Is it not already broken?
> 
> Yes, it would be broken.
> 
> > 
> > > I'd have thought that it would be better to not attach the buffers and to
> > > go ahead and do whatever synchronous IO is needed in the error recovery
> > > code, then free those buffers again.
> > 
> > It is hard because if the synchronous IO fails, then what do you do?
> 
> Do what we usually do when an IO error happens: crash the kernel?  (Sorry,
> have been spending too long at bugzilla.kernel.org)

Heh.. I guess there is still a chance to retry the IO with sync or
fsync. I'mt not surprised if the "normal" pagecache error handling
paths doesn't work so well either, but at least if we can duplicate
as little code as possible it might get fixed up one day.


 
> > You could try making it up as you go along, but of course if we _can_
> > attach the buffers here then it would be preferable to do that. IMO.
> >  
> > 
> > > Also, you have a couple of (cheerily uncommented) PagePrivate() tests in
> > > there which should be page_has_buffers().
> > 
> > Yeah, I guess the whole thing needs more commenting :P
> > page_has_buffers... right, I'll change that.
> 
> Did it get much testing?

A little. Obviously it only really changes anything when an IO error hits,
and I found that ext3/jbd gives up and goes readonly pretty quickly when I
inject IO errors into the block device. What I really want to do is just
inject faults at nobh_prepare_write and do some longer tests.

I'll do that today. 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [fuse-devel] [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

2007-08-07 Thread Miklos Szeredi
> >From a purely practical standpoint: it's a concern that all filesytems need
> patching to continue to correctly function after this change.  There might
> be filesystems which you missed, and there are out-of-tree filesystems
> which won't be updated.
> 
> And I think the impact upon the out-of-tree filesystems would be fairly
> severe: they quietly and subtly get their secutiry guarantees broken (I
> think?)
> 
> Is there any way in which we can prevent these problems?  Say
> 
> - rename something so that unconverted filesystems will reliably fail to
>   compile?

Maybe renaming ATTR_MODE to ATTR_MODE_SET (changing it's value as
well, so that binary stuff breaks visibly as well)?

This would make sense, because we are changing what this attribute
acually means.  In the new code attr->ia_mode only contains the
originally set mode, not the ones we've added to change the suid bits.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html