[patch 2/6][RFC] Allow FIBMAP to return EFBIG on large filesystems.

2007-10-26 Thread Mike Waychison
Propagate an error (EFBIG) to userspace if the physical block is too large to 
return in a 32bit int instead of truncating it.

Signed-off-by: Mike Waychison <[EMAIL PROTECTED]>

 fs/ioctl.c |   10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

Index: linux-2.6.23/fs/ioctl.c
===
--- linux-2.6.23.orig/fs/ioctl.c2007-10-26 15:26:10.0 -0700
+++ linux-2.6.23/fs/ioctl.c 2007-10-26 16:16:28.0 -0700
@@ -52,6 +52,7 @@ static int file_ioctl(struct file *filp,
case FIBMAP:
{
struct address_space *mapping = filp->f_mapping;
+   sector_t phys_block;
int res;
/* do we support this mess? */
if (!mapping->a_ops->bmap)
@@ -64,8 +65,15 @@ static int file_ioctl(struct file *filp,
return -EINVAL;
 
lock_kernel();
-   res = mapping->a_ops->bmap(mapping, block);
+   phys_block = mapping->a_ops->bmap(mapping, block);
unlock_kernel();
+
+   /* Make sure that the return value fits in the
+* user's buffer. */
+   if ((u32)phys_block < phys_block)
+   return -EFBIG;
+
+   res = phys_block;
return put_user(res, p);
}
case FIGETBSZ:

--
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 4/6][RFC] Attempt to plug race with truncate

2007-10-26 Thread Mike Waychison
Attempt to deal with races with truncate paths.

I'm not really sure on the locking here, but these seem to be taken by the 
truncate path.  BKL is left as some filesystem may(?) still require it.

Signed-off-by: Mike Waychison <[EMAIL PROTECTED]>
 fs/ioctl.c |8 
 1 file changed, 8 insertions(+)

Index: linux-2.6.23/fs/ioctl.c
===
--- linux-2.6.23.orig/fs/ioctl.c2007-10-26 15:27:29.0 -0700
+++ linux-2.6.23/fs/ioctl.c 2007-10-26 16:16:28.0 -0700
@@ -43,13 +43,21 @@ static long do_ioctl(struct file *filp, 
 static int do_fibmap(struct address_space *mapping, sector_t block,
 sector_t *phys_block)
 {
+   struct inode *inode = mapping->host;
+
if (!capable(CAP_SYS_RAWIO))
return -EPERM;
if (!mapping->a_ops->bmap)
return -EINVAL;
 
lock_kernel();
+   /* Avoid races with truncate */
+   mutex_lock(&inode->i_mutex);
+   /* FIXME: Do we really need i_alloc_sem? */
+   down_read(&inode->i_alloc_sem);
*phys_block = mapping->a_ops->bmap(mapping, block);
+   up_read(&inode->i_alloc_sem);
+   mutex_unlock(&inode->i_mutex);
unlock_kernel();
 
return 0;

--
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 5/6][RFC] Introduce FIBMAP64

2007-10-26 Thread Mike Waychison
Introduce FIBMAP64.  This is the same as FIBMAP, but takes a u64.

This new routine requires filesystems to implement bmap64 if they want to 
support > 2^31 blocks in a logical file.  We do this to give time for 
filesystems to be properly audited.

Signed-off-by: Mike Waychison <[EMAIL PROTECTED]>

 fs/compat_ioctl.c  |2 ++
 fs/ioctl.c |   34 +-
 include/linux/fs.h |7 +++
 3 files changed, 42 insertions(+), 1 deletion(-)

Index: linux-2.6.23/fs/ioctl.c
===
--- linux-2.6.23.orig/fs/ioctl.c2007-10-26 15:28:28.0 -0700
+++ linux-2.6.23/fs/ioctl.c 2007-10-26 16:16:28.0 -0700
@@ -44,11 +44,24 @@ static int do_fibmap(struct address_spac
 sector_t *phys_block)
 {
struct inode *inode = mapping->host;
+   sector_t (*bmap)(struct address_space *, sector_t);
 
if (!capable(CAP_SYS_RAWIO))
return -EPERM;
-   if (!mapping->a_ops->bmap)
+
+   if (mapping->a_ops->bmap64) {
+   /* Filesystem has bmap path audited for 64bit. */
+   bmap = mapping->a_ops->bmap64;
+   } else if (mapping->a_ops->bmap) {
+   bmap = mapping->a_ops->bmap;
+   /* Need to verify that the input looks sane when sector_t is
+* 64bit and the filesystem has only been audited for 32bit. */
+   if ((s32)block < 0)
+   return -EFBIG;
+   } else {
+   /* no FIBMAP support */
return -EINVAL;
+   }
 
lock_kernel();
/* Avoid races with truncate */
@@ -95,6 +108,25 @@ static int file_ioctl(struct file *filp,
res = phys_block;
return put_user(res, p);
}
+   case FIBMAP64:
+   {
+   struct address_space *mapping = filp->f_mapping;
+   sector_t phys_block;
+   u64 block;
+
+   if ((error = get_user(block, p)) != 0)
+   return error;
+
+   /* Make sure that the logical block fits in sector_t */
+   if ((sector_t)-1 < block)
+   return -EFBIG;
+
+   error = do_fibmap(mapping, block, &phys_block);
+   if (error)
+   return error;
+
+   return put_user(block, p);
+   }
case FIGETBSZ:
return put_user(inode->i_sb->s_blocksize, p);
case FIONREAD:
Index: linux-2.6.23/include/linux/fs.h
===
--- linux-2.6.23.orig/include/linux/fs.h2007-10-26 15:25:47.0 
-0700
+++ linux-2.6.23/include/linux/fs.h 2007-10-26 15:29:00.0 -0700
@@ -220,6 +220,7 @@ extern int dir_notify_enable;
 #define BMAP_IOCTL 1   /* obsolete - kept for compatibility */
 #define FIBMAP_IO(0x00,1)  /* bmap access */
 #define FIGETBSZ   _IO(0x00,2) /* get the block size used for bmap */
+#define FIBMAP64   _IO(0x00,3) /* bmap access - 64 bit sector numbers */
 
 #defineFS_IOC_GETFLAGS _IOR('f', 1, long)
 #defineFS_IOC_SETFLAGS _IOW('f', 2, long)
@@ -423,6 +424,12 @@ struct address_space_operations {
int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
sector_t (*bmap)(struct address_space *, sector_t);
+   /*
+* Version of bmap for filesystems that can safely handle 64bit
+* sector_t.  Once all filesystems are converted to this, we can drop
+* bmap().
+*/
+   sector_t (*bmap64)(struct address_space *, sector_t);
void (*invalidatepage) (struct page *, unsigned long);
int (*releasepage) (struct page *, gfp_t);
ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
Index: linux-2.6.23/fs/compat_ioctl.c
===
--- linux-2.6.23.orig/fs/compat_ioctl.c 2007-10-26 15:25:47.0 -0700
+++ linux-2.6.23/fs/compat_ioctl.c  2007-10-26 15:29:00.0 -0700
@@ -2506,6 +2506,7 @@ COMPATIBLE_IOCTL(FIONREAD)  /* This is a
 /* 0x00 */
 COMPATIBLE_IOCTL(FIBMAP)
 COMPATIBLE_IOCTL(FIGETBSZ)
+COMPATIBLE_IOCTL(FIBMAP64)
 /* 0x03 -- HD/IDE ioctl's used by hdparm and friends.
  * Some need translations, these do not.
  */
@@ -3597,6 +3598,7 @@ asmlinkage long compat_sys_ioctl(unsigne
 
case FIBMAP:
case FIGETBSZ:
+   case FIBMAP64:
case FIONREAD:
if (S_ISREG(filp->f_path.dentry->d_inode->i_mode))
break;

--
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body o

[patch 3/6][RFC] Move FIBMAP logic

2007-10-26 Thread Mike Waychison
Move FIBMAP logic out of file_ioctl() in preparation for introducing FIBMAP64.

Signed-off-by: Mike Waychison <[EMAIL PROTECTED]>
 fs/ioctl.c |   25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

Index: linux-2.6.23/fs/ioctl.c
===
--- linux-2.6.23.orig/fs/ioctl.c2007-10-26 15:26:11.0 -0700
+++ linux-2.6.23/fs/ioctl.c 2007-10-26 16:16:28.0 -0700
@@ -40,6 +40,21 @@ static long do_ioctl(struct file *filp, 
return error;
 }
 
+static int do_fibmap(struct address_space *mapping, sector_t block,
+sector_t *phys_block)
+{
+   if (!capable(CAP_SYS_RAWIO))
+   return -EPERM;
+   if (!mapping->a_ops->bmap)
+   return -EINVAL;
+
+   lock_kernel();
+   *phys_block = mapping->a_ops->bmap(mapping, block);
+   unlock_kernel();
+
+   return 0;
+}
+
 static int file_ioctl(struct file *filp, unsigned int cmd,
unsigned long arg)
 {
@@ -55,18 +70,14 @@ static int file_ioctl(struct file *filp,
sector_t phys_block;
int res;
/* do we support this mess? */
-   if (!mapping->a_ops->bmap)
-   return -EINVAL;
-   if (!capable(CAP_SYS_RAWIO))
-   return -EPERM;
if ((error = get_user(block, p)) != 0)
return error;
if (block < 0)
return -EINVAL;
 
-   lock_kernel();
-   phys_block = mapping->a_ops->bmap(mapping, block);
-   unlock_kernel();
+   error = do_fibmap(mapping, block, &phys_block);
+   if (error)
+   return error;
 
/* Make sure that the return value fits in the
 * user's buffer. */

--
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 6/6][RFC] Drop CAP_SYS_RAWIO requirement on FIBMAP

2007-10-26 Thread Mike Waychison
Remove the need for having CAP_SYS_RAWIO when doing a FIBMAP call on an open 
file descriptor.

It would be nice to allow users to have permission to see where their data is 
landing on disk, and there really isn't a good reason to keep them from getting 
at this information.

Signed-off-by: Mike Waychison <[EMAIL PROTECTED]>
 fs/ioctl.c |3 ---
 1 file changed, 3 deletions(-)

Index: linux-2.6.23/fs/ioctl.c
===
--- linux-2.6.23.orig/fs/ioctl.c2007-10-26 15:30:05.0 -0700
+++ linux-2.6.23/fs/ioctl.c 2007-10-26 15:31:43.0 -0700
@@ -46,9 +46,6 @@ static int do_fibmap(struct address_spac
struct inode *inode = mapping->host;
sector_t (*bmap)(struct address_space *, sector_t);
 
-   if (!capable(CAP_SYS_RAWIO))
-   return -EPERM;
-
if (mapping->a_ops->bmap64) {
/* Filesystem has bmap path audited for 64bit. */
bmap = mapping->a_ops->bmap64;

--
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/6][RFC] Cleanup FIBMAP

2007-10-26 Thread Mike Waychison
The following series is meant to clean up FIBMAP paths with the eventual goal 
of allowing users to be able to FIBMAP their data.

I'm sending this as an RFC as I've only tested this on a x86_64 kernel with a 
32bit binary on ext2 and I've noticed a couple ext2_warnings already.

I'm unsure of the locking in [4/6] fix_race_with_truncate.patch.  Any help here 
would greatly be appreciated.

The last patch, [6/6] drop_cap_sys_rawio_for_fibmap.patch, is of course, not to 
be applied until any remaining issues are fixed :)

Thanks,

Mike Waychison
--
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/6][RFC] Keep FIBMAP from looking at negative block nrs

2007-10-26 Thread Mike Waychison
Return an error if the user requests a negative logical block in a file.

Signed-off-by: Mike Waychison <[EMAIL PROTECTED]>
 fs/ioctl.c |2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6.23/fs/ioctl.c
===
--- linux-2.6.23.orig/fs/ioctl.c2007-10-26 15:25:48.0 -0700
+++ linux-2.6.23/fs/ioctl.c 2007-10-26 16:16:29.0 -0700
@@ -60,6 +60,8 @@ static int file_ioctl(struct file *filp,
return -EPERM;
if ((error = get_user(block, p)) != 0)
return error;
+   if (block < 0)
+   return -EINVAL;
 
lock_kernel();
res = mapping->a_ops->bmap(mapping, block);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] 9p: add missing end-of-options record for trans_fd

2007-10-26 Thread Latchesar Ionkov
The list of options that the fd transport accepts is missing end-of-options
marker. This patch adds it.

Signed-off-by: Latchesar Ionkov <[EMAIL PROTECTED]>

---
commit 70ec0c7936c2516d128fdb1a32d749071b8846ec
tree 8de5495f83b94096825f8bfe0767e4fcbaf36ef3
parent 25ed88ed319e40fb6426e2aaefcc5f136aadd4ee
author Latchesar Ionkov <[EMAIL PROTECTED]> 1193438452 -0600
committer Latchesar Ionkov <[EMAIL PROTECTED]> 1193438452 -0600

 net/9p/trans_fd.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 30269a4..62332ed 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -62,13 +62,14 @@ struct p9_trans_fd {
 
 enum {
/* Options that take integer arguments */
-   Opt_port, Opt_rfdno, Opt_wfdno,
+   Opt_port, Opt_rfdno, Opt_wfdno, Opt_err,
 };
 
 static match_table_t tokens = {
{Opt_port, "port=%u"},
{Opt_rfdno, "rfdno=%u"},
{Opt_wfdno, "wfdno=%u"},
+   {Opt_err, NULL},
 };
 
 /**
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] 9p: return NULL when trans not found

2007-10-26 Thread Latchesar Ionkov
v9fs_match_trans function returns arbitrary transport module instead of NULL
when the requested transport is not registered. This patch modifies the
function to return NULL in that case.

Signed-off-by: Latchesar Ionkov <[EMAIL PROTECTED]>

---
commit 25ed88ed319e40fb6426e2aaefcc5f136aadd4ee
tree 281feda9127a85178d44faca59a13645d08e314d
parent 0c0b7fa3d4e80ab3e4e69b8a55661f649d1b41ff
author Latchesar Ionkov <[EMAIL PROTECTED]> 1193438318 -0600
committer Latchesar Ionkov <[EMAIL PROTECTED]> 1193438318 -0600

 net/9p/mod.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/9p/mod.c b/net/9p/mod.c
index 41d70f4..8f9763a 100644
--- a/net/9p/mod.c
+++ b/net/9p/mod.c
@@ -76,9 +76,9 @@ struct p9_trans_module *v9fs_match_trans(const substring_t 
*name)
list_for_each(p, &v9fs_trans_list) {
t = list_entry(p, struct p9_trans_module, list);
if (strncmp(t->name, name->from, name->to-name->from) == 0)
-   break;
+   return t;
}
-   return t;
+   return NULL;
 }
 EXPORT_SYMBOL(v9fs_match_trans);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] 9p: use copy of the options value instead of original

2007-10-26 Thread Latchesar Ionkov
v9fs_parse_options function uses strsep which modifies the value of the
v9ses->options field. That modified value is later passed to the function
that creates the transport potentially making the transport creation
function to fail.

This patch creates a copy of v9ses->option field that v9fs_parse_options
function uses instead of the original value.

Signed-off-by: Latchesar Ionkov <[EMAIL PROTECTED]>

---
commit 0c0b7fa3d4e80ab3e4e69b8a55661f649d1b41ff
tree eeb4ac6ea85387c153e3f7f6cad370e1c5dc8534
parent c9927c2bf4f45bb85e8b502ab3fb79ad6483c244
author Latchesar Ionkov <[EMAIL PROTECTED]> 1193438132 -0600
committer Latchesar Ionkov <[EMAIL PROTECTED]> 1193438132 -0600

 fs/9p/v9fs.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index 756f7e9..fbb12da 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -82,7 +82,7 @@ static match_table_t tokens = {
 
 static void v9fs_parse_options(struct v9fs_session_info *v9ses)
 {
-   char *options = v9ses->options;
+   char *options;
substring_t args[MAX_OPT_ARGS];
char *p;
int option;
@@ -96,9 +96,10 @@ static void v9fs_parse_options(struct v9fs_session_info 
*v9ses)
v9ses->cache = 0;
v9ses->trans = v9fs_default_trans();
 
-   if (!options)
+   if (!v9ses->options)
return;
 
+   options = kstrdup(v9ses->options, GFP_KERNEL);
while ((p = strsep(&options, ",")) != NULL) {
int token;
if (!*p)
@@ -169,6 +170,7 @@ static void v9fs_parse_options(struct v9fs_session_info 
*v9ses)
continue;
}
}
+   kfree(options);
 }
 
 /**
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/1] Drop CAP_SYS_RAWIO requirement for FIBMAP

2007-10-26 Thread Mike Waychison

Jason Uhlenkott wrote:

On Fri, Oct 26, 2007 at 14:59:57 -0700, Mike Waychison wrote:

Jason Uhlenkott wrote:

Additionally, ext3_bmap() has this to say about it:

   if (EXT3_I(inode)->i_state & EXT3_STATE_JDATA) {
   /*
* This is a REALLY heavyweight approach, but the use of
* bmap on dirty files is expected to be extremely rare:
* only if we run lilo or swapon on a freshly made file
* do we expect this to happen.
*
* (bmap requires CAP_SYS_RAWIO so this does not
* represent an unprivileged user DOS attack --- we'd be
* in trouble if mortal users could trigger this path at
* will.)
Hmm.  I don't know what the right approach to this is.  This seems to be 
the same situation as the delayed allocation problem, no?


Yup.

What if we just returned 0?  Tools like lilo are already doing sync(), 
would that cause the journal to get flushed explicitly anyway?


Not sure, but I'd be pretty nervous about breaking any existing users
which aren't explicitly syncing.


True.   We can probably get away with an implicit flush when 
CAP_SYS_RAWIO is set, but that's pretty gross :(




Are you envisioning users who want to see where their data is landing
for performance reasons?  It seems like such users are going to have
sufficiently different desires from existing FIBMAP users (who need to
know where everything is because they intend to fiddle with the raw
device) that a different interface might be warranted.


A little of both ;)

We could introduce a new API, though either way, the same fundamental 
problems apply wrt auditing.


I see three reasons that new APIs are warranted:

a) to deal with block numbers > 2^31 --> FIBMAP64
b) to have a path where no syncing is required due to worries about user 
DoS (delayed allocation / data in journal).
c) possibly some way to FIBMAP a range so that userspace doesn't need to 
syscall for each block, something like how mincore() does it?


I have a patchset ready that I'll send out shortly that introduces 
FIBMAP64.  The last patch in that set drops the CAP_SYS_RAWIO, but it's 
probably not what we want given DoS case.  I'd like to send it out 
anyway to get some comments on some of the sanity checks and locking I'm 
adding.


Handling (c) above is just extra sugar and isn't something I'm too 
worried about implementing.


Mike Waychison
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/1] Drop CAP_SYS_RAWIO requirement for FIBMAP

2007-10-26 Thread Jason Uhlenkott
On Fri, Oct 26, 2007 at 14:59:57 -0700, Mike Waychison wrote:
> Jason Uhlenkott wrote:
> >Additionally, ext3_bmap() has this to say about it:
> >
> >if (EXT3_I(inode)->i_state & EXT3_STATE_JDATA) {
> >/*
> > * This is a REALLY heavyweight approach, but the use of
> > * bmap on dirty files is expected to be extremely rare:
> > * only if we run lilo or swapon on a freshly made file
> > * do we expect this to happen.
> > *
> > * (bmap requires CAP_SYS_RAWIO so this does not
> > * represent an unprivileged user DOS attack --- we'd be
> > * in trouble if mortal users could trigger this path at
> > * will.)
> 
> Hmm.  I don't know what the right approach to this is.  This seems to be 
> the same situation as the delayed allocation problem, no?

Yup.

> What if we just returned 0?  Tools like lilo are already doing sync(), 
> would that cause the journal to get flushed explicitly anyway?

Not sure, but I'd be pretty nervous about breaking any existing users
which aren't explicitly syncing.

Are you envisioning users who want to see where their data is landing
for performance reasons?  It seems like such users are going to have
sufficiently different desires from existing FIBMAP users (who need to
know where everything is because they intend to fiddle with the raw
device) that a different interface might be warranted.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VFS/Security: Rework inode_getsecurity and callers to return resulting buffer

2007-10-26 Thread James Morris
On Fri, 26 Oct 2007, Serge E. Hallyn wrote:

> > It wouldn't be much effort to rebase this patch against Linus's latest
> > tree. I am assuming that the static lsm patch is in there based on the
> > recent discussion on LKML?
> 
> Oh, sorry for the two emails.
> 
> Yeah it's in 2.6.24.  So a rebase will be necessary anyway. 

That code may still change -- Arjan's update, for example.

-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/1] Drop CAP_SYS_RAWIO requirement for FIBMAP

2007-10-26 Thread Jason Uhlenkott
On Fri, Oct 26, 2007 at 01:22:17 +0100, Alan Cox wrote:
> On Thu, 25 Oct 2007 16:06:40 -0700
> Mike Waychison <[EMAIL PROTECTED]> wrote:
> 
> > Remove the need for having CAP_SYS_RAWIO when doing a FIBMAP call on an 
> > open file descriptor.
> > 
> > It would be nice to allow users to have permission to see where their data 
> > is landing on disk, and there really isn't a good reason to keep them from 
> > getting at this information.
> 
> Historically this was done because people felt it was more secure. It
> also allows you to make some deductions about other activities on the
> disk but thats probably only a concern for very very security crazed
> compartmentalised boxes
> 
> Also historically at least FIBMAP could be abused to crash the system.
> Now if you can verify that has been fixed I have no problem, but given
> that I can find no record of that being fixed it would be wise to audit
> it first and review Chris Evans and other reports about what occurs when
> FIBMAP is passed random block numbers.
> 
> FIBMAP has another problem for this general use as well - it takes an int
> but the block number can now be bigger for very large files on 32bit.

Additionally, ext3_bmap() has this to say about it:

if (EXT3_I(inode)->i_state & EXT3_STATE_JDATA) {
/*
 * This is a REALLY heavyweight approach, but the use of
 * bmap on dirty files is expected to be extremely rare:
 * only if we run lilo or swapon on a freshly made file
 * do we expect this to happen.
 *
 * (bmap requires CAP_SYS_RAWIO so this does not
 * represent an unprivileged user DOS attack --- we'd be
 * in trouble if mortal users could trigger this path at
 * will.)
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/1] Drop CAP_SYS_RAWIO requirement for FIBMAP

2007-10-26 Thread Mike Waychison

Jason Uhlenkott wrote:

On Fri, Oct 26, 2007 at 01:22:17 +0100, Alan Cox wrote:

On Thu, 25 Oct 2007 16:06:40 -0700
Mike Waychison <[EMAIL PROTECTED]> wrote:


Remove the need for having CAP_SYS_RAWIO when doing a FIBMAP call on an open 
file descriptor.

It would be nice to allow users to have permission to see where their data is 
landing on disk, and there really isn't a good reason to keep them from getting 
at this information.

Historically this was done because people felt it was more secure. It
also allows you to make some deductions about other activities on the
disk but thats probably only a concern for very very security crazed
compartmentalised boxes

Also historically at least FIBMAP could be abused to crash the system.
Now if you can verify that has been fixed I have no problem, but given
that I can find no record of that being fixed it would be wise to audit
it first and review Chris Evans and other reports about what occurs when
FIBMAP is passed random block numbers.

FIBMAP has another problem for this general use as well - it takes an int
but the block number can now be bigger for very large files on 32bit.


Additionally, ext3_bmap() has this to say about it:

if (EXT3_I(inode)->i_state & EXT3_STATE_JDATA) {
/*
 * This is a REALLY heavyweight approach, but the use of
 * bmap on dirty files is expected to be extremely rare:
 * only if we run lilo or swapon on a freshly made file
 * do we expect this to happen.
 *
 * (bmap requires CAP_SYS_RAWIO so this does not
 * represent an unprivileged user DOS attack --- we'd be
 * in trouble if mortal users could trigger this path at
 * will.)


Hmm.  I don't know what the right approach to this is.  This seems to be 
the same situation as the delayed allocation problem, no?


What if we just returned 0?  Tools like lilo are already doing sync(), 
would that cause the journal to get flushed explicitly anyway?


Mike Waychison
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Fix linux/swap.h build wart

2007-10-26 Thread Jeff Dike
On Fri, Oct 26, 2007 at 01:38:12PM -0700, Luck, Tony wrote:
> Not fundamental reasons, but these patches break the ia64 build for most
> configs with:
> 
> In file included from kernel/futex.c:59:
> include/asm/futex.h: In function `futex_atomic_op_inuser':
> include/asm/futex.h:62: error: implicit declaration of function 
> `pagefault_disable'
> include/asm/futex.h:86: error: implicit declaration of function 
> `pagefault_enable'
> make[1]: *** [kernel/futex.o] Error 1
> 
> Adding an include of  to kernel/futex.c is sufficient (but
> not necessarily right) to unbreak this.

Yeah, I've been changing  to  for
these.  Looks like I'll need to make a pass over all the futex.h's.

> A few configs also fail with:
> mm/migrate.c: In function `migrate_page_copy':
> mm/migrate.c:359: error: implicit declaration of function `copy_highpage'
> make[1]: *** [mm/migrate.o] Error 1
> 
> Making mm/migrate.c include  fixes this one.

Excellent, I hadn't yet figured out who used migrate.c.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [0/3] Distributed storage. Mirror algo extension for automatic recovery.

2007-10-26 Thread Andrew Morton
On Thu, 18 Oct 2007 23:17:41 +0400
Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> I'm pleased to announce sixth release of the distributed storage
> subsystem, which allows to form a storage on top of remote and local
> nodes, which in turn can be exported to another storage as a node to
> form tree-like storages.

I went back and re-read last month's discussion and I'm not seeing any
reason why we shouldn't start thinking about merging this.

How close is it to that stage?  A peek at your development blog indicates
that things are still changing at a moderate rate?

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 0/2] Fix linux/swap.h build wart

2007-10-26 Thread Luck, Tony
> These patches propose to break the recursion instead by removing the
> linux/pagemap.h -> linux/highmem.h inclusion.  I'd like to know
> whether there are any fundamental reasons that this include should be
> preserved.

Not fundamental reasons, but these patches break the ia64 build for most
configs with:

In file included from kernel/futex.c:59:
include/asm/futex.h: In function `futex_atomic_op_inuser':
include/asm/futex.h:62: error: implicit declaration of function 
`pagefault_disable'
include/asm/futex.h:86: error: implicit declaration of function 
`pagefault_enable'
make[1]: *** [kernel/futex.o] Error 1

Adding an include of  to kernel/futex.c is sufficient (but
not necessarily right) to unbreak this.

A few configs also fail with:
mm/migrate.c: In function `migrate_page_copy':
mm/migrate.c:359: error: implicit declaration of function `copy_highpage'
make[1]: *** [mm/migrate.o] Error 1

Making mm/migrate.c include  fixes this one.

-Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/2] Remove highmem.h include from pagemap.h

2007-10-26 Thread Jeff Dike
There has been a long-standing wart in linux/swap.h where it uses
page_cache_release and release_pages without declaring them by
including linux/pagemap.h.  There is this scary comment to warn off
anyone foolish enough to try to fix this:

   /* only sparc can not include linux/pagemap.h in this file
* so leave page_cache_release and release_pages undeclared... */

This has caused problems for a number of people including:
 akpm - http://www.ussg.iu.edu/hypermail/linux/kernel/0708.3/0446.html
"Oh that damn thing again.  It's a regular source of
include breakage."
 ppc - http://patchwork.ozlabs.org/linuxppc/patch?id=12288
 UML - http://www.mail-archive.com/[EMAIL PROTECTED]/msg05095.html

What happens with sparc, when this include is added, is a linux/mm.h ->
linux/mm.h recursion: 
 linux/mm.h -> asm/pgtable.h -> linux/swap.h -> linux/pagemap.h ->
linux/highmem.h -> linux/mm.h
leaving various things that live in mm.h undefined in highmem.h.

sparc is unique in the asm/pgtable.h -> linux/swap.h inclusion.  This
looks correct, as pgtable.h uses swp_entry_t, which is defined in
swap.h, so maybe the other arches should do the same.  Adding this to
the UML pgtable.h causes the same build breakage.

All of the links in the recursion above look reasonable, except for
linux/pagemap.h -> linux/highmem.h, which is the point of attack for
this fix.

linux/pagemap.h uses nothing from linux/highmem.h, so doesn't need to
include it.  However, there are a number of files which use stuff from
highmem.h, don't include it, but do include pagemap.h.  This patch
removes the linux/pagemap.h -> linux/highmem.h inclusion and adds a
linux/highmem.h include to every file in the tree which now needs one.

---
 arch/parisc/kernel/cache.c   |1 +
 drivers/block/rd.c   |1 +
 drivers/dma/iovlock.c|1 +
 drivers/media/video/ivtv/ivtv-udma.c |1 +
 drivers/net/e1000/e1000_main.c   |1 +
 drivers/net/e1000e/netdev.c  |1 +
 fs/9p/vfs_addr.c |1 +
 fs/affs/file.c   |1 +
 fs/affs/symlink.c|1 +
 fs/afs/dir.c |1 +
 fs/afs/fsclient.c|1 +
 fs/afs/mntpt.c   |1 +
 fs/afs/rxrpc.c   |1 +
 fs/afs/write.c   |1 +
 fs/cifs/file.c   |1 +
 fs/cifs/inode.c  |1 +
 fs/coda/symlink.c|1 +
 fs/cramfs/inode.c|1 +
 fs/dlm/lowcomms.c|1 +
 fs/ecryptfs/mmap.c   |1 +
 fs/ecryptfs/read_write.c |1 +
 fs/efs/symlink.c |1 +
 fs/ext2/dir.c|1 +
 fs/ext2/namei.c  |1 +
 fs/ext3/inode.c  |1 +
 fs/ext4/extents.c|1 +
 fs/ext4/inode.c  |1 +
 fs/freevxfs/vxfs_immed.c |1 +
 fs/freevxfs/vxfs_subr.c  |1 +
 fs/fuse/dev.c|1 +
 fs/gfs2/bmap.c   |1 +
 fs/gfs2/lops.c   |1 +
 fs/gfs2/ops_address.c|1 +
 fs/gfs2/ops_file.c   |1 +
 fs/hfs/bnode.c   |1 +
 fs/hfs/btree.c   |1 +
 fs/hfsplus/bitmap.c  |1 +
 fs/hfsplus/bnode.c   |1 +
 fs/hfsplus/btree.c   |1 +
 fs/hostfs/hostfs_kern.c  |1 +
 fs/hpfs/namei.c  |1 +
 fs/isofs/compress.c  |1 +
 fs/isofs/rock.c  |1 +
 fs/jbd/journal.c |1 +
 fs/jbd2/journal.c|1 +
 fs/jffs2/fs.c|1 +
 fs/jfs/super.c   |1 +
 fs/libfs.c   |1 +
 fs/minix/namei.c |1 +
 fs/namei.c   |1 +
 fs/ncpfs/dir.c   |1 +
 fs/ncpfs/mmap.c  |1 +
 fs/ncpfs/symlink.c   |1 +
 fs/nfs/dir.c |1 +
 fs/nfs/nfs2xdr.c |1 +
 fs/nfs/nfs3xdr.c |1 +
 fs/nfs/nfs4proc.c|1 +
 fs/nfs/nfs4xdr.c |1 +
 fs/nfs/read.c|1 +
 fs/nfs/symlink.c |1 +
 fs/nfs/write.c   |1 +
 fs/ntfs/aops.c   |1 +
 fs/ntfs/file.c   |1 +
 fs/ocfs2/symlink.c   |1 +
 fs/reiserfs/ioctl.c  |1 +
 fs/reiserfs/stree.c  |1 +
 fs/reiserfs/tail_conversion.c|1 +
 fs/reiserfs/xattr.c  |1 +
 fs/romfs/inode.c |1 +

[RFC PATCH 0/2] Fix linux/swap.h build wart

2007-10-26 Thread Jeff Dike
For some time, there has been a problem of linux/swap.h using
page_cache_release and release_pages without declaring them by
including linux/pagemap.h.  pagemap.h isn't included because that
breaks the sparc build.

The full details are in the next post, but the short story is that
sparc's pgtable.h includes linux/swap.h, creating a mm.h -> mm.h
recursion.  The current tree breaks that recursion by removing the
swap.h -> pagemap.h inclusion.  This breaks builds on a regular basis
(references also in the next post).

These patches propose to break the recursion instead by removing the
linux/pagemap.h -> linux/highmem.h inclusion.  I'd like to know
whether there are any fundamental reasons that this include should be
preserved.

The reason for choosing this include is that pagemap.h doesn't use
anything exported by highmem.h.  The downside is that a bunch of
things now need to include highmem.h.  These are mostly filesystems,
plus a good amount of arch code and a few drivers.  I would argue that
if you use something from a header, you should include it, regardless
of whether you already get the header indirectly, so this isn't too
much of a downside as far as I'm concerned.

The one thing I'm uncertain about is whether pagemap.h and highmem.h
have some basic relationship that implies that you can assume you get
highmem when you include pagemap.

The two patches that follow do the following:
remove highmem.h from pagemap.h and add includes of highmem.h
to a number of files that now need them
include pagemap.h in swap.h

These are for comment only, not for mainline, so there's no S-o-b
right yet.

At this point, I haven't confirmed that the following files, which use
stuff from highmem.h but don't include it, still build:

./arch/x86/kernel/crash_dump_64.c
./arch/frv/mm/fault.c
./arch/frv/mm/kmap.c
./arch/h8300/mm/kmap.c
./arch/ia64/kernel/crash_dump.c
./arch/m68k/mm/kmap.c
./arch/m68knommu/mm/kmap.c
./arch/powerpc/kernel/crash_dump.c
./arch/sh/kernel/crash_dump.c
./arch/xtensa/mm/pgtable.c
./drivers/block/ps3disk.c
./drivers/infiniband/hw/ipath/ipath_verbs.h
./drivers/mmc/host/at91_mci.c
./drivers/mmc/host/mmci.h
./drivers/mmc/host/mmc_spi.c
./drivers/s390/block/xpram.c
./drivers/s390/char/tape_34xx.c
./drivers/kvm/paging_tmpl.h
./fs/hfsplus/hfsplus_fs.h
./include/asm-arm/cacheflush.h
./include/asm-frv/pgtable.h
./include/asm-x86/kexec_32.h
./include/asm-x86/kexec_64.h
./include/asm-x86/pgtable_32.h
./include/asm-m68k/motorola_pgalloc.h
./include/asm-m68k/sun3_pgalloc.h
./include/asm-m68k/sun3_pgtable.h
./include/asm-mips/cacheflush.h
./include/asm-mips/page.h
./include/asm-parisc/cacheflush.h
./include/asm-powerpc/pgtable-ppc32.h
./include/asm-ppc/pgtable.h
./include/asm-s390/kexec.h
./include/asm-sh/kexec.h
./mm/migrate.c

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/2] Add pagemap.h include to swap.h

2007-10-26 Thread Jeff Dike
Include linux/pagemap.h in linux/swap.h and remove the comment saying
that it's not possible.

---
 include/linux/swap.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.17/include/linux/swap.h
===
--- linux-2.6.17.orig/include/linux/swap.h  2007-10-24 10:05:09.0 
-0400
+++ linux-2.6.17/include/linux/swap.h   2007-10-25 22:59:29.0 -0400
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -282,8 +283,7 @@ static inline void disable_swap_token(vo
 
 #define si_swapinfo(val) \
do { (val)->freeswap = (val)->totalswap = 0; } while (0)
-/* only sparc can not include linux/pagemap.h in this file
- * so leave page_cache_release and release_pages undeclared... */
+
 #define free_page_and_swap_cache(page) \
page_cache_release(page)
 #define free_pages_and_swap_cache(pages, nr) \
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VFS/Security: Rework inode_getsecurity and callers to return resulting buffer

2007-10-26 Thread David P. Quigley
On Fri, 2007-10-26 at 11:36 -0500, Serge E. Hallyn wrote:
> Quoting David P. Quigley ([EMAIL PROTECTED]):
> > On Fri, 2007-10-26 at 10:02 -0500, Serge E. Hallyn wrote:
> > > Quoting David P. Quigley ([EMAIL PROTECTED]):
> > > > On Thu, 2007-10-25 at 19:02 -0500, Serge E. Hallyn wrote:
> > > > > Quoting David P. Quigley ([EMAIL PROTECTED]):
> > > > > >  static int task_alloc_security(struct task_struct *task)
> > > > > > @@ -2423,14 +2397,22 @@ static const char 
> > > > > > *selinux_inode_xattr_getsuffix(void)
> > > > > >   *
> > > > > >   * Permission check is handled by selinux_inode_getxattr hook.
> > > > > >   */
> > > > > > -static int selinux_inode_getsecurity(const struct inode *inode, 
> > > > > > const char *name, void *buffer, size_t size, int err)
> > > > > > +static int selinux_inode_getsecurity(const struct inode *inode,
> > > > > > +   const char *name,
> > > > > > +   void **buffer)
> > > > > >  {
> > > > > > +   u32 size;
> > > > > > +   int error;
> > > > > > struct inode_security_struct *isec = inode->i_security;
> > > > > >  
> > > > > > if (strcmp(name, XATTR_SELINUX_SUFFIX))
> > > > > > return -EOPNOTSUPP;
> > > > > >  
> > > > > > -   return selinux_getsecurity(isec->sid, buffer, size);
> > > > > > +   error = security_sid_to_context(isec->sid, (char **)buffer, 
> > > > > > &size);
> > > > > 
> > > > > The only other downside I see here is that when the user just passes 
> > > > > in
> > > > > NULL for a buffer, security_sid_to_context() will still
> > > > > kmalloc the buffer only to have it immediately freed by
> > > > > xattr_getsecurity() through release_secctx().  I trust that isn't seen
> > > > > as any major performance impact?
> > > > 
> > > > There is no way to avoid this in the SELinux case. SELinux doesn't store
> > > > the sid to string mapping directly. Rather it takes the sid and then
> > > > builds the string from fields in the related structure. So regardless
> > > > this data is being allocated internally. The only issue I potentially
> > > > see is that if someone passes in null expecting just to get the length
> > > > we are actually returning a value. However we are changing the semantics
> > > > of the function so the old semantics are no longer valid.
> > > 
> > > Hmm?  Which semantics are no longer valid?
> > > 
> > > You're changing the semantincs of the in-kernel API, but userspace can
> > > still send in NULL to query the length of the buffer needed.  So if
> > > userspace does two getxattrs, one to get the length, then another to get
> > > the value, selinux will be kmallocing twice.
> > > 
> > > For a file manager doing a listing on a huge directory and wanting to
> > > list the selinux type, i could see that being a performance issue.  Of
> > > course they could get around that by sending in a 'reasonably large'
> > > buffer for a first try.
> > > 
> > 
> > Ok lets start this line of thought over again since it has been a while
> > since I wrote the patches and got almost no sleep last night. 
> > 
> > Your concerns are that we are double allocating buffers one of which we
> > are just going to immediately free after a copy. So inside the SELinux
> > helper function there was what I saw as generic code for handling
> > xattrs. This can be seen in the new function xattr_getsecurity which use
> > to be internal to SELinux (selinux_getsecurity). What we are doing is
> > grabbing the string which internally is being allocated anyway and if
> > our buffer passed in for the copy is null we just goto out returning the
> > length and freeing the buffer. So here is our standard null handling
> > that we had before. In LSMs where there is no internal allocation to
> > handle the getsecurity call this should introduce almost no overhead.
> 
> Ah, thanks, you reminded me of what I was trying to point out.
> 
> SMACK won't do allocations so it's ok.  SELinux will do allocations
> in any case so it's ok.  So in terms of current users it's fine, so I
> don't want to complaint too loudly.
> 
> But the now-generic xattr_getsecurity() call passes in 'buffer' from its
> stack, with no indication to the LSM of whether userspace passed in NULL
> or a buffer.  So if there *were* an lsm which had to allocate space to
> return data, but didn't want to do so when the user just asked for the
> length of the data, then that LSM would be out of luck.
> 
> So would you object to passing in a boolean telling the LSM whether the
> user had passed in a buffer in which to return data or not?

I don't see why we couldn't add a bool. I am just wondering are there
really use-cases where the developer is going to need this though.
Unfortunately I'm not all knowing so I can't foresee what us "Insane"
security people will do so I think it's a reasonable addition. I'll
rebase the patch and make these changes and hopefully have a new
revision to the list before the end of the day.

Dave

> 
> thanks,
> -serge
> 
>

Re: [PATCH 1/2] VFS/Security: Rework inode_getsecurity and callers to return resulting buffer

2007-10-26 Thread Serge E. Hallyn
Quoting David P. Quigley ([EMAIL PROTECTED]):
> On Fri, 2007-10-26 at 10:02 -0500, Serge E. Hallyn wrote:
> > Quoting David P. Quigley ([EMAIL PROTECTED]):
> > > On Thu, 2007-10-25 at 19:02 -0500, Serge E. Hallyn wrote:
> > > > Quoting David P. Quigley ([EMAIL PROTECTED]):
> > > > >  static int task_alloc_security(struct task_struct *task)
> > > > > @@ -2423,14 +2397,22 @@ static const char 
> > > > > *selinux_inode_xattr_getsuffix(void)
> > > > >   *
> > > > >   * Permission check is handled by selinux_inode_getxattr hook.
> > > > >   */
> > > > > -static int selinux_inode_getsecurity(const struct inode *inode, 
> > > > > const char *name, void *buffer, size_t size, int err)
> > > > > +static int selinux_inode_getsecurity(const struct inode *inode,
> > > > > + const char *name,
> > > > > + void **buffer)
> > > > >  {
> > > > > + u32 size;
> > > > > + int error;
> > > > >   struct inode_security_struct *isec = inode->i_security;
> > > > >  
> > > > >   if (strcmp(name, XATTR_SELINUX_SUFFIX))
> > > > >   return -EOPNOTSUPP;
> > > > >  
> > > > > - return selinux_getsecurity(isec->sid, buffer, size);
> > > > > + error = security_sid_to_context(isec->sid, (char **)buffer, 
> > > > > &size);
> > > > 
> > > > The only other downside I see here is that when the user just passes in
> > > > NULL for a buffer, security_sid_to_context() will still
> > > > kmalloc the buffer only to have it immediately freed by
> > > > xattr_getsecurity() through release_secctx().  I trust that isn't seen
> > > > as any major performance impact?
> > > 
> > > There is no way to avoid this in the SELinux case. SELinux doesn't store
> > > the sid to string mapping directly. Rather it takes the sid and then
> > > builds the string from fields in the related structure. So regardless
> > > this data is being allocated internally. The only issue I potentially
> > > see is that if someone passes in null expecting just to get the length
> > > we are actually returning a value. However we are changing the semantics
> > > of the function so the old semantics are no longer valid.
> > 
> > Hmm?  Which semantics are no longer valid?
> > 
> > You're changing the semantincs of the in-kernel API, but userspace can
> > still send in NULL to query the length of the buffer needed.  So if
> > userspace does two getxattrs, one to get the length, then another to get
> > the value, selinux will be kmallocing twice.
> > 
> > For a file manager doing a listing on a huge directory and wanting to
> > list the selinux type, i could see that being a performance issue.  Of
> > course they could get around that by sending in a 'reasonably large'
> > buffer for a first try.
> > 
> 
> Ok lets start this line of thought over again since it has been a while
> since I wrote the patches and got almost no sleep last night. 
> 
> Your concerns are that we are double allocating buffers one of which we
> are just going to immediately free after a copy. So inside the SELinux
> helper function there was what I saw as generic code for handling
> xattrs. This can be seen in the new function xattr_getsecurity which use
> to be internal to SELinux (selinux_getsecurity). What we are doing is
> grabbing the string which internally is being allocated anyway and if
> our buffer passed in for the copy is null we just goto out returning the
> length and freeing the buffer. So here is our standard null handling
> that we had before. In LSMs where there is no internal allocation to
> handle the getsecurity call this should introduce almost no overhead.

Ah, thanks, you reminded me of what I was trying to point out.

SMACK won't do allocations so it's ok.  SELinux will do allocations
in any case so it's ok.  So in terms of current users it's fine, so I
don't want to complaint too loudly.

But the now-generic xattr_getsecurity() call passes in 'buffer' from its
stack, with no indication to the LSM of whether userspace passed in NULL
or a buffer.  So if there *were* an lsm which had to allocate space to
return data, but didn't want to do so when the user just asked for the
length of the data, then that LSM would be out of luck.

So would you object to passing in a boolean telling the LSM whether the
user had passed in a buffer in which to return data or not?

thanks,
-serge

> For example in Casey's latest SMACK patch he has a table of the label
> strings and he can pass a pointer directly into that table back via the
> security_inode_getsecurity hook. 
>   In addition to this completes the lifecycle management that
> security_release_secctx attempts to perform. It doesn't seem right that
> if we require security_release_secctx to free the data we obtained from
> security_inode_getsecurity that the data buffer should be allocated
> outside of security_inode_getsecurity.
> 
> I hope this clears up any questions/concerns.
> 
> > -serge
> > -
> > To unsubscribe from this list: send th

Re: [patch] fs: restore nobh

2007-10-26 Thread Jan Kara
On Fri 26-10-07 00:49:41, Nick Piggin wrote:
> On Thu, Oct 25, 2007 at 09:07:36PM +0200, Jan Kara wrote:
> >   Hi,
> > 
> > > This is overdue, sorry. Got a little complicated, and I've been away from
> > > my filesystem test setup so I didn't want ot send it (lucky, coz I found
> > > a bug after more substantial testing).
> > > 
> > > Anyway, RFC?
> >   Hmm, maybe one comment/question:
> > 
> > > Index: linux-2.6/fs/buffer.c
> > > ===
> > > --- linux-2.6.orig/fs/buffer.c2007-10-08 14:09:35.0 +1000
> > > +++ linux-2.6/fs/buffer.c 2007-10-08 16:45:28.0 +1000
> > ...
> > 
> > > -/*
> > > - * Make sure any changes to nobh_commit_write() are reflected in
> > > - * nobh_truncate_page(), since it doesn't call commit_write().
> > > - */
> > > -int nobh_commit_write(struct file *file, struct page *page,
> > > - unsigned from, unsigned to)
> > > +int nobh_write_end(struct file *file, struct address_space *mapping,
> > > + loff_t pos, unsigned len, unsigned copied,
> > > + struct page *page, void *fsdata)
> > >  {
> > >   struct inode *inode = page->mapping->host;
> > > - loff_t pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
> > > + struct buffer_head *head = NULL;
> > > + struct buffer_head *bh;
> > >  
> > > - if (page_has_buffers(page))
> > > - return generic_commit_write(file, page, from, to);
> > > + if (!PageMappedToDisk(page)) {
> > > + if (unlikely(copied < len) && !page_has_buffers(page))
> > > + attach_nobh_buffers(page, head);
> > > + if (page_has_buffers(page))
> > > + return generic_write_end(file, mapping, pos, len, 
> > > copied, page, fsdata);
> > > + }
> >   So we can have a page with attached buffers but we decide to not call
> > generic_write_end()...
> > 
> > >   SetPageUptodate(page);
> > >   set_page_dirty(page);
> >   And we just set the page dirty but we leave buffers clean. So cannot
> > subsequent try_to_free_buffers() come, mark page as clean (as buffers
> > are clean) and discard the data?
> 
> Hmm, we might just be OK here because set_page_dirty should dirty the
> buffers. However, I think you have spotted a mistake here and it would be
> better if the generic_write_end test was outside the PageMapedToDisk
> check.
  But set_page_dirty() does not dirty buffers. That is only done by
set_page_dirty_buffers(). Or am I missing something?

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SUSE Labs, CR
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VFS/Security: Rework inode_getsecurity and callers to return resulting buffer

2007-10-26 Thread David P. Quigley
On Fri, 2007-10-26 at 10:02 -0500, Serge E. Hallyn wrote:
> Quoting David P. Quigley ([EMAIL PROTECTED]):
> > On Thu, 2007-10-25 at 19:02 -0500, Serge E. Hallyn wrote:
> > > Quoting David P. Quigley ([EMAIL PROTECTED]):
> > > >  static int task_alloc_security(struct task_struct *task)
> > > > @@ -2423,14 +2397,22 @@ static const char 
> > > > *selinux_inode_xattr_getsuffix(void)
> > > >   *
> > > >   * Permission check is handled by selinux_inode_getxattr hook.
> > > >   */
> > > > -static int selinux_inode_getsecurity(const struct inode *inode, const 
> > > > char *name, void *buffer, size_t size, int err)
> > > > +static int selinux_inode_getsecurity(const struct inode *inode,
> > > > +   const char *name,
> > > > +   void **buffer)
> > > >  {
> > > > +   u32 size;
> > > > +   int error;
> > > > struct inode_security_struct *isec = inode->i_security;
> > > >  
> > > > if (strcmp(name, XATTR_SELINUX_SUFFIX))
> > > > return -EOPNOTSUPP;
> > > >  
> > > > -   return selinux_getsecurity(isec->sid, buffer, size);
> > > > +   error = security_sid_to_context(isec->sid, (char **)buffer, 
> > > > &size);
> > > 
> > > The only other downside I see here is that when the user just passes in
> > > NULL for a buffer, security_sid_to_context() will still
> > > kmalloc the buffer only to have it immediately freed by
> > > xattr_getsecurity() through release_secctx().  I trust that isn't seen
> > > as any major performance impact?
> > 
> > There is no way to avoid this in the SELinux case. SELinux doesn't store
> > the sid to string mapping directly. Rather it takes the sid and then
> > builds the string from fields in the related structure. So regardless
> > this data is being allocated internally. The only issue I potentially
> > see is that if someone passes in null expecting just to get the length
> > we are actually returning a value. However we are changing the semantics
> > of the function so the old semantics are no longer valid.
> 
> Hmm?  Which semantics are no longer valid?
> 
> You're changing the semantincs of the in-kernel API, but userspace can
> still send in NULL to query the length of the buffer needed.  So if
> userspace does two getxattrs, one to get the length, then another to get
> the value, selinux will be kmallocing twice.
> 
> For a file manager doing a listing on a huge directory and wanting to
> list the selinux type, i could see that being a performance issue.  Of
> course they could get around that by sending in a 'reasonably large'
> buffer for a first try.
> 

Ok lets start this line of thought over again since it has been a while
since I wrote the patches and got almost no sleep last night. 

Your concerns are that we are double allocating buffers one of which we
are just going to immediately free after a copy. So inside the SELinux
helper function there was what I saw as generic code for handling
xattrs. This can be seen in the new function xattr_getsecurity which use
to be internal to SELinux (selinux_getsecurity). What we are doing is
grabbing the string which internally is being allocated anyway and if
our buffer passed in for the copy is null we just goto out returning the
length and freeing the buffer. So here is our standard null handling
that we had before. In LSMs where there is no internal allocation to
handle the getsecurity call this should introduce almost no overhead.
For example in Casey's latest SMACK patch he has a table of the label
strings and he can pass a pointer directly into that table back via the
security_inode_getsecurity hook. 
In addition to this completes the lifecycle management that
security_release_secctx attempts to perform. It doesn't seem right that
if we require security_release_secctx to free the data we obtained from
security_inode_getsecurity that the data buffer should be allocated
outside of security_inode_getsecurity.

I hope this clears up any questions/concerns.

> -serge
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-security-module" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VFS/Security: Rework inode_getsecurity and callers to return resulting buffer

2007-10-26 Thread Serge E. Hallyn
Quoting Stephen Smalley ([EMAIL PROTECTED]):
> On Fri, 2007-10-26 at 10:02 -0500, Serge E. Hallyn wrote:
> > Quoting David P. Quigley ([EMAIL PROTECTED]):
> > > On Thu, 2007-10-25 at 19:02 -0500, Serge E. Hallyn wrote:
> > > > Quoting David P. Quigley ([EMAIL PROTECTED]):
> > > > >  static int task_alloc_security(struct task_struct *task)
> > > > > @@ -2423,14 +2397,22 @@ static const char 
> > > > > *selinux_inode_xattr_getsuffix(void)
> > > > >   *
> > > > >   * Permission check is handled by selinux_inode_getxattr hook.
> > > > >   */
> > > > > -static int selinux_inode_getsecurity(const struct inode *inode, 
> > > > > const char *name, void *buffer, size_t size, int err)
> > > > > +static int selinux_inode_getsecurity(const struct inode *inode,
> > > > > + const char *name,
> > > > > + void **buffer)
> > > > >  {
> > > > > + u32 size;
> > > > > + int error;
> > > > >   struct inode_security_struct *isec = inode->i_security;
> > > > >  
> > > > >   if (strcmp(name, XATTR_SELINUX_SUFFIX))
> > > > >   return -EOPNOTSUPP;
> > > > >  
> > > > > - return selinux_getsecurity(isec->sid, buffer, size);
> > > > > + error = security_sid_to_context(isec->sid, (char **)buffer, 
> > > > > &size);
> > > > 
> > > > The only other downside I see here is that when the user just passes in
> > > > NULL for a buffer, security_sid_to_context() will still
> > > > kmalloc the buffer only to have it immediately freed by
> > > > xattr_getsecurity() through release_secctx().  I trust that isn't seen
> > > > as any major performance impact?
> > > 
> > > There is no way to avoid this in the SELinux case. SELinux doesn't store
> > > the sid to string mapping directly. Rather it takes the sid and then
> > > builds the string from fields in the related structure. So regardless
> > > this data is being allocated internally. The only issue I potentially
> > > see is that if someone passes in null expecting just to get the length
> > > we are actually returning a value. However we are changing the semantics
> > > of the function so the old semantics are no longer valid.
> > 
> > Hmm?  Which semantics are no longer valid?
> > 
> > You're changing the semantincs of the in-kernel API, but userspace can
> > still send in NULL to query the length of the buffer needed.  So if
> > userspace does two getxattrs, one to get the length, then another to get
> > the value, selinux will be kmallocing twice.
> > 
> > For a file manager doing a listing on a huge directory and wanting to
> > list the selinux type, i could see that being a performance issue.  Of
> > course they could get around that by sending in a 'reasonably large'
> > buffer for a first try.
> 
> That's what current userland does. libselinux always tries with an
> initial buffer first (and usually succeeds), thereby avoiding the second
> call to getxattr in the common case.

Ok - I figured for doing thousands of these in one directory listing
that could waste quite a bit of memory, but since (as i check) selinux
has always done a kmalloc for every getsecurity call, I guess it's a
fair tradeoff

thanks,
-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VFS/Security: Rework inode_getsecurity and callers to return resulting buffer

2007-10-26 Thread Stephen Smalley
On Fri, 2007-10-26 at 10:02 -0500, Serge E. Hallyn wrote:
> Quoting David P. Quigley ([EMAIL PROTECTED]):
> > On Thu, 2007-10-25 at 19:02 -0500, Serge E. Hallyn wrote:
> > > Quoting David P. Quigley ([EMAIL PROTECTED]):
> > > >  static int task_alloc_security(struct task_struct *task)
> > > > @@ -2423,14 +2397,22 @@ static const char 
> > > > *selinux_inode_xattr_getsuffix(void)
> > > >   *
> > > >   * Permission check is handled by selinux_inode_getxattr hook.
> > > >   */
> > > > -static int selinux_inode_getsecurity(const struct inode *inode, const 
> > > > char *name, void *buffer, size_t size, int err)
> > > > +static int selinux_inode_getsecurity(const struct inode *inode,
> > > > +   const char *name,
> > > > +   void **buffer)
> > > >  {
> > > > +   u32 size;
> > > > +   int error;
> > > > struct inode_security_struct *isec = inode->i_security;
> > > >  
> > > > if (strcmp(name, XATTR_SELINUX_SUFFIX))
> > > > return -EOPNOTSUPP;
> > > >  
> > > > -   return selinux_getsecurity(isec->sid, buffer, size);
> > > > +   error = security_sid_to_context(isec->sid, (char **)buffer, 
> > > > &size);
> > > 
> > > The only other downside I see here is that when the user just passes in
> > > NULL for a buffer, security_sid_to_context() will still
> > > kmalloc the buffer only to have it immediately freed by
> > > xattr_getsecurity() through release_secctx().  I trust that isn't seen
> > > as any major performance impact?
> > 
> > There is no way to avoid this in the SELinux case. SELinux doesn't store
> > the sid to string mapping directly. Rather it takes the sid and then
> > builds the string from fields in the related structure. So regardless
> > this data is being allocated internally. The only issue I potentially
> > see is that if someone passes in null expecting just to get the length
> > we are actually returning a value. However we are changing the semantics
> > of the function so the old semantics are no longer valid.
> 
> Hmm?  Which semantics are no longer valid?
> 
> You're changing the semantincs of the in-kernel API, but userspace can
> still send in NULL to query the length of the buffer needed.  So if
> userspace does two getxattrs, one to get the length, then another to get
> the value, selinux will be kmallocing twice.
> 
> For a file manager doing a listing on a huge directory and wanting to
> list the selinux type, i could see that being a performance issue.  Of
> course they could get around that by sending in a 'reasonably large'
> buffer for a first try.

That's what current userland does. libselinux always tries with an
initial buffer first (and usually succeeds), thereby avoiding the second
call to getxattr in the common case.

-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VFS/Security: Rework inode_getsecurity and callers to return resulting buffer

2007-10-26 Thread David P. Quigley
On Fri, 2007-10-26 at 11:13 -0400, David P. Quigley wrote:
> On Fri, 2007-10-26 at 10:02 -0500, Serge E. Hallyn wrote:
> > Quoting David P. Quigley ([EMAIL PROTECTED]):
> > > On Thu, 2007-10-25 at 19:02 -0500, Serge E. Hallyn wrote:
> > > > Quoting David P. Quigley ([EMAIL PROTECTED]):
> > > > >  static int task_alloc_security(struct task_struct *task)
> > > > > @@ -2423,14 +2397,22 @@ static const char 
> > > > > *selinux_inode_xattr_getsuffix(void)
> > > > >   *
> > > > >   * Permission check is handled by selinux_inode_getxattr hook.
> > > > >   */
> > > > > -static int selinux_inode_getsecurity(const struct inode *inode, 
> > > > > const char *name, void *buffer, size_t size, int err)
> > > > > +static int selinux_inode_getsecurity(const struct inode *inode,
> > > > > + const char *name,
> > > > > + void **buffer)
> > > > >  {
> > > > > + u32 size;
> > > > > + int error;
> > > > >   struct inode_security_struct *isec = inode->i_security;
> > > > >  
> > > > >   if (strcmp(name, XATTR_SELINUX_SUFFIX))
> > > > >   return -EOPNOTSUPP;
> > > > >  
> > > > > - return selinux_getsecurity(isec->sid, buffer, size);
> > > > > + error = security_sid_to_context(isec->sid, (char **)buffer, 
> > > > > &size);
> > > > 
> > > > The only other downside I see here is that when the user just passes in
> > > > NULL for a buffer, security_sid_to_context() will still
> > > > kmalloc the buffer only to have it immediately freed by
> > > > xattr_getsecurity() through release_secctx().  I trust that isn't seen
> > > > as any major performance impact?
> > > 
> > > There is no way to avoid this in the SELinux case. SELinux doesn't store
> > > the sid to string mapping directly. Rather it takes the sid and then
> > > builds the string from fields in the related structure. So regardless
> > > this data is being allocated internally. The only issue I potentially
> > > see is that if someone passes in null expecting just to get the length
> > > we are actually returning a value. However we are changing the semantics
> > > of the function so the old semantics are no longer valid.
> > 
> > Hmm?  Which semantics are no longer valid?
> > 
> > You're changing the semantincs of the in-kernel API, but userspace can
> > still send in NULL to query the length of the buffer needed.  So if
> > userspace does two getxattrs, one to get the length, then another to get
> > the value, selinux will be kmallocing twice.
> 
> 
> We are changing the semantics of the LSM hook. From a memory allocation
> stand point the only difference is who allocates the buffer. Before this
> patch we would allocate one outside of the LSM hook and then fill it
> (even if null were passed). Since the LSM hook is in a better position
> to accurately create this buffer we pushed the allocation to it.
> 
> > For a file manager doing a listing on a huge directory and wanting to
> > list the selinux type, i could see that being a performance issue.  Of
> > course they could get around that by sending in a 'reasonably large'
> > buffer for a first try.
> 
> I'd agree but these allocations have always been happening and no one
> has seen a problem with it. There should be the same number of
> allocations as before the question is where are they done.

This statement applies to SELinux. In actuality it is possible for an
LSM to have less allocation than before with this model.

> 
> > 
> > -serge
> 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-security-module" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VFS/Security: Rework inode_getsecurity and callers to return resulting buffer

2007-10-26 Thread David P. Quigley
On Fri, 2007-10-26 at 10:07 -0500, Serge E. Hallyn wrote:
> Quoting David P. Quigley ([EMAIL PROTECTED]):
> > On Thu, 2007-10-25 at 19:02 -0500, Serge E. Hallyn wrote:
> > > Quoting David P. Quigley ([EMAIL PROTECTED]):
> > > > This patch modifies the interface to inode_getsecurity to have 
> > > > the
> > > > function return a buffer containing the security blob and its length via
> > > > parameters instead of relying on the calling function to give it an
> > > > appropriately sized buffer. Security blobs obtained with this function
> > > > should be freed using the release_secctx LSM hook. This alleviates the
> > > > problem of the caller having to guess a length and preallocate a buffer
> > > > for this function allowing it to be used elsewhere for Labeled NFS. The
> > > > patch also removed the unused err parameter. The conversion is similar
> > > > to the one performed by Al Viro for the security_getprocattr hook.
> > > > 
> > > > Signed-off-by: David P. Quigley <[EMAIL PROTECTED]>
> > > > ---
> > > >  fs/xattr.c   |   26 --
> > > >  include/linux/security.h |   27 ++-
> > > >  include/linux/xattr.h|1 +
> > > >  mm/shmem.c   |3 +--
> > > >  security/dummy.c |4 +++-
> > > >  security/selinux/hooks.c |   38 ++
> > > 
> > > (Hmm, I was about to ask if this diffstat could be complete, as it
> > > doesn't have for instance security/security.c, but I guess this predates
> > > the staticlsm patch...)
> > 
> > It wouldn't be much effort to rebase this patch against Linus's latest
> > tree. I am assuming that the static lsm patch is in there based on the
> > recent discussion on LKML?
> 
> Oh, sorry for the two emails.
> 
> Yeah it's in 2.6.24.  So a rebase will be necessary anyway.  I was just
> saying I was too lazy to find another tree against which to check that
> you didn't miss any getsecurity calls (hidden under some exotic .config)
> to change their arguments  :)

I used the LXR to get all uses of getsecurity so I am pretty sure I have
them all.

> 
> -serge
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-security-module" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VFS/Security: Rework inode_getsecurity and callers to return resulting buffer

2007-10-26 Thread David P. Quigley
On Fri, 2007-10-26 at 10:02 -0500, Serge E. Hallyn wrote:
> Quoting David P. Quigley ([EMAIL PROTECTED]):
> > On Thu, 2007-10-25 at 19:02 -0500, Serge E. Hallyn wrote:
> > > Quoting David P. Quigley ([EMAIL PROTECTED]):
> > > >  static int task_alloc_security(struct task_struct *task)
> > > > @@ -2423,14 +2397,22 @@ static const char 
> > > > *selinux_inode_xattr_getsuffix(void)
> > > >   *
> > > >   * Permission check is handled by selinux_inode_getxattr hook.
> > > >   */
> > > > -static int selinux_inode_getsecurity(const struct inode *inode, const 
> > > > char *name, void *buffer, size_t size, int err)
> > > > +static int selinux_inode_getsecurity(const struct inode *inode,
> > > > +   const char *name,
> > > > +   void **buffer)
> > > >  {
> > > > +   u32 size;
> > > > +   int error;
> > > > struct inode_security_struct *isec = inode->i_security;
> > > >  
> > > > if (strcmp(name, XATTR_SELINUX_SUFFIX))
> > > > return -EOPNOTSUPP;
> > > >  
> > > > -   return selinux_getsecurity(isec->sid, buffer, size);
> > > > +   error = security_sid_to_context(isec->sid, (char **)buffer, 
> > > > &size);
> > > 
> > > The only other downside I see here is that when the user just passes in
> > > NULL for a buffer, security_sid_to_context() will still
> > > kmalloc the buffer only to have it immediately freed by
> > > xattr_getsecurity() through release_secctx().  I trust that isn't seen
> > > as any major performance impact?
> > 
> > There is no way to avoid this in the SELinux case. SELinux doesn't store
> > the sid to string mapping directly. Rather it takes the sid and then
> > builds the string from fields in the related structure. So regardless
> > this data is being allocated internally. The only issue I potentially
> > see is that if someone passes in null expecting just to get the length
> > we are actually returning a value. However we are changing the semantics
> > of the function so the old semantics are no longer valid.
> 
> Hmm?  Which semantics are no longer valid?
> 
> You're changing the semantincs of the in-kernel API, but userspace can
> still send in NULL to query the length of the buffer needed.  So if
> userspace does two getxattrs, one to get the length, then another to get
> the value, selinux will be kmallocing twice.


We are changing the semantics of the LSM hook. From a memory allocation
stand point the only difference is who allocates the buffer. Before this
patch we would allocate one outside of the LSM hook and then fill it
(even if null were passed). Since the LSM hook is in a better position
to accurately create this buffer we pushed the allocation to it.

> For a file manager doing a listing on a huge directory and wanting to
> list the selinux type, i could see that being a performance issue.  Of
> course they could get around that by sending in a 'reasonably large'
> buffer for a first try.

I'd agree but these allocations have always been happening and no one
has seen a problem with it. There should be the same number of
allocations as before the question is where are they done.

> 
> -serge

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VFS/Security: Rework inode_getsecurity and callers to return resulting buffer

2007-10-26 Thread Serge E. Hallyn
Quoting David P. Quigley ([EMAIL PROTECTED]):
> On Thu, 2007-10-25 at 19:02 -0500, Serge E. Hallyn wrote:
> > Quoting David P. Quigley ([EMAIL PROTECTED]):
> > >   This patch modifies the interface to inode_getsecurity to have the
> > > function return a buffer containing the security blob and its length via
> > > parameters instead of relying on the calling function to give it an
> > > appropriately sized buffer. Security blobs obtained with this function
> > > should be freed using the release_secctx LSM hook. This alleviates the
> > > problem of the caller having to guess a length and preallocate a buffer
> > > for this function allowing it to be used elsewhere for Labeled NFS. The
> > > patch also removed the unused err parameter. The conversion is similar
> > > to the one performed by Al Viro for the security_getprocattr hook.
> > > 
> > > Signed-off-by: David P. Quigley <[EMAIL PROTECTED]>
> > > ---
> > >  fs/xattr.c   |   26 --
> > >  include/linux/security.h |   27 ++-
> > >  include/linux/xattr.h|1 +
> > >  mm/shmem.c   |3 +--
> > >  security/dummy.c |4 +++-
> > >  security/selinux/hooks.c |   38 ++
> > 
> > (Hmm, I was about to ask if this diffstat could be complete, as it
> > doesn't have for instance security/security.c, but I guess this predates
> > the staticlsm patch...)
> 
> It wouldn't be much effort to rebase this patch against Linus's latest
> tree. I am assuming that the static lsm patch is in there based on the
> recent discussion on LKML?

Oh, sorry for the two emails.

Yeah it's in 2.6.24.  So a rebase will be necessary anyway.  I was just
saying I was too lazy to find another tree against which to check that
you didn't miss any getsecurity calls (hidden under some exotic .config)
to change their arguments  :)

-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VFS/Security: Rework inode_getsecurity and callers to return resulting buffer

2007-10-26 Thread Serge E. Hallyn
Quoting David P. Quigley ([EMAIL PROTECTED]):
> On Thu, 2007-10-25 at 19:02 -0500, Serge E. Hallyn wrote:
> > Quoting David P. Quigley ([EMAIL PROTECTED]):
> > >  static int task_alloc_security(struct task_struct *task)
> > > @@ -2423,14 +2397,22 @@ static const char 
> > > *selinux_inode_xattr_getsuffix(void)
> > >   *
> > >   * Permission check is handled by selinux_inode_getxattr hook.
> > >   */
> > > -static int selinux_inode_getsecurity(const struct inode *inode, const 
> > > char *name, void *buffer, size_t size, int err)
> > > +static int selinux_inode_getsecurity(const struct inode *inode,
> > > + const char *name,
> > > + void **buffer)
> > >  {
> > > + u32 size;
> > > + int error;
> > >   struct inode_security_struct *isec = inode->i_security;
> > >  
> > >   if (strcmp(name, XATTR_SELINUX_SUFFIX))
> > >   return -EOPNOTSUPP;
> > >  
> > > - return selinux_getsecurity(isec->sid, buffer, size);
> > > + error = security_sid_to_context(isec->sid, (char **)buffer, &size);
> > 
> > The only other downside I see here is that when the user just passes in
> > NULL for a buffer, security_sid_to_context() will still
> > kmalloc the buffer only to have it immediately freed by
> > xattr_getsecurity() through release_secctx().  I trust that isn't seen
> > as any major performance impact?
> 
> There is no way to avoid this in the SELinux case. SELinux doesn't store
> the sid to string mapping directly. Rather it takes the sid and then
> builds the string from fields in the related structure. So regardless
> this data is being allocated internally. The only issue I potentially
> see is that if someone passes in null expecting just to get the length
> we are actually returning a value. However we are changing the semantics
> of the function so the old semantics are no longer valid.

Hmm?  Which semantics are no longer valid?

You're changing the semantincs of the in-kernel API, but userspace can
still send in NULL to query the length of the buffer needed.  So if
userspace does two getxattrs, one to get the length, then another to get
the value, selinux will be kmallocing twice.

For a file manager doing a listing on a huge directory and wanting to
list the selinux type, i could see that being a performance issue.  Of
course they could get around that by sending in a 'reasonably large'
buffer for a first try.

-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VFS/Security: Rework inode_getsecurity and callers to return resulting buffer

2007-10-26 Thread David P. Quigley
On Thu, 2007-10-25 at 19:02 -0500, Serge E. Hallyn wrote:
> Quoting David P. Quigley ([EMAIL PROTECTED]):
> > This patch modifies the interface to inode_getsecurity to have the
> > function return a buffer containing the security blob and its length via
> > parameters instead of relying on the calling function to give it an
> > appropriately sized buffer. Security blobs obtained with this function
> > should be freed using the release_secctx LSM hook. This alleviates the
> > problem of the caller having to guess a length and preallocate a buffer
> > for this function allowing it to be used elsewhere for Labeled NFS. The
> > patch also removed the unused err parameter. The conversion is similar
> > to the one performed by Al Viro for the security_getprocattr hook.
> > 
> > Signed-off-by: David P. Quigley <[EMAIL PROTECTED]>
> > ---
> >  fs/xattr.c   |   26 --
> >  include/linux/security.h |   27 ++-
> >  include/linux/xattr.h|1 +
> >  mm/shmem.c   |3 +--
> >  security/dummy.c |4 +++-
> >  security/selinux/hooks.c |   38 ++
> 
> (Hmm, I was about to ask if this diffstat could be complete, as it
> doesn't have for instance security/security.c, but I guess this predates
> the staticlsm patch...)

It wouldn't be much effort to rebase this patch against Linus's latest
tree. I am assuming that the static lsm patch is in there based on the
recent discussion on LKML?

> 
> >  6 files changed, 53 insertions(+), 46 deletions(-)
> > 
> > diff --git a/fs/xattr.c b/fs/xattr.c
> > index a44fd92..d45c7ef 100644
> > --- a/fs/xattr.c
> > +++ b/fs/xattr.c
> > @@ -105,6 +105,29 @@ out:
> >  EXPORT_SYMBOL_GPL(vfs_setxattr);
> >  
> >  ssize_t
> > +xattr_getsecurity(struct inode *inode, const char *name, void *value,
> > +   size_t size)
> > +{
> > +   void *buffer = NULL;
> > +   ssize_t len;
> > +
> > +   len = security_inode_getsecurity(inode, name, &buffer);
> > +   if (len < 0)
> > +   return len;
> > +   if (!value || !size)
> > +   goto out;
> > +   if (size < len) {
> > +   len = -ERANGE;
> > +   goto out;
> > +   }
> > +   memcpy(value, buffer, len);
> > +out:
> > +   security_release_secctx(buffer, len);
> 
> This is mighty misleading in -ERANGE case :)  I realize that
> selinux_release_secctx() ignores len anyway.  But given the description
> in security.h, I'd say either you need to keep the actual length
> allocated and pass that in here, or (probably better) have another patch
> remove the second argument from security_release_secctx().

I would consider this a bug on my part. I don't think we can get rid of
the len argument to security_release_secctx because it is supposed to be
a generic destructor for security blobs. If the module always returned a
fixed length blob we could remove it but there is no telling what one of
the many new LSMs to come will do. 

> 
> > +   return len;
> > +}
> > +EXPORT_SYMBOL_GPL(xattr_getsecurity);
> > +
> > +ssize_t
> >  vfs_getxattr(struct dentry *dentry, char *name, void *value, size_t size)
> >  {
> > struct inode *inode = dentry->d_inode;
> > @@ -126,8 +149,7 @@ vfs_getxattr(struct dentry *dentry, char *name, void 
> > *value, size_t size)
> > if (!strncmp(name, XATTR_SECURITY_PREFIX,
> > XATTR_SECURITY_PREFIX_LEN)) {
> > const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
> > -   int ret = security_inode_getsecurity(inode, suffix, value,
> > -size, error);
> > +   int ret = xattr_getsecurity(inode, suffix, value, size);
> > /*
> >  * Only overwrite the return value if a security module
> >  * is actually active.
> > diff --git a/include/linux/security.h b/include/linux/security.h
> > index 1a15526..8658929 100644
> > --- a/include/linux/security.h
> > +++ b/include/linux/security.h
> > @@ -391,15 +391,11 @@ struct request_sock;
> >   * identified by @name for @dentry.
> >   * Return 0 if permission is granted.
> >   * @inode_getsecurity:
> > - * Copy the extended attribute representation of the security label 
> > - * associated with @name for @inode into @buffer.  @buffer may be
> > - * NULL to request the size of the buffer required.  @size indicates
> > - * the size of @buffer in bytes.  Note that @name is the remainder
> > - * of the attribute name after the security. prefix has been removed.
> > - * @err is the return value from the preceding fs getxattr call,
> > - * and can be used by the security module to determine whether it
> > - * should try and canonicalize the attribute value.
> > - * Return number of bytes used/required on success.
> > + * Retrieve a copy of the extended attribute representation of the
> > + * security label associated with @name for @inode via @buffer.  Note that
> > + * @name is the remainder of th

[PATCH] Add an ERR_CAST() function to complement ERR_PTR and co. [try #5.2]

2007-10-26 Thread David Howells
Add an ERR_CAST() function to complement ERR_PTR and co. for the purposes of
casting an error entyped as one pointer type to an error of another pointer
type whilst making it explicit as to what is going on.

This provides a replacement for the ERR_PTR(PTR_ERR(p)) construct.

Signed-off-by: David Howells <[EMAIL PROTECTED]>
---

 include/linux/err.h |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/include/linux/err.h b/include/linux/err.h
index 1ab1d44..ec87f31 100644
--- a/include/linux/err.h
+++ b/include/linux/err.h
@@ -34,6 +34,19 @@ static inline long IS_ERR(const void *ptr)
return IS_ERR_VALUE((unsigned long)ptr);
 }
 
+/**
+ * ERR_CAST - Explicitly cast an error-valued pointer to another pointer type
+ * @ptr: The pointer to cast.
+ *
+ * Explicitly cast an error-valued pointer to another pointer type in such a
+ * way as to make it clear that's what's going on.
+ */
+static inline void *ERR_CAST(const void *ptr)
+{
+   /* cast away the const */
+   return (void *) ptr;
+}
+
 #endif
 
 #endif /* _LINUX_ERR_H */

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland

2007-10-26 Thread Hugh Dickins
On Fri, 26 Oct 2007, Neil Brown wrote:
> On Thursday October 25, [EMAIL PROTECTED] wrote:
> 
> The patch looks like it makes perfect sense to me.

Great, thanks a lot for looking at it, Neil and Pekka.

> Before the change, ->writepage could return AOP_WRITEPAGE_ACTIVATE
> without unlocking the page, and this has precisely the effect of:
>ClearPageReclaim();  (if the call path was through pageout)
>SetPageActive();  (if the call was through shrink_page_list)
>unlock_page();
> 
> With the patch, the ->writepage method does the SetPageActive and the
> unlock_page, which on the whole seems cleaner.
> 
> We seem to have lost a call to ClearPageReclaim - I don't know if that
> is significant.

It doesn't show up in the diff at all, but pageout() already has
if (!PageWriteback(page)) {
/* synchronous write or broken a_ops? */
ClearPageReclaim(page);
}
which will clear it since we've never set PageWriteback.

I think no harm would come from leaving it set there, since it only
takes effect in end_page_writeback (its effect being to let the just
written page be moved to the end of the LRU, so that it will then
be soon reclaimed), and clear_page_dirty_for_io clears it before
coming down this way.  But I'd never argue for that: I hate having
leftover flags hanging around outside the scope of their relevance.

> > Special, hidden, undocumented, secret hack!  Then in 2.6.7 Andrew
> > stole his own secret and used it when concocting ramdisk_writepage.
> > Oh, and NFS made some kind of use of it in 2.6.6 only.  Then Neil
> > revealed the secret to the uninitiated in 2.6.17: now, what's the
> > appropriate punishment for that?
> 
> Surely the punishment should be for writing hidden undocumented hacks
> in the first place!  I vote we just make him maintainer for the whole
> kernel - that will keep him so busy that he will never have a chance
> to do it again :-)

That is a splendid retort, which has won you absolution.
But it makes me a little sad: that smiley should be a weepy.

> > --- 2.6.24-rc1/Documentation/filesystems/vfs.txt2007-10-24 
> > 07:15:11.0 +0100
> > +++ linux/Documentation/filesystems/vfs.txt 2007-10-24 08:42:07.0 
> > +0100
> > @@ -567,9 +567,7 @@ struct address_space_operations {
> >If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to
> >try too hard if there are problems, and may choose to write out
> >other pages from the mapping if that is easier (e.g. due to
> > -  internal dependencies).  If it chooses not to start writeout, it
> > -  should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep
> > -  calling ->writepage on that page.
> > +  internal dependencies).
> >  
> 
> It seems that the new requirement is that if the address_space
> chooses not to write out the page, it should now call SetPageActive().
> If that is the case, I think it should be explicit in the
> documentation - please?

No, it's not the case; but you're right that I should add something
there, to put an end to the idea.  It'll be something along the lines
of "You may notice shmem setting PageActive there, but please don't do
that; or if you insist, be sure never to do so in the !wbc->for_reclaim
case".

The PageActive thing is for when a filesystem regrets that it even
had a ->writepage (it replicates the behaviour of the writepage == NULL
case or the VM_LOCKED SWAP_FAIL case or the !add_to_swap case, delaying
the return of this page to writepage for as long as it can).  It's done
in shmem_writepage because shm_lock (equivalent to VM_LOCKED) is only
discovered within that writepage, and no-swap is discovered there too.

ramdisk does it too: I've not tried to understand ramdisk as Nick and
Eric have, but it used to have no writepage, and would prefer to have
no writepage, but appears to need one for some PageUptodate reasons.

It's fairly normal for a filesystem to find that for some reason it
cannot carry out a writepage on this page right now (in the reclaim
case: the sync case demands action, IIRC); so it then simply does
set_page_dirty and unlock_page and returns "success".

I'll try to condense this down for the Doc when finalizing the patch;
which I've still not yet tested properly - thanks for the eyes, but
I can't submit it until I've checked in detail that it really gets
to do what we think it does.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Distributed storage. Move away from char device ioctls.

2007-10-26 Thread Evgeniy Polyakov
Returning back to this, since block based storage, which can act as a
shared storage/transport layer, is ready with 5'th release of the DST.

My couple of notes on proposed data distribution algorithm in FS.

On Sun, Sep 16, 2007 at 03:07:11AM -0400, Kyle Moffett ([EMAIL PROTECTED]) 
wrote:
> >I actually think there is a place for this - and improvements are  
> >definitely welcome.  Even Lustre needs block-device level  
> >redundancy currently, though we will be working to make Lustre- 
> >level redundancy available in the future (the problem is WAY harder  
> >than it seems at first glance, if you allow writeback caches at the  
> >clients and servers).
> 
> I really think that to get proper non-block-device-level filesystem  
> redundancy you need to base it on something similar to the GIT  
> model.  Data replication is done in specific-sized chunks indexed by  
> SHA-1 sum and you actually have a sort of "merge algorithm" for when  
> local and remote changes differ.  The OS would only implement a very  
> limited list of merge algorithms, IE one of:
> 
> (A)  Don't merge, each client gets its own branch and merges are manual
> (B)  Most recent changed version is made the master every X-seconds/ 
> open/close/write/other-event.
> (C)  The tree at X (usually a particular client/server) is always  
> used as the master when there are conflicts.

This looks like a good way to work with offline clients (where I recall
Coda), after offline node modified data, it should be merged back to the
cluster with different algorithms.

Data (supposed to be) written to the failed node during its offline time
will be resynced from other nodes when failed one is online, there are
no problems and/or special algorithms to be used here.

Filesystem replication is not a 100% 'git way' - git tree contains
already combined objects - i.e. the last blob for given path does not
contain its history, only ready to be used data, while filesystem,
especially that one which requires simultaneous write from different
threads/nodes, should implement copy-on-write semantics, essentially
putting all new data (git commit) to the new location and then collect
it from different extents to present a ready file.

At least that is how I see the filesystem I'm working on.

...

> There's a lot of other technical details which would need resolution  
> in an actual implementation, but this is enough of a summary to give  
> you the gist of the concept.  Most likely there will be some major  
> flaw which makes it impossible to produce reliably, but the concept  
> contains the things I would be interested in for a real "networked  
> filesystem".

Git semantics and copy-on-write has actually quite a lot in common (on
high enough level of abstraction), but SHA1 index is not a possible 
issue in filesystem - even besides amount of data to be hashed before
key is ready. Key should also contain enough information about what
underlying data is - git does not store that information (tree, blob or
whatever) in its keys, since it does not require it. At least that is
how I see it to be implemented.

Overall I see this new project as a true copy-on-write FS.

Thanks.

> Cheers,
> Kyle Moffett

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VFS: new fgetattr() file operation

2007-10-26 Thread Miklos Szeredi
> On Fri, Oct 26, 2007 at 01:10:14AM +0200, Miklos Szeredi wrote:
> > > On Wed, Oct 24, 2007 at 05:27:04PM +0200, Miklos Szeredi wrote:
> > > > > >> Wouldn't you be better off by attempting to implement an "open
> > > > > >> by ino" operation and an operation to get the generation count
> > > > > >> for the file and then modifying the network protocol of interest
> > > > > >> to use these as identifiers for the file to be manipulated?
> > > > > >> 
> > > > > >
> > > > > > You mean an "open by inode" on the userspace API?  My guess, it
> > > > > > wouldn't get very far.
> > > > > 
> > > > > This isn't a new idea and has been implemented on a variety of
> > > > > different systems.
> > > > 
> > > > Like?
> > > 
> > > XFS.
> > > 
> > > 'man open_by_handle'
> > 
> > Doesn't seem widely used, with 600 something google hits.
> 
> from the man page:
> 
> "They are intended  for  use  by a limited set of system utilities such
> as backup programs."
> 
> It also gets used by HSMs and so it is current, tested and is not
> going away

Sure.

> > And in this
> > old thread Linus is not entirely enthusiastic about the concept:
> > 
> >   http://lkml.org/lkml/1999/1/11/244
> 
> That was "open by inode number", AFAICT. A handle is an opaque
> blob that can be an arbitrary length defined by the filesystem.
> You have to convert a fd or path to a handle first before you can
> use it later, so any filesystem can implement it...
> 
> i.e. it is exactly what this (unanswered) post suggested:
> 
> http://lkml.org/lkml/1999/1/13/186

Well, yeah, if a filesystem doesn't have a global index, the whole
path could just be stuffed into the handle.

But there's not much point in that, is there?  And because the
interface bypasses normal access checking on parent directories, it
has security implications, making it not generally useful.

Specifically, it is not useful for the "unprivileged file server" case
that Peter was suggesting.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland

2007-10-26 Thread Pekka Enberg
Hi,

On 10/26/07, Neil Brown <[EMAIL PROTECTED]> wrote:
> It seems that the new requirement is that if the address_space
> chooses not to write out the page, it should now call SetPageActive().
> If that is the case, I think it should be explicit in the
> documentation - please?

Agreed.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland

2007-10-26 Thread Pekka Enberg
Hi Hugh,

On 10/25/07, Hugh Dickins <[EMAIL PROTECTED]> wrote:
> @@ -349,10 +349,6 @@ static pageout_t pageout(struct page *pa
> res = mapping->a_ops->writepage(page, &wbc);
> if (res < 0)
> handle_write_error(mapping, page, res);
> -   if (res == AOP_WRITEPAGE_ACTIVATE) {
> -   ClearPageReclaim(page);
> -   return PAGE_ACTIVATE;
> -   }

I don't see ClearPageReclaim added anywhere. Is that done on purpose?
Other than that, the patch looks good to me and I think we should
stick it into -mm to punish Andrew for his secret hack ;-).

  Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html