[PATCH 2/4] writeback: 3-queue based writeback schedule

2007-08-09 Thread Fengguang Wu
Properly manage the 3 queues of sb->s_dirty/s_io/s_more_io so that
- time-ordering of dirtied_when can be easily maintained
- writeback can continue from where previous run left out

The majority work has been done by Andrew Morton and Ken Chen,
this patch just clarifies the roles of the 3 queues:
- s_dirty   for io delay(up to dirty_expire_interval)
- s_io  for io run(a full scan of s_io may involve multiple runs)
- s_more_io for io continuation

The following paradigm shows the data flow.

requeue on new scan(empty s_io)
+-+
| |
 dirty   old| |
 inodes  enough V |
 ==> s_dirty ==> s_io |
 ^| requeue io|
 |+-> s_more_io
 |   hold back|
 ++--> disk write requests

sb->s_dirty: a FIFO queue
- s_dirty hosts not-yet-expired(recently dirtied) dirty inodes
- once expired, inodes will be moved out of s_dirty and *never put back*
  (unless for some reason we have to hold on the inode for some time)

sb->s_io and sb->s_more_io: a cyclic queue scanned for io
- on each run of generic_sync_sb_inodes(), some more s_dirty inodes may be
  appended to s_io
- on each full scan of s_io, all s_more_io inodes will be moved back to s_io
- large files that cannot be synced in one run will be moved to s_more_io for
  retry on next full scan

inode->dirtied_when
- inode->dirtied_when is updated to the *current* jiffies on pushing into
  s_dirty, and is never changed in other cases.
- time-ordering thus can be simply ensured while moving inodes between lists,
  since (time order == enqueue order)

Cc: Ken Chen <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 fs/fs-writeback.c |  106 +---
 1 file changed, 52 insertions(+), 54 deletions(-)

--- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c
@@ -93,6 +93,15 @@ static void __check_dirty_inode_list(str
__FILE__, __LINE__);\
} while (0)
 
+
+int sb_has_dirty_inodes(struct super_block *sb)
+{
+   return !list_empty(&sb->s_dirty) ||
+  !list_empty(&sb->s_io) ||
+  !list_empty(&sb->s_more_io);
+}
+EXPORT_SYMBOL(sb_has_dirty_inodes);
+
 /**
  * __mark_inode_dirty -internal function
  * @inode: inode to mark
@@ -187,7 +196,7 @@ void __mark_inode_dirty(struct inode *in
goto out;
 
/*
-* If the inode was already on s_dirty or s_io, don't
+* If the inode was already on s_dirty/s_io/s_more_io, don't
 * reposition it (that would break s_dirty time-ordering).
 */
if (!was_dirty) {
@@ -211,33 +220,20 @@ static int write_inode(struct inode *ino
 }
 
 /*
- * Redirty an inode: set its when-it-was dirtied timestamp and move it to the
- * furthest end of its superblock's dirty-inode list.
- *
- * Before stamping the inode's ->dirtied_when, we check to see whether it is
- * already the most-recently-dirtied inode on the s_dirty list.  If that is
- * the case then the inode must have been redirtied while it was being written
- * out and we don't reset its dirtied_when.
+ * Enqueue a newly dirtied inode:
+ * - set its when-it-was dirtied timestamp
+ * - move it to the furthest end of its superblock's dirty-inode list
  */
 static void redirty_tail(struct inode *inode)
 {
-   struct super_block *sb = inode->i_sb;
-
check_dirty_inode(inode);
-   if (!list_empty(&sb->s_dirty)) {
-   struct inode *tail_inode;
-
-   tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
-   if (!time_after_eq(inode->dirtied_when,
-   tail_inode->dirtied_when))
-   inode->dirtied_when = jiffies;
-   }
-   list_move(&inode->i_list, &sb->s_dirty);
+   inode->dirtied_when = jiffies;
+   list_move(&inode->i_list, &inode->i_sb->s_dirty);
check_dirty_inode(inode);
 }
 
 /*
- * requeue inode for re-scanning after sb->s_io list is exhausted.
+ * Queue an inode for more io in the next full scan of s_io.
  */
 static void requeue_io(struct inode *inode)
 {
@@ -246,6 +242,32 @@ static void requeue_io(struct inode *ino
check_dirty_inode(inode);
 }
 
+/*
+ * Queue all possible inodes for a run of io.
+ * The resulting s_io is in order of:
+ * - inodes queued for more io from s_more_io(once for a full scan of s_io)
+ * - possible remaining inodes in s_io(was a partial scan)
+ * - dirty inodes (old enough) from s_dirty
+ */

[PATCH 3/4] writeback: function renames and cleanups

2007-08-09 Thread Fengguang Wu
Two function renames:
- rename redirty_tail() to queue_dirty_inode()
- rename requeue_io() to queue_for_more_io()

Also some code cleanups on fs-writeback.c. No behavior changes.

Cc: Ken Chen <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 fs/fs-writeback.c |  133 
 1 file changed, 62 insertions(+), 71 deletions(-)

--- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c
@@ -102,6 +102,55 @@ int sb_has_dirty_inodes(struct super_blo
 }
 EXPORT_SYMBOL(sb_has_dirty_inodes);
 
+/*
+ * Enqueue a newly dirtied inode:
+ * - set its when-it-was dirtied timestamp
+ * - move it to the furthest end of its superblock's dirty-inode list
+ */
+static void queue_dirty_inode(struct inode *inode)
+{
+   check_dirty_inode(inode);
+   inode->dirtied_when = jiffies;
+   list_move(&inode->i_list, &inode->i_sb->s_dirty);
+   check_dirty_inode(inode);
+}
+
+/*
+ * Queue an inode for more io in the next full scan of s_io.
+ */
+static void queue_for_more_io(struct inode *inode)
+{
+   check_dirty_inode(inode);
+   list_move(&inode->i_list, &inode->i_sb->s_more_io);
+   check_dirty_inode(inode);
+}
+
+/*
+ * Queue all possible inodes for a run of io.
+ * The resulting s_io is in order of:
+ * - inodes queued for more io from s_more_io(once for a full scan of s_io)
+ * - possible remaining inodes in s_io(was a partial scan)
+ * - dirty inodes (old enough) from s_dirty
+ */
+static void queue_inodes_for_io(struct super_block *sb,
+   unsigned long *older_than_this)
+{
+   check_dirty_inode_list(sb);
+   if (list_empty(&sb->s_io))
+   list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */
+   check_dirty_inode_list(sb);
+   while (!list_empty(&sb->s_dirty)) {
+   struct inode *inode = list_entry(sb->s_dirty.prev,
+   struct inode, i_list);
+   /* Was this inode dirtied too recently? */
+   if (older_than_this &&
+   time_after(inode->dirtied_when, *older_than_this))
+   break;
+   list_move(&inode->i_list, &sb->s_io);
+   }
+   check_dirty_inode_list(sb);
+}
+
 /**
  * __mark_inode_dirty -internal function
  * @inode: inode to mark
@@ -199,12 +248,8 @@ void __mark_inode_dirty(struct inode *in
 * If the inode was already on s_dirty/s_io/s_more_io, don't
 * reposition it (that would break s_dirty time-ordering).
 */
-   if (!was_dirty) {
-   check_dirty_inode(inode);
-   inode->dirtied_when = jiffies;
-   list_move(&inode->i_list, &sb->s_dirty);
-   check_dirty_inode(inode);
-   }
+   if (!was_dirty)
+   queue_dirty_inode(inode);
}
 out:
spin_unlock(&inode_lock);
@@ -219,55 +264,6 @@ static int write_inode(struct inode *ino
return 0;
 }
 
-/*
- * Enqueue a newly dirtied inode:
- * - set its when-it-was dirtied timestamp
- * - move it to the furthest end of its superblock's dirty-inode list
- */
-static void redirty_tail(struct inode *inode)
-{
-   check_dirty_inode(inode);
-   inode->dirtied_when = jiffies;
-   list_move(&inode->i_list, &inode->i_sb->s_dirty);
-   check_dirty_inode(inode);
-}
-
-/*
- * Queue an inode for more io in the next full scan of s_io.
- */
-static void requeue_io(struct inode *inode)
-{
-   check_dirty_inode(inode);
-   list_move(&inode->i_list, &inode->i_sb->s_more_io);
-   check_dirty_inode(inode);
-}
-
-/*
- * Queue all possible inodes for a run of io.
- * The resulting s_io is in order of:
- * - inodes queued for more io from s_more_io(once for a full scan of s_io)
- * - possible remaining inodes in s_io(was a partial scan)
- * - dirty inodes (old enough) from s_dirty
- */
-static void queue_inodes_for_io(struct super_block *sb,
-   unsigned long *older_than_this)
-{
-   check_dirty_inode_list(sb);
-   if (list_empty(&sb->s_io))
-   list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */
-   check_dirty_inode_list(sb);
-   while (!list_empty(&sb->s_dirty)) {
-   struct inode *inode = list_entry(sb->s_dirty.prev,
-   struct inode, i_list);
-   /* Was this inode dirtied too recently? */
-   if (older_than_this &&
-   time_after(inode->dirtied_when, *older_than_this))
-   break;
-   list_move(&inode->i_list, &sb->s_io);
-   }
-   check_dirty_inode_list(sb);
-}
-
 static void inode_sync_complete(struct inode *inode)
 {
/*
@@ -329,6 +325

[PATCH 1/4] writeback: check time-ordering of s_io and s_more_io

2007-08-09 Thread Fengguang Wu
It helps catch bugs like this:

[  738.645689] fs/fs-writeback.c:535: s_dirty got screwed up
[  738.646114] 8100028532b0:4295082249
[  738.646255] 810002856858:4295082259
[  738.646388] 810002831b58:4295082667
[  738.646520] 81000281b1b0:4295082671
[  738.646651] 81000281d798:4295083507
== s_dirty/s_io
[  738.646783] 81000287e998:4295081393
[  738.646916] 81000287e430:4295081403
[  738.647068] 8100028789d8:4295081409
[  738.647212] 810002878470:4295081415
[  738.647358] 810002884a18:4295081421
[  738.647503] 8100028844b0:4295081427
[  738.647648] 810002890a58:4295081433
[  738.647782] 8100028904f0:4295081441
[  738.647894] 81000288da98:4295081449
[  738.648011] 81000288d530:4295081455
[  738.648123] 810002897ad8:4295081461
[  738.648235] 810002897570:4295081469
[  738.648347] 810002894b18:4295081477
[  738.648459] 8100028945b0:4295081483

The buggy line 534 is
list_splice_init(&sb->s_io, sb->s_dirty.prev);

This is not the only time-ordering bug in linux-2.6.23-rc1-mm2.
Let's fix them all.

Cc: Ken Chen <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 fs/fs-writeback.c |   37 +++--
 1 file changed, 31 insertions(+), 6 deletions(-)

--- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c
+++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c
@@ -26,12 +26,12 @@
 
 int sysctl_inode_debug __read_mostly;
 
-static int __check(struct super_block *sb, int print_stuff)
+static int __check(struct list_head *head, int print_stuff)
 {
-   struct list_head *cursor = &sb->s_dirty;
+   struct list_head *cursor = head;
unsigned long dirtied_when = 0;
 
-   while ((cursor = cursor->prev) != &sb->s_dirty) {
+   while ((cursor = cursor->prev) != head) {
struct inode *inode = list_entry(cursor, struct inode, i_list);
if (print_stuff) {
printk("%p:%lu\n", inode, inode->dirtied_when);
@@ -51,14 +51,32 @@ static void __check_dirty_inode_list(str
if (!sysctl_inode_debug)
return;
 
-   if (__check(sb, 0)) {
+   if (__check(&sb->s_dirty, 0)) {
sysctl_inode_debug = 0;
if (inode)
printk("%s:%d: s_dirty got screwed up.  inode=%p:%lu\n",
file, line, inode, inode->dirtied_when);
else
printk("%s:%d: s_dirty got screwed up\n", file, line);
-   __check(sb, 1);
+   __check(&sb->s_dirty, 1);
+   }
+   if (__check(&sb->s_io, 0)) {
+   sysctl_inode_debug = 0;
+   if (inode)
+   printk("%s:%d: s_io got screwed up.  inode=%p:%lu\n",
+   file, line, inode, inode->dirtied_when);
+   else
+   printk("%s:%d: s_io got screwed up\n", file, line);
+   __check(&sb->s_io, 1);
+   }
+   if (__check(&sb->s_more_io, 0)) {
+   sysctl_inode_debug = 0;
+   if (inode)
+   printk("%s:%d: s_more_io got screwed up.  
inode=%p:%lu\n",
+   file, line, inode, inode->dirtied_when);
+   else
+   printk("%s:%d: s_more_io got screwed up\n", file, line);
+   __check(&sb->s_more_io, 1);
}
 }
 
@@ -223,7 +241,9 @@ static void redirty_tail(struct inode *i
  */
 static void requeue_io(struct inode *inode)
 {
+   check_dirty_inode(inode);
list_move(&inode->i_list, &inode->i_sb->s_more_io);
+   check_dirty_inode(inode);
 }
 
 static void inode_sync_complete(struct inode *inode)
@@ -483,7 +503,9 @@ int generic_sync_sb_inodes(struct super_
/* Was this inode dirtied too recently? */
if (wbc->older_than_this && time_after(inode->dirtied_when,
*wbc->older_than_this)) {
+   check_dirty_inode_list(sb);
list_splice_init(&sb->s_io, sb->s_dirty.prev);
+   check_dirty_inode_list(sb);
break;
}
 
@@ -520,8 +542,11 @@ int generic_sync_sb_inodes(struct super_
break;
}
 
-   if (list_empty(&sb->s_io))
+   if (list_empty(&sb->s_io)) {
+   check_dirty_inode_list(sb);
list_splice_init(&sb->s_more_io, &sb->s_io);
+   check_dirty_inode_list(sb);
+   }
spin_unlock(&inode_lock);
return ret; /* Leave any unwritten inodes on s_io */
 }

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes()

2007-08-09 Thread Fengguang Wu
NTFS's if-condition on dirty inodes is not complete.
Fix it with sb_has_dirty_inodes().

Cc: Anton Altaparmakov <[EMAIL PROTECTED]>
Cc: Ken Chen <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
--- linux-2.6.23-rc1-mm2.orig/fs/ntfs/super.c
+++ linux-2.6.23-rc1-mm2/fs/ntfs/super.c
@@ -2381,14 +2381,14 @@ static void ntfs_put_super(struct super_
 */
ntfs_commit_inode(vol->mft_ino);
write_inode_now(vol->mft_ino, 1);
-   if (!list_empty(&sb->s_dirty)) {
+   if (sb_has_dirty_inodes(sb)) {
const char *s1, *s2;
 
mutex_lock(&vol->mft_ino->i_mutex);
truncate_inode_pages(vol->mft_ino->i_mapping, 0);
mutex_unlock(&vol->mft_ino->i_mutex);
write_inode_now(vol->mft_ino, 1);
-   if (!list_empty(&sb->s_dirty)) {
+   if (sb_has_dirty_inodes(sb)) {
static const char *_s1 = "inodes";
static const char *_s2 = "";
s1 = _s1;
--- linux-2.6.23-rc1-mm2.orig/include/linux/fs.h
+++ linux-2.6.23-rc1-mm2/include/linux/fs.h
@@ -1772,6 +1772,7 @@ extern int bdev_read_only(struct block_d
 extern int set_blocksize(struct block_device *, int);
 extern int sb_set_blocksize(struct super_block *, int);
 extern int sb_min_blocksize(struct super_block *, int);
+extern int sb_has_dirty_inodes(struct super_block *);
 
 extern int generic_file_mmap(struct file *, struct vm_area_struct *);
 extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] [RFC][PATCH] fs-writeback: redefining the dirty inode queues

2007-08-09 Thread Fengguang Wu
Andrew,

I'd like to propose a cleaner way of using the s_dirty, s_io, s_more_io
queues for the writeback of dirty inodes. The basic idea is to clearly
define the function of the queues, especially to decouple s_diry from
s_io/s_more_io.  The details are in the changelog of patch 2.

The patches are some cleanups on top of Andrew's s_dirty time-ordering
patches and Ken's s_more_io patch:

[PATCH 1/4] writeback: check time-ordering of s_io and s_more_io
[PATCH 2/4] writeback: 3-queue based writeback schedule
[PATCH 3/4] writeback: function renames and cleanups
[PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes()

 fs/fs-writeback.c  |  196 +++
 fs/ntfs/super.c|4 
 include/linux/fs.h |1 
 3 files changed, 108 insertions(+), 93 deletions(-)

Note that the patches need rework for inserting into the right place of
your queue of -mm patches. I'll take care of it in the next take.

Thank you,
Fengguang
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/4] pass open file to ->setattr()

2007-08-09 Thread Miklos Szeredi
> >> > This is needed to be able to correctly implement open-unlink-fsetattr
> >> > semantics in some filesystem such as sshfs, without having to resort
> >> > to "silly-renaming".
> >> 
> >> How do you plan to do that?
> > 
> > Easy: the SFTP protocol has stateful opens and defines an FSTAT call.
> 
> Is it possible to reconnect without umounting?

Yes, but open files and in-progress requests are lost at reconnect.

> If yes, the unlinked files would be lost in spite of being opened,
> wouldn't they?

Sure.  Obviously one of the drawbacks of a stateful protocol is that
the server state can't survive a reconnect.

But that sort of reliability has never been the goal of sshfs.  And
even if that was needed, it could probably be much better handled in a
lower layer.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFFS2/mtdsuper modprobe "unknown symbol" in 2.6.23-rc1

2007-08-09 Thread Erez Zadok
In message <[EMAIL PROTECTED]>, Adrian Bunk writes:
> On Thu, Aug 09, 2007 at 10:38:18PM -0400, Erez Zadok wrote:
> > I'm getting an error modprobing jffs2 due to mtdsuper failing to insmod:
> >...
> > Does anyone know what am I missing?
> 
> You miss that 2.6.23-rc2 with this bug fixed has already been released.

Great, I'll upgrade to rc2 (I've had this problem since .22-rc).  Thanks for
the quick response.

Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: JFFS2/mtdsuper modprobe "unknown symbol" in 2.6.23-rc1

2007-08-09 Thread Adrian Bunk
On Thu, Aug 09, 2007 at 10:38:18PM -0400, Erez Zadok wrote:
> I'm getting an error modprobing jffs2 due to mtdsuper failing to insmod:
>...
> Does anyone know what am I missing?

You miss that 2.6.23-rc2 with this bug fixed has already been released.

> Thanks,
> Erez.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


JFFS2/mtdsuper modprobe "unknown symbol" in 2.6.23-rc1

2007-08-09 Thread Erez Zadok
I'm getting an error modprobing jffs2 due to mtdsuper failing to insmod:

# modprobe jffs2
WARNING: Error inserting mtdsuper
(/lib/modules/2.6.23-rc1/kernel/drivers/mtd/mtdsuper.ko):
Unknown symbol in module, or unknown parameter (see dmesg)
FATAL: Error inserting jffs2
(/lib/modules/2.6.23-rc1/kernel/fs/jffs2/jffs2.ko): Unknown
symbol in module, or unknown parameter (see dmesg)

# dmesg | tail
mtdsuper: Unknown symbol get_mtd_device
mtdsuper: Unknown symbol put_mtd_device
jffs2: Unknown symbol get_sb_mtd
jffs2: Unknown symbol kill_mtd_super

My relevant .config is:

CONFIG_MTD=m
CONFIG_MTD_BLKDEVS=m
CONFIG_MTD_BLOCK=m
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
CONFIG_MTD_BLOCK2MTD=m
CONFIG_JFFS2_FS=m
CONFIG_JFFS2_FS_DEBUG=0
CONFIG_JFFS2_FS_WRITEBUFFER=y
CONFIG_JFFS2_SUMMARY=y
CONFIG_JFFS2_FS_XATTR=y
CONFIG_JFFS2_FS_POSIX_ACL=y
CONFIG_JFFS2_FS_SECURITY=y
CONFIG_JFFS2_COMPRESSION_OPTIONS=y
CONFIG_JFFS2_ZLIB=y
CONFIG_JFFS2_RTIME=y
CONFIG_JFFS2_CMODE_PRIORITY=y

A "quick hack" around this which I found is to add

  MODULE_LICENSE("GPL");

to the end of drivers/mtd/mtdsuper.c, but that doesn't sound like the right
fix.

Does anyone know what am I missing?

Thanks,
Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problems while mounting /boot partition

2007-08-09 Thread Jan Engelhardt

On Aug 8 2007 18:28, Michal Piotrowski wrote:
>
>Hi Brian,
>
>Brian J. Murrell pisze:
>> I am using Ubuntu Gutsy, which is the in-development branch heading for
>> their next stable release.
>
>You forgot about message subject, so no one has read this report.

Actually, given the volume on LKML, a line without a subject is making
the most attention since all others do have one. :)


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] limit minixfs printks on corrupted dir i_size, CVE-2006-6058

2007-08-09 Thread Eric Sandeen
Bodo Eggert wrote:

> Warning: I'm only looking at the patch.
> 
> You are supposed to print an error message for a user, not to write in a
> chat window to a 1337 script kiddie. OK, you just matched the current style,
> and your patch is IMHO OK for a quick security fix, but:
> 
> - Security fixes should be CCed to the security mailing list, shouldn't they?
>   (It might be security@ or stable@, I'll remember tomorrow, but then I'd
>forget to comment)

ok.

> - Imagine you have three mounts containing a minix fs, how can you tell which
>   one is the the defective one?

good point.

> - The message says "minix_bmap", while the patch suggests it's in
>   block_to_path. Therefore I asume "minix_bmap" to have only random
>   informational value.

Yup, you're right.

> - Does block < 0 or block > $size make a difference?

well, block > size is likely to arrive from a corrupt i_size, and the
insistence upon going ahead and checking the next page after
encountering an error on the last one... I don't have any scenario in
mind where we'd be repeatedly trying to check blocks < 0.

> - the printk lacks the loglevel.

As do all other printk's in minixfs... (hm and 11,619 other printk's in
the kernel :) )

> - Asuming minix supports error handling, shouldn't it do something?
> 
> I'd suggest a message saying something like "minix: Bad block address on
> device 08:15, needs fsck".

Fair enough, as you said I was just fixing up the issue, not rewriting
the code around it.  But yes, I should probably have considered at least
a better message here.  I can fix this up & resend.  But I'm not
promising to audit all other printk's in minixfs this time around.  ;-)

-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/4] pass open file to ->setattr()

2007-08-09 Thread Bodo Eggert
Miklos Szeredi <[EMAIL PROTECTED]> wrote:

>> > This is needed to be able to correctly implement open-unlink-fsetattr
>> > semantics in some filesystem such as sshfs, without having to resort
>> > to "silly-renaming".
>> 
>> How do you plan to do that?
> 
> Easy: the SFTP protocol has stateful opens and defines an FSTAT call.

Is it possible to reconnect without umounting? If yes, the unlinked files
would be lost in spite of being opened, wouldn't they?
-- 
Top 100 things you don't want the sysadmin to say:
11. Can you get VMS for this Sparc thingy?

Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] limit minixfs printks on corrupted dir i_size, CVE-2006-6058

2007-08-09 Thread Bodo Eggert
Eric Sandeen <[EMAIL PROTECTED]> wrote:

> This attempts to address CVE-2006-6058
> http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-6058
>  
> first reported at http://projects.info-pull.com/mokb/MOKB-17-11-2006.html
> 
> Essentially a corrupted minix dir inode reporting a very large
> i_size will loop for a very long time in minix_readdir, minix_find_entry,
> etc, because on EIO they just move on to try the next page.  This is
> under the BKL, printk-storming as well.  This can lock up the machine
> for a very long time.  Simply ratelimiting the printks gets things back
> under control.

> Index: linux-2.6.22-rc4/fs/minix/itree_v1.c
> ===
> --- linux-2.6.22-rc4.orig/fs/minix/itree_v1.c
> +++ linux-2.6.22-rc4/fs/minix/itree_v1.c
> @@ -27,7 +27,8 @@ static int block_to_path(struct inode *
>  if (block < 0) {
>  printk("minix_bmap: block<0\n");
>  } else if (block >= (minix_sb(inode->i_sb)->s_max_size/BLOCK_SIZE)) {
> - printk("minix_bmap: block>big\n");
> + if (printk_ratelimit())
> + printk("minix_bmap: block>big\n");

Warning: I'm only looking at the patch.

You are supposed to print an error message for a user, not to write in a
chat window to a 1337 script kiddie. OK, you just matched the current style,
and your patch is IMHO OK for a quick security fix, but:

- Security fixes should be CCed to the security mailing list, shouldn't they?
  (It might be security@ or stable@, I'll remember tomorrow, but then I'd
   forget to comment)
- Imagine you have three mounts containing a minix fs, how can you tell which
  one is the the defective one?
- The message says "minix_bmap", while the patch suggests it's in
  block_to_path. Therefore I asume "minix_bmap" to have only random
  informational value.
- Does block < 0 or block > $size make a difference?
- the printk lacks the loglevel.
- Asuming minix supports error handling, shouldn't it do something?

I'd suggest a message saying something like "minix: Bad block address on
device 08:15, needs fsck".
-- 
Oops. My brain just hit a bad sector. 

Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED]
 [EMAIL PROTECTED] [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] limit minixfs printks on corrupted dir i_size, CVE-2006-6058

2007-08-09 Thread Eric Sandeen
Perhaps this is simpler, and preferable.  Thanks to adilger for
reminding me about printk_ratelimit.  :)



This attempts to address CVE-2006-6058 
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-6058
 
first reported at http://projects.info-pull.com/mokb/MOKB-17-11-2006.html

Essentially a corrupted minix dir inode reporting a very large
i_size will loop for a very long time in minix_readdir, minix_find_entry,
etc, because on EIO they just move on to try the next page.  This is
under the BKL, printk-storming as well.  This can lock up the machine 
for a very long time.  Simply ratelimiting the printks gets things back
under control.

Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>

Index: linux-2.6.22-rc4/fs/minix/itree_v1.c
===
--- linux-2.6.22-rc4.orig/fs/minix/itree_v1.c
+++ linux-2.6.22-rc4/fs/minix/itree_v1.c
@@ -27,7 +27,8 @@ static int block_to_path(struct inode * 
if (block < 0) {
printk("minix_bmap: block<0\n");
} else if (block >= (minix_sb(inode->i_sb)->s_max_size/BLOCK_SIZE)) {
-   printk("minix_bmap: block>big\n");
+   if (printk_ratelimit())
+   printk("minix_bmap: block>big\n");
} else if (block < 7) {
offsets[n++] = block;
} else if ((block -= 7) < 512) {
Index: linux-2.6.22-rc4/fs/minix/itree_v2.c
===
--- linux-2.6.22-rc4.orig/fs/minix/itree_v2.c
+++ linux-2.6.22-rc4/fs/minix/itree_v2.c
@@ -28,7 +28,8 @@ static int block_to_path(struct inode * 
if (block < 0) {
printk("minix_bmap: block<0\n");
} else if (block >= 
(minix_sb(inode->i_sb)->s_max_size/sb->s_blocksize)) {
-   printk("minix_bmap: block>big\n");
+   if (printk_ratelimit())
+   printk("minix_bmap: block>big\n");
} else if (block < 7) {
offsets[n++] = block;
} else if ((block -= 7) < 256) {



-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

2007-08-09 Thread Casey Schaufler

--- James Morris <[EMAIL PROTECTED]> wrote:

> On Thu, 9 Aug 2007, David Howells wrote:
> 
> > James Morris <[EMAIL PROTECTED]> wrote:
> > 
> > > David, I've looked at the code and can't see that you need to access the 
> > > label itself outside the LSM.  Could you instead simply pass the inode 
> > > pointer around?
> > 
> > It's not quite that simple.  I need to impose *two* security labels in
> > cachefiles_begin_secure() when I'm about to act on behalf of a process
> that's
> > tried to access a netfs file:
> 
> Ah ok, we had a similar problem with NFS mount options.
> 
> While I'm concerned about encoding SELinux-optimized secid labels into 
> general kernel structures, moving to more generalized pointers introduces 
> lifecycle maintenance issues and complexity which is not needed in the 
> mainline kernel.  i.e. it'll be unused infrastructure maintained by 
> upstream, and used only by out-of-tree modules.
> 
> So, given that the kernel has no stable API, I suggest accepting the u32 
> secid as you propose, and if someone wants to merge a module which also 
> uses these hooks, but is entirely unable to use u32 labels, then they can 
> also justify making the interface more generalized and provide the code 
> for it.

Grumble. Yet another thing to undo in the near future. I still
hope to suggest what I would consider a viable alternative "soon".


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4/4] VFS: allow filesystem to override mknod capability checks

2007-08-09 Thread Miklos Szeredi
> > From: Miklos Szeredi <[EMAIL PROTECTED]>
> > 
> > Add a new filesystem flag, that results in the VFS not checking if the
> > current process has enough privileges to do an mknod().
> > 
> > This is needed on filesystems, where an unprivileged user may be able
> > to create a device node, without causing security problems.
> > 
> > One such example is "mountlo" a loopback mount utility implemented
> > with fuse and UML, which runs as an unprivileged userspace process.
> > In this case the user does in fact have the right to create device
> > nodes within the filesystem image, as long as the user has write
> > access to the image.  Since the filesystem is mounted with "nodev",
> > adding device nodes is not a security concern.
> 
> Could we enforce at do_new_mount() that if
> type->fs_flags&FS_MKNOD_CHECKS_PERM then mnt_flags |= MS_NODEV?

Well, the problem with that is, there will be fuse filesystems which
will want devices to work and for those the capability checks will be
reenabled inside ->mknod().  In fact, for backward compatibility all
filesystems will have the mknod checks, except ones which explicitly
request to turn it off.

Since unprivileged fuse mounts always have "nodev", the only way
security could be screwed up, is if a filesystem running with
privileges disabled the mknod checks.

I will probably add some safety guards against that into the fuse
library, but of course there's no way to stop a privileged user from
screwing up security anyway.

If for example there's a loop mount, where the disk image file is
writable by a user, and root mounts it without "nodev", the user can
still create device nodes (by modifying the image) even if the mknod
checks are enabled.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 16/18] fs/Kconfig

2007-08-09 Thread Jörn Engel
On Thu, 9 August 2007 01:01:26 +0200, Arnd Bergmann wrote:
> On Wednesday 08 August 2007, Jörn Engel wrote:
> > +config LOGFS
> > +   bool "Log Filesystem (EXPERIMENTAL)"
> > +   depends on MTD && BLOCK && EXPERIMENTAL
> 
> The dependency on MTD _and_ BLOCK looks correct for your code, but
> not necessary. How about making it 
> 
>   depends on (MTD || BLOCK) && EXPERIMENTAL
> 
> and allowing to build without the mtd/bdev specific code?

Would be useful, yes.

Jörn

-- 
Data expands to fill the space available for storage.
-- Parkinson's Law
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 02/18] include/linux/logfs.h

2007-08-09 Thread Jörn Engel
On Thu, 9 August 2007 00:56:29 +0200, Arnd Bergmann wrote:
> On Wednesday 08 August 2007, Jörn Engel wrote:
> > +++ linux-2.6.21logfs/include/linux/logfs.h 2007-08-08 
> > 02:57:37.0 +0200
> > @@ -0,0 +1,500 @@
> > +/*
> > + * fs/logfs/logfs.h
> > + *
> 
> The comment does not match the file name. Better remove the file names
> entirely from introduction comments.

Maybe.  This file needs to get moved again anyway.

Jörn

-- 
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it.
-- Brian W. Kernighan
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

2007-08-09 Thread James Morris
On Thu, 9 Aug 2007, David Howells wrote:

> James Morris <[EMAIL PROTECTED]> wrote:
> 
> > David, I've looked at the code and can't see that you need to access the 
> > label itself outside the LSM.  Could you instead simply pass the inode 
> > pointer around?
> 
> It's not quite that simple.  I need to impose *two* security labels in
> cachefiles_begin_secure() when I'm about to act on behalf of a process that's
> tried to access a netfs file:

Ah ok, we had a similar problem with NFS mount options.

While I'm concerned about encoding SELinux-optimized secid labels into 
general kernel structures, moving to more generalized pointers introduces 
lifecycle maintenance issues and complexity which is not needed in the 
mainline kernel.  i.e. it'll be unused infrastructure maintained by 
upstream, and used only by out-of-tree modules.

So, given that the kernel has no stable API, I suggest accepting the u32 
secid as you propose, and if someone wants to merge a module which also 
uses these hooks, but is entirely unable to use u32 labels, then they can 
also justify making the interface more generalized and provide the code 
for it.



- James 
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4/4] VFS: allow filesystem to override mknod capability checks

2007-08-09 Thread Serge E. Hallyn
Quoting [EMAIL PROTECTED] ([EMAIL PROTECTED]):
> From: Miklos Szeredi <[EMAIL PROTECTED]>
> 
> Add a new filesystem flag, that results in the VFS not checking if the
> current process has enough privileges to do an mknod().
> 
> This is needed on filesystems, where an unprivileged user may be able
> to create a device node, without causing security problems.
> 
> One such example is "mountlo" a loopback mount utility implemented
> with fuse and UML, which runs as an unprivileged userspace process.
> In this case the user does in fact have the right to create device
> nodes within the filesystem image, as long as the user has write
> access to the image.  Since the filesystem is mounted with "nodev",
> adding device nodes is not a security concern.

Could we enforce at do_new_mount() that if
type->fs_flags&FS_MKNOD_CHECKS_PERM then mnt_flags |= MS_NODEV?

> This feature is basically "fuse-only", so it does not make sense to
> change the semantics of ->mknod().
> 
> Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
> ---
> 
> Index: linux/fs/namei.c
> ===
> --- linux.orig/fs/namei.c 2007-08-09 16:49:07.0 +0200
> +++ linux/fs/namei.c  2007-08-09 16:49:12.0 +0200
> @@ -1921,7 +1921,8 @@ int vfs_mknod(struct inode *dir, struct 
>   if (error)
>   return error;
>  
> - if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
> + if (!(dir->i_sb->s_type->fs_flags & FS_MKNOD_CHECKS_PERM) &&
> + (S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
>   return -EPERM;
>  
>   if (!dir->i_op || !dir->i_op->mknod)
> Index: linux/include/linux/fs.h
> ===
> --- linux.orig/include/linux/fs.h 2007-08-09 16:49:07.0 +0200
> +++ linux/include/linux/fs.h  2007-08-09 16:49:12.0 +0200
> @@ -97,6 +97,7 @@ extern int dir_notify_enable;
>  #define FS_BINARY_MOUNTDATA 2
>  #define FS_HAS_SUBTYPE 4
>  #define FS_SAFE 8/* Safe to mount by unprivileged users */
> +#define FS_MKNOD_CHECKS_PERM 16  /* FS checks if device creation is 
> allowed */
>  #define FS_REVAL_DOT 16384   /* Check the paths ".", ".." for staleness */
>  #define FS_RENAME_DOES_D_MOVE32768   /* FS will handle d_move()
>* during rename() internally.
> 
> --
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/14] NFS: Use local caching [try #2]

2007-08-09 Thread Trond Myklebust
On Thu, 2007-08-09 at 19:52 +0100, David Howells wrote:
> Trond Myklebust <[EMAIL PROTECTED]> wrote:
> 
> > Dang, that's a lot of inlines... AFAICS, approx half of fs/nfs/fscache.h
> > should really be moved into fscache.c.
> 
> If you wish.  It seems a shame since a lot of them have only one caller.

...however it also forces you to export a lot of stuff which is really
private to fscache.c (the atomics etc).

> Note that due to patch #2, PG_fscache causes releasepage() and
> invalidatepage() to be called in addition to PG_private.



> > > +
> > > + if (!nfs_fscache_release_page(page, gfp))
> > > + return 0;
> > 
> > This looks _very_ dubious. Why shouldn't I be able to discard a page
> > just because fscache isn't done writing it out? I may have very good
> > reasons to do so.
> 
> Hmmm...  Looking at the truncate routines, I suppose this ought to be okay,
> provided the cache retains a reference on the page whilst it's writing it out
> (put_page() won't can the page until we release it).
> 
> It also seems dubious, though, to release the page when the filesystem is
> doing stuff to it, even if it's by proxy in the cache.  I'll have to test
> that, but I'm slightly concerned that the netfs could end up releasing its
> cookie before the cache has finished with its pages.  On the other hand, with
> the new asynchronous stuff I've done, I'm not sure this'll be an actual
> problem.

Actually, as long as launder_page() and invalidate_page() are doing
their thing, then your current code might be OK. The important thing is
to ensure that invalidate_inode_pages2() and truncate_inode_pages() work
as expected.

> > >  static int nfs_launder_page(struct page *page)
> > >  {
> > > + nfs_fscache_invalidate_page(page, page->mapping->host, 0);
> > 
> > Why? The function of launder_page() is to clean the page, not to
> > truncate it.
> 
> Okay.

What you should be doing here is probably to make a call to
wait_on_page_fscache_write(page). That should suffice to ensure that the
page is clean w.r.t. fscache activity afaics.

> > > @@ -1000,11 +1007,13 @@ static int nfs_update_inode(struct inode *inode, 
> > > struct nfs_fattr *fattr)
> > >   if (data_stable) {
> > >   inode->i_size = new_isize;
> > >   invalid |= NFS_INO_INVALID_DATA;
> > > + nfs_fscache_attr_changed(inode);
> > 
> > Can't fscache_attr_changed() call kmalloc(GFP_KERNEL)? You are in a
> > spinlocked context here.
> 
> Hmmm...  How about I move the call to fscache_attr_changed() to the callers of
> nfs_update_inode(), to just after the spinlock is unlocked.  The operation is
> going to be asynchronous, so the delay shouldn't matter.

Right. You might also want to add a flag NFS_INO_INVALID_FSCACHE_ATTR or
something like that in order to trigger it.

> > I'm not too happy about the change to the binary mount interface. As far
> > as I'm concerned, the binary interface should really be considered
> > frozen, and should not be touched.
> 
> I hadn't come across that.  I'll have a look.
> 
> > Instead, feel free to update the text-based mount interface (which can
> > be found in 2.6.23-rc1 and later). Please put any such mount interface
> > changes in a separate patch together with the Kconfig changes.
> 
> If you wish, but doesn't that violate the rules of patch division laid down by
> Linus and Andrew?

Why? It should be possible to set this up so that the NFS+fscache code
compiles correctly without the mount patch.

Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/14] NFS: Use local caching [try #2]

2007-08-09 Thread David Howells

> > Instead, feel free to update the text-based mount interface (which can
> > be found in 2.6.23-rc1 and later).

I presume you're referring to nfs_mount_option_tokens[] and friends.  Is there
a mount program that can drive this?

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

2007-08-09 Thread James Morris
On Thu, 9 Aug 2007, David Howells wrote:

> + u32 (*inode_get_secid)(struct inode *inode);

To maintain API consistency, please return an int which only acts as an 
error code, and returning the secid via a *u32 function parameter.


- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

2007-08-09 Thread David Howells
James Morris <[EMAIL PROTECTED]> wrote:

> David, I've looked at the code and can't see that you need to access the 
> label itself outside the LSM.  Could you instead simply pass the inode 
> pointer around?

It's not quite that simple.  I need to impose *two* security labels in
cachefiles_begin_secure() when I'm about to act on behalf of a process that's
tried to access a netfs file:

 (1) The security label to act as.  This is the label attached to the
 cachefilesd process when it starts the cache.  This is obtained by
 cachefiles_get_security_ID().

 (2) The security label to create files as.  This is the label attached to
 root directory of the cache.  This is obtained by
 cachefiles_determine_cache_secid().

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/14] NFS: Use local caching [try #2]

2007-08-09 Thread David Howells
Trond Myklebust <[EMAIL PROTECTED]> wrote:

> Dang, that's a lot of inlines... AFAICS, approx half of fs/nfs/fscache.h
> should really be moved into fscache.c.

If you wish.  It seems a shame since a lot of them have only one caller.

> > +   /* we can do this here as the bits are only set with the page lock
> > +* held, and our caller is holding that */
> > +   if (!page->private)
> > +   ClearPagePrivate(page);
> 
> Why would PG_private be set at this point?

Looks like I've got a bit more cleaning up to do.  PG_private isn't set by
FS-Cache, so this bit shouldn't be here.

> In any case, please send this and other PagePrivate changes as a
> separate patch. Any changes to the PagePrivate semantics must be made
> easy to debug.

There shouldn't be any.

Note that due to patch #2, PG_fscache causes releasepage() and
invalidatepage() to be called in addition to PG_private.

> > +
> > +   if (!nfs_fscache_release_page(page, gfp))
> > +   return 0;
> 
> This looks _very_ dubious. Why shouldn't I be able to discard a page
> just because fscache isn't done writing it out? I may have very good
> reasons to do so.

Hmmm...  Looking at the truncate routines, I suppose this ought to be okay,
provided the cache retains a reference on the page whilst it's writing it out
(put_page() won't can the page until we release it).

It also seems dubious, though, to release the page when the filesystem is
doing stuff to it, even if it's by proxy in the cache.  I'll have to test
that, but I'm slightly concerned that the netfs could end up releasing its
cookie before the cache has finished with its pages.  On the other hand, with
the new asynchronous stuff I've done, I'm not sure this'll be an actual
problem.

> >  static int nfs_launder_page(struct page *page)
> >  {
> > +   nfs_fscache_invalidate_page(page, page->mapping->host, 0);
> 
> Why? The function of launder_page() is to clean the page, not to
> truncate it.

Okay.

> > @@ -1000,11 +1007,13 @@ static int nfs_update_inode(struct inode *inode, 
> > struct nfs_fattr *fattr)
> > if (data_stable) {
> > inode->i_size = new_isize;
> > invalid |= NFS_INO_INVALID_DATA;
> > +   nfs_fscache_attr_changed(inode);
> 
> Can't fscache_attr_changed() call kmalloc(GFP_KERNEL)? You are in a
> spinlocked context here.

Hmmm...  How about I move the call to fscache_attr_changed() to the callers of
nfs_update_inode(), to just after the spinlock is unlocked.  The operation is
going to be asynchronous, so the delay shouldn't matter.

> I'm not too happy about the change to the binary mount interface. As far
> as I'm concerned, the binary interface should really be considered
> frozen, and should not be touched.

I hadn't come across that.  I'll have a look.

> Instead, feel free to update the text-based mount interface (which can
> be found in 2.6.23-rc1 and later). Please put any such mount interface
> changes in a separate patch together with the Kconfig changes.

If you wish, but doesn't that violate the rules of patch division laid down by
Linus and Andrew?

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

2007-08-09 Thread James Morris
On Thu, 9 Aug 2007, Casey Schaufler wrote:

> This is SELinux specific functionality. It should not be an LSM
> interface. 

As long as the security labels are themselves not being exported to the 
kernel to be used e.g. for display or transport, then I agree, and we 
should avoid passing them around outside the LSM entirely if possible.

Usually, they're attached to a significant kernel object, which you 
typically pass around as part of the interface anyway.

David, I've looked at the code and can't see that you need to access the 
label itself outside the LSM.  Could you instead simply pass the inode 
pointer around?

(I know it's not always possible, but much preferred).


- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/14] CacheFiles: Permit a process's create SID to be overridden [try #2]

2007-08-09 Thread Casey Schaufler

--- David Howells <[EMAIL PROTECTED]> wrote:

> Casey Schaufler <[EMAIL PROTECTED]> wrote:
> 
> > This is SELinux specific funtionality and should be done in the
> > SELinux code. You should not be adding interfaces that are SELinux
> > specific, in this case using secids instead of the LSM blob interfaces.
> 
> Is using secids your only objection?  Or are you objecting to the whole
> 'act-as' concept?

My knee jerk reaction is that that is likely to be SELinux specific
behavior as well. I'm going to have to look at the patch more carefully
before I can say for sure. I will try to make a constructive proposal
once I've had the chance to think on it a little. Sorry about the terse
and unhelpful initial reaction.


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] limit minixfs dir_pages on corrupted dir i_size, CVE-2006-6058

2007-08-09 Thread Eric Sandeen
This attempts to address CVE-2006-6058 
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-6058

first reported at http://projects.info-pull.com/mokb/MOKB-17-11-2006.html

Essentially a corrupted minix dir inode reporting a very large
i_size will loop for a very long time in minix_readdir, minix_find_entry,
etc, because on EIO they just move on to try the next page.  This is
under the BKL, printk'ing as well.  This can lock up the machine 
for a very long time.  A simple approach is to at least limit the nr. of 
pages attempted to no more than s_max_size.  (s_max_size is about 256MB for
V1, but 2GB for V2; this could still result in a lot of EIO reads in the V2
case, should the retry loops in minix_readdir & friends be short-circuited
somehow instead?  A simple "break" rather than "continue" on error would 
certainly resolve it, too...)

Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>

--- linux-2.6.22-rc4.orig/fs/minix/dir.c
+++ linux-2.6.22-rc4/fs/minix/dir.c
@@ -42,7 +42,15 @@ minix_last_byte(struct inode *inode, uns
 
 static inline unsigned long dir_pages(struct inode *inode)
 {
-   return (inode->i_size+PAGE_CACHE_SIZE-1)>>PAGE_CACHE_SHIFT;
+   loff_t size = inode->i_size;
+
+   if (size > minix_sb(inode->i_sb)->s_max_size) {
+   printk("%s: inode %lld i_size > s_max_size\n",
+   __FUNCTION__, inode->i_size);
+   size = minix_sb(inode->i_sb)->s_max_size;
+   }
+
+   return (size+PAGE_CACHE_SIZE-1)>>PAGE_CACHE_SHIFT;
 }
 
 static int dir_commit_chunk(struct page *page, unsigned from, unsigned to)

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

2007-08-09 Thread James Morris
On Thu, 9 Aug 2007, David Howells wrote:

> James Morris <[EMAIL PROTECTED]> wrote:
> 
> > > + u32 (*inode_get_secid)(struct inode *inode);
> > 
> > To maintain API consistency, please return an int which only acts as an 
> > error code, and returning the secid via a *u32 function parameter.
> 
> Does that apply to *all* the functions, irrespective of whether or not they
> return an error?

LSM is theoretically an API, so we generally don't know if a security 
module will return an error or not.

If they were simply calls directly into SElinux, where we could always 
know the semantics, then that would be a different story.



- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/14] NFS: Use local caching [try #2]

2007-08-09 Thread Trond Myklebust
Dang, that's a lot of inlines... AFAICS, approx half of fs/nfs/fscache.h
should really be moved into fscache.c.

Otherwise, this looks a lot less intrusive than previous patches.

See inlined comments.

On Thu, 2007-08-09 at 17:05 +0100, David Howells wrote:
> The attached patch makes it possible for the NFS filesystem to make use of the
> network filesystem local caching service (FS-Cache).
> 
> To be able to use this, an updated mount program is required.  This can be
> obtained from:
> 
>   http://people.redhat.com/steved/cachefs/util-linux/
> 
> To mount an NFS filesystem to use caching, add an "fsc" option to the mount:
> 
>   mount warthog:/ /a -o fsc
> 
> Signed-Off-By: David Howells <[EMAIL PROTECTED]>
> ---
> 
>  fs/Kconfig |8 +
>  fs/nfs/Makefile|1 
>  fs/nfs/client.c|   11 +
>  fs/nfs/file.c  |   38 +++-
>  fs/nfs/fscache.c   |  303 +
>  fs/nfs/fscache.h   |  464 
> 
>  fs/nfs/inode.c |   16 ++
>  fs/nfs/internal.h  |8 +
>  fs/nfs/read.c  |   27 ++-
>  fs/nfs/super.c |1 
>  fs/nfs/sysctl.c|   43 
>  fs/nfs/write.c |3 
>  include/linux/nfs4_mount.h |3 
>  include/linux/nfs_fs.h |6 +
>  include/linux/nfs_fs_sb.h  |5 
>  include/linux/nfs_mount.h  |3 
>  16 files changed, 931 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 7feb4cb..76d5d16 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -1600,6 +1600,14 @@ config NFS_V4
>  
> If unsure, say N.
>  
> +config NFS_FSCACHE
> + bool "Provide NFS client caching support (EXPERIMENTAL)"
> + depends on EXPERIMENTAL
> + depends on NFS_FS=m && FSCACHE || NFS_FS=y && FSCACHE=y
> + help
> +   Say Y here if you want NFS data to be cached locally on disc through
> +   the general filesystem cache manager
> +
>  config NFS_DIRECTIO
>   bool "Allow direct I/O on NFS files"
>   depends on NFS_FS
> diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
> index b55cb23..c9e7c43 100644
> --- a/fs/nfs/Makefile
> +++ b/fs/nfs/Makefile
> @@ -16,4 +16,5 @@ nfs-$(CONFIG_NFS_V4)+= nfs4proc.o nfs4xdr.o 
> nfs4state.o nfs4renewd.o \
>  nfs4namespace.o
>  nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
>  nfs-$(CONFIG_SYSCTL) += sysctl.o
> +nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
>  nfs-objs := $(nfs-y)
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index a49f9fe..7be7807 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -137,6 +137,8 @@ static struct nfs_client *nfs_alloc_client(const char 
> *hostname,
>   clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED;
>  #endif
>  
> + nfs_fscache_get_client_cookie(clp);
> +
>   return clp;
>  
>  error_3:
> @@ -168,6 +170,8 @@ static void nfs_free_client(struct nfs_client *clp)
>  
>   nfs4_shutdown_client(clp);
>  
> + nfs_fscache_release_client_cookie(clp);
> +
>   /* -EIO all pending I/O */
>   if (!IS_ERR(clp->cl_rpcclient))
>   rpc_shutdown_client(clp->cl_rpcclient);
> @@ -1308,7 +1312,7 @@ static int nfs_volume_list_show(struct seq_file *m, 
> void *v)
>  
>   /* display header on line 1 */
>   if (v == &nfs_volume_list) {
> - seq_puts(m, "NV SERVER   PORT DEV FSID\n");
> + seq_puts(m, "NV SERVER   PORT DEV FSID  FSC\n");
>   return 0;
>   }
>   /* display one transport per line on subsequent lines */
> @@ -1322,12 +1326,13 @@ static int nfs_volume_list_show(struct seq_file *m, 
> void *v)
>(unsigned long long) server->fsid.major,
>(unsigned long long) server->fsid.minor);
>  
> - seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s\n",
> + seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n",
>  clp->cl_nfsversion,
>  NIPQUAD(clp->cl_addr.sin_addr),
>  ntohs(clp->cl_addr.sin_port),
>  dev,
> -fsid);
> +fsid,
> +nfs_server_fscache_state(server));

Please send these changes as a separate patch: they change an existing
interface.

>  
>   return 0;
>  }
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index c87dc71..dfd36e0 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -34,6 +34,7 @@
>  
>  #include "delegation.h"
>  #include "iostat.h"
> +#include "internal.h"
>  
>  #define NFSDBG_FACILITY  NFSDBG_FILE
>  
> @@ -259,6 +260,10 @@ nfs_file_mmap(struct file * file, struct vm_area_struct 
> * vma)
>   status = nfs_revalidate_mapping(inode, file->f_mapping);
>   if (!status)
>   status = generic_file_mmap(file, vma);
> +
> + if (status == 0)
> + nfs_fscache_install_vm_ops(inode, vma);
> +
>   return status;

Please note that in 2.6.24 the NFS client

Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

2007-08-09 Thread David Howells
James Morris <[EMAIL PROTECTED]> wrote:

> > +   u32 (*inode_get_secid)(struct inode *inode);
> 
> To maintain API consistency, please return an int which only acts as an 
> error code, and returning the secid via a *u32 function parameter.

Does that apply to *all* the functions, irrespective of whether or not they
return an error?

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/14] CacheFiles: Permit a process's create SID to be overridden [try #2]

2007-08-09 Thread David Howells
Casey Schaufler <[EMAIL PROTECTED]> wrote:

> This is SELinux specific funtionality and should be done in the
> SELinux code. You should not be adding interfaces that are SELinux
> specific, in this case using secids instead of the LSM blob interfaces.

Is using secids your only objection?  Or are you objecting to the whole
'act-as' concept?

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

2007-08-09 Thread David Howells
Casey Schaufler <[EMAIL PROTECTED]> wrote:

> This is SELinux specific functionality. It should not be an LSM
> interface. 

This is what I worked out in conjunction with the denizens of the SELinux
mailing list.  What would you have me do differently?  Change things like:

u32 (*act_as_secid)(u32 secid);

to something like:

void (*act_as_secid)(const char *newsecdata, u32 newseclen,
 char *oldsecdata, u32 *oldseclen);

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

2007-08-09 Thread Casey Schaufler

--- Stephen Smalley <[EMAIL PROTECTED]> wrote:

> On Thu, 2007-08-09 at 10:07 -0700, Casey Schaufler wrote:
> > --- David Howells <[EMAIL PROTECTED]> wrote:
> > 
> > > Permit an inode's security ID to be obtained by the CacheFiles module. 
> This
> > > is
> > > then used as the SID with which files and directories will be created in
> the
> > > cache.
> > 
> > This is SELinux specific functionality. It should not be an LSM
> > interface. 
> 
> Odd, you proposed exactly the same hook (aside from naming convention
> and secid as argument vs. as retval) in recent postings on linux-audit
> and selinux list for use by the audit system.

And that's exposing SELinux specific functionality too. And I don't
like the fact that the audit system already requires a secid interface.
The audit system, however, does not use the secid for anything other
than a handle that gets passed around and eventually used to get the
data that goes into the audit record. It's annoying, but harmless and
does not affect any access control decisions. The change proposed here
would use the secid in access control decisions. The LSM interface
ought not to be exposing module specific internal data structures. My
work on pulling selinux code out of audit left the secid interface
in place. You're right in that audit should get fixed. I had been
hoping to make that a phase II activity.


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

2007-08-09 Thread Stephen Smalley
On Thu, 2007-08-09 at 10:07 -0700, Casey Schaufler wrote:
> --- David Howells <[EMAIL PROTECTED]> wrote:
> 
> > Permit an inode's security ID to be obtained by the CacheFiles module.  This
> > is
> > then used as the SID with which files and directories will be created in the
> > cache.
> 
> This is SELinux specific functionality. It should not be an LSM
> interface. 

Odd, you proposed exactly the same hook (aside from naming convention
and secid as argument vs. as retval) in recent postings on linux-audit
and selinux list for use by the audit system.

-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

2007-08-09 Thread Casey Schaufler

--- David Howells <[EMAIL PROTECTED]> wrote:

> Permit an inode's security ID to be obtained by the CacheFiles module.  This
> is
> then used as the SID with which files and directories will be created in the
> cache.

This is SELinux specific functionality. It should not be an LSM
interface. 

Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/14] CacheFiles: Permit a process's create SID to be overridden [try #2]

2007-08-09 Thread Casey Schaufler

--- David Howells <[EMAIL PROTECTED]> wrote:

> Make it possible for a process's file creation SID to be temporarily
> overridden
> by CacheFiles so that files created in the cache have the right label
> attached.
> 
> Without this facility, files created in the cache will be given the current
> file creation SID of whatever process happens to have invoked CacheFiles
> indirectly by means of opening a netfs file at the time the cache file is
> created.

This is SELinux specific funtionality and should be done in the
SELinux code. You should not be adding interfaces that are SELinux
specific, in this case using secids instead of the LSM blob interfaces.


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/14] CacheFiles: Add an act-as SID override in task_security_struct [try #2]

2007-08-09 Thread David Howells
Add an act-as SID to task_security_struct that is equivalent to fsuid/fsgid in
task_struct.  This permits a task to perform operations as if it is the
overriding SID, without changing its own SID as that might be needed to control
access to the process by ptrace, signals, /proc, etc.

This is useful for CacheFiles in that it allows CacheFiles to access the cache
files and directories using the cache's security context rather than the
security context of the process on whose behalf it is working, and in the
context of which it is running.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 include/linux/security.h  |   32 +++
 security/dummy.c  |   12 +++
 security/selinux/exports.c|2 
 security/selinux/hooks.c  |  160 +++--
 security/selinux/include/objsec.h |1 
 security/selinux/selinuxfs.c  |2 
 security/selinux/xfrm.c   |6 +
 7 files changed, 148 insertions(+), 67 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 92d3da0..422015d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1154,6 +1154,16 @@ struct request_sock;
  * Set the current FS security ID.
  * @secid contains the security ID to set.
  *
+ * @act_as_secid:
+ * Set the security ID as which to act, returning the security ID as which
+ *  the process was previously acting.
+ * @secid contains the security ID to act as.
+ *
+ * @act_as_self:
+ * Reset the security ID as which to act to be the same as the process's
+ *  owning security ID, and return the security ID as which the process was
+ *  previously acting.
+ *
  * This is the main security structure.
  */
 struct security_operations {
@@ -1339,6 +1349,8 @@ struct security_operations {
void (*release_secctx)(char *secdata, u32 seclen);
u32 (*get_fscreate_secid)(void);
u32 (*set_fscreate_secid)(u32 secid);
+   u32 (*act_as_secid)(u32 secid);
+   u32 (*act_as_self)(void);
 
 #ifdef CONFIG_SECURITY_NETWORK
int (*unix_stream_connect) (struct socket * sock,
@@ -2146,6 +2158,16 @@ static inline u32 security_set_fscreate_secid(u32 secid)
return security_ops->set_fscreate_secid(secid);
 }
 
+static inline u32 security_act_as_secid(u32 secid)
+{
+   return security_ops->act_as_secid(secid);
+}
+
+static inline u32 security_act_as_self(void)
+{
+   return security_ops->act_as_self();
+}
+
 /* prototypes */
 extern int security_init   (void);
 extern int register_security   (struct security_operations *ops);
@@ -2838,6 +2860,16 @@ static inline u32 security_set_fscreate_secid(u32 secid)
return 0;
 }
 
+static inline u32 security_act_as_secid(u32 secid)
+{
+   return 0;
+}
+
+static inline u32 security_act_as_self(void)
+{
+   return 0;
+}
+
 #endif /* CONFIG_SECURITY */
 
 #ifdef CONFIG_SECURITY_NETWORK
diff --git a/security/dummy.c b/security/dummy.c
index d463e6f..77ec75d 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -940,6 +940,16 @@ static u32 dummy_set_fscreate_secid(u32 secid)
return 0;
 }
 
+static u32 dummy_act_as_secid(u32 secid)
+{
+   return 0;
+}
+
+static u32 dummy_act_as_self(void)
+{
+   return 0;
+}
+
 #ifdef CONFIG_KEYS
 static inline int dummy_key_alloc(struct key *key, struct task_struct *ctx,
  unsigned long flags)
@@ -1096,6 +1106,8 @@ void security_fixup_ops (struct security_operations *ops)
set_to_dummy_if_null(ops, release_secctx);
set_to_dummy_if_null(ops, get_fscreate_secid);
set_to_dummy_if_null(ops, set_fscreate_secid);
+   set_to_dummy_if_null(ops, act_as_secid);
+   set_to_dummy_if_null(ops, act_as_self);
 #ifdef CONFIG_SECURITY_NETWORK
set_to_dummy_if_null(ops, unix_stream_connect);
set_to_dummy_if_null(ops, unix_may_send);
diff --git a/security/selinux/exports.c b/security/selinux/exports.c
index b6f9694..b559699 100644
--- a/security/selinux/exports.c
+++ b/security/selinux/exports.c
@@ -79,7 +79,7 @@ int selinux_relabel_packet_permission(u32 sid)
if (selinux_enabled) {
struct task_security_struct *tsec = current->security;
 
-   return avc_has_perm(tsec->sid, sid, SECCLASS_PACKET,
+   return avc_has_perm(tsec->actor_sid, sid, SECCLASS_PACKET,
PACKET__RELABELTO, NULL);
}
return 0;
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index c5905b0..66af819 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -162,7 +162,8 @@ static int task_alloc_security(struct task_struct *task)
return -ENOMEM;
 
tsec->task = task;
-   tsec->osid = tsec->sid = tsec->ptrace_sid = SECINITSID_UNLABELED;
+   tsec->osid = tsec->actor_sid = tsec->sid = tsec->ptrace_sid =
+   SECINITSID_UNLABELED;
task->security = tsec;
 
return 

[PATCH 06/14] CacheFiles: Add a hook to write a single page of data to an inode [try #2]

2007-08-09 Thread David Howells
Add an address space operation to write one single page of data to an inode at
a page-aligned location (thus permitting the implementation to be highly
optimised).

This is used by CacheFiles to store the contents of netfs pages into their
backing file pages.

Supply a generic implementation for this that uses the prepare_write() and
commit_write() address_space operations to bound a copy directly into the page
cache.

Hook the Ext2 and Ext3 operations to the generic implementation.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 fs/ext2/inode.c|2 +
 fs/ext3/inode.c|3 ++
 include/linux/fs.h |7 
 mm/filemap.c   |   95 
 4 files changed, 107 insertions(+), 0 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 0079b2c..b3e4b50 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -695,6 +695,7 @@ const struct address_space_operations ext2_aops = {
.direct_IO  = ext2_direct_IO,
.writepages = ext2_writepages,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 const struct address_space_operations ext2_aops_xip = {
@@ -713,6 +714,7 @@ const struct address_space_operations ext2_nobh_aops = {
.direct_IO  = ext2_direct_IO,
.writepages = ext2_writepages,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 /*
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index de4e316..93809eb 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -1713,6 +1713,7 @@ static const struct address_space_operations 
ext3_ordered_aops = {
.releasepage= ext3_releasepage,
.direct_IO  = ext3_direct_IO,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 static const struct address_space_operations ext3_writeback_aops = {
@@ -1727,6 +1728,7 @@ static const struct address_space_operations 
ext3_writeback_aops = {
.releasepage= ext3_releasepage,
.direct_IO  = ext3_direct_IO,
.migratepage= buffer_migrate_page,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 static const struct address_space_operations ext3_journalled_aops = {
@@ -1740,6 +1742,7 @@ static const struct address_space_operations 
ext3_journalled_aops = {
.bmap   = ext3_bmap,
.invalidatepage = ext3_invalidatepage,
.releasepage= ext3_releasepage,
+   .write_one_page = generic_file_buffered_write_one_page,
 };
 
 void ext3_set_aops(struct inode *inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6bf1395..1b1f288 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -433,6 +433,11 @@ struct address_space_operations {
int (*migratepage) (struct address_space *,
struct page *, struct page *);
int (*launder_page) (struct page *);
+   /* write the contents of the source page over the page at the specified
+* index in the target address space (the source page does not need to
+* be related to the target address space) */
+   int (*write_one_page)(struct address_space *, pgoff_t, struct page *);
+
 };
 
 struct backing_dev_info;
@@ -1669,6 +1674,8 @@ extern ssize_t generic_file_direct_write(struct kiocb *, 
const struct iovec *,
unsigned long *, loff_t, loff_t *, size_t, size_t);
 extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec 
*,
unsigned long, loff_t, loff_t *, size_t, ssize_t);
+extern int generic_file_buffered_write_one_page(struct address_space *,
+   pgoff_t, struct page *);
 extern ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, 
loff_t *ppos);
 extern ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t 
len, loff_t *ppos);
 extern void do_generic_mapping_read(struct address_space *mapping,
diff --git a/mm/filemap.c b/mm/filemap.c
index 7b96487..5e419a2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2032,6 +2032,101 @@ zero_length_segment:
 }
 EXPORT_SYMBOL(generic_file_buffered_write);
 
+/**
+ * generic_file_buffered_write_one_page - Write a single page of data to an
+ * inode
+ * @mapping - The address space of the target inode
+ * @index - The target page in the target inode to fill
+ * @source - The data to write into the target page
+ *
+ * Write the data from the source page to the page in the nominated address
+ * space at the @index specified.  Note that the file will not be extended if
+ * the page crosses the EOF marker, in which case only the first part of the
+ * page will be written.
+ *
+ * The @source page does not need to have any association with the file or the
+ * target page offset.
+ */
+int generic_fil

[PATCH 12/14] CacheFiles: Get the SID under which the CacheFiles module should operate [try #2]

2007-08-09 Thread David Howells
Get the SID under which the CacheFiles module should operate so that the
SELinux security system can control the accesses it makes.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 include/linux/security.h |   20 
 security/dummy.c |7 +++
 security/selinux/hooks.c |7 +++
 3 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 21cadea..9cb417e 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1164,6 +1164,14 @@ struct request_sock;
  *  owning security ID, and return the security ID as which the process was
  *  previously acting.
  *
+ * @cachefiles_get_secid:
+ * Determine the security ID for the CacheFiles module to use when
+ * accessing the filesystem containing the cache.
+ * @secid contains the security ID under which cachefiles daemon is
+ *  running.
+ * @modsecid contains the pointer to where the security ID for the module
+ * is to be stored.
+ *
  * This is the main security structure.
  */
 struct security_operations {
@@ -1352,6 +1360,7 @@ struct security_operations {
u32 (*set_fscreate_secid)(u32 secid);
u32 (*act_as_secid)(u32 secid);
u32 (*act_as_self)(void);
+   int (*cachefiles_get_secid)(u32 secid, u32 *modsecid);
 
 #ifdef CONFIG_SECURITY_NETWORK
int (*unix_stream_connect) (struct socket * sock,
@@ -2176,6 +2185,11 @@ static inline u32 security_act_as_self(void)
return security_ops->act_as_self();
 }
 
+static inline int security_cachefiles_get_secid(u32 secid, u32 *modsecid)
+{
+   return security_ops->cachefiles_get_secid(secid, modsecid);
+}
+
 /* prototypes */
 extern int security_init   (void);
 extern int register_security   (struct security_operations *ops);
@@ -2883,6 +2897,12 @@ static inline u32 security_act_as_self(void)
return 0;
 }
 
+static inline int security_cachefiles_get_secid(u32 secid, u32 *modsecid)
+{
+   *modsecid = 0;
+   return 0;
+}
+
 #endif /* CONFIG_SECURITY */
 
 #ifdef CONFIG_SECURITY_NETWORK
diff --git a/security/dummy.c b/security/dummy.c
index 6a7a317..2c1fd16 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -955,6 +955,12 @@ static u32 dummy_act_as_self(void)
return 0;
 }
 
+static int dummy_cachefiles_get_secid(u32 secid, u32 *modsecid)
+{
+   *modsecid = 0;
+   return 0;
+}
+
 #ifdef CONFIG_KEYS
 static inline int dummy_key_alloc(struct key *key, struct task_struct *ctx,
  unsigned long flags)
@@ -1114,6 +1120,7 @@ void security_fixup_ops (struct security_operations *ops)
set_to_dummy_if_null(ops, set_fscreate_secid);
set_to_dummy_if_null(ops, act_as_secid);
set_to_dummy_if_null(ops, act_as_self);
+   set_to_dummy_if_null(ops, cachefiles_get_secid);
 #ifdef CONFIG_SECURITY_NETWORK
set_to_dummy_if_null(ops, unix_stream_connect);
set_to_dummy_if_null(ops, unix_may_send);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index c05d662..725f657 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4718,6 +4718,12 @@ static u32 selinux_act_as_self(void)
return oldactor_sid;
 }
 
+static int selinux_cachefiles_get_secid(u32 secid, u32 *modsecid)
+{
+   return security_transition_sid(secid, SECINITSID_KERNEL,
+  SECCLASS_PROCESS, modsecid);
+}
+
 #ifdef CONFIG_KEYS
 
 static int selinux_key_alloc(struct key *k, struct task_struct *tsk,
@@ -4905,6 +4911,7 @@ static struct security_operations selinux_ops = {
.set_fscreate_secid =   selinux_set_fscreate_secid,
.act_as_secid = selinux_act_as_secid,
.act_as_self =  selinux_act_as_self,
+   .cachefiles_get_secid = selinux_cachefiles_get_secid,
 
 .unix_stream_connect = selinux_socket_unix_stream_connect,
.unix_may_send =selinux_socket_unix_may_send,

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/14] CacheFiles: Permit a process's create SID to be overridden [try #2]

2007-08-09 Thread David Howells
Make it possible for a process's file creation SID to be temporarily overridden
by CacheFiles so that files created in the cache have the right label attached.

Without this facility, files created in the cache will be given the current
file creation SID of whatever process happens to have invoked CacheFiles
indirectly by means of opening a netfs file at the time the cache file is
created.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 include/linux/security.h |   35 +++
 security/dummy.c |   12 
 security/selinux/hooks.c |   18 ++
 3 files changed, 65 insertions(+), 0 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index c11dc8a..92d3da0 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1147,6 +1147,13 @@ struct request_sock;
  * @secdata contains the security context.
  * @seclen contains the length of the security context.
  *
+ * @get_fscreate_secid:
+ * Get the current FS security ID.
+ *
+ * @set_fscreate_secid:
+ * Set the current FS security ID.
+ * @secid contains the security ID to set.
+ *
  * This is the main security structure.
  */
 struct security_operations {
@@ -1330,6 +1337,8 @@ struct security_operations {
int (*setprocattr)(struct task_struct *p, char *name, void *value, 
size_t size);
int (*secid_to_secctx)(u32 secid, char **secdata, u32 *seclen);
void (*release_secctx)(char *secdata, u32 seclen);
+   u32 (*get_fscreate_secid)(void);
+   u32 (*set_fscreate_secid)(u32 secid);
 
 #ifdef CONFIG_SECURITY_NETWORK
int (*unix_stream_connect) (struct socket * sock,
@@ -2127,6 +2136,16 @@ static inline void security_release_secctx(char 
*secdata, u32 seclen)
return security_ops->release_secctx(secdata, seclen);
 }
 
+static inline u32 security_get_fscreate_secid(void)
+{
+   return security_ops->get_fscreate_secid();
+}
+
+static inline u32 security_set_fscreate_secid(u32 secid)
+{
+   return security_ops->set_fscreate_secid(secid);
+}
+
 /* prototypes */
 extern int security_init   (void);
 extern int register_security   (struct security_operations *ops);
@@ -2795,6 +2814,11 @@ static inline void securityfs_remove(struct dentry 
*dentry)
 {
 }
 
+static inline int security_to_secctx_secid(char *secdata, u32 seclen, u32 
*secid)
+{
+   return -EOPNOTSUPP;
+}
+
 static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 
*seclen)
 {
return -EOPNOTSUPP;
@@ -2803,6 +2827,17 @@ static inline int security_secid_to_secctx(u32 secid, 
char **secdata, u32 *secle
 static inline void security_release_secctx(char *secdata, u32 seclen)
 {
 }
+
+static inline u32 security_get_fscreate_secid(void)
+{
+   return 0;
+}
+
+static inline u32 security_set_fscreate_secid(u32 secid)
+{
+   return 0;
+}
+
 #endif /* CONFIG_SECURITY */
 
 #ifdef CONFIG_SECURITY_NETWORK
diff --git a/security/dummy.c b/security/dummy.c
index 19d813d..d463e6f 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -930,6 +930,16 @@ static void dummy_release_secctx(char *secdata, u32 seclen)
 {
 }
 
+static u32 dummy_get_fscreate_secid(void)
+{
+   return 0;
+}
+
+static u32 dummy_set_fscreate_secid(u32 secid)
+{
+   return 0;
+}
+
 #ifdef CONFIG_KEYS
 static inline int dummy_key_alloc(struct key *key, struct task_struct *ctx,
  unsigned long flags)
@@ -1084,6 +1094,8 @@ void security_fixup_ops (struct security_operations *ops)
set_to_dummy_if_null(ops, setprocattr);
set_to_dummy_if_null(ops, secid_to_secctx);
set_to_dummy_if_null(ops, release_secctx);
+   set_to_dummy_if_null(ops, get_fscreate_secid);
+   set_to_dummy_if_null(ops, set_fscreate_secid);
 #ifdef CONFIG_SECURITY_NETWORK
set_to_dummy_if_null(ops, unix_stream_connect);
set_to_dummy_if_null(ops, unix_may_send);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6237933..c5905b0 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4661,6 +4661,22 @@ static void selinux_release_secctx(char *secdata, u32 
seclen)
kfree(secdata);
 }
 
+static u32 selinux_get_fscreate_secid(void)
+{
+   struct task_security_struct *tsec = current->security;
+
+   return tsec->create_sid;
+}
+
+static u32 selinux_set_fscreate_secid(u32 secid)
+{
+   struct task_security_struct *tsec = current->security;
+   u32 oldsid = tsec->create_sid;
+
+   tsec->create_sid = secid;
+   return oldsid;
+}
+
 #ifdef CONFIG_KEYS
 
 static int selinux_key_alloc(struct key *k, struct task_struct *tsk,
@@ -4843,6 +4859,8 @@ static struct security_operations selinux_ops = {
 
.secid_to_secctx =  selinux_secid_to_secctx,
.release_secctx =   selinux_release_secctx,
+   .get_fscreate_secid =   selinux_get_fscreate_secid,
+   .set_fscreate_secid =   selinux_set_f

[PATCH 07/14] CacheFiles: Permit the page lock state to be monitored [try #2]

2007-08-09 Thread David Howells
Add a function to install a monitor on the page lock waitqueue for a particular
page, thus allowing the page being unlocked to be detected.

This is used by CacheFiles to detect read completion on a page in the backing
filesystem so that it can then copy the data to the waiting netfs page.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 include/linux/pagemap.h |5 +
 mm/filemap.c|   19 +++
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index d1049b6..452fdcf 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -220,6 +220,11 @@ static inline void wait_on_page_fscache_write(struct page 
*page)
 extern void end_page_fscache_write(struct page *page);
 
 /*
+ * Add an arbitrary waiter to a page's wait queue
+ */
+extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
+
+/*
  * Fault a userspace page into pagetables.  Return non-zero on a fault.
  *
  * This assumes that two userspace pages are always sufficient.  That's
diff --git a/mm/filemap.c b/mm/filemap.c
index 5e419a2..c60c24e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -518,6 +518,25 @@ void fastcall wait_on_page_bit(struct page *page, int 
bit_nr)
 EXPORT_SYMBOL(wait_on_page_bit);
 
 /**
+ * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue
+ * @page - Page defining the wait queue of interest
+ * @waiter - Waiter to add to the queue
+ *
+ * Add an arbitrary @waiter to the wait queue for the nominated @page.
+ */
+void add_page_wait_queue(struct page *page, wait_queue_t *waiter)
+{
+   wait_queue_head_t *q = page_waitqueue(page);
+   unsigned long flags;
+
+   spin_lock_irqsave(&q->lock, flags);
+   __add_wait_queue(q, waiter);
+   spin_unlock_irqrestore(&q->lock, flags);
+}
+
+EXPORT_SYMBOL_GPL(add_page_wait_queue);
+
+/**
  * unlock_page - unlock a locked page
  * @page: the page
  *

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]

2007-08-09 Thread David Howells
Permit an inode's security ID to be obtained by the CacheFiles module.  This is
then used as the SID with which files and directories will be created in the
cache.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 include/linux/security.h |   13 +
 security/dummy.c |6 ++
 security/selinux/hooks.c |8 
 3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 422015d..21cadea 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1252,6 +1252,7 @@ struct security_operations {
int (*inode_getsecurity)(const struct inode *inode, const char *name, 
void *buffer, size_t size, int err);
int (*inode_setsecurity)(struct inode *inode, const char *name, const 
void *value, size_t size, int flags);
int (*inode_listsecurity)(struct inode *inode, char *buffer, size_t 
buffer_size);
+   u32 (*inode_get_secid)(struct inode *inode);
 
int (*file_permission) (struct file * file, int mask);
int (*file_alloc_security) (struct file * file);
@@ -1814,6 +1815,13 @@ static inline int security_inode_listsecurity(struct 
inode *inode, char *buffer,
return security_ops->inode_listsecurity(inode, buffer, buffer_size);
 }
 
+static inline u32 security_inode_get_secid(struct inode *inode)
+{
+   if (unlikely(IS_PRIVATE(inode)))
+   return 0;
+   return security_ops->inode_get_secid(inode);
+}
+
 static inline int security_file_permission (struct file *file, int mask)
 {
return security_ops->file_permission (file, mask);
@@ -2514,6 +2522,11 @@ static inline int security_inode_listsecurity(struct 
inode *inode, char *buffer,
return 0;
 }
 
+static inline u32 security_inode_get_secid(struct inode *inode)
+{
+   return 0;
+}
+
 static inline int security_file_permission (struct file *file, int mask)
 {
return 0;
diff --git a/security/dummy.c b/security/dummy.c
index 77ec75d..6a7a317 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -392,6 +392,11 @@ static int dummy_inode_listsecurity(struct inode *inode, 
char *buffer, size_t bu
return 0;
 }
 
+static u32 dummy_inode_get_secid(struct inode *inode)
+{
+   return 0;
+}
+
 static const char *dummy_inode_xattr_getsuffix(void)
 {
return NULL;
@@ -1042,6 +1047,7 @@ void security_fixup_ops (struct security_operations *ops)
set_to_dummy_if_null(ops, inode_getsecurity);
set_to_dummy_if_null(ops, inode_setsecurity);
set_to_dummy_if_null(ops, inode_listsecurity);
+   set_to_dummy_if_null(ops, inode_get_secid);
set_to_dummy_if_null(ops, file_permission);
set_to_dummy_if_null(ops, file_alloc_security);
set_to_dummy_if_null(ops, file_free_security);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 66af819..c05d662 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2464,6 +2464,13 @@ static int selinux_inode_listsecurity(struct inode 
*inode, char *buffer, size_t
return len;
 }
 
+static u32 selinux_inode_get_secid(struct inode *inode)
+{
+   struct inode_security_struct *isec = inode->i_security;
+
+   return isec->sid;
+}
+
 /* file security operations */
 
 static int selinux_file_permission(struct file *file, int mask)
@@ -4822,6 +4829,7 @@ static struct security_operations selinux_ops = {
.inode_getsecurity =selinux_inode_getsecurity,
.inode_setsecurity =selinux_inode_setsecurity,
.inode_listsecurity =   selinux_inode_listsecurity,
+   .inode_get_secid =  selinux_inode_get_secid,
 
.file_permission =  selinux_file_permission,
.file_alloc_security =  selinux_file_alloc_security,

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/14] CacheFiles: Export things for CacheFiles [try #2]

2007-08-09 Thread David Howells
Export a number of functions for CacheFiles's use.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 fs/super.c   |2 ++
 kernel/auditsc.c |2 ++
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index fc8ebed..c0d99dd 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -270,6 +270,8 @@ int fsync_super(struct super_block *sb)
return sync_blockdev(sb->s_bdev);
 }
 
+EXPORT_SYMBOL_GPL(fsync_super);
+
 /**
  * generic_shutdown_super  -   common helper for ->kill_sb()
  * @sb: superblock to kill
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index a777d37..1c068ec 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1526,6 +1526,8 @@ add_names:
}
 }
 
+EXPORT_SYMBOL_GPL(__audit_inode_child);
+
 /**
  * auditsc_get_stamp - get local copies of audit_context values
  * @ctx: audit_context for the task

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/14] NFS: Use local caching [try #2]

2007-08-09 Thread David Howells
The attached patch makes it possible for the NFS filesystem to make use of the
network filesystem local caching service (FS-Cache).

To be able to use this, an updated mount program is required.  This can be
obtained from:

http://people.redhat.com/steved/cachefs/util-linux/

To mount an NFS filesystem to use caching, add an "fsc" option to the mount:

mount warthog:/ /a -o fsc

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 fs/Kconfig |8 +
 fs/nfs/Makefile|1 
 fs/nfs/client.c|   11 +
 fs/nfs/file.c  |   38 +++-
 fs/nfs/fscache.c   |  303 +
 fs/nfs/fscache.h   |  464 
 fs/nfs/inode.c |   16 ++
 fs/nfs/internal.h  |8 +
 fs/nfs/read.c  |   27 ++-
 fs/nfs/super.c |1 
 fs/nfs/sysctl.c|   43 
 fs/nfs/write.c |3 
 include/linux/nfs4_mount.h |3 
 include/linux/nfs_fs.h |6 +
 include/linux/nfs_fs_sb.h  |5 
 include/linux/nfs_mount.h  |3 
 16 files changed, 931 insertions(+), 9 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 7feb4cb..76d5d16 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1600,6 +1600,14 @@ config NFS_V4
 
  If unsure, say N.
 
+config NFS_FSCACHE
+   bool "Provide NFS client caching support (EXPERIMENTAL)"
+   depends on EXPERIMENTAL
+   depends on NFS_FS=m && FSCACHE || NFS_FS=y && FSCACHE=y
+   help
+ Say Y here if you want NFS data to be cached locally on disc through
+ the general filesystem cache manager
+
 config NFS_DIRECTIO
bool "Allow direct I/O on NFS files"
depends on NFS_FS
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index b55cb23..c9e7c43 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -16,4 +16,5 @@ nfs-$(CONFIG_NFS_V4)  += nfs4proc.o nfs4xdr.o nfs4state.o 
nfs4renewd.o \
   nfs4namespace.o
 nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
 nfs-$(CONFIG_SYSCTL) += sysctl.o
+nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
 nfs-objs   := $(nfs-y)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index a49f9fe..7be7807 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -137,6 +137,8 @@ static struct nfs_client *nfs_alloc_client(const char 
*hostname,
clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED;
 #endif
 
+   nfs_fscache_get_client_cookie(clp);
+
return clp;
 
 error_3:
@@ -168,6 +170,8 @@ static void nfs_free_client(struct nfs_client *clp)
 
nfs4_shutdown_client(clp);
 
+   nfs_fscache_release_client_cookie(clp);
+
/* -EIO all pending I/O */
if (!IS_ERR(clp->cl_rpcclient))
rpc_shutdown_client(clp->cl_rpcclient);
@@ -1308,7 +1312,7 @@ static int nfs_volume_list_show(struct seq_file *m, void 
*v)
 
/* display header on line 1 */
if (v == &nfs_volume_list) {
-   seq_puts(m, "NV SERVER   PORT DEV FSID\n");
+   seq_puts(m, "NV SERVER   PORT DEV FSID  FSC\n");
return 0;
}
/* display one transport per line on subsequent lines */
@@ -1322,12 +1326,13 @@ static int nfs_volume_list_show(struct seq_file *m, 
void *v)
 (unsigned long long) server->fsid.major,
 (unsigned long long) server->fsid.minor);
 
-   seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s\n",
+   seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n",
   clp->cl_nfsversion,
   NIPQUAD(clp->cl_addr.sin_addr),
   ntohs(clp->cl_addr.sin_port),
   dev,
-  fsid);
+  fsid,
+  nfs_server_fscache_state(server));
 
return 0;
 }
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index c87dc71..dfd36e0 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -34,6 +34,7 @@
 
 #include "delegation.h"
 #include "iostat.h"
+#include "internal.h"
 
 #define NFSDBG_FACILITYNFSDBG_FILE
 
@@ -259,6 +260,10 @@ nfs_file_mmap(struct file * file, struct vm_area_struct * 
vma)
status = nfs_revalidate_mapping(inode, file->f_mapping);
if (!status)
status = generic_file_mmap(file, vma);
+
+   if (status == 0)
+   nfs_fscache_install_vm_ops(inode, vma);
+
return status;
 }
 
@@ -311,22 +316,51 @@ static int nfs_commit_write(struct file *file, struct 
page *page, unsigned offse
return status;
 }
 
+/*
+ * partially or wholly invalidate a page
+ * - release the private state associated with a page if undergoing complete
+ *   page invalidation
+ * - caller holds page lock
+ */
 static void nfs_invalidate_page(struct page *page, unsigned long offset)
 {
if (offset != 0)
return;
/* Cancel any unstarted writes on this page */
nfs_wb_page_priority(page->mapping->host, page, FLUSH_INVALIDA

[PATCH 01/14] FS-Cache: Release page->private after failed readahead [try #2]

2007-08-09 Thread David Howells
The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails or the filler function fails. This
permits pages with caching references associated with them to be cleaned up.

The invalidatepage() address space op is called (indirectly) to do the honours.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 mm/readahead.c |   40 ++--
 1 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 39bf45d..12d1378 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
 {
@@ -51,6 +52,41 @@ EXPORT_SYMBOL_GPL(file_ra_state_init);
 
 #define list_to_page(head) (list_entry((head)->prev, struct page, lru))
 
+/*
+ * see if a page needs releasing upon read_cache_pages() failure
+ * - the caller of read_cache_pages() may have set PG_private before calling,
+ *   such as the NFS fs marking pages that are cached locally on disk, thus we
+ *   need to give the fs a chance to clean up in the event of an error
+ */
+static void read_cache_pages_invalidate_page(struct address_space *mapping,
+struct page *page)
+{
+   if (PagePrivate(page)) {
+   if (TestSetPageLocked(page))
+   BUG();
+   page->mapping = mapping;
+   do_invalidatepage(page, 0);
+   page->mapping = NULL;
+   unlock_page(page);
+   }
+   page_cache_release(page);
+}
+
+/*
+ * release a list of pages, invalidating them first if need be
+ */
+static void read_cache_pages_invalidate_pages(struct address_space *mapping,
+ struct list_head *pages)
+{
+   struct page *victim;
+
+   while (!list_empty(pages)) {
+   victim = list_to_page(pages);
+   list_del(&victim->lru);
+   read_cache_pages_invalidate_page(mapping, victim);
+   }
+}
+
 /**
  * read_cache_pages - populate an address space with some pages & start reads 
against them
  * @mapping: the address_space
@@ -74,14 +110,14 @@ int read_cache_pages(struct address_space *mapping, struct 
list_head *pages,
page = list_to_page(pages);
list_del(&page->lru);
if (add_to_page_cache(page, mapping, page->index, GFP_KERNEL)) {
-   page_cache_release(page);
+   read_cache_pages_invalidate_page(mapping, page);
continue;
}
ret = filler(data, page);
if (!pagevec_add(&lru_pvec, page))
__pagevec_lru_add(&lru_pvec);
if (ret) {
-   put_pages_list(pages);
+   read_cache_pages_invalidate_pages(mapping, pages);
break;
}
task_io_account_read(PAGE_CACHE_SIZE);

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/14] FS-Cache: Provide an add_wait_queue_tail() function [try #2]

2007-08-09 Thread David Howells
Provide an add_wait_queue_tail() function to add a waiter to the back of a
wait queue instead of the front.

Signed-off-by: David Howells <[EMAIL PROTECTED]>
---

 include/linux/wait.h |1 +
 kernel/wait.c|   18 ++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 0e68628..4cae7db 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -118,6 +118,7 @@ static inline int waitqueue_active(wait_queue_head_t *q)
 #define is_sync_wait(wait) (!(wait) || ((wait)->private))
 
 extern void FASTCALL(add_wait_queue(wait_queue_head_t *q, wait_queue_t * 
wait));
+extern void FASTCALL(add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t * 
wait));
 extern void FASTCALL(add_wait_queue_exclusive(wait_queue_head_t *q, 
wait_queue_t * wait));
 extern void FASTCALL(remove_wait_queue(wait_queue_head_t *q, wait_queue_t * 
wait));
 
diff --git a/kernel/wait.c b/kernel/wait.c
index 444ddbf..7acc9cc 100644
--- a/kernel/wait.c
+++ b/kernel/wait.c
@@ -29,6 +29,24 @@ void fastcall add_wait_queue(wait_queue_head_t *q, 
wait_queue_t *wait)
 }
 EXPORT_SYMBOL(add_wait_queue);
 
+/**
+ * add_wait_queue_tail - Add a waiter to the back of a waitqueue
+ * @q: the wait queue to append the waiter to
+ * @wait: the waiter to be queued
+ *
+ * Add a waiter to the back of a waitqueue so that it gets woken up last.
+ */
+void fastcall add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t *wait)
+{
+   unsigned long flags;
+
+   wait->flags &= ~WQ_FLAG_EXCLUSIVE;
+   spin_lock_irqsave(&q->lock, flags);
+   __add_wait_queue_tail(q, wait);
+   spin_unlock_irqrestore(&q->lock, flags);
+}
+EXPORT_SYMBOL(add_wait_queue_tail);
+
 void fastcall add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t 
*wait)
 {
unsigned long flags;

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/14] Permit filesystem local caching [try #2]

2007-08-09 Thread David Howells


These patches add local caching for network filesystems such as NFS and AFS.

FS-Cache now runs fully asynchronously as required by Trond Myklebust for NFS.

--
Changes:

 (*) The CacheFiles module no longer accepts directory fds in its cull and
 inuse commands from cachefilesd.  Instead it uses the current working
 directory of the calling process as the basis for looking up the object.
 Corollary to this, fget_light() no longer needs to be exported.

--
A tarball of the patches is available at:


http://people.redhat.com/~dhowells/fscache/patches/nfs+fscache-21.tar.bz2


To use this version of CacheFiles, the cachefilesd-0.9 is also required.  It
is available as an SRPM:

http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9-1.fc7.src.rpm

Or as individual bits:

http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9.tar.bz2
http://people.redhat.com/~dhowells/fscache/cachefilesd.fc
http://people.redhat.com/~dhowells/fscache/cachefilesd.if
http://people.redhat.com/~dhowells/fscache/cachefilesd.te
http://people.redhat.com/~dhowells/fscache/cachefilesd.spec

The .fc, .if and .te files are for manipulating SELinux.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/14] CacheFiles: Add missing copy_page export for ia64 [try #2]

2007-08-09 Thread David Howells
This one-line patch fixes the missing export of copy_page introduced
by the cachefile patches.  This patch is not yet upstream, but is required
for cachefile on ia64.  It will be pushed upstream when cachefile goes
upstream.

Signed-off-by: Prarit Bhargava <[EMAIL PROTECTED]>
Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 arch/ia64/kernel/ia64_ksyms.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/kernel/ia64_ksyms.c b/arch/ia64/kernel/ia64_ksyms.c
index bd17190..20c3546 100644
--- a/arch/ia64/kernel/ia64_ksyms.c
+++ b/arch/ia64/kernel/ia64_ksyms.c
@@ -43,6 +43,7 @@ EXPORT_SYMBOL(__do_clear_user);
 EXPORT_SYMBOL(__strlen_user);
 EXPORT_SYMBOL(__strncpy_from_user);
 EXPORT_SYMBOL(__strnlen_user);
+EXPORT_SYMBOL(copy_page);
 
 /* from arch/ia64/lib */
 extern void __divsi3(void);

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/14] FS-Cache: Recruit a couple of page flags for cache management [try #2]

2007-08-09 Thread David Howells
Recruit a couple of page flags to aid in cache management.  The following extra
flags are defined:

 (1) PG_fscache (PG_owner_priv_2)

 The marked page is backed by a local cache and is pinning resources in the
 cache driver.

 (2) PG_fscache_write (PG_owner_priv_3)

 The marked page is being written to the local cache.  The page may not be
 modified whilst this is in progress.

If PG_fscache is set, then things that checked for PG_private will now also
check for that.  This includes things like truncation and page invalidation.
The function page_has_private() had been added to detect this.

Signed-off-by: David Howells <[EMAIL PROTECTED]>
---

 fs/splice.c|2 +-
 include/linux/page-flags.h |   30 +-
 include/linux/pagemap.h|   11 +++
 mm/filemap.c   |   16 
 mm/migrate.c   |2 +-
 mm/page_alloc.c|3 +++
 mm/readahead.c |9 +
 mm/swap.c  |4 ++--
 mm/swap_state.c|4 ++--
 mm/truncate.c  |   10 +-
 mm/vmscan.c|2 +-
 11 files changed, 76 insertions(+), 17 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index c010a72..ae4f5b7 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -58,7 +58,7 @@ static int page_cache_pipe_buf_steal(struct pipe_inode_info 
*pipe,
 */
wait_on_page_writeback(page);
 
-   if (PagePrivate(page))
+   if (page_has_private(page))
try_to_release_page(page, GFP_KERNEL);
 
/*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 209d3a4..eaf9854 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -83,19 +83,24 @@
 #define PG_private 11  /* If pagecache, has fs-private data */
 
 #define PG_writeback   12  /* Page is under writeback */
+#define PG_owner_priv_213  /* Owner use. If pagecache, fs 
may use */
 #define PG_compound14  /* Part of a compound page */
 #define PG_swapcache   15  /* Swap page: swp_entry_t in private */
 
 #define PG_mappedtodisk16  /* Has blocks allocated on-disk 
*/
 #define PG_reclaim 17  /* To be reclaimed asap */
+#define PG_owner_priv_318  /* Owner use. If pagecache, fs 
may use */
 #define PG_buddy   19  /* Page is free, on buddy lists */
 
 /* PG_readahead is only used for file reads; PG_reclaim is only for writes */
 #define PG_readahead   PG_reclaim /* Reminder to do async read-ahead */
 
-/* PG_owner_priv_1 users should have descriptive aliases */
+/* PG_owner_priv_1/2/3 users should have descriptive aliases */
 #define PG_checked PG_owner_priv_1 /* Used by some filesystems */
 #define PG_pinned  PG_owner_priv_1 /* Xen pinned pagetable */
+#define PG_fscache PG_owner_priv_2 /* Backed by local cache */
+#define PG_fscache_write   PG_owner_priv_3 /* Writing to local cache */
+
 
 #if (BITS_PER_LONG > 32)
 /*
@@ -199,6 +204,18 @@ static inline void SetPageUptodate(struct page *page)
 #define TestClearPageWriteback(page) test_and_clear_bit(PG_writeback,  \
&(page)->flags)
 
+#define PageFsCache(page)  test_bit(PG_fscache, &(page)->flags)
+#define SetPageFsCache(page)   set_bit(PG_fscache, &(page)->flags)
+#define ClearPageFsCache(page) clear_bit(PG_fscache, &(page)->flags)
+#define TestSetPageFsCache(page) test_and_set_bit(PG_fscache, &(page)->flags)
+#define TestClearPageFsCache(page) test_and_clear_bit(PG_fscache, 
&(page)->flags)
+
+#define PageFsCacheWrite(page) test_bit(PG_fscache_write, 
&(page)->flags)
+#define SetPageFsCacheWrite(page)  set_bit(PG_fscache_write, 
&(page)->flags)
+#define ClearPageFsCacheWrite(page)clear_bit(PG_fscache_write, 
&(page)->flags)
+#define TestSetPageFsCacheWrite(page)  test_and_set_bit(PG_fscache_write, 
&(page)->flags)
+#define TestClearPageFsCacheWrite(page)
test_and_clear_bit(PG_fscache_write, &(page)->flags)
+
 #define PageBuddy(page)test_bit(PG_buddy, &(page)->flags)
 #define __SetPageBuddy(page)   __set_bit(PG_buddy, &(page)->flags)
 #define __ClearPageBuddy(page) __clear_bit(PG_buddy, &(page)->flags)
@@ -272,4 +289,15 @@ static inline void set_page_writeback(struct page *page)
test_set_page_writeback(page);
 }
 
+/**
+ * page_has_private - Determine if page has private stuff
+ * @page: The page to be checked
+ *
+ * Determine if a page has private stuff, indicating that release routines
+ * should be invoked upon it.
+ */
+#define page_has_private(page) \
+   ((page)->flags & ((1 << PG_private) |   \
+ (1 << PG_fscache)))
+
 #endif /* PAGE_FLAGS_H */
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
ind

Re: [RFC PATCH 1/4] pass open file to ->setattr()

2007-08-09 Thread Miklos Szeredi
> > This is needed to be able to correctly implement open-unlink-fsetattr
> > semantics in some filesystem such as sshfs, without having to resort
> > to "silly-renaming".
> 
> How do you plan to do that?

Easy: the SFTP protocol has stateful opens and defines an FSTAT call.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/4] pass open file to ->setattr()

2007-08-09 Thread J. Bruce Fields
On Thu, Aug 09, 2007 at 05:27:45PM +0200, [EMAIL PROTECTED] wrote:
> This is needed to be able to correctly implement open-unlink-fsetattr
> semantics in some filesystem such as sshfs, without having to resort
> to "silly-renaming".

How do you plan to do that?

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/4] pass open file to ->getattr()

2007-08-09 Thread miklos
From: Miklos Szeredi <[EMAIL PROTECTED]>

Pass the open file into the filesystem's ->getattr() method for
fstat().

This is needed to be able to correctly implement open-unlink-fstat
semantics in some filesystem such as sshfs, without having to resort
to "silly-renaming".

Do this by adding a 'struct file *' parameter to i_op->getattr().  For
fstat() pass the open file pointer, in other cases pass NULL.

This is safe from a compatibility standpoint, out-of-tree old stuff
will continue to work, but will get a warning at compile time.

Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
---

Index: linux/fs/9p/vfs_inode.c
===
--- linux.orig/fs/9p/vfs_inode.c2007-08-09 16:47:30.0 +0200
+++ linux/fs/9p/vfs_inode.c 2007-08-09 16:48:45.0 +0200
@@ -706,7 +706,7 @@ done:
 
 static int
 v9fs_vfs_getattr(struct vfsmount *mnt, struct dentry *dentry,
-struct kstat *stat)
+struct kstat *stat, struct file *file)
 {
int err;
struct v9fs_session_info *v9ses;
Index: linux/fs/afs/inode.c
===
--- linux.orig/fs/afs/inode.c   2007-08-09 16:47:30.0 +0200
+++ linux/fs/afs/inode.c2007-08-09 16:48:45.0 +0200
@@ -295,7 +295,7 @@ error_unlock:
  * read the attributes of an inode
  */
 int afs_getattr(struct vfsmount *mnt, struct dentry *dentry,
- struct kstat *stat)
+ struct kstat *stat, struct file *file)
 {
struct inode *inode;
 
Index: linux/fs/afs/internal.h
===
--- linux.orig/fs/afs/internal.h2007-08-09 16:47:30.0 +0200
+++ linux/fs/afs/internal.h 2007-08-09 16:48:45.0 +0200
@@ -548,7 +548,8 @@ extern struct inode *afs_iget(struct sup
  struct afs_callback *);
 extern void afs_zap_data(struct afs_vnode *);
 extern int afs_validate(struct afs_vnode *, struct key *);
-extern int afs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
+extern int afs_getattr(struct vfsmount *, struct dentry *, struct kstat *,
+  struct file *);
 extern int afs_setattr(struct dentry *, struct iattr *);
 extern void afs_clear_inode(struct inode *);
 
Index: linux/fs/bad_inode.c
===
--- linux.orig/fs/bad_inode.c   2007-08-09 16:47:30.0 +0200
+++ linux/fs/bad_inode.c2007-08-09 16:48:45.0 +0200
@@ -250,7 +250,7 @@ static int bad_inode_permission(struct i
 }
 
 static int bad_inode_getattr(struct vfsmount *mnt, struct dentry *dentry,
-   struct kstat *stat)
+   struct kstat *stat, struct file *file)
 {
return -EIO;
 }
Index: linux/fs/cifs/cifsfs.h
===
--- linux.orig/fs/cifs/cifsfs.h 2007-08-09 16:47:30.0 +0200
+++ linux/fs/cifs/cifsfs.h  2007-08-09 16:48:45.0 +0200
@@ -55,7 +55,8 @@ extern int cifs_rmdir(struct inode *, st
 extern int cifs_rename(struct inode *, struct dentry *, struct inode *,
   struct dentry *);
 extern int cifs_revalidate(struct dentry *);
-extern int cifs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
+extern int cifs_getattr(struct vfsmount *, struct dentry *, struct kstat *,
+   struct file *);
 extern int cifs_setattr(struct dentry *, struct iattr *);
 
 extern const struct inode_operations cifs_file_inode_ops;
Index: linux/fs/cifs/inode.c
===
--- linux.orig/fs/cifs/inode.c  2007-08-09 16:47:30.0 +0200
+++ linux/fs/cifs/inode.c   2007-08-09 16:48:45.0 +0200
@@ -1332,7 +1332,7 @@ int cifs_revalidate(struct dentry *diren
 }
 
 int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry,
-   struct kstat *stat)
+   struct kstat *stat, struct file *file)
 {
int err = cifs_revalidate(dentry);
if (!err) {
Index: linux/fs/coda/inode.c
===
--- linux.orig/fs/coda/inode.c  2007-08-09 16:47:30.0 +0200
+++ linux/fs/coda/inode.c   2007-08-09 16:48:45.0 +0200
@@ -220,7 +220,8 @@ static void coda_clear_inode(struct inod
coda_cache_clear_inode(inode);
 }
 
-int coda_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat 
*stat)
+int coda_getattr(struct vfsmount *mnt, struct dentry *dentry,
+struct kstat *stat, struct file *file)
 {
int err = coda_revalidate_inode(dentry);
if (!err)
Index: linux/fs/fat/file.c
===
--- linux.orig/fs/fat/file.c2007-08-09 16:47:30.0 +0200
+++ linux/fs/fat/file.c 2007-08-09 16:48:45.0 +0200
@@ -303,7 +303,8 @@ v

[RFC PATCH 1/4] pass open file to ->setattr()

2007-08-09 Thread miklos
From: Miklos Szeredi <[EMAIL PROTECTED]>

Pass the open file into the filesystem's ->setattr() method for
fchmod, fchown and some of the utimes variants.

This is needed to be able to correctly implement open-unlink-fsetattr
semantics in some filesystem such as sshfs, without having to resort
to "silly-renaming".

The infrastructure is already there, so just need to fill in
attrs.ia_file and set ATTR_FILE in attrs.ia_valid.

Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
---

Index: linux/fs/open.c
===
--- linux.orig/fs/open.c2007-08-09 16:47:30.0 +0200
+++ linux/fs/open.c 2007-08-09 16:48:43.0 +0200
@@ -581,7 +581,8 @@ asmlinkage long sys_fchmod(unsigned int 
if (mode == (mode_t) -1)
mode = inode->i_mode;
newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
-   newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
+   newattrs.ia_valid = ATTR_MODE | ATTR_CTIME | ATTR_FILE;
+   newattrs.ia_file = file;
err = notify_change(dentry, &newattrs);
mutex_unlock(&inode->i_mutex);
 
@@ -631,7 +632,8 @@ asmlinkage long sys_chmod(const char __u
return sys_fchmodat(AT_FDCWD, filename, mode);
 }
 
-static int chown_common(struct dentry * dentry, uid_t user, gid_t group)
+static int chown_common(struct dentry * dentry, uid_t user, gid_t group,
+   struct file *file)
 {
struct inode * inode;
int error;
@@ -659,6 +661,10 @@ static int chown_common(struct dentry * 
}
if (!S_ISDIR(inode->i_mode))
newattrs.ia_valid |= ATTR_KILL_SUID|ATTR_KILL_SGID;
+   if (file) {
+   newattrs.ia_file = file;
+   newattrs.ia_valid |= ATTR_FILE;
+   }
mutex_lock(&inode->i_mutex);
error = notify_change(dentry, &newattrs);
mutex_unlock(&inode->i_mutex);
@@ -674,7 +680,7 @@ asmlinkage long sys_chown(const char __u
error = user_path_walk(filename, &nd);
if (error)
goto out;
-   error = chown_common(nd.dentry, user, group);
+   error = chown_common(nd.dentry, user, group, NULL);
path_release(&nd);
 out:
return error;
@@ -694,7 +700,7 @@ asmlinkage long sys_fchownat(int dfd, co
error = __user_walk_fd(dfd, filename, follow, &nd);
if (error)
goto out;
-   error = chown_common(nd.dentry, user, group);
+   error = chown_common(nd.dentry, user, group, NULL);
path_release(&nd);
 out:
return error;
@@ -708,7 +714,7 @@ asmlinkage long sys_lchown(const char __
error = user_path_walk_link(filename, &nd);
if (error)
goto out;
-   error = chown_common(nd.dentry, user, group);
+   error = chown_common(nd.dentry, user, group, NULL);
path_release(&nd);
 out:
return error;
@@ -727,7 +733,7 @@ asmlinkage long sys_fchown(unsigned int 
 
dentry = file->f_path.dentry;
audit_inode(NULL, dentry);
-   error = chown_common(dentry, user, group);
+   error = chown_common(dentry, user, group, file);
fput(file);
 out:
return error;
Index: linux/fs/utimes.c
===
--- linux.orig/fs/utimes.c  2007-08-09 16:47:30.0 +0200
+++ linux/fs/utimes.c   2007-08-09 16:48:43.0 +0200
@@ -130,6 +130,10 @@ long do_utimes(int dfd, char __user *fil
}
}
}
+   if (f) {
+   newattrs.ia_file = f;
+   newattrs.ia_valid |= ATTR_FILE;
+   }
mutex_lock(&inode->i_mutex);
error = notify_change(dentry, &newattrs);
mutex_unlock(&inode->i_mutex);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4/4] VFS: allow filesystem to override mknod capability checks

2007-08-09 Thread miklos
From: Miklos Szeredi <[EMAIL PROTECTED]>

Add a new filesystem flag, that results in the VFS not checking if the
current process has enough privileges to do an mknod().

This is needed on filesystems, where an unprivileged user may be able
to create a device node, without causing security problems.

One such example is "mountlo" a loopback mount utility implemented
with fuse and UML, which runs as an unprivileged userspace process.
In this case the user does in fact have the right to create device
nodes within the filesystem image, as long as the user has write
access to the image.  Since the filesystem is mounted with "nodev",
adding device nodes is not a security concern.

This feature is basically "fuse-only", so it does not make sense to
change the semantics of ->mknod().

Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
---

Index: linux/fs/namei.c
===
--- linux.orig/fs/namei.c   2007-08-09 16:49:07.0 +0200
+++ linux/fs/namei.c2007-08-09 16:49:12.0 +0200
@@ -1921,7 +1921,8 @@ int vfs_mknod(struct inode *dir, struct 
if (error)
return error;
 
-   if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
+   if (!(dir->i_sb->s_type->fs_flags & FS_MKNOD_CHECKS_PERM) &&
+   (S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
return -EPERM;
 
if (!dir->i_op || !dir->i_op->mknod)
Index: linux/include/linux/fs.h
===
--- linux.orig/include/linux/fs.h   2007-08-09 16:49:07.0 +0200
+++ linux/include/linux/fs.h2007-08-09 16:49:12.0 +0200
@@ -97,6 +97,7 @@ extern int dir_notify_enable;
 #define FS_BINARY_MOUNTDATA 2
 #define FS_HAS_SUBTYPE 4
 #define FS_SAFE 8  /* Safe to mount by unprivileged users */
+#define FS_MKNOD_CHECKS_PERM 16/* FS checks if device creation is 
allowed */
 #define FS_REVAL_DOT   16384   /* Check the paths ".", ".." for staleness */
 #define FS_RENAME_DOES_D_MOVE  32768   /* FS will handle d_move()
 * during rename() internally.

--
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/4] VFS updates

2007-08-09 Thread miklos
VFS tweaks needed for some FUSE features, but possibly useful to other
filesystems as well.

Comments are welcome.
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 3/4] allow filesystems to implement atomic open+truncate

2007-08-09 Thread miklos
From: Miklos Szeredi <[EMAIL PROTECTED]>

Add a new attribute flag ATTR_OPEN, with the meaning: "truncation was
initiated by open() due to the O_TRUNC flag".

This way filesystems wanting to implement truncation within their
->open() method can ignore such truncate requests.

This is a quick & dirty hack, but it comes for free.

When (if) we implement a proper low-level open+create+truncate inode
operation, this can go away.

Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
---

Index: linux/fs/namei.c
===
--- linux.orig/fs/namei.c   2007-08-09 16:47:30.0 +0200
+++ linux/fs/namei.c2007-08-09 16:49:07.0 +0200
@@ -1655,8 +1655,10 @@ int may_open(struct nameidata *nd, int a
error = locks_verify_locked(inode);
if (!error) {
DQUOT_INIT(inode);
-   
-   error = do_truncate(dentry, 0, ATTR_MTIME|ATTR_CTIME, 
NULL);
+
+   error = do_truncate(dentry, 0,
+   ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
+   NULL);
}
put_write_access(inode);
if (error)
Index: linux/include/linux/fs.h
===
--- linux.orig/include/linux/fs.h   2007-08-09 16:48:45.0 +0200
+++ linux/include/linux/fs.h2007-08-09 16:49:07.0 +0200
@@ -335,6 +335,7 @@ typedef void (dio_iodone_t)(struct kiocb
 #define ATTR_KILL_SUID 2048
 #define ATTR_KILL_SGID 4096
 #define ATTR_FILE  8192
+#define ATTR_OPEN  16384   /* Truncating from open(O_TRUNC) */
 
 /*
  * This is the Inode Attributes structure, used for notify_change().  It
@@ -1521,7 +1522,7 @@ static inline int break_lease(struct ino
 
 /* fs/open.c */
 
-extern int do_truncate(struct dentry *, loff_t start, unsigned int time_attrs,
+extern int do_truncate(struct dentry *, loff_t start, unsigned int attrs,
   struct file *filp);
 extern long do_sys_open(int fdf, const char __user *filename, int flags,
int mode);
Index: linux/fs/open.c
===
--- linux.orig/fs/open.c2007-08-09 16:48:43.0 +0200
+++ linux/fs/open.c 2007-08-09 16:49:07.0 +0200
@@ -194,7 +194,7 @@ out:
return error;
 }
 
-int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
+int do_truncate(struct dentry *dentry, loff_t length, unsigned int attrs,
struct file *filp)
 {
int err;
@@ -205,7 +205,7 @@ int do_truncate(struct dentry *dentry, l
return -EINVAL;
 
newattrs.ia_size = length;
-   newattrs.ia_valid = ATTR_SIZE | time_attrs;
+   newattrs.ia_valid = ATTR_SIZE | attrs;
if (filp) {
newattrs.ia_file = filp;
newattrs.ia_valid |= ATTR_FILE;

--
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [fuse-devel] [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)

2007-08-09 Thread Jeff Layton
On Wed, 8 Aug 2007 22:05:13 +0200 (CEST)
Jan Engelhardt <[EMAIL PROTECTED]> wrote:

> 
> On Aug 8 2007 09:48, Andrew Morton wrote:
> >> > On Mon, 6 Aug 2007 09:54:03 -0400
> >> > Jeff Layton <[EMAIL PROTECTED]> wrote:
> >> > 
> >> > Is there any way in which we can prevent these problems?  Say
> >> > 
> >> > - rename something so that unconverted filesystems will reliably fail to
> >> >   compile?
> >> > 
> >> 
> >> I suppose we could rename the .setattr inode operation to something
> >> else, but then we'll be stuck with it for at least a while. That seems
> >> sort of kludgey too...
> >
> >Sure.  We're changing the required behaviour of .setattr.  Changing its
> >name is a fine and reasonably reliable way to communicate that fact.
> 
> Maybe ->chattr/->chgattr?
> 
> 

That seems like a good replacement name. :-)

Now that I think on this further though, maybe Trond's suggestion to
change how the return code works is the best one. That would
(hopefully) catch this problem at runtime, so if someone is using a
precompiled but unconverted module then that would be detected too.

--
Jeff Layton <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html