[PATCH 2/4] writeback: 3-queue based writeback schedule
Properly manage the 3 queues of sb->s_dirty/s_io/s_more_io so that - time-ordering of dirtied_when can be easily maintained - writeback can continue from where previous run left out The majority work has been done by Andrew Morton and Ken Chen, this patch just clarifies the roles of the 3 queues: - s_dirty for io delay(up to dirty_expire_interval) - s_io for io run(a full scan of s_io may involve multiple runs) - s_more_io for io continuation The following paradigm shows the data flow. requeue on new scan(empty s_io) +-+ | | dirty old| | inodes enough V | ==> s_dirty ==> s_io | ^| requeue io| |+-> s_more_io | hold back| ++--> disk write requests sb->s_dirty: a FIFO queue - s_dirty hosts not-yet-expired(recently dirtied) dirty inodes - once expired, inodes will be moved out of s_dirty and *never put back* (unless for some reason we have to hold on the inode for some time) sb->s_io and sb->s_more_io: a cyclic queue scanned for io - on each run of generic_sync_sb_inodes(), some more s_dirty inodes may be appended to s_io - on each full scan of s_io, all s_more_io inodes will be moved back to s_io - large files that cannot be synced in one run will be moved to s_more_io for retry on next full scan inode->dirtied_when - inode->dirtied_when is updated to the *current* jiffies on pushing into s_dirty, and is never changed in other cases. - time-ordering thus can be simply ensured while moving inodes between lists, since (time order == enqueue order) Cc: Ken Chen <[EMAIL PROTECTED]> Cc: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]> --- fs/fs-writeback.c | 106 +--- 1 file changed, 52 insertions(+), 54 deletions(-) --- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c +++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c @@ -93,6 +93,15 @@ static void __check_dirty_inode_list(str __FILE__, __LINE__);\ } while (0) + +int sb_has_dirty_inodes(struct super_block *sb) +{ + return !list_empty(&sb->s_dirty) || + !list_empty(&sb->s_io) || + !list_empty(&sb->s_more_io); +} +EXPORT_SYMBOL(sb_has_dirty_inodes); + /** * __mark_inode_dirty -internal function * @inode: inode to mark @@ -187,7 +196,7 @@ void __mark_inode_dirty(struct inode *in goto out; /* -* If the inode was already on s_dirty or s_io, don't +* If the inode was already on s_dirty/s_io/s_more_io, don't * reposition it (that would break s_dirty time-ordering). */ if (!was_dirty) { @@ -211,33 +220,20 @@ static int write_inode(struct inode *ino } /* - * Redirty an inode: set its when-it-was dirtied timestamp and move it to the - * furthest end of its superblock's dirty-inode list. - * - * Before stamping the inode's ->dirtied_when, we check to see whether it is - * already the most-recently-dirtied inode on the s_dirty list. If that is - * the case then the inode must have been redirtied while it was being written - * out and we don't reset its dirtied_when. + * Enqueue a newly dirtied inode: + * - set its when-it-was dirtied timestamp + * - move it to the furthest end of its superblock's dirty-inode list */ static void redirty_tail(struct inode *inode) { - struct super_block *sb = inode->i_sb; - check_dirty_inode(inode); - if (!list_empty(&sb->s_dirty)) { - struct inode *tail_inode; - - tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list); - if (!time_after_eq(inode->dirtied_when, - tail_inode->dirtied_when)) - inode->dirtied_when = jiffies; - } - list_move(&inode->i_list, &sb->s_dirty); + inode->dirtied_when = jiffies; + list_move(&inode->i_list, &inode->i_sb->s_dirty); check_dirty_inode(inode); } /* - * requeue inode for re-scanning after sb->s_io list is exhausted. + * Queue an inode for more io in the next full scan of s_io. */ static void requeue_io(struct inode *inode) { @@ -246,6 +242,32 @@ static void requeue_io(struct inode *ino check_dirty_inode(inode); } +/* + * Queue all possible inodes for a run of io. + * The resulting s_io is in order of: + * - inodes queued for more io from s_more_io(once for a full scan of s_io) + * - possible remaining inodes in s_io(was a partial scan) + * - dirty inodes (old enough) from s_dirty + */
[PATCH 3/4] writeback: function renames and cleanups
Two function renames: - rename redirty_tail() to queue_dirty_inode() - rename requeue_io() to queue_for_more_io() Also some code cleanups on fs-writeback.c. No behavior changes. Cc: Ken Chen <[EMAIL PROTECTED]> Cc: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]> --- fs/fs-writeback.c | 133 1 file changed, 62 insertions(+), 71 deletions(-) --- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c +++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c @@ -102,6 +102,55 @@ int sb_has_dirty_inodes(struct super_blo } EXPORT_SYMBOL(sb_has_dirty_inodes); +/* + * Enqueue a newly dirtied inode: + * - set its when-it-was dirtied timestamp + * - move it to the furthest end of its superblock's dirty-inode list + */ +static void queue_dirty_inode(struct inode *inode) +{ + check_dirty_inode(inode); + inode->dirtied_when = jiffies; + list_move(&inode->i_list, &inode->i_sb->s_dirty); + check_dirty_inode(inode); +} + +/* + * Queue an inode for more io in the next full scan of s_io. + */ +static void queue_for_more_io(struct inode *inode) +{ + check_dirty_inode(inode); + list_move(&inode->i_list, &inode->i_sb->s_more_io); + check_dirty_inode(inode); +} + +/* + * Queue all possible inodes for a run of io. + * The resulting s_io is in order of: + * - inodes queued for more io from s_more_io(once for a full scan of s_io) + * - possible remaining inodes in s_io(was a partial scan) + * - dirty inodes (old enough) from s_dirty + */ +static void queue_inodes_for_io(struct super_block *sb, + unsigned long *older_than_this) +{ + check_dirty_inode_list(sb); + if (list_empty(&sb->s_io)) + list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */ + check_dirty_inode_list(sb); + while (!list_empty(&sb->s_dirty)) { + struct inode *inode = list_entry(sb->s_dirty.prev, + struct inode, i_list); + /* Was this inode dirtied too recently? */ + if (older_than_this && + time_after(inode->dirtied_when, *older_than_this)) + break; + list_move(&inode->i_list, &sb->s_io); + } + check_dirty_inode_list(sb); +} + /** * __mark_inode_dirty -internal function * @inode: inode to mark @@ -199,12 +248,8 @@ void __mark_inode_dirty(struct inode *in * If the inode was already on s_dirty/s_io/s_more_io, don't * reposition it (that would break s_dirty time-ordering). */ - if (!was_dirty) { - check_dirty_inode(inode); - inode->dirtied_when = jiffies; - list_move(&inode->i_list, &sb->s_dirty); - check_dirty_inode(inode); - } + if (!was_dirty) + queue_dirty_inode(inode); } out: spin_unlock(&inode_lock); @@ -219,55 +264,6 @@ static int write_inode(struct inode *ino return 0; } -/* - * Enqueue a newly dirtied inode: - * - set its when-it-was dirtied timestamp - * - move it to the furthest end of its superblock's dirty-inode list - */ -static void redirty_tail(struct inode *inode) -{ - check_dirty_inode(inode); - inode->dirtied_when = jiffies; - list_move(&inode->i_list, &inode->i_sb->s_dirty); - check_dirty_inode(inode); -} - -/* - * Queue an inode for more io in the next full scan of s_io. - */ -static void requeue_io(struct inode *inode) -{ - check_dirty_inode(inode); - list_move(&inode->i_list, &inode->i_sb->s_more_io); - check_dirty_inode(inode); -} - -/* - * Queue all possible inodes for a run of io. - * The resulting s_io is in order of: - * - inodes queued for more io from s_more_io(once for a full scan of s_io) - * - possible remaining inodes in s_io(was a partial scan) - * - dirty inodes (old enough) from s_dirty - */ -static void queue_inodes_for_io(struct super_block *sb, - unsigned long *older_than_this) -{ - check_dirty_inode_list(sb); - if (list_empty(&sb->s_io)) - list_splice_init(&sb->s_more_io, &sb->s_io); /* eldest first */ - check_dirty_inode_list(sb); - while (!list_empty(&sb->s_dirty)) { - struct inode *inode = list_entry(sb->s_dirty.prev, - struct inode, i_list); - /* Was this inode dirtied too recently? */ - if (older_than_this && - time_after(inode->dirtied_when, *older_than_this)) - break; - list_move(&inode->i_list, &sb->s_io); - } - check_dirty_inode_list(sb); -} - static void inode_sync_complete(struct inode *inode) { /* @@ -329,6 +325
[PATCH 1/4] writeback: check time-ordering of s_io and s_more_io
It helps catch bugs like this: [ 738.645689] fs/fs-writeback.c:535: s_dirty got screwed up [ 738.646114] 8100028532b0:4295082249 [ 738.646255] 810002856858:4295082259 [ 738.646388] 810002831b58:4295082667 [ 738.646520] 81000281b1b0:4295082671 [ 738.646651] 81000281d798:4295083507 == s_dirty/s_io [ 738.646783] 81000287e998:4295081393 [ 738.646916] 81000287e430:4295081403 [ 738.647068] 8100028789d8:4295081409 [ 738.647212] 810002878470:4295081415 [ 738.647358] 810002884a18:4295081421 [ 738.647503] 8100028844b0:4295081427 [ 738.647648] 810002890a58:4295081433 [ 738.647782] 8100028904f0:4295081441 [ 738.647894] 81000288da98:4295081449 [ 738.648011] 81000288d530:4295081455 [ 738.648123] 810002897ad8:4295081461 [ 738.648235] 810002897570:4295081469 [ 738.648347] 810002894b18:4295081477 [ 738.648459] 8100028945b0:4295081483 The buggy line 534 is list_splice_init(&sb->s_io, sb->s_dirty.prev); This is not the only time-ordering bug in linux-2.6.23-rc1-mm2. Let's fix them all. Cc: Ken Chen <[EMAIL PROTECTED]> Cc: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]> --- fs/fs-writeback.c | 37 +++-- 1 file changed, 31 insertions(+), 6 deletions(-) --- linux-2.6.23-rc1-mm2.orig/fs/fs-writeback.c +++ linux-2.6.23-rc1-mm2/fs/fs-writeback.c @@ -26,12 +26,12 @@ int sysctl_inode_debug __read_mostly; -static int __check(struct super_block *sb, int print_stuff) +static int __check(struct list_head *head, int print_stuff) { - struct list_head *cursor = &sb->s_dirty; + struct list_head *cursor = head; unsigned long dirtied_when = 0; - while ((cursor = cursor->prev) != &sb->s_dirty) { + while ((cursor = cursor->prev) != head) { struct inode *inode = list_entry(cursor, struct inode, i_list); if (print_stuff) { printk("%p:%lu\n", inode, inode->dirtied_when); @@ -51,14 +51,32 @@ static void __check_dirty_inode_list(str if (!sysctl_inode_debug) return; - if (__check(sb, 0)) { + if (__check(&sb->s_dirty, 0)) { sysctl_inode_debug = 0; if (inode) printk("%s:%d: s_dirty got screwed up. inode=%p:%lu\n", file, line, inode, inode->dirtied_when); else printk("%s:%d: s_dirty got screwed up\n", file, line); - __check(sb, 1); + __check(&sb->s_dirty, 1); + } + if (__check(&sb->s_io, 0)) { + sysctl_inode_debug = 0; + if (inode) + printk("%s:%d: s_io got screwed up. inode=%p:%lu\n", + file, line, inode, inode->dirtied_when); + else + printk("%s:%d: s_io got screwed up\n", file, line); + __check(&sb->s_io, 1); + } + if (__check(&sb->s_more_io, 0)) { + sysctl_inode_debug = 0; + if (inode) + printk("%s:%d: s_more_io got screwed up. inode=%p:%lu\n", + file, line, inode, inode->dirtied_when); + else + printk("%s:%d: s_more_io got screwed up\n", file, line); + __check(&sb->s_more_io, 1); } } @@ -223,7 +241,9 @@ static void redirty_tail(struct inode *i */ static void requeue_io(struct inode *inode) { + check_dirty_inode(inode); list_move(&inode->i_list, &inode->i_sb->s_more_io); + check_dirty_inode(inode); } static void inode_sync_complete(struct inode *inode) @@ -483,7 +503,9 @@ int generic_sync_sb_inodes(struct super_ /* Was this inode dirtied too recently? */ if (wbc->older_than_this && time_after(inode->dirtied_when, *wbc->older_than_this)) { + check_dirty_inode_list(sb); list_splice_init(&sb->s_io, sb->s_dirty.prev); + check_dirty_inode_list(sb); break; } @@ -520,8 +542,11 @@ int generic_sync_sb_inodes(struct super_ break; } - if (list_empty(&sb->s_io)) + if (list_empty(&sb->s_io)) { + check_dirty_inode_list(sb); list_splice_init(&sb->s_more_io, &sb->s_io); + check_dirty_inode_list(sb); + } spin_unlock(&inode_lock); return ret; /* Leave any unwritten inodes on s_io */ } -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes()
NTFS's if-condition on dirty inodes is not complete. Fix it with sb_has_dirty_inodes(). Cc: Anton Altaparmakov <[EMAIL PROTECTED]> Cc: Ken Chen <[EMAIL PROTECTED]> Cc: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]> --- --- linux-2.6.23-rc1-mm2.orig/fs/ntfs/super.c +++ linux-2.6.23-rc1-mm2/fs/ntfs/super.c @@ -2381,14 +2381,14 @@ static void ntfs_put_super(struct super_ */ ntfs_commit_inode(vol->mft_ino); write_inode_now(vol->mft_ino, 1); - if (!list_empty(&sb->s_dirty)) { + if (sb_has_dirty_inodes(sb)) { const char *s1, *s2; mutex_lock(&vol->mft_ino->i_mutex); truncate_inode_pages(vol->mft_ino->i_mapping, 0); mutex_unlock(&vol->mft_ino->i_mutex); write_inode_now(vol->mft_ino, 1); - if (!list_empty(&sb->s_dirty)) { + if (sb_has_dirty_inodes(sb)) { static const char *_s1 = "inodes"; static const char *_s2 = ""; s1 = _s1; --- linux-2.6.23-rc1-mm2.orig/include/linux/fs.h +++ linux-2.6.23-rc1-mm2/include/linux/fs.h @@ -1772,6 +1772,7 @@ extern int bdev_read_only(struct block_d extern int set_blocksize(struct block_device *, int); extern int sb_set_blocksize(struct super_block *, int); extern int sb_min_blocksize(struct super_block *, int); +extern int sb_has_dirty_inodes(struct super_block *); extern int generic_file_mmap(struct file *, struct vm_area_struct *); extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *); -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] [RFC][PATCH] fs-writeback: redefining the dirty inode queues
Andrew, I'd like to propose a cleaner way of using the s_dirty, s_io, s_more_io queues for the writeback of dirty inodes. The basic idea is to clearly define the function of the queues, especially to decouple s_diry from s_io/s_more_io. The details are in the changelog of patch 2. The patches are some cleanups on top of Andrew's s_dirty time-ordering patches and Ken's s_more_io patch: [PATCH 1/4] writeback: check time-ordering of s_io and s_more_io [PATCH 2/4] writeback: 3-queue based writeback schedule [PATCH 3/4] writeback: function renames and cleanups [PATCH 4/4] writeback: fix ntfs with sb_has_dirty_inodes() fs/fs-writeback.c | 196 +++ fs/ntfs/super.c|4 include/linux/fs.h |1 3 files changed, 108 insertions(+), 93 deletions(-) Note that the patches need rework for inserting into the right place of your queue of -mm patches. I'll take care of it in the next take. Thank you, Fengguang -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/4] pass open file to ->setattr()
> >> > This is needed to be able to correctly implement open-unlink-fsetattr > >> > semantics in some filesystem such as sshfs, without having to resort > >> > to "silly-renaming". > >> > >> How do you plan to do that? > > > > Easy: the SFTP protocol has stateful opens and defines an FSTAT call. > > Is it possible to reconnect without umounting? Yes, but open files and in-progress requests are lost at reconnect. > If yes, the unlinked files would be lost in spite of being opened, > wouldn't they? Sure. Obviously one of the drawbacks of a stateful protocol is that the server state can't survive a reconnect. But that sort of reliability has never been the goal of sshfs. And even if that was needed, it could probably be much better handled in a lower layer. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: JFFS2/mtdsuper modprobe "unknown symbol" in 2.6.23-rc1
In message <[EMAIL PROTECTED]>, Adrian Bunk writes: > On Thu, Aug 09, 2007 at 10:38:18PM -0400, Erez Zadok wrote: > > I'm getting an error modprobing jffs2 due to mtdsuper failing to insmod: > >... > > Does anyone know what am I missing? > > You miss that 2.6.23-rc2 with this bug fixed has already been released. Great, I'll upgrade to rc2 (I've had this problem since .22-rc). Thanks for the quick response. Erez. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: JFFS2/mtdsuper modprobe "unknown symbol" in 2.6.23-rc1
On Thu, Aug 09, 2007 at 10:38:18PM -0400, Erez Zadok wrote: > I'm getting an error modprobing jffs2 due to mtdsuper failing to insmod: >... > Does anyone know what am I missing? You miss that 2.6.23-rc2 with this bug fixed has already been released. > Thanks, > Erez. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
JFFS2/mtdsuper modprobe "unknown symbol" in 2.6.23-rc1
I'm getting an error modprobing jffs2 due to mtdsuper failing to insmod: # modprobe jffs2 WARNING: Error inserting mtdsuper (/lib/modules/2.6.23-rc1/kernel/drivers/mtd/mtdsuper.ko): Unknown symbol in module, or unknown parameter (see dmesg) FATAL: Error inserting jffs2 (/lib/modules/2.6.23-rc1/kernel/fs/jffs2/jffs2.ko): Unknown symbol in module, or unknown parameter (see dmesg) # dmesg | tail mtdsuper: Unknown symbol get_mtd_device mtdsuper: Unknown symbol put_mtd_device jffs2: Unknown symbol get_sb_mtd jffs2: Unknown symbol kill_mtd_super My relevant .config is: CONFIG_MTD=m CONFIG_MTD_BLKDEVS=m CONFIG_MTD_BLOCK=m CONFIG_MTD_MAP_BANK_WIDTH_1=y CONFIG_MTD_MAP_BANK_WIDTH_2=y CONFIG_MTD_MAP_BANK_WIDTH_4=y CONFIG_MTD_CFI_I1=y CONFIG_MTD_CFI_I2=y CONFIG_MTD_BLOCK2MTD=m CONFIG_JFFS2_FS=m CONFIG_JFFS2_FS_DEBUG=0 CONFIG_JFFS2_FS_WRITEBUFFER=y CONFIG_JFFS2_SUMMARY=y CONFIG_JFFS2_FS_XATTR=y CONFIG_JFFS2_FS_POSIX_ACL=y CONFIG_JFFS2_FS_SECURITY=y CONFIG_JFFS2_COMPRESSION_OPTIONS=y CONFIG_JFFS2_ZLIB=y CONFIG_JFFS2_RTIME=y CONFIG_JFFS2_CMODE_PRIORITY=y A "quick hack" around this which I found is to add MODULE_LICENSE("GPL"); to the end of drivers/mtd/mtdsuper.c, but that doesn't sound like the right fix. Does anyone know what am I missing? Thanks, Erez. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: problems while mounting /boot partition
On Aug 8 2007 18:28, Michal Piotrowski wrote: > >Hi Brian, > >Brian J. Murrell pisze: >> I am using Ubuntu Gutsy, which is the in-development branch heading for >> their next stable release. > >You forgot about message subject, so no one has read this report. Actually, given the volume on LKML, a line without a subject is making the most attention since all others do have one. :) Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] limit minixfs printks on corrupted dir i_size, CVE-2006-6058
Bodo Eggert wrote: > Warning: I'm only looking at the patch. > > You are supposed to print an error message for a user, not to write in a > chat window to a 1337 script kiddie. OK, you just matched the current style, > and your patch is IMHO OK for a quick security fix, but: > > - Security fixes should be CCed to the security mailing list, shouldn't they? > (It might be security@ or stable@, I'll remember tomorrow, but then I'd >forget to comment) ok. > - Imagine you have three mounts containing a minix fs, how can you tell which > one is the the defective one? good point. > - The message says "minix_bmap", while the patch suggests it's in > block_to_path. Therefore I asume "minix_bmap" to have only random > informational value. Yup, you're right. > - Does block < 0 or block > $size make a difference? well, block > size is likely to arrive from a corrupt i_size, and the insistence upon going ahead and checking the next page after encountering an error on the last one... I don't have any scenario in mind where we'd be repeatedly trying to check blocks < 0. > - the printk lacks the loglevel. As do all other printk's in minixfs... (hm and 11,619 other printk's in the kernel :) ) > - Asuming minix supports error handling, shouldn't it do something? > > I'd suggest a message saying something like "minix: Bad block address on > device 08:15, needs fsck". Fair enough, as you said I was just fixing up the issue, not rewriting the code around it. But yes, I should probably have considered at least a better message here. I can fix this up & resend. But I'm not promising to audit all other printk's in minixfs this time around. ;-) -Eric - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/4] pass open file to ->setattr()
Miklos Szeredi <[EMAIL PROTECTED]> wrote: >> > This is needed to be able to correctly implement open-unlink-fsetattr >> > semantics in some filesystem such as sshfs, without having to resort >> > to "silly-renaming". >> >> How do you plan to do that? > > Easy: the SFTP protocol has stateful opens and defines an FSTAT call. Is it possible to reconnect without umounting? If yes, the unlinked files would be lost in spite of being opened, wouldn't they? -- Top 100 things you don't want the sysadmin to say: 11. Can you get VMS for this Sparc thingy? Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] limit minixfs printks on corrupted dir i_size, CVE-2006-6058
Eric Sandeen <[EMAIL PROTECTED]> wrote: > This attempts to address CVE-2006-6058 > http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-6058 > > first reported at http://projects.info-pull.com/mokb/MOKB-17-11-2006.html > > Essentially a corrupted minix dir inode reporting a very large > i_size will loop for a very long time in minix_readdir, minix_find_entry, > etc, because on EIO they just move on to try the next page. This is > under the BKL, printk-storming as well. This can lock up the machine > for a very long time. Simply ratelimiting the printks gets things back > under control. > Index: linux-2.6.22-rc4/fs/minix/itree_v1.c > === > --- linux-2.6.22-rc4.orig/fs/minix/itree_v1.c > +++ linux-2.6.22-rc4/fs/minix/itree_v1.c > @@ -27,7 +27,8 @@ static int block_to_path(struct inode * > if (block < 0) { > printk("minix_bmap: block<0\n"); > } else if (block >= (minix_sb(inode->i_sb)->s_max_size/BLOCK_SIZE)) { > - printk("minix_bmap: block>big\n"); > + if (printk_ratelimit()) > + printk("minix_bmap: block>big\n"); Warning: I'm only looking at the patch. You are supposed to print an error message for a user, not to write in a chat window to a 1337 script kiddie. OK, you just matched the current style, and your patch is IMHO OK for a quick security fix, but: - Security fixes should be CCed to the security mailing list, shouldn't they? (It might be security@ or stable@, I'll remember tomorrow, but then I'd forget to comment) - Imagine you have three mounts containing a minix fs, how can you tell which one is the the defective one? - The message says "minix_bmap", while the patch suggests it's in block_to_path. Therefore I asume "minix_bmap" to have only random informational value. - Does block < 0 or block > $size make a difference? - the printk lacks the loglevel. - Asuming minix supports error handling, shouldn't it do something? I'd suggest a message saying something like "minix: Bad block address on device 08:15, needs fsck". -- Oops. My brain just hit a bad sector. Friß, Spammer: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2] limit minixfs printks on corrupted dir i_size, CVE-2006-6058
Perhaps this is simpler, and preferable. Thanks to adilger for reminding me about printk_ratelimit. :) This attempts to address CVE-2006-6058 http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-6058 first reported at http://projects.info-pull.com/mokb/MOKB-17-11-2006.html Essentially a corrupted minix dir inode reporting a very large i_size will loop for a very long time in minix_readdir, minix_find_entry, etc, because on EIO they just move on to try the next page. This is under the BKL, printk-storming as well. This can lock up the machine for a very long time. Simply ratelimiting the printks gets things back under control. Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]> Index: linux-2.6.22-rc4/fs/minix/itree_v1.c === --- linux-2.6.22-rc4.orig/fs/minix/itree_v1.c +++ linux-2.6.22-rc4/fs/minix/itree_v1.c @@ -27,7 +27,8 @@ static int block_to_path(struct inode * if (block < 0) { printk("minix_bmap: block<0\n"); } else if (block >= (minix_sb(inode->i_sb)->s_max_size/BLOCK_SIZE)) { - printk("minix_bmap: block>big\n"); + if (printk_ratelimit()) + printk("minix_bmap: block>big\n"); } else if (block < 7) { offsets[n++] = block; } else if ((block -= 7) < 512) { Index: linux-2.6.22-rc4/fs/minix/itree_v2.c === --- linux-2.6.22-rc4.orig/fs/minix/itree_v2.c +++ linux-2.6.22-rc4/fs/minix/itree_v2.c @@ -28,7 +28,8 @@ static int block_to_path(struct inode * if (block < 0) { printk("minix_bmap: block<0\n"); } else if (block >= (minix_sb(inode->i_sb)->s_max_size/sb->s_blocksize)) { - printk("minix_bmap: block>big\n"); + if (printk_ratelimit()) + printk("minix_bmap: block>big\n"); } else if (block < 7) { offsets[n++] = block; } else if ((block -= 7) < 256) { - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]
--- James Morris <[EMAIL PROTECTED]> wrote: > On Thu, 9 Aug 2007, David Howells wrote: > > > James Morris <[EMAIL PROTECTED]> wrote: > > > > > David, I've looked at the code and can't see that you need to access the > > > label itself outside the LSM. Could you instead simply pass the inode > > > pointer around? > > > > It's not quite that simple. I need to impose *two* security labels in > > cachefiles_begin_secure() when I'm about to act on behalf of a process > that's > > tried to access a netfs file: > > Ah ok, we had a similar problem with NFS mount options. > > While I'm concerned about encoding SELinux-optimized secid labels into > general kernel structures, moving to more generalized pointers introduces > lifecycle maintenance issues and complexity which is not needed in the > mainline kernel. i.e. it'll be unused infrastructure maintained by > upstream, and used only by out-of-tree modules. > > So, given that the kernel has no stable API, I suggest accepting the u32 > secid as you propose, and if someone wants to merge a module which also > uses these hooks, but is entirely unable to use u32 labels, then they can > also justify making the interface more generalized and provide the code > for it. Grumble. Yet another thing to undo in the near future. I still hope to suggest what I would consider a viable alternative "soon". Casey Schaufler [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4/4] VFS: allow filesystem to override mknod capability checks
> > From: Miklos Szeredi <[EMAIL PROTECTED]> > > > > Add a new filesystem flag, that results in the VFS not checking if the > > current process has enough privileges to do an mknod(). > > > > This is needed on filesystems, where an unprivileged user may be able > > to create a device node, without causing security problems. > > > > One such example is "mountlo" a loopback mount utility implemented > > with fuse and UML, which runs as an unprivileged userspace process. > > In this case the user does in fact have the right to create device > > nodes within the filesystem image, as long as the user has write > > access to the image. Since the filesystem is mounted with "nodev", > > adding device nodes is not a security concern. > > Could we enforce at do_new_mount() that if > type->fs_flags&FS_MKNOD_CHECKS_PERM then mnt_flags |= MS_NODEV? Well, the problem with that is, there will be fuse filesystems which will want devices to work and for those the capability checks will be reenabled inside ->mknod(). In fact, for backward compatibility all filesystems will have the mknod checks, except ones which explicitly request to turn it off. Since unprivileged fuse mounts always have "nodev", the only way security could be screwed up, is if a filesystem running with privileges disabled the mknod checks. I will probably add some safety guards against that into the fuse library, but of course there's no way to stop a privileged user from screwing up security anyway. If for example there's a loop mount, where the disk image file is writable by a user, and root mounts it without "nodev", the user can still create device nodes (by modifying the image) even if the mknod checks are enabled. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 16/18] fs/Kconfig
On Thu, 9 August 2007 01:01:26 +0200, Arnd Bergmann wrote: > On Wednesday 08 August 2007, Jörn Engel wrote: > > +config LOGFS > > + bool "Log Filesystem (EXPERIMENTAL)" > > + depends on MTD && BLOCK && EXPERIMENTAL > > The dependency on MTD _and_ BLOCK looks correct for your code, but > not necessary. How about making it > > depends on (MTD || BLOCK) && EXPERIMENTAL > > and allowing to build without the mtd/bdev specific code? Would be useful, yes. Jörn -- Data expands to fill the space available for storage. -- Parkinson's Law - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 02/18] include/linux/logfs.h
On Thu, 9 August 2007 00:56:29 +0200, Arnd Bergmann wrote: > On Wednesday 08 August 2007, Jörn Engel wrote: > > +++ linux-2.6.21logfs/include/linux/logfs.h 2007-08-08 > > 02:57:37.0 +0200 > > @@ -0,0 +1,500 @@ > > +/* > > + * fs/logfs/logfs.h > > + * > > The comment does not match the file name. Better remove the file names > entirely from introduction comments. Maybe. This file needs to get moved again anyway. Jörn -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]
On Thu, 9 Aug 2007, David Howells wrote: > James Morris <[EMAIL PROTECTED]> wrote: > > > David, I've looked at the code and can't see that you need to access the > > label itself outside the LSM. Could you instead simply pass the inode > > pointer around? > > It's not quite that simple. I need to impose *two* security labels in > cachefiles_begin_secure() when I'm about to act on behalf of a process that's > tried to access a netfs file: Ah ok, we had a similar problem with NFS mount options. While I'm concerned about encoding SELinux-optimized secid labels into general kernel structures, moving to more generalized pointers introduces lifecycle maintenance issues and complexity which is not needed in the mainline kernel. i.e. it'll be unused infrastructure maintained by upstream, and used only by out-of-tree modules. So, given that the kernel has no stable API, I suggest accepting the u32 secid as you propose, and if someone wants to merge a module which also uses these hooks, but is entirely unable to use u32 labels, then they can also justify making the interface more generalized and provide the code for it. - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4/4] VFS: allow filesystem to override mknod capability checks
Quoting [EMAIL PROTECTED] ([EMAIL PROTECTED]): > From: Miklos Szeredi <[EMAIL PROTECTED]> > > Add a new filesystem flag, that results in the VFS not checking if the > current process has enough privileges to do an mknod(). > > This is needed on filesystems, where an unprivileged user may be able > to create a device node, without causing security problems. > > One such example is "mountlo" a loopback mount utility implemented > with fuse and UML, which runs as an unprivileged userspace process. > In this case the user does in fact have the right to create device > nodes within the filesystem image, as long as the user has write > access to the image. Since the filesystem is mounted with "nodev", > adding device nodes is not a security concern. Could we enforce at do_new_mount() that if type->fs_flags&FS_MKNOD_CHECKS_PERM then mnt_flags |= MS_NODEV? > This feature is basically "fuse-only", so it does not make sense to > change the semantics of ->mknod(). > > Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> > --- > > Index: linux/fs/namei.c > === > --- linux.orig/fs/namei.c 2007-08-09 16:49:07.0 +0200 > +++ linux/fs/namei.c 2007-08-09 16:49:12.0 +0200 > @@ -1921,7 +1921,8 @@ int vfs_mknod(struct inode *dir, struct > if (error) > return error; > > - if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD)) > + if (!(dir->i_sb->s_type->fs_flags & FS_MKNOD_CHECKS_PERM) && > + (S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD)) > return -EPERM; > > if (!dir->i_op || !dir->i_op->mknod) > Index: linux/include/linux/fs.h > === > --- linux.orig/include/linux/fs.h 2007-08-09 16:49:07.0 +0200 > +++ linux/include/linux/fs.h 2007-08-09 16:49:12.0 +0200 > @@ -97,6 +97,7 @@ extern int dir_notify_enable; > #define FS_BINARY_MOUNTDATA 2 > #define FS_HAS_SUBTYPE 4 > #define FS_SAFE 8/* Safe to mount by unprivileged users */ > +#define FS_MKNOD_CHECKS_PERM 16 /* FS checks if device creation is > allowed */ > #define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */ > #define FS_RENAME_DOES_D_MOVE32768 /* FS will handle d_move() >* during rename() internally. > > -- > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/14] NFS: Use local caching [try #2]
On Thu, 2007-08-09 at 19:52 +0100, David Howells wrote: > Trond Myklebust <[EMAIL PROTECTED]> wrote: > > > Dang, that's a lot of inlines... AFAICS, approx half of fs/nfs/fscache.h > > should really be moved into fscache.c. > > If you wish. It seems a shame since a lot of them have only one caller. ...however it also forces you to export a lot of stuff which is really private to fscache.c (the atomics etc). > Note that due to patch #2, PG_fscache causes releasepage() and > invalidatepage() to be called in addition to PG_private. > > > + > > > + if (!nfs_fscache_release_page(page, gfp)) > > > + return 0; > > > > This looks _very_ dubious. Why shouldn't I be able to discard a page > > just because fscache isn't done writing it out? I may have very good > > reasons to do so. > > Hmmm... Looking at the truncate routines, I suppose this ought to be okay, > provided the cache retains a reference on the page whilst it's writing it out > (put_page() won't can the page until we release it). > > It also seems dubious, though, to release the page when the filesystem is > doing stuff to it, even if it's by proxy in the cache. I'll have to test > that, but I'm slightly concerned that the netfs could end up releasing its > cookie before the cache has finished with its pages. On the other hand, with > the new asynchronous stuff I've done, I'm not sure this'll be an actual > problem. Actually, as long as launder_page() and invalidate_page() are doing their thing, then your current code might be OK. The important thing is to ensure that invalidate_inode_pages2() and truncate_inode_pages() work as expected. > > > static int nfs_launder_page(struct page *page) > > > { > > > + nfs_fscache_invalidate_page(page, page->mapping->host, 0); > > > > Why? The function of launder_page() is to clean the page, not to > > truncate it. > > Okay. What you should be doing here is probably to make a call to wait_on_page_fscache_write(page). That should suffice to ensure that the page is clean w.r.t. fscache activity afaics. > > > @@ -1000,11 +1007,13 @@ static int nfs_update_inode(struct inode *inode, > > > struct nfs_fattr *fattr) > > > if (data_stable) { > > > inode->i_size = new_isize; > > > invalid |= NFS_INO_INVALID_DATA; > > > + nfs_fscache_attr_changed(inode); > > > > Can't fscache_attr_changed() call kmalloc(GFP_KERNEL)? You are in a > > spinlocked context here. > > Hmmm... How about I move the call to fscache_attr_changed() to the callers of > nfs_update_inode(), to just after the spinlock is unlocked. The operation is > going to be asynchronous, so the delay shouldn't matter. Right. You might also want to add a flag NFS_INO_INVALID_FSCACHE_ATTR or something like that in order to trigger it. > > I'm not too happy about the change to the binary mount interface. As far > > as I'm concerned, the binary interface should really be considered > > frozen, and should not be touched. > > I hadn't come across that. I'll have a look. > > > Instead, feel free to update the text-based mount interface (which can > > be found in 2.6.23-rc1 and later). Please put any such mount interface > > changes in a separate patch together with the Kconfig changes. > > If you wish, but doesn't that violate the rules of patch division laid down by > Linus and Andrew? Why? It should be possible to set this up so that the NFS+fscache code compiles correctly without the mount patch. Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/14] NFS: Use local caching [try #2]
> > Instead, feel free to update the text-based mount interface (which can > > be found in 2.6.23-rc1 and later). I presume you're referring to nfs_mount_option_tokens[] and friends. Is there a mount program that can drive this? David - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]
On Thu, 9 Aug 2007, David Howells wrote: > + u32 (*inode_get_secid)(struct inode *inode); To maintain API consistency, please return an int which only acts as an error code, and returning the secid via a *u32 function parameter. - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]
James Morris <[EMAIL PROTECTED]> wrote: > David, I've looked at the code and can't see that you need to access the > label itself outside the LSM. Could you instead simply pass the inode > pointer around? It's not quite that simple. I need to impose *two* security labels in cachefiles_begin_secure() when I'm about to act on behalf of a process that's tried to access a netfs file: (1) The security label to act as. This is the label attached to the cachefilesd process when it starts the cache. This is obtained by cachefiles_get_security_ID(). (2) The security label to create files as. This is the label attached to root directory of the cache. This is obtained by cachefiles_determine_cache_secid(). David - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/14] NFS: Use local caching [try #2]
Trond Myklebust <[EMAIL PROTECTED]> wrote: > Dang, that's a lot of inlines... AFAICS, approx half of fs/nfs/fscache.h > should really be moved into fscache.c. If you wish. It seems a shame since a lot of them have only one caller. > > + /* we can do this here as the bits are only set with the page lock > > +* held, and our caller is holding that */ > > + if (!page->private) > > + ClearPagePrivate(page); > > Why would PG_private be set at this point? Looks like I've got a bit more cleaning up to do. PG_private isn't set by FS-Cache, so this bit shouldn't be here. > In any case, please send this and other PagePrivate changes as a > separate patch. Any changes to the PagePrivate semantics must be made > easy to debug. There shouldn't be any. Note that due to patch #2, PG_fscache causes releasepage() and invalidatepage() to be called in addition to PG_private. > > + > > + if (!nfs_fscache_release_page(page, gfp)) > > + return 0; > > This looks _very_ dubious. Why shouldn't I be able to discard a page > just because fscache isn't done writing it out? I may have very good > reasons to do so. Hmmm... Looking at the truncate routines, I suppose this ought to be okay, provided the cache retains a reference on the page whilst it's writing it out (put_page() won't can the page until we release it). It also seems dubious, though, to release the page when the filesystem is doing stuff to it, even if it's by proxy in the cache. I'll have to test that, but I'm slightly concerned that the netfs could end up releasing its cookie before the cache has finished with its pages. On the other hand, with the new asynchronous stuff I've done, I'm not sure this'll be an actual problem. > > static int nfs_launder_page(struct page *page) > > { > > + nfs_fscache_invalidate_page(page, page->mapping->host, 0); > > Why? The function of launder_page() is to clean the page, not to > truncate it. Okay. > > @@ -1000,11 +1007,13 @@ static int nfs_update_inode(struct inode *inode, > > struct nfs_fattr *fattr) > > if (data_stable) { > > inode->i_size = new_isize; > > invalid |= NFS_INO_INVALID_DATA; > > + nfs_fscache_attr_changed(inode); > > Can't fscache_attr_changed() call kmalloc(GFP_KERNEL)? You are in a > spinlocked context here. Hmmm... How about I move the call to fscache_attr_changed() to the callers of nfs_update_inode(), to just after the spinlock is unlocked. The operation is going to be asynchronous, so the delay shouldn't matter. > I'm not too happy about the change to the binary mount interface. As far > as I'm concerned, the binary interface should really be considered > frozen, and should not be touched. I hadn't come across that. I'll have a look. > Instead, feel free to update the text-based mount interface (which can > be found in 2.6.23-rc1 and later). Please put any such mount interface > changes in a separate patch together with the Kconfig changes. If you wish, but doesn't that violate the rules of patch division laid down by Linus and Andrew? David - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]
On Thu, 9 Aug 2007, Casey Schaufler wrote: > This is SELinux specific functionality. It should not be an LSM > interface. As long as the security labels are themselves not being exported to the kernel to be used e.g. for display or transport, then I agree, and we should avoid passing them around outside the LSM entirely if possible. Usually, they're attached to a significant kernel object, which you typically pass around as part of the interface anyway. David, I've looked at the code and can't see that you need to access the label itself outside the LSM. Could you instead simply pass the inode pointer around? (I know it's not always possible, but much preferred). - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/14] CacheFiles: Permit a process's create SID to be overridden [try #2]
--- David Howells <[EMAIL PROTECTED]> wrote: > Casey Schaufler <[EMAIL PROTECTED]> wrote: > > > This is SELinux specific funtionality and should be done in the > > SELinux code. You should not be adding interfaces that are SELinux > > specific, in this case using secids instead of the LSM blob interfaces. > > Is using secids your only objection? Or are you objecting to the whole > 'act-as' concept? My knee jerk reaction is that that is likely to be SELinux specific behavior as well. I'm going to have to look at the patch more carefully before I can say for sure. I will try to make a constructive proposal once I've had the chance to think on it a little. Sorry about the terse and unhelpful initial reaction. Casey Schaufler [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] limit minixfs dir_pages on corrupted dir i_size, CVE-2006-6058
This attempts to address CVE-2006-6058 http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-6058 first reported at http://projects.info-pull.com/mokb/MOKB-17-11-2006.html Essentially a corrupted minix dir inode reporting a very large i_size will loop for a very long time in minix_readdir, minix_find_entry, etc, because on EIO they just move on to try the next page. This is under the BKL, printk'ing as well. This can lock up the machine for a very long time. A simple approach is to at least limit the nr. of pages attempted to no more than s_max_size. (s_max_size is about 256MB for V1, but 2GB for V2; this could still result in a lot of EIO reads in the V2 case, should the retry loops in minix_readdir & friends be short-circuited somehow instead? A simple "break" rather than "continue" on error would certainly resolve it, too...) Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]> --- linux-2.6.22-rc4.orig/fs/minix/dir.c +++ linux-2.6.22-rc4/fs/minix/dir.c @@ -42,7 +42,15 @@ minix_last_byte(struct inode *inode, uns static inline unsigned long dir_pages(struct inode *inode) { - return (inode->i_size+PAGE_CACHE_SIZE-1)>>PAGE_CACHE_SHIFT; + loff_t size = inode->i_size; + + if (size > minix_sb(inode->i_sb)->s_max_size) { + printk("%s: inode %lld i_size > s_max_size\n", + __FUNCTION__, inode->i_size); + size = minix_sb(inode->i_sb)->s_max_size; + } + + return (size+PAGE_CACHE_SIZE-1)>>PAGE_CACHE_SHIFT; } static int dir_commit_chunk(struct page *page, unsigned from, unsigned to) - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]
On Thu, 9 Aug 2007, David Howells wrote: > James Morris <[EMAIL PROTECTED]> wrote: > > > > + u32 (*inode_get_secid)(struct inode *inode); > > > > To maintain API consistency, please return an int which only acts as an > > error code, and returning the secid via a *u32 function parameter. > > Does that apply to *all* the functions, irrespective of whether or not they > return an error? LSM is theoretically an API, so we generally don't know if a security module will return an error or not. If they were simply calls directly into SElinux, where we could always know the semantics, then that would be a different story. - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/14] NFS: Use local caching [try #2]
Dang, that's a lot of inlines... AFAICS, approx half of fs/nfs/fscache.h should really be moved into fscache.c. Otherwise, this looks a lot less intrusive than previous patches. See inlined comments. On Thu, 2007-08-09 at 17:05 +0100, David Howells wrote: > The attached patch makes it possible for the NFS filesystem to make use of the > network filesystem local caching service (FS-Cache). > > To be able to use this, an updated mount program is required. This can be > obtained from: > > http://people.redhat.com/steved/cachefs/util-linux/ > > To mount an NFS filesystem to use caching, add an "fsc" option to the mount: > > mount warthog:/ /a -o fsc > > Signed-Off-By: David Howells <[EMAIL PROTECTED]> > --- > > fs/Kconfig |8 + > fs/nfs/Makefile|1 > fs/nfs/client.c| 11 + > fs/nfs/file.c | 38 +++- > fs/nfs/fscache.c | 303 + > fs/nfs/fscache.h | 464 > > fs/nfs/inode.c | 16 ++ > fs/nfs/internal.h |8 + > fs/nfs/read.c | 27 ++- > fs/nfs/super.c |1 > fs/nfs/sysctl.c| 43 > fs/nfs/write.c |3 > include/linux/nfs4_mount.h |3 > include/linux/nfs_fs.h |6 + > include/linux/nfs_fs_sb.h |5 > include/linux/nfs_mount.h |3 > 16 files changed, 931 insertions(+), 9 deletions(-) > > diff --git a/fs/Kconfig b/fs/Kconfig > index 7feb4cb..76d5d16 100644 > --- a/fs/Kconfig > +++ b/fs/Kconfig > @@ -1600,6 +1600,14 @@ config NFS_V4 > > If unsure, say N. > > +config NFS_FSCACHE > + bool "Provide NFS client caching support (EXPERIMENTAL)" > + depends on EXPERIMENTAL > + depends on NFS_FS=m && FSCACHE || NFS_FS=y && FSCACHE=y > + help > + Say Y here if you want NFS data to be cached locally on disc through > + the general filesystem cache manager > + > config NFS_DIRECTIO > bool "Allow direct I/O on NFS files" > depends on NFS_FS > diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile > index b55cb23..c9e7c43 100644 > --- a/fs/nfs/Makefile > +++ b/fs/nfs/Makefile > @@ -16,4 +16,5 @@ nfs-$(CONFIG_NFS_V4)+= nfs4proc.o nfs4xdr.o > nfs4state.o nfs4renewd.o \ > nfs4namespace.o > nfs-$(CONFIG_NFS_DIRECTIO) += direct.o > nfs-$(CONFIG_SYSCTL) += sysctl.o > +nfs-$(CONFIG_NFS_FSCACHE) += fscache.o > nfs-objs := $(nfs-y) > diff --git a/fs/nfs/client.c b/fs/nfs/client.c > index a49f9fe..7be7807 100644 > --- a/fs/nfs/client.c > +++ b/fs/nfs/client.c > @@ -137,6 +137,8 @@ static struct nfs_client *nfs_alloc_client(const char > *hostname, > clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED; > #endif > > + nfs_fscache_get_client_cookie(clp); > + > return clp; > > error_3: > @@ -168,6 +170,8 @@ static void nfs_free_client(struct nfs_client *clp) > > nfs4_shutdown_client(clp); > > + nfs_fscache_release_client_cookie(clp); > + > /* -EIO all pending I/O */ > if (!IS_ERR(clp->cl_rpcclient)) > rpc_shutdown_client(clp->cl_rpcclient); > @@ -1308,7 +1312,7 @@ static int nfs_volume_list_show(struct seq_file *m, > void *v) > > /* display header on line 1 */ > if (v == &nfs_volume_list) { > - seq_puts(m, "NV SERVER PORT DEV FSID\n"); > + seq_puts(m, "NV SERVER PORT DEV FSID FSC\n"); > return 0; > } > /* display one transport per line on subsequent lines */ > @@ -1322,12 +1326,13 @@ static int nfs_volume_list_show(struct seq_file *m, > void *v) >(unsigned long long) server->fsid.major, >(unsigned long long) server->fsid.minor); > > - seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s\n", > + seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n", > clp->cl_nfsversion, > NIPQUAD(clp->cl_addr.sin_addr), > ntohs(clp->cl_addr.sin_port), > dev, > -fsid); > +fsid, > +nfs_server_fscache_state(server)); Please send these changes as a separate patch: they change an existing interface. > > return 0; > } > diff --git a/fs/nfs/file.c b/fs/nfs/file.c > index c87dc71..dfd36e0 100644 > --- a/fs/nfs/file.c > +++ b/fs/nfs/file.c > @@ -34,6 +34,7 @@ > > #include "delegation.h" > #include "iostat.h" > +#include "internal.h" > > #define NFSDBG_FACILITY NFSDBG_FILE > > @@ -259,6 +260,10 @@ nfs_file_mmap(struct file * file, struct vm_area_struct > * vma) > status = nfs_revalidate_mapping(inode, file->f_mapping); > if (!status) > status = generic_file_mmap(file, vma); > + > + if (status == 0) > + nfs_fscache_install_vm_ops(inode, vma); > + > return status; Please note that in 2.6.24 the NFS client
Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]
James Morris <[EMAIL PROTECTED]> wrote: > > + u32 (*inode_get_secid)(struct inode *inode); > > To maintain API consistency, please return an int which only acts as an > error code, and returning the secid via a *u32 function parameter. Does that apply to *all* the functions, irrespective of whether or not they return an error? David - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/14] CacheFiles: Permit a process's create SID to be overridden [try #2]
Casey Schaufler <[EMAIL PROTECTED]> wrote: > This is SELinux specific funtionality and should be done in the > SELinux code. You should not be adding interfaces that are SELinux > specific, in this case using secids instead of the LSM blob interfaces. Is using secids your only objection? Or are you objecting to the whole 'act-as' concept? David - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]
Casey Schaufler <[EMAIL PROTECTED]> wrote: > This is SELinux specific functionality. It should not be an LSM > interface. This is what I worked out in conjunction with the denizens of the SELinux mailing list. What would you have me do differently? Change things like: u32 (*act_as_secid)(u32 secid); to something like: void (*act_as_secid)(const char *newsecdata, u32 newseclen, char *oldsecdata, u32 *oldseclen); David - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]
--- Stephen Smalley <[EMAIL PROTECTED]> wrote: > On Thu, 2007-08-09 at 10:07 -0700, Casey Schaufler wrote: > > --- David Howells <[EMAIL PROTECTED]> wrote: > > > > > Permit an inode's security ID to be obtained by the CacheFiles module. > This > > > is > > > then used as the SID with which files and directories will be created in > the > > > cache. > > > > This is SELinux specific functionality. It should not be an LSM > > interface. > > Odd, you proposed exactly the same hook (aside from naming convention > and secid as argument vs. as retval) in recent postings on linux-audit > and selinux list for use by the audit system. And that's exposing SELinux specific functionality too. And I don't like the fact that the audit system already requires a secid interface. The audit system, however, does not use the secid for anything other than a handle that gets passed around and eventually used to get the data that goes into the audit record. It's annoying, but harmless and does not affect any access control decisions. The change proposed here would use the secid in access control decisions. The LSM interface ought not to be exposing module specific internal data structures. My work on pulling selinux code out of audit left the secid interface in place. You're right in that audit should get fixed. I had been hoping to make that a phase II activity. Casey Schaufler [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]
On Thu, 2007-08-09 at 10:07 -0700, Casey Schaufler wrote: > --- David Howells <[EMAIL PROTECTED]> wrote: > > > Permit an inode's security ID to be obtained by the CacheFiles module. This > > is > > then used as the SID with which files and directories will be created in the > > cache. > > This is SELinux specific functionality. It should not be an LSM > interface. Odd, you proposed exactly the same hook (aside from naming convention and secid as argument vs. as retval) in recent postings on linux-audit and selinux list for use by the audit system. -- Stephen Smalley National Security Agency - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]
--- David Howells <[EMAIL PROTECTED]> wrote: > Permit an inode's security ID to be obtained by the CacheFiles module. This > is > then used as the SID with which files and directories will be created in the > cache. This is SELinux specific functionality. It should not be an LSM interface. Casey Schaufler [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/14] CacheFiles: Permit a process's create SID to be overridden [try #2]
--- David Howells <[EMAIL PROTECTED]> wrote: > Make it possible for a process's file creation SID to be temporarily > overridden > by CacheFiles so that files created in the cache have the right label > attached. > > Without this facility, files created in the cache will be given the current > file creation SID of whatever process happens to have invoked CacheFiles > indirectly by means of opening a netfs file at the time the cache file is > created. This is SELinux specific funtionality and should be done in the SELinux code. You should not be adding interfaces that are SELinux specific, in this case using secids instead of the LSM blob interfaces. Casey Schaufler [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/14] CacheFiles: Add an act-as SID override in task_security_struct [try #2]
Add an act-as SID to task_security_struct that is equivalent to fsuid/fsgid in task_struct. This permits a task to perform operations as if it is the overriding SID, without changing its own SID as that might be needed to control access to the process by ptrace, signals, /proc, etc. This is useful for CacheFiles in that it allows CacheFiles to access the cache files and directories using the cache's security context rather than the security context of the process on whose behalf it is working, and in the context of which it is running. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- include/linux/security.h | 32 +++ security/dummy.c | 12 +++ security/selinux/exports.c|2 security/selinux/hooks.c | 160 +++-- security/selinux/include/objsec.h |1 security/selinux/selinuxfs.c |2 security/selinux/xfrm.c |6 + 7 files changed, 148 insertions(+), 67 deletions(-) diff --git a/include/linux/security.h b/include/linux/security.h index 92d3da0..422015d 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1154,6 +1154,16 @@ struct request_sock; * Set the current FS security ID. * @secid contains the security ID to set. * + * @act_as_secid: + * Set the security ID as which to act, returning the security ID as which + * the process was previously acting. + * @secid contains the security ID to act as. + * + * @act_as_self: + * Reset the security ID as which to act to be the same as the process's + * owning security ID, and return the security ID as which the process was + * previously acting. + * * This is the main security structure. */ struct security_operations { @@ -1339,6 +1349,8 @@ struct security_operations { void (*release_secctx)(char *secdata, u32 seclen); u32 (*get_fscreate_secid)(void); u32 (*set_fscreate_secid)(u32 secid); + u32 (*act_as_secid)(u32 secid); + u32 (*act_as_self)(void); #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect) (struct socket * sock, @@ -2146,6 +2158,16 @@ static inline u32 security_set_fscreate_secid(u32 secid) return security_ops->set_fscreate_secid(secid); } +static inline u32 security_act_as_secid(u32 secid) +{ + return security_ops->act_as_secid(secid); +} + +static inline u32 security_act_as_self(void) +{ + return security_ops->act_as_self(); +} + /* prototypes */ extern int security_init (void); extern int register_security (struct security_operations *ops); @@ -2838,6 +2860,16 @@ static inline u32 security_set_fscreate_secid(u32 secid) return 0; } +static inline u32 security_act_as_secid(u32 secid) +{ + return 0; +} + +static inline u32 security_act_as_self(void) +{ + return 0; +} + #endif /* CONFIG_SECURITY */ #ifdef CONFIG_SECURITY_NETWORK diff --git a/security/dummy.c b/security/dummy.c index d463e6f..77ec75d 100644 --- a/security/dummy.c +++ b/security/dummy.c @@ -940,6 +940,16 @@ static u32 dummy_set_fscreate_secid(u32 secid) return 0; } +static u32 dummy_act_as_secid(u32 secid) +{ + return 0; +} + +static u32 dummy_act_as_self(void) +{ + return 0; +} + #ifdef CONFIG_KEYS static inline int dummy_key_alloc(struct key *key, struct task_struct *ctx, unsigned long flags) @@ -1096,6 +1106,8 @@ void security_fixup_ops (struct security_operations *ops) set_to_dummy_if_null(ops, release_secctx); set_to_dummy_if_null(ops, get_fscreate_secid); set_to_dummy_if_null(ops, set_fscreate_secid); + set_to_dummy_if_null(ops, act_as_secid); + set_to_dummy_if_null(ops, act_as_self); #ifdef CONFIG_SECURITY_NETWORK set_to_dummy_if_null(ops, unix_stream_connect); set_to_dummy_if_null(ops, unix_may_send); diff --git a/security/selinux/exports.c b/security/selinux/exports.c index b6f9694..b559699 100644 --- a/security/selinux/exports.c +++ b/security/selinux/exports.c @@ -79,7 +79,7 @@ int selinux_relabel_packet_permission(u32 sid) if (selinux_enabled) { struct task_security_struct *tsec = current->security; - return avc_has_perm(tsec->sid, sid, SECCLASS_PACKET, + return avc_has_perm(tsec->actor_sid, sid, SECCLASS_PACKET, PACKET__RELABELTO, NULL); } return 0; diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index c5905b0..66af819 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -162,7 +162,8 @@ static int task_alloc_security(struct task_struct *task) return -ENOMEM; tsec->task = task; - tsec->osid = tsec->sid = tsec->ptrace_sid = SECINITSID_UNLABELED; + tsec->osid = tsec->actor_sid = tsec->sid = tsec->ptrace_sid = + SECINITSID_UNLABELED; task->security = tsec; return
[PATCH 06/14] CacheFiles: Add a hook to write a single page of data to an inode [try #2]
Add an address space operation to write one single page of data to an inode at a page-aligned location (thus permitting the implementation to be highly optimised). This is used by CacheFiles to store the contents of netfs pages into their backing file pages. Supply a generic implementation for this that uses the prepare_write() and commit_write() address_space operations to bound a copy directly into the page cache. Hook the Ext2 and Ext3 operations to the generic implementation. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- fs/ext2/inode.c|2 + fs/ext3/inode.c|3 ++ include/linux/fs.h |7 mm/filemap.c | 95 4 files changed, 107 insertions(+), 0 deletions(-) diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 0079b2c..b3e4b50 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -695,6 +695,7 @@ const struct address_space_operations ext2_aops = { .direct_IO = ext2_direct_IO, .writepages = ext2_writepages, .migratepage= buffer_migrate_page, + .write_one_page = generic_file_buffered_write_one_page, }; const struct address_space_operations ext2_aops_xip = { @@ -713,6 +714,7 @@ const struct address_space_operations ext2_nobh_aops = { .direct_IO = ext2_direct_IO, .writepages = ext2_writepages, .migratepage= buffer_migrate_page, + .write_one_page = generic_file_buffered_write_one_page, }; /* diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index de4e316..93809eb 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -1713,6 +1713,7 @@ static const struct address_space_operations ext3_ordered_aops = { .releasepage= ext3_releasepage, .direct_IO = ext3_direct_IO, .migratepage= buffer_migrate_page, + .write_one_page = generic_file_buffered_write_one_page, }; static const struct address_space_operations ext3_writeback_aops = { @@ -1727,6 +1728,7 @@ static const struct address_space_operations ext3_writeback_aops = { .releasepage= ext3_releasepage, .direct_IO = ext3_direct_IO, .migratepage= buffer_migrate_page, + .write_one_page = generic_file_buffered_write_one_page, }; static const struct address_space_operations ext3_journalled_aops = { @@ -1740,6 +1742,7 @@ static const struct address_space_operations ext3_journalled_aops = { .bmap = ext3_bmap, .invalidatepage = ext3_invalidatepage, .releasepage= ext3_releasepage, + .write_one_page = generic_file_buffered_write_one_page, }; void ext3_set_aops(struct inode *inode) diff --git a/include/linux/fs.h b/include/linux/fs.h index 6bf1395..1b1f288 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -433,6 +433,11 @@ struct address_space_operations { int (*migratepage) (struct address_space *, struct page *, struct page *); int (*launder_page) (struct page *); + /* write the contents of the source page over the page at the specified +* index in the target address space (the source page does not need to +* be related to the target address space) */ + int (*write_one_page)(struct address_space *, pgoff_t, struct page *); + }; struct backing_dev_info; @@ -1669,6 +1674,8 @@ extern ssize_t generic_file_direct_write(struct kiocb *, const struct iovec *, unsigned long *, loff_t, loff_t *, size_t, size_t); extern ssize_t generic_file_buffered_write(struct kiocb *, const struct iovec *, unsigned long, loff_t, loff_t *, size_t, ssize_t); +extern int generic_file_buffered_write_one_page(struct address_space *, + pgoff_t, struct page *); extern ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos); extern ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos); extern void do_generic_mapping_read(struct address_space *mapping, diff --git a/mm/filemap.c b/mm/filemap.c index 7b96487..5e419a2 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2032,6 +2032,101 @@ zero_length_segment: } EXPORT_SYMBOL(generic_file_buffered_write); +/** + * generic_file_buffered_write_one_page - Write a single page of data to an + * inode + * @mapping - The address space of the target inode + * @index - The target page in the target inode to fill + * @source - The data to write into the target page + * + * Write the data from the source page to the page in the nominated address + * space at the @index specified. Note that the file will not be extended if + * the page crosses the EOF marker, in which case only the first part of the + * page will be written. + * + * The @source page does not need to have any association with the file or the + * target page offset. + */ +int generic_fil
[PATCH 12/14] CacheFiles: Get the SID under which the CacheFiles module should operate [try #2]
Get the SID under which the CacheFiles module should operate so that the SELinux security system can control the accesses it makes. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- include/linux/security.h | 20 security/dummy.c |7 +++ security/selinux/hooks.c |7 +++ 3 files changed, 34 insertions(+), 0 deletions(-) diff --git a/include/linux/security.h b/include/linux/security.h index 21cadea..9cb417e 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1164,6 +1164,14 @@ struct request_sock; * owning security ID, and return the security ID as which the process was * previously acting. * + * @cachefiles_get_secid: + * Determine the security ID for the CacheFiles module to use when + * accessing the filesystem containing the cache. + * @secid contains the security ID under which cachefiles daemon is + * running. + * @modsecid contains the pointer to where the security ID for the module + * is to be stored. + * * This is the main security structure. */ struct security_operations { @@ -1352,6 +1360,7 @@ struct security_operations { u32 (*set_fscreate_secid)(u32 secid); u32 (*act_as_secid)(u32 secid); u32 (*act_as_self)(void); + int (*cachefiles_get_secid)(u32 secid, u32 *modsecid); #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect) (struct socket * sock, @@ -2176,6 +2185,11 @@ static inline u32 security_act_as_self(void) return security_ops->act_as_self(); } +static inline int security_cachefiles_get_secid(u32 secid, u32 *modsecid) +{ + return security_ops->cachefiles_get_secid(secid, modsecid); +} + /* prototypes */ extern int security_init (void); extern int register_security (struct security_operations *ops); @@ -2883,6 +2897,12 @@ static inline u32 security_act_as_self(void) return 0; } +static inline int security_cachefiles_get_secid(u32 secid, u32 *modsecid) +{ + *modsecid = 0; + return 0; +} + #endif /* CONFIG_SECURITY */ #ifdef CONFIG_SECURITY_NETWORK diff --git a/security/dummy.c b/security/dummy.c index 6a7a317..2c1fd16 100644 --- a/security/dummy.c +++ b/security/dummy.c @@ -955,6 +955,12 @@ static u32 dummy_act_as_self(void) return 0; } +static int dummy_cachefiles_get_secid(u32 secid, u32 *modsecid) +{ + *modsecid = 0; + return 0; +} + #ifdef CONFIG_KEYS static inline int dummy_key_alloc(struct key *key, struct task_struct *ctx, unsigned long flags) @@ -1114,6 +1120,7 @@ void security_fixup_ops (struct security_operations *ops) set_to_dummy_if_null(ops, set_fscreate_secid); set_to_dummy_if_null(ops, act_as_secid); set_to_dummy_if_null(ops, act_as_self); + set_to_dummy_if_null(ops, cachefiles_get_secid); #ifdef CONFIG_SECURITY_NETWORK set_to_dummy_if_null(ops, unix_stream_connect); set_to_dummy_if_null(ops, unix_may_send); diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index c05d662..725f657 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -4718,6 +4718,12 @@ static u32 selinux_act_as_self(void) return oldactor_sid; } +static int selinux_cachefiles_get_secid(u32 secid, u32 *modsecid) +{ + return security_transition_sid(secid, SECINITSID_KERNEL, + SECCLASS_PROCESS, modsecid); +} + #ifdef CONFIG_KEYS static int selinux_key_alloc(struct key *k, struct task_struct *tsk, @@ -4905,6 +4911,7 @@ static struct security_operations selinux_ops = { .set_fscreate_secid = selinux_set_fscreate_secid, .act_as_secid = selinux_act_as_secid, .act_as_self = selinux_act_as_self, + .cachefiles_get_secid = selinux_cachefiles_get_secid, .unix_stream_connect = selinux_socket_unix_stream_connect, .unix_may_send =selinux_socket_unix_may_send, - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/14] CacheFiles: Permit a process's create SID to be overridden [try #2]
Make it possible for a process's file creation SID to be temporarily overridden by CacheFiles so that files created in the cache have the right label attached. Without this facility, files created in the cache will be given the current file creation SID of whatever process happens to have invoked CacheFiles indirectly by means of opening a netfs file at the time the cache file is created. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- include/linux/security.h | 35 +++ security/dummy.c | 12 security/selinux/hooks.c | 18 ++ 3 files changed, 65 insertions(+), 0 deletions(-) diff --git a/include/linux/security.h b/include/linux/security.h index c11dc8a..92d3da0 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1147,6 +1147,13 @@ struct request_sock; * @secdata contains the security context. * @seclen contains the length of the security context. * + * @get_fscreate_secid: + * Get the current FS security ID. + * + * @set_fscreate_secid: + * Set the current FS security ID. + * @secid contains the security ID to set. + * * This is the main security structure. */ struct security_operations { @@ -1330,6 +1337,8 @@ struct security_operations { int (*setprocattr)(struct task_struct *p, char *name, void *value, size_t size); int (*secid_to_secctx)(u32 secid, char **secdata, u32 *seclen); void (*release_secctx)(char *secdata, u32 seclen); + u32 (*get_fscreate_secid)(void); + u32 (*set_fscreate_secid)(u32 secid); #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect) (struct socket * sock, @@ -2127,6 +2136,16 @@ static inline void security_release_secctx(char *secdata, u32 seclen) return security_ops->release_secctx(secdata, seclen); } +static inline u32 security_get_fscreate_secid(void) +{ + return security_ops->get_fscreate_secid(); +} + +static inline u32 security_set_fscreate_secid(u32 secid) +{ + return security_ops->set_fscreate_secid(secid); +} + /* prototypes */ extern int security_init (void); extern int register_security (struct security_operations *ops); @@ -2795,6 +2814,11 @@ static inline void securityfs_remove(struct dentry *dentry) { } +static inline int security_to_secctx_secid(char *secdata, u32 seclen, u32 *secid) +{ + return -EOPNOTSUPP; +} + static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen) { return -EOPNOTSUPP; @@ -2803,6 +2827,17 @@ static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 *secle static inline void security_release_secctx(char *secdata, u32 seclen) { } + +static inline u32 security_get_fscreate_secid(void) +{ + return 0; +} + +static inline u32 security_set_fscreate_secid(u32 secid) +{ + return 0; +} + #endif /* CONFIG_SECURITY */ #ifdef CONFIG_SECURITY_NETWORK diff --git a/security/dummy.c b/security/dummy.c index 19d813d..d463e6f 100644 --- a/security/dummy.c +++ b/security/dummy.c @@ -930,6 +930,16 @@ static void dummy_release_secctx(char *secdata, u32 seclen) { } +static u32 dummy_get_fscreate_secid(void) +{ + return 0; +} + +static u32 dummy_set_fscreate_secid(u32 secid) +{ + return 0; +} + #ifdef CONFIG_KEYS static inline int dummy_key_alloc(struct key *key, struct task_struct *ctx, unsigned long flags) @@ -1084,6 +1094,8 @@ void security_fixup_ops (struct security_operations *ops) set_to_dummy_if_null(ops, setprocattr); set_to_dummy_if_null(ops, secid_to_secctx); set_to_dummy_if_null(ops, release_secctx); + set_to_dummy_if_null(ops, get_fscreate_secid); + set_to_dummy_if_null(ops, set_fscreate_secid); #ifdef CONFIG_SECURITY_NETWORK set_to_dummy_if_null(ops, unix_stream_connect); set_to_dummy_if_null(ops, unix_may_send); diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 6237933..c5905b0 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -4661,6 +4661,22 @@ static void selinux_release_secctx(char *secdata, u32 seclen) kfree(secdata); } +static u32 selinux_get_fscreate_secid(void) +{ + struct task_security_struct *tsec = current->security; + + return tsec->create_sid; +} + +static u32 selinux_set_fscreate_secid(u32 secid) +{ + struct task_security_struct *tsec = current->security; + u32 oldsid = tsec->create_sid; + + tsec->create_sid = secid; + return oldsid; +} + #ifdef CONFIG_KEYS static int selinux_key_alloc(struct key *k, struct task_struct *tsk, @@ -4843,6 +4859,8 @@ static struct security_operations selinux_ops = { .secid_to_secctx = selinux_secid_to_secctx, .release_secctx = selinux_release_secctx, + .get_fscreate_secid = selinux_get_fscreate_secid, + .set_fscreate_secid = selinux_set_f
[PATCH 07/14] CacheFiles: Permit the page lock state to be monitored [try #2]
Add a function to install a monitor on the page lock waitqueue for a particular page, thus allowing the page being unlocked to be detected. This is used by CacheFiles to detect read completion on a page in the backing filesystem so that it can then copy the data to the waiting netfs page. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- include/linux/pagemap.h |5 + mm/filemap.c| 19 +++ 2 files changed, 24 insertions(+), 0 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index d1049b6..452fdcf 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -220,6 +220,11 @@ static inline void wait_on_page_fscache_write(struct page *page) extern void end_page_fscache_write(struct page *page); /* + * Add an arbitrary waiter to a page's wait queue + */ +extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter); + +/* * Fault a userspace page into pagetables. Return non-zero on a fault. * * This assumes that two userspace pages are always sufficient. That's diff --git a/mm/filemap.c b/mm/filemap.c index 5e419a2..c60c24e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -518,6 +518,25 @@ void fastcall wait_on_page_bit(struct page *page, int bit_nr) EXPORT_SYMBOL(wait_on_page_bit); /** + * add_page_wait_queue - Add an arbitrary waiter to a page's wait queue + * @page - Page defining the wait queue of interest + * @waiter - Waiter to add to the queue + * + * Add an arbitrary @waiter to the wait queue for the nominated @page. + */ +void add_page_wait_queue(struct page *page, wait_queue_t *waiter) +{ + wait_queue_head_t *q = page_waitqueue(page); + unsigned long flags; + + spin_lock_irqsave(&q->lock, flags); + __add_wait_queue(q, waiter); + spin_unlock_irqrestore(&q->lock, flags); +} + +EXPORT_SYMBOL_GPL(add_page_wait_queue); + +/** * unlock_page - unlock a locked page * @page: the page * - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/14] CacheFiles: Permit an inode's security ID to be obtained [try #2]
Permit an inode's security ID to be obtained by the CacheFiles module. This is then used as the SID with which files and directories will be created in the cache. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- include/linux/security.h | 13 + security/dummy.c |6 ++ security/selinux/hooks.c |8 3 files changed, 27 insertions(+), 0 deletions(-) diff --git a/include/linux/security.h b/include/linux/security.h index 422015d..21cadea 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1252,6 +1252,7 @@ struct security_operations { int (*inode_getsecurity)(const struct inode *inode, const char *name, void *buffer, size_t size, int err); int (*inode_setsecurity)(struct inode *inode, const char *name, const void *value, size_t size, int flags); int (*inode_listsecurity)(struct inode *inode, char *buffer, size_t buffer_size); + u32 (*inode_get_secid)(struct inode *inode); int (*file_permission) (struct file * file, int mask); int (*file_alloc_security) (struct file * file); @@ -1814,6 +1815,13 @@ static inline int security_inode_listsecurity(struct inode *inode, char *buffer, return security_ops->inode_listsecurity(inode, buffer, buffer_size); } +static inline u32 security_inode_get_secid(struct inode *inode) +{ + if (unlikely(IS_PRIVATE(inode))) + return 0; + return security_ops->inode_get_secid(inode); +} + static inline int security_file_permission (struct file *file, int mask) { return security_ops->file_permission (file, mask); @@ -2514,6 +2522,11 @@ static inline int security_inode_listsecurity(struct inode *inode, char *buffer, return 0; } +static inline u32 security_inode_get_secid(struct inode *inode) +{ + return 0; +} + static inline int security_file_permission (struct file *file, int mask) { return 0; diff --git a/security/dummy.c b/security/dummy.c index 77ec75d..6a7a317 100644 --- a/security/dummy.c +++ b/security/dummy.c @@ -392,6 +392,11 @@ static int dummy_inode_listsecurity(struct inode *inode, char *buffer, size_t bu return 0; } +static u32 dummy_inode_get_secid(struct inode *inode) +{ + return 0; +} + static const char *dummy_inode_xattr_getsuffix(void) { return NULL; @@ -1042,6 +1047,7 @@ void security_fixup_ops (struct security_operations *ops) set_to_dummy_if_null(ops, inode_getsecurity); set_to_dummy_if_null(ops, inode_setsecurity); set_to_dummy_if_null(ops, inode_listsecurity); + set_to_dummy_if_null(ops, inode_get_secid); set_to_dummy_if_null(ops, file_permission); set_to_dummy_if_null(ops, file_alloc_security); set_to_dummy_if_null(ops, file_free_security); diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 66af819..c05d662 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2464,6 +2464,13 @@ static int selinux_inode_listsecurity(struct inode *inode, char *buffer, size_t return len; } +static u32 selinux_inode_get_secid(struct inode *inode) +{ + struct inode_security_struct *isec = inode->i_security; + + return isec->sid; +} + /* file security operations */ static int selinux_file_permission(struct file *file, int mask) @@ -4822,6 +4829,7 @@ static struct security_operations selinux_ops = { .inode_getsecurity =selinux_inode_getsecurity, .inode_setsecurity =selinux_inode_setsecurity, .inode_listsecurity = selinux_inode_listsecurity, + .inode_get_secid = selinux_inode_get_secid, .file_permission = selinux_file_permission, .file_alloc_security = selinux_file_alloc_security, - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/14] CacheFiles: Export things for CacheFiles [try #2]
Export a number of functions for CacheFiles's use. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- fs/super.c |2 ++ kernel/auditsc.c |2 ++ 2 files changed, 4 insertions(+), 0 deletions(-) diff --git a/fs/super.c b/fs/super.c index fc8ebed..c0d99dd 100644 --- a/fs/super.c +++ b/fs/super.c @@ -270,6 +270,8 @@ int fsync_super(struct super_block *sb) return sync_blockdev(sb->s_bdev); } +EXPORT_SYMBOL_GPL(fsync_super); + /** * generic_shutdown_super - common helper for ->kill_sb() * @sb: superblock to kill diff --git a/kernel/auditsc.c b/kernel/auditsc.c index a777d37..1c068ec 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -1526,6 +1526,8 @@ add_names: } } +EXPORT_SYMBOL_GPL(__audit_inode_child); + /** * auditsc_get_stamp - get local copies of audit_context values * @ctx: audit_context for the task - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/14] NFS: Use local caching [try #2]
The attached patch makes it possible for the NFS filesystem to make use of the network filesystem local caching service (FS-Cache). To be able to use this, an updated mount program is required. This can be obtained from: http://people.redhat.com/steved/cachefs/util-linux/ To mount an NFS filesystem to use caching, add an "fsc" option to the mount: mount warthog:/ /a -o fsc Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- fs/Kconfig |8 + fs/nfs/Makefile|1 fs/nfs/client.c| 11 + fs/nfs/file.c | 38 +++- fs/nfs/fscache.c | 303 + fs/nfs/fscache.h | 464 fs/nfs/inode.c | 16 ++ fs/nfs/internal.h |8 + fs/nfs/read.c | 27 ++- fs/nfs/super.c |1 fs/nfs/sysctl.c| 43 fs/nfs/write.c |3 include/linux/nfs4_mount.h |3 include/linux/nfs_fs.h |6 + include/linux/nfs_fs_sb.h |5 include/linux/nfs_mount.h |3 16 files changed, 931 insertions(+), 9 deletions(-) diff --git a/fs/Kconfig b/fs/Kconfig index 7feb4cb..76d5d16 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -1600,6 +1600,14 @@ config NFS_V4 If unsure, say N. +config NFS_FSCACHE + bool "Provide NFS client caching support (EXPERIMENTAL)" + depends on EXPERIMENTAL + depends on NFS_FS=m && FSCACHE || NFS_FS=y && FSCACHE=y + help + Say Y here if you want NFS data to be cached locally on disc through + the general filesystem cache manager + config NFS_DIRECTIO bool "Allow direct I/O on NFS files" depends on NFS_FS diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile index b55cb23..c9e7c43 100644 --- a/fs/nfs/Makefile +++ b/fs/nfs/Makefile @@ -16,4 +16,5 @@ nfs-$(CONFIG_NFS_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o \ nfs4namespace.o nfs-$(CONFIG_NFS_DIRECTIO) += direct.o nfs-$(CONFIG_SYSCTL) += sysctl.o +nfs-$(CONFIG_NFS_FSCACHE) += fscache.o nfs-objs := $(nfs-y) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index a49f9fe..7be7807 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -137,6 +137,8 @@ static struct nfs_client *nfs_alloc_client(const char *hostname, clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED; #endif + nfs_fscache_get_client_cookie(clp); + return clp; error_3: @@ -168,6 +170,8 @@ static void nfs_free_client(struct nfs_client *clp) nfs4_shutdown_client(clp); + nfs_fscache_release_client_cookie(clp); + /* -EIO all pending I/O */ if (!IS_ERR(clp->cl_rpcclient)) rpc_shutdown_client(clp->cl_rpcclient); @@ -1308,7 +1312,7 @@ static int nfs_volume_list_show(struct seq_file *m, void *v) /* display header on line 1 */ if (v == &nfs_volume_list) { - seq_puts(m, "NV SERVER PORT DEV FSID\n"); + seq_puts(m, "NV SERVER PORT DEV FSID FSC\n"); return 0; } /* display one transport per line on subsequent lines */ @@ -1322,12 +1326,13 @@ static int nfs_volume_list_show(struct seq_file *m, void *v) (unsigned long long) server->fsid.major, (unsigned long long) server->fsid.minor); - seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s\n", + seq_printf(m, "v%d %02x%02x%02x%02x %4hx %-7s %-17s %s\n", clp->cl_nfsversion, NIPQUAD(clp->cl_addr.sin_addr), ntohs(clp->cl_addr.sin_port), dev, - fsid); + fsid, + nfs_server_fscache_state(server)); return 0; } diff --git a/fs/nfs/file.c b/fs/nfs/file.c index c87dc71..dfd36e0 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -34,6 +34,7 @@ #include "delegation.h" #include "iostat.h" +#include "internal.h" #define NFSDBG_FACILITYNFSDBG_FILE @@ -259,6 +260,10 @@ nfs_file_mmap(struct file * file, struct vm_area_struct * vma) status = nfs_revalidate_mapping(inode, file->f_mapping); if (!status) status = generic_file_mmap(file, vma); + + if (status == 0) + nfs_fscache_install_vm_ops(inode, vma); + return status; } @@ -311,22 +316,51 @@ static int nfs_commit_write(struct file *file, struct page *page, unsigned offse return status; } +/* + * partially or wholly invalidate a page + * - release the private state associated with a page if undergoing complete + * page invalidation + * - caller holds page lock + */ static void nfs_invalidate_page(struct page *page, unsigned long offset) { if (offset != 0) return; /* Cancel any unstarted writes on this page */ nfs_wb_page_priority(page->mapping->host, page, FLUSH_INVALIDA
[PATCH 01/14] FS-Cache: Release page->private after failed readahead [try #2]
The attached patch causes read_cache_pages() to release page-private data on a page for which add_to_page_cache() fails or the filler function fails. This permits pages with caching references associated with them to be cleaned up. The invalidatepage() address space op is called (indirectly) to do the honours. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- mm/readahead.c | 40 ++-- 1 files changed, 38 insertions(+), 2 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index 39bf45d..12d1378 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -15,6 +15,7 @@ #include #include #include +#include void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page) { @@ -51,6 +52,41 @@ EXPORT_SYMBOL_GPL(file_ra_state_init); #define list_to_page(head) (list_entry((head)->prev, struct page, lru)) +/* + * see if a page needs releasing upon read_cache_pages() failure + * - the caller of read_cache_pages() may have set PG_private before calling, + * such as the NFS fs marking pages that are cached locally on disk, thus we + * need to give the fs a chance to clean up in the event of an error + */ +static void read_cache_pages_invalidate_page(struct address_space *mapping, +struct page *page) +{ + if (PagePrivate(page)) { + if (TestSetPageLocked(page)) + BUG(); + page->mapping = mapping; + do_invalidatepage(page, 0); + page->mapping = NULL; + unlock_page(page); + } + page_cache_release(page); +} + +/* + * release a list of pages, invalidating them first if need be + */ +static void read_cache_pages_invalidate_pages(struct address_space *mapping, + struct list_head *pages) +{ + struct page *victim; + + while (!list_empty(pages)) { + victim = list_to_page(pages); + list_del(&victim->lru); + read_cache_pages_invalidate_page(mapping, victim); + } +} + /** * read_cache_pages - populate an address space with some pages & start reads against them * @mapping: the address_space @@ -74,14 +110,14 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages, page = list_to_page(pages); list_del(&page->lru); if (add_to_page_cache(page, mapping, page->index, GFP_KERNEL)) { - page_cache_release(page); + read_cache_pages_invalidate_page(mapping, page); continue; } ret = filler(data, page); if (!pagevec_add(&lru_pvec, page)) __pagevec_lru_add(&lru_pvec); if (ret) { - put_pages_list(pages); + read_cache_pages_invalidate_pages(mapping, pages); break; } task_io_account_read(PAGE_CACHE_SIZE); - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/14] FS-Cache: Provide an add_wait_queue_tail() function [try #2]
Provide an add_wait_queue_tail() function to add a waiter to the back of a wait queue instead of the front. Signed-off-by: David Howells <[EMAIL PROTECTED]> --- include/linux/wait.h |1 + kernel/wait.c| 18 ++ 2 files changed, 19 insertions(+), 0 deletions(-) diff --git a/include/linux/wait.h b/include/linux/wait.h index 0e68628..4cae7db 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -118,6 +118,7 @@ static inline int waitqueue_active(wait_queue_head_t *q) #define is_sync_wait(wait) (!(wait) || ((wait)->private)) extern void FASTCALL(add_wait_queue(wait_queue_head_t *q, wait_queue_t * wait)); +extern void FASTCALL(add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t * wait)); extern void FASTCALL(add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t * wait)); extern void FASTCALL(remove_wait_queue(wait_queue_head_t *q, wait_queue_t * wait)); diff --git a/kernel/wait.c b/kernel/wait.c index 444ddbf..7acc9cc 100644 --- a/kernel/wait.c +++ b/kernel/wait.c @@ -29,6 +29,24 @@ void fastcall add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait) } EXPORT_SYMBOL(add_wait_queue); +/** + * add_wait_queue_tail - Add a waiter to the back of a waitqueue + * @q: the wait queue to append the waiter to + * @wait: the waiter to be queued + * + * Add a waiter to the back of a waitqueue so that it gets woken up last. + */ +void fastcall add_wait_queue_tail(wait_queue_head_t *q, wait_queue_t *wait) +{ + unsigned long flags; + + wait->flags &= ~WQ_FLAG_EXCLUSIVE; + spin_lock_irqsave(&q->lock, flags); + __add_wait_queue_tail(q, wait); + spin_unlock_irqrestore(&q->lock, flags); +} +EXPORT_SYMBOL(add_wait_queue_tail); + void fastcall add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait) { unsigned long flags; - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/14] Permit filesystem local caching [try #2]
These patches add local caching for network filesystems such as NFS and AFS. FS-Cache now runs fully asynchronously as required by Trond Myklebust for NFS. -- Changes: (*) The CacheFiles module no longer accepts directory fds in its cull and inuse commands from cachefilesd. Instead it uses the current working directory of the calling process as the basis for looking up the object. Corollary to this, fget_light() no longer needs to be exported. -- A tarball of the patches is available at: http://people.redhat.com/~dhowells/fscache/patches/nfs+fscache-21.tar.bz2 To use this version of CacheFiles, the cachefilesd-0.9 is also required. It is available as an SRPM: http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9-1.fc7.src.rpm Or as individual bits: http://people.redhat.com/~dhowells/fscache/cachefilesd-0.9.tar.bz2 http://people.redhat.com/~dhowells/fscache/cachefilesd.fc http://people.redhat.com/~dhowells/fscache/cachefilesd.if http://people.redhat.com/~dhowells/fscache/cachefilesd.te http://people.redhat.com/~dhowells/fscache/cachefilesd.spec The .fc, .if and .te files are for manipulating SELinux. David - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/14] CacheFiles: Add missing copy_page export for ia64 [try #2]
This one-line patch fixes the missing export of copy_page introduced by the cachefile patches. This patch is not yet upstream, but is required for cachefile on ia64. It will be pushed upstream when cachefile goes upstream. Signed-off-by: Prarit Bhargava <[EMAIL PROTECTED]> Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- arch/ia64/kernel/ia64_ksyms.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kernel/ia64_ksyms.c b/arch/ia64/kernel/ia64_ksyms.c index bd17190..20c3546 100644 --- a/arch/ia64/kernel/ia64_ksyms.c +++ b/arch/ia64/kernel/ia64_ksyms.c @@ -43,6 +43,7 @@ EXPORT_SYMBOL(__do_clear_user); EXPORT_SYMBOL(__strlen_user); EXPORT_SYMBOL(__strncpy_from_user); EXPORT_SYMBOL(__strnlen_user); +EXPORT_SYMBOL(copy_page); /* from arch/ia64/lib */ extern void __divsi3(void); - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/14] FS-Cache: Recruit a couple of page flags for cache management [try #2]
Recruit a couple of page flags to aid in cache management. The following extra flags are defined: (1) PG_fscache (PG_owner_priv_2) The marked page is backed by a local cache and is pinning resources in the cache driver. (2) PG_fscache_write (PG_owner_priv_3) The marked page is being written to the local cache. The page may not be modified whilst this is in progress. If PG_fscache is set, then things that checked for PG_private will now also check for that. This includes things like truncation and page invalidation. The function page_has_private() had been added to detect this. Signed-off-by: David Howells <[EMAIL PROTECTED]> --- fs/splice.c|2 +- include/linux/page-flags.h | 30 +- include/linux/pagemap.h| 11 +++ mm/filemap.c | 16 mm/migrate.c |2 +- mm/page_alloc.c|3 +++ mm/readahead.c |9 + mm/swap.c |4 ++-- mm/swap_state.c|4 ++-- mm/truncate.c | 10 +- mm/vmscan.c|2 +- 11 files changed, 76 insertions(+), 17 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index c010a72..ae4f5b7 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -58,7 +58,7 @@ static int page_cache_pipe_buf_steal(struct pipe_inode_info *pipe, */ wait_on_page_writeback(page); - if (PagePrivate(page)) + if (page_has_private(page)) try_to_release_page(page, GFP_KERNEL); /* diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 209d3a4..eaf9854 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -83,19 +83,24 @@ #define PG_private 11 /* If pagecache, has fs-private data */ #define PG_writeback 12 /* Page is under writeback */ +#define PG_owner_priv_213 /* Owner use. If pagecache, fs may use */ #define PG_compound14 /* Part of a compound page */ #define PG_swapcache 15 /* Swap page: swp_entry_t in private */ #define PG_mappedtodisk16 /* Has blocks allocated on-disk */ #define PG_reclaim 17 /* To be reclaimed asap */ +#define PG_owner_priv_318 /* Owner use. If pagecache, fs may use */ #define PG_buddy 19 /* Page is free, on buddy lists */ /* PG_readahead is only used for file reads; PG_reclaim is only for writes */ #define PG_readahead PG_reclaim /* Reminder to do async read-ahead */ -/* PG_owner_priv_1 users should have descriptive aliases */ +/* PG_owner_priv_1/2/3 users should have descriptive aliases */ #define PG_checked PG_owner_priv_1 /* Used by some filesystems */ #define PG_pinned PG_owner_priv_1 /* Xen pinned pagetable */ +#define PG_fscache PG_owner_priv_2 /* Backed by local cache */ +#define PG_fscache_write PG_owner_priv_3 /* Writing to local cache */ + #if (BITS_PER_LONG > 32) /* @@ -199,6 +204,18 @@ static inline void SetPageUptodate(struct page *page) #define TestClearPageWriteback(page) test_and_clear_bit(PG_writeback, \ &(page)->flags) +#define PageFsCache(page) test_bit(PG_fscache, &(page)->flags) +#define SetPageFsCache(page) set_bit(PG_fscache, &(page)->flags) +#define ClearPageFsCache(page) clear_bit(PG_fscache, &(page)->flags) +#define TestSetPageFsCache(page) test_and_set_bit(PG_fscache, &(page)->flags) +#define TestClearPageFsCache(page) test_and_clear_bit(PG_fscache, &(page)->flags) + +#define PageFsCacheWrite(page) test_bit(PG_fscache_write, &(page)->flags) +#define SetPageFsCacheWrite(page) set_bit(PG_fscache_write, &(page)->flags) +#define ClearPageFsCacheWrite(page)clear_bit(PG_fscache_write, &(page)->flags) +#define TestSetPageFsCacheWrite(page) test_and_set_bit(PG_fscache_write, &(page)->flags) +#define TestClearPageFsCacheWrite(page) test_and_clear_bit(PG_fscache_write, &(page)->flags) + #define PageBuddy(page)test_bit(PG_buddy, &(page)->flags) #define __SetPageBuddy(page) __set_bit(PG_buddy, &(page)->flags) #define __ClearPageBuddy(page) __clear_bit(PG_buddy, &(page)->flags) @@ -272,4 +289,15 @@ static inline void set_page_writeback(struct page *page) test_set_page_writeback(page); } +/** + * page_has_private - Determine if page has private stuff + * @page: The page to be checked + * + * Determine if a page has private stuff, indicating that release routines + * should be invoked upon it. + */ +#define page_has_private(page) \ + ((page)->flags & ((1 << PG_private) | \ + (1 << PG_fscache))) + #endif /* PAGE_FLAGS_H */ diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h ind
Re: [RFC PATCH 1/4] pass open file to ->setattr()
> > This is needed to be able to correctly implement open-unlink-fsetattr > > semantics in some filesystem such as sshfs, without having to resort > > to "silly-renaming". > > How do you plan to do that? Easy: the SFTP protocol has stateful opens and defines an FSTAT call. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/4] pass open file to ->setattr()
On Thu, Aug 09, 2007 at 05:27:45PM +0200, [EMAIL PROTECTED] wrote: > This is needed to be able to correctly implement open-unlink-fsetattr > semantics in some filesystem such as sshfs, without having to resort > to "silly-renaming". How do you plan to do that? --b. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 2/4] pass open file to ->getattr()
From: Miklos Szeredi <[EMAIL PROTECTED]> Pass the open file into the filesystem's ->getattr() method for fstat(). This is needed to be able to correctly implement open-unlink-fstat semantics in some filesystem such as sshfs, without having to resort to "silly-renaming". Do this by adding a 'struct file *' parameter to i_op->getattr(). For fstat() pass the open file pointer, in other cases pass NULL. This is safe from a compatibility standpoint, out-of-tree old stuff will continue to work, but will get a warning at compile time. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/9p/vfs_inode.c === --- linux.orig/fs/9p/vfs_inode.c2007-08-09 16:47:30.0 +0200 +++ linux/fs/9p/vfs_inode.c 2007-08-09 16:48:45.0 +0200 @@ -706,7 +706,7 @@ done: static int v9fs_vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, -struct kstat *stat) +struct kstat *stat, struct file *file) { int err; struct v9fs_session_info *v9ses; Index: linux/fs/afs/inode.c === --- linux.orig/fs/afs/inode.c 2007-08-09 16:47:30.0 +0200 +++ linux/fs/afs/inode.c2007-08-09 16:48:45.0 +0200 @@ -295,7 +295,7 @@ error_unlock: * read the attributes of an inode */ int afs_getattr(struct vfsmount *mnt, struct dentry *dentry, - struct kstat *stat) + struct kstat *stat, struct file *file) { struct inode *inode; Index: linux/fs/afs/internal.h === --- linux.orig/fs/afs/internal.h2007-08-09 16:47:30.0 +0200 +++ linux/fs/afs/internal.h 2007-08-09 16:48:45.0 +0200 @@ -548,7 +548,8 @@ extern struct inode *afs_iget(struct sup struct afs_callback *); extern void afs_zap_data(struct afs_vnode *); extern int afs_validate(struct afs_vnode *, struct key *); -extern int afs_getattr(struct vfsmount *, struct dentry *, struct kstat *); +extern int afs_getattr(struct vfsmount *, struct dentry *, struct kstat *, + struct file *); extern int afs_setattr(struct dentry *, struct iattr *); extern void afs_clear_inode(struct inode *); Index: linux/fs/bad_inode.c === --- linux.orig/fs/bad_inode.c 2007-08-09 16:47:30.0 +0200 +++ linux/fs/bad_inode.c2007-08-09 16:48:45.0 +0200 @@ -250,7 +250,7 @@ static int bad_inode_permission(struct i } static int bad_inode_getattr(struct vfsmount *mnt, struct dentry *dentry, - struct kstat *stat) + struct kstat *stat, struct file *file) { return -EIO; } Index: linux/fs/cifs/cifsfs.h === --- linux.orig/fs/cifs/cifsfs.h 2007-08-09 16:47:30.0 +0200 +++ linux/fs/cifs/cifsfs.h 2007-08-09 16:48:45.0 +0200 @@ -55,7 +55,8 @@ extern int cifs_rmdir(struct inode *, st extern int cifs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *); extern int cifs_revalidate(struct dentry *); -extern int cifs_getattr(struct vfsmount *, struct dentry *, struct kstat *); +extern int cifs_getattr(struct vfsmount *, struct dentry *, struct kstat *, + struct file *); extern int cifs_setattr(struct dentry *, struct iattr *); extern const struct inode_operations cifs_file_inode_ops; Index: linux/fs/cifs/inode.c === --- linux.orig/fs/cifs/inode.c 2007-08-09 16:47:30.0 +0200 +++ linux/fs/cifs/inode.c 2007-08-09 16:48:45.0 +0200 @@ -1332,7 +1332,7 @@ int cifs_revalidate(struct dentry *diren } int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry, - struct kstat *stat) + struct kstat *stat, struct file *file) { int err = cifs_revalidate(dentry); if (!err) { Index: linux/fs/coda/inode.c === --- linux.orig/fs/coda/inode.c 2007-08-09 16:47:30.0 +0200 +++ linux/fs/coda/inode.c 2007-08-09 16:48:45.0 +0200 @@ -220,7 +220,8 @@ static void coda_clear_inode(struct inod coda_cache_clear_inode(inode); } -int coda_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) +int coda_getattr(struct vfsmount *mnt, struct dentry *dentry, +struct kstat *stat, struct file *file) { int err = coda_revalidate_inode(dentry); if (!err) Index: linux/fs/fat/file.c === --- linux.orig/fs/fat/file.c2007-08-09 16:47:30.0 +0200 +++ linux/fs/fat/file.c 2007-08-09 16:48:45.0 +0200 @@ -303,7 +303,8 @@ v
[RFC PATCH 1/4] pass open file to ->setattr()
From: Miklos Szeredi <[EMAIL PROTECTED]> Pass the open file into the filesystem's ->setattr() method for fchmod, fchown and some of the utimes variants. This is needed to be able to correctly implement open-unlink-fsetattr semantics in some filesystem such as sshfs, without having to resort to "silly-renaming". The infrastructure is already there, so just need to fill in attrs.ia_file and set ATTR_FILE in attrs.ia_valid. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/open.c === --- linux.orig/fs/open.c2007-08-09 16:47:30.0 +0200 +++ linux/fs/open.c 2007-08-09 16:48:43.0 +0200 @@ -581,7 +581,8 @@ asmlinkage long sys_fchmod(unsigned int if (mode == (mode_t) -1) mode = inode->i_mode; newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO); - newattrs.ia_valid = ATTR_MODE | ATTR_CTIME; + newattrs.ia_valid = ATTR_MODE | ATTR_CTIME | ATTR_FILE; + newattrs.ia_file = file; err = notify_change(dentry, &newattrs); mutex_unlock(&inode->i_mutex); @@ -631,7 +632,8 @@ asmlinkage long sys_chmod(const char __u return sys_fchmodat(AT_FDCWD, filename, mode); } -static int chown_common(struct dentry * dentry, uid_t user, gid_t group) +static int chown_common(struct dentry * dentry, uid_t user, gid_t group, + struct file *file) { struct inode * inode; int error; @@ -659,6 +661,10 @@ static int chown_common(struct dentry * } if (!S_ISDIR(inode->i_mode)) newattrs.ia_valid |= ATTR_KILL_SUID|ATTR_KILL_SGID; + if (file) { + newattrs.ia_file = file; + newattrs.ia_valid |= ATTR_FILE; + } mutex_lock(&inode->i_mutex); error = notify_change(dentry, &newattrs); mutex_unlock(&inode->i_mutex); @@ -674,7 +680,7 @@ asmlinkage long sys_chown(const char __u error = user_path_walk(filename, &nd); if (error) goto out; - error = chown_common(nd.dentry, user, group); + error = chown_common(nd.dentry, user, group, NULL); path_release(&nd); out: return error; @@ -694,7 +700,7 @@ asmlinkage long sys_fchownat(int dfd, co error = __user_walk_fd(dfd, filename, follow, &nd); if (error) goto out; - error = chown_common(nd.dentry, user, group); + error = chown_common(nd.dentry, user, group, NULL); path_release(&nd); out: return error; @@ -708,7 +714,7 @@ asmlinkage long sys_lchown(const char __ error = user_path_walk_link(filename, &nd); if (error) goto out; - error = chown_common(nd.dentry, user, group); + error = chown_common(nd.dentry, user, group, NULL); path_release(&nd); out: return error; @@ -727,7 +733,7 @@ asmlinkage long sys_fchown(unsigned int dentry = file->f_path.dentry; audit_inode(NULL, dentry); - error = chown_common(dentry, user, group); + error = chown_common(dentry, user, group, file); fput(file); out: return error; Index: linux/fs/utimes.c === --- linux.orig/fs/utimes.c 2007-08-09 16:47:30.0 +0200 +++ linux/fs/utimes.c 2007-08-09 16:48:43.0 +0200 @@ -130,6 +130,10 @@ long do_utimes(int dfd, char __user *fil } } } + if (f) { + newattrs.ia_file = f; + newattrs.ia_valid |= ATTR_FILE; + } mutex_lock(&inode->i_mutex); error = notify_change(dentry, &newattrs); mutex_unlock(&inode->i_mutex); -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 4/4] VFS: allow filesystem to override mknod capability checks
From: Miklos Szeredi <[EMAIL PROTECTED]> Add a new filesystem flag, that results in the VFS not checking if the current process has enough privileges to do an mknod(). This is needed on filesystems, where an unprivileged user may be able to create a device node, without causing security problems. One such example is "mountlo" a loopback mount utility implemented with fuse and UML, which runs as an unprivileged userspace process. In this case the user does in fact have the right to create device nodes within the filesystem image, as long as the user has write access to the image. Since the filesystem is mounted with "nodev", adding device nodes is not a security concern. This feature is basically "fuse-only", so it does not make sense to change the semantics of ->mknod(). Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/namei.c === --- linux.orig/fs/namei.c 2007-08-09 16:49:07.0 +0200 +++ linux/fs/namei.c2007-08-09 16:49:12.0 +0200 @@ -1921,7 +1921,8 @@ int vfs_mknod(struct inode *dir, struct if (error) return error; - if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD)) + if (!(dir->i_sb->s_type->fs_flags & FS_MKNOD_CHECKS_PERM) && + (S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD)) return -EPERM; if (!dir->i_op || !dir->i_op->mknod) Index: linux/include/linux/fs.h === --- linux.orig/include/linux/fs.h 2007-08-09 16:49:07.0 +0200 +++ linux/include/linux/fs.h2007-08-09 16:49:12.0 +0200 @@ -97,6 +97,7 @@ extern int dir_notify_enable; #define FS_BINARY_MOUNTDATA 2 #define FS_HAS_SUBTYPE 4 #define FS_SAFE 8 /* Safe to mount by unprivileged users */ +#define FS_MKNOD_CHECKS_PERM 16/* FS checks if device creation is allowed */ #define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */ #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() * during rename() internally. -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 0/4] VFS updates
VFS tweaks needed for some FUSE features, but possibly useful to other filesystems as well. Comments are welcome. -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 3/4] allow filesystems to implement atomic open+truncate
From: Miklos Szeredi <[EMAIL PROTECTED]> Add a new attribute flag ATTR_OPEN, with the meaning: "truncation was initiated by open() due to the O_TRUNC flag". This way filesystems wanting to implement truncation within their ->open() method can ignore such truncate requests. This is a quick & dirty hack, but it comes for free. When (if) we implement a proper low-level open+create+truncate inode operation, this can go away. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/namei.c === --- linux.orig/fs/namei.c 2007-08-09 16:47:30.0 +0200 +++ linux/fs/namei.c2007-08-09 16:49:07.0 +0200 @@ -1655,8 +1655,10 @@ int may_open(struct nameidata *nd, int a error = locks_verify_locked(inode); if (!error) { DQUOT_INIT(inode); - - error = do_truncate(dentry, 0, ATTR_MTIME|ATTR_CTIME, NULL); + + error = do_truncate(dentry, 0, + ATTR_MTIME|ATTR_CTIME|ATTR_OPEN, + NULL); } put_write_access(inode); if (error) Index: linux/include/linux/fs.h === --- linux.orig/include/linux/fs.h 2007-08-09 16:48:45.0 +0200 +++ linux/include/linux/fs.h2007-08-09 16:49:07.0 +0200 @@ -335,6 +335,7 @@ typedef void (dio_iodone_t)(struct kiocb #define ATTR_KILL_SUID 2048 #define ATTR_KILL_SGID 4096 #define ATTR_FILE 8192 +#define ATTR_OPEN 16384 /* Truncating from open(O_TRUNC) */ /* * This is the Inode Attributes structure, used for notify_change(). It @@ -1521,7 +1522,7 @@ static inline int break_lease(struct ino /* fs/open.c */ -extern int do_truncate(struct dentry *, loff_t start, unsigned int time_attrs, +extern int do_truncate(struct dentry *, loff_t start, unsigned int attrs, struct file *filp); extern long do_sys_open(int fdf, const char __user *filename, int flags, int mode); Index: linux/fs/open.c === --- linux.orig/fs/open.c2007-08-09 16:48:43.0 +0200 +++ linux/fs/open.c 2007-08-09 16:49:07.0 +0200 @@ -194,7 +194,7 @@ out: return error; } -int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs, +int do_truncate(struct dentry *dentry, loff_t length, unsigned int attrs, struct file *filp) { int err; @@ -205,7 +205,7 @@ int do_truncate(struct dentry *dentry, l return -EINVAL; newattrs.ia_size = length; - newattrs.ia_valid = ATTR_SIZE | time_attrs; + newattrs.ia_valid = ATTR_SIZE | attrs; if (filp) { newattrs.ia_file = filp; newattrs.ia_valid |= ATTR_FILE; -- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [fuse-devel] [PATCH 00/25] move handling of setuid/gid bits from VFS into individual setattr functions (RESEND)
On Wed, 8 Aug 2007 22:05:13 +0200 (CEST) Jan Engelhardt <[EMAIL PROTECTED]> wrote: > > On Aug 8 2007 09:48, Andrew Morton wrote: > >> > On Mon, 6 Aug 2007 09:54:03 -0400 > >> > Jeff Layton <[EMAIL PROTECTED]> wrote: > >> > > >> > Is there any way in which we can prevent these problems? Say > >> > > >> > - rename something so that unconverted filesystems will reliably fail to > >> > compile? > >> > > >> > >> I suppose we could rename the .setattr inode operation to something > >> else, but then we'll be stuck with it for at least a while. That seems > >> sort of kludgey too... > > > >Sure. We're changing the required behaviour of .setattr. Changing its > >name is a fine and reasonably reliable way to communicate that fact. > > Maybe ->chattr/->chgattr? > > That seems like a good replacement name. :-) Now that I think on this further though, maybe Trond's suggestion to change how the return code works is the best one. That would (hopefully) catch this problem at runtime, so if someone is using a precompiled but unconverted module then that would be detected too. -- Jeff Layton <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html