2.6.21-rc6 new aops patchset
http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/ 2.6.21-rc6-new-aops* New aops patchset against 2.6.21-rc6. Reworked the cont helpers to be better aligned with the old scheme. This unbroke reiserfs (hopefully the only showstopper), and made fat conversion simpler. Converted most of the cont_prepare_write filesystems (about half a dozen). affs and hfsplus still call ->prepare_write in parts, which makes them slightly less than trivial. Converted block_dev and jffs2 to new aops. Converted hostfs and smbfs, optimised things along the way. Convert rd to new aops. This was interesting because the new aops make it trivial to keep rd pagecache off the LRU, so I did that too. Bugfixes for tmpfs and loop (which was not using the new aops, so I didn't notice the tmpfs breakage). Switched ext2's directory manipulation to the new pagecache accessors. Did some performance testing of the fuse_perform_write implementation. Result with a passthrough filesystem onto a backing tmpfs directory is that bulk (1MB) writes are nearly 4 times faster (256MB/s vs 71MB/s), because FUSE can send larger requests to userspace. Block based filesystems will tend to be less dramatic, but could still be significant if block allocation is batched, for example. Issues: perform_write still here for the moment (conversion from perform_write aop implementation to write fop shouldn't be too hard anyway, but I'll sort this out before it gets into mainline). nobh still unconverted (old nobh ops still work, they'll just be using the slow usercopy path. ext3 doesn't use nobh any more). I'm inclined to keep ignoring nobh for now, because we're already up to 40 patches. I'd like to try improving nobh, but this isn't the right patchset to do it in. Many of the trivial conversions are untested. Need to convert others (eg. reiserfs). Need to think about how to merge this. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/17] hfs: remove redundant read_mapping_page error check
Now that read_mapping_page() does error checking internally, there is no need to check PageError here. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/hfs/bnode.c linux-2.6.21-rc6-mm1-test/fs/hfs/bnode.c --- linux-2.6.21-rc6-mm1/fs/hfs/bnode.c 2007-04-09 17:20:13.0 -0700 +++ linux-2.6.21-rc6-mm1-test/fs/hfs/bnode.c2007-04-10 21:28:03.0 -0700 @@ -282,10 +282,6 @@ static struct hfs_bnode *__hfs_bnode_cre page = read_mapping_page(mapping, block++, NULL); if (IS_ERR(page)) goto fail; - if (PageError(page)) { - page_cache_release(page); - goto fail; - } page_cache_release(page); node->page[i] = page; } - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/17] ntfs: convert ntfs_map_page to read_kmap_page
Replace ntfs_map_page() and ntfs_unmap_page() using the new read_kmap_page() and put_kmapped_page() calls, and their locking variants, and remove unneeded PageError checking. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ntfs/aops.h linux-2.6.21-rc5-mm4-test/fs/ntfs/aops.h --- linux-2.6.21-rc5-mm4/fs/ntfs/aops.h 2007-04-05 17:14:25.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/ntfs/aops.h2007-04-06 01:59:19.0 -0700 @@ -31,73 +31,6 @@ #include "inode.h" -/** - * ntfs_unmap_page - release a page that was mapped using ntfs_map_page() - * @page: the page to release - * - * Unpin, unmap and release a page that was obtained from ntfs_map_page(). - */ -static inline void ntfs_unmap_page(struct page *page) -{ - kunmap(page); - page_cache_release(page); -} - -/** - * ntfs_map_page - map a page into accessible memory, reading it if necessary - * @mapping: address space for which to obtain the page - * @index: index into the page cache for @mapping of the page to map - * - * Read a page from the page cache of the address space @mapping at position - * @index, where @index is in units of PAGE_CACHE_SIZE, and not in bytes. - * - * If the page is not in memory it is loaded from disk first using the readpage - * method defined in the address space operations of @mapping and the page is - * added to the page cache of @mapping in the process. - * - * If the page belongs to an mst protected attribute and it is marked as such - * in its ntfs inode (NInoMstProtected()) the mst fixups are applied but no - * error checking is performed. This means the caller has to verify whether - * the ntfs record(s) contained in the page are valid or not using one of the - * ntfs_is__record{,p}() macros, where is the record type you are - * expecting to see. (For details of the macros, see fs/ntfs/layout.h.) - * - * If the page is in high memory it is mapped into memory directly addressible - * by the kernel. - * - * Finally the page count is incremented, thus pinning the page into place. - * - * The above means that page_address(page) can be used on all pages obtained - * with ntfs_map_page() to get the kernel virtual address of the page. - * - * When finished with the page, the caller has to call ntfs_unmap_page() to - * unpin, unmap and release the page. - * - * Note this does not grant exclusive access. If such is desired, the caller - * must provide it independently of the ntfs_{un}map_page() calls by using - * a {rw_}semaphore or other means of serialization. A spin lock cannot be - * used as ntfs_map_page() can block. - * - * The unlocked and uptodate page is returned on success or an encoded error - * on failure. Caller has to test for error using the IS_ERR() macro on the - * return value. If that evaluates to 'true', the negative error code can be - * obtained using PTR_ERR() on the return value of ntfs_map_page(). - */ -static inline struct page *ntfs_map_page(struct address_space *mapping, - unsigned long index) -{ - struct page *page = read_mapping_page(mapping, index, NULL); - - if (!IS_ERR(page)) { - kmap(page); - if (!PageError(page)) - return page; - ntfs_unmap_page(page); - return ERR_PTR(-EIO); - } - return page; -} - #ifdef NTFS_RW extern void mark_ntfs_record_dirty(struct page *page, const unsigned int ofs); diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ntfs/bitmap.c linux-2.6.21-rc5-mm4-test/fs/ntfs/bitmap.c --- linux-2.6.21-rc5-mm4/fs/ntfs/bitmap.c 2006-11-29 13:57:37.0 -0800 +++ linux-2.6.21-rc5-mm4-test/fs/ntfs/bitmap.c 2007-04-06 12:40:53.0 -0700 @@ -72,7 +72,7 @@ int __ntfs_bitmap_set_bits_in_run(struct /* Get the page containing the first bit (@start_bit). */ mapping = vi->i_mapping; - page = ntfs_map_page(mapping, index); + page = read_kmap_page(mapping, index); if (IS_ERR(page)) { if (!is_rollback) ntfs_error(vi->i_sb, "Failed to map first page (error " @@ -123,8 +123,8 @@ int __ntfs_bitmap_set_bits_in_run(struct /* Update @index and get the next page. */ flush_dcache_page(page); set_page_dirty(page); - ntfs_unmap_page(page); - page = ntfs_map_page(mapping, ++index); + put_kmapped_page(page); + page = read_kmap_page(mapping, ++index); if (IS_ERR(page)) goto rollback; kaddr = page_address(page); @@ -159,7 +159,7 @@ done: /* We are done. Unmap the page and return success. */ flush_dcache_page(page); set_page_dirty(page); - ntfs_unmap_page(page); + put_kmapped_page(page); ntfs_debug("Done."); return 0; rollback: diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ntfs/dir.c linux
[PATCH 14/17] reiserfs: convert reiserfs_get_page to read_kmap_page
Replace reiserfs_get_page() and reiserfs_put_page() using the new read_kmap_page() and put_kmapped_page() calls and their locking variants. Also, propagate the gfp_mask() deadlock comment to callsites. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/reiserfs/xattr.c linux-2.6.21-rc5-mm4-test/fs/reiserfs/xattr.c --- linux-2.6.21-rc5-mm4/fs/reiserfs/xattr.c2007-04-05 17:14:25.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/reiserfs/xattr.c 2007-04-06 14:41:34.0 -0700 @@ -438,33 +438,6 @@ int xattr_readdir(struct file *file, fil return res; } -/* Internal operations on file data */ -static inline void reiserfs_put_page(struct page *page) -{ - kunmap(page); - page_cache_release(page); -} - -static struct page *reiserfs_get_page(struct inode *dir, unsigned long n) -{ - struct address_space *mapping = dir->i_mapping; - struct page *page; - /* We can deadlock if we try to free dentries, - and an unlink/rmdir has just occured - GFP_NOFS avoids this */ - mapping_set_gfp_mask(mapping, GFP_NOFS); - page = read_mapping_page(mapping, n, NULL); - if (!IS_ERR(page)) { - kmap(page); - if (PageError(page)) - goto fail; - } - return page; - - fail: - reiserfs_put_page(page); - return ERR_PTR(-EIO); -} - static inline __u32 xattr_hash(const char *msg, int len) { return csum_partial(msg, len, 0); @@ -537,13 +510,15 @@ reiserfs_xattr_set(struct inode *inode, else chunk = buffer_size - buffer_pos; - page = reiserfs_get_page(xinode, file_pos >> PAGE_CACHE_SHIFT); + /* We can deadlock if we try to free dentries, + and an unlink/rmdir has just occured - GFP_NOFS avoids this */ + mapping_set_gfp_mask(mapping, GFP_NOFS); + page = __read_kmap_page(mapping, file_pos >> PAGE_CACHE_SHIFT); if (IS_ERR(page)) { err = PTR_ERR(page); goto out_filp; } - lock_page(page); data = page_address(page); if (file_pos == 0) { @@ -566,8 +541,7 @@ reiserfs_xattr_set(struct inode *inode, page_offset + chunk + skip); } - unlock_page(page); - reiserfs_put_page(page); + put_locked_page(page); buffer_pos += chunk; file_pos += chunk; skip = 0; @@ -646,13 +620,15 @@ reiserfs_xattr_get(const struct inode *i else chunk = isize - file_pos; - page = reiserfs_get_page(xinode, file_pos >> PAGE_CACHE_SHIFT); + /* We can deadlock if we try to free dentries, + and an unlink/rmdir has just occured - GFP_NOFS avoids this */ + mapping_set_gfp_mask(xinode->i_mapping, GFP_NOFS); + page = __read_kmap_page(xinode->i_mapping, file_pos >> PAGE_CACHE_SHIFT); if (IS_ERR(page)) { err = PTR_ERR(page); goto out_dput; } - lock_page(page); data = page_address(page); if (file_pos == 0) { struct reiserfs_xattr_header *rxh = @@ -661,8 +637,7 @@ reiserfs_xattr_get(const struct inode *i chunk -= skip; /* Magic doesn't match up.. */ if (rxh->h_magic != cpu_to_le32(REISERFS_XATTR_MAGIC)) { - unlock_page(page); - reiserfs_put_page(page); + put_locked_page(page); reiserfs_warning(inode->i_sb, "Invalid magic for xattr (%s) " "associated with %k", name, @@ -673,8 +648,7 @@ reiserfs_xattr_get(const struct inode *i hash = le32_to_cpu(rxh->h_hash); } memcpy(buffer + buffer_pos, data + skip, chunk); - unlock_page(page); - reiserfs_put_page(page); + put_locked_page(page); file_pos += chunk; buffer_pos += chunk; skip = 0; - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/17] ufs: convert ufs_get_page to read_kmap_page
Replace ufs_get_page()/ufs_get_locked_page() and ufs_put_page()/ufs_put_locked_page() using the new read_kmap_page() and put_kmapped_page() calls and their locking variants. Also, change the ufs_check_page() call to return the page's error status, and update the call sites accordingly. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/balloc.c linux-2.6.21-rc5-mm4-test/fs/ufs/balloc.c --- linux-2.6.21-rc5-mm4/fs/ufs/balloc.c2007-04-05 17:13:29.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/ufs/balloc.c 2007-04-06 12:46:02.0 -0700 @@ -272,7 +272,7 @@ static void ufs_change_blocknr(struct in index = i >> (PAGE_CACHE_SHIFT - inode->i_blkbits); if (likely(cur_index != index)) { - page = ufs_get_locked_page(mapping, index); + page = __read_mapping_page(mapping, index, NULL); if (!page)/* it was truncated */ continue; if (IS_ERR(page)) {/* or EIO */ @@ -325,8 +325,10 @@ static void ufs_change_blocknr(struct in bh = bh->b_this_page; } while (bh != head); - if (likely(cur_index != index)) - ufs_put_locked_page(page); + if (likely(cur_index != index)) { + unlock_page(page); + page_cache_release(page); + } } UFSD("EXIT\n"); } diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/truncate.c linux-2.6.21-rc5-mm4-test/fs/ufs/truncate.c --- linux-2.6.21-rc5-mm4/fs/ufs/truncate.c 2007-04-05 17:13:29.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/ufs/truncate.c 2007-04-06 12:46:14.0 -0700 @@ -395,8 +395,9 @@ static int ufs_alloc_lastblock(struct in lastfrag--; - lastpage = ufs_get_locked_page(mapping, lastfrag >> - (PAGE_CACHE_SHIFT - inode->i_blkbits)); + lastpage = __read_mapping_page(mapping, lastfrag >> + (PAGE_CACHE_SHIFT - inode->i_blkbits), + NULL); if (IS_ERR(lastpage)) { err = -EIO; goto out; @@ -441,7 +442,8 @@ static int ufs_alloc_lastblock(struct in } } out_unlock: - ufs_put_locked_page(lastpage); + unlock_page(lastpage); + page_cache_release(lastpage); out: return err; } diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/util.c linux-2.6.21-rc5-mm4-test/fs/ufs/util.c --- linux-2.6.21-rc5-mm4/fs/ufs/util.c 2007-04-05 17:14:25.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/ufs/util.c 2007-04-06 12:40:53.0 -0700 @@ -232,55 +232,3 @@ ufs_set_inode_dev(struct super_block *sb ufsi->i_u1.i_data[0] = cpu_to_fs32(sb, fs32); } -/** - * ufs_get_locked_page() - locate, pin and lock a pagecache page, if not exist - * read it from disk. - * @mapping: the address_space to search - * @index: the page index - * - * Locates the desired pagecache page, if not exist we'll read it, - * locks it, increments its reference - * count and returns its address. - * - */ - -struct page *ufs_get_locked_page(struct address_space *mapping, -pgoff_t index) -{ - struct page *page; - - page = find_lock_page(mapping, index); - if (!page) { - page = read_mapping_page(mapping, index, NULL); - - if (IS_ERR(page)) { - printk(KERN_ERR "ufs_change_blocknr: " - "read_mapping_page error: ino %lu, index: %lu\n", - mapping->host->i_ino, index); - goto out; - } - - lock_page(page); - - if (unlikely(page->mapping == NULL)) { - /* Truncate got there first */ - unlock_page(page); - page_cache_release(page); - page = NULL; - goto out; - } - - if (!PageUptodate(page) || PageError(page)) { - unlock_page(page); - page_cache_release(page); - - printk(KERN_ERR "ufs_change_blocknr: " - "can not read page: ino %lu, index: %lu\n", - mapping->host->i_ino, index); - - page = ERR_PTR(-EIO); - } - } -out: - return page; -} diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/util.h linux-2.6.21-rc5-mm4-test/fs/ufs/util.h --- linux-2.6.21-rc5-mm4/fs/ufs/util.h 2007-04-05 17:13:29.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/ufs/util.h 2007-04-06 12:46:36.0 -0700 @@ -251,16 +251,6 @@ extern void _ubh_ubhcpymem_(struct ufs_s #define ubh_memcpyubh(ubh,mem,size) _ubh_
[PATCH 17/17] vxfs: convert vxfs_get_page to read_kmap_page
Replace vxfs_get_page() with the new read_kmap_page(). Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_extern.h linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_extern.h --- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_extern.h 2007-04-05 17:13:29.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_extern.h 2007-04-06 01:59:19.0 -0700 @@ -69,7 +69,6 @@ extern const struct file_operations vxfs extern int vxfs_read_olt(struct super_block *, u_long); /* vxfs_subr.c */ -extern struct page * vxfs_get_page(struct address_space *, u_long); extern voidvxfs_put_page(struct page *); extern struct buffer_head *vxfs_bread(struct inode *, int); diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_inode.c linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_inode.c --- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_inode.c 2007-04-05 17:14:25.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_inode.c 2007-04-06 01:59:19.0 -0700 @@ -138,7 +138,7 @@ __vxfs_iget(ino_t ino, struct inode *ili u_long offset; offset = (ino % (PAGE_SIZE / VXFS_ISIZE)) * VXFS_ISIZE; - pp = vxfs_get_page(ilistp->i_mapping, ino * VXFS_ISIZE / PAGE_SIZE); + pp = read_kmap_page(ilistp->i_mapping, ino * VXFS_ISIZE / PAGE_SIZE); if (!IS_ERR(pp)) { struct vxfs_inode_info *vip; diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_lookup.c linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_lookup.c --- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_lookup.c 2007-04-05 17:13:29.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_lookup.c 2007-04-06 01:59:19.0 -0700 @@ -125,7 +125,7 @@ vxfs_find_entry(struct inode *ip, struct caddr_t kaddr; struct page *pp; - pp = vxfs_get_page(ip->i_mapping, page); + pp = read_kmap_page(ip->i_mapping, page); if (IS_ERR(pp)) continue; kaddr = (caddr_t)page_address(pp); @@ -280,7 +280,7 @@ vxfs_readdir(struct file *fp, void *retp caddr_t kaddr; struct page *pp; - pp = vxfs_get_page(ip->i_mapping, page); + pp = read_kmap_page(ip->i_mapping, page); if (IS_ERR(pp)) continue; kaddr = (caddr_t)page_address(pp); diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_subr.c linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_subr.c --- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_subr.c2007-04-05 17:14:25.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_subr.c 2007-04-06 01:59:19.0 -0700 @@ -56,39 +56,6 @@ vxfs_put_page(struct page *pp) } /** - * vxfs_get_page - read a page into memory. - * @ip:inode to read from - * @n: page number - * - * Description: - * vxfs_get_page reads the @n th page of @ip into the pagecache. - * - * Returns: - * The wanted page on success, else a NULL pointer. - */ -struct page * -vxfs_get_page(struct address_space *mapping, u_long n) -{ - struct page * pp; - - pp = read_mapping_page(mapping, n, NULL); - - if (!IS_ERR(pp)) { - kmap(pp); - /** if (!PageChecked(pp)) **/ - /** vxfs_check_page(pp); **/ - if (PageError(pp)) - goto fail; - } - - return (pp); - -fail: - vxfs_put_page(pp); - return ERR_PTR(-EIO); -} - -/** * vxfs_bread - read buffer for a give inode,block tuple * @ip:inode * @block: logical block - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/17] reiser4: remove redundant read_mapping_page error checks
read_mapping_page() is now fully synchronous, so there's no need wait for the page lock or check for I/O errors. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/tail_conversion.c linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/tail_conversion.c --- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/tail_conversion.c 2007-04-09 17:24:03.0 -0700 +++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/tail_conversion.c 2007-04-10 21:33:47.0 -0700 @@ -608,14 +608,6 @@ int extent2tail(unix_file_info_t *uf_inf break; } - wait_on_page_locked(page); - - if (!PageUptodate(page)) { - page_cache_release(page); - result = RETERR(-EIO); - break; - } - /* cut part of file we have read */ start_byte = (__u64) (i << PAGE_CACHE_SHIFT); set_key_offset(&from, start_byte); diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/extent_file_ops.c linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/extent_file_ops.c --- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/extent_file_ops.c 2007-04-10 19:41:14.0 -0700 +++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/extent_file_ops.c 2007-04-10 21:38:41.0 -0700 @@ -1220,15 +1220,8 @@ int reiser4_read_extent(struct file *fil page = read_mapping_page(mapping, cur_page, file); if (IS_ERR(page)) return PTR_ERR(page); - lock_page(page); - if (!PageUptodate(page)) { - unlock_page(page); - page_cache_release(page); - warning("jmacd-97178", "extent_read: page is not up to date"); - return RETERR(-EIO); - } + mark_page_accessed(page); - unlock_page(page); /* If users can be writing to this page using arbitrary virtual addresses, take care about potential aliasing before reading - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/17] partition: remove redundant read_mapping_page error checks
Remove unneeded PageError checking in read_dev_sector(), and clean up the code a bit. Can anyone point out why it's OK to use page_address() here on a page which has not been kmapped? If it's not OK, then a good number of callers need to be fixed. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/partitions/check.c linux-2.6.21-rc6-mm1-test/fs/partitions/check.c --- linux-2.6.21-rc6-mm1/fs/partitions/check.c 2007-04-09 17:24:03.0 -0700 +++ linux-2.6.21-rc6-mm1-test/fs/partitions/check.c 2007-04-10 21:59:01.0 -0700 @@ -568,16 +568,12 @@ unsigned char *read_dev_sector(struct bl page = read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_CACHE_SHIFT-9)), NULL); - if (!IS_ERR(page)) { - if (PageError(page)) - goto fail; - p->v = page; - return (unsigned char *)page_address(page) + ((n & ((1 << (PAGE_CACHE_SHIFT - 9)) - 1)) << 9); -fail: - page_cache_release(page); + if (IS_ERR(page)) { + p->v = NULL; + return NULL; } - p->v = NULL; - return NULL; + p->v = page; + return (unsigned char *)page_address(page) + ((n & ((1 << (PAGE_CACHE_SHIFT - 9)) - 1)) << 9); } EXPORT_SYMBOL(read_dev_sector); - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/17] sysv: convert dir_get_page to read_kmap_page
Replace sysv dir_get_page() with the new read_kmap_page(). Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/sysv/dir.c linux-2.6.21-rc5-mm4-test/fs/sysv/dir.c --- linux-2.6.21-rc5-mm4/fs/sysv/dir.c 2007-04-05 17:14:25.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/sysv/dir.c 2007-04-06 01:59:19.0 -0700 @@ -50,15 +50,6 @@ static int dir_commit_chunk(struct page return err; } -static struct page * dir_get_page(struct inode *dir, unsigned long n) -{ - struct address_space *mapping = dir->i_mapping; - struct page *page = read_mapping_page(mapping, n, NULL); - if (!IS_ERR(page)) - kmap(page); - return page; -} - static int sysv_readdir(struct file * filp, void * dirent, filldir_t filldir) { unsigned long pos = filp->f_pos; @@ -77,7 +68,7 @@ static int sysv_readdir(struct file * fi for ( ; n < npages; n++, offset = 0) { char *kaddr, *limit; struct sysv_dir_entry *de; - struct page *page = dir_get_page(inode, n); + struct page *page = read_kmap_page(inode->i_mapping, n); if (IS_ERR(page)) continue; @@ -149,7 +140,7 @@ struct sysv_dir_entry *sysv_find_entry(s do { char *kaddr; - page = dir_get_page(dir, n); + page = read_kmap_page(dir->i_mapping, n); if (!IS_ERR(page)) { kaddr = (char*)page_address(page); de = (struct sysv_dir_entry *) kaddr; @@ -191,7 +182,7 @@ int sysv_add_link(struct dentry *dentry, /* We take care of directory expansion in the same loop */ for (n = 0; n <= npages; n++) { - page = dir_get_page(dir, n); + page = read_kmap_page(dir->i_mapping, n); err = PTR_ERR(page); if (IS_ERR(page)) goto out; @@ -299,7 +290,7 @@ int sysv_empty_dir(struct inode * inode) for (i = 0; i < npages; i++) { char *kaddr; struct sysv_dir_entry * de; - page = dir_get_page(inode, i); + page = read_kmap_page(inode->i_mapping, i); if (IS_ERR(page)) continue; @@ -353,7 +344,7 @@ void sysv_set_link(struct sysv_dir_entry struct sysv_dir_entry * sysv_dotdot (struct inode *dir, struct page **p) { - struct page *page = dir_get_page(dir, 0); + struct page *page = read_kmap_page(dir->i_mapping, 0); struct sysv_dir_entry *de = NULL; if (!IS_ERR(page)) { - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/17] minix: convert dir_get_page to read_kmap_page
Replace minix dir_get_page() and dir_put_page() using the new read_kmap_page() and put_kmapped_page()/put_locked_page() calls. Also, use __read_kmap_page() instead of re-taking the page_lock. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/minix/dir.c linux-2.6.21-rc5-mm4-test/fs/minix/dir.c --- linux-2.6.21-rc5-mm4/fs/minix/dir.c 2007-04-05 17:14:25.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/minix/dir.c2007-04-06 02:31:55.0 -0700 @@ -23,12 +23,6 @@ const struct file_operations minix_dir_o .fsync = minix_sync_file, }; -static inline void dir_put_page(struct page *page) -{ - kunmap(page); - page_cache_release(page); -} - /* * Return the offset into page `page_nr' of the last valid * byte in that page, plus one. @@ -60,22 +54,6 @@ static int dir_commit_chunk(struct page return err; } -static struct page * dir_get_page(struct inode *dir, unsigned long n) -{ - struct address_space *mapping = dir->i_mapping; - struct page *page = read_mapping_page(mapping, n, NULL); - if (!IS_ERR(page)) { - kmap(page); - if (!PageUptodate(page)) - goto fail; - } - return page; - -fail: - dir_put_page(page); - return ERR_PTR(-EIO); -} - static inline void *minix_next_entry(void *de, struct minix_sb_info *sbi) { return (void*)((char*)de + sbi->s_dirsize); @@ -102,7 +80,7 @@ static int minix_readdir(struct file * f for ( ; n < npages; n++, offset = 0) { char *p, *kaddr, *limit; - struct page *page = dir_get_page(inode, n); + struct page *page = read_kmap_page(inode->i_mapping, n); if (IS_ERR(page)) continue; @@ -128,12 +106,12 @@ static int minix_readdir(struct file * f (n << PAGE_CACHE_SHIFT) | offset, inumber, DT_UNKNOWN); if (over) { - dir_put_page(page); + put_kmapped_page(page); goto done; } } } - dir_put_page(page); + put_kmapped_page(page); } done: @@ -177,7 +155,7 @@ minix_dirent *minix_find_entry(struct de for (n = 0; n < npages; n++) { char *kaddr, *limit; - page = dir_get_page(dir, n); + page = read_kmap_page(dir->i_mapping, n); if (IS_ERR(page)) continue; @@ -198,7 +176,7 @@ minix_dirent *minix_find_entry(struct de if (namecompare(namelen, sbi->s_namelen, name, namx)) goto found; } - dir_put_page(page); + put_kmapped_page(page); } return NULL; @@ -233,11 +211,10 @@ int minix_add_link(struct dentry *dentry for (n = 0; n <= npages; n++) { char *limit, *dir_end; - page = dir_get_page(dir, n); + page = __read_kmap_page(dir->i_mapping, n); err = PTR_ERR(page); if (IS_ERR(page)) goto out; - lock_page(page); kaddr = (char*)page_address(page); dir_end = kaddr + minix_last_byte(dir, n); limit = kaddr + PAGE_CACHE_SIZE - sbi->s_dirsize; @@ -265,8 +242,7 @@ int minix_add_link(struct dentry *dentry if (namecompare(namelen, sbi->s_namelen, name, namx)) goto out_unlock; } - unlock_page(page); - dir_put_page(page); + put_locked_page(page); } BUG(); return -EINVAL; @@ -288,13 +264,12 @@ got_it: err = dir_commit_chunk(page, from, to); dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC; mark_inode_dirty(dir); -out_put: - dir_put_page(page); + put_kmapped_page(page); out: return err; out_unlock: - unlock_page(page); - goto out_put; + put_locked_page(page); + return err; } int minix_delete_entry(struct minix_dir_entry *de, struct page *page) @@ -314,7 +289,7 @@ int minix_delete_entry(struct minix_dir_ } else { unlock_page(page); } - dir_put_page(page); + put_kmapped_page(page); inode->i_ctime = inode->i_mtime = CURRENT_TIME_SEC; mark_inode_dirty(inode); return err; @@ -378,7 +353,7 @@ int minix_empty_dir(struct inode * inode for (i = 0; i < npages; i++) { char *p, *kaddr, *limit; - page = dir_get_page(inode, i); + page = read_kmap_page(inode->i_mapping, i); if (IS_ERR(page)
[PATCH 10/17] mtd: convert page_read to read_kmap_page
Replace page_read() with read_kmap_page()/__read_kmap_page(). This probably fixes behaviour on highmem systems, since page_address() was being used without kmap(). Also eliminate the need to re-take the page lock during writes to the page. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/drivers/mtd/devices/block2mtd.c linux-2.6.21-rc5-mm4-test/drivers/mtd/devices/block2mtd.c --- linux-2.6.21-rc5-mm4/drivers/mtd/devices/block2mtd.c2007-04-05 17:14:24.0 -0700 +++ linux-2.6.21-rc5-mm4-test/drivers/mtd/devices/block2mtd.c 2007-04-06 01:59:19.0 -0700 @@ -39,12 +39,6 @@ struct block2mtd_dev { /* Static info about the MTD, used in cleanup_module */ static LIST_HEAD(blkmtd_device_list); - -static struct page *page_read(struct address_space *mapping, int index) -{ - return read_mapping_page(mapping, index, NULL); -} - /* erase a specified part of the device */ static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len) { @@ -56,23 +50,19 @@ static int _block2mtd_erase(struct block u_long *max; while (pages) { - page = page_read(mapping, index); - if (!page) - return -ENOMEM; + page = __read_kmap_page(mapping, index); if (IS_ERR(page)) return PTR_ERR(page); max = page_address(page) + PAGE_SIZE; for (p=page_address(page); pblkdev->bd_inode->i_mapping, index); - if (!page) - return -ENOMEM; + page = read_kmap_page(dev->blkdev->bd_inode->i_mapping, index); if (IS_ERR(page)) return PTR_ERR(page); memcpy(buf, page_address(page) + offset, cpylen); - page_cache_release(page); + put_kmapped_page(page); if (retlen) *retlen += cpylen; @@ -163,19 +151,15 @@ static int _block2mtd_write(struct block cpylen = len; // this page len = len - cpylen; - page = page_read(mapping, index); - if (!page) - return -ENOMEM; + page = __read_kmap_page(mapping, index); if (IS_ERR(page)) return PTR_ERR(page); if (memcmp(page_address(page)+offset, buf, cpylen)) { - lock_page(page); memcpy(page_address(page) + offset, buf, cpylen); set_page_dirty(page); - unlock_page(page); } - page_cache_release(page); + put_locked_page(page); if (retlen) *retlen += cpylen; - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/17] hfsplus: remove redundant read_mapping_page error check
Now that read_mapping_page() does error checking internally, there is no need to check PageError here. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/hfsplus/bnode.c linux-2.6.21-rc6-mm1-test/fs/hfsplus/bnode.c --- linux-2.6.21-rc6-mm1/fs/hfsplus/bnode.c 2007-04-09 17:20:13.0 -0700 +++ linux-2.6.21-rc6-mm1-test/fs/hfsplus/bnode.c2007-04-10 21:28:45.0 -0700 @@ -442,10 +442,6 @@ static struct hfs_bnode *__hfs_bnode_cre page = read_mapping_page(mapping, block, NULL); if (IS_ERR(page)) goto fail; - if (PageError(page)) { - page_cache_release(page); - goto fail; - } page_cache_release(page); node->page[i] = page; } - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/17] jfs: use locking read_mapping_page
Use the new locking variant of read_mapping_page to avoid doing extra work. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/jfs/jfs_metapage.c linux-2.6.21-rc6-mm1-test/fs/jfs/jfs_metapage.c --- linux-2.6.21-rc6-mm1/fs/jfs/jfs_metapage.c 2007-04-09 17:23:48.0 -0700 +++ linux-2.6.21-rc6-mm1-test/fs/jfs/jfs_metapage.c 2007-04-09 21:37:09.0 -0700 @@ -632,12 +632,11 @@ struct metapage *__get_metapage(struct i } SetPageUptodate(page); } else { - page = read_mapping_page(mapping, page_index, NULL); - if (IS_ERR(page) || !PageUptodate(page)) { + page = __read_mapping_page(mapping, page_index, NULL); + if (IS_ERR(page)) { jfs_err("read_mapping_page failed!"); return NULL; } - lock_page(page); } mp = page_to_mp(page, page_offset); - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/17] jffs2: convert jffs2_gc_fetch_page to read_cache_page
Replace jffs2_gc_fetch_page() and jffs2_gc_release_page() using the read_cache_page() and put_kmapped_page() calls, and update the call site accordingly. Explicit calls to kmap()/kunmap() make the code more clear. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/jffs2/fs.c linux-2.6.21-rc5-mm4-test/fs/jffs2/fs.c --- linux-2.6.21-rc5-mm4/fs/jffs2/fs.c 2007-04-05 17:14:25.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/jffs2/fs.c 2007-04-06 01:59:19.0 -0700 @@ -621,33 +621,6 @@ struct jffs2_inode_info *jffs2_gc_fetch_ return JFFS2_INODE_INFO(inode); } -unsigned char *jffs2_gc_fetch_page(struct jffs2_sb_info *c, - struct jffs2_inode_info *f, - unsigned long offset, - unsigned long *priv) -{ - struct inode *inode = OFNI_EDONI_2SFFJ(f); - struct page *pg; - - pg = read_cache_page(inode->i_mapping, offset >> PAGE_CACHE_SHIFT, -(void *)jffs2_do_readpage_unlock, inode); - if (IS_ERR(pg)) - return (void *)pg; - - *priv = (unsigned long)pg; - return kmap(pg); -} - -void jffs2_gc_release_page(struct jffs2_sb_info *c, - unsigned char *ptr, - unsigned long *priv) -{ - struct page *pg = (void *)*priv; - - kunmap(pg); - page_cache_release(pg); -} - static int jffs2_flash_setup(struct jffs2_sb_info *c) { int ret = 0; diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/jffs2/gc.c linux-2.6.21-rc5-mm4-test/fs/jffs2/gc.c --- linux-2.6.21-rc5-mm4/fs/jffs2/gc.c 2007-04-05 17:13:10.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/jffs2/gc.c 2007-04-06 01:59:19.0 -0700 @@ -1078,7 +1078,7 @@ static int jffs2_garbage_collect_dnode(s uint32_t alloclen, offset, orig_end, orig_start; int ret = 0; unsigned char *comprbuf = NULL, *writebuf; - unsigned long pg; + struct page *page; unsigned char *pg_ptr; memset(&ri, 0, sizeof(ri)); @@ -1219,12 +1219,16 @@ static int jffs2_garbage_collect_dnode(s *page OK. We'll actually write it out again in commit_write, which is a little *suboptimal, but at least we're correct. */ - pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg); + page = read_cache_page(OFNI_EDONI_2SFFJ(f)->i_mapping, + start >> PAGE_CACHE_SHIFT, + (void *)jffs2_do_readpage_unlock, + OFNI_EDONI_2SFFJ(f)); - if (IS_ERR(pg_ptr)) { + if (IS_ERR(page)) { printk(KERN_WARNING "read_cache_page() returned error: %ld\n", PTR_ERR(pg_ptr)); - return PTR_ERR(pg_ptr); + return PTR_ERR(page); } + pg_ptr = kmap(page); offset = start; while(offset < orig_end) { @@ -1287,6 +1291,7 @@ static int jffs2_garbage_collect_dnode(s } } - jffs2_gc_release_page(c, pg_ptr, &pg); + kunmap(page); + page_cache_release(page); return ret; } - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/17] afs: convert afs_dir_get_page to read_kmap_page
Replace afs_dir_get_page() and afs_dir_put_page() using the new read_kmap_page() and put_kmapped_page() calls, and eliminate unnecessary PageError checks. Also, change the afs_dir_check_page() call to return the page's error status, and update the call site accordingly. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/afs/dir.c linux-2.6.21-rc5-mm4-test/fs/afs/dir.c --- linux-2.6.21-rc5-mm4/fs/afs/dir.c 2007-04-06 12:27:03.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/afs/dir.c 2007-04-06 14:30:22.0 -0700 @@ -115,12 +115,15 @@ struct afs_dir_lookup_cookie { /* * check that a directory page is valid */ -static inline void afs_dir_check_page(struct inode *dir, struct page *page) +static inline int afs_dir_check_page(struct inode *dir, struct page *page) { struct afs_dir_page *dbuf; loff_t latter; int tmp, qty; + if (likely(PageChecked(page))) + return PageError(page); + #if 0 /* check the page count */ qty = desc.size / sizeof(dbuf->blocks[0]); @@ -154,52 +157,16 @@ static inline void afs_dir_check_page(st } SetPageChecked(page); - return; + return 0; error: SetPageChecked(page); SetPageError(page); - + return 1; } /* end afs_dir_check_page() */ /*/ /* - * discard a page cached in the pagecache - */ -static inline void afs_dir_put_page(struct page *page) -{ - kunmap(page); - page_cache_release(page); - -} /* end afs_dir_put_page() */ - -/*/ -/* - * get a page into the pagecache - */ -static struct page *afs_dir_get_page(struct inode *dir, unsigned long index) -{ - struct page *page; - - _enter("{%lu},%lu", dir->i_ino, index); - - page = read_mapping_page(dir->i_mapping, index, NULL); - if (!IS_ERR(page)) { - kmap(page); - if (!PageChecked(page)) - afs_dir_check_page(dir, page); - if (PageError(page)) - goto fail; - } - return page; - - fail: - afs_dir_put_page(page); - return ERR_PTR(-EIO); -} /* end afs_dir_get_page() */ - -/*/ -/* * open an AFS directory file */ static int afs_dir_open(struct inode *inode, struct file *file) @@ -344,11 +311,16 @@ static int afs_dir_iterate(struct inode blkoff = *fpos & ~(sizeof(union afs_dir_block) - 1); /* fetch the appropriate page from the directory */ - page = afs_dir_get_page(dir, blkoff / PAGE_SIZE); + page = read_kmap_page(dir->i_mapping, blkoff / PAGE_SIZE); if (IS_ERR(page)) { ret = PTR_ERR(page); break; } + if (afs_check_page(dir, page)) { + err = -EIO; + put_kmapped_page(page); + break; + } limit = blkoff & ~(PAGE_SIZE - 1); @@ -361,7 +333,7 @@ static int afs_dir_iterate(struct inode ret = afs_dir_iterate_block(fpos, dblock, blkoff, cookie, filldir); if (ret != 1) { - afs_dir_put_page(page); + put_kmapped_page(page); goto out; } @@ -369,7 +341,7 @@ static int afs_dir_iterate(struct inode } while (*fpos < dir->i_size && blkoff < limit); - afs_dir_put_page(page); + put_kmapped_page(page); ret = 0; } diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/afs/mntpt.c linux-2.6.21-rc6-mm1-test/fs/afs/mntpt.c --- linux-2.6.21-rc6-mm1/fs/afs/mntpt.c 2007-04-09 17:24:03.0 -0700 +++ linux-2.6.21-rc6-mm1-test/fs/afs/mntpt.c2007-04-10 21:22:07.0 -0700 @@ -74,11 +74,6 @@ int afs_mntpt_check_symlink(struct afs_v ret = PTR_ERR(page); goto out; } - - ret = -EIO; - if (PageError(page)) - goto out_free; - buf = kmap(page); /* examine the symlink's contents */ @@ -98,7 +93,6 @@ int afs_mntpt_check_symlink(struct afs_v ret = 0; kunmap(page); - out_free: page_cache_release(page); out: _leave(" = %d", ret); @@ -180,10 +174,6 @@ static struct vfsmount *afs_mntpt_do_aut goto error; } - ret = -EIO; - if (PageError(page)) - goto error; - buf = kmap(page); memcpy(devname, buf, size); kunmap(page); - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"
[PATCH 4/17] ext2: convert ext2_get_page to read_kmap_page
Replace ext2_get_page() and ext2_put_page() using the new read_kmap_page() and put_kmapped_page() calls. Also, change the ext2_check_page() call to return the page's error status, and update the call sites accordingly. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ext2/dir.c linux-2.6.21-rc5-mm4-test/fs/ext2/dir.c --- linux-2.6.21-rc5-mm4/fs/ext2/dir.c 2007-04-06 12:27:03.0 -0700 +++ linux-2.6.21-rc5-mm4-test/fs/ext2/dir.c 2007-04-06 14:34:23.0 -0700 @@ -35,12 +35,6 @@ static inline unsigned ext2_chunk_size(s return inode->i_sb->s_blocksize; } -static inline void ext2_put_page(struct page *page) -{ - kunmap(page); - page_cache_release(page); -} - static inline unsigned long dir_pages(struct inode *inode) { return (inode->i_size+PAGE_CACHE_SIZE-1)>>PAGE_CACHE_SHIFT; @@ -74,7 +68,7 @@ static int ext2_commit_chunk(struct page return err; } -static void ext2_check_page(struct page *page) +static int ext2_check_page(struct page *page) { struct inode *dir = page->mapping->host; struct super_block *sb = dir->i_sb; @@ -86,6 +80,14 @@ static void ext2_check_page(struct page ext2_dirent *p; char *error; + if (likely(PageChecked(page))) { + if (likely(!PageError(page))) + return 0; + + put_kmapped_page(page); + return -EIO; + } + if ((dir->i_size >> PAGE_CACHE_SHIFT) == page->index) { limit = dir->i_size & ~PAGE_CACHE_MASK; if (limit & (chunk_size - 1)) @@ -112,7 +114,7 @@ static void ext2_check_page(struct page goto Eend; out: SetPageChecked(page); - return; + return 0; /* Too bad, we had an error */ @@ -153,24 +155,8 @@ Eend: fail: SetPageChecked(page); SetPageError(page); -} - -static struct page * ext2_get_page(struct inode *dir, unsigned long n) -{ - struct address_space *mapping = dir->i_mapping; - struct page *page = read_mapping_page(mapping, n, NULL); - if (!IS_ERR(page)) { - kmap(page); - if (!PageChecked(page)) - ext2_check_page(page); - if (PageError(page)) - goto fail; - } - return page; - -fail: - ext2_put_page(page); - return ERR_PTR(-EIO); + put_kmapped_page(page); + return -EIO; } /* @@ -262,9 +248,9 @@ ext2_readdir (struct file * filp, void * for ( ; n < npages; n++, offset = 0) { char *kaddr, *limit; ext2_dirent *de; - struct page *page = ext2_get_page(inode, n); + struct page *page = read_kmap_page(inode->i_mapping, n); - if (IS_ERR(page)) { + if (IS_ERR(page) || ext2_check_page(page)) { ext2_error(sb, __FUNCTION__, "bad page in #%lu", inode->i_ino); @@ -286,7 +272,7 @@ ext2_readdir (struct file * filp, void * if (de->rec_len == 0) { ext2_error(sb, __FUNCTION__, "zero-length directory entry"); - ext2_put_page(page); + put_kmapped_page(page); return -EIO; } if (de->inode) { @@ -301,13 +287,13 @@ ext2_readdir (struct file * filp, void * (nf_pos += le16_to_cpu(de->rec_len); } - ext2_put_page(page); + put_kmapped_page(page); } return 0; } @@ -344,8 +330,8 @@ struct ext2_dir_entry_2 * ext2_find_entr n = start; do { char *kaddr; - page = ext2_get_page(dir, n); - if (!IS_ERR(page)) { + page = read_kmap_page(dir->i_mapping, n); + if (!IS_ERR(page) && !ext2_check_page(page)) { kaddr = page_address(page); de = (ext2_dirent *) kaddr; kaddr += ext2_last_byte(dir, n) - reclen; @@ -353,14 +339,14 @@ struct ext2_dir_entry_2 * ext2_find_entr if (de->rec_len == 0) { ext2_error(dir->i_sb, __FUNCTION__, "zero-length directory entry"); - ext2_put_page(page); +
[PATCH 2/17] fs: introduce new read_cache_page interface
Export a single version of read_cache_page, which returns with a locked, Uptodate page or a synchronous error, and use inline helper functions to replicate the old behavior. Also, introduce new helper functions for the most common file system uses, which include kmapping the page, as well as needing to keep the page locked. These changes collectively eliminate a substantial amount of private fs logic in favor of generic code. It also simplifies filemap.c significantly, by assuming that callers want synchronous behavior, which is true for all callers anyway except one. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/include/linux/pagemap.h linux-2.6.21-rc6-mm1-test/include/linux/pagemap.h --- linux-2.6.21-rc6-mm1/include/linux/pagemap.h2007-04-11 14:22:19.0 -0700 +++ linux-2.6.21-rc6-mm1-test/include/linux/pagemap.h 2007-04-11 14:29:31.0 -0700 @@ -108,21 +108,30 @@ static inline struct page *grab_cache_pa extern struct page * grab_cache_page_nowait(struct address_space *mapping, unsigned long index); -extern struct page * read_cache_page_async(struct address_space *mapping, - unsigned long index, filler_t *filler, - void *data); -extern struct page * read_cache_page(struct address_space *mapping, +extern struct page *__read_cache_page(struct address_space *mapping, unsigned long index, filler_t *filler, void *data); extern int read_cache_pages(struct address_space *mapping, struct list_head *pages, filler_t *filler, void *data); -static inline struct page *read_mapping_page_async( - struct address_space *mapping, +void fastcall unlock_page(struct page *page); +static inline struct page *read_cache_page(struct address_space *mapping, + unsigned long index, filler_t *filler, + void *data) +{ + struct page *page; + + page = __read_cache_page(mapping, index, filler, data); + if (!IS_ERR(page)) + unlock_page(page); + return page; +} + +static inline struct page *__read_mapping_page(struct address_space *mapping, unsigned long index, void *data) { filler_t *filler = (filler_t *)mapping->a_ops->readpage; - return read_cache_page_async(mapping, index, filler, data); + return __read_cache_page(mapping, index, filler, data); } static inline struct page *read_mapping_page(struct address_space *mapping, @@ -132,6 +141,36 @@ static inline struct page *read_mapping_ return read_cache_page(mapping, index, filler, data); } +static inline struct page *__read_kmap_page(struct address_space *mapping, + unsigned long index) +{ + struct page *page = __read_mapping_page(mapping, index, NULL); + if (!IS_ERR(page)) + kmap(page); + return page; +} + +static inline struct page *read_kmap_page(struct address_space *mapping, + unsigned long index) +{ + struct page *page = read_mapping_page(mapping, index, NULL); + if (!IS_ERR(page)) + kmap(page); + return page; +} + +static inline void put_kmapped_page(struct page *page) +{ + kunmap(page); + page_cache_release(page); +} + +static inline void put_locked_page(struct page *page) +{ + unlock_page(page); + put_kmapped_page(page); +} + int add_to_page_cache(struct page *page, struct address_space *mapping, unsigned long index, gfp_t gfp_mask); int add_to_page_cache_lru(struct page *page, struct address_space *mapping, diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/mm/filemap.c linux-2.6.21-rc6-mm1-test/mm/filemap.c --- linux-2.6.21-rc6-mm1/mm/filemap.c 2007-04-11 14:26:42.0 -0700 +++ linux-2.6.21-rc6-mm1-test/mm/filemap.c 2007-04-10 21:46:03.0 -0700 @@ -1600,115 +1600,53 @@ int generic_file_readonly_mmap(struct fi EXPORT_SYMBOL(generic_file_mmap); EXPORT_SYMBOL(generic_file_readonly_mmap); -static struct page *__read_cache_page(struct address_space *mapping, - unsigned long index, - int (*filler)(void *,struct page*), - void *data) -{ - struct page *page, *cached_page = NULL; - int err; -repeat: - page = find_get_page(mapping, index); - if (!page) { - if (!cached_page) { - cached_page = page_cache_alloc_cold(mapping); - if (!cached_page) - return ERR_PTR(-ENOMEM); - } - err = add_to_page_cache_lru(cached_page, mapping, - index,
[PATCH 1/17] cramfs: use read_mapping_page
read_mapping_page_async() is going away, so convert its only user to read_mapping_page(). This change has not been benchmarked, however, in order to get real parallelism this wants something completely different, like __do_page_cache_readahead(), which is not currently exported. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/cramfs/inode.c linux-2.6.21-rc6-mm1-test/fs/cramfs/inode.c --- linux-2.6.21-rc6-mm1/fs/cramfs/inode.c 2007-04-09 17:24:03.0 -0700 +++ linux-2.6.21-rc6-mm1-test/fs/cramfs/inode.c 2007-04-09 21:37:09.0 -0700 @@ -180,8 +180,7 @@ static void *cramfs_read(struct super_bl struct page *page = NULL; if (blocknr + i < devsize) { - page = read_mapping_page_async(mapping, blocknr + i, - NULL); + page = read_mapping_page(mapping, blocknr + i, NULL); /* synchronous error? */ if (IS_ERR(page)) page = NULL; - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/17] fs: cleanup single page synchronous read interface
Nick Piggin recently changed the read_cache_page interface to be synchronous, which is pretty much what the file systems want anyway. Turns out that they have more in common than that, though, and some of them want to be able to get an uptodate *locked* page. Many of them want a kmapped page, which is uptodate and unlocked, and they all have their own individual helper functions to achieve this. Since the helper functions are so similar, this patch just combines them into a small number of simple library functions, which call read_cache_page (renamed to __read_cache_page because it now returns a locked page). The immediate result is a vast reduction in the number of fs-specific helper functions. The secondary goal is to reduce the number of places the page lock is taken, and eliminate a lot of PageUptodate and PageError checks. The file systems that still use PageChecked now have checker functions that return an error if the page is corrupted or has some other error. This simplifies the logic since the checker function is not part of any helper function anymore. Compile tested on x86_64. Signed-off-by: Nate Diller <[EMAIL PROTECTED]> --- drivers/mtd/devices/block2mtd.c | 28 +-- fs/afs/dir.c | 56 +++--- fs/afs/mntpt.c | 10 -- fs/cramfs/inode.c|3 fs/ext2/dir.c| 82 - fs/freevxfs/vxfs_extern.h|1 fs/freevxfs/vxfs_inode.c |2 fs/freevxfs/vxfs_lookup.c|4 - fs/freevxfs/vxfs_subr.c | 33 fs/hfs/bnode.c |4 - fs/hfsplus/bnode.c |4 - fs/jffs2/fs.c| 27 --- fs/jffs2/gc.c| 15 ++- fs/jfs/jfs_metapage.c|5 - fs/minix/dir.c | 59 --- fs/ntfs/aops.h | 67 - fs/ntfs/bitmap.c |8 +- fs/ntfs/dir.c| 65 ++--- fs/ntfs/index.c | 12 +-- fs/ntfs/lcnalloc.c |6 - fs/ntfs/logfile.c| 12 +-- fs/ntfs/mft.c| 53 + fs/ntfs/super.c | 38 - fs/ntfs/usnjrnl.c|4 - fs/partitions/check.c| 14 +-- fs/reiser4/plugin/file/tail_conversion.c |8 -- fs/reiser4/plugin/item/extent_file_ops.c |9 -- fs/reiserfs/xattr.c | 48 ++-- fs/sysv/dir.c| 19 +--- fs/ufs/balloc.c |8 +- fs/ufs/dir.c | 90 +-- fs/ufs/truncate.c|8 +- fs/ufs/util.c| 52 - fs/ufs/util.h| 10 -- include/linux/pagemap.h | 53 - mm/filemap.c | 118 +++ 36 files changed, 315 insertions(+), 720 deletions(-) - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] make iunique use a do/while loop rather than its obscure goto loop
A while back, Christoph mentioned that he thought that iunique ought to be cleaned up to use a more conventional loop construct. This patch does that, turning the strange goto loop into a do/while. Signed-off-by: Jeff Layton <[EMAIL PROTECTED]> diff --git a/fs/inode.c b/fs/inode.c index 23fc1fd..90e7587 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -689,21 +689,18 @@ ino_t iunique(struct super_block *sb, ino_t max_reserved) struct inode *inode; struct hlist_head * head; ino_t res; + spin_lock(&inode_lock); -retry: - if (counter > max_reserved) { - head = inode_hashtable + hash(sb,counter); + do { + if (counter <= max_reserved) + counter = max_reserved + 1; res = counter++; + head = inode_hashtable + hash(sb, res); inode = find_inode_fast(sb, head, res); - if (!inode) { - spin_unlock(&inode_lock); - return res; - } - } else { - counter = max_reserved + 1; - } - goto retry; - + } while (inode != NULL); + spin_unlock(&inode_lock); + + return res; } EXPORT_SYMBOL(iunique); - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] unprivileged mount syscall
Quoting Miklos Szeredi ([EMAIL PROTECTED]): > > Not objecting to prctl(), but two other options would be > > > > 1. add a CLONE_NEW_NS_USERMNT flag - kind of ugly, but that is > >the time at which the ns is created, so in that sense it > >makes sense. > > Yes, I thought about this, but there's no easy way to set the flag for > the initial namespace, and a second flag CLONE_NEW_NS_NOUSERMNT would > be needed to turn off the flag. Not mentioning it would 'turn it off' for the cloned ns, but the default value for the initial namespace is still a problem. > > 2. use the nsproxy container subsystem (see Paul Menage's > >containers patchset) to set this using, e.g., > > > > echo 1 > /containers/vserver1/mounts/usermount > > That again would lose some flexibility: only namespaces which > are part of a container could be manipulated. In the nsproxy subsystem, every namespace gets a container so long as the nsproxy subsystem is mounted. > Does that exclude the > initial namespace? No, the initial namespace is tied to the root dentry - so if as my example was assuming youve done mount -t container -o ns none /containers then to change the setting for the initial namespace you would echo 0 > /containers/mounts/usermount > Also how would a process find out which vserver it is running in? cat /proc/$$/container -serge - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] unprivileged mount syscall
> Not objecting to prctl(), but two other options would be > > 1. add a CLONE_NEW_NS_USERMNT flag - kind of ugly, but that is > the time at which the ns is created, so in that sense it > makes sense. Yes, I thought about this, but there's no easy way to set the flag for the initial namespace, and a second flag CLONE_NEW_NS_NOUSERMNT would be needed to turn off the flag. > 2. use the nsproxy container subsystem (see Paul Menage's > containers patchset) to set this using, e.g., > > echo 1 > /containers/vserver1/mounts/usermount That again would lose some flexibility: only namespaces which are part of a container could be manipulated. Does that exclude the initial namespace? Also how would a process find out which vserver it is running in? Thanks, Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] AFS: Add security support
On Wed, Apr 11, 2007 at 09:10:32PM +0100, David Howells wrote: > J. Bruce Fields <[EMAIL PROTECTED]> wrote: > > > Just curious--when is the actual crypto done? There doesn't seem to be > > any in this patch. > > See AF_RXRPC patch: > > http://people.redhat.com/~dhowells/rxrpc/04-af_rxrpc.diff > > You turn on CONFIG_RXKAD and load the rxkad module thus built (assuming you > haven't built it in) after loading the af_rxrpc module. I probably should've > mentioned that in the cover. Oh, I see--didn't think to check net/rxrpc. Thanks! --b. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] AFS: Add security support
J. Bruce Fields <[EMAIL PROTECTED]> wrote: > Just curious--when is the actual crypto done? There doesn't seem to be > any in this patch. See AF_RXRPC patch: http://people.redhat.com/~dhowells/rxrpc/04-af_rxrpc.diff You turn on CONFIG_RXKAD and load the rxkad module thus built (assuming you haven't built it in) after loading the af_rxrpc module. I probably should've mentioned that in the cover. So anyone using sockets of family AF_RXRPC can use it. See these test programs: (1) The klog test program fetches a ticket from the kaserver and adds it as a key of type rxrpc: http://people.redhat.com/~dhowells/rxrpc/klog.c (2) The listen test program which listens for potentially secured incoming calls: http://people.redhat.com/~dhowells/rxrpc/listen.c (3) The rxrpc test program which can make secure calls: http://people.redhat.com/~dhowells/rxrpc/rxrpc.c David - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] unprivileged mount syscall
Quoting Miklos Szeredi ([EMAIL PROTECTED]): > > It would be nice in general if we could avoid any sort of checks for > > (mnt->mnt_ns == init_nsproxy.mnt_ns). Maybe that won't be possible, > > but, taking the two listed examples: > > [snip] > > It's probably worthwile going after these problematic cases, and > fixing them, OTOH it's not easy to audit a complete system for holes > arising from user mounts in the global namespace. > > So why not move this decision out from the kernel? How about adding a > boolean flag to namespaces, which specifies whether unprivileged > mounts are allowed or not. This would give complete flexibility to > distro builders and sysadmins. > > The biggest problem I see is how to set this flag. There's no easy > way to represent namespaces in /proc or /sys, and this is sufficiently > obscure not to warrant a new syscall. Adding a new flag to prctl() > could do the trick. Does that sound OK? Not objecting to prctl(), but two other options would be 1. add a CLONE_NEW_NS_USERMNT flag - kind of ugly, but that is the time at which the ns is created, so in that sense it makes sense. 2. use the nsproxy container subsystem (see Paul Menage's containers patchset) to set this using, e.g., echo 1 > /containers/vserver1/mounts/usermount The prctl() method has a huge advantage of being implementable right now. -serge - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] unprivileged mount syscall
> It would be nice in general if we could avoid any sort of checks for > (mnt->mnt_ns == init_nsproxy.mnt_ns). Maybe that won't be possible, > but, taking the two listed examples: [snip] It's probably worthwile going after these problematic cases, and fixing them, OTOH it's not easy to audit a complete system for holes arising from user mounts in the global namespace. So why not move this decision out from the kernel? How about adding a boolean flag to namespaces, which specifies whether unprivileged mounts are allowed or not. This would give complete flexibility to distro builders and sysadmins. The biggest problem I see is how to set this flag. There's no easy way to represent namespaces in /proc or /sys, and this is sufficiently obscure not to warrant a new syscall. Adding a new flag to prctl() could do the trick. Does that sound OK? Thanks, Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] AFS: Add security support
On Wed, Apr 11, 2007 at 08:10:37PM +0100, David Howells wrote: > Add security support to the AFS filesystem. Kerberos IV tickets are > added as RxRPC keys are added to the session keyring with the klog > program. open() and other VFS operations then find this ticket with > request_key() and either use it immediately (eg: mkdir, unlink) or > attach it to a file descriptor (open). Just curious--when is the actual crypto done? There doesn't seem to be any in this patch. --b. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8] AFS: Correctly alter relocation state after update and show state in /proc
Correctly alter the relocation state after update is complete by switching it from "Updating" to "Valid". Also display the record state in the vlocation database proc file. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- fs/afs/proc.c | 15 +-- fs/afs/vlocation.c |4 +++- 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/afs/proc.c b/fs/afs/proc.c index 9e6af40..d5601f6 100644 --- a/fs/afs/proc.c +++ b/fs/afs/proc.c @@ -553,6 +553,16 @@ static void afs_proc_cell_volumes_stop(struct seq_file *p, void *v) up_read(&cell->vl_sem); } +const char afs_vlocation_states[][4] = { + [AFS_VL_NEW]= "New", + [AFS_VL_CREATING] = "Crt", + [AFS_VL_VALID] = "Val", + [AFS_VL_NO_VOLUME] = "NoV", + [AFS_VL_UPDATING] = "Upd", + [AFS_VL_VOLUME_DELETED] = "Del", + [AFS_VL_UNCERTAIN] = "Unc", +}; + /* * display a header line followed by a load of volume lines */ @@ -563,13 +573,14 @@ static int afs_proc_cell_volumes_show(struct seq_file *m, void *v) /* display header on line 1 */ if (v == (void *) 1) { - seq_puts(m, "USE VLID[0] VLID[1] VLID[2] NAME\n"); + seq_puts(m, "USE STT VLID[0] VLID[1] VLID[2] NAME\n"); return 0; } /* display one cell per line on subsequent lines */ - seq_printf(m, "%3d %08x %08x %08x %s\n", + seq_printf(m, "%3d %s %08x %08x %08x %s\n", atomic_read(&vlocation->usage), + afs_vlocation_states[vlocation->state], vlocation->vldb.vid[0], vlocation->vldb.vid[1], vlocation->vldb.vid[2], diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c index f0f4419..9af1fe8 100644 --- a/fs/afs/vlocation.c +++ b/fs/afs/vlocation.c @@ -657,7 +657,7 @@ static void afs_vlocation_updater(struct work_struct *work) switch (ret) { case 0: afs_vlocation_apply_update(vl, &vldb); - vl->state = AFS_VL_UPDATING; + vl->state = AFS_VL_VALID; break; case -ENOMEDIUM: vl->state = AFS_VL_VOLUME_DELETED; @@ -691,6 +691,8 @@ static void afs_vlocation_updater(struct work_struct *work) timeout = afs_vlocation_update_timeout; } + ASSERT(list_empty(&vl->update)); + list_add_tail(&vl->update, &afs_vlocation_updates); _debug("timeout %ld", timeout); - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/8] AFS: Permit key to be cached in nameidata
Permit a key to be cached in the nameidata struct so that it only needs to be looked up once when doing the sequence of d_revalidate(), permission(), follow_link() and lookup() calls involved in a pathwalk. This is used by the AFS filesystem to avoid repeatedly having to call request_key(). Once looked up, the key is then available as the kernel walks to the tree until such a time as the kernel crosses to a non-AFS mountpoint or an AFS mountpoint in a different cell. The cache works like this: (1) The nameidata::key pointer is initialised to NULL at the start of the pathwalk (do_path_lookup()). path_release() and co. release the key it points to. (2) Any filesystem operation performed during the pathwalk that has access to the nameidata (lookup, permission, follow_link, d_revalidate) can look at the key - if non-NULL - and if it's what they're looking for they can use it. If there's a key there of potential interest, the key's type and description should be checked to make sure the key is permissible. If of interest, key_validate() should be called to make sure the key is still usable. If it isn't, the error should be passed back rather than the key lookup being redone on the basis that some earlier step is now no longer valid. (3) Any operation that is not interested in the key can either ignore it or release it and clear the pointer. (4) If an operation wants to put its own key there, it should release the old key and set the pointer to point to its own key with the key's usage count incremented. This could be encapsulated in a function something like this: void set_nd_key(struct nameidata *nd, struct key *key) { key_put(nd->key); nd->key = key_get(key); } Unfortunately there isn't currently a way to pass the key onto the inode operations for create(), link(), unlink(), and suchlike, nor is there a way to pass it to the open() file op without adding a struct key pointer argument to each of these. This might also be useful for NFS and CIFS. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- fs/namei.c|5 + fs/open.c |7 +-- include/linux/namei.h |1 + 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index ee60cc4..7a59d12 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -350,6 +350,8 @@ void path_release(struct nameidata *nd) { dput(nd->dentry); mntput(nd->mnt); + key_put(nd->key); + nd->key = NULL; } /* @@ -360,6 +362,8 @@ void path_release_on_umount(struct nameidata *nd) { dput(nd->dentry); mntput_no_expire(nd->mnt); + key_put(nd->key); + nd->key = NULL; } /** @@ -1108,6 +1112,7 @@ static int fastcall do_path_lookup(int dfd, const char *name, struct file *file; struct fs_struct *fs = current->fs; + nd->key = NULL; nd->last_type = LAST_ROOT; /* if there are only slashes... */ nd->flags = flags; nd->depth = 0; diff --git a/fs/open.c b/fs/open.c index c989fb4..77bd2a5 100644 --- a/fs/open.c +++ b/fs/open.c @@ -822,10 +822,13 @@ struct file *nameidata_to_filp(struct nameidata *nd, int flags) /* Pick up the filp from the open intent */ filp = nd->intent.open.file; /* Has the filesystem initialised the file for us? */ - if (filp->f_path.dentry == NULL) + if (filp->f_path.dentry == NULL) { filp = __dentry_open(nd->dentry, nd->mnt, flags, filp, NULL); - else + key_put(nd->key); + nd->key = NULL; + } else { path_release(nd); + } return filp; } diff --git a/include/linux/namei.h b/include/linux/namei.h index d39a5a6..d677408 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -17,6 +17,7 @@ enum { MAX_NESTED_LINKS = 8 }; struct nameidata { struct dentry *dentry; struct vfsmount *mnt; + struct key *key; struct qstr last; unsigned intflags; int last_type; - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] AFS: AF_RXRPC key changes
Make two changes to the AF_RXRPC key handling to make it easier for AFS to use: (1) Export key_type_rxrpc so that AFS can request keys of this type. (2) Make it possible to have keys that represent "no security". These are created by instantiating the keys with no data. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- include/keys/rxrpc-type.h | 22 ++ net/rxrpc/af_rxrpc.c |2 ++ net/rxrpc/ar-key.c| 10 +- net/rxrpc/ar-output.c |6 +- 4 files changed, 38 insertions(+), 2 deletions(-) diff --git a/include/keys/rxrpc-type.h b/include/keys/rxrpc-type.h new file mode 100644 index 000..e2ee73a --- /dev/null +++ b/include/keys/rxrpc-type.h @@ -0,0 +1,22 @@ +/* RxRPC key type + * + * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. + * Written by David Howells ([EMAIL PROTECTED]) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _KEYS_RXRPC_TYPE_H +#define _KEYS_RXRPC_TYPE_H + +#include + +/* + * key type for AF_RXRPC keys + */ +extern struct key_type key_type_rxrpc; + +#endif /* _KEYS_USER_TYPE_H */ diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 115ad19..9e37e4f 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -299,6 +299,8 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock, if (!key) key = rx->key; + if (key && !key->payload.data) + key = NULL; /* a no-security key */ bundle = rxrpc_get_bundle(rx, trans, key, service_id, gfp); if (IS_ERR(bundle)) { diff --git a/net/rxrpc/ar-key.c b/net/rxrpc/ar-key.c index 869a96c..7e049ff 100644 --- a/net/rxrpc/ar-key.c +++ b/net/rxrpc/ar-key.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include "ar-internal.h" @@ -40,6 +41,8 @@ struct key_type key_type_rxrpc = { .describe = rxrpc_describe, }; +EXPORT_SYMBOL(key_type_rxrpc); + /* * rxrpc server defined keys take ":" as the * description and an 8-byte decryption key as the payload @@ -63,6 +66,8 @@ struct key_type key_type_rxrpc_s = { * 12 4 kvno * 16 8 session key * 24 [len] ticket + * + * if no data is provided, then a no-security key is made */ static int rxrpc_instantiate(struct key *key, const void *data, size_t datalen) { @@ -74,6 +79,10 @@ static int rxrpc_instantiate(struct key *key, const void *data, size_t datalen) _enter("{%x},,%zu", key_serial(key), datalen); + /* handle a no-security key */ + if (!data && datalen == 0) + return 0; + /* get the key interface version number */ ret = -EINVAL; if (datalen <= 4 || !data) @@ -287,7 +296,6 @@ int rxrpc_get_server_data_key(struct rxrpc_connection *conn, struct rxkad_key tsec; } data; - _enter(""); key = key_alloc(&key_type_rxrpc, "x", 0, 0, current, 0, diff --git a/net/rxrpc/ar-output.c b/net/rxrpc/ar-output.c index ed7f3f4..d2d0baa 100644 --- a/net/rxrpc/ar-output.c +++ b/net/rxrpc/ar-output.c @@ -132,6 +132,7 @@ int rxrpc_client_sendmsg(struct kiocb *iocb, struct rxrpc_sock *rx, enum rxrpc_command cmd; struct rxrpc_call *call; unsigned long user_call_ID = 0; + struct key *key; __be16 service_id; u32 abort_code = 0; int ret; @@ -153,7 +154,10 @@ int rxrpc_client_sendmsg(struct kiocb *iocb, struct rxrpc_sock *rx, (struct sockaddr_rxrpc *) msg->msg_name; service_id = htons(srx->srx_service); } - bundle = rxrpc_get_bundle(rx, trans, rx->key, service_id, + key = rx->key; + if (key && !rx->key->payload.data) + key = NULL; + bundle = rxrpc_get_bundle(rx, trans, key, service_id, GFP_KERNEL); if (IS_ERR(bundle)) return PTR_ERR(bundle); - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8] AFS: Handle multiple mounts of an AFS superblock correctly
Handle multiple mounts of an AFS superblock correctly, checking to see whether the superblock is already initialised after calling sget() rather than just unconditionally stamping all over it. Also delete the "silent" parameter to afs_fill_super() as it's not used and can, in any case, be obtained from sb->s_flags. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- fs/afs/super.c | 26 -- 1 files changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/afs/super.c b/fs/afs/super.c index efc4fe6..77e6875 100644 --- a/fs/afs/super.c +++ b/fs/afs/super.c @@ -212,7 +212,7 @@ static int afs_test_super(struct super_block *sb, void *data) /* * fill in the superblock */ -static int afs_fill_super(struct super_block *sb, void *data, int silent) +static int afs_fill_super(struct super_block *sb, void *data) { struct afs_mount_params *params = data; struct afs_super_info *as = NULL; @@ -319,17 +319,23 @@ static int afs_get_sb(struct file_system_type *fs_type, goto error; } - sb->s_flags = flags; - - ret = afs_fill_super(sb, ¶ms, flags & MS_SILENT ? 1 : 0); - if (ret < 0) { - up_write(&sb->s_umount); - deactivate_super(sb); - goto error; + if (!sb->s_root) { + /* initial superblock/root creation */ + _debug("create"); + sb->s_flags = flags; + ret = afs_fill_super(sb, ¶ms); + if (ret < 0) { + up_write(&sb->s_umount); + deactivate_super(sb); + goto error; + } + sb->s_flags |= MS_ACTIVE; + } else { + _debug("reuse"); + ASSERTCMP(sb->s_flags, &, MS_ACTIVE); } - sb->s_flags |= MS_ACTIVE; - simple_set_mnt(mnt, sb); + simple_set_mnt(mnt, sb); afs_put_volume(params.volume); afs_put_cell(params.default_cell); _leave(" = 0 [%p]", sb); - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/8] AFS: Fix callback aggregator work item deadlock
Fix a deadlock in the give-up-callback aggregator dispatcher work item whereby the aggregator runs on keventd as does timed autounmount, thus leading to the unmount blocking keventd whilst waiting for keventd to run the aggregator when the give-up-callback buffer is full. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- fs/afs/callback.c | 14 +- fs/afs/fsclient.c |6 -- 2 files changed, 13 insertions(+), 7 deletions(-) diff --git a/fs/afs/callback.c b/fs/afs/callback.c index fdad11c..1533b49 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -232,7 +232,8 @@ static void afs_do_give_up_callback(struct afs_server *server, * possible to ship in one operation */ switch (atomic_inc_return(&server->cb_break_n)) { case 1 ... AFSCBMAX - 1: - schedule_delayed_work(&server->cb_break_work, HZ * 2); + queue_delayed_work(afs_callback_update_worker, + &server->cb_break_work, HZ * 2); break; case AFSCBMAX: afs_flush_callback_breaks(server); @@ -271,9 +272,11 @@ void afs_give_up_callback(struct afs_vnode *vnode) spin_lock(&server->cb_lock); if (vnode->cb_promised && afs_breakring_space(server) == 0) { add_wait_queue(&server->cb_break_waitq, &myself); - while (vnode->cb_promised && - afs_breakring_space(server) == 0) { + for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); + if (!vnode->cb_promised || + afs_breakring_space(server) != 0) + break; spin_unlock(&server->cb_lock); schedule(); spin_lock(&server->cb_lock); @@ -315,7 +318,8 @@ void afs_dispatch_give_up_callbacks(struct work_struct *work) void afs_flush_callback_breaks(struct afs_server *server) { if (try_to_cancel_delayed_work(&server->cb_break_work) >= 0) - schedule_delayed_work(&server->cb_break_work, 0); + queue_delayed_work(afs_callback_update_worker, + &server->cb_break_work, 0); } #if 0 @@ -426,7 +430,7 @@ static void afs_callback_updater(struct work_struct *work) int __init afs_callback_update_init(void) { afs_callback_update_worker = - create_singlethread_workqueue("kafs_cbupdated"); + create_singlethread_workqueue("kafs_callbackd"); return afs_callback_update_worker ? 0 : -ENOMEM; } diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index d955178..e2a36f8 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -355,10 +355,11 @@ int afs_fs_give_up_callbacks(struct afs_server *server, __be32 *bp, *tp; int loop; - _enter(""); - ncallbacks = CIRC_CNT(server->cb_break_head, server->cb_break_tail, ARRAY_SIZE(server->cb_break)); + + _enter("{%zu},", ncallbacks); + if (ncallbacks == 0) return 0; if (ncallbacks > AFSCBMAX) @@ -398,6 +399,7 @@ int afs_fs_give_up_callbacks(struct afs_server *server, (ARRAY_SIZE(server->cb_break) - 1); } + ASSERT(ncallbacks > 0); wake_up_nr(&server->cb_break_waitq, ncallbacks); return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode); - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/8] AF_RXRPC: Use own workqueues
Make the AF_RXRPC module use its own workqueues with their own per-CPU threads. Currently it uses keventd to do the following tasks, amongst others: (*) Security negotiation (*) Packet encryption and decryption (*) Packet resending (*) ACK, abort and busy packet generation (*) Timeout handling (*) Missing packet catchup (*) Parts of incoming call management (*) Destruction of structures we've finished with Some of these conflict with AFS's use of keventd, however, and can lead to effective deadlock of resources. Having discussed this, it has been suggested that encryption and decryption shouldn't be done in keventd (that's not unreasonable - it is potentially quite slow), and so the AF_RXRPC service is given its own threads rather than AFS. It might be useful to consider using the rpciod threads for this, if they were separated out from the SunRPC module. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- net/rxrpc/af_rxrpc.c | 17 ++--- net/rxrpc/ar-accept.c | 12 ++-- net/rxrpc/ar-ack.c| 10 +- net/rxrpc/ar-call.c | 16 net/rxrpc/ar-connection.c |8 net/rxrpc/ar-connevent.c | 20 ++-- net/rxrpc/ar-error.c |6 +++--- net/rxrpc/ar-input.c | 24 net/rxrpc/ar-internal.h | 28 net/rxrpc/ar-local.c |2 +- net/rxrpc/ar-output.c |4 ++-- net/rxrpc/ar-peer.c |2 +- net/rxrpc/ar-recvmsg.c|2 +- net/rxrpc/ar-skbuff.c |2 +- net/rxrpc/ar-transport.c |8 15 files changed, 92 insertions(+), 69 deletions(-) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index fb35998..115ad19 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -41,6 +41,8 @@ atomic_t rxrpc_debug_id; /* count of skbs currently in use */ atomic_t rxrpc_n_skbs; +struct workqueue_struct *rxrpc_workqueue; + static void rxrpc_sock_destructor(struct sock *); /* @@ -688,7 +690,7 @@ static int rxrpc_release_sock(struct sock *sk) /* try to flush out this socket */ rxrpc_release_calls_on_socket(rx); - flush_scheduled_work(); + flush_workqueue(rxrpc_workqueue); rxrpc_purge_queue(&sk->sk_receive_queue); if (rx->conn) { @@ -785,15 +787,21 @@ static int __init af_rxrpc_init(void) rxrpc_epoch = htonl(xtime.tv_sec); + ret = -ENOMEM; rxrpc_call_jar = kmem_cache_create( "rxrpc_call_jar", sizeof(struct rxrpc_call), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); if (!rxrpc_call_jar) { printk(KERN_NOTICE "RxRPC: Failed to allocate call jar\n"); - ret = -ENOMEM; goto error_call_jar; } + rxrpc_workqueue = create_workqueue("krxrpcd"); + if (!rxrpc_workqueue) { + printk(KERN_NOTICE "RxRPC: Failed to allocate work queue\n"); + goto error_work_queue; + } + ret = proto_register(&rxrpc_proto, 1); if (ret < 0) { printk(KERN_CRIT "RxRPC: Cannot register protocol\n"); @@ -831,6 +839,8 @@ error_key_type: error_sock: proto_unregister(&rxrpc_proto); error_proto: + destroy_workqueue(rxrpc_workqueue); +error_work_queue: kmem_cache_destroy(rxrpc_call_jar); error_call_jar: return ret; @@ -855,9 +865,10 @@ static void __exit af_rxrpc_exit(void) ASSERTCMP(atomic_read(&rxrpc_n_skbs), ==, 0); _debug("flush scheduled work"); - flush_scheduled_work(); + flush_workqueue(rxrpc_workqueue); proc_net_remove("rxrpc_conns"); proc_net_remove("rxrpc_calls"); + destroy_workqueue(rxrpc_workqueue); kmem_cache_destroy(rxrpc_call_jar); _leave(""); } diff --git a/net/rxrpc/ar-accept.c b/net/rxrpc/ar-accept.c index 405092d..73243ab 100644 --- a/net/rxrpc/ar-accept.c +++ b/net/rxrpc/ar-accept.c @@ -139,7 +139,7 @@ static int rxrpc_accept_incoming_call(struct rxrpc_local *local, call->conn->state = RXRPC_CONN_SERVER_CHALLENGING; atomic_inc(&call->conn->usage); set_bit(RXRPC_CONN_CHALLENGE, &call->conn->events); - schedule_work(&call->conn->processor); + rxrpc_queue_conn(call->conn); } else { _debug("conn ready"); call->state = RXRPC_CALL_SERVER_ACCEPTING; @@ -183,7 +183,7 @@ invalid_service: if (!test_bit(RXRPC_CALL_RELEASE, &call->flags) && !test_and_set_bit(RXRPC_CALL_RELEASE, &call->events)) { rxrpc_get_call(call); - schedule_work(&call->processor); + rxrpc_queue_call(call); } read_unlock_bh(&call->state_lock); rxrpc_put_call(call); @@ -375,7 +375,7 @@ struct rxrpc_call *rxrpc_accept_call(struct rxrpc_sock *rx,
[PATCH 2/8] AF_RXRPC: Lower dead call timeout and fix available call counting on connections
Make a couple of fixes to AF_RXRPC: (1) The dead call timeout is shortened to 2 seconds. Without this, each completed call sits around eating up resources for 10 seconds. The calls need to hang around for a little while in case duplicate packets appear, but 10 seconds is excessive. (2) The number of available calls on a connection (conn->avail_calls) wasn't being decremented when a new call was allocated for a connection that didn't have any calls in progress. This an occasional BUG occurring when we tried to find an empty channel slot on a connection that was supposed to have one available and didn't. In association with this, more assertions have been added to check this. Signed-Off-By: David Howells <[EMAIL PROTECTED]> --- net/rxrpc/ar-call.c | 59 + net/rxrpc/ar-connection.c | 20 ++- 2 files changed, 56 insertions(+), 23 deletions(-) diff --git a/net/rxrpc/ar-call.c b/net/rxrpc/ar-call.c index 1d7698a..4d92d88 100644 --- a/net/rxrpc/ar-call.c +++ b/net/rxrpc/ar-call.c @@ -19,7 +19,7 @@ struct kmem_cache *rxrpc_call_jar; LIST_HEAD(rxrpc_calls); DEFINE_RWLOCK(rxrpc_call_lock); static unsigned rxrpc_call_max_lifetime = 60; -static unsigned rxrpc_dead_call_timeout = 10; +static unsigned rxrpc_dead_call_timeout = 2; static void rxrpc_destroy_call(struct work_struct *work); static void rxrpc_call_life_expired(unsigned long _call); @@ -398,6 +398,7 @@ found_extant_call: */ void rxrpc_release_call(struct rxrpc_call *call) { + struct rxrpc_connection *conn = call->conn; struct rxrpc_sock *rx = call->socket; _enter("{%d,%d,%d,%d}", @@ -413,8 +414,7 @@ void rxrpc_release_call(struct rxrpc_call *call) /* dissociate from the socket * - the socket's ref on the call is passed to the death timer */ - _debug("RELEASE CALL %p (%d CONN %p)", - call, call->debug_id, call->conn); + _debug("RELEASE CALL %p (%d CONN %p)", call, call->debug_id, conn); write_lock_bh(&rx->call_lock); if (!list_empty(&call->accept_link)) { @@ -430,24 +430,42 @@ void rxrpc_release_call(struct rxrpc_call *call) } write_unlock_bh(&rx->call_lock); - if (call->conn->out_clientflag) - spin_lock(&call->conn->trans->client_lock); - write_lock_bh(&call->conn->lock); - /* free up the channel for reuse */ - if (call->conn->out_clientflag) { - call->conn->avail_calls++; - if (call->conn->avail_calls == RXRPC_MAXCALLS) - list_move_tail(&call->conn->bundle_link, - &call->conn->bundle->unused_conns); - else if (call->conn->avail_calls == 1) - list_move_tail(&call->conn->bundle_link, - &call->conn->bundle->avail_conns); + spin_lock(&conn->trans->client_lock); + write_lock_bh(&conn->lock); + write_lock(&call->state_lock); + + if (conn->channels[call->channel] == call) + conn->channels[call->channel] = NULL; + + if (conn->out_clientflag && conn->bundle) { + conn->avail_calls++; + switch (conn->avail_calls) { + case 1: + list_move_tail(&conn->bundle_link, + &conn->bundle->avail_conns); + case 2 ... RXRPC_MAXCALLS - 1: + ASSERT(conn->channels[0] == NULL || + conn->channels[1] == NULL || + conn->channels[2] == NULL || + conn->channels[3] == NULL); + break; + case RXRPC_MAXCALLS: + list_move_tail(&conn->bundle_link, + &conn->bundle->unused_conns); + ASSERT(conn->channels[0] == NULL && + conn->channels[1] == NULL && + conn->channels[2] == NULL && + conn->channels[3] == NULL); + break; + default: + printk(KERN_ERR "RxRPC: conn->avail_calls=%d\n", + conn->avail_calls); + BUG(); + } } - write_lock(&call->state_lock); - if (call->conn->channels[call->channel] == call) - call->conn->channels[call->channel] = NULL; + spin_unlock(&conn->trans->client_lock); if (call->state < RXRPC_CALL_COMPLETE && call->state != RXRPC_CALL_CLIENT_FINAL_ACK) { @@ -458,10 +476,9 @@ void rxrpc_release_call(struct rxrpc_call *call) rxrpc_queue_call(call); } write_unlock(&call->state_lock); - write_unlock_bh(&call->conn->lock); - if (call->conn->out_clientflag) -
[PATCH 0/8] AFS: Add security support and fix bugs
These patches build on the patchset labelled "AF_RXRPC socket family and AFS rewrite". The patches are also available for http download. Firstly, the patches fix a number of bugs in AF_RXRPC: http://people.redhat.com/~dhowells/rxrpc/09-af_rxrpc-own-workqueues.diff http://people.redhat.com/~dhowells/rxrpc/10-af_rxrpc-fixes.diff Secondly, they fix some bugs in the AFS filesystem: http://people.redhat.com/~dhowells/rxrpc/11-afs-callback-wq.diff http://people.redhat.com/~dhowells/rxrpc/12-afs-vlocation.diff http://people.redhat.com/~dhowells/rxrpc/13-afs-multimount.diff And finally, they add security support to AFS: http://people.redhat.com/~dhowells/rxrpc/14-afs-rxrpc-key.diff http://people.redhat.com/~dhowells/rxrpc/15-afs-nameidata-key.diff http://people.redhat.com/~dhowells/rxrpc/16-afs-security.diff A security key is acquired by running the klog program: http://people.redhat.com/~dhowells/rxrpc/klog.c This is compiled by: make klog CFLAGS="-Wall -g" LDLIBS="-lcrypto -lcrypt -lkrb4 -lkeyutils" And then run by: ./klog Note that at the moment this is a rough and ready test program that has the username, realm, password and proposed key timeout compiled in. Note also that it will only talk to the AFS kaserver. If a security key is acquired, then all subsequent operations - including VL lookups and mounts - performed with that session keyring will be authenticated using that key. The key can be viewed like so: [EMAIL PROTECTED] ~]# keyctl show Session Keyring -3 --alswrv 0 0 keyring: _ses.3268 2 --alswrv 0 0 \_ keyring: _uid.0 111416553 --als--v 0 0 \_ rxrpc: [EMAIL PROTECTED] David - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] unprivileged mount syscall
On Wed, 2007-04-11 at 12:44 +0200, Miklos Szeredi wrote: > > 1. clone the master namespace. > > > > 2. in the new namespace > > > > move the tree under /share/$me to / > > for each ($user, $what, $how) { > > move /share/$user/$what to /$what > > if ($how == slave) { > > make the mount tree under /$what as slave > > } > > } > > > > 3. in the new namespace make the tree under > >/share as private and unmount /share > > Thanks. I get the basic idea now: the namespace itself need not be > shared between the sessions, it is enough if "share" propagation is > set up between the different namespaces of a user. > > I don't yet see either in your or Viro's description how the trees > under /share/$USER are initialized. I guess they are recursively > bound from /, and are made slaves. yes. I suppose, when a userid is created one of the steps would be mount --rbind / /share/$USER mount --make-rslave /share/$USER mount --make-rshared /share/$USER RP > Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/13] fs: convert core functions to zero_user_page
On Tue, 10 April 2007 22:56:38 -0700, Andrew Morton wrote: > > And I'm surprised that this: > > +static inline void memclear_highpage_flush(struct page *page, unsigned int > offset, unsigned int size) > +{ > + return zero_user_page(page, offset, size); > +} > > compiled. zero_user_page() returns void... As does memclear_highpage_flush(). Some of my code looks like: void some_func(...) { if (foo) return do_foo(...); if (bar) return do_bar(...); ... } do_foo() and do_bar() also return void. Saves an extra line for the return statment and the brackets. Doesn't help in the code you quoted, of course. Jörn -- Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest. -- Rob Pike - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] unprivileged mount syscall
Quoting Ian Kent ([EMAIL PROTECTED]): > On Wed, 2007-04-11 at 09:26 -0500, Serge E. Hallyn wrote: > > Quoting Ian Kent ([EMAIL PROTECTED]): > > > On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote: > > > > > > >> > > > > > > >> - users can use bind mounts without having to pre-configure them > > > > > > >> in > > > > > > >> /etc/fstab > > > > > > >> > > > > > > > > > > > > This is by far the biggest concern I see. I think the security > > > > > > implication of allowing anyone to do bind mounts are poorly > > > > > > understood. > > > > > > > > > > And especially so since there is no way for a filesystem module to > > > > > veto > > > > > such requests. > > > > > > > > The filesystem can't veto initial mounts based on destination either. > > > > I don't think it's up to the filesystem to police bind/move mounts in > > > > any way. > > > > > > But if a filesystem can't or the developer thinks that it shouldn't for > > > some reason, support bind/move mounts then there should be a way for the > > > > Can you list some valid reasons why an fs could care where it is > > mounted? The only thing I could think of is a stackable fs, but it > > shouldn't care whether it is overlay-mounted or not. > > For my part, autofs and autofs4. Ah, thanks. I can see I'm going to have start using autofs to get to know the implementation, because it seems clear we'll run into it in the containers work again (beyond the struct pid conv) at some point. > Moving or binding isn't valid. > I tried to design that limitation out version 5 but wasn't able to. > In time I probably can but couldn't continue to support older versions. thanks, -serge > > > > thanks, > > -serge > > > > > filesystem to tell the kernel that. > > > > > > Surely a filesystem is in a good position to be able to decide if a > > > mount request "for it" should be allowed to continue based on it's "own > > > situation and capabilities". > > > > > > Ian > > > > > > > > > > > > - > > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" > > > in > > > the body of a message to [EMAIL PROTECTED] > > > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] unprivileged mount syscall
On Wed, 2007-04-11 at 09:26 -0500, Serge E. Hallyn wrote: > Quoting Ian Kent ([EMAIL PROTECTED]): > > On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote: > > > > > >> > > > > > >> - users can use bind mounts without having to pre-configure them in > > > > > >> /etc/fstab > > > > > >> > > > > > > > > > > This is by far the biggest concern I see. I think the security > > > > > implication of allowing anyone to do bind mounts are poorly > > > > > understood. > > > > > > > > And especially so since there is no way for a filesystem module to veto > > > > such requests. > > > > > > The filesystem can't veto initial mounts based on destination either. > > > I don't think it's up to the filesystem to police bind/move mounts in > > > any way. > > > > But if a filesystem can't or the developer thinks that it shouldn't for > > some reason, support bind/move mounts then there should be a way for the > > Can you list some valid reasons why an fs could care where it is > mounted? The only thing I could think of is a stackable fs, but it > shouldn't care whether it is overlay-mounted or not. For my part, autofs and autofs4. Moving or binding isn't valid. I tried to design that limitation out version 5 but wasn't able to. In time I probably can but couldn't continue to support older versions. > > thanks, > -serge > > > filesystem to tell the kernel that. > > > > Surely a filesystem is in a good position to be able to decide if a > > mount request "for it" should be allowed to continue based on it's "own > > situation and capabilities". > > > > Ian > > > > > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] unprivileged mount syscall
Quoting Ian Kent ([EMAIL PROTECTED]): > On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote: > > > > >> > > > > >> - users can use bind mounts without having to pre-configure them in > > > > >> /etc/fstab > > > > >> > > > > > > > > This is by far the biggest concern I see. I think the security > > > > implication of allowing anyone to do bind mounts are poorly understood. > > > > > > And especially so since there is no way for a filesystem module to veto > > > such requests. > > > > The filesystem can't veto initial mounts based on destination either. > > I don't think it's up to the filesystem to police bind/move mounts in > > any way. > > But if a filesystem can't or the developer thinks that it shouldn't for > some reason, support bind/move mounts then there should be a way for the Can you list some valid reasons why an fs could care where it is mounted? The only thing I could think of is a stackable fs, but it shouldn't care whether it is overlay-mounted or not. thanks, -serge > filesystem to tell the kernel that. > > Surely a filesystem is in a good position to be able to decide if a > mount request "for it" should be allowed to continue based on it's "own > situation and capabilities". > > Ian > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] unprivileged mount syscall
On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote: > > > >> > > > >> - users can use bind mounts without having to pre-configure them in > > > >> /etc/fstab > > > >> > > > > > > This is by far the biggest concern I see. I think the security > > > implication of allowing anyone to do bind mounts are poorly understood. > > > > And especially so since there is no way for a filesystem module to veto > > such requests. > > The filesystem can't veto initial mounts based on destination either. > I don't think it's up to the filesystem to police bind/move mounts in > any way. But if a filesystem can't or the developer thinks that it shouldn't for some reason, support bind/move mounts then there should be a way for the filesystem to tell the kernel that. Surely a filesystem is in a good position to be able to decide if a mount request "for it" should be allowed to continue based on it's "own situation and capabilities". Ian - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
calling write() from interrupt context
Hello, I have a question regarding calling write() from an interrupt context in the kernel: is it possible ? There is an article about reading/writing files from the kernel by GregKH; see: http://interactive.linuxjournal.com/article/8110 Everybody (including the author) admits that reading/writing files from the kernel is not recommended at all. Yet, because of my interest in this, I tried it and it works. However, when trying write() from interrupt context it will not work because write() can sleep. Is there a way to call write() from interrupt context ? some special filesystem or patch? I found a dumpfs patch; but it was not tested yet; moreover, it is a patch against 2.6.8-rc2 and as it seems it was abandoned. see http://lwn.net/Articles/94748/?format=printable Any ideas? Regards, Ian - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] unprivileged mount syscall
> > >> > > >> - users can use bind mounts without having to pre-configure them in > > >> /etc/fstab > > >> > > > > This is by far the biggest concern I see. I think the security > > implication of allowing anyone to do bind mounts are poorly understood. > > And especially so since there is no way for a filesystem module to veto > such requests. The filesystem can't veto initial mounts based on destination either. I don't think it's up to the filesystem to police bind/move mounts in any way. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] unprivileged mount syscall
> 1. clone the master namespace. > > 2. in the new namespace > > move the tree under /share/$me to / > for each ($user, $what, $how) { > move /share/$user/$what to /$what > if ($how == slave) { > make the mount tree under /$what as slave > } > } > > 3. in the new namespace make the tree under >/share as private and unmount /share Thanks. I get the basic idea now: the namespace itself need not be shared between the sessions, it is enough if "share" propagation is set up between the different namespaces of a user. I don't yet see either in your or Viro's description how the trees under /share/$USER are initialized. I guess they are recursively bound from /, and are made slaves. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/8] unprivileged mount syscall
> > This patchset adds support for keeping mount ownership information in > > the kernel, and allow unprivileged mount(2) and umount(2) in certain > > cases. > > Well, I'd like to feel all smart and point out some bugs, but the code > all reads very nicely, seems to work as advertised, and while I won't > have ltp results until tomorrow, boot test results in so far are all > successful. > > Looks good. Thanks for the review and testing! Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html