date:20070411

2.6.21-rc6 new aops patchset

2007-04-11 Thread Nick Piggin


http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/

2.6.21-rc6-new-aops*

New aops patchset against 2.6.21-rc6.

Reworked the cont helpers to be better aligned with the old scheme.
This unbroke reiserfs (hopefully the only showstopper), and made fat
conversion simpler.

Converted most of the cont_prepare_write filesystems (about half a dozen).
affs and hfsplus still call ->prepare_write in parts, which makes them
slightly less than trivial.

Converted block_dev and jffs2 to new aops.

Converted hostfs and smbfs, optimised things along the way.

Convert rd to new aops. This was interesting because the new aops make
it trivial to keep rd pagecache off the LRU, so I did that too.

Bugfixes for tmpfs and loop (which was not using the new aops, so I
didn't notice the tmpfs breakage).

Switched ext2's directory manipulation to the new pagecache accessors.

Did some performance testing of the fuse_perform_write implementation.
Result with a passthrough filesystem onto a backing tmpfs directory is that
bulk (1MB) writes are nearly 4 times faster (256MB/s vs 71MB/s), because
FUSE can send larger requests to userspace. Block based filesystems will
tend to be less dramatic, but could still be significant if block allocation
is batched, for example.

Issues:
perform_write still here for the moment (conversion from perform_write aop
implementation to write fop shouldn't be too hard anyway, but I'll sort
this out before it gets into mainline).

nobh still unconverted (old nobh ops still work, they'll just be using the
slow usercopy path.  ext3 doesn't use nobh any more). I'm inclined to keep
ignoring nobh for now, because we're already up to 40 patches. I'd like to
try improving nobh, but this isn't the right patchset to do it in.

Many of the trivial conversions are untested. Need to convert others
(eg. reiserfs).

Need to think about how to merge this.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/17] hfs: remove redundant read_mapping_page error check

2007-04-11 Thread Nate Diller

Now that read_mapping_page() does error checking internally, there is no
need to check PageError here.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/hfs/bnode.c 
linux-2.6.21-rc6-mm1-test/fs/hfs/bnode.c
--- linux-2.6.21-rc6-mm1/fs/hfs/bnode.c 2007-04-09 17:20:13.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/hfs/bnode.c2007-04-10 21:28:03.0 
-0700
@@ -282,10 +282,6 @@ static struct hfs_bnode *__hfs_bnode_cre
page = read_mapping_page(mapping, block++, NULL);
if (IS_ERR(page))
goto fail;
-   if (PageError(page)) {
-   page_cache_release(page);
-   goto fail;
-   }
page_cache_release(page);
node->page[i] = page;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/17] ntfs: convert ntfs_map_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace ntfs_map_page() and ntfs_unmap_page() using the new read_kmap_page()
and put_kmapped_page() calls, and their locking variants, and remove
unneeded PageError checking.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ntfs/aops.h 
linux-2.6.21-rc5-mm4-test/fs/ntfs/aops.h
--- linux-2.6.21-rc5-mm4/fs/ntfs/aops.h 2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/ntfs/aops.h2007-04-06 01:59:19.0 
-0700
@@ -31,73 +31,6 @@
 
 #include "inode.h"
 
-/**
- * ntfs_unmap_page - release a page that was mapped using ntfs_map_page()
- * @page:  the page to release
- *
- * Unpin, unmap and release a page that was obtained from ntfs_map_page().
- */
-static inline void ntfs_unmap_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-}
-
-/**
- * ntfs_map_page - map a page into accessible memory, reading it if necessary
- * @mapping:   address space for which to obtain the page
- * @index: index into the page cache for @mapping of the page to map
- *
- * Read a page from the page cache of the address space @mapping at position
- * @index, where @index is in units of PAGE_CACHE_SIZE, and not in bytes.
- *
- * If the page is not in memory it is loaded from disk first using the readpage
- * method defined in the address space operations of @mapping and the page is
- * added to the page cache of @mapping in the process.
- *
- * If the page belongs to an mst protected attribute and it is marked as such
- * in its ntfs inode (NInoMstProtected()) the mst fixups are applied but no
- * error checking is performed.  This means the caller has to verify whether
- * the ntfs record(s) contained in the page are valid or not using one of the
- * ntfs_is__record{,p}() macros, where  is the record type you are
- * expecting to see.  (For details of the macros, see fs/ntfs/layout.h.)
- *
- * If the page is in high memory it is mapped into memory directly addressible
- * by the kernel.
- *
- * Finally the page count is incremented, thus pinning the page into place.
- *
- * The above means that page_address(page) can be used on all pages obtained
- * with ntfs_map_page() to get the kernel virtual address of the page.
- *
- * When finished with the page, the caller has to call ntfs_unmap_page() to
- * unpin, unmap and release the page.
- *
- * Note this does not grant exclusive access. If such is desired, the caller
- * must provide it independently of the ntfs_{un}map_page() calls by using
- * a {rw_}semaphore or other means of serialization. A spin lock cannot be
- * used as ntfs_map_page() can block.
- *
- * The unlocked and uptodate page is returned on success or an encoded error
- * on failure. Caller has to test for error using the IS_ERR() macro on the
- * return value. If that evaluates to 'true', the negative error code can be
- * obtained using PTR_ERR() on the return value of ntfs_map_page().
- */
-static inline struct page *ntfs_map_page(struct address_space *mapping,
-   unsigned long index)
-{
-   struct page *page = read_mapping_page(mapping, index, NULL);
-
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (!PageError(page))
-   return page;
-   ntfs_unmap_page(page);
-   return ERR_PTR(-EIO);
-   }
-   return page;
-}
-
 #ifdef NTFS_RW
 
 extern void mark_ntfs_record_dirty(struct page *page, const unsigned int ofs);
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ntfs/bitmap.c 
linux-2.6.21-rc5-mm4-test/fs/ntfs/bitmap.c
--- linux-2.6.21-rc5-mm4/fs/ntfs/bitmap.c   2006-11-29 13:57:37.0 
-0800
+++ linux-2.6.21-rc5-mm4-test/fs/ntfs/bitmap.c  2007-04-06 12:40:53.0 
-0700
@@ -72,7 +72,7 @@ int __ntfs_bitmap_set_bits_in_run(struct
 
/* Get the page containing the first bit (@start_bit). */
mapping = vi->i_mapping;
-   page = ntfs_map_page(mapping, index);
+   page = read_kmap_page(mapping, index);
if (IS_ERR(page)) {
if (!is_rollback)
ntfs_error(vi->i_sb, "Failed to map first page (error "
@@ -123,8 +123,8 @@ int __ntfs_bitmap_set_bits_in_run(struct
/* Update @index and get the next page. */
flush_dcache_page(page);
set_page_dirty(page);
-   ntfs_unmap_page(page);
-   page = ntfs_map_page(mapping, ++index);
+   put_kmapped_page(page);
+   page = read_kmap_page(mapping, ++index);
if (IS_ERR(page))
goto rollback;
kaddr = page_address(page);
@@ -159,7 +159,7 @@ done:
/* We are done.  Unmap the page and return success. */
flush_dcache_page(page);
set_page_dirty(page);
-   ntfs_unmap_page(page);
+   put_kmapped_page(page);
ntfs_debug("Done.");
return 0;
 rollback:
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ntfs/dir.c 
linux

[PATCH 14/17] reiserfs: convert reiserfs_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace reiserfs_get_page() and reiserfs_put_page() using the new
read_kmap_page() and put_kmapped_page() calls and their locking variants. 
Also, propagate the gfp_mask() deadlock comment to callsites.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/reiserfs/xattr.c 
linux-2.6.21-rc5-mm4-test/fs/reiserfs/xattr.c
--- linux-2.6.21-rc5-mm4/fs/reiserfs/xattr.c2007-04-05 17:14:25.0 
-0700
+++ linux-2.6.21-rc5-mm4-test/fs/reiserfs/xattr.c   2007-04-06 
14:41:34.0 -0700
@@ -438,33 +438,6 @@ int xattr_readdir(struct file *file, fil
return res;
 }
 
-/* Internal operations on file data */
-static inline void reiserfs_put_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-}
-
-static struct page *reiserfs_get_page(struct inode *dir, unsigned long n)
-{
-   struct address_space *mapping = dir->i_mapping;
-   struct page *page;
-   /* We can deadlock if we try to free dentries,
-  and an unlink/rmdir has just occured - GFP_NOFS avoids this */
-   mapping_set_gfp_mask(mapping, GFP_NOFS);
-   page = read_mapping_page(mapping, n, NULL);
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (PageError(page))
-   goto fail;
-   }
-   return page;
-
-  fail:
-   reiserfs_put_page(page);
-   return ERR_PTR(-EIO);
-}
-
 static inline __u32 xattr_hash(const char *msg, int len)
 {
return csum_partial(msg, len, 0);
@@ -537,13 +510,15 @@ reiserfs_xattr_set(struct inode *inode, 
else
chunk = buffer_size - buffer_pos;
 
-   page = reiserfs_get_page(xinode, file_pos >> PAGE_CACHE_SHIFT);
+   /* We can deadlock if we try to free dentries,
+  and an unlink/rmdir has just occured - GFP_NOFS avoids this 
*/
+   mapping_set_gfp_mask(mapping, GFP_NOFS);
+   page = __read_kmap_page(mapping, file_pos >> PAGE_CACHE_SHIFT);
if (IS_ERR(page)) {
err = PTR_ERR(page);
goto out_filp;
}
 
-   lock_page(page);
data = page_address(page);
 
if (file_pos == 0) {
@@ -566,8 +541,7 @@ reiserfs_xattr_set(struct inode *inode, 
 page_offset + chunk +
 skip);
}
-   unlock_page(page);
-   reiserfs_put_page(page);
+   put_locked_page(page);
buffer_pos += chunk;
file_pos += chunk;
skip = 0;
@@ -646,13 +620,15 @@ reiserfs_xattr_get(const struct inode *i
else
chunk = isize - file_pos;
 
-   page = reiserfs_get_page(xinode, file_pos >> PAGE_CACHE_SHIFT);
+   /* We can deadlock if we try to free dentries,
+  and an unlink/rmdir has just occured - GFP_NOFS avoids this 
*/
+   mapping_set_gfp_mask(xinode->i_mapping, GFP_NOFS);
+   page = __read_kmap_page(xinode->i_mapping, file_pos >> 
PAGE_CACHE_SHIFT);
if (IS_ERR(page)) {
err = PTR_ERR(page);
goto out_dput;
}
 
-   lock_page(page);
data = page_address(page);
if (file_pos == 0) {
struct reiserfs_xattr_header *rxh =
@@ -661,8 +637,7 @@ reiserfs_xattr_get(const struct inode *i
chunk -= skip;
/* Magic doesn't match up.. */
if (rxh->h_magic != cpu_to_le32(REISERFS_XATTR_MAGIC)) {
-   unlock_page(page);
-   reiserfs_put_page(page);
+   put_locked_page(page);
reiserfs_warning(inode->i_sb,
 "Invalid magic for xattr (%s) "
 "associated with %k", name,
@@ -673,8 +648,7 @@ reiserfs_xattr_get(const struct inode *i
hash = le32_to_cpu(rxh->h_hash);
}
memcpy(buffer + buffer_pos, data + skip, chunk);
-   unlock_page(page);
-   reiserfs_put_page(page);
+   put_locked_page(page);
file_pos += chunk;
buffer_pos += chunk;
skip = 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 16/17] ufs: convert ufs_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace ufs_get_page()/ufs_get_locked_page() and
ufs_put_page()/ufs_put_locked_page() using the new read_kmap_page() and
put_kmapped_page() calls and their locking variants.  Also, change the
ufs_check_page() call to return the page's error status, and update the
call sites accordingly.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/balloc.c 
linux-2.6.21-rc5-mm4-test/fs/ufs/balloc.c
--- linux-2.6.21-rc5-mm4/fs/ufs/balloc.c2007-04-05 17:13:29.0 
-0700
+++ linux-2.6.21-rc5-mm4-test/fs/ufs/balloc.c   2007-04-06 12:46:02.0 
-0700
@@ -272,7 +272,7 @@ static void ufs_change_blocknr(struct in
index = i >> (PAGE_CACHE_SHIFT - inode->i_blkbits);
 
if (likely(cur_index != index)) {
-   page = ufs_get_locked_page(mapping, index);
+   page = __read_mapping_page(mapping, index, NULL);
if (!page)/* it was truncated */
continue;
if (IS_ERR(page)) {/* or EIO */
@@ -325,8 +325,10 @@ static void ufs_change_blocknr(struct in
bh = bh->b_this_page;
} while (bh != head);
 
-   if (likely(cur_index != index))
-   ufs_put_locked_page(page);
+   if (likely(cur_index != index)) {
+   unlock_page(page);
+   page_cache_release(page);
+   }
}
UFSD("EXIT\n");
 }
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/truncate.c 
linux-2.6.21-rc5-mm4-test/fs/ufs/truncate.c
--- linux-2.6.21-rc5-mm4/fs/ufs/truncate.c  2007-04-05 17:13:29.0 
-0700
+++ linux-2.6.21-rc5-mm4-test/fs/ufs/truncate.c 2007-04-06 12:46:14.0 
-0700
@@ -395,8 +395,9 @@ static int ufs_alloc_lastblock(struct in
 
lastfrag--;
 
-   lastpage = ufs_get_locked_page(mapping, lastfrag >>
-  (PAGE_CACHE_SHIFT - inode->i_blkbits));
+   lastpage = __read_mapping_page(mapping, lastfrag >>
+  (PAGE_CACHE_SHIFT - inode->i_blkbits),
+  NULL);
if (IS_ERR(lastpage)) {
err = -EIO;
goto out;
@@ -441,7 +442,8 @@ static int ufs_alloc_lastblock(struct in
   }
}
 out_unlock:
-   ufs_put_locked_page(lastpage);
+   unlock_page(lastpage);
+   page_cache_release(lastpage);
 out:
return err;
 }
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/util.c 
linux-2.6.21-rc5-mm4-test/fs/ufs/util.c
--- linux-2.6.21-rc5-mm4/fs/ufs/util.c  2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/ufs/util.c 2007-04-06 12:40:53.0 
-0700
@@ -232,55 +232,3 @@ ufs_set_inode_dev(struct super_block *sb
ufsi->i_u1.i_data[0] = cpu_to_fs32(sb, fs32);
 }
 
-/**
- * ufs_get_locked_page() - locate, pin and lock a pagecache page, if not exist
- * read it from disk.
- * @mapping: the address_space to search
- * @index: the page index
- *
- * Locates the desired pagecache page, if not exist we'll read it,
- * locks it, increments its reference
- * count and returns its address.
- *
- */
-
-struct page *ufs_get_locked_page(struct address_space *mapping,
-pgoff_t index)
-{
-   struct page *page;
-
-   page = find_lock_page(mapping, index);
-   if (!page) {
-   page = read_mapping_page(mapping, index, NULL);
-
-   if (IS_ERR(page)) {
-   printk(KERN_ERR "ufs_change_blocknr: "
-  "read_mapping_page error: ino %lu, index: %lu\n",
-  mapping->host->i_ino, index);
-   goto out;
-   }
-
-   lock_page(page);
-
-   if (unlikely(page->mapping == NULL)) {
-   /* Truncate got there first */
-   unlock_page(page);
-   page_cache_release(page);
-   page = NULL;
-   goto out;
-   }
-
-   if (!PageUptodate(page) || PageError(page)) {
-   unlock_page(page);
-   page_cache_release(page);
-
-   printk(KERN_ERR "ufs_change_blocknr: "
-  "can not read page: ino %lu, index: %lu\n",
-  mapping->host->i_ino, index);
-
-   page = ERR_PTR(-EIO);
-   }
-   }
-out:
-   return page;
-}
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ufs/util.h 
linux-2.6.21-rc5-mm4-test/fs/ufs/util.h
--- linux-2.6.21-rc5-mm4/fs/ufs/util.h  2007-04-05 17:13:29.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/ufs/util.h 2007-04-06 12:46:36.0 
-0700
@@ -251,16 +251,6 @@ extern void _ubh_ubhcpymem_(struct ufs_s
 #define ubh_memcpyubh(ubh,mem,size) _ubh_

[PATCH 17/17] vxfs: convert vxfs_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace vxfs_get_page() with the new read_kmap_page().

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_extern.h 
linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_extern.h
--- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_extern.h  2007-04-05 
17:13:29.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_extern.h 2007-04-06 
01:59:19.0 -0700
@@ -69,7 +69,6 @@ extern const struct file_operations   vxfs
 extern int vxfs_read_olt(struct super_block *, u_long);
 
 /* vxfs_subr.c */
-extern struct page *   vxfs_get_page(struct address_space *, u_long);
 extern voidvxfs_put_page(struct page *);
 extern struct buffer_head *vxfs_bread(struct inode *, int);
 
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_inode.c 
linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_inode.c
--- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_inode.c   2007-04-05 
17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_inode.c  2007-04-06 
01:59:19.0 -0700
@@ -138,7 +138,7 @@ __vxfs_iget(ino_t ino, struct inode *ili
u_long  offset;
 
offset = (ino % (PAGE_SIZE / VXFS_ISIZE)) * VXFS_ISIZE;
-   pp = vxfs_get_page(ilistp->i_mapping, ino * VXFS_ISIZE / PAGE_SIZE);
+   pp = read_kmap_page(ilistp->i_mapping, ino * VXFS_ISIZE / PAGE_SIZE);
 
if (!IS_ERR(pp)) {
struct vxfs_inode_info  *vip;
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_lookup.c 
linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_lookup.c
--- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_lookup.c  2007-04-05 
17:13:29.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_lookup.c 2007-04-06 
01:59:19.0 -0700
@@ -125,7 +125,7 @@ vxfs_find_entry(struct inode *ip, struct
caddr_t kaddr;
struct page *pp;
 
-   pp = vxfs_get_page(ip->i_mapping, page);
+   pp = read_kmap_page(ip->i_mapping, page);
if (IS_ERR(pp))
continue;
kaddr = (caddr_t)page_address(pp);
@@ -280,7 +280,7 @@ vxfs_readdir(struct file *fp, void *retp
caddr_t kaddr;
struct page *pp;
 
-   pp = vxfs_get_page(ip->i_mapping, page);
+   pp = read_kmap_page(ip->i_mapping, page);
if (IS_ERR(pp))
continue;
kaddr = (caddr_t)page_address(pp);
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_subr.c 
linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_subr.c
--- linux-2.6.21-rc5-mm4/fs/freevxfs/vxfs_subr.c2007-04-05 
17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/freevxfs/vxfs_subr.c   2007-04-06 
01:59:19.0 -0700
@@ -56,39 +56,6 @@ vxfs_put_page(struct page *pp)
 }
 
 /**
- * vxfs_get_page - read a page into memory.
- * @ip:inode to read from
- * @n: page number
- *
- * Description:
- *   vxfs_get_page reads the @n th page of @ip into the pagecache.
- *
- * Returns:
- *   The wanted page on success, else a NULL pointer.
- */
-struct page *
-vxfs_get_page(struct address_space *mapping, u_long n)
-{
-   struct page *   pp;
-
-   pp = read_mapping_page(mapping, n, NULL);
-
-   if (!IS_ERR(pp)) {
-   kmap(pp);
-   /** if (!PageChecked(pp)) **/
-   /** vxfs_check_page(pp); **/
-   if (PageError(pp))
-   goto fail;
-   }
-   
-   return (pp);
-
-fail:
-   vxfs_put_page(pp);
-   return ERR_PTR(-EIO);
-}
-
-/**
  * vxfs_bread - read buffer for a give inode,block tuple
  * @ip:inode
  * @block: logical block
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/17] reiser4: remove redundant read_mapping_page error checks

2007-04-11 Thread Nate Diller

read_mapping_page() is now fully synchronous, so there's no need wait for
the page lock or check for I/O errors.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff 
linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/tail_conversion.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/tail_conversion.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/file/tail_conversion.c   
2007-04-09 17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/file/tail_conversion.c  
2007-04-10 21:33:47.0 -0700
@@ -608,14 +608,6 @@ int extent2tail(unix_file_info_t *uf_inf
break;
}
 
-   wait_on_page_locked(page);
-
-   if (!PageUptodate(page)) {
-   page_cache_release(page);
-   result = RETERR(-EIO);
-   break;
-   }
-
/* cut part of file we have read */
start_byte = (__u64) (i << PAGE_CACHE_SHIFT);
set_key_offset(&from, start_byte);
diff -urpN -X dontdiff 
linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/extent_file_ops.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/extent_file_ops.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/extent_file_ops.c   
2007-04-10 19:41:14.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/extent_file_ops.c  
2007-04-10 21:38:41.0 -0700
@@ -1220,15 +1220,8 @@ int reiser4_read_extent(struct file *fil
page = read_mapping_page(mapping, cur_page, file);
if (IS_ERR(page))
return PTR_ERR(page);
-   lock_page(page);
-   if (!PageUptodate(page)) {
-   unlock_page(page);
-   page_cache_release(page);
-   warning("jmacd-97178", "extent_read: page is not up to 
date");
-   return RETERR(-EIO);
-   }
+
mark_page_accessed(page);
-   unlock_page(page);
 
/* If users can be writing to this page using arbitrary virtual
   addresses, take care about potential aliasing before reading
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/17] partition: remove redundant read_mapping_page error checks

2007-04-11 Thread Nate Diller

Remove unneeded PageError checking in read_dev_sector(), and clean up the
code a bit.

Can anyone point out why it's OK to use page_address() here on a page which
has not been kmapped?  If it's not OK, then a good number of callers need to
be fixed.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/partitions/check.c 
linux-2.6.21-rc6-mm1-test/fs/partitions/check.c
--- linux-2.6.21-rc6-mm1/fs/partitions/check.c  2007-04-09 17:24:03.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/partitions/check.c 2007-04-10 
21:59:01.0 -0700
@@ -568,16 +568,12 @@ unsigned char *read_dev_sector(struct bl
 
page = read_mapping_page(mapping, (pgoff_t)(n >> (PAGE_CACHE_SHIFT-9)),
 NULL);
-   if (!IS_ERR(page)) {
-   if (PageError(page))
-   goto fail;
-   p->v = page;
-   return (unsigned char *)page_address(page) +  ((n & ((1 << 
(PAGE_CACHE_SHIFT - 9)) - 1)) << 9);
-fail:
-   page_cache_release(page);
+   if (IS_ERR(page)) {
+   p->v = NULL;
+   return NULL;
}
-   p->v = NULL;
-   return NULL;
+   p->v = page;
+   return (unsigned char *)page_address(page) +  ((n & ((1 << 
(PAGE_CACHE_SHIFT - 9)) - 1)) << 9);
 }
 
 EXPORT_SYMBOL(read_dev_sector);
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/17] sysv: convert dir_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace sysv dir_get_page() with the new read_kmap_page().

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/sysv/dir.c 
linux-2.6.21-rc5-mm4-test/fs/sysv/dir.c
--- linux-2.6.21-rc5-mm4/fs/sysv/dir.c  2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/sysv/dir.c 2007-04-06 01:59:19.0 
-0700
@@ -50,15 +50,6 @@ static int dir_commit_chunk(struct page 
return err;
 }
 
-static struct page * dir_get_page(struct inode *dir, unsigned long n)
-{
-   struct address_space *mapping = dir->i_mapping;
-   struct page *page = read_mapping_page(mapping, n, NULL);
-   if (!IS_ERR(page))
-   kmap(page);
-   return page;
-}
-
 static int sysv_readdir(struct file * filp, void * dirent, filldir_t filldir)
 {
unsigned long pos = filp->f_pos;
@@ -77,7 +68,7 @@ static int sysv_readdir(struct file * fi
for ( ; n < npages; n++, offset = 0) {
char *kaddr, *limit;
struct sysv_dir_entry *de;
-   struct page *page = dir_get_page(inode, n);
+   struct page *page = read_kmap_page(inode->i_mapping, n);
 
if (IS_ERR(page))
continue;
@@ -149,7 +140,7 @@ struct sysv_dir_entry *sysv_find_entry(s
 
do {
char *kaddr;
-   page = dir_get_page(dir, n);
+   page = read_kmap_page(dir->i_mapping, n);
if (!IS_ERR(page)) {
kaddr = (char*)page_address(page);
de = (struct sysv_dir_entry *) kaddr;
@@ -191,7 +182,7 @@ int sysv_add_link(struct dentry *dentry,
 
/* We take care of directory expansion in the same loop */
for (n = 0; n <= npages; n++) {
-   page = dir_get_page(dir, n);
+   page = read_kmap_page(dir->i_mapping, n);
err = PTR_ERR(page);
if (IS_ERR(page))
goto out;
@@ -299,7 +290,7 @@ int sysv_empty_dir(struct inode * inode)
for (i = 0; i < npages; i++) {
char *kaddr;
struct sysv_dir_entry * de;
-   page = dir_get_page(inode, i);
+   page = read_kmap_page(inode->i_mapping, i);
 
if (IS_ERR(page))
continue;
@@ -353,7 +344,7 @@ void sysv_set_link(struct sysv_dir_entry
 
 struct sysv_dir_entry * sysv_dotdot (struct inode *dir, struct page **p)
 {
-   struct page *page = dir_get_page(dir, 0);
+   struct page *page = read_kmap_page(dir->i_mapping, 0);
struct sysv_dir_entry *de = NULL;
 
if (!IS_ERR(page)) {
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 9/17] minix: convert dir_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace minix dir_get_page() and dir_put_page() using the new
read_kmap_page() and put_kmapped_page()/put_locked_page() calls.  Also, use
__read_kmap_page() instead of re-taking the page_lock.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/minix/dir.c 
linux-2.6.21-rc5-mm4-test/fs/minix/dir.c
--- linux-2.6.21-rc5-mm4/fs/minix/dir.c 2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/minix/dir.c2007-04-06 02:31:55.0 
-0700
@@ -23,12 +23,6 @@ const struct file_operations minix_dir_o
.fsync  = minix_sync_file,
 };
 
-static inline void dir_put_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-}
-
 /*
  * Return the offset into page `page_nr' of the last valid
  * byte in that page, plus one.
@@ -60,22 +54,6 @@ static int dir_commit_chunk(struct page 
return err;
 }
 
-static struct page * dir_get_page(struct inode *dir, unsigned long n)
-{
-   struct address_space *mapping = dir->i_mapping;
-   struct page *page = read_mapping_page(mapping, n, NULL);
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (!PageUptodate(page))
-   goto fail;
-   }
-   return page;
-
-fail:
-   dir_put_page(page);
-   return ERR_PTR(-EIO);
-}
-
 static inline void *minix_next_entry(void *de, struct minix_sb_info *sbi)
 {
return (void*)((char*)de + sbi->s_dirsize);
@@ -102,7 +80,7 @@ static int minix_readdir(struct file * f
 
for ( ; n < npages; n++, offset = 0) {
char *p, *kaddr, *limit;
-   struct page *page = dir_get_page(inode, n);
+   struct page *page = read_kmap_page(inode->i_mapping, n);
 
if (IS_ERR(page))
continue;
@@ -128,12 +106,12 @@ static int minix_readdir(struct file * f
(n << PAGE_CACHE_SHIFT) | offset,
inumber, DT_UNKNOWN);
if (over) {
-   dir_put_page(page);
+   put_kmapped_page(page);
goto done;
}
}
}
-   dir_put_page(page);
+   put_kmapped_page(page);
}
 
 done:
@@ -177,7 +155,7 @@ minix_dirent *minix_find_entry(struct de
for (n = 0; n < npages; n++) {
char *kaddr, *limit;
 
-   page = dir_get_page(dir, n);
+   page = read_kmap_page(dir->i_mapping, n);
if (IS_ERR(page))
continue;
 
@@ -198,7 +176,7 @@ minix_dirent *minix_find_entry(struct de
if (namecompare(namelen, sbi->s_namelen, name, namx))
goto found;
}
-   dir_put_page(page);
+   put_kmapped_page(page);
}
return NULL;
 
@@ -233,11 +211,10 @@ int minix_add_link(struct dentry *dentry
for (n = 0; n <= npages; n++) {
char *limit, *dir_end;
 
-   page = dir_get_page(dir, n);
+   page = __read_kmap_page(dir->i_mapping, n);
err = PTR_ERR(page);
if (IS_ERR(page))
goto out;
-   lock_page(page);
kaddr = (char*)page_address(page);
dir_end = kaddr + minix_last_byte(dir, n);
limit = kaddr + PAGE_CACHE_SIZE - sbi->s_dirsize;
@@ -265,8 +242,7 @@ int minix_add_link(struct dentry *dentry
if (namecompare(namelen, sbi->s_namelen, name, namx))
goto out_unlock;
}
-   unlock_page(page);
-   dir_put_page(page);
+   put_locked_page(page);
}
BUG();
return -EINVAL;
@@ -288,13 +264,12 @@ got_it:
err = dir_commit_chunk(page, from, to);
dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
mark_inode_dirty(dir);
-out_put:
-   dir_put_page(page);
+   put_kmapped_page(page);
 out:
return err;
 out_unlock:
-   unlock_page(page);
-   goto out_put;
+   put_locked_page(page);
+   return err;
 }
 
 int minix_delete_entry(struct minix_dir_entry *de, struct page *page)
@@ -314,7 +289,7 @@ int minix_delete_entry(struct minix_dir_
} else {
unlock_page(page);
}
-   dir_put_page(page);
+   put_kmapped_page(page);
inode->i_ctime = inode->i_mtime = CURRENT_TIME_SEC;
mark_inode_dirty(inode);
return err;
@@ -378,7 +353,7 @@ int minix_empty_dir(struct inode * inode
for (i = 0; i < npages; i++) {
char *p, *kaddr, *limit;
 
-   page = dir_get_page(inode, i);
+   page = read_kmap_page(inode->i_mapping, i);
if (IS_ERR(page)

[PATCH 10/17] mtd: convert page_read to read_kmap_page

2007-04-11 Thread Nate Diller

Replace page_read() with read_kmap_page()/__read_kmap_page().  This probably
fixes behaviour on highmem systems, since page_address() was being used
without kmap().  Also eliminate the need to re-take the page lock during
writes to the page.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/drivers/mtd/devices/block2mtd.c 
linux-2.6.21-rc5-mm4-test/drivers/mtd/devices/block2mtd.c
--- linux-2.6.21-rc5-mm4/drivers/mtd/devices/block2mtd.c2007-04-05 
17:14:24.0 -0700
+++ linux-2.6.21-rc5-mm4-test/drivers/mtd/devices/block2mtd.c   2007-04-06 
01:59:19.0 -0700
@@ -39,12 +39,6 @@ struct block2mtd_dev {
 /* Static info about the MTD, used in cleanup_module */
 static LIST_HEAD(blkmtd_device_list);
 
-
-static struct page *page_read(struct address_space *mapping, int index)
-{
-   return read_mapping_page(mapping, index, NULL);
-}
-
 /* erase a specified part of the device */
 static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
 {
@@ -56,23 +50,19 @@ static int _block2mtd_erase(struct block
u_long *max;
 
while (pages) {
-   page = page_read(mapping, index);
-   if (!page)
-   return -ENOMEM;
+   page = __read_kmap_page(mapping, index);
if (IS_ERR(page))
return PTR_ERR(page);
 
max = page_address(page) + PAGE_SIZE;
for (p=page_address(page); pblkdev->bd_inode->i_mapping, index);
-   if (!page)
-   return -ENOMEM;
+   page = read_kmap_page(dev->blkdev->bd_inode->i_mapping, index);
if (IS_ERR(page))
return PTR_ERR(page);
 
memcpy(buf, page_address(page) + offset, cpylen);
-   page_cache_release(page);
+   put_kmapped_page(page);
 
if (retlen)
*retlen += cpylen;
@@ -163,19 +151,15 @@ static int _block2mtd_write(struct block
cpylen = len;   // this page
len = len - cpylen;
 
-   page = page_read(mapping, index);
-   if (!page)
-   return -ENOMEM;
+   page = __read_kmap_page(mapping, index);
if (IS_ERR(page))
return PTR_ERR(page);
 
if (memcmp(page_address(page)+offset, buf, cpylen)) {
-   lock_page(page);
memcpy(page_address(page) + offset, buf, cpylen);
set_page_dirty(page);
-   unlock_page(page);
}
-   page_cache_release(page);
+   put_locked_page(page);
 
if (retlen)
*retlen += cpylen;
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/17] hfsplus: remove redundant read_mapping_page error check

2007-04-11 Thread Nate Diller

Now that read_mapping_page() does error checking internally, there is no
need to check PageError here.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]> 

--- 

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/hfsplus/bnode.c 
linux-2.6.21-rc6-mm1-test/fs/hfsplus/bnode.c
--- linux-2.6.21-rc6-mm1/fs/hfsplus/bnode.c 2007-04-09 17:20:13.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/hfsplus/bnode.c2007-04-10 
21:28:45.0 -0700
@@ -442,10 +442,6 @@ static struct hfs_bnode *__hfs_bnode_cre
page = read_mapping_page(mapping, block, NULL);
if (IS_ERR(page))
goto fail;
-   if (PageError(page)) {
-   page_cache_release(page);
-   goto fail;
-   }
page_cache_release(page);
node->page[i] = page;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/17] jfs: use locking read_mapping_page

2007-04-11 Thread Nate Diller

Use the new locking variant of read_mapping_page to avoid doing extra work.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/jfs/jfs_metapage.c 
linux-2.6.21-rc6-mm1-test/fs/jfs/jfs_metapage.c
--- linux-2.6.21-rc6-mm1/fs/jfs/jfs_metapage.c  2007-04-09 17:23:48.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/jfs/jfs_metapage.c 2007-04-09 
21:37:09.0 -0700
@@ -632,12 +632,11 @@ struct metapage *__get_metapage(struct i
}
SetPageUptodate(page);
} else {
-   page = read_mapping_page(mapping, page_index, NULL);
-   if (IS_ERR(page) || !PageUptodate(page)) {
+   page = __read_mapping_page(mapping, page_index, NULL);
+   if (IS_ERR(page)) {
jfs_err("read_mapping_page failed!");
return NULL;
}
-   lock_page(page);
}
 
mp = page_to_mp(page, page_offset);
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/17] jffs2: convert jffs2_gc_fetch_page to read_cache_page

2007-04-11 Thread Nate Diller

Replace jffs2_gc_fetch_page() and jffs2_gc_release_page() using the
read_cache_page() and put_kmapped_page() calls, and update the call site
accordingly.  Explicit calls to kmap()/kunmap() make the code more clear.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/jffs2/fs.c 
linux-2.6.21-rc5-mm4-test/fs/jffs2/fs.c
--- linux-2.6.21-rc5-mm4/fs/jffs2/fs.c  2007-04-05 17:14:25.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/jffs2/fs.c 2007-04-06 01:59:19.0 
-0700
@@ -621,33 +621,6 @@ struct jffs2_inode_info *jffs2_gc_fetch_
return JFFS2_INODE_INFO(inode);
 }
 
-unsigned char *jffs2_gc_fetch_page(struct jffs2_sb_info *c,
-  struct jffs2_inode_info *f,
-  unsigned long offset,
-  unsigned long *priv)
-{
-   struct inode *inode = OFNI_EDONI_2SFFJ(f);
-   struct page *pg;
-
-   pg = read_cache_page(inode->i_mapping, offset >> PAGE_CACHE_SHIFT,
-(void *)jffs2_do_readpage_unlock, inode);
-   if (IS_ERR(pg))
-   return (void *)pg;
-
-   *priv = (unsigned long)pg;
-   return kmap(pg);
-}
-
-void jffs2_gc_release_page(struct jffs2_sb_info *c,
-  unsigned char *ptr,
-  unsigned long *priv)
-{
-   struct page *pg = (void *)*priv;
-
-   kunmap(pg);
-   page_cache_release(pg);
-}
-
 static int jffs2_flash_setup(struct jffs2_sb_info *c) {
int ret = 0;
 
diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/jffs2/gc.c 
linux-2.6.21-rc5-mm4-test/fs/jffs2/gc.c
--- linux-2.6.21-rc5-mm4/fs/jffs2/gc.c  2007-04-05 17:13:10.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/jffs2/gc.c 2007-04-06 01:59:19.0 
-0700
@@ -1078,7 +1078,7 @@ static int jffs2_garbage_collect_dnode(s
uint32_t alloclen, offset, orig_end, orig_start;
int ret = 0;
unsigned char *comprbuf = NULL, *writebuf;
-   unsigned long pg;
+   struct page *page;
unsigned char *pg_ptr;
 
memset(&ri, 0, sizeof(ri));
@@ -1219,12 +1219,16 @@ static int jffs2_garbage_collect_dnode(s
 *page OK. We'll actually write it out again in commit_write, which 
is a little
 *suboptimal, but at least we're correct.
 */
-   pg_ptr = jffs2_gc_fetch_page(c, f, start, &pg);
+   page = read_cache_page(OFNI_EDONI_2SFFJ(f)->i_mapping,
+   start >> PAGE_CACHE_SHIFT,
+   (void *)jffs2_do_readpage_unlock,
+   OFNI_EDONI_2SFFJ(f));
 
-   if (IS_ERR(pg_ptr)) {
+   if (IS_ERR(page)) {
printk(KERN_WARNING "read_cache_page() returned error: %ld\n", 
PTR_ERR(pg_ptr));
-   return PTR_ERR(pg_ptr);
+   return PTR_ERR(page);
}
+   pg_ptr = kmap(page);
 
offset = start;
while(offset < orig_end) {
@@ -1287,6 +1291,7 @@ static int jffs2_garbage_collect_dnode(s
}
}
 
-   jffs2_gc_release_page(c, pg_ptr, &pg);
+   kunmap(page);
+   page_cache_release(page);
return ret;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/17] afs: convert afs_dir_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace afs_dir_get_page() and afs_dir_put_page() using the new
read_kmap_page() and put_kmapped_page() calls, and eliminate unnecessary
PageError checks.  Also, change the afs_dir_check_page() call to return
the page's error status, and update the call site accordingly.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/afs/dir.c 
linux-2.6.21-rc5-mm4-test/fs/afs/dir.c
--- linux-2.6.21-rc5-mm4/fs/afs/dir.c   2007-04-06 12:27:03.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/afs/dir.c  2007-04-06 14:30:22.0 
-0700
@@ -115,12 +115,15 @@ struct afs_dir_lookup_cookie {
 /*
  * check that a directory page is valid
  */
-static inline void afs_dir_check_page(struct inode *dir, struct page *page)
+static inline int afs_dir_check_page(struct inode *dir, struct page *page)
 {
struct afs_dir_page *dbuf;
loff_t latter;
int tmp, qty;
 
+   if (likely(PageChecked(page)))
+   return PageError(page);
+
 #if 0
/* check the page count */
qty = desc.size / sizeof(dbuf->blocks[0]);
@@ -154,52 +157,16 @@ static inline void afs_dir_check_page(st
}
 
SetPageChecked(page);
-   return;
+   return 0;
 
  error:
SetPageChecked(page);
SetPageError(page);
-
+   return 1;
 } /* end afs_dir_check_page() */
 
 /*/
 /*
- * discard a page cached in the pagecache
- */
-static inline void afs_dir_put_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-
-} /* end afs_dir_put_page() */
-
-/*/
-/*
- * get a page into the pagecache
- */
-static struct page *afs_dir_get_page(struct inode *dir, unsigned long index)
-{
-   struct page *page;
-
-   _enter("{%lu},%lu", dir->i_ino, index);
-
-   page = read_mapping_page(dir->i_mapping, index, NULL);
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (!PageChecked(page))
-   afs_dir_check_page(dir, page);
-   if (PageError(page))
-   goto fail;
-   }
-   return page;
-
- fail:
-   afs_dir_put_page(page);
-   return ERR_PTR(-EIO);
-} /* end afs_dir_get_page() */
-
-/*/
-/*
  * open an AFS directory file
  */
 static int afs_dir_open(struct inode *inode, struct file *file)
@@ -344,11 +311,16 @@ static int afs_dir_iterate(struct inode 
blkoff = *fpos & ~(sizeof(union afs_dir_block) - 1);
 
/* fetch the appropriate page from the directory */
-   page = afs_dir_get_page(dir, blkoff / PAGE_SIZE);
+   page = read_kmap_page(dir->i_mapping, blkoff / PAGE_SIZE);
if (IS_ERR(page)) {
ret = PTR_ERR(page);
break;
}
+   if (afs_check_page(dir, page)) {
+   err = -EIO;
+   put_kmapped_page(page);
+   break;
+   }
 
limit = blkoff & ~(PAGE_SIZE - 1);
 
@@ -361,7 +333,7 @@ static int afs_dir_iterate(struct inode 
ret = afs_dir_iterate_block(fpos, dblock, blkoff,
cookie, filldir);
if (ret != 1) {
-   afs_dir_put_page(page);
+   put_kmapped_page(page);
goto out;
}
 
@@ -369,7 +341,7 @@ static int afs_dir_iterate(struct inode 
 
} while (*fpos < dir->i_size && blkoff < limit);
 
-   afs_dir_put_page(page);
+   put_kmapped_page(page);
ret = 0;
}
 
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/afs/mntpt.c 
linux-2.6.21-rc6-mm1-test/fs/afs/mntpt.c
--- linux-2.6.21-rc6-mm1/fs/afs/mntpt.c 2007-04-09 17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/afs/mntpt.c2007-04-10 21:22:07.0 
-0700
@@ -74,11 +74,6 @@ int afs_mntpt_check_symlink(struct afs_v
ret = PTR_ERR(page);
goto out;
}
-
-   ret = -EIO;
-   if (PageError(page))
-   goto out_free;
-
buf = kmap(page);
 
/* examine the symlink's contents */
@@ -98,7 +93,6 @@ int afs_mntpt_check_symlink(struct afs_v
ret = 0;
 
kunmap(page);
- out_free:
page_cache_release(page);
  out:
_leave(" = %d", ret);
@@ -180,10 +174,6 @@ static struct vfsmount *afs_mntpt_do_aut
goto error;
}
 
-   ret = -EIO;
-   if (PageError(page))
-   goto error;
-
buf = kmap(page);
memcpy(devname, buf, size);
kunmap(page);
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"

[PATCH 4/17] ext2: convert ext2_get_page to read_kmap_page

2007-04-11 Thread Nate Diller

Replace ext2_get_page() and ext2_put_page() using the new read_kmap_page()
and put_kmapped_page() calls.  Also, change the ext2_check_page() call to
return the page's error status, and update the call sites accordingly.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc5-mm4/fs/ext2/dir.c 
linux-2.6.21-rc5-mm4-test/fs/ext2/dir.c
--- linux-2.6.21-rc5-mm4/fs/ext2/dir.c  2007-04-06 12:27:03.0 -0700
+++ linux-2.6.21-rc5-mm4-test/fs/ext2/dir.c 2007-04-06 14:34:23.0 
-0700
@@ -35,12 +35,6 @@ static inline unsigned ext2_chunk_size(s
return inode->i_sb->s_blocksize;
 }
 
-static inline void ext2_put_page(struct page *page)
-{
-   kunmap(page);
-   page_cache_release(page);
-}
-
 static inline unsigned long dir_pages(struct inode *inode)
 {
return (inode->i_size+PAGE_CACHE_SIZE-1)>>PAGE_CACHE_SHIFT;
@@ -74,7 +68,7 @@ static int ext2_commit_chunk(struct page
return err;
 }
 
-static void ext2_check_page(struct page *page)
+static int ext2_check_page(struct page *page)
 {
struct inode *dir = page->mapping->host;
struct super_block *sb = dir->i_sb;
@@ -86,6 +80,14 @@ static void ext2_check_page(struct page 
ext2_dirent *p;
char *error;
 
+   if (likely(PageChecked(page))) {
+   if (likely(!PageError(page)))
+   return 0;
+
+   put_kmapped_page(page);
+   return -EIO;
+   }
+
if ((dir->i_size >> PAGE_CACHE_SHIFT) == page->index) {
limit = dir->i_size & ~PAGE_CACHE_MASK;
if (limit & (chunk_size - 1))
@@ -112,7 +114,7 @@ static void ext2_check_page(struct page 
goto Eend;
 out:
SetPageChecked(page);
-   return;
+   return 0;
 
/* Too bad, we had an error */
 
@@ -153,24 +155,8 @@ Eend:
 fail:
SetPageChecked(page);
SetPageError(page);
-}
-
-static struct page * ext2_get_page(struct inode *dir, unsigned long n)
-{
-   struct address_space *mapping = dir->i_mapping;
-   struct page *page = read_mapping_page(mapping, n, NULL);
-   if (!IS_ERR(page)) {
-   kmap(page);
-   if (!PageChecked(page))
-   ext2_check_page(page);
-   if (PageError(page))
-   goto fail;
-   }
-   return page;
-
-fail:
-   ext2_put_page(page);
-   return ERR_PTR(-EIO);
+   put_kmapped_page(page);
+   return -EIO;
 }
 
 /*
@@ -262,9 +248,9 @@ ext2_readdir (struct file * filp, void *
for ( ; n < npages; n++, offset = 0) {
char *kaddr, *limit;
ext2_dirent *de;
-   struct page *page = ext2_get_page(inode, n);
+   struct page *page = read_kmap_page(inode->i_mapping, n);
 
-   if (IS_ERR(page)) {
+   if (IS_ERR(page) || ext2_check_page(page)) {
ext2_error(sb, __FUNCTION__,
   "bad page in #%lu",
   inode->i_ino);
@@ -286,7 +272,7 @@ ext2_readdir (struct file * filp, void *
if (de->rec_len == 0) {
ext2_error(sb, __FUNCTION__,
"zero-length directory entry");
-   ext2_put_page(page);
+   put_kmapped_page(page);
return -EIO;
}
if (de->inode) {
@@ -301,13 +287,13 @@ ext2_readdir (struct file * filp, void *
(nf_pos += le16_to_cpu(de->rec_len);
}
-   ext2_put_page(page);
+   put_kmapped_page(page);
}
return 0;
 }
@@ -344,8 +330,8 @@ struct ext2_dir_entry_2 * ext2_find_entr
n = start;
do {
char *kaddr;
-   page = ext2_get_page(dir, n);
-   if (!IS_ERR(page)) {
+   page = read_kmap_page(dir->i_mapping, n);
+   if (!IS_ERR(page) && !ext2_check_page(page)) {
kaddr = page_address(page);
de = (ext2_dirent *) kaddr;
kaddr += ext2_last_byte(dir, n) - reclen;
@@ -353,14 +339,14 @@ struct ext2_dir_entry_2 * ext2_find_entr
if (de->rec_len == 0) {
ext2_error(dir->i_sb, __FUNCTION__,
"zero-length directory entry");
-   ext2_put_page(page);
+

[PATCH 2/17] fs: introduce new read_cache_page interface

2007-04-11 Thread Nate Diller

Export a single version of read_cache_page, which returns with a locked,
Uptodate page or a synchronous error, and use inline helper functions to
replicate the old behavior.  Also, introduce new helper functions for the
most common file system uses, which include kmapping the page, as well as
needing to keep the page locked.  These changes collectively eliminate a
substantial amount of private fs logic in favor of generic code.

It also simplifies filemap.c significantly, by assuming that callers want
synchronous behavior, which is true for all callers anyway except one.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/include/linux/pagemap.h 
linux-2.6.21-rc6-mm1-test/include/linux/pagemap.h
--- linux-2.6.21-rc6-mm1/include/linux/pagemap.h2007-04-11 
14:22:19.0 -0700
+++ linux-2.6.21-rc6-mm1-test/include/linux/pagemap.h   2007-04-11 
14:29:31.0 -0700
@@ -108,21 +108,30 @@ static inline struct page *grab_cache_pa
 
 extern struct page * grab_cache_page_nowait(struct address_space *mapping,
unsigned long index);
-extern struct page * read_cache_page_async(struct address_space *mapping,
-   unsigned long index, filler_t *filler,
-   void *data);
-extern struct page * read_cache_page(struct address_space *mapping,
+extern struct page *__read_cache_page(struct address_space *mapping,
unsigned long index, filler_t *filler,
void *data);
 extern int read_cache_pages(struct address_space *mapping,
struct list_head *pages, filler_t *filler, void *data);
 
-static inline struct page *read_mapping_page_async(
-   struct address_space *mapping,
+void fastcall unlock_page(struct page *page);
+static inline struct page *read_cache_page(struct address_space *mapping,
+   unsigned long index, filler_t *filler,
+   void *data)
+{
+   struct page *page;
+
+   page = __read_cache_page(mapping, index, filler, data);
+   if (!IS_ERR(page))
+   unlock_page(page);
+   return page;
+}
+
+static inline struct page *__read_mapping_page(struct address_space *mapping,
 unsigned long index, void *data)
 {
filler_t *filler = (filler_t *)mapping->a_ops->readpage;
-   return read_cache_page_async(mapping, index, filler, data);
+   return __read_cache_page(mapping, index, filler, data);
 }
 
 static inline struct page *read_mapping_page(struct address_space *mapping,
@@ -132,6 +141,36 @@ static inline struct page *read_mapping_
return read_cache_page(mapping, index, filler, data);
 }
 
+static inline struct page *__read_kmap_page(struct address_space *mapping,
+ unsigned long index)
+{
+   struct page *page = __read_mapping_page(mapping, index, NULL);
+   if (!IS_ERR(page))
+   kmap(page);
+   return page;
+}
+
+static inline struct page *read_kmap_page(struct address_space *mapping,
+ unsigned long index)
+{
+   struct page *page = read_mapping_page(mapping, index, NULL);
+   if (!IS_ERR(page))
+   kmap(page);
+   return page;
+}
+
+static inline void put_kmapped_page(struct page *page)
+{
+   kunmap(page);
+   page_cache_release(page);
+}
+
+static inline void put_locked_page(struct page *page)
+{
+   unlock_page(page);
+   put_kmapped_page(page);
+}
+
 int add_to_page_cache(struct page *page, struct address_space *mapping,
unsigned long index, gfp_t gfp_mask);
 int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/mm/filemap.c 
linux-2.6.21-rc6-mm1-test/mm/filemap.c
--- linux-2.6.21-rc6-mm1/mm/filemap.c   2007-04-11 14:26:42.0 -0700
+++ linux-2.6.21-rc6-mm1-test/mm/filemap.c  2007-04-10 21:46:03.0 
-0700
@@ -1600,115 +1600,53 @@ int generic_file_readonly_mmap(struct fi
 EXPORT_SYMBOL(generic_file_mmap);
 EXPORT_SYMBOL(generic_file_readonly_mmap);
 
-static struct page *__read_cache_page(struct address_space *mapping,
-   unsigned long index,
-   int (*filler)(void *,struct page*),
-   void *data)
-{
-   struct page *page, *cached_page = NULL;
-   int err;
-repeat:
-   page = find_get_page(mapping, index);
-   if (!page) {
-   if (!cached_page) {
-   cached_page = page_cache_alloc_cold(mapping);
-   if (!cached_page)
-   return ERR_PTR(-ENOMEM);
-   }
-   err = add_to_page_cache_lru(cached_page, mapping,
-   index,

[PATCH 1/17] cramfs: use read_mapping_page

2007-04-11 Thread Nate Diller

read_mapping_page_async() is going away, so convert its only user to
read_mapping_page().  This change has not been benchmarked, however, in
order to get real parallelism this wants something completely different,
like __do_page_cache_readahead(), which is not currently exported.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/cramfs/inode.c 
linux-2.6.21-rc6-mm1-test/fs/cramfs/inode.c
--- linux-2.6.21-rc6-mm1/fs/cramfs/inode.c  2007-04-09 17:24:03.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/cramfs/inode.c 2007-04-09 21:37:09.0 
-0700
@@ -180,8 +180,7 @@ static void *cramfs_read(struct super_bl
struct page *page = NULL;
 
if (blocknr + i < devsize) {
-   page = read_mapping_page_async(mapping, blocknr + i,
-   NULL);
+   page = read_mapping_page(mapping, blocknr + i, NULL);
/* synchronous error? */
if (IS_ERR(page))
page = NULL;
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/17] fs: cleanup single page synchronous read interface

2007-04-11 Thread Nate Diller

Nick Piggin recently changed the read_cache_page interface to be
synchronous, which is pretty much what the file systems want anyway.  Turns
out that they have more in common than that, though, and some of them want
to be able to get an uptodate *locked* page.  Many of them want a kmapped
page, which is uptodate and unlocked, and they all have their own individual
helper functions to achieve this.

Since the helper functions are so similar, this patch just combines them
into a small number of simple library functions, which call read_cache_page
(renamed to __read_cache_page because it now returns a locked page).  The
immediate result is a vast reduction in the number of fs-specific helper
functions.  The secondary goal is to reduce the number of places the page
lock is taken, and eliminate a lot of PageUptodate and PageError checks.

The file systems that still use PageChecked now have checker functions that
return an error if the page is corrupted or has some other error.  This
simplifies the logic since the checker function is not part of any helper
function anymore.

Compile tested on x86_64.

Signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

 drivers/mtd/devices/block2mtd.c  |   28 +--
 fs/afs/dir.c |   56 +++---
 fs/afs/mntpt.c   |   10 --
 fs/cramfs/inode.c|3 
 fs/ext2/dir.c|   82 -
 fs/freevxfs/vxfs_extern.h|1 
 fs/freevxfs/vxfs_inode.c |2 
 fs/freevxfs/vxfs_lookup.c|4 -
 fs/freevxfs/vxfs_subr.c  |   33 
 fs/hfs/bnode.c   |4 -
 fs/hfsplus/bnode.c   |4 -
 fs/jffs2/fs.c|   27 ---
 fs/jffs2/gc.c|   15 ++-
 fs/jfs/jfs_metapage.c|5 -
 fs/minix/dir.c   |   59 ---
 fs/ntfs/aops.h   |   67 -
 fs/ntfs/bitmap.c |8 +-
 fs/ntfs/dir.c|   65 ++---
 fs/ntfs/index.c  |   12 +--
 fs/ntfs/lcnalloc.c   |6 -
 fs/ntfs/logfile.c|   12 +--
 fs/ntfs/mft.c|   53 +
 fs/ntfs/super.c  |   38 -
 fs/ntfs/usnjrnl.c|4 -
 fs/partitions/check.c|   14 +--
 fs/reiser4/plugin/file/tail_conversion.c |8 --
 fs/reiser4/plugin/item/extent_file_ops.c |9 --
 fs/reiserfs/xattr.c  |   48 ++--
 fs/sysv/dir.c|   19 +---
 fs/ufs/balloc.c  |8 +-
 fs/ufs/dir.c |   90 +--
 fs/ufs/truncate.c|8 +-
 fs/ufs/util.c|   52 -
 fs/ufs/util.h|   10 --
 include/linux/pagemap.h  |   53 -
 mm/filemap.c |  118 +++
 36 files changed, 315 insertions(+), 720 deletions(-)
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] make iunique use a do/while loop rather than its obscure goto loop

2007-04-11 Thread Jeffrey Layton

A while back, Christoph mentioned that he thought that iunique ought to be
cleaned up to use a more conventional loop construct. This patch does that,
turning the strange goto loop into a do/while.

Signed-off-by: Jeff Layton <[EMAIL PROTECTED]>

diff --git a/fs/inode.c b/fs/inode.c
index 23fc1fd..90e7587 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -689,21 +689,18 @@ ino_t iunique(struct super_block *sb, ino_t max_reserved)
struct inode *inode;
struct hlist_head * head;
ino_t res;
+
spin_lock(&inode_lock);
-retry:
-   if (counter > max_reserved) {
-   head = inode_hashtable + hash(sb,counter);
+   do {
+   if (counter <= max_reserved)
+   counter = max_reserved + 1;
res = counter++;
+   head = inode_hashtable + hash(sb, res);
inode = find_inode_fast(sb, head, res);
-   if (!inode) {
-   spin_unlock(&inode_lock);
-   return res;
-   }
-   } else {
-   counter = max_reserved + 1;
-   }
-   goto retry;
-   
+   } while (inode != NULL);
+   spin_unlock(&inode_lock);
+
+   return res;
 }
 
 EXPORT_SYMBOL(iunique);
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/8] unprivileged mount syscall

2007-04-11 Thread Serge E. Hallyn

Quoting Miklos Szeredi ([EMAIL PROTECTED]):
> > Not objecting to prctl(), but two other options would be
> > 
> > 1. add a CLONE_NEW_NS_USERMNT flag - kind of ugly, but that is
> >the time at which the ns is created, so in that sense it
> >makes sense.
> 
> Yes, I thought about this, but there's no easy way to set the flag for
> the initial namespace, and a second flag CLONE_NEW_NS_NOUSERMNT would
> be needed to turn off the flag.

Not mentioning it would 'turn it off' for the cloned ns, but the default
value for the initial namespace is still a problem.

> > 2. use the nsproxy container subsystem (see Paul Menage's
> >containers patchset) to set this using, e.g.,
> > 
> > echo 1 > /containers/vserver1/mounts/usermount
> 
> That again would lose some flexibility: only namespaces which
> are part of a container could be manipulated.

In the nsproxy subsystem, every namespace gets a container so
long as the nsproxy subsystem is mounted.

> Does that exclude the
> initial namespace?

No, the initial namespace is tied to the root dentry - so if as my
example was assuming youve done

mount -t container -o ns none /containers

then to change the setting for the initial namespace you would

echo 0 > /containers/mounts/usermount

> Also how would a process find out which vserver it is running in?

cat /proc/$$/container

-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/8] unprivileged mount syscall

2007-04-11 Thread Miklos Szeredi

> Not objecting to prctl(), but two other options would be
> 
>   1. add a CLONE_NEW_NS_USERMNT flag - kind of ugly, but that is
>  the time at which the ns is created, so in that sense it
>  makes sense.

Yes, I thought about this, but there's no easy way to set the flag for
the initial namespace, and a second flag CLONE_NEW_NS_NOUSERMNT would
be needed to turn off the flag.

>   2. use the nsproxy container subsystem (see Paul Menage's
>  containers patchset) to set this using, e.g.,
> 
>   echo 1 > /containers/vserver1/mounts/usermount

That again would lose some flexibility: only namespaces which
are part of a container could be manipulated.  Does that exclude the
initial namespace?

Also how would a process find out which vserver it is running in?

Thanks,
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] AFS: Add security support

2007-04-11 Thread J. Bruce Fields

On Wed, Apr 11, 2007 at 09:10:32PM +0100, David Howells wrote:
> J. Bruce Fields <[EMAIL PROTECTED]> wrote:
> 
> > Just curious--when is the actual crypto done?  There doesn't seem to be
> > any in this patch.
> 
> See AF_RXRPC patch:
> 
>   http://people.redhat.com/~dhowells/rxrpc/04-af_rxrpc.diff
> 
> You turn on CONFIG_RXKAD and load the rxkad module thus built (assuming you
> haven't built it in) after loading the af_rxrpc module.  I probably should've
> mentioned that in the cover.

Oh, I see--didn't think to check net/rxrpc.  Thanks!

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] AFS: Add security support

2007-04-11 Thread David Howells

J. Bruce Fields <[EMAIL PROTECTED]> wrote:

> Just curious--when is the actual crypto done?  There doesn't seem to be
> any in this patch.

See AF_RXRPC patch:

http://people.redhat.com/~dhowells/rxrpc/04-af_rxrpc.diff

You turn on CONFIG_RXKAD and load the rxkad module thus built (assuming you
haven't built it in) after loading the af_rxrpc module.  I probably should've
mentioned that in the cover.

So anyone using sockets of family AF_RXRPC can use it.  See these test
programs:

 (1) The klog test program fetches a ticket from the kaserver and adds it as a
 key of type rxrpc:

http://people.redhat.com/~dhowells/rxrpc/klog.c

 (2) The listen test program which listens for potentially secured incoming
 calls:

http://people.redhat.com/~dhowells/rxrpc/listen.c

 (3) The rxrpc test program which can make secure calls:

http://people.redhat.com/~dhowells/rxrpc/rxrpc.c

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/8] unprivileged mount syscall

2007-04-11 Thread Serge E. Hallyn

Quoting Miklos Szeredi ([EMAIL PROTECTED]):
> > It would be nice in general if we could avoid any sort of checks for
> > (mnt->mnt_ns == init_nsproxy.mnt_ns).  Maybe that won't be possible,
> > but, taking the two listed examples:
> 
> [snip]
> 
> It's probably worthwile going after these problematic cases, and
> fixing them, OTOH it's not easy to audit a complete system for holes
> arising from user mounts in the global namespace.
> 
> So why not move this decision out from the kernel?  How about adding a
> boolean flag to namespaces, which specifies whether unprivileged
> mounts are allowed or not.  This would give complete flexibility to
> distro builders and sysadmins.
> 
> The biggest problem I see is how to set this flag.  There's no easy
> way to represent namespaces in /proc or /sys, and this is sufficiently
> obscure not to warrant a new syscall.  Adding a new flag to prctl()
> could do the trick.  Does that sound OK?

Not objecting to prctl(), but two other options would be

1. add a CLONE_NEW_NS_USERMNT flag - kind of ugly, but that is
   the time at which the ns is created, so in that sense it
   makes sense.
2. use the nsproxy container subsystem (see Paul Menage's
   containers patchset) to set this using, e.g.,

echo 1 > /containers/vserver1/mounts/usermount

The prctl() method has a huge advantage of being implementable right
now.

-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/8] unprivileged mount syscall

2007-04-11 Thread Miklos Szeredi

> It would be nice in general if we could avoid any sort of checks for
> (mnt->mnt_ns == init_nsproxy.mnt_ns).  Maybe that won't be possible,
> but, taking the two listed examples:

[snip]

It's probably worthwile going after these problematic cases, and
fixing them, OTOH it's not easy to audit a complete system for holes
arising from user mounts in the global namespace.

So why not move this decision out from the kernel?  How about adding a
boolean flag to namespaces, which specifies whether unprivileged
mounts are allowed or not.  This would give complete flexibility to
distro builders and sysadmins.

The biggest problem I see is how to set this flag.  There's no easy
way to represent namespaces in /proc or /sys, and this is sufficiently
obscure not to warrant a new syscall.  Adding a new flag to prctl()
could do the trick.  Does that sound OK?

Thanks,
Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] AFS: Add security support

2007-04-11 Thread J. Bruce Fields

On Wed, Apr 11, 2007 at 08:10:37PM +0100, David Howells wrote:
> Add security support to the AFS filesystem.  Kerberos IV tickets are
> added as RxRPC keys are added to the session keyring with the klog
> program.  open() and other VFS operations then find this ticket with
> request_key() and either use it immediately (eg: mkdir, unlink) or
> attach it to a file descriptor (open).

Just curious--when is the actual crypto done?  There doesn't seem to be
any in this patch.

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/8] AFS: Correctly alter relocation state after update and show state in /proc

2007-04-11 Thread David Howells

Correctly alter the relocation state after update is complete by switching it
from "Updating" to "Valid".

Also display the record state in the vlocation database proc file.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 fs/afs/proc.c  |   15 +--
 fs/afs/vlocation.c |4 +++-
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/afs/proc.c b/fs/afs/proc.c
index 9e6af40..d5601f6 100644
--- a/fs/afs/proc.c
+++ b/fs/afs/proc.c
@@ -553,6 +553,16 @@ static void afs_proc_cell_volumes_stop(struct seq_file *p, 
void *v)
up_read(&cell->vl_sem);
 }
 
+const char afs_vlocation_states[][4] = {
+   [AFS_VL_NEW]= "New",
+   [AFS_VL_CREATING]   = "Crt",
+   [AFS_VL_VALID]  = "Val",
+   [AFS_VL_NO_VOLUME]  = "NoV",
+   [AFS_VL_UPDATING]   = "Upd",
+   [AFS_VL_VOLUME_DELETED] = "Del",
+   [AFS_VL_UNCERTAIN]  = "Unc",
+};
+
 /*
  * display a header line followed by a load of volume lines
  */
@@ -563,13 +573,14 @@ static int afs_proc_cell_volumes_show(struct seq_file *m, 
void *v)
 
/* display header on line 1 */
if (v == (void *) 1) {
-   seq_puts(m, "USE VLID[0]  VLID[1]  VLID[2]  NAME\n");
+   seq_puts(m, "USE STT VLID[0]  VLID[1]  VLID[2]  NAME\n");
return 0;
}
 
/* display one cell per line on subsequent lines */
-   seq_printf(m, "%3d %08x %08x %08x %s\n",
+   seq_printf(m, "%3d %s %08x %08x %08x %s\n",
   atomic_read(&vlocation->usage),
+  afs_vlocation_states[vlocation->state],
   vlocation->vldb.vid[0],
   vlocation->vldb.vid[1],
   vlocation->vldb.vid[2],
diff --git a/fs/afs/vlocation.c b/fs/afs/vlocation.c
index f0f4419..9af1fe8 100644
--- a/fs/afs/vlocation.c
+++ b/fs/afs/vlocation.c
@@ -657,7 +657,7 @@ static void afs_vlocation_updater(struct work_struct *work)
switch (ret) {
case 0:
afs_vlocation_apply_update(vl, &vldb);
-   vl->state = AFS_VL_UPDATING;
+   vl->state = AFS_VL_VALID;
break;
case -ENOMEDIUM:
vl->state = AFS_VL_VOLUME_DELETED;
@@ -691,6 +691,8 @@ static void afs_vlocation_updater(struct work_struct *work)
timeout = afs_vlocation_update_timeout;
}
 
+   ASSERT(list_empty(&vl->update));
+
list_add_tail(&vl->update, &afs_vlocation_updates);
 
_debug("timeout %ld", timeout);

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/8] AFS: Permit key to be cached in nameidata

2007-04-11 Thread David Howells

Permit a key to be cached in the nameidata struct so that it only needs to be
looked up once when doing the sequence of d_revalidate(), permission(),
follow_link() and lookup() calls involved in a pathwalk.

This is used by the AFS filesystem to avoid repeatedly having to call
request_key().  Once looked up, the key is then available as the kernel walks
to the tree until such a time as the kernel crosses to a non-AFS mountpoint or
an AFS mountpoint in a different cell.

The cache works like this:

 (1) The nameidata::key pointer is initialised to NULL at the start of the
 pathwalk (do_path_lookup()).  path_release() and co. release the key it
 points to.

 (2) Any filesystem operation performed during the pathwalk that has access to
 the nameidata (lookup, permission, follow_link, d_revalidate) can look at
 the key - if non-NULL - and if it's what they're looking for they can use
 it.

 If there's a key there of potential interest, the key's type and
 description should be checked to make sure the key is permissible.

 If of interest, key_validate() should be called to make sure the key is
 still usable.  If it isn't, the error should be passed back rather than
 the key lookup being redone on the basis that some earlier step is now no
 longer valid.

 (3) Any operation that is not interested in the key can either ignore it or
 release it and clear the pointer.

 (4) If an operation wants to put its own key there, it should release the old
 key and set the pointer to point to its own key with the key's usage count
 incremented.  This could be encapsulated in a function something like
 this:

void set_nd_key(struct nameidata *nd, struct key *key)
{
key_put(nd->key);
nd->key = key_get(key);
}

Unfortunately there isn't currently a way to pass the key onto the inode
operations for create(), link(), unlink(), and suchlike, nor is there a way to
pass it to the open() file op without adding a struct key pointer argument to
each of these.

This might also be useful for NFS and CIFS.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 fs/namei.c|5 +
 fs/open.c |7 +--
 include/linux/namei.h |1 +
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ee60cc4..7a59d12 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -350,6 +350,8 @@ void path_release(struct nameidata *nd)
 {
dput(nd->dentry);
mntput(nd->mnt);
+   key_put(nd->key);
+   nd->key = NULL;
 }
 
 /*
@@ -360,6 +362,8 @@ void path_release_on_umount(struct nameidata *nd)
 {
dput(nd->dentry);
mntput_no_expire(nd->mnt);
+   key_put(nd->key);
+   nd->key = NULL;
 }
 
 /**
@@ -1108,6 +1112,7 @@ static int fastcall do_path_lookup(int dfd, const char 
*name,
struct file *file;
struct fs_struct *fs = current->fs;
 
+   nd->key = NULL;
nd->last_type = LAST_ROOT; /* if there are only slashes... */
nd->flags = flags;
nd->depth = 0;
diff --git a/fs/open.c b/fs/open.c
index c989fb4..77bd2a5 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -822,10 +822,13 @@ struct file *nameidata_to_filp(struct nameidata *nd, int 
flags)
/* Pick up the filp from the open intent */
filp = nd->intent.open.file;
/* Has the filesystem initialised the file for us? */
-   if (filp->f_path.dentry == NULL)
+   if (filp->f_path.dentry == NULL) {
filp = __dentry_open(nd->dentry, nd->mnt, flags, filp, NULL);
-   else
+   key_put(nd->key);
+   nd->key = NULL;
+   } else {
path_release(nd);
+   }
return filp;
 }
 
diff --git a/include/linux/namei.h b/include/linux/namei.h
index d39a5a6..d677408 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -17,6 +17,7 @@ enum { MAX_NESTED_LINKS = 8 };
 struct nameidata {
struct dentry   *dentry;
struct vfsmount *mnt;
+   struct key  *key;
struct qstr last;
unsigned intflags;
int last_type;

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/8] AFS: AF_RXRPC key changes

2007-04-11 Thread David Howells

Make two changes to the AF_RXRPC key handling to make it easier for AFS to
use:

 (1) Export key_type_rxrpc so that AFS can request keys of this type.

 (2) Make it possible to have keys that represent "no security".  These are
 created by instantiating the keys with no data.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 include/keys/rxrpc-type.h |   22 ++
 net/rxrpc/af_rxrpc.c  |2 ++
 net/rxrpc/ar-key.c|   10 +-
 net/rxrpc/ar-output.c |6 +-
 4 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/include/keys/rxrpc-type.h b/include/keys/rxrpc-type.h
new file mode 100644
index 000..e2ee73a
--- /dev/null
+++ b/include/keys/rxrpc-type.h
@@ -0,0 +1,22 @@
+/* RxRPC key type
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([EMAIL PROTECTED])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _KEYS_RXRPC_TYPE_H
+#define _KEYS_RXRPC_TYPE_H
+
+#include 
+
+/*
+ * key type for AF_RXRPC keys
+ */
+extern struct key_type key_type_rxrpc;
+
+#endif /* _KEYS_USER_TYPE_H */
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 115ad19..9e37e4f 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -299,6 +299,8 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket 
*sock,
 
if (!key)
key = rx->key;
+   if (key && !key->payload.data)
+   key = NULL; /* a no-security key */
 
bundle = rxrpc_get_bundle(rx, trans, key, service_id, gfp);
if (IS_ERR(bundle)) {
diff --git a/net/rxrpc/ar-key.c b/net/rxrpc/ar-key.c
index 869a96c..7e049ff 100644
--- a/net/rxrpc/ar-key.c
+++ b/net/rxrpc/ar-key.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "ar-internal.h"
 
@@ -40,6 +41,8 @@ struct key_type key_type_rxrpc = {
.describe   = rxrpc_describe,
 };
 
+EXPORT_SYMBOL(key_type_rxrpc);
+
 /*
  * rxrpc server defined keys take ":" as the
  * description and an 8-byte decryption key as the payload
@@ -63,6 +66,8 @@ struct key_type key_type_rxrpc_s = {
  * 12  4   kvno
  * 16  8   session key
  * 24  [len]   ticket
+ *
+ * if no data is provided, then a no-security key is made
  */
 static int rxrpc_instantiate(struct key *key, const void *data, size_t datalen)
 {
@@ -74,6 +79,10 @@ static int rxrpc_instantiate(struct key *key, const void 
*data, size_t datalen)
 
_enter("{%x},,%zu", key_serial(key), datalen);
 
+   /* handle a no-security key */
+   if (!data && datalen == 0)
+   return 0;
+
/* get the key interface version number */
ret = -EINVAL;
if (datalen <= 4 || !data)
@@ -287,7 +296,6 @@ int rxrpc_get_server_data_key(struct rxrpc_connection *conn,
struct rxkad_key tsec;
} data;
 
-
_enter("");
 
key = key_alloc(&key_type_rxrpc, "x", 0, 0, current, 0,
diff --git a/net/rxrpc/ar-output.c b/net/rxrpc/ar-output.c
index ed7f3f4..d2d0baa 100644
--- a/net/rxrpc/ar-output.c
+++ b/net/rxrpc/ar-output.c
@@ -132,6 +132,7 @@ int rxrpc_client_sendmsg(struct kiocb *iocb, struct 
rxrpc_sock *rx,
enum rxrpc_command cmd;
struct rxrpc_call *call;
unsigned long user_call_ID = 0;
+   struct key *key;
__be16 service_id;
u32 abort_code = 0;
int ret;
@@ -153,7 +154,10 @@ int rxrpc_client_sendmsg(struct kiocb *iocb, struct 
rxrpc_sock *rx,
(struct sockaddr_rxrpc *) msg->msg_name;
service_id = htons(srx->srx_service);
}
-   bundle = rxrpc_get_bundle(rx, trans, rx->key, service_id,
+   key = rx->key;
+   if (key && !rx->key->payload.data)
+   key = NULL;
+   bundle = rxrpc_get_bundle(rx, trans, key, service_id,
  GFP_KERNEL);
if (IS_ERR(bundle))
return PTR_ERR(bundle);

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/8] AFS: Handle multiple mounts of an AFS superblock correctly

2007-04-11 Thread David Howells

Handle multiple mounts of an AFS superblock correctly, checking to see whether
the superblock is already initialised after calling sget() rather than just
unconditionally stamping all over it.

Also delete the "silent" parameter to afs_fill_super() as it's not used and
can, in any case, be obtained from sb->s_flags.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 fs/afs/super.c |   26 --
 1 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/afs/super.c b/fs/afs/super.c
index efc4fe6..77e6875 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -212,7 +212,7 @@ static int afs_test_super(struct super_block *sb, void 
*data)
 /*
  * fill in the superblock
  */
-static int afs_fill_super(struct super_block *sb, void *data, int silent)
+static int afs_fill_super(struct super_block *sb, void *data)
 {
struct afs_mount_params *params = data;
struct afs_super_info *as = NULL;
@@ -319,17 +319,23 @@ static int afs_get_sb(struct file_system_type *fs_type,
goto error;
}
 
-   sb->s_flags = flags;
-
-   ret = afs_fill_super(sb, ¶ms, flags & MS_SILENT ? 1 : 0);
-   if (ret < 0) {
-   up_write(&sb->s_umount);
-   deactivate_super(sb);
-   goto error;
+   if (!sb->s_root) {
+   /* initial superblock/root creation */
+   _debug("create");
+   sb->s_flags = flags;
+   ret = afs_fill_super(sb, ¶ms);
+   if (ret < 0) {
+   up_write(&sb->s_umount);
+   deactivate_super(sb);
+   goto error;
+   }
+   sb->s_flags |= MS_ACTIVE;
+   } else {
+   _debug("reuse");
+   ASSERTCMP(sb->s_flags, &, MS_ACTIVE);
}
-   sb->s_flags |= MS_ACTIVE;
-   simple_set_mnt(mnt, sb);
 
+   simple_set_mnt(mnt, sb);
afs_put_volume(params.volume);
afs_put_cell(params.default_cell);
_leave(" = 0 [%p]", sb);

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/8] AFS: Fix callback aggregator work item deadlock

2007-04-11 Thread David Howells

Fix a deadlock in the give-up-callback aggregator dispatcher work item whereby
the aggregator runs on keventd as does timed autounmount, thus leading to the
unmount blocking keventd whilst waiting for keventd to run the aggregator when
the give-up-callback buffer is full.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 fs/afs/callback.c |   14 +-
 fs/afs/fsclient.c |6 --
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index fdad11c..1533b49 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -232,7 +232,8 @@ static void afs_do_give_up_callback(struct afs_server 
*server,
 * possible to ship in one operation */
switch (atomic_inc_return(&server->cb_break_n)) {
case 1 ... AFSCBMAX - 1:
-   schedule_delayed_work(&server->cb_break_work, HZ * 2);
+   queue_delayed_work(afs_callback_update_worker,
+  &server->cb_break_work, HZ * 2);
break;
case AFSCBMAX:
afs_flush_callback_breaks(server);
@@ -271,9 +272,11 @@ void afs_give_up_callback(struct afs_vnode *vnode)
spin_lock(&server->cb_lock);
if (vnode->cb_promised && afs_breakring_space(server) == 0) {
add_wait_queue(&server->cb_break_waitq, &myself);
-   while (vnode->cb_promised &&
-  afs_breakring_space(server) == 0) {
+   for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
+   if (!vnode->cb_promised ||
+   afs_breakring_space(server) != 0)
+   break;
spin_unlock(&server->cb_lock);
schedule();
spin_lock(&server->cb_lock);
@@ -315,7 +318,8 @@ void afs_dispatch_give_up_callbacks(struct work_struct 
*work)
 void afs_flush_callback_breaks(struct afs_server *server)
 {
if (try_to_cancel_delayed_work(&server->cb_break_work) >= 0)
-   schedule_delayed_work(&server->cb_break_work, 0);
+   queue_delayed_work(afs_callback_update_worker,
+  &server->cb_break_work, 0);
 }
 
 #if 0
@@ -426,7 +430,7 @@ static void afs_callback_updater(struct work_struct *work)
 int __init afs_callback_update_init(void)
 {
afs_callback_update_worker =
-   create_singlethread_workqueue("kafs_cbupdated");
+   create_singlethread_workqueue("kafs_callbackd");
return afs_callback_update_worker ? 0 : -ENOMEM;
 }
 
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index d955178..e2a36f8 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -355,10 +355,11 @@ int afs_fs_give_up_callbacks(struct afs_server *server,
__be32 *bp, *tp;
int loop;
 
-   _enter("");
-
ncallbacks = CIRC_CNT(server->cb_break_head, server->cb_break_tail,
  ARRAY_SIZE(server->cb_break));
+
+   _enter("{%zu},", ncallbacks);
+
if (ncallbacks == 0)
return 0;
if (ncallbacks > AFSCBMAX)
@@ -398,6 +399,7 @@ int afs_fs_give_up_callbacks(struct afs_server *server,
(ARRAY_SIZE(server->cb_break) - 1);
}
 
+   ASSERT(ncallbacks > 0);
wake_up_nr(&server->cb_break_waitq, ncallbacks);
 
return afs_make_call(&server->addr, call, GFP_NOFS, wait_mode);

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/8] AF_RXRPC: Use own workqueues

2007-04-11 Thread David Howells

Make the AF_RXRPC module use its own workqueues with their own per-CPU threads.
Currently it uses keventd to do the following tasks, amongst others:

 (*) Security negotiation

 (*) Packet encryption and decryption

 (*) Packet resending

 (*) ACK, abort and busy packet generation

 (*) Timeout handling

 (*) Missing packet catchup

 (*) Parts of incoming call management

 (*) Destruction of structures we've finished with

Some of these conflict with AFS's use of keventd, however, and can lead to
effective deadlock of resources.  Having discussed this, it has been suggested
that encryption and decryption shouldn't be done in keventd (that's not
unreasonable - it is potentially quite slow), and so the AF_RXRPC service is
given its own threads rather than AFS.

It might be useful to consider using the rpciod threads for this, if they were
separated out from the SunRPC module.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 net/rxrpc/af_rxrpc.c  |   17 ++---
 net/rxrpc/ar-accept.c |   12 ++--
 net/rxrpc/ar-ack.c|   10 +-
 net/rxrpc/ar-call.c   |   16 
 net/rxrpc/ar-connection.c |8 
 net/rxrpc/ar-connevent.c  |   20 ++--
 net/rxrpc/ar-error.c  |6 +++---
 net/rxrpc/ar-input.c  |   24 
 net/rxrpc/ar-internal.h   |   28 
 net/rxrpc/ar-local.c  |2 +-
 net/rxrpc/ar-output.c |4 ++--
 net/rxrpc/ar-peer.c   |2 +-
 net/rxrpc/ar-recvmsg.c|2 +-
 net/rxrpc/ar-skbuff.c |2 +-
 net/rxrpc/ar-transport.c  |8 
 15 files changed, 92 insertions(+), 69 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index fb35998..115ad19 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -41,6 +41,8 @@ atomic_t rxrpc_debug_id;
 /* count of skbs currently in use */
 atomic_t rxrpc_n_skbs;
 
+struct workqueue_struct *rxrpc_workqueue;
+
 static void rxrpc_sock_destructor(struct sock *);
 
 /*
@@ -688,7 +690,7 @@ static int rxrpc_release_sock(struct sock *sk)
 
/* try to flush out this socket */
rxrpc_release_calls_on_socket(rx);
-   flush_scheduled_work();
+   flush_workqueue(rxrpc_workqueue);
rxrpc_purge_queue(&sk->sk_receive_queue);
 
if (rx->conn) {
@@ -785,15 +787,21 @@ static int __init af_rxrpc_init(void)
 
rxrpc_epoch = htonl(xtime.tv_sec);
 
+   ret = -ENOMEM;
rxrpc_call_jar = kmem_cache_create(
"rxrpc_call_jar", sizeof(struct rxrpc_call), 0,
SLAB_HWCACHE_ALIGN, NULL, NULL);
if (!rxrpc_call_jar) {
printk(KERN_NOTICE "RxRPC: Failed to allocate call jar\n");
-   ret = -ENOMEM;
goto error_call_jar;
}
 
+   rxrpc_workqueue = create_workqueue("krxrpcd");
+   if (!rxrpc_workqueue) {
+   printk(KERN_NOTICE "RxRPC: Failed to allocate work queue\n");
+   goto error_work_queue;
+   }
+
ret = proto_register(&rxrpc_proto, 1);
 if (ret < 0) {
 printk(KERN_CRIT "RxRPC: Cannot register protocol\n");
@@ -831,6 +839,8 @@ error_key_type:
 error_sock:
proto_unregister(&rxrpc_proto);
 error_proto:
+   destroy_workqueue(rxrpc_workqueue);
+error_work_queue:
kmem_cache_destroy(rxrpc_call_jar);
 error_call_jar:
return ret;
@@ -855,9 +865,10 @@ static void __exit af_rxrpc_exit(void)
ASSERTCMP(atomic_read(&rxrpc_n_skbs), ==, 0);
 
_debug("flush scheduled work");
-   flush_scheduled_work();
+   flush_workqueue(rxrpc_workqueue);
proc_net_remove("rxrpc_conns");
proc_net_remove("rxrpc_calls");
+   destroy_workqueue(rxrpc_workqueue);
kmem_cache_destroy(rxrpc_call_jar);
_leave("");
 }
diff --git a/net/rxrpc/ar-accept.c b/net/rxrpc/ar-accept.c
index 405092d..73243ab 100644
--- a/net/rxrpc/ar-accept.c
+++ b/net/rxrpc/ar-accept.c
@@ -139,7 +139,7 @@ static int rxrpc_accept_incoming_call(struct rxrpc_local 
*local,
call->conn->state = RXRPC_CONN_SERVER_CHALLENGING;
atomic_inc(&call->conn->usage);
set_bit(RXRPC_CONN_CHALLENGE, &call->conn->events);
-   schedule_work(&call->conn->processor);
+   rxrpc_queue_conn(call->conn);
} else {
_debug("conn ready");
call->state = RXRPC_CALL_SERVER_ACCEPTING;
@@ -183,7 +183,7 @@ invalid_service:
if (!test_bit(RXRPC_CALL_RELEASE, &call->flags) &&
!test_and_set_bit(RXRPC_CALL_RELEASE, &call->events)) {
rxrpc_get_call(call);
-   schedule_work(&call->processor);
+   rxrpc_queue_call(call);
}
read_unlock_bh(&call->state_lock);
rxrpc_put_call(call);
@@ -375,7 +375,7 @@ struct rxrpc_call *rxrpc_accept_call(struct rxrpc_sock *rx,

[PATCH 2/8] AF_RXRPC: Lower dead call timeout and fix available call counting on connections

2007-04-11 Thread David Howells

Make a couple of fixes to AF_RXRPC:

 (1) The dead call timeout is shortened to 2 seconds.  Without this, each
 completed call sits around eating up resources for 10 seconds.  The calls
 need to hang around for a little while in case duplicate packets appear,
 but 10 seconds is excessive.

 (2) The number of available calls on a connection (conn->avail_calls) wasn't
 being decremented when a new call was allocated for a connection that
 didn't have any calls in progress.  This an occasional BUG occurring when
 we tried to find an empty channel slot on a connection that was supposed
 to have one available and didn't.

 In association with this, more assertions have been added to check this.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 net/rxrpc/ar-call.c   |   59 +
 net/rxrpc/ar-connection.c |   20 ++-
 2 files changed, 56 insertions(+), 23 deletions(-)

diff --git a/net/rxrpc/ar-call.c b/net/rxrpc/ar-call.c
index 1d7698a..4d92d88 100644
--- a/net/rxrpc/ar-call.c
+++ b/net/rxrpc/ar-call.c
@@ -19,7 +19,7 @@ struct kmem_cache *rxrpc_call_jar;
 LIST_HEAD(rxrpc_calls);
 DEFINE_RWLOCK(rxrpc_call_lock);
 static unsigned rxrpc_call_max_lifetime = 60;
-static unsigned rxrpc_dead_call_timeout = 10;
+static unsigned rxrpc_dead_call_timeout = 2;
 
 static void rxrpc_destroy_call(struct work_struct *work);
 static void rxrpc_call_life_expired(unsigned long _call);
@@ -398,6 +398,7 @@ found_extant_call:
  */
 void rxrpc_release_call(struct rxrpc_call *call)
 {
+   struct rxrpc_connection *conn = call->conn;
struct rxrpc_sock *rx = call->socket;
 
_enter("{%d,%d,%d,%d}",
@@ -413,8 +414,7 @@ void rxrpc_release_call(struct rxrpc_call *call)
/* dissociate from the socket
 * - the socket's ref on the call is passed to the death timer
 */
-   _debug("RELEASE CALL %p (%d CONN %p)",
-  call, call->debug_id, call->conn);
+   _debug("RELEASE CALL %p (%d CONN %p)", call, call->debug_id, conn);
 
write_lock_bh(&rx->call_lock);
if (!list_empty(&call->accept_link)) {
@@ -430,24 +430,42 @@ void rxrpc_release_call(struct rxrpc_call *call)
}
write_unlock_bh(&rx->call_lock);
 
-   if (call->conn->out_clientflag)
-   spin_lock(&call->conn->trans->client_lock);
-   write_lock_bh(&call->conn->lock);
-
/* free up the channel for reuse */
-   if (call->conn->out_clientflag) {
-   call->conn->avail_calls++;
-   if (call->conn->avail_calls == RXRPC_MAXCALLS)
-   list_move_tail(&call->conn->bundle_link,
-  &call->conn->bundle->unused_conns);
-   else if (call->conn->avail_calls == 1)
-   list_move_tail(&call->conn->bundle_link,
-  &call->conn->bundle->avail_conns);
+   spin_lock(&conn->trans->client_lock);
+   write_lock_bh(&conn->lock);
+   write_lock(&call->state_lock);
+
+   if (conn->channels[call->channel] == call)
+   conn->channels[call->channel] = NULL;
+
+   if (conn->out_clientflag && conn->bundle) {
+   conn->avail_calls++;
+   switch (conn->avail_calls) {
+   case 1:
+   list_move_tail(&conn->bundle_link,
+  &conn->bundle->avail_conns);
+   case 2 ... RXRPC_MAXCALLS - 1:
+   ASSERT(conn->channels[0] == NULL ||
+  conn->channels[1] == NULL ||
+  conn->channels[2] == NULL ||
+  conn->channels[3] == NULL);
+   break;
+   case RXRPC_MAXCALLS:
+   list_move_tail(&conn->bundle_link,
+  &conn->bundle->unused_conns);
+   ASSERT(conn->channels[0] == NULL &&
+  conn->channels[1] == NULL &&
+  conn->channels[2] == NULL &&
+  conn->channels[3] == NULL);
+   break;
+   default:
+   printk(KERN_ERR "RxRPC: conn->avail_calls=%d\n",
+  conn->avail_calls);
+   BUG();
+   }
}
 
-   write_lock(&call->state_lock);
-   if (call->conn->channels[call->channel] == call)
-   call->conn->channels[call->channel] = NULL;
+   spin_unlock(&conn->trans->client_lock);
 
if (call->state < RXRPC_CALL_COMPLETE &&
call->state != RXRPC_CALL_CLIENT_FINAL_ACK) {
@@ -458,10 +476,9 @@ void rxrpc_release_call(struct rxrpc_call *call)
rxrpc_queue_call(call);
}
write_unlock(&call->state_lock);
-   write_unlock_bh(&call->conn->lock);
-   if (call->conn->out_clientflag)
-

[PATCH 0/8] AFS: Add security support and fix bugs

2007-04-11 Thread David Howells


These patches build on the patchset labelled "AF_RXRPC socket family and AFS
rewrite".  The patches are also available for http download.

Firstly, the patches fix a number of bugs in AF_RXRPC:

http://people.redhat.com/~dhowells/rxrpc/09-af_rxrpc-own-workqueues.diff
http://people.redhat.com/~dhowells/rxrpc/10-af_rxrpc-fixes.diff

Secondly, they fix some bugs in the AFS filesystem:

http://people.redhat.com/~dhowells/rxrpc/11-afs-callback-wq.diff
http://people.redhat.com/~dhowells/rxrpc/12-afs-vlocation.diff
http://people.redhat.com/~dhowells/rxrpc/13-afs-multimount.diff

And finally, they add security support to AFS:

http://people.redhat.com/~dhowells/rxrpc/14-afs-rxrpc-key.diff
http://people.redhat.com/~dhowells/rxrpc/15-afs-nameidata-key.diff
http://people.redhat.com/~dhowells/rxrpc/16-afs-security.diff


A security key is acquired by running the klog program:

http://people.redhat.com/~dhowells/rxrpc/klog.c

This is compiled by:

make klog CFLAGS="-Wall -g" LDLIBS="-lcrypto -lcrypt -lkrb4 -lkeyutils"

And then run by:

./klog

Note that at the moment this is a rough and ready test program that has the
username, realm, password and proposed key timeout compiled in.  Note also that
it will only talk to the AFS kaserver.


If a security key is acquired, then all subsequent operations - including VL
lookups and mounts - performed with that session keyring will be authenticated
using that key.  The key can be viewed like so:

[EMAIL PROTECTED] ~]# keyctl show
Session Keyring
   -3 --alswrv  0 0  keyring: _ses.3268
2 --alswrv  0 0   \_ keyring: _uid.0
111416553 --als--v  0 0   \_ rxrpc: [EMAIL PROTECTED]

David
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/8] unprivileged mount syscall

2007-04-11 Thread Ram Pai

On Wed, 2007-04-11 at 12:44 +0200, Miklos Szeredi wrote:
> > 1. clone the master namespace.
> > 
> > 2. in the new namespace
> > 
> > move the tree under /share/$me to /
> > for each ($user, $what, $how) {
> > move /share/$user/$what to /$what
> > if ($how == slave) {
> >  make the mount tree under /$what as slave
> > }
> > }
> > 
> > 3. in the new namespace make the tree under 
> >/share as private and unmount /share
> 
> Thanks.  I get the basic idea now: the namespace itself need not be
> shared between the sessions, it is enough if "share" propagation is
> set up between the different namespaces of a user.
> 
> I don't yet see either in your or Viro's description how the trees
> under /share/$USER are initialized.  I guess they are recursively
> bound from /, and are made slaves.

yes. I suppose, when a userid is created one of the steps would be

mount --rbind / /share/$USER
mount --make-rslave /share/$USER
mount --make-rshared /share/$USER

RP







> Miklos

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/13] fs: convert core functions to zero_user_page

2007-04-11 Thread Jörn Engel

On Tue, 10 April 2007 22:56:38 -0700, Andrew Morton wrote:
> 
> And I'm surprised that this:
> 
> +static inline void memclear_highpage_flush(struct page *page, unsigned int 
> offset, unsigned int size)
> +{
> + return zero_user_page(page, offset, size);
> +}
> 
> compiled.  zero_user_page() returns void...

As does memclear_highpage_flush().  Some of my code looks like:
void some_func(...)
{
if (foo)
return do_foo(...);
if (bar)
return do_bar(...);
...
}

do_foo() and do_bar() also return void.  Saves an extra line for the
return statment and the brackets.

Doesn't help in the code you quoted, of course.

Jörn

-- 
Measure. Don't tune for speed until you've measured, and even then
don't unless one part of the code overwhelms the rest.
-- Rob Pike
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/8] unprivileged mount syscall

2007-04-11 Thread Serge E. Hallyn

Quoting Ian Kent ([EMAIL PROTECTED]):
> On Wed, 2007-04-11 at 09:26 -0500, Serge E. Hallyn wrote:
> > Quoting Ian Kent ([EMAIL PROTECTED]):
> > > On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote:
> > > > > > >>
> > > > > > >> - users can use bind mounts without having to pre-configure them 
> > > > > > >> in
> > > > > > >>   /etc/fstab
> > > > > > >>
> > > > > > 
> > > > > > This is by far the biggest concern I see.  I think the security 
> > > > > > implication of allowing anyone to do bind mounts are poorly 
> > > > > > understood.
> > > > > 
> > > > > And especially so since there is no way for a filesystem module to 
> > > > > veto
> > > > > such requests.
> > > > 
> > > > The filesystem can't veto initial mounts based on destination either.
> > > > I don't think it's up to the filesystem to police bind/move mounts in
> > > > any way.
> > > 
> > > But if a filesystem can't or the developer thinks that it shouldn't for
> > > some reason, support bind/move mounts then there should be a way for the
> > 
> > Can you list some valid reasons why an fs could care where it is
> > mounted?  The only thing I could think of is a stackable fs, but it
> > shouldn't care whether it is overlay-mounted or not.
> 
> For my part, autofs and autofs4.

Ah, thanks.

I can see I'm going to have start using autofs to get to know the
implementation, because it seems clear we'll run into it in the
containers work again (beyond the struct pid conv) at some point.

> Moving or binding isn't valid.
> I tried to design that limitation out version 5 but wasn't able to.
> In time I probably can but couldn't continue to support older versions.

thanks,
-serge

> > 
> > thanks,
> > -serge
> > 
> > > filesystem to tell the kernel that.
> > > 
> > > Surely a filesystem is in a good position to be able to decide if a
> > > mount request "for it" should be allowed to continue based on it's "own
> > > situation and capabilities".
> > > 
> > > Ian
> > > 
> > > 
> > > 
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" 
> > > in
> > > the body of a message to [EMAIL PROTECTED]
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/8] unprivileged mount syscall

2007-04-11 Thread Ian Kent

On Wed, 2007-04-11 at 09:26 -0500, Serge E. Hallyn wrote:
> Quoting Ian Kent ([EMAIL PROTECTED]):
> > On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote:
> > > > > >>
> > > > > >> - users can use bind mounts without having to pre-configure them in
> > > > > >>   /etc/fstab
> > > > > >>
> > > > > 
> > > > > This is by far the biggest concern I see.  I think the security 
> > > > > implication of allowing anyone to do bind mounts are poorly 
> > > > > understood.
> > > > 
> > > > And especially so since there is no way for a filesystem module to veto
> > > > such requests.
> > > 
> > > The filesystem can't veto initial mounts based on destination either.
> > > I don't think it's up to the filesystem to police bind/move mounts in
> > > any way.
> > 
> > But if a filesystem can't or the developer thinks that it shouldn't for
> > some reason, support bind/move mounts then there should be a way for the
> 
> Can you list some valid reasons why an fs could care where it is
> mounted?  The only thing I could think of is a stackable fs, but it
> shouldn't care whether it is overlay-mounted or not.

For my part, autofs and autofs4.
Moving or binding isn't valid.
I tried to design that limitation out version 5 but wasn't able to.
In time I probably can but couldn't continue to support older versions.

> 
> thanks,
> -serge
> 
> > filesystem to tell the kernel that.
> > 
> > Surely a filesystem is in a good position to be able to decide if a
> > mount request "for it" should be allowed to continue based on it's "own
> > situation and capabilities".
> > 
> > Ian
> > 
> > 
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/8] unprivileged mount syscall

2007-04-11 Thread Serge E. Hallyn

Quoting Ian Kent ([EMAIL PROTECTED]):
> On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote:
> > > > >>
> > > > >> - users can use bind mounts without having to pre-configure them in
> > > > >>   /etc/fstab
> > > > >>
> > > > 
> > > > This is by far the biggest concern I see.  I think the security 
> > > > implication of allowing anyone to do bind mounts are poorly understood.
> > > 
> > > And especially so since there is no way for a filesystem module to veto
> > > such requests.
> > 
> > The filesystem can't veto initial mounts based on destination either.
> > I don't think it's up to the filesystem to police bind/move mounts in
> > any way.
> 
> But if a filesystem can't or the developer thinks that it shouldn't for
> some reason, support bind/move mounts then there should be a way for the

Can you list some valid reasons why an fs could care where it is
mounted?  The only thing I could think of is a stackable fs, but it
shouldn't care whether it is overlay-mounted or not.

thanks,
-serge

> filesystem to tell the kernel that.
> 
> Surely a filesystem is in a good position to be able to decide if a
> mount request "for it" should be allowed to continue based on it's "own
> situation and capabilities".
> 
> Ian
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/8] unprivileged mount syscall

2007-04-11 Thread Ian Kent

On Wed, 2007-04-11 at 12:48 +0200, Miklos Szeredi wrote:
> > > >>
> > > >> - users can use bind mounts without having to pre-configure them in
> > > >>   /etc/fstab
> > > >>
> > > 
> > > This is by far the biggest concern I see.  I think the security 
> > > implication of allowing anyone to do bind mounts are poorly understood.
> > 
> > And especially so since there is no way for a filesystem module to veto
> > such requests.
> 
> The filesystem can't veto initial mounts based on destination either.
> I don't think it's up to the filesystem to police bind/move mounts in
> any way.

But if a filesystem can't or the developer thinks that it shouldn't for
some reason, support bind/move mounts then there should be a way for the
filesystem to tell the kernel that.

Surely a filesystem is in a good position to be able to decide if a
mount request "for it" should be allowed to continue based on it's "own
situation and capabilities".

Ian

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

calling write() from interrupt context

2007-04-11 Thread Ian Brown


Hello,
I have a question regarding calling write() from an interrupt context
in the kernel:
is it possible ?

There is an article about reading/writing files from the kernel
by GregKH; see: http://interactive.linuxjournal.com/article/8110

Everybody (including the author) admits that reading/writing files from
the kernel is not recommended at all.
Yet, because of my interest in this, I tried it and it works.
However, when trying write() from interrupt context it will not work
because write() can sleep.

Is there a way to call write() from interrupt context ? some special filesystem
or patch?
I found a dumpfs patch;
but it was not tested yet; moreover, it is a patch against 2.6.8-rc2 and as
it seems it was abandoned.
see
http://lwn.net/Articles/94748/?format=printable

Any ideas?
Regards,


Ian
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/8] unprivileged mount syscall

2007-04-11 Thread Miklos Szeredi

> > >>
> > >> - users can use bind mounts without having to pre-configure them in
> > >>   /etc/fstab
> > >>
> > 
> > This is by far the biggest concern I see.  I think the security 
> > implication of allowing anyone to do bind mounts are poorly understood.
> 
> And especially so since there is no way for a filesystem module to veto
> such requests.

The filesystem can't veto initial mounts based on destination either.
I don't think it's up to the filesystem to police bind/move mounts in
any way.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/8] unprivileged mount syscall

2007-04-11 Thread Miklos Szeredi

> 1. clone the master namespace.
> 
> 2. in the new namespace
> 
>   move the tree under /share/$me to /
> for each ($user, $what, $how) {
> move /share/$user/$what to /$what
>   if ($how == slave) {
>  make the mount tree under /$what as slave
> }
> }
> 
> 3. in the new namespace make the tree under 
>/share as private and unmount /share

Thanks.  I get the basic idea now: the namespace itself need not be
shared between the sessions, it is enough if "share" propagation is
set up between the different namespaces of a user.

I don't yet see either in your or Viro's description how the trees
under /share/$USER are initialized.  I guess they are recursively
bound from /, and are made slaves.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/8] unprivileged mount syscall

2007-04-11 Thread Miklos Szeredi

> > This patchset adds support for keeping mount ownership information in
> > the kernel, and allow unprivileged mount(2) and umount(2) in certain
> > cases.
> 
> Well, I'd like to feel all smart and point out some bugs, but the code
> all reads very nicely, seems to work as advertised, and while I won't
> have ltp results until tomorrow, boot test results in so far are all
> successful.
> 
> Looks good.

Thanks for the review and testing!

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

45 matches

Mail list logo