Re: Mirror a file system on the fly

2005-08-19 Thread Dave Quigley

Dave Schwartz wrote:


Ram,

Your code snippet seems to work great as discussed. Thanks. :-)
However, my requirement is slightly different.  What I also want is
that any file created from the mirrored/cloned file-system must not be
available in the parent file system.

Gracias,
decebel


On 8/18/05, Ram Pai <[EMAIL PROTECTED]> wrote:
 


On Thu, 2005-08-18 at 13:27, Dave Schwartz wrote:
   


Hi Ram,
Thanks for the inputs. I was going over the man pages describing the
clone system call and its option of CLONE_NEWNS. Could understand the
description only in parts.

The man page suggests that this flag when set, the cloned child is
started in a new name space, initialized with a copy of the parent.
Now does that mean, a program like a shell when cloned with
CLONE_NEWNS set, will have a copy of file hierarchy of the underlying
parent process?
 


Yes the child process will see an exact copy of all the mounts of
various filesystems as that of the parent. However if you mount/unmount
any filesystems in the child, the same will not be mounted/unmounted in
the parent and vice-versa.  Each has its individual view of the
the filesystem heirarchy.

Try the following program that clones off a child process with a mirror
namespace and gives you a bash prompt. Try mounting and unmounting
in this bash prompt and see if the same is visible in a totally
different window.


#include  
#include  
#include  

char somemem[4096];

int myfunc(){
   system("bash");
}

int
main(int argc, char *argv[])
{
   if(clone(myfunc, somemem, CLONE_NEWNS|SIGCHLD, NULL)) {
   wait(NULL);
   } else {
   printf("clone failed\n");
   }
   printf("exit\n");
}


Hope this helps,
RP




   


Gracias,
decebel



On 8/19/05, Ram Pai <[EMAIL PROTECTED]> wrote:
 


On Thu, 2005-08-18 at 12:40, Dave Schwartz wrote:
   


Hi list,

Not too sure if this is the right forum to ask this question but since
my requirement is around linux filesystems, I shall take this liberty
to post my question.

My requirement is to develop a kernel/user space module to add an
extension to the shell program environment such that this shell forks
a mirror look-alike filesystem of the underlying OS to the programs
run in that particular shell.
 


u seem to be talking about namespaces, if I get you right.

there is a flag CLONE_NEWNS to the system call 'clone' which does what
u r talking about.

RP




   


Was trying to look thru the FAQ and a few list archives to look for
ideas around my requirement. The archives were overwhelming.


Any ideas/pointers will be a great help,
Gracias,
decebel
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

   


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

   


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


 

You might want to look into a project I work on called unionfs It 
contains code for sandboxing which is what your looking to do. The 
current code base is available for both 2.4 and 2.6 so you can do this 
with either kernel version.  You can find the source at 
http://www.filesystems.org/project-unionfs.html the split cache source 
is what you should be looking at. It will give you an example of how to 
create a small sandbox.  If you think unionfs itself can be used for 
your purposes Ide suggest asking on their mailing list..


Dave Quigley

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix cramfs making duplicate entries in inode cache

2005-08-19 Thread Dave Johnson
Dave Johnson writes:
> Phillip Lougher writes:
> > Doesn't iget_locked() assume inode numbers are unique?
> > 
> > In Cramfs inode numbers are set to 1 for non-data inodes (fifos, 
> > sockets, devices, empty directories), i.e
> > 
> > %stat device namedpipe
> >File: `device'
> >Size: 0   Blocks: 0  IO Block: 4096   character 
> > special file
> > Device: 700h/1792d  Inode: 1   Links: 1 Device type: 1,1
> > Access: (0644/crw-r--r--)  Uid: (0/root)   Gid: (0/root)
> > Access: 1970-01-01 01:00:00.0 +0100
> > Modify: 1970-01-01 01:00:00.0 +0100
> > Change: 1970-01-01 01:00:00.0 +0100
> >File: `namedpipe'
> >Size: 0   Blocks: 0  IO Block: 4096   fifo
> > Device: 700h/1792d  Inode: 1   Links: 1
> > Access: (0644/prw-r--r--)  Uid: (0/root)   Gid: (0/root)
> > Access: 1970-01-01 01:00:00.0 +0100
> > Modify: 1970-01-01 01:00:00.0 +0100
> > Change: 1970-01-01 01:00:00.0 +0100
> > 
> > Should iget5_locked() be used here?
> 
> Yep, that was busted.  Below patch should be better.

Doh, 2nd attempt wasn't verifying the inode number in the test
function to protect against hash collisions (iget_locked does this but
iget5_locked doesn't)

It should be good now.

-- 
Dave Johnson
Starent Networks

= fs/cramfs/inode.c 1.42 vs edited =
--- 1.42/fs/cramfs/inode.c  2005-07-14 12:24:48 -04:00
+++ edited/fs/cramfs/inode.c2005-08-19 22:06:44 -04:00
@@ -42,12 +42,46 @@
 #define CRAMINO(x) ((x)->offset?(x)->offset<<2:1)
 #define OFFSET(x)  ((x)->i_ino)
 
+
+static int cramfs_iget5_test(struct inode *inode, void *opaque)
+{
+   struct cramfs_inode * cramfs_inode = (struct cramfs_inode *)opaque;
+
+   if (inode->i_ino != CRAMINO(cramfs_inode))
+   return 0; /* does not match */
+
+   if (inode->i_ino != 1)
+   return 1;
+
+   /* all empty directories, char, block, pipe, and sock, share inode #1 */
+  
+   if ((inode->i_mode != cramfs_inode->mode) ||
+   (inode->i_gid != cramfs_inode->gid) ||
+   (inode->i_uid != cramfs_inode->uid))
+   return 0; /* does not match */
+  
+   if ((S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) &&
+   (inode->i_rdev != old_decode_dev(cramfs_inode->size)))
+   return 0; /* does not match */
+
+   return 1; /* matches */
+}
+
+static int cramfs_iget5_set(struct inode *inode, void *opaque)
+{
+   struct cramfs_inode * cramfs_inode = (struct cramfs_inode *)opaque;
+   inode->i_ino = CRAMINO(cramfs_inode);
+   return 0;
+}
+
 static struct inode *get_cramfs_inode(struct super_block *sb, struct 
cramfs_inode * cramfs_inode)
 {
-   struct inode * inode = new_inode(sb);
+   struct inode * inode = iget5_locked(sb, CRAMINO(cramfs_inode),
+   cramfs_iget5_test, cramfs_iget5_set,
+   cramfs_inode);
static struct timespec zerotime;
 
-   if (inode) {
+   if (inode && (inode->i_state & I_NEW)) {
inode->i_mode = cramfs_inode->mode;
inode->i_uid = cramfs_inode->uid;
inode->i_size = cramfs_inode->size;
@@ -63,7 +99,6 @@
   but it's the best we can do without reading the directory
   contents.  1 yields the right result in GNU find, even
   without -noleaf option. */
-   insert_inode_hash(inode);
if (S_ISREG(inode->i_mode)) {
inode->i_fop = &generic_ro_fops;
inode->i_data.a_ops = &cramfs_aops;
@@ -75,6 +110,7 @@
init_special_inode(inode, inode->i_mode,
old_decode_dev(cramfs_inode->size));
}
+   unlock_new_inode(inode);
}
return inode;
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 06:08:12PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 20 Aug 2005, Al Viro wrote:
> > 
> > That looks OK except for
> > * ncpfs fix is actually missing here
> 
> Well, the thing is, with the change to page_follow_link() and 
> page_put_link(), ncpfs should now work fine - it doesn't need any fixing 
> any more.

Ah - right, it's using normal methods...
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Linus Torvalds


On Sat, 20 Aug 2005, Al Viro wrote:
> 
> That looks OK except for
>   * ncpfs fix is actually missing here

Well, the thing is, with the change to page_follow_link() and 
page_put_link(), ncpfs should now work fine - it doesn't need any fixing 
any more.

It was makign an assumption that used to be incorrect, but that assumption
became ok with the calling convention change - now the page_follow_link()
and page_put_link() functions work fine even if the page cache might get 
invalidated in between the calls.

So with your patch, things should be fine (and external filesystems will 
generate warnings on compiles, but should work fine, since the change 
maintains the ABI even if the API changed - which is pretty unusual).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix cramfs making duplicate entries in inode cache

2005-08-19 Thread Dave Johnson
Phillip Lougher writes:
> Doesn't iget_locked() assume inode numbers are unique?
> 
> In Cramfs inode numbers are set to 1 for non-data inodes (fifos, 
> sockets, devices, empty directories), i.e
> 
> %stat device namedpipe
>File: `device'
>Size: 0   Blocks: 0  IO Block: 4096   character 
> special file
> Device: 700h/1792d  Inode: 1   Links: 1 Device type: 1,1
> Access: (0644/crw-r--r--)  Uid: (0/root)   Gid: (0/root)
> Access: 1970-01-01 01:00:00.0 +0100
> Modify: 1970-01-01 01:00:00.0 +0100
> Change: 1970-01-01 01:00:00.0 +0100
>File: `namedpipe'
>Size: 0   Blocks: 0  IO Block: 4096   fifo
> Device: 700h/1792d  Inode: 1   Links: 1
> Access: (0644/prw-r--r--)  Uid: (0/root)   Gid: (0/root)
> Access: 1970-01-01 01:00:00.0 +0100
> Modify: 1970-01-01 01:00:00.0 +0100
> Change: 1970-01-01 01:00:00.0 +0100
> 
> Should iget5_locked() be used here?

Yep, that was busted.  Below patch should be better.

-- 
Dave Johnson
Starent Networks


= fs/cramfs/inode.c 1.42 vs edited =
--- 1.42/fs/cramfs/inode.c  2005-07-14 12:24:48 -04:00
+++ edited/fs/cramfs/inode.c2005-08-19 18:47:28 -04:00
@@ -42,12 +42,43 @@
 #define CRAMINO(x) ((x)->offset?(x)->offset<<2:1)
 #define OFFSET(x)  ((x)->i_ino)
 
+
+static int cramfs_iget5_test(struct inode *inode, void *opaque)
+{
+   struct cramfs_inode * cramfs_inode = (struct cramfs_inode *)opaque;
+
+   if (inode->i_ino != 1)
+   return 1;
+
+   /* all empty directories, char, block, pipe, and sock, share inode #1 */
+  
+   if ((inode->i_mode != cramfs_inode->mode) ||
+   (inode->i_gid != cramfs_inode->gid) ||
+   (inode->i_uid != cramfs_inode->uid))
+   return 0; /* does not match */
+  
+   if ((S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) &&
+   (inode->i_rdev != old_decode_dev(cramfs_inode->size)))
+   return 0; /* does not match */
+
+   return 1; /* matches */
+}
+
+static int cramfs_iget5_set(struct inode *inode, void *opaque)
+{
+   struct cramfs_inode * cramfs_inode = (struct cramfs_inode *)opaque;
+   inode->i_ino = CRAMINO(cramfs_inode);
+   return 0;
+}
+
 static struct inode *get_cramfs_inode(struct super_block *sb, struct 
cramfs_inode * cramfs_inode)
 {
-   struct inode * inode = new_inode(sb);
+   struct inode * inode = iget5_locked(sb, CRAMINO(cramfs_inode),
+   cramfs_iget5_test, cramfs_iget5_set,
+   cramfs_inode);
static struct timespec zerotime;
 
-   if (inode) {
+   if (inode && (inode->i_state & I_NEW)) {
inode->i_mode = cramfs_inode->mode;
inode->i_uid = cramfs_inode->uid;
inode->i_size = cramfs_inode->size;
@@ -59,11 +92,6 @@
/* Struct copy intentional */
inode->i_mtime = inode->i_atime = inode->i_ctime = zerotime;
inode->i_ino = CRAMINO(cramfs_inode);
-   /* inode->i_nlink is left 1 - arguably wrong for directories,
-  but it's the best we can do without reading the directory
-  contents.  1 yields the right result in GNU find, even
-  without -noleaf option. */
-   insert_inode_hash(inode);
if (S_ISREG(inode->i_mode)) {
inode->i_fop = &generic_ro_fops;
inode->i_data.a_ops = &cramfs_aops;
@@ -75,6 +104,7 @@
init_special_inode(inode, inode->i_mode,
old_decode_dev(cramfs_inode->size));
}
+   unlock_new_inode(inode);
}
return inode;
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Sat, Aug 20, 2005 at 12:15:42AM +0100, Al Viro wrote:
> 
> That looks OK except for
>   * jffs2 is b0rken (see patch in another mail)
>   * afs, autofs4, befs, devfs, freevxfs, jffs2, jfs, ncpfs, procfs,
> smbfs, sysvfs, ufs, xfs - prototype change for ->follow_link()
>   * befs, smbfs, xfs - same for ->put_link()
>   * ncpfs fix is actually missing here
> 
> Prototype changes are covered by patch below (incremental on top of your +
> jffs2 fix upthread).  No ncpfs changes - these will go separately, assuming
> you haven't done them yet; just a plain janitor stuff.


Gaack...  And here's the patch itself - sorry.

diff -urN RC13-rc6-git10-base/fs/afs/mntpt.c current/fs/afs/mntpt.c
--- RC13-rc6-git10-base/fs/afs/mntpt.c  2005-06-17 15:48:29.0 -0400
+++ current/fs/afs/mntpt.c  2005-08-19 19:02:48.0 -0400
@@ -30,7 +30,7 @@
   struct dentry *dentry,
   struct nameidata *nd);
 static int afs_mntpt_open(struct inode *inode, struct file *file);
-static int afs_mntpt_follow_link(struct dentry *dentry, struct nameidata *nd);
+static void *afs_mntpt_follow_link(struct dentry *dentry, struct nameidata 
*nd);
 
 struct file_operations afs_mntpt_file_operations = {
.open   = afs_mntpt_open,
@@ -233,7 +233,7 @@
 /*
  * follow a link from a mountpoint directory, thus causing it to be mounted
  */
-static int afs_mntpt_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *afs_mntpt_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
struct vfsmount *newmnt;
struct dentry *old_dentry;
@@ -249,7 +249,7 @@
newmnt = afs_mntpt_do_automount(dentry);
if (IS_ERR(newmnt)) {
path_release(nd);
-   return PTR_ERR(newmnt);
+   return (void *)newmnt;
}
 
old_dentry = nd->dentry;
@@ -267,7 +267,7 @@
}
 
kleave(" = %d", err);
-   return err;
+   return ERR_PTR(err);
 } /* end afs_mntpt_follow_link() */
 
 /*/
diff -urN RC13-rc6-git10-base/fs/autofs4/symlink.c current/fs/autofs4/symlink.c
--- RC13-rc6-git10-base/fs/autofs4/symlink.c2005-06-17 15:48:29.0 
-0400
+++ current/fs/autofs4/symlink.c2005-08-19 19:03:11.0 -0400
@@ -12,11 +12,11 @@
 
 #include "autofs_i.h"
 
-static int autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *autofs4_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
struct autofs_info *ino = autofs4_dentry_ino(dentry);
nd_set_link(nd, (char *)ino->u.symlink);
-   return 0;
+   return NULL;
 }
 
 struct inode_operations autofs4_symlink_inode_operations = {
diff -urN RC13-rc6-git10-base/fs/befs/linuxvfs.c current/fs/befs/linuxvfs.c
--- RC13-rc6-git10-base/fs/befs/linuxvfs.c  2005-06-17 15:48:29.0 
-0400
+++ current/fs/befs/linuxvfs.c  2005-08-19 19:03:52.0 -0400
@@ -41,8 +41,8 @@
 static void befs_destroy_inode(struct inode *inode);
 static int befs_init_inodecache(void);
 static void befs_destroy_inodecache(void);
-static int befs_follow_link(struct dentry *, struct nameidata *);
-static void befs_put_link(struct dentry *, struct nameidata *);
+static void *befs_follow_link(struct dentry *, struct nameidata *);
+static void befs_put_link(struct dentry *, struct nameidata *, void *);
 static int befs_utf2nls(struct super_block *sb, const char *in, int in_len,
char **out, int *out_len);
 static int befs_nls2utf(struct super_block *sb, const char *in, int in_len,
@@ -487,10 +487,10 @@
}
 
nd_set_link(nd, link);
-   return 0;
+   return NULL;
 }
 
-static void befs_put_link(struct dentry *dentry, struct nameidata *nd)
+static void befs_put_link(struct dentry *dentry, struct nameidata *nd, void *p)
 {
befs_inode_info *befs_ino = BEFS_I(dentry->d_inode);
if (befs_ino->i_flags & BEFS_LONG_SYMLINK) {
diff -urN RC13-rc6-git10-base/fs/devfs/base.c current/fs/devfs/base.c
--- RC13-rc6-git10-base/fs/devfs/base.c 2005-06-17 15:48:29.0 -0400
+++ current/fs/devfs/base.c 2005-08-19 19:04:17.0 -0400
@@ -2491,11 +2491,11 @@
return 0;
 }  /*  End Function devfs_mknod  */
 
-static int devfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *devfs_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
struct devfs_entry *p = get_devfs_entry_from_vfs_inode(dentry->d_inode);
nd_set_link(nd, p ? p->u.symlink.linkname : ERR_PTR(-ENODEV));
-   return 0;
+   return NULL;
 }  /*  End Function devfs_follow_link  */
 
 static struct inode_operations devfs_iops = {
diff -urN RC13-rc6-git10-base/fs/freevxfs/vxfs_immed.c 
current/fs/freevxfs/vxfs_immed.c
--- RC13-rc6-git10-base/fs/freevxfs/vxfs_immed.c2005-06-1

Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 03:04:52PM -0700, Linus Torvalds wrote:
> 
> 
> On Fri, 19 Aug 2005, Anton Altaparmakov wrote:
> > 
> > Yes, sure.  I have applied your patch to our 2.6.11.4 tree (with the one 
> > liner change I emailed you just now) and have kicked off a compile.
> 
> Actually, hold on. The original patch had another problem: it returned an
> uninitialized "page" pointer when page_getlink() failed.
> 
> This one should have that fixed, and has converted a few other 
> filesystems. Most of them trivially, but I took the opportunity to just 
> simplify NFS while I was at it, since it now has no reason to need to save 
> off the "struct page *" any more.
> 
> It's still not tested, but at least I've looked at it a bit more ;)

That looks OK except for
* jffs2 is b0rken (see patch in another mail)
* afs, autofs4, befs, devfs, freevxfs, jffs2, jfs, ncpfs, procfs,
smbfs, sysvfs, ufs, xfs - prototype change for ->follow_link()
* befs, smbfs, xfs - same for ->put_link()
* ncpfs fix is actually missing here

Prototype changes are covered by patch below (incremental on top of your +
jffs2 fix upthread).  No ncpfs changes - these will go separately, assuming
you haven't done them yet; just a plain janitor stuff.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Linus Torvalds


On Fri, 19 Aug 2005, Anton Altaparmakov wrote:
> 
> Yes, sure.  I have applied your patch to our 2.6.11.4 tree (with the one 
> liner change I emailed you just now) and have kicked off a compile.

Actually, hold on. The original patch had another problem: it returned an
uninitialized "page" pointer when page_getlink() failed.

This one should have that fixed, and has converted a few other 
filesystems. Most of them trivially, but I took the opportunity to just 
simplify NFS while I was at it, since it now has no reason to need to save 
off the "struct page *" any more.

It's still not tested, but at least I've looked at it a bit more ;)

Linus

---
diff --git a/fs/autofs/symlink.c b/fs/autofs/symlink.c
--- a/fs/autofs/symlink.c
+++ b/fs/autofs/symlink.c
@@ -12,11 +12,12 @@
 
 #include "autofs_i.h"
 
-static int autofs_follow_link(struct dentry *dentry, struct nameidata *nd)
+/* Nothing to release.. */
+static void *autofs_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
char *s=((struct autofs_symlink *)dentry->d_inode->u.generic_ip)->data;
nd_set_link(nd, s);
-   return 0;
+   return NULL;
 }
 
 struct inode_operations autofs_symlink_inode_operations = {
diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
--- a/fs/cifs/cifsfs.h
+++ b/fs/cifs/cifsfs.h
@@ -83,8 +83,8 @@ extern int cifs_dir_notify(struct file *
 extern struct dentry_operations cifs_dentry_ops;
 
 /* Functions related to symlinks */
-extern int cifs_follow_link(struct dentry *direntry, struct nameidata *nd);
-extern void cifs_put_link(struct dentry *direntry, struct nameidata *nd);
+extern void *cifs_follow_link(struct dentry *direntry, struct nameidata *nd);
+extern void cifs_put_link(struct dentry *direntry, struct nameidata *nd, void 
*);
 extern int cifs_readlink(struct dentry *direntry, char __user *buffer, 
 int buflen);
 extern int cifs_symlink(struct inode *inode, struct dentry *direntry,
diff --git a/fs/cifs/link.c b/fs/cifs/link.c
--- a/fs/cifs/link.c
+++ b/fs/cifs/link.c
@@ -92,7 +92,7 @@ cifs_hl_exit:
return rc;
 }
 
-int
+void *
 cifs_follow_link(struct dentry *direntry, struct nameidata *nd)
 {
struct inode *inode = direntry->d_inode;
@@ -148,7 +148,7 @@ out:
 out_no_free:
FreeXid(xid);
nd_set_link(nd, target_path);
-   return 0;
+   return NULL;/* No cookie */
 }
 
 int
@@ -330,7 +330,7 @@ cifs_readlink(struct dentry *direntry, c
return rc;
 }
 
-void cifs_put_link(struct dentry *direntry, struct nameidata *nd)
+void cifs_put_link(struct dentry *direntry, struct nameidata *nd, void *cookie)
 {
char *p = nd_get_link(nd);
if (!IS_ERR(p))
diff --git a/fs/ext2/symlink.c b/fs/ext2/symlink.c
--- a/fs/ext2/symlink.c
+++ b/fs/ext2/symlink.c
@@ -21,11 +21,11 @@
 #include "xattr.h"
 #include 
 
-static int ext2_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void *ext2_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
struct ext2_inode_info *ei = EXT2_I(dentry->d_inode);
nd_set_link(nd, (char *)ei->i_data);
-   return 0;
+   return NULL;
 }
 
 struct inode_operations ext2_symlink_inode_operations = {
diff --git a/fs/ext3/symlink.c b/fs/ext3/symlink.c
--- a/fs/ext3/symlink.c
+++ b/fs/ext3/symlink.c
@@ -23,11 +23,11 @@
 #include 
 #include "xattr.h"
 
-static int ext3_follow_link(struct dentry *dentry, struct nameidata *nd)
+static void * ext3_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
struct ext3_inode_info *ei = EXT3_I(dentry->d_inode);
nd_set_link(nd, (char*)ei->i_data);
-   return 0;
+   return NULL;
 }
 
 struct inode_operations ext3_symlink_inode_operations = {
diff --git a/fs/namei.c b/fs/namei.c
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -501,6 +501,7 @@ struct path {
 static inline int __do_follow_link(struct path *path, struct nameidata *nd)
 {
int error;
+   void *cookie;
struct dentry *dentry = path->dentry;
 
touch_atime(path->mnt, dentry);
@@ -508,13 +509,15 @@ static inline int __do_follow_link(struc
 
if (path->mnt == nd->mnt)
mntget(path->mnt);
-   error = dentry->d_inode->i_op->follow_link(dentry, nd);
-   if (!error) {
+   cookie = dentry->d_inode->i_op->follow_link(dentry, nd);
+   error = PTR_ERR(cookie);
+   if (!IS_ERR(cookie)) {
char *s = nd_get_link(nd);
+   error = 0;
if (s)
error = __vfs_follow_link(nd, s);
if (dentry->d_inode->i_op->put_link)
-   dentry->d_inode->i_op->put_link(dentry, nd);
+   dentry->d_inode->i_op->put_link(dentry, nd, cookie);
}
dput(dentry);
mntput(path->mnt);
@@ -2344,15 +2347,17 @@ out:
 int generic_readlink(struct dentry *dentry, char __user *buffer, int buflen)
 {
struct nameidata nd;
-   int res;
+   void *cookie;
+
nd.dept

Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 01:35:49PM -0700, Linus Torvalds wrote:
> I'm not sure if this merits a new file or new organization (hey,
> fs/lib/xxx might be good in theory). In particular, I had actually been
> hoping to release 2.6.13 today, but this seems like a valid thing to hold 
> things up for - but not if we're going to re-organize things.
> 
> The one thing that strikes me is that we might actually have less pain if
> we just changed the calling convention for follow_link/put_link slightly
> instead of creating a new library function. The existing "page cache"  
> thing really _does_ work very well, and would work fine for NFS and ncpfs
> too, if we just allowed an extra cookie to be passed along from
> "follow_link()" to "put_link()".

Er...  We can do that, but current calling conventions for ->follow_link()
are already too complex (and we had bugs in that area).  What we have is
1) you can return -error; then you must release nameidata yourself
and ->put_link() will _not_ be called.
2) you can do nd_set_link(nd, ERR_PTR(-error)) and return 0
3) you can do nd_set_link(nd, path) and return 0
4) you can return 0 (after having moved nameidata yourself)
Note that in practice (2) is far more convenient than (1).  So pretty much
everything does it for errors and (1) turns out to be a source of bugs -
people forget to drop nameidata.  The only real case for (1) is failed
(4) - when we went too far and already dropped nameidata naturally.

Let me take a look at the current situation...
* a bunch of filesystems do only (2) and (3).  That's the normal
case.
* jffs2 follow_link() is broken - it has an exit where it returns
-EIO and leaks nameidata.
* procfs special symlinks - (1) or (4), correctly used.
* afs magic - (1) or (4), correctly used.
* hppfs - badly broken for unrelated reasons.

Adding "you can return a cookie as long as it's not ERR_PTR() and it will
be treated like (2), (3) or (4), except that you won't want it in (4)" will
add to confusion.  And of course, we will have to update every ->put_link()
and ->follow_link() out there...

_If_ you want 2.6.13 with minimal PITA right now, I'd suggest leaving API
as is until 2.6.13-git1 and using stray_page_... + patch below to deal with
jffs2 until then.  Then we can deal with API change - immediately after
2.6.13 release.  Removing stray_... in process.

diff -urN RC13-rc6-git10-base/fs/jffs2/symlink.c current/fs/jffs2/symlink.c
--- RC13-rc6-git10-base/fs/jffs2/symlink.c  2005-08-10 10:37:52.0 
-0400
+++ current/fs/jffs2/symlink.c  2005-08-19 17:35:52.0 -0400
@@ -30,6 +30,7 @@
 static int jffs2_follow_link(struct dentry *dentry, struct nameidata *nd)
 {
struct jffs2_inode_info *f = JFFS2_INODE_INFO(dentry->d_inode);
+   char *p = (char *)f->dents;

/*
 * We don't acquire the f->sem mutex here since the only data we
@@ -45,13 +46,14 @@
 * nd_set_link() call.
 */

-   if (!f->dents) {
+   if (!p) {
printk(KERN_ERR "jffs2_follow_link(): can't find symlink 
taerget\n");
-   return -EIO;
+   p = ERR_PTR(-EIO);
+   } else {
+   D1(printk(KERN_DEBUG "jffs2_follow_link(): target path is 
'%s'\n", (char *) f->dents));
}
-   D1(printk(KERN_DEBUG "jffs2_follow_link(): target path is '%s'\n", 
(char *) f->dents));
 
-   nd_set_link(nd, (char *)f->dents);
+   nd_set_link(nd, p);

/*
 * We unlock the f->sem mutex but VFS will use the f->dents string. 
This is safe
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Anton Altaparmakov
On Fri, 19 Aug 2005, Linus Torvalds wrote:
> On Fri, 19 Aug 2005, Anton Altaparmakov wrote:
> > It does disable link caching.  But I didn't make this up.  This is exactly 
> > what smbfs uses.  I just copied smbfs given ncpfs copies almost everything 
> > smbfs does anyway...
> 
> Can you test whether the untested test-patch I sent out seems to work for 
> your case that bugs out? You'll probably get a few new "initialization 
> from incompatible pointer type" warnings, but I do believe that they 
> should be harmless at least on normal architectures.

Yes, sure.  I have applied your patch to our 2.6.11.4 tree (with the one 
liner change I emailed you just now) and have kicked off a compile.  I am 
afraid the actual testing will have to wait for tomorrow as I was actually 
on the way to bed and the make modules is taking forever or at least 
longer than I can keep my eyes open...

Best regards,

Anton
-- 
Anton Altaparmakov  (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Linus Torvalds


On Fri, 19 Aug 2005, Anton Altaparmakov wrote:
> 
> You are throwing away the error return from vfs_readlink().  I suspect you 
> wanted:
> 
> + cookie = ERR_PTR(res);

Yes, thanks.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix cramfs making duplicate entries in inode cache

2005-08-19 Thread Phillip Lougher

Dave Johnson wrote:



Patch below fixes this by making get_cramfs_inode() use the inode
cache before blindly creating a new entry every time.  This eliminates
the duplicate inodes and duplicate buffer cache.


> +  struct inode * inode = iget_locked(sb, CRAMINO(cramfs_inode));

Doesn't iget_locked() assume inode numbers are unique?

In Cramfs inode numbers are set to 1 for non-data inodes (fifos, 
sockets, devices, empty directories), i.e


%stat device namedpipe
  File: `device'
  Size: 0   Blocks: 0  IO Block: 4096   character 
special file

Device: 700h/1792d  Inode: 1   Links: 1 Device type: 1,1
Access: (0644/crw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 1970-01-01 01:00:00.0 +0100
Modify: 1970-01-01 01:00:00.0 +0100
Change: 1970-01-01 01:00:00.0 +0100
  File: `namedpipe'
  Size: 0   Blocks: 0  IO Block: 4096   fifo
Device: 700h/1792d  Inode: 1   Links: 1
Access: (0644/prw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 1970-01-01 01:00:00.0 +0100
Modify: 1970-01-01 01:00:00.0 +0100
Change: 1970-01-01 01:00:00.0 +0100

Should iget5_locked() be used here?

Phillip
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Anton Altaparmakov
On Fri, 19 Aug 2005, Linus Torvalds wrote:
> The one thing that strikes me is that we might actually have less pain if
> we just changed the calling convention for follow_link/put_link slightly
> instead of creating a new library function. The existing "page cache"  
> thing really _does_ work very well, and would work fine for NFS and ncpfs
> too, if we just allowed an extra cookie to be passed along from
> "follow_link()" to "put_link()".
> 
> A patch like this (totally untested, and you'd need to update any
> filesystems that don't use the regular page_follow_link interface) would
> seem to clean up the mess nicely.. The basic change is that follow_link() 
> returns a error-pointer cookie instead of just zero or error, and that is 
> passed into put_link().
> 
> That simple calling convention change solves all problems. It so _happens_ 
> that any old binary code also continues to work (the cookie will be zero, 
> and put_link will ignore it), so it shouldn't even break any unconverted 
> stuff (unless they mix using their own functions _and_ using the helpher 
> functions, which is of course possible).
> 
> The "shouldn't break nonconverted filesystems" makes me think this is a 
> safe change. Comments?
> 
> NOTE NOTE NOTE! Let me say again that it's untested. It might not break 
> nonconverted filesystems, but it equally well migth break even the 
> converted ones ;)
> 
>   Linus
> 
> 
> diff --git a/fs/namei.c b/fs/namei.c
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -501,6 +501,7 @@ struct path {
>  static inline int __do_follow_link(struct path *path, struct nameidata *nd)
>  {
>   int error;
> + void *cookie;
>   struct dentry *dentry = path->dentry;
>  
>   touch_atime(path->mnt, dentry);
> @@ -508,13 +509,15 @@ static inline int __do_follow_link(struc
>  
>   if (path->mnt == nd->mnt)
>   mntget(path->mnt);
> - error = dentry->d_inode->i_op->follow_link(dentry, nd);
> - if (!error) {
> + cookie = dentry->d_inode->i_op->follow_link(dentry, nd);
> + error = PTR_ERR(cookie);
> + if (!IS_ERR(cookie)) {
>   char *s = nd_get_link(nd);
> + error = 0;
>   if (s)
>   error = __vfs_follow_link(nd, s);
>   if (dentry->d_inode->i_op->put_link)
> - dentry->d_inode->i_op->put_link(dentry, nd);
> + dentry->d_inode->i_op->put_link(dentry, nd, cookie);
>   }
>   dput(dentry);
>   mntput(path->mnt);
> @@ -2345,14 +2348,17 @@ int generic_readlink(struct dentry *dent
>  {
>   struct nameidata nd;
>   int res;
> + void *cookie;
> +
>   nd.depth = 0;
> - res = dentry->d_inode->i_op->follow_link(dentry, &nd);
> - if (!res) {
> + cookie = dentry->d_inode->i_op->follow_link(dentry, &nd);
> + if (!IS_ERR(cookie)) {
>   res = vfs_readlink(dentry, buffer, buflen, nd_get_link(&nd));
>   if (dentry->d_inode->i_op->put_link)
> - dentry->d_inode->i_op->put_link(dentry, &nd);
> + dentry->d_inode->i_op->put_link(dentry, &nd, cookie);
> + cookie = ERR_PTR(0);

You are throwing away the error return from vfs_readlink().  I suspect you 
wanted:

+   cookie = ERR_PTR(res);

>   }
> - return res;
> + return PTR_ERR(cookie);
>  }
>  
>  int vfs_follow_link(struct nameidata *nd, const char *link)
> @@ -2395,23 +2401,19 @@ int page_readlink(struct dentry *dentry,
>   return res;
>  }
>  
> -int page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
> +void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
>  {
>   struct page *page;
>   nd_set_link(nd, page_getlink(dentry, &page));
> - return 0;
> + return page;
>  }
>  
> -void page_put_link(struct dentry *dentry, struct nameidata *nd)
> +void page_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
>  {
>   if (!IS_ERR(nd_get_link(nd))) {
> - struct page *page;
> - page = find_get_page(dentry->d_inode->i_mapping, 0);
> - if (!page)
> - BUG();
> + struct page *page = cookie;
>   kunmap(page);
>   page_cache_release(page);
> - page_cache_release(page);
>   }
>  }
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -993,8 +993,8 @@ struct inode_operations {
>   int (*rename) (struct inode *, struct dentry *,
>   struct inode *, struct dentry *);
>   int (*readlink) (struct dentry *, char __user *,int);
> - int (*follow_link) (struct dentry *, struct nameidata *);
> - void (*put_link) (struct dentry *, struct nameidata *);
> + void * (*follow_link) (struct dentry *, struct nameidata *);
> + void (*put_link) (struct dentry *, struct nameidata *, void *);
>   void (*truncate) (struct inode 

Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Christoph Hellwig
On Fri, Aug 19, 2005 at 08:43:06PM +0100, Al Viro wrote:
> On Fri, Aug 19, 2005 at 08:41:29PM +0100, Matthew Wilcox wrote:
> > > is getting crowded.  Linus, do you have any objections to that or 
> > > suggestions
> > > on filename here?
> > 
> > fs/symlink.c?
> 
> Or fs/lib/symlink.c...

That's a very good idea.  We were in need of fs/lib/ for a long time
to unwind all the generic routines of the VFS logic.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Linus Torvalds


On Fri, 19 Aug 2005, Anton Altaparmakov wrote:
> 
> It does disable link caching.  But I didn't make this up.  This is exactly 
> what smbfs uses.  I just copied smbfs given ncpfs copies almost everything 
> smbfs does anyway...

Can you test whether the untested test-patch I sent out seems to work for 
your case that bugs out? You'll probably get a few new "initialization 
from incompatible pointer type" warnings, but I do believe that they 
should be harmless at least on normal architectures.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Generic fallback for security xattrs

2005-08-19 Thread Chris Wright
* Christoph Hellwig ([EMAIL PROTECTED]) wrote:
> On Fri, Aug 19, 2005 at 01:57:56PM -0400, Stephen Smalley wrote:
> > Note that this
> > approach may be controversial [1]; it has been suggested that we
> > should instead be modifying all filesystem types to support security
> > (and other) xattrs natively, but this seems questionable for legacy
> > filesystems like vfat and pseudo filesystems like proc, especially
> > when the resulting code will end up simply calling the security
> > framework to access the incore security label as with the current
> > devpts and tmpfs handlers.

Agreed, I think the counter points that were made were not reasonable.

> > The patch restructures the code flow slightly to reduce duplication
> > between the normal path and the fallback path, but this should only have
> > one user-visible side effect - a program may get -EACCES rather than
> > -EOPNOTSUPP if policy denied access but the filesystem didn't support
> > the operation anyway.  Note that the post_setxattr hook call is not
> > needed in the fallback case, as the inode_setsecurity hook call handles
> > the incore inode security state update directly.  In contrast, we do
> > call fsnotify in both cases.
> > 
> > Let me know what you think.  Please do NOT apply yet.
> 
> Very nice, and gets rid of lots of crap.  Now that we started parsing
> the attribute name in generic code we should deprecate the old
> ->{get,set,list,remove}xattr inode operations and make the helpers
> James added a while ago mandatory for the future.

I agree it's a nice cleanup.  The only thing I didn't care for was parsing
name in generic code (since it was special cased as the only name), but
you make a great point.  There are toplevel generic names which each fs
has to parse anyway.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Anton Altaparmakov
On Fri, 19 Aug 2005, Linus Torvalds wrote:
> On Fri, 19 Aug 2005, Linus Torvalds wrote:
> > 
> >  - document this as a fundamental fact, and apply the ncpfs patch. Local 
> >filesystems can still continue to use the generic helper functions 
> >(all other users _are_ local filesystems).
> 
> Actually, looking at the ncpfs patch, I'd rather not apply that patch 
> as-is. It looks like it will totally disable symlink caching, which would 
> be kind of sad. Somebody willing to do the same thing NFS does?

It does disable link caching.  But I didn't make this up.  This is exactly 
what smbfs uses.  I just copied smbfs given ncpfs copies almost everything 
smbfs does anyway...

Best regards,

Anton
-- 
Anton Altaparmakov  (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Generic fallback for security xattrs

2005-08-19 Thread Jeff Mahoney
Christoph Hellwig wrote:
> Very nice, and gets rid of lots of crap.  Now that we started parsing
> the attribute name in generic code we should deprecate the old
> ->{get,set,list,remove}xattr inode operations and make the helpers
> James added a while ago mandatory for the future.

I'd be fine with making the generic xattr routines the default with a
few exceptions:

1) Pass the entire name of the xattr (including prefix) to the
underlying filesystem. ReiserFS needs the entire name, including the
prefix, to look up an xattr. Reconstructing the name again, when it was
available in the caller, is wasteful. Once the determination of the
prefix is made, the filesystem code can just skip the prefix length
anyway if it doesn't need it.

2) Keep i_op->listxattr. For xattrs like system.posix_acl_access, sure,
it makes sense. But when selinux or user xattrs are introduced, the name
and number of such xattrs will be unknown to a generic routine.

I have a series of patches for reiserfs xattrs that I'm still working
the kinks out of, mainly performance-wise. With the patch set applied,
reiserfs_{set,get,remove}xattr look identical to the generic versions,
except that find_xattr_handler_prefix doesn't modify the name.

-Jeff

-- 
Jeff Mahoney
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Linus Torvalds


On Fri, 19 Aug 2005, Al Viro wrote:
> 
> FWIW, I'd rather take page_symlink(), page_symlink_inode_operations,
> page_put_link(), page_follow_link_light(), page_readlink(), page_getlink(),
> generic_readlink() and vfs_readlink() to the same place where these guys
> would live.  They all belong together and none of them has any business
> in fs/namei.c.  Options: fs/libfs.c or separate library since fs/libfs.c
> is getting crowded.  Linus, do you have any objections to that or suggestions
> on filename here?

I'm not sure if this merits a new file or new organization (hey,
fs/lib/xxx might be good in theory). In particular, I had actually been
hoping to release 2.6.13 today, but this seems like a valid thing to hold 
things up for - but not if we're going to re-organize things.

The one thing that strikes me is that we might actually have less pain if
we just changed the calling convention for follow_link/put_link slightly
instead of creating a new library function. The existing "page cache"  
thing really _does_ work very well, and would work fine for NFS and ncpfs
too, if we just allowed an extra cookie to be passed along from
"follow_link()" to "put_link()".

A patch like this (totally untested, and you'd need to update any
filesystems that don't use the regular page_follow_link interface) would
seem to clean up the mess nicely.. The basic change is that follow_link() 
returns a error-pointer cookie instead of just zero or error, and that is 
passed into put_link().

That simple calling convention change solves all problems. It so _happens_ 
that any old binary code also continues to work (the cookie will be zero, 
and put_link will ignore it), so it shouldn't even break any unconverted 
stuff (unless they mix using their own functions _and_ using the helpher 
functions, which is of course possible).

The "shouldn't break nonconverted filesystems" makes me think this is a 
safe change. Comments?

NOTE NOTE NOTE! Let me say again that it's untested. It might not break 
nonconverted filesystems, but it equally well migth break even the 
converted ones ;)

Linus


diff --git a/fs/namei.c b/fs/namei.c
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -501,6 +501,7 @@ struct path {
 static inline int __do_follow_link(struct path *path, struct nameidata *nd)
 {
int error;
+   void *cookie;
struct dentry *dentry = path->dentry;
 
touch_atime(path->mnt, dentry);
@@ -508,13 +509,15 @@ static inline int __do_follow_link(struc
 
if (path->mnt == nd->mnt)
mntget(path->mnt);
-   error = dentry->d_inode->i_op->follow_link(dentry, nd);
-   if (!error) {
+   cookie = dentry->d_inode->i_op->follow_link(dentry, nd);
+   error = PTR_ERR(cookie);
+   if (!IS_ERR(cookie)) {
char *s = nd_get_link(nd);
+   error = 0;
if (s)
error = __vfs_follow_link(nd, s);
if (dentry->d_inode->i_op->put_link)
-   dentry->d_inode->i_op->put_link(dentry, nd);
+   dentry->d_inode->i_op->put_link(dentry, nd, cookie);
}
dput(dentry);
mntput(path->mnt);
@@ -2345,14 +2348,17 @@ int generic_readlink(struct dentry *dent
 {
struct nameidata nd;
int res;
+   void *cookie;
+
nd.depth = 0;
-   res = dentry->d_inode->i_op->follow_link(dentry, &nd);
-   if (!res) {
+   cookie = dentry->d_inode->i_op->follow_link(dentry, &nd);
+   if (!IS_ERR(cookie)) {
res = vfs_readlink(dentry, buffer, buflen, nd_get_link(&nd));
if (dentry->d_inode->i_op->put_link)
-   dentry->d_inode->i_op->put_link(dentry, &nd);
+   dentry->d_inode->i_op->put_link(dentry, &nd, cookie);
+   cookie = ERR_PTR(0);
}
-   return res;
+   return PTR_ERR(cookie);
 }
 
 int vfs_follow_link(struct nameidata *nd, const char *link)
@@ -2395,23 +2401,19 @@ int page_readlink(struct dentry *dentry,
return res;
 }
 
-int page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
+void *page_follow_link_light(struct dentry *dentry, struct nameidata *nd)
 {
struct page *page;
nd_set_link(nd, page_getlink(dentry, &page));
-   return 0;
+   return page;
 }
 
-void page_put_link(struct dentry *dentry, struct nameidata *nd)
+void page_put_link(struct dentry *dentry, struct nameidata *nd, void *cookie)
 {
if (!IS_ERR(nd_get_link(nd))) {
-   struct page *page;
-   page = find_get_page(dentry->d_inode->i_mapping, 0);
-   if (!page)
-   BUG();
+   struct page *page = cookie;
kunmap(page);
page_cache_release(page);
-   page_cache_release(page);
}
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -993,8 +993,8 @@ struct inod

[PATCH] fix cramfs making duplicate entries in inode cache

2005-08-19 Thread Dave Johnson

Every time cramfs_lookup() is called to lookup and inode for a dentry,
get_cramfs_inode() will allocate a new inode without checking to see
if that inode already exists in the inode cache.

This is fine the first time, but if the dentry cache entry(ies)
associated with that inode are aged out, but the inode entry is not
aged out (which can be quite common if the inode has buffer cache
linked to it), cramfs_lookup() will be called again and another inode
will be allocated and added to the inode cache creating a duplicate in
the inode cache.

The big issue here is that the buffers associated with each inode
cache entry are not shared between the duplicates!

The older inode entries are now orphaned as no dentry points to it and
won't be freed until the buffer cache assoicated with them are first
freed.  The newest entry will have to create all new buffer cache for
each part of its file as the old buffer cache is now orphaned as well.

Patch below fixes this by making get_cramfs_inode() use the inode
cache before blindly creating a new entry every time.  This eliminates
the duplicate inodes and duplicate buffer cache.

-- 
Dave Johnson
Starent Networks


= fs/cramfs/inode.c 1.42 vs edited =
--- 1.42/fs/cramfs/inode.c  2005-07-14 12:24:48 -04:00
+++ edited/fs/cramfs/inode.c2005-08-19 15:39:05 -04:00
@@ -44,10 +44,10 @@
 
 static struct inode *get_cramfs_inode(struct super_block *sb, struct 
cramfs_inode * cramfs_inode)
 {
-   struct inode * inode = new_inode(sb);
+   struct inode * inode = iget_locked(sb, CRAMINO(cramfs_inode));
static struct timespec zerotime;
 
-   if (inode) {
+   if (inode && (inode->i_state & I_NEW)) {
inode->i_mode = cramfs_inode->mode;
inode->i_uid = cramfs_inode->uid;
inode->i_size = cramfs_inode->size;
@@ -57,11 +57,6 @@
/* Struct copy intentional */
inode->i_mtime = inode->i_atime = inode->i_ctime = zerotime;
inode->i_ino = CRAMINO(cramfs_inode);
-   /* inode->i_nlink is left 1 - arguably wrong for directories,
-  but it's the best we can do without reading the directory
-  contents.  1 yields the right result in GNU find, even
-  without -noleaf option. */
-   insert_inode_hash(inode);
if (S_ISREG(inode->i_mode)) {
inode->i_fop = &generic_ro_fops;
inode->i_data.a_ops = &cramfs_aops;
@@ -76,6 +72,7 @@
init_special_inode(inode, inode->i_mode,
old_decode_dev(cramfs_inode->size));
}
+   unlock_new_inode(inode);
}
return inode;
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH] Stacking support for inode_init_security

2005-08-19 Thread hallyn
The following patch changes the (new to -mm) inode_init_security
function to support multiple LSMs.  It does this by placing the
three passed arguments (name, value, len) into a structure, and
passing in a list_head, onto which the structure can be appended.
The callers (filesystems) call their _xattr_set functions
on each returned (name, value, len) set.

This is useful both for the stacker LSM, and for any two (or more)
LSMs which might want to cooperate even without stacker.

I've tested it under a plain selinux-enabled 2.6.13-rc6-mm1 using
Stephen Smalley's sample exploit originally motivating
inode_init_security, as well as with a simple 'touch ab; ls -Z ab'.

I've also tested it with a corresponding stacker patch, with
selinux stacked with two test LSMs which simply define
inode_init_security.  Again, this passed the sample exploit, and
manually inspecting the .security xattrs gave the expected results.


thanks,
-serge

--
 fs/ext2/xattr_security.c |   22 ++
 fs/ext3/xattr_security.c |   22 ++
 include/linux/security.h |   30 +++---
 mm/shmem.c   |6 ++
 security/dummy.c |2 +-
 security/selinux/hooks.c |   42 ++
 6 files changed, 76 insertions(+), 48 deletions(-)

Index: linux-2.6.13-rc6-mm1/include/linux/security.h
===
--- linux-2.6.13-rc6-mm1.orig/include/linux/security.h  2005-08-19 
17:39:44.0 -0500
+++ linux-2.6.13-rc6-mm1/include/linux/security.h   2005-08-19 
19:02:30.0 -0500
@@ -89,6 +89,14 @@ struct swap_info_struct;
 
 #ifdef CONFIG_SECURITY
 
+struct xattr_data {
+   struct list_head list;
+   char *name;
+   void *value;
+   int len;
+};
+
+
 /**
  * struct security_operations - main security structure
  *
@@ -263,9 +271,13 @@ struct swap_info_struct;
  * then it should return -EOPNOTSUPP to skip this processing.
  * @inode contains the inode structure of the newly created inode.
  * @dir contains the inode structure of the parent directory.
- * @name will be set to the allocated name suffix (e.g. selinux).
- * @value will be set to the allocated attribute value.
- * @len will be set to the length of the value.
+ * @head, if not null, points to a listhead to which to append a
+ * newly allocated struct xattr_data with the following data:
+ * @data->name will be set to the allocated name suffix
+ * (e.g. selinux).
+ * @data->value will be set to the allocated attribute value.
+ * @data->len will be set to the length of the value.
+ * @data->list is used to add the data to the list_head
  * Returns 0 if @name and @value have been successfully set,
  * -EOPNOTSUPP if no security attribute is needed, or
  * -ENOMEM on memory allocation failure.
@@ -1064,7 +1076,7 @@ struct security_operations {
int (*inode_alloc_security) (struct inode *inode);  
void (*inode_free_security) (struct inode *inode);
int (*inode_init_security) (struct inode *inode, struct inode *dir,
-   char **name, void **value, size_t *len);
+   struct list_head *head);
int (*inode_create) (struct inode *dir,
 struct dentry *dentry, int mode);
int (*inode_link) (struct dentry *old_dentry,
@@ -1415,13 +1427,11 @@ static inline void security_inode_free (
 
 static inline int security_inode_init_security (struct inode *inode,
struct inode *dir,
-   char **name,
-   void **value,
-   size_t *len)
+   struct list_head *head)
 {
if (unlikely (IS_PRIVATE (inode)))
return -EOPNOTSUPP;
-   return security_ops->inode_init_security (inode, dir, name, value, len);
+   return security_ops->inode_init_security (inode, dir, head);
 }

 static inline int security_inode_create (struct inode *dir,
@@ -2103,9 +2113,7 @@ static inline void security_inode_free (
 
 static inline int security_inode_init_security (struct inode *inode,
struct inode *dir,
-   char **name,
-   void **value,
-   size_t *len)
+   struct list_head *head)
 {
return -EOPNOTSUPP;
 }
Index: linux-2.6.13-rc6-mm1/mm/shmem.c
===
--- linux-2.6.13-rc6-mm1.orig/mm/shmem.c2005-08-19 17:39:44.0 
-0500
+++ linux-2.6.13-rc

Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Mika Penttilä

Al Viro wrote:


On Fri, Aug 19, 2005 at 10:16:47PM +0300, Mika Penttilä wrote:
 

Just out of curiosity - what protects even local filesystems against 
concurrent truncate and symlink resolving when using the page cache helpers?
   



How do you get truncate(2) or ftruncate(2) to do something with a symlink?
The former follows links, the latter takes an open file...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 

Yes that is right, there is no way to invalidate the symlink inode 
mapping page(s) from user space.


--Mika




-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 08:41:29PM +0100, Matthew Wilcox wrote:
> > is getting crowded.  Linus, do you have any objections to that or 
> > suggestions
> > on filename here?
> 
> fs/symlink.c?

Or fs/lib/symlink.c...
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Matthew Wilcox
On Fri, Aug 19, 2005 at 08:38:34PM +0100, Al Viro wrote:
> FWIW, I'd rather take page_symlink(), page_symlink_inode_operations,
> page_put_link(), page_follow_link_light(), page_readlink(), page_getlink(),
> generic_readlink() and vfs_readlink() to the same place where these guys
> would live.  They all belong together and none of them has any business
> in fs/namei.c.  Options: fs/libfs.c or separate library since fs/libfs.c
> is getting crowded.  Linus, do you have any objections to that or suggestions
> on filename here?

fs/symlink.c?

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 10:16:47PM +0300, Mika Penttilä wrote:
> Just out of curiosity - what protects even local filesystems against 
> concurrent truncate and symlink resolving when using the page cache helpers?

How do you get truncate(2) or ftruncate(2) to do something with a symlink?
The former follows links, the latter takes an open file...
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 07:00:38PM +0100, Christoph Hellwig wrote:
> On Fri, Aug 19, 2005 at 07:02:18PM +0100, Al Viro wrote:
> > On Fri, Aug 19, 2005 at 05:53:32PM +0100, Al Viro wrote:
> > > I'm taking NFS helpers to libfs.c and switching ncpfs to them.  IMO that's
> > > better than copying the damn thing and other network filesystems might 
> > > have
> > > the same needs eventually...
> > 
> > [something like this - completely untested]
> > 
> > * stray_page_get_link(inode, filler) - returns ERR_PTR(error) or pointer
> > to symlink body.  Said symlink body sits in a page at offset equal to
> > offsetof(page, struct stray_page_link).  filler() is expected to put it
> > at such offset. Page is cached.
> > 
> > * stray_page_put_link() - ->put_link() suitable for links obtained from
> > stray_page_get_link().  Unlike the usual pagecache-based variants, this
> > sucker does _not_ rely on page staying cached.
> > 
> > * nfs and ncpfs switched to the helpers above.
> 
> Can you add some kerneldoc comments to describe them?  Especially as
> the name is not very descriptive.

Hey, if anybody has suggestions on names - they are very welcome ;-)

FWIW, I'd rather take page_symlink(), page_symlink_inode_operations,
page_put_link(), page_follow_link_light(), page_readlink(), page_getlink(),
generic_readlink() and vfs_readlink() to the same place where these guys
would live.  They all belong together and none of them has any business
in fs/namei.c.  Options: fs/libfs.c or separate library since fs/libfs.c
is getting crowded.  Linus, do you have any objections to that or suggestions
on filename here?
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Mika Penttilä

Al Viro wrote:


On Fri, Aug 19, 2005 at 05:53:32PM +0100, Al Viro wrote:
 


I'm taking NFS helpers to libfs.c and switching ncpfs to them.  IMO that's
better than copying the damn thing and other network filesystems might have
the same needs eventually...
   



[something like this - completely untested]

* stray_page_get_link(inode, filler) - returns ERR_PTR(error) or pointer
to symlink body.  Said symlink body sits in a page at offset equal to
offsetof(page, struct stray_page_link).  filler() is expected to put it
at such offset. Page is cached.

* stray_page_put_link() - ->put_link() suitable for links obtained from
stray_page_get_link().  Unlike the usual pagecache-based variants, this
sucker does _not_ rely on page staying cached.

* nfs and ncpfs switched to the helpers above.

Signed-off-by: Al Viro <[EMAIL PROTECTED]>

 



Just out of curiosity - what protects even local filesystems against 
concurrent truncate and symlink resolving when using the page cache helpers?


--Mika

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH] Generic fallback for security xattrs

2005-08-19 Thread Stephen Smalley
This is a request for comments (only) on the patch below that modifies
the VFS setxattr, getxattr, and listxattr code to fall back to the
security module for security xattrs if the filesystem does not support
xattrs natively.  This allows security modules to export the incore
inode security label information to userspace even if the filesystem
does not provide xattr storage, and eliminates the need to
individually patch various pseudo filesystem types to provide such
access (note that this patch removes the existing xattr code from
devpts and tmpfs as it is then no longer needed). Note that this
approach may be controversial [1]; it has been suggested that we
should instead be modifying all filesystem types to support security
(and other) xattrs natively, but this seems questionable for legacy
filesystems like vfat and pseudo filesystems like proc, especially
when the resulting code will end up simply calling the security
framework to access the incore security label as with the current
devpts and tmpfs handlers.

The patch restructures the code flow slightly to reduce duplication
between the normal path and the fallback path, but this should only have
one user-visible side effect - a program may get -EACCES rather than
-EOPNOTSUPP if policy denied access but the filesystem didn't support
the operation anyway.  Note that the post_setxattr hook call is not
needed in the fallback case, as the inode_setsecurity hook call handles
the incore inode security state update directly.  In contrast, we do
call fsnotify in both cases.

Let me know what you think.  Please do NOT apply yet.

A prior discussion thread on this patch on the linux-security-module
list is available at
[1] http://marc.theaimsgroup.com/?t=11214414723&r=1&w=2

---

 fs/Kconfig |   43 --
 fs/devpts/Makefile |1 
 fs/devpts/inode.c  |   21 ---
 fs/devpts/xattr_security.c |   47 
 fs/xattr.c |   80 +-
 mm/shmem.c |   85 -
 6 files changed, 49 insertions(+), 228 deletions(-)

diff -X /home/sds/dontdiff -Nrup linux-2.6.13-rc6-mm1/fs/devpts/inode.c 
linux-2.6.13-rc6-mm1-xattr/fs/devpts/inode.c
--- linux-2.6.13-rc6-mm1/fs/devpts/inode.c  2005-06-17 15:48:29.0 
-0400
+++ linux-2.6.13-rc6-mm1-xattr/fs/devpts/inode.c2005-08-19 
13:02:58.0 -0400
@@ -18,28 +18,9 @@
 #include 
 #include 
 #include 
-#include 
 
 #define DEVPTS_SUPER_MAGIC 0x1cd1
 
-extern struct xattr_handler devpts_xattr_security_handler;
-
-static struct xattr_handler *devpts_xattr_handlers[] = {
-#ifdef CONFIG_DEVPTS_FS_SECURITY
-   &devpts_xattr_security_handler,
-#endif
-   NULL
-};
-
-static struct inode_operations devpts_file_inode_operations = {
-#ifdef CONFIG_DEVPTS_FS_XATTR
-   .setxattr   = generic_setxattr,
-   .getxattr   = generic_getxattr,
-   .listxattr  = generic_listxattr,
-   .removexattr= generic_removexattr,
-#endif
-};
-
 static struct vfsmount *devpts_mnt;
 static struct dentry *devpts_root;
 
@@ -102,7 +83,6 @@ devpts_fill_super(struct super_block *s,
s->s_blocksize_bits = 10;
s->s_magic = DEVPTS_SUPER_MAGIC;
s->s_op = &devpts_sops;
-   s->s_xattr = devpts_xattr_handlers;
s->s_time_gran = 1;
 
inode = new_inode(s);
@@ -175,7 +155,6 @@ int devpts_pty_new(struct tty_struct *tt
inode->i_gid = config.setgid ? config.gid : current->fsgid;
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
init_special_inode(inode, S_IFCHR|config.mode, device);
-   inode->i_op = &devpts_file_inode_operations;
inode->u.generic_ip = tty;
 
dentry = get_node(number);
diff -X /home/sds/dontdiff -Nrup linux-2.6.13-rc6-mm1/fs/devpts/Makefile 
linux-2.6.13-rc6-mm1-xattr/fs/devpts/Makefile
--- linux-2.6.13-rc6-mm1/fs/devpts/Makefile 2005-06-17 15:48:29.0 
-0400
+++ linux-2.6.13-rc6-mm1-xattr/fs/devpts/Makefile   2005-08-19 
13:03:07.0 -0400
@@ -5,4 +5,3 @@
 obj-$(CONFIG_UNIX98_PTYS)  += devpts.o
 
 devpts-$(CONFIG_UNIX98_PTYS)   := inode.o
-devpts-$(CONFIG_DEVPTS_FS_SECURITY)+= xattr_security.o
diff -X /home/sds/dontdiff -Nrup 
linux-2.6.13-rc6-mm1/fs/devpts/xattr_security.c 
linux-2.6.13-rc6-mm1-xattr/fs/devpts/xattr_security.c
--- linux-2.6.13-rc6-mm1/fs/devpts/xattr_security.c 2005-06-17 
15:48:29.0 -0400
+++ linux-2.6.13-rc6-mm1-xattr/fs/devpts/xattr_security.c   1969-12-31 
19:00:00.0 -0500
@@ -1,47 +0,0 @@
-/*
- * Security xattr support for devpts.
- *
- * Author: Stephen Smalley <[EMAIL PROTECTED]>
- * Copyright (c) 2004 Red Hat, Inc., James Morris <[EMAIL PROTECTED]>
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; ei

Re: [RFC][PATCH] Generic fallback for security xattrs

2005-08-19 Thread Christoph Hellwig
On Fri, Aug 19, 2005 at 01:57:56PM -0400, Stephen Smalley wrote:
> This is a request for comments (only) on the patch below that modifies
> the VFS setxattr, getxattr, and listxattr code to fall back to the
> security module for security xattrs if the filesystem does not support
> xattrs natively.  This allows security modules to export the incore
> inode security label information to userspace even if the filesystem
> does not provide xattr storage, and eliminates the need to
> individually patch various pseudo filesystem types to provide such
> access (note that this patch removes the existing xattr code from
> devpts and tmpfs as it is then no longer needed). Note that this
> approach may be controversial [1]; it has been suggested that we
> should instead be modifying all filesystem types to support security
> (and other) xattrs natively, but this seems questionable for legacy
> filesystems like vfat and pseudo filesystems like proc, especially
> when the resulting code will end up simply calling the security
> framework to access the incore security label as with the current
> devpts and tmpfs handlers.
> 
> The patch restructures the code flow slightly to reduce duplication
> between the normal path and the fallback path, but this should only have
> one user-visible side effect - a program may get -EACCES rather than
> -EOPNOTSUPP if policy denied access but the filesystem didn't support
> the operation anyway.  Note that the post_setxattr hook call is not
> needed in the fallback case, as the inode_setsecurity hook call handles
> the incore inode security state update directly.  In contrast, we do
> call fsnotify in both cases.
> 
> Let me know what you think.  Please do NOT apply yet.

Very nice, and gets rid of lots of crap.  Now that we started parsing
the attribute name in generic code we should deprecate the old
->{get,set,list,remove}xattr inode operations and make the helpers
James added a while ago mandatory for the future.


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Christoph Hellwig
On Fri, Aug 19, 2005 at 07:02:18PM +0100, Al Viro wrote:
> On Fri, Aug 19, 2005 at 05:53:32PM +0100, Al Viro wrote:
> > I'm taking NFS helpers to libfs.c and switching ncpfs to them.  IMO that's
> > better than copying the damn thing and other network filesystems might have
> > the same needs eventually...
> 
> [something like this - completely untested]
> 
> * stray_page_get_link(inode, filler) - returns ERR_PTR(error) or pointer
> to symlink body.  Said symlink body sits in a page at offset equal to
> offsetof(page, struct stray_page_link).  filler() is expected to put it
> at such offset. Page is cached.
> 
> * stray_page_put_link() - ->put_link() suitable for links obtained from
> stray_page_get_link().  Unlike the usual pagecache-based variants, this
> sucker does _not_ rely on page staying cached.
> 
> * nfs and ncpfs switched to the helpers above.

Can you add some kerneldoc comments to describe them?  Especially as
the name is not very descriptive.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 05:53:32PM +0100, Al Viro wrote:
> I'm taking NFS helpers to libfs.c and switching ncpfs to them.  IMO that's
> better than copying the damn thing and other network filesystems might have
> the same needs eventually...

[something like this - completely untested]

* stray_page_get_link(inode, filler) - returns ERR_PTR(error) or pointer
to symlink body.  Said symlink body sits in a page at offset equal to
offsetof(page, struct stray_page_link).  filler() is expected to put it
at such offset. Page is cached.

* stray_page_put_link() - ->put_link() suitable for links obtained from
stray_page_get_link().  Unlike the usual pagecache-based variants, this
sucker does _not_ rely on page staying cached.

* nfs and ncpfs switched to the helpers above.

Signed-off-by: Al Viro <[EMAIL PROTECTED]>

diff -urN RC14-rc6-git10-base/fs/libfs.c current/fs/libfs.c
--- RC13-rc6-git10-base/fs/libfs.c  2005-08-10 10:37:52.0 -0400
+++ current/fs/libfs.c  2005-08-19 13:29:39.0 -0400
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 int simple_getattr(struct vfsmount *mnt, struct dentry *dentry,
@@ -616,6 +617,42 @@
return ret;
 }
 
+char *stray_page_get_link(struct inode *inode, int (*filler)(struct inode *, 
struct page *))
+{
+   struct stray_page_symlink *p;
+   struct page *page;
+
+   page = read_cache_page(&inode->i_data, 0, (filler_t *)filler, inode);
+   if (IS_ERR(page))
+   goto read_failed;
+   if (!PageUptodate(page))
+   goto getlink_read_error;
+   p = kmap(page);
+   p->page = page;
+   return p->body;
+
+getlink_read_error:
+   page_cache_release(page);
+   page = ERR_PTR(-EIO);
+read_failed:
+   return (char *)page;/* error */
+}
+
+void stray_page_put_link(struct dentry *dentry, struct nameidata *nd)
+{
+   char *s = nd_get_link(nd);
+   if (!IS_ERR(s)) {
+   struct stray_page_symlink *p;
+   struct page *page;
+
+   p = container_of(s, struct stray_page_symlink, body[0]);
+   page = p->page;
+
+   kunmap(page);
+   page_cache_release(page);
+   }
+}
+
 EXPORT_SYMBOL(dcache_dir_close);
 EXPORT_SYMBOL(dcache_dir_lseek);
 EXPORT_SYMBOL(dcache_dir_open);
@@ -648,3 +685,5 @@
 EXPORT_SYMBOL_GPL(simple_attr_close);
 EXPORT_SYMBOL_GPL(simple_attr_read);
 EXPORT_SYMBOL_GPL(simple_attr_write);
+EXPORT_SYMBOL(stray_page_get_link);
+EXPORT_SYMBOL(stray_page_put_link);
diff -urN RC13-rc6-git10-base/fs/ncpfs/inode.c current/fs/ncpfs/inode.c
--- RC13-rc6-git10-base/fs/ncpfs/inode.c2005-06-17 15:48:29.0 
-0400
+++ current/fs/ncpfs/inode.c2005-08-19 13:12:01.0 -0400
@@ -104,7 +104,7 @@
 
 extern struct dentry_operations ncp_root_dentry_operations;
 #if defined(CONFIG_NCPFS_EXTRAS) || defined(CONFIG_NCPFS_NFS_NS)
-extern struct address_space_operations ncp_symlink_aops;
+extern int ncp_follow_link(struct dentry *dentry, struct nameidata *nd);
 extern int ncp_symlink(struct inode*, struct dentry*, const char*);
 #endif
 
@@ -233,8 +233,8 @@
 #if defined(CONFIG_NCPFS_EXTRAS) || defined(CONFIG_NCPFS_NFS_NS)
 static struct inode_operations ncp_symlink_inode_operations = {
.readlink   = generic_readlink,
-   .follow_link= page_follow_link_light,
-   .put_link   = page_put_link,
+   .follow_link= ncp_follow_link,
+   .put_link   = stray_page_put_link,
.setattr= ncp_notify_change,
 };
 #endif
@@ -272,7 +272,6 @@
 #if defined(CONFIG_NCPFS_EXTRAS) || defined(CONFIG_NCPFS_NFS_NS)
} else if (S_ISLNK(inode->i_mode)) {
inode->i_op = &ncp_symlink_inode_operations;
-   inode->i_data.a_ops = &ncp_symlink_aops;
 #endif
} else {
make_bad_inode(inode);
diff -urN RC13-rc6-git10-base/fs/ncpfs/symlink.c current/fs/ncpfs/symlink.c
--- RC13-rc6-git10-base/fs/ncpfs/symlink.c  2005-06-17 15:48:29.0 
-0400
+++ current/fs/ncpfs/symlink.c  2005-08-19 13:26:26.0 -0400
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ncplib_kernel.h"
 
 
@@ -41,9 +42,8 @@
 
 /* - read a symbolic link -- */
 
-static int ncp_symlink_readpage(struct file *file, struct page *page)
+static int symlink_filler(struct inode *inode, struct page *page)
 {
-   struct inode *inode = page->mapping->host;
int error, length, len;
char *link, *rawlink;
char *buf = kmap(page);
@@ -77,6 +77,7 @@
}
 
len = NCP_MAX_SYMLINK_SIZE;
+   buf += offsetof(struct stray_page_symlink, body);
error = ncp_vol2io(NCP_SERVER(inode), buf, &len, link, length, 0);
kfree(rawlink);
if (error)
@@ -96,13 +97,12 @@
return error;
 }
 
-/*
- * symlinks can't do much...
- */
-struct address_space_operations ncp_symlink_aops = {
-   .re

Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 09:43:17AM -0700, Linus Torvalds wrote:
> Actually, looking at the ncpfs patch, I'd rather not apply that patch 
> as-is. It looks like it will totally disable symlink caching, which would 
> be kind of sad. Somebody willing to do the same thing NFS does?
> 
> NFS hides away the "struct page *" pointer at the end of the page data, so
> that when we pass the pointer to the virtual address around, we can 
> trivially look up the "struct page".
> 
> An alternative is to make the symlink address-space use a gfp_mask of 
> GFP_KERNEL (no highmem), at which point ncpfs_put_link() just becomes 
> something like
> 
>   void ncpfs_put_link(struct dentry *dentry, struct nameidata *nd)
>   {
>   char *addr = nd_get_link(nd);
> 
>   if (!IS_ERR(addr))
>   page_cache_release(virt_to_page(addr));
>   }
> 
> which is pretty ugly, but even simpler than the NFS trick.
> 
> Anybody?

I'm taking NFS helpers to libfs.c and switching ncpfs to them.  IMO that's
better than copying the damn thing and other network filesystems might have
the same needs eventually...
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Linus Torvalds


On Fri, 19 Aug 2005, Linus Torvalds wrote:
> 
>  - document this as a fundamental fact, and apply the ncpfs patch. Local 
>filesystems can still continue to use the generic helper functions 
>(all other users _are_ local filesystems).

Actually, looking at the ncpfs patch, I'd rather not apply that patch 
as-is. It looks like it will totally disable symlink caching, which would 
be kind of sad. Somebody willing to do the same thing NFS does?

NFS hides away the "struct page *" pointer at the end of the page data, so
that when we pass the pointer to the virtual address around, we can 
trivially look up the "struct page".

An alternative is to make the symlink address-space use a gfp_mask of 
GFP_KERNEL (no highmem), at which point ncpfs_put_link() just becomes 
something like

void ncpfs_put_link(struct dentry *dentry, struct nameidata *nd)
{
char *addr = nd_get_link(nd);

if (!IS_ERR(addr))
page_cache_release(virt_to_page(addr));
}

which is pretty ugly, but even simpler than the NFS trick.

Anybody?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Linus Torvalds


On Fri, 19 Aug 2005, Linus Torvalds wrote:
> 
> The generic "page cache for symlinks" code does _not_ support invalidating 
> the cache while it's being used. A local filesystem will obviously never 
> invalidate the cache at all. 
> 
> Hmm.. NFS _does_ use the page cache for symlinks [..]

Looking more and more at this, I'm convinced this is it.

Basically, page_follow_link_light() and page_put_link() depend on the fact
that the page in the page cache is the same one the whole time:  
page_follow_link_light() will increment the page count of the page it
finds at offset 0, and page_put_link() will decrement it. If the page has 
changed, they increment/decrement different pages.

There's two ways to fix this:

 - document this as a fundamental fact, and apply the ncpfs patch. Local 
   filesystems can still continue to use the generic helper functions 
   (all other users _are_ local filesystems).

 - make "nameidata" contain not just the virtual addresses of the names, 
   but also have a "struct page *pages[MAX_NESTED_LINKS + 1]", and save 
   away the page there. That will fix ncpfs, and we could then make NFS 
   also use the generic routines.

I suspect that #1 is the prudent one. We have a patch already, and we
don't want to grow nameidata. I'll commit a comment at the head of
page_follow_link_light() too.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 09:07:35AM -0700, Linus Torvalds wrote:
> Hmm.. NFS _does_ use the page cache for symlinks, but uses it slightly 
> differently: instead of relying on the page cache entry being the same 
> when freeing the page, it just caches the page it looked up in the page 
> cache (ie "nfs_follow_link()" does look up the page cache, but then hides 
> the page pointer inside the page data itself (uglee), and thus does not 
> depend on the mapping staying the same (nfs_put_link() just takes the page 
> from the symlink data).

For NFS that was done exactly to deal with cache invalidation.  IIRC, I've
convinced myself that it wasn't going to happen on ncpfs and happily
abstained from duplicating the NFS variant.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Linus Torvalds


On Fri, 19 Aug 2005, Anton Altaparmakov wrote:
>
>   struct page *page;
>   page = find_get_page(dentry->d_inode->i_mapping, 0);
>   if (!page)
> > BUG();

Something has truncated the mapping. 

My guess is that you had a cache invalidate event that removed the page
from the mapping while it was being used. That might also explain why this
only happens for ncpfs. I bet that in the other cases, the mapping was 
also invalidated, but re-filled immediately, and your strace slowed the 
other process down enough that it didn't get to re-fill the cache it 
invalidated or something like that.

The fact that it happens only for cross-volume things might also be 
explained that way - is there something ncpfs does when switching volumes 
that might trigger a cache invalidate for another volume (either on the 
client side or the server side - maybe the server tends to try to break 
the connection for the old volume when you start traversing a new one?)

The generic "page cache for symlinks" code does _not_ support invalidating 
the cache while it's being used. A local filesystem will obviously never 
invalidate the cache at all. 

Hmm.. NFS _does_ use the page cache for symlinks, but uses it slightly 
differently: instead of relying on the page cache entry being the same 
when freeing the page, it just caches the page it looked up in the page 
cache (ie "nfs_follow_link()" does look up the page cache, but then hides 
the page pointer inside the page data itself (uglee), and thus does not 
depend on the mapping staying the same (nfs_put_link() just takes the page 
from the symlink data).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 04:44:06PM +0100, Anton Altaparmakov wrote:
> I tried stracing nautilus to answer your question.  And this time for
> the first time instead of a Bad page state I got a BUG() triggered in
> fs/namei.c, the arrow below marks the spot:
> 
> void page_put_link(struct dentry *dentry, struct nameidata *nd)
> {
>   if (!IS_ERR(nd_get_link(nd))) {
>   struct page *page;
>   page = find_get_page(dentry->d_inode->i_mapping, 0);
>   if (!page)
> > BUG();

Ergo, something had blown page cache for our inode.  Interesting...
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Anton Altaparmakov
On Fri, 2005-08-19 at 15:20 +0100, Al Viro wrote:
> On Fri, Aug 19, 2005 at 12:14:48PM +0100, Anton Altaparmakov wrote:
> > There is a bug somewhere in 2.6.11.4 and I can't figure out where it is.
> > I assume it is present in older and newer kernels, too as the related
> > code hasn't changed much AFAICS and googling for "Bad page state"
> > returns rather a lot of hits relating to both older (up to 2.5.70!) and
> > newer kernels...
> > 
> > Note: PLEASE do not stop reading because you read ncpfs below as I am
> > pretty sure it is not ncpfs related!  And looking at google a lot of
> > people have reported such similar problems since 2.5.70 or so and they
> > were all told to go away as they have bad ram.  That is impossible
> > because this happens on well over 600 workstations and several servers
> > 100% reproducible.  Many different types of hardware, different makes,
> > difference age, all running smp kernels even if single cpu.  You can't
> > tell me they all have bad ram.  Windows works fine and Linux works fine
> > except for that one specific problem which is 100% reproducible...
> > 
> > The bug only appears, but it appears 100% reproducibly when a cross
> > volume symlink on ncpfs is accessed using nautilus under gnome.  I.e.
> > double click on a cross volume symlink on ncpfs in nautilus and the
> > machine locks up solid.
> 
> Ugh...  Could you at least tell what does nautilus attempt to do at that
> point?  Something that wouldn't show up with simple ls -l  or
> cat  >/dev/null, judging by the above, but what?

One important thing I forgot to add is that the symlink must point to a
directory not a file.  If it points to a file all works fine.

I tried stracing nautilus to answer your question.  And this time for
the first time instead of a Bad page state I got a BUG() triggered in
fs/namei.c, the arrow below marks the spot:

void page_put_link(struct dentry *dentry, struct nameidata *nd)
{
if (!IS_ERR(nd_get_link(nd))) {
struct page *page;
page = find_get_page(dentry->d_inode->i_mapping, 0);
if (!page)
>   BUG();
kunmap(page);
page_cache_release(page);
page_cache_release(page);
}
}

Here is the BUG output:

kernel BUG at fs/namei.c:2329!
invalid operand:  [#1]
SMP
Modules linked in: autofs4 nls_cp850 nls_utf8 ncpfs subfs fuse snd
soundcore speedstep_lib freq_table thermal processor fan button battery
ac nvram ipt_REJECT iptable_filter ip_tables af_packet edd joydev evdev
st video1394 ohci1394 raw1394 ieee1394 capability sg sr_mod dm_mod e1000
uhci_hcd usbcore i2c_i801 i2c_core hw_random parport_pc lp parport
reiserfs ide_cd cdrom ide_disk aic7xxx aic79xx via82cxxx ata_piix libata
piix ide_core sd_mod scsi_mod
CPU:3
EIP:0060:[]Tainted: G U VLI
EFLAGS: 00010246   (2.6.11.4-21.8-smp)
EIP is at page_put_link+0x3e/0x50
eax:    ebx:    ecx: d9c75654   edx: 
esi:    edi: d9d3   ebp: ffac2000   esp: d9d31eac
ds: 007b   es: 007b   ss: 0068
Process nautilus (pid: 21910, threadinfo=d9d3 task=f5caf550)
Stack: d9c75654 c0171659  4000 c2025060 0001 d9d31f1c
dff3ec80
   d9c75654 001d981f 0003 d9eb2014  d9d3 d9d31f1c

   d9eb2000 c0171e96 d9eb2000 d9d31f1c 0001 d9eb2000 c017211f
08542878
Call Trace:
 [] link_path_walk+0x929/0xe80
 [] path_lookup+0xa6/0x1c0
 [] __user_walk+0x2f/0x70
 [] vfs_stat+0x1d/0x60
 [] sys_stat64+0xf/0x30
 [] do_syscall_trace+0xb1/0x175
 [] syscall_call+0x7/0xb
Code: 31 d2 8b 80 a8 00 00 00 e8 80 e6 fc ff 85 c0 89 c3 74 18 89 d8 e8
93 5e fa ff 89 d8 e8 2c 80 fd ff 89 d8 5b e9 24 80 fd ff 5b c3 <0f> 0b
19 09 76 bb 32 c0 eb de 90 8d b4 26 00 00 00 00 83 ec 28

The compressed strace output is attached.  It was tracing the main
nautilus process (strace -f -F -o /strace.log -p 21910) which is where
the bug was hit as well.  FWIW I attached strace to the process just
before double-clicking on the symlink in the nautilus window...

Looking at the bottom of the strace.log file it seems like two possibly
concurrent stat64() calls of the symlink were running, one of which did
not complete...  (note this was done on a dual-cpu machine with
hyperthreading enabled, i.e. it appears to have four cpus).

Does this help?

Let me know if I can provide any other details...  (I can provide an ssh
connection to a machine for testing purposes if required.)

Best regards,

Anton
-- 
Anton Altaparmakov  (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/


strace.log.bz2
Description: application/bzip


Re: Mirror a file system on the fly

2005-08-19 Thread Dave Schwartz
Ram,

Your code snippet seems to work great as discussed. Thanks. :-)
However, my requirement is slightly different.  What I also want is
that any file created from the mirrored/cloned file-system must not be
available in the parent file system.

Gracias,
decebel


On 8/18/05, Ram Pai <[EMAIL PROTECTED]> wrote:
> On Thu, 2005-08-18 at 13:27, Dave Schwartz wrote:
> > Hi Ram,
> > Thanks for the inputs. I was going over the man pages describing the
> > clone system call and its option of CLONE_NEWNS. Could understand the
> > description only in parts.
> >
> > The man page suggests that this flag when set, the cloned child is
> > started in a new name space, initialized with a copy of the parent.
> > Now does that mean, a program like a shell when cloned with
> > CLONE_NEWNS set, will have a copy of file hierarchy of the underlying
> > parent process?
> 
> Yes the child process will see an exact copy of all the mounts of
> various filesystems as that of the parent. However if you mount/unmount
> any filesystems in the child, the same will not be mounted/unmounted in
> the parent and vice-versa.  Each has its individual view of the
> the filesystem heirarchy.
> 
> Try the following program that clones off a child process with a mirror
> namespace and gives you a bash prompt. Try mounting and unmounting
> in this bash prompt and see if the same is visible in a totally
> different window.
> 
> 
> #include  
> #include  
> #include  
> 
> char somemem[4096];
> 
> int myfunc(){
> system("bash");
> }
> 
> int
> main(int argc, char *argv[])
> {
> if(clone(myfunc, somemem, CLONE_NEWNS|SIGCHLD, NULL)) {
> wait(NULL);
> } else {
> printf("clone failed\n");
> }
> printf("exit\n");
> }
> 
> 
> Hope this helps,
> RP
> 
> 
> 
> 
> >
> > Gracias,
> > decebel
> >
> >
> >
> > On 8/19/05, Ram Pai <[EMAIL PROTECTED]> wrote:
> > > On Thu, 2005-08-18 at 12:40, Dave Schwartz wrote:
> > > > Hi list,
> > > >
> > > > Not too sure if this is the right forum to ask this question but since
> > > > my requirement is around linux filesystems, I shall take this liberty
> > > > to post my question.
> > > >
> > > > My requirement is to develop a kernel/user space module to add an
> > > > extension to the shell program environment such that this shell forks
> > > > a mirror look-alike filesystem of the underlying OS to the programs
> > > > run in that particular shell.
> > >
> > > u seem to be talking about namespaces, if I get you right.
> > >
> > > there is a flag CLONE_NEWNS to the system call 'clone' which does what
> > > u r talking about.
> > >
> > > RP
> > >
> > >
> > >
> > >
> > > >
> > > >
> > > > Was trying to look thru the FAQ and a few list archives to look for
> > > > ideas around my requirement. The archives were overwhelming.
> > > >
> > > >
> > > > Any ideas/pointers will be a great help,
> > > > Gracias,
> > > > decebel
> > > > -
> > > > To unsubscribe from this list: send the line "unsubscribe 
> > > > linux-fsdevel" in
> > > > the body of a message to [EMAIL PROTECTED]
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >
> > >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Al Viro
On Fri, Aug 19, 2005 at 12:14:48PM +0100, Anton Altaparmakov wrote:
> Hi,
> 
> There is a bug somewhere in 2.6.11.4 and I can't figure out where it is.
> I assume it is present in older and newer kernels, too as the related
> code hasn't changed much AFAICS and googling for "Bad page state"
> returns rather a lot of hits relating to both older (up to 2.5.70!) and
> newer kernels...
> 
> Note: PLEASE do not stop reading because you read ncpfs below as I am
> pretty sure it is not ncpfs related!  And looking at google a lot of
> people have reported such similar problems since 2.5.70 or so and they
> were all told to go away as they have bad ram.  That is impossible
> because this happens on well over 600 workstations and several servers
> 100% reproducible.  Many different types of hardware, different makes,
> difference age, all running smp kernels even if single cpu.  You can't
> tell me they all have bad ram.  Windows works fine and Linux works fine
> except for that one specific problem which is 100% reproducible...
> 
> The bug only appears, but it appears 100% reproducibly when a cross
> volume symlink on ncpfs is accessed using nautilus under gnome.  I.e.
> double click on a cross volume symlink on ncpfs in nautilus and the
> machine locks up solid.

Ugh...  Could you at least tell what does nautilus attempt to do at that
point?  Something that wouldn't show up with simple ls -l  or
cat  >/dev/null, judging by the above, but what?
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel bug: Bad page state: related to generic symlink code and mmap

2005-08-19 Thread Anton Altaparmakov
Hi,

There is a bug somewhere in 2.6.11.4 and I can't figure out where it is.
I assume it is present in older and newer kernels, too as the related
code hasn't changed much AFAICS and googling for "Bad page state"
returns rather a lot of hits relating to both older (up to 2.5.70!) and
newer kernels...

Note: PLEASE do not stop reading because you read ncpfs below as I am
pretty sure it is not ncpfs related!  And looking at google a lot of
people have reported such similar problems since 2.5.70 or so and they
were all told to go away as they have bad ram.  That is impossible
because this happens on well over 600 workstations and several servers
100% reproducible.  Many different types of hardware, different makes,
difference age, all running smp kernels even if single cpu.  You can't
tell me they all have bad ram.  Windows works fine and Linux works fine
except for that one specific problem which is 100% reproducible...

The bug only appears, but it appears 100% reproducibly when a cross
volume symlink on ncpfs is accessed using nautilus under gnome.  I.e.
double click on a cross volume symlink on ncpfs in nautilus and the
machine locks up solid.  Here is what is logged to syslog over the
network (this is an example, after the second message the machine is
locked up solid, I can send you other traces if you want to see more of
them):

Bad page state at free_hot_cold_page (in process 'nautilus', page c10d33c0)
flags:0x200c mapping:cd83f5d0 mapcount:0 count:0
Backtrace:
 [] bad_page+0x7c/0xb0
 [] free_hot_cold_page+0x7c/0x110
 [] __pagevec_free+0x1c/0x30
 [] release_pages+0x163/0x180
 [] __pagevec_lru_add+0xb2/0xc0
 [] lru_add_drain+0x45/0x50
 [] unmap_region+0x2f/0x130
 [] detach_vmas_to_be_unmapped+0x2a/0x60
 [] do_munmap+0x102/0x150
 [] sys_munmap+0x48/0x70
 [] sysenter_past_esp+0x52/0x79
Trying to fix it up, but a reboot is needed
Bad page state at free_hot_cold_page (in process 'nautilus', page c10d33c0)
flags:0x2004 mapping: mapcount:1 count:0
Backtrace:
 [] bad_page+0x7c/0xb0
 [] free_hot_cold_page+0x7c/0x110
 [] link_path_walk+0x929/0xe80
 [] ip_local_deliver+0x192/0x290
 [] ip_rcv+0x414/0x540
 [] path_lookup+0xa6/0x1c0
 [] open_namei+0x82/0x680
 [] recalc_task_prio+0xe7/0x200
 [] filp_open+0x28/0x50
 [] get_unused_fd+0x9a/0xc0
 [] getname+0x68/0xe0
 [] sys_open+0x3c/0xa0
 [] sysenter_past_esp+0x52/0x79
Trying to fix it up, but a reboot is needed

So somehow pages get into a bad state.

Note that using gnome-terminal or console to access the symlink works
fine.  It is only nautilus and looking at the backtrace it has something
to do with mmap.

Note this is not an ncpfs bug AFAICS because ncpfs uses the generic
symlink code provided by VFS.  From fs/ncpfs/inode.c:

static struct inode_operations ncp_symlink_inode_operations = {
.readlink   = generic_readlink,
.follow_link= page_follow_link_light,
.put_link   = page_put_link,
.setattr= ncp_notify_change,
};

And from fs/ncpfs/symlink.c:

struct address_space_operations ncp_symlink_aops = {
.readpage   = ncp_symlink_readpage,
};

And ncp_symlink_readpage() is:

#define NCP_SYMLINK_MAGIC0  cpu_to_le32(0x6c6d7973) /* "symlnk->" */
#define NCP_SYMLINK_MAGIC1  cpu_to_le32(0x3e2d6b6e)

/* - read a symbolic link -- */

static int ncp_symlink_readpage(struct file *file, struct page *page)
{
struct inode *inode = page->mapping->host;
int error, length, len;
char *link, *rawlink;
char *buf = kmap(page);

error = -ENOMEM;
rawlink=(char *)kmalloc(NCP_MAX_SYMLINK_SIZE, GFP_KERNEL);
if (!rawlink)
goto fail;

if (ncp_make_open(inode,O_RDONLY))
goto failEIO;

error=ncp_read_kernel(NCP_SERVER(inode),NCP_FINFO(inode)->file_handle,
 0,NCP_MAX_SYMLINK_SIZE,rawlink,&length);

ncp_inode_close(inode);
/* Close file handle if no other users... */
ncp_make_closed(inode);
if (error)
goto failEIO;

if (NCP_FINFO(inode)->flags & NCPI_KLUDGE_SYMLINK) {
if (length (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

[ncpfs] Fix cross volume symlink lockups due to nautilus.

Signed-off-by: Anton Altaparmakov <[EMAIL PROTECTED]>

diff -urNp linux-2.6.git/fs/ncpfs.old/inode.c linux-2.6.git/fs/ncpfs/inode.c
--- linux-2.6.git/fs/ncpfs.old/inode.c  2005-07-17 07:19:56.0 +0100
+++ linux-2.6.git/fs/ncpfs/inode.c  2005-08-19 12:00:49.0 +0100
@@ -104,7 +104,8 @@ static struct super_operations ncp_sops 
 
 extern struct dentry_operations ncp_root_dentry_operations;
 #if defined(CONFIG_NCPFS_EXTRAS) || defined(CONFIG_NCPFS_NFS_NS)
-extern struct address_space_operations ncp_symlin