Re: [PATCH 02/18] exportfs: add new methods

2007-09-21 Thread Christoph Hellwig
On Thu, Sep 20, 2007 at 05:18:40PM -0700, Andrew Morton wrote:
 On Wed, 19 Sep 2007 18:30:25 +0200
 Christoph Hellwig [EMAIL PROTECTED] wrote:
 
  +   /*
  +* It's not a directory.  Life is a little more complicated.
  +*/
  +   struct dentry *target_dir, *nresult;
  +   char nbuf[NAME_MAX+1];
 
 Lots of stack usage there.

True.  But this is just copying the old code around to make use of the
new methods, so I didn't want to mess with underlying details.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/5] VFS changes

2007-09-21 Thread Miklos Szeredi
Hi,

Here's a bunch a VFS interface changes that are needed to implement
some FUSE features.

The biggest part is patches 1-3, which pass the open file to the
filesystem for syscalls passed an file descriptor, such as fstat(),
fchmod(), etc...  These patches touch a lot of filesystems, but
otherwise quite simple and actually source and binary backward
compatible.

Christoph, can you please take a quick look at the patches, if the
interface changes are OK?

Thanks,
Miklos
--
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/5] VFS: pass open file to -xattr()

2007-09-21 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED]

Pass the open file into the filesystem's *xattr() methods.

This is needed to be able to correctly implement open-unlink-f*xattr
semantics, without having to resort to silly-renaming.

Do this by adding a 'struct file *' parameter to i_op-*xattr().  For
f... variants pass the open file pointer, in other cases pass NULL.

This is safe from a compatibility standpoint, out-of-tree old stuff
will continue to work, but will get a warning at compile time.

Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
---

Index: linux/fs/bad_inode.c
===
--- linux.orig/fs/bad_inode.c   2007-09-21 13:45:07.0 +0200
+++ linux/fs/bad_inode.c2007-09-21 13:45:11.0 +0200
@@ -261,24 +261,25 @@ static int bad_inode_setattr(struct dent
 }
 
 static int bad_inode_setxattr(struct dentry *dentry, const char *name,
-   const void *value, size_t size, int flags)
+   const void *value, size_t size, int flags, struct file *file)
 {
return -EIO;
 }
 
 static ssize_t bad_inode_getxattr(struct dentry *dentry, const char *name,
-   void *buffer, size_t size)
+   void *buffer, size_t size, struct file *file)
 {
return -EIO;
 }
 
 static ssize_t bad_inode_listxattr(struct dentry *dentry, char *buffer,
-   size_t buffer_size)
+   size_t buffer_size, struct file *file)
 {
return -EIO;
 }
 
-static int bad_inode_removexattr(struct dentry *dentry, const char *name)
+static int bad_inode_removexattr(struct dentry *dentry, const char *name,
+struct file *file)
 {
return -EIO;
 }
Index: linux/fs/ext2/xattr.c
===
--- linux.orig/fs/ext2/xattr.c  2007-09-21 13:44:54.0 +0200
+++ linux/fs/ext2/xattr.c   2007-09-21 13:45:11.0 +0200
@@ -328,7 +328,8 @@ cleanup:
  * dentry-d_inode-i_mutex: don't care
  */
 ssize_t
-ext2_listxattr(struct dentry *dentry, char *buffer, size_t size)
+ext2_listxattr(struct dentry *dentry, char *buffer, size_t size,
+  struct file *file)
 {
return ext2_xattr_list(dentry-d_inode, buffer, size);
 }
Index: linux/fs/ext2/xattr.h
===
--- linux.orig/fs/ext2/xattr.h  2007-09-21 13:44:54.0 +0200
+++ linux/fs/ext2/xattr.h   2007-09-21 13:45:11.0 +0200
@@ -61,7 +61,7 @@ extern struct xattr_handler ext2_xattr_a
 extern struct xattr_handler ext2_xattr_acl_default_handler;
 extern struct xattr_handler ext2_xattr_security_handler;
 
-extern ssize_t ext2_listxattr(struct dentry *, char *, size_t);
+extern ssize_t ext2_listxattr(struct dentry *, char *, size_t, struct file *);
 
 extern int ext2_xattr_get(struct inode *, int, const char *, void *, size_t);
 extern int ext2_xattr_set(struct inode *, int, const char *, const void *, 
size_t, int);
Index: linux/fs/ext3/xattr.c
===
--- linux.orig/fs/ext3/xattr.c  2007-09-21 13:44:54.0 +0200
+++ linux/fs/ext3/xattr.c   2007-09-21 13:45:11.0 +0200
@@ -143,7 +143,8 @@ ext3_xattr_handler(int name_index)
  * dentry-d_inode-i_mutex: don't care
  */
 ssize_t
-ext3_listxattr(struct dentry *dentry, char *buffer, size_t size)
+ext3_listxattr(struct dentry *dentry, char *buffer, size_t size,
+  struct file *file)
 {
return ext3_xattr_list(dentry-d_inode, buffer, size);
 }
Index: linux/fs/ext3/xattr.h
===
--- linux.orig/fs/ext3/xattr.h  2007-09-21 13:44:54.0 +0200
+++ linux/fs/ext3/xattr.h   2007-09-21 13:45:11.0 +0200
@@ -64,7 +64,7 @@ extern struct xattr_handler ext3_xattr_a
 extern struct xattr_handler ext3_xattr_acl_default_handler;
 extern struct xattr_handler ext3_xattr_security_handler;
 
-extern ssize_t ext3_listxattr(struct dentry *, char *, size_t);
+extern ssize_t ext3_listxattr(struct dentry *, char *, size_t, struct file *);
 
 extern int ext3_xattr_get(struct inode *, int, const char *, void *, size_t);
 extern int ext3_xattr_list(struct inode *, char *, size_t);
Index: linux/fs/fuse/dir.c
===
--- linux.orig/fs/fuse/dir.c2007-09-21 13:45:07.0 +0200
+++ linux/fs/fuse/dir.c 2007-09-21 13:45:11.0 +0200
@@ -1118,7 +1118,8 @@ static int fuse_getattr(struct vfsmount 
 }
 
 static int fuse_setxattr(struct dentry *entry, const char *name,
-const void *value, size_t size, int flags)
+const void *value, size_t size, int flags,
+struct file *file)
 {
struct inode *inode = entry-d_inode;
struct fuse_conn *fc = get_fuse_conn(inode);
@@ -1156,7 +1157,7 @@ static int fuse_setxattr(struct 

[patch 1/5] VFS: pass open file to -setattr()

2007-09-21 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED]

Pass the open file into the filesystem's -setattr() method for
fchmod, fchown and some of the utimes variants.

This is needed to be able to correctly implement open-unlink-fsetattr
semantics in some filesystem such as sshfs, without having to resort
to silly-renaming.

The infrastructure is already there, so just need to fill in
attrs.ia_file and set ATTR_FILE in attrs.ia_valid.

Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
---

Index: linux/fs/open.c
===
--- linux.orig/fs/open.c2007-09-21 13:44:57.0 +0200
+++ linux/fs/open.c 2007-09-21 13:45:04.0 +0200
@@ -581,7 +581,8 @@ asmlinkage long sys_fchmod(unsigned int 
if (mode == (mode_t) -1)
mode = inode-i_mode;
newattrs.ia_mode = (mode  S_IALLUGO) | (inode-i_mode  ~S_IALLUGO);
-   newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
+   newattrs.ia_valid = ATTR_MODE | ATTR_CTIME | ATTR_FILE;
+   newattrs.ia_file = file;
err = notify_change(dentry, newattrs);
mutex_unlock(inode-i_mutex);
 
@@ -631,7 +632,8 @@ asmlinkage long sys_chmod(const char __u
return sys_fchmodat(AT_FDCWD, filename, mode);
 }
 
-static int chown_common(struct dentry * dentry, uid_t user, gid_t group)
+static int chown_common(struct dentry * dentry, uid_t user, gid_t group,
+   struct file *file)
 {
struct inode * inode;
int error;
@@ -660,6 +662,10 @@ static int chown_common(struct dentry * 
if (!S_ISDIR(inode-i_mode))
newattrs.ia_valid |=
ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
+   if (file) {
+   newattrs.ia_file = file;
+   newattrs.ia_valid |= ATTR_FILE;
+   }
mutex_lock(inode-i_mutex);
error = notify_change(dentry, newattrs);
mutex_unlock(inode-i_mutex);
@@ -675,7 +681,7 @@ asmlinkage long sys_chown(const char __u
error = user_path_walk(filename, nd);
if (error)
goto out;
-   error = chown_common(nd.dentry, user, group);
+   error = chown_common(nd.dentry, user, group, NULL);
path_release(nd);
 out:
return error;
@@ -695,7 +701,7 @@ asmlinkage long sys_fchownat(int dfd, co
error = __user_walk_fd(dfd, filename, follow, nd);
if (error)
goto out;
-   error = chown_common(nd.dentry, user, group);
+   error = chown_common(nd.dentry, user, group, NULL);
path_release(nd);
 out:
return error;
@@ -709,7 +715,7 @@ asmlinkage long sys_lchown(const char __
error = user_path_walk_link(filename, nd);
if (error)
goto out;
-   error = chown_common(nd.dentry, user, group);
+   error = chown_common(nd.dentry, user, group, NULL);
path_release(nd);
 out:
return error;
@@ -728,7 +734,7 @@ asmlinkage long sys_fchown(unsigned int 
 
dentry = file-f_path.dentry;
audit_inode(NULL, dentry);
-   error = chown_common(dentry, user, group);
+   error = chown_common(dentry, user, group, file);
fput(file);
 out:
return error;
Index: linux/fs/utimes.c
===
--- linux.orig/fs/utimes.c  2007-09-21 13:44:57.0 +0200
+++ linux/fs/utimes.c   2007-09-21 13:45:04.0 +0200
@@ -130,6 +130,10 @@ long do_utimes(int dfd, char __user *fil
}
}
}
+   if (f) {
+   newattrs.ia_file = f;
+   newattrs.ia_valid |= ATTR_FILE;
+   }
mutex_lock(inode-i_mutex);
error = notify_change(dentry, newattrs);
mutex_unlock(inode-i_mutex);

--
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/5] VFS: pass open file to -getattr()

2007-09-21 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED]

Pass the open file into the filesystem's -getattr() method for
fstat().

This is needed to be able to correctly implement open-unlink-fstat
semantics in some filesystem such as sshfs, without having to resort
to silly-renaming.

Do this by adding a 'struct file *' parameter to i_op-getattr().  For
fstat() pass the open file pointer, in other cases pass NULL.

This is safe from a compatibility standpoint, out-of-tree old stuff
will continue to work, but will get a warning at compile time.

Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
---

Index: linux/fs/9p/vfs_inode.c
===
--- linux.orig/fs/9p/vfs_inode.c2007-09-21 14:08:25.0 +0200
+++ linux/fs/9p/vfs_inode.c 2007-09-21 14:08:40.0 +0200
@@ -706,7 +706,7 @@ done:
 
 static int
 v9fs_vfs_getattr(struct vfsmount *mnt, struct dentry *dentry,
-struct kstat *stat)
+struct kstat *stat, struct file *file)
 {
int err;
struct v9fs_session_info *v9ses;
@@ -717,7 +717,7 @@ v9fs_vfs_getattr(struct vfsmount *mnt, s
err = -EPERM;
v9ses = v9fs_inode2v9ses(dentry-d_inode);
if (v9ses-cache == CACHE_LOOSE)
-   return simple_getattr(mnt, dentry, stat);
+   return simple_getattr(mnt, dentry, stat, file);
 
fid = v9fs_fid_lookup(dentry);
if (IS_ERR(fid))
Index: linux/fs/afs/inode.c
===
--- linux.orig/fs/afs/inode.c   2007-09-21 14:08:25.0 +0200
+++ linux/fs/afs/inode.c2007-09-21 14:08:40.0 +0200
@@ -295,7 +295,7 @@ error_unlock:
  * read the attributes of an inode
  */
 int afs_getattr(struct vfsmount *mnt, struct dentry *dentry,
- struct kstat *stat)
+ struct kstat *stat, struct file *file)
 {
struct inode *inode;
 
Index: linux/fs/afs/internal.h
===
--- linux.orig/fs/afs/internal.h2007-09-21 14:08:25.0 +0200
+++ linux/fs/afs/internal.h 2007-09-21 14:08:40.0 +0200
@@ -548,7 +548,8 @@ extern struct inode *afs_iget(struct sup
  struct afs_callback *);
 extern void afs_zap_data(struct afs_vnode *);
 extern int afs_validate(struct afs_vnode *, struct key *);
-extern int afs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
+extern int afs_getattr(struct vfsmount *, struct dentry *, struct kstat *,
+  struct file *);
 extern int afs_setattr(struct dentry *, struct iattr *);
 extern void afs_clear_inode(struct inode *);
 
Index: linux/fs/bad_inode.c
===
--- linux.orig/fs/bad_inode.c   2007-09-21 14:08:25.0 +0200
+++ linux/fs/bad_inode.c2007-09-21 14:08:40.0 +0200
@@ -250,7 +250,7 @@ static int bad_inode_permission(struct i
 }
 
 static int bad_inode_getattr(struct vfsmount *mnt, struct dentry *dentry,
-   struct kstat *stat)
+   struct kstat *stat, struct file *file)
 {
return -EIO;
 }
Index: linux/fs/cifs/cifsfs.h
===
--- linux.orig/fs/cifs/cifsfs.h 2007-09-21 14:08:25.0 +0200
+++ linux/fs/cifs/cifsfs.h  2007-09-21 14:08:40.0 +0200
@@ -55,7 +55,8 @@ extern int cifs_rmdir(struct inode *, st
 extern int cifs_rename(struct inode *, struct dentry *, struct inode *,
   struct dentry *);
 extern int cifs_revalidate(struct dentry *);
-extern int cifs_getattr(struct vfsmount *, struct dentry *, struct kstat *);
+extern int cifs_getattr(struct vfsmount *, struct dentry *, struct kstat *,
+   struct file *);
 extern int cifs_setattr(struct dentry *, struct iattr *);
 
 extern const struct inode_operations cifs_file_inode_ops;
Index: linux/fs/cifs/inode.c
===
--- linux.orig/fs/cifs/inode.c  2007-09-21 14:08:25.0 +0200
+++ linux/fs/cifs/inode.c   2007-09-21 14:08:40.0 +0200
@@ -1340,7 +1340,7 @@ int cifs_revalidate(struct dentry *diren
 }
 
 int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry,
-   struct kstat *stat)
+   struct kstat *stat, struct file *file)
 {
int err = cifs_revalidate(dentry);
if (!err) {
Index: linux/fs/coda/inode.c
===
--- linux.orig/fs/coda/inode.c  2007-09-21 14:08:25.0 +0200
+++ linux/fs/coda/inode.c   2007-09-21 14:08:40.0 +0200
@@ -220,7 +220,8 @@ static void coda_clear_inode(struct inod
coda_cache_clear_inode(inode);
 }
 
-int coda_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat 
*stat)
+int coda_getattr(struct vfsmount *mnt, struct dentry *dentry,
+

Re: [patch 3/5] VFS: pass open file to -xattr()

2007-09-21 Thread Christoph Hellwig
On Fri, Sep 21, 2007 at 02:23:46PM +0200, Miklos Szeredi wrote:
 From: Miklos Szeredi [EMAIL PROTECTED]
 
 Pass the open file into the filesystem's *xattr() methods.
 
 This is needed to be able to correctly implement open-unlink-f*xattr
 semantics, without having to resort to silly-renaming.
 
 Do this by adding a 'struct file *' parameter to i_op-*xattr().  For
 f... variants pass the open file pointer, in other cases pass NULL.
 
 This is safe from a compatibility standpoint, out-of-tree old stuff
 will continue to work, but will get a warning at compile time.

NACK, no more optional arguments, and passing file structs to xattr
stuff is silly.  If your filesystem doesn't get open but unliked
right you will have to resort to silly renaming, I'm sorry.

Same argument applies to all pass file down patches in the series,
I won't comment on the separately.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 4/5] VFS: allow filesystems to implement atomic open+truncate

2007-09-21 Thread Christoph Hellwig
On Fri, Sep 21, 2007 at 02:23:47PM +0200, Miklos Szeredi wrote:
 From: Miklos Szeredi [EMAIL PROTECTED]
 
 Add a new attribute flag ATTR_OPEN, with the meaning: truncation was
 initiated by open() due to the O_TRUNC flag.
 
 This way filesystems wanting to implement truncation within their
 -open() method can ignore such truncate requests.
 
 This is a quick  dirty hack, but it comes for free.

Fine with me as it doesn't cause any active harm, but expect this to
go away once the nfs intent mess is cleaned up and we'll get a real
method for this kind of thing.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 4/5] VFS: allow filesystems to implement atomic open+truncate

2007-09-21 Thread Miklos Szeredi
From: Miklos Szeredi [EMAIL PROTECTED]

Add a new attribute flag ATTR_OPEN, with the meaning: truncation was
initiated by open() due to the O_TRUNC flag.

This way filesystems wanting to implement truncation within their
-open() method can ignore such truncate requests.

This is a quick  dirty hack, but it comes for free.

Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
---

Index: linux/fs/namei.c
===
--- linux.orig/fs/namei.c   2007-09-21 13:44:53.0 +0200
+++ linux/fs/namei.c2007-09-21 13:45:14.0 +0200
@@ -1656,8 +1656,10 @@ int may_open(struct nameidata *nd, int a
error = locks_verify_locked(inode);
if (!error) {
DQUOT_INIT(inode);
-   
-   error = do_truncate(dentry, 0, ATTR_MTIME|ATTR_CTIME, 
NULL);
+
+   error = do_truncate(dentry, 0,
+   ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
+   NULL);
}
put_write_access(inode);
if (error)
Index: linux/include/linux/fs.h
===
--- linux.orig/include/linux/fs.h   2007-09-21 13:45:11.0 +0200
+++ linux/include/linux/fs.h2007-09-21 13:45:14.0 +0200
@@ -337,6 +337,7 @@ typedef void (dio_iodone_t)(struct kiocb
 #define ATTR_KILL_SGID 4096
 #define ATTR_FILE  8192
 #define ATTR_KILL_PRIV 16384
+#define ATTR_OPEN  32768   /* Truncating from open(O_TRUNC) */
 
 /*
  * This is the Inode Attributes structure, used for notify_change().  It
@@ -1526,7 +1527,7 @@ static inline int break_lease(struct ino
 
 /* fs/open.c */
 
-extern int do_truncate(struct dentry *, loff_t start, unsigned int time_attrs,
+extern int do_truncate(struct dentry *, loff_t start, unsigned int attrs,
   struct file *filp);
 extern long do_sys_open(int fdf, const char __user *filename, int flags,
int mode);
Index: linux/fs/open.c
===
--- linux.orig/fs/open.c2007-09-21 13:45:04.0 +0200
+++ linux/fs/open.c 2007-09-21 13:45:14.0 +0200
@@ -194,7 +194,7 @@ out:
return error;
 }
 
-int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
+int do_truncate(struct dentry *dentry, loff_t length, unsigned int attrs,
struct file *filp)
 {
int err;
@@ -205,7 +205,7 @@ int do_truncate(struct dentry *dentry, l
return -EINVAL;
 
newattrs.ia_size = length;
-   newattrs.ia_valid = ATTR_SIZE | time_attrs;
+   newattrs.ia_valid = ATTR_SIZE | attrs;
if (filp) {
newattrs.ia_file = filp;
newattrs.ia_valid |= ATTR_FILE;

--
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/5] VFS: allow filesystem to override mknod capability checks

2007-09-21 Thread Christoph Hellwig
On Fri, Sep 21, 2007 at 02:23:48PM +0200, Miklos Szeredi wrote:
 From: Miklos Szeredi [EMAIL PROTECTED]
 
 Add a new filesystem flag, that results in the VFS not checking if the
 current process has enough privileges to do an mknod().
 
 This is needed on filesystems, where an unprivileged user may be able
 to create a device node, without causing security problems.

A user should never be able to create devices.  And no, I don't want to
see a filesystem that implements it's own file operations for device nodes.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5] VFS: pass open file to -xattr()

2007-09-21 Thread Miklos Szeredi
  I don't think it's silly.  Read/write get passed the file descriptor,
  and it makes a lot of sense, if the filesystem has stateful opens.
  
  Similarly for any fs operation that gets a file descriptor, it makes
  sense to pass the relevant open file down into the filesystem.
 
 read/write fundamentally operate on file descriptors.  None of these
 operations does, rather their normal forms get a path name and special
 forms operate on a file descriptor to avoid lookup races.  Still the
 underlying operation has nothing to do with the file descriptor at all.

I don't see why read/write are special, normal filesystems don't
need the file descriptor in that case either.

And other in BSD's, OS X, the read/write fs ops don't get the open
file, and their fuse implementations have to hack around this.

The exact same thing is true for the other operations on file
descriptors, just in those cases Linux does the same as the other
OS's.

  If you look carefully, the ftrunacate() already does this, becuse
  without that it's impossible to implement correct semantics in the
  filesystem in some cases.
 
 ftruncate is a special case due to O_TRUNC.

No, it's special, because it does not do permission checking, while
truncate() does.

  For other operations it's not impossible, but it would mean more hacks
  in the filesystem itself (such as sillyrenaming) that are entirely
  unneeded if the file info is available.
 
 It's not a problem at all for filesystem that implement normal unix
 semantics.  If you want to shoer-horn strange semantics that barely
 fit, you'll need some more hacks.

What about sshfs.  Basically it uses the sftp protocol, which more or
less implements the UNIX API in a remote protocol.

So it has OPEN, READ, WRITE, CLOSE, STAT, FSTAT, etc. operation.  Now
why is FSTAT different than READ or WRITE?

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/5] VFS: allow filesystem to override mknod capability checks

2007-09-21 Thread Miklos Szeredi
  Take this example: I've loopback mounted an UML disk image using fuse
  (no privileges required), and want to create some device nodes.  I
  can't yet boot the UML because the device node is missing from the
  image.  So what should I do.  Currently I have to manipulate the
  mounted image as root.  But that's really shouldn't be needed.
 
 That's something that shouldn't be solved in the filesystem, but rather
 through exact semantics of unprivilegued mounts.  Given that an
 unprivilegued implies ignoring the device files we can easily allow
 users to create them, because they're nothing special anymore.

Exacly.  And we already have an API for that: mknod(2).  It would be
quite stupid to introduce _another_ API to do the same.  It would mean
that all the tools, like mknod(8) would not work with the new API.

Or am I misunderstanding your suggestion?

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/5] VFS: allow filesystem to override mknod capability checks

2007-09-21 Thread Christoph Hellwig
On Fri, Sep 21, 2007 at 03:10:26PM +0200, Miklos Szeredi wrote:
 Take this example: I've loopback mounted an UML disk image using fuse
 (no privileges required), and want to create some device nodes.  I
 can't yet boot the UML because the device node is missing from the
 image.  So what should I do.  Currently I have to manipulate the
 mounted image as root.  But that's really shouldn't be needed.

That's something that shouldn't be solved in the filesystem, but rather
through exact semantics of unprivilegued mounts.  Given that an
unprivilegued implies ignoring the device files we can easily allow
users to create them, because they're nothing special anymore.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/5] VFS: allow filesystem to override mknod capability checks

2007-09-21 Thread Miklos Szeredi
  From: Miklos Szeredi [EMAIL PROTECTED]
  
  Add a new filesystem flag, that results in the VFS not checking if the
  current process has enough privileges to do an mknod().
  
  This is needed on filesystems, where an unprivileged user may be able
  to create a device node, without causing security problems.
 
 A user should never be able to create devices.

A user can already create a device with fuse implicitly.  This patch
would just allow that explicitly.

Take this example: I've loopback mounted an UML disk image using fuse
(no privileges required), and want to create some device nodes.  I
can't yet boot the UML because the device node is missing from the
image.  So what should I do.  Currently I have to manipulate the
mounted image as root.  But that's really shouldn't be needed.

 And no, I don't want to see a filesystem that implements it's own
 file operations for device nodes.

I don't want that either, and it has nothing to do with this patch.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5] VFS: pass open file to -xattr()

2007-09-21 Thread Miklos Szeredi
 On Fri, Sep 21, 2007 at 02:23:46PM +0200, Miklos Szeredi wrote:
  From: Miklos Szeredi [EMAIL PROTECTED]
  
  Pass the open file into the filesystem's *xattr() methods.
  
  This is needed to be able to correctly implement open-unlink-f*xattr
  semantics, without having to resort to silly-renaming.
  
  Do this by adding a 'struct file *' parameter to i_op-*xattr().  For
  f... variants pass the open file pointer, in other cases pass NULL.
  
  This is safe from a compatibility standpoint, out-of-tree old stuff
  will continue to work, but will get a warning at compile time.
 
 NACK, no more optional arguments, and passing file structs to xattr
 stuff is silly.  If your filesystem doesn't get open but unliked
 right you will have to resort to silly renaming, I'm sorry.
 
 Same argument applies to all pass file down patches in the series,
 I won't comment on the separately.

I don't think it's silly.  Read/write get passed the file descriptor,
and it makes a lot of sense, if the filesystem has stateful opens.

Similarly for any fs operation that gets a file descriptor, it makes
sense to pass the relevant open file down into the filesystem.

If you look carefully, the ftrunacate() already does this, becuse
without that it's impossible to implement correct semantics in the
filesystem in some cases.

For other operations it's not impossible, but it would mean more hacks
in the filesystem itself (such as sillyrenaming) that are entirely
unneeded if the file info is available.

I agree, that the xattr API is quite ugly already, but adding one more
argument won't make it all that much worse.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 4/5] VFS: allow filesystems to implement atomic open+truncate

2007-09-21 Thread Miklos Szeredi
  From: Miklos Szeredi [EMAIL PROTECTED]
  
  Add a new attribute flag ATTR_OPEN, with the meaning: truncation was
  initiated by open() due to the O_TRUNC flag.
  
  This way filesystems wanting to implement truncation within their
  -open() method can ignore such truncate requests.
  
  This is a quick  dirty hack, but it comes for free.
 
 Fine with me as it doesn't cause any active harm, but expect this to
 go away once the nfs intent mess is cleaned up and we'll get a real
 method for this kind of thing.

Sure.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5] VFS: pass open file to -xattr()

2007-09-21 Thread Christoph Hellwig
On Fri, Sep 21, 2007 at 03:00:06PM +0200, Miklos Szeredi wrote:
 I don't think it's silly.  Read/write get passed the file descriptor,
 and it makes a lot of sense, if the filesystem has stateful opens.
 
 Similarly for any fs operation that gets a file descriptor, it makes
 sense to pass the relevant open file down into the filesystem.

read/write fundamentally operate on file descriptors.  None of these
operations does, rather their normal forms get a path name and special
forms operate on a file descriptor to avoid lookup races.  Still the
underlying operation has nothing to do with the file descriptor at all.

 If you look carefully, the ftrunacate() already does this, becuse
 without that it's impossible to implement correct semantics in the
 filesystem in some cases.

ftruncate is a special case due to O_TRUNC.  But I have plans to solve
this whole issue more elegant than the current hack.

 For other operations it's not impossible, but it would mean more hacks
 in the filesystem itself (such as sillyrenaming) that are entirely
 unneeded if the file info is available.

It's not a problem at all for filesystem that implement normal unix
semantics.  If you want to shoer-horn strange semantics that barely
fit, you'll need some more hacks.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5] VFS: pass open file to -xattr()

2007-09-21 Thread Trond Myklebust
On Fri, 2007-09-21 at 15:16 +0200, Miklos Szeredi wrote:

  ftruncate is a special case due to O_TRUNC.
 
 No, it's special, because it does not do permission checking, while
 truncate() does.

So why not just add file-f_op-ftruncate() and file-f_op-fstat()?
Most filesystems can trivially redirect these to do_truncate() and their
existing getattr() method. Those, like FUSE, that care can use the hook.
In fact, I think that NFSv4 could also benefit from an ftruncate():
currently we have to hunt around for an open file context for that
particular case.

Trond

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/5] VFS: allow filesystem to override mknod capability checks

2007-09-21 Thread Christoph Hellwig
On Fri, Sep 21, 2007 at 03:18:33PM +0200, Miklos Szeredi wrote:
  That's something that shouldn't be solved in the filesystem, but rather
  through exact semantics of unprivilegued mounts.  Given that an
  unprivilegued implies ignoring the device files we can easily allow
  users to create them, because they're nothing special anymore.
 
 Exacly.  And we already have an API for that: mknod(2).  It would be
 quite stupid to introduce _another_ API to do the same.  It would mean
 that all the tools, like mknod(8) would not work with the new API.
 
 Or am I misunderstanding your suggestion?

Yes :)

My suggestions is:

 - mknod for unprivilegued user is allowed in the following case

(1) mount point is mounted with MNT_NODEV
(2) mount point is owner by the user doing mknod

 - and maybe

(3) we have a special mount option to allow it if we don't want
to allow it for normal unprivilegued mounts for some reason

which implies we need to get in unprivilegued mounts first, but we'll
have to do that anyway.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5] VFS: pass open file to -xattr()

2007-09-21 Thread Christoph Hellwig
On Fri, Sep 21, 2007 at 10:32:31AM -0400, Trond Myklebust wrote:
 On Fri, 2007-09-21 at 15:16 +0200, Miklos Szeredi wrote:
 
   ftruncate is a special case due to O_TRUNC.
  
  No, it's special, because it does not do permission checking, while
  truncate() does.
 
 So why not just add file-f_op-ftruncate() and file-f_op-fstat()?
 Most filesystems can trivially redirect these to do_truncate() and their
 existing getattr() method. Those, like FUSE, that care can use the hook.
 In fact, I think that NFSv4 could also benefit from an ftruncate():
 currently we have to hunt around for an open file context for that
 particular case.

Havin the file for fruncate is fine and I'm planning to do something
along those lines.  Having it for (f)stat is dumb because the operation
is in no way related to the open file descriptor.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/5] VFS: allow filesystem to override mknod capability checks

2007-09-21 Thread Miklos Szeredi
 On Fri, Sep 21, 2007 at 03:18:33PM +0200, Miklos Szeredi wrote:
   That's something that shouldn't be solved in the filesystem, but rather
   through exact semantics of unprivilegued mounts.  Given that an
   unprivilegued implies ignoring the device files we can easily allow
   users to create them, because they're nothing special anymore.
  
  Exacly.  And we already have an API for that: mknod(2).  It would be
  quite stupid to introduce _another_ API to do the same.  It would mean
  that all the tools, like mknod(8) would not work with the new API.
  
  Or am I misunderstanding your suggestion?
 
 Yes :)
 
 My suggestions is:
 
  - mknod for unprivilegued user is allowed in the following case
 
 (1) mount point is mounted with MNT_NODEV
 (2) mount point is owner by the user doing mknod

Ah, OK.  Well, that's what fuse would do with the above change.  So
you are basically saying, the change is OK, but we want proper
unprivileged mounts first.

  - and maybe
 
 (3) we have a special mount option to allow it if we don't want
 to allow it for normal unprivilegued mounts for some reason

I'm sure we don't want it by default.

For example if user bind mounts / onto /home/user/myroot (with 'nodev'
of couse), we still don't want mknod to work on that mount, for
obvious reasons.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/5] VFS: allow filesystem to override mknod capability checks

2007-09-21 Thread Christoph Hellwig
On Fri, Sep 21, 2007 at 04:48:58PM +0200, Miklos Szeredi wrote:
 Ah, OK.  Well, that's what fuse would do with the above change.  So
 you are basically saying, the change is OK, but we want proper
 unprivileged mounts first.

Yes, that and that it should be a mount flag, not a file_system_type
flag.

 I'm sure we don't want it by default.
 
 For example if user bind mounts / onto /home/user/myroot (with 'nodev'
 of couse), we still don't want mknod to work on that mount, for
 obvious reasons.

True, we'll have to deny it if there is any non-privilegued mount of
the backing device possible.   At this point it's getting rather nasty
and I wonder whether it's really worth it..
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5] VFS: pass open file to -xattr()

2007-09-21 Thread Miklos Szeredi
 On Fri, Sep 21, 2007 at 10:32:31AM -0400, Trond Myklebust wrote:
  On Fri, 2007-09-21 at 15:16 +0200, Miklos Szeredi wrote:
  
ftruncate is a special case due to O_TRUNC.
   
   No, it's special, because it does not do permission checking, while
   truncate() does.
  
  So why not just add file-f_op-ftruncate() and file-f_op-fstat()?
  Most filesystems can trivially redirect these to do_truncate() and their
  existing getattr() method. Those, like FUSE, that care can use the hook.
  In fact, I think that NFSv4 could also benefit from an ftruncate():
  currently we have to hunt around for an open file context for that
  particular case.
 
 Havin the file for fruncate is fine and I'm planning to do something
 along those lines.  Having it for (f)stat is dumb because the operation
 is in no way related to the open file descriptor.

What I'm saying is that read and write are _no_more_ related to the
file than fstat.  Read/write operate on inode data, fstat operates on
inode metadata.

OK, read/write have a position state in the open file, but that is
something the filesystem should _never_ touch anyway, so it's
irrelevant to the discussion.

The fact is, if the filesystem uses a stateful open API, which defines
an fstat() operation, it can be useful to use that instead of the
plain stat().  But that is only possible if the VFS supplies the open
file, otherwise there will be just hunting around for suitable open
files, or sillyrenaming.  None of which is desirable.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/5] VFS: allow filesystem to override mknod capability checks

2007-09-21 Thread Miklos Szeredi
 On Fri, Sep 21, 2007 at 04:48:58PM +0200, Miklos Szeredi wrote:
  Ah, OK.  Well, that's what fuse would do with the above change.  So
  you are basically saying, the change is OK, but we want proper
  unprivileged mounts first.
 
 Yes, that and that it should be a mount flag, not a file_system_type
 flag.
 
  I'm sure we don't want it by default.
  
  For example if user bind mounts / onto /home/user/myroot (with 'nodev'
  of couse), we still don't want mknod to work on that mount, for
  obvious reasons.
 
 True, we'll have to deny it if there is any non-privilegued mount of
 the backing device possible.   At this point it's getting rather nasty
 and I wonder whether it's really worth it..

I think the assumption, that we want this as a generic service is
false.

We want this as a special service for a few filesystems, such as the
unprivileged userspace loopback mounting I was talking about.

So my thinking is: if an unprivileged filesystem explicitly asks for
this, then it should be allowed.  It could be a per-superblock flag
instead of a per fs-type flag, if that sounds better.

My fuse implementation would have been exactly the same: the -mknod()
implementation would check a per filesystem flag, and if it's not set,
check the permissions as normal mknod() would.  Here's the relevant
patch snippet:

Index: linux/fs/fuse/dir.c
===
--- linux.orig/fs/fuse/dir.c2007-09-21 13:45:23.0 +0200
+++ linux/fs/fuse/dir.c 2007-09-21 13:45:25.0 +0200
@@ -486,7 +486,13 @@ static int fuse_mknod(struct inode *dir,
 {
struct fuse_mknod_in inarg;
struct fuse_conn *fc = get_fuse_conn(dir);
-   struct fuse_req *req = fuse_get_req(fc);
+   struct fuse_req *req;
+
+   if (!fc-mknod_nocheck 
+   ((S_ISCHR(mode) || S_ISBLK(mode))  !capable(CAP_MKNOD)))
+   return -EPERM;
+
+   req = fuse_get_req(fc);
if (IS_ERR(req))
return PTR_ERR(req);
 

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/5] VFS: pass open file to -getattr()

2007-09-21 Thread Andreas Dilger
On Sep 21, 2007  14:23 +0200, Miklos Szeredi wrote:
 @@ -1212,7 +1212,8 @@ struct inode_operations {
 - int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
 + int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *,
 + struct file *file);

It's not much of an inode operation anymore if you need to pass a file
to it...  Since the attributes are really part of the inode and not
the file, this seems like a bit of a hack.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5] VFS: pass open file to -xattr()

2007-09-21 Thread Andreas Dilger
On Sep 21, 2007  16:59 +0200, Miklos Szeredi wrote:
 What I'm saying is that read and write are _no_more_ related to the
 file than fstat.  Read/write operate on inode data, fstat operates on
 inode metadata.

The read and write operations are DEFINITELY related to the file descriptor
because of f_pos.  Each process opening the same file can have a different
f_pos so read/write will work in different locations of the file.

In contrast getattr and getxattr operate on the single inode and you don't
get e.g. a different i_size or i_uid or i_gid depending on who opened a
file, nor is the xattr different.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5] VFS: pass open file to -xattr()

2007-09-21 Thread Andreas Dilger
On Sep 21, 2007  14:23 +0200, Miklos Szeredi wrote:
 @@ -1214,10 +1214,12 @@ struct inode_operations {
 + int (*setxattr) (struct dentry *, const char *,const void *,size_t,int,
 +  struct file *);
 + ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t,
 +  struct file *);
 + ssize_t (*listxattr) (struct dentry *, char *, size_t, struct file *);
 + int (*removexattr) (struct dentry *, const char *, struct file *);

Likewise - these are no longer inode operations if you need a file.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 4/5] VFS: allow filesystems to implement atomic open+truncate

2007-09-21 Thread Andreas Dilger
On Sep 21, 2007  14:23 +0200, Miklos Szeredi wrote:
 Add a new attribute flag ATTR_OPEN, with the meaning: truncation was
 initiated by open() due to the O_TRUNC flag.
 
 This way filesystems wanting to implement truncation within their
 -open() method can ignore such truncate requests.

This is actually something we've needed to do in Lustre for a while also.
We called it ATTR_FROM_OPEN, but I don't really mind ATTR_OPEN either -
the less patching we need to do the better.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-21 Thread Hugh Dickins
On Thu, 20 Sep 2007, Christoph Lameter wrote:
 On Thu, 20 Sep 2007, David Chinner wrote:
   Disagree, the mmap side is not a little change.
  
  That's not in the filesystem, though. ;)
 
 And its really only a minimal change for some function to loop over all 
 4k pages and elsewhere index the right 4k subpage.

I agree with you on that: the changes you had to make to support mmap
were _much_ less bothersome than I'd been fearing, and I'm surprised
some people still see that side of it as a sticking point.

But I've kept very quiet because I remain quite ambivalent about the
patchset: I'm somewhere on the spectrum between you and Nick, shifting
my position from hour to hour.  Don't expect any decisiveness from me.

In some senses I'm even further off the scale away from you: I'm
dubious even of Nick and Andrew's belief in opportunistic contiguity.  
Just how hard should we trying for contiguity?  How far should we
go in sacrificing our earlier LRU principles?  It's easy to bump
up PAGE_ALLOC_COSTLY_ORDER, but what price do we pay when we do?

I agree with those who want to see how the competing approaches
work out in practice: which is frustrating for you, yes, because
you are so close to ready.  (I've not glanced at virtual compound,
but had been wondering in that direction before you suggested it.)

I do think your patchset is, for the time being at least, a nice
out-of-tree set, and it's grand to be able to bring a filesystem
from another arch with larger pagesize and get at the data from it.

I've found some fixes needed on top of your Large Blocksize Support
patches: I'll send those to you in a moment.  Looks like you didn't
try much swapping!

I only managed to get ext2 working with larger blocksizes:
reiserfs -b 8192 wouldn't mount (reiserfs_fill_super: can not find
reiserfs on /dev/sdb1); ext3 gave me mysterious errors (JBD: tar
wants too many credits, even after adding JBD patches that you
turned out to be depending on); and I didn't try ext4 or xfs
(I'm guessing the latter has been quite well tested ;)

Hugh
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5] VFS: pass open file to -xattr()

2007-09-21 Thread Miklos Szeredi
  What I'm saying is that read and write are _no_more_ related to the
  file than fstat.  Read/write operate on inode data, fstat operates on
  inode metadata.
 
 The read and write operations are DEFINITELY related to the file descriptor
 because of f_pos.  Each process opening the same file can have a different
 f_pos so read/write will work in different locations of the file.

Ah yes, but f_pos is handled entirely within the VFS.  The filesystem
has nothing to do with f_pos, and the read/write methods are passed
the offset explicitly.

So with that the read/write calls (for regular files at least) really
are not that different from fstat.

 In contrast getattr and getxattr operate on the single inode and you don't
 get e.g. a different i_size or i_uid or i_gid depending on who opened a
 file, nor is the xattr different.

You won't get different data either (again for regular files).  Yet
passing file operations down to the filesystem implementation makes
sense even for regular files, even if for most filesystems we could
get away with a totally stateless read/write model (as some other OS's
apparently do).

What I'm arguing, is that if we pass the open file for read/write to
the filesystem for regular files, it makes _equally_ as much sense to
pass the open file for getattr/setattr/xattr operations.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Ecryptfs-devel] [PATCH 3/11] eCryptfs: read_write.c routines

2007-09-21 Thread Michael Halcrow
On Wed, Sep 19, 2007 at 10:38:50PM -0700, Andrew Morton wrote:
  +   virt = kmap(page_for_lower);
  +   rc = ecryptfs_write_lower(ecryptfs_inode, virt, offset, size);
  +   kunmap(page_for_lower);
  +   return rc;
  +}
 
 argh, kmap.  http://lkml.org/lkml/2007/9/15/55

Here is a patch that moves to kmap_atomic(), adding an intermediate
copy. Although I would really like to find a way to avoid having to do
this extra copy.

---

Replace kmap() with kmap_atomic() for read_write.c routines

kmap() can lead to deadlock when multiple tasks attempt to take more
than one simultaneously:

http://lkml.org/lkml/2007/9/15/55

In order to avoid this possibility, eCryptfs must allocate an
intermediate block of memory to use with vfs_read() and vfs_write(),
copying the data through this memory region, since kmap_atomic()
cannot be held during calls which may block.

Signed-off-by: Michael Halcrow [EMAIL PROTECTED]
---
diff --git a/fs/ecryptfs/ecryptfs_kernel.h b/fs/ecryptfs/ecryptfs_kernel.h
index bb92b74..ce7a5d4 100644
--- a/fs/ecryptfs/ecryptfs_kernel.h
+++ b/fs/ecryptfs/ecryptfs_kernel.h
@@ -648,6 +648,6 @@ int ecryptfs_read_lower_page_segment(struct page 
*page_for_ecryptfs,
 struct inode *ecryptfs_inode);
 int ecryptfs_read(char *data, loff_t offset, size_t size,
  struct file *ecryptfs_file);
-struct page *ecryptfs_get1page(struct file *file, loff_t index);
+struct page *ecryptfs_get_locked_page(struct file *file, loff_t index);
 
 #endif /* #ifndef ECRYPTFS_KERNEL_H */
diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c
index c6a8a33..6abf805 100644
--- a/fs/ecryptfs/mmap.c
+++ b/fs/ecryptfs/mmap.c
@@ -37,23 +37,30 @@
 struct kmem_cache *ecryptfs_lower_page_cache;
 
 /**
- * ecryptfs_get1page
+ * ecryptfs_get_locked_page
  *
  * Get one page from cache or lower f/s, return error otherwise.
  *
- * Returns unlocked and up-to-date page (if ok), with increased
+ * Returns a locked and up-to-date page (if ok), with increased
  * refcnt.
  */
-struct page *ecryptfs_get1page(struct file *file, loff_t index)
+struct page *ecryptfs_get_locked_page(struct file *file, loff_t index)
 {
struct dentry *dentry;
struct inode *inode;
struct address_space *mapping;
+   struct page *page;
 
dentry = file-f_path.dentry;
inode = dentry-d_inode;
mapping = inode-i_mapping;
-   return read_mapping_page(mapping, index, (void *)file);
+   page = read_mapping_page(mapping, index, (void *)file);
+   if (!IS_ERR(page))
+   lock_page(page);
+   else
+   printk(KERN_ERR %s: Error from read_mapping_page()\n,
+  __FUNCTION__);
+   return page;
 }
 
 /**
diff --git a/fs/ecryptfs/read_write.c b/fs/ecryptfs/read_write.c
index ccd2599..6732a4c 100644
--- a/fs/ecryptfs/read_write.c
+++ b/fs/ecryptfs/read_write.c
@@ -83,14 +83,24 @@ int ecryptfs_write_lower_page_segment(struct inode 
*ecryptfs_inode,
  struct page *page_for_lower,
  size_t offset_in_page, size_t size)
 {
-   char *virt;
+   char *page_for_lower_virt;
+   char *tmp_virt;
loff_t offset;
int rc;
 
-   offset = (page_for_lower-index  PAGE_CACHE_SHIFT) + offset_in_page;
-   virt = kmap(page_for_lower);
-   rc = ecryptfs_write_lower(ecryptfs_inode, virt, offset, size);
-   kunmap(page_for_lower);
+   tmp_virt = kmalloc(PAGE_CACHE_SIZE, GFP_KERNEL);
+   if (!tmp_virt) {
+   rc = -ENOMEM;
+   goto out;
+   }
+   offset = loff_t)page_for_lower-index)  PAGE_CACHE_SHIFT)
+ + offset_in_page);
+   page_for_lower_virt = kmap_atomic(page_for_lower, KM_USER0);
+   memcpy(tmp_virt, page_for_lower_virt, PAGE_CACHE_SIZE);
+   kunmap_atomic(page_for_lower_virt, KM_USER0);
+   rc = ecryptfs_write_lower(ecryptfs_inode, tmp_virt, offset, size);
+   kfree(tmp_virt);
+out:
return rc;
 }
 
@@ -140,8 +150,8 @@ int ecryptfs_write(struct file *ecryptfs_file, char *data, 
loff_t offset,
if (num_bytes  total_remaining_zeros)
num_bytes = total_remaining_zeros;
}
-   ecryptfs_page = ecryptfs_get1page(ecryptfs_file,
- ecryptfs_page_idx);
+   ecryptfs_page = ecryptfs_get_locked_page(ecryptfs_file,
+ecryptfs_page_idx);
if (IS_ERR(ecryptfs_page)) {
rc = PTR_ERR(ecryptfs_page);
printk(KERN_ERR %s: Error getting page at 
@@ -159,6 +169,7 @@ int ecryptfs_write(struct file *ecryptfs_file, char *data, 
loff_t offset,
printk(KERN_ERR %s: Error decrypting 
   page; rc = [%d]\n,
   __FUNCTION__, 

Re: [Ecryptfs-devel] [PATCH 3/11] eCryptfs: read_write.c routines

2007-09-21 Thread Andrew Morton
On Fri, 21 Sep 2007 16:51:25 -0500
Michael Halcrow [EMAIL PROTECTED] wrote:

 On Wed, Sep 19, 2007 at 10:38:50PM -0700, Andrew Morton wrote:
   + virt = kmap(page_for_lower);
   + rc = ecryptfs_write_lower(ecryptfs_inode, virt, offset, size);
   + kunmap(page_for_lower);
   + return rc;
   +}
  
  argh, kmap.  http://lkml.org/lkml/2007/9/15/55
 
 Here is a patch that moves to kmap_atomic(), adding an intermediate
 copy. Although I would really like to find a way to avoid having to do
 this extra copy.

We might as well stick with kmap.  I was just having a whine - I don't know
what to do about this really, apart from perhaps giving in to reality and
making kmap work better.


btw, I'm not really a great admirer of the whole patchset: it does some
pretty nasty-looking things: allocating dynamic memory, grabbing the
underlying pageframes with virt_to_page(), passing them back into kernel
APIs which are supposed to be called from userspace, etc.  It's all rather
ugly and abusive-looking.

But given that you're trying to do things which the kernel just isn't set
up to do, it isn't immediately obvious what can be done to fix it.

Perhaps there are problems whcih I didn't have time to spot, and perhaps
there are things which could be done to improve it.  But I don't have time
to sit down and absorb it all to a sufficient level of detail to be able to
suggest anything, and nobody else seems to be interested in reading the
patches so whoop, in it all goes.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html