Re: [RFC/PATCH 4/8] revoke: core code V7

2008-01-16 Thread Pekka J Enberg
Hi,

On Tue, 15 Jan 2008, Christoph Hellwig wrote:
 Something like the loop above is not going to go in for sure.  Once we
 get rid of the sb-s_files we can put the list_head in struct file to
 new use eventually if we don't want to get rid of it.  E.g. and
 per-inode list would be much better than the per-superblock one and
 would regularize what the tty driver is doing.

Sure, adding a list of struct files to struct inode obviously works for 
me. But it does increase the size of struct inode which I thought you're 
not allowed to do ;-)

Anyway, there are still some other problems with this patch if we want to 
use it to implement forced unmount (my bad, we can't go changing struct 
inode because it's shared by different mount points).

Pekka
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 2/8] revoke: inode revoke lock V7

2007-12-19 Thread Pekka J Enberg
Hi Jonathan,

(Thanks for the review!)

On Tue, 18 Dec 2007, Jonathan Corbet wrote:
 This is a relatively minor detail in the rather bigger context of this
 patch, but...
 
  @@ -642,6 +644,7 @@ struct inode {
  struct list_headinotify_watches; /* watches on this inode */
  struct mutexinotify_mutex;  /* protects the watches list */
   #endif
  +   wait_queue_head_t   i_revoke_wait;
 
 That seems like a relatively hefty addition to every inode in the system
 when revoke - I think - will be a fairly rare operation.  Would there be
 any significant cost to using a single, global revoke-wait queue instead
 of growing the inode structure?

No, that's a good idea. I'll change it for the next patchset. Thanks!

   Pekka
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 2/8] revoke: inode revoke lock V7

2007-12-19 Thread Pekka J Enberg
Hi,

On Wed, 19 Dec 2007, Serge E. Hallyn wrote:
  I assume you mean S_REVOKE_LOCK and not -i_mutex, right?
 
 No I did mean the i_mutex since you take the i_mutex when you set
 S_REVOKE_LOCK.  So between that and the comment above do_lookup(),
 I assumed you were trying to lock out concurrent do_lookups() returning
 an inode whose revoke is starting at the same time.

No, I only use -i_mutex for synchronizing the write to -i_flags.

On Wed, 19 Dec 2007, Serge E. Hallyn wrote:
  The caller is supposed to block open(2) with chmod(2)/chattr(2) so while 
  revoke is in progress, you can get references to the _revoked inode_, 
  which is fine (operations on it will fail with EBADFS). The 
  -i_revoke_wait bits are there to make sure that while we revoke, you 
  can't get a _new reference_ to the inode until we're done.
 
 And a new reference means through iget(), so if revoke starts
 between the IS_REVOKE_LOCKED() check in do_lookup and its return,
 it's ok bc we'll get a reference later on?

Yes, as soon as we unhash the dentries and the inode, do_lookup() will try 
to find a new inode with iget() but we need to wait before writeback on 
the revoked inode is finished.

On Wed, 19 Dec 2007, Serge E. Hallyn wrote:
 I'm a little confused but i'll keep looking.

I don't blame you. The patch is missing the following minor detail which 
is needed to avoid fs corruption...

Pekka

Index: 2.6/fs/revoke.c
===
--- 2.6.orig/fs/revoke.c2007-12-16 19:57:40.0 +0200
+++ 2.6/fs/revoke.c 2007-12-19 18:03:13.0 +0200
@@ -426,6 +426,8 @@ int err = 0;
make_revoked_inode(inode);
remove_inode_hash(inode);
revoke_aliases(inode);
+
+   err = write_inode_now(inode, 1);
 failed:
revoke_unlock(inode);
wake_up(inode-i_revoke_wait);
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 2/8] revoke: inode revoke lock V7

2007-12-17 Thread Pekka J Enberg
Hi Serge,

(Thanks for looking at this. I appreciate the review!)

On Mon, 17 Dec 2007, [EMAIL PROTECTED] wrote:
  struct vfsmount *mnt = nd-mnt;
  -   struct dentry *dentry = __d_lookup(nd-dentry, name);
  +   struct dentry *dentry;
   
  +again:
  +   dentry  = __d_lookup(nd-dentry, name);
  if (!dentry)
  goto need_lookup;
  +
  +   if (dentry-d_inode  IS_REVOKE_LOCKED(dentry-d_inode)) {
 
 not sure whether this is a problem or not, but dentry-d_inode isn't
 locked here, right?  So nothing is keeping do_lookup() returning
 with an inode which gets revoked between here and the return 0
 a few lines down?

I assume you mean S_REVOKE_LOCK and not -i_mutex, right?

The caller is supposed to block open(2) with chmod(2)/chattr(2) so while 
revoke is in progress, you can get references to the _revoked inode_, 
which is fine (operations on it will fail with EBADFS). The 
-i_revoke_wait bits are there to make sure that while we revoke, you 
can't get a _new reference_ to the inode until we're done.

Pekka
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 8/8] revoke: add to documentation V7

2007-12-14 Thread Pekka J Enberg
From: Pekka Enberg [EMAIL PROTECTED]

Add the -revoke() file operation to VFS documentation.

Cc: Alan Cox [EMAIL PROTECTED]
Cc: Al Viro [EMAIL PROTECTED]
Cc: Christoph Hellwig [EMAIL PROTECTED]
Cc: Peter Zijlstra [EMAIL PROTECTED]
Signed-off-by: Pekka Enberg [EMAIL PROTECTED]
---

 Documentation/filesystems/vfs.txt |5 +
 1 file changed, 5 insertions(+)

Index: 2.6/Documentation/filesystems/vfs.txt
===
--- 2.6.orig/Documentation/filesystems/vfs.txt  2007-11-23 09:58:11.0 
+0200
+++ 2.6/Documentation/filesystems/vfs.txt   2007-12-14 16:41:05.0 
+0200
@@ -780,6 +780,7 @@ struct file_operations {
int (*flock) (struct file *, int, struct file_lock *);
ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, 
size_t, unsigned int);
ssize_t (*splice_read)(struct file *, struct pipe_inode_info *, size_t, 
unsigned int);
+   int (*revoke)(struct file *);
 };
 
 Again, all methods are called without any locks being held, unless
@@ -853,6 +854,10 @@ otherwise noted.
   splice_read: called by the VFS to splice data from file to a pipe. This
   method is used by the splice(2) system call
 
+  revoke: called by revokeat(2) system call to revoke access to an open file.
+ This method must ensure that all currently blocked writes are flushed
+ and that all pending reads will fail.
+
 Note that the file operations are implemented by the specific
 filesystem in which the inode resides. When opening a device node
 (character or block special) most filesystems will call special
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 7/8] revoke: support for ext2 and ext3 V7

2007-12-14 Thread Pekka J Enberg
From: Pekka Enberg [EMAIL PROTECTED]

Add revoke support to ext2, ext3 and ext4 by wiring f_ops-revoke with
generic_file_revoke.

Cc: Alan Cox [EMAIL PROTECTED]
Cc: Al Viro [EMAIL PROTECTED]
Cc: Christoph Hellwig [EMAIL PROTECTED]
Cc: Peter Zijlstra [EMAIL PROTECTED]
Signed-off-by: Pekka Enberg [EMAIL PROTECTED]
---

 fs/ext2/file.c |1 +
 fs/ext3/file.c |1 +
 fs/ext4/file.c |1 +
 3 files changed, 3 insertions(+)

Index: 2.6/fs/ext2/file.c
===
--- 2.6.orig/fs/ext2/file.c 2007-11-23 09:58:11.0 +0200
+++ 2.6/fs/ext2/file.c  2007-12-14 16:41:02.0 +0200
@@ -58,6 +58,7 @@ const struct file_operations ext2_file_o
.fsync  = ext2_sync_file,
.splice_read= generic_file_splice_read,
.splice_write   = generic_file_splice_write,
+   .revoke = generic_file_revoke,
 };
 
 #ifdef CONFIG_EXT2_FS_XIP
Index: 2.6/fs/ext3/file.c
===
--- 2.6.orig/fs/ext3/file.c 2007-11-23 09:58:11.0 +0200
+++ 2.6/fs/ext3/file.c  2007-12-14 16:41:02.0 +0200
@@ -122,6 +122,7 @@ const struct file_operations ext3_file_o
.fsync  = ext3_sync_file,
.splice_read= generic_file_splice_read,
.splice_write   = generic_file_splice_write,
+   .revoke = generic_file_revoke,
 };
 
 const struct inode_operations ext3_file_inode_operations = {
Index: 2.6/fs/ext4/file.c
===
--- 2.6.orig/fs/ext4/file.c 2007-11-23 09:58:11.0 +0200
+++ 2.6/fs/ext4/file.c  2007-12-14 16:41:02.0 +0200
@@ -122,6 +122,7 @@ const struct file_operations ext4_file_o
.fsync  = ext4_sync_file,
.splice_read= generic_file_splice_read,
.splice_write   = generic_file_splice_write,
+   .revoke = generic_file_revoke,
 };
 
 const struct inode_operations ext4_file_inode_operations = {
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 6/8] revoke: wire up i386 system call V7

2007-12-14 Thread Pekka J Enberg
From: Pekka Enberg [EMAIL PROTECTED]

Make revokeat system call available to user-space on i386.

[EMAIL PROTECTED]: fix 32-bit userspace]
Cc: Alan Cox [EMAIL PROTECTED]
Cc: Al Viro [EMAIL PROTECTED]
Cc: Christoph Hellwig [EMAIL PROTECTED]
Cc: Peter Zijlstra [EMAIL PROTECTED]
Signed-off-by: Pekka Enberg [EMAIL PROTECTED]
---
 arch/x86/ia32/ia32entry.S  |1 +
 arch/x86/kernel/syscall_table_32.S |1 +
 include/asm-x86/unistd_32.h|3 ++-
 3 files changed, 4 insertions(+), 1 deletion(-)

Index: 2.6/arch/x86/ia32/ia32entry.S
===
--- 2.6.orig/arch/x86/ia32/ia32entry.S  2007-11-23 09:58:11.0 +0200
+++ 2.6/arch/x86/ia32/ia32entry.S   2007-12-14 16:40:59.0 +0200
@@ -726,4 +726,5 @@ .quad compat_sys_utimensat  /* 320 */
.quad compat_sys_timerfd
.quad sys_eventfd
.quad sys32_fallocate
+   .quad sys_revokeat  /* 325 */
 ia32_syscall_end:
Index: 2.6/arch/x86/kernel/syscall_table_32.S
===
--- 2.6.orig/arch/x86/kernel/syscall_table_32.S 2007-11-23 09:58:11.0 
+0200
+++ 2.6/arch/x86/kernel/syscall_table_32.S  2007-12-14 16:40:59.0 
+0200
@@ -324,3 +324,4 @@ .long sys_utimensat /* 320 */
.long sys_timerfd
.long sys_eventfd
.long sys_fallocate
+   .long sys_revokeat  /* 325 */
Index: 2.6/include/asm-x86/unistd_32.h
===
--- 2.6.orig/include/asm-x86/unistd_32.h2007-11-23 09:58:11.0 
+0200
+++ 2.6/include/asm-x86/unistd_32.h 2007-12-14 16:40:59.0 +0200
@@ -330,10 +330,11 @@ #define __NR_utimensat320
 #define __NR_timerfd   322
 #define __NR_eventfd   323
 #define __NR_fallocate 324
+#define __NR_revokeat  325
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 325
+#define NR_syscalls 326
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 5/8] revoke: add to makefile V7

2007-12-14 Thread Pekka J Enberg
From: Pekka Enberg [EMAIL PROTECTED]

Add fs/revoke.c and fs/revoked_inode.c to build when CONFIG_MMU is enabled.

Cc: Alan Cox [EMAIL PROTECTED]
Cc: Al Viro [EMAIL PROTECTED]
Cc: Christoph Hellwig [EMAIL PROTECTED]
Cc: Peter Zijlstra [EMAIL PROTECTED]
Signed-off-by: Pekka Enberg [EMAIL PROTECTED]
---
 fs/Makefile |1 +
 1 file changed, 1 insertion(+)

Index: 2.6/fs/Makefile
===
--- 2.6.orig/fs/Makefile2007-11-23 09:58:11.0 +0200
+++ 2.6/fs/Makefile 2007-12-14 16:40:57.0 +0200
@@ -19,6 +19,7 @@ else
 obj-y +=   no-block.o
 endif
 
+obj-$(CONFIG_MMU)  += revoke.o revoked_inode.o
 obj-$(CONFIG_INOTIFY)  += inotify.o
 obj-$(CONFIG_INOTIFY_USER) += inotify_user.o
 obj-$(CONFIG_EPOLL)+= eventpoll.o
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 4/8] revoke: core code V7

2007-12-14 Thread Pekka J Enberg
From: Pekka Enberg [EMAIL PROTECTED]

The revokeat(2) system call ensures that after successful revocation you can
only access an inode via a file descriptor that is obtained from a subsequent
open(2) call.  The open(2) system call can be blocked by the caller with
chmod(2) and chown(2) prior to calling revokeat(2) to gain exclusive access to
an inode.

After an successful revocation, operations on file descriptors fail with the
EBADF or ENXIO error code for regular and device files, respectively.
Attempting to read from or write to a revoked mapping causes SIGBUS.  The
revokeat(2) system call guarantees that:

  (1) open file descriptors are revoked,

  (2) file descriptors created by fork(2) and dup(2) during
  the operation are revoked,

  (3) file descriptors obtained via a SCM_RIGHTS datagram during or
  after the revoke operation are revoked,

  (4) in-flight read(2) and write(2) operations are either completed
  or aborted before revokeat(2) returns successfully,

  (5) attempting to read from or write to a shared memory mapping
  raises SIGBUS, and

  (6) copy-on-write to a private memory mapping after successful
  revokeat(2) call does not reveal any data written after the
  system call has returned.

TODO:

  - I/O requests that are in-flight
  - Breaking of private mapping COW races with fork

Cc: Alan Cox [EMAIL PROTECTED]
Cc: Al Viro [EMAIL PROTECTED]
Cc: Christoph Hellwig [EMAIL PROTECTED]
Cc: Peter Zijlstra [EMAIL PROTECTED]
Signed-off-by: Pekka Enberg [EMAIL PROTECTED]
---
 fs/revoke.c   |  450 ++
 include/linux/fs.h|   10 +
 include/linux/magic.h |2 
 include/linux/mm.h|1 
 mm/mmap.c |   11 +
 5 files changed, 473 insertions(+), 1 deletion(-)

Index: 2.6/fs/revoke.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ 2.6/fs/revoke.c 2007-12-14 16:40:55.0 +0200
@@ -0,0 +1,450 @@
+/*
+ * Invalidate all current open file descriptors of an inode.
+ *
+ * Copyright (C) 2006-2007  Pekka Enberg
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include linux/preempt.h
+#include linux/bit_spinlock.h
+#include linux/buffer_head.h
+#include linux/dcache.h
+#include linux/file.h
+#include linux/fs.h
+#include linux/magic.h
+#include linux/module.h
+#include linux/mount.h
+#include linux/namei.h
+
+static void revoke_aliases(struct inode *inode)
+{
+   struct dentry *dentry;
+restart:
+   spin_lock(dcache_lock);
+   list_for_each_entry(dentry, inode-i_dentry, d_alias) {
+   spin_lock(dentry-d_lock);
+   if (!d_unhashed(dentry)) {
+   dget_locked(dentry);
+   __d_drop(dentry);
+   spin_unlock(dentry-d_lock);
+   spin_unlock(dcache_lock);
+   dput(dentry);
+   goto restart;
+   }
+   spin_unlock(dentry-d_lock);
+   }
+   spin_unlock(dcache_lock);
+}
+
+static int revoke_files(struct inode *inode)
+{
+   struct super_block *sb;
+   struct file *file;
+   int err = 0;
+
+   sb = inode-i_sb;
+   if (!sb)
+   return -EINVAL;
+
+restart:
+   file_list_lock();
+   list_for_each_entry(file, sb-s_files, f_u.fu_list) {
+   struct dentry *dentry = file-f_path.dentry;
+
+   if (dentry-d_inode != inode)
+   continue;
+
+   if (file-f_op != inode-i_fop)
+   continue;
+
+   get_file(file);
+
+   /*
+* inode-i_mutex cannot be acquired under files_lock
+*/
+   file_list_unlock();
+
+   err = file-f_op-revoke(file);
+   make_revoked_file(inode, file);
+   fput(file);
+
+   if (err)
+   goto out;
+
+   if (signal_pending(current)) {
+   err = -EINTR;
+   goto out;
+   }
+   cond_resched();
+   goto restart;
+   }
+   file_list_unlock();
+out:
+   return err;
+}
+
+static inline bool vma_matches(struct vm_area_struct *vma, struct inode *inode)
+{
+   struct file *file = vma-vm_file;
+
+   return file  file-f_path.dentry-d_inode == inode;
+}
+
+/*
+ * LOCKING: read_lock(tasklist_lock)
+ */
+static unsigned long nr_tasks_with_mm(void)
+{
+   struct task_struct *g, *p;
+   int ret = 0;
+
+   do_each_thread(g, p) {
+   if (!p-mm)
+   continue;
+   ret++;
+   }
+   while_each_thread(g, p);
+   return ret;
+}
+
+static int task_break_cow(struct task_struct *tsk, struct inode *inode)
+{
+   struct vm_area_struct *vma;
+   struct mm_struct *mm;
+   int ret = 0;
+
+   mm = get_task_mm(tsk);
+   if (!mm)
+ 

[RFC/PATCH 3/8] revoke: file, inode, and address space operations V7

2007-12-14 Thread Pekka J Enberg
From: Pekka Enberg [EMAIL PROTECTED]

Add file, inode, and addresspace operations for inodes that represent revoked
files.

Cc: Alan Cox [EMAIL PROTECTED]
Cc: Al Viro [EMAIL PROTECTED]
Cc: Christoph Hellwig [EMAIL PROTECTED]
Cc: Peter Zijlstra [EMAIL PROTECTED]
Signed-off-by: Pekka Enberg [EMAIL PROTECTED]
---
 fs/revoked_inode.c |  416 +
 1 file changed, 416 insertions(+)

Index: 2.6/fs/revoked_inode.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ 2.6/fs/revoked_inode.c  2007-12-14 16:40:53.0 +0200
@@ -0,0 +1,416 @@
+/*
+ * fs/revoked_inode.c
+ *
+ * Copyright (C) 2007  Pekka Enberg
+ *
+ * Provide stub functions for revoked inodes. Based on fs/bad_inode.c which is
+ *
+ * Copyright (C) 1997  Stephen Tweedie
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include linux/fs.h
+#include linux/module.h
+#include linux/stat.h
+#include linux/time.h
+#include linux/smp_lock.h
+#include linux/namei.h
+#include linux/poll.h
+
+static loff_t revoked_file_llseek(struct file *file, loff_t offset, int origin)
+{
+   return -EBADF;
+}
+
+static ssize_t revoked_file_read(struct file *filp, char __user *buf,
+size_t size, loff_t *ppos)
+{
+   return -EBADF;
+}
+
+static ssize_t revoked_special_file_read(struct file *filp, char __user *buf,
+size_t size, loff_t *ppos)
+{
+   return 0;
+}
+
+static ssize_t revoked_file_write(struct file *filp, const char __user *buf,
+ size_t siz, loff_t *ppos)
+{
+   return -EBADF;
+}
+
+static ssize_t revoked_file_aio_read(struct kiocb *iocb,
+const struct iovec *iov,
+unsigned long nr_segs, loff_t pos)
+{
+   return -EBADF;
+}
+
+static ssize_t revoked_file_aio_write(struct kiocb *iocb,
+ const struct iovec *iov,
+ unsigned long nr_segs, loff_t pos)
+{
+   return -EBADF;
+}
+
+static int revoked_file_readdir(struct file *filp, void *dirent,
+   filldir_t filldir)
+{
+   return -EBADF;
+}
+
+static unsigned int revoked_file_poll(struct file *filp, poll_table *wait)
+{
+   return POLLERR;
+}
+
+static int revoked_file_ioctl(struct inode *inode, struct file *filp,
+ unsigned int cmd, unsigned long arg)
+{
+   return -EBADF;
+}
+
+static long revoked_file_unlocked_ioctl(struct file *file, unsigned cmd,
+   unsigned long arg)
+{
+   return -EBADF;
+}
+
+static long revoked_file_compat_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+   return -EBADF;
+}
+
+static int revoked_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+   return -EBADF;
+}
+
+static int revoked_file_open(struct inode *inode, struct file *filp)
+{
+   return -EBADF;
+}
+
+static int revoked_file_flush(struct file *file, fl_owner_t id)
+{
+   return 0;
+}
+
+static int revoked_file_release(struct inode *inode, struct file *filp)
+{
+   return -EBADF;
+}
+
+static int revoked_file_fsync(struct file *file, struct dentry *dentry,
+ int datasync)
+{
+   return -EBADF;
+}
+
+static int revoked_file_aio_fsync(struct kiocb *iocb, int datasync)
+{
+   return -EBADF;
+}
+
+static int revoked_file_fasync(int fd, struct file *filp, int on)
+{
+   return -EBADF;
+}
+
+static int revoked_file_lock(struct file *file, int cmd, struct file_lock *fl)
+{
+   return -EBADF;
+}
+
+static ssize_t revoked_file_sendpage(struct file *file, struct page *page,
+int off, size_t len, loff_t *pos,
+int more)
+{
+   return -EBADF;
+}
+
+static unsigned long revoked_file_get_unmapped_area(struct file *file,
+   unsigned long addr,
+   unsigned long len,
+   unsigned long pgoff,
+   unsigned long flags)
+{
+   return -EBADF;
+}
+
+static int revoked_file_check_flags(int flags)
+{
+   return -EBADF;
+}
+
+static int revoked_file_dir_notify(struct file *file, unsigned long arg)
+{
+   return -EBADF;
+}
+
+static int revoked_file_flock(struct file *filp, int cmd, struct file_lock *fl)
+{
+   return -EBADF;
+}
+
+static ssize_t revoked_file_splice_write(struct pipe_inode_info *pipe,
+struct file *out, loff_t *ppos,
+size_t len, unsigned int flags)
+{
+   return -EBADF;
+}
+
+static ssize_t revoked_file_splice_read(struct file *in, loff_t *ppos,
+  

[RFC/PATCH 1/8] revoke: special mmap handling V7

2007-12-14 Thread Pekka J Enberg
From: Pekka Enberg [EMAIL PROTECTED]

This adds special handling for revoked shared memory mappings.  We want to
raise SIGBUS if someone accesses a revoked mapping and return ENODEV if
somebody tries to remap one with mmap(2).

Cc: Alan Cox [EMAIL PROTECTED]
Cc: Al Viro [EMAIL PROTECTED]
Cc: Christoph Hellwig [EMAIL PROTECTED]
Cc: Peter Zijlstra [EMAIL PROTECTED]
Signed-off-by: Pekka Enberg [EMAIL PROTECTED]
---
 include/linux/mm.h |1 +
 mm/memory.c|3 +++
 mm/mmap.c  |   12 
 3 files changed, 12 insertions(+), 4 deletions(-)

Index: 2.6/include/linux/mm.h
===
--- 2.6.orig/include/linux/mm.h 2007-12-14 11:33:57.0 +0200
+++ 2.6/include/linux/mm.h  2007-12-14 16:40:48.0 +0200
@@ -106,6 +106,7 @@ #define VM_INSERTPAGE   0x0200  /* The 
 #define VM_ALWAYSDUMP  0x0400  /* Always include in core dumps */
 
 #define VM_CAN_NONLINEAR 0x0800/* Has -fault  does nonlinear pages */
+#define VM_REVOKED 0x1000  /* Mapping has been revoked */
 
 #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */
 #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
Index: 2.6/mm/memory.c
===
--- 2.6.orig/mm/memory.c2007-11-23 09:58:11.0 +0200
+++ 2.6/mm/memory.c 2007-12-14 16:40:49.0 +0200
@@ -2530,6 +2530,9 @@ int handle_mm_fault(struct mm_struct *mm
if (unlikely(is_vm_hugetlb_page(vma)))
return hugetlb_fault(mm, vma, address, write_access);
 
+   if (unlikely(vma-vm_flags  VM_REVOKED))
+   return VM_FAULT_SIGBUS;
+
pgd = pgd_offset(mm, address);
pud = pud_alloc(mm, pgd, address);
if (!pud)
Index: 2.6/mm/mmap.c
===
--- 2.6.orig/mm/mmap.c  2007-12-14 11:33:57.0 +0200
+++ 2.6/mm/mmap.c   2007-12-14 16:40:49.0 +0200
@@ -1081,10 +1081,14 @@ unsigned long charged = 0;
error = -ENOMEM;
 munmap_back:
vma = find_vma_prepare(mm, addr, prev, rb_link, rb_parent);
-   if (vma  vma-vm_start  addr + len) {
-   if (do_munmap(mm, addr, len))
-   return -ENOMEM;
-   goto munmap_back;
+   if (vma) {
+   if (unlikely(vma-vm_flags  VM_REVOKED))
+   return -ENODEV;
+   if (vma-vm_start  addr + len) {
+   if (do_munmap(mm, addr, len))
+   return -ENOMEM;
+   goto munmap_back;
+   }
}
 
/* Check against address space limit. */
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 2/8] revoke: inode revoke lock V7

2007-12-14 Thread Pekka J Enberg
From: Pekka Enberg [EMAIL PROTECTED]

The revoke operation cannibalizes the revoked struct inode and removes it from
the inode cache thus forcing subsequent callers to look up the real inode.
Therefore we must make sure that while the revoke operation is in progress
(e.g. flushing dirty pages to disk) no one takes a new reference to the inode
and starts I/O on it.

Cc: Alan Cox [EMAIL PROTECTED]
Cc: Al Viro [EMAIL PROTECTED]
Cc: Christoph Hellwig [EMAIL PROTECTED]
Cc: Peter Zijlstra [EMAIL PROTECTED]
Signed-off-by: Pekka Enberg [EMAIL PROTECTED]
---
 fs/inode.c |1 +
 fs/namei.c |   15 ++-
 include/linux/fs.h |3 +++
 3 files changed, 18 insertions(+), 1 deletion(-)

Index: 2.6/include/linux/fs.h
===
--- 2.6.orig/include/linux/fs.h 2007-11-23 09:58:11.0 +0200
+++ 2.6/include/linux/fs.h  2007-12-14 16:40:50.0 +0200
@@ -150,6 +150,7 @@ #define MS_MGC_MSK 0x
 #define S_NOCMTIME 128 /* Do not update file c/mtime */
 #define S_SWAPFILE 256 /* Do not truncate: swapon got its bmaps */
 #define S_PRIVATE  512 /* Inode is fs-internal */
+#define S_REVOKE_LOCK  1024/* Inode is being revoked */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
@@ -183,6 +184,7 @@ #define MS_MGC_MSK 0x
 #define IS_NOCMTIME(inode) ((inode)-i_flags  S_NOCMTIME)
 #define IS_SWAPFILE(inode) ((inode)-i_flags  S_SWAPFILE)
 #define IS_PRIVATE(inode)  ((inode)-i_flags  S_PRIVATE)
+#define IS_REVOKE_LOCKED(inode)((inode)-i_flags  S_REVOKE_LOCK)
 
 /* the read-only stuff doesn't really belong here, but any other place is
probably as bad and I don't want to create yet another include file. */
@@ -642,6 +644,7 @@ struct inode {
struct list_headinotify_watches; /* watches on this inode */
struct mutexinotify_mutex;  /* protects the watches list */
 #endif
+   wait_queue_head_t   i_revoke_wait;
 
unsigned long   i_state;
unsigned long   dirtied_when;   /* jiffies of first dirtying */
Index: 2.6/fs/inode.c
===
--- 2.6.orig/fs/inode.c 2007-10-26 09:36:45.0 +0300
+++ 2.6/fs/inode.c  2007-12-14 16:40:50.0 +0200
@@ -216,6 +216,7 @@ memset(inode, 0, sizeof(*inode));
INIT_RAW_PRIO_TREE_ROOT(inode-i_data.i_mmap);
INIT_LIST_HEAD(inode-i_data.i_mmap_nonlinear);
i_size_ordered_init(inode);
+   init_waitqueue_head(inode-i_revoke_wait);
 #ifdef CONFIG_INOTIFY
INIT_LIST_HEAD(inode-inotify_watches);
mutex_init(inode-inotify_mutex);
Index: 2.6/fs/namei.c
===
--- 2.6.orig/fs/namei.c 2007-10-26 09:36:48.0 +0300
+++ 2.6/fs/namei.c  2007-12-14 16:40:50.0 +0200
@@ -785,10 +785,23 @@ static int do_lookup(struct nameidata *n
 struct path *path)
 {
struct vfsmount *mnt = nd-mnt;
-   struct dentry *dentry = __d_lookup(nd-dentry, name);
+   struct dentry *dentry;
 
+again:
+   dentry  = __d_lookup(nd-dentry, name);
if (!dentry)
goto need_lookup;
+
+   if (dentry-d_inode  IS_REVOKE_LOCKED(dentry-d_inode)) {
+   int err;
+
+   err = wait_event_interruptible(dentry-d_inode-i_revoke_wait,
+   !IS_REVOKE_LOCKED(dentry-d_inode));
+   if (err)
+   return err;
+   goto again;
+   }
+
if (dentry-d_op  dentry-d_op-d_revalidate)
goto need_revalidate;
 done:
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland

2007-10-14 Thread Pekka J Enberg
Hi Erez,

On Sun, 14 Oct 2007, Erez Zadok wrote:
 In unionfs_writepage() I tried to emulate as best possible what the lower
 f/s will have returned to the VFS.  Since tmpfs's -writepage can return
 AOP_WRITEPAGE_ACTIVATE and re-mark its page as dirty, I did the same in
 unionfs: mark again my page as dirty, and return AOP_WRITEPAGE_ACTIVATE.
 
 Should I be doing something different when unionfs stacks on top of tmpfs?
 (BTW, this is probably also relevant to ecryptfs.)

Look at mm/filemap.c:__filemap_fdatawrite_range(). You shouldn't be 
calling unionfs_writepage() _at all_ if the lower mapping has 
BDI_CAP_NO_WRITEBACK capability set. Perhaps something like the totally 
untested patch below?

Pekka

---
 fs/unionfs/mmap.c |   17 +
 1 file changed, 17 insertions(+)

Index: linux-2.6.23-rc8/fs/unionfs/mmap.c
===
--- linux-2.6.23-rc8.orig/fs/unionfs/mmap.c
+++ linux-2.6.23-rc8/fs/unionfs/mmap.c
@@ -17,6 +17,7 @@
  * published by the Free Software Foundation.
  */
 
+#include linux/backing-dev.h
 #include union.h
 
 /*
@@ -144,6 +145,21 @@ out:
return err;
 }
 
+static int unionfs_writepages(struct address_space *mapping,
+ struct writeback_control *wbc)
+{
+   struct inode *lower_inode;
+   struct inode *inode;
+
+   inode = mapping-host;
+   lower_inode = unionfs_lower_inode(inode);
+
+   if (!mapping_cap_writeback_dirty(lower_inode-i_mapping))
+   return 0;
+
+   return generic_writepages(mapping, wbc);
+}
+
 /*
  * readpage is called from generic_page_read and the fault handler.
  * If your file system uses generic_page_read for the read op, it
@@ -371,6 +387,7 @@ out:
 
 struct address_space_operations unionfs_aops = {
.writepage  = unionfs_writepage,
+   .writepages = unionfs_writepages,
.readpage   = unionfs_readpage,
.prepare_write  = unionfs_prepare_write,
.commit_write   = unionfs_commit_write,
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland

2007-10-07 Thread Pekka J Enberg
Hi Erez,

On 10/7/07, Erez Zadok [EMAIL PROTECTED] wrote:
 Anyway, some Ubuntu users of Unionfs reported that msync(2) sometimes
 returns AOP_WRITEPAGE_ACTIVATE (decimal 524288) back to userland.
 Therefore, some user programs fail, esp. if they're written such as 
 this:

[snip]

On 10/7/07, Erez Zadok [EMAIL PROTECTED] wrote:
 Is this a bug indeed, or are user programs supposed to handleƂ 
 AOP_WRITEPAGE_ACTIVATE (I hope not the latter). If it's a kernel bug, 
 what should the kernel return: a zero, or an -errno (and which one)?

It's a kernel bug. AOP_WRITEPAGE_ACTIVATE is a hint to the VM to avoid 
writeback of the page in the near future. I wonder if it's enough that we 
change the return value to zero from 
mm/page-writeback.c:write_cache_pages() in case we hit AOP_WRITEPAGE_ACTIVE...

Pekka 

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 63512a9..717f341 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -672,8 +672,10 @@ retry:
 
ret = (*writepage)(page, wbc, data);
 
-   if (unlikely(ret == AOP_WRITEPAGE_ACTIVATE))
+   if (unlikely(ret == AOP_WRITEPAGE_ACTIVATE)) {
unlock_page(page);
+   ret = 0;
+   }
if (ret || (--(wbc-nr_to_write) = 0))
done = 1;
if (wbc-nonblocking  bdi_write_congested(bdi)) {

[RFC/PATCH 1/5] revoke: special mmap handling

2007-07-11 Thread Pekka J Enberg
From: Pekka Enberg [EMAIL PROTECTED]

This adds special handling for revoked memory mappings.  We want to raise
SIGBUS when accessing revoked mappings and return ENODEV when trying to remap
with mmap(2).

Signed-off-by: Pekka Enberg [EMAIL PROTECTED]
---

 include/linux/mm.h |1 +
 mm/memory.c|3 +++
 mm/mmap.c  |   12 
 3 files changed, 12 insertions(+), 4 deletions(-)

Index: 2.6/include/linux/mm.h
===
--- 2.6.orig/include/linux/mm.h 2007-07-06 10:19:51.0 +0300
+++ 2.6/include/linux/mm.h  2007-07-11 11:48:28.0 +0300
@@ -169,6 +169,7 @@ #define VM_NONLINEAR0x0080  /* Is no
 #define VM_MAPPED_COPY 0x0100  /* T if mapped copy of data (nommu 
mmap) */
 #define VM_INSERTPAGE  0x0200  /* The vma has had vm_insert_page() 
done on it */
 #define VM_ALWAYSDUMP  0x0400  /* Always include in core dumps */
+#define VM_REVOKED 0x0800  /* Mapping has been revoked */
 
 #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */
 #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
Index: 2.6/mm/memory.c
===
--- 2.6.orig/mm/memory.c2007-07-06 10:19:53.0 +0300
+++ 2.6/mm/memory.c 2007-07-11 11:48:28.0 +0300
@@ -2597,6 +2597,9 @@ int __handle_mm_fault(struct mm_struct *
if (unlikely(is_vm_hugetlb_page(vma)))
return hugetlb_fault(mm, vma, address, write_access);
 
+   if (unlikely(vma-vm_flags  VM_REVOKED))
+   return VM_FAULT_SIGBUS;
+
pgd = pgd_offset(mm, address);
pud = pud_alloc(mm, pgd, address);
if (!pud)
Index: 2.6/mm/mmap.c
===
--- 2.6.orig/mm/mmap.c  2007-07-06 10:19:53.0 +0300
+++ 2.6/mm/mmap.c   2007-07-11 11:48:28.0 +0300
@@ -1031,10 +1031,14 @@ accountable = 0;
error = -ENOMEM;
 munmap_back:
vma = find_vma_prepare(mm, addr, prev, rb_link, rb_parent);
-   if (vma  vma-vm_start  addr + len) {
-   if (do_munmap(mm, addr, len))
-   return -ENOMEM;
-   goto munmap_back;
+   if (vma) {
+   if (unlikely(vma-vm_flags  VM_REVOKED))
+   return -ENODEV;
+   if (vma-vm_start  addr + len) {
+   if (do_munmap(mm, addr, len))
+   return -ENOMEM;
+   goto munmap_back;
+   }
}
 
/* Check against address space limit. */
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 2/5] revoke: core code

2007-07-11 Thread Pekka J Enberg
From: Pekka Enberg [EMAIL PROTECTED]

The revokeat(2) and frevoke(2) system calls invalidate open file descriptors
and shared mappings of an inode.  After an successful revocation, operations
on file descriptors fail with the EBADF or ENXIO error code for regular and
device files, respectively.  Attempting to read from or write to a revoked
mapping causes SIGBUS.

The actual operation is done in two passes:

 1. Revoke all file descriptors that point to the given inode. We do
this under tasklist_lock so that after this pass, we don't need
to worry about racing with close(2) or dup(2).

 2. Take down shared memory mappings of the inode and close all file
pointers.

The file descriptors and memory mapping ranges are preserved until the
owning task does close(2) and munmap(2), respectively.


You use revoke() (with chown, for example) to gain exclusive access to 
an inode that might be in use by other processes. This means that we must 
mke sure that:

  - operations on opened file descriptors pointing to that inode fail
  - there are no shared mappings visible to other processes
  - in-progress system calls are either completed (writes) or abort 
(reads)

After revoke() system call returns, you are guaranteed to have revoked 
access to an inode for any processes that had access to it when you 
started the operation. The caller is responsible for blocking any future 
open(2) calls that might occur while revoke() takes care of fork(2) and 
dup(2) during the operation.

Signed-off-by: Pekka Enberg [EMAIL PROTECTED]
---

 fs/Makefile  |1 
 fs/revoke.c  |  777 +++
 fs/revoked_inode.c   |  417 +++
 include/linux/fs.h   |8 
 include/linux/magic.h|1 
 include/linux/mm.h   |1 
 include/linux/revoked_fs_i.h |   18 
 include/linux/syscalls.h |3 
 mm/mmap.c|   11 
 9 files changed, 1237 insertions(+)

Index: 2.6/fs/Makefile
===
--- 2.6.orig/fs/Makefile2007-05-21 15:38:14.0 +0300
+++ 2.6/fs/Makefile 2007-07-11 11:48:35.0 +0300
@@ -19,6 +19,7 @@ else
 obj-y +=   no-block.o
 endif
 
+obj-$(CONFIG_MMU)  += revoke.o revoked_inode.o
 obj-$(CONFIG_INOTIFY)  += inotify.o
 obj-$(CONFIG_INOTIFY_USER) += inotify_user.o
 obj-$(CONFIG_EPOLL)+= eventpoll.o
Index: 2.6/fs/revoke.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ 2.6/fs/revoke.c 2007-07-11 11:48:35.0 +0300
@@ -0,0 +1,777 @@
+/*
+ * fs/revoke.c - Invalidate all current open file descriptors of an inode.
+ *
+ * Copyright (C) 2006-2007  Pekka Enberg
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include linux/file.h
+#include linux/fs.h
+#include linux/namei.h
+#include linux/magic.h
+#include linux/mm.h
+#include linux/mman.h
+#include linux/module.h
+#include linux/mount.h
+#include linux/sched.h
+#include linux/revoked_fs_i.h
+#include linux/syscalls.h
+
+/**
+ * fileset - an array of file pointers.
+ * @files:the array of file pointers
+ * @nr:   number of elements in the array
+ * @end:  index to next unused file pointer
+ */
+struct fileset {
+   struct file **files;
+   unsigned long   nr;
+   unsigned long   end;
+};
+
+/**
+ * revoke_details - details of the revoke operation
+ * @inode:invalidate open file descriptors of this inode
+ * @fset: set of files that point to a revoked inode
+ * @restore_start:index to the first file pointer that is currently in
+ *use by a file descriptor but the real file has not
+ *been revoked
+ */
+struct revoke_details {
+   struct fileset  *fset;
+   unsigned long   restore_start;
+};
+
+static struct kmem_cache *revokefs_inode_cache;
+
+static inline bool fset_is_full(struct fileset *set)
+{
+   return set-nr == set-end;
+}
+
+static inline struct file *fset_get_filp(struct fileset *set)
+{
+   return set-files[set-end++];
+}
+
+static struct fileset *alloc_fset(unsigned long size)
+{
+   struct fileset *fset;
+
+   fset = kzalloc(sizeof *fset, GFP_KERNEL);
+   if (!fset)
+   return NULL;
+
+   fset-files = kcalloc(size, sizeof(struct file *), GFP_KERNEL);
+   if (!fset-files) {
+   kfree(fset);
+   return NULL;
+   }
+   fset-nr = size;
+   return fset;
+}
+
+static void free_fset(struct fileset *fset)
+{
+  int i;
+
+  for (i = fset-end; i  fset-nr; i++)
+  fput(fset-files[i]);
+
+  kfree(fset-files);
+  kfree(fset);
+}
+
+/*
+ * Revoked file descriptors point to inodes in the revokefs filesystem.
+ */
+static struct vfsmount *revokefs_mnt;
+
+static struct file *get_revoked_file(void)
+{
+   struct dentry 

[RFC/PATCH 3/5] revoke: wire up i386 system calls

2007-07-11 Thread Pekka J Enberg
From: Pekka Enberg [EMAIL PROTECTED]

Make revokeat and frevoke system calls available to user-space on i386.

Signed-off-by: Pekka Enberg [EMAIL PROTECTED]
---

 arch/i386/kernel/syscall_table.S |2 ++
 arch/x86_64/ia32/ia32entry.S |2 ++
 include/asm-i386/unistd.h|4 +++-
 3 files changed, 7 insertions(+), 1 deletion(-)

Index: 2.6/arch/i386/kernel/syscall_table.S
===
--- 2.6.orig/arch/i386/kernel/syscall_table.S   2007-05-21 15:38:00.0 
+0300
+++ 2.6/arch/i386/kernel/syscall_table.S2007-07-11 11:48:39.0 
+0300
@@ -323,3 +323,5 @@ .long sys_utimensat /* 320 */
.long sys_signalfd
.long sys_timerfd
.long sys_eventfd
+   .long sys_revokeat
+   .long sys_frevoke   /* 325 */
Index: 2.6/include/asm-i386/unistd.h
===
--- 2.6.orig/include/asm-i386/unistd.h  2007-05-21 15:38:15.0 +0300
+++ 2.6/include/asm-i386/unistd.h   2007-07-11 11:48:39.0 +0300
@@ -329,10 +329,12 @@ #define __NR_utimensat320
 #define __NR_signalfd  321
 #define __NR_timerfd   322
 #define __NR_eventfd   323
+#define __NR_revokeat  324
+#define __NR_frevoke   325
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 324
+#define NR_syscalls 326
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
Index: 2.6/arch/x86_64/ia32/ia32entry.S
===
--- 2.6.orig/arch/x86_64/ia32/ia32entry.S   2007-07-06 10:19:44.0 
+0300
+++ 2.6/arch/x86_64/ia32/ia32entry.S2007-07-11 11:48:40.0 +0300
@@ -719,4 +719,6 @@ .quad compat_sys_utimensat  /* 320 */
.quad compat_sys_signalfd
.quad compat_sys_timerfd
.quad sys_eventfd
+   .quad sys_revokeat
+   .quad sys_frevoke   /* 325 */
 ia32_syscall_end:
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 4/5] revoke: support for ext2 and ext3

2007-07-11 Thread Pekka J Enberg
From: Pekka Enberg [EMAIL PROTECTED]

Add revoke support to ext2, ext3 and ext4 by wiring f_ops-revoke with
generic_file_revoke.

Signed-off-by: Pekka Enberg [EMAIL PROTECTED]
---

 fs/ext2/file.c |1 +
 fs/ext3/file.c |1 +
 fs/ext4/file.c |1 +
 3 files changed, 3 insertions(+)

Index: 2.6/fs/ext2/file.c
===
--- 2.6.orig/fs/ext2/file.c 2007-05-04 13:49:05.0 +0300
+++ 2.6/fs/ext2/file.c  2007-07-11 11:48:43.0 +0300
@@ -56,6 +56,7 @@ const struct file_operations ext2_file_o
.sendfile   = generic_file_sendfile,
.splice_read= generic_file_splice_read,
.splice_write   = generic_file_splice_write,
+   .revoke = generic_file_revoke,
 };
 
 #ifdef CONFIG_EXT2_FS_XIP
Index: 2.6/fs/ext3/file.c
===
--- 2.6.orig/fs/ext3/file.c 2007-05-04 13:49:05.0 +0300
+++ 2.6/fs/ext3/file.c  2007-07-11 11:48:43.0 +0300
@@ -123,6 +123,7 @@ const struct file_operations ext3_file_o
.sendfile   = generic_file_sendfile,
.splice_read= generic_file_splice_read,
.splice_write   = generic_file_splice_write,
+   .revoke = generic_file_revoke,
 };
 
 const struct inode_operations ext3_file_inode_operations = {
Index: 2.6/fs/ext4/file.c
===
--- 2.6.orig/fs/ext4/file.c 2007-05-04 13:49:05.0 +0300
+++ 2.6/fs/ext4/file.c  2007-07-11 11:48:43.0 +0300
@@ -123,6 +123,7 @@ const struct file_operations ext4_file_o
.sendfile   = generic_file_sendfile,
.splice_read= generic_file_splice_read,
.splice_write   = generic_file_splice_write,
+   .revoke = generic_file_revoke,
 };
 
 const struct inode_operations ext4_file_inode_operations = {
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH 5/5] revoke: add documentation

2007-07-11 Thread Pekka J Enberg
From: Pekka Enberg [EMAIL PROTECTED]

This documents revoke file operation in Documentation/filesystems/vfs.txt.

Signed-off-by: Pekka Enberg [EMAIL PROTECTED]
---

 Documentation/filesystems/vfs.txt |5 +
 1 file changed, 5 insertions(+)

Index: 2.6/Documentation/filesystems/vfs.txt
===
--- 2.6.orig/Documentation/filesystems/vfs.txt  2007-05-21 15:37:59.0 
+0300
+++ 2.6/Documentation/filesystems/vfs.txt   2007-07-11 11:48:46.0 
+0300
@@ -732,6 +732,7 @@ struct file_operations {
 int);
ssize_t (*splice_read)(struct file *, struct pipe_inode_info *, size_t, 
unsigned  
 int);
+   int (*revoke)(struct file *);
 };
 
 Again, all methods are called without any locks being held, unless
@@ -805,6 +806,10 @@ otherwise noted.
   splice_read: called by the VFS to splice data from file to a pipe. This
   method is used by the splice(2) system call
 
+  revoke: called by revokeat(2) and frevoke(2) system calls to revoke access
+ to an open file. This method must ensure that all currently blocked
+ writes are flushed and reads will fail.
+
 Note that the file operations are implemented by the specific
 filesystem in which the inode resides. When opening a device node
 (character or block special) most filesystems will call special
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 2/5] revoke: core code

2007-07-11 Thread Pekka J Enberg
Hi Al,

On Wed, 11 Jul 2007, Al Viro wrote:
 Better: I have the only opened descriptor for foo.  I send it to myself
 as described above.  I close it.  revoke() is called, finds no opened
 instances of foo in any descriptor tables and cheerfully does nothing.
 I call recvmsg() and I have completely undamaged opened file back.

Uhm, nice. So, revoke() needs a proper inode - struct files mapping 
somewhere. Can we add a list of files to struct inode? Are there other 
cases where a file can point to an inode but the file is not attached to 
any file descriptor?

Pekka
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 2/5] revoke: core code

2007-07-11 Thread Pekka J Enberg
On Wed, 11 Jul 2007, Al Viro wrote:
 BTW, read() or write() in progress might get rather unhappy if your
 live replacement of -f_mapping races with them...

For writes, we (1) never start any new operations after we've cleaned up 
the file descriptor tables so (2) after we're done with do_fsync() we 
never touch -f_mapping again.

But for reads, I think there's a problem if we're in 
do_generic_mapping_read() doing invalidate_inode_pages2() is not enough 
because we're hanging on to the real mapping. Hmm.

Pekka
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 2/5] revoke: core code

2007-07-11 Thread Pekka J Enberg
Hi,

On Wed, 11 Jul 2007, Al Viro wrote:
 The fundamental issue here is that even if you do find struct file,
 you can't blindly rip its -f_mapping since it can be in the middle
 of -read(), -write(), pageout, etc.  And even if you do manage
 that, you still have the ability to do fchmod() later.

Then we would need to change the VFS and relevant parts so that we can 
take down -f_mapping. I don't see how we could do that without affecting 
current hotpaths. Hmm. I suppose what we really need to do is cannibalize 
the actual inode (remove from inode cache, detach from dentry and take 
down the mapping) so that we don't have to touch existing struct file 
pointers at all.

Pekka
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] LogFS take three

2007-05-16 Thread Pekka J Enberg
Hi Joern,

 +#define LOGFS_BUG(sb) do {   \
 + struct super_block *__sb = sb;  \
 + logfs_crash_dump(__sb); \
 + BUG();  \
 +} while(0)

Note that BUG() can be a no-op so dumping something on disk might not make 
sense there. This seems useful, but you probably need to make this bit 
more generic so that using BUG() proper in your filesystem code does the 
right thing. Inventing your own wrapper should be avoided.

 +static inline struct logfs_super *LOGFS_SUPER(struct super_block *sb)
 +{
 + return sb-s_fs_info;
 +}
 +
 +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode)
 +{
 + return container_of(inode, struct logfs_inode, vfs_inode);
 +}

No need for upper case in function names.

 +int logfs_memcpy(void *in, void *out, size_t inlen, size_t outlen)
 +{
 + if (outlen  inlen)
 + return -EIO;
 + memcpy(out, in, inlen);
 + return inlen;
 +}

Please drop this wrapper function. It's better to open-code the error
handling in the callers (there are total of three of them).

 +/* FIXME: combine with per-sb journal variant */
 +static unsigned char compressor_buf[LOGFS_MAX_OBJECTSIZE];
 +static DEFINE_MUTEX(compr_mutex);

This looks fishy. All reads and writes are serialized by compr_mutex 
because they share a scratch buffer for compression and uncompression?

 +/* FIXME: all this mess should get replaced by using the page cache */
 +static void fixup_from_wbuf(struct super_block *sb, struct logfs_area 
*area,
 + void *read, u64 ofs, size_t readlen)
 +{

Indeed. And I think you're getting some more trouble because of this... 

 +int logfs_segment_read(struct super_block *sb, void *buf, u64 ofs)
 +{
 + struct logfs_object_header *h;
 + u16 len;
 + int err, bs = sb-s_blocksize;
 +
 + mutex_lock(compr_mutex);
 + err = wbuf_read(sb, ofs, bs + LOGFS_HEADERSIZE, compressor_buf);
 + if (err)
 + goto out;
 + h = (void*)compressor_buf;
 + len = be16_to_cpu(h-len);
 +
 + switch (h-compr) {
 + case COMPR_NONE:
 + logfs_memcpy(compressor_buf + LOGFS_HEADERSIZE, buf, bs, 
bs);
 + break;

Seems wasteful to first read the data in a scratch buffer and then 
memcpy() it immediately for the COMPR_NONE case. Any reason why we can't 
read a full struct page, for example, and simply use that if we don't need 
to uncompress anything?

 + case COMPR_ZLIB:
 + err = logfs_uncompress(compressor_buf + LOGFS_HEADERSIZE, 
buf,
 + len, bs);
 + BUG_ON(err);
 + break;

Not claiming to undestand your on-disk format, but wouldn't it make more 
sense if we knew whether a given segment is compressed or not _before_ we 
actually read it?
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: forced umount?

2007-03-26 Thread Pekka J Enberg
On Mon, 26 Mar 2007, Phillip Susi wrote:
 Is this revoke system supported for the filesystem as a whole?  I thought it
 was just to force specific files closed, not the whole filesystem.  What if
 the filesystem itself has pending IO to say, update inodes or block bitmaps?
 Can these be aborted?

We never want to _abort_ pending updates only pending reads. So, even with 
revoke(), we need to be careful which is why we do do_fsync() in 
generic_revoke_file() to make sure pending updates are flushed before we 
declare the inode revoked.

But, I haven't looked at forced unmount that much so there may be other 
issues I am not aware of.

Pekka
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] revokeat/frevoke system calls V5

2007-02-07 Thread Pekka J Enberg
Hi Honza,

On Wed, 7 Feb 2007, Jan Kara wrote:
   Have you considered using similar hack as bad_inode.c instead of
 revoked_inode.c?

I am not sure what you mean, revoked_inode.c looks pretty much the same as 
bad_inode.c in mainline...

Pekka
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/PATCH] revokeat/frevoke system calls V5

2007-01-28 Thread Pekka J Enberg
From: Pekka Enberg [EMAIL PROTECTED]

The revokeat(2) and frevoke(2) system calls invalidate open file
descriptors and shared mappings of an inode. After an successful
revocation, operations on file descriptors fail with the EBADF or
ENXIO error code for regular and device files,
respectively. Attempting to read from or write to a revoked mapping
causes SIGBUS.

The actual operation is done in two passes:

 1. Revoke all file descriptors that point to the given inode. We do
this under tasklist_lock so that after this pass, we don't need
to worry about racing with close(2) or dup(2).
   
 2. Take down shared memory mappings of each revoked file and close
the file pointer.

The file descriptors are kept until the owning task does close(2) and
memory mapping ranges preserved until the owning task does munmap(2).

Signed-off-by: Pekka Enberg [EMAIL PROTECTED]
---

 arch/i386/kernel/syscall_table.S |3 
 fs/Makefile  |2 
 fs/ext2/file.c   |1 
 fs/ext3/file.c   |1 
 fs/file_table.c  |1 
 fs/revoke.c  |  588 ++
 fs/revoked_inode.c   |  664 +++
 include/asm-i386/unistd.h|4 
 include/linux/file.h |   14 
 include/linux/fs.h   |6 
 include/linux/mm.h   |2 
 include/linux/syscalls.h |3 
 mm/memory.c  |3 
 mm/mmap.c|   11 
 14 files changed, 1298 insertions(+), 5 deletions(-)

Index: 2.6/arch/i386/kernel/syscall_table.S
===
--- 2.6.orig/arch/i386/kernel/syscall_table.S
+++ 2.6/arch/i386/kernel/syscall_table.S
@@ -319,3 +319,6 @@ ENTRY(sys_call_table)
.long sys_move_pages
.long sys_getcpu
.long sys_epoll_pwait
+   .long sys_revokeat  /* 320 */
+   .long sys_frevoke
+
Index: 2.6/fs/Makefile
===
--- 2.6.orig/fs/Makefile
+++ 2.6/fs/Makefile
@@ -11,7 +11,7 @@ obj-y :=  open.o read_write.o file_table.
attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
seq_file.o xattr.o libfs.o fs-writeback.o \
pnode.o drop_caches.o splice.o sync.o utimes.o \
-   stack.o
+   stack.o revoke.o revoked_inode.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=   buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o
Index: 2.6/fs/revoke.c
===
--- /dev/null
+++ 2.6/fs/revoke.c
@@ -0,0 +1,588 @@
+/*
+ * fs/revoke.c - Invalidate all current open file descriptors of an inode.
+ *
+ * Copyright (C) 2006-2007  Pekka Enberg
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include linux/file.h
+#include linux/fs.h
+#include linux/namei.h
+#include linux/mm.h
+#include linux/mman.h
+#include linux/module.h
+#include linux/mount.h
+#include linux/sched.h
+
+/*
+ * We pre-allocate an array of file pointers (including dummy inodes)
+ * so that we do not need to do kmalloc() under tasklist_lock.  The
+ * revoke operation is done in two passes: first we revoke all fds
+ * pointing to an inode and then we do close/munmap in a second pass.
+ */
+struct revoke_table {
+   struct file **files;
+   unsigned long nr_files; /* capacity */
+   unsigned long nr_revoked;   /* used in first pass */
+   unsigned long nr_closed;/* used in second pass */
+};
+
+struct kmem_cache *revokefs_inode_cache;
+
+/*
+ * Revoked file descriptors point to inodes in the revokefs filesystem.
+ */
+static struct vfsmount *revokefs_mnt;
+
+struct revokefs_inode_info {
+   struct task_struct *owner;
+   struct file *file;
+   unsigned int fd;
+   struct inode vfs_inode;
+};
+
+static inline struct revokefs_inode_info *REVOKEFS_I(struct inode *inode)
+{
+   return container_of(inode, struct revokefs_inode_info, vfs_inode);
+}
+
+extern void make_revoked_inode(struct inode *, int);
+
+static struct file *get_revoked_file(void)
+{
+   struct dentry *dentry;
+   struct inode *inode;
+   struct file *filp;
+   struct qstr name;
+
+   filp = get_empty_filp();
+   if (!filp)
+   goto err;
+
+   inode = new_inode(revokefs_mnt-mnt_sb);
+   if (!inode)
+   goto err_inode;
+
+   name.name = revoked_file;
+   name.len = strlen(name.name);
+   dentry = d_alloc(revokefs_mnt-mnt_sb-s_root, name);
+   if (!dentry)
+   goto err_dentry;
+
+   d_instantiate(dentry, inode);
+
+   filp-f_mapping = inode-i_mapping;
+   filp-f_dentry = dget(dentry);
+   filp-f_vfsmnt = mntget(revokefs_mnt);
+   filp-f_op = fops_get(inode-i_fop);
+   filp-f_pos = 0;
+
+   return filp;
+
+  err_dentry:
+   iput(inode);
+  err_inode:
+  

Re: share/private/slave a subtree

2005-07-08 Thread Pekka J Enberg

On Fri, 8 Jul 2005, Pekka J Enberg wrote:

 Hey, I just review patches. I don't get to set requirements. There's a reason
 why enums are preferred though. They define a proper name for the constant.


Roman Zippel writes:

Who prefers that?


Well, me, at least. I can't speak for others. 


On Fri, 8 Jul 2005, Pekka J Enberg wrote:

 It's far to easy to mess up with #defines.


Roman Zippel writes:

Rather unlikely with such simple masks.


Redefining a constant with #define by an accident is easy. Introducing 
duplicate constants is equally easy (see radeon headers for an example). 


On Fri, 8 Jul 2005, Pekka J Enberg wrote:

 They also document the code intent
 much better as you can group related constants together.


Roman Zippel writes:

You can't do that with defines?


Sure you can but have you ever tried to figure out where a group of #define 
enumerations end? Enums are a natural language construct for grouping 
related constants so why not use it? 

Bottom line, there are few advantages to using enums rather than #defines 
which is why they are IMHO preferred for new code. 

   Pekka 


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html