Re: [RFC 01/11] add generic versions of debugfs file operations

2008-02-24 Thread Arnd Bergmann
On Saturday 23 February 2008, Al Viro wrote:

 Ewww - caps, \n...  BTW, \0 is pointless here - simple_read_from_buffer() will
 not access it with these arguments)...

 ... 

 Please, check the length; sloppy input grammar is a bad idea.  Hell, at the
 very least you want -EINVAL if input is not recognized...

Ok, these two come straight from debugfs, but of course I can fix them
anyway. Should I do a fix for debugfs in 2.6.25 first, or rather do a
patch on top of the libfs rework?

I'd rather not have any semantic changes during the conversion, so
it would be better to keep that separate imho.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 02/11] introduce simple_fs_type

2008-02-24 Thread Arnd Bergmann
On Saturday 23 February 2008, Al Viro wrote:
 On Tue, Feb 19, 2008 at 05:04:37AM +0100, Arnd Bergmann wrote:
  The most interesting function here is the new struct dentry *
  simple_register_filesystem(struct simple_fs_type *type), which
  returns the root directory of a new file system that can then
  be passed to simple_create_file() and similar functions as a
  parent.
 
 Don't mix split the file with add new stuff, please. 

Ok, I'll introduce it in libfs.c first then.

 And frankly, I'm not convinced that embedding file_system_type
 into another struct is a good idea.

Right, that has been one of the areas I've been thinking about
myself without coming to a good solution. One alternative I
thought about is to have the news members as part of file_system_type
itself, but that would cost a bit of memory for every single 
file system, and might lead to unintended misuse of these members
by non-simple file systems.

Do you think it would be better to dynamically allocate the
file_system_type and have a pointer in struct simple_fs_type to
it?

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 03/11] slim down debugfs

2008-02-24 Thread Arnd Bergmann
On Saturday 23 February 2008, Al Viro wrote:
  +config LIBFS
  + tristate
  + default m
  + help
  +   libfs is a helper library used by many of the simpler file
  +   systems. Parts of libfs can be modular when all of its users
  +   are modules as well, and the users should select this symbol.
 
 NAK.  For one thing, you need dependencies or selects and none of the
 patches later in the series introduces them.  For another, neither
 dependencies nor selects work well if you have non-modular users of
 that sucker.  Seriously, try to add those and do allmodconfig.  Then
 look what a mess it produces.

Sorry about that hunk, I didn't mean to keep that in there. In
my testing, I used the newly introduced parts as a loadable
module, and at one point the intention was to keep the existing
libfs functions as builtin and add a new Kconfig dependency in every
user of the new stuff whenever a file system gets converted.

I figured out later that hardly anyone could even try to build
without them as soon as all the trival file systems are moved
over, but forgot to make it always builtin.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 02/11] introduce simple_fs_type

2008-02-18 Thread Arnd Bergmann
There is a number of pseudo file systems in the kernel
that are basically copies of debugfs, all implementing the
same boilerplate code, just with different bugs.

This adds yet another copy to the kernel in the libfs directory,
with generalized helpers that can be used by any of them.

The most interesting function here is the new struct dentry *
simple_register_filesystem(struct simple_fs_type *type), which
returns the root directory of a new file system that can then
be passed to simple_create_file() and similar functions as a
parent.

Signed-off-by: Arnd Bergman [EMAIL PROTECTED]
Index: linux-2.6/fs/libfs.c
===
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -263,11 +263,6 @@ int simple_link(struct dentry *old_dentr
return 0;
 }
 
-static inline int simple_positive(struct dentry *dentry)
-{
-   return dentry-d_inode  !d_unhashed(dentry);
-}
-
 int simple_empty(struct dentry *dentry)
 {
struct dentry *child;
@@ -409,109 +404,6 @@ int simple_write_end(struct file *file, 
return copied;
 }
 
-/*
- * the inodes created here are not hashed. If you use iunique to generate
- * unique inode values later for this filesystem, then you must take care
- * to pass it an appropriate max_reserved value to avoid collisions.
- */
-int simple_fill_super(struct super_block *s, int magic, struct tree_descr 
*files)
-{
-   struct inode *inode;
-   struct dentry *root;
-   struct dentry *dentry;
-   int i;
-
-   s-s_blocksize = PAGE_CACHE_SIZE;
-   s-s_blocksize_bits = PAGE_CACHE_SHIFT;
-   s-s_magic = magic;
-   s-s_op = simple_super_operations;
-   s-s_time_gran = 1;
-
-   inode = new_inode(s);
-   if (!inode)
-   return -ENOMEM;
-   /*
-* because the root inode is 1, the files array must not contain an
-* entry at index 1
-*/
-   inode-i_ino = 1;
-   inode-i_mode = S_IFDIR | 0755;
-   inode-i_uid = inode-i_gid = 0;
-   inode-i_blocks = 0;
-   inode-i_atime = inode-i_mtime = inode-i_ctime = CURRENT_TIME;
-   inode-i_op = simple_dir_inode_operations;
-   inode-i_fop = simple_dir_operations;
-   inode-i_nlink = 2;
-   root = d_alloc_root(inode);
-   if (!root) {
-   iput(inode);
-   return -ENOMEM;
-   }
-   for (i = 0; !files-name || files-name[0]; i++, files++) {
-   if (!files-name)
-   continue;
-
-   /* warn if it tries to conflict with the root inode */
-   if (unlikely(i == 1))
-   printk(KERN_WARNING %s: %s passed in a files array
-   with an index of 1!\n, __func__,
-   s-s_type-name);
-
-   dentry = d_alloc_name(root, files-name);
-   if (!dentry)
-   goto out;
-   inode = new_inode(s);
-   if (!inode)
-   goto out;
-   inode-i_mode = S_IFREG | files-mode;
-   inode-i_uid = inode-i_gid = 0;
-   inode-i_blocks = 0;
-   inode-i_atime = inode-i_mtime = inode-i_ctime = CURRENT_TIME;
-   inode-i_fop = files-ops;
-   inode-i_ino = i;
-   d_add(dentry, inode);
-   }
-   s-s_root = root;
-   return 0;
-out:
-   d_genocide(root);
-   dput(root);
-   return -ENOMEM;
-}
-
-static DEFINE_SPINLOCK(pin_fs_lock);
-
-int simple_pin_fs(struct file_system_type *type, struct vfsmount **mount, int 
*count)
-{
-   struct vfsmount *mnt = NULL;
-   spin_lock(pin_fs_lock);
-   if (unlikely(!*mount)) {
-   spin_unlock(pin_fs_lock);
-   mnt = vfs_kern_mount(type, 0, type-name, NULL);
-   if (IS_ERR(mnt))
-   return PTR_ERR(mnt);
-   spin_lock(pin_fs_lock);
-   if (!*mount)
-   *mount = mnt;
-   }
-   mntget(*mount);
-   ++*count;
-   spin_unlock(pin_fs_lock);
-   mntput(mnt);
-   return 0;
-}
-
-void simple_release_fs(struct vfsmount **mount, int *count)
-{
-   struct vfsmount *mnt;
-   spin_lock(pin_fs_lock);
-   mnt = *mount;
-   if (!--*count)
-   *mount = NULL;
-   spin_unlock(pin_fs_lock);
-   mntput(mnt);
-}
-
 ssize_t simple_read_from_buffer(void __user *to, size_t count, loff_t *ppos,
const void *from, size_t available)
 {
@@ -786,14 +678,11 @@ EXPORT_SYMBOL(simple_dir_inode_operation
 EXPORT_SYMBOL(simple_dir_operations);
 EXPORT_SYMBOL(simple_empty);
 EXPORT_SYMBOL(d_alloc_name);
-EXPORT_SYMBOL(simple_fill_super);
 EXPORT_SYMBOL(simple_getattr);
 EXPORT_SYMBOL(simple_link);
 EXPORT_SYMBOL(simple_lookup);
-EXPORT_SYMBOL(simple_pin_fs);
 EXPORT_SYMBOL(simple_prepare_write);
 EXPORT_SYMBOL(simple_readpage);
-EXPORT_SYMBOL(simple_release_fs);
 

[RFC 06/11] split out linux/libfs.h from linux/fs.h

2008-02-18 Thread Arnd Bergmann
With libfs turning into a larger subsystem, it makes
sense to have a separate header that is not included
by the low-level vfs code.

Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]
Index: linux-2.6/fs/debugfs/inode.c
===
--- linux-2.6.orig/fs/debugfs/inode.c
+++ linux-2.6/fs/debugfs/inode.c
@@ -18,6 +18,7 @@
 
 #include linux/module.h
 #include linux/fs.h
+#include linux/libfs.h
 #include linux/mount.h
 #include linux/pagemap.h
 #include linux/init.h
Index: linux-2.6/fs/dcache.c
===
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -947,6 +947,7 @@ struct dentry *d_alloc_name(struct dentr
q.hash = full_name_hash(q.name, q.len);
return d_alloc(parent, q);
 }
+EXPORT_SYMBOL(d_alloc_name);
 
 /**
  * d_instantiate - fill in inode information for a dentry
Index: linux-2.6/drivers/usb/core/inode.c
===
--- linux-2.6.orig/drivers/usb/core/inode.c
+++ linux-2.6/drivers/usb/core/inode.c
@@ -27,6 +27,7 @@
 
 /*/
 
+#include linux/libfs.h
 #include linux/module.h
 #include linux/fs.h
 #include linux/mount.h
Index: linux-2.6/fs/binfmt_misc.c
===
--- linux-2.6.orig/fs/binfmt_misc.c
+++ linux-2.6/fs/binfmt_misc.c
@@ -16,6 +16,7 @@
  *  2001-02-28 AV: rewritten into something that resembles C. Original didn't.
  */
 
+#include linux/libfs.h
 #include linux/module.h
 #include linux/init.h
 #include linux/sched.h
Index: linux-2.6/fs/configfs/mount.c
===
--- linux-2.6.orig/fs/configfs/mount.c
+++ linux-2.6/fs/configfs/mount.c
@@ -25,6 +25,7 @@
  */
 
 #include linux/fs.h
+#include linux/libfs.h
 #include linux/module.h
 #include linux/mount.h
 #include linux/pagemap.h
Index: linux-2.6/fs/debugfs/file.c
===
--- linux-2.6.orig/fs/debugfs/file.c
+++ linux-2.6/fs/debugfs/file.c
@@ -15,6 +15,7 @@
 
 #include linux/module.h
 #include linux/fs.h
+#include linux/libfs.h
 #include linux/pagemap.h
 #include linux/namei.h
 #include linux/debugfs.h
Index: linux-2.6/fs/fuse/control.c
===
--- linux-2.6.orig/fs/fuse/control.c
+++ linux-2.6/fs/fuse/control.c
@@ -9,6 +9,7 @@
 #include fuse_i.h
 
 #include linux/init.h
+#include linux/libfs.h
 #include linux/module.h
 
 #define FUSE_CTL_SUPER_MAGIC 0x65735543
Index: linux-2.6/fs/nfsd/nfsctl.c
===
--- linux-2.6.orig/fs/nfsd/nfsctl.c
+++ linux-2.6/fs/nfsd/nfsctl.c
@@ -8,6 +8,7 @@
 
 #include linux/module.h
 
+#include linux/libfs.h
 #include linux/linkage.h
 #include linux/time.h
 #include linux/errno.h
Index: linux-2.6/net/sunrpc/rpc_pipe.c
===
--- linux-2.6.orig/net/sunrpc/rpc_pipe.c
+++ linux-2.6/net/sunrpc/rpc_pipe.c
@@ -8,6 +8,7 @@
  * Copyright (c) 2002, Trond Myklebust [EMAIL PROTECTED]
  *
  */
+#include linux/libfs.h
 #include linux/module.h
 #include linux/slab.h
 #include linux/string.h
Index: linux-2.6/security/inode.c
===
--- linux-2.6.orig/security/inode.c
+++ linux-2.6/security/inode.c
@@ -16,6 +16,7 @@
 
 #include linux/module.h
 #include linux/init.h
+#include linux/libfs.h
 #include linux/security.h
 
 #define SECURITYFS_MAGIC   0x73636673
Index: linux-2.6/security/selinux/selinuxfs.c
===
--- linux-2.6.orig/security/selinux/selinuxfs.c
+++ linux-2.6/security/selinux/selinuxfs.c
@@ -14,6 +14,7 @@
  * the Free Software Foundation, version 2.
  */
 
+#include linux/libfs.h
 #include linux/kernel.h
 #include linux/pagemap.h
 #include linux/slab.h
Index: linux-2.6/virt/kvm/kvm_main.c
===
--- linux-2.6.orig/virt/kvm/kvm_main.c
+++ linux-2.6/virt/kvm/kvm_main.c
@@ -40,6 +40,7 @@
 #include linux/kvm_para.h
 #include linux/pagemap.h
 #include linux/mman.h
+#include linux/libfs.h
 
 #include asm/processor.h
 #include asm/io.h
Index: linux-2.6/arch/powerpc/platforms/cell/spufs/spufs.h
===
--- linux-2.6.orig/arch/powerpc/platforms/cell/spufs/spufs.h
+++ linux-2.6/arch/powerpc/platforms/cell/spufs/spufs.h
@@ -27,6 +27,7 @@
 #include linux/spinlock.h
 #include linux/fs.h
 #include linux/cpumask.h
+#include linux/libfs.h
 
 #include asm/spu.h
 #include asm/spu_csa.h
Index: linux-2.6/fs/autofs4/autofs_i.h
===
--- linux-2.6.orig/fs/autofs4/autofs_i.h
+++ linux-2.6/fs/autofs4/autofs_i.h
@@ -22,6 +22,7

[RFC 04/11] slim down securityfs

2008-02-18 Thread Arnd Bergmann
With the new simple_fs_type in place, securityfs practically
becomes a nop and we just need to leave code around to manage
its mount point.

Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]
Index: linux-2.6/security/inode.c
===
--- linux-2.6.orig/security/inode.c
+++ linux-2.6/security/inode.c
@@ -13,176 +13,14 @@
  */
 
 /* #define DEBUG */
+
 #include linux/module.h
-#include linux/fs.h
-#include linux/mount.h
-#include linux/pagemap.h
 #include linux/init.h
-#include linux/namei.h
 #include linux/security.h
 
 #define SECURITYFS_MAGIC   0x73636673
 
-static struct vfsmount *mount;
-static int mount_count;
-
-/*
- * TODO:
- *   I think I can get rid of these default_file_ops, but not quite sure...
- */
-static ssize_t default_read_file(struct file *file, char __user *buf,
-size_t count, loff_t *ppos)
-{
-   return 0;
-}
-
-static ssize_t default_write_file(struct file *file, const char __user *buf,
-  size_t count, loff_t *ppos)
-{
-   return count;
-}
-
-static int default_open(struct inode *inode, struct file *file)
-{
-   if (inode-i_private)
-   file-private_data = inode-i_private;
-
-   return 0;
-}
-
-static const struct file_operations default_file_ops = {
-   .read = default_read_file,
-   .write =default_write_file,
-   .open = default_open,
-};
-
-static struct inode *get_inode(struct super_block *sb, int mode, dev_t dev)
-{
-   struct inode *inode = new_inode(sb);
-
-   if (inode) {
-   inode-i_mode = mode;
-   inode-i_uid = 0;
-   inode-i_gid = 0;
-   inode-i_blocks = 0;
-   inode-i_atime = inode-i_mtime = inode-i_ctime = CURRENT_TIME;
-   switch (mode  S_IFMT) {
-   default:
-   init_special_inode(inode, mode, dev);
-   break;
-   case S_IFREG:
-   inode-i_fop = default_file_ops;
-   break;
-   case S_IFDIR:
-   inode-i_op = simple_dir_inode_operations;
-   inode-i_fop = simple_dir_operations;
-
-   /* directory inodes start off with i_nlink == 2 (for 
. entry) */
-   inc_nlink(inode);
-   break;
-   }
-   }
-   return inode;
-}
-
-/* SMP-safe */
-static int mknod(struct inode *dir, struct dentry *dentry,
-int mode, dev_t dev)
-{
-   struct inode *inode;
-   int error = -EPERM;
-
-   if (dentry-d_inode)
-   return -EEXIST;
-
-   inode = get_inode(dir-i_sb, mode, dev);
-   if (inode) {
-   d_instantiate(dentry, inode);
-   dget(dentry);
-   error = 0;
-   }
-   return error;
-}
-
-static int mkdir(struct inode *dir, struct dentry *dentry, int mode)
-{
-   int res;
-
-   mode = (mode  (S_IRWXUGO | S_ISVTX)) | S_IFDIR;
-   res = mknod(dir, dentry, mode, 0);
-   if (!res)
-   inc_nlink(dir);
-   return res;
-}
-
-static int create(struct inode *dir, struct dentry *dentry, int mode)
-{
-   mode = (mode  S_IALLUGO) | S_IFREG;
-   return mknod(dir, dentry, mode, 0);
-}
-
-static inline int positive(struct dentry *dentry)
-{
-   return dentry-d_inode  !d_unhashed(dentry);
-}
-
-static int fill_super(struct super_block *sb, void *data, int silent)
-{
-   static struct tree_descr files[] = {{}};
-
-   return simple_fill_super(sb, SECURITYFS_MAGIC, files);
-}
-
-static int get_sb(struct file_system_type *fs_type,
- int flags, const char *dev_name,
- void *data, struct vfsmount *mnt)
-{
-   return get_sb_single(fs_type, flags, data, fill_super, mnt);
-}
-
-static struct file_system_type fs_type = {
-   .owner =THIS_MODULE,
-   .name = securityfs,
-   .get_sb =   get_sb,
-   .kill_sb =  kill_litter_super,
-};
-
-static int create_by_name(const char *name, mode_t mode,
- struct dentry *parent,
- struct dentry **dentry)
-{
-   int error = 0;
-
-   *dentry = NULL;
-
-   /* If the parent is not specified, we create it in the root.
-* We need the root dentry to do this, which is in the super
-* block. A pointer to that is in the struct vfsmount that we
-* have around.
-*/
-   if (!parent ) {
-   if (mount  mount-mnt_sb) {
-   parent = mount-mnt_sb-s_root;
-   }
-   }
-   if (!parent) {
-   pr_debug(securityfs: Ah! can not find a parent!\n);
-   return -EFAULT;
-   }
-
-   mutex_lock(parent-d_inode-i_mutex);
-   *dentry = lookup_one_len(name, parent, strlen(name));
-   if (!IS_ERR(dentry

[RFC 05/11] slim down usbfs

2008-02-18 Thread Arnd Bergmann
Half of the usbfs code is the same as debugfs, so we can
replace it now with calls to the generic libfs versions.

Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]
Index: linux-2.6/drivers/usb/core/inode.c
===
--- linux-2.6.orig/drivers/usb/core/inode.c
+++ linux-2.6/drivers/usb/core/inode.c
@@ -47,11 +47,10 @@
 #define USBFS_DEFAULT_BUSMODE (S_IXUGO | S_IRUGO)
 #define USBFS_DEFAULT_LISTMODE S_IRUGO
 
-static struct super_operations usbfs_ops;
-static const struct file_operations default_file_operations;
-static struct vfsmount *usbfs_mount;
-static int usbfs_mount_count;  /* = 0 */
-static int ignore_mount = 0;
+static DEFINE_SIMPLE_FS(usb_fs_type, usbfs, NULL, USBDEVICE_SUPER_MAGIC);
+static struct dentry *usbfs_root;
+
+static int ignore_mount = 1;
 
 static struct dentry *devices_usbfs_dentry;
 static int num_buses;  /* = 0 */
@@ -263,186 +262,11 @@ static int remount(struct super_block *s
return -EINVAL;
}
 
-   if (usbfs_mount  usbfs_mount-mnt_sb)
-   update_sb(usbfs_mount-mnt_sb);
-
-   return 0;
-}
-
-static struct inode *usbfs_get_inode (struct super_block *sb, int mode, dev_t 
dev)
-{
-   struct inode *inode = new_inode(sb);
-
-   if (inode) {
-   inode-i_mode = mode;
-   inode-i_uid = current-fsuid;
-   inode-i_gid = current-fsgid;
-   inode-i_blocks = 0;
-   inode-i_atime = inode-i_mtime = inode-i_ctime = CURRENT_TIME;
-   switch (mode  S_IFMT) {
-   default:
-   init_special_inode(inode, mode, dev);
-   break;
-   case S_IFREG:
-   inode-i_fop = default_file_operations;
-   break;
-   case S_IFDIR:
-   inode-i_op = simple_dir_inode_operations;
-   inode-i_fop = simple_dir_operations;
-
-   /* directory inodes start off with i_nlink == 2 (for 
. entry) */
-   inc_nlink(inode);
-   break;
-   }
-   }
-   return inode; 
-}
-
-/* SMP-safe */
-static int usbfs_mknod (struct inode *dir, struct dentry *dentry, int mode,
-   dev_t dev)
-{
-   struct inode *inode = usbfs_get_inode(dir-i_sb, mode, dev);
-   int error = -EPERM;
-
-   if (dentry-d_inode)
-   return -EEXIST;
-
-   if (inode) {
-   d_instantiate(dentry, inode);
-   dget(dentry);
-   error = 0;
-   }
-   return error;
-}
-
-static int usbfs_mkdir (struct inode *dir, struct dentry *dentry, int mode)
-{
-   int res;
-
-   mode = (mode  (S_IRWXUGO | S_ISVTX)) | S_IFDIR;
-   res = usbfs_mknod (dir, dentry, mode, 0);
-   if (!res)
-   inc_nlink(dir);
-   return res;
-}
-
-static int usbfs_create (struct inode *dir, struct dentry *dentry, int mode)
-{
-   mode = (mode  S_IALLUGO) | S_IFREG;
-   return usbfs_mknod (dir, dentry, mode, 0);
-}
-
-static inline int usbfs_positive (struct dentry *dentry)
-{
-   return dentry-d_inode  !d_unhashed(dentry);
-}
-
-static int usbfs_empty (struct dentry *dentry)
-{
-   struct list_head *list;
-
-   spin_lock(dcache_lock);
-
-   list_for_each(list, dentry-d_subdirs) {
-   struct dentry *de = list_entry(list, struct dentry, 
d_u.d_child);
-   if (usbfs_positive(de)) {
-   spin_unlock(dcache_lock);
-   return 0;
-   }
-   }
-
-   spin_unlock(dcache_lock);
-   return 1;
-}
-
-static int usbfs_unlink (struct inode *dir, struct dentry *dentry)
-{
-   struct inode *inode = dentry-d_inode;
-   mutex_lock(inode-i_mutex);
-   drop_nlink(dentry-d_inode);
-   dput(dentry);
-   mutex_unlock(inode-i_mutex);
-   d_delete(dentry);
-   return 0;
-}
-
-static int usbfs_rmdir(struct inode *dir, struct dentry *dentry)
-{
-   int error = -ENOTEMPTY;
-   struct inode * inode = dentry-d_inode;
-
-   mutex_lock(inode-i_mutex);
-   dentry_unhash(dentry);
-   if (usbfs_empty(dentry)) {
-   drop_nlink(dentry-d_inode);
-   drop_nlink(dentry-d_inode);
-   dput(dentry);
-   inode-i_flags |= S_DEAD;
-   drop_nlink(dir);
-   error = 0;
-   }
-   mutex_unlock(inode-i_mutex);
-   if (!error)
-   d_delete(dentry);
-   dput(dentry);
-   return error;
-}
-
-
-/* default file operations */
-static ssize_t default_read_file (struct file *file, char __user *buf,
- size_t count, loff_t *ppos)
-{
-   return 0;
-}
-
-static ssize_t default_write_file (struct file *file, const char __user *buf,
-  size_t count, loff_t *ppos)
-{
-   return count;
-}
-
-static loff_t default_file_lseek

[RFC 09/11] split out libfs/super.c from libfs.c

2008-02-18 Thread Arnd Bergmann
Consolidate all super block manipulation code in libfs in a single
source file.

Signed-off-by: Arnd Bergman [EMAIL PROTECTED]
Index: linux-2.6/fs/libfs.c
===
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -12,63 +12,6 @@
 
 #include asm/uaccess.h
 
-static const struct super_operations simple_super_operations = {
-   .statfs = simple_statfs,
-};
-
-/*
- * Common helper for pseudo-filesystems (sockfs, pipefs, bdev - stuff that
- * will never be mountable)
- */
-int get_sb_pseudo(struct file_system_type *fs_type, char *name,
-   const struct super_operations *ops, unsigned long magic,
-   struct vfsmount *mnt)
-{
-   struct super_block *s = sget(fs_type, NULL, set_anon_super, NULL);
-   struct dentry *dentry;
-   struct inode *root;
-   struct qstr d_name = {.name = name, .len = strlen(name)};
-
-   if (IS_ERR(s))
-   return PTR_ERR(s);
-
-   s-s_flags = MS_NOUSER;
-   s-s_maxbytes = ~0ULL;
-   s-s_blocksize = 1024;
-   s-s_blocksize_bits = 10;
-   s-s_magic = magic;
-   s-s_op = ops ? ops : simple_super_operations;
-   s-s_time_gran = 1;
-   root = new_inode(s);
-   if (!root)
-   goto Enomem;
-   /*
-* since this is the first inode, make it number 1. New inodes created
-* after this must take care not to collide with it (by passing
-* max_reserved of 1 to iunique).
-*/
-   root-i_ino = 1;
-   root-i_mode = S_IFDIR | S_IRUSR | S_IWUSR;
-   root-i_uid = root-i_gid = 0;
-   root-i_atime = root-i_mtime = root-i_ctime = CURRENT_TIME;
-   dentry = d_alloc(NULL, d_name);
-   if (!dentry) {
-   iput(root);
-   goto Enomem;
-   }
-   dentry-d_sb = s;
-   dentry-d_parent = dentry;
-   d_instantiate(dentry, root);
-   s-s_root = dentry;
-   s-s_flags |= MS_ACTIVE;
-   return simple_set_mnt(mnt, s);
-
-Enomem:
-   up_write(s-s_umount);
-   deactivate_super(s);
-   return -ENOMEM;
-}
-
 int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry 
*dentry)
 {
struct inode *inode = old_dentry-d_inode;
@@ -238,7 +181,6 @@ ssize_t simple_read_from_buffer(void __u
return count;
 }
 
-EXPORT_SYMBOL(get_sb_pseudo);
 EXPORT_SYMBOL(simple_write_begin);
 EXPORT_SYMBOL(simple_write_end);
 EXPORT_SYMBOL(simple_empty);
Index: linux-2.6/fs/libfs/super.c
===
--- linux-2.6.orig/fs/libfs/super.c
+++ linux-2.6/fs/libfs/super.c
@@ -54,6 +54,60 @@ static const struct super_operations sim
 };
 
 /*
+ * Common helper for pseudo-filesystems (sockfs, pipefs, bdev - stuff that
+ * will never be mountable)
+ */
+int get_sb_pseudo(struct file_system_type *fs_type, char *name,
+   const struct super_operations *ops, unsigned long magic,
+   struct vfsmount *mnt)
+{
+   struct super_block *s = sget(fs_type, NULL, set_anon_super, NULL);
+   struct dentry *dentry;
+   struct inode *root;
+   struct qstr d_name = {.name = name, .len = strlen(name)};
+
+   if (IS_ERR(s))
+   return PTR_ERR(s);
+
+   s-s_flags = MS_NOUSER;
+   s-s_maxbytes = ~0ULL;
+   s-s_blocksize = 1024;
+   s-s_blocksize_bits = 10;
+   s-s_magic = magic;
+   s-s_op = ops ? ops : simple_super_operations;
+   s-s_time_gran = 1;
+   root = new_inode(s);
+   if (!root)
+   goto Enomem;
+   /*
+* since this is the first inode, make it number 1. New inodes created
+* after this must take care not to collide with it (by passing
+* max_reserved of 1 to iunique).
+*/
+   root-i_ino = 1;
+   root-i_mode = S_IFDIR | S_IRUSR | S_IWUSR;
+   root-i_uid = root-i_gid = 0;
+   root-i_atime = root-i_mtime = root-i_ctime = CURRENT_TIME;
+   dentry = d_alloc(NULL, d_name);
+   if (!dentry) {
+   iput(root);
+   goto Enomem;
+   }
+   dentry-d_sb = s;
+   dentry-d_parent = dentry;
+   d_instantiate(dentry, root);
+   s-s_root = dentry;
+   s-s_flags |= MS_ACTIVE;
+   return simple_set_mnt(mnt, s);
+
+Enomem:
+   up_write(s-s_umount);
+   deactivate_super(s);
+   return -ENOMEM;
+}
+EXPORT_SYMBOL(get_sb_pseudo);
+
+/*
  * the inodes created here are not hashed. If you use iunique to generate
  * unique inode values later for this filesystem, then you must take care
  * to pass it an appropriate max_reserved value to avoid collisions.

-- 

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 08/11] split out libfs/dentry.c from libfs.c

2008-02-18 Thread Arnd Bergmann
Consolidate all dentry manipulation code in libfs in a single
source file.

Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]

Index: linux-2.6/fs/libfs.c
===
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -12,188 +12,6 @@
 
 #include asm/uaccess.h
 
-int simple_getattr(struct vfsmount *mnt, struct dentry *dentry,
-  struct kstat *stat)
-{
-   struct inode *inode = dentry-d_inode;
-   generic_fillattr(inode, stat);
-   stat-blocks = inode-i_mapping-nrpages  (PAGE_CACHE_SHIFT - 9);
-   return 0;
-}
-
-int simple_statfs(struct dentry *dentry, struct kstatfs *buf)
-{
-   buf-f_type = dentry-d_sb-s_magic;
-   buf-f_bsize = PAGE_CACHE_SIZE;
-   buf-f_namelen = NAME_MAX;
-   return 0;
-}
-
-/*
- * Retaining negative dentries for an in-memory filesystem just wastes
- * memory and lookup time: arrange for them to be deleted immediately.
- */
-static int simple_delete_dentry(struct dentry *dentry)
-{
-   return 1;
-}
-
-/*
- * Lookup the data. This is trivial - if the dentry didn't already
- * exist, we know it is negative.  Set d_op to delete negative dentries.
- */
-struct dentry *simple_lookup(struct inode *dir, struct dentry *dentry, struct 
nameidata *nd)
-{
-   static struct dentry_operations simple_dentry_operations = {
-   .d_delete = simple_delete_dentry,
-   };
-
-   if (dentry-d_name.len  NAME_MAX)
-   return ERR_PTR(-ENAMETOOLONG);
-   dentry-d_op = simple_dentry_operations;
-   d_add(dentry, NULL);
-   return NULL;
-}
-
-int simple_sync_file(struct file * file, struct dentry *dentry, int datasync)
-{
-   return 0;
-}
- 
-int dcache_dir_open(struct inode *inode, struct file *file)
-{
-   static struct qstr cursor_name = {.len = 1, .name = .};
-
-   file-private_data = d_alloc(file-f_path.dentry, cursor_name);
-
-   return file-private_data ? 0 : -ENOMEM;
-}
-
-int dcache_dir_close(struct inode *inode, struct file *file)
-{
-   dput(file-private_data);
-   return 0;
-}
-
-loff_t dcache_dir_lseek(struct file *file, loff_t offset, int origin)
-{
-   mutex_lock(file-f_path.dentry-d_inode-i_mutex);
-   switch (origin) {
-   case 1:
-   offset += file-f_pos;
-   case 0:
-   if (offset = 0)
-   break;
-   default:
-   mutex_unlock(file-f_path.dentry-d_inode-i_mutex);
-   return -EINVAL;
-   }
-   if (offset != file-f_pos) {
-   file-f_pos = offset;
-   if (file-f_pos = 2) {
-   struct list_head *p;
-   struct dentry *cursor = file-private_data;
-   loff_t n = file-f_pos - 2;
-
-   spin_lock(dcache_lock);
-   list_del(cursor-d_u.d_child);
-   p = file-f_path.dentry-d_subdirs.next;
-   while (n  p != file-f_path.dentry-d_subdirs) {
-   struct dentry *next;
-   next = list_entry(p, struct dentry, 
d_u.d_child);
-   if (!d_unhashed(next)  next-d_inode)
-   n--;
-   p = p-next;
-   }
-   list_add_tail(cursor-d_u.d_child, p);
-   spin_unlock(dcache_lock);
-   }
-   }
-   mutex_unlock(file-f_path.dentry-d_inode-i_mutex);
-   return offset;
-}
-
-/* Relationship between i_mode and the DT_xxx types */
-static inline unsigned char dt_type(struct inode *inode)
-{
-   return (inode-i_mode  12)  15;
-}
-
-/*
- * Directory is locked and all positive dentries in it are safe, since
- * for ramfs-type trees they can't go away without unlink() or rmdir(),
- * both impossible due to the lock on directory.
- */
-
-int dcache_readdir(struct file * filp, void * dirent, filldir_t filldir)
-{
-   struct dentry *dentry = filp-f_path.dentry;
-   struct dentry *cursor = filp-private_data;
-   struct list_head *p, *q = cursor-d_u.d_child;
-   ino_t ino;
-   int i = filp-f_pos;
-
-   switch (i) {
-   case 0:
-   ino = dentry-d_inode-i_ino;
-   if (filldir(dirent, ., 1, i, ino, DT_DIR)  0)
-   break;
-   filp-f_pos++;
-   i++;
-   /* fallthrough */
-   case 1:
-   ino = parent_ino(dentry);
-   if (filldir(dirent, .., 2, i, ino, DT_DIR)  0)
-   break;
-   filp-f_pos++;
-   i++;
-   /* fallthrough */
-   default:
-   spin_lock(dcache_lock

[RFC 01/11] add generic versions of debugfs file operations

2008-02-18 Thread Arnd Bergmann
The file operations in debugfs are rather generic and can
be used by other file systems, so it can be interesting to
include them in libfs, with more generic names, and exported
to modules.

This patch adds a new copy of these operations to libfs,
so that the debugfs version can later be cut down.

Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]

Index: linux-2.6/fs/Makefile
===
--- linux-2.6.orig/fs/Makefile
+++ linux-2.6/fs/Makefile
@@ -13,6 +13,8 @@ obj-y :=  open.o read_write.o file_table.
pnode.o drop_caches.o splice.o sync.o utimes.o \
stack.o
 
+obj-$(CONFIG_LIBFS) += libfs/
+
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=   buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o
 else
Index: linux-2.6/include/linux/libfs.h
===
--- /dev/null
+++ linux-2.6/include/linux/libfs.h
@@ -0,0 +1,21 @@
+#ifndef __LIBFS_H__
+#define __LIBFS_H__
+
+#include linux/fs.h
+
+extern const struct file_operations simple_fops_u8;
+extern const struct file_operations simple_fops_x8;
+extern const struct file_operations simple_fops_u16;
+extern const struct file_operations simple_fops_x16;
+extern const struct file_operations simple_fops_u32;
+extern const struct file_operations simple_fops_x32;
+extern const struct file_operations simple_fops_u64;
+extern const struct file_operations simple_fops_bool;
+extern const struct file_operations simple_fops_blob;
+
+struct simple_blob_wrapper {
+   void *data;
+   unsigned long size;
+};
+
+#endif /* __LIBFS_H__ */
Index: linux-2.6/fs/libfs/Makefile
===
--- /dev/null
+++ linux-2.6/fs/libfs/Makefile
@@ -0,0 +1,3 @@
+libfs-y += file.o
+
+obj-$(CONFIG_LIBFS) += libfs.o
Index: linux-2.6/fs/libfs/file.c
===
--- /dev/null
+++ linux-2.6/fs/libfs/file.c
@@ -0,0 +1,126 @@
+/*
+ * fs/libfs/file.c
+ * Library for filesystems writers.
+ */
+
+#include linux/fs.h
+#include linux/libfs.h
+#include linux/module.h
+
+#include asm/uaccess.h
+
+/* commonly used attribute file operations */
+static int simple_u8_set(void *data, u64 val)
+{
+   *(u8 *)data = val;
+   return 0;
+}
+static int simple_u8_get(void *data, u64 *val)
+{
+   *val = *(u8 *)data;
+   return 0;
+}
+DEFINE_SIMPLE_EXPORTED_ATTRIBUTE(simple_fops_u8, simple_u8_get, simple_u8_set, 
%llu\n);
+DEFINE_SIMPLE_EXPORTED_ATTRIBUTE(simple_fops_x8, simple_u8_get, simple_u8_set, 
0x%02llx\n);
+
+static int simple_u16_set(void *data, u64 val)
+{
+   *(u16 *)data = val;
+   return 0;
+}
+static int simple_u16_get(void *data, u64 *val)
+{
+   *val = *(u16 *)data;
+   return 0;
+}
+DEFINE_SIMPLE_EXPORTED_ATTRIBUTE(simple_fops_u16, simple_u16_get, 
simple_u16_set, %llu\n);
+DEFINE_SIMPLE_EXPORTED_ATTRIBUTE(simple_fops_x16, simple_u16_get, 
simple_u16_set, 0x%04llx\n);
+
+static int simple_u32_set(void *data, u64 val)
+{
+   *(u32 *)data = val;
+   return 0;
+}
+static int simple_u32_get(void *data, u64 *val)
+{
+   *val = *(u32 *)data;
+   return 0;
+}
+DEFINE_SIMPLE_EXPORTED_ATTRIBUTE(simple_fops_u32, simple_u32_get, 
simple_u32_set, %llu\n);
+DEFINE_SIMPLE_EXPORTED_ATTRIBUTE(simple_fops_x32, simple_u32_get, 
simple_u32_set, 0x%08llx\n);
+
+static int simple_u64_set(void *data, u64 val)
+{
+   *(u64 *)data = val;
+   return 0;
+}
+
+static int simple_u64_get(void *data, u64 *val)
+{
+   return *(u64 *)data;
+   return 0;
+}
+DEFINE_SIMPLE_EXPORTED_ATTRIBUTE(simple_fops_u64, simple_u64_get, 
simple_u64_set, %llu\n);
+
+static ssize_t read_file_bool(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+   char buf[3];
+   u32 *val = file-private_data;
+
+   if (*val)
+   buf[0] = 'Y';
+   else
+   buf[0] = 'N';
+   buf[1] = '\n';
+   buf[2] = 0x00;
+   return simple_read_from_buffer(user_buf, count, ppos, buf, 2);
+}
+
+static ssize_t write_file_bool(struct file *file, const char __user *user_buf,
+  size_t count, loff_t *ppos)
+{
+   char buf[32];
+   int buf_size;
+   u32 *val = file-private_data;
+
+   buf_size = min(count, (sizeof(buf)-1));
+   if (copy_from_user(buf, user_buf, buf_size))
+   return -EFAULT;
+
+   switch (buf[0]) {
+   case 'y':
+   case 'Y':
+   case '1':
+   *val = 1;
+   break;
+   case 'n':
+   case 'N':
+   case '0':
+   *val = 0;
+   break;
+   }
+
+   return count;
+}
+
+const struct file_operations simple_fops_bool = {
+   .read = read_file_bool,
+   .write =write_file_bool,
+   .open = simple_open,
+};
+EXPORT_SYMBOL_GPL(simple_fops_bool);
+
+static ssize_t read_file_blob(struct

[RFC 00/11] possible debugfs/libfs consolidation

2008-02-18 Thread Arnd Bergmann
I noticed that there is a lot of duplication in pseudo
file systems, so I started looking into how to consolidate
them. I ended up with a largish rework of the structure
of libfs and moving almost all of debugfs in there as well.

As an example, I also have patches that reduce debugfs,
securityfs and usbfs to the point where they are mostly
thin wrappers around libfs, with large comment blocks.
Other file systems could be changed in the same way, but
I first like to see if people agree that I'm on the right
track.

These patches have seen practically no testing so far,
so don't expect them to work, but please tell me what
you think about the concept.

Arnd 

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 03/11] slim down debugfs

2008-02-18 Thread Arnd Bergmann
With most of debugfs now copied to generic code in libfs,
we can remove the original copy and replace it with thin
wrappers around libfs.

Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]
Index: linux-2.6/fs/Kconfig
===
--- linux-2.6.orig/fs/Kconfig
+++ linux-2.6/fs/Kconfig
@@ -1001,6 +1001,14 @@ config CONFIGFS_FS
  Both sysfs and configfs can and should exist together on the
  same system. One is not a replacement for the other.
 
+config LIBFS
+   tristate
+   default m
+   help
+ libfs is a helper library used by many of the simpler file
+ systems. Parts of libfs can be modular when all of its users
+ are modules as well, and the users should select this symbol.
+
 endmenu
 
 menu Miscellaneous filesystems
Index: linux-2.6/fs/debugfs/file.c
===
--- linux-2.6.orig/fs/debugfs/file.c
+++ linux-2.6/fs/debugfs/file.c
@@ -19,55 +19,6 @@
 #include linux/namei.h
 #include linux/debugfs.h
 
-static ssize_t default_read_file(struct file *file, char __user *buf,
-size_t count, loff_t *ppos)
-{
-   return 0;
-}
-
-static ssize_t default_write_file(struct file *file, const char __user *buf,
-  size_t count, loff_t *ppos)
-{
-   return count;
-}
-
-static int default_open(struct inode *inode, struct file *file)
-{
-   if (inode-i_private)
-   file-private_data = inode-i_private;
-
-   return 0;
-}
-
-const struct file_operations debugfs_file_operations = {
-   .read = default_read_file,
-   .write =default_write_file,
-   .open = default_open,
-};
-
-static void *debugfs_follow_link(struct dentry *dentry, struct nameidata *nd)
-{
-   nd_set_link(nd, dentry-d_inode-i_private);
-   return NULL;
-}
-
-const struct inode_operations debugfs_link_operations = {
-   .readlink   = generic_readlink,
-   .follow_link= debugfs_follow_link,
-};
-
-static int debugfs_u8_set(void *data, u64 val)
-{
-   *(u8 *)data = val;
-   return 0;
-}
-static int debugfs_u8_get(void *data, u64 *val)
-{
-   *val = *(u8 *)data;
-   return 0;
-}
-DEFINE_SIMPLE_ATTRIBUTE(fops_u8, debugfs_u8_get, debugfs_u8_set, %llu\n);
-
 /**
  * debugfs_create_u8 - create a debugfs file that is used to read and write an 
unsigned 8-bit value
  * @name: a pointer to a string containing the name of the file to create.
@@ -95,22 +46,10 @@ DEFINE_SIMPLE_ATTRIBUTE(fops_u8, debugfs
 struct dentry *debugfs_create_u8(const char *name, mode_t mode,
 struct dentry *parent, u8 *value)
 {
-   return debugfs_create_file(name, mode, parent, value, fops_u8);
+   return debugfs_create_file(name, mode, parent, value, simple_fops_u8);
 }
 EXPORT_SYMBOL_GPL(debugfs_create_u8);
 
-static int debugfs_u16_set(void *data, u64 val)
-{
-   *(u16 *)data = val;
-   return 0;
-}
-static int debugfs_u16_get(void *data, u64 *val)
-{
-   *val = *(u16 *)data;
-   return 0;
-}
-DEFINE_SIMPLE_ATTRIBUTE(fops_u16, debugfs_u16_get, debugfs_u16_set, %llu\n);
-
 /**
  * debugfs_create_u16 - create a debugfs file that is used to read and write 
an unsigned 16-bit value
  * @name: a pointer to a string containing the name of the file to create.
@@ -138,22 +77,10 @@ DEFINE_SIMPLE_ATTRIBUTE(fops_u16, debugf
 struct dentry *debugfs_create_u16(const char *name, mode_t mode,
  struct dentry *parent, u16 *value)
 {
-   return debugfs_create_file(name, mode, parent, value, fops_u16);
+   return debugfs_create_file(name, mode, parent, value, simple_fops_u16);
 }
 EXPORT_SYMBOL_GPL(debugfs_create_u16);
 
-static int debugfs_u32_set(void *data, u64 val)
-{
-   *(u32 *)data = val;
-   return 0;
-}
-static int debugfs_u32_get(void *data, u64 *val)
-{
-   *val = *(u32 *)data;
-   return 0;
-}
-DEFINE_SIMPLE_ATTRIBUTE(fops_u32, debugfs_u32_get, debugfs_u32_set, %llu\n);
-
 /**
  * debugfs_create_u32 - create a debugfs file that is used to read and write 
an unsigned 32-bit value
  * @name: a pointer to a string containing the name of the file to create.
@@ -181,23 +108,10 @@ DEFINE_SIMPLE_ATTRIBUTE(fops_u32, debugf
 struct dentry *debugfs_create_u32(const char *name, mode_t mode,
 struct dentry *parent, u32 *value)
 {
-   return debugfs_create_file(name, mode, parent, value, fops_u32);
+   return debugfs_create_file(name, mode, parent, value, simple_fops_u32);
 }
 EXPORT_SYMBOL_GPL(debugfs_create_u32);
 
-static int debugfs_u64_set(void *data, u64 val)
-{
-   *(u64 *)data = val;
-   return 0;
-}
-
-static int debugfs_u64_get(void *data, u64 *val)
-{
-   *val = *(u64 *)data;
-   return 0;
-}
-DEFINE_SIMPLE_ATTRIBUTE(fops_u64, debugfs_u64_get, debugfs_u64_set, %llu\n);
-
 /**
  * debugfs_create_u64 - create

[RFC 07/11] split out libfs/file.c from libfs.c

2008-02-18 Thread Arnd Bergmann
Consolidate all file manipulation code in libfs in a single
source file.

Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]
Index: linux-2.6/fs/libfs.c
===
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -421,165 +421,6 @@ ssize_t simple_read_from_buffer(void __u
 }
 
 /*
- * Transaction based IO.
- * The file expects a single write which triggers the transaction, and then
- * possibly a read which collects the result - which is stored in a
- * file-local buffer.
- */
-char *simple_transaction_get(struct file *file, const char __user *buf, size_t 
size)
-{
-   struct simple_transaction_argresp *ar;
-   static DEFINE_SPINLOCK(simple_transaction_lock);
-
-   if (size  SIMPLE_TRANSACTION_LIMIT - 1)
-   return ERR_PTR(-EFBIG);
-
-   ar = (struct simple_transaction_argresp *)get_zeroed_page(GFP_KERNEL);
-   if (!ar)
-   return ERR_PTR(-ENOMEM);
-
-   spin_lock(simple_transaction_lock);
-
-   /* only one write allowed per open */
-   if (file-private_data) {
-   spin_unlock(simple_transaction_lock);
-   free_page((unsigned long)ar);
-   return ERR_PTR(-EBUSY);
-   }
-
-   file-private_data = ar;
-
-   spin_unlock(simple_transaction_lock);
-
-   if (copy_from_user(ar-data, buf, size))
-   return ERR_PTR(-EFAULT);
-
-   return ar-data;
-}
-
-ssize_t simple_transaction_read(struct file *file, char __user *buf, size_t 
size, loff_t *pos)
-{
-   struct simple_transaction_argresp *ar = file-private_data;
-
-   if (!ar)
-   return 0;
-   return simple_read_from_buffer(buf, size, pos, ar-data, ar-size);
-}
-
-int simple_transaction_release(struct inode *inode, struct file *file)
-{
-   free_page((unsigned long)file-private_data);
-   return 0;
-}
-
-/* Simple attribute files */
-
-struct simple_attr {
-   int (*get)(void *, u64 *);
-   int (*set)(void *, u64);
-   char get_buf[24];   /* enough to store a u64 and \n\0 */
-   char set_buf[24];
-   void *data;
-   const char *fmt;/* format for read operation */
-   struct mutex mutex; /* protects access to these buffers */
-};
-
-/* simple_attr_open is called by an actual attribute open file operation
- * to set the attribute specific access operations. */
-int simple_attr_open(struct inode *inode, struct file *file,
-int (*get)(void *, u64 *), int (*set)(void *, u64),
-const char *fmt)
-{
-   struct simple_attr *attr;
-
-   attr = kmalloc(sizeof(*attr), GFP_KERNEL);
-   if (!attr)
-   return -ENOMEM;
-
-   attr-get = get;
-   attr-set = set;
-   attr-data = inode-i_private;
-   attr-fmt = fmt;
-   mutex_init(attr-mutex);
-
-   file-private_data = attr;
-
-   return nonseekable_open(inode, file);
-}
-
-int simple_attr_release(struct inode *inode, struct file *file)
-{
-   kfree(file-private_data);
-   return 0;
-}
-
-/* read from the buffer that is filled with the get function */
-ssize_t simple_attr_read(struct file *file, char __user *buf,
-size_t len, loff_t *ppos)
-{
-   struct simple_attr *attr;
-   size_t size;
-   ssize_t ret;
-
-   attr = file-private_data;
-
-   if (!attr-get)
-   return -EACCES;
-
-   ret = mutex_lock_interruptible(attr-mutex);
-   if (ret)
-   return ret;
-
-   if (*ppos) {/* continued read */
-   size = strlen(attr-get_buf);
-   } else {/* first read */
-   u64 val;
-   ret = attr-get(attr-data, val);
-   if (ret)
-   goto out;
-
-   size = scnprintf(attr-get_buf, sizeof(attr-get_buf),
-attr-fmt, (unsigned long long)val);
-   }
-
-   ret = simple_read_from_buffer(buf, len, ppos, attr-get_buf, size);
-out:
-   mutex_unlock(attr-mutex);
-   return ret;
-}
-
-/* interpret the buffer as a number to call the set function with */
-ssize_t simple_attr_write(struct file *file, const char __user *buf,
- size_t len, loff_t *ppos)
-{
-   struct simple_attr *attr;
-   u64 val;
-   size_t size;
-   ssize_t ret;
-
-   attr = file-private_data;
-   if (!attr-set)
-   return -EACCES;
-
-   ret = mutex_lock_interruptible(attr-mutex);
-   if (ret)
-   return ret;
-
-   ret = -EFAULT;
-   size = min(sizeof(attr-set_buf) - 1, len);
-   if (copy_from_user(attr-set_buf, buf, size))
-   goto out;
-
-   ret = len; /* claim we got the whole input */
-   attr-set_buf[size] = '\0';
-   val = simple_strtol(attr-set_buf, NULL, 0);
-   attr-set(attr-data, val);
-out:
-   mutex_unlock(attr-mutex);
-   return ret;
-}
-
-/*
  * This is what

[RFC 10/11] split out libfs/inode.c from libfs.c

2008-02-18 Thread Arnd Bergmann
Consolidate all inode manipulation code in libfs in a single
source file.

Signed-off-by: Arnd Bergman [EMAIL PROTECTED]

Index: linux-2.6/fs/libfs.c
===
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -12,78 +12,6 @@
 
 #include asm/uaccess.h
 
-int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry 
*dentry)
-{
-   struct inode *inode = old_dentry-d_inode;
-
-   inode-i_ctime = dir-i_ctime = dir-i_mtime = CURRENT_TIME;
-   inc_nlink(inode);
-   atomic_inc(inode-i_count);
-   dget(dentry);
-   d_instantiate(dentry, inode);
-   return 0;
-}
-
-int simple_empty(struct dentry *dentry)
-{
-   struct dentry *child;
-   int ret = 0;
-
-   spin_lock(dcache_lock);
-   list_for_each_entry(child, dentry-d_subdirs, d_u.d_child)
-   if (simple_positive(child))
-   goto out;
-   ret = 1;
-out:
-   spin_unlock(dcache_lock);
-   return ret;
-}
-
-int simple_unlink(struct inode *dir, struct dentry *dentry)
-{
-   struct inode *inode = dentry-d_inode;
-
-   inode-i_ctime = dir-i_ctime = dir-i_mtime = CURRENT_TIME;
-   drop_nlink(inode);
-   dput(dentry);
-   return 0;
-}
-
-int simple_rmdir(struct inode *dir, struct dentry *dentry)
-{
-   if (!simple_empty(dentry))
-   return -ENOTEMPTY;
-
-   drop_nlink(dentry-d_inode);
-   simple_unlink(dir, dentry);
-   drop_nlink(dir);
-   return 0;
-}
-
-int simple_rename(struct inode *old_dir, struct dentry *old_dentry,
-   struct inode *new_dir, struct dentry *new_dentry)
-{
-   struct inode *inode = old_dentry-d_inode;
-   int they_are_dirs = S_ISDIR(old_dentry-d_inode-i_mode);
-
-   if (!simple_empty(new_dentry))
-   return -ENOTEMPTY;
-
-   if (new_dentry-d_inode) {
-   simple_unlink(new_dir, new_dentry);
-   if (they_are_dirs)
-   drop_nlink(old_dir);
-   } else if (they_are_dirs) {
-   drop_nlink(old_dir);
-   inc_nlink(new_dir);
-   }
-
-   old_dir-i_ctime = old_dir-i_mtime = new_dir-i_ctime =
-   new_dir-i_mtime = inode-i_ctime = CURRENT_TIME;
-
-   return 0;
-}
-
 int simple_readpage(struct file *file, struct page *page)
 {
clear_highpage(page);
@@ -183,11 +111,6 @@ ssize_t simple_read_from_buffer(void __u
 
 EXPORT_SYMBOL(simple_write_begin);
 EXPORT_SYMBOL(simple_write_end);
-EXPORT_SYMBOL(simple_empty);
-EXPORT_SYMBOL(simple_link);
 EXPORT_SYMBOL(simple_prepare_write);
 EXPORT_SYMBOL(simple_readpage);
-EXPORT_SYMBOL(simple_rename);
-EXPORT_SYMBOL(simple_rmdir);
-EXPORT_SYMBOL(simple_unlink);
 EXPORT_SYMBOL(simple_read_from_buffer);
Index: linux-2.6/fs/libfs/inode.c
===
--- linux-2.6.orig/fs/libfs/inode.c
+++ linux-2.6/fs/libfs/inode.c
@@ -417,4 +417,79 @@ exit:
 }
 EXPORT_SYMBOL_GPL(simple_rename_named);
 
+int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry 
*dentry)
+{
+   struct inode *inode = old_dentry-d_inode;
 
+   inode-i_ctime = dir-i_ctime = dir-i_mtime = CURRENT_TIME;
+   inc_nlink(inode);
+   atomic_inc(inode-i_count);
+   dget(dentry);
+   d_instantiate(dentry, inode);
+   return 0;
+}
+EXPORT_SYMBOL(simple_link);
+
+int simple_empty(struct dentry *dentry)
+{
+   struct dentry *child;
+   int ret = 0;
+
+   spin_lock(dcache_lock);
+   list_for_each_entry(child, dentry-d_subdirs, d_u.d_child)
+   if (simple_positive(child))
+   goto out;
+   ret = 1;
+out:
+   spin_unlock(dcache_lock);
+   return ret;
+}
+EXPORT_SYMBOL(simple_empty);
+
+int simple_unlink(struct inode *dir, struct dentry *dentry)
+{
+   struct inode *inode = dentry-d_inode;
+
+   inode-i_ctime = dir-i_ctime = dir-i_mtime = CURRENT_TIME;
+   drop_nlink(inode);
+   dput(dentry);
+   return 0;
+}
+EXPORT_SYMBOL(simple_unlink);
+
+int simple_rmdir(struct inode *dir, struct dentry *dentry)
+{
+   if (!simple_empty(dentry))
+   return -ENOTEMPTY;
+
+   drop_nlink(dentry-d_inode);
+   simple_unlink(dir, dentry);
+   drop_nlink(dir);
+   return 0;
+}
+EXPORT_SYMBOL(simple_rmdir);
+
+int simple_rename(struct inode *old_dir, struct dentry *old_dentry,
+   struct inode *new_dir, struct dentry *new_dentry)
+{
+   struct inode *inode = old_dentry-d_inode;
+   int they_are_dirs = S_ISDIR(old_dentry-d_inode-i_mode);
+
+   if (!simple_empty(new_dentry))
+   return -ENOTEMPTY;
+
+   if (new_dentry-d_inode) {
+   simple_unlink(new_dir, new_dentry);
+   if (they_are_dirs)
+   drop_nlink(old_dir);
+   } else if (they_are_dirs) {
+   drop_nlink(old_dir);
+   inc_nlink(new_dir);
+   }
+
+   

[RFC 11/11] split out libfs/aops.c from libfs.c

2008-02-18 Thread Arnd Bergmann
Consolidate all address space manipulation code in libfs in a single
source file.

Signed-off-by: Arnd Bergman [EMAIL PROTECTED]


Index: linux-2.6/fs/libfs.c
===
--- linux-2.6.orig/fs/libfs.c
+++ /dev/null
@@ -1,116 +0,0 @@
-/*
- * fs/libfs.c
- * Library for filesystems writers.
- */
-
-#include linux/module.h
-#include linux/pagemap.h
-#include linux/mount.h
-#include linux/vfs.h
-#include linux/mutex.h
-#include linux/exportfs.h
-
-#include asm/uaccess.h
-
-int simple_readpage(struct file *file, struct page *page)
-{
-   clear_highpage(page);
-   flush_dcache_page(page);
-   SetPageUptodate(page);
-   unlock_page(page);
-   return 0;
-}
-
-int simple_prepare_write(struct file *file, struct page *page,
-   unsigned from, unsigned to)
-{
-   if (!PageUptodate(page)) {
-   if (to - from != PAGE_CACHE_SIZE)
-   zero_user_segments(page,
-   0, from,
-   to, PAGE_CACHE_SIZE);
-   }
-   return 0;
-}
-
-int simple_write_begin(struct file *file, struct address_space *mapping,
-   loff_t pos, unsigned len, unsigned flags,
-   struct page **pagep, void **fsdata)
-{
-   struct page *page;
-   pgoff_t index;
-   unsigned from;
-
-   index = pos  PAGE_CACHE_SHIFT;
-   from = pos  (PAGE_CACHE_SIZE - 1);
-
-   page = __grab_cache_page(mapping, index);
-   if (!page)
-   return -ENOMEM;
-
-   *pagep = page;
-
-   return simple_prepare_write(file, page, from, from+len);
-}
-
-static int simple_commit_write(struct file *file, struct page *page,
-  unsigned from, unsigned to)
-{
-   struct inode *inode = page-mapping-host;
-   loff_t pos = ((loff_t)page-index  PAGE_CACHE_SHIFT) + to;
-
-   if (!PageUptodate(page))
-   SetPageUptodate(page);
-   /*
-* No need to use i_size_read() here, the i_size
-* cannot change under us because we hold the i_mutex.
-*/
-   if (pos  inode-i_size)
-   i_size_write(inode, pos);
-   set_page_dirty(page);
-   return 0;
-}
-
-int simple_write_end(struct file *file, struct address_space *mapping,
-   loff_t pos, unsigned len, unsigned copied,
-   struct page *page, void *fsdata)
-{
-   unsigned from = pos  (PAGE_CACHE_SIZE - 1);
-
-   /* zero the stale part of the page if we did a short copy */
-   if (copied  len) {
-   void *kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + from + copied, 0, len - copied);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-   }
-
-   simple_commit_write(file, page, from, from+copied);
-
-   unlock_page(page);
-   page_cache_release(page);
-
-   return copied;
-}
-
-ssize_t simple_read_from_buffer(void __user *to, size_t count, loff_t *ppos,
-   const void *from, size_t available)
-{
-   loff_t pos = *ppos;
-   if (pos  0)
-   return -EINVAL;
-   if (pos = available)
-   return 0;
-   if (count  available - pos)
-   count = available - pos;
-   if (copy_to_user(to, from + pos, count))
-   return -EFAULT;
-   *ppos = pos + count;
-   return count;
-}
-
-EXPORT_SYMBOL(simple_write_begin);
-EXPORT_SYMBOL(simple_write_end);
-EXPORT_SYMBOL(simple_prepare_write);
-EXPORT_SYMBOL(simple_readpage);
-EXPORT_SYMBOL(simple_read_from_buffer);
Index: linux-2.6/fs/libfs/Makefile
===
--- linux-2.6.orig/fs/libfs/Makefile
+++ linux-2.6/fs/libfs/Makefile
@@ -1,3 +1,3 @@
 libfs-y += file.o
 
-obj-$(CONFIG_LIBFS) += libfs.o inode.o super.o dentry.o
+obj-$(CONFIG_LIBFS) += libfs.o inode.o super.o dentry.o aops.o
Index: linux-2.6/fs/libfs/aops.c
===
--- /dev/null
+++ linux-2.6/fs/libfs/aops.c
@@ -0,0 +1,113 @@
+/*
+ * fs/libfs/aops.c
+ * Library for filesystems writers -- address space operations
+ */
+
+#include linux/module.h
+#include linux/pagemap.h
+#include linux/mount.h
+#include linux/vfs.h
+
+#include asm/uaccess.h
+
+int simple_readpage(struct file *file, struct page *page)
+{
+   clear_highpage(page);
+   flush_dcache_page(page);
+   SetPageUptodate(page);
+   unlock_page(page);
+   return 0;
+}
+EXPORT_SYMBOL(simple_readpage);
+
+int simple_prepare_write(struct file *file, struct page *page,
+   unsigned from, unsigned to)
+{
+   if (!PageUptodate(page)) {
+   if (to - from != PAGE_CACHE_SIZE)
+   zero_user_segments(page,
+   0, from,
+   to, PAGE_CACHE_SIZE);
+   }
+ 

Re: Proposal to improve filesystem/block snapshot interaction

2007-10-30 Thread Arnd Bergmann
On Tuesday 30 October 2007, Dongjun Shin wrote:
 There is an ongoing discussion about adding 'Trim' ATA command for notifying
 the drive about the deleted blocks.
 
 http://www.t13.org/Documents/UploadedDocuments/docs2007/e07154r3-Data_Set_Management_Proposal_for_ATA-ACS2.pdf
 
 This is especially useful for the storage device like Solid State Drive (SSD).
 
This make me curious, why would t13 want to invent a new command when
there is already the erase command from CFA? 

It's not exactly the same, but close enough that the proposed BIO_HINT_RELEASE
should probably be mapped to CFA_ERASE (0xc0) on drives that support it:
http://t13.org/Documents/UploadedDocuments/technical/d97116r1.pdf

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposal to improve filesystem/block snapshot interaction

2007-10-30 Thread Arnd Bergmann
On Tuesday 30 October 2007, Jörn Engel wrote:
 On Tue, 30 October 2007 23:19:48 +0900, Dongjun Shin wrote:
  On 10/30/07, Arnd Bergmann [EMAIL PROTECTED] wrote:
  
   Not sure. Why shouldn't you be able to reorder the hints provided that
   they don't overlap with read/write bios for the same block?
  
  You're right. The bios can be reordered if they don't overlap with hint.
 
 I would keep things simpler.  Bios can be reordered, full stop.  If an
 erase and a write overlap, the caller (filesystem?) has to add a
 barrier.

I thought bios were already ordered if they affect the same blocks.
Either way, I agree that an erase should not be treated special on
the bio layer, its ordering should be handled the same way we do it
for writes.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 02/18] include/linux/logfs.h

2007-08-08 Thread Arnd Bergmann
On Wednesday 08 August 2007, Jörn Engel wrote:
 +++ linux-2.6.21logfs/include/linux/logfs.h 2007-08-08 02:57:37.0 
 +0200
 @@ -0,0 +1,500 @@
 +/*
 + * fs/logfs/logfs.h
 + *

The comment does not match the file name. Better remove the file names
entirely from introduction comments.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 16/18] fs/Kconfig

2007-08-08 Thread Arnd Bergmann
On Wednesday 08 August 2007, Jörn Engel wrote:
 +config LOGFS
 +   bool Log Filesystem (EXPERIMENTAL)
 +   depends on MTD  BLOCK  EXPERIMENTAL

The dependency on MTD _and_ BLOCK looks correct for your code, but
not necessary. How about making it 

depends on (MTD || BLOCK)  EXPERIMENTAL

and allowing to build without the mtd/bdev specific code?

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG?]Set XIP mount option on ext2 bypass check.

2007-06-21 Thread Arnd Bergmann
On Thursday 21 June 2007, Carsten Otte wrote:
 
 This is an updated version of my bugfix patch. Yan Zheng pointed out,
 that ext2_remount lacks checking if -o xip should be enabled or not.
 This patch checks for presence of direct_access on the backing block
 device and if the blocksize meets the requirements.
 Andrew, please consider adding this patch to -mm.
 
 Signed-off-by: Carsten Otte [EMAIL PROTECTED]

It looks to me like a local denial of service attack in case of
user-mountable ext2 file systems in /etc/fstab.

Shouldn't that make it go into 2.6.22?

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 10/18] fs/logfs/inode.c

2007-06-11 Thread Arnd Bergmann
On Tuesday 12 June 2007, Jörn Engel wrote:
 The initial storm of review comments has calmed down.  I get the
 impression that people either lose interest or run out of simple things
 to point out.  Maybe I should wait a bit before resending?

Your last series had a todo list of 11 items, plus more stuff
found in the review. How about you submit logfs for inclusion in -mm
when all easy stuff is gone from that list?

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 15/18] fs/logfs/super.c

2007-06-10 Thread Arnd Bergmann
On Sunday 03 June 2007, Jörn Engel wrote:
 +static int mtdread(struct super_block *sb, loff_t ofs, size_t len, void *buf)
 +{
 + struct mtd_info *mtd = logfs_super(sb)-s_mtd;
 + size_t retlen;
 + int ret;
 +
 + ret = mtd-read(mtd, ofs, len, retlen, buf);
 + BUG_ON(ret == -EINVAL);
 + if (ret)
 + return ret;
 +
 + /* Not sure if we should loop instead. */
 + if (retlen != len)
 + return -EIO;
 +
 + return 0;
 +}
 +
 +static int mtdwrite(struct super_block *sb, loff_t ofs, size_t len, void 
 *buf)
 +{
 + struct logfs_super *super = logfs_super(sb);
 + struct mtd_info *mtd = super-s_mtd;
 + struct inode *inode = super-s_dev_inode;
 + size_t retlen;
 + loff_t page_start, page_end;
 + int ret;
 +
 + if (super-s_flags  LOGFS_SB_FLAG_RO)
 + return -EROFS;
 +
 + BUG_ON((ofs = mtd-size) || (len  mtd-size - ofs));
 + BUG_ON(ofs != (ofs  super-s_writeshift)  super-s_writeshift);
 + BUG_ON(len  PAGE_CACHE_SIZE);
 + page_start = ofs  PAGE_CACHE_MASK;
 + page_end = PAGE_CACHE_ALIGN(ofs + len) - 1;
 + truncate_inode_pages_range(inode-i_data, page_start, page_end);
 + ret = mtd-write(mtd, ofs, len, retlen, buf);
 + if (ret || (retlen != len))
 + return -EIO;
 +
 + return 0;
 +}

It seems that these functions are completely synchronous and afaics, the
writes are called in the pdflush context, effectively blocking out operations
on other file systems at the time. Not sure if this is something that
should be fixed, but it seems to limit scalability on mtd backends.

It seems that jffs2 has the same behaviour.

 +static int bdwrite(struct super_block *sb, loff_t to, size_t len, void *buf)
 +{
 + struct block_device *bdev = logfs_super(sb)-s_bdev;
 + struct address_space *mapping = bdev-bd_inode-i_mapping;
 + struct page *page;
 + long index = to  PAGE_SHIFT;
 + long offset = to  (PAGE_SIZE-1);
 + long copylen;
 +
 + while (len) {
 + copylen = min((ulong)len, PAGE_SIZE - offset);
 +
 + page = read_cache_page(mapping, index,
 + (filler_t*)mapping-a_ops-readpage, NULL);
 + if (!page)
 + return -ENOMEM;
 + if (IS_ERR(page))
 + return PTR_ERR(page);
 + lock_page(page);
 + memcpy(page_address(page) + offset, buf, copylen);
 + set_page_dirty(page);
 + unlock_page(page);
 + page_cache_release(page);
 +
 + buf += copylen;
 + len -= copylen;
 + offset = 0;
 + index++;
 + }
 + return 0;
 +}

How about using submit_bio here instead of going to the page cache?
That would avoid doubling all the memory consumption here.

 +/*
 + * logfs_crash_dump - dump debug information to device
 + *
 + * The LogFS superblock only occupies part of a segment.  This function will
 + * write as much debug information as it can gather into the spare space.
 + */
 +void logfs_crash_dump(struct super_block *sb)
 +{
 + struct logfs_super *super = logfs_super(sb);
 + int i, blockno = 2, bs = sb-s_blocksize;
 + void *scratch = super-s_wblock[0];
 + void *stack = (void *) ((ulong)current  ~0x1fffUL);
 +
 + /* all wbufs */
 + for (i=0; iLOGFS_NO_AREAS; i++) {
 + void *wbuf = super-s_area[i]-a_wbuf;
 + u64 ofs = sb-s_blocksize + i*super-s_writesize;
 + mtdwrite(sb, ofs, super-s_writesize, wbuf);
 + }

shouldn't this use the -write() function instead of hardcoding mtdwrite.

 +module_init(logfs_init);
 +module_exit(logfs_exit);

You are missing at least a MODULE_LICENSE and should probably
add the MODULE_AUTHOR and MODULE_DESCRIPTION tags as well.
Right now, it's impossible to even load the module, because
it uses a few GPL-only symbols.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 04/18] include/linux/logfs.h

2007-06-06 Thread Arnd Bergmann
On Wednesday 06 June 2007, David Woodhouse wrote:
 And if I had something like this (which is admittedly contrived, but
 hardware people _do_ do stupid things to us):
    { uint32_t, uint8_t, uint16_t, uint8_t, uint32_t, uint32_t }
 
 With the 'packed' attribute the compiler would assume arbitrary
 alignment of all the 32-bit integers. But in reality it's only necessary
 for the uint16_t in the middle. A 'nopadding' attribute would deal with
 that correctly.

I would argue that a newly invented 'nopadding' attribute should reject
such a structure as invalid, because it should not let members be
unaligned. Unfortunately, this also gets tricky if you consider

struct {
uint32_t a;
uint64_t b;
uint32_t c;
};

which does have an unaligned member by default in i386, but not on
any modern platform.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 04/18] include/linux/logfs.h

2007-06-03 Thread Arnd Bergmann
On Sunday 03 June 2007, Jörn Engel wrote:
 +struct logfs_je_spillout {
 +   __be64  so_segment[0];
 +}__packed;

All the on-disk data structures you define in this file have naturally
aligned members, so the __packed attribute is not needed.

However, I think it causes gcc to generate larger and slower code
on some architectures, because now it has to assume that the data
structure itself has no more than byte alignment.

I'd simply remove all instances of __packed therefore. In order
to verify that you got it right in all cases, build with
'-Wpadded -Wpacked'.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 06/18] fs/logfs/compr.c

2007-06-03 Thread Arnd Bergmann
On Sunday 03 June 2007, Jörn Engel wrote:
 +#define COMPR_LEVEL 3
 +
 +static DEFINE_MUTEX(compr_mutex);
 +static struct z_stream_s stream;

Is there a particular reason to choose '3' as the only compression
level? Should this perhaps be a per-superblock option instead?

Also, I thought I saw discussion about making the mutex and
stream per-superblock, but don't remember if the idea was discarded.
If it was, you might want to add it to the won't-happen list.

Arnd  
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LogFS take four

2007-06-03 Thread Arnd Bergmann
On Sunday 03 June 2007, Jörn Engel wrote:

 Unchanged:
 o error handling
 
...
 Won't happen (unless I get convinced to do otherwise):
 o Change LOGFS_BUG() and LOGFS_BUG_ON() to inline functions
   These are macros for very much the same reasons BUG() and BUG_ON() are.

I wonder how many of your LOGFS_BUG{,_ON} still remain after the
error handling is in place to deal with broken file system contents.
Ideally, I'd say the current LOGFS_BUG() should be replaced with
a function that prints about the kind of error it has hit (rate-limited),
potentially calls logfs_crash_dump(), and remounts the medium read-only,
but _not_ call BUG().

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 14/18] fs/logfs/segment.c

2007-06-03 Thread Arnd Bergmann
On Sunday 03 June 2007, Jörn Engel wrote:
 +static DEFINE_MUTEX(compr_mutex);
 +

It seems you define a static compre_mutex in both segment.c and in compr.c,
and always lock them both at the same time. Is that a correct observation?
Is it intentional, or an oversight on your side?

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Tuesday 15 May 2007, Jörn Engel wrote:
 
  I've been semi watching this, and the only comment I really can give
  is that I hate the name.  To me, logfs implies a filesystem for
  logging purposes, not for Flash hardware with wear leveling issues to
  be taken into account.
 
 Yeah, well, ...
 
 Two years ago when I started all this, I was looking for a good name.
 All I could come up with sounded stupid, so I picked LogFS as a code
 name.  As soon as I find a better name, the code name should get
 replaced.
 

When doing a google search on logfs, there are less than five results
among the first 100 that don't refer to your work. The other two listed
in there are also log-structured file systems: The inferno flash file
system (http://inferno-os.googlecode.com/svn/trunk/liblogfs/) and the
(discontinued) file system named lfs from the 2005 google summer of
code.

I'd say the name should stay, changing it now can only add more confusion.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Tuesday 15 May 2007, Jörn Engel wrote:
 Add LogFS, a scalable flash filesystem.

Hi Jörn,

Sorry for not commenting earlier, there were so many discussions on version
two that I wanted to wait for the fallout of that instead of duplicating
all the comments.

Here are a few things I notice while going through the third version:

 +/*
 + * Private errno for accessed beyond end-of-file.  Only used internally to
 + * logfs.  If this ever gets exposed to userspace or even other parts of the
 + * kernel, it is a bug.  256 was chosen as a number sufficiently above all
 + * used errno #defines.
 + *
 + * It can be argued that this is a hack and should be replaced with something
 + * else.  My last attempt to do this failed spectacularly and there are more
 + * urgent problems that users actually care about.  This will remain for the
 + * moment.  Patches are wellcome, of course.
 + */
 +#define EOF  256

It should at least be in the kernel-only errno range between 512 and 4095,
that way it can eventually be added to include/linux/errno.h.

 + * Target rename works in three atomic steps:
 + * 1. Attach old inode to new dentry (remember old dentry and new inode)
 + * 2. Remove old dentry (still remember the new inode)
 + * 3. Remove new inode
 + *
 + * Here we remember both an inode an a dentry.  If we get interrupted
 + * between steps 1 and 2, we delete both the dentry and the inode.  If
 + * we get interrupted between steps 2 and 3, we delete just the inode.
 + * In either case, the remaining objects are deleted on next mount.  From
 + * a users point of view, the operation succeeded.

This description had me confused for a while: why would you remove the
new inode. Maybe change the text to say 'target inode' or 'victim inode'?

 +static int logfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
 +{
 + struct inode *inode;
 +
 + if (dir-i_nlink = LOGFS_LINK_MAX)
 + return -EMLINK;

Why is i_nlink limited? Don't you run out of space for inodes before
overflowing?

 + * In principle, this function should loop forever, looking for GC candidates
 + * and moving data.  LogFS is designed in such a way that this loop is
 + * guaranteed to terminate.
 + *
 + * Limiting the loop to four iterations serves purely to catch cases when
 + * these guarantees have failed.  An actual endless loop is an obvious bug
 + * and should be reported as such.
 + *
 + * But there is another nasty twist to this.  As I have described in my LCA
 + * presentation, Garbage collection would have to limit itself to higher
 + * levels if the number of available free segments goes down.  This code
 + * doesn't and should fail spectacularly.  Yet - hard as I tried I haven't
 + * been able to make it fail (short of a bug elsewhere).
 + *
 + * So in a way this code is intentionally wrong as a desperate cry for a
 + * better testcase.  And I do expect to get blamed for it one day. :(
 + */

Could you bug the code to reserve fewer segments for GC than you really
need, in order to stress test GC?

 +static struct inode *logfs_alloc_inode(struct super_block *sb)
 +{
 + struct logfs_inode *li;
 +
 + li = kmem_cache_alloc(logfs_inode_cache, GFP_KERNEL);
 + if (!li)
 + return NULL;
 + logfs_init_inode(li-vfs_inode);
 + return li-vfs_inode;
 +}
 +
 +
 +struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino)
 +{
 + struct inode *inode;
 +
 + inode = logfs_alloc_inode(sb);
 + if (!inode)
 + return ERR_PTR(-ENOMEM);
 +
 + logfs_init_inode(inode);

logfs_alloc_inode() returns an initialized inode, so no need to call
logfs_init_inode() again, right?

 +static __be64 timespec_to_be64(struct timespec tsp)
 +{
 + u64 time = ((u64)tsp.tv_sec  32) + (tsp.tv_nsec  0x);
 +
 + WARN_ON(tsp.tv_nsec  9);
 + return cpu_to_be64(time);
 +}

Why not just store 64 bit nanoseconds? that would avoid the problem
with ns overflow and the year-2038 bug. OTOH, that would require
a 64 bit integer division when reading the data, so it gets you
a runtime overhead.

 +static void logfs_read_inode(struct inode *inode)
 +{
 + int ret;
 +
 + BUG_ON(inode-i_ino == LOGFS_INO_MASTER);
 +
 + ret = __logfs_read_inode(inode);
 +
 + /* What else can we do here? */
 + BUG_ON(ret);
 +}

ext2 returns make_bad_inode(inode) in this case, which seems to be
a better solution than crashing.

 +int __logfs_write_inode(struct inode *inode)
 +{
 + /*
 +  * FIXME: Those two inodes are 512 bytes in total.  Not good to
 +  * have on the stack.  Possibly the best solution would be to bite
 +  * the bullet and do another format change before release and
 +  * shrink the inodes.
 +  */
 + struct logfs_disk_inode old, new;
 +
 + BUG_ON(inode-i_ino == LOGFS_INO_MASTER);
 +
 + /* read and compare the inode first.  If it hasn't changed, don't
 +  * bother writing it. */
 + logfs_inode_to_disk(inode, new);
 + if 

Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Thursday 17 May 2007, Jörn Engel wrote:
 
  Why not just store 64 bit nanoseconds? that would avoid the problem
  with ns overflow and the year-2038 bug. OTOH, that would require
  a 64 bit integer division when reading the data, so it gets you
  a runtime overhead.
 
 I like the idea.  Do conversion function exist both way?
 
 What I don't get is the year-2038 bug.  Isn't that the 31bit limit,
 while 32bit would last to 2106?

You're right, you don't hit the 2038 bug here, because you use an
unsigned variable. The bug exists elsewhere because time_t tv_sec
is signed.

Just using nanoseconds probably doesn't gain you much after all
then. You could however just have separate 32 bit fields in the
inode for seconds and nanoseconds, that will result in the exact
same layout that you have right now, but won't require a conversion
function.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] LogFS take three

2007-05-17 Thread Arnd Bergmann
On Thursday 17 May 2007, Pekka Enberg wrote:
 
 Jörn Engel wrote:
  Compressing random data will actually enlarge it.  If that happens I
  simply store the verbatim uncompressed data instead and mark it as such.
  
  There is also demand for a user-controlled bit in the inode to disable
  compression completely.  All those .jpg, .mpg, .mp3, etc. just waste
  time by trying and failing to compress them.
 
 So any sane way to enable compression is on per-inode basis which makes 
 me still wonder why you need per-object compression.

1. it doesn't require user interaction, the file system will do the right
thing most of the time.

2. enlarging data is a very bad thing because it makes the behaviour
of the fs unpredictable. With uncompressed objects, you have a guaranteed
upper bound on the size.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Heads up on sys_fallocate()

2007-03-04 Thread Arnd Bergmann
On Sunday 04 March 2007, Anton Altaparmakov wrote:
  A generic_fallocate makes sense to me iff we can do it in the kernel
  more significantly more efficiently than in glibc, e.g. by using only
  a single page in page cache instead of one for each page to be  
  preallocated.
 
  If  glibc is smart enough to do an optimal implementation, I fully  
  agree
  with you.
 
 glibc cannot ever be smart enough because a file system driver will  
 always know better and be able to do things in a much more optimized  
 way.

Ok, that's not what I meant. It's obvious that the file system itself
can do better than both VFS and glibc. The question is whether VFS can
be better than glibc on file systems that don't offer their own
implementation of the fallocate operation.

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html