date:20101209

Purpose of block reserves

2010-12-09 Thread Arne Jansen

Hi,

while reading btrfs source code, I try to make sense of the various uses
of block reserves. The working theory is as follows:

1. Every operation has to reserve upfront every single byte it needs to
   complete its operation fully.
2. If an operation cannot determine how much space it will need, it has
   to be able to cope with running out of space. Normally it does it by
   inserting an orphan item, doing its work in multiple transactions and
   removing the orphan item. The commits in between normally free up enough
   space to continue the operation.
3. All other enospc situations are errors in program logic and should result
   in BUG_ON.

It would be great of someone with a deeper knowledge could correct, expand
or just confirm this.

Thanks,
Arne
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/5] Btrfs: Readonly snapshots

2010-12-09 Thread Li Zefan

This patchset adds readonly-snapshots support. You can create a
readonly snapshot, and you can also set a snapshot (subvolume also)
readonly/writable on the fly.

A few readonly checks are added in setattr, permission, remove_xattr
and set_xattr callbacks, as well as in some ioctls.

The patchset is also available in:
git://repo.or.cz/linux-btrfs-devel.git readonly-snapshots

Note: I've changed the async snapshot ABI. So if the patchset is acceptable,
the first patch has to be merged into .37 to avoid ABI breakage.

---
 fs/btrfs/ctree.h   |3 +
 fs/btrfs/disk-io.c |   36 +
 fs/btrfs/inode.c   |8 ++
 fs/btrfs/ioctl.c   |  193 +++-
 fs/btrfs/ioctl.h   |   19 -
 fs/btrfs/transaction.c |8 ++
 fs/btrfs/transaction.h |1 +
 fs/btrfs/xattr.c   |   18 +
 8 files changed, 230 insertions(+), 56 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/5] Btrfs: Make async snapshot ioctl more generic

2010-12-09 Thread Li Zefan

So we don't have to add new structures as we create more ioctls
for snapshots/subvolumes.

Now to create async snapshot, set BTRFS_SUBVOL_CREATE_SNAP_ASYNC bit
of vol_arg_v2->flags, and then call ioctl(BTRFS_IOCT_SNAP_CREATE_V2).

Signed-off-by: Li Zefan 
---
 fs/btrfs/ioctl.c |   34 +-
 fs/btrfs/ioctl.h |   14 +-
 2 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index f1c9bb4..dc953bc 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -947,23 +947,31 @@ out:
 
 static noinline int btrfs_ioctl_snap_create(struct file *file,
void __user *arg, int subvol,
-   int async)
+   bool v2)
 {
struct btrfs_ioctl_vol_args *vol_args = NULL;
-   struct btrfs_ioctl_async_vol_args *async_vol_args = NULL;
+   struct btrfs_ioctl_vol_args_v2 *vol_args_v2 = NULL;
char *name;
u64 fd;
u64 transid = 0;
+   bool async = false;
int ret;
 
-   if (async) {
-   async_vol_args = memdup_user(arg, sizeof(*async_vol_args));
-   if (IS_ERR(async_vol_args))
-   return PTR_ERR(async_vol_args);
+   if (v2) {
+   vol_args_v2 = memdup_user(arg, sizeof(*vol_args_v2));
+   if (IS_ERR(vol_args_v2))
+   return PTR_ERR(vol_args_v2);
 
-   name = async_vol_args->name;
-   fd = async_vol_args->fd;
-   async_vol_args->name[BTRFS_SNAPSHOT_NAME_MAX] = '\0';
+   if (vol_args_v2->flags & ~BTRFS_SUBVOL_CREATE_SNAP_ASYNC) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   name = vol_args_v2->name;
+   fd = vol_args_v2->fd;
+   vol_args_v2->name[BTRFS_SUBVOL_NAME_MAX] = '\0';
+   if (vol_args_v2->flags & BTRFS_SUBVOL_CREATE_SNAP_ASYNC)
+   async = true;
} else {
vol_args = memdup_user(arg, sizeof(*vol_args));
if (IS_ERR(vol_args))
@@ -978,13 +986,13 @@ static noinline int btrfs_ioctl_snap_create(struct file 
*file,
 
if (!ret && async) {
if (copy_to_user(arg +
-   offsetof(struct btrfs_ioctl_async_vol_args,
+   offsetof(struct btrfs_ioctl_vol_args_v2,
transid), &transid, sizeof(transid)))
return -EFAULT;
}
-
+out:
kfree(vol_args);
-   kfree(async_vol_args);
+   kfree(vol_args_v2);
 
return ret;
 }
@@ -2246,7 +2254,7 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_getversion(file, argp);
case BTRFS_IOC_SNAP_CREATE:
return btrfs_ioctl_snap_create(file, argp, 0, 0);
-   case BTRFS_IOC_SNAP_CREATE_ASYNC:
+   case BTRFS_IOC_SNAP_CREATE_V2:
return btrfs_ioctl_snap_create(file, argp, 0, 1);
case BTRFS_IOC_SUBVOL_CREATE:
return btrfs_ioctl_snap_create(file, argp, 1, 0);
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 17c99eb..9affc49 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -30,11 +30,15 @@ struct btrfs_ioctl_vol_args {
char name[BTRFS_PATH_NAME_MAX + 1];
 };
 
-#define BTRFS_SNAPSHOT_NAME_MAX 4079
-struct btrfs_ioctl_async_vol_args {
+#define BTRFS_SUBVOL_CREATE_SNAP_ASYNC (1ULL << 0)
+
+#define BTRFS_SUBVOL_NAME_MAX 4039
+struct btrfs_ioctl_vol_args_v2 {
__s64 fd;
__u64 transid;
-   char name[BTRFS_SNAPSHOT_NAME_MAX + 1];
+   __u64 flags;
+   __u64 unused[4];
+   char name[BTRFS_SUBVOL_NAME_MAX + 1];
 };
 
 #define BTRFS_INO_LOOKUP_PATH_MAX 4080
@@ -187,6 +191,6 @@ struct btrfs_ioctl_space_args {
struct btrfs_ioctl_space_args)
 #define BTRFS_IOC_START_SYNC _IOR(BTRFS_IOCTL_MAGIC, 24, __u64)
 #define BTRFS_IOC_WAIT_SYNC  _IOW(BTRFS_IOCTL_MAGIC, 22, __u64)
-#define BTRFS_IOC_SNAP_CREATE_ASYNC _IOW(BTRFS_IOCTL_MAGIC, 23, \
-  struct btrfs_ioctl_async_vol_args)
+#define BTRFS_IOC_SNAP_CREATE_V2 _IOW(BTRFS_IOCTL_MAGIC, 23, \
+  struct btrfs_ioctl_vol_args_v2)
 #endif
-- 
1.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/5] Btrfs: Refactor btrfs_ioctl_snap_create()

2010-12-09 Thread Li Zefan

Split it into two functions for two different ioctls, since they
share no common code.

Also fix a memory leak in a failure path.

Signed-off-by: Li Zefan 
---
 fs/btrfs/ioctl.c |   71 +++--
 1 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index dc953bc..5ecd532 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -946,54 +946,55 @@ out:
 }
 
 static noinline int btrfs_ioctl_snap_create(struct file *file,
-   void __user *arg, int subvol,
-   bool v2)
+   void __user *arg, int subvol)
 {
-   struct btrfs_ioctl_vol_args *vol_args = NULL;
-   struct btrfs_ioctl_vol_args_v2 *vol_args_v2 = NULL;
-   char *name;
-   u64 fd;
+   struct btrfs_ioctl_vol_args *vol_args;
+   int ret;
+
+   vol_args = memdup_user(arg, sizeof(*vol_args));
+   if (IS_ERR(vol_args))
+   return PTR_ERR(vol_args);
+   vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
+
+   ret = btrfs_ioctl_snap_create_transid(file, vol_args->name,
+ vol_args->fd, subvol, NULL);
+
+   kfree(vol_args);
+   return ret;
+}
+
+static noinline int btrfs_ioctl_snap_create_v2(struct file *file,
+  void __user *arg, int subvol)
+{
+   struct btrfs_ioctl_vol_args_v2 *vol_args;
u64 transid = 0;
bool async = false;
int ret;
 
-   if (v2) {
-   vol_args_v2 = memdup_user(arg, sizeof(*vol_args_v2));
-   if (IS_ERR(vol_args_v2))
-   return PTR_ERR(vol_args_v2);
-
-   if (vol_args_v2->flags & ~BTRFS_SUBVOL_CREATE_SNAP_ASYNC) {
-   ret = -EINVAL;
-   goto out;
-   }
+   vol_args = memdup_user(arg, sizeof(*vol_args));
+   if (IS_ERR(vol_args))
+   return PTR_ERR(vol_args);
 
-   name = vol_args_v2->name;
-   fd = vol_args_v2->fd;
-   vol_args_v2->name[BTRFS_SUBVOL_NAME_MAX] = '\0';
-   if (vol_args_v2->flags & BTRFS_SUBVOL_CREATE_SNAP_ASYNC)
-   async = true;
-   } else {
-   vol_args = memdup_user(arg, sizeof(*vol_args));
-   if (IS_ERR(vol_args))
-   return PTR_ERR(vol_args);
-   name = vol_args->name;
-   fd = vol_args->fd;
-   vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
+   if (vol_args->flags & ~BTRFS_SUBVOL_CREATE_SNAP_ASYNC) {
+   ret = -EINVAL;
+   goto out;
}
 
-   ret = btrfs_ioctl_snap_create_transid(file, name, fd,
- subvol, &transid);
+   vol_args->name[BTRFS_SUBVOL_NAME_MAX] = '\0';
+   if (vol_args->flags & BTRFS_SUBVOL_CREATE_SNAP_ASYNC)
+   async = true;
+
+   ret = btrfs_ioctl_snap_create_transid(file, vol_args->name,
+ vol_args->fd, subvol, &transid);
 
if (!ret && async) {
if (copy_to_user(arg +
offsetof(struct btrfs_ioctl_vol_args_v2,
transid), &transid, sizeof(transid)))
-   return -EFAULT;
+   ret = -EFAULT;
}
 out:
kfree(vol_args);
-   kfree(vol_args_v2);
-
return ret;
 }
 
@@ -2253,11 +2254,11 @@ long btrfs_ioctl(struct file *file, unsigned int
case FS_IOC_GETVERSION:
return btrfs_ioctl_getversion(file, argp);
case BTRFS_IOC_SNAP_CREATE:
-   return btrfs_ioctl_snap_create(file, argp, 0, 0);
+   return btrfs_ioctl_snap_create(file, argp, 0);
case BTRFS_IOC_SNAP_CREATE_V2:
-   return btrfs_ioctl_snap_create(file, argp, 0, 1);
+   return btrfs_ioctl_snap_create_v2(file, argp, 0);
case BTRFS_IOC_SUBVOL_CREATE:
-   return btrfs_ioctl_snap_create(file, argp, 1, 0);
+   return btrfs_ioctl_snap_create(file, argp, 1);
case BTRFS_IOC_SNAP_DESTROY:
return btrfs_ioctl_snap_destroy(file, argp);
case BTRFS_IOC_DEFAULT_SUBVOL:
-- 
1.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 3/5] Btrfs: Add helper __setup_root_post()

2010-12-09 Thread Li Zefan

Eliminate duplicate code in btrfs_read_fs_root_no_radix() and
find_and_setup_root().

Also prepare for the next patch.

Signed-off-by: Li Zefan 
---
 fs/btrfs/disk-io.c |   31 +++
 1 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 33b6d45..ca20c6d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -987,14 +987,25 @@ static int __setup_root(u32 nodesize, u32 leafsize, u32 
sectorsize,
return 0;
 }
 
+static void __setup_root_post(struct btrfs_root *root)
+{
+   u32 blocksize;
+   u64 generation;
+
+   generation = btrfs_root_generation(&root->root_item);
+   blocksize = btrfs_level_size(root, btrfs_root_level(&root->root_item));
+   root->node = read_tree_block(root, btrfs_root_bytenr(&root->root_item),
+blocksize, generation);
+   BUG_ON(!root->node);
+   root->commit_root = btrfs_root_node(root);
+}
+
 static int find_and_setup_root(struct btrfs_root *tree_root,
   struct btrfs_fs_info *fs_info,
   u64 objectid,
   struct btrfs_root *root)
 {
int ret;
-   u32 blocksize;
-   u64 generation;
 
__setup_root(tree_root->nodesize, tree_root->leafsize,
 tree_root->sectorsize, tree_root->stripesize,
@@ -1005,12 +1016,7 @@ static int find_and_setup_root(struct btrfs_root 
*tree_root,
return -ENOENT;
BUG_ON(ret);
 
-   generation = btrfs_root_generation(&root->root_item);
-   blocksize = btrfs_level_size(root, btrfs_root_level(&root->root_item));
-   root->node = read_tree_block(root, btrfs_root_bytenr(&root->root_item),
-blocksize, generation);
-   BUG_ON(!root->node);
-   root->commit_root = btrfs_root_node(root);
+   __setup_root_post(root);
return 0;
 }
 
@@ -,8 +1117,6 @@ struct btrfs_root *btrfs_read_fs_root_no_radix(struct 
btrfs_root *tree_root,
struct btrfs_fs_info *fs_info = tree_root->fs_info;
struct btrfs_path *path;
struct extent_buffer *l;
-   u64 generation;
-   u32 blocksize;
int ret = 0;
 
root = kzalloc(sizeof(*root), GFP_NOFS);
@@ -1149,12 +1153,7 @@ struct btrfs_root *btrfs_read_fs_root_no_radix(struct 
btrfs_root *tree_root,
return ERR_PTR(ret);
}
 
-   generation = btrfs_root_generation(&root->root_item);
-   blocksize = btrfs_level_size(root, btrfs_root_level(&root->root_item));
-   root->node = read_tree_block(root, btrfs_root_bytenr(&root->root_item),
-blocksize, generation);
-   root->commit_root = btrfs_root_node(root);
-   BUG_ON(!root->node);
+   __setup_root_post(root);
 out:
if (location->objectid != BTRFS_TREE_LOG_OBJECTID)
root->ref_cows = 1;
-- 
1.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 4/5] Btrfs: Add readonly snapshots support

2010-12-09 Thread Li Zefan

Usage:

Set BTRFS_SUBVOL_RDONLY of btrfs_ioctl_vol_arg_v2->flags, and call
ioctl(BTRFS_I0CTL_SNAP_CREATE_V2).

Implementation:

- In disk set readonly bit of btrfs_root_item->flags, and in memory
set btrfs_root->readonly.

- Add readonly checks in btrfs_permission (inode_permission),
btrfs_setattr, btrfs_set/remove_xattr and some ioctls.

Signed-off-by: Li Zefan 
---
 fs/btrfs/ctree.h   |3 +++
 fs/btrfs/disk-io.c |5 +
 fs/btrfs/inode.c   |8 
 fs/btrfs/ioctl.c   |   40 +++-
 fs/btrfs/ioctl.h   |1 +
 fs/btrfs/transaction.c |8 
 fs/btrfs/transaction.h |1 +
 fs/btrfs/xattr.c   |   18 ++
 8 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index af52f6d..ad37c78 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -597,6 +597,8 @@ struct btrfs_dir_item {
u8 type;
 } __attribute__ ((__packed__));
 
+#define BTRFS_ROOT_SNAP_RDONLY (1ULL << 0)
+
 struct btrfs_root_item {
struct btrfs_inode_item inode;
__le64 generation;
@@ -1116,6 +1118,7 @@ struct btrfs_root {
int defrag_running;
char *name;
int in_sysfs;
+   bool readonly;
 
/* the dirty list is only used by non-reference counted roots */
struct list_head dirty_list;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ca20c6d..74748df 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -991,6 +991,7 @@ static void __setup_root_post(struct btrfs_root *root)
 {
u32 blocksize;
u64 generation;
+   u64 flags;
 
generation = btrfs_root_generation(&root->root_item);
blocksize = btrfs_level_size(root, btrfs_root_level(&root->root_item));
@@ -998,6 +999,10 @@ static void __setup_root_post(struct btrfs_root *root)
 blocksize, generation);
BUG_ON(!root->node);
root->commit_root = btrfs_root_node(root);
+
+   flags = btrfs_root_flags(&root->root_item);
+   if (flags & BTRFS_ROOT_SNAP_RDONLY)
+   root->readonly = true;
 }
 
 static int find_and_setup_root(struct btrfs_root *tree_root,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0f34cae..e99676d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3671,8 +3671,12 @@ static int btrfs_setattr_size(struct inode *inode, 
struct iattr *attr)
 static int btrfs_setattr(struct dentry *dentry, struct iattr *attr)
 {
struct inode *inode = dentry->d_inode;
+   struct btrfs_root *root = BTRFS_I(inode)->root;
int err;
 
+   if (root->readonly)
+   return -EROFS;
+
err = inode_change_ok(inode, attr);
if (err)
return err;
@@ -7207,6 +7211,10 @@ static int btrfs_set_page_dirty(struct page *page)
 
 static int btrfs_permission(struct inode *inode, int mask)
 {
+   struct btrfs_root *root = BTRFS_I(inode)->root;
+
+   if (root->readonly && (mask & MAY_WRITE))
+   return -EROFS;
if ((BTRFS_I(inode)->flags & BTRFS_INODE_READONLY) && (mask & 
MAY_WRITE))
return -EACCES;
return generic_permission(inode, mask, btrfs_check_acl);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5ecd532..db2b231 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -147,6 +147,9 @@ static int btrfs_ioctl_setflags(struct file *file, void 
__user *arg)
unsigned int flags, oldflags;
int ret;
 
+   if (root->readonly)
+   return -EROFS;
+
if (copy_from_user(&flags, arg, sizeof(flags)))
return -EFAULT;
 
@@ -360,7 +363,8 @@ fail:
 }
 
 static int create_snapshot(struct btrfs_root *root, struct dentry *dentry,
-  char *name, int namelen, u64 *async_transid)
+  char *name, int namelen, u64 *async_transid,
+  bool readonly)
 {
struct inode *inode;
struct dentry *parent;
@@ -378,6 +382,7 @@ static int create_snapshot(struct btrfs_root *root, struct 
dentry *dentry,
btrfs_init_block_rsv(&pending_snapshot->block_rsv);
pending_snapshot->dentry = dentry;
pending_snapshot->root = root;
+   pending_snapshot->readonly = readonly;
 
trans = btrfs_start_transaction(root->fs_info->extent_root, 5);
if (IS_ERR(trans)) {
@@ -509,7 +514,7 @@ static inline int btrfs_may_create(struct inode *dir, 
struct dentry *child)
 static noinline int btrfs_mksubvol(struct path *parent,
   char *name, int namelen,
   struct btrfs_root *snap_src,
-  u64 *async_transid)
+  u64 *async_transid, bool readonly)
 {
struct inode *dir  = parent->dentry->d_inode;
struct dentry *dentry;
@@ -541,7 +546,7 @@ static noinline int btrfs_mksubvol(struct path *parent,
 
if (snap_src) {

[PATCH v2 5/5] Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctl

2010-12-09 Thread Li Zefan

This allows us to set a snapshot or a subvolume readonly or writable
on the fly.

Usage:

Set BTRFS_SUBVOL_RDONLY of btrfs_ioctl_vol_arg_v2->flags, and then
call ioctl(BTRFS_IOCTL_SUBVOL_SETFLAGS);

Changelog for v2:
- Add _GETFLAGS ioctl.
- Check if the passed fd is the root of a subvolume.
- Change the name from _SNAP_SETFLAGS to _SUBVOL_SETFLAGS.

Signed-off-by: Li Zefan 
---
 fs/btrfs/ioctl.c |   92 ++
 fs/btrfs/ioctl.h |4 ++
 2 files changed, 96 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index db2b231..29304c7 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1010,6 +1010,94 @@ out:
return ret;
 }
 
+static noinline int btrfs_ioctl_subvol_getflags(struct file *file,
+   void __user *arg)
+{
+   struct inode *inode = fdentry(file)->d_inode;
+   struct btrfs_root *root = BTRFS_I(inode)->root;
+   struct btrfs_ioctl_vol_args_v2 *vol_args;
+   int ret = 0;
+   u64 flags = 0;
+
+   if (inode->i_ino != BTRFS_FIRST_FREE_OBJECTID)
+   return -EINVAL;
+
+   vol_args = memdup_user(arg, sizeof(*vol_args));
+   if (IS_ERR(vol_args))
+   return PTR_ERR(vol_args);
+
+   down_read(&root->fs_info->subvol_sem);
+   if (root->readonly)
+   flags |= BTRFS_SUBVOL_RDONLY;
+   up_read(&root->fs_info->subvol_sem);
+
+   if (copy_to_user(arg + offsetof(struct btrfs_ioctl_vol_args_v2, flags),
+&flags, sizeof(flags)))
+   ret = -EFAULT;
+
+   kfree(vol_args);
+   return ret;
+}
+
+static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
+ void __user *arg)
+{
+   struct inode *inode = fdentry(file)->d_inode;
+   struct btrfs_root *root = BTRFS_I(inode)->root;
+   struct btrfs_trans_handle *trans;
+   struct btrfs_ioctl_vol_args_v2 *vol_args;
+   u64 root_flags;
+   u64 flags;
+   int ret = 0;
+
+   if (root->fs_info->sb->s_flags & MS_RDONLY)
+   return -EROFS;
+
+   if (inode->i_ino != BTRFS_FIRST_FREE_OBJECTID)
+   return -EINVAL;
+
+   vol_args = memdup_user(arg, sizeof(*vol_args));
+   if (IS_ERR(vol_args))
+   return PTR_ERR(vol_args);
+   flags = vol_args->flags;
+
+   if (flags & ~BTRFS_SUBVOL_RDONLY) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   down_write(&root->fs_info->subvol_sem);
+
+   /* nothing to do */
+   if (!!(flags & BTRFS_SUBVOL_RDONLY) == root->readonly)
+   goto out_unlock;
+
+   root_flags = btrfs_root_flags(&root->root_item);
+   if (flags & BTRFS_SUBVOL_RDONLY)
+   btrfs_set_root_flags(&root->root_item,
+root_flags | BTRFS_ROOT_SNAP_RDONLY);
+   else
+   btrfs_set_root_flags(&root->root_item,
+root_flags & ~BTRFS_ROOT_SNAP_RDONLY);
+   root->readonly = !root->readonly;
+
+   trans = btrfs_start_transaction(root, 1);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto out_unlock;
+   }
+
+   ret = btrfs_update_root(trans, root,
+   &root->root_key, &root->root_item);
+
+   btrfs_commit_transaction(trans, root);
+out_unlock:
+   up_write(&root->fs_info->subvol_sem);
+out:
+   kfree(vol_args);
+   return ret;
+}
+
 /*
  * helper to check if the subvolume references other subvolumes
  */
@@ -2283,6 +2371,10 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_snap_create(file, argp, 1);
case BTRFS_IOC_SNAP_DESTROY:
return btrfs_ioctl_snap_destroy(file, argp);
+   case BTRFS_IOC_SUBVOL_GETFLAGS:
+   return btrfs_ioctl_subvol_getflags(file, argp);
+   case BTRFS_IOC_SUBVOL_SETFLAGS:
+   return btrfs_ioctl_subvol_setflags(file, argp);
case BTRFS_IOC_DEFAULT_SUBVOL:
return btrfs_ioctl_default_subvol(file, argp);
case BTRFS_IOC_DEFRAG:
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 192fa99..19f7259 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -194,4 +194,8 @@ struct btrfs_ioctl_space_args {
 #define BTRFS_IOC_WAIT_SYNC  _IOW(BTRFS_IOCTL_MAGIC, 22, __u64)
 #define BTRFS_IOC_SNAP_CREATE_V2 _IOW(BTRFS_IOCTL_MAGIC, 23, \
   struct btrfs_ioctl_vol_args_v2)
+#define BTRFS_IOC_SUBVOL_GETFLAGS _IOW(BTRFS_IOCTL_MAGIC, 25, \
+  struct btrfs_ioctl_vol_args_v2)
+#define BTRFS_IOC_SUBVOL_SETFLAGS _IOW(BTRFS_IOCTL_MAGIC, 26, \
+  struct btrfs_ioctl_vol_args_v2)
 #endif
-- 
1.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info a

[PATCH v2]Btrfs: pwrite blocked when writing from the mmaped buffer of the same page

2010-12-09 Thread Zhong, Xin

This problem is found in meego testing:
http://bugs.meego.com/show_bug.cgi?id=6672
A file in btrfs is mmaped and the mmaped buffer is passed to pwrite to write to 
the same page
of the same file. In btrfs_file_aio_write(), the pages is locked by 
prepare_pages(). So when
btrfs_copy_from_user() is called, page fault happens and the same page needs to 
be locked again
in filemap_fault(). The fix is to move iov_iter_fault_in_readable() before 
prepage_pages() to make page
fault happen before pages are locked. And also disable page fault in critical 
region in
btrfs_copy_from_user().

Reviewed-by: Yan, Zheng
Signed-off-by: Zhong, Xin 
---
 fs/btrfs/file.c |   92 ---
 1 files changed, 60 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index c1faded..66836d8 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -48,30 +48,34 @@ static noinline int btrfs_copy_from_user(loff_t pos, int 
num_pages,
 struct page **prepared_pages,
 struct iov_iter *i)
 {
-   size_t copied;
+   size_t copied = 0;
int pg = 0;
int offset = pos & (PAGE_CACHE_SIZE - 1);
+   int total_copied = 0;
 
while (write_bytes > 0) {
size_t count = min_t(size_t,
 PAGE_CACHE_SIZE - offset, write_bytes);
struct page *page = prepared_pages[pg];
-again:
-   if (unlikely(iov_iter_fault_in_readable(i, count)))
-   return -EFAULT;
-
-   /* Copy data from userspace to the current page */
-   copied = iov_iter_copy_from_user(page, i, offset, count);
+   /*
+* Copy data from userspace to the current page
+*
+* Disable pagefault to avoid recursive lock since
+* the pages are already locked
+*/
+   pagefault_disable();
+   copied = iov_iter_copy_from_user_atomic(page, i, offset, count);
+   pagefault_enable();
 
/* Flush processor's dcache for this page */
flush_dcache_page(page);
iov_iter_advance(i, copied);
write_bytes -= copied;
+   total_copied += copied;
 
+   /* Return to btrfs_file_aio_write to fault page */
if (unlikely(copied == 0)) {
-   count = min_t(size_t, PAGE_CACHE_SIZE - offset,
- iov_iter_single_seg_count(i));
-   goto again;
+   break;
}
 
if (unlikely(copied < PAGE_CACHE_SIZE - offset)) {
@@ -81,7 +85,7 @@ again:
offset = 0;
}
}
-   return 0;
+   return total_copied;
 }
 
 /*
@@ -854,6 +858,8 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
unsigned long last_index;
int will_write;
int buffered = 0;
+   int copied = 0;
+   int dirty_pages = 0;
 
will_write = ((file->f_flags & O_DSYNC) || IS_SYNC(inode) ||
  (file->f_flags & O_DIRECT));
@@ -970,7 +976,17 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
WARN_ON(num_pages > nrptrs);
memset(pages, 0, sizeof(struct page *) * nrptrs);
 
-   ret = btrfs_delalloc_reserve_space(inode, write_bytes);
+   /*
+* Fault pages before locking them in prepare_pages
+* to avoid recursive lock
+*/
+   if (unlikely(iov_iter_fault_in_readable(&i, write_bytes))) {
+   ret = -EFAULT;
+   goto out;
+   }
+
+   ret = btrfs_delalloc_reserve_space(inode,
+   num_pages << PAGE_CACHE_SHIFT);
if (ret)
goto out;
 
@@ -978,37 +994,49 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
pos, first_index, last_index,
write_bytes);
if (ret) {
-   btrfs_delalloc_release_space(inode, write_bytes);
+   btrfs_delalloc_release_space(inode,
+   num_pages << PAGE_CACHE_SHIFT);
goto out;
}
 
-   ret = btrfs_copy_from_user(pos, num_pages,
+   copied = btrfs_copy_from_user(pos, num_pages,
   write_bytes, pages, &i);
-   if (ret == 0) {
+   dirty_pages = (copied + PAGE_CACHE_SIZE - 1) >>
+   PAGE_CACHE_SHIFT;
+
+   if (num_pages > dirty_pages) {
+   if (copied > 0)
+   atomic_inc(
+

Re: The value displayed by 'ls -s' command is strange.

2010-12-09 Thread Miao Xie

On wed, 08 Dec 2010 08:53:55 +0900, Tsutomu Itoh wrote:
>>> I think that the disk allocation size of each file becomes a monotone 
>>> increase
>>> when the file is made.
>>> But, it sometimes return to 0.  Is it correct?
>>>
>>
>> The # of blocks is:
>>
>>  stat->blocks = (inode_get_bytes(inode) +
>>  BTRFS_I(inode)->delalloc_bytes)>>  9;
>>
>> So I think after sub(delalloc_bytes) and before inode_add_bytes(), you may
>> see 0 value.
> 
> Yes, I also think so.
> But, I think that such a state is too long for only the update timing...

Several months ago, some one posted a patch to get the allocated size of the 
compressed file,
http://marc.info/?l=linux-btrfs&m=128109745012238&w=2
this patch may help you to implement what you need.

Regards
Miao

>>
>>>
>>> The result of the test at 2.6.37-rc4 is shown below.
>>> (see inode no. 291)
>>>
>>>  # df -T /test14
>>>  FilesystemType   1K-blocks  Used Available Use% Mounted on
>>>  /dev/sdd14   btrfs 4162560  8736   3709440   1% /test14
>>>  # dd if=/dev/zero of=/test14/dir/as001.26603 bs=1M count=100
>>>  # dd if=/dev/zero of=/test14/dir/as002.26603 bs=1M count=200
>>>  # dd if=/dev/zero of=/test14/dir/sy001.26603 bs=1M count=300 
>>> oflag=direct
>>>  # dd if=/dev/zero of=/test14/dir/as003.26603 bs=1M count=400
>>>  # ls -lis /test14/dir
>>>  total 406528
>>>  288  0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
>>>  289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
>>>   ->  291  99328 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
>>>  290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
>>>  # sleep 3
>>>  # ls -lis /test14/dir
>>>  total 406528
>>>  288  0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
>>>  289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
>>>   ->  291  99328 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
>>>  290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
>>>  # sleep 3
>>>  # ls -lis /test14/dir
>>>  total 307200
>>>  288  0 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
>>>  289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
>>>   ->  291  0 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
>>>  290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
>>>  # sleep 3
>>>  # ls -lis /test14/dir
>>>  total 409600
>>>  288 102400 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
>>>  289  0 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
>>>   ->  291  0 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
>>>  290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
>>>  # sync
>>>  # ls -lis /test14/dir
>>>  total 1024000
>>>  288 102400 -rw-r--r-- 1 root root 104857600 Dec  7 15:07 as001.26603
>>>  289 204800 -rw-r--r-- 1 root root 209715200 Dec  7 15:07 as002.26603
>>>   ->  291 409600 -rw-r--r-- 1 root root 419430400 Dec  7 15:08 as003.26603
>>>  290 307200 -rw-r--r-- 1 root root 314572800 Dec  7 15:08 sy001.26603
>>>
>>> The trace result of btrfs_getattr() is shown below.
>>>
>>>   Dec  7 15:08:03 luna kernel: ino:291 blocks:198656 i_blocks:0 i_bytes:0 
>>> delalloc_bytes:101711872
>>>   Dec  7 15:08:06 luna kernel: ino:291 blocks:198656 i_blocks:0 i_bytes:0 
>>> delalloc_bytes:101711872
>>>   Dec  7 15:08:09 luna kernel: ino:291 blocks:0 i_blocks:0 i_bytes:0 
>>> delalloc_bytes:0
>>>   Dec  7 15:08:12 luna kernel: ino:291 blocks:0 i_blocks:0 i_bytes:0 
>>> delalloc_bytes:0
>>>   Dec  7 15:08:18 luna kernel: ino:291 blocks:819200 i_blocks:819200 
>>> i_bytes:0 delalloc_bytes:0
>>>
>>>
>>> Regards,
>>> Itoh
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

mount snapshot by object-id

2010-12-09 Thread Michael Niederle

Hi!

I'm currently writing a btrfs-rescue tool and therefor began to study the
btrfs-on-disk structures in detail.

The root tree contains a ROOT_ITEM entry for *every* subvolume in the whole
file system, but only DIR_ITEM entries for subvolumes that were created in the
root directory of the filesystem.

If we have a destroyed root directory, there is no way to access subvolumes
stored deeper in the fs-tree by name. It might be nice to have the ability to
mount a subvolume by object-id. A simple tool could list all available
subvolumes with their object-ids, generation numbers, and so on.

Greetings, Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: mount snapshot by object-id

2010-12-09 Thread Calvin Walton

On Thu, 2010-12-09 at 14:51 +0100, Michael Niederle wrote:
> Hi!
> 
> I'm currently writing a btrfs-rescue tool and therefor began to study the
> btrfs-on-disk structures in detail.
> 
> The root tree contains a ROOT_ITEM entry for *every* subvolume in the whole
> file system, but only DIR_ITEM entries for subvolumes that were created in the
> root directory of the filesystem.
> 
> If we have a destroyed root directory, there is no way to access subvolumes
> stored deeper in the fs-tree by name. It might be nice to have the ability to
> mount a subvolume by object-id. A simple tool could list all available
> subvolumes with their object-ids, generation numbers, and so on.

Unless I misunderstand something, this is actually already possible;
just poorly documented. You can use the 'subvolid=' mount option to
mount by object id; the object id can be found using the tool
btrfs subvolume list 

The subvolid= mount option was missing from the wiki, so I've just added
it:
https://btrfs.wiki.kernel.org/index.php/Getting_started#Mount_Options

-- 
Calvin Walton 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: mount snapshot by object-id

2010-12-09 Thread Dipl.-Ing. Michael Niederle

Hi, Calvin!

Thanks a lot for this information and for updating the wiki!

The option works - on healthy disks ...

I will continue writing my rescue-tool. I also wrote a btrfs_subvolumes command
that displays all subvolumes of an unmounted filesystem. This helps a lot if
mounting the filesystem leads to a crash!

Greetings, Michael




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fs/vfs/security: pass last path component to LSM on inode creation

2010-12-09 Thread John Stoffel

> "Eric" == Eric Paris  writes:

Eric> SELinux would like to implement a new labeling behavior of newly
Eric> created inodes.  We currently label new inodes based on the
Eric> parent and the creating process.  This new behavior would also
Eric> take into account the name of the new object when deciding the
Eric> new label.  This is not the (supposed) full path, just the last
Eric> component of the path.

Eric> This is very useful because creating /etc/shadow is different
Eric> than creating /etc/passwd but the kernel hooks are unable to
Eric> differentiate these operations.  We currently require that
Eric> userspace realize it is doing some difficult operation like that
Eric> and than userspace jumps through SELinux hoops to get things set
Eric> up correctly.  This patch does not implement new behavior, that
Eric> is obviously contained in a seperate SELinux patch, but it does
Eric> pass the needed name down to the correct LSM hook.  If no such
Eric> name exists it is fine to pass NULL.

I've looked this patch over, and maybe I'm missing something, but how
does knowing the name of the file really tell you anything, esp when
you only get the filename, not the path?  What threat are you
addressing with this change?  

So what happens when I create a file /home/john/shadow, does selinux
(or LSM in general) then run extra checks because the filename is
'shadow' in your model?  

I *think* the overhead shouldn't be there if SELINUX is disabled, but
have you confirmed this?  How you run performance tests before/after
this change when doing lots of creations of inodes to see what sort of
performance changes might be there?

Thanks,
John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: What to do about subvolumes?

2010-12-09 Thread J. Bruce Fields

On Wed, Dec 08, 2010 at 09:41:33PM -0700, Andreas Dilger wrote:
> On 2010-12-08, at 16:07, Neil Brown wrote:
> > On Mon, 6 Dec 2010 11:48:45 -0500 "J. Bruce Fields" 
> > wrote:
> > 
> >> On Fri, Dec 03, 2010 at 04:01:44PM -0700, Andreas Dilger wrote:
> >>> Any chance we can add a ->get_fsid(sb, inode) method to
> >>> export_operations (or something simiar), that allows the
> >>> filesystem to generate an FSID based on the volume and
> >>> inode that is being exported?
> >> 
> >> No objection from here.
> > 
> > My standard objection here is that you cannot guarantee that the
> > fsid is 100% guarantied to be unique across all filesystems in
> > the system (including filesystems mounted from dm snapshots of
> > filesystems that are currently mounted).  NFSd needs this uniqueness.
> 
> Sure, but you also cannot guarantee that the devno is constant across 
> reboots, yet NFS continues to use this much-less-constant value...
> 
> > This is only really an objection if user-space cannot over-ride
> > the fsid provided by the filesystem.
> 
> Agreed.  It definitely makes sense to allow this, for whatever strange 
> circumstances might arise.  However, defaulting to using the filesystem UUID 
> definitely makes the most sense, and looking at the nfs-utils mountd code, it 
> seems that this is already standard behaviour for local block devices 
> (excluding "btrfs" filesystems).
> 
> > I'd be very happy to see an interface to user-space whereby
> > user-space can get a reasonably unique fsid for a given
> > filesystem.
> 
> Hmm, maybe I'm missing something, but why does userspace need to be able to 
> get this value?  I would think that nfsd gets it from the filesystem directly 
> in the kernel, but if a "uuid=" option is present in the exports file that is 
> preferentially used over the value from the filesystem.

Well, the kernel can't distinguish the case of an explicit "uuid="
option in /etc/exports from one that was (as is the normal default)
generated automatically by mountd.  Maybe not a big deal.

The uuid seems like a useful thing to have access to from userspace
anyway, for userspace nfs servers if for no other reason:

> That said, I think Aneesh's open_by_handle patchset also made the UUID 
> visible in /proc//mountinfo, after the filesystems stored it in
> sb->s_uuid at mount time.  That _should_ make it visible for non-block 
> mountpoints as well, assuming they fill in s_uuid.
> 
> > Whether this is an export_operations method or some field in the
> > 'struct super' which gets copied out doesn't matter to me.
> 
> Since Aneesh has already developed patches, is there any objection to using 
> those (last sent to linux-fsdevel on 2010-10-29):
> 
> [PATCH -V22 12/14] vfs: Export file system uuid via /proc//mountinfo
> [PATCH -V22 13/14] ext3: Copy fs UUID to superblock.
> [PATCH -V22 14/14] ext4: Copy fs UUID to superblock

I can't see anything wrong with that.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fs/vfs/security: pass last path component to LSM on inode creation

2010-12-09 Thread Eric Paris

On Thu, 2010-12-09 at 10:05 -0500, John Stoffel wrote:
> > "Eric" == Eric Paris  writes:

> So what happens when I create a file /home/john/shadow, does selinux
> (or LSM in general) then run extra checks because the filename is
> 'shadow' in your model?  

It's entirely a question of labeling and one that was discussed on the
LSM list in some detail:

http://marc.info/?t=12914130822&r=1&w=2

The basic synopsis is that when a new inode is created SELinux must
apply some label.  It makes the decision for what label to apply based
on 3 pieces of information.

The label of the parent inode.
The label of the process creating the new inode.
The 'class' of the inode, S_ISREG, S_ISDIR, S_ISLNK, etc

This patch adds a 4th piece of information, the name of the object being
created.  An obvious situation where this will be useful is devtmpfs
(although you'll find other examples in the above thread).  devtmpfs
when it creates char/block devices is unable to distinguish between kmem
and console and so they are created with a generic label.  hotplug/udev
is then called which does some pathname like matching and relabels them
to something more specific.  We've found that many people are able to
race against this particular updating and get spurious denials in /dev.
With this patch devtmpfs will be able to get the labels correct to begin
with.

I'm certainly willing to discuss the security implications of this
patch, but that would probably be best done with a significantly
shortened cc-list.  You'll see in the above mentioned thread that a
number of 'security' people (even those who are staunchly anti-SELinux)
recognize there is value in this and that it is certainly much better
than we have today.

> I *think* the overhead shouldn't be there if SELINUX is disabled, but
> have you confirmed this?  How you run performance tests before/after
> this change when doing lots of creations of inodes to see what sort of
> performance changes might be there?

I've actually recently done some perf testing on creating large numbers
of inodes using bonnie++, since SELinux was a noticeable overhead in
that operation.  Doing that same test with SELinux disabled (or enabled)
I do not see a noticeable difference when this patch is applied or not.
It's just an extra argument to a function that goes unused.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fs/vfs/security: pass last path component to LSM on inode creation

2010-12-09 Thread Serge Hallyn

Quoting John Stoffel (j...@stoffel.org):
> > "Eric" == Eric Paris  writes:
> 
> Eric> SELinux would like to implement a new labeling behavior of newly
> Eric> created inodes.  We currently label new inodes based on the
> Eric> parent and the creating process.  This new behavior would also
> Eric> take into account the name of the new object when deciding the
> Eric> new label.  This is not the (supposed) full path, just the last
> Eric> component of the path.
> 
> Eric> This is very useful because creating /etc/shadow is different
> Eric> than creating /etc/passwd but the kernel hooks are unable to
> Eric> differentiate these operations.  We currently require that
> Eric> userspace realize it is doing some difficult operation like that
> Eric> and than userspace jumps through SELinux hoops to get things set
> Eric> up correctly.  This patch does not implement new behavior, that
> Eric> is obviously contained in a seperate SELinux patch, but it does
> Eric> pass the needed name down to the correct LSM hook.  If no such
> Eric> name exists it is fine to pass NULL.
> 
> I've looked this patch over, and maybe I'm missing something, but how
> does knowing the name of the file really tell you anything, esp when
> you only get the filename, not the path?  What threat are you
> addressing with this change?  

Like you, I keep thinking back to this patch and going back and forth.
But to answer your question:  in some cases, the name of the file
(plus the context of the directory in which it is created) can tell
you what assumptions userspace will make about it.  And userspace most
definately is a part of the TCB, i.e. /bin/passwd and /bin/login.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Fsck, parent transid verify failed

2010-12-09 Thread Tom Kuther

Chris Mason  oracle.com> writes:

> [...]
> Build the latest tools, then:
> 
> btrfsck -s 1 /dev/xxx
> btrfsck -s 2 /dev/xxx
> 
> If either of these work we have an easy way to get it mounted.  Just let
> me know.
> 

Hello,

I get those "parent transid verify failed" errors too after a system failure.

# btrfsck -s 1 /dev/md0 
using SB copy 1, bytenr 67108864
found 1954912653312 bytes used err is 0
total csum bytes: 1892054684
total tree bytes: 3455627264
total fs tree bytes: 1082691584
btree space waste bytes: 584155173
file data blocks allocated: 12808940421120
 referenced 1933520879616
Btrfs v0.19-35-g1b444cd-dirty
# btrfsck -s 2 /dev/md0 
using SB copy 2, bytenr 274877906944
found 1954912653312 bytes used err is 0
-snip-

Both seem to work.
What would be the steps to get it mounted?

Thanks in advance.

~thomas


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Fsck, parent transid verify failed

2010-12-09 Thread Chris Mason

Excerpts from Tommy Jonsson's message of 2010-12-08 15:07:58 -0500:
> >> Build the latest tools, then:
> >>
> >> btrfsck -s 1 /dev/xxx
> >> btrfsck -s 2 /dev/xxx
> >>
> >> If either of these work we have an easy way to get it mounted.  Just let
> >> me know.
> >>
> >> -chris
> >>
> 
> > $ btrfsck -s 1 /dev/sda
> > using SB copy 1, bytenr 67108864
> > parent transid verify failed on 2721514774528 wanted 39651 found 39649
> > btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root->node)' 
> > failed.
> > Aborted
> >
> > $ btrfsck -s 2 /dev/sda
> > using SB copy 2, bytenr 274877906944
> > parent transid verify failed on 2721514774528 wanted 39651 found 39649
> > btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root->node)' 
> > failed.
> > Aborted
> >
> > Tried "btrfs-debug-tree /dev/sda" and "btrfs-debug-tree -e /dev/sda" :
> > parent transid verify failed on 2721514774528 wanted 39651 found 39649
> > btrfs-debug-tree: disk-io.c:739: open_ctree_fd: Assertion
> > `!(!tree_root->node)' failed.
> >
> > dmesg said:
> > [268375.903581] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 2 
> > transid 39650 /dev/sdd
> > [268375.904241] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 1 
> > transid 39651 /dev/sdc
> > [268375.904526] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 
> > transid 39651 /dev/sda
> 
> Sorry to be a bother, but do you have any other suggestions ?

Not a bother at all, I'm polishing off a version of fsck that I hope
will be able to construct a good tree for you.  It's my main priority
right now and I hope to have something ready early Monday.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Permissions model for btrfs?

2010-12-09 Thread Wayne Pollock

I looked though the wiki (and "searched" the archives) but
don't see an answer.  Will btrfs support old POSIX-style ACLs
and permissions, or the new NFS/NT style ACLs like ZFS?  From
the patch I saw, it seems old POSIX ACLs and permissions, but
I'd like to know for sure.  (And maybe the FAQ on the wiki could
address this?)  If it is the older POSIC ACLs, is there any plan
to support NFSv4 ACLs in the future?

On a related note, will btrfs support any ext4 attributes (via chattr)?

Thanks!

-- 
Wayne Pollock
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fs/vfs/security: pass last path component to LSM on inode creation

2010-12-09 Thread John Stoffel

> "Eric" == Eric Paris  writes:

Eric> On Thu, 2010-12-09 at 10:05 -0500, John Stoffel wrote:
>> > "Eric" == Eric Paris  writes:

>> So what happens when I create a file /home/john/shadow, does selinux
>> (or LSM in general) then run extra checks because the filename is
>> 'shadow' in your model?  

Eric> It's entirely a question of labeling and one that was discussed on the
Eric> LSM list in some detail:

Eric> http://marc.info/?t=12914130822&r=1&w=2

Thank you for pointing me at this discussion.  I'm working my way
through it, but so far I'm not seeing any consensus that this is
really the proper thing to do.  I personally feel this should be in
userspace if at all possible.

Eric> The basic synopsis is that when a new inode is created SELinux
Eric> must apply some label.  It makes the decision for what label to
Eric> apply based on 3 pieces of information.

Eric> The label of the parent inode.
Eric> The label of the process creating the new inode.
Eric> The 'class' of the inode, S_ISREG, S_ISDIR, S_ISLNK, etc

These seem to be ok, if you're using label based security.  But since
I freely admit I'm not an expert or even a user, I'm just trying to
understand and push back to make sure we do what's good.  And which
doesn't impact non-SElinux users.  

Eric> This patch adds a 4th piece of information, the name of the
Eric> object being created.  An obvious situation where this will be
Eric> useful is devtmpfs (although you'll find other examples in the
Eric> above thread).  devtmpfs when it creates char/block devices is
Eric> unable to distinguish between kmem and console and so they are
Eric> created with a generic label.  hotplug/udev is then called which
Eric> does some pathname like matching and relabels them to something
Eric> more specific.  We've found that many people are able to race
Eric> against this particular updating and get spurious denials in
Eric> /dev.  With this patch devtmpfs will be able to get the labels
Eric> correct to begin with.

So your Label based access controls are *also* based on pathnames?
Right?  

Eric> I'm certainly willing to discuss the security implications of this
Eric> patch, but that would probably be best done with a significantly
Eric> shortened cc-list.  You'll see in the above mentioned thread that a
Eric> number of 'security' people (even those who are staunchly anti-SELinux)
Eric> recognize there is value in this and that it is certainly much better
Eric> than we have today.

>> I *think* the overhead shouldn't be there if SELINUX is disabled, but
>> have you confirmed this?  How you run performance tests before/after
>> this change when doing lots of creations of inodes to see what sort of
>> performance changes might be there?

Eric> I've actually recently done some perf testing on creating large
Eric> numbers of inodes using bonnie++, since SELinux was a noticeable
Eric> overhead in that operation.  Doing that same test with SELinux
Eric> disabled (or enabled) I do not see a noticeable difference when
Eric> this patch is applied or not.  It's just an extra argument to a
Eric> function that goes unused.

That answers alot of my concerns then.  Not having it impact users in
a non-SELinux context is vitally important to me.

Thanks,
John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Ted Ts'o

On Tue, Dec 07, 2010 at 09:37:20PM -0600, Jon Nelson wrote:
> One difference is the location of the transaction logs (pg_xlog). In
> my case, /var/lib/pgsql/data *is* mountpoint for the test volume
> (actually, it's a symlink to the mount point). In your case, that is
> not so. Perhaps that makes a difference?  pgsql_tmp might also be on
> two different volumes in your case (I can't be sure).

I just tried tried to run t.sql five times with /var/lib/postgres as a
mountpoint for a 5400 rpm disk, and I still haven't been able to
replicate it.

If you can point out how to query pgsql_tmp (I'm using a completely
default postgres install), that would be helpful, but I don't think it
would be going anywhere else.

Still trying

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Permissions model for btrfs?

2010-12-09 Thread Sean Bartell

On Thu, Dec 09, 2010 at 12:35:35PM -0500, Wayne Pollock wrote:
> I looked though the wiki (and "searched" the archives) but
> don't see an answer.  Will btrfs support old POSIX-style ACLs
> and permissions, or the new NFS/NT style ACLs like ZFS?  From
> the patch I saw, it seems old POSIX ACLs and permissions, but
> I'd like to know for sure.  (And maybe the FAQ on the wiki could
> address this?)  If it is the older POSIC ACLs, is there any plan
> to support NFSv4 ACLs in the future?

Right now it supports POSIX ACLs. I don't know about future plans.

> On a related note, will btrfs support any ext4 attributes (via chattr)?

It currently supports AaDdiS.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fs/vfs/security: pass last path component to LSM on inode creation

2010-12-09 Thread Eric Paris

On Thu, 2010-12-09 at 12:48 -0500, John Stoffel wrote:
> > "Eric" == Eric Paris  writes:
> 
> Eric> On Thu, 2010-12-09 at 10:05 -0500, John Stoffel wrote:
> >> > "Eric" == Eric Paris  writes:
> 
> Eric> This patch adds a 4th piece of information, the name of the
> Eric> object being created.  An obvious situation where this will be
> Eric> useful is devtmpfs (although you'll find other examples in the
> Eric> above thread).  devtmpfs when it creates char/block devices is
> Eric> unable to distinguish between kmem and console and so they are
> Eric> created with a generic label.  hotplug/udev is then called which
> Eric> does some pathname like matching and relabels them to something
> Eric> more specific.  We've found that many people are able to race
> Eric> against this particular updating and get spurious denials in
> Eric> /dev.  With this patch devtmpfs will be able to get the labels
> Eric> correct to begin with.
> 
> So your Label based access controls are *also* based on pathnames?
> Right?

Access decisions are still based solely on the label.  This patch can
influence how new objects get their label, which makes the access
decisions indirectly path based.  You'll find a reasonable summary and
commentary on lwn in this weeks security section.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Jon Nelson

On Thu, Dec 9, 2010 at 12:01 PM, Ted Ts'o  wrote:
> On Tue, Dec 07, 2010 at 09:37:20PM -0600, Jon Nelson wrote:
>> One difference is the location of the transaction logs (pg_xlog). In
>> my case, /var/lib/pgsql/data *is* mountpoint for the test volume
>> (actually, it's a symlink to the mount point). In your case, that is
>> not so. Perhaps that makes a difference?  pgsql_tmp might also be on
>> two different volumes in your case (I can't be sure).
>
> I just tried tried to run t.sql five times with /var/lib/postgres as a
> mountpoint for a 5400 rpm disk, and I still haven't been able to
> replicate it.

You should be OK, there. Are you using encryption or no?
I had difficulty replicating the issue without encryption.

> If you can point out how to query pgsql_tmp (I'm using a completely
> default postgres install), that would be helpful, but I don't think it
> would be going anywhere else.

Normally it's /var/lib/pgsql/data/pgsql_tmp (or
/var/lib/postgres/data/pgsql_tmp in your case). By placing
/var/lib/{postgresql,pgsql}/data on the LUKS + ext4 volume, on both
openSUSE 11.3 and Kubuntu, I was able to replicate the problem easily,
in VirtualBox. I can give qemu a try. In both cases I was using a
2.6.37x kernel.

-- 
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs-progs: update super fields for space cache

2010-12-09 Thread Josef Bacik

This patch updates the super field to add the cache_generation member.  It also
makes us set it to -1 on mkfs so any new filesystem will get the space cache
stuff turned on.  Thanks,

Signed-off-by: Josef Bacik 
---
 ctree.h |6 +-
 utils.c |1 +
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/ctree.h b/ctree.h
index b79e238..962c510 100644
--- a/ctree.h
+++ b/ctree.h
@@ -340,8 +340,10 @@ struct btrfs_super_block {
 
char label[BTRFS_LABEL_SIZE];
 
+   __le64 cache_generation;
+
/* future expansion */
-   __le64 reserved[32];
+   __le64 reserved[31];
u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE];
 } __attribute__ ((__packed__));
 
@@ -1564,6 +1566,8 @@ BTRFS_SETGET_STACK_FUNCS(super_incompat_flags, struct 
btrfs_super_block,
 incompat_flags, 64);
 BTRFS_SETGET_STACK_FUNCS(super_csum_type, struct btrfs_super_block,
 csum_type, 16);
+BTRFS_SETGET_STACK_FUNCS(super_cache_generation, struct btrfs_super_block,
+cache_generation, 64);
 
 static inline int btrfs_super_csum_size(struct btrfs_super_block *s)
 {
diff --git a/utils.c b/utils.c
index 2f4c6e1..b890728 100644
--- a/utils.c
+++ b/utils.c
@@ -103,6 +103,7 @@ int make_btrfs(int fd, const char *device, const char 
*label,
btrfs_set_super_stripesize(&super, stripesize);
btrfs_set_super_csum_type(&super, BTRFS_CSUM_TYPE_CRC32);
btrfs_set_super_chunk_root_generation(&super, 1);
+   btrfs_set_super_cache_generation(&super, -1);
if (label)
strcpy(super.label, label);
 
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs-progs: add support for mixed data+metadata block groups

2010-12-09 Thread Josef Bacik

So alot of crazy people (I'm looking at you Meego) want to use btrfs on phones
and such with small devices.  Unfortunately the way we split out metadata/data
chunks it makes space usage inefficient for volumes that are smaller than
1gigabyte.  So add a -M option for mixing metadata+data, and default to this
mixed mode if the filesystem is less than or equal to 1 gigabyte.  I've tested
this with xfstests on a 100mb filesystem and everything is a-ok.

Signed-off-by: Josef Bacik 
---
 btrfs-vol.c  |4 +-
 btrfs_cmds.c |   13 +-
 ctree.h  |   10 +++--
 mkfs.c   |  122 +-
 utils.c  |   10 ++--
 utils.h  |2 +-
 6 files changed, 112 insertions(+), 49 deletions(-)

diff --git a/btrfs-vol.c b/btrfs-vol.c
index 8069778..7200bbc 100644
--- a/btrfs-vol.c
+++ b/btrfs-vol.c
@@ -129,7 +129,9 @@ int main(int ac, char **av)
exit(1);
}
if (cmd == BTRFS_IOC_ADD_DEV) {
-   ret = btrfs_prepare_device(devfd, device, 1, &dev_block_count);
+   int mixed = 0;
+
+   ret = btrfs_prepare_device(devfd, device, 1, &dev_block_count, 
&mixed);
if (ret) {
fprintf(stderr, "Unable to init %s\n", device);
exit(1);
diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index 8031c58..683aec0 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -705,6 +705,7 @@ int do_add_volume(int nargs, char **args)
int devfd, res;
u64 dev_block_count = 0;
struct stat st;
+   int mixed = 0;
 
devfd = open(args[i], O_RDWR);
if (!devfd) {
@@ -727,7 +728,7 @@ int do_add_volume(int nargs, char **args)
continue;
}
 
-   res = btrfs_prepare_device(devfd, args[i], 1, &dev_block_count);
+   res = btrfs_prepare_device(devfd, args[i], 1, &dev_block_count, 
&mixed);
if (res) {
fprintf(stderr, "ERROR: Unable to init '%s'\n", 
args[i]);
close(devfd);
@@ -889,8 +890,14 @@ int do_df_filesystem(int nargs, char **argv)
memset(description, 0, 80);
 
if (flags & BTRFS_BLOCK_GROUP_DATA) {
-   snprintf(description, 5, "%s", "Data");
-   written += 4;
+   if (flags & BTRFS_BLOCK_GROUP_METADATA) {
+   snprintf(description, 15, "%s",
+"Data+Metadata");
+   written += 14;
+   } else {
+   snprintf(description, 5, "%s", "Data");
+   written += 4;
+   }
} else if (flags & BTRFS_BLOCK_GROUP_SYSTEM) {
snprintf(description, 7, "%s", "System");
written += 6;
diff --git a/ctree.h b/ctree.h
index 962c510..ed83d02 100644
--- a/ctree.h
+++ b/ctree.h
@@ -352,13 +352,15 @@ struct btrfs_super_block {
  * ones specified below then we will fail to mount
  */
 #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF   (1ULL << 0)
-#define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL  (2ULL << 0)
+#define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL  (1ULL << 1)
+#define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS(1ULL << 2)
 
 #define BTRFS_FEATURE_COMPAT_SUPP  0ULL
 #define BTRFS_FEATURE_COMPAT_RO_SUPP   0ULL
-#define BTRFS_FEATURE_INCOMPAT_SUPP\
-   (BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF | \
-BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL)
+#define BTRFS_FEATURE_INCOMPAT_SUPP\
+   (BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF | \
+BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL |\
+BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS)
 
 /*
  * A leaf is full of items. offset and size tell us where to find
diff --git a/mkfs.c b/mkfs.c
index 2e99b95..04de93a 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -69,7 +69,7 @@ static u64 parse_size(char *s)
return atol(s) * mult;
 }
 
-static int make_root_dir(struct btrfs_root *root)
+static int make_root_dir(struct btrfs_root *root, int mixed)
 {
struct btrfs_trans_handle *trans;
struct btrfs_key location;
@@ -88,30 +88,47 @@ static int make_root_dir(struct btrfs_root *root)
 0, BTRFS_MKFS_SYSTEM_GROUP_SIZE);
BUG_ON(ret);
 
-   ret = btrfs_alloc_chunk(trans, root->fs_info->extent_root,
-   &chunk_start, &chunk_size,
-   BTRFS_BLOCK_GROUP_METADATA);
-   BUG_ON(ret);
-   ret = btrfs_make_block_group(trans, root, 0,
-BTRFS_BLOCK_GROUP_METADATA,
-BTRFS_FIRST_CHUNK_TREE_OBJECTID,
-chunk_start, chunk_size);
-   BUG_ON(ret);
+   if (mixed) {
+

Re: [PATCH v2 4/5] Btrfs: Add readonly snapshots support

2010-12-09 Thread Goffredo Baroncelli

Hi Li,

On Thursday, 09 December, 2010, Li Zefan wrote:
> Usage:
> 
> Set BTRFS_SUBVOL_RDONLY of btrfs_ioctl_vol_arg_v2->flags, and call
> ioctl(BTRFS_I0CTL_SNAP_CREATE_V2).
> 
> Implementation:
> 
> - In disk set readonly bit of btrfs_root_item->flags, and in memory
> set btrfs_root->readonly.
> 
> - Add readonly checks in btrfs_permission (inode_permission),
> btrfs_setattr, btrfs_set/remove_xattr and some ioctls.
> 
> Signed-off-by: Li Zefan 
> ---
>  fs/btrfs/ctree.h   |3 +++
>  fs/btrfs/disk-io.c |5 +
>  fs/btrfs/inode.c   |8 
>  fs/btrfs/ioctl.c   |   40 +++-
>  fs/btrfs/ioctl.h   |1 +
>  fs/btrfs/transaction.c |8 
>  fs/btrfs/transaction.h |1 +
>  fs/btrfs/xattr.c   |   18 ++
>  8 files changed, 75 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index af52f6d..ad37c78 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -597,6 +597,8 @@ struct btrfs_dir_item {
>   u8 type;
>  } __attribute__ ((__packed__));
>  
> +#define BTRFS_ROOT_SNAP_RDONLY   (1ULL << 0)
> +
>  struct btrfs_root_item {
>   struct btrfs_inode_item inode;
>   __le64 generation;
> @@ -1116,6 +1118,7 @@ struct btrfs_root {
>   int defrag_running;
>   char *name;
>   int in_sysfs;
> + bool readonly;

Does make sense to store the same information in two places ?
If we have access to root->readonly, we have also access to "root-
>root_item.flags". Because we need the latter, we can get rid of the former.


We can replace a test like

if(root->readonly) 

with

if(root->root_item.flags & BTRFS_ROOT_SNAP_RDONLY)

Or better we can create a two macros like:

#define btrfs_root_readonly(x) ((x)->root_item.flags & BTRFS_ROOT_SNAP_RDONLY)
#define btrfs_root_set_readonly(x, ro)  \
do{ (x)->root_item.flags =  \
((x)->root_item.flags & ~BTRFS_ROOT_SNAP_RDONLY) |  \
(ro ? BTRFS_ROOT_SNAP_RDONLY : 0 ); \
}while(0)


Sorry for to be too late for this kind of suggestion. But I think that this 
optimization may help to avoid misalignment between the two variables (see my 
reply in the next patch).
[...]
-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) 
Key fingerprint = 4769 7E51 5293 D36C 814E  C054 BF04 F161 3DC5 0512
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs subvolume snapshot hung in btrfs_commit_transaction

2010-12-09 Thread Ian! D. Allen

Hello developers - Last chance to get more details on this btrfs hang
(below) before I reboot the machine.  Anything I can do to gather
more data?

Linux linux 2.6.35-23-generic #40-Ubuntu SMP Wed Nov 17 22:14:33 UTC 2010 
x86_64 GNU/Linux

Description: Ubuntu 10.10

Package: btrfs-tools
Status: install ok installed
Priority: optional
Section: admin
Installed-Size: 1408
Maintainer: Ubuntu Developers 
Architecture: amd64
Version: 0.19+20100601-3
Depends: e2fslibs (>= 1.37), libc6 (>= 2.7), libcomerr2 (>= 1.01), libuuid1 (>= 
2.16), zlib1g (>= 1:1.2.0)
Original-Maintainer: Daniel Baumann 
Homepage: http://btrfs.wiki.kernel.org/

On Wed, Dec 08, 2010 at 09:01:02AM -0500, Ian! D. Allen wrote:
> I've been exercising btrfs doing a continuous loop of:
> 
> - delete an old snapshot to keep disk space about the same
> - create snapshot from previous snapshot
> - rsync root into new snapshot
> 
> I have room for 150 snapshots on disk.  I delete the oldest, create
> the newest, do the rsync into the newest, repeat.  It hung today on
> snapshot 564:
> 
> $ ps uww 24575
> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> root 24575  0.0  0.0   6224   332 pts/10   DN   07:35   0:00 btrfs 
> subvolume snapshot /mnt/sde1/snap564 /mnt/sde1/snap565
> 
> $ ps lww 24575
> F   UID   PID  PPID PRI  NIVSZ   RSS WCHAN  STAT TTYTIME COMMAND
> 4 0 24575 27716  35   -   6224   332 btrfs_ DN   pts/10 0:00 btrfs 
> subvolume snapshot /mnt/sde1/snap564 /mnt/sde1/snap565
> 
> $ ps -o wchan 24575
> WCHAN
> btrfs_commit_transaction
> 
> No messages in "dmesg" or kernel log.  Anyone want me to run some other
> debug tests to find out what is wrong?  Anything that tries to access
> anything inside the btrfs file system /dev/sde1 hangs uninterruptably:
> 
> 1 0  1863 2  20   0  0 0 wait_f D?  0:29 
> [btrfs-transacti]
> 4 0  4933  4925  20   0  26524  2864 lookup D+   pts/10 0:02 /bin/bash
> 1   777 27995  7318  20   0  26576  1784 vfs_re D+   pts/52 0:00 bash
> 0   777 29395  7284  20   0  21856   688 vfs_re Dpts/51 0:00 ls -abp 
> --color=auto /mnt/sde1
> 0   777 29510  7284  20   0  21856   692 vfs_re Dpts/51 0:00 /bin/ls 
> /mnt/sde1
> 
> $ ps -o wchan 1863
> WCHAN
> wait_for_commit
> 
> $ ps -o wchan 27995
> WCHAN
> vfs_readdir

-- 
| Ian! D. Allen  -  idal...@idallen.ca  -  Ottawa, Ontario, Canada
| Home Page: http://idallen.com/   Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom:  http://eff.org/  and have fun:  http://fools.ca/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: What to do about subvolumes?

2010-12-09 Thread Martin Steigerwald

Am Mittwoch 01 Dezember 2010 schrieb Mike Hommey:
> On Wed, Dec 01, 2010 at 11:01:37AM -0500, Chris Mason wrote:
> > Excerpts from C Anthony Risinger's message of 2010-12-01 09:51:55 
-0500:
> > > On Wed, Dec 1, 2010 at 8:21 AM, Josef Bacik  
wrote:
> > > > === How do we want subvolumes to work from a user perspective?
> > > > ===
> > > > 
> > > > 1) Users need to be able to create their own subvolumes. Â The
> > > > permission semantics will be absolutely the same as creating
> > > > directories, so I don't think this is too tricky. Â We want this
> > > > because you can only take snapshots of subvolumes, and so it is
> > > > important that users be able to create their own discrete
> > > > snapshottable targets.
> > > > 
> > > > 2) Users need to be able to snapshot their subvolumes. Â This is
> > > > basically the same as #1, but it bears repeating.
> > > 
> > > could it be possible to convert a directory into a volume?  or at
> > > least base a snapshot off it?
> > 
> > I'm afraid this turns into the same complexity as creating a new
> > volume and copying all the files/dirs in by hand.
> 
> Except you wouldn't have to copy data, only metadata.

And it could probably be race-free. If I'd  cp -reflink or rsync stuff from 
a real directory to a subvolume and then rename the old directory to an 
other name and the subvolume to the directory name then I might be missing 
files that have been created during the copy process and missing changes to 
files that have been already copied.

What I would like is an easy way to make ~/.kde or whatever a subvolume to 
be able to snapshot it independently while KDE applications or whatever is 
using and writing to it, *without* any userland even noticing it and 
without - except for metadata for managing the subvolume - any additional 
space consumption.

So

deepdance:/#12> btrfs subvolume create /home/martin/.kde
ERROR: '/home/martin/.kde' exists

would just make a subvolume out of ~/.kde even if it needs splitting out 
the tree or even copying the tree data into a new tree.

There are other filesystem operations like btrfs filesystem balance that can 
be expensive as well.

All that said from a user point of view. Maybe technical its not feasible. 
But it would be nice if it can be made feasible without loosing existing 
advantages.

And maybe

deepdance:/> btrfs subvolume create .   
ERROR: '.' exists

should really remain this way ;).

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

signature.asc
Description: This is a digitally signed message part.

Re: [PATCH v2 5/5] Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctl

2010-12-09 Thread Goffredo Baroncelli

Hi Li,

On Thursday, 09 December, 2010, Li Zefan wrote:
> This allows us to set a snapshot or a subvolume readonly or writable
> on the fly.
> 
> Usage:
> 
> Set BTRFS_SUBVOL_RDONLY of btrfs_ioctl_vol_arg_v2->flags, and then
> call ioctl(BTRFS_IOCTL_SUBVOL_SETFLAGS);
> 
> Changelog for v2:
> - Add _GETFLAGS ioctl.
> - Check if the passed fd is the root of a subvolume.
> - Change the name from _SNAP_SETFLAGS to _SUBVOL_SETFLAGS.
> 
> Signed-off-by: Li Zefan 
> ---
>  fs/btrfs/ioctl.c |   92 
++
>  fs/btrfs/ioctl.h |4 ++
>  2 files changed, 96 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index db2b231..29304c7 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1010,6 +1010,94 @@ out:
[...]
> + struct btrfs_ioctl_vol_args_v2 *vol_args;
> + int ret = 0;
> + u64 flags = 0;
> +
> + if (inode->i_ino != BTRFS_FIRST_FREE_OBJECTID)
> + return -EINVAL;
> +
> + vol_args = memdup_user(arg, sizeof(*vol_args));

Would be better to avoid a dynamic allocation for a so small struct ? 
And as more general comment: what is the reason to pass a so complex struct 
only for setting/getting the flags ?
Would be it better to pass directly a u64 variable.

[..]
> +
> +static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
> +   void __user *arg)
> +{
> + struct inode *inode = fdentry(file)->d_inode;
> + struct btrfs_root *root = BTRFS_I(inode)->root;
> + struct btrfs_trans_handle *trans;
> + struct btrfs_ioctl_vol_args_v2 *vol_args;
> + u64 root_flags;
> + u64 flags;
> + int ret = 0;
> +
> + if (root->fs_info->sb->s_flags & MS_RDONLY)
> + return -EROFS;
> +
> + if (inode->i_ino != BTRFS_FIRST_FREE_OBJECTID)
> + return -EINVAL;
> +
> + vol_args = memdup_user(arg, sizeof(*vol_args));

Same as before.

> + if (IS_ERR(vol_args))
> + return PTR_ERR(vol_args);
> + flags = vol_args->flags;
> +
> + if (flags & ~BTRFS_SUBVOL_RDONLY) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + down_write(&root->fs_info->subvol_sem);
> +
> + /* nothing to do */
> + if (!!(flags & BTRFS_SUBVOL_RDONLY) == root->readonly)
> + goto out_unlock;

This is only an aesthetic comment: I prefer a simpler style like

if ((flags & BTRFS_SUBVOL_RDONLY) && root->readonly)
goto out_unlock;

But I know that every body has its style :-)

> +
> + root_flags = btrfs_root_flags(&root->root_item);
> + if (flags & BTRFS_SUBVOL_RDONLY)
> + btrfs_set_root_flags(&root->root_item,
> +  root_flags | BTRFS_ROOT_SNAP_RDONLY);
> + else
> + btrfs_set_root_flags(&root->root_item,
> +  root_flags & ~BTRFS_ROOT_SNAP_RDONLY);
> + root->readonly = !root->readonly;

I double checked this line. But if I read the code correctly I think that the 
line above is wrong: the field "root->readonly" is flipped regardless the 
value of the flags passed by the user.

Moreover I have another question: why internally the flags is 
BTRFS_ROOT_SNAP_RDONLY, instead in user space the flag is BTRFS_SUBVOL_RDONLY 
? 

I suggest to 
- rename BTRFS_SUBVOL_RDONLY in BTRFS_SUBVOL_CREATE_SNAP_RDONLY (like 
BTRFS_SUBVOL_CREATE_SNAP_ASYNC)
- rename BTRFS_ROOT_SNAP_RDONLY in BTRFS_ROOT_FLAGS_RDONLY and use both in 
userspace and in kernel space this flag. I suggested to remove SNAP because 
the flag make sense also for a "standard" subvolume.


Gegards 
G.Baroncelli


-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) 
Key fingerprint = 4769 7E51 5293 D36C 814E  C054 BF04 F161 3DC5 0512
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Ted Ts'o

On Thu, Dec 09, 2010 at 12:10:58PM -0600, Jon Nelson wrote:
> 
> You should be OK, there. Are you using encryption or no?
> I had difficulty replicating the issue without encryption.

Yes, I'm using encryption.  LUKS with aes-xts-plain-sha256, and then
LVM on top of LUKS.

> > If you can point out how to query pgsql_tmp (I'm using a completely
> > default postgres install), that would be helpful, but I don't think it
> > would be going anywhere else.
> 
> Normally it's /var/lib/pgsql/data/pgsql_tmp (or
> /var/lib/postgres/data/pgsql_tmp in your case). By placing
> /var/lib/{postgresql,pgsql}/data on the LUKS + ext4 volume, on both
> openSUSE 11.3 and Kubuntu, I was able to replicate the problem easily,
> in VirtualBox. I can give qemu a try. In both cases I was using a
> 2.6.37x kernel.

Ah, I'm not using virtualization.  I'm running on a X410 laptop, on
raw hardware.  Perhaps virtualization slows things down enough that it
triggers?  Or maybe you're running with a more constrained memory than
I?  How much memory do you have configured in your VM?

  - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Jon Nelson

On Thu, Dec 9, 2010 at 2:13 PM, Ted Ts'o  wrote:
> On Thu, Dec 09, 2010 at 12:10:58PM -0600, Jon Nelson wrote:
>>
>> You should be OK, there. Are you using encryption or no?
>> I had difficulty replicating the issue without encryption.
>
> Yes, I'm using encryption.  LUKS with aes-xts-plain-sha256, and then
> LVM on top of LUKS.

Hmm.
The cipher is listed as:
aes-cbc-essiv:sha256

>> > If you can point out how to query pgsql_tmp (I'm using a completely
>> > default postgres install), that would be helpful, but I don't think it
>> > would be going anywhere else.
>>
>> Normally it's /var/lib/pgsql/data/pgsql_tmp (or
>> /var/lib/postgres/data/pgsql_tmp in your case). By placing
>> /var/lib/{postgresql,pgsql}/data on the LUKS + ext4 volume, on both
>> openSUSE 11.3 and Kubuntu, I was able to replicate the problem easily,
>> in VirtualBox. I can give qemu a try. In both cases I was using a
>> 2.6.37x kernel.
>
> Ah, I'm not using virtualization.  I'm running on a X410 laptop, on
> raw hardware.  Perhaps virtualization slows things down enough that it
> triggers?  Or maybe you're running with a more constrained memory than
> I?  How much memory do you have configured in your VM?

512MB.

'free' reports 75MB, 419MB free.

I originally noticed the problem on really real hardware (thinkpad
T61p), however.

-- 
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Andi Kleen

> 512MB.
> 
> 'free' reports 75MB, 419MB free.
> 
> I originally noticed the problem on really real hardware (thinkpad
> T61p), however.

If you can easily reproduce it could you try a git bisect?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: fix sync subvol/snapshot creation

2010-12-09 Thread Sage Weil

We were incorrectly taking the async path even for the sync ioctls by
passing in &transid unconditionally.

There's ample room for further cleanup here, but this keeps the fix simple.

Signed-off-by: Sage Weil 
---
 fs/btrfs/ioctl.c |   20 +++-
 1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 41614c3..4c2d7c4 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -970,6 +970,15 @@ static noinline int btrfs_ioctl_snap_create(struct file 
*file,
name = async_vol_args->name;
fd = async_vol_args->fd;
async_vol_args->name[BTRFS_SNAPSHOT_NAME_MAX] = '\0';
+
+   ret = btrfs_ioctl_snap_create_transid(file, name, fd,
+ subvol, &transid);
+
+   if (ret == 0 &&
+   copy_to_user(arg +
+offsetof(struct btrfs_ioctl_async_vol_args,
+ transid), &transid, sizeof(transid)))
+   ret = -EFAULT;
} else {
vol_args = memdup_user(arg, sizeof(*vol_args));
if (IS_ERR(vol_args))
@@ -977,16 +986,9 @@ static noinline int btrfs_ioctl_snap_create(struct file 
*file,
name = vol_args->name;
fd = vol_args->fd;
vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
-   }
-
-   ret = btrfs_ioctl_snap_create_transid(file, name, fd,
- subvol, &transid);
 
-   if (!ret && async) {
-   if (copy_to_user(arg +
-   offsetof(struct btrfs_ioctl_async_vol_args,
-   transid), &transid, sizeof(transid)))
-   return -EFAULT;
+   ret = btrfs_ioctl_snap_create_transid(file, name, fd,
+ subvol, NULL);
}
 
kfree(vol_args);
-- 
1.6.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Chris Mason

Excerpts from Andi Kleen's message of 2010-12-09 18:16:16 -0500:
> > 512MB.
> > 
> > 'free' reports 75MB, 419MB free.
> > 
> > I originally noticed the problem on really real hardware (thinkpad
> > T61p), however.
> 
> If you can easily reproduce it could you try a git bisect?

Do we have a known good kernel?  I looked back through the thread and
didn't see any reports where the postgres test on ext4 passed in this
config.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Matt

On Fri, Dec 10, 2010 at 2:38 AM, Chris Mason  wrote:
> Excerpts from Andi Kleen's message of 2010-12-09 18:16:16 -0500:
>> > 512MB.
>> >
>> > 'free' reports 75MB, 419MB free.
>> >
>> > I originally noticed the problem on really real hardware (thinkpad
>> > T61p), however.
>>
>> If you can easily reproduce it could you try a git bisect?
>
> Do we have a known good kernel?  I looked back through the thread and
> didn't see any reports where the postgres test on ext4 passed in this
> config.
>
> -chris
>

Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179

from the tests I've done that one showed the least or no corruption if
you count the empty /etc/env.d/03opengl as an artefact

(I tested 3 commits in total)


1) 5a87b7a5da250c9be6d757758425dfeaf8ed3179

2) 1de3e3df917459422cb2aecac440febc8879d410

3) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc

1 -> 3 (earlier -> later)

Regards

Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Mike Fedyk

On Thu, Dec 9, 2010 at 5:38 PM, Chris Mason  wrote:
> Excerpts from Andi Kleen's message of 2010-12-09 18:16:16 -0500:
>> > 512MB.
>> >
>> > 'free' reports 75MB, 419MB free.
>> >
>> > I originally noticed the problem on really real hardware (thinkpad
>> > T61p), however.
>>
>> If you can easily reproduce it could you try a git bisect?
>
> Do we have a known good kernel?  I looked back through the thread and
> didn't see any reports where the postgres test on ext4 passed in this
> config.
>

2.6.34.something.  -- Any chance a newer kernel can be tested to be found good?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Chris Mason

Excerpts from Mike Fedyk's message of 2010-12-09 20:58:40 -0500:
> On Thu, Dec 9, 2010 at 5:38 PM, Chris Mason  wrote:
> > Excerpts from Andi Kleen's message of 2010-12-09 18:16:16 -0500:
> >> > 512MB.
> >> >
> >> > 'free' reports 75MB, 419MB free.
> >> >
> >> > I originally noticed the problem on really real hardware (thinkpad
> >> > T61p), however.
> >>
> >> If you can easily reproduce it could you try a git bisect?
> >
> > Do we have a known good kernel?  I looked back through the thread and
> > didn't see any reports where the postgres test on ext4 passed in this
> > config.
> >
> 
> 2.6.34.something.  -- Any chance a newer kernel can be tested to be found 
> good?

But he is triggering the ext4 corruption without dm-crypt.  I think
dm-crypt was still used somewhere on the system during the test, just
not on the partitions that actually hit the corruption.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Jon Nelson

On Thu, Dec 9, 2010 at 8:00 PM, Chris Mason  wrote:
> Excerpts from Mike Fedyk's message of 2010-12-09 20:58:40 -0500:
>> On Thu, Dec 9, 2010 at 5:38 PM, Chris Mason  wrote:
>> > Excerpts from Andi Kleen's message of 2010-12-09 18:16:16 -0500:
>> >> > 512MB.
>> >> >
>> >> > 'free' reports 75MB, 419MB free.
>> >> >
>> >> > I originally noticed the problem on really real hardware (thinkpad
>> >> > T61p), however.
>> >>
>> >> If you can easily reproduce it could you try a git bisect?
>> >
>> > Do we have a known good kernel?  I looked back through the thread and
>> > didn't see any reports where the postgres test on ext4 passed in this
>> > config.
>> >
>>
>> 2.6.34.something.  -- Any chance a newer kernel can be tested to be found 
>> good?
>
> But he is triggering the ext4 corruption without dm-crypt.  I think
> dm-crypt was still used somewhere on the system during the test, just
> not on the partitions that actually hit the corruption.

I now have my doubts about being able to trigger it without dm-crypt.
I stand by my report, but I'm also unable to reproduce it after after
50 test iterations. I did run 2.6.36 for a few weeks without any issue
that I recall.



-- 
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix sync subvol/snapshot creation

2010-12-09 Thread Li Zefan

Sage Weil wrote:
> We were incorrectly taking the async path even for the sync ioctls by
> passing in &transid unconditionally.
> 

Ha, I fixed this accidentally in my patchset. :)

> There's ample room for further cleanup here, but this keeps the fix simple.
> 
> Signed-off-by: Sage Weil 

Reviewed-by: Li Zefan 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix sync subvol/snapshot creation

2010-12-09 Thread Chris Mason

Excerpts from Li Zefan's message of 2010-12-09 21:11:25 -0500:
> Sage Weil wrote:
> > We were incorrectly taking the async path even for the sync ioctls by
> > passing in &transid unconditionally.
> > 
> 
> Ha, I fixed this accidentally in my patchset. :)

Thanks guys,

Unless either of you object, I'll take this and the snapshot ABI change
for the next rc.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix sync subvol/snapshot creation

2010-12-09 Thread Sage Weil

On Thu, 9 Dec 2010, Chris Mason wrote:
> Excerpts from Li Zefan's message of 2010-12-09 21:11:25 -0500:
> > Sage Weil wrote:
> > > We were incorrectly taking the async path even for the sync ioctls by
> > > passing in &transid unconditionally.
> > > 
> > 
> > Ha, I fixed this accidentally in my patchset. :)
>
> Thanks guys,
> 
> Unless either of you object, I'll take this and the snapshot ABI change
> for the next rc.

Sounds good to me.

Just FYI, I'm still seeing some weird corruption and crashes on my 
machines and haven't narrowed it down yet.  Should have more info soon.

sage
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Ted Ts'o

On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote:
> 
> Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179
> 
> from the tests I've done that one showed the least or no corruption if
> you count the empty /etc/env.d/03opengl as an artefact

Yes, that's a good test.  Also try commit bd2d0210cf.  The patch
series that is most likely to be at fault if there is a regression in
between 5a87b7a5d and bd2d0210cf inclusive.

I did a lot of testing before submitting it, but that wa a tricky
rewrite.  If you can reproduce the problem reliably, it might be good
to try commit 16828088f9 (the commit before 5a87b7a5d) and commit
bd2d0210cf.  If it reliably reproduces on bd2d0210cf, but is clean on
16828088f9, then it's my ext4 block i/o submission patches, and we'll
need to either figure out what's going on or back out that set of
changes.

If that's the case, a bisect of those changes (it's only 6 commits, so
it shouldn't take long) would be most appreciated.

Regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3] Btrfs: Make async snapshot ioctl more generic

2010-12-09 Thread Li Zefan

If we had reserved some bytes in struct btrfs_ioctl_vol_args, we
wouldn't have to create a new structure for async snapshot creation.

Here we convert async snapshot ioctl to use a more generic ABI, as
we'll add more ioctls for snapshots/subvolumes in the future, readonly
snapshots for example.

Signed-off-by: Li Zefan 
---

I rebased the patch on Sage's fix, so you don't have to rebase it
by yourself.

---
 fs/btrfs/ioctl.c |   44 +++-
 fs/btrfs/ioctl.h |   14 +-
 2 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 7cc2e8e..f87552a 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -947,31 +947,41 @@ out:
 
 static noinline int btrfs_ioctl_snap_create(struct file *file,
void __user *arg, int subvol,
-   int async)
+   int v2)
 {
struct btrfs_ioctl_vol_args *vol_args = NULL;
-   struct btrfs_ioctl_async_vol_args *async_vol_args = NULL;
+   struct btrfs_ioctl_vol_args_v2 *vol_args_v2 = NULL;
char *name;
u64 fd;
-   u64 transid = 0;
int ret;
 
-   if (async) {
-   async_vol_args = memdup_user(arg, sizeof(*async_vol_args));
-   if (IS_ERR(async_vol_args))
-   return PTR_ERR(async_vol_args);
+   if (v2) {
+   u64 transid = 0;
+   u64 *ptr = NULL;
 
-   name = async_vol_args->name;
-   fd = async_vol_args->fd;
-   async_vol_args->name[BTRFS_SNAPSHOT_NAME_MAX] = '\0';
+   vol_args_v2 = memdup_user(arg, sizeof(*vol_args_v2));
+   if (IS_ERR(vol_args_v2))
+   return PTR_ERR(vol_args_v2);
+
+   if (vol_args_v2->flags & ~BTRFS_SUBVOL_CREATE_ASYNC) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   name = vol_args_v2->name;
+   fd = vol_args_v2->fd;
+   vol_args_v2->name[BTRFS_SUBVOL_NAME_MAX] = '\0';
+
+   if (vol_args_v2->flags & BTRFS_SUBVOL_CREATE_ASYNC)
+   ptr = &transid;
 
ret = btrfs_ioctl_snap_create_transid(file, name, fd,
- subvol, &transid);
+ subvol, ptr);
 
-   if (ret == 0 &&
+   if (ret == 0 && ptr &&
copy_to_user(arg +
-offsetof(struct btrfs_ioctl_async_vol_args,
- transid), &transid, sizeof(transid)))
+offsetof(struct btrfs_ioctl_vol_args_v2,
+ transid), ptr, sizeof(*ptr)))
ret = -EFAULT;
} else {
vol_args = memdup_user(arg, sizeof(*vol_args));
@@ -984,9 +994,9 @@ static noinline int btrfs_ioctl_snap_create(struct file 
*file,
ret = btrfs_ioctl_snap_create_transid(file, name, fd,
  subvol, NULL);
}
-
+out:
kfree(vol_args);
-   kfree(async_vol_args);
+   kfree(vol_args_v2);
 
return ret;
 }
@@ -2248,7 +2258,7 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_getversion(file, argp);
case BTRFS_IOC_SNAP_CREATE:
return btrfs_ioctl_snap_create(file, argp, 0, 0);
-   case BTRFS_IOC_SNAP_CREATE_ASYNC:
+   case BTRFS_IOC_SNAP_CREATE_V2:
return btrfs_ioctl_snap_create(file, argp, 0, 1);
case BTRFS_IOC_SUBVOL_CREATE:
return btrfs_ioctl_snap_create(file, argp, 1, 0);
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index 17c99eb..c344d12 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -30,11 +30,15 @@ struct btrfs_ioctl_vol_args {
char name[BTRFS_PATH_NAME_MAX + 1];
 };
 
-#define BTRFS_SNAPSHOT_NAME_MAX 4079
-struct btrfs_ioctl_async_vol_args {
+#define BTRFS_SUBVOL_CREATE_ASYNC  (1ULL << 0)
+
+#define BTRFS_SUBVOL_NAME_MAX 4039
+struct btrfs_ioctl_vol_args_v2 {
__s64 fd;
__u64 transid;
-   char name[BTRFS_SNAPSHOT_NAME_MAX + 1];
+   __u64 flags;
+   __u64 unused[4];
+   char name[BTRFS_SUBVOL_NAME_MAX + 1];
 };
 
 #define BTRFS_INO_LOOKUP_PATH_MAX 4080
@@ -187,6 +191,6 @@ struct btrfs_ioctl_space_args {
struct btrfs_ioctl_space_args)
 #define BTRFS_IOC_START_SYNC _IOR(BTRFS_IOCTL_MAGIC, 24, __u64)
 #define BTRFS_IOC_WAIT_SYNC  _IOW(BTRFS_IOCTL_MAGIC, 22, __u64)
-#define BTRFS_IOC_SNAP_CREATE_ASYNC _IOW(BTRFS_IOCTL_MAGIC, 23, \
-  struct btrfs_ioctl_async_vol_args)
+#define BTRFS_IOC_SNAP_CREATE_V2 _IOW(BTRFS_IOCTL_MAGIC, 23, \
+  struct btrfs_ioctl_vol

Re: [PATCH v2 4/5] Btrfs: Add readonly snapshots support

2010-12-09 Thread Li Zefan

>> +#define BTRFS_ROOT_SNAP_RDONLY  (1ULL << 0)
>> +
>>  struct btrfs_root_item {
>>  struct btrfs_inode_item inode;
>>  __le64 generation;
>> @@ -1116,6 +1118,7 @@ struct btrfs_root {
>>  int defrag_running;
>>  char *name;
>>  int in_sysfs;
>> +bool readonly;
> 
> Does make sense to store the same information in two places ?
> If we have access to root->readonly, we have also access to "root-
>> root_item.flags". Because we need the latter, we can get rid of the former.
> 
> 
> We can replace a test like
> 
>   if(root->readonly) 
> 
> with
> 
>   if(root->root_item.flags & BTRFS_ROOT_SNAP_RDONLY)
> 
> Or better we can create a two macros like:
> 
> #define btrfs_root_readonly(x) ((x)->root_item.flags & BTRFS_ROOT_SNAP_RDONLY)
> #define btrfs_root_set_readonly(x, ro)
> \
>   do{ (x)->root_item.flags =  \
>   ((x)->root_item.flags & ~BTRFS_ROOT_SNAP_RDONLY) |  \
>   (ro ? BTRFS_ROOT_SNAP_RDONLY : 0 ); \
>   }while(0)
> 
> 

Makes sense.

(except that inline functions are preferable)

> Sorry for to be too late for this kind of suggestion. But I think that this 
> optimization may help to avoid misalignment between the two variables (see my 
> reply in the next patch).
> [...]
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Jon Nelson

On Thu, Dec 9, 2010 at 8:38 PM, Ted Ts'o  wrote:
> On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote:
>>
>> Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179
>>
>> from the tests I've done that one showed the least or no corruption if
>> you count the empty /etc/env.d/03opengl as an artefact
>
> Yes, that's a good test.  Also try commit bd2d0210cf.  The patch
> series that is most likely to be at fault if there is a regression in
> between 5a87b7a5d and bd2d0210cf inclusive.
>
> I did a lot of testing before submitting it, but that wa a tricky
> rewrite.  If you can reproduce the problem reliably, it might be good
> to try commit 16828088f9 (the commit before 5a87b7a5d) and commit
> bd2d0210cf.  If it reliably reproduces on bd2d0210cf, but is clean on
> 16828088f9, then it's my ext4 block i/o submission patches, and we'll
> need to either figure out what's going on or back out that set of
> changes.
>
> If that's the case, a bisect of those changes (it's only 6 commits, so
> it shouldn't take long) would be most appreciated.

I observed the behavior on bd2d0210cf in a qemu-kvm install of
openSUSE 11.3 (x86_64) on *totally* different host - an AMD quad-core.

I did /not/ observe the behavior on 16828088f9 (yet). I'll run the
test a few more times on 1682..

Additionally, I am building a bisected kernel now (
cb20d5188366f04d96d2e07b1240cc92170ade40 ), but won't be able to get
back at it for a while.

I hope this helps.

-- 
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 5/5] Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctl

2010-12-09 Thread Li Zefan

Goffredo Baroncelli wrote:
> Hi Li,
> 
> On Thursday, 09 December, 2010, Li Zefan wrote:
>> This allows us to set a snapshot or a subvolume readonly or writable
>> on the fly.
>>
>> Usage:
>>
>> Set BTRFS_SUBVOL_RDONLY of btrfs_ioctl_vol_arg_v2->flags, and then
>> call ioctl(BTRFS_IOCTL_SUBVOL_SETFLAGS);
>>
>> Changelog for v2:
>> - Add _GETFLAGS ioctl.
>> - Check if the passed fd is the root of a subvolume.
>> - Change the name from _SNAP_SETFLAGS to _SUBVOL_SETFLAGS.
>>
>> Signed-off-by: Li Zefan 
>> ---
>>  fs/btrfs/ioctl.c |   92 
> ++
>>  fs/btrfs/ioctl.h |4 ++
>>  2 files changed, 96 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index db2b231..29304c7 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1010,6 +1010,94 @@ out:
> [...]
>> +struct btrfs_ioctl_vol_args_v2 *vol_args;
>> +int ret = 0;
>> +u64 flags = 0;
>> +
>> +if (inode->i_ino != BTRFS_FIRST_FREE_OBJECTID)
>> +return -EINVAL;
>> +
>> +vol_args = memdup_user(arg, sizeof(*vol_args));
> 
> Would be better to avoid a dynamic allocation for a so small struct ? 
> And as more general comment: what is the reason to pass a so complex struct 
> only for setting/getting the flags ?
> Would be it better to pass directly a u64 variable.
> 

I was considering using the same structure for all subvolume ioctls,
but here seems it's overkill..

> [..]
>> +
>> +static noinline int btrfs_ioctl_subvol_setflags(struct file *file,
>> +  void __user *arg)
>> +{
>> +struct inode *inode = fdentry(file)->d_inode;
>> +struct btrfs_root *root = BTRFS_I(inode)->root;
>> +struct btrfs_trans_handle *trans;
>> +struct btrfs_ioctl_vol_args_v2 *vol_args;
>> +u64 root_flags;
>> +u64 flags;
>> +int ret = 0;
>> +
>> +if (root->fs_info->sb->s_flags & MS_RDONLY)
>> +return -EROFS;
>> +
>> +if (inode->i_ino != BTRFS_FIRST_FREE_OBJECTID)
>> +return -EINVAL;
>> +
>> +vol_args = memdup_user(arg, sizeof(*vol_args));
> 
> Same as before.
> 
>> +if (IS_ERR(vol_args))
>> +return PTR_ERR(vol_args);
>> +flags = vol_args->flags;
>> +
>> +if (flags & ~BTRFS_SUBVOL_RDONLY) {
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +down_write(&root->fs_info->subvol_sem);
>> +
>> +/* nothing to do */
>> +if (!!(flags & BTRFS_SUBVOL_RDONLY) == root->readonly)
>> +goto out_unlock;
> 
> This is only an aesthetic comment: I prefer a simpler style like
> 
>   if ((flags & BTRFS_SUBVOL_RDONLY) && root->readonly)
>   goto out_unlock;
> 

They are not equivalent. The former checks if the flags and the
root both have readonly bit set or cleared.

> But I know that every body has its style :-)
> 
>> +
>> +root_flags = btrfs_root_flags(&root->root_item);
>> +if (flags & BTRFS_SUBVOL_RDONLY)
>> +btrfs_set_root_flags(&root->root_item,
>> + root_flags | BTRFS_ROOT_SNAP_RDONLY);
>> +else
>> +btrfs_set_root_flags(&root->root_item,
>> + root_flags & ~BTRFS_ROOT_SNAP_RDONLY);
>> +root->readonly = !root->readonly;
> 
> I double checked this line. But if I read the code correctly I think that the 
> line above is wrong: the field "root->readonly" is flipped regardless the 
> value of the flags passed by the user.
> 

Yep, that's because if we don't need to flip it, we've already exited early.

Note, there's only one flag.

> Moreover I have another question: why internally the flags is 
> BTRFS_ROOT_SNAP_RDONLY, instead in user space the flag is BTRFS_SUBVOL_RDONLY 
> ? 
> 

That's my carelessness..

> I suggest to 
> - rename BTRFS_SUBVOL_RDONLY in BTRFS_SUBVOL_CREATE_SNAP_RDONLY (like 
> BTRFS_SUBVOL_CREATE_SNAP_ASYNC)
> - rename BTRFS_ROOT_SNAP_RDONLY in BTRFS_ROOT_FLAGS_RDONLY and use both in 
> userspace and in kernel space this flag. I suggested to remove SNAP because 
> the flag make sense also for a "standard" subvolume.
> 

I'd prefer not to mix flags for root_item flags and vol ioctl flags.

And CREATE_RDONLY and CREATE_ASYNC can also be valid for subvolumes too.
Support for async subvolume creation is already there, just lack of an
user interface. So I've changed _CREATE_SNAP_ASYNC to _CREATE_ASYNC
in the patch I sent just now.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: do not BUG if we fail to remove the orphan item for dead snapshots

2010-12-09 Thread Li Zefan

01:25, Josef Bacik wrote:
> Not being able to delete an orphan item isn't a horrible thing.  The worst 
> that
> happens is the next time around we try and do the orphan cleanup and we can't
> find the referenced object and just delete the item and move on.  Thanks,
> 

Would be better to add code comment? Otherwise later people may wonder why
the return value is not checked and see it as a bug.

> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/extent-tree.c |1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 8aed05e..8c26441 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -6354,7 +6354,6 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>   if (ret > 0) {
>   ret = btrfs_del_orphan_item(trans, tree_root,
>   root->root_key.objectid);
> - BUG_ON(ret);
>   }
>   }
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 5/5] Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctl

2010-12-09 Thread Ian! D. Allen

On Fri, Dec 10, 2010 at 03:12:41PM +0800, Li Zefan wrote:
> >> +  /* nothing to do */
> >> +  if (!!(flags & BTRFS_SUBVOL_RDONLY) == root->readonly)
> >> +  goto out_unlock;
> > 
> > This is only an aesthetic comment: I prefer a simpler style like
> > 
> > if ((flags & BTRFS_SUBVOL_RDONLY) && root->readonly)
> > goto out_unlock;
> 
> They are not equivalent. The former checks if the flags and the
> root both have readonly bit set or cleared.

This is an excellent reason to add what you just replied (above) as a
comment into the code ahead of that line, so others don't make the same
mistake in interpreting what it does:

/* nothing to do */
/* check if flags and root both have readonly bit set or cleared */
if (!!(flags & BTRFS_SUBVOL_RDONLY) == root->readonly)
goto out_unlock;

"Debugging is twice as hard as writing the code in the first place.
 Therefore, if you write the code as cleverly as possible, you are,
 by definition, not smart enough to debug it." - Brian W. Kernighan

-- 
| Ian! D. Allen  -  idal...@idallen.ca  -  Ottawa, Ontario, Canada
| Home Page: http://idallen.com/   Contact Improv: http://contactimprov.ca/
| College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/
| Defend digital freedom:  http://eff.org/  and have fun:  http://fools.ca/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

49 matches

Mail list logo