date:20180319

Re: High CPU usage of RT threads...

2018-03-19 Thread Shyam Prasad N

Hi Nikolay,

Thanks for your reply on this.

Checked the stack trace for many of the stuck threads. Looks like all
of them are stuck in this loop...
[] exit_to_usermode_loop+0x72/0xd0
[] prepare_exit_to_usermode+0x26/0x30
[] retint_user+0x8/0x10
[] 0x

Seems like it is stuck in a tight loop in exit_to_usermode_loop.
FWIW, we started seeing this issue with nodatacow btrfs mount option.
Previously we were running with default option of datacow.
However, this also coincides with fairly heavy unlink load that we've
been putting the system under.

Please let me know if there is anything else you can think of, based
on the above data.

Regards,
Shyam


On Thu, Mar 15, 2018 at 12:59 PM, Nikolay Borisov  wrote:
>
>
> On 15.03.2018 09:23, Shyam Prasad N wrote:
>> Hi,
>>
>> Our servers run some daemons that are scheduled to run many real time
>> threads. These threads serve the client nodes by performing I/O on top
>> of some set of disks, configured as DRBD pairs with disks on other
>> peer servers for high availability of data. Btrfs is the filesystem
>> that is configured on top of DRBD.
>>
>> While testing high availability with fairly high load, we have noticed
>> the following behaviour a couple of times: When the server which was
>> killed comes back up and gets ready and DRBD disks start syncing the
>> data between the disks, a performance hit is generally expected at the
>> peer node which has taken over the service now. However, the real time
>> threads (mentioned above) on the active node are hogging the CPUs. As
>> a part of debugging the issue, we tried to force a core dump on these
>> threads by using a SIGABRT. However, these threads were not responding
>> to any signals. Only after using real-time throttling (to reduce real
>> time CPU usage to 50%), and waiting around for a few minutes, we were
>> able to force a core dump. However, the corefile generated didn't have
>> much useful info (I think it was a partial/corrupted core dump).
>>
>> Based on the above behaviour, (signals not being picked up), it looks
>> to me like all these threads were likely stuck inside some system
>> call. And since majority of the system calls by these threads are VFS
>> calls on btrfs, I feel that these threads may have been stuck in some
>> I/O. Specifically, based on the CPU usage, in some spinlock (I'm open
>> to suggestions of other possibilities). And this is the reason I'm
>> posting on this mailing list.
>
> When you have a bunch of those threads get a dump of the stacks of all
> sleeping tasks by "echo w > /proc/sysrq-trigger" .
>
>>
>> Is there a known bug which might have caused this? Kernel version
>> we're using is 4.4.0.
>
> This is rather old kernel, you should at least be using latest 4.4.y
> stable kernel. BTRFS is a moving target and there are a lot of
> improvements made every release. So I'd suggest to try 4.14 at least on
> one offending machine.
>
>> If we go for a kernel upgrade, what are the chances of not seeing this
>> behaviour again?
>>
>> Or is my analysis of the problem entirely wrong? My feeling is that
>> this maybe some issue with using Btrfs when it doesn't get a response
>> from DRBD quickly enough.
>
> Feelings don't count for anything. Next time this happens extract
> stacktrace from the offending threads i.e. smapling /proc/[pid of
> hogging thread]/stack. Furthermore, if we assume that btrfs is indeed
> not getting responses fast enough this means most clients should really
> be stuck in io sleep and not doing any processing.
>
>
>> Because we have been using ext4 on top of DRBD for a long time, and
>> have never seen such issues during HA tests there.
>>



-- 
-Shyam
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: High CPU usage of RT threads...

2018-03-19 Thread Nikolay Borisov



On 19.03.2018 09:13, Shyam Prasad N wrote:
> Hi Nikolay,
> 
> Thanks for your reply on this.
> 
> Checked the stack trace for many of the stuck threads. Looks like all
> of them are stuck in this loop...
> [] exit_to_usermode_loop+0x72/0xd0
> [] prepare_exit_to_usermode+0x26/0x30
> [] retint_user+0x8/0x10
> [] 0x

Well, this doesn't imply btrfs at all.

How about the _full_ output of :

echo w > /proq/sysrq-trigger

Perhaps there is a lot of load in workqueues?

> 
> Seems like it is stuck in a tight loop in exit_to_usermode_loop.
> FWIW, we started seeing this issue with nodatacow btrfs mount option.
> Previously we were running with default option of datacow.
> However, this also coincides with fairly heavy unlink load that we've
> been putting the system under.
> 
> Please let me know if there is anything else you can think of, based
> on the above data.
> 
> Regards,
> Shyam
> 
> 
> On Thu, Mar 15, 2018 at 12:59 PM, Nikolay Borisov  wrote:
>>
>>
>> On 15.03.2018 09:23, Shyam Prasad N wrote:
>>> Hi,
>>>
>>> Our servers run some daemons that are scheduled to run many real time
>>> threads. These threads serve the client nodes by performing I/O on top
>>> of some set of disks, configured as DRBD pairs with disks on other
>>> peer servers for high availability of data. Btrfs is the filesystem
>>> that is configured on top of DRBD.
>>>
>>> While testing high availability with fairly high load, we have noticed
>>> the following behaviour a couple of times: When the server which was
>>> killed comes back up and gets ready and DRBD disks start syncing the
>>> data between the disks, a performance hit is generally expected at the
>>> peer node which has taken over the service now. However, the real time
>>> threads (mentioned above) on the active node are hogging the CPUs. As
>>> a part of debugging the issue, we tried to force a core dump on these
>>> threads by using a SIGABRT. However, these threads were not responding
>>> to any signals. Only after using real-time throttling (to reduce real
>>> time CPU usage to 50%), and waiting around for a few minutes, we were
>>> able to force a core dump. However, the corefile generated didn't have
>>> much useful info (I think it was a partial/corrupted core dump).
>>>
>>> Based on the above behaviour, (signals not being picked up), it looks
>>> to me like all these threads were likely stuck inside some system
>>> call. And since majority of the system calls by these threads are VFS
>>> calls on btrfs, I feel that these threads may have been stuck in some
>>> I/O. Specifically, based on the CPU usage, in some spinlock (I'm open
>>> to suggestions of other possibilities). And this is the reason I'm
>>> posting on this mailing list.
>>
>> When you have a bunch of those threads get a dump of the stacks of all
>> sleeping tasks by "echo w > /proc/sysrq-trigger" .
>>
>>>
>>> Is there a known bug which might have caused this? Kernel version
>>> we're using is 4.4.0.
>>
>> This is rather old kernel, you should at least be using latest 4.4.y
>> stable kernel. BTRFS is a moving target and there are a lot of
>> improvements made every release. So I'd suggest to try 4.14 at least on
>> one offending machine.
>>
>>> If we go for a kernel upgrade, what are the chances of not seeing this
>>> behaviour again?
>>>
>>> Or is my analysis of the problem entirely wrong? My feeling is that
>>> this maybe some issue with using Btrfs when it doesn't get a response
>>> from DRBD quickly enough.
>>
>> Feelings don't count for anything. Next time this happens extract
>> stacktrace from the offending threads i.e. smapling /proc/[pid of
>> hogging thread]/stack. Furthermore, if we assume that btrfs is indeed
>> not getting responses fast enough this means most clients should really
>> be stuck in io sleep and not doing any processing.
>>
>>
>>> Because we have been using ext4 on top of DRBD for a long time, and
>>> have never seen such issues during HA tests there.
>>>
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 0/3] btrfs: Add three new unprivileged ioctls to allow normal users to call "sub list/show" etc.

2018-03-19 Thread Misono, Tomohiro

changelog:

v2-> v3
 - fix kbuild test bot warning
v1 -> v2
  - completely reimplement 1st/2nd ioctl to have user friendly api
  - various cleanup, remove unnecessary goto
===

This adds three new unprivileged ioctls:

1st patch: ioctl which returns subvolume information of ROOT_ITEM and 
ROOT_BACKREF
2nd patch: ioctl which returns subvolume information of ROOT_REF (without 
subvolume name)
3rd patch: user version of ino_lookup ioctl which also peforms permission check.

They will be used to implement user version of "subvolume list/show" etc in 
user tools.
See each commit log for more detals.

The rfc implementation of btrfs-progs can be found in the ML titled as follows: 
  [RFC PATCH v3 0/7] btrfs-progs: Allow normal user to call "subvolume 
list/show"


Tomohiro Misono (3):
  btrfs: Add unprivileged ioctl which returns subvolume information
  btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF
  btrfs: Add unprivileged version of ino_lookup ioctl

 fs/btrfs/ioctl.c   | 413 +
 include/uapi/linux/btrfs.h |  84 ++
 2 files changed, 497 insertions(+)

-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/3] btrfs: Add unprivileged ioctl which returns subvolume information

2018-03-19 Thread Misono, Tomohiro

 Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns
the information of subvolume containing this inode.
(i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.)

Signed-off-by: Tomohiro Misono 
---
 fs/btrfs/ioctl.c   | 118 +
 include/uapi/linux/btrfs.h |  51 
 2 files changed, 169 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 111ee282b777..be2abe7a5bba 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2309,6 +2309,122 @@ static noinline int btrfs_ioctl_ino_lookup(struct file 
*file,
return ret;
 }
 
+/* Get the subvolume information in BTRFS_ROOT_ITEM and BTRFS_ROOT_BACKREF */
+static noinline int btrfs_ioctl_get_subvol_info(struct file *file,
+  void __user *argp)
+{
+   struct btrfs_ioctl_get_subvol_info_args *subvol_info;
+   struct btrfs_root *root;
+   struct btrfs_path *path;
+   struct btrfs_key key;
+
+   struct btrfs_root_item root_item;
+   struct btrfs_root_ref *rref;
+   struct extent_buffer *l;
+   int slot;
+
+   unsigned long item_off;
+   unsigned long item_len;
+
+   struct inode *inode;
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   subvol_info = kzalloc(sizeof(*subvol_info), GFP_KERNEL);
+   if (!subvol_info) {
+   btrfs_free_path(path);
+   return -ENOMEM;
+   }
+   inode = file_inode(file);
+
+   root = BTRFS_I(inode)->root->fs_info->tree_root;
+   key.objectid = BTRFS_I(inode)->root->root_key.objectid;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = 0;
+   ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+   if (ret < 0) {
+   goto out;
+   } else if (ret > 0) {
+   /* If the subvolume is a snapshot, offset is not zero */
+   u64 objectid = key.objectid;
+
+   btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+   if (key.objectid != objectid ||
+   key.type != BTRFS_ROOT_ITEM_KEY) {
+   ret = -ENOENT;
+   goto out;
+   }
+   }
+
+   l = path->nodes[0];
+   slot = path->slots[0];
+   item_off = btrfs_item_ptr_offset(l, slot);
+   item_len = btrfs_item_size_nr(l, slot);
+   read_extent_buffer(l, &root_item, item_off, item_len);
+
+   subvol_info->id = key.objectid;
+
+   subvol_info->generation = btrfs_root_generation(&root_item);
+   subvol_info->flags = btrfs_root_flags(&root_item);
+
+   memcpy(subvol_info->uuid, root_item.uuid, BTRFS_UUID_SIZE);
+   memcpy(subvol_info->parent_uuid, root_item.parent_uuid,
+   BTRFS_UUID_SIZE);
+   memcpy(subvol_info->received_uuid, root_item.received_uuid,
+   BTRFS_UUID_SIZE);
+
+   subvol_info->ctransid = btrfs_root_ctransid(&root_item);
+   subvol_info->ctime.sec = btrfs_stack_timespec_sec(&root_item.ctime);
+   subvol_info->ctime.nsec = btrfs_stack_timespec_nsec(&root_item.ctime);
+
+   subvol_info->otransid = btrfs_root_otransid(&root_item);
+   subvol_info->otime.sec = btrfs_stack_timespec_sec(&root_item.otime);
+   subvol_info->otime.nsec = btrfs_stack_timespec_nsec(&root_item.otime);
+
+   subvol_info->stransid = btrfs_root_stransid(&root_item);
+   subvol_info->stime.sec = btrfs_stack_timespec_sec(&root_item.stime);
+   subvol_info->stime.nsec = btrfs_stack_timespec_nsec(&root_item.stime);
+
+   subvol_info->rtransid = btrfs_root_rtransid(&root_item);
+   subvol_info->rtime.sec = btrfs_stack_timespec_sec(&root_item.rtime);
+   subvol_info->rtime.nsec = btrfs_stack_timespec_nsec(&root_item.rtime);
+
+   btrfs_release_path(path);
+   key.type = BTRFS_ROOT_BACKREF_KEY;
+   key.offset = 0;
+   ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+   if (ret < 0)
+   goto out;
+
+   l = path->nodes[0];
+   slot = path->slots[0];
+   btrfs_item_key_to_cpu(l, &key, slot);
+   if (key.objectid == subvol_info->id &&
+   key.type == BTRFS_ROOT_BACKREF_KEY){
+   subvol_info->parent_id = key.offset;
+
+   rref = btrfs_item_ptr(l, slot, struct btrfs_root_ref);
+   subvol_info->dirid = btrfs_root_ref_dirid(l, rref);
+
+   item_off = btrfs_item_ptr_offset(l, slot)
+   + sizeof(struct btrfs_root_ref);
+   item_len = btrfs_item_size_nr(l, slot)
+   - sizeof(struct btrfs_root_ref);
+   read_extent_buffer(l, subvol_info->name, item_off, item_len);
+   }
+
+   if (copy_to_user(argp, subvol_info, sizeof(*subvol_info)))
+   ret = -EFAULT;
+
+out:
+   kzfree(subvol_inf

[PATCH v3 2/3] btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF

2018-03-19 Thread Misono, Tomohiro

Add unprivileged ioctl BTRFS_IOC_GET_SUBVOL_ROOTREF which
returns ROOT_REF information of the subvolume containing this inode.
The min id of root ref's subvolume to be searched is specified by
min_id in struct btrfs_ioctl_get_subvol_rootref_args.

If there are more root refs than BTRFS_MAX_ROOTREF_BUFFER_NUM,
this ioctl sets min_id to the last searched root ref's subvolid + 1
and return -EOVERFLOW. Therefore the caller can just call this ioctl
again without changing the argument to continue search.

Signed-off-by: Tomohiro Misono 
---
 fs/btrfs/ioctl.c   | 91 ++
 include/uapi/linux/btrfs.h | 16 
 2 files changed, 107 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index be2abe7a5bba..5b97a735790e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2425,6 +2425,95 @@ static noinline int btrfs_ioctl_get_subvol_info(struct 
file *file,
return ret;
 }
 
+/* Returns ROOT_REF information of the subvolume contining this inode. */
+static noinline int btrfs_ioctl_get_subvol_rootref(struct file *file,
+  void __user *argp)
+{
+   struct btrfs_ioctl_get_subvol_rootref_args *rootrefs;
+   struct btrfs_root_ref *rref;
+   struct btrfs_root *root;
+   struct btrfs_path *path;
+   struct btrfs_key key;
+
+   struct extent_buffer *l;
+   int slot;
+
+   struct inode *inode;
+   int i, nritems;
+   int ret;
+   u64 objectid;
+   u8 found;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   rootrefs = memdup_user(argp, sizeof(*rootrefs));
+   if (!rootrefs) {
+   btrfs_free_path(path);
+   return -ENOMEM;
+   }
+
+   inode = file_inode(file);
+   root = BTRFS_I(inode)->root->fs_info->tree_root;
+   objectid = BTRFS_I(inode)->root->root_key.objectid;
+
+   key.objectid = objectid;
+   key.type = BTRFS_ROOT_REF_KEY;
+   key.offset = rootrefs->min_id;
+   found = 0;
+   while (1) {
+   ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+   if (ret < 0)
+   goto out;
+
+   l = path->nodes[0];
+   slot = path->slots[0];
+   nritems = btrfs_header_nritems(l);
+   if (nritems - slot == 0) {
+   ret = 0;
+   goto out;
+   }
+
+   for (i = slot; i < nritems; i++) {
+   btrfs_item_key_to_cpu(l, &key, i);
+   if (key.objectid != objectid ||
+   key.type != BTRFS_ROOT_REF_KEY) {
+   ret = 0;
+   goto out;
+   }
+
+   if (found == BTRFS_MAX_ROOTREF_BUFFER_NUM) {
+   /* update min_id for next search */
+   rootrefs->min_id = key.offset;
+   ret = -EOVERFLOW;
+   goto out;
+   }
+
+   rref = btrfs_item_ptr(l, i, struct btrfs_root_ref);
+   rootrefs->rootref[found].subvolid = key.offset;
+   rootrefs->rootref[found].dirid =
+ btrfs_root_ref_dirid(l, rref);
+   found++;
+   }
+
+   btrfs_release_path(path);
+   key.offset++;
+   }
+
+out:
+   if (!ret || ret == -EOVERFLOW) {
+   rootrefs->num_items = found;
+   if (copy_to_user(argp, rootrefs, sizeof(*rootrefs)))
+   ret = -EFAULT;
+   }
+
+   btrfs_free_path(path);
+   kfree(rootrefs);
+
+   return ret;
+}
+
 static noinline int btrfs_ioctl_snap_destroy(struct file *file,
 void __user *arg)
 {
@@ -5781,6 +5870,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_set_features(file, argp);
case BTRFS_IOC_GET_SUBVOL_INFO:
return btrfs_ioctl_get_subvol_info(file, argp);
+   case BTRFS_IOC_GET_SUBVOL_ROOTREF:
+   return btrfs_ioctl_get_subvol_rootref(file, argp);
}
 
return -ENOTTY;
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index ed053852c71f..82c88d52d6e6 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -774,6 +774,20 @@ struct btrfs_ioctl_get_subvol_info_args {
__u64 reserved[8];
 };
 
+#define BTRFS_MAX_ROOTREF_BUFFER_NUM 255
+struct btrfs_ioctl_get_subvol_rootref_args {
+   /* in/out, min id of rootref's subvolid to be searched */
+   __u64 min_id;
+   /* out */
+   struct {
+   __u64 subvolid;
+   __u64 dirid;
+   } rootref[BTRFS_MAX_ROOTRE

[PATCH v3 3/3] btrfs: Add unprivileged version of ino_lookup ioctl

2018-03-19 Thread Misono, Tomohiro

Add unprivileged version of ino_lookup ioctl BTRFS_IOC_INO_LOOKUP_USER
to allow normal users to call "btrfs subvololume list/show" etc. in
combination with BTRFS_IOC_GET_SUBVOL_INFO/BTRFS_IOC_GET_SUBVOL_ROOTREF.

This can be used like BTRFS_IOC_INO_LOOKUP but the argument is
different. This is  because it always searches the fs/file tree
correspoinding to the fd with which this ioctl is called and also
returns the name of bottom subvolume.

The main differences from original ino_lookup ioctl are:
  1. Read + Exec permission will be checked using inode_permission()
 during path construction. -EACCES will be returned in case
 of failure.
  2. Path construction will be stopped at the inode number which
 corresponds to the fd with which this ioctl is called. If
 constructed path does not exist under fd's inode, -EACCES
 will be returned.
  3. The name of bottom subvolume is also searched and filled.

Note that the maximum length of path is shorter 256 (BTRFS_VOL_NAME_MAX+1)
bytes than ino_lookup ioctl because of space of subvolume's name.

Signed-off-by: Tomohiro Misono 
---
 fs/btrfs/ioctl.c   | 204 +
 include/uapi/linux/btrfs.h |  17 
 2 files changed, 221 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5b97a735790e..5ea1e7bd03ce 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2267,6 +2267,166 @@ static noinline int btrfs_search_path_in_tree(struct 
btrfs_fs_info *info,
return ret;
 }
 
+static noinline int btrfs_search_path_in_tree_user(struct inode *inode,
+   struct btrfs_ioctl_ino_lookup_user_args *args)
+{
+   struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+   struct super_block *sb = inode->i_sb;
+   struct btrfs_key upper_limit = BTRFS_I(inode)->location;
+   u64 treeid = BTRFS_I(inode)->root->root_key.objectid;
+   u64 dirid = args->dirid;
+
+   unsigned long item_off;
+   unsigned long item_len;
+   struct btrfs_inode_ref *iref;
+   struct btrfs_root_ref *rref;
+   struct btrfs_root *root;
+   struct btrfs_path *path;
+   struct btrfs_key key, key2;
+   struct extent_buffer *l;
+   struct inode *temp_inode;
+   char *ptr;
+   int slot;
+   int len;
+   int total_len = 0;
+   int ret = -1;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   /*
+* If the bottom subvolume does not exist directly under upper_limit,
+* construct the path in bottomup way.
+*/
+   if (dirid != upper_limit.objectid) {
+   ptr = &args->path[BTRFS_INO_LOOKUP_USER_PATH_MAX - 1];
+
+   key.objectid = treeid;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = (u64)-1;
+   root = btrfs_read_fs_root_no_name(fs_info, &key);
+   if (IS_ERR(root)) {
+   ret = -ENOENT;
+   goto out;
+   }
+
+   key.objectid = dirid;
+   key.type = BTRFS_INODE_REF_KEY;
+   key.offset = (u64)-1;
+   while (1) {
+   ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+   if (ret < 0) {
+   goto out;
+   } else if (ret > 0) {
+   ret = btrfs_previous_item(root, path, dirid,
+ BTRFS_INODE_REF_KEY);
+   if (ret < 0) {
+   goto out;
+   } else if (ret > 0) {
+   ret = -ENOENT;
+   goto out;
+   }
+   }
+
+   l = path->nodes[0];
+   slot = path->slots[0];
+   btrfs_item_key_to_cpu(l, &key, slot);
+
+   iref = btrfs_item_ptr(l, slot, struct btrfs_inode_ref);
+   len = btrfs_inode_ref_name_len(l, iref);
+   ptr -= len + 1;
+   total_len += len + 1;
+   if (ptr < args->path) {
+   ret = -ENAMETOOLONG;
+   goto out;
+   }
+
+   *(ptr + len) = '/';
+   read_extent_buffer(l, ptr,
+   (unsigned long)(iref + 1), len);
+
+   /* Check the read+exec permission of this directory */
+   ret = btrfs_previous_item(root, path, dirid,
+ BTRFS_INODE_ITEM_KEY);
+   if (ret < 0) {
+   goto out;
+   } else if (ret > 0) {
+   ret = -ENOENT;
+

[RFC PATCH v3 0/7] btrfs-progs: Allow normal user to call "subvolume list/show"

2018-03-19 Thread Misono, Tomohiro

changelog:

v2 -> v3
  - use get_euid() to check the caller's privilege (and remove 3rd patch)
  - improve error handling
v1 -> v2
  - add independent error handling patch (1st patch)
  - reimplement according to ioctl change
  - various cleanup
===

This RFC implements user version of "subvolume list/show" using three new 
ioctls.
The ioctl patch to the kernel can be found in the ML titled 
  "[PATCH v3 0/3] btrfs: Add three new unprivileged ioctls to allow normal 
users to call "sub list/show" etc.

1th patch is independent and improvements of error handling
2nd-4th are some prepartion works.
5th patch is the main part.
6th-7th adds the test for "subvolume list"

The main behavior differences between root and normal users are:

- "sub list" list the subvolumes which exist under the specified path 
(including the path itself). The specified path itself is not needed to be
 a subvolume. Also If the subvolume cannot be opend but the parent
directory can be, the information other than name or id would be zeroed out.

- snapshot filed of "subvolume show" just lists
the snapshots under the specified subvolume.


This is a part of RFC I sent last December[1] whose aim is to improve normal 
users' usability.
The remaining works of RFC are: 
  - Allow "sub delete" for empty subvolume
  - Allow "qgroup show" to check quota limit

[1] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg70991.html


Tomohiro Misono (7):
  btrfs-progs: sub list: Call rb_free_nodes() in error path
  btrfs-progs: ioctl: Add 3 definitions of new unprivileged ioctl
  btrfs-progs: sub list: Pass specified path down to
btrfs_list_subvols()
  btrfs-progs: fallback to open without O_NOATIME flag in
find_mount_root()
  btrfs-progs: sub list: Allow normal user to call "subvolume list/show"
  btrfs-progs: test: Add helper function to check if test user exists
  btrfs-porgs: test: Add cli-test/009 to check subvolume list for both
root and normal user

 btrfs-list.c   | 376 +++--
 btrfs-list.h   |   7 +-
 cmds-subvolume.c   |  14 +-
 ioctl.h|  86 +++
 tests/cli-tests/009-subvolume-list/test.sh | 136 +++
 tests/common   |  10 +
 utils.c|  13 +-
 7 files changed, 609 insertions(+), 33 deletions(-)
 create mode 100755 tests/cli-tests/009-subvolume-list/test.sh

-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 2/7] btrfs-progs: ioctl: Add 3 definitions of new unprivileged ioctl

2018-03-19 Thread Misono, Tomohiro

Add 3 definitions of new unprivileged ioctl (BTRFS_IOC_GET_SUBVOL_INFO,
BTRFS_IOC_GET_SUBVOL_ROOTREF and BTRFS_IOC_INO_LOOKUP_USER). They will
be used to implement user version of "btrfs subvolume list" etc.

Signed-off-by: Tomohiro Misono 
---
 ioctl.h | 86 +
 1 file changed, 86 insertions(+)

diff --git a/ioctl.h b/ioctl.h
index 709e996f..c6624352 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -320,6 +320,22 @@ struct btrfs_ioctl_ino_lookup_args {
 };
 BUILD_ASSERT(sizeof(struct btrfs_ioctl_ino_lookup_args) == 4096);
 
+#define BTRFS_INO_LOOKUP_USER_PATH_MAX (4080-BTRFS_VOL_NAME_MAX-1)
+struct btrfs_ioctl_ino_lookup_user_args {
+   /* in, inode number containing the subvolume of 'subvolid' */
+   __u64 dirid;
+   /* in */
+   __u64 subvolid;
+   /* out, name of the subvolume of 'subvolid' */
+   char name[BTRFS_VOL_NAME_MAX + 1];
+   /*
+* out, constructed path from the directory with which
+* the ioctl is called to dirid
+*/
+   char path[BTRFS_INO_LOOKUP_USER_PATH_MAX];
+};
+BUILD_ASSERT(sizeof(struct btrfs_ioctl_ino_lookup_user_args) == 4096);
+
 struct btrfs_ioctl_search_key {
/* which root are we searching.  0 is the tree of tree roots */
__u64 tree_id;
@@ -672,6 +688,70 @@ BUILD_ASSERT(sizeof(struct btrfs_ioctl_send_args_64) == 
72);
 
 #define BTRFS_IOC_SEND_64_COMPAT_DEFINED 1
 
+struct btrfs_ioctl_get_subvol_info_args {
+   /* All filed is out */
+   /* Id of this subvolume */
+   __u64 id;
+   /* Name of this subvolume, used to get the real name at mount point */
+   char name[BTRFS_VOL_NAME_MAX + 1];
+   /*
+* Id of the subvolume which contains this subvolume.
+* Zero for top-level subvolume or deleted subvolume
+*/
+   __u64 parent_id;
+   /*
+* Inode number of the directory which contains this subvolume.
+* Zero for top-level subvolume or deleted subvolume
+*/
+   __u64 dirid;
+
+   /* Latest transaction id of this subvolume */
+   __u64 generation;
+   /* Flags of this subvolume */
+   __u64 flags;
+
+   /* uuid of this subvolume */
+   __u8 uuid[BTRFS_UUID_SIZE];
+   /*
+* uuid of the subvolume of which this subvolume is a snapshot.
+* All zero for non-snapshot subvolume
+*/
+   __u8 parent_uuid[BTRFS_UUID_SIZE];
+   /*
+* uuid of the subvolume from which this subvolume is received.
+* All zero for non-received subvolume
+*/
+   __u8 received_uuid[BTRFS_UUID_SIZE];
+
+   /* Transaction id indicates when change/create/send/receive happens */
+   __u64 ctransid;
+   __u64 otransid;
+   __u64 stransid;
+   __u64 rtransid;
+   /* Time corresponds to c/o/s/rtransid */
+   struct btrfs_ioctl_timespec ctime;
+   struct btrfs_ioctl_timespec otime;
+   struct btrfs_ioctl_timespec stime;
+   struct btrfs_ioctl_timespec rtime;
+
+   __u64 reserved[8];
+};
+
+#define BTRFS_MAX_ROOTREF_BUFFER_NUM 255
+struct btrfs_ioctl_get_subvol_rootref_args {
+   /* in/out, min id of rootref's subvolid to be searched */
+   __u64 min_id;
+   /* out */
+   struct {
+   __u64 subvolid;
+   __u64 dirid;
+   } rootref[BTRFS_MAX_ROOTREF_BUFFER_NUM];
+   /* out, number of found items */
+   __u8 num_items;
+   __u8 align[7];
+};
+BUILD_ASSERT(sizeof(struct btrfs_ioctl_get_subvol_rootref_args) == 4096);
+
 /* Error codes as returned by the kernel */
 enum btrfs_err_code {
notused,
@@ -828,6 +908,12 @@ static inline char *btrfs_err_str(enum btrfs_err_code 
err_code)
   struct btrfs_ioctl_feature_flags[3])
 #define BTRFS_IOC_RM_DEV_V2_IOW(BTRFS_IOCTL_MAGIC, 58, \
   struct btrfs_ioctl_vol_args_v2)
+#define BTRFS_IOC_GET_SUBVOL_INFO _IOR(BTRFS_IOCTL_MAGIC, 60, \
+   struct btrfs_ioctl_get_subvol_info_args)
+#define BTRFS_IOC_GET_SUBVOL_ROOTREF _IOWR(BTRFS_IOCTL_MAGIC, 61, \
+   struct btrfs_ioctl_get_subvol_rootref_args)
+#define BTRFS_IOC_INO_LOOKUP_USER _IOWR(BTRFS_IOCTL_MAGIC, 62, \
+   struct btrfs_ioctl_ino_lookup_user_args)
 #ifdef __cplusplus
 }
 #endif
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 1/7] btrfs-progs: sub list: Call rb_free_nodes() in error path

2018-03-19 Thread Misono, Tomohiro

After btrfs_list_subvols() is called, root_lookup may hold some allocated
memory area even if the function fails.
Therefore rb_free_nodes() should be called.

Signed-off-by: Tomohiro Misono 
---
 btrfs-list.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/btrfs-list.c b/btrfs-list.c
index e01c5899..50e5ce5f 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -1523,8 +1523,10 @@ int btrfs_list_subvols_print(int fd, struct 
btrfs_list_filter_set *filter_set,
return ret;
 
ret = btrfs_list_subvols(fd, &root_lookup);
-   if (ret)
+   if (ret) {
+   rb_free_nodes(&root_lookup.root, free_root_info);
return ret;
+   }
filter_and_sort_subvol(&root_lookup, &root_sort, filter_set,
 comp_set, top_id);
 
@@ -1554,14 +1556,18 @@ int btrfs_get_toplevel_subvol(int fd, struct root_info 
*the_ri)
return ret;
 
ret = btrfs_list_subvols(fd, &rl);
-   if (ret)
+   if (ret) {
+   rb_free_nodes(&rl.root, free_root_info);
return ret;
+   }
 
rbn = rb_first(&rl.root);
ri = rb_entry(rbn, struct root_info, rb_node);
 
-   if (ri->root_id != BTRFS_FS_TREE_OBJECTID)
+   if (ri->root_id != BTRFS_FS_TREE_OBJECTID) {
+   rb_free_nodes(&rl.root, free_root_info);
return -ENOENT;
+   }
 
memcpy(the_ri, ri, offsetof(struct root_info, path));
the_ri->path = strdup_or_null("/");
@@ -1585,8 +1591,10 @@ int btrfs_get_subvol(int fd, struct root_info *the_ri)
return ret;
 
ret = btrfs_list_subvols(fd, &rl);
-   if (ret)
+   if (ret) {
+   rb_free_nodes(&rl.root, free_root_info);
return ret;
+   }
 
rbn = rb_first(&rl.root);
while(rbn) {
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 4/7] btrfs-progs: fallback to open without O_NOATIME flag in find_mount_root()

2018-03-19 Thread Misono, Tomohiro

O_NOATIME flag requires effective UID of process matches file's owner
or has CAP_FOWNER capabilities. Fallback to open without O_NOATIME flag
so that normal user can also call find_mount_root().

This is a preparation work to allow normal user to call "subvolume show".

Signed-off-by: Tomohiro Misono 
---
 utils.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/utils.c b/utils.c
index 81c54faa..acea70a5 100644
--- a/utils.c
+++ b/utils.c
@@ -2045,6 +2045,9 @@ int find_mount_root(const char *path, char **mount_root)
char *longest_match = NULL;
 
fd = open(path, O_RDONLY | O_NOATIME);
+   if (fd < 0 && errno == EPERM)
+   fd = open(path, O_RDONLY);
+
if (fd < 0)
return -errno;
close(fd);
-- 
2.14.3


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 3/7] btrfs-progs: sub list: Pass specified path down to btrfs_list_subvols()

2018-03-19 Thread Misono, Tomohiro

This is a preparetion work to allow normal user to call
"subvolume list/show".

Signed-off-by: Tomohiro Misono 
---
 btrfs-list.c | 16 +---
 btrfs-list.h |  7 ---
 cmds-subvolume.c |  6 +++---
 utils.c  | 10 +-
 4 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/btrfs-list.c b/btrfs-list.c
index 50e5ce5f..833ff912 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -1489,7 +1489,8 @@ next:
}
 }
 
-static int btrfs_list_subvols(int fd, struct root_lookup *root_lookup)
+static int btrfs_list_subvols(int fd, struct root_lookup *root_lookup,
+   const char *path)
 {
int ret;
 
@@ -1510,7 +1511,7 @@ static int btrfs_list_subvols(int fd, struct root_lookup 
*root_lookup)
 int btrfs_list_subvols_print(int fd, struct btrfs_list_filter_set *filter_set,
   struct btrfs_list_comparer_set *comp_set,
   enum btrfs_list_layout layout, int full_path,
-  const char *raw_prefix)
+  const char *raw_prefix, const char *path)
 {
struct root_lookup root_lookup;
struct root_lookup root_sort;
@@ -1522,7 +1523,7 @@ int btrfs_list_subvols_print(int fd, struct 
btrfs_list_filter_set *filter_set,
if (ret)
return ret;
 
-   ret = btrfs_list_subvols(fd, &root_lookup);
+   ret = btrfs_list_subvols(fd, &root_lookup, path);
if (ret) {
rb_free_nodes(&root_lookup.root, free_root_info);
return ret;
@@ -1543,7 +1544,8 @@ static char *strdup_or_null(const char *s)
return strdup(s);
 }
 
-int btrfs_get_toplevel_subvol(int fd, struct root_info *the_ri)
+int btrfs_get_toplevel_subvol(int fd, struct root_info *the_ri,
+   const char *path)
 {
int ret;
struct root_lookup rl;
@@ -1555,7 +1557,7 @@ int btrfs_get_toplevel_subvol(int fd, struct root_info 
*the_ri)
if (ret)
return ret;
 
-   ret = btrfs_list_subvols(fd, &rl);
+   ret = btrfs_list_subvols(fd, &rl, path);
if (ret) {
rb_free_nodes(&rl.root, free_root_info);
return ret;
@@ -1578,7 +1580,7 @@ int btrfs_get_toplevel_subvol(int fd, struct root_info 
*the_ri)
return ret;
 }
 
-int btrfs_get_subvol(int fd, struct root_info *the_ri)
+int btrfs_get_subvol(int fd, struct root_info *the_ri, const char *path)
 {
int ret, rr;
struct root_lookup rl;
@@ -1590,7 +1592,7 @@ int btrfs_get_subvol(int fd, struct root_info *the_ri)
if (ret)
return ret;
 
-   ret = btrfs_list_subvols(fd, &rl);
+   ret = btrfs_list_subvols(fd, &rl, path);
if (ret) {
rb_free_nodes(&rl.root, free_root_info);
return ret;
diff --git a/btrfs-list.h b/btrfs-list.h
index 6e5fc778..9b6720e6 100644
--- a/btrfs-list.h
+++ b/btrfs-list.h
@@ -169,12 +169,13 @@ struct btrfs_list_comparer_set 
*btrfs_list_alloc_comparer_set(void);
 int btrfs_list_subvols_print(int fd, struct btrfs_list_filter_set *filter_set,
   struct btrfs_list_comparer_set *comp_set,
   enum btrfs_list_layout layot, int full_path,
-  const char *raw_prefix);
+  const char *raw_prefix, const char *path);
 int btrfs_list_find_updated_files(int fd, u64 root_id, u64 oldest_gen);
 int btrfs_list_get_default_subvolume(int fd, u64 *default_id);
 char *btrfs_list_path_for_root(int fd, u64 root);
 int btrfs_list_get_path_rootid(int fd, u64 *treeid);
-int btrfs_get_subvol(int fd, struct root_info *the_ri);
-int btrfs_get_toplevel_subvol(int fd, struct root_info *the_ri);
+int btrfs_get_subvol(int fd, struct root_info *the_ri, const char *path);
+int btrfs_get_toplevel_subvol(int fd, struct root_info *the_ri,
+   const char *path);
 
 #endif
diff --git a/cmds-subvolume.c b/cmds-subvolume.c
index 8a473f7a..faa10c5a 100644
--- a/cmds-subvolume.c
+++ b/cmds-subvolume.c
@@ -620,7 +620,7 @@ static int cmd_subvol_list(int argc, char **argv)
btrfs_list_setup_print_column(BTRFS_LIST_PATH);
 
ret = btrfs_list_subvols_print(fd, filter_set, comparer_set,
-   layout, !is_list_all && !is_only_in_path, NULL);
+   layout, !is_list_all && !is_only_in_path, NULL, subvol);
 
 out:
close_file_or_dir(fd, dirstream);
@@ -844,7 +844,7 @@ static int cmd_subvol_get_default(int argc, char **argv)
btrfs_list_setup_print_column(BTRFS_LIST_PATH);
 
ret = btrfs_list_subvols_print(fd, filter_set, NULL,
-   BTRFS_LIST_LAYOUT_DEFAULT, 1, NULL);
+   BTRFS_LIST_LAYOUT_DEFAULT, 1, NULL, subvol);
 
if (filter_set)
free(filter_set);
@@ -1110,7 +1110,7 @@ static int cmd_subvol_show(int argc, char **argv)
goto out;
}
btrfs_list_subvols_print(fd, filter_set, NULL, BTRFS_LIST_LAYOUT_RAW,
-

[RFC PATCH v3 6/7] btrfs-progs: test: Add helper function to check if test user exists

2018-03-19 Thread Misono, Tomohiro

Test user 'progs-test' will be used to test the behavior of normal user.

In order to pass this check, add the user by "useradd -M progs-test".
Note that progs-test should not have root privileges.

Signed-off-by: Tomohiro Misono 
---
 tests/common | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tests/common b/tests/common
index fae30f1d..e8f7c061 100644
--- a/tests/common
+++ b/tests/common
@@ -307,6 +307,16 @@ check_global_prereq()
fi
 }
 
+check_testuser()
+{
+   id -u progs-test > /dev/null 2>&1
+   if [ $? -ne 0 ]; then
+   _not_run "Need to add user \"progs-test\""
+   fi
+   # Note that progs-test should not have root privileges
+   # otherwise test may not run as expected
+}
+
 check_image()
 {
local image
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v3 7/7] btrfs-porgs: test: Add cli-test/009 to check subvolume list for both root and normal user

2018-03-19 Thread Misono, Tomohiro


Signed-off-by: Tomohiro Misono 
---
 tests/cli-tests/009-subvolume-list/test.sh | 136 +
 1 file changed, 136 insertions(+)
 create mode 100755 tests/cli-tests/009-subvolume-list/test.sh

diff --git a/tests/cli-tests/009-subvolume-list/test.sh 
b/tests/cli-tests/009-subvolume-list/test.sh
new file mode 100755
index ..ac00a697
--- /dev/null
+++ b/tests/cli-tests/009-subvolume-list/test.sh
@@ -0,0 +1,136 @@
+#!/bin/bash
+# test for "subvolume list" both for root and normal user
+
+source "$TEST_TOP/common"
+
+check_testuser
+check_prereq mkfs.btrfs
+check_prereq btrfs
+
+setup_root_helper
+prepare_test_dev
+
+
+# test if the ids returned by "sub list" match expected ids
+# $1  ... indicate run as root or test user
+# $2  ... PATH to be specified by sub list command
+# $3~ ... expected return ids
+test_list()
+{
+   local SUDO
+   if [ $1 -eq 1 ]; then
+   SUDO=$SUDO_HELPER
+   else
+   SUDO="sudo -u progs-test"
+   fi
+
+   result=$(run_check_stdout $SUDO "$TOP/btrfs" subvolume list "$2" | \
+   awk '{print $2}' | xargs | sort -n)
+
+   shift
+   shift
+   expected=($(echo "$@" | tr " " "\n" | sort -n))
+   expected=$(IFS=" "; echo "${expected[*]}")
+
+   if [ "$result" != "$expected" ]; then
+   echo "result  : $result"
+   echo "expected: $expected"
+   _fail "ids returned by sub list does not match expected ids"
+   fi
+}
+
+run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f "$TEST_DEV"
+run_check_mount_test_dev
+cd "$TEST_MNT"
+
+# create subvolumes and directories and make some non-readable
+# by user 'progs-test'
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume create sub1
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume create sub1/subsub1
+run_check $SUDO_HELPER mkdir sub1/dir
+
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume create sub2
+run_check $SUDO_HELPER mkdir -p sub2/dir/dirdir
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume create sub2/dir/subsub2
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume create sub2/dir/dirdir/subsubX
+
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume create sub3
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume create sub3/subsub3
+run_check $SUDO_HELPER mkdir sub3/dir
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume create sub3/dir/subsubY
+run_check $SUDO_HELPER chmod o-r sub3
+
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume create sub4
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume create sub4/subsub4
+run_check $SUDO_HELPER mkdir sub4/dir
+run_check $SUDO_HELPER "$TOP/btrfs" subvolume create sub4/dir/subsubZ
+run_check $SUDO_HELPER setfacl -m u:progs-test:- sub4/dir
+
+run_check $SUDO_HELPER touch "file"
+
+# expected result for root:
+#
+# ID 257 gen 9 top level 5 path sub1
+# ID 258 gen 8 top level 257 path sub1/subsub1
+# ID 259 gen 11 top level 5 path sub2
+# ID 260 gen 10 top level 259 path sub2/dir/subsub2
+# ID 261 gen 11 top level 259 path sub2/dir/dirdir/subsubX
+# ID 262 gen 15 top level 5 path sub3
+# ID 263 gen 13 top level 262 path sub3/subsub3
+# ID 264 gen 14 top level 262 path sub3/dir/subsubY
+# ID 265 gen 17 top level 5 path sub4
+# ID 266 gen 16 top level 265 path sub4/subsub4
+# ID 267 gen 17 top level 265 path sub4/dir/subsubZ
+
+# check for root for both absolute/relative path
+# always returns all subvolumes
+all=$(seq 257 267)
+test_list 1 "$TEST_MNT" "${all[@]}"
+test_list 1 "$TEST_MNT/sub1" "${all[@]}"
+test_list 1 "$TEST_MNT/sub1/dir" "${all[@]}"
+test_list 1 "$TEST_MNT/sub2" "${all[@]}"
+test_list 1 "$TEST_MNT/sub2/dir" "${all[@]}"
+test_list 1 "$TEST_MNT/sub3" "${all[@]}"
+test_list 1 "$TEST_MNT/sub4" "${all[@]}"
+run_mustfail "should fail for file" \
+   $SUDO_HELPER "$TOP/btrfs" subvolume list "$TEST_MNT/file"
+
+test_list 1 "." "${all[@]}"
+test_list 1 "sub1" "${all[@]}"
+test_list 1 "sub1/dir" "${all[@]}"
+test_list 1 "sub2" "${all[@]}"
+test_list 1 "sub2/dir" "${all[@]}"
+test_list 1 "sub3" "${all[@]}"
+test_list 1 "sub4" "${all[@]}"
+run_mustfail "should fail for file" \
+   $SUDO_HELPER "$TOP/btrfs" subvolume list "file"
+
+# check for normal user for both absolute/relative path
+# only returns subvolumes under specified path
+test_list 0 "$TEST_MNT" "257 258 259 260 261 262 265 266"
+test_list 0 "$TEST_MNT/sub1" "257 258"
+test_list 0 "$TEST_MNT/sub1/dir" ""
+test_list 0 "$TEST_MNT/sub2" "259 260 261"
+test_list 0 "$TEST_MNT/sub2/dir" "260 261"
+run_mustfail "should raise permission error" \
+   sudo -u progs-test "$TOP/btrfs" subvolume list "$TEST_MNT/sub3"
+test_list 0 "$TEST_MNT/sub4" "265 266"
+run_mustfail "should raise permission error" \
+   sudo -u progs-test "$TOP/btrfs" subvolume list "$TEST_MNT/sub4/dir"
+run_mustfail "should fail for file" \
+   sudo -u progs-test "$TOP/btrfs" subvolume list "$TEST_MNT/file"
+
+test_list 0 "." "257 258 259 260 261 262 265 266"
+test_list 0 "sub1/dir" ""
+test_list 0 "sub2" "259 260 261"
+test_list 0 "sub2/dir" "260 261"
+run_mustfail

[RFC PATCH v3 5/7] btrfs-progs: sub list: Allow normal user to call "subvolume list/show"

2018-03-19 Thread Misono, Tomohiro

Allow normal user to call "subvolume list/show" by using 3 new
unprivileged ioctls (BTRFS_IOC_GET_SUBVOL_INFO,
BTRFS_IOC_GET_SUBVOL_ROOTREF and BTRFS_IOC_INO_LOOKUP_USER).

Note that for root, "subvolume list" returns all the subvolume in the
filesystem by default, but for normal user, it returns subvolumes
which exist under the specified path (including the path itself).
The specified path itself is not needed to be a subvolume.
If the subvolume cannot be opend but the parent directory can be,
the information other than name or id would be zeroed out.

Also, for normal user, snapshot filed of "subvolume show" just lists
the snapshots under the specified subvolume.

Signed-off-by: Tomohiro Misono 
---
 btrfs-list.c | 344 +--
 cmds-subvolume.c |   8 ++
 2 files changed, 341 insertions(+), 11 deletions(-)

diff --git a/btrfs-list.c b/btrfs-list.c
index 833ff912..c819499f 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -33,6 +33,7 @@
 #include 
 #include "btrfs-list.h"
 #include "rbtree-utils.h"
+#include 
 
 #define BTRFS_LIST_NFILTERS_INCREASE   (2 * BTRFS_LIST_FILTER_MAX)
 #define BTRFS_LIST_NCOMPS_INCREASE (2 * BTRFS_LIST_COMP_MAX)
@@ -549,6 +550,9 @@ static int resolve_root(struct root_lookup *rl, struct 
root_info *ri,
int len = 0;
struct root_info *found;
 
+   if (ri->full_path != NULL)
+   return 0;
+
/*
 * we go backwards from the root_info object and add pathnames
 * from parent directories as we go.
@@ -672,6 +676,50 @@ static int lookup_ino_path(int fd, struct root_info *ri)
return 0;
 }
 
+/* user version of lookup_ino_path which also cheks the access right */
+static int lookup_ino_path_user(int fd, struct root_info *ri)
+{
+   struct btrfs_ioctl_ino_lookup_user_args args;
+   int ret = 0;
+
+   if (ri->path)
+   return 0;
+   if (!ri->ref_tree)
+   return -ENOENT;
+
+   memset(&args, 0, sizeof(args));
+   args.dirid = ri->dir_id;
+   args.subvolid = ri->root_id;
+
+   ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP_USER, &args);
+   if (ret < 0) {
+   if (errno == ENOENT) {
+   ri->ref_tree = 0;
+   return -ENOENT;
+   }
+   if (errno != EACCES) {
+   error("failed to lookup path for root %llu: %m",
+   (unsigned long long)ri->ref_tree);
+   return ret;
+   } else {
+   return -EACCES;
+   }
+   }
+
+   ri->path = malloc(strlen(args.path) + strlen(args.name) + 1);
+   if (!ri->path)
+   return -ENOMEM;
+   strcpy(ri->path, args.path);
+
+   ri->name = malloc(strlen(args.name) + 1);
+   if (!ri->name)
+   return -ENOMEM;
+   strcpy(ri->name, args.name);
+
+   strcat(ri->path, ri->name);
+   return ret;
+}
+
 /* finding the generation for a given path is a two step process.
  * First we use the inode lookup routine to find out the root id
  *
@@ -1310,6 +1358,266 @@ static int list_subvol_fill_paths(int fd, struct 
root_lookup *root_lookup)
return 0;
 }
 
+static int fill_subvol_info(struct root_lookup *root_lookup, int fd)
+{
+   struct btrfs_ioctl_get_subvol_info_args subvol_info;
+   u64 root_offset;
+   int ret;
+
+   ret = ioctl(fd, BTRFS_IOC_GET_SUBVOL_INFO, &subvol_info);
+   if (ret < 0)
+   return -errno;
+
+   if (!uuid_is_null(subvol_info.parent_uuid))
+   root_offset = subvol_info.otransid;
+   else
+   root_offset = 0;
+
+   add_root(root_lookup, subvol_info.id, 0,
+root_offset, subvol_info.flags, subvol_info.dirid,
+NULL, 0,
+subvol_info.otransid, subvol_info.generation,
+subvol_info.otime.sec, subvol_info.uuid,
+subvol_info.parent_uuid, subvol_info.received_uuid);
+
+   return 0;
+}
+
+static int fill_subvol_info_top(struct root_lookup *root_lookup, int fd)
+{
+   struct btrfs_ioctl_get_subvol_info_args subvol_info;
+   struct root_info *ri;
+   u64 root_offset;
+   int ret;
+
+   ret = ioctl(fd, BTRFS_IOC_GET_SUBVOL_INFO, &subvol_info);
+   if (ret < 0)
+   return -errno;
+
+   if (!uuid_is_null(subvol_info.parent_uuid))
+   root_offset = subvol_info.otransid;
+   else
+   root_offset = 0;
+
+   add_root(root_lookup, subvol_info.id, subvol_info.parent_id,
+root_offset, subvol_info.flags, subvol_info.dirid,
+subvol_info.name, strlen(subvol_info.name),
+subvol_info.otransid, subvol_info.generation,
+subvol_info.otime.sec, subvol_info.uuid,
+subvol_info.parent_uuid, subvol_info.received_uuid);
+
+   /*
+* fill the path since we won't lookup directory/subvolume

[PATCH] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE

2018-03-19 Thread Misono, Tomohiro

Currently, the top-level subvolume lacks the UUID. As a result, both
non-snapshot subvolume and snapshot of top-level subvolume do not have
Parent UUID and cannot be distinguisued. Therefore "fi show" of
top-level lists all the subvolumes which lacks the UUID in
"Snapshot(s)" filed.  Also, it lacks the otime information.

Fix this by adding the UUID and otime at the mkfs time.  As a
consequence, snapshots of top-level subvolume now have a Parent UUID and
UUID tree will create an entry for top-level subvolume at mount time.
This should not cause the problem for current kernel, but user program
which relies on the empty Parent UUID may be affected by this change.

Signed-off-by: Tomohiro Misono 
---
This is also needed in order that "sub list -s" works properly for
non-privileged user[1] even if there are snapshots of toplevel subvolume.

Currently the check if a subvolume is a snapshot is done by looking at the key
offset of ROOT_ITEM of subvolume (non-zero for snapshot) used by tree search 
ioctl.
However, non-privileged version of "sub list" won't use tree search ioctl and 
just
looking if parent uuid is null or not. Therefore there is no way to recognize
snapshots of toplevel subvolume.

[1] https://marc.info/?l=linux-btrfs&m=152144463907830&w=2

 mkfs/common.c | 14 ++
 mkfs/main.c   |  3 +++
 2 files changed, 17 insertions(+)

diff --git a/mkfs/common.c b/mkfs/common.c
index 16916ca2..6924d9b7 100644
--- a/mkfs/common.c
+++ b/mkfs/common.c
@@ -44,6 +44,7 @@ static int btrfs_create_tree_root(int fd, struct 
btrfs_mkfs_config *cfg,
u32 itemoff;
int ret = 0;
int blk;
+   u8 uuid[BTRFS_UUID_SIZE];
 
memset(buf->data + sizeof(struct btrfs_header), 0,
cfg->nodesize - sizeof(struct btrfs_header));
@@ -77,6 +78,19 @@ static int btrfs_create_tree_root(int fd, struct 
btrfs_mkfs_config *cfg,
btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff);
btrfs_set_item_size(buf, btrfs_item_nr(nritems),
sizeof(root_item));
+   if (blk == MKFS_FS_TREE) {
+   time_t now = time(NULL);
+
+   uuid_generate(uuid);
+   memcpy(root_item.uuid, uuid, BTRFS_UUID_SIZE);
+   btrfs_set_stack_timespec_sec(&root_item.otime, now);
+   btrfs_set_stack_timespec_sec(&root_item.ctime, now);
+   } else {
+   memset(uuid, 0, BTRFS_UUID_SIZE);
+   memcpy(root_item.uuid, uuid, BTRFS_UUID_SIZE);
+   btrfs_set_stack_timespec_sec(&root_item.otime, 0);
+   btrfs_set_stack_timespec_sec(&root_item.ctime, 0);
+   }
write_extent_buffer(buf, &root_item,
btrfs_item_ptr_offset(buf, nritems),
sizeof(root_item));
diff --git a/mkfs/main.c b/mkfs/main.c
index 5a717f70..52d92581 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -315,6 +315,7 @@ static int create_tree(struct btrfs_trans_handle *trans,
struct btrfs_key location;
struct btrfs_root_item root_item;
struct extent_buffer *tmp;
+   u8 uuid[BTRFS_UUID_SIZE] = {0};
int ret;
 
ret = btrfs_copy_root(trans, root, root->node, &tmp, objectid);
@@ -325,6 +326,8 @@ static int create_tree(struct btrfs_trans_handle *trans,
btrfs_set_root_bytenr(&root_item, tmp->start);
btrfs_set_root_level(&root_item, btrfs_header_level(tmp));
btrfs_set_root_generation(&root_item, trans->transid);
+   /* clear uuid of source tree */
+   memcpy(root_item.uuid, uuid, BTRFS_UUID_SIZE);
free_extent_buffer(tmp);
 
location.objectid = objectid;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE

2018-03-19 Thread Hugo Mills

On Mon, Mar 19, 2018 at 05:16:42PM +0900, Misono, Tomohiro wrote:
> Currently, the top-level subvolume lacks the UUID. As a result, both
> non-snapshot subvolume and snapshot of top-level subvolume do not have
> Parent UUID and cannot be distinguisued. Therefore "fi show" of
> top-level lists all the subvolumes which lacks the UUID in
> "Snapshot(s)" filed.  Also, it lacks the otime information.
> 
> Fix this by adding the UUID and otime at the mkfs time.  As a
> consequence, snapshots of top-level subvolume now have a Parent UUID and
> UUID tree will create an entry for top-level subvolume at mount time.
> This should not cause the problem for current kernel, but user program
> which relies on the empty Parent UUID may be affected by this change.

   Is there any way of adding a UUID to the top level subvol on an
existing filesystem? It would be helpful not to have to rebuild every
filesystem in the world to fix this.

   Hugo.

> Signed-off-by: Tomohiro Misono 
> ---
> This is also needed in order that "sub list -s" works properly for
> non-privileged user[1] even if there are snapshots of toplevel subvolume.
> 
> Currently the check if a subvolume is a snapshot is done by looking at the key
> offset of ROOT_ITEM of subvolume (non-zero for snapshot) used by tree search 
> ioctl.
> However, non-privileged version of "sub list" won't use tree search ioctl and 
> just
> looking if parent uuid is null or not. Therefore there is no way to recognize
> snapshots of toplevel subvolume.
> 
> [1] https://marc.info/?l=linux-btrfs&m=152144463907830&w=2
> 
>  mkfs/common.c | 14 ++
>  mkfs/main.c   |  3 +++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/mkfs/common.c b/mkfs/common.c
> index 16916ca2..6924d9b7 100644
> --- a/mkfs/common.c
> +++ b/mkfs/common.c
> @@ -44,6 +44,7 @@ static int btrfs_create_tree_root(int fd, struct 
> btrfs_mkfs_config *cfg,
>   u32 itemoff;
>   int ret = 0;
>   int blk;
> + u8 uuid[BTRFS_UUID_SIZE];
>  
>   memset(buf->data + sizeof(struct btrfs_header), 0,
>   cfg->nodesize - sizeof(struct btrfs_header));
> @@ -77,6 +78,19 @@ static int btrfs_create_tree_root(int fd, struct 
> btrfs_mkfs_config *cfg,
>   btrfs_set_item_offset(buf, btrfs_item_nr(nritems), itemoff);
>   btrfs_set_item_size(buf, btrfs_item_nr(nritems),
>   sizeof(root_item));
> + if (blk == MKFS_FS_TREE) {
> + time_t now = time(NULL);
> +
> + uuid_generate(uuid);
> + memcpy(root_item.uuid, uuid, BTRFS_UUID_SIZE);
> + btrfs_set_stack_timespec_sec(&root_item.otime, now);
> + btrfs_set_stack_timespec_sec(&root_item.ctime, now);
> + } else {
> + memset(uuid, 0, BTRFS_UUID_SIZE);
> + memcpy(root_item.uuid, uuid, BTRFS_UUID_SIZE);
> + btrfs_set_stack_timespec_sec(&root_item.otime, 0);
> + btrfs_set_stack_timespec_sec(&root_item.ctime, 0);
> + }
>   write_extent_buffer(buf, &root_item,
>   btrfs_item_ptr_offset(buf, nritems),
>   sizeof(root_item));
> diff --git a/mkfs/main.c b/mkfs/main.c
> index 5a717f70..52d92581 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -315,6 +315,7 @@ static int create_tree(struct btrfs_trans_handle *trans,
>   struct btrfs_key location;
>   struct btrfs_root_item root_item;
>   struct extent_buffer *tmp;
> + u8 uuid[BTRFS_UUID_SIZE] = {0};
>   int ret;
>  
>   ret = btrfs_copy_root(trans, root, root->node, &tmp, objectid);
> @@ -325,6 +326,8 @@ static int create_tree(struct btrfs_trans_handle *trans,
>   btrfs_set_root_bytenr(&root_item, tmp->start);
>   btrfs_set_root_level(&root_item, btrfs_header_level(tmp));
>   btrfs_set_root_generation(&root_item, trans->transid);
> + /* clear uuid of source tree */
> + memcpy(root_item.uuid, uuid, BTRFS_UUID_SIZE);
>   free_extent_buffer(tmp);
>  
>   location.objectid = objectid;

-- 
Hugo Mills | This chap Anon is writing some perfectly lovely
hugo@... carfax.org.uk | stuff at the moment.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature

Kernel warning - not sure if this is important

2018-03-19 Thread Peter Chant

I got this kernel warning overnight.  Possibly during or after a dedup
using duperemove.  I'm not sure if that is relevent.  Seems to relate to
fs/btrfs/backref.c line 1266.

Don't know if it is important.  Thought I'd post it just in case.

I'm afraid this is a screen-shot in the old fashioned sense:
https://drive.google.com/file/d/1NelZV2ld9Yqqt8b2y2sg6aAwgevgMy4X/view?usp=sharing

Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Kernel warning - not sure if this is important

2018-03-19 Thread Nikolay Borisov



On 19.03.2018 10:21, Peter Chant wrote:
> I got this kernel warning overnight.  Possibly during or after a dedup
> using duperemove.  I'm not sure if that is relevent.  Seems to relate to
> fs/btrfs/backref.c line 1266.
> 
> Don't know if it is important.  Thought I'd post it just in case.
> 
> I'm afraid this is a screen-shot in the old fashioned sense:
> https://drive.google.com/file/d/1NelZV2ld9Yqqt8b2y2sg6aAwgevgMy4X/view?usp=sharing

You are hitting a warning that was removed in : c8195a7b1ad5 ("btrfs:
remove spurious WARN_ON(ref->count < 0) in find_parent_nodes")


In short - you can disregard it. However, it's likely you will continue
seeing it in the future unless you update to 4.16 (the commit is not
tagged for stable ))



> 
> Pete
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Validate child tree block's level and first key

2018-03-19 Thread Qu Wenruo

We have several reports about node pointer points to incorrect child
tree blocks, which could have even wrong owner and level but still with
valid generation and checksum.

Although btrfs check could handle it and print error message like:
leaf parent key incorrect 60670574592

Kernel doesn't have enough check on this type of corruption correctly.
At least add such check to read_tree_block() and btrfs_read_buffer(),
where we need two new parameters @first_key and @level to verify the
child tree block.

For case where we don't have parent node to get @first_key and @level,
just pass @first_key as NULL and kernel will skip such check.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/backref.c |  6 +++--
 fs/btrfs/ctree.c   | 25 +-
 fs/btrfs/disk-io.c | 69 --
 fs/btrfs/disk-io.h |  8 +++---
 fs/btrfs/extent-tree.c |  6 -
 fs/btrfs/print-tree.c  | 10 +---
 fs/btrfs/qgroup.c  |  7 +++--
 fs/btrfs/relocation.c  | 21 ---
 fs/btrfs/tree-log.c| 12 ++---
 9 files changed, 126 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 26484648d090..3866b8ab20f1 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -738,7 +738,8 @@ static int add_missing_keys(struct btrfs_fs_info *fs_info,
BUG_ON(ref->key_for_search.type);
BUG_ON(!ref->wanted_disk_byte);
 
-   eb = read_tree_block(fs_info, ref->wanted_disk_byte, 0);
+   eb = read_tree_block(fs_info, ref->wanted_disk_byte, 0, NULL,
+0);
if (IS_ERR(eb)) {
free_pref(ref);
return PTR_ERR(eb);
@@ -1291,7 +1292,8 @@ static int find_parent_nodes(struct btrfs_trans_handle 
*trans,
ref->level == 0) {
struct extent_buffer *eb;
 
-   eb = read_tree_block(fs_info, ref->parent, 0);
+   eb = read_tree_block(fs_info, ref->parent, 0,
+NULL, 0);
if (IS_ERR(eb)) {
ret = PTR_ERR(eb);
goto out;
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index b88a79e69ddf..e49ca6d529e8 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1436,7 +1436,7 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
if (old_root && tm && tm->op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) {
btrfs_tree_read_unlock(eb_root);
free_extent_buffer(eb_root);
-   old = read_tree_block(fs_info, logical, 0);
+   old = read_tree_block(fs_info, logical, 0, NULL, 0);
if (WARN_ON(IS_ERR(old) || !extent_buffer_uptodate(old))) {
if (!IS_ERR(old))
free_extent_buffer(old);
@@ -1656,6 +1656,7 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
btrfs_set_lock_blocking(parent);
 
for (i = start_slot; i <= end_slot; i++) {
+   struct btrfs_key first_key;
int close = 1;
 
btrfs_node_key(parent, &disk_key, i);
@@ -1665,6 +1666,7 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
progress_passed = 1;
blocknr = btrfs_node_blockptr(parent, i);
gen = btrfs_node_ptr_generation(parent, i);
+   btrfs_node_key_to_cpu(parent, &first_key, i);
if (last_block == 0)
last_block = blocknr;
 
@@ -1688,7 +1690,9 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
uptodate = 0;
if (!cur || !uptodate) {
if (!cur) {
-   cur = read_tree_block(fs_info, blocknr, gen);
+   cur = read_tree_block(fs_info, blocknr, gen,
+ &first_key,
+ parent_level - 1);
if (IS_ERR(cur)) {
return PTR_ERR(cur);
} else if (!extent_buffer_uptodate(cur)) {
@@ -1696,7 +1700,8 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
return -EIO;
}
} else if (!uptodate) {
-   err = btrfs_read_buffer(cur, gen);
+   err = btrfs_read_buffer(cur, gen, &first_key,
+   parent_level - 1);
if (err) {
free_extent_buffer(cur);
return err;
@@ -1849,14 +1854,17 @@ read_node_slot(st

RE: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-19 Thread David Laight

From: linus...@gmail.com [mailto:linus...@gmail.com] On Behalf Of Linus Torvalds
> Sent: 18 March 2018 23:36
...
> 
> Yeah, and since we're in the situation that *new* gcc versions work
> for us anyway, and we only have issues with older gcc's (that sadly
> people still use), even if there was a new cool feature we couldn't
> use it.

Is it necessary to have the full checks for old versions of gcc?

Even -Wvla could be predicated on very recent gcc - since we aren't
worried about whether gcc decides to generate a vla, but whether
the source requests one.

David

[PATCH] btrfs-progs: print-tree: Do correct offset output for ROOT_ITEM

2018-03-19 Thread Qu Wenruo

Commit Fixes: 8c36786c8198 ("btrfs-progs: print-tree: Print offset as tree 
objectid for ROOT_ITEM")
changes how we translate offset of ROOT_ITEM.

However the fact is, even for ROOT_ITEM, we have different meaning of
offset.

For tree reloc tree, it's indeed subvolume id.
But for other trees, it's the transid of when it's created.

Fix it.

Reported-by: Nikolay Borisov 
Signed-off-by: Qu Wenruo 
---
 print-tree.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/print-tree.c b/print-tree.c
index 45350fea0963..848e296c4e67 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -834,7 +834,16 @@ void btrfs_print_key(struct btrfs_disk_key *disk_key)
 */
case BTRFS_ROOT_ITEM_KEY:
printf(" ");
-   print_objectid(stdout, offset, type);
+   /*
+* Normally offset of ROOT_ITEM should presents the generation
+* when this root is created.
+* However if this is tree reloc tree, offset is the subvolume
+* id of its source. Here we do extra check on this.
+*/
+   if (objectid == BTRFS_TREE_RELOC_OBJECTID)
+   print_objectid(stdout, offset, type);
+   else
+   printf("%lld", offset);
printf(")");
break;
default:
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 2/2] Remove false-positive VLAs when using max()

2018-03-19 Thread Andrey Ryabinin



On 03/16/2018 07:25 AM, Kees Cook wrote:
> As part of removing VLAs from the kernel[1], we want to build with -Wvla,
> but it is overly pessimistic and only accepts constant expressions for
> stack array sizes, instead of also constant values. The max() macro
> triggers the warning, so this refactors these uses of max() to use the
> new const_max() instead.
> 
> [1] https://lkml.org/lkml/2018/3/7/621
> 
> Signed-off-by: Kees Cook 
> ---
>  drivers/input/touchscreen/cyttsp4_core.c |  2 +-
>  fs/btrfs/tree-checker.c  |  3 ++-
>  lib/vsprintf.c   |  5 +++--
>  net/ipv4/proc.c  |  8 
>  net/ipv6/proc.c  | 11 +--
>  5 files changed, 15 insertions(+), 14 deletions(-)
> 

FWIW, the patch below is alternative way to deal with these (Note, I didn't 
test my patch, just demonstrating the idea).
It's quite simple, and should work on any gcc version.

This approach wouldn't work well for CONFIG dependent max values, especially in 
case of single constant
expression being dependent on several config options, but it seems we don't 
have any these.


 drivers/input/touchscreen/cyttsp4_core.c | 3 ++-
 fs/btrfs/tree-checker.c  | 3 ++-
 lib/vsprintf.c   | 6 --
 net/ipv4/proc.c  | 4 +++-
 net/ipv6/proc.c  | 6 --
 5 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/input/touchscreen/cyttsp4_core.c 
b/drivers/input/touchscreen/cyttsp4_core.c
index 727c3232517c..ce546a3fad3d 100644
--- a/drivers/input/touchscreen/cyttsp4_core.c
+++ b/drivers/input/touchscreen/cyttsp4_core.c
@@ -868,7 +868,8 @@ static void cyttsp4_get_mt_touches(struct cyttsp4_mt_data 
*md, int num_cur_tch)
struct cyttsp4_touch tch;
int sig;
int i, j, t = 0;
-   int ids[max(CY_TMA1036_MAX_TCH, CY_TMA4XX_MAX_TCH)];
+   int ids[CY_TMA4XX_MAX_TCH];
+   BUILD_BUG_ON(CY_TMA1036_MAX_TCH > CY_TMA4XX_MAX_TCH);
 
memset(ids, 0, si->si_ofs.tch_abs[CY_TCH_T].max * sizeof(int));
for (i = 0; i < num_cur_tch; i++) {
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 8871286c1a91..ad4c2fea572f 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -346,7 +346,8 @@ static int check_dir_item(struct btrfs_fs_info *fs_info,
 */
if (key->type == BTRFS_DIR_ITEM_KEY ||
key->type == BTRFS_XATTR_ITEM_KEY) {
-   char namebuf[max(BTRFS_NAME_LEN, XATTR_NAME_MAX)];
+   char namebuf[BTRFS_NAME_LEN];
+   BUILD_BUG_ON(XATTR_NAME_MAX > BTRFS_NAME_LEN);
 
read_extent_buffer(leaf, namebuf,
(unsigned long)(di + 1), name_len);
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 942b5234a59b..fa081d684660 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -754,13 +754,15 @@ char *resource_string(char *buf, char *end, struct 
resource *res,
 #define FLAG_BUF_SIZE  (2 * sizeof(res->flags))
 #define DECODED_BUF_SIZE   sizeof("[mem - 64bit pref window disabled]")
 #define RAW_BUF_SIZE   sizeof("[mem - flags 0x]")
-   char sym[max(2*RSRC_BUF_SIZE + DECODED_BUF_SIZE,
-2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE)];
+   char sym[2*RSRC_BUF_SIZE + DECODED_BUF_SIZE];
 
char *p = sym, *pend = sym + sizeof(sym);
int decode = (fmt[0] == 'R') ? 1 : 0;
const struct printf_spec *specp;
 
+   BUILD_BUG_ON((2*RSRC_BUF_SIZE + FLAG_BUF_SIZE + RAW_BUF_SIZE) >
+   (2*RSRC_BUF_SIZE + DECODED_BUF_SIZE));
+
*p++ = '[';
if (res->flags & IORESOURCE_IO) {
p = string(p, pend, "io  ", str_spec);
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index d97e83b2dd33..9d08749de8d0 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -46,7 +46,7 @@
 #include 
 #include 
 
-#define TCPUDP_MIB_MAX max_t(u32, UDP_MIB_MAX, TCP_MIB_MAX)
+#define TCPUDP_MIB_MAX TCP_MIB_MAX
 
 /*
  * Report socket allocation statistics [m...@utu.fi]
@@ -404,6 +404,8 @@ static int snmp_seq_show_tcp_udp(struct seq_file *seq, void 
*v)
struct net *net = seq->private;
int i;
 
+   BUILD_BUG_ON(UDP_MIB_MAX > TCP_MIB_MAX);
+
memset(buff, 0, TCPUDP_MIB_MAX * sizeof(unsigned long));
 
seq_puts(seq, "\nTcp:");
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index 1678cf037688..3ad91dae7324 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -32,8 +32,7 @@
 
 #define MAX4(a, b, c, d) \
max_t(u32, max_t(u32, a, b), max_t(u32, c, d))
-#define SNMP_MIB_MAX MAX4(UDP_MIB_MAX, TCP_MIB_MAX, \
-   IPSTATS_MIB_MAX, ICMP_MIB_MAX)
+#define SNMP_MIB_MAX IPSTATS_MIB_MAX
 
 static int sockstat6_seq_show(struct seq_file *seq, void *v)
 {
@@ -198,6 +197,9 @@ static void snmp6_seq_show_item(struct seq_file *seq, void 
__pe

Re: [PATCH] btrfs-progs: print-tree: Do correct offset output for ROOT_ITEM

2018-03-19 Thread Nikolay Borisov



On 19.03.2018 12:28, Qu Wenruo wrote:
> Commit Fixes: 8c36786c8198 ("btrfs-progs: print-tree: Print offset as tree 
> objectid for ROOT_ITEM")
> changes how we translate offset of ROOT_ITEM.
> 
> However the fact is, even for ROOT_ITEM, we have different meaning of
> offset.
> 
> For tree reloc tree, it's indeed subvolume id.
> But for other trees, it's the transid of when it's created.
> 
> Fix it.
> 
> Reported-by: Nikolay Borisov 
> Signed-off-by: Qu Wenruo 
> ---
>  print-tree.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/print-tree.c b/print-tree.c
> index 45350fea0963..848e296c4e67 100644
> --- a/print-tree.c
> +++ b/print-tree.c
> @@ -834,7 +834,16 @@ void btrfs_print_key(struct btrfs_disk_key *disk_key)
>*/
>   case BTRFS_ROOT_ITEM_KEY:
>   printf(" ");
> - print_objectid(stdout, offset, type);
> + /*
> +  * Normally offset of ROOT_ITEM should presents the generation
> +  * when this root is created.
> +  * However if this is tree reloc tree, offset is the subvolume
> +  * id of its source. Here we do extra check on this.
> +  */
> + if (objectid == BTRFS_TREE_RELOC_OBJECTID)
> + print_objectid(stdout, offset, type);

Do we even have to do this for the reloc tree. If the offset is just a
numeric id tied to a subvolume then the only possible value different
than a numerci value is FS_TREE (in case we have 5 as the source subvol)
but even this is not very informative. I'd suggest just leaving it as a
numeric value.

> + else
> + printf("%lld", offset);
>   printf(")");
>   break;
>   default:
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: print-tree: Do correct offset output for ROOT_ITEM

2018-03-19 Thread Qu Wenruo



On 2018年03月19日 18:46, Nikolay Borisov wrote:
> 
> 
> On 19.03.2018 12:28, Qu Wenruo wrote:
>> Commit Fixes: 8c36786c8198 ("btrfs-progs: print-tree: Print offset as tree 
>> objectid for ROOT_ITEM")
>> changes how we translate offset of ROOT_ITEM.
>>
>> However the fact is, even for ROOT_ITEM, we have different meaning of
>> offset.
>>
>> For tree reloc tree, it's indeed subvolume id.
>> But for other trees, it's the transid of when it's created.
>>
>> Fix it.
>>
>> Reported-by: Nikolay Borisov 
>> Signed-off-by: Qu Wenruo 
>> ---
>>  print-tree.c | 11 ++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/print-tree.c b/print-tree.c
>> index 45350fea0963..848e296c4e67 100644
>> --- a/print-tree.c
>> +++ b/print-tree.c
>> @@ -834,7 +834,16 @@ void btrfs_print_key(struct btrfs_disk_key *disk_key)
>>   */
>>  case BTRFS_ROOT_ITEM_KEY:
>>  printf(" ");
>> -print_objectid(stdout, offset, type);
>> +/*
>> + * Normally offset of ROOT_ITEM should presents the generation
>> + * when this root is created.
>> + * However if this is tree reloc tree, offset is the subvolume
>> + * id of its source. Here we do extra check on this.
>> + */
>> +if (objectid == BTRFS_TREE_RELOC_OBJECTID)
>> +print_objectid(stdout, offset, type);
> 
> Do we even have to do this for the reloc tree. If the offset is just a
> numeric id tied to a subvolume then the only possible value different
> than a numerci value is FS_TREE (in case we have 5 as the source subvol)
> but even this is not very informative. I'd suggest just leaving it as a
> numeric value.

What about this value?
18446744073709551607

It's data reloc tree objectid.
And it's pretty long since it's -9ULL.

At least this makes 2 objectid more readable.

Thanks,
Qu

> 
>> +else
>> +printf("%lld", offset);
>>  printf(")");
>>  break;
>>  default:
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] btrfs-progs: print-tree: Do correct offset output for ROOT_ITEM

2018-03-19 Thread Nikolay Borisov



On 19.03.2018 12:50, Qu Wenruo wrote:
> What about this value?
> 18446744073709551607
> 
> It's data reloc tree objectid.
> And it's pretty long since it's -9ULL.
> 
> At least this makes 2 objectid more readable.

Fair enough.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: High CPU usage of RT threads...

2018-03-19 Thread Shyam Prasad N

Hi,

Attaching the sysrq-trigger output.

Regards,
Shyam

On Mon, Mar 19, 2018 at 12:45 PM, Nikolay Borisov  wrote:
>
>
> On 19.03.2018 09:13, Shyam Prasad N wrote:
>> Hi Nikolay,
>>
>> Thanks for your reply on this.
>>
>> Checked the stack trace for many of the stuck threads. Looks like all
>> of them are stuck in this loop...
>> [] exit_to_usermode_loop+0x72/0xd0
>> [] prepare_exit_to_usermode+0x26/0x30
>> [] retint_user+0x8/0x10
>> [] 0x
>
> Well, this doesn't imply btrfs at all.
>
> How about the _full_ output of :
>
> echo w > /proq/sysrq-trigger
>
> Perhaps there is a lot of load in workqueues?
>
>>
>> Seems like it is stuck in a tight loop in exit_to_usermode_loop.
>> FWIW, we started seeing this issue with nodatacow btrfs mount option.
>> Previously we were running with default option of datacow.
>> However, this also coincides with fairly heavy unlink load that we've
>> been putting the system under.
>>
>> Please let me know if there is anything else you can think of, based
>> on the above data.
>>
>> Regards,
>> Shyam
>>
>>
>> On Thu, Mar 15, 2018 at 12:59 PM, Nikolay Borisov  wrote:
>>>
>>>
>>> On 15.03.2018 09:23, Shyam Prasad N wrote:
 Hi,

 Our servers run some daemons that are scheduled to run many real time
 threads. These threads serve the client nodes by performing I/O on top
 of some set of disks, configured as DRBD pairs with disks on other
 peer servers for high availability of data. Btrfs is the filesystem
 that is configured on top of DRBD.

 While testing high availability with fairly high load, we have noticed
 the following behaviour a couple of times: When the server which was
 killed comes back up and gets ready and DRBD disks start syncing the
 data between the disks, a performance hit is generally expected at the
 peer node which has taken over the service now. However, the real time
 threads (mentioned above) on the active node are hogging the CPUs. As
 a part of debugging the issue, we tried to force a core dump on these
 threads by using a SIGABRT. However, these threads were not responding
 to any signals. Only after using real-time throttling (to reduce real
 time CPU usage to 50%), and waiting around for a few minutes, we were
 able to force a core dump. However, the corefile generated didn't have
 much useful info (I think it was a partial/corrupted core dump).

 Based on the above behaviour, (signals not being picked up), it looks
 to me like all these threads were likely stuck inside some system
 call. And since majority of the system calls by these threads are VFS
 calls on btrfs, I feel that these threads may have been stuck in some
 I/O. Specifically, based on the CPU usage, in some spinlock (I'm open
 to suggestions of other possibilities). And this is the reason I'm
 posting on this mailing list.
>>>
>>> When you have a bunch of those threads get a dump of the stacks of all
>>> sleeping tasks by "echo w > /proc/sysrq-trigger" .
>>>

 Is there a known bug which might have caused this? Kernel version
 we're using is 4.4.0.
>>>
>>> This is rather old kernel, you should at least be using latest 4.4.y
>>> stable kernel. BTRFS is a moving target and there are a lot of
>>> improvements made every release. So I'd suggest to try 4.14 at least on
>>> one offending machine.
>>>
 If we go for a kernel upgrade, what are the chances of not seeing this
 behaviour again?

 Or is my analysis of the problem entirely wrong? My feeling is that
 this maybe some issue with using Btrfs when it doesn't get a response
 from DRBD quickly enough.
>>>
>>> Feelings don't count for anything. Next time this happens extract
>>> stacktrace from the offending threads i.e. smapling /proc/[pid of
>>> hogging thread]/stack. Furthermore, if we assume that btrfs is indeed
>>> not getting responses fast enough this means most clients should really
>>> be stuck in io sleep and not doing any processing.
>>>
>>>
 Because we have been using ext4 on top of DRBD for a long time, and
 have never seen such issues during HA tests there.

>>
>>
>>



-- 
-Shyam

[Mon Mar 19 00:38:16 2018] sysrq: SysRq : Show backtrace of all active CPUs
[Mon Mar 19 00:38:16 2018] Sending NMI to all CPUs:
[Mon Mar 19 00:38:16 2018] NMI backtrace for cpu 0
[Mon Mar 19 00:38:16 2018] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.4.0-62-generic #83-Ubuntu
[Mon Mar 19 00:38:16 2018] Hardware name: Xen HVM domU, BIOS 4.7.1-1.9 
02/16/2017
[Mon Mar 19 00:38:16 2018] task: 81e11500 ti: 81e0 task.ti: 
81e0
[Mon Mar 19 00:38:16 2018] RIP: 0010:[]  [] 
native_safe_halt+0x6/0x10
[Mon Mar 19 00:38:16 2018] RSP: 0018:81e03e98  EFLAGS: 0246
[Mon Mar 19 00:38:16 2018] RAX:  RBX: 81f38200 RCX: 

[Mon Mar 19 00:38:16 2018] RDX:  RSI:  RDI: 

[Mo

Re: [PATCH] btrfs-progs: print-tree: Do correct offset output for ROOT_ITEM

2018-03-19 Thread Nikolay Borisov



On 19.03.2018 12:54, Nikolay Borisov wrote:
> 
> 
> On 19.03.2018 12:50, Qu Wenruo wrote:
>> What about this value?
>> 18446744073709551607
>>
>> It's data reloc tree objectid.
>> And it's pretty long since it's -9ULL.
>>
>> At least this makes 2 objectid more readable.
> 
> Fair enough.
> 

Looking a bit more I wonder if we should use the same print_objectid
function for the "owner" field of a leaf block.

For the data reloc tree we currently get:

leaf 30490624 items 2 free space 16061 generation 4 owner
18446744073709551607


Or for the leaf belonging to the first fs tree (5):
leaf 30883840 items 11 free space 15354 generation 10 owner 5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: print-tree: Do correct offset output for ROOT_ITEM

2018-03-19 Thread Qu Wenruo



On 2018年03月19日 19:15, Nikolay Borisov wrote:
> 
> 
> On 19.03.2018 12:54, Nikolay Borisov wrote:
>>
>>
>> On 19.03.2018 12:50, Qu Wenruo wrote:
>>> What about this value?
>>> 18446744073709551607
>>>
>>> It's data reloc tree objectid.
>>> And it's pretty long since it's -9ULL.
>>>
>>> At least this makes 2 objectid more readable.
>>
>> Fair enough.
>>
> 
> Looking a bit more I wonder if we should use the same print_objectid
> function for the "owner" field of a leaf block.
> 
> For the data reloc tree we currently get:
> 
> leaf 30490624 items 2 free space 16061 generation 4 owner
> 18446744073709551607
> 
> 
> Or for the leaf belonging to the first fs tree (5):
> leaf 30883840 items 11 free space 15354 generation 10 owner 5

Good idea!

Looks pretty nice to me.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature

Re: High CPU usage of RT threads...

2018-03-19 Thread Shyam Prasad N

On Mon, Mar 19, 2018 at 4:37 PM, Nikolay Borisov  wrote:
>
>
> On 19.03.2018 13:02, Shyam Prasad N wrote:
>> Hi,
>>
>> Attaching the sysrq-trigger output.
>
> Has this been obtained while the machine experienced a period of a lot
> of blocked threads? Because the output shows a machine which is idle?
>
Hmm.. No, actually. The threads are still taking up CPU, and not
responding to signals.


-- 
-Shyam
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: High CPU usage of RT threads...

2018-03-19 Thread Nikolay Borisov



On 19.03.2018 13:48, Shyam Prasad N wrote:
> On Mon, Mar 19, 2018 at 4:37 PM, Nikolay Borisov  wrote:
>>
>>
>> On 19.03.2018 13:02, Shyam Prasad N wrote:
>>> Hi,
>>>
>>> Attaching the sysrq-trigger output.
>>
>> Has this been obtained while the machine experienced a period of a lot
>> of blocked threads? Because the output shows a machine which is idle?
>>
> Hmm.. No, actually. The threads are still taking up CPU, and not
> responding to signals.

Considering all the data you provided I'm inclined to say you have a
problem of different nature than btrfs. Nothing really points at the
direction of btrfs.

> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: High CPU usage of RT threads...

2018-03-19 Thread Shyam Prasad N

Thank you for the analysis, Nikolay.
Will try to upgrade the kernel and check if the issue reproduces.

Regards,
Shyam

On Mon, Mar 19, 2018 at 5:21 PM, Nikolay Borisov  wrote:
>
>
> On 19.03.2018 13:48, Shyam Prasad N wrote:
>> On Mon, Mar 19, 2018 at 4:37 PM, Nikolay Borisov  wrote:
>>>
>>>
>>> On 19.03.2018 13:02, Shyam Prasad N wrote:
 Hi,

 Attaching the sysrq-trigger output.
>>>
>>> Has this been obtained while the machine experienced a period of a lot
>>> of blocked threads? Because the output shows a machine which is idle?
>>>
>> Hmm.. No, actually. The threads are still taking up CPU, and not
>> responding to signals.
>
> Considering all the data you provided I'm inclined to say you have a
> problem of different nature than btrfs. Nothing really points at the
> direction of btrfs.
>
>>
>>



-- 
-Shyam
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: print-tree: Do correct offset output for ROOT_ITEM

2018-03-19 Thread Nikolay Borisov



On 19.03.2018 12:28, Qu Wenruo wrote:
> Commit Fixes: 8c36786c8198 ("btrfs-progs: print-tree: Print offset as tree 
> objectid for ROOT_ITEM")
> changes how we translate offset of ROOT_ITEM.
> 
> However the fact is, even for ROOT_ITEM, we have different meaning of
> offset.
> 
> For tree reloc tree, it's indeed subvolume id.
> But for other trees, it's the transid of when it's created.
> 
> Fix it.
> 
> Reported-by: Nikolay Borisov 
> Signed-off-by: Qu Wenruo 
> ---
>  print-tree.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/print-tree.c b/print-tree.c
> index 45350fea0963..848e296c4e67 100644
> --- a/print-tree.c
> +++ b/print-tree.c
> @@ -834,7 +834,16 @@ void btrfs_print_key(struct btrfs_disk_key *disk_key)
>*/
>   case BTRFS_ROOT_ITEM_KEY:
>   printf(" ");
> - print_objectid(stdout, offset, type);
> + /*
> +  * Normally offset of ROOT_ITEM should presents the generation
> +  * when this root is created.
> +  * However if this is tree reloc tree, offset is the subvolume
> +  * id of its source. Here we do extra check on this.
> +  */
> + if (objectid == BTRFS_TREE_RELOC_OBJECTID)
> + print_objectid(stdout, offset, type);
> + else
> + printf("%lld", offset);
>   printf(")");

Reviewed-by: Nikolay Borisov 

>   break;
>   default:
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs-progs: ctree: Add extra level check for read_node_slot()

2018-03-19 Thread David Sterba

On Thu, Feb 08, 2018 at 08:59:40AM +0800, Qu Wenruo wrote:
> Strangely, we have level check in btrfs_print_tree() while we don't have
> the same check in read_node_slot().
> 
> That's to say, for the following corruption, btrfs_search_slot() or
> btrfs_next_leaf() can return invalid leaf:
> 
> Parent eb:
>   node XX level 1
>   ^^^
>   Child should be leaf (level 0)
>   ...
>   key (XXX XXX XXX) block YY
> 
> Child eb:
>   leaf YY level 1
>   ^^^
>   Something went wrong now
> 
> And for the corrupted leaf returned, later caller can be screwed up
> easily.
> 
> Although the root cause (powerloss, but still something wrong breaking
> metadata CoW of btrfs) is still unknown, at least enhance btrfs-progs to
> avoid SEGV.
> 
> Reported-by: Ralph Gauges 
> Signed-off-by: Qu Wenruo 

Applied, thanks.

> ---
> changlog:
> v2:
>   Check if the extent buffer is up-to-date before checking its level to
>   avoid possible NULL pointer access.
> ---
>  ctree.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/ctree.c b/ctree.c
> index 4fc33b14000a..430805e3043f 100644
> --- a/ctree.c
> +++ b/ctree.c
> @@ -22,6 +22,7 @@
>  #include "repair.h"
>  #include "internal.h"
>  #include "sizes.h"
> +#include "messages.h"
>  
>  static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
> *root, struct btrfs_path *path, int level);
> @@ -640,7 +641,9 @@ static int bin_search(struct extent_buffer *eb, struct 
> btrfs_key *key,
>  struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info,
>  struct extent_buffer *parent, int slot)
>  {
> + struct extent_buffer *ret;
>   int level = btrfs_header_level(parent);
> +
>   if (slot < 0)
>   return NULL;
>   if (slot >= btrfs_header_nritems(parent))
> @@ -649,8 +652,19 @@ struct extent_buffer *read_node_slot(struct 
> btrfs_fs_info *fs_info,
>   if (level == 0)
>   return NULL;
>  
> - return read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
> + ret = read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
>  btrfs_node_ptr_generation(parent, slot));
> + if (!extent_buffer_uptodate(ret))
> + return ERR_PTR(-EIO);
> +
> + if (btrfs_header_level(ret) != level - 1) {
> + error("child eb corrupted: parent bytenr=%llu item=%d parent 
> level=%d child level=%d",

Please unindent the strings that are do not fit 80 chars on the line.
I've fixed that now, but I do that too often despite this has been known
to be the preferred style.

> +   btrfs_header_bytenr(parent), slot,
> +   btrfs_header_level(parent), btrfs_header_level(ret));
> + free_extent_buffer(ret);
> + return ERR_PTR(-EIO);
> + }
> + return ret;
>  }
>  
>  static int balance_level(struct btrfs_trans_handle *trans,
> -- 
> 2.16.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2 v3] Btrfs-progs: check, fix false error reports for shared prealloc extents

2018-03-19 Thread David Sterba

On Thu, Mar 15, 2018 at 02:35:36PM +0200, Nikolay Borisov wrote:
> On 14.03.2018 22:11, fdman...@kernel.org wrote:
> > From: Filipe Manana 
> > 
> > Under some cases the filesystem checker reports an error when it finds
> > checksum items for an extent that is referenced by an inode as a prealloc
> > extent. Such cases are not an error when the extent is actually shared
> > (was cloned/reflinked) with other inodes and was written through one of
> > those other inodes.
> > 
> > Example:
> > 
> >   $ mkfs.btrfs -f /dev/sdb
> >   $ mount /dev/sdb /mnt
> > 
> >   $ touch /mnt/foo
> >   $ xfs_io -c "falloc 0 256K" /mnt/foo
> >   $ sync
> > 
> >   $ xfs_io -c "pwrite -S 0xab 0 256K" /mnt/foo
> >   $ touch /mnt/bar
> >   $ xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar
> >   $ xfs_io -c "fsync" /mnt/bar
> > 
> >   
> >   $ mount /dev/sdb /mnt
> >   $ umount /mnt
> > 
> >   $ btrfs check /dev/sdc
> >   Checking filesystem on /dev/sdb
> >   UUID: 52d3006e-ee3b-40eb-aa21-e56253a03d39
> >   checking extents
> >   checking free space cache
> >   checking fs roots
> >   root 5 inode 257 errors 800, odd csum item
> >   ERROR: errors found in fs roots
> >   found 688128 bytes used, error(s) found
> >   total csum bytes: 256
> >   total tree bytes: 163840
> >   total fs tree bytes: 65536
> >   total extent tree bytes: 16384
> >   btree space waste bytes: 138819
> >   file data blocks allocated: 10747904
> >referenced 10747904
> >   $ echo $?
> >   1
> > 
> > So teach check to not report such cases as errors by checking if the
> > extent is shared with other inodes and if so, consider it an error the
> > existence of checksum items only if all those other inodes are referencing
> > the extent as a prealloc extent.
> > This case can be hit often when running the generic/475 testcase from
> > fstests.
> > 
> > A test case will follow in a separate patch.
> > 
> > Signed-off-by: Filipe Manana 
> 
> Reviewed-by: Nikolay Borisov 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2 v3] Btrfs-progs: add fsck test for filesystem with shared prealloc extents

2018-03-19 Thread David Sterba

On Wed, Mar 14, 2018 at 08:11:18PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> Verify that a filesystem check operation (fsck) does not report the
> following scenario as an error:
> 
> An extent is shared between two inodes, as a result of clone/reflink
> operation, and for one of the inodes, lets call it inode A, the extent is
> referenced through a file extent item as a prealloc extent, while for the
> other inode, call it inode B, the extent is referenced through a regular
> file extent item, that is, it was written to. The goal of this test is to
> make sure a filesystem check operation will not report "odd csum items"
> errors for the prealloc extent at inode A, because this scenario is valid
> since the extent was written through inode B and therefore it is expected
> to have checksum items in the filesystem's checksum btree for that shared
> extent.
> 
> Such scenario can be created with the following steps for example:
> 
>  mkfs.btrfs -f /dev/sdb
>  mount /dev/sdb /mnt
> 
>  touch /mnt/foo
>  xfs_io -c "falloc 0 256K" /mnt/foo
>  sync
> 
>  xfs_io -c "pwrite -S 0xab 0 256K" /mnt/foo
>  touch /mnt/bar
>  xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar
>  xfs_io -c "fsync" /mnt/bar
> 
>  
>  mount /dev/sdb /mnt
>  umount /mnt
> 
> This scenario is fixed by the following patch for the filesystem checker:
> 
>  "Btrfs-progs: check, fix false error reports for shared prealloc extents"
> 
> Signed-off-by: Filipe Manana 
> Reviewed-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: convert/ext2: Remove check for ext2_ext_attr_entry->e_value_block

2018-03-19 Thread David Sterba

On Wed, Mar 14, 2018 at 08:56:57AM +0800, Qu Wenruo wrote:
> In latest e2fsprogs (1.44.0) definition of ext2_ext_attr_entry has
> removed member e_value_block, as currently ext* doesn't support it set
> anyway.
> 
> So remove such check so that we can pass compile.
> 
> Signed-off-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE

2018-03-19 Thread David Sterba

On Mon, Mar 19, 2018 at 08:20:10AM +, Hugo Mills wrote:
> On Mon, Mar 19, 2018 at 05:16:42PM +0900, Misono, Tomohiro wrote:
> > Currently, the top-level subvolume lacks the UUID. As a result, both
> > non-snapshot subvolume and snapshot of top-level subvolume do not have
> > Parent UUID and cannot be distinguisued. Therefore "fi show" of
> > top-level lists all the subvolumes which lacks the UUID in
> > "Snapshot(s)" filed.  Also, it lacks the otime information.
> > 
> > Fix this by adding the UUID and otime at the mkfs time.  As a
> > consequence, snapshots of top-level subvolume now have a Parent UUID and
> > UUID tree will create an entry for top-level subvolume at mount time.
> > This should not cause the problem for current kernel, but user program
> > which relies on the empty Parent UUID may be affected by this change.
> 
>Is there any way of adding a UUID to the top level subvol on an
> existing filesystem? It would be helpful not to have to rebuild every
> filesystem in the world to fix this.

We can do that by a special purpose tool. The easiest way is to set the
uuid on an unmouted filesystem, but as this is a one-time action I hope
this is acceptable. Added to todo, thanks for the suggestion.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE

2018-03-19 Thread Hugo Mills

On Mon, Mar 19, 2018 at 02:02:23PM +0100, David Sterba wrote:
> On Mon, Mar 19, 2018 at 08:20:10AM +, Hugo Mills wrote:
> > On Mon, Mar 19, 2018 at 05:16:42PM +0900, Misono, Tomohiro wrote:
> > > Currently, the top-level subvolume lacks the UUID. As a result, both
> > > non-snapshot subvolume and snapshot of top-level subvolume do not have
> > > Parent UUID and cannot be distinguisued. Therefore "fi show" of
> > > top-level lists all the subvolumes which lacks the UUID in
> > > "Snapshot(s)" filed.  Also, it lacks the otime information.
> > > 
> > > Fix this by adding the UUID and otime at the mkfs time.  As a
> > > consequence, snapshots of top-level subvolume now have a Parent UUID and
> > > UUID tree will create an entry for top-level subvolume at mount time.
> > > This should not cause the problem for current kernel, but user program
> > > which relies on the empty Parent UUID may be affected by this change.
> > 
> >Is there any way of adding a UUID to the top level subvol on an
> > existing filesystem? It would be helpful not to have to rebuild every
> > filesystem in the world to fix this.
> 
> We can do that by a special purpose tool. The easiest way is to set the
> uuid on an unmouted filesystem, but as this is a one-time action I hope
> this is acceptable. Added to todo, thanks for the suggestion.

   Sounds good to me.

   Hugo.

-- 
Hugo Mills | Talking about music is like dancing about
hugo@... carfax.org.uk | architecture
http://carfax.org.uk/  |
PGP: E2AB1DE4  |   Frank Zappa


signature.asc
Description: Digital signature

Re: [PATCH v2 1/7] btrfs-progs: convert/ext2: Fix inline extent creation check

2018-03-19 Thread David Sterba

On Tue, Mar 13, 2018 at 09:56:10AM +0800, Qu Wenruo wrote:
> Signed-off-by: Qu Wenruo 

Please squash patches 1 and 2 together and write a changelog, somethiing
like is in the cover letter. This is a bugfix, we need to document what
was broken and how it's fixed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE

2018-03-19 Thread Christoph Anton Mitterer

On Mon, 2018-03-19 at 14:02 +0100, David Sterba wrote:
> We can do that by a special purpose tool.

No average user will ever run (even know) about that...

Could you perhaps either do it automatically in fsck (which is IMO als
a bad idea as fsck should be read-only per default)... or at least add
a warning to fsck, like a "Info: Please run tool foo to get bar done."?

Cheers,
Chris.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 7/7] btrfs-progs: test/mkfs: Add test case for rootdir inline extent size

2018-03-19 Thread David Sterba

On Tue, Mar 13, 2018 at 09:56:16AM +0800, Qu Wenruo wrote:
> Add a test case for mkfs --rootdir, using files with different file
> sizes to check if invalid large inline extent could exist.
> 
> Signed-off-by: Qu Wenruo 
> ---
>  tests/mkfs-tests/014-rootdir-inline-extent/test.sh | 81 
> ++
>  1 file changed, 81 insertions(+)
>  create mode 100755 tests/mkfs-tests/014-rootdir-inline-extent/test.sh
> 
> diff --git a/tests/mkfs-tests/014-rootdir-inline-extent/test.sh 
> b/tests/mkfs-tests/014-rootdir-inline-extent/test.sh
> new file mode 100755
> index ..e0765b0bc4e2
> --- /dev/null
> +++ b/tests/mkfs-tests/014-rootdir-inline-extent/test.sh
> @@ -0,0 +1,81 @@
> +#!/bin/bash
> +# Regression test for mkfs.btrfs --rootdir with inline file extents
> +# For any large inline file extent, btrfs check could already report it
> +
> +source "$TOP/tests/common"

  "$TEST_TOP/common"

> +
> +check_global_prereq fallocate
> +check_prereq mkfs.btrfs
> +
> +prepare_test_dev
> +
> +tmp=$(mktemp -d --tmpdir btrfs-progs-mkfs.rootdirXXX)
> +
> +pagesize=$(getconf PAGESIZE)
> +create_file()
> +{
> + local size=$1
> + # Reuse size and filename
> + run_check fallocate -l $size "$tmp/$size"
> +}
> +
> +test_mkfs_rootdir()
> +{
> + nodesize=$1
> + run_check "$TOP/mkfs.btrfs" --nodesize $nodesize -f --rootdir "$tmp" \
> + "$TEST_DEV"
> + run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
> +}
> +
> +# File sizes is designed to cross differnet node size, so even
> +# the sectorsize is not 4K, we can still test it well.
> +create_file 512
> +create_file 1024
> +create_file 2048
> +
> +create_file 3994
> +create_file 3995 # For 4K node size, max inline would be 4k - 101
> +create_file 3996
> +
> +create_file 4095
> +create_file 4096
> +create_file 4097
> +
> +create_file 8090
> +create_file 8091
> +create_file 8092
> +
> +create_file 8191
> +create_file 8192
> +create_file 8193
> +
> +create_file 16282
> +create_file 16283
> +create_file 16284
> +
> +create_file 16383
> +create_file 16384
> +create_file 16385
> +
> +create_file 32666
> +create_file 32667
> +create_file 32668
> +
> +create_file 32767
> +create_file 32768
> +create_file 32769
> +
> +create_file 65434
> +create_file 65435
> +create_file 65436
> +
> +create_file 65535
> +create_file 65536
> +create_file 65537

Please rewrite that as a for cycle that adds the -1 and +1 values.

> +
> +for nodesize in 4096 8192 16384 32768 65536; do
> + if [ $nodesize -ge $pagesize ]; then
> + test_mkfs_rootdir $nodesize
> + fi
> +done
> +rm -rf -- "$tmp"
> -- 
> 2.16.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE

2018-03-19 Thread David Sterba

On Mon, Mar 19, 2018 at 02:26:38PM +0100, Christoph Anton Mitterer wrote:
> On Mon, 2018-03-19 at 14:02 +0100, David Sterba wrote:
> > We can do that by a special purpose tool.
> 
> No average user will ever run (even know) about that...
> 
> Could you perhaps either do it automatically in fsck (which is IMO als
> a bad idea as fsck should be read-only per default)... or at least add
> a warning to fsck, like a "Info: Please run tool foo to get bar done."?

Makes sense, check can warn and point to the command, in a similar way
we have the fix-device-count.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v3 5/7] btrfs-progs: sub list: Allow normal user to call "subvolume list/show"

2018-03-19 Thread Goffredo Baroncelli

On 03/19/2018 08:32 AM, Misono, Tomohiro wrote:
> Allow normal user to call "subvolume list/show" by using 3 new
> unprivileged ioctls (BTRFS_IOC_GET_SUBVOL_INFO,
> BTRFS_IOC_GET_SUBVOL_ROOTREF and BTRFS_IOC_INO_LOOKUP_USER).
> 
> Note that for root, "subvolume list" returns all the subvolume in the
> filesystem by default, but for normal user, it returns subvolumes
> which exist under the specified path (including the path itself).
> The specified path itself is not needed to be a subvolume.
> If the subvolume cannot be opend but the parent directory can be,
> the information other than name or id would be zeroed out.
> 
> Also, for normal user, snapshot filed of "subvolume show" just lists
> the snapshots under the specified subvolume.
> 
> Signed-off-by: Tomohiro Misono 
> ---
>  btrfs-list.c | 344 
> +--
>  cmds-subvolume.c |   8 ++
>  2 files changed, 341 insertions(+), 11 deletions(-)
> 
> diff --git a/btrfs-list.c b/btrfs-list.c
> index 833ff912..c819499f 100644
> --- a/btrfs-list.c
> +++ b/btrfs-list.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include "btrfs-list.h"
>  #include "rbtree-utils.h"
> +#include 
>  
>  #define BTRFS_LIST_NFILTERS_INCREASE (2 * BTRFS_LIST_FILTER_MAX)
>  #define BTRFS_LIST_NCOMPS_INCREASE   (2 * BTRFS_LIST_COMP_MAX)
> @@ -549,6 +550,9 @@ static int resolve_root(struct root_lookup *rl, struct 
> root_info *ri,
>   int len = 0;
>   struct root_info *found;
>  
> + if (ri->full_path != NULL)
> + return 0;
> +
>   /*
>* we go backwards from the root_info object and add pathnames
>* from parent directories as we go.
> @@ -672,6 +676,50 @@ static int lookup_ino_path(int fd, struct root_info *ri)
>   return 0;
>  }
>  
> +/* user version of lookup_ino_path which also cheks the access right */
> +static int lookup_ino_path_user(int fd, struct root_info *ri)
> +{
> + struct btrfs_ioctl_ino_lookup_user_args args;
> + int ret = 0;
> +
> + if (ri->path)
> + return 0;
> + if (!ri->ref_tree)
> + return -ENOENT;
> +
> + memset(&args, 0, sizeof(args));
> + args.dirid = ri->dir_id;
> + args.subvolid = ri->root_id;
> +
> + ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP_USER, &args);
> + if (ret < 0) {
> + if (errno == ENOENT) {
> + ri->ref_tree = 0;
> + return -ENOENT;
> + }
> + if (errno != EACCES) {
> + error("failed to lookup path for root %llu: %m",
> + (unsigned long long)ri->ref_tree);
> + return ret;
> + } else {
> + return -EACCES;
> + }
> + }
> +
> + ri->path = malloc(strlen(args.path) + strlen(args.name) + 1);
> + if (!ri->path)
> + return -ENOMEM;
> + strcpy(ri->path, args.path);
> +
> + ri->name = malloc(strlen(args.name) + 1);
> + if (!ri->name)
> + return -ENOMEM;
> + strcpy(ri->name, args.name);
> +
> + strcat(ri->path, ri->name);
> + return ret;
> +}
> +
>  /* finding the generation for a given path is a two step process.
>   * First we use the inode lookup routine to find out the root id
>   *
> @@ -1310,6 +1358,266 @@ static int list_subvol_fill_paths(int fd, struct 
> root_lookup *root_lookup)
>   return 0;
>  }
>  
> +static int fill_subvol_info(struct root_lookup *root_lookup, int fd)
> +{
> + struct btrfs_ioctl_get_subvol_info_args subvol_info;
> + u64 root_offset;
> + int ret;
> +
> + ret = ioctl(fd, BTRFS_IOC_GET_SUBVOL_INFO, &subvol_info);
> + if (ret < 0)
> + return -errno;
> +
> + if (!uuid_is_null(subvol_info.parent_uuid))
> + root_offset = subvol_info.otransid;
> + else
> + root_offset = 0;
> +
> + add_root(root_lookup, subvol_info.id, 0,
> +  root_offset, subvol_info.flags, subvol_info.dirid,
> +  NULL, 0,
> +  subvol_info.otransid, subvol_info.generation,
> +  subvol_info.otime.sec, subvol_info.uuid,
> +  subvol_info.parent_uuid, subvol_info.received_uuid);
> +
> + return 0;
> +}
> +
> +static int fill_subvol_info_top(struct root_lookup *root_lookup, int fd)
> +{
> + struct btrfs_ioctl_get_subvol_info_args subvol_info;
> + struct root_info *ri;
> + u64 root_offset;
> + int ret;
> +
> + ret = ioctl(fd, BTRFS_IOC_GET_SUBVOL_INFO, &subvol_info);
> + if (ret < 0)
> + return -errno;
> +
> + if (!uuid_is_null(subvol_info.parent_uuid))
> + root_offset = subvol_info.otransid;
> + else
> + root_offset = 0;
> +
> + add_root(root_lookup, subvol_info.id, subvol_info.parent_id,
> +  root_offset, subvol_info.flags, subvol_info.dirid,
> +  subvol_info.name, strlen(subvol_info.name),
> +  subvol_info.otransid, subvol_info.generation,
> +  sub

Re: [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers

2018-03-19 Thread Jeff Mahoney

On 3/16/18 4:12 PM, David Sterba wrote:
> On Fri, Mar 16, 2018 at 02:36:27PM -0400, je...@suse.com wrote:
>> From: Jeff Mahoney 
>>
>> While running btrfs/011, I hit the following lockdep splat.
>>
>> This is the important bit:
>>pcpu_alloc+0x1ac/0x5e0
>>__percpu_counter_init+0x4e/0xb0
>>btrfs_init_fs_root+0x99/0x1c0 [btrfs]
>>btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
>>resolve_indirect_refs+0x130/0x830 [btrfs]
>>find_parent_nodes+0x69e/0xff0 [btrfs]
>>btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
>>btrfs_find_all_roots+0x50/0x70 [btrfs]
>>btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
>>btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
>>
>> The percpu_counter_init call in btrfs_alloc_subvolume_writers
>> uses GFP_KERNEL, which we can't do during transaction commit.
>>
>> This switches it to GFP_NOFS.
> 
>> Signed-off-by: Jeff Mahoney 
>> ---
>>  fs/btrfs/disk-io.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 21f34ad0d411..eb6bb3169a9e 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers 
>> *btrfs_alloc_subvolume_writers(void)
>>  if (!writers)
>>  return ERR_PTR(-ENOMEM);
>>  
>> -ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
>> +ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
> 
> A line above the diff context is another allocation that does GFP_NOFS,
> so one of the gfp flags were wrong.
> 
> Looks like there's another instance where percpu allocates with
> GFP_KERNEL: create_space_info that can be called from the path that
> allocates chunks, so this also looks like a NOFS candidate.

We can get rid of this case entirely.  Those call sites should be
removed since the space_infos are all allocated at mount time.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature

Re: Crashes running btrfs scrub

2018-03-19 Thread Liu Bo

On Sun, Mar 18, 2018 at 3:52 PM, waxhead  wrote:
> Liu Bo wrote:
>>
>> On Sat, Mar 17, 2018 at 5:26 PM, Liu Bo  wrote:
>>>
>>> On Fri, Mar 16, 2018 at 2:46 PM, Mike Stevens 
>>> wrote:
>
> Could you please paste the whole dmesg, it looks like it hit
> btrfs_abort_transaction(),
> which should give us more information about where goes wrong.


 The whole thing is here https://pastebin.com/4ENq2saQ
>>>
>>>
>>> Given this,
>>>
>>> [  299.410998] BTRFS: error (device sdag) in
>>> btrfs_create_pending_block_groups:10192: errno=-27 unknown
>>>
>>> it refers to -EFBIG, so I think the warning comes from
>>>
>>> btrfs_add_system_chunk()
>>> {
>>> ...
>>>  if (array_size + item_size + sizeof(disk_key)
>>>
>>>  > BTRFS_SYSTEM_CHUNK_ARRAY_SIZE) {
>>>
>>>  mutex_unlock(&fs_info->chunk_mutex);
>>>
>>>  return -EFBIG;
>>>
>>>  }
>>>
>>> If that's the case, we need to check this earlier during mount.
>>>
>>
>> I didn't realize this until now,  we do have a limitation on up to how
>> many disks btrfs could handle, in order to make balance/scrub work
>> properly (where system chunks may be set readonly),
>>
>> ((BTRFS_SYSTEM_CHUNK_ARRAY_SIZE / 2) - sizeof(struct btrfs_chunk)) /
>> sizeof(struct btrfs_stripe) + 1
>>
>> will be the number of disks btrfs can handle at most.
>
>
> Am I understanding this correct, BTRFS have limit to the number of physical
> devices it can handle?! (max 30 devices?!)
>
> Or are this referring to the number of devices BTRFS can utilize in a stripe
> (in which case 30 actually sounds like a high number).
>
> 30 devices is really not that much, heck you get 90 disks top load JBOD
> storage chassis these days and BTRFS does sound like an attractive choice
> for things like that.

So Mike's case is, that both metadata and data are configured as
raid6, and the operations, balance and scrub, that he tried, need to
set the existing block group as readonly (in order to avoid any
further changes being applied during operations are running), then we
got into the place where another system chunk is needed.

However, I think it'd be better to have some warnings about this when
doing a) mkfs.btrfs -mraid6, b) btrfs device add.

David, any idea?

thanks,
liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: print-tree: Enhance warning on tree block level mismatch and error handling

2018-03-19 Thread David Sterba

On Thu, Mar 15, 2018 at 12:48:15PM +0800, Qu Wenruo wrote:
> This patch enhance the tree block level mismatch by the following
> methods:
> 
> 1) Merge same warning branches into one
>We had two branches showing the same message, and their condition
>is also the same. Merge them
> 
> 2) Only skip bad slot
>The old code skipped all the remaining slots, here we just skip one
>slot to output as many correct tree blocks as possible.
> 
> 3) Enhance warning message
>Ouput the parent bytenr and expected and wrong level, so we don't
>need refer to the stdout to get which tree block is corrupted.
> 
> Signed-off-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: inspect-dump-tree: add '-f|--follow' options to print all children tree blocks for '-b'

2018-03-19 Thread David Sterba

On Wed, Mar 14, 2018 at 09:05:38AM +0800, Qu Wenruo wrote:
> When debuging with "btrfs inspect dump-tree", it's not that handy if we
> want to iterate all child tree blocks starting from a specified block.
> 
> -b can only print a single block, while without -b "btrfs inspect dump-tree"
> will need extra tree roots fulfilled to continue, which is not possible
> for some damaged filesystem.
> 
> Add a new option '-f|--follow' to iterate a sub-tree starting from block
> specified by '-b|--block', so we would have less limitation.
> 
> Signed-off-by: Qu Wenruo 

I've dropped the shot option for now and updated the changelog. While
'-f' is probably a good choice, I'd postpone adding that if we want to
use it for other purposes.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] btrfs-progs: Fix typos in docs and user-facing strings

2018-03-19 Thread David Sterba

On Thu, Mar 15, 2018 at 08:39:09PM -0400, Nicholas D Steeves wrote:
> Signed-off-by: Nicholas D Steeves 

Thank you very much. I see the amount of typos is lower than the last time.
Patch applied cleanly, so no manual merging was necesssary.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] btrfs-progs: docs: annual typo, clarity, & grammar review & fixups

2018-03-19 Thread David Sterba

On Thu, Mar 15, 2018 at 08:49:26PM -0400, Nicholas D Steeves wrote:
> Hi Qu,
> 
> So sorry for the incredibly delayed reply [it got lost in my drafts
> folder], I sincerely appreciate the time you took to respond.  There
> is a lot in your responses that I suspect would benefit readers of the
> btrfs wiki, so I've drawn attention to them by replying inline.  I've
> omitted the sections David resolved with his merge.
> 
> P.S. Even graduate-level native speakers struggle with the
> multitude of special-cases in English!
> 
> On Sun, Oct 22, 2017 at 06:54:16PM +0800, Qu Wenruo wrote:
> > Hi Nicholas,
> > 
> > Thanks for the documentation update.
> > Since I'm not a native English speaker, I may not help much to organize
> > the sentence, but I can help to explain the question noted in the
> > modification.
> > 
> > On 2017年10月22日 08:00, Nicholas D Steeves wrote:
> > > In one big patch, as requested
> [...]
> > > --- a/Documentation/btrfs-balance.asciidoc
> > > +++ b/Documentation/btrfs-balance.asciidoc
> > > @@ -21,7 +21,7 @@ filesystem.
> > >  The balance operation is cancellable by the user. The on-disk state of 
> > > the
> > >  filesystem is always consistent so an unexpected interruption (eg. 
> > > system crash,
> > >  reboot) does not corrupt the filesystem. The progress of the balance 
> > > operation
> > > -is temporarily stored and will be resumed upon mount, unless the mount 
> > > option
> > > +is temporarily stored (EDIT: where is it stored?) and will be 
> > > resumed upon mount, unless the mount option
> > 
> > To be specific, they are stored in data reloc tree and tree reloc tree.
> > 
> > Data reloc tree stores the data/metadata written to new location.
> > 
> > And tree reloc tree is kind of special snapshot for each tree whose tree
> > block is get relocated during the relocation.
> 
> Is there already a document on the btrfs allocation?  This seems like
> it might be a nice addition for the wiki.  I'm guessing it would fit
> under
> https://btrfs.wiki.kernel.org/index.php/Main_Page#Developer_documentation
> 
> > > @@ -200,11 +200,11 @@ section 'PROFILES'.
> > >  ENOSPC
> > >  --
> > >  
> > > -The way balance operates, it usually needs to temporarily create a new 
> > > block
> > > +The way balance operates, it usually needs to temporarily create a 
> > > new block
> > >  group and move the old data there. For that it needs work space, 
> > > otherwise
> > >  it fails for ENOSPC reasons.
> > >  This is not the same ENOSPC as if the free space is exhausted. This 
> > > refers to
> > > -the space on the level of block groups.
> > > +the space on the level of block groups. (EDIT: What is the 
> > > relationship between the new block group and the work space?  Is the "old 
> > > data" removed from the new block group?  Please say something about block 
> > > groups to clarify)
> > 
> > Here I think we're talking about allocating new block group, so it's
> > using unallocated space.
> > 
> > While for normal space usage, we're allocating from *allocated* block
> > group space.
> > 
> > So there are two levels of space allocation:
> > 
> > 1) Extent level
> >Always allocated from existing block group (or chunk).
> >Data extent, tree block allocation are all happening at this level.
> > 
> > 2) Block group (or chunk, which are the same) level
> >Always allocated from free device space.
> > 
> > I think the original sentence just wants to address this.
> 
> Also seems like a good fit for a btrfs allocation document.
> 
> > >  
> > >  The free work space can be calculated from the output of the *btrfs 
> > > filesystem show*
> > >  command:
> > > @@ -227,7 +227,7 @@ space. After that it might be possible to run other 
> > > filters.
> > >  
> > >  Conversion to profiles based on striping (RAID0, RAID5/6) require the 
> > > work
> > >  space on each device. An interrupted balance may leave partially filled 
> > > block
> > > -groups that might consume the work space.
> > > +groups that might (EDIT: is this 2nd level of uncertainty 
> > > necessary?) consume the work space.
> > >  
> [...]
> > > @@ -3,7 +3,7 @@ btrfs-filesystem(8)
> [...]
> > >  SYNOPSIS
> > >  
> > > @@ -53,8 +53,8 @@ not total size of filesystem.
> > >  when the filesystem is full. Its 'total' size is dynamic based on the
> > >  filesystem size, usually not larger than 512MiB, 'used' may fluctuate.
> > >  +
> > > -The global block reserve is accounted within Metadata. In case the 
> > > filesystem
> > > -metadata are exhausted, 'GlobalReserve/total + Metadata/used = 
> > > Metadata/total'.
> > > +The global block reserve is accounted within Metadata. In case the 
> > > filesystem
> > > +metadata are exhausted, 'GlobalReserve/total + Metadata/used = 
> > > Metadata/total'. (EDIT: s/are/is/? And please write more for clarity. 
> > > Is "global block reserve" part of GlobalReserve that is accounted within 
> > > Metadata?  Isn't all of GlobalReserve's metadata accounted within

Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy

2018-03-19 Thread David Sterba

On Sat, Mar 17, 2018 at 06:27:15PM +0100, Christoph Biedl wrote:
> Greg Kroah-Hartman wrote...
> 
> > On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:
> 
> > > > commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
> 
> > > On big-endian systems, this change intruduces severe corruption,
> > > resulting in complete loss of the data on the used block device.
> 
> > That sucks.  Can you test Linus's tree to verify the problem is there?
> > I'll gladly revert this if Linus's tree also gets the revert, I don't
> > want you to hit this when you upgrade to a newer kernel.
> 
> Confirmed: The problem is, err ... was in Linus' tree as well. The
> rather recent commit 8f5fd927c3a7 reverted the change, after that
> everything is as expected again.

Thanks for checking.

> Looking at the original commit, I don't have a clue why things go wrong
> so horribly

It's a half endianness conversion. The plain in-memory structures are in
LE and has to be accessed via the helpers, super block copy and the
root item.  The commit adds only one half of the conversion, that
naturally does not exhibit on LE, because the macros are no-op.

Originally, the items were stored from the on-disk type to on-disk type,
regardless of the CPU:

  super->chunk_root = root_item->bytenr;

The patch should have added conversion of both values, like

  btrfs_set_super_chunk_root(super, btrfs_root_bytenr(root_item));

and not

  btrfs_set_super_chunk_root(super, root_item->bytenr);

It's not very obvious from the context though, typically there's one
structure that needs the accessors and is set from a value that's in CPU
byteorder. I think that the exception to this pattern confused all
involved developers.

The root_item members are annotated as __le64 that should be caught by
sparse/smatch checker in the buggy case, but we don't run the checkers
every the time.

> - otherwise don't be afraid of my data. I took this as a
> chance to verify my data recovery procedure, with success.

Should that not be the case, the damaged items in superblock can be
byteswapped back. That's 6 x u64 at most, I have a tool for that now.

Thanks again for the report and sorry for the trouble.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: print-tree: Do correct offset output for ROOT_ITEM

2018-03-19 Thread David Sterba

On Mon, Mar 19, 2018 at 06:28:10PM +0800, Qu Wenruo wrote:
> Commit Fixes: 8c36786c8198 ("btrfs-progs: print-tree: Print offset as tree 
> objectid for ROOT_ITEM")
> changes how we translate offset of ROOT_ITEM.
> 
> However the fact is, even for ROOT_ITEM, we have different meaning of
> offset.
> 
> For tree reloc tree, it's indeed subvolume id.
> But for other trees, it's the transid of when it's created.
> 
> Fix it.
> 
> Reported-by: Nikolay Borisov 
> Signed-off-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers

2018-03-19 Thread David Sterba

On Mon, Mar 19, 2018 at 01:52:05PM -0400, Jeff Mahoney wrote:
> On 3/16/18 4:12 PM, David Sterba wrote:
> > On Fri, Mar 16, 2018 at 02:36:27PM -0400, je...@suse.com wrote:
> >> From: Jeff Mahoney 
> >>
> >> While running btrfs/011, I hit the following lockdep splat.
> >>
> >> This is the important bit:
> >>pcpu_alloc+0x1ac/0x5e0
> >>__percpu_counter_init+0x4e/0xb0
> >>btrfs_init_fs_root+0x99/0x1c0 [btrfs]
> >>btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
> >>resolve_indirect_refs+0x130/0x830 [btrfs]
> >>find_parent_nodes+0x69e/0xff0 [btrfs]
> >>btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
> >>btrfs_find_all_roots+0x50/0x70 [btrfs]
> >>btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
> >>btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
> >>
> >> The percpu_counter_init call in btrfs_alloc_subvolume_writers
> >> uses GFP_KERNEL, which we can't do during transaction commit.
> >>
> >> This switches it to GFP_NOFS.
> > 
> >> Signed-off-by: Jeff Mahoney 
> >> ---
> >>  fs/btrfs/disk-io.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> >> index 21f34ad0d411..eb6bb3169a9e 100644
> >> --- a/fs/btrfs/disk-io.c
> >> +++ b/fs/btrfs/disk-io.c
> >> @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers 
> >> *btrfs_alloc_subvolume_writers(void)
> >>if (!writers)
> >>return ERR_PTR(-ENOMEM);
> >>  
> >> -  ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
> >> +  ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
> > 
> > A line above the diff context is another allocation that does GFP_NOFS,
> > so one of the gfp flags were wrong.
> > 
> > Looks like there's another instance where percpu allocates with
> > GFP_KERNEL: create_space_info that can be called from the path that
> > allocates chunks, so this also looks like a NOFS candidate.
> 
> We can get rid of this case entirely.  Those call sites should be
> removed since the space_infos are all allocated at mount time.

That would be great and make a few things simpler. So this means that
__find_space_info never fails once the space infos are properly
initialized, right? That was my concern in do_chunk_alloc and
btrfs_make_block_group (that's called from __btrfs_alloc_chunk).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers

2018-03-19 Thread Jeff Mahoney

On 3/19/18 2:08 PM, David Sterba wrote:
> On Mon, Mar 19, 2018 at 01:52:05PM -0400, Jeff Mahoney wrote:
>> On 3/16/18 4:12 PM, David Sterba wrote:
>>> On Fri, Mar 16, 2018 at 02:36:27PM -0400, je...@suse.com wrote:
 From: Jeff Mahoney 

 While running btrfs/011, I hit the following lockdep splat.

 This is the important bit:
pcpu_alloc+0x1ac/0x5e0
__percpu_counter_init+0x4e/0xb0
btrfs_init_fs_root+0x99/0x1c0 [btrfs]
btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
resolve_indirect_refs+0x130/0x830 [btrfs]
find_parent_nodes+0x69e/0xff0 [btrfs]
btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
btrfs_find_all_roots+0x50/0x70 [btrfs]
btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]

 The percpu_counter_init call in btrfs_alloc_subvolume_writers
 uses GFP_KERNEL, which we can't do during transaction commit.

 This switches it to GFP_NOFS.
>>>
 Signed-off-by: Jeff Mahoney 
 ---
  fs/btrfs/disk-io.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 21f34ad0d411..eb6bb3169a9e 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -1108,7 +1108,7 @@ static struct btrfs_subvolume_writers 
 *btrfs_alloc_subvolume_writers(void)
if (!writers)
return ERR_PTR(-ENOMEM);
  
 -  ret = percpu_counter_init(&writers->counter, 0, GFP_KERNEL);
 +  ret = percpu_counter_init(&writers->counter, 0, GFP_NOFS);
>>>
>>> A line above the diff context is another allocation that does GFP_NOFS,
>>> so one of the gfp flags were wrong.
>>>
>>> Looks like there's another instance where percpu allocates with
>>> GFP_KERNEL: create_space_info that can be called from the path that
>>> allocates chunks, so this also looks like a NOFS candidate.
>>
>> We can get rid of this case entirely.  Those call sites should be
>> removed since the space_infos are all allocated at mount time.
> 
> That would be great and make a few things simpler. So this means that
> __find_space_info never fails once the space infos are properly
> initialized, right? That was my concern in do_chunk_alloc and
> btrfs_make_block_group (that's called from __btrfs_alloc_chunk).

That's a different case.  The raid levels are added when the first block
group of a particular read level is loaded up.  That can happen when the
block groups are read in initially, where it should be safe to use
GFP_KERNEL or when a chunk of a new type is allocated.  The thing is
that a chunk of a new type will only be allocated when we're converting
via balance, so we may be able to do the kobject_add for the raid level
when we start the balance rather than wait for it to create the block group.

-Jeff


-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] btrfs: Validate child tree block's level and first key

2018-03-19 Thread kbuild test robot

Hi Qu,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v4.16-rc4]
[also build test ERROR on next-20180319]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-Validate-child-tree-block-s-level-and-first-key/20180320-054353
config: i386-randconfig-x006-201811 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   fs//btrfs/ref-verify.c: In function 'walk_down_tree':
>> fs//btrfs/ref-verify.c:586:9: error: too few arguments to function 
>> 'read_tree_block'
   eb = read_tree_block(fs_info, block_bytenr, gen);
^~~
   In file included from fs//btrfs/ref-verify.c:22:0:
   fs//btrfs/disk-io.h:55:23: note: declared here
struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 
bytenr,
  ^~~

vim +/read_tree_block +586 fs//btrfs/ref-verify.c

fd708b81 Josef Bacik 2017-09-29  570  
fd708b81 Josef Bacik 2017-09-29  571  /* Walk down to the leaf from the given 
level */
fd708b81 Josef Bacik 2017-09-29  572  static int walk_down_tree(struct 
btrfs_root *root, struct btrfs_path *path,
fd708b81 Josef Bacik 2017-09-29  573  int level, u64 
*bytenr, u64 *num_bytes)
fd708b81 Josef Bacik 2017-09-29  574  {
fd708b81 Josef Bacik 2017-09-29  575struct btrfs_fs_info *fs_info = 
root->fs_info;
fd708b81 Josef Bacik 2017-09-29  576struct extent_buffer *eb;
fd708b81 Josef Bacik 2017-09-29  577u64 block_bytenr, gen;
fd708b81 Josef Bacik 2017-09-29  578int ret = 0;
fd708b81 Josef Bacik 2017-09-29  579  
fd708b81 Josef Bacik 2017-09-29  580while (level >= 0) {
fd708b81 Josef Bacik 2017-09-29  581if (level) {
fd708b81 Josef Bacik 2017-09-29  582block_bytenr = 
btrfs_node_blockptr(path->nodes[level],
fd708b81 Josef Bacik 2017-09-29  583
   path->slots[level]);
fd708b81 Josef Bacik 2017-09-29  584gen = 
btrfs_node_ptr_generation(path->nodes[level],
fd708b81 Josef Bacik 2017-09-29  585
path->slots[level]);
fd708b81 Josef Bacik 2017-09-29 @586eb = 
read_tree_block(fs_info, block_bytenr, gen);
fd708b81 Josef Bacik 2017-09-29  587if (IS_ERR(eb))
fd708b81 Josef Bacik 2017-09-29  588return 
PTR_ERR(eb);
fd708b81 Josef Bacik 2017-09-29  589if 
(!extent_buffer_uptodate(eb)) {
fd708b81 Josef Bacik 2017-09-29  590
free_extent_buffer(eb);
fd708b81 Josef Bacik 2017-09-29  591return -EIO;
fd708b81 Josef Bacik 2017-09-29  592}
fd708b81 Josef Bacik 2017-09-29  593
btrfs_tree_read_lock(eb);
fd708b81 Josef Bacik 2017-09-29  594
btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
fd708b81 Josef Bacik 2017-09-29  595path->nodes[level-1] = 
eb;
fd708b81 Josef Bacik 2017-09-29  596path->slots[level-1] = 
0;
fd708b81 Josef Bacik 2017-09-29  597path->locks[level-1] = 
BTRFS_READ_LOCK_BLOCKING;
fd708b81 Josef Bacik 2017-09-29  598} else {
fd708b81 Josef Bacik 2017-09-29  599ret = 
process_leaf(root, path, bytenr, num_bytes);
fd708b81 Josef Bacik 2017-09-29  600if (ret)
fd708b81 Josef Bacik 2017-09-29  601break;
fd708b81 Josef Bacik 2017-09-29  602}
fd708b81 Josef Bacik 2017-09-29  603level--;
fd708b81 Josef Bacik 2017-09-29  604}
fd708b81 Josef Bacik 2017-09-29  605return ret;
fd708b81 Josef Bacik 2017-09-29  606  }
fd708b81 Josef Bacik 2017-09-29  607  

:: The code at line 586 was first introduced by commit
:: fd708b81d972a0714b02a60eb4792fdbf15868c4 Btrfs: add a extent ref verify 
tool

:: TO: Josef Bacik 
:: CC: David Sterba 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-19 Thread Linus Torvalds

On Mon, Mar 19, 2018 at 2:43 AM, David Laight  wrote:
>
> Is it necessary to have the full checks for old versions of gcc?
>
> Even -Wvla could be predicated on very recent gcc - since we aren't
> worried about whether gcc decides to generate a vla, but whether
> the source requests one.

You are correct. We could just ignore the issue with old gcc versions,
and disable -Wvla rather than worry about it.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v3 5/7] btrfs-progs: sub list: Allow normal user to call "subvolume list/show"

2018-03-19 Thread Misono, Tomohiro

On 2018/03/20 2:09, Goffredo Baroncelli wrote:
> On 03/19/2018 08:32 AM, Misono, Tomohiro wrote
[snip]
>>  static void print_subvolume_column(struct root_info *subv,
>> enum btrfs_list_column_enum column)
>>  {
>> @@ -1492,19 +1800,33 @@ next:
>>  static int btrfs_list_subvols(int fd, struct root_lookup *root_lookup,
>>  const char *path)
>>  {
>> +uid_t uid;
>>  int ret;
>>  
>> -ret = list_subvol_search(fd, root_lookup);
>> -if (ret) {
>> -error("can't perform the search: %m");
>> -return ret;
>> +uid = geteuid();
>> +if (!uid) {
> 
> A minor tip: I think uid == 0 is better here, because we should check that 
> uid is root

ok, thanks.

> 
>> +ret = list_subvol_search(fd, root_lookup);
>> +if (ret) {
>> +error("can't perform the search: %s", strerror(-ret));
>> +return ret;
>> +}
>> +/*
>> + * now we have an rbtree full of root_info objects, but we
>> + * need to fill in their path names within the subvol that
>> + * is referencing each one.
>> + */
>> +ret = list_subvol_fill_paths(fd, root_lookup);
>> +} else {
>> +ret = list_subvol_user(fd, root_lookup, path);
>> +if (ret) {
>> +if (ret == -ENOTTY)
>> +error("can't perform the search: Operation by non-privileged user is not 
>> supported by this kernel.");
>> +else
>> +error("can't perform the search: %s",
>> +strerror(-ret));
>> +}
> 
> Sorry for noticing that only now: which is the reason to not using the new 
> api also in "root" case? I know that the behavior is a bit different, so my 
> question is: it is possible to extend the new ioctls to support also the old 
> behavior in the root case? So we could get rid of the "tree search" ioctl, 
> using it only for the old kernel

It is not possible to provide the same functionality for root by current 
implementation of ioctl.

Currently all three ioctls returns the information of subvolume which contains 
the fd's inode.
This is because to prevent a user from getting arbitrary subvolume information 
in the filesystem.

So, in order to work "sub list" with these new ioctls , we need to do:
1. open the path
2. call GET_SUBVOL_INFO to get the subvolume info of opened path.
3. call GET_SUBVOL_ROOTREF to get the list of child subvolume id.
4. call INO_LOOKUP_USER to check if a user can really access the (parent 
directory of) child subvolume and construct the path.
5. for each accessible child subvolume, repeat 1-5. 

Therefore there is no way to search subvolumes outside of mount point if the 
default subvolume
has been changed and cannot be used as an alternative of tree search ioctl for 
root.

If we pass a subvolid to be searched as an argument (allowing for root only), 
it is possible to use
these ioctls to replace tree search ioctl. However, this means the behavior of 
ioctl is different from
root and normal user. Is it a good thing? Also, the number of ioctl call 
certainly increases since 
GET_SUBVOL_INFO/SUBVOL_ROOTREF needs to be called for each subvolume while tree 
search returns multiple
information at once. I'd like to know others' opinions.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-19 Thread Arnd Bergmann

On Tue, Mar 20, 2018 at 7:29 AM, Linus Torvalds
 wrote:
> On Mon, Mar 19, 2018 at 2:43 AM, David Laight  wrote:
>>
>> Is it necessary to have the full checks for old versions of gcc?
>>
>> Even -Wvla could be predicated on very recent gcc - since we aren't
>> worried about whether gcc decides to generate a vla, but whether
>> the source requests one.
>
> You are correct. We could just ignore the issue with old gcc versions,
> and disable -Wvla rather than worry about it.

This version might also be an option:

diff --git a/Makefile b/Makefile
index 37fc475a2b92..49dd9f0fb76c 100644
--- a/Makefile
+++ b/Makefile
@@ -687,7 +687,8 @@ KBUILD_CFLAGS += $(call cc-option,-fno-reorder-blocks,) \
 endif

 ifneq ($(CONFIG_FRAME_WARN),0)
-KBUILD_CFLAGS += $(call cc-option,-Wframe-larger-than=${CONFIG_FRAME_WARN})
+KBUILD_CFLAGS += $(call cc-option,-Wstack-usage=${CONFIG_FRAME_WARN}, \
+   -$(call cc-option,-Wframe-larger-than=${CONFIG_FRAME_WARN}))
 endif

 # This selects the stack protector compiler flag. Testing it is delayed

Wiht -Wstack-usage=, we should get a similar warning to -Wvla for frames that
contain real VLAs, but not when there is a VLA that ends up being a compile-time
constant size in the end. Wstack-usage was introduced in gcc-4.7, so
on older versions
it turns back into Wframe-larger-than=.

An example output would be

security/integrity/ima/ima_crypto.c: In function 'ima_calc_buffer_hash':
security/integrity/ima/ima_crypto.c:616:5: error: stack usage might be
unbounded [-Werror=stack-usage=]

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs-progs: ctree: Add extra level check for read_node_slot()

2018-03-19 Thread Qu Wenruo



On 2018年03月19日 20:36, David Sterba wrote:
> On Thu, Feb 08, 2018 at 08:59:40AM +0800, Qu Wenruo wrote:
>> Strangely, we have level check in btrfs_print_tree() while we don't have
>> the same check in read_node_slot().
>>
>> That's to say, for the following corruption, btrfs_search_slot() or
>> btrfs_next_leaf() can return invalid leaf:
>>
>> Parent eb:
>>   node XX level 1
>>   ^^^
>>   Child should be leaf (level 0)
>>   ...
>>   key (XXX XXX XXX) block YY
>>
>> Child eb:
>>   leaf YY level 1
>>   ^^^
>>   Something went wrong now
>>
>> And for the corrupted leaf returned, later caller can be screwed up
>> easily.
>>
>> Although the root cause (powerloss, but still something wrong breaking
>> metadata CoW of btrfs) is still unknown, at least enhance btrfs-progs to
>> avoid SEGV.
>>
>> Reported-by: Ralph Gauges 
>> Signed-off-by: Qu Wenruo 
> 
> Applied, thanks.
> 
>> ---
>> changlog:
>> v2:
>>   Check if the extent buffer is up-to-date before checking its level to
>>   avoid possible NULL pointer access.
>> ---
>>  ctree.c | 16 +++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/ctree.c b/ctree.c
>> index 4fc33b14000a..430805e3043f 100644
>> --- a/ctree.c
>> +++ b/ctree.c
>> @@ -22,6 +22,7 @@
>>  #include "repair.h"
>>  #include "internal.h"
>>  #include "sizes.h"
>> +#include "messages.h"
>>  
>>  static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
>>*root, struct btrfs_path *path, int level);
>> @@ -640,7 +641,9 @@ static int bin_search(struct extent_buffer *eb, struct 
>> btrfs_key *key,
>>  struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info,
>> struct extent_buffer *parent, int slot)
>>  {
>> +struct extent_buffer *ret;
>>  int level = btrfs_header_level(parent);
>> +
>>  if (slot < 0)
>>  return NULL;
>>  if (slot >= btrfs_header_nritems(parent))
>> @@ -649,8 +652,19 @@ struct extent_buffer *read_node_slot(struct 
>> btrfs_fs_info *fs_info,
>>  if (level == 0)
>>  return NULL;
>>  
>> -return read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
>> +ret = read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
>> btrfs_node_ptr_generation(parent, slot));
>> +if (!extent_buffer_uptodate(ret))
>> +return ERR_PTR(-EIO);
>> +
>> +if (btrfs_header_level(ret) != level - 1) {
>> +error("child eb corrupted: parent bytenr=%llu item=%d parent 
>> level=%d child level=%d",
> 
> Please unindent the strings that are do not fit 80 chars on the line.
> I've fixed that now, but I do that too often despite this has been known
> to be the preferred style.

Sorry for the extra trouble.

But I'm still a little uncertain about the correct way to handle such
long string.
It would be pretty nice to have some recommendation for the following cases:

1) Super long, already over 80 chars even without indent
   No indent at all or just normal indent?

2) Short enough to have some indent, but can't be aligned to the bracket
   Should we use as much indent as possible or leave no indent at all?

Thanks,
Qu

> 
>> +  btrfs_header_bytenr(parent), slot,
>> +  btrfs_header_level(parent), btrfs_header_level(ret));
>> +free_extent_buffer(ret);
>> +return ERR_PTR(-EIO);
>> +}
>> +return ret;
>>  }
>>  
>>  static int balance_level(struct btrfs_trans_handle *trans,
>> -- 
>> 2.16.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] btrfs-progs: inspect-dump-tree: add '-f|--follow' options to print all children tree blocks for '-b'

2018-03-19 Thread Qu Wenruo



On 2018年03月20日 02:44, David Sterba wrote:
> On Wed, Mar 14, 2018 at 09:05:38AM +0800, Qu Wenruo wrote:
>> When debuging with "btrfs inspect dump-tree", it's not that handy if we
>> want to iterate all child tree blocks starting from a specified block.
>>
>> -b can only print a single block, while without -b "btrfs inspect dump-tree"
>> will need extra tree roots fulfilled to continue, which is not possible
>> for some damaged filesystem.
>>
>> Add a new option '-f|--follow' to iterate a sub-tree starting from block
>> specified by '-b|--block', so we would have less limitation.
>>
>> Signed-off-by: Qu Wenruo 
> 
> I've dropped the shot option for now and updated the changelog. While
> '-f' is probably a good choice, I'd postpone adding that if we want to
> use it for other purposes.

I'm fine with this.

Just curious if this means we should also double check the short option
usage or this is just for "-f"?

Thanks,
Qu


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2] btrfs-progs: ctree: Add extra level check for read_node_slot()

2018-03-19 Thread Qu Wenruo



On 2018年03月20日 13:16, Qu Wenruo wrote:
> 
> 
> On 2018年03月19日 20:36, David Sterba wrote:
>> On Thu, Feb 08, 2018 at 08:59:40AM +0800, Qu Wenruo wrote:
>>> Strangely, we have level check in btrfs_print_tree() while we don't have
>>> the same check in read_node_slot().
>>>
>>> That's to say, for the following corruption, btrfs_search_slot() or
>>> btrfs_next_leaf() can return invalid leaf:
>>>
>>> Parent eb:
>>>   node XX level 1
>>>   ^^^
>>>   Child should be leaf (level 0)
>>>   ...
>>>   key (XXX XXX XXX) block YY
>>>
>>> Child eb:
>>>   leaf YY level 1
>>>   ^^^
>>>   Something went wrong now
>>>
>>> And for the corrupted leaf returned, later caller can be screwed up
>>> easily.
>>>
>>> Although the root cause (powerloss, but still something wrong breaking
>>> metadata CoW of btrfs) is still unknown, at least enhance btrfs-progs to
>>> avoid SEGV.
>>>
>>> Reported-by: Ralph Gauges 
>>> Signed-off-by: Qu Wenruo 
>>
>> Applied, thanks.
>>
>>> ---
>>> changlog:
>>> v2:
>>>   Check if the extent buffer is up-to-date before checking its level to
>>>   avoid possible NULL pointer access.
>>> ---
>>>  ctree.c | 16 +++-
>>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/ctree.c b/ctree.c
>>> index 4fc33b14000a..430805e3043f 100644
>>> --- a/ctree.c
>>> +++ b/ctree.c
>>> @@ -22,6 +22,7 @@
>>>  #include "repair.h"
>>>  #include "internal.h"
>>>  #include "sizes.h"
>>> +#include "messages.h"
>>>  
>>>  static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
>>>   *root, struct btrfs_path *path, int level);
>>> @@ -640,7 +641,9 @@ static int bin_search(struct extent_buffer *eb, struct 
>>> btrfs_key *key,
>>>  struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info,
>>>struct extent_buffer *parent, int slot)
>>>  {
>>> +   struct extent_buffer *ret;
>>> int level = btrfs_header_level(parent);
>>> +
>>> if (slot < 0)
>>> return NULL;
>>> if (slot >= btrfs_header_nritems(parent))
>>> @@ -649,8 +652,19 @@ struct extent_buffer *read_node_slot(struct 
>>> btrfs_fs_info *fs_info,
>>> if (level == 0)
>>> return NULL;
>>>  
>>> -   return read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
>>> +   ret = read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
>>>btrfs_node_ptr_generation(parent, slot));
>>> +   if (!extent_buffer_uptodate(ret))
>>> +   return ERR_PTR(-EIO);
>>> +
>>> +   if (btrfs_header_level(ret) != level - 1) {
>>> +   error("child eb corrupted: parent bytenr=%llu item=%d parent 
>>> level=%d child level=%d",
>>
>> Please unindent the strings that are do not fit 80 chars on the line.
>> I've fixed that now, but I do that too often despite this has been known
>> to be the preferred style.
> 
> Sorry for the extra trouble.
> 
> But I'm still a little uncertain about the correct way to handle such
> long string.
> It would be pretty nice to have some recommendation for the following cases:
> 
> 1) Super long, already over 80 chars even without indent
>No indent at all or just normal indent?

And in that case, should we move the string to a new line?
As neither way, it's already over 80 chars.

Thanks,
Qu

> 
> 2) Short enough to have some indent, but can't be aligned to the bracket
>Should we use as much indent as possible or leave no indent at all?
> 
> Thanks,
> Qu
> 
>>
>>> + btrfs_header_bytenr(parent), slot,
>>> + btrfs_header_level(parent), btrfs_header_level(ret));
>>> +   free_extent_buffer(ret);
>>> +   return ERR_PTR(-EIO);
>>> +   }
>>> +   return ret;
>>>  }
>>>  
>>>  static int balance_level(struct btrfs_trans_handle *trans,
>>> -- 
>>> 2.16.1
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 



signature.asc
Description: OpenPGP digital signature

[PATCH v3 0/6] Fix long standing -EOPNOTSUPP problem caused by large inline extent

2018-03-19 Thread Qu Wenruo

The patch is based on v4.15.1, and is designed to replace the old patch
in devel branch.

Kernel doesn't support dropping range inside inline extent, and prevents
such thing happening by limiting max inline extent size to
min(max_inline, sectorsize - 1) in cow_file_range_inline().

However btrfs-progs only inherit the BTRFS_MAX_INLINE_DATA_SIZE() macro,
which doesn't have sectorsize check.
And since btrfs-progs defaults to 16K nodesize, above macro allows large
inline extent over 15K size.

This leads to unexpected kernel behavior.

The bug exists in several parts of btrfs-progs, any tool which creates
file extent is involved, including:
1) btrfs-convert
2) mkfs --rootdir

This patchset fixes the problems in convert (both ext2 and reiserfs),
mkfs --rootdir, then add check support for both original and lowmem
mode, and finally adds 2 test cases, one for mkfs and one for convert.

For mkfs test case, it can already be exposed by misc/002, but a
pin-point test case will be much better.

changelog:
v2:
  Don't modify BTRFS_MAX_INLINE_DATA_SIZE(), but add extra check to
  callers who create file extents.
v3:
  Merge fixes for convert.
  Add real commit message for convert fixes.
  Use $TEST_TOP to replace cooperate with stand alone test cases.
  Use for loops to make the new test case shorter.

Qu Wenruo (6):
  btrfs-progs: convert: Fix inline file extent creation condition
  btrfs-progs: mkfs/rootdir: Fix inline extent creation check
  btrfs-progs: check/original mode: Check inline extent size
  btrfs-progs: check/lowmem mode: Check inline extent size
  btrfs-progs: test/convert: Add test case for invalid large inline data
extent
  btrfs-progs: test/mkfs: Add test case for rootdir inline extent size

 check/main.c   | 16 
 check/mode-lowmem.c| 28 +
 check/mode-original.h  |  1 +
 convert/source-ext2.c  |  2 +-
 convert/source-reiserfs.c  |  3 +-
 mkfs/rootdir.c |  6 ++-
 .../016-invalid-large-inline-extent/test.sh| 22 ++
 tests/mkfs-tests/014-rootdir-inline-extent/test.sh | 47 ++
 8 files changed, 121 insertions(+), 4 deletions(-)
 create mode 100755 tests/convert-tests/016-invalid-large-inline-extent/test.sh
 create mode 100755 tests/mkfs-tests/014-rootdir-inline-extent/test.sh

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 5/6] btrfs-progs: test/convert: Add test case for invalid large inline data extent

2018-03-19 Thread Qu Wenruo

Signed-off-by: Qu Wenruo 
---
 .../016-invalid-large-inline-extent/test.sh| 22 ++
 1 file changed, 22 insertions(+)
 create mode 100755 tests/convert-tests/016-invalid-large-inline-extent/test.sh

diff --git a/tests/convert-tests/016-invalid-large-inline-extent/test.sh 
b/tests/convert-tests/016-invalid-large-inline-extent/test.sh
new file mode 100755
index ..f37c7c09d2e7
--- /dev/null
+++ b/tests/convert-tests/016-invalid-large-inline-extent/test.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+# Check if btrfs-convert refuses to rollback the filesystem, and leave the fs
+# and the convert image untouched
+
+source "$TEST_TOP/common"
+source "$TEST_TOP/common.convert"
+
+setup_root_helper
+prepare_test_dev
+check_prereq btrfs-convert
+check_global_prereq mke2fs
+
+convert_test_prep_fs ext4 mke2fs -t ext4 -b 4096
+
+# Create a 6K file, which should not be inlined
+run_check $SUDO_HELPER dd if=/dev/zero bs=2k count=3 of="$TEST_MNT/file1"
+
+run_check_umount_test_dev
+
+# convert_test_do_convert() will call btrfs check, which should expose any
+# invalid inline extent with too large size
+convert_test_do_convert
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 6/6] btrfs-progs: test/mkfs: Add test case for rootdir inline extent size

2018-03-19 Thread Qu Wenruo

Add a test case for mkfs --rootdir, using files with different file
sizes to check if invalid large inline extent could exist.

Signed-off-by: Qu Wenruo 
---
 tests/mkfs-tests/014-rootdir-inline-extent/test.sh | 47 ++
 1 file changed, 47 insertions(+)
 create mode 100755 tests/mkfs-tests/014-rootdir-inline-extent/test.sh

diff --git a/tests/mkfs-tests/014-rootdir-inline-extent/test.sh 
b/tests/mkfs-tests/014-rootdir-inline-extent/test.sh
new file mode 100755
index ..167ced1e4987
--- /dev/null
+++ b/tests/mkfs-tests/014-rootdir-inline-extent/test.sh
@@ -0,0 +1,47 @@
+#!/bin/bash
+# Regression test for mkfs.btrfs --rootdir with inline file extents
+# For any large inline file extent, btrfs check could already report it
+
+source "$TEST_TOP/common"
+
+check_global_prereq fallocate
+check_prereq mkfs.btrfs
+
+prepare_test_dev
+
+tmp=$(mktemp -d --tmpdir btrfs-progs-mkfs.rootdirXXX)
+
+pagesize=$(getconf PAGESIZE)
+create_file()
+{
+   local size=$1
+   # Reuse size as filename
+   run_check fallocate -l $size "$tmp/$size"
+}
+
+test_mkfs_rootdir()
+{
+   nodesize=$1
+   run_check "$TOP/mkfs.btrfs" --nodesize $nodesize -f --rootdir "$tmp" \
+   "$TEST_DEV"
+   run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
+}
+
+# Use power of 2 from 512 to 64K (maximum node size) as base file size
+for i in 512 1024 2048 4096 8192 16384 32768; do
+   create_file $(($i - 102))
+   # 101 is the overhead size for max inline extent
+   create_file $(($i - 101))
+   create_file $(($i - 100))
+
+   create_file $(($i - 1))
+   create_file $i
+   create_file $(($i + 1))
+done
+
+for nodesize in 4096 8192 16384 32768 65536; do
+   if [ $nodesize -ge $pagesize ]; then
+   test_mkfs_rootdir $nodesize
+   fi
+done
+rm -rf -- "$tmp"
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/6] btrfs-progs: convert: Fix inline file extent creation condition

2018-03-19 Thread Qu Wenruo

[Bug]
On btrfs converted from ext*, one user reported the following kernel
warning:
 [ cut here ]
 BTRFS: Transaction aborted (error -95)
 WARNING: CPU: 0 PID: 324 at fs/btrfs/inode.c:3042 
btrfs_finish_ordered_io+0x7ab/0x850 [btrfs]
 Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
 RIP: 0010:btrfs_finish_ordered_io+0x7ab/0x850 [btrfs]
 ...
 Call Trace:
  normal_work_helper+0x39/0x370 [btrfs]
  process_one_work+0x1ce/0x410
  worker_thread+0x2b/0x3d0
  ? process_one_work+0x410/0x410
  kthread+0x113/0x130
  ? kthread_create_on_node+0x70/0x70
  ? do_syscall_64+0x74/0x190
  ? SyS_exit_group+0x10/0x10
  ret_from_fork+0x35/0x40
 ---[ end trace c8ed62ff6a525901 ]---
 BTRFS: error (device dm-2) in
btrfs_finish_ordered_io:3042: errno=-95 unknown
 BTRFS info (device dm-2): forced readonly
 BTRFS error (device dm-2): pending csums is 6447104

[Cause]
The call trace and the unique return value points to
__btrfs_drop_extents(), when we tries to drop pages of an inline extent,
we will trigger such -EOPNOTSUPP.

However kernel has limitation on the size of inline file extent
(sector size for ram size and sector size - 1for on-disk size),
btrfs-convert doesn't have the same limitation, resulting much larger
file extent.

The lack of correct inline extent size check dates back to 2008 when
btrfs-convert is added into btrfs-progs.

[Fix]
Fix the inline extent creation condition, not only using
BTRFS_MAX_INLINE_DATA_SIZE(), which is only the maximum size of inline
data according to nodesize, but also limit it against sector size.

Signed-off-by: Qu Wenruo 
---
 convert/source-ext2.c | 2 +-
 convert/source-reiserfs.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/convert/source-ext2.c b/convert/source-ext2.c
index b1492c78693d..e13fbe00415a 100644
--- a/convert/source-ext2.c
+++ b/convert/source-ext2.c
@@ -310,7 +310,7 @@ static int ext2_create_file_extents(struct 
btrfs_trans_handle *trans,
if (ret)
goto fail;
if ((convert_flags & CONVERT_FLAG_INLINE_DATA) && data.first_block == 0
-   && data.num_blocks > 0
+   && data.num_blocks > 0 && inode_size < sectorsize
&& inode_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)) {
u64 num_bytes = data.num_blocks * sectorsize;
u64 disk_bytenr = data.disk_block * sectorsize;
diff --git a/convert/source-reiserfs.c b/convert/source-reiserfs.c
index 39d6f0728bd3..eeb68d962c5d 100644
--- a/convert/source-reiserfs.c
+++ b/convert/source-reiserfs.c
@@ -376,7 +376,8 @@ static int reiserfs_convert_tail(struct btrfs_trans_handle 
*trans,
u64 isize;
int ret;
 
-   if (length >= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info))
+   if (length >= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info) ||
+   length >= root->fs_info->sectorsize)
return convert_direct(trans, root, objectid, inode, body,
  length, offset, convert_flags);
 
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/6] btrfs-progs: mkfs/rootdir: Fix inline extent creation check

2018-03-19 Thread Qu Wenruo

Just like convert, we need extra check against sector size for creating
inline extent.

Signed-off-by: Qu Wenruo 
---
 mkfs/rootdir.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mkfs/rootdir.c b/mkfs/rootdir.c
index e06b65ac13e4..a1d223a2408a 100644
--- a/mkfs/rootdir.c
+++ b/mkfs/rootdir.c
@@ -139,7 +139,8 @@ static int fill_inode_item(struct btrfs_trans_handle *trans,
}
if (S_ISREG(src->st_mode)) {
btrfs_set_stack_inode_size(dst, (u64)src->st_size);
-   if (src->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info))
+   if (src->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info) &&
+   src->st_size < sectorsize)
btrfs_set_stack_inode_nbytes(dst, src->st_size);
else {
blocks = src->st_size / sectorsize;
@@ -327,7 +328,8 @@ static int add_file_items(struct btrfs_trans_handle *trans,
if (st->st_size % sectorsize)
blocks += 1;
 
-   if (st->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)) {
+   if (st->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info) &&
+   st->st_size < sectorsize) {
char *buffer = malloc(st->st_size);
 
if (!buffer) {
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 3/6] btrfs-progs: check/original mode: Check inline extent size

2018-03-19 Thread Qu Wenruo

For inline compressed file extent, kernel doesn't allow inline extent
ram size larger than sector size and on-disk inline extent size should
not exceed BTRFS_MAX_INLINE_DATA_SIZE().

For inline uncompressed file extent, kernel doesn't allow inline extent
ram and on-disk size larger than either BTRFS_MAX_INLINE_DATA_SIZE() or
sector size.

Check it in original mode.

Signed-off-by: Qu Wenruo 
---
 check/main.c  | 16 
 check/mode-original.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/check/main.c b/check/main.c
index 97baae583f04..7821faa929a3 100644
--- a/check/main.c
+++ b/check/main.c
@@ -560,6 +560,8 @@ static void print_inode_error(struct btrfs_root *root, 
struct inode_record *rec)
fprintf(stderr, ", bad file extent");
if (errors & I_ERR_FILE_EXTENT_OVERLAP)
fprintf(stderr, ", file extent overlap");
+   if (errors & I_ERR_FILE_EXTENT_TOO_LARGE)
+   fprintf(stderr, ", inline file extent too large");
if (errors & I_ERR_FILE_EXTENT_DISCOUNT)
fprintf(stderr, ", file extent discount");
if (errors & I_ERR_DIR_ISIZE_WRONG)
@@ -1433,6 +1435,8 @@ static int process_file_extent(struct btrfs_root *root,
u64 disk_bytenr = 0;
u64 extent_offset = 0;
u64 mask = root->fs_info->sectorsize - 1;
+   u32 max_inline_size = min_t(u32, mask,
+   BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info));
int extent_type;
int ret;
 
@@ -1458,9 +1462,21 @@ static int process_file_extent(struct btrfs_root *root,
extent_type = btrfs_file_extent_type(eb, fi);
 
if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
+   u8 compression = btrfs_file_extent_compression(eb, fi);
+   struct btrfs_item *item = btrfs_item_nr(slot);
+
num_bytes = btrfs_file_extent_inline_len(eb, slot, fi);
if (num_bytes == 0)
rec->errors |= I_ERR_BAD_FILE_EXTENT;
+   if (compression) {
+   if (btrfs_file_extent_inline_item_len(eb, item) >
+   max_inline_size ||
+   num_bytes > root->fs_info->sectorsize)
+   rec->errors |= I_ERR_FILE_EXTENT_TOO_LARGE;
+   } else {
+   if (num_bytes > max_inline_size)
+   rec->errors |= I_ERR_FILE_EXTENT_TOO_LARGE;
+   }
rec->found_size += num_bytes;
num_bytes = (num_bytes + mask) & ~mask;
} else if (extent_type == BTRFS_FILE_EXTENT_REG ||
diff --git a/check/mode-original.h b/check/mode-original.h
index f859af478f0f..368de692fdd1 100644
--- a/check/mode-original.h
+++ b/check/mode-original.h
@@ -185,6 +185,7 @@ struct file_extent_hole {
 #define I_ERR_SOME_CSUM_MISSING(1 << 12)
 #define I_ERR_LINK_COUNT_WRONG (1 << 13)
 #define I_ERR_FILE_EXTENT_ORPHAN   (1 << 14)
+#define I_ERR_FILE_EXTENT_TOO_LARGE(1 << 15)
 
 struct inode_record {
struct list_head backrefs;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 4/6] btrfs-progs: check/lowmem mode: Check inline extent size

2018-03-19 Thread Qu Wenruo

Signed-off-by: Qu Wenruo 
---
 check/mode-lowmem.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 62bcf3d2e126..e5446f9c7280 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -1417,6 +1417,8 @@ static int check_file_extent(struct btrfs_root *root, 
struct btrfs_key *fkey,
u64 csum_found; /* In byte size, sectorsize aligned */
u64 search_start;   /* Logical range start we search for csum */
u64 search_len; /* Logical range len we search for csum */
+   u32 max_inline_extent_size = min_t(u32, root->fs_info->sectorsize - 1,
+   BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info));
unsigned int extent_type;
unsigned int is_hole;
int compressed = 0;
@@ -1440,6 +1442,32 @@ static int check_file_extent(struct btrfs_root *root, 
struct btrfs_key *fkey,
root->objectid, fkey->objectid, fkey->offset);
err |= FILE_EXTENT_ERROR;
}
+   if (compressed) {
+   if (extent_num_bytes > root->fs_info->sectorsize) {
+   error(
+"root %llu EXTENT_DATA[%llu %llu] too large inline extent ram size, have %llu, 
max: %u",
+   root->objectid, fkey->objectid,
+   fkey->offset, extent_num_bytes,
+   root->fs_info->sectorsize - 1);
+   err |= FILE_EXTENT_ERROR;
+   }
+   if (item_inline_len > max_inline_extent_size) {
+   error(
+"root %llu EXTENT_DATA[%llu %llu] too large inline extent on-disk size, have 
%u, max: %u",
+   root->objectid, fkey->objectid,
+   fkey->offset, item_inline_len,
+   max_inline_extent_size);
+   err |= FILE_EXTENT_ERROR;
+   }
+   } else {
+   if (extent_num_bytes > max_inline_extent_size) {
+   error(
+ "root %llu EXTENT_DATA[%llu %llu] too large inline extent size, have %llu, 
max: %u",
+   root->objectid, fkey->objectid, fkey->offset,
+   extent_num_bytes, max_inline_extent_size);
+   err |= FILE_EXTENT_ERROR;
+   }
+   }
if (!compressed && extent_num_bytes != item_inline_len) {
error(
"root %llu EXTENT_DATA[%llu %llu] wrong inline size, have: 
%llu, expected: %u",
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Allow non-privileged user to delete empty subvolume by default

2018-03-19 Thread Misono, Tomohiro

Deletion of subvolume by non-privileged user is completely restricted
by default because we can delete a subvolume even if it is not empty
and may cause data loss. In other words, when user_subvol_rm_allowed
mount option is used, a user can delete a subvolume containing the
directory which cannot be deleted directly by the user.

However, there should be no harm to allow users to delete empty subvolumes
when rmdir(2) would have been allowed if they were normal directories.
This patch allows deletion of empty subvolume by default.

Note that user_subvol_rm_allowed option requires write+exec permission
of the subvolume to be deleted, but they are not required for empty
subvolume.

The comment in the code is also updated accordingly.

Signed-off-by: Tomohiro Misono 
---
 fs/btrfs/ioctl.c | 55 +++
 1 file changed, 31 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 111ee282b777..838406a7a7f5 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2366,36 +2366,43 @@ static noinline int btrfs_ioctl_snap_destroy(struct 
file *file,
dest = BTRFS_I(inode)->root;
if (!capable(CAP_SYS_ADMIN)) {
/*
-* Regular user.  Only allow this with a special mount
-* option, when the user has write+exec access to the
-* subvol root, and when rmdir(2) would have been
-* allowed.
+* By default, regular user is only allowed to delete
+* empty subvols when rmdir(2) would have been allowed
+* if they were normal directories.
 *
-* Note that this is _not_ check that the subvol is
-* empty or doesn't contain data that we wouldn't
+* If the mount option 'user_subvol_rm_allowed' is set,
+* it allows users to delete non-empty subvols when the
+* user has write+exec access to the subvol root and when
+* rmdir(2) would have been allowed (except the emptiness
+* check).
+*
+* Note that this option does _not_ check that if the subvol
+* is empty or doesn't contain data that the user wouldn't
 * otherwise be able to delete.
 *
-* Users who want to delete empty subvols should try
-* rmdir(2).
+* Users who want to delete empty subvols created by
+* snapshot (ino number == 2) can use rmdir(2).
 */
-   err = -EPERM;
-   if (!btrfs_test_opt(fs_info, USER_SUBVOL_RM_ALLOWED))
-   goto out_dput;
+   err = -ENOTEMPTY;
+   if (inode->i_size != BTRFS_EMPTY_DIR_SIZE) {
+   if (!btrfs_test_opt(fs_info, USER_SUBVOL_RM_ALLOWED))
+   goto out_dput;
 
-   /*
-* Do not allow deletion if the parent dir is the same
-* as the dir to be deleted.  That means the ioctl
-* must be called on the dentry referencing the root
-* of the subvol, not a random directory contained
-* within it.
-*/
-   err = -EINVAL;
-   if (root == dest)
-   goto out_dput;
+   /*
+* Do not allow deletion if the parent dir is the same
+* as the dir to be deleted.  That means the ioctl
+* must be called on the dentry referencing the root
+* of the subvol, not a random directory contained
+* within it.
+*/
+   err = -EINVAL;
+   if (root == dest)
+   goto out_dput;
 
-   err = inode_permission(inode, MAY_WRITE | MAY_EXEC);
-   if (err)
-   goto out_dput;
+   err = inode_permission(inode, MAY_WRITE | MAY_EXEC);
+   if (err)
+   goto out_dput;
+   }
}
 
/* check if subvolume may be deleted by a user */
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: ctree.h: Fix wrong comment position about csum size

2018-03-19 Thread Misono, Tomohiro


Signed-off-by: Tomohiro Misono 
---
 fs/btrfs/ctree.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index da308774b8a4..8f59cb20dd4c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -86,9 +86,9 @@ struct btrfs_ordered_sum;
  */
 #define BTRFS_LINK_MAX 65535U
 
+/* four bytes for CRC32 */
 static const int btrfs_csum_sizes[] = { 4 };
 
-/* four bytes for CRC32 */
 #define BTRFS_EMPTY_DIR_SIZE 0
 
 /* ioprio of readahead is set to idle */
-- 
2.14.3


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

69 matches

Mail list logo