Re: off-by-one uncompressed invalid ram_bytes corruptions

2018-05-17 Thread Qu Wenruo


On 2018年05月18日 13:23, Steve Leung wrote:
> Hi list,
> 
> I've got 3-device raid1 btrfs filesystem that's throwing up some
> "corrupt leaf" errors in dmesg.  This is a uniquified list I've
> observed lately:
> 
>   BTRFS critical (device sda1): corrupt leaf: root=1 block=4970196795392
> slot=307 ino=206231 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 3468 expect 3469

Would you please use "btrfs-debug-tree -b 4970196795392 /dev/sda1" to
dump the leaf?

It's caught by tree-checker code which is ensuring all tree blocks are
correct before btrfs can take use of them.

That inline extent size check is tested, so I'm wondering if this
indicates any real corruption.
That btrfs-debug-tree output will definitely help.

BTW, if I didn't miss anything, there should not be any inlined extent
in root tree.

>   BTRFS critical (device sda1): corrupt leaf: root=1 block=4970552426496
> slot=91 ino=209736 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 3496 expect 3497

Same dump will definitely help.

>   BTRFS critical (device sda1): corrupt leaf: root=1 block=4970712399872
> slot=221 ino=205230 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 1790 expect 1791
>   BTRFS critical (device sda1): corrupt leaf: root=1 block=4970803920896
> slot=368 ino=205732 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 2475 expect 2476
>   BTRFS critical (device sda1): corrupt leaf: root=1 block=4970987945984
> slot=236 ino=208896 file_offset=0, invalid ram_bytes for uncompressed
> inline extent, have 490 expect 491
> 
> All of them seem to be 1 short of the expected value.
> 
> Some files do seem to be inaccessible on the filesystem, and btrfs
> inspect-internal on any of those inode numbers fails with:
> 
>  ERROR: ino paths ioctl: Input/output error
> 
> and another message for that inode appears.
> 
> 'btrfs check' (output attached) seems to notice these corruptions (among
> a few others, some of which seem to be related to a problematic attempt
> to build Android I posted about some months ago).
> 
> Other information:
> 
> Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The filesystem has
> about 25 snapshots at the moment, only a handful of compressed files,
> and nothing fancy like qgroups enabled.
> 
> btrfs fi show:
> 
>  Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
>  Total devices 4 FS bytes used 2.48TiB
>  devid    1 size 1.36TiB used 1.13TiB path /dev/sdd1
>  devid    2 size 464.73GiB used 230.00GiB path /dev/sdc1
>  devid    3 size 1.36TiB used 1.13TiB path /dev/sdb1
>  devid    4 size 3.49TiB used 2.49TiB path /dev/sda1
> 
> btrfs fi df:
> 
>  Data, RAID1: total=2.49TiB, used=2.48TiB
>  System, RAID1: total=32.00MiB, used=416.00KiB
>  Metadata, RAID1: total=7.00GiB, used=5.29GiB
>  GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> dmesg output attached as well.
> 
> Thanks in advance for any assistance!  I have backups of all the
> important stuff here but it would be nice to fix the corruptions in place.

And btrfs check doesn't report the same problem as the default original
mode doesn't have such check.

Please also post the result of "btrfs check --mode=lowmem /dev/sda1"

Thanks,
Qu

> 
> Steve



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 4/6] Btrfs: remove unused check of skip_locking

2018-05-17 Thread Nikolay Borisov


On 18.05.2018 05:55, Liu Bo wrote:
> On Wed, May 16, 2018 at 3:03 PM, Nikolay Borisov  wrote:
>>
>>
>> On 15.05.2018 20:52, Liu Bo wrote:
>>> The check is superfluous since all of callers who set search_for_commit
>>> also have skip_locking set.
>>
>> This is false. For example btrfs_qgroup_rescan_worker sets
>> search_commit_root = 1 but doesn't set skip locking. So either your
>> assumption is wrong or the code in btrfs_qgroup_rescan_worker has missed
>> that (which seems more likely).
>>
> 
> I'm a bit confused as I didn't see btrfs_qgroup_rescan_worker() set
> path->search_commit_root.

Then you are not basing your patches on the latest development branch
(misc-next) from David's github. The code in question is indeed very
recent:

e3884f5bd7cc ("btrfs: qgroup: Search commit root for rescan to avoid
missing extent")


> 
> thanks,
> liubo
> 
>> I think this change necessitates either:
>>
>> a) ASSERT(p->skip_locking) inside the if (p->search_commit_root) branch
>>
>> or
>>
>> b) Unconditionally setting ->skip_locking if we have set
>> search_commit_root and removing opencoded set of skip_locking alongside
>> search_commit_root.
>>
>> I think b) is the better option since it provides "fire and forget"
>> semantics when you want to search the commit root, since it's only read
>> only.
>>
>>>
>>> Signed-off-by: Liu Bo 
>>> ---
>>>  fs/btrfs/ctree.c | 2 --
>>>  1 file changed, 2 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>> index 399839df7a8f..cf34eca41d4e 100644
>>> --- a/fs/btrfs/ctree.c
>>> +++ b/fs/btrfs/ctree.c
>>> @@ -2623,8 +2623,6 @@ static struct extent_buffer 
>>> *btrfs_search_slot_get_root(struct btrfs_root *root,
>>>   level = btrfs_header_level(b);
>>>   if (p->need_commit_sem)
>>>   up_read(_info->commit_root_sem);
>>> - if (!p->skip_locking)
>>> - btrfs_tree_read_lock(b);
>>>
>>>   goto out;
>>>   }
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 6/6] Btrfs: remove always true check in unlock_up

2018-05-17 Thread Qu Wenruo


On 2018年05月18日 11:00, Liu Bo wrote:
> As unlock_up() is written as
> 
> for () {
>if (!path->locks[i])
>break;
>...
>if (... && path->locks[i]) {
>}
> }
> 
> Apparently, @path->locks[i] is always true at this 'if'.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> ---
>  fs/btrfs/ctree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index e619f7e01794..65dbebb8cc59 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2330,7 +2330,7 @@ static noinline void unlock_up(struct btrfs_path *path, 
> int level,
>   no_skips = 1;
>  
>   t = path->nodes[i];
> - if (i >= lowest_unlock && i > skip_level && path->locks[i]) {
> + if (i >= lowest_unlock && i > skip_level) {
>   btrfs_tree_unlock_rw(t, path->locks[i]);
>   path->locks[i] = 0;
>   if (write_lock_level &&
> 



signature.asc
Description: OpenPGP digital signature


off-by-one uncompressed invalid ram_bytes corruptions

2018-05-17 Thread Steve Leung

Hi list,

I've got 3-device raid1 btrfs filesystem that's throwing up some
"corrupt leaf" errors in dmesg.  This is a uniquified list I've
observed lately:

  BTRFS critical (device sda1): corrupt leaf: root=1 
block=4970196795392 slot=307 ino=206231 file_offset=0, invalid ram_bytes 
for uncompressed inline extent, have 3468 expect 3469
  BTRFS critical (device sda1): corrupt leaf: root=1 
block=4970552426496 slot=91 ino=209736 file_offset=0, invalid ram_bytes 
for uncompressed inline extent, have 3496 expect 3497
  BTRFS critical (device sda1): corrupt leaf: root=1 
block=4970712399872 slot=221 ino=205230 file_offset=0, invalid ram_bytes 
for uncompressed inline extent, have 1790 expect 1791
  BTRFS critical (device sda1): corrupt leaf: root=1 
block=4970803920896 slot=368 ino=205732 file_offset=0, invalid ram_bytes 
for uncompressed inline extent, have 2475 expect 2476
  BTRFS critical (device sda1): corrupt leaf: root=1 
block=4970987945984 slot=236 ino=208896 file_offset=0, invalid ram_bytes 
for uncompressed inline extent, have 490 expect 491


All of them seem to be 1 short of the expected value.

Some files do seem to be inaccessible on the filesystem, and btrfs
inspect-internal on any of those inode numbers fails with:

 ERROR: ino paths ioctl: Input/output error

and another message for that inode appears.

'btrfs check' (output attached) seems to notice these corruptions (among 
a few others, some of which seem to be related to a problematic attempt 
to build Android I posted about some months ago).


Other information:

Arch Linux x86-64, kernel 4.16.6, btrfs-progs 4.16.  The filesystem has 
about 25 snapshots at the moment, only a handful of compressed files, 
and nothing fancy like qgroups enabled.


btrfs fi show:

 Label: none  uuid: 9d4db9e3-b9c3-4f6d-8cb4-60ff55e96d82
 Total devices 4 FS bytes used 2.48TiB
 devid1 size 1.36TiB used 1.13TiB path /dev/sdd1
 devid2 size 464.73GiB used 230.00GiB path /dev/sdc1
 devid3 size 1.36TiB used 1.13TiB path /dev/sdb1
 devid4 size 3.49TiB used 2.49TiB path /dev/sda1

btrfs fi df:

 Data, RAID1: total=2.49TiB, used=2.48TiB
 System, RAID1: total=32.00MiB, used=416.00KiB
 Metadata, RAID1: total=7.00GiB, used=5.29GiB
 GlobalReserve, single: total=512.00MiB, used=0.00B

dmesg output attached as well.

Thanks in advance for any assistance!  I have backups of all the 
important stuff here but it would be nice to fix the corruptions in place.


Steve


btrfs-check.txt.gz
Description: application/gzip


dmesg.txt.gz
Description: application/gzip


Re: [PATCH v2 4/6] Btrfs: remove unused check of skip_locking

2018-05-17 Thread Qu Wenruo


On 2018年05月18日 11:00, Liu Bo wrote:
> The check is superfluous since all of callers who set search_for_commit
> also have skip_locking set.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: Qu Wenruo 

Although more obvious comment about search_commit_root and skip_locking
in ctree.h will be much better.

Thanks,
Qu

> ---
>  fs/btrfs/ctree.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index d12fc0474e21..8d3b09038f37 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2623,8 +2623,6 @@ static struct extent_buffer 
> *btrfs_search_slot_get_root(struct btrfs_root *root,
>   level = btrfs_header_level(b);
>   if (p->need_commit_sem)
>   up_read(_info->commit_root_sem);
> - if (!p->skip_locking)
> - btrfs_tree_read_lock(b);
>  
>   goto out;
>   }
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 2/6] Btrfs: use more straightforward extent_buffer_uptodate

2018-05-17 Thread Qu Wenruo


On 2018年05月18日 11:00, Liu Bo wrote:
> If parent_transid "0" is passed to btrfs_buffer_uptodate(),
> btrfs_buffer_uptodate() is equivalent to extent_buffer_uptodate(), but
> extent_buffer_uptodate() is preferred since we don't have to look into
> verify_parent_transid().
> 
> Signed-off-by: Liu Bo 

Reviewed-by: Qu Wenruo 

And considering how little extra work we do in btrfs_buffer_uptodate(),
what about make it an inline function?

Thanks,
Qu

> ---
>  fs/btrfs/ctree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 9fa3d77c98d4..a96d308c51b8 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2445,7 +2445,7 @@ noinline void btrfs_unlock_up_safe(struct btrfs_path 
> *path, int level)
>* and give up so that our caller doesn't loop forever
>* on our EAGAINs.
>*/
> - if (!btrfs_buffer_uptodate(tmp, 0, 0))
> + if (!extent_buffer_uptodate(tmp))
>   ret = -EIO;
>   free_extent_buffer(tmp);
>   } else {
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 3/6] Btrfs: move get root of btrfs_search_slot to a helper

2018-05-17 Thread Qu Wenruo


On 2018年05月18日 11:00, Liu Bo wrote:
> It's good to have a helper instead of having all get-root details
> open-coded.
> 
> The new helper locks (if necessary) and sets root node of the path.
> 
> Also invert the checks to make the code flow easier to read.
> 
> There is no functional change in this commit.
> 
> Signed-off-by: Liu Bo 

Less indents with better use of goto tag, indeed easier to read.

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> ---
>  fs/btrfs/ctree.c | 115 
> +--
>  1 file changed, 70 insertions(+), 45 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index a96d308c51b8..d12fc0474e21 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2598,6 +2598,75 @@ int btrfs_find_item(struct btrfs_root *fs_root, struct 
> btrfs_path *path,
>   return 0;
>  }
>  
> +static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root 
> *root,
> + struct btrfs_path *p,
> + int write_lock_level)
> +{
> + struct btrfs_fs_info *fs_info = root->fs_info;
> + struct extent_buffer *b;
> + int root_lock;
> + int level = 0;
> +
> + /*
> +  * we try very hard to do read locks on the root
> +  */
> + root_lock = BTRFS_READ_LOCK;
> +
> + if (p->search_commit_root) {
> + /*
> +  * the commit roots are read only so we always do read locks
> +  */
> + if (p->need_commit_sem)
> + down_read(_info->commit_root_sem);
> + b = root->commit_root;
> + extent_buffer_get(b);
> + level = btrfs_header_level(b);
> + if (p->need_commit_sem)
> + up_read(_info->commit_root_sem);
> + if (!p->skip_locking)
> + btrfs_tree_read_lock(b);
> +
> + goto out;
> + }
> +
> + if (p->skip_locking) {
> + b = btrfs_root_node(root);
> + level = btrfs_header_level(b);
> + goto out;
> + }
> +
> + /*
> +  * we don't know the level of the root node until we actually
> +  * have it read locked
> +  */
> + b = btrfs_read_lock_root_node(root);
> + level = btrfs_header_level(b);
> + if (level > write_lock_level)
> + goto out;
> +
> + /*
> +  * whoops, must trade for write lock
> +  */
> + btrfs_tree_read_unlock(b);
> + free_extent_buffer(b);
> + b = btrfs_lock_root_node(root);
> + root_lock = BTRFS_WRITE_LOCK;
> + /*
> +  * the level might have changed, check again
> +  */
> + level = btrfs_header_level(b);
> +
> +out:
> + p->nodes[level] = b;
> + if (!p->skip_locking)
> + p->locks[level] = root_lock;
> + /*
> +  * Callers are responsible for drop b's refs.
> +  */
> + return b;
> +}
> +
> +
>  /*
>   * btrfs_search_slot - look for a key in a tree and perform necessary
>   * modifications to preserve tree invariants.
> @@ -2634,7 +2703,6 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, 
> struct btrfs_root *root,
>   int err;
>   int level;
>   int lowest_unlock = 1;
> - int root_lock;
>   /* everything at write_lock_level or lower must be write locked */
>   int write_lock_level = 0;
>   u8 lowest_level = 0;
> @@ -2672,50 +2740,7 @@ int btrfs_search_slot(struct btrfs_trans_handle 
> *trans, struct btrfs_root *root,
>  
>  again:
>   prev_cmp = -1;
> - /*
> -  * we try very hard to do read locks on the root
> -  */
> - root_lock = BTRFS_READ_LOCK;
> - level = 0;
> - if (p->search_commit_root) {
> - /*
> -  * the commit roots are read only
> -  * so we always do read locks
> -  */
> - if (p->need_commit_sem)
> - down_read(_info->commit_root_sem);
> - b = root->commit_root;
> - extent_buffer_get(b);
> - level = btrfs_header_level(b);
> - if (p->need_commit_sem)
> - up_read(_info->commit_root_sem);
> - if (!p->skip_locking)
> - btrfs_tree_read_lock(b);
> - } else {
> - if (p->skip_locking) {
> - b = btrfs_root_node(root);
> - level = btrfs_header_level(b);
> - } else {
> - /* we don't know the level of the root node
> -  * until we actually have it read locked
> -  */
> - b = btrfs_read_lock_root_node(root);
> - level = btrfs_header_level(b);
> - if (level <= write_lock_level) {
> - /* whoops, must trade for write lock */
> - btrfs_tree_read_unlock(b);
> -   

RE: [PATCH v6 2/3] btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF

2018-05-17 Thread Gu, Jinxiang


> -Original Message-
> From: linux-btrfs-ow...@vger.kernel.org 
> [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Tomohiro Misono
> Sent: Friday, May 18, 2018 10:55 AM
> To: linux-btrfs@vger.kernel.org
> Subject: [PATCH v6 2/3] btrfs: Add unprivileged ioctl which returns 
> subvolume's ROOT_REF
> 
> Add unprivileged ioctl BTRFS_IOC_GET_SUBVOL_ROOTREF which returns ROOT_REF 
> information of the subvolume containing this inode
> except the subvolume name (this is because to prevent potential name leak). 
> The subvolume name will be gained by user version of
> ino_lookup ioctl (BTRFS_IOC_INO_LOOKUP_USER) which also performs permission 
> check.
> 
> The min id of root ref's subvolume to be searched is specified by @min_id in 
> struct btrfs_ioctl_get_subvol_rootref_args. After the search
> ends, @min_id is set to the last searched root ref's subvolid + 1. Also, if 
> there are more root refs than
> BTRFS_MAX_ROOTREF_BUFFER_NUM, -EOVERFLOW is returned. Therefore the caller 
> can just call this ioctl again without changing the
> argument to continue search.
> 
> Reviewed-by: Qu Wenruo 
> Signed-off-by: Tomohiro Misono 
> ---
>  v4 -> v5
> - Update error handling of btrfs_next_leaf() to cover all cases
> - Use btrfs_next_item() to reduce the call of btrfs_search_slot()
> 
>  fs/btrfs/ioctl.c   | 102 
> +
>  include/uapi/linux/btrfs.h |  16 +++
>  2 files changed, 118 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 
> 31af6e91c614..463ddedd90da 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2365,6 +2365,106 @@ static noinline int 
> btrfs_ioctl_get_subvol_info(struct file *file,
>   return ret;
>  }
> 
> +/*
> + * Return ROOT_REF information of the subvolume containing this inode
> + * except the subvolume name.
> + */
> +static noinline int btrfs_ioctl_get_subvol_rootref(struct file *file,
> +void __user *argp)
> +{
> + struct btrfs_ioctl_get_subvol_rootref_args *rootrefs;
> + struct btrfs_root_ref *rref;
> + struct btrfs_root *root;
> + struct btrfs_path *path;
> + struct btrfs_key key;
> +
> + struct extent_buffer *l;
> + int slot;
> +
> + struct inode *inode;
> + int ret;
> + u64 objectid;
> + u8 found;
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> +
> + rootrefs = memdup_user(argp, sizeof(*rootrefs));
> + if (!rootrefs) {
> + btrfs_free_path(path);
> + return -ENOMEM;
> + }
> +
> + inode = file_inode(file);
> + root = BTRFS_I(inode)->root->fs_info->tree_root;
> + objectid = BTRFS_I(inode)->root->root_key.objectid;
> +
> + key.objectid = objectid;
> + key.type = BTRFS_ROOT_REF_KEY;
> + key.offset = rootrefs->min_id;
> + found = 0;
> +
> + ret = btrfs_search_slot(NULL, root, , path, 0, 0);
> + if (ret < 0) {
> + goto out;
> + } else if (path->slots[0] >=
> + btrfs_header_nritems(path->nodes[0])) {
> + ret = btrfs_next_leaf(root, path);
> + if (ret < 0) {
> + goto out;
> + } else if (ret > 0) {
> + ret = -EUCLEAN;
> + goto out;
> + }
> + }
> + while (1) {
> + l = path->nodes[0];
> + slot = path->slots[0];
> +
> + btrfs_item_key_to_cpu(l, , slot);
> + if (key.objectid != objectid ||
> + key.type != BTRFS_ROOT_REF_KEY) {
> + ret = 0;
> + goto out;
> + }
> +
> + if (found == BTRFS_MAX_ROOTREF_BUFFER_NUM) {
> + ret = -EOVERFLOW;
> + goto out;
> + }
> +
> + rref = btrfs_item_ptr(l, slot, struct btrfs_root_ref);
> + rootrefs->rootref[found].subvolid = key.offset;
> + rootrefs->rootref[found].dirid =
> +   btrfs_root_ref_dirid(l, rref);
> + found++;
> +
> + ret = btrfs_next_item(root, path);
> + if (ret < 0) {
> + goto out;
> + } else if (ret > 0) {
> + ret = -EUCLEAN;
> + goto out;
> + }
> + }
> +
> +out:
> + if (!ret || ret == -EOVERFLOW) {
> + rootrefs->num_items = found;
> + /* update min_id for next search */
> + if (found)
> + rootrefs->min_id =
> + rootrefs->rootref[found - 1].subvolid + 1;
> + if (copy_to_user(argp, rootrefs, sizeof(*rootrefs)))
> + ret = -EFAULT;
> + }
> +
> + btrfs_free_path(path);
> + kfree(rootrefs);
> + return ret;
> +}
> +
>  static noinline int 

Re: [PATCH v2 1/6] Btrfs: remove superfluous free_extent_buffer

2018-05-17 Thread Qu Wenruo


On 2018年05月18日 11:00, Liu Bo wrote:
> read_block_for_search() can be simplified as,
> 
> tmp = find_extent_buffer();
> if (tmp)
>return;
> 
> free_extent_buffer();
> read_tree_block();
> 
> Apparently, @tmp must be NULL at this point, free_extent_buffer() is not
> needed.>
> Signed-off-by: Liu Bo 

Reviewed-by: Qu wenruo 

Thanks,
Qu
> ---
>  fs/btrfs/ctree.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index b3f6f300e492..9fa3d77c98d4 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2432,7 +2432,6 @@ noinline void btrfs_unlock_up_safe(struct btrfs_path 
> *path, int level)
>   btrfs_unlock_up_safe(p, level + 1);
>   btrfs_set_path_blocking(p);
>  
> - free_extent_buffer(tmp);
>   if (p->reada != READA_NONE)
>   reada_for_search(fs_info, p, level, slot, key->objectid);
>  
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2] Btrfs: add parent_transid parameter to veirfy_level_key

2018-05-17 Thread Qu Wenruo


On 2018年05月18日 13:02, Nikolay Borisov wrote:
> 
> 
> On 18.05.2018 05:59, Liu Bo wrote:
>> As verify_level_key() is checked after verify_parent_transid(), i.e.
>>
>> if (verify_parent_transid())
>>ret = -EIO;
>> else if (verify_level_key())
>>ret = -EUCLEAN;
>>
>> if parent_transid is 0, verify_parent_transid() skips verifying
>> parent_transid and considers eb as valid, and if verify_level_key()
>> reports something wrong, we're not going to know if it's caused by
>> corrupted metadata or non-checkecd eb (e.g. stale eb).
> 
> It's really unclear (from the changelog) how the stale eb ties with
> parent_transid. You should have explained the relationship between
> checking parenttransid and stale/non-stale ebs

It's in fact related to the fixing patch:
Btrfs: fix the corruption by reading stale btree blocks

But whatever, extra mention won't hurt.

> 
>>
>> @parent_transid is able to tell whether the eb's generation has been
>> verified by the caller.
> 
> It's really unclear why parent_transid is able to tell whether the
> generation is verified by the caller.

At least this looks pretty straightforward to me.
@parent_transid = 0 means no generation check, and @parent_transid != 0
means already checked.

So at least for this part I didn't get any confusion.

Thanks,
Qu

> 
>>
>> Signed-off-by: Liu Bo 
>> ---
>> v2: - more explicit commit log.
>> - adjust the position shown in error msg.
>>
>>  fs/btrfs/disk-io.c | 13 +++--
>>  1 file changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 60caa68c3618..ad865176a3ca 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -416,7 +416,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
>> *fs_info,
>>  
>>  static int verify_level_key(struct btrfs_fs_info *fs_info,
>>  struct extent_buffer *eb, int level,
>> -struct btrfs_key *first_key)
>> +struct btrfs_key *first_key, u64 parent_transid)
>>  {
>>  int found_level;
>>  struct btrfs_key found_key;
>> @@ -454,10 +454,11 @@ static int verify_level_key(struct btrfs_fs_info 
>> *fs_info,
>>  if (ret) {
>>  WARN_ON(1);
>>  btrfs_err(fs_info,
>> -"tree first key mismatch detected, bytenr=%llu key expected=(%llu, %u, 
>> %llu) has=(%llu, %u, %llu)",
>> -  eb->start, first_key->objectid, first_key->type,
>> -  first_key->offset, found_key.objectid,
>> -  found_key.type, found_key.offset);
>> +"tree first key mismatch detected, bytenr=%llu parent_transid=%llu key 
>> expected=(%llu, %u, %llu) has=(%llu, %u, %llu)",
>> +  eb->start, parent_transid, first_key->objectid,
>> +  first_key->type, first_key->offset,
>> +  found_key.objectid, found_key.type,
>> +  found_key.offset);
>>  }
>>  #endif
>>  return ret;
>> @@ -493,7 +494,7 @@ static int btree_read_extent_buffer_pages(struct 
>> btrfs_fs_info *fs_info,
>> parent_transid, 0))
>>  ret = -EIO;
>>  else if (verify_level_key(fs_info, eb, level,
>> -  first_key))
>> +  first_key, parent_transid))
>>  ret = -EUCLEAN;
>>  else
>>  break;
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v6 1/3] btrfs: Add unprivileged ioctl which returns subvolume information

2018-05-17 Thread Gu, Jinxiang


> -Original Message-
> From: linux-btrfs-ow...@vger.kernel.org 
> [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Tomohiro Misono
> Sent: Friday, May 18, 2018 10:55 AM
> To: linux-btrfs@vger.kernel.org
> Subject: [PATCH v6 1/3] btrfs: Add unprivileged ioctl which returns subvolume 
> information
> 
> Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns the 
> information of subvolume containing this inode.
> (i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.)
> 
> Signed-off-by: Tomohiro Misono 
> ---
>  v5 -> v6
> - Use brfs_read_fs_root_no_name() to get subvolume root and simplify
>   code
>  v4 -> v5
> - Update error handling of btrfs_next_leaf() to cover all cases
> - Return error if ROOT_BACKREF is not found (except top-level)
> 
>  fs/btrfs/ioctl.c   | 125 
> +
>  include/uapi/linux/btrfs.h |  51 ++
>  2 files changed, 176 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 
> 48e2ddff32bd..31af6e91c614 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2242,6 +2242,129 @@ static noinline int btrfs_ioctl_ino_lookup(struct 
> file *file,
>   return ret;
>  }
> 
> +/* Get the subvolume information in BTRFS_ROOT_ITEM and
> +BTRFS_ROOT_BACKREF */ static noinline int btrfs_ioctl_get_subvol_info(struct 
> file *file,
> +void __user *argp)
> +{
> + struct btrfs_ioctl_get_subvol_info_args *subvol_info;
> + struct btrfs_fs_info *fs_info;
> + struct btrfs_root *root;
> + struct btrfs_path *path;
> + struct btrfs_key key;
> +
> + struct btrfs_root_item *root_item;
> + struct btrfs_root_ref *rref;
> + struct extent_buffer *l;
> + int slot;
> +
> + unsigned long item_off;
> + unsigned long item_len;
> +
> + struct inode *inode;
> + int ret = 0;
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> +
> + subvol_info = kzalloc(sizeof(*subvol_info), GFP_KERNEL);
> + if (!subvol_info) {
> + btrfs_free_path(path);
> + return -ENOMEM;
> + }
> +
> + inode = file_inode(file);
> + fs_info = BTRFS_I(inode)->root->fs_info;
> +
> + /* Get root_item of inode's subvolume */
> + key.objectid = BTRFS_I(inode)->root->root_key.objectid;
> + key.type = BTRFS_ROOT_ITEM_KEY;
> + key.offset = (u64)-1;
> + root = btrfs_read_fs_root_no_name(fs_info, );
> + if (IS_ERR(root)) {
> + ret = PTR_ERR(root);
> + goto out;
> + }
> + root_item = >root_item;
> +
> + subvol_info->id = key.objectid;
> +
> + subvol_info->generation = btrfs_root_generation(root_item);
> + subvol_info->flags = btrfs_root_flags(root_item);
> +
> + memcpy(subvol_info->uuid, root_item->uuid, BTRFS_UUID_SIZE);
> + memcpy(subvol_info->parent_uuid, root_item->parent_uuid,
> + BTRFS_UUID_SIZE);
> + memcpy(subvol_info->received_uuid, root_item->received_uuid,
> + BTRFS_UUID_SIZE);
> +
> + subvol_info->ctransid = btrfs_root_ctransid(root_item);
> + subvol_info->ctime.sec = btrfs_stack_timespec_sec(_item->otime);
> + subvol_info->ctime.nsec =
> +btrfs_stack_timespec_nsec(_item->ctime);
> +
> + subvol_info->otransid = btrfs_root_otransid(root_item);
> + subvol_info->otime.sec = btrfs_stack_timespec_sec(_item->otime);
> + subvol_info->otime.nsec =
> +btrfs_stack_timespec_nsec(_item->otime);
> +
> + subvol_info->stransid = btrfs_root_stransid(root_item);
> + subvol_info->stime.sec = btrfs_stack_timespec_sec(_item->stime);
> + subvol_info->stime.nsec =
> +btrfs_stack_timespec_nsec(_item->stime);
> +
> + subvol_info->rtransid = btrfs_root_rtransid(root_item);
> + subvol_info->rtime.sec = btrfs_stack_timespec_sec(_item->rtime);
> + subvol_info->rtime.nsec =
> +btrfs_stack_timespec_nsec(_item->rtime);
> +
> + if (key.objectid != BTRFS_FS_TREE_OBJECTID) {
> + /* Search root tree for ROOT_BACKREF of this subvolume */
> + root = fs_info->tree_root;
> +
> + key.type = BTRFS_ROOT_BACKREF_KEY;
> + key.offset = 0;
> + ret = btrfs_search_slot(NULL, root, , path, 0, 0);
> + if (ret < 0) {
> + goto out;
> + } else if (path->slots[0] >=
> + btrfs_header_nritems(path->nodes[0])) {
> + ret = btrfs_next_leaf(root, path);
> + if (ret < 0) {
> + goto out;
> + } else if (ret > 0) {
> + ret = -EUCLEAN;
> + goto out;
> + }
> + }
> +
> + l = path->nodes[0];
> + slot = path->slots[0];
> +  

Re: [PATCH v2] Btrfs: add parent_transid parameter to veirfy_level_key

2018-05-17 Thread Qu Wenruo


On 2018年05月18日 10:59, Liu Bo wrote:
> As verify_level_key() is checked after verify_parent_transid(), i.e.
> 
> if (verify_parent_transid())
>ret = -EIO;
> else if (verify_level_key())
>ret = -EUCLEAN;
> 
> if parent_transid is 0, verify_parent_transid() skips verifying
> parent_transid and considers eb as valid, and if verify_level_key()
> reports something wrong, we're not going to know if it's caused by
> corrupted metadata or non-checkecd eb (e.g. stale eb).
> 
> @parent_transid is able to tell whether the eb's generation has been
> verified by the caller.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> ---
> v2: - more explicit commit log.
> - adjust the position shown in error msg.
> 
>  fs/btrfs/disk-io.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 60caa68c3618..ad865176a3ca 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -416,7 +416,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
> *fs_info,
>  
>  static int verify_level_key(struct btrfs_fs_info *fs_info,
>   struct extent_buffer *eb, int level,
> - struct btrfs_key *first_key)
> + struct btrfs_key *first_key, u64 parent_transid)
>  {
>   int found_level;
>   struct btrfs_key found_key;
> @@ -454,10 +454,11 @@ static int verify_level_key(struct btrfs_fs_info 
> *fs_info,
>   if (ret) {
>   WARN_ON(1);
>   btrfs_err(fs_info,
> -"tree first key mismatch detected, bytenr=%llu key expected=(%llu, %u, %llu) 
> has=(%llu, %u, %llu)",
> -   eb->start, first_key->objectid, first_key->type,
> -   first_key->offset, found_key.objectid,
> -   found_key.type, found_key.offset);
> +"tree first key mismatch detected, bytenr=%llu parent_transid=%llu key 
> expected=(%llu, %u, %llu) has=(%llu, %u, %llu)",
> +   eb->start, parent_transid, first_key->objectid,
> +   first_key->type, first_key->offset,
> +   found_key.objectid, found_key.type,
> +   found_key.offset);
>   }
>  #endif
>   return ret;
> @@ -493,7 +494,7 @@ static int btree_read_extent_buffer_pages(struct 
> btrfs_fs_info *fs_info,
>  parent_transid, 0))
>   ret = -EIO;
>   else if (verify_level_key(fs_info, eb, level,
> -   first_key))
> +   first_key, parent_transid))
>   ret = -EUCLEAN;
>   else
>   break;
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2] Btrfs: add parent_transid parameter to veirfy_level_key

2018-05-17 Thread Nikolay Borisov


On 18.05.2018 05:59, Liu Bo wrote:
> As verify_level_key() is checked after verify_parent_transid(), i.e.
> 
> if (verify_parent_transid())
>ret = -EIO;
> else if (verify_level_key())
>ret = -EUCLEAN;
> 
> if parent_transid is 0, verify_parent_transid() skips verifying
> parent_transid and considers eb as valid, and if verify_level_key()
> reports something wrong, we're not going to know if it's caused by
> corrupted metadata or non-checkecd eb (e.g. stale eb).

It's really unclear (from the changelog) how the stale eb ties with
parent_transid. You should have explained the relationship between
checking parenttransid and stale/non-stale ebs

> 
> @parent_transid is able to tell whether the eb's generation has been
> verified by the caller.

It's really unclear why parent_transid is able to tell whether the
generation is verified by the caller.

> 
> Signed-off-by: Liu Bo 
> ---
> v2: - more explicit commit log.
> - adjust the position shown in error msg.
> 
>  fs/btrfs/disk-io.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 60caa68c3618..ad865176a3ca 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -416,7 +416,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
> *fs_info,
>  
>  static int verify_level_key(struct btrfs_fs_info *fs_info,
>   struct extent_buffer *eb, int level,
> - struct btrfs_key *first_key)
> + struct btrfs_key *first_key, u64 parent_transid)
>  {
>   int found_level;
>   struct btrfs_key found_key;
> @@ -454,10 +454,11 @@ static int verify_level_key(struct btrfs_fs_info 
> *fs_info,
>   if (ret) {
>   WARN_ON(1);
>   btrfs_err(fs_info,
> -"tree first key mismatch detected, bytenr=%llu key expected=(%llu, %u, %llu) 
> has=(%llu, %u, %llu)",
> -   eb->start, first_key->objectid, first_key->type,
> -   first_key->offset, found_key.objectid,
> -   found_key.type, found_key.offset);
> +"tree first key mismatch detected, bytenr=%llu parent_transid=%llu key 
> expected=(%llu, %u, %llu) has=(%llu, %u, %llu)",
> +   eb->start, parent_transid, first_key->objectid,
> +   first_key->type, first_key->offset,
> +   found_key.objectid, found_key.type,
> +   found_key.offset);
>   }
>  #endif
>   return ret;
> @@ -493,7 +494,7 @@ static int btree_read_extent_buffer_pages(struct 
> btrfs_fs_info *fs_info,
>  parent_transid, 0))
>   ret = -EIO;
>   else if (verify_level_key(fs_info, eb, level,
> -   first_key))
> +   first_key, parent_transid))
>   ret = -EUCLEAN;
>   else
>   break;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/18] btrfs-progs: qgroups: add pathname to show output

2018-05-17 Thread Misono Tomohiro
On 2018/05/17 6:38, je...@suse.com wrote:
> From: Jeff Mahoney 
> 
> The btrfs qgroup show command currently only exports qgroup IDs,
> forcing the user to resolve which subvolume each corresponds to.
> 
> This patch adds pathname resolution to qgroup show so that when
> the -P option is used, the last column contains the pathname of
> the root of the subvolume it describes.  In the case of nested
> qgroups, it will show the number of member qgroups or the paths
> of the members if the -v option is used.
> 
> Pathname can also be used as a sort parameter.
> 
> Reviewed-by: Qu Wenruo 
> Signed-off-by: Jeff Mahoney 
> ---
>  Documentation/btrfs-qgroup.asciidoc |   4 +
>  cmds-qgroup.c   |  18 -
>  kerncompat.h|   1 +
>  qgroup.c| 149 
> 
>  qgroup.h|   4 +-
>  5 files changed, 157 insertions(+), 19 deletions(-)
> 
> diff --git a/Documentation/btrfs-qgroup.asciidoc 
> b/Documentation/btrfs-qgroup.asciidoc
> index 3108457c..360b3269 100644
> --- a/Documentation/btrfs-qgroup.asciidoc
> +++ b/Documentation/btrfs-qgroup.asciidoc
> @@ -97,10 +97,14 @@ print child qgroup id.
>  print limit of referenced size of qgroup.
>  -e
>  print limit of exclusive size of qgroup.
> +-P
> +print pathname to the root of the subvolume managed by qgroup.  For nested 
> qgroups, the number of members will be printed unless -v is specified.
>  -F
>  list all qgroups which impact the given path(include ancestral qgroups)
>  -f
>  list all qgroups which impact the given path(exclude ancestral qgroups)
> +-v
> +Be more verbose.  Print pathnames of member qgroups when nested.
>  --raw
>  raw numbers in bytes, without the 'B' suffix.
>  --human-readable
> diff --git a/cmds-qgroup.c b/cmds-qgroup.c
> index 93206900..33053725 100644
> --- a/cmds-qgroup.c
> +++ b/cmds-qgroup.c
> @@ -282,8 +282,11 @@ static const char * const cmd_qgroup_show_usage[] = {
>   "   (including ancestral qgroups)",
>   "-f list all qgroups which impact the given path",
>   "   (excluding ancestral qgroups)",
> + "-P print first-level qgroups using pathname",
> + "   - nested qgroups will be reported as a count",
> + "-v verbose, prints pathnames for all nested qgroups",
>   HELPINFO_UNITS_LONG,
> - "--sort=qgroupid,rfer,excl,max_rfer,max_excl",
> + "--sort=qgroupid,rfer,excl,max_rfer,max_excl,pathname",
>   "   list qgroups sorted by specified items",
>   "   you can use '+' or '-' in front of each item.",
>   "   (+:ascending, -:descending, ascending default)",
> @@ -302,6 +305,7 @@ static int cmd_qgroup_show(int argc, char **argv)
>   unsigned unit_mode;
>   int sync = 0;
>   enum btrfs_util_error err;
> + bool verbose = false;
>  
>   struct btrfs_qgroup_comparer_set *comparer_set;
>   struct btrfs_qgroup_filter_set *filter_set;
> @@ -319,10 +323,11 @@ static int cmd_qgroup_show(int argc, char **argv)
>   static const struct option long_options[] = {
>   {"sort", required_argument, NULL, GETOPT_VAL_SORT},
>   {"sync", no_argument, NULL, GETOPT_VAL_SYNC},
> + {"verbose", no_argument, NULL, 'v'},
>   { NULL, 0, NULL, 0 }
>   };
>  
> - c = getopt_long(argc, argv, "pcreFf", long_options, NULL);
> + c = getopt_long(argc, argv, "pPcreFfv", long_options, NULL);
>   if (c < 0)
>   break;
>   switch (c) {
> @@ -330,6 +335,10 @@ static int cmd_qgroup_show(int argc, char **argv)
>   btrfs_qgroup_setup_print_column(
>   BTRFS_QGROUP_PARENT);
>   break;
> + case 'P':
> + btrfs_qgroup_setup_print_column(
> + BTRFS_QGROUP_PATHNAME);
> + break;
>   case 'c':
>   btrfs_qgroup_setup_print_column(
>   BTRFS_QGROUP_CHILD);
> @@ -357,6 +366,9 @@ static int cmd_qgroup_show(int argc, char **argv)
>   case GETOPT_VAL_SYNC:
>   sync = 1;
>   break;
> + case 'v':
> + verbose = true;
> + break;
>   default:
>   usage(cmd_qgroup_show_usage);
>   }
> @@ -398,7 +410,7 @@ static int cmd_qgroup_show(int argc, char **argv)
>   BTRFS_QGROUP_FILTER_PARENT,
>   qgroupid);
>   }
> - ret = btrfs_show_qgroups(fd, filter_set, comparer_set);
> + ret = btrfs_show_qgroups(fd, filter_set, comparer_set, verbose);
>  

Re: [PATCH V4] test online label ioctl

2018-05-17 Thread Dave Chinner
On Thu, May 17, 2018 at 10:28:26AM -0500, Eric Sandeen wrote:
> This tests the online label ioctl that btrfs has, which has been
> recently proposed for XFS.
> 
> To run, it requires an updated xfs_io with the label command and a
> filesystem that supports it
> 
> A slight change here to _require_xfs_io_command as well, so that tests
> which simply fail with "Inappropriate ioctl" can be caught in the
> common case.
> 
> Signed-off-by: Eric Sandeen 
> ---
> 
> Now with new and improved sequential V4 versioning!

Looks good now.

Reviewed-by: Dave Chinner 
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/6] Btrfs: remove unused check of skip_locking

2018-05-17 Thread Liu Bo
The check is superfluous since all of callers who set search_for_commit
also have skip_locking set.

Signed-off-by: Liu Bo 
---
 fs/btrfs/ctree.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index d12fc0474e21..8d3b09038f37 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2623,8 +2623,6 @@ static struct extent_buffer 
*btrfs_search_slot_get_root(struct btrfs_root *root,
level = btrfs_header_level(b);
if (p->need_commit_sem)
up_read(_info->commit_root_sem);
-   if (!p->skip_locking)
-   btrfs_tree_read_lock(b);
 
goto out;
}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/6] Btrfs: use more straightforward extent_buffer_uptodate

2018-05-17 Thread Liu Bo
If parent_transid "0" is passed to btrfs_buffer_uptodate(),
btrfs_buffer_uptodate() is equivalent to extent_buffer_uptodate(), but
extent_buffer_uptodate() is preferred since we don't have to look into
verify_parent_transid().

Signed-off-by: Liu Bo 
---
 fs/btrfs/ctree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 9fa3d77c98d4..a96d308c51b8 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2445,7 +2445,7 @@ noinline void btrfs_unlock_up_safe(struct btrfs_path 
*path, int level)
 * and give up so that our caller doesn't loop forever
 * on our EAGAINs.
 */
-   if (!btrfs_buffer_uptodate(tmp, 0, 0))
+   if (!extent_buffer_uptodate(tmp))
ret = -EIO;
free_extent_buffer(tmp);
} else {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/6] Btrfs: remove superfluous free_extent_buffer

2018-05-17 Thread Liu Bo
read_block_for_search() can be simplified as,

tmp = find_extent_buffer();
if (tmp)
   return;

free_extent_buffer();
read_tree_block();

Apparently, @tmp must be NULL at this point, free_extent_buffer() is not
needed.

Signed-off-by: Liu Bo 
---
 fs/btrfs/ctree.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index b3f6f300e492..9fa3d77c98d4 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2432,7 +2432,6 @@ noinline void btrfs_unlock_up_safe(struct btrfs_path 
*path, int level)
btrfs_unlock_up_safe(p, level + 1);
btrfs_set_path_blocking(p);
 
-   free_extent_buffer(tmp);
if (p->reada != READA_NONE)
reada_for_search(fs_info, p, level, slot, key->objectid);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/6] Btrfs: move get root of btrfs_search_slot to a helper

2018-05-17 Thread Liu Bo
It's good to have a helper instead of having all get-root details
open-coded.

The new helper locks (if necessary) and sets root node of the path.

Also invert the checks to make the code flow easier to read.

There is no functional change in this commit.

Signed-off-by: Liu Bo 
---
 fs/btrfs/ctree.c | 115 +--
 1 file changed, 70 insertions(+), 45 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index a96d308c51b8..d12fc0474e21 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2598,6 +2598,75 @@ int btrfs_find_item(struct btrfs_root *fs_root, struct 
btrfs_path *path,
return 0;
 }
 
+static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root 
*root,
+   struct btrfs_path *p,
+   int write_lock_level)
+{
+   struct btrfs_fs_info *fs_info = root->fs_info;
+   struct extent_buffer *b;
+   int root_lock;
+   int level = 0;
+
+   /*
+* we try very hard to do read locks on the root
+*/
+   root_lock = BTRFS_READ_LOCK;
+
+   if (p->search_commit_root) {
+   /*
+* the commit roots are read only so we always do read locks
+*/
+   if (p->need_commit_sem)
+   down_read(_info->commit_root_sem);
+   b = root->commit_root;
+   extent_buffer_get(b);
+   level = btrfs_header_level(b);
+   if (p->need_commit_sem)
+   up_read(_info->commit_root_sem);
+   if (!p->skip_locking)
+   btrfs_tree_read_lock(b);
+
+   goto out;
+   }
+
+   if (p->skip_locking) {
+   b = btrfs_root_node(root);
+   level = btrfs_header_level(b);
+   goto out;
+   }
+
+   /*
+* we don't know the level of the root node until we actually
+* have it read locked
+*/
+   b = btrfs_read_lock_root_node(root);
+   level = btrfs_header_level(b);
+   if (level > write_lock_level)
+   goto out;
+
+   /*
+* whoops, must trade for write lock
+*/
+   btrfs_tree_read_unlock(b);
+   free_extent_buffer(b);
+   b = btrfs_lock_root_node(root);
+   root_lock = BTRFS_WRITE_LOCK;
+   /*
+* the level might have changed, check again
+*/
+   level = btrfs_header_level(b);
+
+out:
+   p->nodes[level] = b;
+   if (!p->skip_locking)
+   p->locks[level] = root_lock;
+   /*
+* Callers are responsible for drop b's refs.
+*/
+   return b;
+}
+
+
 /*
  * btrfs_search_slot - look for a key in a tree and perform necessary
  * modifications to preserve tree invariants.
@@ -2634,7 +2703,6 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, 
struct btrfs_root *root,
int err;
int level;
int lowest_unlock = 1;
-   int root_lock;
/* everything at write_lock_level or lower must be write locked */
int write_lock_level = 0;
u8 lowest_level = 0;
@@ -2672,50 +2740,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, 
struct btrfs_root *root,
 
 again:
prev_cmp = -1;
-   /*
-* we try very hard to do read locks on the root
-*/
-   root_lock = BTRFS_READ_LOCK;
-   level = 0;
-   if (p->search_commit_root) {
-   /*
-* the commit roots are read only
-* so we always do read locks
-*/
-   if (p->need_commit_sem)
-   down_read(_info->commit_root_sem);
-   b = root->commit_root;
-   extent_buffer_get(b);
-   level = btrfs_header_level(b);
-   if (p->need_commit_sem)
-   up_read(_info->commit_root_sem);
-   if (!p->skip_locking)
-   btrfs_tree_read_lock(b);
-   } else {
-   if (p->skip_locking) {
-   b = btrfs_root_node(root);
-   level = btrfs_header_level(b);
-   } else {
-   /* we don't know the level of the root node
-* until we actually have it read locked
-*/
-   b = btrfs_read_lock_root_node(root);
-   level = btrfs_header_level(b);
-   if (level <= write_lock_level) {
-   /* whoops, must trade for write lock */
-   btrfs_tree_read_unlock(b);
-   free_extent_buffer(b);
-   b = btrfs_lock_root_node(root);
-   root_lock = BTRFS_WRITE_LOCK;
-
-   /* the level might have changed, check again */
-

[PATCH v2 5/6] Btrfs: grab write lock directly if write_lock_level is the max level

2018-05-17 Thread Liu Bo
Typically, when acquiring root node's lock, btrfs tries its best to get
read lock and trade for write lock if @write_lock_level implies to do so.

In case of (cow && (p->keep_locks || p->lowest_level)), write_lock_level
is set to BTRFS_MAX_LEVEL, which means we need to acquire root node's
write lock directly.

In this particular case, the dance of acquiring read lock and then trading
for write lock can be saved.

Signed-off-by: Liu Bo 
---
 fs/btrfs/ctree.c | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 8d3b09038f37..e619f7e01794 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2633,20 +2633,23 @@ static struct extent_buffer 
*btrfs_search_slot_get_root(struct btrfs_root *root,
goto out;
}
 
-   /*
-* we don't know the level of the root node until we actually
-* have it read locked
-*/
-   b = btrfs_read_lock_root_node(root);
-   level = btrfs_header_level(b);
-   if (level > write_lock_level)
-   goto out;
+   if (write_lock_level < BTRFS_MAX_LEVEL) {
+   /*
+* we don't know the level of the root node until we actually
+* have it read locked
+*/
+   b = btrfs_read_lock_root_node(root);
+   level = btrfs_header_level(b);
+   if (level > write_lock_level)
+   goto out;
+
+   /*
+* whoops, must trade for write lock
+*/
+   btrfs_tree_read_unlock(b);
+   free_extent_buffer(b);
+   }
 
-   /*
-* whoops, must trade for write lock
-*/
-   btrfs_tree_read_unlock(b);
-   free_extent_buffer(b);
b = btrfs_lock_root_node(root);
root_lock = BTRFS_WRITE_LOCK;
/*
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/6] Btrfs: remove always true check in unlock_up

2018-05-17 Thread Liu Bo
As unlock_up() is written as

for () {
   if (!path->locks[i])
   break;
   ...
   if (... && path->locks[i]) {
   }
}

Apparently, @path->locks[i] is always true at this 'if'.

Signed-off-by: Liu Bo 
---
 fs/btrfs/ctree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index e619f7e01794..65dbebb8cc59 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2330,7 +2330,7 @@ static noinline void unlock_up(struct btrfs_path *path, 
int level,
no_skips = 1;
 
t = path->nodes[i];
-   if (i >= lowest_unlock && i > skip_level && path->locks[i]) {
+   if (i >= lowest_unlock && i > skip_level) {
btrfs_tree_unlock_rw(t, path->locks[i]);
path->locks[i] = 0;
if (write_lock_level &&
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/6] btrfs_search_slot cleanups

2018-05-17 Thread Liu Bo
Here are a collection of patches I did for btrfs_search_slot().

v2: more explicit commit log for each patch.

Liu Bo (6):
  Btrfs: remove superfluous free_extent_buffer
  Btrfs: use more straightforward extent_buffer_uptodate
  Btrfs: move get root of btrfs_search_slot to a helper
  Btrfs: remove unused check of skip_locking
  Btrfs: grab write lock directly if write_lock_level is the max level
  Btrfs: remove always true check in unlock_up

 fs/btrfs/ctree.c | 121 +--
 1 file changed, 73 insertions(+), 48 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: add parent_transid parameter to veirfy_level_key

2018-05-17 Thread Liu Bo
As verify_level_key() is checked after verify_parent_transid(), i.e.

if (verify_parent_transid())
   ret = -EIO;
else if (verify_level_key())
   ret = -EUCLEAN;

if parent_transid is 0, verify_parent_transid() skips verifying
parent_transid and considers eb as valid, and if verify_level_key()
reports something wrong, we're not going to know if it's caused by
corrupted metadata or non-checkecd eb (e.g. stale eb).

@parent_transid is able to tell whether the eb's generation has been
verified by the caller.

Signed-off-by: Liu Bo 
---
v2: - more explicit commit log.
- adjust the position shown in error msg.

 fs/btrfs/disk-io.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 60caa68c3618..ad865176a3ca 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -416,7 +416,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
 
 static int verify_level_key(struct btrfs_fs_info *fs_info,
struct extent_buffer *eb, int level,
-   struct btrfs_key *first_key)
+   struct btrfs_key *first_key, u64 parent_transid)
 {
int found_level;
struct btrfs_key found_key;
@@ -454,10 +454,11 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
if (ret) {
WARN_ON(1);
btrfs_err(fs_info,
-"tree first key mismatch detected, bytenr=%llu key expected=(%llu, %u, %llu) 
has=(%llu, %u, %llu)",
- eb->start, first_key->objectid, first_key->type,
- first_key->offset, found_key.objectid,
- found_key.type, found_key.offset);
+"tree first key mismatch detected, bytenr=%llu parent_transid=%llu key 
expected=(%llu, %u, %llu) has=(%llu, %u, %llu)",
+ eb->start, parent_transid, first_key->objectid,
+ first_key->type, first_key->offset,
+ found_key.objectid, found_key.type,
+ found_key.offset);
}
 #endif
return ret;
@@ -493,7 +494,7 @@ static int btree_read_extent_buffer_pages(struct 
btrfs_fs_info *fs_info,
   parent_transid, 0))
ret = -EIO;
else if (verify_level_key(fs_info, eb, level,
- first_key))
+ first_key, parent_transid))
ret = -EUCLEAN;
else
break;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 1/3] btrfs: Add unprivileged ioctl which returns subvolume information

2018-05-17 Thread Misono Tomohiro
On 2018/05/18 11:54, Tomohiro Misono wrote:
> Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns
> the information of subvolume containing this inode.
> (i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.)
> 
> Signed-off-by: Tomohiro Misono 
> ---
>  v5 -> v6
> - Use brfs_read_fs_root_no_name() to get subvolume root and simplify
>   code
>  v4 -> v5
> - Update error handling of btrfs_next_leaf() to cover all cases
> - Return error if ROOT_BACKREF is not found (except top-level)
> 
>  fs/btrfs/ioctl.c   | 125 
> +
>  include/uapi/linux/btrfs.h |  51 ++
>  2 files changed, 176 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 48e2ddff32bd..31af6e91c614 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2242,6 +2242,129 @@ static noinline int btrfs_ioctl_ino_lookup(struct 
> file *file,
>   return ret;
>  }
>  
> +/* Get the subvolume information in BTRFS_ROOT_ITEM and BTRFS_ROOT_BACKREF */
> +static noinline int btrfs_ioctl_get_subvol_info(struct file *file,
> +void __user *argp)
> +{
> + struct btrfs_ioctl_get_subvol_info_args *subvol_info;
> + struct btrfs_fs_info *fs_info;
> + struct btrfs_root *root;
> + struct btrfs_path *path;
> + struct btrfs_key key;
> +
> + struct btrfs_root_item *root_item;
> + struct btrfs_root_ref *rref;
> + struct extent_buffer *l;
> + int slot;
> +
> + unsigned long item_off;
> + unsigned long item_len;
> +
> + struct inode *inode;
> + int ret = 0;
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> +
> + subvol_info = kzalloc(sizeof(*subvol_info), GFP_KERNEL);
> + if (!subvol_info) {
> + btrfs_free_path(path);
> + return -ENOMEM;
> + }
> +
> + inode = file_inode(file);
> + fs_info = BTRFS_I(inode)->root->fs_info;
> +
> + /* Get root_item of inode's subvolume */
> + key.objectid = BTRFS_I(inode)->root->root_key.objectid;
> + key.type = BTRFS_ROOT_ITEM_KEY;
> + key.offset = (u64)-1;
> + root = btrfs_read_fs_root_no_name(fs_info, );
> + if (IS_ERR(root)) {
> + ret = PTR_ERR(root);
> + goto out;
> + }
> + root_item = >root_item;
> +
> + subvol_info->id = key.objectid;
> +
> + subvol_info->generation = btrfs_root_generation(root_item);
> + subvol_info->flags = btrfs_root_flags(root_item);
> +
> + memcpy(subvol_info->uuid, root_item->uuid, BTRFS_UUID_SIZE);
> + memcpy(subvol_info->parent_uuid, root_item->parent_uuid,
> + BTRFS_UUID_SIZE);
> + memcpy(subvol_info->received_uuid, root_item->received_uuid,
> + BTRFS_UUID_SIZE);
> +
> + subvol_info->ctransid = btrfs_root_ctransid(root_item);
> + subvol_info->ctime.sec = btrfs_stack_timespec_sec(_item->otime);
  obviously ->ctime, 
sorry for messed up.

> + subvol_info->ctime.nsec = btrfs_stack_timespec_nsec(_item->ctime);
> +
> + subvol_info->otransid = btrfs_root_otransid(root_item);
> + subvol_info->otime.sec = btrfs_stack_timespec_sec(_item->otime);
> + subvol_info->otime.nsec = btrfs_stack_timespec_nsec(_item->otime);
> +
> + subvol_info->stransid = btrfs_root_stransid(root_item);
> + subvol_info->stime.sec = btrfs_stack_timespec_sec(_item->stime);
> + subvol_info->stime.nsec = btrfs_stack_timespec_nsec(_item->stime);
> +
> + subvol_info->rtransid = btrfs_root_rtransid(root_item);
> + subvol_info->rtime.sec = btrfs_stack_timespec_sec(_item->rtime);
> + subvol_info->rtime.nsec = btrfs_stack_timespec_nsec(_item->rtime);
> +
> + if (key.objectid != BTRFS_FS_TREE_OBJECTID) {
> + /* Search root tree for ROOT_BACKREF of this subvolume */
> + root = fs_info->tree_root;
> +
> + key.type = BTRFS_ROOT_BACKREF_KEY;
> + key.offset = 0;
> + ret = btrfs_search_slot(NULL, root, , path, 0, 0);
> + if (ret < 0) {
> + goto out;
> + } else if (path->slots[0] >=
> + btrfs_header_nritems(path->nodes[0])) {
> + ret = btrfs_next_leaf(root, path);
> + if (ret < 0) {
> + goto out;
> + } else if (ret > 0) {
> + ret = -EUCLEAN;
> + goto out;
> + }
> + }
> +
> + l = path->nodes[0];
> + slot = path->slots[0];
> + btrfs_item_key_to_cpu(l, , slot);
> + if (key.objectid == subvol_info->id &&
> + key.type == BTRFS_ROOT_BACKREF_KEY) {
> + 

Re: [PATCH 4/6] Btrfs: remove unused check of skip_locking

2018-05-17 Thread Liu Bo
On Wed, May 16, 2018 at 3:03 PM, Nikolay Borisov  wrote:
>
>
> On 15.05.2018 20:52, Liu Bo wrote:
>> The check is superfluous since all of callers who set search_for_commit
>> also have skip_locking set.
>
> This is false. For example btrfs_qgroup_rescan_worker sets
> search_commit_root = 1 but doesn't set skip locking. So either your
> assumption is wrong or the code in btrfs_qgroup_rescan_worker has missed
> that (which seems more likely).
>

I'm a bit confused as I didn't see btrfs_qgroup_rescan_worker() set
path->search_commit_root.

thanks,
liubo

> I think this change necessitates either:
>
> a) ASSERT(p->skip_locking) inside the if (p->search_commit_root) branch
>
> or
>
> b) Unconditionally setting ->skip_locking if we have set
> search_commit_root and removing opencoded set of skip_locking alongside
> search_commit_root.
>
> I think b) is the better option since it provides "fire and forget"
> semantics when you want to search the commit root, since it's only read
> only.
>
>>
>> Signed-off-by: Liu Bo 
>> ---
>>  fs/btrfs/ctree.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index 399839df7a8f..cf34eca41d4e 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -2623,8 +2623,6 @@ static struct extent_buffer 
>> *btrfs_search_slot_get_root(struct btrfs_root *root,
>>   level = btrfs_header_level(b);
>>   if (p->need_commit_sem)
>>   up_read(_info->commit_root_sem);
>> - if (!p->skip_locking)
>> - btrfs_tree_read_lock(b);
>>
>>   goto out;
>>   }
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 3/3] btrfs: Add unprivileged version of ino_lookup ioctl

2018-05-17 Thread Tomohiro Misono
Add unprivileged version of ino_lookup ioctl BTRFS_IOC_INO_LOOKUP_USER
to allow normal users to call "btrfs subvolume list/show" etc. in
combination with BTRFS_IOC_GET_SUBVOL_INFO/BTRFS_IOC_GET_SUBVOL_ROOTREF.

This can be used like BTRFS_IOC_INO_LOOKUP but the argument is
different. This is  because it always searches the fs/file tree
correspoinding to the fd with which this ioctl is called and also
returns the name of bottom subvolume.

The main differences from original ino_lookup ioctl are:
  1. Read + Exec permission will be checked using inode_permission()
 during path construction. -EACCES will be returned in case
 of failure.
  2. Path construction will be stopped at the inode number which
 corresponds to the fd with which this ioctl is called. If
 constructed path does not exist under fd's inode, -EACCES
 will be returned.
  3. The name of bottom subvolume is also searched and filled.

Note that the maximum length of path is shorter 256 (BTRFS_VOL_NAME_MAX+1)
bytes than ino_lookup ioctl because of space of subvolume's name.

Reviewed-by: Gu Jinxiang 
Reviewed-by: Qu Wenruo 
Signed-off-by: Tomohiro Misono 
---
 fs/btrfs/ioctl.c   | 204 +
 include/uapi/linux/btrfs.h |  17 
 2 files changed, 221 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 463ddedd90da..5e8dfa816ba9 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2200,6 +2200,166 @@ static noinline int btrfs_search_path_in_tree(struct 
btrfs_fs_info *info,
return ret;
 }
 
+static noinline int btrfs_search_path_in_tree_user(struct inode *inode,
+   struct btrfs_ioctl_ino_lookup_user_args *args)
+{
+   struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+   struct super_block *sb = inode->i_sb;
+   struct btrfs_key upper_limit = BTRFS_I(inode)->location;
+   u64 treeid = BTRFS_I(inode)->root->root_key.objectid;
+   u64 dirid = args->dirid;
+
+   unsigned long item_off;
+   unsigned long item_len;
+   struct btrfs_inode_ref *iref;
+   struct btrfs_root_ref *rref;
+   struct btrfs_root *root;
+   struct btrfs_path *path;
+   struct btrfs_key key, key2;
+   struct extent_buffer *l;
+   struct inode *temp_inode;
+   char *ptr;
+   int slot;
+   int len;
+   int total_len = 0;
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   /*
+* If the bottom subvolume does not exist directly under upper_limit,
+* construct the path in bottomup way.
+*/
+   if (dirid != upper_limit.objectid) {
+   ptr = >path[BTRFS_INO_LOOKUP_USER_PATH_MAX - 1];
+
+   key.objectid = treeid;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = (u64)-1;
+   root = btrfs_read_fs_root_no_name(fs_info, );
+   if (IS_ERR(root)) {
+   ret = -ENOENT;
+   goto out;
+   }
+
+   key.objectid = dirid;
+   key.type = BTRFS_INODE_REF_KEY;
+   key.offset = (u64)-1;
+   while (1) {
+   ret = btrfs_search_slot(NULL, root, , path, 0, 0);
+   if (ret < 0) {
+   goto out;
+   } else if (ret > 0) {
+   ret = btrfs_previous_item(root, path, dirid,
+ BTRFS_INODE_REF_KEY);
+   if (ret < 0) {
+   goto out;
+   } else if (ret > 0) {
+   ret = -ENOENT;
+   goto out;
+   }
+   }
+
+   l = path->nodes[0];
+   slot = path->slots[0];
+   btrfs_item_key_to_cpu(l, , slot);
+
+   iref = btrfs_item_ptr(l, slot, struct btrfs_inode_ref);
+   len = btrfs_inode_ref_name_len(l, iref);
+   ptr -= len + 1;
+   total_len += len + 1;
+   if (ptr < args->path) {
+   ret = -ENAMETOOLONG;
+   goto out;
+   }
+
+   *(ptr + len) = '/';
+   read_extent_buffer(l, ptr,
+   (unsigned long)(iref + 1), len);
+
+   /* Check the read+exec permission of this directory */
+   ret = btrfs_previous_item(root, path, dirid,
+ BTRFS_INODE_ITEM_KEY);
+   if (ret < 0) {
+   goto out;
+   

[PATCH v6 1/3] btrfs: Add unprivileged ioctl which returns subvolume information

2018-05-17 Thread Tomohiro Misono
Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns
the information of subvolume containing this inode.
(i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.)

Signed-off-by: Tomohiro Misono 
---
 v5 -> v6
- Use brfs_read_fs_root_no_name() to get subvolume root and simplify
  code
 v4 -> v5
- Update error handling of btrfs_next_leaf() to cover all cases
- Return error if ROOT_BACKREF is not found (except top-level)

 fs/btrfs/ioctl.c   | 125 +
 include/uapi/linux/btrfs.h |  51 ++
 2 files changed, 176 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 48e2ddff32bd..31af6e91c614 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2242,6 +2242,129 @@ static noinline int btrfs_ioctl_ino_lookup(struct file 
*file,
return ret;
 }
 
+/* Get the subvolume information in BTRFS_ROOT_ITEM and BTRFS_ROOT_BACKREF */
+static noinline int btrfs_ioctl_get_subvol_info(struct file *file,
+  void __user *argp)
+{
+   struct btrfs_ioctl_get_subvol_info_args *subvol_info;
+   struct btrfs_fs_info *fs_info;
+   struct btrfs_root *root;
+   struct btrfs_path *path;
+   struct btrfs_key key;
+
+   struct btrfs_root_item *root_item;
+   struct btrfs_root_ref *rref;
+   struct extent_buffer *l;
+   int slot;
+
+   unsigned long item_off;
+   unsigned long item_len;
+
+   struct inode *inode;
+   int ret = 0;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   subvol_info = kzalloc(sizeof(*subvol_info), GFP_KERNEL);
+   if (!subvol_info) {
+   btrfs_free_path(path);
+   return -ENOMEM;
+   }
+
+   inode = file_inode(file);
+   fs_info = BTRFS_I(inode)->root->fs_info;
+
+   /* Get root_item of inode's subvolume */
+   key.objectid = BTRFS_I(inode)->root->root_key.objectid;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = (u64)-1;
+   root = btrfs_read_fs_root_no_name(fs_info, );
+   if (IS_ERR(root)) {
+   ret = PTR_ERR(root);
+   goto out;
+   }
+   root_item = >root_item;
+
+   subvol_info->id = key.objectid;
+
+   subvol_info->generation = btrfs_root_generation(root_item);
+   subvol_info->flags = btrfs_root_flags(root_item);
+
+   memcpy(subvol_info->uuid, root_item->uuid, BTRFS_UUID_SIZE);
+   memcpy(subvol_info->parent_uuid, root_item->parent_uuid,
+   BTRFS_UUID_SIZE);
+   memcpy(subvol_info->received_uuid, root_item->received_uuid,
+   BTRFS_UUID_SIZE);
+
+   subvol_info->ctransid = btrfs_root_ctransid(root_item);
+   subvol_info->ctime.sec = btrfs_stack_timespec_sec(_item->otime);
+   subvol_info->ctime.nsec = btrfs_stack_timespec_nsec(_item->ctime);
+
+   subvol_info->otransid = btrfs_root_otransid(root_item);
+   subvol_info->otime.sec = btrfs_stack_timespec_sec(_item->otime);
+   subvol_info->otime.nsec = btrfs_stack_timespec_nsec(_item->otime);
+
+   subvol_info->stransid = btrfs_root_stransid(root_item);
+   subvol_info->stime.sec = btrfs_stack_timespec_sec(_item->stime);
+   subvol_info->stime.nsec = btrfs_stack_timespec_nsec(_item->stime);
+
+   subvol_info->rtransid = btrfs_root_rtransid(root_item);
+   subvol_info->rtime.sec = btrfs_stack_timespec_sec(_item->rtime);
+   subvol_info->rtime.nsec = btrfs_stack_timespec_nsec(_item->rtime);
+
+   if (key.objectid != BTRFS_FS_TREE_OBJECTID) {
+   /* Search root tree for ROOT_BACKREF of this subvolume */
+   root = fs_info->tree_root;
+
+   key.type = BTRFS_ROOT_BACKREF_KEY;
+   key.offset = 0;
+   ret = btrfs_search_slot(NULL, root, , path, 0, 0);
+   if (ret < 0) {
+   goto out;
+   } else if (path->slots[0] >=
+   btrfs_header_nritems(path->nodes[0])) {
+   ret = btrfs_next_leaf(root, path);
+   if (ret < 0) {
+   goto out;
+   } else if (ret > 0) {
+   ret = -EUCLEAN;
+   goto out;
+   }
+   }
+
+   l = path->nodes[0];
+   slot = path->slots[0];
+   btrfs_item_key_to_cpu(l, , slot);
+   if (key.objectid == subvol_info->id &&
+   key.type == BTRFS_ROOT_BACKREF_KEY) {
+   subvol_info->parent_id = key.offset;
+
+   rref = btrfs_item_ptr(l, slot, struct btrfs_root_ref);
+   subvol_info->dirid = btrfs_root_ref_dirid(l, rref);
+
+   item_off = 

[PATCH v6 0/3] btrfs: Add three new unprivileged ioctls to allow normal users to call "sub list/show" etc.

2018-05-17 Thread Tomohiro Misono
changelog:
v5 -> v6
  - Update 1st patch by using btrfs_fs_root_no_name()
  - Return -EUCLEAN when btrfs_next_leaf/next_item() should not fail
  - Add Reviewed-by tag
v4 -> v5
  - Update error handling of 1st/2nd patch. See each log for details
  - Fix misspelling
v3 -> v4
  - call btrfs_next_leaf() after btrfs_search_slot() when the slot
position exceeds the number of items
  - rebased to current misc-next
v2 -> v3
  - fix kbuild test bot warning
v1 -> v2
  - completely reimplement 1st/2nd ioctl to have user friendly api
  - various cleanup, remove unnecessary goto
===

This adds three new unprivileged ioctls:

1st patch:
  ioctl which returns subvolume information of ROOT_ITEM and ROOT_BACKREF
2nd patch:
  ioctl which returns subvolume information of ROOT_REF (without subvolume name)
3rd patch: 
  user version of ino_lookup ioctl which also performs permission check.

They will be used to implement user version of "subvolume list/show" etc.
in user tools.
See each commit log for more detals.

The implementation of btrfs-progs can be found in the ML titled as follows: 
  [PATCH 0/11] btrfs-progs: Rework of "subvolume list/show" and relax the root 
privileges of them

Tomohiro Misono (3):
  btrfs: Add unprivileged ioctl which returns subvolume information
  btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF
  btrfs: Add unprivileged version of ino_lookup ioctl

 fs/btrfs/ioctl.c   | 431 +
 include/uapi/linux/btrfs.h |  84 +
 2 files changed, 515 insertions(+)

-- 
2.14.3


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 2/3] btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF

2018-05-17 Thread Tomohiro Misono
Add unprivileged ioctl BTRFS_IOC_GET_SUBVOL_ROOTREF which
returns ROOT_REF information of the subvolume containing this inode
except the subvolume name (this is because to prevent potential name
leak). The subvolume name will be gained by user version of ino_lookup
ioctl (BTRFS_IOC_INO_LOOKUP_USER) which also performs permission check.

The min id of root ref's subvolume to be searched is specified by
@min_id in struct btrfs_ioctl_get_subvol_rootref_args. After the search
ends, @min_id is set to the last searched root ref's subvolid + 1. Also,
if there are more root refs than BTRFS_MAX_ROOTREF_BUFFER_NUM, -EOVERFLOW
is returned. Therefore the caller can just call this ioctl again without
changing the argument to continue search.

Reviewed-by: Qu Wenruo 
Signed-off-by: Tomohiro Misono 
---
 v4 -> v5
- Update error handling of btrfs_next_leaf() to cover all cases
- Use btrfs_next_item() to reduce the call of btrfs_search_slot()

 fs/btrfs/ioctl.c   | 102 +
 include/uapi/linux/btrfs.h |  16 +++
 2 files changed, 118 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 31af6e91c614..463ddedd90da 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2365,6 +2365,106 @@ static noinline int btrfs_ioctl_get_subvol_info(struct 
file *file,
return ret;
 }
 
+/*
+ * Return ROOT_REF information of the subvolume containing this inode
+ * except the subvolume name.
+ */
+static noinline int btrfs_ioctl_get_subvol_rootref(struct file *file,
+  void __user *argp)
+{
+   struct btrfs_ioctl_get_subvol_rootref_args *rootrefs;
+   struct btrfs_root_ref *rref;
+   struct btrfs_root *root;
+   struct btrfs_path *path;
+   struct btrfs_key key;
+
+   struct extent_buffer *l;
+   int slot;
+
+   struct inode *inode;
+   int ret;
+   u64 objectid;
+   u8 found;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   rootrefs = memdup_user(argp, sizeof(*rootrefs));
+   if (!rootrefs) {
+   btrfs_free_path(path);
+   return -ENOMEM;
+   }
+
+   inode = file_inode(file);
+   root = BTRFS_I(inode)->root->fs_info->tree_root;
+   objectid = BTRFS_I(inode)->root->root_key.objectid;
+
+   key.objectid = objectid;
+   key.type = BTRFS_ROOT_REF_KEY;
+   key.offset = rootrefs->min_id;
+   found = 0;
+
+   ret = btrfs_search_slot(NULL, root, , path, 0, 0);
+   if (ret < 0) {
+   goto out;
+   } else if (path->slots[0] >=
+   btrfs_header_nritems(path->nodes[0])) {
+   ret = btrfs_next_leaf(root, path);
+   if (ret < 0) {
+   goto out;
+   } else if (ret > 0) {
+   ret = -EUCLEAN;
+   goto out;
+   }
+   }
+   while (1) {
+   l = path->nodes[0];
+   slot = path->slots[0];
+
+   btrfs_item_key_to_cpu(l, , slot);
+   if (key.objectid != objectid ||
+   key.type != BTRFS_ROOT_REF_KEY) {
+   ret = 0;
+   goto out;
+   }
+
+   if (found == BTRFS_MAX_ROOTREF_BUFFER_NUM) {
+   ret = -EOVERFLOW;
+   goto out;
+   }
+
+   rref = btrfs_item_ptr(l, slot, struct btrfs_root_ref);
+   rootrefs->rootref[found].subvolid = key.offset;
+   rootrefs->rootref[found].dirid =
+ btrfs_root_ref_dirid(l, rref);
+   found++;
+
+   ret = btrfs_next_item(root, path);
+   if (ret < 0) {
+   goto out;
+   } else if (ret > 0) {
+   ret = -EUCLEAN;
+   goto out;
+   }
+   }
+
+out:
+   if (!ret || ret == -EOVERFLOW) {
+   rootrefs->num_items = found;
+   /* update min_id for next search */
+   if (found)
+   rootrefs->min_id =
+   rootrefs->rootref[found - 1].subvolid + 1;
+   if (copy_to_user(argp, rootrefs, sizeof(*rootrefs)))
+   ret = -EFAULT;
+   }
+
+   btrfs_free_path(path);
+   kfree(rootrefs);
+   return ret;
+}
+
 static noinline int btrfs_ioctl_snap_destroy(struct file *file,
 void __user *arg)
 {
@@ -5499,6 +5599,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_set_features(file, argp);
case BTRFS_IOC_GET_SUBVOL_INFO:
return btrfs_ioctl_get_subvol_info(file, argp);
+   case BTRFS_IOC_GET_SUBVOL_ROOTREF:
+   return btrfs_ioctl_get_subvol_rootref(file, argp);
}

Re: [PATCH 2/6] Btrfs: use more straightforward extent_buffer_uptodate

2018-05-17 Thread Liu Bo
On Wed, May 16, 2018 at 2:43 PM, Nikolay Borisov  wrote:
>
>
> On 15.05.2018 20:52, Liu Bo wrote:
>> In read_block_for_search(), it's straightforward to use
>> extent_buffer_uptodate() instead since 0 is passed as parent transid to
>
> "instead of the more heavyweight btrfs_buffer_update"
>

I don't think it's about heavyweight, they're actually equivalent in this case.

I just want to reduce the burden of reading these code as
verify_parent_transid() really has some corner cases to think about.

>> btrfs_buffer_uptodate(), which means the check for parent transid is not
>> needed.
>>
>> Signed-off-by: Liu Bo 
>
> Codewise LGTM:
>
> Reviewed-by: Nikolay Borisov 

Thanks for taking a look at this.

thanks,
liubo

>> ---
>>  fs/btrfs/ctree.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index 9fa3d77c98d4..a96d308c51b8 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -2445,7 +2445,7 @@ noinline void btrfs_unlock_up_safe(struct btrfs_path 
>> *path, int level)
>>* and give up so that our caller doesn't loop forever
>>* on our EAGAINs.
>>*/
>> - if (!btrfs_buffer_uptodate(tmp, 0, 0))
>> + if (!extent_buffer_uptodate(tmp))
>>   ret = -EIO;
>>   free_extent_buffer(tmp);
>>   } else {
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/3] btrfs: Add unprivileged ioctl which returns subvolume information

2018-05-17 Thread Misono Tomohiro
On 2018/05/18 10:10, Qu Wenruo wrote:
> 
> 
> On 2018年05月18日 09:00, Misono Tomohiro wrote:
>> On 2018/05/17 15:39, Qu Wenruo wrote:
>>>
>>>
>>> On 2018年05月16日 13:49, Tomohiro Misono wrote:
 Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns
 the information of subvolume containing this inode.
 (i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.)

 Signed-off-by: Tomohiro Misono 
 ---
  v4 -> v5
 - Update error handling of btrfs_next_leaf() to cover all cases
 - Return error if ROOT_BACKREF is not found (except top-level)

  fs/btrfs/ioctl.c   | 146 
 +
  include/uapi/linux/btrfs.h |  51 
  2 files changed, 197 insertions(+)

 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index 48e2ddff32bd..c1c9ae9a937d 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -2242,6 +2242,150 @@ static noinline int btrfs_ioctl_ino_lookup(struct 
 file *file,
return ret;
  }
  
 +/* Get the subvolume information in BTRFS_ROOT_ITEM and 
 BTRFS_ROOT_BACKREF */
 +static noinline int btrfs_ioctl_get_subvol_info(struct file *file,
 + void __user *argp)
 +{
 +  struct btrfs_ioctl_get_subvol_info_args *subvol_info;
 +  struct btrfs_root *root;
 +  struct btrfs_path *path;
 +  struct btrfs_key key;
 +
 +  struct btrfs_root_item root_item;
 +  struct btrfs_root_ref *rref;
 +  struct extent_buffer *l;
 +  int slot;
 +
 +  unsigned long item_off;
 +  unsigned long item_len;
 +
 +  struct inode *inode;
 +  int ret;
 +
 +  path = btrfs_alloc_path();
 +  if (!path)
 +  return -ENOMEM;
 +
 +  subvol_info = kzalloc(sizeof(*subvol_info), GFP_KERNEL);
 +  if (!subvol_info) {
 +  btrfs_free_path(path);
 +  return -ENOMEM;
 +  }
 +
 +  inode = file_inode(file);
 +  root = BTRFS_I(inode)->root->fs_info->tree_root;
 +
 +  key.objectid = BTRFS_I(inode)->root->root_key.objectid;
 +  key.type = BTRFS_ROOT_ITEM_KEY;
 +  key.offset = 0;
 +
 +  ret = btrfs_search_slot(NULL, root, , path, 0, 0);
>>>
>>> What about just using btrfs_get_fs_root()?
>>>
>>> It would save several lines, and have better error handling along with
>>> better tree cache.
>>
>> Here we search ROOT_ITEM of subvolume in root_tree and
>> I don't think btrfs_get_fs_root() would save lines, right?
> 
> btrfs_root has root_item member.
> 
> So just btrfs_get_fs_root(), then use root->root_item.

Ok, I see. Thanks for you kind help.

> 
> 
 +  if (ret < 0) {
 +  goto out;
 +  } else if (ret > 0) {
 +  u64 objectid = key.objectid;
 +
 +  if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
 +  ret = btrfs_next_leaf(root, path);
 +  if (ret < 0) {
 +  goto out;
 +  } else if (ret > 0) {
 +  ret = -ENOENT;
 +  goto out;
 +  }
 +  }
 +
 +  /* If the subvolume is a snapshot, offset is not zero */
 +  btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
 +  if (key.objectid != objectid ||
 +  key.type != BTRFS_ROOT_ITEM_KEY) {
 +  ret = -ENOENT;
 +  goto out;
 +  }
 +  }
> 
> These lines can be saved.
> 
> Thanks,
> Qu
> 
 +  l = path->nodes[0];
 +  slot = path->slots[0];
 +  item_off = btrfs_item_ptr_offset(l, slot);
 +  item_len = btrfs_item_size_nr(l, slot);
 +  read_extent_buffer(l, _item, item_off, item_len);
 +
 +  subvol_info->id = key.objectid;
 +
 +  subvol_info->generation = btrfs_root_generation(_item);
 +  subvol_info->flags = btrfs_root_flags(_item);
 +
 +  memcpy(subvol_info->uuid, root_item.uuid, BTRFS_UUID_SIZE);
 +  memcpy(subvol_info->parent_uuid, root_item.parent_uuid,
 +  BTRFS_UUID_SIZE);
 +  memcpy(subvol_info->received_uuid, root_item.received_uuid,
 +  BTRFS_UUID_SIZE);
 +
 +  subvol_info->ctransid = btrfs_root_ctransid(_item);
 +  subvol_info->ctime.sec = btrfs_stack_timespec_sec(_item.ctime);
 +  subvol_info->ctime.nsec = btrfs_stack_timespec_nsec(_item.ctime);
 +
 +  subvol_info->otransid = btrfs_root_otransid(_item);
 +  subvol_info->otime.sec = btrfs_stack_timespec_sec(_item.otime);
 +  subvol_info->otime.nsec = btrfs_stack_timespec_nsec(_item.otime);
 +
 +  subvol_info->stransid = btrfs_root_stransid(_item);
 +  subvol_info->stime.sec = btrfs_stack_timespec_sec(_item.stime);
 

Re: [PATCH v5 1/3] btrfs: Add unprivileged ioctl which returns subvolume information

2018-05-17 Thread Qu Wenruo


On 2018年05月18日 09:00, Misono Tomohiro wrote:
> On 2018/05/17 15:39, Qu Wenruo wrote:
>>
>>
>> On 2018年05月16日 13:49, Tomohiro Misono wrote:
>>> Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns
>>> the information of subvolume containing this inode.
>>> (i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.)
>>>
>>> Signed-off-by: Tomohiro Misono 
>>> ---
>>>  v4 -> v5
>>> - Update error handling of btrfs_next_leaf() to cover all cases
>>> - Return error if ROOT_BACKREF is not found (except top-level)
>>>
>>>  fs/btrfs/ioctl.c   | 146 
>>> +
>>>  include/uapi/linux/btrfs.h |  51 
>>>  2 files changed, 197 insertions(+)
>>>
>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>> index 48e2ddff32bd..c1c9ae9a937d 100644
>>> --- a/fs/btrfs/ioctl.c
>>> +++ b/fs/btrfs/ioctl.c
>>> @@ -2242,6 +2242,150 @@ static noinline int btrfs_ioctl_ino_lookup(struct 
>>> file *file,
>>> return ret;
>>>  }
>>>  
>>> +/* Get the subvolume information in BTRFS_ROOT_ITEM and BTRFS_ROOT_BACKREF 
>>> */
>>> +static noinline int btrfs_ioctl_get_subvol_info(struct file *file,
>>> +  void __user *argp)
>>> +{
>>> +   struct btrfs_ioctl_get_subvol_info_args *subvol_info;
>>> +   struct btrfs_root *root;
>>> +   struct btrfs_path *path;
>>> +   struct btrfs_key key;
>>> +
>>> +   struct btrfs_root_item root_item;
>>> +   struct btrfs_root_ref *rref;
>>> +   struct extent_buffer *l;
>>> +   int slot;
>>> +
>>> +   unsigned long item_off;
>>> +   unsigned long item_len;
>>> +
>>> +   struct inode *inode;
>>> +   int ret;
>>> +
>>> +   path = btrfs_alloc_path();
>>> +   if (!path)
>>> +   return -ENOMEM;
>>> +
>>> +   subvol_info = kzalloc(sizeof(*subvol_info), GFP_KERNEL);
>>> +   if (!subvol_info) {
>>> +   btrfs_free_path(path);
>>> +   return -ENOMEM;
>>> +   }
>>> +
>>> +   inode = file_inode(file);
>>> +   root = BTRFS_I(inode)->root->fs_info->tree_root;
>>> +
>>> +   key.objectid = BTRFS_I(inode)->root->root_key.objectid;
>>> +   key.type = BTRFS_ROOT_ITEM_KEY;
>>> +   key.offset = 0;
>>> +
>>> +   ret = btrfs_search_slot(NULL, root, , path, 0, 0);
>>
>> What about just using btrfs_get_fs_root()?
>>
>> It would save several lines, and have better error handling along with
>> better tree cache.
> 
> Here we search ROOT_ITEM of subvolume in root_tree and
> I don't think btrfs_get_fs_root() would save lines, right?

btrfs_root has root_item member.

So just btrfs_get_fs_root(), then use root->root_item.


>>> +   if (ret < 0) {
>>> +   goto out;
>>> +   } else if (ret > 0) {
>>> +   u64 objectid = key.objectid;
>>> +
>>> +   if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
>>> +   ret = btrfs_next_leaf(root, path);
>>> +   if (ret < 0) {
>>> +   goto out;
>>> +   } else if (ret > 0) {
>>> +   ret = -ENOENT;
>>> +   goto out;
>>> +   }
>>> +   }
>>> +
>>> +   /* If the subvolume is a snapshot, offset is not zero */
>>> +   btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
>>> +   if (key.objectid != objectid ||
>>> +   key.type != BTRFS_ROOT_ITEM_KEY) {
>>> +   ret = -ENOENT;
>>> +   goto out;
>>> +   }
>>> +   }

These lines can be saved.

Thanks,
Qu

>>> +   l = path->nodes[0];
>>> +   slot = path->slots[0];
>>> +   item_off = btrfs_item_ptr_offset(l, slot);
>>> +   item_len = btrfs_item_size_nr(l, slot);
>>> +   read_extent_buffer(l, _item, item_off, item_len);
>>> +
>>> +   subvol_info->id = key.objectid;
>>> +
>>> +   subvol_info->generation = btrfs_root_generation(_item);
>>> +   subvol_info->flags = btrfs_root_flags(_item);
>>> +
>>> +   memcpy(subvol_info->uuid, root_item.uuid, BTRFS_UUID_SIZE);
>>> +   memcpy(subvol_info->parent_uuid, root_item.parent_uuid,
>>> +   BTRFS_UUID_SIZE);
>>> +   memcpy(subvol_info->received_uuid, root_item.received_uuid,
>>> +   BTRFS_UUID_SIZE);
>>> +
>>> +   subvol_info->ctransid = btrfs_root_ctransid(_item);
>>> +   subvol_info->ctime.sec = btrfs_stack_timespec_sec(_item.ctime);
>>> +   subvol_info->ctime.nsec = btrfs_stack_timespec_nsec(_item.ctime);
>>> +
>>> +   subvol_info->otransid = btrfs_root_otransid(_item);
>>> +   subvol_info->otime.sec = btrfs_stack_timespec_sec(_item.otime);
>>> +   subvol_info->otime.nsec = btrfs_stack_timespec_nsec(_item.otime);
>>> +
>>> +   subvol_info->stransid = btrfs_root_stransid(_item);
>>> +   subvol_info->stime.sec = btrfs_stack_timespec_sec(_item.stime);
>>> +   subvol_info->stime.nsec = btrfs_stack_timespec_nsec(_item.stime);
>>> +
>>> +   subvol_info->rtransid = btrfs_root_rtransid(_item);
>>> +   

Re: [PATCH v5 1/3] btrfs: Add unprivileged ioctl which returns subvolume information

2018-05-17 Thread Misono Tomohiro
On 2018/05/17 15:39, Qu Wenruo wrote:
> 
> 
> On 2018年05月16日 13:49, Tomohiro Misono wrote:
>> Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns
>> the information of subvolume containing this inode.
>> (i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.)
>>
>> Signed-off-by: Tomohiro Misono 
>> ---
>>  v4 -> v5
>> - Update error handling of btrfs_next_leaf() to cover all cases
>> - Return error if ROOT_BACKREF is not found (except top-level)
>>
>>  fs/btrfs/ioctl.c   | 146 
>> +
>>  include/uapi/linux/btrfs.h |  51 
>>  2 files changed, 197 insertions(+)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 48e2ddff32bd..c1c9ae9a937d 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -2242,6 +2242,150 @@ static noinline int btrfs_ioctl_ino_lookup(struct 
>> file *file,
>>  return ret;
>>  }
>>  
>> +/* Get the subvolume information in BTRFS_ROOT_ITEM and BTRFS_ROOT_BACKREF 
>> */
>> +static noinline int btrfs_ioctl_get_subvol_info(struct file *file,
>> +   void __user *argp)
>> +{
>> +struct btrfs_ioctl_get_subvol_info_args *subvol_info;
>> +struct btrfs_root *root;
>> +struct btrfs_path *path;
>> +struct btrfs_key key;
>> +
>> +struct btrfs_root_item root_item;
>> +struct btrfs_root_ref *rref;
>> +struct extent_buffer *l;
>> +int slot;
>> +
>> +unsigned long item_off;
>> +unsigned long item_len;
>> +
>> +struct inode *inode;
>> +int ret;
>> +
>> +path = btrfs_alloc_path();
>> +if (!path)
>> +return -ENOMEM;
>> +
>> +subvol_info = kzalloc(sizeof(*subvol_info), GFP_KERNEL);
>> +if (!subvol_info) {
>> +btrfs_free_path(path);
>> +return -ENOMEM;
>> +}
>> +
>> +inode = file_inode(file);
>> +root = BTRFS_I(inode)->root->fs_info->tree_root;
>> +
>> +key.objectid = BTRFS_I(inode)->root->root_key.objectid;
>> +key.type = BTRFS_ROOT_ITEM_KEY;
>> +key.offset = 0;
>> +
>> +ret = btrfs_search_slot(NULL, root, , path, 0, 0);
> 
> What about just using btrfs_get_fs_root()?
> 
> It would save several lines, and have better error handling along with
> better tree cache.

Here we search ROOT_ITEM of subvolume in root_tree and
I don't think btrfs_get_fs_root() would save lines, right?

Thanks,

> 
> Thanks,
> Qu
> 
>> +if (ret < 0) {
>> +goto out;
>> +} else if (ret > 0) {
>> +u64 objectid = key.objectid;
>> +
>> +if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
>> +ret = btrfs_next_leaf(root, path);
>> +if (ret < 0) {
>> +goto out;
>> +} else if (ret > 0) {
>> +ret = -ENOENT;
>> +goto out;
>> +}
>> +}
>> +
>> +/* If the subvolume is a snapshot, offset is not zero */
>> +btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
>> +if (key.objectid != objectid ||
>> +key.type != BTRFS_ROOT_ITEM_KEY) {
>> +ret = -ENOENT;
>> +goto out;
>> +}
>> +}
>> +
>> +l = path->nodes[0];
>> +slot = path->slots[0];
>> +item_off = btrfs_item_ptr_offset(l, slot);
>> +item_len = btrfs_item_size_nr(l, slot);
>> +read_extent_buffer(l, _item, item_off, item_len);
>> +
>> +subvol_info->id = key.objectid;
>> +
>> +subvol_info->generation = btrfs_root_generation(_item);
>> +subvol_info->flags = btrfs_root_flags(_item);
>> +
>> +memcpy(subvol_info->uuid, root_item.uuid, BTRFS_UUID_SIZE);
>> +memcpy(subvol_info->parent_uuid, root_item.parent_uuid,
>> +BTRFS_UUID_SIZE);
>> +memcpy(subvol_info->received_uuid, root_item.received_uuid,
>> +BTRFS_UUID_SIZE);
>> +
>> +subvol_info->ctransid = btrfs_root_ctransid(_item);
>> +subvol_info->ctime.sec = btrfs_stack_timespec_sec(_item.ctime);
>> +subvol_info->ctime.nsec = btrfs_stack_timespec_nsec(_item.ctime);
>> +
>> +subvol_info->otransid = btrfs_root_otransid(_item);
>> +subvol_info->otime.sec = btrfs_stack_timespec_sec(_item.otime);
>> +subvol_info->otime.nsec = btrfs_stack_timespec_nsec(_item.otime);
>> +
>> +subvol_info->stransid = btrfs_root_stransid(_item);
>> +subvol_info->stime.sec = btrfs_stack_timespec_sec(_item.stime);
>> +subvol_info->stime.nsec = btrfs_stack_timespec_nsec(_item.stime);
>> +
>> +subvol_info->rtransid = btrfs_root_rtransid(_item);
>> +subvol_info->rtime.sec = btrfs_stack_timespec_sec(_item.rtime);
>> +subvol_info->rtime.nsec = btrfs_stack_timespec_nsec(_item.rtime);
>> +
>> +btrfs_release_path(path);
>> +if (key.objectid != 

Re: [PATCH 2/2] vfs: dedupe should return EPERM if permission is not granted

2018-05-17 Thread Mark Fasheh
On Thu, May 17, 2018 at 01:15:51AM -0400, Zygo Blaxell wrote:
> On Sun, May 13, 2018 at 11:26:39AM -0700, Darrick J. Wong wrote:
> > On Sun, May 13, 2018 at 06:21:52PM +, Mark Fasheh wrote:
> > > On Fri, May 11, 2018 at 05:06:34PM -0700, Darrick J. Wong wrote:
> > > > On Fri, May 11, 2018 at 12:26:51PM -0700, Mark Fasheh wrote:
> > > > > Right now we return EINVAL if a process does not have permission to 
> > > > > dedupe a
> > > > > file. This was an oversight on my part. EPERM gives a true 
> > > > > description of
> > > > > the nature of our error, and EINVAL is already used for the case that 
> > > > > the
> > > > > filesystem does not support dedupe.
> > > > > 
> > > > > Signed-off-by: Mark Fasheh 
> > > > > ---
> > > > >  fs/read_write.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > > index 77986a2e2a3b..8edef43a182c 100644
> > > > > --- a/fs/read_write.c
> > > > > +++ b/fs/read_write.c
> > > > > @@ -2038,7 +2038,7 @@ int vfs_dedupe_file_range(struct file *file, 
> > > > > struct file_dedupe_range *same)
> > > > >   info->status = -EINVAL;
> > > > >   } else if (!(is_admin || (dst_file->f_mode & 
> > > > > FMODE_WRITE) ||
> > > > >uid_eq(current_fsuid(), dst->i_uid))) {
> > > > > - info->status = -EINVAL;
> > > > > + info->status = -EPERM;
> > > > 
> > > > Hmm, are we allowed to change this aspect of the kabi after the fact?
> > > > 
> > > > Granted, we're only trading one error code for another, but will the
> > > > existing users of this care?  xfs_io won't and I assume duperemove won't
> > > > either, but what about bees? :)
> > > 
> > > Yeah if you see my initial e-mail I check bees and also rust-btrfs. I 
> > > think
> > > this is fine as we're simply expanding on an error code return. There's no
> > > magic behavior expected with respect to these error codes either.
> > 
> > Ok.  No objections from me, then.
> > 
> > Acked-by: Darrick J. Wong 
> 
> For what it's worth, no objection from me either.  ;)
> 
> bees runs only with admin privilege and will never hit the modified line.

Awesome, thanks for the review Zygo.
--Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] vfs: allow dedupe of user owned read-only files

2018-05-17 Thread Mark Fasheh
On Sun, May 13, 2018 at 10:50:25PM +0200, Adam Borowski wrote:
> On Sun, May 13, 2018 at 06:16:53PM +, Mark Fasheh wrote:
> > On Sat, May 12, 2018 at 04:49:20AM +0200, Adam Borowski wrote:
> > > On Fri, May 11, 2018 at 12:26:50PM -0700, Mark Fasheh wrote:
> > > > The permission check in vfs_dedupe_file_range() is too coarse - We
> > > > only allow dedupe of the destination file if the user is root, or
> > > > they have the file open for write.
> > > > 
> > > > This effectively limits a non-root user from deduping their own
> > > > read-only files. As file data during a dedupe does not change,
> > > > this is unexpected behavior and this has caused a number of issue
> > > > reports.
> [...]
> > > > So change the check so we allow dedupe on the target if:
> > > > 
> > > > - the root or admin is asking for it
> > > > - the owner of the file is asking for the dedupe
> > > > - the process has write access
> > > 
> > > I submitted a similar patch in May 2016, yet it has never been applied
> > > despite multiple pings, with no NAK.  My version allowed dedupe if:
> > > - the root or admin is asking for it
> > > - the file has w permission (on the inode -- ie, could have been opened 
> > > rw)
> > 
> > Ahh, yes I see that now. I did wind up acking it too :)
> > > 
> > > I like this new version better than mine: "root or owner or w" is more
> > > Unixy than "could have been opened w".
> > 
> > I agree, IMHO the behavior in this patch is intuitive. What we had before
> > would surprise users.
> 
> Actually, there's one reason to still consider "could have been opened w":
> with it, deduplication programs can simply open the file r and not care
> about ETXTBSY at all.  Otherwise, every program needs to stat() and have
> logic to pick the proper argument to the open() call (r if owner/root,
> rw or w if not).

That makes sense. The goal after all is to be able to just open r and not
worry about it. I hadn't considered the other possibilities that
inode_permission() covers. I'll make that change for my next series.

Thanks,
  --Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 03/12] Btrfs: don't BUG_ON() in btrfs_truncate_inode_items()

2018-05-17 Thread David Sterba
On Fri, May 11, 2018 at 01:13:31PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> btrfs_free_extent() can fail because of ENOMEM. There's no reason to
> panic here, we can just abort the transaction.
> 
> Fixes: f4b9aa8d3b87 ("btrfs_truncate")
> Reviewed-by: Nikolay Borisov 
> Signed-off-by: Omar Sandoval 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Remove stale comment about select_delayed_ref

2018-05-17 Thread David Sterba
On Thu, May 17, 2018 at 02:16:29PM +0300, Nikolay Borisov wrote:
> select_delayed_ref really just gets the next delayed ref which has to
> be processed - either an add ref or drop ref. We never go back for
> anything. So the comment is actually bogus, just remove it.
> 
> Signed-off-by: Nikolay Borisov 

Added to misc-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] btrfs: sysfs: Add entry which shows rmdir(2) can work for subvolume

2018-05-17 Thread David Sterba
On Thu, May 17, 2018 at 02:24:51PM +0900, Misono Tomohiro wrote:
> Deletion of a subvolume by rmdir(2) has become allowed by the
> 'commit cd2decf640b1 ("btrfs: Allow rmdir(2) to delete an empty
> subvolume")'.
> 
> It is a kind of new feature and this commits add a sysfs entry
>   /sys/fs/btrfs/features/rmdir_subvol
> to indicate the availability of the feature so that a user program
> (e.g. xfstests) can detect it.
> 
> Prior to this commit, all entry in /sys/fs/btrfs/features is a feature
> which depends on feature bits of superblock (i.e. each feature affects
> on-disk format) and managed by attribute_group "btrfs_feature_attr_group".
> For each fs, entries in /sys/fs/btrfs/UUID/features indicate which
> features are enabled (or can be changed online) for the fs.
> 
> However, rmdir_subvol feature only depends on kernel modules. Therefore
> new attribute_group "btrfs_static_feature_attr_group" is introduced and
> sysfs_merge_group() is used to share /sys/fs/btrfs/features directory.
> Features in "btrfs_static_feature_attr_group" won't be listed in each
> /sys/fs/btrfs/UUID/features.
> 
> Signed-off-by: Tomohiro Misono 
> ---
> Hi David,
> 
> Sorry for the misunderstanding.

No problem.

> How about this version which uses /sys/fs/btrfs/features
> while using different attribute_group for static features.

Perfect, that's what I had in mind, added to misc-next, thanks.

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: use kvzalloc for EXTENT_SAME temporary data

2018-05-17 Thread Nikolay Borisov


On 17.05.2018 18:41, David Sterba wrote:
> The dedupe range is 16 MiB, with 4 KiB pages and 8 byte pointers, the
> arrays can be 32KiB large. To avoid allocation failures due to
> fragmented memory, use the allocation with fallback to vmalloc.
> 
> The arrays are allocated and freed only inside btrfs_extent_same and
> reused for all the ranges.
> 
> Signed-off-by: David Sterba 

LGTM:

Reviewed-by: Nikolay Borisov 
> ---
> 
> Based on git://github.com/kdave/btrfs-devel.git ext/timofey/dedupe-16mb-limit
> 
> v2:
> - use kvmalloc_array
> - use __GFP_ZERO, as there's no kvzalloc_array
> - check that there are no other frees missed,
>   see https://patchwork.kernel.org/patch/10374941/ for reference
> 
>  fs/btrfs/ioctl.c | 16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index b572e38b4b64..20751829104d 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3178,12 +3178,13 @@ static int btrfs_extent_same(struct inode *src, u64 
> loff, u64 olen,
>* locking. We use an array for the page pointers. Size of the array is
>* bounded by len, which is in turn bounded by BTRFS_MAX_DEDUPE_LEN.
>*/
> - cmp.src_pages = kcalloc(num_pages, sizeof(struct page *), GFP_KERNEL);
> - cmp.dst_pages = kcalloc(num_pages, sizeof(struct page *), GFP_KERNEL);
> + cmp.src_pages = kvmalloc_array(num_pages, sizeof(struct page *),
> +GFP_KERNEL | __GFP_ZERO);
> + cmp.dst_pages = kvmalloc_array(num_pages, sizeof(struct page *),
> +GFP_KERNEL | __GFP_ZERO);
>   if (!cmp.src_pages || !cmp.dst_pages) {
> - kfree(cmp.src_pages);
> - kfree(cmp.dst_pages);
> - return -ENOMEM;
> + ret = -ENOMEM;
> + goto out_free;
>   }
>  
>   if (same_inode)
> @@ -3211,8 +3212,9 @@ static int btrfs_extent_same(struct inode *src, u64 
> loff, u64 olen,
>   else
>   btrfs_double_inode_unlock(src, dst);
>  
> - kfree(cmp.src_pages);
> - kfree(cmp.dst_pages);
> +out_free:
> + kvfree(cmp.src_pages);
> + kvfree(cmp.dst_pages);
>  
>   return ret;
>  }
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: use kvzalloc for EXTENT_SAME temporary data

2018-05-17 Thread David Sterba
The dedupe range is 16 MiB, with 4 KiB pages and 8 byte pointers, the
arrays can be 32KiB large. To avoid allocation failures due to
fragmented memory, use the allocation with fallback to vmalloc.

The arrays are allocated and freed only inside btrfs_extent_same and
reused for all the ranges.

Signed-off-by: David Sterba 
---

Based on git://github.com/kdave/btrfs-devel.git ext/timofey/dedupe-16mb-limit

v2:
- use kvmalloc_array
- use __GFP_ZERO, as there's no kvzalloc_array
- check that there are no other frees missed,
  see https://patchwork.kernel.org/patch/10374941/ for reference

 fs/btrfs/ioctl.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b572e38b4b64..20751829104d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3178,12 +3178,13 @@ static int btrfs_extent_same(struct inode *src, u64 
loff, u64 olen,
 * locking. We use an array for the page pointers. Size of the array is
 * bounded by len, which is in turn bounded by BTRFS_MAX_DEDUPE_LEN.
 */
-   cmp.src_pages = kcalloc(num_pages, sizeof(struct page *), GFP_KERNEL);
-   cmp.dst_pages = kcalloc(num_pages, sizeof(struct page *), GFP_KERNEL);
+   cmp.src_pages = kvmalloc_array(num_pages, sizeof(struct page *),
+  GFP_KERNEL | __GFP_ZERO);
+   cmp.dst_pages = kvmalloc_array(num_pages, sizeof(struct page *),
+  GFP_KERNEL | __GFP_ZERO);
if (!cmp.src_pages || !cmp.dst_pages) {
-   kfree(cmp.src_pages);
-   kfree(cmp.dst_pages);
-   return -ENOMEM;
+   ret = -ENOMEM;
+   goto out_free;
}
 
if (same_inode)
@@ -3211,8 +3212,9 @@ static int btrfs_extent_same(struct inode *src, u64 loff, 
u64 olen,
else
btrfs_double_inode_unlock(src, dst);
 
-   kfree(cmp.src_pages);
-   kfree(cmp.dst_pages);
+out_free:
+   kvfree(cmp.src_pages);
+   kvfree(cmp.dst_pages);
 
return ret;
 }
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] btrfs: add read mirror policy

2018-05-17 Thread Austin S. Hemmelgarn

On 2018-05-17 10:46, Jeff Mahoney wrote:

On 5/16/18 6:35 PM, David Sterba wrote:

On Wed, May 16, 2018 at 06:03:56PM +0800, Anand Jain wrote:

Not yet ready for the integration. As I need to introduce
-o no_read_mirror_policy instead of -o read_mirror_policy=-


Mount option is mostly likely not the right interface for setting such
options, as usual.



I've seen a few alternate suggestions in the thread.  I suppose the real
question is: what and where is the intended persistence for this choice?

A mount option gets it via fstab.  How would a user be expected to set
it consistently via ioctl on each mount?  Properties could work, but
there's more discussion needed there.  Personally, I like the property
idea since it could conceivably be used on a per-file basis.


For the specific proposed use case (the tests), it probably doesn't need 
to be persistent beyond mount options.


However, this also allows for a trivial configuration using a slow 
storage device to provide redundancy for a fast storage device of the 
same size, which is potentially very useful for some people.  In that 
case, I can see most people who would be using it wanting it to follow 
the filesystem regardless of what context it's being mounted in (for 
example, it shouldn't need an extra option if mounted from a recovery 
environment or if it's moved to another system).


Most of my reason for recommending properties is that filesystem level 
properties appear to be the best thing BTRFS has to store per-volume 
configuration that's supposed to stay with the volume, despite not 
really being used for that even though there are quite a few mount 
options that are logical candidates for this type of thing (for example, 
the `ssd` options, `metadata_ratio`, and `max_inline` all make more 
logical sense as a property of the volume, not the mount).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4] test online label ioctl

2018-05-17 Thread Eric Sandeen

This tests the online label ioctl that btrfs has, which has been
recently proposed for XFS.

To run, it requires an updated xfs_io with the label command and a
filesystem that supports it

A slight change here to _require_xfs_io_command as well, so that tests
which simply fail with "Inappropriate ioctl" can be caught in the
common case.

Signed-off-by: Eric Sandeen 
---

Now with new and improved sequential V4 versioning!

This passes on btrfs, _notruns on xfs/ext4 of yore, and passes
on xfs w/ my online label patchset (as long as xfs_io has the new
capability)

V2: Add a max label length helper
Set the proper btrfs max label length o_O oops
Filter trailing whitespace from blkid output

V3: lowercase local vars, simplify max label len function

V4: use/test new -s / -c options to set and clear

diff --git a/common/rc b/common/rc
index ffe53236..333cfb82 100644
--- a/common/rc
+++ b/common/rc
@@ -2144,6 +2144,9 @@ _require_xfs_io_command()
echo $testio | grep -q "Inappropriate ioctl" && \
_notrun "xfs_io $command support is missing"
;;
+   "label")
+   testio=`$XFS_IO_PROG -c "label" $TEST_DIR 2>&1`
+   ;;
"open")
# -c "open $f" is broken in xfs_io <= 4.8. Along with the fix,
# a new -C flag was introduced to execute one shot commands.
@@ -2182,7 +2185,7 @@ _require_xfs_io_command()
rm -f $testfile 2>&1 > /dev/null
echo $testio | grep -q "not found" && \
_notrun "xfs_io $command support is missing"
-   echo $testio | grep -q "Operation not supported" && \
+   echo $testio | grep -q "Operation not supported\|Inappropriate ioctl" 
&& \
_notrun "xfs_io $command failed (old kernel/wrong fs?)"
echo $testio | grep -q "Invalid" && \
_notrun "xfs_io $command failed (old kernel/wrong fs/bad args?)"
@@ -3788,6 +3791,29 @@ _require_scratch_feature()
esac
 }
 
+# The maximum filesystem label length, /not/ including terminating NULL

+_label_get_max()
+{
+   case $FSTYP in
+   xfs)
+   echo 12
+   ;;
+   btrfs)
+   echo 255
+   ;;
+   *)
+   _notrun "$FSTYP does not define maximum label length"
+   ;;
+   esac
+}
+
+# Helper to check above early in a script
+_require_label_get_max()
+{
+   # Just call _label_get_max which will notrun if appropriate
+   dummy=$(_label_get_max)
+}
+
 init_rc
 
 

diff --git a/tests/generic/488 b/tests/generic/488
new file mode 100755
index ..3616522d
--- /dev/null
+++ b/tests/generic/488
@@ -0,0 +1,95 @@
+#! /bin/bash
+# FS QA Test 488
+#
+# Test the online filesystem label set/get ioctls
+#
+#---
+# Copyright (c) 2018 Red Hat, Inc.  All Rights Reserved.
+# Author: Eric Sandeen 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_xfs_io_command "label"
+_require_label_get_max
+
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount
+
+# Make sure we can set & clear the label
+$XFS_IO_PROG -c "label -s label.$seq" $SCRATCH_MNT
+$XFS_IO_PROG -c "label" $SCRATCH_MNT
+
+$XFS_IO_PROG -c "label -c" $SCRATCH_MNT
+$XFS_IO_PROG -c "label" $SCRATCH_MNT
+
+# And that userspace can see it now, while mounted
+# NB: some blkid has trailing whitespace, filter it out here
+$XFS_IO_PROG -c "label -s label.$seq" $SCRATCH_MNT
+$XFS_IO_PROG -c "label" $SCRATCH_MNT
+blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g"
+
+# And that the it is still there when it's unmounted
+_scratch_unmount
+blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ 

Re: [PATCH v2 0/3] btrfs: add read mirror policy

2018-05-17 Thread Jeff Mahoney
On 5/17/18 8:25 AM, Austin S. Hemmelgarn wrote:
> On 2018-05-16 22:32, Anand Jain wrote:
>>
>>
>> On 05/17/2018 06:35 AM, David Sterba wrote:
>>> On Wed, May 16, 2018 at 06:03:56PM +0800, Anand Jain wrote:
 Not yet ready for the integration. As I need to introduce
 -o no_read_mirror_policy instead of -o read_mirror_policy=-
>>>
>>> Mount option is mostly likely not the right interface for setting such
>>> options, as usual.
>>
>>   I am ok to make it ioctl for the final. What do you think?
>>
>>
>>   But to reproduce the bug posted in
>>     Btrfs: fix the corruption by reading stale btree blocks
>>   It needs to be a mount option, as randomly the pid can
>>   still pick the disk specified in the mount option.
>>
> Personally, I'd vote for filesystem property (thus handled through the
> standard `btrfs property` command) that can be overridden by a mount
> option.  With that approach, no new tool (or change to an existing tool)
> would be needed, existing volumes could be converted to use it in a
> backwards compatible manner (old kernels would just ignore the
> property), and you could still have the behavior you want in tests (and
> in theory it could easily be adapted to be a per-subvolume setting if we
> ever get per-subvolume chunk profile support).

Properties are a combination of interfaces presented through a single
command.  Although the kernel API would allow a direct-to-property
interface via the btrfs.* extended attributes, those are currently
limited to a single inode.  The label property is set via ioctl and
stored in the superblock.  The read-only subvolume property is also set
by ioctl but stored in the root flags.

As it stands, every property is explicitly defined in the tools, so any
addition would require tools changes.  This is a bigger discussion,
though.  We *could* use the xattr interface to access per-root or
fs-global properties, but we'd need to define that interface.
btrfs_listxattr could get interesting, though I suppose we could
simplify it by only allowing the per-subvolume and fs-global operations
on root inodes.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 0/3] btrfs: add read mirror policy

2018-05-17 Thread Jeff Mahoney
On 5/16/18 6:35 PM, David Sterba wrote:
> On Wed, May 16, 2018 at 06:03:56PM +0800, Anand Jain wrote:
>> Not yet ready for the integration. As I need to introduce
>> -o no_read_mirror_policy instead of -o read_mirror_policy=-
> 
> Mount option is mostly likely not the right interface for setting such
> options, as usual.


I've seen a few alternate suggestions in the thread.  I suppose the real
question is: what and where is the intended persistence for this choice?

A mount option gets it via fstab.  How would a user be expected to set
it consistently via ioctl on each mount?  Properties could work, but
there's more discussion needed there.  Personally, I like the property
idea since it could conceivably be used on a per-file basis.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


[PATCH] btrfs: fix describe_relocation string pointer

2018-05-17 Thread Anand Jain
Looks like the original idea was to print the hex of the flags which
is not coded with their flag name. So use the current buf pointer bp
instead of buf.

Signed-off-by: Anand Jain 
---
 fs/btrfs/relocation.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 74656d79e511..879b76fa881a 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4344,7 +4344,7 @@ static void describe_relocation(struct btrfs_fs_info 
*fs_info,
DESCRIBE_FLAG(RAID5,"raid5");
DESCRIBE_FLAG(RAID6,"raid6");
if (flags)
-   snprintf(buf, buf - bp + sizeof(buf), "|0x%llx", flags);
+   snprintf(bp, buf - bp + sizeof(buf), "|0x%llx", flags);
 #undef DESCRIBE_FLAG
}
 
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs hang on rsync in linux 4.16.8

2018-05-17 Thread E V
running an rsync -av --del as the only process hiting my btrfs backup
filesystem. rsync is now stuck and so is all other access to the
filesystem. Looking at ps it seems the btrfs-cleaner is running, so
maybe that deadlocked with Stack for the rsync:

[<0>] balance_dirty_pages_ratelimited+0x270/0xca0
[<0>] __btrfs_buffered_write+0x3ae/0x730 [btrfs]
[<0>] btrfs_file_write_iter+0x36e/0x510 [btrfs]
[<0>] __vfs_write+0xcf/0x130
[<0>] vfs_write+0xab/0x190
[<0>] SyS_write+0x3d/0x90
[<0>] do_syscall_64+0x55/0x100
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0x

dmesg has a bunch of stuff starting with:
[245043.379979] WARNING: CPU: 4 PID: 2608 at fs/btrfs/locking.c:251
btrfs_tree_lock+0x1be/0x1d0 [btrfs]
[245043.381571] Modules linked in: ipmi_si mpt3sas raid_class
scsi_transport_sas dell_rbu nfsv3 nfsv4 nfs fscache ext2 mgag200
intel_powerclamp i2c_algo_bit coretemp crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel aesni_intel drm_kms_helper aes_x86_64 joydev evdev
crypto_simd iTCO_wdt cryptd syscopyarea iTCO_vendor_support
sysfillrect sysimgblt dcdbas sg fb_sys_fops ttm glue_helper drm pcspkr
serio_raw ipmi_devintf ipmi_msghandler lpc_ich acpi_power_meter
i7core_edac mfd_core button nfsd auth_rpcgss oid_registry nfs_acl
lockd grace sunrpc loop autofs4 ext4 crc32c_generic crc16 mbcache jbd2
btrfs xor zstd_decompress zstd_compress xxhash raid6_pq hid_generic
usbhid hid sd_mod psmouse crc32c_intel i2c_core ehci_pci uhci_hcd
ehci_hcd ixgbe mdio megaraid_sas usbcore ptp usb_common pps_core
scsi_mod bnx2
[245043.391437]  [last unloaded: ipmi_si]
[245043.392792] CPU: 4 PID: 2608 Comm: btrfs-transacti Tainted: G
  W I  4.16.8 #1
[245043.395489] RIP: 0010:btrfs_tree_lock+0x1be/0x1d0 [btrfs]
[245043.396791] RSP: 0018:c9000424b840 EFLAGS: 00010246
[245043.398093] RAX: 0a30 RBX: 8807e20a3d20 RCX:
0001
[245043.399414] RDX: 0001 RSI: 0002 RDI:
8807e20a3d20
[245043.400732] RBP: 0001 R08: 88041f39a700 R09:
8800
[245043.402021] R10: 0040 R11: 8807e20a3d20 R12:
8807cb220630
[245043.403296] R13: 0001 R14: 8807cb220628 R15:
88041fbdf000
[245043.404780] FS:  () GS:88082fc8()
knlGS:
[245043.406050] CS:  0010 DS:  ES:  CR0: 80050033
[245043.407321] CR2: 7fffdbdb9f10 CR3: 01c09005 CR4:
000206e0
[245043.408670] Call Trace:
[245043.409977]  btrfs_search_slot+0x761/0xa60 [btrfs]
[245043.411278]  btrfs_insert_empty_items+0x62/0xb0 [btrfs]
[245043.412572]  btrfs_insert_item+0x5b/0xc0 [btrfs]
[245043.413922]  btrfs_create_pending_block_groups+0xfb/0x1e0 [btrfs]
[245043.415216]  do_chunk_alloc+0x1e5/0x2a0 [btrfs]
[245043.416487]  find_free_extent+0xcd0/0xf60 [btrfs]
[245043.417813]  btrfs_reserve_extent+0x96/0x1e0 [btrfs]
[245043.419105]  btrfs_alloc_tree_block+0xfb/0x4a0 [btrfs]
[245043.420378]  __btrfs_cow_block+0x127/0x550 [btrfs]
[245043.421652]  btrfs_cow_block+0xee/0x190 [btrfs]
[245043.422979]  btrfs_search_slot+0x227/0xa60 [btrfs]
[245043.424279]  ? btrfs_update_inode_item+0x59/0x100 [btrfs]
[245043.425538]  ? iput+0x72/0x1e0
[245043.426798]  write_one_cache_group.isra.49+0x20/0x90 [btrfs]
[245043.428131]  btrfs_start_dirty_block_groups+0x102/0x420 [btrfs]
[245043.429419]  btrfs_commit_transaction+0x11b/0x880 [btrfs]
[245043.430712]  ? start_transaction+0x8e/0x410 [btrfs]
[245043.432006]  transaction_kthread+0x184/0x1a0 [btrfs]
[245043.433341]  kthread+0xf0/0x130
[245043.434628]  ? btrfs_cleanup_transaction+0x4e0/0x4e0 [btrfs]
[245043.435928]  ? kthread_create_worker_on_cpu+0x40/0x40
[245043.437236]  ret_from_fork+0x1f/0x30
[245043.438472] Code: 43 58 85 c0 75 2c f0 ff 43 58 f0 ff 43 44 65 48
8b 04 25 00 4d 01 00 8b 80 00 04 00 00 89 43 40 48 83 c4 28 5b 5d 41
5c 41 5d c3 <0f> 0b e9 60 fe ff ff 0f 0b eb d0 0f 1f 80 00 00 00 00 8b
47 4c
[245043.441054] ---[ end trace 15abaa2aaf36827f ]---

ending with a stuck btrfs-transcaction:

245770.403336] btrfs-transacti D0  2608  2 0x8000
[245770.404281] Call Trace:
[245770.405276]  ? __schedule+0x2b1/0x770
[245770.406205]  schedule+0x2d/0x80
[245770.407150]  btrfs_tree_lock+0x111/0x1d0 [btrfs]
[245770.408073]  ? wait_woken+0x80/0x80
[245770.409059]  btrfs_search_slot+0x761/0xa60 [btrfs]
[245770.409994]  btrfs_insert_empty_items+0x62/0xb0 [btrfs]
[245770.410913]  btrfs_insert_item+0x5b/0xc0 [btrfs]
[245770.411817]  btrfs_create_pending_block_groups+0xfb/0x1e0 [btrfs]
[245770.412738]  do_chunk_alloc+0x1e5/0x2a0 [btrfs]
[245770.413627]  find_free_extent+0xcd0/0xf60 [btrfs]
[245770.414513]  btrfs_reserve_extent+0x96/0x1e0 [btrfs]
[245770.415398]  btrfs_alloc_tree_block+0xfb/0x4a0 [btrfs]
[245770.416285]  __btrfs_cow_block+0x127/0x550 [btrfs]
[245770.417230]  btrfs_cow_block+0xee/0x190 [btrfs]
[245770.418118]  btrfs_search_slot+0x227/0xa60 [btrfs]
[245770.419004]  ? btrfs_update_inode_item+0x59/0x100 [btrfs]
[245770.419880]  ? iput+0x72/0x1e0

Re: [PATCH v2 0/3] btrfs: add read mirror policy

2018-05-17 Thread Austin S. Hemmelgarn

On 2018-05-16 22:32, Anand Jain wrote:



On 05/17/2018 06:35 AM, David Sterba wrote:

On Wed, May 16, 2018 at 06:03:56PM +0800, Anand Jain wrote:

Not yet ready for the integration. As I need to introduce
-o no_read_mirror_policy instead of -o read_mirror_policy=-


Mount option is mostly likely not the right interface for setting such
options, as usual.


  I am ok to make it ioctl for the final. What do you think?


  But to reproduce the bug posted in
    Btrfs: fix the corruption by reading stale btree blocks
  It needs to be a mount option, as randomly the pid can
  still pick the disk specified in the mount option.

Personally, I'd vote for filesystem property (thus handled through the 
standard `btrfs property` command) that can be overridden by a mount 
option.  With that approach, no new tool (or change to an existing tool) 
would be needed, existing volumes could be converted to use it in a 
backwards compatible manner (old kernels would just ignore the 
property), and you could still have the behavior you want in tests (and 
in theory it could easily be adapted to be a per-subvolume setting if we 
ever get per-subvolume chunk profile support).


Of course, I'd actually like to see most of the mount options available 
as filesystem level properties with the option to override through mount 
options, but that's a lot more ambitious of an undertaking.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] btrfs: balance: add kernel log for end or paused

2018-05-17 Thread David Sterba
On Wed, May 16, 2018 at 10:51:28AM +0800, Anand Jain wrote:
> Add a kernel log when the balance ends, either for cancel or completed
> or if it is paused.

Missing S-O-B.

> ---
> v1->v2: Moved from 2/3 to 3/3
> 
>  fs/btrfs/volumes.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index ce68c4f42f94..a4e243a29f5c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4053,6 +4053,13 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
>   ret = __btrfs_balance(fs_info);
>  
>   mutex_lock(_info->balance_mutex);
> + if (ret == -ECANCELED && atomic_read(_info->balance_pause_req))
> + btrfs_info(fs_info, "balance: paused");
> + else if (ret == -ECANCELED && atomic_read(_info->balance_cancel_req))
> + btrfs_info(fs_info, "balance: canceled");
> + else
> + btrfs_info(fs_info, "balance: ended with status: %d", ret);

Wouldn't that repeat the same once again? There are status messages
printed already, when eg. balance is cancelled or paused.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] btrfs: balance: prefix kernel logs

2018-05-17 Thread David Sterba
On Wed, May 16, 2018 at 10:51:26AM +0800, Anand Jain wrote:
> Kernel logs are very important for the forensic investigations of the
> issues in general make it easy to use it. This patch adds 'balance:'
> prefix so that it can be easily searched.
> 
> Signed-off-by: Anand Jain 

Added to next, thanks. Please stick to the coding style and add an empty
newlines between declarations and statements and un-indent long printk
strings so they fit 80 columns. I've fixed that for now.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] btrfs: balance: add args info during start and resume

2018-05-17 Thread David Sterba
On Wed, May 16, 2018 at 10:51:27AM +0800, Anand Jain wrote:
> Balance args info is an important information to be reviewed for the
> system audit. So this patch adds it to the kernel log.
> 
> Example:
> 
> -> btrfs bal start -dprofiles='raid1|single',convert=raid5 
> -mprofiles='raid1|single',convert=raid5 /btrfs
> 
>  kernel: BTRFS info (device sdb): balance: start data profiles=raid1|single 
> convert=raid5 metadata profiles=raid1|single convert=raid5 system 
> profiles=raid1|single convert=raid5
> 
> -> btrfs bal start -dprofiles=raid5,convert=single 
> -mprofiles='raid1|single',convert=raid5 --background /btrfs
> 
>  kernel: BTRFS info (device sdb): balance: start data profiles=raid5 
> convert=single metadata profiles=raid1|single convert=raid5 system 
> profiles=raid1|single convert=raid5
> 
> Signed-off-by: Anand Jain 

The log message can be useful, however please avoid using unchecked
strcat and rewrite it without it if possible. You can find some
inspiration how to join several strings in describe_relocation().
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Remove stale comment about select_delayed_ref

2018-05-17 Thread Nikolay Borisov
select_delayed_ref really just gets the next delayed ref which has to
be processed - either an add ref or drop ref. We never go back for
anything. So the comment is actually bogus, just remove it.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 795dc533856f..cae33c531a3c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2705,10 +2705,6 @@ static noinline int __btrfs_run_delayed_refs(struct 
btrfs_trans_handle *trans,
spin_lock(_ref->lock);
btrfs_merge_delayed_refs(trans, delayed_refs, locked_ref);
 
-   /*
-* locked_ref is the head node, so we have to go one
-* node back for any delayed ref updates
-*/
ref = select_delayed_ref(locked_ref);
 
if (ref && ref->seq &&
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs device replace can cause silent or noisy corruption on compressed NOCOW/NODATASUM

2018-05-17 Thread james harvey
Looks like Qu may have taken care of corrupted compressed data with
NODATASUM from causing causing random kernel memory corruption.

As long as the compressed data was valid and could be uncompressed,
there were no problems, even on data marked NOCOW/NODATASUM.  If the
data being sent to be uncompressed was invalid and failed
decompression, it would sometimes give an I/O error, and sometimes
cause random kernel memory corruption.

I retraced my steps to try to figure out how my data got corrupted in
the first place.  The pattern of corruption didn't make any sense for
this to be hardware related, or a user-caused badly executed dd.

In short, "btrfs device replace" caused it.  When it copies data to a
new drive, and encounters NOCOW/NODATASUM compressed data, it copies
the data in uncompressed form to the new drive, leaving it in
compressed form on the other mirror, leaving it all marked as
compressed.  I don't know how it handles the longer length.  I don't
know if it only writes out the compressed length, if it writes out the
uncompressed length possibly overwriting other data its writing out or
even worse other file extents, etc.

To rule out anything else, I started in a fresh VM with the May 1,
2018, Arch installation ISO.  That's kernel 4.16.5, btrfs-progs 4.16.

Starting with a freshly partitioned disk with (3) 10GB partitions, a
fairly minimal reproducing case with lots of explaining comments can
be read here:

https://pastebin.com/VvNk90Wa

I of course don't know the extent of this.  I don't know all of the
situations where NOCOW/NODATASUM extents are compressed anyway.  In my
real world case, it was journald logs.  We know journald/systemd
submits those for defragmentation.  I haven't verified if it submits
the defragmentation asking for compression.  In my reproducing example
linked above, I had to defragment the file asking for compression to
cause the file to be compressed.  If that's the extent of the bug,
probably lots of journald logs out there that have been through a
replace have corruption, but hopefully no databases.  I don't think
any databases, and not many database administrators, are going to
submit the files for defragmentation with compression.  But, if
compression can be triggered in more situations than this, it's
possible there's a lot of corruption (sometimes silent) out there on
important things like databases.

Obviously, btrfs device replace or something it depends on needs
fixing.  It's above my pay grade on if some type of alert should be
sent out saying not to use replace on btrfs-progs less than a new
version that hasn't come out yet.  Probably depends on how big the
extent of the bug is.

I also submit that even with corrupted compressed data no longer being
submitted for decompression, and even with btrfs device replace soon
being patched, that there should be a way for all NODATASUM data that
is mirrored to have the mirrored copies compared, regardless of if
compression is involved.  I think check or scrub should gain this
functionality.  Obviously, without a checksum, no automatic repair can
happen, but the user can at least be alerted that something is wrong.
As the example will show, if the corruption happens on the mirrored
copy that isn't read, it's silent corruption, unless that good copy
goes bad someday.  Btrfs has a chance to give NODATASUM data extra
protection over other filesystems, somewhere between mirrored copies
just really protecting against a disk failure like most
implementations and like btrfs does with NODATASUM data now, and
between btrfs' checksummed mirrors that guard against bit rot and
one-mirror accidental corruption.

I'd even be interested in writing such an addon to check or scrub, if
it would be accepted, assuming it was written well and worked of
course.  If someone else wants to do it, that's OK too.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/3] btrfs: Add three new unprivileged ioctls to allow normal users to call "sub list/show" etc.

2018-05-17 Thread Qu Wenruo


On 2018年05月17日 16:25, Misono Tomohiro wrote:
> On 2018/05/17 15:56, Qu Wenruo wrote:
>>
>>
>> On 2018年05月16日 13:49, Tomohiro Misono wrote:
>>> [based on current misc-next]
>>>
>>> changelog:
>>> v4 -> v5
>>>   - Update error handling of 1st/2nd patch. See each log for details
>>>   - Fix misspelling
>>> v3 -> v4
>>>   - call btrfs_next_leaf() after btrfs_search_slot() when the slot
>>> position exceeds the number of items
>>>   - rebased to current misc-next
>>> v2 -> v3
>>>   - fix kbuild test bot warning
>>> v1 -> v2
>>>   - completely reimplement 1st/2nd ioctl to have user friendly api
>>>   - various cleanup, remove unnecessary goto
>>> ===
>>>
>>> This adds three new unprivileged ioctls:
>>>
>>> 1st patch:
>>>   ioctl which returns subvolume information of ROOT_ITEM and ROOT_BACKREF
>>> 2nd patch:
>>>   ioctl which returns subvolume information of ROOT_REF (without subvolume 
>>> name)
>>
>> First 2 patches looks mostly fine.
>>
>>> 3rd patch: 
>>>   user version of ino_lookup ioctl which also performs permission check.
>>
>> I'm a little concerned about this.
>>
>> What will happen in the following scenario?
>> - Environment is container whose rootfs is a subvolume of btrfs
>> - The root and normal use try to call subvolume list on their rootfs
>>
>> Will it leak the real subvolume layout to the container root/normal user?
>>
>> Or it will leak anyway even without the unprivileged ioctl?
> 
> Hi,
> 
> I'm not sure about container, but these ioctls searches subvolume (fs tree) of
> fd with witch ioctl is called (i.e. the caller needs to open the subvolume 
> first)
> and cannot search arbitrary tree. So, normal user can only get the information
> under the rootfs's subvolume.
> 
> On the other and, root can use TREE_SEARCH/INO_LOOKUP ioctl too which can 
> search
> arbitrary tree and get all info. So, I think root can get real layout.
> 
> Does this answer make sense?

Makes sense now.

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> 
>>
>> Thanks,
>> Qu
>>
>>>
>>> They will be used to implement user version of "subvolume list/show" etc.
>>> in user tools.
>>> See each commit log for more detals.
>>>
>>> The implementation of btrfs-progs can be found in the ML titled as follows: 
>>>   [PATCH 0/11] btrfs-progs: Rework of "subvolume list/show" and relax the 
>>> root privileges of them
>>>
>>> Tomohiro Misono (3):
>>>   btrfs: Add unprivileged ioctl which returns subvolume information
>>>   btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF
>>>   btrfs: Add unprivileged version of ino_lookup ioctl
>>>
>>>  fs/btrfs/ioctl.c   | 452 
>>> +
>>>  include/uapi/linux/btrfs.h |  84 +
>>>  2 files changed, 536 insertions(+)
>>>
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 3/4] btrfs: lzo: Add header length check to avoid slab out of bounds access

2018-05-17 Thread Qu Wenruo


On 2018年05月17日 16:19, Qu Wenruo wrote:
> 
> 
> On 2018年05月17日 16:14, Nikolay Borisov wrote:
>>
>>
>> On 17.05.2018 09:27, Qu Wenruo wrote:
>>> James Harvey reported that some corrupted compressed extent data can
>>> lead to various kernel memory corruption.
>>>
>>> Such corrupted extent data belongs to inode with NODATASUM flags, thus
>>> data csum won't help us detecting such bug.
>>>
>>> If lucky enough, kasan could catch it like:
>>> ==
>>> BUG: KASAN: slab-out-of-bounds in lzo_decompress_bio+0x384/0x7a0 [btrfs]
>>> Write of size 4096 at addr 8800606cb0f8 by task kworker/u16:0/2338
>>>
>>> CPU: 3 PID: 2338 Comm: kworker/u16:0 Tainted: G   O  
>>> 4.17.0-rc5-custom+ #50
>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>>> Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
>>> Call Trace:
>>>  dump_stack+0xc2/0x16b
>>>  print_address_description+0x6a/0x270
>>>  kasan_report+0x260/0x380
>>>  memcpy+0x34/0x50
>>>  lzo_decompress_bio+0x384/0x7a0 [btrfs]
>>>  end_compressed_bio_read+0x99f/0x10b0 [btrfs]
>>>  bio_endio+0x32e/0x640
>>>  normal_work_helper+0x15a/0xea0 [btrfs]
>>>  process_one_work+0x7e3/0x1470
>>>  worker_thread+0x1b0/0x1170
>>>  kthread+0x2db/0x390
>>>  ret_from_fork+0x22/0x40
>>> ...
>>> ==
>>>
>>> The offending compressed data has the following info:
>>>
>>> Header: length 32768(Looks completely valid)
>>> Segment 0 Header:   length 3472882419   (Obvious out of bounds)
>>>
>>> Then when handling segment 0, since it's over the current page, we need
>>> the compressed data to workspace, then such large size would trigger
>>> out-of-bounds memory access, screwing up the whole kernel.
>>>
>>> Fix it by adding extra checks on header and segment headers to ensure we
>>> won't access out-of-bounds, and even checks the decompressed data won't
>>> be out-of-bounds.
>>>
>>> Reported-by: James Harvey 
>>> Signed-off-by: Qu Wenruo 
>>> ---
>>>  fs/btrfs/lzo.c | 35 ++-
>>>  1 file changed, 34 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
>>> index 3d2ae4c08876..78ebc809072f 100644
>>> --- a/fs/btrfs/lzo.c
>>> +++ b/fs/btrfs/lzo.c
>>> @@ -281,6 +281,7 @@ static int lzo_decompress_bio(struct list_head *ws, 
>>> struct compressed_bio *cb)
>>> unsigned long working_bytes;
>>> size_t in_len;
>>> size_t out_len;
>>> +   size_t max_segment_len = lzo1x_worst_compress(PAGE_SIZE);
>>> unsigned long in_offset;
>>> unsigned long in_page_bytes_left;
>>> unsigned long tot_in;
>>> @@ -294,6 +295,18 @@ static int lzo_decompress_bio(struct list_head *ws, 
>>> struct compressed_bio *cb)
>>>  
>>> data_in = kmap(pages_in[0]);
>>> tot_len = read_compress_length(data_in);
>>> +   /*
>>> +* Compressed data header check.
>>> +*
>>> +* The real compressed size can't exceed extent length, and all pages
>>> +* should be used (a full pending page is not possible).
>>> +* If this happens it means the compressed extent is corrupted.
>>> +*/
>>> +   if (tot_len > min_t(size_t, BTRFS_MAX_COMPRESSED, srclen) ||
>>> +   tot_len < srclen - PAGE_SIZE) {
>>> +   ret = -EUCLEAN;
>>> +   goto done;
>>> +   }
>>
>> So tot_len is the compressed size as written in the compressed stream,
>> whereas srclen is the number of bytes on-disk this compressed extent
>> take up (as derived from submit_compressed_extents). Shouldn't those two
>> always be equal, i.e perhaps an assert is in order?
> 
> Nope, and in most case, tot_len is smaller than srclen (extent size).
> 
> The compressed data can have pending zeros, and in fact, for 8K all
> "0xcd" lzo compressed data, tot_len is 100 bytes, and the remaining
> bytes are all pending zeros.
> While compressed extent size is always rounded up to sector size, and in
> that 8K all "0xcd" case, @srclen will be 4K.
> 
> Thanks,
> Qu
> 
>>
>> srclen  comes from the async_extent struct, which in turns is
>> initialized in compress_file_range with the value of "total_compressed",
>> and the value there is actually initialized by
>> btrfs_compress_pages->lzo_compress_pages (that code makes me wanna sing
>> "You spin me right round, baby Right round like a record, baby").

Forgot to mention, that call sites are all *write* sites, not *read* sites.

For read, cb->compressed_len will just be the extent size.

Anyway, the check works fine for both sites.

Thanks,
Qu

>>
>>
>>>  
>>> tot_in = LZO_LEN;
>>> in_offset = LZO_LEN;
>>> @@ -308,6 +321,17 @@ static int lzo_decompress_bio(struct list_head *ws, 
>>> struct compressed_bio *cb)
>>> in_offset += LZO_LEN;
>>> tot_in += LZO_LEN;
>>>  
>>> +   /*
>>> +* Segment header check.
>>> +*
>>> +* The segment 

Re: [PATCH v5 0/3] btrfs: Add three new unprivileged ioctls to allow normal users to call "sub list/show" etc.

2018-05-17 Thread Misono Tomohiro
On 2018/05/17 15:56, Qu Wenruo wrote:
> 
> 
> On 2018年05月16日 13:49, Tomohiro Misono wrote:
>> [based on current misc-next]
>>
>> changelog:
>> v4 -> v5
>>   - Update error handling of 1st/2nd patch. See each log for details
>>   - Fix misspelling
>> v3 -> v4
>>   - call btrfs_next_leaf() after btrfs_search_slot() when the slot
>> position exceeds the number of items
>>   - rebased to current misc-next
>> v2 -> v3
>>   - fix kbuild test bot warning
>> v1 -> v2
>>   - completely reimplement 1st/2nd ioctl to have user friendly api
>>   - various cleanup, remove unnecessary goto
>> ===
>>
>> This adds three new unprivileged ioctls:
>>
>> 1st patch:
>>   ioctl which returns subvolume information of ROOT_ITEM and ROOT_BACKREF
>> 2nd patch:
>>   ioctl which returns subvolume information of ROOT_REF (without subvolume 
>> name)
> 
> First 2 patches looks mostly fine.
> 
>> 3rd patch: 
>>   user version of ino_lookup ioctl which also performs permission check.
> 
> I'm a little concerned about this.
> 
> What will happen in the following scenario?
> - Environment is container whose rootfs is a subvolume of btrfs
> - The root and normal use try to call subvolume list on their rootfs
> 
> Will it leak the real subvolume layout to the container root/normal user?
> 
> Or it will leak anyway even without the unprivileged ioctl?

Hi,

I'm not sure about container, but these ioctls searches subvolume (fs tree) of
fd with witch ioctl is called (i.e. the caller needs to open the subvolume 
first)
and cannot search arbitrary tree. So, normal user can only get the information
under the rootfs's subvolume.

On the other and, root can use TREE_SEARCH/INO_LOOKUP ioctl too which can search
arbitrary tree and get all info. So, I think root can get real layout.

Does this answer make sense?

> 
> Thanks,
> Qu
> 
>>
>> They will be used to implement user version of "subvolume list/show" etc.
>> in user tools.
>> See each commit log for more detals.
>>
>> The implementation of btrfs-progs can be found in the ML titled as follows: 
>>   [PATCH 0/11] btrfs-progs: Rework of "subvolume list/show" and relax the 
>> root privileges of them
>>
>> Tomohiro Misono (3):
>>   btrfs: Add unprivileged ioctl which returns subvolume information
>>   btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF
>>   btrfs: Add unprivileged version of ino_lookup ioctl
>>
>>  fs/btrfs/ioctl.c   | 452 
>> +
>>  include/uapi/linux/btrfs.h |  84 +
>>  2 files changed, 536 insertions(+)
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] btrfs: lzo: Add header length check to avoid slab out of bounds access

2018-05-17 Thread Qu Wenruo


On 2018年05月17日 16:14, Nikolay Borisov wrote:
> 
> 
> On 17.05.2018 09:27, Qu Wenruo wrote:
>> James Harvey reported that some corrupted compressed extent data can
>> lead to various kernel memory corruption.
>>
>> Such corrupted extent data belongs to inode with NODATASUM flags, thus
>> data csum won't help us detecting such bug.
>>
>> If lucky enough, kasan could catch it like:
>> ==
>> BUG: KASAN: slab-out-of-bounds in lzo_decompress_bio+0x384/0x7a0 [btrfs]
>> Write of size 4096 at addr 8800606cb0f8 by task kworker/u16:0/2338
>>
>> CPU: 3 PID: 2338 Comm: kworker/u16:0 Tainted: G   O  
>> 4.17.0-rc5-custom+ #50
>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>> Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
>> Call Trace:
>>  dump_stack+0xc2/0x16b
>>  print_address_description+0x6a/0x270
>>  kasan_report+0x260/0x380
>>  memcpy+0x34/0x50
>>  lzo_decompress_bio+0x384/0x7a0 [btrfs]
>>  end_compressed_bio_read+0x99f/0x10b0 [btrfs]
>>  bio_endio+0x32e/0x640
>>  normal_work_helper+0x15a/0xea0 [btrfs]
>>  process_one_work+0x7e3/0x1470
>>  worker_thread+0x1b0/0x1170
>>  kthread+0x2db/0x390
>>  ret_from_fork+0x22/0x40
>> ...
>> ==
>>
>> The offending compressed data has the following info:
>>
>> Header:  length 32768(Looks completely valid)
>> Segment 0 Header:length 3472882419   (Obvious out of bounds)
>>
>> Then when handling segment 0, since it's over the current page, we need
>> the compressed data to workspace, then such large size would trigger
>> out-of-bounds memory access, screwing up the whole kernel.
>>
>> Fix it by adding extra checks on header and segment headers to ensure we
>> won't access out-of-bounds, and even checks the decompressed data won't
>> be out-of-bounds.
>>
>> Reported-by: James Harvey 
>> Signed-off-by: Qu Wenruo 
>> ---
>>  fs/btrfs/lzo.c | 35 ++-
>>  1 file changed, 34 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
>> index 3d2ae4c08876..78ebc809072f 100644
>> --- a/fs/btrfs/lzo.c
>> +++ b/fs/btrfs/lzo.c
>> @@ -281,6 +281,7 @@ static int lzo_decompress_bio(struct list_head *ws, 
>> struct compressed_bio *cb)
>>  unsigned long working_bytes;
>>  size_t in_len;
>>  size_t out_len;
>> +size_t max_segment_len = lzo1x_worst_compress(PAGE_SIZE);
>>  unsigned long in_offset;
>>  unsigned long in_page_bytes_left;
>>  unsigned long tot_in;
>> @@ -294,6 +295,18 @@ static int lzo_decompress_bio(struct list_head *ws, 
>> struct compressed_bio *cb)
>>  
>>  data_in = kmap(pages_in[0]);
>>  tot_len = read_compress_length(data_in);
>> +/*
>> + * Compressed data header check.
>> + *
>> + * The real compressed size can't exceed extent length, and all pages
>> + * should be used (a full pending page is not possible).
>> + * If this happens it means the compressed extent is corrupted.
>> + */
>> +if (tot_len > min_t(size_t, BTRFS_MAX_COMPRESSED, srclen) ||
>> +tot_len < srclen - PAGE_SIZE) {
>> +ret = -EUCLEAN;
>> +goto done;
>> +}
> 
> So tot_len is the compressed size as written in the compressed stream,
> whereas srclen is the number of bytes on-disk this compressed extent
> take up (as derived from submit_compressed_extents). Shouldn't those two
> always be equal, i.e perhaps an assert is in order?

Nope, and in most case, tot_len is smaller than srclen (extent size).

The compressed data can have pending zeros, and in fact, for 8K all
"0xcd" lzo compressed data, tot_len is 100 bytes, and the remaining
bytes are all pending zeros.
While compressed extent size is always rounded up to sector size, and in
that 8K all "0xcd" case, @srclen will be 4K.

Thanks,
Qu

> 
> srclen  comes from the async_extent struct, which in turns is
> initialized in compress_file_range with the value of "total_compressed",
> and the value there is actually initialized by
> btrfs_compress_pages->lzo_compress_pages (that code makes me wanna sing
> "You spin me right round, baby Right round like a record, baby").
> 
> 
>>  
>>  tot_in = LZO_LEN;
>>  in_offset = LZO_LEN;
>> @@ -308,6 +321,17 @@ static int lzo_decompress_bio(struct list_head *ws, 
>> struct compressed_bio *cb)
>>  in_offset += LZO_LEN;
>>  tot_in += LZO_LEN;
>>  
>> +/*
>> + * Segment header check.
>> + *
>> + * The segment length must not exceed max lzo compression
>> + * size, nor the total compressed size
>> + */
>> +if (in_len > max_segment_len || tot_in + in_len > tot_len) {
>> +ret = -EUCLEAN;
>> +goto done;
>> +}
>> +
>>  tot_in += in_len;
>>  

Re: [PATCH 3/4] btrfs: lzo: Add header length check to avoid slab out of bounds access

2018-05-17 Thread Nikolay Borisov


On 17.05.2018 09:27, Qu Wenruo wrote:
> James Harvey reported that some corrupted compressed extent data can
> lead to various kernel memory corruption.
> 
> Such corrupted extent data belongs to inode with NODATASUM flags, thus
> data csum won't help us detecting such bug.
> 
> If lucky enough, kasan could catch it like:
> ==
> BUG: KASAN: slab-out-of-bounds in lzo_decompress_bio+0x384/0x7a0 [btrfs]
> Write of size 4096 at addr 8800606cb0f8 by task kworker/u16:0/2338
> 
> CPU: 3 PID: 2338 Comm: kworker/u16:0 Tainted: G   O  
> 4.17.0-rc5-custom+ #50
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
> Call Trace:
>  dump_stack+0xc2/0x16b
>  print_address_description+0x6a/0x270
>  kasan_report+0x260/0x380
>  memcpy+0x34/0x50
>  lzo_decompress_bio+0x384/0x7a0 [btrfs]
>  end_compressed_bio_read+0x99f/0x10b0 [btrfs]
>  bio_endio+0x32e/0x640
>  normal_work_helper+0x15a/0xea0 [btrfs]
>  process_one_work+0x7e3/0x1470
>  worker_thread+0x1b0/0x1170
>  kthread+0x2db/0x390
>  ret_from_fork+0x22/0x40
> ...
> ==
> 
> The offending compressed data has the following info:
> 
> Header:   length 32768(Looks completely valid)
> Segment 0 Header: length 3472882419   (Obvious out of bounds)
> 
> Then when handling segment 0, since it's over the current page, we need
> the compressed data to workspace, then such large size would trigger
> out-of-bounds memory access, screwing up the whole kernel.
> 
> Fix it by adding extra checks on header and segment headers to ensure we
> won't access out-of-bounds, and even checks the decompressed data won't
> be out-of-bounds.
> 
> Reported-by: James Harvey 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/lzo.c | 35 ++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
> index 3d2ae4c08876..78ebc809072f 100644
> --- a/fs/btrfs/lzo.c
> +++ b/fs/btrfs/lzo.c
> @@ -281,6 +281,7 @@ static int lzo_decompress_bio(struct list_head *ws, 
> struct compressed_bio *cb)
>   unsigned long working_bytes;
>   size_t in_len;
>   size_t out_len;
> + size_t max_segment_len = lzo1x_worst_compress(PAGE_SIZE);
>   unsigned long in_offset;
>   unsigned long in_page_bytes_left;
>   unsigned long tot_in;
> @@ -294,6 +295,18 @@ static int lzo_decompress_bio(struct list_head *ws, 
> struct compressed_bio *cb)
>  
>   data_in = kmap(pages_in[0]);
>   tot_len = read_compress_length(data_in);
> + /*
> +  * Compressed data header check.
> +  *
> +  * The real compressed size can't exceed extent length, and all pages
> +  * should be used (a full pending page is not possible).
> +  * If this happens it means the compressed extent is corrupted.
> +  */
> + if (tot_len > min_t(size_t, BTRFS_MAX_COMPRESSED, srclen) ||
> + tot_len < srclen - PAGE_SIZE) {
> + ret = -EUCLEAN;
> + goto done;
> + }

So tot_len is the compressed size as written in the compressed stream,
whereas srclen is the number of bytes on-disk this compressed extent
take up (as derived from submit_compressed_extents). Shouldn't those two
always be equal, i.e perhaps an assert is in order?

srclen  comes from the async_extent struct, which in turns is
initialized in compress_file_range with the value of "total_compressed",
and the value there is actually initialized by
btrfs_compress_pages->lzo_compress_pages (that code makes me wanna sing
"You spin me right round, baby Right round like a record, baby").


>  
>   tot_in = LZO_LEN;
>   in_offset = LZO_LEN;
> @@ -308,6 +321,17 @@ static int lzo_decompress_bio(struct list_head *ws, 
> struct compressed_bio *cb)
>   in_offset += LZO_LEN;
>   tot_in += LZO_LEN;
>  
> + /*
> +  * Segment header check.
> +  *
> +  * The segment length must not exceed max lzo compression
> +  * size, nor the total compressed size
> +  */
> + if (in_len > max_segment_len || tot_in + in_len > tot_len) {
> + ret = -EUCLEAN;
> + goto done;
> + }
> +
>   tot_in += in_len;
>   working_bytes = in_len;
>   may_late_unmap = need_unmap = false;
> @@ -358,7 +382,7 @@ static int lzo_decompress_bio(struct list_head *ws, 
> struct compressed_bio *cb)
>   }
>   }
>  
> - out_len = lzo1x_worst_compress(PAGE_SIZE);
> + out_len = max_segment_len;
>   ret = lzo1x_decompress_safe(buf, in_len, workspace->buf,
>   _len);
>   if (need_unmap)
> @@ -368,6 

Re: [PATCH 2/4] btrfs: lzo: Add comment about the how btrfs records its lzo compressed data

2018-05-17 Thread Qu Wenruo


On 2018年05月17日 15:48, Nikolay Borisov wrote:
> 
> 
> On 17.05.2018 09:27, Qu Wenruo wrote:
>> Signed-off-by: Qu Wenruo 
> 
> Overall it looks good and useful just a couple of nits below.
>> ---
>>  fs/btrfs/lzo.c | 23 +++
>>  1 file changed, 23 insertions(+)
>>
>> diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
>> index 0667ea07f766..3d2ae4c08876 100644
>> --- a/fs/btrfs/lzo.c
>> +++ b/fs/btrfs/lzo.c
>> @@ -17,6 +17,29 @@
>>  
>>  #define LZO_LEN 4
>>  
>> +/*
>> + * Btrfs LZO compression format
>> + *
>> + * Regular LZO compressed data extent is consist of:
> nit: s/is consist/consists
>> + * 1.  Header
>> + * Fixed size. LZO_LEN (4) bytes long, LE16.
>> + * Records the total size (*includes* the header) of real compressed 
>> data.
>> + *
>> + * 2.  Segment(s)
>> + * Variable size. Includes one segment header, and then data payload.
>> + * One btrfs compressed data can have one or more segments.
> 
> So "one btrfs compressed data" should really mean "one btrfs compressed
> extent" I guess?

Oh, yes, one btrfs compressed extent.

Thanks,
Qu

> 
>> + *
>> + * 2.1 Segment header
>> + * Fixed size. LZO_LEN (4) bytes long, LE16.
>> + * Records the total size of the segment (*excludes* the header).
>> + *
>> + * 2.2 Data Payload
>> + * Variable size. Size up limit should be 
>> lzo1x_worst_compress(PAGE_SIZE).
>> + *
>> + * While for inlined LZO compressed data extent, it doesn't have Header, 
>> just
>> + * one Segment.
> 
> 
> 
>> + */
>> +
>>  struct workspace {
>>  void *mem;
>>  void *buf;  /* where decompressed data goes */
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] btrfs: lzo: Add comment about the how btrfs records its lzo compressed data

2018-05-17 Thread Nikolay Borisov


On 17.05.2018 09:27, Qu Wenruo wrote:
> Signed-off-by: Qu Wenruo 

Overall it looks good and useful just a couple of nits below.
> ---
>  fs/btrfs/lzo.c | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
> index 0667ea07f766..3d2ae4c08876 100644
> --- a/fs/btrfs/lzo.c
> +++ b/fs/btrfs/lzo.c
> @@ -17,6 +17,29 @@
>  
>  #define LZO_LEN  4
>  
> +/*
> + * Btrfs LZO compression format
> + *
> + * Regular LZO compressed data extent is consist of:
nit: s/is consist/consists
> + * 1.  Header
> + * Fixed size. LZO_LEN (4) bytes long, LE16.
> + * Records the total size (*includes* the header) of real compressed 
> data.
> + *
> + * 2.  Segment(s)
> + * Variable size. Includes one segment header, and then data payload.
> + * One btrfs compressed data can have one or more segments.

So "one btrfs compressed data" should really mean "one btrfs compressed
extent" I guess?

> + *
> + * 2.1 Segment header
> + * Fixed size. LZO_LEN (4) bytes long, LE16.
> + * Records the total size of the segment (*excludes* the header).
> + *
> + * 2.2 Data Payload
> + * Variable size. Size up limit should be 
> lzo1x_worst_compress(PAGE_SIZE).
> + *
> + * While for inlined LZO compressed data extent, it doesn't have Header, just
> + * one Segment.



> + */
> +
>  struct workspace {
>   void *mem;
>   void *buf;  /* where decompressed data goes */
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] btrfs: compression: Add linux/sizes.h for compression.h

2018-05-17 Thread Nikolay Borisov


On 17.05.2018 09:27, Qu Wenruo wrote:
> Since compression.h is using SZ_* macros, and if some user only includes
> compression.h without linux/sizes.h, it will cause compile error.
> 
> One example is lzo.c, if it uses BTRFS_MAX_COMPRESSED, it would cause
> compile error.
> 
> Fix it by adding linux/sizes.h in compression.h
> 
> Signed-off-by: Qu Wenruo 

Reviewed-by: Nikolay Borisov 

> ---
>  fs/btrfs/compression.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
> index cc605f7b23fb..317703d9b073 100644
> --- a/fs/btrfs/compression.h
> +++ b/fs/btrfs/compression.h
> @@ -6,6 +6,7 @@
>  #ifndef BTRFS_COMPRESSION_H
>  #define BTRFS_COMPRESSION_H
>  
> +#include 
>  /*
>   * We want to make sure that amount of RAM required to uncompress an extent 
> is
>   * reasonable, so we limit the total size in ram of a compressed extent to
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: fix BUG trying to resume balance without resume flag

2018-05-17 Thread Anand Jain



On 05/16/2018 10:35 PM, David Sterba wrote:

On Mon, Apr 30, 2018 at 05:48:45PM +0800, Anand Jain wrote:

We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance(),
which is not called during the remount. So when resuming from the
paused balance we hit BUG.

  kernel: kernel BUG at fs/btrfs/volumes.c:3890!
  ::
  kernel:  balance_kthread+0x51/0x60 [btrfs]
  kernel:  kthread+0x111/0x130
  ::
  kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ba7d0090bde8

Reproducer:
   On a mounted BTRFS.

   btrfs balance start --full-balance /btrfs
   btrfs balance pause /btrfs
   mount -o remount,ro /dev/sdb /btrfs
   mount -o remount,rw /dev/sdb /btrfs

To fix this set the BTRFS_BALANCE_RESUME flag in btrfs_resume_balance_async()
instead of btrfs_recover_balance().

Signed-off-by: Anand Jain 
---
v1->v2: btrfs_resume_balance_async() can be called only from remount or
mount, we don't need to hold fs_info->balance_lock.


For mount it's ok, there's nothing that can run in parallel, but remount
ro->rw that resumes the balance can be run with the ioctl that checks
balance status in parallel, at any time. As the fs_info::balance_ctl is
set up, btrfs_ioctl_balance_progress will proceed and return the current
status.


 You are right.


Strictly speaking we should rather keep the balance at the paused state
unless it is resumed by the user again, that means neither mount nor
remount-rw should resume the balance automatically, former case needs
writing balance status to the disk. Which needs compatibility
verification.  So for now just avoid BUG.

  fs/btrfs/volumes.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3e6983a169c4..64bcaf25908b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4115,6 +4115,8 @@ int btrfs_resume_balance_async(struct btrfs_fs_info 
*fs_info)
return 0;
}
  
+	fs_info->balance_ctl->b_flags |= BTRFS_BALANCE_RESUME;


Though there's no update operation on flags, I think it's still better
to add the locks that set the resume status. And a comment why it's
here.


ok.


+
tsk = kthread_run(balance_kthread, fs_info, "btrfs-balance");
return PTR_ERR_OR_ZERO(tsk);
  }
@@ -4156,7 +4158,6 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
  
  	bctl->fs_info = fs_info;

bctl->flags = btrfs_balance_flags(leaf, item);
-   bctl->flags |= BTRFS_BALANCE_RESUME;


Also, it does not hurt to leave this here, as it logically matches what
the function does: we found the balance item, thus we set the status
accordingly.


ok.


Please update and resend, I'd like to push this to 4.17-rc as it's a
crash fix with a reproducer. Thanks.


Sent v3 with the above changes.

Thanks, Anand

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] btrfs: fix BUG trying to resume balance without resume flag

2018-05-17 Thread Anand Jain
We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance()
only, which isn't called during the remount. So when resuming from
the paused balance we hit the BUG.

 kernel: kernel BUG at fs/btrfs/volumes.c:3890!
 ::
 kernel:  balance_kthread+0x51/0x60 [btrfs]
 kernel:  kthread+0x111/0x130
 ::
 kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ba7d0090bde8

Reproducer:
  On a mounted BTRFS.

  btrfs balance start --full-balance /btrfs
  btrfs balance pause /btrfs
  mount -o remount,ro /dev/sdb /btrfs
  mount -o remount,rw /dev/sdb /btrfs

To fix this set the BTRFS_BALANCE_RESUME flag in btrfs_resume_balance_async().

Signed-off-by: Anand Jain 
---
v2->v3: . Hold fs_info->balance_lock because in remount context there can
be more the one thread accessing the balance_ctl.
. Don't remove the BTRFS_BALANCE_RESUME set at recover_balance
as in the original code.
. Add comments.
v1->v2: btrfs_resume_balance_async() can be called only from remount or
mount, we don't need to hold fs_info->balance_lock.

 fs/btrfs/volumes.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 40e4259bdd51..c68976856d87 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4112,6 +4112,15 @@ int btrfs_resume_balance_async(struct btrfs_fs_info 
*fs_info)
return 0;
}
 
+   /*
+* A remount ro->rw sequence should continue with the
+* paused balance irrespective of who pauses it, system or
+* the user as of now, so set the resume flag.
+*/
+   spin_lock(_info->balance_lock);
+   fs_info->balance_ctl->flags |= BTRFS_BALANCE_RESUME;
+   spin_unlock(_info->balance_lock);
+
tsk = kthread_run(balance_kthread, fs_info, "btrfs-balance");
return PTR_ERR_OR_ZERO(tsk);
 }
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: Skip some btrfs_cross_ref_exist() check in nocow path

2018-05-17 Thread Ethan Lien
In nocow path, we check if the extent is snapshotted in
btrfs_cross_ref_exist(). We can do the similar check earlier and avoid
unnecessary search into extent tree.

A fio test on a Intel D-1531, 16GB RAM, SSD RAID-5 machine as follows:

[global]
group_reporting
time_based
thread=1
ioengine=libaio
bs=4k
iodepth=32
size=64G
runtime=180
numjobs=8
rw=randwrite

[file1]
filename=/mnt/nocow/testfile

IOPS result:   unpatched patched

1 fio round: 4667046958
snapshot
2 fio round: 5182654498
3 fio round: 5976761289

After snapshot, the first fio get about 5% performance gain. As we
continually write to the same file, all writes will resume to nocow mode
and eventually we have no performance gain.

Signed-off-by: Ethan Lien 
---

V2:
 Add comment and performance test.

 fs/btrfs/inode.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d241285a0d2a..177630337108 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1373,6 +1373,13 @@ static noinline int run_delalloc_nocow(struct inode 
*inode,
btrfs_file_extent_encryption(leaf, fi) ||
btrfs_file_extent_other_encoding(leaf, fi))
goto out_check;
+   /*
+* We can skip the checking of generation of
+* extent item in btrfs_cross_ref_exist().
+*/
+   if (btrfs_file_extent_generation(leaf, fi) <=
+   btrfs_root_last_snapshot(>root_item))
+   goto out_check;
if (extent_type == BTRFS_FILE_EXTENT_REG && !force)
goto out_check;
if (btrfs_extent_readonly(fs_info, disk_bytenr))
@@ -7368,6 +7375,14 @@ noinline int can_nocow_extent(struct inode *inode, u64 
offset, u64 *len,
btrfs_file_extent_other_encoding(leaf, fi))
goto out;
 
+   /*
+* We can skip the checking of generation of
+* extent item in btrfs_cross_ref_exist().
+*/
+   if (btrfs_file_extent_generation(leaf, fi) <=
+   btrfs_root_last_snapshot(>root_item))
+   goto out;
+
backref_offset = btrfs_file_extent_offset(leaf, fi);
 
if (orig_start) {
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/3] btrfs: Add three new unprivileged ioctls to allow normal users to call "sub list/show" etc.

2018-05-17 Thread Qu Wenruo


On 2018年05月16日 13:49, Tomohiro Misono wrote:
> [based on current misc-next]
> 
> changelog:
> v4 -> v5
>   - Update error handling of 1st/2nd patch. See each log for details
>   - Fix misspelling
> v3 -> v4
>   - call btrfs_next_leaf() after btrfs_search_slot() when the slot
> position exceeds the number of items
>   - rebased to current misc-next
> v2 -> v3
>   - fix kbuild test bot warning
> v1 -> v2
>   - completely reimplement 1st/2nd ioctl to have user friendly api
>   - various cleanup, remove unnecessary goto
> ===
> 
> This adds three new unprivileged ioctls:
> 
> 1st patch:
>   ioctl which returns subvolume information of ROOT_ITEM and ROOT_BACKREF
> 2nd patch:
>   ioctl which returns subvolume information of ROOT_REF (without subvolume 
> name)

First 2 patches looks mostly fine.

> 3rd patch: 
>   user version of ino_lookup ioctl which also performs permission check.

I'm a little concerned about this.

What will happen in the following scenario?
- Environment is container whose rootfs is a subvolume of btrfs
- The root and normal use try to call subvolume list on their rootfs

Will it leak the real subvolume layout to the container root/normal user?

Or it will leak anyway even without the unprivileged ioctl?

Thanks,
Qu

> 
> They will be used to implement user version of "subvolume list/show" etc.
> in user tools.
> See each commit log for more detals.
> 
> The implementation of btrfs-progs can be found in the ML titled as follows: 
>   [PATCH 0/11] btrfs-progs: Rework of "subvolume list/show" and relax the 
> root privileges of them
> 
> Tomohiro Misono (3):
>   btrfs: Add unprivileged ioctl which returns subvolume information
>   btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF
>   btrfs: Add unprivileged version of ino_lookup ioctl
> 
>  fs/btrfs/ioctl.c   | 452 
> +
>  include/uapi/linux/btrfs.h |  84 +
>  2 files changed, 536 insertions(+)
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v5 2/3] btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF

2018-05-17 Thread Qu Wenruo


On 2018年05月16日 13:49, Tomohiro Misono wrote:
> Add unprivileged ioctl BTRFS_IOC_GET_SUBVOL_ROOTREF which
> returns ROOT_REF information of the subvolume containing this inode
> except the subvolume name (this is because to prevent potential name
> leak). The subvolume name will be gained by user version of ino_lookup
> ioctl (BTRFS_IOC_INO_LOOKUP_USER) which also performs permission check.
> 
> The min id of root ref's subvolume to be searched is specified by
> @min_id in struct btrfs_ioctl_get_subvol_rootref_args. After the search
> ends, @min_id is set to the last searched root ref's subvolid + 1. Also,
> if there are more root refs than BTRFS_MAX_ROOTREF_BUFFER_NUM, -EOVERFLOW
> is returned. Therefore the caller can just call this ioctl again without
> changing the argument to continue search.
> 
> Signed-off-by: Tomohiro Misono 

Looks good.

Reviewed-by: Qu Wenruo 

Just a little off-topic nitpick below, no need to address in this patch.

> ---
>  v4 -> v5
> - Update error handling of btrfs_next_leaf() to cover all cases
> - Use btrfs_next_item() to reduce the call of btrfs_search_slot()
> 
>  fs/btrfs/ioctl.c   | 102 
> +
>  include/uapi/linux/btrfs.h |  16 +++
>  2 files changed, 118 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index c1c9ae9a937d..db5de77540e1 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2386,6 +2386,106 @@ static noinline int 
> btrfs_ioctl_get_subvol_info(struct file *file,
>   return ret;
>  }
>  
> +/*
> + * Return ROOT_REF information of the subvolume containing this inode
> + * except the subvolume name.
> + */
> +static noinline int btrfs_ioctl_get_subvol_rootref(struct file *file,
> +void __user *argp)
> +{
> + struct btrfs_ioctl_get_subvol_rootref_args *rootrefs;
> + struct btrfs_root_ref *rref;
> + struct btrfs_root *root;
> + struct btrfs_path *path;
> + struct btrfs_key key;
> +
> + struct extent_buffer *l;
> + int slot;
> +
> + struct inode *inode;
> + int ret;
> + u64 objectid;
> + u8 found;
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> +
> + rootrefs = memdup_user(argp, sizeof(*rootrefs));
> + if (!rootrefs) {
> + btrfs_free_path(path);
> + return -ENOMEM;
> + }
> +
> + inode = file_inode(file);
> + root = BTRFS_I(inode)->root->fs_info->tree_root;
> + objectid = BTRFS_I(inode)->root->root_key.objectid;
> +
> + key.objectid = objectid;
> + key.type = BTRFS_ROOT_REF_KEY;
> + key.offset = rootrefs->min_id;
> + found = 0;
> +
> + ret = btrfs_search_slot(NULL, root, , path, 0, 0);
> + if (ret < 0) {
> + goto out;
> + } else if (path->slots[0] >=
> + btrfs_header_nritems(path->nodes[0])) {
> + ret = btrfs_next_leaf(root, path);
> + if (ret < 0) {
> + goto out;
> + } else if (ret > 0) {
> + ret = -ENOENT;
> + goto out;
> + }
> + }
> + while (1) {
> + l = path->nodes[0];
> + slot = path->slots[0];
> +
> + btrfs_item_key_to_cpu(l, , slot);
> + if (key.objectid != objectid ||
> + key.type != BTRFS_ROOT_REF_KEY) {
> + ret = 0;
> + goto out;
> + }
> +
> + if (found == BTRFS_MAX_ROOTREF_BUFFER_NUM) {
> + ret = -EOVERFLOW;
> + goto out;
> + }
> +
> + rref = btrfs_item_ptr(l, slot, struct btrfs_root_ref);
> + rootrefs->rootref[found].subvolid = key.offset;
> + rootrefs->rootref[found].dirid =
> +   btrfs_root_ref_dirid(l, rref);
> + found++;
> +
> + ret = btrfs_next_item(root, path);
> + if (ret < 0) {
> + goto out;
> + } else if (ret > 0) {
> + ret = -ENOENT;
> + goto out;
> + }
> + }
> +
> +out:
> + if (!ret || ret == -EOVERFLOW) {
> + rootrefs->num_items = found;
> + /* update min_id for next search */
> + if (found)
> + rootrefs->min_id =
> + rootrefs->rootref[found - 1].subvolid + 1;
> + if (copy_to_user(argp, rootrefs, sizeof(*rootrefs)))
> + ret = -EFAULT;
> + }
> +
> + btrfs_free_path(path);
> + kfree(rootrefs);
> + return ret;
> +}
> +
>  static noinline int btrfs_ioctl_snap_destroy(struct file *file,
>void __user *arg)
>  {
> @@ -5520,6 +5620,8 @@ long btrfs_ioctl(struct file *file, unsigned int
>   return 

Re: [PATCH v5 1/3] btrfs: Add unprivileged ioctl which returns subvolume information

2018-05-17 Thread Qu Wenruo


On 2018年05月16日 13:49, Tomohiro Misono wrote:
> Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns
> the information of subvolume containing this inode.
> (i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.)
> 
> Signed-off-by: Tomohiro Misono 
> ---
>  v4 -> v5
> - Update error handling of btrfs_next_leaf() to cover all cases
> - Return error if ROOT_BACKREF is not found (except top-level)
> 
>  fs/btrfs/ioctl.c   | 146 
> +
>  include/uapi/linux/btrfs.h |  51 
>  2 files changed, 197 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 48e2ddff32bd..c1c9ae9a937d 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2242,6 +2242,150 @@ static noinline int btrfs_ioctl_ino_lookup(struct 
> file *file,
>   return ret;
>  }
>  
> +/* Get the subvolume information in BTRFS_ROOT_ITEM and BTRFS_ROOT_BACKREF */
> +static noinline int btrfs_ioctl_get_subvol_info(struct file *file,
> +void __user *argp)
> +{
> + struct btrfs_ioctl_get_subvol_info_args *subvol_info;
> + struct btrfs_root *root;
> + struct btrfs_path *path;
> + struct btrfs_key key;
> +
> + struct btrfs_root_item root_item;
> + struct btrfs_root_ref *rref;
> + struct extent_buffer *l;
> + int slot;
> +
> + unsigned long item_off;
> + unsigned long item_len;
> +
> + struct inode *inode;
> + int ret;
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> +
> + subvol_info = kzalloc(sizeof(*subvol_info), GFP_KERNEL);
> + if (!subvol_info) {
> + btrfs_free_path(path);
> + return -ENOMEM;
> + }
> +
> + inode = file_inode(file);
> + root = BTRFS_I(inode)->root->fs_info->tree_root;
> +
> + key.objectid = BTRFS_I(inode)->root->root_key.objectid;
> + key.type = BTRFS_ROOT_ITEM_KEY;
> + key.offset = 0;
> +
> + ret = btrfs_search_slot(NULL, root, , path, 0, 0);

What about just using btrfs_get_fs_root()?

It would save several lines, and have better error handling along with
better tree cache.

Thanks,
Qu

> + if (ret < 0) {
> + goto out;
> + } else if (ret > 0) {
> + u64 objectid = key.objectid;
> +
> + if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
> + ret = btrfs_next_leaf(root, path);
> + if (ret < 0) {
> + goto out;
> + } else if (ret > 0) {
> + ret = -ENOENT;
> + goto out;
> + }
> + }
> +
> + /* If the subvolume is a snapshot, offset is not zero */
> + btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
> + if (key.objectid != objectid ||
> + key.type != BTRFS_ROOT_ITEM_KEY) {
> + ret = -ENOENT;
> + goto out;
> + }
> + }
> +
> + l = path->nodes[0];
> + slot = path->slots[0];
> + item_off = btrfs_item_ptr_offset(l, slot);
> + item_len = btrfs_item_size_nr(l, slot);
> + read_extent_buffer(l, _item, item_off, item_len);
> +
> + subvol_info->id = key.objectid;
> +
> + subvol_info->generation = btrfs_root_generation(_item);
> + subvol_info->flags = btrfs_root_flags(_item);
> +
> + memcpy(subvol_info->uuid, root_item.uuid, BTRFS_UUID_SIZE);
> + memcpy(subvol_info->parent_uuid, root_item.parent_uuid,
> + BTRFS_UUID_SIZE);
> + memcpy(subvol_info->received_uuid, root_item.received_uuid,
> + BTRFS_UUID_SIZE);
> +
> + subvol_info->ctransid = btrfs_root_ctransid(_item);
> + subvol_info->ctime.sec = btrfs_stack_timespec_sec(_item.ctime);
> + subvol_info->ctime.nsec = btrfs_stack_timespec_nsec(_item.ctime);
> +
> + subvol_info->otransid = btrfs_root_otransid(_item);
> + subvol_info->otime.sec = btrfs_stack_timespec_sec(_item.otime);
> + subvol_info->otime.nsec = btrfs_stack_timespec_nsec(_item.otime);
> +
> + subvol_info->stransid = btrfs_root_stransid(_item);
> + subvol_info->stime.sec = btrfs_stack_timespec_sec(_item.stime);
> + subvol_info->stime.nsec = btrfs_stack_timespec_nsec(_item.stime);
> +
> + subvol_info->rtransid = btrfs_root_rtransid(_item);
> + subvol_info->rtime.sec = btrfs_stack_timespec_sec(_item.rtime);
> + subvol_info->rtime.nsec = btrfs_stack_timespec_nsec(_item.rtime);
> +
> + btrfs_release_path(path);
> + if (key.objectid != BTRFS_FS_TREE_OBJECTID) {
> + key.type = BTRFS_ROOT_BACKREF_KEY;
> + key.offset = 0;
> + ret = btrfs_search_slot(NULL, root, , path, 0, 0);
> + if (ret < 0) {
> + goto out;
> + 

RE: [PATCH v5 2/3] btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF

2018-05-17 Thread Gu, Jinxiang
Hi,

> -Original Message-
> From: linux-btrfs-ow...@vger.kernel.org 
> [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Tomohiro Misono
> Sent: Wednesday, May 16, 2018 1:50 PM
> To: linux-btrfs@vger.kernel.org
> Subject: [PATCH v5 2/3] btrfs: Add unprivileged ioctl which returns 
> subvolume's ROOT_REF
> 
> Add unprivileged ioctl BTRFS_IOC_GET_SUBVOL_ROOTREF which returns ROOT_REF 
> information of the subvolume containing this inode
> except the subvolume name (this is because to prevent potential name leak). 
> The subvolume name will be gained by user version of
> ino_lookup ioctl (BTRFS_IOC_INO_LOOKUP_USER) which also performs permission 
> check.
> 
> The min id of root ref's subvolume to be searched is specified by @min_id in 
> struct btrfs_ioctl_get_subvol_rootref_args. After the search
> ends, @min_id is set to the last searched root ref's subvolid + 1. Also, if 
> there are more root refs than
> BTRFS_MAX_ROOTREF_BUFFER_NUM, -EOVERFLOW is returned. Therefore the caller 
> can just call this ioctl again without changing the
> argument to continue search.
> 
> Signed-off-by: Tomohiro Misono 
> ---
>  v4 -> v5
> - Update error handling of btrfs_next_leaf() to cover all cases
> - Use btrfs_next_item() to reduce the call of btrfs_search_slot()
> 
>  fs/btrfs/ioctl.c   | 102 
> +
>  include/uapi/linux/btrfs.h |  16 +++
>  2 files changed, 118 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 
> c1c9ae9a937d..db5de77540e1 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2386,6 +2386,106 @@ static noinline int 
> btrfs_ioctl_get_subvol_info(struct file *file,
>   return ret;
>  }
> 
> +/*
> + * Return ROOT_REF information of the subvolume containing this inode
> + * except the subvolume name.
> + */
> +static noinline int btrfs_ioctl_get_subvol_rootref(struct file *file,
> +void __user *argp)
> +{
> + struct btrfs_ioctl_get_subvol_rootref_args *rootrefs;
> + struct btrfs_root_ref *rref;
> + struct btrfs_root *root;
> + struct btrfs_path *path;
> + struct btrfs_key key;
> +
> + struct extent_buffer *l;
> + int slot;
> +
> + struct inode *inode;
> + int ret;
> + u64 objectid;
> + u8 found;
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> +
> + rootrefs = memdup_user(argp, sizeof(*rootrefs));
> + if (!rootrefs) {
> + btrfs_free_path(path);
> + return -ENOMEM;
> + }
> +
> + inode = file_inode(file);
> + root = BTRFS_I(inode)->root->fs_info->tree_root;
> + objectid = BTRFS_I(inode)->root->root_key.objectid;
> +
> + key.objectid = objectid;
> + key.type = BTRFS_ROOT_REF_KEY;
> + key.offset = rootrefs->min_id;
> + found = 0;
> +
> + ret = btrfs_search_slot(NULL, root, , path, 0, 0);
> + if (ret < 0) {
> + goto out;
> + } else if (path->slots[0] >=
> + btrfs_header_nritems(path->nodes[0])) {
> + ret = btrfs_next_leaf(root, path);
> + if (ret < 0) {
> + goto out;
> + } else if (ret > 0) {
> + ret = -ENOENT;
> + goto out;
> + }
> + }
> + while (1) {
> + l = path->nodes[0];
> + slot = path->slots[0];
> +
> + btrfs_item_key_to_cpu(l, , slot);
> + if (key.objectid != objectid ||
> + key.type != BTRFS_ROOT_REF_KEY) {
> + ret = 0;
> + goto out;
> + }
> +
> + if (found == BTRFS_MAX_ROOTREF_BUFFER_NUM) {
> + ret = -EOVERFLOW;
> + goto out;
> + }
> +
> + rref = btrfs_item_ptr(l, slot, struct btrfs_root_ref);
> + rootrefs->rootref[found].subvolid = key.offset;
> + rootrefs->rootref[found].dirid =
> +   btrfs_root_ref_dirid(l, rref);
> + found++;
> +
> + ret = btrfs_next_item(root, path);
> + if (ret < 0) {
> + goto out;
> + } else if (ret > 0) {
> + ret = -ENOENT;
Should return -EUCLEAN in this scenario.


> + goto out;
> + }
> + }
> +
> +out:
> + if (!ret || ret == -EOVERFLOW) {
> + rootrefs->num_items = found;
> + /* update min_id for next search */
> + if (found)
> + rootrefs->min_id =
> + rootrefs->rootref[found - 1].subvolid + 1;
> + if (copy_to_user(argp, rootrefs, sizeof(*rootrefs)))
> + ret = -EFAULT;
> + }
> +
> + btrfs_free_path(path);
> + kfree(rootrefs);
> + return ret;
> +}
> +
>  static noinline int 

[PATCH 3/4] btrfs: lzo: Add header length check to avoid slab out of bounds access

2018-05-17 Thread Qu Wenruo
James Harvey reported that some corrupted compressed extent data can
lead to various kernel memory corruption.

Such corrupted extent data belongs to inode with NODATASUM flags, thus
data csum won't help us detecting such bug.

If lucky enough, kasan could catch it like:
==
BUG: KASAN: slab-out-of-bounds in lzo_decompress_bio+0x384/0x7a0 [btrfs]
Write of size 4096 at addr 8800606cb0f8 by task kworker/u16:0/2338

CPU: 3 PID: 2338 Comm: kworker/u16:0 Tainted: G   O  
4.17.0-rc5-custom+ #50
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
Call Trace:
 dump_stack+0xc2/0x16b
 print_address_description+0x6a/0x270
 kasan_report+0x260/0x380
 memcpy+0x34/0x50
 lzo_decompress_bio+0x384/0x7a0 [btrfs]
 end_compressed_bio_read+0x99f/0x10b0 [btrfs]
 bio_endio+0x32e/0x640
 normal_work_helper+0x15a/0xea0 [btrfs]
 process_one_work+0x7e3/0x1470
 worker_thread+0x1b0/0x1170
 kthread+0x2db/0x390
 ret_from_fork+0x22/0x40
...
==

The offending compressed data has the following info:

Header: length 32768(Looks completely valid)
Segment 0 Header:   length 3472882419   (Obvious out of bounds)

Then when handling segment 0, since it's over the current page, we need
the compressed data to workspace, then such large size would trigger
out-of-bounds memory access, screwing up the whole kernel.

Fix it by adding extra checks on header and segment headers to ensure we
won't access out-of-bounds, and even checks the decompressed data won't
be out-of-bounds.

Reported-by: James Harvey 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/lzo.c | 35 ++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
index 3d2ae4c08876..78ebc809072f 100644
--- a/fs/btrfs/lzo.c
+++ b/fs/btrfs/lzo.c
@@ -281,6 +281,7 @@ static int lzo_decompress_bio(struct list_head *ws, struct 
compressed_bio *cb)
unsigned long working_bytes;
size_t in_len;
size_t out_len;
+   size_t max_segment_len = lzo1x_worst_compress(PAGE_SIZE);
unsigned long in_offset;
unsigned long in_page_bytes_left;
unsigned long tot_in;
@@ -294,6 +295,18 @@ static int lzo_decompress_bio(struct list_head *ws, struct 
compressed_bio *cb)
 
data_in = kmap(pages_in[0]);
tot_len = read_compress_length(data_in);
+   /*
+* Compressed data header check.
+*
+* The real compressed size can't exceed extent length, and all pages
+* should be used (a full pending page is not possible).
+* If this happens it means the compressed extent is corrupted.
+*/
+   if (tot_len > min_t(size_t, BTRFS_MAX_COMPRESSED, srclen) ||
+   tot_len < srclen - PAGE_SIZE) {
+   ret = -EUCLEAN;
+   goto done;
+   }
 
tot_in = LZO_LEN;
in_offset = LZO_LEN;
@@ -308,6 +321,17 @@ static int lzo_decompress_bio(struct list_head *ws, struct 
compressed_bio *cb)
in_offset += LZO_LEN;
tot_in += LZO_LEN;
 
+   /*
+* Segment header check.
+*
+* The segment length must not exceed max lzo compression
+* size, nor the total compressed size
+*/
+   if (in_len > max_segment_len || tot_in + in_len > tot_len) {
+   ret = -EUCLEAN;
+   goto done;
+   }
+
tot_in += in_len;
working_bytes = in_len;
may_late_unmap = need_unmap = false;
@@ -358,7 +382,7 @@ static int lzo_decompress_bio(struct list_head *ws, struct 
compressed_bio *cb)
}
}
 
-   out_len = lzo1x_worst_compress(PAGE_SIZE);
+   out_len = max_segment_len;
ret = lzo1x_decompress_safe(buf, in_len, workspace->buf,
_len);
if (need_unmap)
@@ -368,6 +392,15 @@ static int lzo_decompress_bio(struct list_head *ws, struct 
compressed_bio *cb)
ret = -EIO;
break;
}
+   /*
+* Decompressed data length check.
+* The uncompressed data should not exceed uncompressed extent
+* size.
+*/
+   if (tot_out + out_len > cb->len) {
+   ret = -EUCLEAN;
+   break;
+   }
 
buf_start = tot_out;
tot_out += out_len;
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[PATCH 4/4] btrfs: lzo: Hardern inline lzo compressed extent decompression

2018-05-17 Thread Qu Wenruo
Unlike regular lzo compressed extent, inline extent doesn't have Header
and only has one Segment.
And further more, inlined extent always has csum in its leaf header,
it's less possible to have corrupted data.

Anyway, still add extra segment header length check.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/lzo.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
index 78ebc809072f..077851a9f498 100644
--- a/fs/btrfs/lzo.c
+++ b/fs/btrfs/lzo.c
@@ -425,6 +425,7 @@ static int lzo_decompress(struct list_head *ws, unsigned 
char *data_in,
struct workspace *workspace = list_entry(ws, struct workspace, list);
size_t in_len;
size_t out_len;
+   size_t max_segment_len = lzo1x_worst_compress(PAGE_SIZE);
int ret = 0;
char *kaddr;
unsigned long bytes;
@@ -434,6 +435,10 @@ static int lzo_decompress(struct list_head *ws, unsigned 
char *data_in,
data_in += LZO_LEN;
 
in_len = read_compress_length(data_in);
+   if (in_len > max_segment_len) {
+   ret = -EUCLEAN;
+   goto out;
+   }
data_in += LZO_LEN;
 
out_len = PAGE_SIZE;
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] btrfs: lzo: Add comment about the how btrfs records its lzo compressed data

2018-05-17 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/lzo.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
index 0667ea07f766..3d2ae4c08876 100644
--- a/fs/btrfs/lzo.c
+++ b/fs/btrfs/lzo.c
@@ -17,6 +17,29 @@
 
 #define LZO_LEN4
 
+/*
+ * Btrfs LZO compression format
+ *
+ * Regular LZO compressed data extent is consist of:
+ * 1.  Header
+ * Fixed size. LZO_LEN (4) bytes long, LE16.
+ * Records the total size (*includes* the header) of real compressed data.
+ *
+ * 2.  Segment(s)
+ * Variable size. Includes one segment header, and then data payload.
+ * One btrfs compressed data can have one or more segments.
+ *
+ * 2.1 Segment header
+ * Fixed size. LZO_LEN (4) bytes long, LE16.
+ * Records the total size of the segment (*excludes* the header).
+ *
+ * 2.2 Data Payload
+ * Variable size. Size up limit should be lzo1x_worst_compress(PAGE_SIZE).
+ *
+ * While for inlined LZO compressed data extent, it doesn't have Header, just
+ * one Segment.
+ */
+
 struct workspace {
void *mem;
void *buf;  /* where decompressed data goes */
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] btrfs: lzo: Harden decompression callers to avoid

2018-05-17 Thread Qu Wenruo
James Harvey reported pretty strange kernel misbehavior where after
reading certain btrfs compressed data, kernel crash with unrelated
calltrace.
(https://bugzilla.kernel.org/show_bug.cgi?id=199707 and
 https://www.spinics.net/lists/linux-btrfs/msg77971.html)

Thanks for his comprehensive debug help, we located the problem to:

1) Bad lzo decompression verification check
   The direct cause.
   The lzo decompression code involves memory copy for compressed data
   who crosses page boundary.
   And if corrupted segment header contains invalid length, btrfs will
   do memory copy and then corrupt kernel memory.

   Fix it by add header length check for each segment, and just in case,
   also add uncompressed data length check. (Patch 3)

2) Compressed extent created for NODATASUM inode
   The root cause.
   Will be addressed in another patchset.

Other patches are mostly cosmetic, like adding extra include for
compression.h, to avoid later compiling error. (Patch 1)
Or adding comment of how btrfs organise its compressed data (Patch 2)
And adding extra check for inlined compressed data even it's not really
possible to happen (Patch 4)

Qu Wenruo (4):
  btrfs: compression: Add linux/sizes.h for compression.h
  btrfs: lzo: Add comment about the how btrfs records its lzo compressed
data
  btrfs: lzo: Add header length check to avoid slab out of bounds access
  btrfs: lzo: Hardern inline lzo compressed extent decompression

 fs/btrfs/compression.h |  1 +
 fs/btrfs/lzo.c | 63 +-
 2 files changed, 63 insertions(+), 1 deletion(-)

-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] btrfs: compression: Add linux/sizes.h for compression.h

2018-05-17 Thread Qu Wenruo
Since compression.h is using SZ_* macros, and if some user only includes
compression.h without linux/sizes.h, it will cause compile error.

One example is lzo.c, if it uses BTRFS_MAX_COMPRESSED, it would cause
compile error.

Fix it by adding linux/sizes.h in compression.h

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/compression.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index cc605f7b23fb..317703d9b073 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -6,6 +6,7 @@
 #ifndef BTRFS_COMPRESSION_H
 #define BTRFS_COMPRESSION_H
 
+#include 
 /*
  * We want to make sure that amount of RAM required to uncompress an extent is
  * reasonable, so we limit the total size in ram of a compressed extent to
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v5 1/3] btrfs: Add unprivileged ioctl which returns subvolume information

2018-05-17 Thread Gu, Jinxiang


> -Original Message-
> From: linux-btrfs-ow...@vger.kernel.org 
> [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Tomohiro Misono
> Sent: Wednesday, May 16, 2018 1:50 PM
> To: linux-btrfs@vger.kernel.org
> Subject: [PATCH v5 1/3] btrfs: Add unprivileged ioctl which returns subvolume 
> information
> 
> Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns the 
> information of subvolume containing this inode.
> (i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.)
> 
> Signed-off-by: Tomohiro Misono 
> ---
>  v4 -> v5
> - Update error handling of btrfs_next_leaf() to cover all cases
> - Return error if ROOT_BACKREF is not found (except top-level)
> 
>  fs/btrfs/ioctl.c   | 146 
> +
>  include/uapi/linux/btrfs.h |  51 
>  2 files changed, 197 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 
> 48e2ddff32bd..c1c9ae9a937d 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2242,6 +2242,150 @@ static noinline int btrfs_ioctl_ino_lookup(struct 
> file *file,
>   return ret;
>  }
> 
> +/* Get the subvolume information in BTRFS_ROOT_ITEM and
> +BTRFS_ROOT_BACKREF */ static noinline int btrfs_ioctl_get_subvol_info(struct 
> file *file,
> +void __user *argp)
> +{
> + struct btrfs_ioctl_get_subvol_info_args *subvol_info;
> + struct btrfs_root *root;
> + struct btrfs_path *path;
> + struct btrfs_key key;
> +
> + struct btrfs_root_item root_item;
> + struct btrfs_root_ref *rref;
> + struct extent_buffer *l;
> + int slot;
> +
> + unsigned long item_off;
> + unsigned long item_len;
> +
> + struct inode *inode;
> + int ret;
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> +
> + subvol_info = kzalloc(sizeof(*subvol_info), GFP_KERNEL);
> + if (!subvol_info) {
> + btrfs_free_path(path);
> + return -ENOMEM;
> + }
> +
> + inode = file_inode(file);
> + root = BTRFS_I(inode)->root->fs_info->tree_root;
> +
> + key.objectid = BTRFS_I(inode)->root->root_key.objectid;
> + key.type = BTRFS_ROOT_ITEM_KEY;
> + key.offset = 0;
> +
> + ret = btrfs_search_slot(NULL, root, , path, 0, 0);
> + if (ret < 0) {
> + goto out;
> + } else if (ret > 0) {
> + u64 objectid = key.objectid;
> +
> + if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
> + ret = btrfs_next_leaf(root, path);
> + if (ret < 0) {
> + goto out;
> + } else if (ret > 0) {
> + ret = -ENOENT;
> + goto out;
> + }
> + }
> +
> + /* If the subvolume is a snapshot, offset is not zero */
> + btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
> + if (key.objectid != objectid ||
> + key.type != BTRFS_ROOT_ITEM_KEY) {
> + ret = -ENOENT;
> + goto out;
> + }
> + }
> +
> + l = path->nodes[0];
> + slot = path->slots[0];
> + item_off = btrfs_item_ptr_offset(l, slot);
> + item_len = btrfs_item_size_nr(l, slot);
> + read_extent_buffer(l, _item, item_off, item_len);
> +
> + subvol_info->id = key.objectid;
> +
> + subvol_info->generation = btrfs_root_generation(_item);
> + subvol_info->flags = btrfs_root_flags(_item);
> +
> + memcpy(subvol_info->uuid, root_item.uuid, BTRFS_UUID_SIZE);
> + memcpy(subvol_info->parent_uuid, root_item.parent_uuid,
> + BTRFS_UUID_SIZE);
> + memcpy(subvol_info->received_uuid, root_item.received_uuid,
> + BTRFS_UUID_SIZE);
> +
> + subvol_info->ctransid = btrfs_root_ctransid(_item);
> + subvol_info->ctime.sec = btrfs_stack_timespec_sec(_item.ctime);
> + subvol_info->ctime.nsec = btrfs_stack_timespec_nsec(_item.ctime);
> +
> + subvol_info->otransid = btrfs_root_otransid(_item);
> + subvol_info->otime.sec = btrfs_stack_timespec_sec(_item.otime);
> + subvol_info->otime.nsec = btrfs_stack_timespec_nsec(_item.otime);
> +
> + subvol_info->stransid = btrfs_root_stransid(_item);
> + subvol_info->stime.sec = btrfs_stack_timespec_sec(_item.stime);
> + subvol_info->stime.nsec = btrfs_stack_timespec_nsec(_item.stime);
> +
> + subvol_info->rtransid = btrfs_root_rtransid(_item);
> + subvol_info->rtime.sec = btrfs_stack_timespec_sec(_item.rtime);
> + subvol_info->rtime.nsec = btrfs_stack_timespec_nsec(_item.rtime);
> +
> + btrfs_release_path(path);
> + if (key.objectid != BTRFS_FS_TREE_OBJECTID) {
> + key.type = BTRFS_ROOT_BACKREF_KEY;
> + key.offset = 0;
> +