date:20180308

Re: Inconsistence between sender and receiver

2018-03-08 Thread Andrei Borzenkov

09.03.2018 08:38, Liu Bo пишет:
> On Thu, Mar 08, 2018 at 09:15:50AM +0300, Andrei Borzenkov wrote:
>> 07.03.2018 21:49, Liu Bo пишет:
>>> Hi,
>>>
>>> In the following steps[1], if  on receiver side has got
>>> changed via 'btrfs property set', then after doing incremental
>>> updates, receiver gets a different snapshot from what sender has sent.
>>>
>>> The reason behind it is that there is no change about file 'foo' in
>>> the send stream, such that receiver simply creates a snapshot of
>>>  on its side with nothing to apply from the send stream.
>>>
>>> A possible way to avoid this is to check rtransid and ctranid of
>>>  on receiver side, but I'm not very sure whether the current
>>> behavior is made deliberately, does anyone have an idea? 
>>>
>>> Thanks,
>>>
>>> -liubo
>>>
>>> [1]:
>>> $ btrfs sub create /mnt/send/sub
>>> $ touch /mnt/send/sub/foo
>>> $ btrfs sub snap -r /mnt/send/sub /mnt/send/parent
>>>
>>> # send parent out
>>> $ btrfs send /mnt/send/parent | btrfs receive /mnt/recv/
>>>
>>> # change parent and file under it
>>> $ btrfs property set -t subvol /mnt/recv/parent ro false
>>
>> Is removing the ability to modify read-only property an option? What are
>> use cases for it? What can it do that "btrfs sub snap read-only
>> writable" cannot?
>>
> 
> Tbh, I don't know any usecase for that, I just wanted to toggle off
> the ro bit in order to change something inside .
> 
>>> $ truncate -s 4096 /mnt/recv/parent/foo
>>>
>>> $ btrfs sub snap -r /mnt/send/sub /mnt/send/update
>>> $ btrfs send -p /mnt/send/parent /mnt/send/update | btrfs receive /mnt/recv
>>>
>>
>> This should fail right away because /mnt/send/parent is not read-only.
>> If it does not, this is really a bug.
>>
> 
> It's not '/mnt/send/parent', but '/mnt/recv/parent' got changed.
> 

Ah, sorry.

What happened to this patch which clears received_uuid when ro is
flipped off?

https://patchwork.kernel.org/patch/9986521/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: replace: cache rbio when rebuild data on missing device

2018-03-08 Thread Liu Bo

On Tue, Mar 06, 2018 at 12:28:18PM +0100, David Sterba wrote:
> On Fri, Mar 02, 2018 at 04:10:38PM -0700, Liu Bo wrote:
> > Rebuild on missing device is as same as recover, after it's done, rbio
> > has data which is consistent with on-disk data, so it can be cached to
> > avoid further reads.
> 
> Please add a comment that describes why the READ and REBUILD can be
> merged together, it's not obvious from the code.

Will update it.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Inconsistence between sender and receiver

2018-03-08 Thread Liu Bo

On Thu, Mar 08, 2018 at 03:29:48PM +, Filipe Manana wrote:
> On Wed, Mar 7, 2018 at 6:49 PM, Liu Bo  wrote:
> > Hi,
> >
> > In the following steps[1], if  on receiver side has got
> > changed via 'btrfs property set', then after doing incremental
> > updates, receiver gets a different snapshot from what sender has sent.
> >
> > The reason behind it is that there is no change about file 'foo' in
> > the send stream, such that receiver simply creates a snapshot of
> >  on its side with nothing to apply from the send stream.
> >
> > A possible way to avoid this is to check rtransid and ctranid of
> >  on receiver side, but I'm not very sure whether the current
> > behavior is made deliberately, does anyone have an idea?
> >
> > Thanks,
> >
> > -liubo
> >
> > [1]:
> > $ btrfs sub create /mnt/send/sub
> > $ touch /mnt/send/sub/foo
> > $ btrfs sub snap -r /mnt/send/sub /mnt/send/parent
> >
> > # send parent out
> > $ btrfs send /mnt/send/parent | btrfs receive /mnt/recv/
> >
> > # change parent and file under it
> > $ btrfs property set -t subvol /mnt/recv/parent ro false
> > $ truncate -s 4096 /mnt/recv/parent/foo
> >
> > $ btrfs sub snap -r /mnt/send/sub /mnt/send/update
> > $ btrfs send -p /mnt/send/parent /mnt/send/update | btrfs receive /mnt/recv
> >
> > $ ls -l /mnt/send/update
> > total 0
> > -rw-r--r-- 1 root root 0 Mar  6 11:13 foo
> >
> > $ ls -l /mnt/recv/update
> > total 0
> > -rw-r--r-- 1 root root 4096 Mar  6 11:14 foo
> 
> Interesting. I wasn't aware of that and I'm surprised too, I wouldn't
> expect that behaviour (and I must see the code to figure out why it
> behaves like that).

The code about comparing rtransid and ctransid does exist in
cmds-receive.c, but had been commented.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Inconsistence between sender and receiver

2018-03-08 Thread Liu Bo

On Thu, Mar 08, 2018 at 09:15:50AM +0300, Andrei Borzenkov wrote:
> 07.03.2018 21:49, Liu Bo пишет:
> > Hi,
> > 
> > In the following steps[1], if  on receiver side has got
> > changed via 'btrfs property set', then after doing incremental
> > updates, receiver gets a different snapshot from what sender has sent.
> > 
> > The reason behind it is that there is no change about file 'foo' in
> > the send stream, such that receiver simply creates a snapshot of
> >  on its side with nothing to apply from the send stream.
> > 
> > A possible way to avoid this is to check rtransid and ctranid of
> >  on receiver side, but I'm not very sure whether the current
> > behavior is made deliberately, does anyone have an idea? 
> > 
> > Thanks,
> > 
> > -liubo
> > 
> > [1]:
> > $ btrfs sub create /mnt/send/sub
> > $ touch /mnt/send/sub/foo
> > $ btrfs sub snap -r /mnt/send/sub /mnt/send/parent
> > 
> > # send parent out
> > $ btrfs send /mnt/send/parent | btrfs receive /mnt/recv/
> > 
> > # change parent and file under it
> > $ btrfs property set -t subvol /mnt/recv/parent ro false
> 
> Is removing the ability to modify read-only property an option? What are
> use cases for it? What can it do that "btrfs sub snap read-only
> writable" cannot?
>

Tbh, I don't know any usecase for that, I just wanted to toggle off
the ro bit in order to change something inside .

> > $ truncate -s 4096 /mnt/recv/parent/foo
> > 
> > $ btrfs sub snap -r /mnt/send/sub /mnt/send/update
> > $ btrfs send -p /mnt/send/parent /mnt/send/update | btrfs receive /mnt/recv
> > 
> 
> This should fail right away because /mnt/send/parent is not read-only.
> If it does not, this is really a bug.
>

It's not '/mnt/send/parent', but '/mnt/recv/parent' got changed.

Thanks,

-liubo

> Of course one may go one step further and set /mnt/send/parent read-only
> again, then we get this issue.
> 
> > $ ls -l /mnt/send/update
> > total 0
> > -rw-r--r-- 1 root root 0 Mar  6 11:13 foo
> > 
> > $ ls -l /mnt/recv/update
> > total 0
> > -rw-r--r-- 1 root root 4096 Mar  6 11:14 foo
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-08 Thread Kees Cook

On Thu, Mar 8, 2018 at 5:35 PM, Linus Torvalds
 wrote:
> I don't want to weaken the type enforcement, and I _thought_ you had
> done that __builtin_types_compatible_p() to keep it in place.

I thought so too (that originally came from Josh), but on removal, I
was surprised that the checking was retained. :)

> But if that's not why you did it, then why was it there at all? If the
> type warning shows through even if it's in the other expression, then
> just a
>
>
> #define __min(t1, t2, x, y) \
> __builtin_choose_expr(  \
> __builtin_constant_p(x) &   \
> __builtin_constant_p(y),\
> (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),  \
> __single_eval_min(t1, t2,   \
>...
>
> would seem to be sufficient?
>
> Because logically, the only thing that matters is that x and y don't
> have any side effects and can be evaluated twice, and
> "__builtin_constant_p()" is already a much stronger version of that.
>
> Hmm? The __builtin_types_compatible_p() just doesn't seem to matter
> for the only thing I thought it was there for.

Yup, agreed. I'll drop it.

-Kees

-- 
Kees Cook
Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-08 Thread Linus Torvalds

On Thu, Mar 8, 2018 at 4:45 PM, Kees Cook  wrote:
>
> Rasmus mentioned this too. What I said there was that I was shy to
> make that change, since we already can't mix that kind of thing with
> the existing min()/max() implementation. The existing min()/max() is
> already extremely strict, so there are no instances of this in the
> tree.

Yes, but I also didn't want to add any new cases in case people add
new min/max() users.

But:

> If I explicitly add one, I see this with or without the patch:
>
> In file included from drivers/misc/lkdtm.h:7:0,
>  from drivers/misc/lkdtm_core.c:33:
> drivers/misc/lkdtm_core.c: In function ‘lkdtm_module_exit’:
> ./include/linux/kernel.h:809:16: warning: comparison of distinct
> pointer types lacks a cast

Oh, ok, in that case, just drop the __builtin_types_compatible_p()
entirely. It's not adding anything.

I was expecting the non-chosen expression in __builtin_choose_expr()
to not cause type warnings. I'm actually surprised it does. Type games
is why __builtin_choose_expr() tends to exist in the first place.

> So are you saying you _want_ the type enforcement weakened here, or
> that I should just not use __builtin_types_compatible_p()?

I don't want to weaken the type enforcement, and I _thought_ you had
done that __builtin_types_compatible_p() to keep it in place.

But if that's not why you did it, then why was it there at all? If the
type warning shows through even if it's in the other expression, then
just a

#define __min(t1, t2, x, y) \
__builtin_choose_expr(  \
__builtin_constant_p(x) &   \
__builtin_constant_p(y),\
(t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),  \
__single_eval_min(t1, t2,   \
   ...

would seem to be sufficient?

Because logically, the only thing that matters is that x and y don't
have any side effects and can be evaluated twice, and
"__builtin_constant_p()" is already a much stronger version of that.

Hmm? The __builtin_types_compatible_p() just doesn't seem to matter
for the only thing I thought it was there for.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] btrfs: Add unprivileged version of ino_lookup ioctl

2018-03-08 Thread Misono, Tomohiro

On 2018/03/08 18:47, Nikolay Borisov wrote:
> 
> 
> On  6.03.2018 10:31, Misono, Tomohiro wrote:
>> Add unprivileged version of ino_lookup ioctl (BTRFS_IOC_INO_LOOKUP_USER)
>> to allow normal users to call "btrfs subvololume list/show" etc. in
>> combination with BTRFS_IOC_GET_SUBVOL_INFO.
>>
>> This can be used like BTRFS_IOC_INO_LOOKUP but the argument is
>> different because it also returns the name of bottom subvolume,
>> which is ommited by BTRFS_IOC_GET_SUBVOL_INFO ioctl.
>> Also, this must be called with fd which exists under the tree of
>> args->treeid to prevent for user to search any tree.
>>
>> The main differences from original ino_lookup ioctl are:
>>   1. Read permission will be checked using inode_permission()
>>  during path construction. -EACCES will be returned in case
>>  of failure.
>>   2. Path construction will be stopped at the inode number which
>>  corresponds to the fd with which this ioctl is called. If
>>  constructed path does not exist under fd's inode, -EACCES
>>  will be returned.
>>   3. The name of bottom subvolume is also searched and filled.
>>
>> Note that the maximum length of path is shorter 272 bytes than
> 
> Did you mean 255? Since BTRFS_VOL_NAME_MAX is defined as 255?

My mistake, actually 255 + 1 (for null) + 8 (for subvolume id) = 264 bytes in 
this version.

> 
>> ino_lookup ioctl because of space of subvolume's id and name.
>>
>> Signed-off-by: Tomohiro Misono 
>> ---
>>  fs/btrfs/ioctl.c   | 218 
>> +
>>  include/uapi/linux/btrfs.h |  16 
>>  2 files changed, 234 insertions(+)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 1dba309dce31..ac23da98b7e7 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -2425,6 +2425,179 @@ static noinline int btrfs_search_path_in_tree(struct 
>> btrfs_fs_info *info,
>>  return ret;
>>  }
>>  
>> +static noinline int btrfs_search_path_in_tree_user(struct btrfs_fs_info 
>> *info,
>> +struct super_block *sb,
>> +struct btrfs_key upper_limit,
>> +struct btrfs_ioctl_ino_lookup_user_args *args)
>> +{
>> +struct btrfs_root *root;
>> +struct btrfs_key key, key2;
>> +char *ptr;
>> +int ret = -1;
>> +int slot;
>> +int len;
>> +int total_len = 0;
>> +u64 dirid = args->dirid;
>> +struct btrfs_inode_ref *iref;
>> +struct btrfs_root_ref rref;
>> +
>> +unsigned long item_off;
>> +unsigned long item_len;
>> +
>> +struct extent_buffer *l;
>> +struct btrfs_path *path = NULL;
>> +
>> +struct inode *inode;
>> +int *new = 0;
>> +
>> +path = btrfs_alloc_path();
>> +if (!path)
>> +return -ENOMEM;
>> +
>> +if (dirid == upper_limit.objectid)
>> +/*
>> + * If the bottom subvolume exists directly under upper limit,
>> + * there is no need to construct the path and just get the
>> + * subvolume's name
>> + */
>> +goto get_subvol_name;
> 
> Eliminate the label and factor out the code in a new function or just
> put in the body of the if.

ok.

> 
>> +if (dirid == BTRFS_FIRST_FREE_OBJECTID)
>> +/* The subvolume does not exist under upper_limit */
>> +goto access_err;
> 
> just set ret to -EACCESS and goto out. And eliminate access_err label
> altogether. Furthermore, shouldn't this check really ether:
> 
> a) Be the first one in this function so that we return ASAP (i.e. even
> before calling btrfs_alloc_path)
> 
> or
> 
> b) Put in the ioctl function itself. Currently for the

I'd like to adopt b) approach.

> btrfs_ioctl_ino_lookup case it's duplicated in both that function and
> btrfs_search_path_in_tree
> >> +
>> +ptr = >path[BTRFS_INO_LOOKUP_PATH_MAX - 1];
>> +
>> +key.objectid = args->treeid;
>> +key.type = BTRFS_ROOT_ITEM_KEY;
>> +key.offset = (u64)-1;
>> +root = btrfs_read_fs_root_no_name(info, );
>> +if (IS_ERR(root)) {
>> +btrfs_err(info, "could not find root %llu", args->treeid);
>> +ret = -ENOENT;
>> +goto out;
>> +}
>> +
>> +key.objectid = dirid;
>> +key.type = BTRFS_INODE_REF_KEY;
>> +key.offset = (u64)-1;
>> +
>> +while (1) {
>> +ret = btrfs_search_slot(NULL, root, , path, 0, 0);
>> +if (ret < 0)
>> +goto out;
>> +else if (ret > 0) {
>> +ret = btrfs_previous_item(root, path, dirid,
>> +  BTRFS_INODE_REF_KEY);
>> +if (ret < 0)
>> +goto out;
>> +else if (ret > 0) {
>> +ret = -ENOENT;
>> +goto out;
>> +}
>> +}
>> +
>> +l = path->nodes[0];
>> +slot = path->slots[0];

Re: [PATCH 1/2] btrfs: Add unprivileged subvolume search ioctl

2018-03-08 Thread Misono, Tomohiro


On 2018/03/08 17:29, Nikolay Borisov wrote:
> 
> 
> On  6.03.2018 10:30, Misono, Tomohiro wrote:
>> Add new unprivileged ioctl (BTRFS_IOC_GET_SUBVOL_INFO) which searches
>> and returns only subvolume related item (ROOT_ITEM/ROOT_BACKREF/ROOT_REF)
>> from root tree. The arguments of this ioctl are the same as treesearch
>> ioctl and can be used like treesearch ioctl.
>>
>> Since treesearch ioctl requires root privilege, this ioctl is needed
>> to allow normal users to call "btrfs subvolume list/show" etc.
>>
>> Note that the subvolume name in ROOT_BACKREF/ROOT_REF will not be copied to
>> prevent potential name leak. The name can be obtained by calling
>> user version's ino_lookup ioctl (BTRFS_IOC_INO_LOOKUP_USER).
>>
>> Signed-off-by: Tomohiro Misono 
>> ---
>>  fs/btrfs/ioctl.c   | 254 
>> +
>>  include/uapi/linux/btrfs.h |   2 +
>>  2 files changed, 256 insertions(+)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 111ee282b777..1dba309dce31 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -1921,6 +1921,28 @@ static noinline int key_in_sk(struct btrfs_key *key,
>>  return 1;
>>  }
>>  
>> +/*
>> + * check if key is in sk and subvolume related range
>> + */
>> +static noinline int key_in_sk_and_subvol(struct btrfs_key *key,
>> +  struct btrfs_ioctl_search_key *sk)
>> +{
>> +int ret;
>> +
>> +ret = key_in_sk(key, sk);
>> +if (!ret)
>> +return 0;
>> +
>> +if ((key->objectid == BTRFS_FS_TREE_OBJECTID ||
>> +(key->objectid >= BTRFS_FIRST_FREE_OBJECTID &&
>> + key->objectid <= BTRFS_LAST_FREE_OBJECTID)) &&
> 
> Why do you need the FIRST_FREE/LAST_FREE object id checks? The range
> described by those are ordinary files. Since you are only interested in
> subvolume data then you should omit those, no?

There are ROOT_ITEM for other trees (EXTETN_TREE, DEV_TREE etc.) and they 
should be skipped.

> 
>> +key->type >= BTRFS_ROOT_ITEM_KEY &&
>> +key->type <= BTRFS_ROOT_BACKREF_KEY)
> I think this check should really be:
> 
> if (key->objectid == BTRFS_FS_TREE_OBJECTID || key->objectid ==
> BTRFS_ROOT_ITEM_KEY || key->type == BTRFS_ROOT_BACKREF_KEY
>>> +   return 1;
>> +
>> +return 0;
>> +}
>> +
>>  static noinline int copy_to_sk(struct btrfs_path *path,
>> struct btrfs_key *key,
>> struct btrfs_ioctl_search_key *sk,
>> @@ -2045,6 +2067,142 @@ static noinline int copy_to_sk(struct btrfs_path 
>> *path,
>>  return ret;
>>  }
>>  
>> +/*
>> + * Basically the same as copy_to_sk() but restricts the copied item
>> + * within subvolume related one using key_in_sk_and_subvol().
>> + * Also, the name of subvolume will be omitted from
>> + * ROOT_BACKREF/ROOT_REF item.
>> + */
>> +static noinline int copy_subvol_item(struct btrfs_path *path,
>> +   struct btrfs_key *key,
>> +   struct btrfs_ioctl_search_key *sk,
>> +   size_t *buf_size,
>> +   char __user *ubuf,
>> +   unsigned long *sk_offset,
>> +   int *num_found)
>> +{
>> +u64 found_transid;
>> +struct extent_buffer *leaf;
>> +struct btrfs_ioctl_search_header sh;
>> +struct btrfs_key test;
>> +unsigned long item_off;
>> +unsigned long item_len;
>> +int nritems;
>> +int i;
>> +int slot;
>> +int ret = 0;
>> +
>> +leaf = path->nodes[0];
>> +slot = path->slots[0];
>> +nritems = btrfs_header_nritems(leaf);
>> +
>> +if (btrfs_header_generation(leaf) > sk->max_transid) {
>> +i = nritems;
>> +goto advance_key;
> 
> I don't see why you need a goto label at all. Just put the necessary
> code inside the if and return directly.

Most of this code is just copied from copy_to_sk().
I will rewrite code in more appropriate way in future version.

> 
>> +}
>> +found_transid = btrfs_header_generation(leaf);
>> +
>> +for (i = slot; i < nritems; i++) {
>> +item_off = btrfs_item_ptr_offset(leaf, i);
>> +item_len = btrfs_item_size_nr(leaf, i);
>> +
>> +btrfs_item_key_to_cpu(leaf, key, i);
>> +if (!key_in_sk_and_subvol(key, sk))
>> +continue;
>> +
>> +/* will not copy the name of subvolume */
>> +if (key->type == BTRFS_ROOT_BACKREF_KEY ||
>> +key->type == BTRFS_ROOT_REF_KEY)
>> +item_len = sizeof(struct btrfs_root_ref);
>> +
>> +if (sizeof(sh) + item_len > *buf_size) {
>> +if (*num_found) {
>> +ret = 1;
>> +goto out;
>  This can be a simple return 1;
> 
> 
>> +}
>> +
>> +/*
>> + * return one empty item back for

Re: [PATCH 1/2] btrfs: Add unprivileged subvolume search ioctl

2018-03-08 Thread Misono, Tomohiro



On 2018/03/08 4:00, Goffredo Baroncelli wrote:
> On 03/07/2018 01:40 AM, Misono, Tomohiro wrote:
>> On 2018/03/07 5:29, Goffredo Baroncelli wrote:
>>> On 03/06/2018 09:30 AM, Misono, Tomohiro wrote:
 Add new unprivileged ioctl (BTRFS_IOC_GET_SUBVOL_INFO) which searches
 and returns only subvolume related item (ROOT_ITEM/ROOT_BACKREF/ROOT_REF)
 from root tree. The arguments of this ioctl are the same as treesearch
 ioctl and can be used like treesearch ioctl.
>>>
>>> Is it a pro ? The treesearch ioctl is tightly coupled to the btrfs internal 
>>> structure, this means that if we would change the btrfs internal structure, 
>>> we have to update all the clients of this api. For the treesearch it is an 
>>> acceptable compromise between flexibility and speed of developing. But for 
>>> a more specialized API, I think that it would make sense to provide a less 
>>> coupled api to the internal structure.
>>
>> Thanks for the comments.
>>
>> The reason I choose the same api is just to minimize the code change in 
>> btrfs-progs.
>> For tree search part, it works just switching the ioctl number from 
>> BTRFS_IOC_TREE_SEARCH
>> to BTRFS_IOC_GET_SUBVOL_INFO in list_subvol_search()[1].
>>
>> [1] https://marc.info/?l=linux-btrfs=152032537911218=2
> 
> I suggest to avoid this approach. An API is forever; we already have a 
> "root-only" one which is quite unfriendly and error prone; I think that we 
> should put all the energies to make a better one. 
> 
> I think that the major weaknesses of this api are:
> - it exports the the data in "le" format  (see struct btrfs_root_item as 
> example); 
> - it requires the user to increase the key for the next ioctl call. This 
> could be doable in the kernel space before returning
> - this ioctl exports both the ROOT_BACKREF and ROOT_ITEM info. Why not make 
> two separate (simplers) ioctl(s) ?

I agree with you and will split this ioctl into two pats (one is for getting 
ROOT_ITEM and the other for ROOT_REF/BACKREF)
and make them to have user friendly apis.

> 
>>>
>>> Below some comments
> [...]
> 
 +  if ((key->objectid == BTRFS_FS_TREE_OBJECTID ||
 +  (key->objectid >= BTRFS_FIRST_FREE_OBJECTID &&
 +   key->objectid <= BTRFS_LAST_FREE_OBJECTID)) &&
 +  key->type >= BTRFS_ROOT_ITEM_KEY &&
 +  key->type <= BTRFS_ROOT_BACKREF_KEY)
>>>
>>> Why returning all the range [BTRFS_ROOT_ITEM_KEY...BTRFS_ROOT_BACKREF_KEY]. 
>>> It would be sufficient to return only
>>>
>>>   + (key->type == BTRFS_ROOT_ITEM_KEY ||
>>>   +  key->type == BTRFS_ROOT_BACKREF_KEY))
>>
>> Sorry, it is a mistake, I mean "key->type <= BTRFS_ROOTREF_KEY".
>> Although btrfs-progs only uses BTRFS_ROOT_BACKREF_KEY, I notice libbtrfs 
>> uses BTRFS_ROOT_REF_KEY instead. So, I think it is better to return both
>> ROOT_BACKREF_KEY and ROOT_REF_KEY. I will fix this in v2.
>>
> 
> I was referring to the '>=' and '<=' instead of '=='. If another type is 
> added in the middle, it would be returned. I find it a bit error prone.

ok, I will change this. thanks.
Tomohiro Misono

> 
> BR
> G.Baroncelli
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-08 Thread Kees Cook

On Thu, Mar 8, 2018 at 3:48 PM, Linus Torvalds
 wrote:
> On Thu, Mar 8, 2018 at 1:40 PM, Kees Cook  wrote:
>> +#define __min(t1, t2, x, y)\
>> +   __builtin_choose_expr(__builtin_constant_p(x) &&\
>> + __builtin_constant_p(y) &&\
>> + __builtin_types_compatible_p(t1, t2), \
>> + (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
>
> I understand why you use __builtin_types_compatible_p(), but please don't.
>
> It will mean that trivial constants like "5" and "sizeof(x)" won't
> simplify, because they have different types.

Rasmus mentioned this too. What I said there was that I was shy to
make that change, since we already can't mix that kind of thing with
the existing min()/max() implementation. The existing min()/max() is
already extremely strict, so there are no instances of this in the
tree. If I explicitly add one, I see this with or without the patch:

In file included from drivers/misc/lkdtm.h:7:0,
 from drivers/misc/lkdtm_core.c:33:
drivers/misc/lkdtm_core.c: In function ‘lkdtm_module_exit’:
./include/linux/kernel.h:809:16: warning: comparison of distinct
pointer types lacks a cast
  (void) ( == );   \
^
./include/linux/kernel.h:818:2: note: in expansion of macro ‘__max’
  __max(typeof(x), typeof(y),   \
  ^
./include/linux/printk.h:308:34: note: in expansion of macro ‘max’
  printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
  ^~~
drivers/misc/lkdtm_core.c:500:2: note: in expansion of macro ‘pr_info’
  pr_info("%lu\n", max(16, sizeof(unsigned long)));
  ^~~

> The ?: will give the right combined type anyway, and if you want the
> type comparison warning, just add a comma-expression with something
> like like
>
>(t1 *)1 == (t2 *)1
>
> to get the type compatibility warning.

When I tried removing __builtin_types_compatible_p(), I still got the
type-check warning because I think the preprocessor still sees the
"(void) ( == )" before optimizing? So, I technically _can_
drop the __builtin_types_compatible_p(), and still keep the type
warning. :P

> Yeah, yeah, maybe none of the VLA cases triggered that, but it seems
> silly to not just get that obvious constant case right.
>
> Hmm?

So are you saying you _want_ the type enforcement weakened here, or
that I should just not use __builtin_types_compatible_p()?

Thanks!

-Kees

-- 
Kees Cook
Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/20] btrfs-progs: qgroups: introduce btrfs_qgroup_query

2018-03-08 Thread Qu Wenruo



On 2018年03月08日 23:21, Jeff Mahoney wrote:
> On 3/8/18 12:54 AM, Qu Wenruo wrote:
>>
>>
>> On 2018年03月08日 10:40, je...@suse.com wrote:
>>> From: Jeff Mahoney 
>>>
>>> The only mechanism we have in the progs for searching qgroups is to load
>>> all of them and filter the results.  This works for qgroup show but
>>> to add quota information to 'btrfs subvoluem show' it's pretty wasteful.
>>>
>>> This patch splits out setting up the search and performing the search so
>>> we can search for a single qgroupid more easily.  Since TREE_SEARCH
>>> will give results that don't strictly match the search terms, we add
>>> a filter to match only the results we care about.
>>>
>>> Signed-off-by: Jeff Mahoney 
>>> ---
>>>  qgroup.c | 143 
>>> ---
>>>  qgroup.h |   7 
>>>  2 files changed, 116 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/qgroup.c b/qgroup.c
>>> index 57815718..d076b1de 100644
>>> --- a/qgroup.c
>>> +++ b/qgroup.c
>>> @@ -1165,11 +1165,30 @@ static inline void print_status_flag_warning(u64 
>>> flags)
>>> warning("qgroup data inconsistent, rescan recommended");
>>>  }
>>>  
>>> -static int __qgroups_search(int fd, struct qgroup_lookup *qgroup_lookup)
>>> +static bool key_in_range(const struct btrfs_key *key,
>>> +const struct btrfs_ioctl_search_key *sk)
>>> +{
>>> +   if (key->objectid < sk->min_objectid ||
>>> +   key->objectid > sk->max_objectid)
>>> +   return false;
>>> +
>>> +   if (key->type < sk->min_type ||
>>> +   key->type > sk->max_type)
>>> +   return false;
>>> +
>>> +   if (key->offset < sk->min_offset ||
>>> +   key->offset > sk->max_offset)
>>> +   return false;
>>> +
>>> +   return true;
>>> +}
>>
>> Even with the key_in_range() check here, we are still following the tree
>> search slice behavior:
>>
>> tree search will still gives us all the items in key range from
>> (min_objectid, min_type, min_offset) to
>> (max_objectid, max_type, max_offset).
>>
>> I don't see much different between the tree search ioctl and this one.
> 
> It's fundamentally different.
> 
> The one in the kernel has a silly interface.  It should be min_key and
> max_key since the components aren't evaluated independently.  It
> effectively treats min_key and max_key as 136-bit values and returns
> everything between them, inclusive.  That's the slice behavior, as you
> call it.
> 
> This key_in_range treats each component separately and acts as a filter
> on the slice returned from the kernel.  If we request min/max_offset =
> 259, the caller will not get anything without offset = 259.

You're right.

I'm just an idiot, the separate check on objectid/type/offset breaks the
slice behavior so we won't found any thing unrelated.


So the current design is completely fine, even we try to query something
doesn't exist we won't have any noise.

Thanks for the filter and this really helps a lot.

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> 
> I suppose, ultimately, this could also be done using a filter on the
> rbtree using the existing interface but that seems even more wasteful.
> 
> -Jeff
> 
>>> +
>>> +static int __qgroups_search(int fd, struct btrfs_ioctl_search_args *args,
>>> +   struct qgroup_lookup *qgroup_lookup)
>>>  {
>>> int ret;
>>> -   struct btrfs_ioctl_search_args args;
>>> -   struct btrfs_ioctl_search_key *sk = 
>>> +   struct btrfs_ioctl_search_key *sk = >key;
>>> +   struct btrfs_ioctl_search_key filter_key = args->key;
>>> struct btrfs_ioctl_search_header *sh;
>>> unsigned long off = 0;
>>> unsigned int i;
>>> @@ -1180,30 +1199,15 @@ static int __qgroups_search(int fd, struct 
>>> qgroup_lookup *qgroup_lookup)
>>> u64 qgroupid;
>>> u64 qgroupid1;
>>>  
>>> -   memset(, 0, sizeof(args));
>>> -
>>> -   sk->tree_id = BTRFS_QUOTA_TREE_OBJECTID;
>>> -   sk->max_type = BTRFS_QGROUP_RELATION_KEY;
>>> -   sk->min_type = BTRFS_QGROUP_STATUS_KEY;
>>> -   sk->max_objectid = (u64)-1;
>>> -   sk->max_offset = (u64)-1;
>>> -   sk->max_transid = (u64)-1;
>>> -   sk->nr_items = 4096;
>>> -
>>> qgroup_lookup_init(qgroup_lookup);
>>>  
>>> while (1) {
>>> -   ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, );
>>> +   ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args);
>>> if (ret < 0) {
>>> -   if (errno == ENOENT) {
>>> -   error("can't list qgroups: quotas not enabled");
>>> +   if (errno == ENOENT)
>>> ret = -ENOTTY;
>>> -   } else {
>>> -   error("can't list qgroups: %s",
>>> -  strerror(errno));
>>> +   else
>>> ret = -errno;
>>> -   }
>>> -
>>> break;
>>> }
>>>  
>>> @@ -1217,37 +1221,46 @@ static int __qgroups_search(int fd,

Re: [PATCH] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-08 Thread Linus Torvalds

On Thu, Mar 8, 2018 at 1:40 PM, Kees Cook  wrote:
> +#define __min(t1, t2, x, y)\
> +   __builtin_choose_expr(__builtin_constant_p(x) &&\
> + __builtin_constant_p(y) &&\
> + __builtin_types_compatible_p(t1, t2), \
> + (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\

I understand why you use __builtin_types_compatible_p(), but please don't.

It will mean that trivial constants like "5" and "sizeof(x)" won't
simplify, because they have different types.

The ?: will give the right combined type anyway, and if you want the
type comparison warning, just add a comma-expression with something
like like

   (t1 *)1 == (t2 *)1

to get the type compatibility warning.

Yeah, yeah, maybe none of the VLA cases triggered that, but it seems
silly to not just get that obvious constant case right.

Hmm?

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: spurious full btrfs corruption

2018-03-08 Thread Qu Wenruo



On 2018年03月08日 22:38, Christoph Anton Mitterer wrote:
> Hey.
> 
> 
> On Tue, 2018-03-06 at 09:50 +0800, Qu Wenruo wrote:
>>> These were the two files:
>>> -rw-r--r-- 1 calestyo calestyo   90112 Feb 22 16:46 'Lady In The
>>> Water/05.mp3'
>>> -rw-r--r-- 1 calestyo calestyo 4892407 Feb 27 23:28
>>> '/home/calestyo/share/music/Lady In The Water/05.mp3'
>>>
>>>
>>> -rw-r--r-- 1 calestyo calestyo 1904640 Feb 22 16:47 'The Hunt For
>>> Red October [Intrada]/21.mp3'
>>> -rw-r--r-- 1 calestyo calestyo 2968128 Feb 27 23:28
>>> '/home/calestyo/share/music/The Hunt For Red October
>>> [Intrada]/21.mp3'
>>>
>>> with the former (smaller one) being the corrupted one (i.e. the one
>>> returned by btrfs-restore).
>>>
>>> Both are (in terms of filesize) multiples of 4096... what does that
>>> mean now?
>>
>> That means either we lost some file extents or inode items.
>>
>> Btrfs-restore only found EXTENT_DATA, which contains the pointer to
>> the
>> real data, and inode number.
>> But no INODE_ITEM is found, which records the real inode size, so it
>> can
>> only use EXTENT_DATA to rebuild as much data as possible.
>> That why all recovered one is aligned to 4K.
>>
>> So some metadata is also corrupted.
> 
> But that can also happen to just some files?

Yep, one tree leaf corruption would leads to corruption to several files.

> Anyway... still strange that it hit just those two (which weren't
> touched for long).
> 
> 
>>> However, all the qcow2 files from the restore are more or less
>>> garbage.
>>> During the btrfs-restore it already complained on them, that it
>>> would
>>> loop too often on them and whether I want to continue or not (I
>>> choose
>>> n and on another full run I choose y).
>>>
>>> Some still contain a partition table, some partitions even
>>> filesystems
>>> (btrfs again)... but I cannot mount them.
>>
>> I think the same problem happens on them too.
>>
>> Some data is lost while some are good.
>> Anyway, they would be garbage.
> 
> Again, still strange... that so many files (of those that I really
> checked) were fully okay,... while those 4 were all broken.

One leaf contains some extent data is corrupted here.

> 
> When it only uses EXTENT_DATA, would that mean that it basically breaks
> on every border where the file is split up into multiple extents (which
> is of course likely for the (CoWed) images that I had.

This depends on the leaf boundary.

But normally one corrupted leaf can only lead to one or two corruption.
For 4 files, we have at least 2 leaf corruption.


> 
> 
> 
>>>
 Would you please try to restore the fs on another system with
 good
 memory?
>>>
>>> Which one? The originally broken fs from the SSD?
>>
>> Yep.
>>
>>> And what should I try to find out here?
>>
>> During restore, if the csum error happens again on the newly created
>> destination btrfs.
>> (And I recommend use mount option nospace_cache,notreelog on the
>> destination fs)
> 
> So an update on this (everything on the OLD notebook with likely good
> memory):
> 
> I booted again from USBstick (with 4.15 kernel/progs),
> luksOpened+losetup+luksOpened (yes two dm-crypt, the one from the
> external restore HDD, then the image file of the SSD which again
> contained dmcrypt+LUKS, of which one was the broken btrfs).
> 
> As I've mentioned before... btrfs-restore (and the other tools for
> trying to find the bytenr) immediately fail here.
> They bring some "block mapping error" and produce no output.
> 
> This worked on my first rescue attempt (where I had 4.12 kernel/progs).
> 
> Since I had no 4.12 kernel/progs at hand anymore, I went to an even
> older rescue stick, wich has 4.7 kernel/progs (if I'm not wrong).
> There it worked again (on the same image file).
> 
> So something changed after 4.14, which makes the tools no longer being
> able to restore at least that what they could restore at 4.14.

This seems to be a regression.
But I'm not sure if it's the kernel to blame or the btrfs-progs.

> 
> 
> => Some bug recently introduced in btrfs-progs?

Is the "block mapping error" message from kernel or btrfs-progs?

> 
> 
> 
> 
> I finished the dump then (from OLD notebook/good RAM) with 4.7
> kernel/progs,... to the very same external HDD I've used before.
> 
> And afterwards I:
> diff -qr --no-dereference restoreFromNEWnotebook/ restoreFromOLDnotebook/
> 
> => No differences were found, except one further file that was in the
> new restoreFromOLDnotebook. Could be that this was a file wich I
> deleted on the old restore because of csum errors, but I don't really
> remember (actually I thought to remember that there were a few which I
> deleted).
> 
> Since all other files were equal (that is at least in terms of file
> contents and symlink targets - I didn't compare the metadata like
> permissions, dates and owners)... the qcow2 images are garbage as well.
> 
> => No csum errors were recorded in the kernel log during the diff, and
> since both, the (remaining) restore results from the NEW notebook and
>

[PATCH v2] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-08 Thread Kees Cook

When max() is used in stack array size calculations from literal values
(e.g. "char foo[max(sizeof(struct1), sizeof(struct2))]", the compiler
thinks this is a dynamic calculation due to the single-eval logic, which
is not needed in the literal case. This change removes several accidental
stack VLAs from an x86 allmodconfig build:

$ diff -u before.txt after.txt | grep ^-
-drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids 
variable length array ‘ids’ [-Wvla]
-fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length array 
‘namebuf’ [-Wvla]
-lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ 
[-Wvla]
-net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array ‘buff’ 
[-Wvla]
-net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array ‘buff’ 
[-Wvla]
-net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array ‘buff64’ 
[-Wvla]

Based on an earlier patch from Josh Poimboeuf.

Signed-off-by: Kees Cook 
---
v2:
- fix copy/paste-o max1_/max2_ (ijc)
- clarify "compile-time" constant in comment (Rasmus)
- clean up formatting on min_t()/max_t()
---
 include/linux/kernel.h | 50 --
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..108cdf7bd484 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -787,37 +787,57 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  * strict type-checking.. See the
  * "unnecessary" pointer comparison.
  */
-#define __min(t1, t2, min1, min2, x, y) ({ \
+#define __single_eval_min(t1, t2, min1, min2, x, y) ({ \
t1 min1 = (x);  \
t2 min2 = (y);  \
(void) ( == );\
min1 < min2 ? min1 : min2; })
 
+/*
+ * In the case of compile-time constant values, there is no need to do
+ * the double-evaluation protection, so the raw comparison can be made.
+ * This allows min()/max() to be used in stack array allocations and
+ * avoid the compiler thinking it is a dynamic value leading to an
+ * accidental VLA.
+ */
+#define __min(t1, t2, x, y)\
+   __builtin_choose_expr(__builtin_constant_p(x) &&\
+ __builtin_constant_p(y) &&\
+ __builtin_types_compatible_p(t1, t2), \
+ (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
+ __single_eval_min(t1, t2, \
+   __UNIQUE_ID(min1_), \
+   __UNIQUE_ID(min2_), \
+   x, y))
+
 /**
  * min - return minimum of two values of the same or compatible types
  * @x: first value
  * @y: second value
  */
-#define min(x, y)  \
-   __min(typeof(x), typeof(y), \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
- x, y)
+#define min(x, y)  __min(typeof(x), typeof(y), x, y)
 
-#define __max(t1, t2, max1, max2, x, y) ({ \
+#define __single_eval_max(t1, t2, max1, max2, x, y) ({ \
t1 max1 = (x);  \
t2 max2 = (y);  \
(void) ( == );\
max1 > max2 ? max1 : max2; })
 
+#define __max(t1, t2, x, y)\
+   __builtin_choose_expr(__builtin_constant_p(x) &&\
+ __builtin_constant_p(y) &&\
+ __builtin_types_compatible_p(t1, t2), \
+ (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
+ __single_eval_max(t1, t2, \
+   __UNIQUE_ID(max1_), \
+   __UNIQUE_ID(max2_), \
+   x, y))
 /**
  * max - return maximum of two values of the same or compatible types
  * @x: first value
  * @y: second value
  */
-#define max(x, y)  \
-   __max(typeof(x), typeof(y), \
- __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
- x, y)
+#define max(x, y)  __max(typeof(x), typeof(y), x, y)
 
 /**
  * min3 - return minimum of three values
@@ -869,10 +889,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  * @x: first value
  * @y: second value
  */
-#define min_t(type, x, y)  \
-   __min(type, type,   \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),

Re: [PATCH 0/3] Remove accidental VLA usage

2018-03-08 Thread Kees Cook

On Thu, Mar 8, 2018 at 2:12 PM, Rasmus Villemoes
 wrote:
> On 8 March 2018 at 21:39, Kees Cook  wrote:
>> However, this works for me:
>>
>> #define __new_max(t1, t2, max1, max2, x, y)\
>>__builtin_choose_expr(__builtin_constant_p(x) && \
>>  __builtin_constant_p(y) && \
>>  __builtin_types_compatible_p(t1, t2), \
>>  (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
>>  __max(t1, t2, max1, max2, x, y))
>>
>> #define new_max(x, y) \
>> __new_max(typeof(x), typeof(y), \
>>   __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
>>   x, y)
>
> Yes, that would seem to do the trick.
>
> Thinking out loud: do we really want or need the
> __builtin_types_compatible condition when x and y are compile-time
> constants? I think it would be nice to be able to use max(16,
> sizeof(bla)) without having to cast either the literal or the sizeof.
> Just omitting the type compatibility check might be dangerous, but
> perhaps it could be relaxed to a check that both values are
> representable in their common promoted type. Something like
>
> (type_signed(t1) == type_signed(t2)) || ((t1)x >= 0 && (t2)y >= 0)
>
> should be safe (if the types have same signedness, or the value of
> signed type is positive), though it doesn't allow a few corner cases
> (e.g. short vs. unsigned char is always ok due to promotion to int,
> and also if the signed type is strictly wider than the unsigned type).

I agree, it would be nice. However, I think it'd be better to continue
to depend on max_t() for these kinds of cases though. For example:

char foo[max_t(size_t, 6, sizeof(something))];

Works with the proposed patch.

Also, I think this mismatch would already be triggering warnings, so
we shouldn't have any currently.

-Kees

-- 
Kees Cook
Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs-progs: free-space-cache: Enhance free space cache free space check

2018-03-08 Thread Qu Wenruo



On 2018年03月08日 22:05, Nikolay Borisov wrote:
> 
> 
> On  8.03.2018 09:02, Qu Wenruo wrote:
>> When we found free space difference between free space cache and block
>> group item, we just discard this free space cache.
>>
>> Normally such difference is caused by btrfs_reserve_extent() called by
>> delalloc which is out of a transaction.
>> And since all btrfs_release_extent() is called with a transaction, under
>> heavy race free space cache can have less free space than block group
>> item.
>>
>> Normally kernel will detect such difference and just discard that cache.
>>
>> However we must be more careful if free space cache has more free space
>> cache, and if that happens, paried with above race one invalid free
>> space cache can be loaded into kernel.
>>
>> So if we find any free space cache who has more free space then block
>> group item, we report it as an error other than ignoring it.
>>
>> Signed-off-by: Qu Wenruo 
>> ---
>> v2:
>>   Fix the timming of free space output.
>> ---
>>  check/main.c   |  4 +++-
>>  free-space-cache.c | 32 
>>  2 files changed, 27 insertions(+), 9 deletions(-)
>>
>> diff --git a/check/main.c b/check/main.c
>> index 97baae583f04..bc31f7e32061 100644
>> --- a/check/main.c
>> +++ b/check/main.c
>> @@ -5339,7 +5339,9 @@ static int check_space_cache(struct btrfs_root *root)
>>  error += ret;
>>  } else {
>>  ret = load_free_space_cache(root->fs_info, cache);
>> -if (!ret)
>> +if (ret < 0)
>> +error++;
>> +if (ret <= 0)
>>  continue;
>>  }
>>  
>> diff --git a/free-space-cache.c b/free-space-cache.c
>> index f933f9f1cf3f..9b83a71ca59a 100644
>> --- a/free-space-cache.c
>> +++ b/free-space-cache.c
>> @@ -438,7 +438,8 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info,
>>  struct btrfs_path *path;
>>  u64 used = btrfs_block_group_used(_group->item);
>>  int ret = 0;
>> -int matched;
>> +u64 bg_free;
>> +s64 diff;
>>  
>>  path = btrfs_alloc_path();
>>  if (!path)
>> @@ -448,18 +449,33 @@ int load_free_space_cache(struct btrfs_fs_info 
>> *fs_info,
>>block_group->key.objectid);
>>  btrfs_free_path(path);
>>  
>> -matched = (ctl->free_space == (block_group->key.offset - used -
>> -   block_group->bytes_super));
>> -if (ret == 1 && !matched) {
>> -__btrfs_remove_free_space_cache(ctl);
>> +bg_free = block_group->key.offset - used - block_group->bytes_super;
>> +diff = ctl->free_space - bg_free;
>> +if (ret == 1 && diff) {
>>  fprintf(stderr,
>> -   "block group %llu has wrong amount of free space\n",
>> -   block_group->key.objectid);
>> +   "block group %llu has wrong amount of free space, free 
>> space cache has %llu block group has %llu\n",nit: Always put units when 
>> printing numbers. In this case we are talking
> about bytes.
>> +   block_group->key.objectid, ctl->free_space, bg_free);
>> +__btrfs_remove_free_space_cache(ctl);
>> +/*
>> + * Due to btrfs_reserve_extent() can happen out of > +  
>>  * transaction, but all btrfs_release_extent() happens inside
>> + * a transaction, so under heavy race it's possible that free
>> + * space cache has less free space, and both kernel just discard
>> + * such cache. But if we find some case where free space cache
>> + * has more free space, this means under certain case such
>> + * cache can be loaded and cause double allocate.
>> + *
>> + * Detect such possibility here.
>> + */
>> +if (diff > 0)
>> +error(
>> +"free space cache has more free space than block group item, this could 
>> leads to serious corruption, please contact btrfs developers");
> 
> I'm not entirely happy with this message. So they will post to the
> mailing list saying something along the lines of "I got this message
> what do I do no, please help".  Better to output actionable data so that
> the user can post it immediately.

Unfortunately, this is already the situation we don't expect to see.

What we really need is to know this could happen, and if possible some
info about the situation.
There is not much actionable data here.

Thanks,
Qu

> 
>>  ret = -1;
>>  }
>>  
>>  if (ret < 0) {
>> -ret = 0;
>> +if (diff <= 0)
>> +ret = 0;
>>  
>>  fprintf(stderr,
>> "failed to load free space cache for block group %llu\n",
>>



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-08 Thread Kees Cook

On Thu, Mar 8, 2018 at 2:18 PM, Andrew Morton  wrote:
> On Thu, 8 Mar 2018 13:40:45 -0800 Kees Cook  wrote:
>
>> When max() is used in stack array size calculations from literal values
>> (e.g. "char foo[max(sizeof(struct1), sizeof(struct2))]", the compiler
>> thinks this is a dynamic calculation due to the single-eval logic, which
>> is not needed in the literal case. This change removes several accidental
>> stack VLAs from an x86 allmodconfig build:
>>
>> $ diff -u before.txt after.txt | grep ^-
>> -drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids 
>> variable length array ‘ids’ [-Wvla]
>> -fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length 
>> array ‘namebuf’ [-Wvla]
>> -lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ 
>> [-Wvla]
>> -net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array 
>> ‘buff’ [-Wvla]
>> -net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array 
>> ‘buff’ [-Wvla]
>> -net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array 
>> ‘buff64’ [-Wvla]
>>
>> Based on an earlier patch from Josh Poimboeuf.
>>
>> ...
>>
>> --- a/include/linux/kernel.h
>> +++ b/include/linux/kernel.h
>> @@ -787,37 +787,57 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
>> oops_dump_mode) { }
>>   * strict type-checking.. See the
>>   * "unnecessary" pointer comparison.
>>   */
>> -#define __min(t1, t2, min1, min2, x, y) ({   \
>> +#define __single_eval_min(t1, t2, min1, min2, x, y) ({   \
>>   t1 min1 = (x);  \
>>   t2 min2 = (y);  \
>>   (void) ( == );\
>>   min1 < min2 ? min1 : min2; })
>>
>> +/*
>> + * In the case of builtin constant values, there is no need to do the
>> + * double-evaluation protection, so the raw comparison can be made.
>> + * This allows min()/max() to be used in stack array allocations and
>> + * avoid the compiler thinking it is a dynamic value leading to an
>> + * accidental VLA.
>> + */
>> +#define __min(t1, t2, x, y)  \
>> + __builtin_choose_expr(__builtin_constant_p(x) &&\
>> +   __builtin_constant_p(y) &&\
>> +   __builtin_types_compatible_p(t1, t2), \
>> +   (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
>> +   __single_eval_min(t1, t2, \
>> + __UNIQUE_ID(max1_), \
>> + __UNIQUE_ID(max2_), \
>> + x, y))
>> +
>
> Holy crap.
>
> I suppose gcc will one day be fixed and we won't need this.
>
> Is there a good reason to convert min()?  Surely nobody will be using
> min to dimension an array - always max?  Just for symmetry, I guess.

I just went with symmetry. It seems like an ugly risk to implement min
and mix differently. :) In theory it may produce smaller code for rare
min() uses, but I haven't actually verified that.

I will send a v2 with the two nits mentioned...

-Kees

-- 
Kees Cook
Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-08 Thread Andrew Morton

On Thu, 8 Mar 2018 13:40:45 -0800 Kees Cook  wrote:

> When max() is used in stack array size calculations from literal values
> (e.g. "char foo[max(sizeof(struct1), sizeof(struct2))]", the compiler
> thinks this is a dynamic calculation due to the single-eval logic, which
> is not needed in the literal case. This change removes several accidental
> stack VLAs from an x86 allmodconfig build:
> 
> $ diff -u before.txt after.txt | grep ^-
> -drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids 
> variable length array ‘ids’ [-Wvla]
> -fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length 
> array ‘namebuf’ [-Wvla]
> -lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ 
> [-Wvla]
> -net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array ‘buff’ 
> [-Wvla]
> -net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array ‘buff’ 
> [-Wvla]
> -net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array 
> ‘buff64’ [-Wvla]
> 
> Based on an earlier patch from Josh Poimboeuf.
> 
> ...
>
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -787,37 +787,57 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
> oops_dump_mode) { }
>   * strict type-checking.. See the
>   * "unnecessary" pointer comparison.
>   */
> -#define __min(t1, t2, min1, min2, x, y) ({   \
> +#define __single_eval_min(t1, t2, min1, min2, x, y) ({   \
>   t1 min1 = (x);  \
>   t2 min2 = (y);  \
>   (void) ( == );\
>   min1 < min2 ? min1 : min2; })
>  
> +/*
> + * In the case of builtin constant values, there is no need to do the
> + * double-evaluation protection, so the raw comparison can be made.
> + * This allows min()/max() to be used in stack array allocations and
> + * avoid the compiler thinking it is a dynamic value leading to an
> + * accidental VLA.
> + */
> +#define __min(t1, t2, x, y)  \
> + __builtin_choose_expr(__builtin_constant_p(x) &&\
> +   __builtin_constant_p(y) &&\
> +   __builtin_types_compatible_p(t1, t2), \
> +   (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
> +   __single_eval_min(t1, t2, \
> + __UNIQUE_ID(max1_), \
> + __UNIQUE_ID(max2_), \
> + x, y))
> +

Holy crap.

I suppose gcc will one day be fixed and we won't need this.

Is there a good reason to convert min()?  Surely nobody will be using
min to dimension an array - always max?  Just for symmetry, I guess.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-08 Thread Ian Campbell

On Thu, 2018-03-08 at 13:40 -0800, Kees Cook wrote:
> 
> +#define __min(t1, t2, x, y)  \
> + __builtin_choose_expr(__builtin_constant_p(x) &&\
> +   __builtin_constant_p(y) &&\
> +   __builtin_types_compatible_p(t1, t2), \
> +   (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
> +   __single_eval_min(t1, t2, \
> + __UNIQUE_ID(max1_), \
> + __UNIQUE_ID(max2_), \

min1_ etc?

Ian.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Remove accidental VLA usage

2018-03-08 Thread Rasmus Villemoes

On 8 March 2018 at 21:39, Kees Cook  wrote:
> On Thu, Mar 8, 2018 at 11:57 AM, Rasmus Villemoes
>  wrote:
>> On 2018-03-08 16:02, Josh Poimboeuf wrote:
>>> On Wed, Mar 07, 2018 at 07:30:44PM -0800, Kees Cook wrote:
>>> +extern long __error_incompatible_types_in_min_macro;
>>> +extern long __error_incompatible_types_in_max_macro;
>>> +
>>> +#define __min(t1, t2, x, y)  \
>>> + __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
>>> +   (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
>>> +   (t1)__error_incompatible_types_in_min_macro)
>>>
>>>  /**
>>>   * min - return minimum of two values of the same or compatible types
>>>   * @x: first value
>>>   * @y: second value
>>>   */
>>> -#define min(x, y)\
>>> - __min(typeof(x), typeof(y), \
>>> -   __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
>>> -   x, y)
>>> +#define min(x, y) __min(typeof(x), typeof(y), x, y)  \
>>>
>>
>> But this introduces the the-chosen-one-of-x-and-y-gets-evaluated-twice
>> problem. Maybe we don't care? But until we get a
>> __builtin_assert_this_has_no_side_effects() I think that's a little
>> dangerous.
>
> Eek, yes, we can't do the double-eval. The proposed change breaks
> things badly. :)
>
> a:   20
> b:   40
> max(a++, b++): 40
> a:   21
> b:   41
>
> a:   20
> b:   40
> new_max(a++, b++): 41
> a:   21
> b:   42
>
> However, this works for me:
>
> #define __new_max(t1, t2, max1, max2, x, y)\
>__builtin_choose_expr(__builtin_constant_p(x) && \
>  __builtin_constant_p(y) && \
>  __builtin_types_compatible_p(t1, t2), \
>  (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
>  __max(t1, t2, max1, max2, x, y))
>
> #define new_max(x, y) \
> __new_max(typeof(x), typeof(y), \
>   __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
>   x, y)

Yes, that would seem to do the trick.

Thinking out loud: do we really want or need the
__builtin_types_compatible condition when x and y are compile-time
constants? I think it would be nice to be able to use max(16,
sizeof(bla)) without having to cast either the literal or the sizeof.
Just omitting the type compatibility check might be dangerous, but
perhaps it could be relaxed to a check that both values are
representable in their common promoted type. Something like

(type_signed(t1) == type_signed(t2)) || ((t1)x >= 0 && (t2)y >= 0)

should be safe (if the types have same signedness, or the value of
signed type is positive), though it doesn't allow a few corner cases
(e.g. short vs. unsigned char is always ok due to promotion to int,
and also if the signed type is strictly wider than the unsigned type).

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-08 Thread Kees Cook

When max() is used in stack array size calculations from literal values
(e.g. "char foo[max(sizeof(struct1), sizeof(struct2))]", the compiler
thinks this is a dynamic calculation due to the single-eval logic, which
is not needed in the literal case. This change removes several accidental
stack VLAs from an x86 allmodconfig build:

$ diff -u before.txt after.txt | grep ^-
-drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids 
variable length array ‘ids’ [-Wvla]
-fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length array 
‘namebuf’ [-Wvla]
-lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ 
[-Wvla]
-net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array ‘buff’ 
[-Wvla]
-net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array ‘buff’ 
[-Wvla]
-net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array ‘buff64’ 
[-Wvla]

Based on an earlier patch from Josh Poimboeuf.

Signed-off-by: Kees Cook 
---
 include/linux/kernel.h | 42 ++
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..e0b39d461582 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -787,37 +787,57 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  * strict type-checking.. See the
  * "unnecessary" pointer comparison.
  */
-#define __min(t1, t2, min1, min2, x, y) ({ \
+#define __single_eval_min(t1, t2, min1, min2, x, y) ({ \
t1 min1 = (x);  \
t2 min2 = (y);  \
(void) ( == );\
min1 < min2 ? min1 : min2; })
 
+/*
+ * In the case of builtin constant values, there is no need to do the
+ * double-evaluation protection, so the raw comparison can be made.
+ * This allows min()/max() to be used in stack array allocations and
+ * avoid the compiler thinking it is a dynamic value leading to an
+ * accidental VLA.
+ */
+#define __min(t1, t2, x, y)\
+   __builtin_choose_expr(__builtin_constant_p(x) &&\
+ __builtin_constant_p(y) &&\
+ __builtin_types_compatible_p(t1, t2), \
+ (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
+ __single_eval_min(t1, t2, \
+   __UNIQUE_ID(max1_), \
+   __UNIQUE_ID(max2_), \
+   x, y))
+
 /**
  * min - return minimum of two values of the same or compatible types
  * @x: first value
  * @y: second value
  */
-#define min(x, y)  \
-   __min(typeof(x), typeof(y), \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
- x, y)
+#define min(x, y)  __min(typeof(x), typeof(y), x, y)
 
-#define __max(t1, t2, max1, max2, x, y) ({ \
+#define __single_eval_max(t1, t2, max1, max2, x, y) ({ \
t1 max1 = (x);  \
t2 max2 = (y);  \
(void) ( == );\
max1 > max2 ? max1 : max2; })
 
+#define __max(t1, t2, x, y)\
+   __builtin_choose_expr(__builtin_constant_p(x) &&\
+ __builtin_constant_p(y) &&\
+ __builtin_types_compatible_p(t1, t2), \
+ (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
+ __single_eval_max(t1, t2, \
+   __UNIQUE_ID(max1_), \
+   __UNIQUE_ID(max2_), \
+   x, y))
 /**
  * max - return maximum of two values of the same or compatible types
  * @x: first value
  * @y: second value
  */
-#define max(x, y)  \
-   __max(typeof(x), typeof(y), \
- __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
- x, y)
+#define max(x, y)  __max(typeof(x), typeof(y), x, y)
 
 /**
  * min3 - return minimum of three values
@@ -871,7 +891,6 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  */
 #define min_t(type, x, y)  \
__min(type, type,   \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
  x, y)
 
 /**
@@ -882,7 +901,6 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  */
 #define max_t(type, x, y)  \

Re: [PATCH 0/3] Remove accidental VLA usage

2018-03-08 Thread Andrew Morton

On Thu, 8 Mar 2018 09:02:36 -0600 Josh Poimboeuf  wrote:

> On Wed, Mar 07, 2018 at 07:30:44PM -0800, Kees Cook wrote:
> > This series adds SIMPLE_MAX() to be used in places where a stack array
> > is actually fixed, but the compiler still warns about VLA usage due to
> > confusion caused by the safety checks in the max() macro.
> > 
> > I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(),
> > and they should all have no operational differences.
> 
> What if we instead simplify the max() macro's type checking so that GCC
> can more easily fold the array size constants?  The below patch seems to
> work:
> 
> -/*
> - * min()/max()/clamp() macros that also do
> - * strict type-checking.. See the
> - * "unnecessary" pointer comparison.
> - */
> -#define __min(t1, t2, min1, min2, x, y) ({   \
> - t1 min1 = (x);  \
> - t2 min2 = (y);  \
> - (void) ( == );\
> - min1 < min2 ? min1 : min2; })
> +extern long __error_incompatible_types_in_min_macro;
> +extern long __error_incompatible_types_in_max_macro;
> +
> +#define __min(t1, t2, x, y)  \
> + __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
> +   (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
> +   (t1)__error_incompatible_types_in_min_macro)

This will move the error detection from compile-time to link-time. 
That's tolerable I guess, but a bit sad and should be flagged in the
changelog at least.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Remove accidental VLA usage

2018-03-08 Thread Kees Cook

On Thu, Mar 8, 2018 at 11:57 AM, Rasmus Villemoes
 wrote:
> On 2018-03-08 16:02, Josh Poimboeuf wrote:
>> On Wed, Mar 07, 2018 at 07:30:44PM -0800, Kees Cook wrote:
>>> This series adds SIMPLE_MAX() to be used in places where a stack array
>>> is actually fixed, but the compiler still warns about VLA usage due to
>>> confusion caused by the safety checks in the max() macro.
>>>
>>> I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(),
>>> and they should all have no operational differences.
>>
>> What if we instead simplify the max() macro's type checking so that GCC
>> can more easily fold the array size constants?  The below patch seems to
>> work:
>>
>
>> +extern long __error_incompatible_types_in_min_macro;
>> +extern long __error_incompatible_types_in_max_macro;
>> +
>> +#define __min(t1, t2, x, y)  \
>> + __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
>> +   (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
>> +   (t1)__error_incompatible_types_in_min_macro)
>>
>>  /**
>>   * min - return minimum of two values of the same or compatible types
>>   * @x: first value
>>   * @y: second value
>>   */
>> -#define min(x, y)\
>> - __min(typeof(x), typeof(y), \
>> -   __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
>> -   x, y)
>> +#define min(x, y) __min(typeof(x), typeof(y), x, y)  \
>>
>
> But this introduces the the-chosen-one-of-x-and-y-gets-evaluated-twice
> problem. Maybe we don't care? But until we get a
> __builtin_assert_this_has_no_side_effects() I think that's a little
> dangerous.

Eek, yes, we can't do the double-eval. The proposed change breaks
things badly. :)

a:   20
b:   40
max(a++, b++): 40
a:   21
b:   41

a:   20
b:   40
new_max(a++, b++): 41
a:   21
b:   42

However, this works for me:

#define __new_max(t1, t2, max1, max2, x, y)\
   __builtin_choose_expr(__builtin_constant_p(x) && \
 __builtin_constant_p(y) && \
 __builtin_types_compatible_p(t1, t2), \
 (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
 __max(t1, t2, max1, max2, x, y))

#define new_max(x, y) \
__new_max(typeof(x), typeof(y), \
  __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
  x, y)

(pardon the whitespace damage...)

Let me spin a sane patch and test it...

-Kees

-- 
Kees Cook
Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Remove accidental VLA usage

2018-03-08 Thread Rasmus Villemoes

On 2018-03-08 16:02, Josh Poimboeuf wrote:
> On Wed, Mar 07, 2018 at 07:30:44PM -0800, Kees Cook wrote:
>> This series adds SIMPLE_MAX() to be used in places where a stack array
>> is actually fixed, but the compiler still warns about VLA usage due to
>> confusion caused by the safety checks in the max() macro.
>>
>> I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(),
>> and they should all have no operational differences.
> 
> What if we instead simplify the max() macro's type checking so that GCC
> can more easily fold the array size constants?  The below patch seems to
> work:
> 

> +extern long __error_incompatible_types_in_min_macro;
> +extern long __error_incompatible_types_in_max_macro;
> +
> +#define __min(t1, t2, x, y)  \
> + __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
> +   (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
> +   (t1)__error_incompatible_types_in_min_macro)
>  
>  /**
>   * min - return minimum of two values of the same or compatible types
>   * @x: first value
>   * @y: second value
>   */
> -#define min(x, y)\
> - __min(typeof(x), typeof(y), \
> -   __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
> -   x, y)
> +#define min(x, y) __min(typeof(x), typeof(y), x, y)  \
>  

But this introduces the the-chosen-one-of-x-and-y-gets-evaluated-twice
problem. Maybe we don't care? But until we get a
__builtin_assert_this_has_no_side_effects() I think that's a little
dangerous.

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to change/fix 'Received UUID'

2018-03-08 Thread Marc MERLIN

On Thu, Mar 08, 2018 at 09:36:49PM +0300, Andrei Borzenkov wrote:
> Yes. Your source has Received UUID. In this case btrfs send will
> transmit received UUID instead of subvolume UUID as reference to base
> snapshot. You need to either clear received UUID on source or set
> received UUID on destination to received UUID of source (not to
> subvolume UUID of source).

gargamel:/var/local/src/python-btrfs/examples# ./set_received_uuid.py 
0e220a4f-6426-4745-8399-0da0084f8b23 313
37 1234.5678 /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
  
Current subvolume information:  
  
  subvol_id: 94887  
  
  received_uuid: 2afc7a5e-107f-d54b-8929-197b80b70828   
  
  stime: 1234.5678 (1970-01-01T00:20:34.567800) 
  
  stransid: 31337   
  
  rtime: 1520488877.415709329 (2018-03-08T06:01:17.415709)  
  
  rtransid: 255755  
  

  
Setting received subvolume...   
  

  
Resulting subvolume information:
  
  subvol_id: 94887  
  
  received_uuid: 0e220a4f-6426-4745-8399-0da0084f8b23   
  
  stime: 1234.5678 (1970-01-01T00:20:34.567800) 
  
  stransid: 31337   
  
  rtime: 1520537034.890253770 (2018-03-08T19:23:54.890254)  
  
  rtransid: 256119  
  

  
gargamel:/var/local/src/python-btrfs/examples# btrfs property set -ts 
/mnt/btrfs_bigbackup/DS1/Video_ro.201802
20_21:03:41 ro true

This worked fine, thank you so much.
I now have an incremental send that is going on and will take a few dozen 
minutes instead
of days for 8TB+ :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs: Add nossd_spread mount option

2018-03-08 Thread Omar Sandoval

On Thu, Mar 08, 2018 at 10:48:48AM -0800, Howard McLauchlan wrote:
> Btrfs has two mount options for SSD optimizations: ssd and ssd_spread.
> Presently there is an option to disable all SSD optimizations, but there
> isn't an option to disable just ssd_spread.
> 
> This patch adds a mount option nossd_spread that disables ssd_spread
> only.
> 
> Reviewed-by: Josef Bacik 

Reviewed-by: Omar Sandoval 

> Signed-off-by: Howard McLauchlan 
> ---
> V2:remove duplicate code with fallthrough
> Based on 4.16-rc4
> 
>  fs/btrfs/super.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 4b817947e00f..5bfcbf6f7eea 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -310,10 +310,10 @@ static void btrfs_put_super(struct super_block *sb)
>  enum {
>   Opt_degraded, Opt_subvol, Opt_subvolid, Opt_device, Opt_nodatasum,
>   Opt_nodatacow, Opt_max_inline, Opt_alloc_start, Opt_nobarrier, Opt_ssd,
> - Opt_nossd, Opt_ssd_spread, Opt_thread_pool, Opt_noacl, Opt_compress,
> - Opt_compress_type, Opt_compress_force, Opt_compress_force_type,
> - Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard,
> - Opt_space_cache, Opt_space_cache_version, Opt_clear_cache,
> + Opt_nossd, Opt_ssd_spread, Opt_nossd_spread, Opt_thread_pool, Opt_noacl,
> + Opt_compress, Opt_compress_type, Opt_compress_force,
> + Opt_compress_force_type, Opt_notreelog, Opt_ratio, Opt_flushoncommit,
> + Opt_discard, Opt_space_cache, Opt_space_cache_version, Opt_clear_cache,
>   Opt_user_subvol_rm_allowed, Opt_enospc_debug, Opt_subvolrootid,
>   Opt_defrag, Opt_inode_cache, Opt_no_space_cache, Opt_recovery,
>   Opt_skip_balance, Opt_check_integrity,
> @@ -353,6 +353,7 @@ static const match_table_t tokens = {
>   {Opt_ssd, "ssd"},
>   {Opt_ssd_spread, "ssd_spread"},
>   {Opt_nossd, "nossd"},
> + {Opt_nossd_spread, "nossd_spread"},
>   {Opt_acl, "acl"},
>   {Opt_noacl, "noacl"},
>   {Opt_notreelog, "notreelog"},
> @@ -579,6 +580,8 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char 
> *options,
>   btrfs_set_opt(info->mount_opt, NOSSD);
>   btrfs_clear_and_info(info, SSD,
>"not using ssd optimizations");
> + /* Fallthrough */
> + case Opt_nossd_spread:
>   btrfs_clear_and_info(info, SSD_SPREAD,
>"not using spread ssd allocation 
> scheme");
>   break;
> -- 
> 2.14.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] btrfs: Add nossd_spread mount option

2018-03-08 Thread Howard McLauchlan

Btrfs has two mount options for SSD optimizations: ssd and ssd_spread.
Presently there is an option to disable all SSD optimizations, but there
isn't an option to disable just ssd_spread.

This patch adds a mount option nossd_spread that disables ssd_spread
only.

Reviewed-by: Josef Bacik 
Signed-off-by: Howard McLauchlan 
---
V2:remove duplicate code with fallthrough
Based on 4.16-rc4

 fs/btrfs/super.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 4b817947e00f..5bfcbf6f7eea 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -310,10 +310,10 @@ static void btrfs_put_super(struct super_block *sb)
 enum {
Opt_degraded, Opt_subvol, Opt_subvolid, Opt_device, Opt_nodatasum,
Opt_nodatacow, Opt_max_inline, Opt_alloc_start, Opt_nobarrier, Opt_ssd,
-   Opt_nossd, Opt_ssd_spread, Opt_thread_pool, Opt_noacl, Opt_compress,
-   Opt_compress_type, Opt_compress_force, Opt_compress_force_type,
-   Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard,
-   Opt_space_cache, Opt_space_cache_version, Opt_clear_cache,
+   Opt_nossd, Opt_ssd_spread, Opt_nossd_spread, Opt_thread_pool, Opt_noacl,
+   Opt_compress, Opt_compress_type, Opt_compress_force,
+   Opt_compress_force_type, Opt_notreelog, Opt_ratio, Opt_flushoncommit,
+   Opt_discard, Opt_space_cache, Opt_space_cache_version, Opt_clear_cache,
Opt_user_subvol_rm_allowed, Opt_enospc_debug, Opt_subvolrootid,
Opt_defrag, Opt_inode_cache, Opt_no_space_cache, Opt_recovery,
Opt_skip_balance, Opt_check_integrity,
@@ -353,6 +353,7 @@ static const match_table_t tokens = {
{Opt_ssd, "ssd"},
{Opt_ssd_spread, "ssd_spread"},
{Opt_nossd, "nossd"},
+   {Opt_nossd_spread, "nossd_spread"},
{Opt_acl, "acl"},
{Opt_noacl, "noacl"},
{Opt_notreelog, "notreelog"},
@@ -579,6 +580,8 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char 
*options,
btrfs_set_opt(info->mount_opt, NOSSD);
btrfs_clear_and_info(info, SSD,
 "not using ssd optimizations");
+   /* Fallthrough */
+   case Opt_nossd_spread:
btrfs_clear_and_info(info, SSD_SPREAD,
 "not using spread ssd allocation 
scheme");
break;
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to change/fix 'Received UUID'

2018-03-08 Thread Andrei Borzenkov

08.03.2018 19:02, Marc MERLIN пишет:
> On Thu, Mar 08, 2018 at 09:34:45AM +0300, Andrei Borzenkov wrote:
>> 08.03.2018 09:06, Marc MERLIN пишет:
>>> On Tue, Mar 06, 2018 at 12:02:47PM -0800, Marc MERLIN wrote:
> https://github.com/knorrie/python-btrfs/commit/1ace623f95300ecf581b1182780fd6432a46b24d

 Well, I had never heard about it until now, thank you.

 I'll see if I can make it work when I get a bit of time.
>>>
>>> Sorry, I missed the fact that there was no code to write at all.
>>> gargamel:/var/local/src/python-btrfs/examples# ./set_received_uuid.py 
>>> 2afc7a5e-107f-d54b-8929-197b80b70828 31337 1234.5678 
>>> /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
>>> Current subvolume information:
>>>   subvol_id: 94887
>>>   received_uuid: ----
>>>   stime: 0.0 (1970-01-01T00:00:00)
>>>   stransid: 0  
>>>   rtime: 0.0 (1970-01-01T00:00:00)
>>>   rtransid: 0  
>>>
>>> Setting received subvolume...
>>>
>>> Resulting subvolume information:
>>>   subvol_id: 94887
>>>   received_uuid: 2afc7a5e-107f-d54b-8929-197b80b70828
>>>   stime: 1234.5678 (1970-01-01T00:20:34.567800)
>>>   stransid: 31337
>>>   rtime: 1520488877.415709329 (2018-03-08T06:01:17.415709)
>>>   rtransid: 255755
>>>
>>> gargamel:/var/local/src/python-btrfs/examples# btrfs property set -ts 
>>> /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41 ro true
>>>
>>>
>>> ABORT: btrfs send -p /mnt/btrfs_pool1/Video_ro.20180205_21:05:15 
>>> Video_ro.20180307_22:03:03 |  btrfs receive /mnt/btrfs_bigbackup/DS1//. 
>>> failed
>>> At subvol Video_ro.20180307_22:03:03
>>> At snapshot Video_ro.20180307_22:03:03
>>> ERROR: cannot find parent subvolume
>>>
>>> gargamel:/mnt/btrfs_pool1# btrfs subvolume show 
>>> /mnt/btrfs_pool1/Video_ro.20180220_21\:03\:41/
>>> Video_ro.20180220_21:03:41
>>
>> Not sure I understand how this subvolume is related. You send
>> differences between Video_ro.20180205_21:05:15 and
>> Video_ro.20180307_22:03:03, so you need to have (replica of)
>> Video_ro.20180205_21:05:15 on destination. How exactly
>> Video_ro.20180220_21:03:41 comes in picture here?
>  
> Sorry, I pasted the wrong thing.
> ABORT: btrfs send -p /mnt/btrfs_pool1/Video_ro.20180220_21:03:41 
> Video_ro.20180308_07:50:06 |  btrfs receive /mnt/btrfs_bigbackup/DS1//. failed
> At subvol Video_ro.20180308_07:50:06
> At snapshot Video_ro.20180308_07:50:06
> ERROR: cannot find parent subvolume
> 
> Same problem basically, just copied the wrong attempt, sorry about that.
> 
> Do I need to make sure of more than
> DS1/Video_ro.20180220_21:03:41
> Received UUID:  2afc7a5e-107f-d54b-8929-197b80b70828
> 
> be equal to
> Name:   Video_ro.20180220_21:03:41
> UUID:   2afc7a5e-107f-d54b-8929-197b80b70828
> 

Yes. Your source has Received UUID. In this case btrfs send will
transmit received UUID instead of subvolume UUID as reference to base
snapshot. You need to either clear received UUID on source or set
received UUID on destination to received UUID of source (not to
subvolume UUID of source).

> Thanks,
> Marc
> 
> 
>>> Name:   Video_ro.20180220_21:03:41
>>> UUID:   2afc7a5e-107f-d54b-8929-197b80b70828
>>> Parent UUID:e5ec5c1e-6b49-084e-8820-5a8cfaa1b089
>>> Received UUID:  0e220a4f-6426-4745-8399-0da0084f8b23
I mean this:

>>> Creation time:  2018-02-20 21:03:42 -0800
>>> Subvolume ID:   11228
>>> Generation: 4174
>>> Gen at creation:4150
>>> Parent ID:  5
>>> Top level ID:   5
>>> Flags:  readonly
>>> Snapshot(s):
>>> Video_rw.20180220_21:03:41
>>> Video
>>>
>>>
>>> Wasn't I supposed to set 2afc7a5e-107f-d54b-8929-197b80b70828 onto the 
>>> destination?
>>>
>>> Doesn't that look ok now? Is there something else I'm missing?
>>> gargamel:/mnt/btrfs_pool1# btrfs subvolume show 
>>> /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
>>> DS1/Video_ro.20180220_21:03:41
>>> Name:   Video_ro.20180220_21:03:41
>>> UUID:   cb4f343c-5e79-7f49-adf0-7ce0b29f23b3
>>> Parent UUID:0e220a4f-6426-4745-8399-0da0084f8b23
>>> Received UUID:  2afc7a5e-107f-d54b-8929-197b80b70828
>>> Creation time:  2018-02-20 21:13:36 -0800
>>> Subvolume ID:   94887
>>> Generation: 250689
>>> Gen at creation:250689
>>> Parent ID:  89160
>>> Top level ID:   89160
>>> Flags:  readonly
>>> Snapshot(s):
>>>
>>> Thanks,
>>> Marc
>>>
>>
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org

Re: [PATCH] fstests: test regression of -EEXIST on creating new file after log replay

2018-03-08 Thread Liu Bo

On Thu, Mar 08, 2018 at 10:07:33AM +, Filipe Manana wrote:
> On Wed, Mar 7, 2018 at 11:59 PM, Liu Bo  wrote:
> > The regression is introduced to btrfs in linux v4.4 and it refuses to create
> > new files after log replay by returning -EEXIST.
> >
> > Although the problem is on btrfs only, there is no btrfs stuff in terms of
> > test, so this makes it generic.
> >
> > The kernel fix is
> >   Btrfs: fix unexpected -EEXIST when creating new inode
> >
> > Signed-off-by: Liu Bo 
> > ---
> >  tests/generic/481 | 75 
> > +++
> >  tests/generic/481.out |  5 
> >  tests/generic/group   |  1 +
> >  3 files changed, 81 insertions(+)
> >  create mode 100755 tests/generic/481
> >  create mode 100644 tests/generic/481.out
> >
> > diff --git a/tests/generic/481 b/tests/generic/481
> > new file mode 100755
> > index 000..8d8bb2b
> > --- /dev/null
> > +++ b/tests/generic/481
> > @@ -0,0 +1,75 @@
> > +#! /bin/bash
> > +# FSQA Test No. 481
> > +#
> > +# Reproduce a regression of btrfs that leads to -EEXIST on creating new 
> > files
> > +# after log replay.
> > +#
> > +# The kernel fix is
> > +#   Btrfs: fix unexpected -EEXIST when creating new inode
> > +#
> > +#---
> > +#
> > +# Copyright (C) 2018 Oracle. All Rights Reserved.
> > +# Author: Bo Liu 
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU General Public License as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it would be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write the Free Software Foundation,
> > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > +#---
> > +#
> > +
> > +seq=`basename $0`
> > +seqres=$RESULT_DIR/$seq
> > +echo "QA output created by $seq"
> > +tmp=/tmp/$$
> > +status=1   # failure is the default!
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +_cleanup()
> > +{
> > +   _cleanup_flakey
> > +   cd /
> > +   rm -f $tmp.*
> > +}
> > +
> > +# get standard environment, filters and checks
> > +. ./common/rc
> > +. ./common/filter
> > +. ./common/dmflakey
> > +
> > +# real QA test starts here
> > +_supported_fs generic
> > +_supported_os Linux
> > +_require_scratch
> > +_require_dm_target flakey
> > +
> > +rm -f $seqres.full
> > +
> > +_scratch_mkfs >>$seqres.full 2>&1
> > +_init_flakey
> > +_mount_flakey
> > +
> > +# create a file and keep it in write ahead log
> > +$XFS_IO_PROG -f -c "pwrite 0 4k" -c "fsync" $SCRATCH_MNT/foo | 
> > _filter_xfs_io
> 
> What's the point of writing data to the file if we later don't check
> that the data is there (and the file size is 4K)?
> To trigger the bug in btrfs, it was also not needed to write data to
> the file in the first place.
>

Oh right, only 'touch' should be fine enough.

> Other then that looks good.
> Thanks for doing this.
> 
> Reviewed-by: Filipe Manana 
> 
> > +
> > +# fail this filesystem so that remount can replay the write ahead log
> > +_flakey_drop_and_remount
> > +
> > +# see if we can create a new file successfully
> > +touch $SCRATCH_MNT/bar
> > +
> > +_unmount_flakey
> > +
> > +echo "Silence is golden"
> > +
> > +status=0
> > +exit
> > diff --git a/tests/generic/481.out b/tests/generic/481.out
> > new file mode 100644
> > index 000..66a6345
> > --- /dev/null
> > +++ b/tests/generic/481.out
> > @@ -0,0 +1,5 @@
> > +QA output created by 481
> > +wrote 4096/4096 bytes at offset 0
> > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> > +touch: cannot touch '/mnt/scratch/bar': File exists

Will fix this, too.

Thanks for the comments.

Thanks,

-liubo

> > +Silence is golden
> > diff --git a/tests/generic/group b/tests/generic/group
> > index ea2056b..05f60f2 100644
> > --- a/tests/generic/group
> > +++ b/tests/generic/group
> > @@ -483,3 +483,4 @@
> >  478 auto quick
> >  479 auto quick metadata
> >  480 auto quick metadata
> > +481 auto quick
> > --
> > 2.5.0
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Filipe David Manana,
> 
> “Whether you think you can, or you think you can't — you're right.”
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: [PATCH 0/3] Remove accidental VLA usage

2018-03-08 Thread Josh Poimboeuf

On Thu, Mar 08, 2018 at 10:02:01AM -0800, Kees Cook wrote:
> On Thu, Mar 8, 2018 at 7:02 AM, Josh Poimboeuf  wrote:
> > On Wed, Mar 07, 2018 at 07:30:44PM -0800, Kees Cook wrote:
> >> This series adds SIMPLE_MAX() to be used in places where a stack array
> >> is actually fixed, but the compiler still warns about VLA usage due to
> >> confusion caused by the safety checks in the max() macro.
> >>
> >> I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(),
> >> and they should all have no operational differences.
> >
> > What if we instead simplify the max() macro's type checking so that GCC
> > can more easily fold the array size constants?  The below patch seems to
> > work:
> 
> Oh magic! Very nice. I couldn't figure out how to do this when I
> stared at it. Yes, let me respin. (I assume I can add your S-o-b?)

I'm going to be traveling for a few days, so I bequeath the patch to
you.  You can add my SOB.  I agree with Steve's suggestion to run it
through 0-day.

> 
> -Kees
> 
> >
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 3fd291503576..ec863726da29 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -782,42 +782,32 @@ ftrace_vprintk(const char *fmt, va_list ap)
> >  static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
> >  #endif /* CONFIG_TRACING */
> >
> > -/*
> > - * min()/max()/clamp() macros that also do
> > - * strict type-checking.. See the
> > - * "unnecessary" pointer comparison.
> > - */
> > -#define __min(t1, t2, min1, min2, x, y) ({ \
> > -   t1 min1 = (x);  \
> > -   t2 min2 = (y);  \
> > -   (void) ( == );\
> > -   min1 < min2 ? min1 : min2; })
> > +extern long __error_incompatible_types_in_min_macro;
> > +extern long __error_incompatible_types_in_max_macro;
> > +
> > +#define __min(t1, t2, x, y)\
> > +   __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
> > + (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
> > + (t1)__error_incompatible_types_in_min_macro)
> >
> >  /**
> >   * min - return minimum of two values of the same or compatible types
> >   * @x: first value
> >   * @y: second value
> >   */
> > -#define min(x, y)  \
> > -   __min(typeof(x), typeof(y), \
> > - __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
> > - x, y)
> > +#define min(x, y) __min(typeof(x), typeof(y), x, y)\
> >
> > -#define __max(t1, t2, max1, max2, x, y) ({ \
> > -   t1 max1 = (x);  \
> > -   t2 max2 = (y);  \
> > -   (void) ( == );\
> > -   max1 > max2 ? max1 : max2; })
> > +#define __max(t1, t2, x, y)\
> > +   __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
> > + (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
> > + (t1)__error_incompatible_types_in_max_macro)
> >
> >  /**
> >   * max - return maximum of two values of the same or compatible types
> >   * @x: first value
> >   * @y: second value
> >   */
> > -#define max(x, y)  \
> > -   __max(typeof(x), typeof(y), \
> > - __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
> > - x, y)
> > +#define max(x, y) __max(typeof(x), typeof(y), x, y)
> >
> >  /**
> >   * min3 - return minimum of three values
> > @@ -869,10 +859,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
> > oops_dump_mode) { }
> >   * @x: first value
> >   * @y: second value
> >   */
> > -#define min_t(type, x, y)  \
> > -   __min(type, type,   \
> > - __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
> > - x, y)
> > +#define min_t(type, x, y) __min(type, type, x, y)
> >
> >  /**
> >   * max_t - return maximum of two values, using the specified type
> > @@ -880,10 +867,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
> > oops_dump_mode) { }
> >   * @x: first value
> >   * @y: second value
> >   */
> > -#define max_t(type, x, y)  \
> > -   __max(type, type,   \
> > - __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
> > - x, y)
> > +#define max_t(type, x, y) __max(type, type, x, y)  
> > \
> >
> >  /**
> >   * clamp_t - return a value clamped to a given range using a given type
> 
> 
> 
> -- 
> Kees Cook
> Pixel Security

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to

Re: [PATCH 0/3] Remove accidental VLA usage

2018-03-08 Thread Steven Rostedt

On Thu, 8 Mar 2018 09:02:36 -0600
Josh Poimboeuf  wrote:

> On Wed, Mar 07, 2018 at 07:30:44PM -0800, Kees Cook wrote:
> > This series adds SIMPLE_MAX() to be used in places where a stack array
> > is actually fixed, but the compiler still warns about VLA usage due to
> > confusion caused by the safety checks in the max() macro.
> > 
> > I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(),
> > and they should all have no operational differences.  
> 
> What if we instead simplify the max() macro's type checking so that GCC
> can more easily fold the array size constants?  The below patch seems to
> work:

Nice. Have you tried to do a allmodconfig and build on various archs?

Of course pushing it to kernel.org will have the zero day bot do it for
you ;-)

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Remove accidental VLA usage

2018-03-08 Thread Kees Cook

On Thu, Mar 8, 2018 at 7:02 AM, Josh Poimboeuf  wrote:
> On Wed, Mar 07, 2018 at 07:30:44PM -0800, Kees Cook wrote:
>> This series adds SIMPLE_MAX() to be used in places where a stack array
>> is actually fixed, but the compiler still warns about VLA usage due to
>> confusion caused by the safety checks in the max() macro.
>>
>> I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(),
>> and they should all have no operational differences.
>
> What if we instead simplify the max() macro's type checking so that GCC
> can more easily fold the array size constants?  The below patch seems to
> work:

Oh magic! Very nice. I couldn't figure out how to do this when I
stared at it. Yes, let me respin. (I assume I can add your S-o-b?)

-Kees

>
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 3fd291503576..ec863726da29 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -782,42 +782,32 @@ ftrace_vprintk(const char *fmt, va_list ap)
>  static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
>  #endif /* CONFIG_TRACING */
>
> -/*
> - * min()/max()/clamp() macros that also do
> - * strict type-checking.. See the
> - * "unnecessary" pointer comparison.
> - */
> -#define __min(t1, t2, min1, min2, x, y) ({ \
> -   t1 min1 = (x);  \
> -   t2 min2 = (y);  \
> -   (void) ( == );\
> -   min1 < min2 ? min1 : min2; })
> +extern long __error_incompatible_types_in_min_macro;
> +extern long __error_incompatible_types_in_max_macro;
> +
> +#define __min(t1, t2, x, y)\
> +   __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
> + (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
> + (t1)__error_incompatible_types_in_min_macro)
>
>  /**
>   * min - return minimum of two values of the same or compatible types
>   * @x: first value
>   * @y: second value
>   */
> -#define min(x, y)  \
> -   __min(typeof(x), typeof(y), \
> - __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
> - x, y)
> +#define min(x, y) __min(typeof(x), typeof(y), x, y)\
>
> -#define __max(t1, t2, max1, max2, x, y) ({ \
> -   t1 max1 = (x);  \
> -   t2 max2 = (y);  \
> -   (void) ( == );\
> -   max1 > max2 ? max1 : max2; })
> +#define __max(t1, t2, x, y)\
> +   __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
> + (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
> + (t1)__error_incompatible_types_in_max_macro)
>
>  /**
>   * max - return maximum of two values of the same or compatible types
>   * @x: first value
>   * @y: second value
>   */
> -#define max(x, y)  \
> -   __max(typeof(x), typeof(y), \
> - __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
> - x, y)
> +#define max(x, y) __max(typeof(x), typeof(y), x, y)
>
>  /**
>   * min3 - return minimum of three values
> @@ -869,10 +859,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
> oops_dump_mode) { }
>   * @x: first value
>   * @y: second value
>   */
> -#define min_t(type, x, y)  \
> -   __min(type, type,   \
> - __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
> - x, y)
> +#define min_t(type, x, y) __min(type, type, x, y)
>
>  /**
>   * max_t - return maximum of two values, using the specified type
> @@ -880,10 +867,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
> oops_dump_mode) { }
>   * @x: first value
>   * @y: second value
>   */
> -#define max_t(type, x, y)  \
> -   __max(type, type,   \
> - __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
> - x, y)
> +#define max_t(type, x, y) __max(type, type, x, y)
>   \
>
>  /**
>   * clamp_t - return a value clamped to a given range using a given type



-- 
Kees Cook
Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Ongoing Btrfs stability issues

2018-03-08 Thread Alex Adriaanse

On Mar 2, 2018, at 11:29 AM, Liu Bo  wrote:
> On Thu, Mar 01, 2018 at 09:40:41PM +0200, Nikolay Borisov wrote:
>> On  1.03.2018 21:04, Alex Adriaanse wrote:
>>> Thanks so much for the suggestions so far, everyone. I wanted to report 
>>> back on this. Last Friday I made the following changes per suggestions from 
>>> this thread:
>>> 
>>> 1. Change the nightly balance to the following:
>>> 
>>>btrfs balance start -dusage=20 
>>>btrfs balance start -dusage=40,limit=10 
>>>btrfs balance start -musage=30 
>>> 
>>> 2. Upgrade kernels for all VMs to 4.14.13-1~bpo9+1, which contains the SSD 
>>> space allocation fix.
>>> 
>>> 3. Boot Linux with the elevator=noop option
>>> 
>>> 4. Change /sys/block/xvd*/queue/scheduler to "none"
>>> 
>>> 5. Mount all our Btrfs filesystems with the "enospc_debug" option.
>> 
>> SO that's good, however you didn't apply the out of tree patch (it has
>> already been merged into the for-next so will likely land in 4.17) I
>> pointed you at. As a result when you your ENOSPC error there is no extra
>> information being printed so we can't really reason about what might be
>> going wrong in the metadata flushing algorithms.

Sorry, I clearly missed that one. I have applied the patch you referenced and 
rebooted the VM in question. This morning we had another FS failure on the same 
machine that caused it to go into readonly mode. This happened after that 
device was experiencing 100% I/O utilization for some time. No balance was 
running at the time; last balance finished about 6 hours prior to the error.

Kernel messages:
[211238.262683] use_block_rsv: 163 callbacks suppressed
[211238.262683] BTRFS: block rsv returned -28
[211238.266718] [ cut here ]
[211238.270462] WARNING: CPU: 0 PID: 391 at fs/btrfs/extent-tree.c:8463 
btrfs_alloc_tree_block+0x39b/0x4c0 [btrfs]
[211238.277203] Modules linked in: xt_nat xt_tcpudp veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c crc32c_generic 
br_netfilter bridge stp llc intel_rapl sb_edac crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel ppdev parport_pc intel_rapl_perf parport serio_raw evdev 
ip_tables x_tables autofs4 btrfs xor zstd_decompress zstd_compress xxhash 
raid6_pq ata_generic crc32c_intel ata_piix libata xen_blkfront cirrus ttm 
drm_kms_helper aesni_intel aes_x86_64 crypto_simd cryptd glue_helper psmouse 
drm ena scsi_mod i2c_piix4 button
[211238.319618] CPU: 0 PID: 391 Comm: btrfs-transacti Tainted: GW   
4.14.13 #3
[211238.325479] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[211238.330742] task: 9cb43abb70c0 task.stack: b234c3b58000
[211238.335575] RIP: 0010:btrfs_alloc_tree_block+0x39b/0x4c0 [btrfs]
[211238.340454] RSP: 0018:b234c3b5b958 EFLAGS: 00010282
[211238.344782] RAX: 001d RBX: 9cb43bdea128 RCX: 

[211238.350562] RDX:  RSI: 9cb440a166f8 RDI: 
9cb440a166f8
[211238.356066] RBP: 4000 R08: 0001 R09: 
7d81
[211238.361649] R10: 0001 R11: 7d81 R12: 
9cb43bdea000
[211238.367304] R13: 9cb437f2c800 R14: 0001 R15: 
ffe4
[211238.372658] FS:  () GS:9cb440a0() 
knlGS:
[211238.379048] CS:  0010 DS:  ES:  CR0: 80050033
[211238.384681] CR2: 7f90a6677000 CR3: 0003cea0a006 CR4: 
001606f0
[211238.391380] DR0:  DR1:  DR2: 

[211238.398050] DR3:  DR6: fffe0ff0 DR7: 
0400
[211238.404730] Call Trace:
[211238.407880]  __btrfs_cow_block+0x125/0x5c0 [btrfs]
[211238.412455]  btrfs_cow_block+0xcb/0x1b0 [btrfs]
[211238.416292]  btrfs_search_slot+0x1fd/0x9e0 [btrfs]
[211238.420630]  lookup_inline_extent_backref+0x105/0x610 [btrfs]
[211238.425215]  __btrfs_free_extent.isra.61+0xf5/0xd30 [btrfs]
[211238.429663]  __btrfs_run_delayed_refs+0x516/0x12a0 [btrfs]
[211238.434077]  btrfs_run_delayed_refs+0x7a/0x270 [btrfs]
[211238.438541]  btrfs_commit_transaction+0x3e1/0x950 [btrfs]
[211238.442899]  ? remove_wait_queue+0x60/0x60
[211238.446503]  transaction_kthread+0x195/0x1b0 [btrfs]
[211238.450578]  kthread+0xfc/0x130
[211238.453924]  ? btrfs_cleanup_transaction+0x580/0x580 [btrfs]
[211238.458381]  ? kthread_create_on_node+0x70/0x70
[211238.462225]  ? do_group_exit+0x3a/0xa0
[211238.465586]  ret_from_fork+0x1f/0x30
[211238.468814] Code: ff 48 c7 c6 28 97 58 c0 48 c7 c7 a0 e1 5d c0 e8 0c d0 f7 
d5 85 c0 0f 84 1c fd ff ff 44 89 fe 48 c7 c7 58 0c 59 c0 e8 70 2f 9e d5 <0f> ff 
e9 06 fd ff ff 4c 63 e8 31 d2 48 89 ee 48 89 df e8 4e eb
[211238.482366] ---[ end trace 48dd1ab4e2e46f6e ]---
[211238.486524] BTRFS info (device xvdc): space_info 4 has 18446744073258958848 
free, is not full

Re: [PATCH 2/2] btrfs: replace waitqueue_actvie with cond_wake_up

2018-03-08 Thread David Sterba

On Thu, Mar 08, 2018 at 02:19:28PM +0200, Nikolay Borisov wrote:
> > @@ -941,9 +941,7 @@ void btrfs_bio_counter_inc_noblocked(struct 
> > btrfs_fs_info *fs_info)
> >  void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount)
> >  {
> > percpu_counter_sub(_info->bio_counter, amount);
> > -
> > -   if (waitqueue_active(_info->replace_wait))
> > -   wake_up(_info->replace_wait);
> > +   cond_wake_up_nomb(_info->replace_wait);
> 
> nit/offtopic:
> 
> I think here the code requires comments since we have 2 types of
> waiters for fs_info->replace_wait. One is dependent on the
> percpu_counter_sum (i.e. the btrfs_rm_dev_replace_blocked). And then
> there is another condition on the same wait entry - the
> btrfs_bio_counter_inc_blocked i.e:
> 
>  wait_event(fs_info->replace_wait,   
>!test_bit(BTRFS_FS_STATE_DEV_REPLACING,
>   
>  _info->fs_state)); 
> 
> geez, who would come up with such synchronization ... 

'git blame' is the right tool, I'm not a fan of the dev-replace locking
either.

> >  }
> >  
> >  void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)
> > @@ -3092,11 +3086,8 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
> > atomic_set(_root_tree->log_commit[index2], 0);
> > mutex_unlock(_root_tree->log_mutex);
> >  
> > -   /*
> > -* The barrier before waitqueue_active is implied by mutex_unlock
> > -*/
> > -   if (waitqueue_active(_root_tree->log_commit_wait[index2]))
> > -   wake_up(_root_tree->log_commit_wait[index2]);
> > +   /* The barrier is implied by mutex_unlock */
> > +   cond_wake_up_nomb(_root_tree->log_commit_wait[index2]);
> 
> I think this is wrong (not your code) but the original assumption that 
> the RELEASE semantics provided by mutex_unlock is sufficient. 

That was added in 33a9eca7e4a4c2c17ae by me.

> According to memory-barriers.txt: 
> 
> Section 'LOCK ACQUISITION FUNCTIONS' states: 
> 
> 
>  (2) RELEASE operation implication:   
>   
>   
>   
>  Memory operations issued before the RELEASE will be completed before the 
>   
>  RELEASE operation has completed. 
>   
>   
>   
>  Memory operations issued after the RELEASE *may* be completed before the 
> 
>  RELEASE operation has completed.
> 
> (I've bolded the may portion)
> 
> The example given there: 
> 
> As an example, consider the following:
>   
>   
>   
> *A = a;   
>   
> *B = b;   
>   
> ACQUIRE   
>   
> *C = c;   
>   
> *D = d;   
>   
> RELEASE   
>   
> *E = e;   
>   
> *F = f;   
>   
>   
>   
> The following sequence of events is acceptable:   
>   
>   
>   
> ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE  
> 
> So if we assume that *C is modifying the flag which the waitqueue is 
> checking, 
> and *E is the actual wakeup, then those accesses can be re-ordered...
> 
> IMHO this code should be considered broken... 

Maybe, but the code has to be taken as a whole, the log_mutex and
waiting does not follow a simple pattern that could be matched on the
barriers example. In this case I think that the log_commit_wait is
partially synchronized by the log_mutex and the bad scenario of missed
wakeup will not happen. We'll have either empty waiter queue, or the
waiters are blocked on the mutex (ie. the waitqueu is active and we'll
always have someone to wake). But this is an observation made after a
short time reading the code so there actually might be a tiny window
where some of the assumptions do not hold.

Point of the patch is to do the transformation to the helpers. If there
are bugs that could be obscured by the patch, I can postpone it until
the bugs are fixed.

> 
> 
> >  out:
> > mutex_lock(>log_mutex);
> > btrfs_remove_all_log_ctxs(root, index1, ret);
> > @@ -3104,11 +3095,8 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
> >

Re: How to change/fix 'Received UUID'

2018-03-08 Thread Marc MERLIN

On Thu, Mar 08, 2018 at 09:34:45AM +0300, Andrei Borzenkov wrote:
> 08.03.2018 09:06, Marc MERLIN пишет:
> > On Tue, Mar 06, 2018 at 12:02:47PM -0800, Marc MERLIN wrote:
> >>> https://github.com/knorrie/python-btrfs/commit/1ace623f95300ecf581b1182780fd6432a46b24d
> >>
> >> Well, I had never heard about it until now, thank you.
> >>
> >> I'll see if I can make it work when I get a bit of time.
> > 
> > Sorry, I missed the fact that there was no code to write at all.
> > gargamel:/var/local/src/python-btrfs/examples# ./set_received_uuid.py 
> > 2afc7a5e-107f-d54b-8929-197b80b70828 31337 1234.5678 
> > /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
> > Current subvolume information:
> >   subvol_id: 94887
> >   received_uuid: ----
> >   stime: 0.0 (1970-01-01T00:00:00)
> >   stransid: 0  
> >   rtime: 0.0 (1970-01-01T00:00:00)
> >   rtransid: 0  
> > 
> > Setting received subvolume...
> > 
> > Resulting subvolume information:
> >   subvol_id: 94887
> >   received_uuid: 2afc7a5e-107f-d54b-8929-197b80b70828
> >   stime: 1234.5678 (1970-01-01T00:20:34.567800)
> >   stransid: 31337
> >   rtime: 1520488877.415709329 (2018-03-08T06:01:17.415709)
> >   rtransid: 255755
> > 
> > gargamel:/var/local/src/python-btrfs/examples# btrfs property set -ts 
> > /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41 ro true
> > 
> > 
> > ABORT: btrfs send -p /mnt/btrfs_pool1/Video_ro.20180205_21:05:15 
> > Video_ro.20180307_22:03:03 |  btrfs receive /mnt/btrfs_bigbackup/DS1//. 
> > failed
> > At subvol Video_ro.20180307_22:03:03
> > At snapshot Video_ro.20180307_22:03:03
> > ERROR: cannot find parent subvolume
> > 
> > gargamel:/mnt/btrfs_pool1# btrfs subvolume show 
> > /mnt/btrfs_pool1/Video_ro.20180220_21\:03\:41/
> > Video_ro.20180220_21:03:41
> 
> Not sure I understand how this subvolume is related. You send
> differences between Video_ro.20180205_21:05:15 and
> Video_ro.20180307_22:03:03, so you need to have (replica of)
> Video_ro.20180205_21:05:15 on destination. How exactly
> Video_ro.20180220_21:03:41 comes in picture here?
 
Sorry, I pasted the wrong thing.
ABORT: btrfs send -p /mnt/btrfs_pool1/Video_ro.20180220_21:03:41 
Video_ro.20180308_07:50:06 |  btrfs receive /mnt/btrfs_bigbackup/DS1//. failed
At subvol Video_ro.20180308_07:50:06
At snapshot Video_ro.20180308_07:50:06
ERROR: cannot find parent subvolume

Same problem basically, just copied the wrong attempt, sorry about that.

Do I need to make sure of more than
DS1/Video_ro.20180220_21:03:41
Received UUID:  2afc7a5e-107f-d54b-8929-197b80b70828

be equal to
Name:   Video_ro.20180220_21:03:41
UUID:   2afc7a5e-107f-d54b-8929-197b80b70828

Thanks,
Marc


> > Name:   Video_ro.20180220_21:03:41
> > UUID:   2afc7a5e-107f-d54b-8929-197b80b70828
> > Parent UUID:e5ec5c1e-6b49-084e-8820-5a8cfaa1b089
> > Received UUID:  0e220a4f-6426-4745-8399-0da0084f8b23>   
> >   Creation time:  2018-02-20 21:03:42 -0800
> > Subvolume ID:   11228
> > Generation: 4174
> > Gen at creation:4150
> > Parent ID:  5
> > Top level ID:   5
> > Flags:  readonly
> > Snapshot(s):
> > Video_rw.20180220_21:03:41
> > Video
> > 
> > 
> > Wasn't I supposed to set 2afc7a5e-107f-d54b-8929-197b80b70828 onto the 
> > destination?
> > 
> > Doesn't that look ok now? Is there something else I'm missing?
> > gargamel:/mnt/btrfs_pool1# btrfs subvolume show 
> > /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
> > DS1/Video_ro.20180220_21:03:41
> > Name:   Video_ro.20180220_21:03:41
> > UUID:   cb4f343c-5e79-7f49-adf0-7ce0b29f23b3
> > Parent UUID:0e220a4f-6426-4745-8399-0da0084f8b23
> > Received UUID:  2afc7a5e-107f-d54b-8929-197b80b70828
> > Creation time:  2018-02-20 21:13:36 -0800
> > Subvolume ID:   94887
> > Generation: 250689
> > Gen at creation:250689
> > Parent ID:  89160
> > Top level ID:   89160
> > Flags:  readonly
> > Snapshot(s):
> > 
> > Thanks,
> > Marc
> > 
> 
> 

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/22] btrfs: remove trivial locking wrappers of tree mod log

2018-03-08 Thread David Sterba

On Thu, Mar 08, 2018 at 05:37:33PM +0200, Nikolay Borisov wrote:
> 
> 
> On  8.03.2018 16:33, David Sterba wrote:
> > The wrappers are trivial and do bring any extra value on top of the
> 
> nit: s/do/don't/ ?

Right, fixed.

> > plain locking primitives.
> > 
> > Signed-off-by: David Sterba 
> 
> Just 2 minor nits, otherwise LGTM:
> 
> Reviewed-by: Nikolay Borisov 
> 
> > - * Note: must be called with write lock (tree_mod_log_write_lock).
> > + * Note: must be called with write lock for fs_info_;;tree_mod_log_lock.
> nit: s/;;/::/

Fixed, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 18/22] btrfs: move allocation after simple tests in tree_mod_log_insert_key

2018-03-08 Thread David Sterba

On Thu, Mar 08, 2018 at 03:26:11PM +, Filipe Manana wrote:
> On Thu, Mar 8, 2018 at 2:33 PM, David Sterba  wrote:
> > The allocation of tree_mod_elem can be delayed after tree_mod_dont_log.
> > In case it causes the function to return, the allocation would be
> > redundant. Otherwise it could cause unnecessary failure if there's not
> > enough memory.
> 
> Nop, the allocation must be made before calling tree_mod_dont_log(),
> as that function acquires a write lock.
> That's why there's a call to tree_mod_need_log() before the allocation
> and a call to tree_mod_dont_log() after (they do the same checks).

Ah right, thanks, it's even documented, dunno why I missed it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/22] btrfs: remove trivial locking wrappers of tree mod log

2018-03-08 Thread Nikolay Borisov



On  8.03.2018 16:33, David Sterba wrote:
> The wrappers are trivial and do bring any extra value on top of the

nit: s/do/don't/ ?

> plain locking primitives.
> 
> Signed-off-by: David Sterba 

Just 2 minor nits, otherwise LGTM:

Reviewed-by: Nikolay Borisov 

> ---
>  fs/btrfs/ctree.c | 58 
> +++-
>  1 file changed, 19 insertions(+), 39 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 716c140798c3..7d39ef6ce20a 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -330,26 +330,6 @@ struct tree_mod_elem {
>   struct tree_mod_root old_root;
>  };
>  
> -static inline void tree_mod_log_read_lock(struct btrfs_fs_info *fs_info)
> -{
> - read_lock(_info->tree_mod_log_lock);
> -}
> -
> -static inline void tree_mod_log_read_unlock(struct btrfs_fs_info *fs_info)
> -{
> - read_unlock(_info->tree_mod_log_lock);
> -}
> -
> -static inline void tree_mod_log_write_lock(struct btrfs_fs_info *fs_info)
> -{
> - write_lock(_info->tree_mod_log_lock);
> -}
> -
> -static inline void tree_mod_log_write_unlock(struct btrfs_fs_info *fs_info)
> -{
> - write_unlock(_info->tree_mod_log_lock);
> -}
> -
>  /*
>   * Pull a new tree mod seq number for our operation.
>   */
> @@ -369,14 +349,14 @@ static inline u64 btrfs_inc_tree_mod_seq(struct 
> btrfs_fs_info *fs_info)
>  u64 btrfs_get_tree_mod_seq(struct btrfs_fs_info *fs_info,
>  struct seq_list *elem)
>  {
> - tree_mod_log_write_lock(fs_info);
> + write_lock(_info->tree_mod_log_lock);
>   spin_lock(_info->tree_mod_seq_lock);
>   if (!elem->seq) {
>   elem->seq = btrfs_inc_tree_mod_seq(fs_info);
>   list_add_tail(>list, _info->tree_mod_seq_list);
>   }
>   spin_unlock(_info->tree_mod_seq_lock);
> - tree_mod_log_write_unlock(fs_info);
> + write_unlock(_info->tree_mod_log_lock);
>  
>   return elem->seq;
>  }
> @@ -418,7 +398,7 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
>* anything that's lower than the lowest existing (read: blocked)
>* sequence number can be removed from the tree.
>*/
> - tree_mod_log_write_lock(fs_info);
> + write_lock(_info->tree_mod_log_lock);
>   tm_root = _info->tree_mod_log;
>   for (node = rb_first(tm_root); node; node = next) {
>   next = rb_next(node);
> @@ -428,7 +408,7 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
>   rb_erase(node, tm_root);
>   kfree(tm);
>   }
> - tree_mod_log_write_unlock(fs_info);
> + write_unlock(_info->tree_mod_log_lock);
>  }
>  
>  /*
> @@ -439,7 +419,7 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
>   * for root replace operations, or the logical address of the affected
>   * block for all other operations.
>   *
> - * Note: must be called with write lock (tree_mod_log_write_lock).
> + * Note: must be called with write lock for fs_info_;;tree_mod_log_lock.
nit: s/;;/::/
>   */
>  static noinline int
>  __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem 
> *tm)
> @@ -477,7 +457,7 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, 
> struct tree_mod_elem *tm)
>   * Determines if logging can be omitted. Returns 1 if it can. Otherwise, it
>   * returns zero with the tree_mod_log_lock acquired. The caller must hold
>   * this until all tree mod log insertions are recorded in the rb tree and 
> then
> - * call tree_mod_log_write_unlock() to release.
> + * write unlock fs_info::tree_mod_log_lock.
>   */
>  static inline int tree_mod_dont_log(struct btrfs_fs_info *fs_info,
>   struct extent_buffer *eb) {
> @@ -487,9 +467,9 @@ static inline int tree_mod_dont_log(struct btrfs_fs_info 
> *fs_info,
>   if (eb && btrfs_header_level(eb) == 0)
>   return 1;
>  
> - tree_mod_log_write_lock(fs_info);
> + write_lock(_info->tree_mod_log_lock);
>   if (list_empty(&(fs_info)->tree_mod_seq_list)) {
> - tree_mod_log_write_unlock(fs_info);
> + write_unlock(_info->tree_mod_log_lock);
>   return 1;
>   }
>  
> @@ -551,7 +531,7 @@ static noinline int tree_mod_log_insert_key(struct 
> extent_buffer *eb, int slot,
>   }
>  
>   ret = __tree_mod_log_insert(eb->fs_info, tm);
> - tree_mod_log_write_unlock(eb->fs_info);
> + write_unlock(>fs_info->tree_mod_log_lock);
>   if (ret)
>   kfree(tm);
>  
> @@ -613,7 +593,7 @@ static noinline int tree_mod_log_insert_move(struct 
> extent_buffer *eb,
>   ret = __tree_mod_log_insert(eb->fs_info, tm);
>   if (ret)
>   goto free_tms;
> - tree_mod_log_write_unlock(eb->fs_info);
> + write_unlock(>fs_info->tree_mod_log_lock);
>   kfree(tm_list);
>  
>   return 0;
> @@ -624,7 +604,7 @@ static noinline int tree_mod_log_insert_move(struct 
> extent_buffer *eb,
>

Re: [PATCH 17/22] btrfs: kill tree_mod_log_set_root_pointer helper

2018-03-08 Thread Nikolay Borisov



On  8.03.2018 16:33, David Sterba wrote:
> A useless wrapper around tree_mod_log_insert_root that hides missing
> error handling. Move it to the callers.
> 
> Signed-off-by: David Sterba 

For 15-17:

Reviewed-by: Nikolay Borisov 

> ---
>  fs/btrfs/ctree.c | 20 +++-
>  1 file changed, 7 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index ceb0836d74cd..30950439731e 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -883,16 +883,6 @@ static noinline int tree_mod_log_free_eb(struct 
> extent_buffer *eb)
>   return ret;
>  }
>  
> -static noinline void
> -tree_mod_log_set_root_pointer(struct btrfs_root *root,
> -   struct extent_buffer *new_root_node,
> -   int log_removal)
> -{
> - int ret;
> - ret = tree_mod_log_insert_root(root->node, new_root_node, log_removal);
> - BUG_ON(ret < 0);
> -}
> -
>  /*
>   * check if the tree block can be shared by multiple trees
>   */
> @@ -1119,7 +1109,8 @@ static noinline int __btrfs_cow_block(struct 
> btrfs_trans_handle *trans,
>   parent_start = buf->start;
>  
>   extent_buffer_get(cow);
> - tree_mod_log_set_root_pointer(root, cow, 1);
> + ret = tree_mod_log_insert_root(root->node, cow, 1);
> + BUG_ON(ret < 0);
>   rcu_assign_pointer(root->node, cow);
>  
>   btrfs_free_tree_block(trans, root, buf, parent_start,
> @@ -1873,7 +1864,8 @@ static noinline int balance_level(struct 
> btrfs_trans_handle *trans,
>   goto enospc;
>   }
>  
> - tree_mod_log_set_root_pointer(root, child, 1);
> + ret = tree_mod_log_insert_root(root->node, child, 1);
> + BUG_ON(ret < 0);
>   rcu_assign_pointer(root->node, child);
>  
>   add_root_to_dirty_list(root);
> @@ -3319,6 +3311,7 @@ static noinline int insert_new_root(struct 
> btrfs_trans_handle *trans,
>   struct extent_buffer *c;
>   struct extent_buffer *old;
>   struct btrfs_disk_key lower_key;
> + int ret;
>  
>   BUG_ON(path->nodes[level]);
>   BUG_ON(path->nodes[level-1] != root->node);
> @@ -3357,7 +3350,8 @@ static noinline int insert_new_root(struct 
> btrfs_trans_handle *trans,
>   btrfs_mark_buffer_dirty(c);
>  
>   old = root->node;
> - tree_mod_log_set_root_pointer(root, c, 0);
> + ret = tree_mod_log_insert_root(root->node, c, 0);
> + BUG_ON(ret < 0);
>   rcu_assign_pointer(root->node, c);
>  
>   /* the super has an extra ref to root->node */
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Inconsistence between sender and receiver

2018-03-08 Thread Filipe Manana

On Wed, Mar 7, 2018 at 6:49 PM, Liu Bo  wrote:
> Hi,
>
> In the following steps[1], if  on receiver side has got
> changed via 'btrfs property set', then after doing incremental
> updates, receiver gets a different snapshot from what sender has sent.
>
> The reason behind it is that there is no change about file 'foo' in
> the send stream, such that receiver simply creates a snapshot of
>  on its side with nothing to apply from the send stream.
>
> A possible way to avoid this is to check rtransid and ctranid of
>  on receiver side, but I'm not very sure whether the current
> behavior is made deliberately, does anyone have an idea?
>
> Thanks,
>
> -liubo
>
> [1]:
> $ btrfs sub create /mnt/send/sub
> $ touch /mnt/send/sub/foo
> $ btrfs sub snap -r /mnt/send/sub /mnt/send/parent
>
> # send parent out
> $ btrfs send /mnt/send/parent | btrfs receive /mnt/recv/
>
> # change parent and file under it
> $ btrfs property set -t subvol /mnt/recv/parent ro false
> $ truncate -s 4096 /mnt/recv/parent/foo
>
> $ btrfs sub snap -r /mnt/send/sub /mnt/send/update
> $ btrfs send -p /mnt/send/parent /mnt/send/update | btrfs receive /mnt/recv
>
> $ ls -l /mnt/send/update
> total 0
> -rw-r--r-- 1 root root 0 Mar  6 11:13 foo
>
> $ ls -l /mnt/recv/update
> total 0
> -rw-r--r-- 1 root root 4096 Mar  6 11:14 foo

Interesting. I wasn't aware of that and I'm surprised too, I wouldn't
expect that behaviour (and I must see the code to figure out why it
behaves like that).
No idea if it's intentional or not, but at a first glance it seems not.

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 18/22] btrfs: move allocation after simple tests in tree_mod_log_insert_key

2018-03-08 Thread Filipe Manana

On Thu, Mar 8, 2018 at 2:33 PM, David Sterba  wrote:
> The allocation of tree_mod_elem can be delayed after tree_mod_dont_log.
> In case it causes the function to return, the allocation would be
> redundant. Otherwise it could cause unnecessary failure if there's not
> enough memory.

Nop, the allocation must be made before calling tree_mod_dont_log(),
as that function acquires a write lock.
That's why there's a call to tree_mod_need_log() before the allocation
and a call to tree_mod_dont_log() after (they do the same checks).


>
> Signed-off-by: David Sterba 
> ---
>  fs/btrfs/ctree.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 30950439731e..da6e2c3ca2d0 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -521,15 +521,13 @@ static noinline int tree_mod_log_insert_key(struct 
> extent_buffer *eb, int slot,
> if (!tree_mod_need_log(eb->fs_info, eb))
> return 0;
>
> +   if (tree_mod_dont_log(eb->fs_info, eb))
> +   return 0;
> +
> tm = alloc_tree_mod_elem(eb, slot, op, flags);
> if (!tm)
> return -ENOMEM;
>
> -   if (tree_mod_dont_log(eb->fs_info, eb)) {
> -   kfree(tm);
> -   return 0;
> -   }
> -
> ret = __tree_mod_log_insert(eb->fs_info, tm);
> write_unlock(>fs_info->tree_mod_log_lock);
> if (ret)
> --
> 2.16.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/20] btrfs-progs: qgroups: introduce btrfs_qgroup_query

2018-03-08 Thread Jeff Mahoney

On 3/8/18 12:54 AM, Qu Wenruo wrote:
> 
> 
> On 2018年03月08日 10:40, je...@suse.com wrote:
>> From: Jeff Mahoney 
>>
>> The only mechanism we have in the progs for searching qgroups is to load
>> all of them and filter the results.  This works for qgroup show but
>> to add quota information to 'btrfs subvoluem show' it's pretty wasteful.
>>
>> This patch splits out setting up the search and performing the search so
>> we can search for a single qgroupid more easily.  Since TREE_SEARCH
>> will give results that don't strictly match the search terms, we add
>> a filter to match only the results we care about.
>>
>> Signed-off-by: Jeff Mahoney 
>> ---
>>  qgroup.c | 143 
>> ---
>>  qgroup.h |   7 
>>  2 files changed, 116 insertions(+), 34 deletions(-)
>>
>> diff --git a/qgroup.c b/qgroup.c
>> index 57815718..d076b1de 100644
>> --- a/qgroup.c
>> +++ b/qgroup.c
>> @@ -1165,11 +1165,30 @@ static inline void print_status_flag_warning(u64 
>> flags)
>>  warning("qgroup data inconsistent, rescan recommended");
>>  }
>>  
>> -static int __qgroups_search(int fd, struct qgroup_lookup *qgroup_lookup)
>> +static bool key_in_range(const struct btrfs_key *key,
>> + const struct btrfs_ioctl_search_key *sk)
>> +{
>> +if (key->objectid < sk->min_objectid ||
>> +key->objectid > sk->max_objectid)
>> +return false;
>> +
>> +if (key->type < sk->min_type ||
>> +key->type > sk->max_type)
>> +return false;
>> +
>> +if (key->offset < sk->min_offset ||
>> +key->offset > sk->max_offset)
>> +return false;
>> +
>> +return true;
>> +}
> 
> Even with the key_in_range() check here, we are still following the tree
> search slice behavior:
> 
> tree search will still gives us all the items in key range from
> (min_objectid, min_type, min_offset) to
> (max_objectid, max_type, max_offset).
> 
> I don't see much different between the tree search ioctl and this one.

It's fundamentally different.

The one in the kernel has a silly interface.  It should be min_key and
max_key since the components aren't evaluated independently.  It
effectively treats min_key and max_key as 136-bit values and returns
everything between them, inclusive.  That's the slice behavior, as you
call it.

This key_in_range treats each component separately and acts as a filter
on the slice returned from the kernel.  If we request min/max_offset =
259, the caller will not get anything without offset = 259.

I suppose, ultimately, this could also be done using a filter on the
rbtree using the existing interface but that seems even more wasteful.

-Jeff

>> +
>> +static int __qgroups_search(int fd, struct btrfs_ioctl_search_args *args,
>> +struct qgroup_lookup *qgroup_lookup)
>>  {
>>  int ret;
>> -struct btrfs_ioctl_search_args args;
>> -struct btrfs_ioctl_search_key *sk = 
>> +struct btrfs_ioctl_search_key *sk = >key;
>> +struct btrfs_ioctl_search_key filter_key = args->key;
>>  struct btrfs_ioctl_search_header *sh;
>>  unsigned long off = 0;
>>  unsigned int i;
>> @@ -1180,30 +1199,15 @@ static int __qgroups_search(int fd, struct 
>> qgroup_lookup *qgroup_lookup)
>>  u64 qgroupid;
>>  u64 qgroupid1;
>>  
>> -memset(, 0, sizeof(args));
>> -
>> -sk->tree_id = BTRFS_QUOTA_TREE_OBJECTID;
>> -sk->max_type = BTRFS_QGROUP_RELATION_KEY;
>> -sk->min_type = BTRFS_QGROUP_STATUS_KEY;
>> -sk->max_objectid = (u64)-1;
>> -sk->max_offset = (u64)-1;
>> -sk->max_transid = (u64)-1;
>> -sk->nr_items = 4096;
>> -
>>  qgroup_lookup_init(qgroup_lookup);
>>  
>>  while (1) {
>> -ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, );
>> +ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args);
>>  if (ret < 0) {
>> -if (errno == ENOENT) {
>> -error("can't list qgroups: quotas not enabled");
>> +if (errno == ENOENT)
>>  ret = -ENOTTY;
>> -} else {
>> -error("can't list qgroups: %s",
>> -   strerror(errno));
>> +else
>>  ret = -errno;
>> -}
>> -
>>  break;
>>  }
>>  
>> @@ -1217,37 +1221,46 @@ static int __qgroups_search(int fd, struct 
>> qgroup_lookup *qgroup_lookup)
>>   * read the root_ref item it contains
>>   */
>>  for (i = 0; i < sk->nr_items; i++) {
>> -sh = (struct btrfs_ioctl_search_header *)(args.buf +
>> +struct btrfs_key key;
>> +
>> +sh = (struct btrfs_ioctl_search_header *)(args->buf +
>>off);
>>  off += sizeof(*sh);
>>

Re: [PATCH 0/3] Remove accidental VLA usage

2018-03-08 Thread Josh Poimboeuf

On Wed, Mar 07, 2018 at 07:30:44PM -0800, Kees Cook wrote:
> This series adds SIMPLE_MAX() to be used in places where a stack array
> is actually fixed, but the compiler still warns about VLA usage due to
> confusion caused by the safety checks in the max() macro.
> 
> I'm sending these via -mm since that's where I've introduced SIMPLE_MAX(),
> and they should all have no operational differences.

What if we instead simplify the max() macro's type checking so that GCC
can more easily fold the array size constants?  The below patch seems to
work:

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3fd291503576..ec863726da29 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -782,42 +782,32 @@ ftrace_vprintk(const char *fmt, va_list ap)
 static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
 #endif /* CONFIG_TRACING */
 
-/*
- * min()/max()/clamp() macros that also do
- * strict type-checking.. See the
- * "unnecessary" pointer comparison.
- */
-#define __min(t1, t2, min1, min2, x, y) ({ \
-   t1 min1 = (x);  \
-   t2 min2 = (y);  \
-   (void) ( == );\
-   min1 < min2 ? min1 : min2; })
+extern long __error_incompatible_types_in_min_macro;
+extern long __error_incompatible_types_in_max_macro;
+
+#define __min(t1, t2, x, y)\
+   __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
+ (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
+ (t1)__error_incompatible_types_in_min_macro)
 
 /**
  * min - return minimum of two values of the same or compatible types
  * @x: first value
  * @y: second value
  */
-#define min(x, y)  \
-   __min(typeof(x), typeof(y), \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
- x, y)
+#define min(x, y) __min(typeof(x), typeof(y), x, y)\
 
-#define __max(t1, t2, max1, max2, x, y) ({ \
-   t1 max1 = (x);  \
-   t2 max2 = (y);  \
-   (void) ( == );\
-   max1 > max2 ? max1 : max2; })
+#define __max(t1, t2, x, y)\
+   __builtin_choose_expr(__builtin_types_compatible_p(t1, t2), \
+ (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
+ (t1)__error_incompatible_types_in_max_macro)
 
 /**
  * max - return maximum of two values of the same or compatible types
  * @x: first value
  * @y: second value
  */
-#define max(x, y)  \
-   __max(typeof(x), typeof(y), \
- __UNIQUE_ID(max1_), __UNIQUE_ID(max2_),   \
- x, y)
+#define max(x, y) __max(typeof(x), typeof(y), x, y)
 
 /**
  * min3 - return minimum of three values
@@ -869,10 +859,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  * @x: first value
  * @y: second value
  */
-#define min_t(type, x, y)  \
-   __min(type, type,   \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
- x, y)
+#define min_t(type, x, y) __min(type, type, x, y)
 
 /**
  * max_t - return maximum of two values, using the specified type
@@ -880,10 +867,7 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
oops_dump_mode) { }
  * @x: first value
  * @y: second value
  */
-#define max_t(type, x, y)  \
-   __max(type, type,   \
- __UNIQUE_ID(min1_), __UNIQUE_ID(min2_),   \
- x, y)
+#define max_t(type, x, y) __max(type, type, x, y)  
\
 
 /**
  * clamp_t - return a value clamped to a given range using a given type
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: spurious full btrfs corruption

2018-03-08 Thread Christoph Anton Mitterer

Hey.

On Tue, 2018-03-06 at 09:50 +0800, Qu Wenruo wrote:
> > These were the two files:
> > -rw-r--r-- 1 calestyo calestyo   90112 Feb 22 16:46 'Lady In The
> > Water/05.mp3'
> > -rw-r--r-- 1 calestyo calestyo 4892407 Feb 27 23:28
> > '/home/calestyo/share/music/Lady In The Water/05.mp3'
> > 
> > 
> > -rw-r--r-- 1 calestyo calestyo 1904640 Feb 22 16:47 'The Hunt For
> > Red October [Intrada]/21.mp3'
> > -rw-r--r-- 1 calestyo calestyo 2968128 Feb 27 23:28
> > '/home/calestyo/share/music/The Hunt For Red October
> > [Intrada]/21.mp3'
> > 
> > with the former (smaller one) being the corrupted one (i.e. the one
> > returned by btrfs-restore).
> > 
> > Both are (in terms of filesize) multiples of 4096... what does that
> > mean now?
> 
> That means either we lost some file extents or inode items.
> 
> Btrfs-restore only found EXTENT_DATA, which contains the pointer to
> the
> real data, and inode number.
> But no INODE_ITEM is found, which records the real inode size, so it
> can
> only use EXTENT_DATA to rebuild as much data as possible.
> That why all recovered one is aligned to 4K.
> 
> So some metadata is also corrupted.

But that can also happen to just some files?
Anyway... still strange that it hit just those two (which weren't
touched for long).

> > However, all the qcow2 files from the restore are more or less
> > garbage.
> > During the btrfs-restore it already complained on them, that it
> > would
> > loop too often on them and whether I want to continue or not (I
> > choose
> > n and on another full run I choose y).
> > 
> > Some still contain a partition table, some partitions even
> > filesystems
> > (btrfs again)... but I cannot mount them.
> 
> I think the same problem happens on them too.
> 
> Some data is lost while some are good.
> Anyway, they would be garbage.

Again, still strange... that so many files (of those that I really
checked) were fully okay,... while those 4 were all broken.

When it only uses EXTENT_DATA, would that mean that it basically breaks
on every border where the file is split up into multiple extents (which
is of course likely for the (CoWed) images that I had.

> > 
> > > Would you please try to restore the fs on another system with
> > > good
> > > memory?
> > 
> > Which one? The originally broken fs from the SSD?
> 
> Yep.
> 
> > And what should I try to find out here?
> 
> During restore, if the csum error happens again on the newly created
> destination btrfs.
> (And I recommend use mount option nospace_cache,notreelog on the
> destination fs)

So an update on this (everything on the OLD notebook with likely good
memory):

I booted again from USBstick (with 4.15 kernel/progs),
luksOpened+losetup+luksOpened (yes two dm-crypt, the one from the
external restore HDD, then the image file of the SSD which again
contained dmcrypt+LUKS, of which one was the broken btrfs).

As I've mentioned before... btrfs-restore (and the other tools for
trying to find the bytenr) immediately fail here.
They bring some "block mapping error" and produce no output.

This worked on my first rescue attempt (where I had 4.12 kernel/progs).

Since I had no 4.12 kernel/progs at hand anymore, I went to an even
older rescue stick, wich has 4.7 kernel/progs (if I'm not wrong).
There it worked again (on the same image file).

So something changed after 4.14, which makes the tools no longer being
able to restore at least that what they could restore at 4.14.

=> Some bug recently introduced in btrfs-progs?

I finished the dump then (from OLD notebook/good RAM) with 4.7
kernel/progs,... to the very same external HDD I've used before.

And afterwards I:
diff -qr --no-dereference restoreFromNEWnotebook/ restoreFromOLDnotebook/

=> No differences were found, except one further file that was in the
new restoreFromOLDnotebook. Could be that this was a file wich I
deleted on the old restore because of csum errors, but I don't really
remember (actually I thought to remember that there were a few which I
deleted).

Since all other files were equal (that is at least in terms of file
contents and symlink targets - I didn't compare the metadata like
permissions, dates and owners)... the qcow2 images are garbage as well.

=> No csum errors were recorded in the kernel log during the diff, and
since both, the (remaining) restore results from the NEW notebook and
the ones just made on the OLD one were read because of the diff,... I'd
guess that no further corruption happened in the recent btrfs-restore.

On to the next working site:

> > > This -28 (ENOSPC) seems to show that the extent tree of the new
> > > btrfs
> > > is
> > > corrupted.
> > 
> > "new" here is dm-1, right? Which is the fresh btrfs I've created on
> > some 8TB HDD for my recovery works.
> > While that FS shows me:
> > [26017.690417] BTRFS info (device dm-2): disk space caching is
> > enabled
> > [26017.690421] BTRFS info (device dm-2): has skinny extents
> > [26017.798959] BTRFS info (device dm-2): bdev /dev/mapper/data-a4
> > errs:
>

[PATCH 19/22] btrfs: separate types for submit_bio_start and submit_bio_done

2018-03-08 Thread David Sterba

The callbacks make use of different parameters that are passed to the
other type unnecessarily. This patch adds separate types for each and
the unused parameters will be removed.

The type extent_submit_bio_hook_t keeps all parameters and can be used
where the start/done types are not appropriate.

Signed-off-by: David Sterba 
---
 fs/btrfs/disk-io.c   | 8 
 fs/btrfs/disk-io.h   | 4 ++--
 fs/btrfs/extent_io.h | 9 +
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 21f34ad0d411..6ffc97305d6a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -124,8 +124,8 @@ struct async_submit_bio {
void *private_data;
struct btrfs_fs_info *fs_info;
struct bio *bio;
-   extent_submit_bio_hook_t *submit_bio_start;
-   extent_submit_bio_hook_t *submit_bio_done;
+   extent_submit_bio_start_t *submit_bio_start;
+   extent_submit_bio_done_t *submit_bio_done;
int mirror_num;
unsigned long bio_flags;
/*
@@ -759,8 +759,8 @@ static void run_one_async_free(struct btrfs_work *work)
 blk_status_t btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct bio 
*bio,
 int mirror_num, unsigned long bio_flags,
 u64 bio_offset, void *private_data,
-extent_submit_bio_hook_t *submit_bio_start,
-extent_submit_bio_hook_t *submit_bio_done)
+extent_submit_bio_start_t *submit_bio_start,
+extent_submit_bio_done_t *submit_bio_done)
 {
struct async_submit_bio *async;
 
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 301151a50ac1..dcc94215de4f 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -131,8 +131,8 @@ blk_status_t btrfs_bio_wq_end_io(struct btrfs_fs_info 
*info, struct bio *bio,
 blk_status_t btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct bio 
*bio,
int mirror_num, unsigned long bio_flags,
u64 bio_offset, void *private_data,
-   extent_submit_bio_hook_t *submit_bio_start,
-   extent_submit_bio_hook_t *submit_bio_done);
+   extent_submit_bio_start_t *submit_bio_start,
+   extent_submit_bio_done_t *submit_bio_done);
 unsigned long btrfs_async_submit_limit(struct btrfs_fs_info *info);
 int btrfs_write_tree_block(struct extent_buffer *buf);
 void btrfs_wait_tree_block_writeback(struct extent_buffer *buf);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index a7a850abd600..202546435db6 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -95,6 +95,15 @@ struct io_failure_record;
 typedefblk_status_t (extent_submit_bio_hook_t)(void *private_data, 
struct bio *bio,
   int mirror_num, unsigned long bio_flags,
   u64 bio_offset);
+
+typedef blk_status_t (extent_submit_bio_start_t)(void *private_data,
+   struct bio *bio, int mirror_num, unsigned long bio_flags,
+   u64 bio_offset);
+
+typedef blk_status_t (extent_submit_bio_done_t)(void *private_data,
+   struct bio *bio, int mirror_num, unsigned long bio_flags,
+   u64 bio_offset);
+
 struct extent_io_ops {
/*
 * The following callbacks must be allways defined, the function
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 18/22] btrfs: move allocation after simple tests in tree_mod_log_insert_key

2018-03-08 Thread David Sterba

The allocation of tree_mod_elem can be delayed after tree_mod_dont_log.
In case it causes the function to return, the allocation would be
redundant. Otherwise it could cause unnecessary failure if there's not
enough memory.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 30950439731e..da6e2c3ca2d0 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -521,15 +521,13 @@ static noinline int tree_mod_log_insert_key(struct 
extent_buffer *eb, int slot,
if (!tree_mod_need_log(eb->fs_info, eb))
return 0;
 
+   if (tree_mod_dont_log(eb->fs_info, eb))
+   return 0;
+
tm = alloc_tree_mod_elem(eb, slot, op, flags);
if (!tm)
return -ENOMEM;
 
-   if (tree_mod_dont_log(eb->fs_info, eb)) {
-   kfree(tm);
-   return 0;
-   }
-
ret = __tree_mod_log_insert(eb->fs_info, tm);
write_unlock(>fs_info->tree_mod_log_lock);
if (ret)
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/22] btrfs: drop fs_info parameter from tree_mod_log_free_eb

2018-03-08 Thread David Sterba

It's provided by the extent_buffer.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 6d2f8562bb36..632beadf5725 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -654,12 +654,10 @@ __tree_mod_log_free_eb(struct btrfs_fs_info *fs_info,
return 0;
 }
 
-static noinline int
-tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
-struct extent_buffer *old_root,
-struct extent_buffer *new_root,
-int log_removal)
+static noinline int tree_mod_log_insert_root(struct extent_buffer *old_root,
+struct extent_buffer *new_root, int log_removal)
 {
+   struct btrfs_fs_info *fs_info = old_root->fs_info;
struct tree_mod_elem *tm = NULL;
struct tree_mod_elem **tm_list = NULL;
int nritems = 0;
@@ -932,8 +930,7 @@ tree_mod_log_set_root_pointer(struct btrfs_root *root,
  int log_removal)
 {
int ret;
-   ret = tree_mod_log_insert_root(root->fs_info, root->node,
-  new_root_node, log_removal);
+   ret = tree_mod_log_insert_root(root->node, new_root_node, log_removal);
BUG_ON(ret < 0);
 }
 
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/22] btrfs: embed tree_mod_move structure to tree_mod_elem

2018-03-08 Thread David Sterba

The tree_mod_move is not used anywhere and can be embedded as anonymous
structure.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 5585433eac40..e53af58697db 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -299,11 +299,6 @@ enum mod_log_op {
MOD_LOG_ROOT_REPLACE,
 };
 
-struct tree_mod_move {
-   int dst_slot;
-   int nr_items;
-};
-
 struct tree_mod_root {
u64 logical;
u8 level;
@@ -326,7 +321,10 @@ struct tree_mod_elem {
u64 blockptr;
 
/* this is used for op == MOD_LOG_MOVE_KEYS */
-   struct tree_mod_move move;
+   struct {
+   int dst_slot;
+   int nr_items;
+   } move;
 
/* this is used for op == MOD_LOG_ROOT_REPLACE */
struct tree_mod_root old_root;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 22/22] btrfs: rename submit callbacks and drop double underscores

2018-03-08 Thread David Sterba

Signed-off-by: David Sterba 
---
 fs/btrfs/disk-io.c |  8 
 fs/btrfs/inode.c   | 23 +++
 2 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f24522a6f36f..e112708e3e9b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -805,7 +805,7 @@ static blk_status_t btree_csum_one_bio(struct bio *bio)
return errno_to_blk_status(ret);
 }
 
-static blk_status_t __btree_submit_bio_start(void *private_data, struct bio 
*bio,
+static blk_status_t btree_submit_bio_start(void *private_data, struct bio *bio,
 u64 bio_offset)
 {
/*
@@ -815,7 +815,7 @@ static blk_status_t __btree_submit_bio_start(void 
*private_data, struct bio *bio
return btree_csum_one_bio(bio);
 }
 
-static blk_status_t __btree_submit_bio_done(void *private_data, struct bio 
*bio,
+static blk_status_t btree_submit_bio_done(void *private_data, struct bio *bio,
int mirror_num)
 {
struct inode *inode = private_data;
@@ -875,8 +875,8 @@ static blk_status_t btree_submit_bio_hook(void 
*private_data, struct bio *bio,
 */
ret = btrfs_wq_submit_bio(fs_info, bio, mirror_num, 0,
  bio_offset, private_data,
- __btree_submit_bio_start,
- __btree_submit_bio_done);
+ btree_submit_bio_start,
+ btree_submit_bio_done);
}
 
if (ret)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5cfba8be73be..16c6e2fe3b58 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1921,7 +1921,7 @@ int btrfs_merge_bio_hook(struct page *page, unsigned long 
offset,
  * At IO completion time the cums attached on the ordered extent record
  * are inserted into the btree
  */
-static blk_status_t __btrfs_submit_bio_start(void *private_data, struct bio 
*bio,
+static blk_status_t btrfs_submit_bio_start(void *private_data, struct bio *bio,
u64 bio_offset)
 {
struct inode *inode = private_data;
@@ -1940,7 +1940,7 @@ static blk_status_t __btrfs_submit_bio_start(void 
*private_data, struct bio *bio
  * At IO completion time the cums attached on the ordered extent record
  * are inserted into the btree
  */
-static blk_status_t __btrfs_submit_bio_done(void *private_data, struct bio 
*bio,
+static blk_status_t btrfs_submit_bio_done(void *private_data, struct bio *bio,
  int mirror_num)
 {
struct inode *inode = private_data;
@@ -2013,8 +2013,8 @@ static blk_status_t btrfs_submit_bio_hook(void 
*private_data, struct bio *bio,
/* we're doing a write, do the async checksumming */
ret = btrfs_wq_submit_bio(fs_info, bio, mirror_num, bio_flags,
  bio_offset, inode,
- __btrfs_submit_bio_start,
- __btrfs_submit_bio_done);
+ btrfs_submit_bio_start,
+ btrfs_submit_bio_done);
goto out;
} else if (!skip_sum) {
ret = btrfs_csum_one_bio(inode, bio, 0, 0);
@@ -8268,7 +8268,7 @@ static void btrfs_endio_direct_write(struct bio *bio)
bio_put(bio);
 }
 
-static blk_status_t __btrfs_submit_bio_start_direct_io(void *private_data,
+static blk_status_t btrfs_submit_bio_start_direct_io(void *private_data,
struct bio *bio, u64 offset)
 {
struct inode *inode = private_data;
@@ -8349,9 +8349,8 @@ static inline blk_status_t 
btrfs_lookup_and_bind_dio_csum(struct inode *inode,
return 0;
 }
 
-static inline blk_status_t
-__btrfs_submit_dio_bio(struct bio *bio, struct inode *inode, u64 file_offset,
-  int async_submit)
+static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
+   struct inode *inode, u64 file_offset, int async_submit)
 {
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct btrfs_dio_private *dip = bio->bi_private;
@@ -8374,8 +8373,8 @@ __btrfs_submit_dio_bio(struct bio *bio, struct inode 
*inode, u64 file_offset,
if (write && async_submit) {
ret = btrfs_wq_submit_bio(fs_info, bio, 0, 0,
  file_offset, inode,
- __btrfs_submit_bio_start_direct_io,
- __btrfs_submit_bio_done);
+ btrfs_submit_bio_start_direct_io,
+ btrfs_submit_bio_done);
goto err;
} else if (write) {
/*
@@ -8461,7 +8460,7 @@ static int

[PATCH 20/22] btrfs: remove unused parameters from extent_submit_bio_start_t

2018-03-08 Thread David Sterba

Remove parameters not used by any of the callbacks.

Signed-off-by: David Sterba 
---
 fs/btrfs/disk-io.c   | 2 --
 fs/btrfs/extent_io.h | 3 +--
 fs/btrfs/inode.c | 4 +---
 3 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6ffc97305d6a..e396ed366751 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -725,7 +725,6 @@ static void run_one_async_start(struct btrfs_work *work)
 
async = container_of(work, struct  async_submit_bio, work);
ret = async->submit_bio_start(async->private_data, async->bio,
- async->mirror_num, async->bio_flags,
  async->bio_offset);
if (ret)
async->status = ret;
@@ -808,7 +807,6 @@ static blk_status_t btree_csum_one_bio(struct bio *bio)
 }
 
 static blk_status_t __btree_submit_bio_start(void *private_data, struct bio 
*bio,
-int mirror_num, unsigned long 
bio_flags,
 u64 bio_offset)
 {
/*
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 202546435db6..1d3792568e64 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -97,8 +97,7 @@ typedef   blk_status_t (extent_submit_bio_hook_t)(void 
*private_data, struct bio *
   u64 bio_offset);
 
 typedef blk_status_t (extent_submit_bio_start_t)(void *private_data,
-   struct bio *bio, int mirror_num, unsigned long bio_flags,
-   u64 bio_offset);
+   struct bio *bio, u64 bio_offset);
 
 typedef blk_status_t (extent_submit_bio_done_t)(void *private_data,
struct bio *bio, int mirror_num, unsigned long bio_flags,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f53470112670..9b33544caaf8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1922,7 +1922,6 @@ int btrfs_merge_bio_hook(struct page *page, unsigned long 
offset,
  * are inserted into the btree
  */
 static blk_status_t __btrfs_submit_bio_start(void *private_data, struct bio 
*bio,
-   int mirror_num, unsigned long bio_flags,
u64 bio_offset)
 {
struct inode *inode = private_data;
@@ -8271,8 +8270,7 @@ static void btrfs_endio_direct_write(struct bio *bio)
 }
 
 static blk_status_t __btrfs_submit_bio_start_direct_io(void *private_data,
-   struct bio *bio, int mirror_num,
-   unsigned long bio_flags, u64 offset)
+   struct bio *bio, u64 offset)
 {
struct inode *inode = private_data;
blk_status_t ret;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 17/22] btrfs: kill tree_mod_log_set_root_pointer helper

2018-03-08 Thread David Sterba

A useless wrapper around tree_mod_log_insert_root that hides missing
error handling. Move it to the callers.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.c | 20 +++-
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index ceb0836d74cd..30950439731e 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -883,16 +883,6 @@ static noinline int tree_mod_log_free_eb(struct 
extent_buffer *eb)
return ret;
 }
 
-static noinline void
-tree_mod_log_set_root_pointer(struct btrfs_root *root,
- struct extent_buffer *new_root_node,
- int log_removal)
-{
-   int ret;
-   ret = tree_mod_log_insert_root(root->node, new_root_node, log_removal);
-   BUG_ON(ret < 0);
-}
-
 /*
  * check if the tree block can be shared by multiple trees
  */
@@ -1119,7 +1109,8 @@ static noinline int __btrfs_cow_block(struct 
btrfs_trans_handle *trans,
parent_start = buf->start;
 
extent_buffer_get(cow);
-   tree_mod_log_set_root_pointer(root, cow, 1);
+   ret = tree_mod_log_insert_root(root->node, cow, 1);
+   BUG_ON(ret < 0);
rcu_assign_pointer(root->node, cow);
 
btrfs_free_tree_block(trans, root, buf, parent_start,
@@ -1873,7 +1864,8 @@ static noinline int balance_level(struct 
btrfs_trans_handle *trans,
goto enospc;
}
 
-   tree_mod_log_set_root_pointer(root, child, 1);
+   ret = tree_mod_log_insert_root(root->node, child, 1);
+   BUG_ON(ret < 0);
rcu_assign_pointer(root->node, child);
 
add_root_to_dirty_list(root);
@@ -3319,6 +3311,7 @@ static noinline int insert_new_root(struct 
btrfs_trans_handle *trans,
struct extent_buffer *c;
struct extent_buffer *old;
struct btrfs_disk_key lower_key;
+   int ret;
 
BUG_ON(path->nodes[level]);
BUG_ON(path->nodes[level-1] != root->node);
@@ -3357,7 +3350,8 @@ static noinline int insert_new_root(struct 
btrfs_trans_handle *trans,
btrfs_mark_buffer_dirty(c);
 
old = root->node;
-   tree_mod_log_set_root_pointer(root, c, 0);
+   ret = tree_mod_log_insert_root(root->node, c, 0);
+   BUG_ON(ret < 0);
rcu_assign_pointer(root->node, c);
 
/* the super has an extra ref to root->node */
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 16/22] btrfs: kill tree_mod_log_set_node_key helper

2018-03-08 Thread David Sterba

A trivial wrapper that can be simply opencoded and makes the GFP
allocation request more visible. The error handling is now moved to the
callers.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.c | 32 +---
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 45c17433bb76..ceb0836d74cd 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -837,16 +837,6 @@ tree_mod_log_eb_copy(struct btrfs_fs_info *fs_info, struct 
extent_buffer *dst,
return ret;
 }
 
-static noinline void tree_mod_log_set_node_key(struct extent_buffer *eb,
-   int slot, int atomic)
-{
-   int ret;
-
-   ret = tree_mod_log_insert_key(eb, slot, MOD_LOG_KEY_REPLACE,
-   atomic ? GFP_ATOMIC : GFP_NOFS);
-   BUG_ON(ret < 0);
-}
-
 static noinline int tree_mod_log_free_eb(struct extent_buffer *eb)
 {
struct tree_mod_elem **tm_list = NULL;
@@ -1962,7 +1952,9 @@ static noinline int balance_level(struct 
btrfs_trans_handle *trans,
} else {
struct btrfs_disk_key right_key;
btrfs_node_key(right, _key, 0);
-   tree_mod_log_set_node_key(parent, pslot + 1, 0);
+   ret = tree_mod_log_insert_key(parent, pslot + 1,
+   MOD_LOG_KEY_REPLACE, GFP_NOFS);
+   BUG_ON(ret < 0);
btrfs_set_node_key(parent, _key, pslot + 1);
btrfs_mark_buffer_dirty(parent);
}
@@ -2006,7 +1998,9 @@ static noinline int balance_level(struct 
btrfs_trans_handle *trans,
/* update the parent key to reflect our changes */
struct btrfs_disk_key mid_key;
btrfs_node_key(mid, _key, 0);
-   tree_mod_log_set_node_key(parent, pslot, 0);
+   ret = tree_mod_log_insert_key(parent, pslot,
+   MOD_LOG_KEY_REPLACE, GFP_NOFS);
+   BUG_ON(ret < 0);
btrfs_set_node_key(parent, _key, pslot);
btrfs_mark_buffer_dirty(parent);
}
@@ -2107,7 +2101,9 @@ static noinline int push_nodes_for_insert(struct 
btrfs_trans_handle *trans,
struct btrfs_disk_key disk_key;
orig_slot += left_nr;
btrfs_node_key(mid, _key, 0);
-   tree_mod_log_set_node_key(parent, pslot, 0);
+   ret = tree_mod_log_insert_key(parent, pslot,
+   MOD_LOG_KEY_REPLACE, GFP_NOFS);
+   BUG_ON(ret < 0);
btrfs_set_node_key(parent, _key, pslot);
btrfs_mark_buffer_dirty(parent);
if (btrfs_header_nritems(left) > orig_slot) {
@@ -2161,7 +2157,9 @@ static noinline int push_nodes_for_insert(struct 
btrfs_trans_handle *trans,
struct btrfs_disk_key disk_key;
 
btrfs_node_key(right, _key, 0);
-   tree_mod_log_set_node_key(parent, pslot + 1, 0);
+   ret = tree_mod_log_insert_key(parent, pslot + 1,
+   MOD_LOG_KEY_REPLACE, GFP_NOFS);
+   BUG_ON(ret < 0);
btrfs_set_node_key(parent, _key, pslot + 1);
btrfs_mark_buffer_dirty(parent);
 
@@ -3114,13 +3112,17 @@ static void fixup_low_keys(struct btrfs_fs_info 
*fs_info,
 {
int i;
struct extent_buffer *t;
+   int ret;
 
for (i = level; i < BTRFS_MAX_LEVEL; i++) {
int tslot = path->slots[i];
+
if (!path->nodes[i])
break;
t = path->nodes[i];
-   tree_mod_log_set_node_key(t, tslot, 1);
+   ret = tree_mod_log_insert_key(t, tslot, MOD_LOG_KEY_REPLACE,
+   GFP_ATOMIC);
+   BUG_ON(ret < 0);
btrfs_set_node_key(t, key, tslot);
btrfs_mark_buffer_dirty(path->nodes[i]);
if (tslot != 0)
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 21/22] btrfs: remove unused parameters from extent_submit_bio_done_t

2018-03-08 Thread David Sterba

Remove parameters not used by any of the callbacks.

Signed-off-by: David Sterba 
---
 fs/btrfs/disk-io.c   | 6 ++
 fs/btrfs/extent_io.h | 3 +--
 fs/btrfs/inode.c | 3 +--
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e396ed366751..f24522a6f36f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -743,8 +743,7 @@ static void run_one_async_done(struct btrfs_work *work)
return;
}
 
-   async->submit_bio_done(async->private_data, async->bio, 
async->mirror_num,
-  async->bio_flags, async->bio_offset);
+   async->submit_bio_done(async->private_data, async->bio, 
async->mirror_num);
 }
 
 static void run_one_async_free(struct btrfs_work *work)
@@ -817,8 +816,7 @@ static blk_status_t __btree_submit_bio_start(void 
*private_data, struct bio *bio
 }
 
 static blk_status_t __btree_submit_bio_done(void *private_data, struct bio 
*bio,
-   int mirror_num, unsigned long 
bio_flags,
-   u64 bio_offset)
+   int mirror_num)
 {
struct inode *inode = private_data;
blk_status_t ret;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 1d3792568e64..d69c106077fc 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -100,8 +100,7 @@ typedef blk_status_t (extent_submit_bio_start_t)(void 
*private_data,
struct bio *bio, u64 bio_offset);
 
 typedef blk_status_t (extent_submit_bio_done_t)(void *private_data,
-   struct bio *bio, int mirror_num, unsigned long bio_flags,
-   u64 bio_offset);
+   struct bio *bio, int mirror_num);
 
 struct extent_io_ops {
/*
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9b33544caaf8..5cfba8be73be 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1941,8 +1941,7 @@ static blk_status_t __btrfs_submit_bio_start(void 
*private_data, struct bio *bio
  * are inserted into the btree
  */
 static blk_status_t __btrfs_submit_bio_done(void *private_data, struct bio 
*bio,
- int mirror_num, unsigned long bio_flags,
- u64 bio_offset)
+ int mirror_num)
 {
struct inode *inode = private_data;
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/22] btrfs: kill trivial wrapper tree_mod_log_eb_move

2018-03-08 Thread David Sterba

The wrapper is effectively an alias for tree_mod_log_insert_move but
also hides the missing error handling. To make that more visible, lift
the BUG_ON to the callers.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 7d39ef6ce20a..45c17433bb76 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -837,14 +837,6 @@ tree_mod_log_eb_copy(struct btrfs_fs_info *fs_info, struct 
extent_buffer *dst,
return ret;
 }
 
-static inline void tree_mod_log_eb_move(struct extent_buffer *dst,
-int dst_offset, int src_offset, int nr_items)
-{
-   int ret;
-   ret = tree_mod_log_insert_move(dst, dst_offset, src_offset, nr_items);
-   BUG_ON(ret < 0);
-}
-
 static noinline void tree_mod_log_set_node_key(struct extent_buffer *eb,
int slot, int atomic)
 {
@@ -3225,8 +3217,8 @@ static int push_node_left(struct btrfs_trans_handle 
*trans,
 
if (push_items < src_nritems) {
/*
-* don't call tree_mod_log_eb_move here, key removal was already
-* fully logged by tree_mod_log_eb_copy above.
+* Don't call tree_mod_log_insert_move here, key removal was
+* already fully logged by tree_mod_log_eb_copy above.
 */
memmove_extent_buffer(src, btrfs_node_key_ptr_offset(0),
  btrfs_node_key_ptr_offset(push_items),
@@ -3281,7 +3273,8 @@ static int balance_node_right(struct btrfs_trans_handle 
*trans,
if (max_push < push_items)
push_items = max_push;
 
-   tree_mod_log_eb_move(dst, push_items, 0, dst_nritems);
+   ret = tree_mod_log_insert_move(dst, push_items, 0, dst_nritems);
+   BUG_ON(ret < 0);
memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(push_items),
  btrfs_node_key_ptr_offset(0),
  (dst_nritems) *
@@ -3399,9 +3392,11 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
BUG_ON(slot > nritems);
BUG_ON(nritems == BTRFS_NODEPTRS_PER_BLOCK(fs_info));
if (slot != nritems) {
-   if (level)
-   tree_mod_log_eb_move(lower, slot + 1, slot,
+   if (level) {
+   ret = tree_mod_log_insert_move(lower, slot + 1, slot,
nritems - slot);
+   BUG_ON(ret < 0);
+   }
memmove_extent_buffer(lower,
  btrfs_node_key_ptr_offset(slot + 1),
  btrfs_node_key_ptr_offset(slot),
@@ -4872,9 +4867,11 @@ static void del_ptr(struct btrfs_root *root, struct 
btrfs_path *path,
 
nritems = btrfs_header_nritems(parent);
if (slot != nritems - 1) {
-   if (level)
-   tree_mod_log_eb_move(parent, slot, slot + 1,
+   if (level) {
+   ret = tree_mod_log_insert_move(parent, slot, slot + 1,
nritems - slot - 1);
+   BUG_ON(ret < 0);
+   }
memmove_extent_buffer(parent,
  btrfs_node_key_ptr_offset(slot),
  btrfs_node_key_ptr_offset(slot + 1),
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/22] btrfs: remove trivial locking wrappers of tree mod log

2018-03-08 Thread David Sterba

The wrappers are trivial and do bring any extra value on top of the
plain locking primitives.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.c | 58 +++-
 1 file changed, 19 insertions(+), 39 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 716c140798c3..7d39ef6ce20a 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -330,26 +330,6 @@ struct tree_mod_elem {
struct tree_mod_root old_root;
 };
 
-static inline void tree_mod_log_read_lock(struct btrfs_fs_info *fs_info)
-{
-   read_lock(_info->tree_mod_log_lock);
-}
-
-static inline void tree_mod_log_read_unlock(struct btrfs_fs_info *fs_info)
-{
-   read_unlock(_info->tree_mod_log_lock);
-}
-
-static inline void tree_mod_log_write_lock(struct btrfs_fs_info *fs_info)
-{
-   write_lock(_info->tree_mod_log_lock);
-}
-
-static inline void tree_mod_log_write_unlock(struct btrfs_fs_info *fs_info)
-{
-   write_unlock(_info->tree_mod_log_lock);
-}
-
 /*
  * Pull a new tree mod seq number for our operation.
  */
@@ -369,14 +349,14 @@ static inline u64 btrfs_inc_tree_mod_seq(struct 
btrfs_fs_info *fs_info)
 u64 btrfs_get_tree_mod_seq(struct btrfs_fs_info *fs_info,
   struct seq_list *elem)
 {
-   tree_mod_log_write_lock(fs_info);
+   write_lock(_info->tree_mod_log_lock);
spin_lock(_info->tree_mod_seq_lock);
if (!elem->seq) {
elem->seq = btrfs_inc_tree_mod_seq(fs_info);
list_add_tail(>list, _info->tree_mod_seq_list);
}
spin_unlock(_info->tree_mod_seq_lock);
-   tree_mod_log_write_unlock(fs_info);
+   write_unlock(_info->tree_mod_log_lock);
 
return elem->seq;
 }
@@ -418,7 +398,7 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
 * anything that's lower than the lowest existing (read: blocked)
 * sequence number can be removed from the tree.
 */
-   tree_mod_log_write_lock(fs_info);
+   write_lock(_info->tree_mod_log_lock);
tm_root = _info->tree_mod_log;
for (node = rb_first(tm_root); node; node = next) {
next = rb_next(node);
@@ -428,7 +408,7 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
rb_erase(node, tm_root);
kfree(tm);
}
-   tree_mod_log_write_unlock(fs_info);
+   write_unlock(_info->tree_mod_log_lock);
 }
 
 /*
@@ -439,7 +419,7 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
  * for root replace operations, or the logical address of the affected
  * block for all other operations.
  *
- * Note: must be called with write lock (tree_mod_log_write_lock).
+ * Note: must be called with write lock for fs_info_;;tree_mod_log_lock.
  */
 static noinline int
 __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
@@ -477,7 +457,7 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct 
tree_mod_elem *tm)
  * Determines if logging can be omitted. Returns 1 if it can. Otherwise, it
  * returns zero with the tree_mod_log_lock acquired. The caller must hold
  * this until all tree mod log insertions are recorded in the rb tree and then
- * call tree_mod_log_write_unlock() to release.
+ * write unlock fs_info::tree_mod_log_lock.
  */
 static inline int tree_mod_dont_log(struct btrfs_fs_info *fs_info,
struct extent_buffer *eb) {
@@ -487,9 +467,9 @@ static inline int tree_mod_dont_log(struct btrfs_fs_info 
*fs_info,
if (eb && btrfs_header_level(eb) == 0)
return 1;
 
-   tree_mod_log_write_lock(fs_info);
+   write_lock(_info->tree_mod_log_lock);
if (list_empty(&(fs_info)->tree_mod_seq_list)) {
-   tree_mod_log_write_unlock(fs_info);
+   write_unlock(_info->tree_mod_log_lock);
return 1;
}
 
@@ -551,7 +531,7 @@ static noinline int tree_mod_log_insert_key(struct 
extent_buffer *eb, int slot,
}
 
ret = __tree_mod_log_insert(eb->fs_info, tm);
-   tree_mod_log_write_unlock(eb->fs_info);
+   write_unlock(>fs_info->tree_mod_log_lock);
if (ret)
kfree(tm);
 
@@ -613,7 +593,7 @@ static noinline int tree_mod_log_insert_move(struct 
extent_buffer *eb,
ret = __tree_mod_log_insert(eb->fs_info, tm);
if (ret)
goto free_tms;
-   tree_mod_log_write_unlock(eb->fs_info);
+   write_unlock(>fs_info->tree_mod_log_lock);
kfree(tm_list);
 
return 0;
@@ -624,7 +604,7 @@ static noinline int tree_mod_log_insert_move(struct 
extent_buffer *eb,
kfree(tm_list[i]);
}
if (locked)
-   tree_mod_log_write_unlock(eb->fs_info);
+   write_unlock(>fs_info->tree_mod_log_lock);
kfree(tm_list);
kfree(tm);
 
@@ -703,7 +683,7 @@ static noinline int tree_mod_log_insert_root(struct 
extent_buffer *old_root,
if (!ret)

[PATCH 11/22] btrfs: drop unused fs_info parameter from tree_mod_log_eb_move

2018-03-08 Thread David Sterba

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 632beadf5725..5585433eac40 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -859,8 +859,7 @@ tree_mod_log_eb_copy(struct btrfs_fs_info *fs_info, struct 
extent_buffer *dst,
return ret;
 }
 
-static inline void
-tree_mod_log_eb_move(struct btrfs_fs_info *fs_info, struct extent_buffer *dst,
+static inline void tree_mod_log_eb_move(struct extent_buffer *dst,
 int dst_offset, int src_offset, int nr_items)
 {
int ret;
@@ -3305,7 +3304,7 @@ static int balance_node_right(struct btrfs_trans_handle 
*trans,
if (max_push < push_items)
push_items = max_push;
 
-   tree_mod_log_eb_move(fs_info, dst, push_items, 0, dst_nritems);
+   tree_mod_log_eb_move(dst, push_items, 0, dst_nritems);
memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(push_items),
  btrfs_node_key_ptr_offset(0),
  (dst_nritems) *
@@ -3424,8 +3423,8 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
BUG_ON(nritems == BTRFS_NODEPTRS_PER_BLOCK(fs_info));
if (slot != nritems) {
if (level)
-   tree_mod_log_eb_move(fs_info, lower, slot + 1,
-slot, nritems - slot);
+   tree_mod_log_eb_move(lower, slot + 1, slot,
+   nritems - slot);
memmove_extent_buffer(lower,
  btrfs_node_key_ptr_offset(slot + 1),
  btrfs_node_key_ptr_offset(slot),
@@ -4897,8 +4896,8 @@ static void del_ptr(struct btrfs_root *root, struct 
btrfs_path *path,
nritems = btrfs_header_nritems(parent);
if (slot != nritems - 1) {
if (level)
-   tree_mod_log_eb_move(fs_info, parent, slot,
-slot + 1, nritems - slot - 1);
+   tree_mod_log_eb_move(parent, slot, slot + 1,
+   nritems - slot - 1);
memmove_extent_buffer(parent,
  btrfs_node_key_ptr_offset(slot),
  btrfs_node_key_ptr_offset(slot + 1),
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/22] btrfs: drop fs_info parameter from __tree_mod_log_oldest_root

2018-03-08 Thread David Sterba

It's provided by the extent_buffer.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index e53af58697db..716c140798c3 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1195,9 +1195,8 @@ static noinline int __btrfs_cow_block(struct 
btrfs_trans_handle *trans,
  * returns the logical address of the oldest predecessor of the given root.
  * entries older than time_seq are ignored.
  */
-static struct tree_mod_elem *
-__tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info,
-  struct extent_buffer *eb_root, u64 time_seq)
+static struct tree_mod_elem *__tree_mod_log_oldest_root(
+   struct extent_buffer *eb_root, u64 time_seq)
 {
struct tree_mod_elem *tm;
struct tree_mod_elem *found = NULL;
@@ -1214,7 +1213,7 @@ __tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info,
 * first operation that's logged for this root.
 */
while (1) {
-   tm = tree_mod_log_search_oldest(fs_info, root_logical,
+   tm = tree_mod_log_search_oldest(eb_root->fs_info, root_logical,
time_seq);
if (!looped && !tm)
return NULL;
@@ -1404,7 +1403,7 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
u64 logical;
 
eb_root = btrfs_read_lock_root_node(root);
-   tm = __tree_mod_log_oldest_root(fs_info, eb_root, time_seq);
+   tm = __tree_mod_log_oldest_root(eb_root, time_seq);
if (!tm)
return eb_root;
 
@@ -1468,7 +1467,7 @@ int btrfs_old_root_level(struct btrfs_root *root, u64 
time_seq)
int level;
struct extent_buffer *eb_root = btrfs_root_node(root);
 
-   tm = __tree_mod_log_oldest_root(root->fs_info, eb_root, time_seq);
+   tm = __tree_mod_log_oldest_root(eb_root, time_seq);
if (tm && tm->op == MOD_LOG_ROOT_REPLACE) {
level = tm->old_root.level;
} else {
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/22] btrfs: remove redundant variable in __do_readpage

2018-03-08 Thread David Sterba

The value of page_end is only stored to end, no other use.

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 36514baa661e..ffcbdb2390a1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2888,8 +2888,7 @@ static int __do_readpage(struct extent_io_tree *tree,
 {
struct inode *inode = page->mapping->host;
u64 start = page_offset(page);
-   u64 page_end = start + PAGE_SIZE - 1;
-   u64 end;
+   const u64 end = start + PAGE_SIZE - 1;
u64 cur = start;
u64 extent_offset;
u64 last_byte = i_size_read(inode);
@@ -2909,7 +2908,6 @@ static int __do_readpage(struct extent_io_tree *tree,
 
set_page_extent_mapped(page);
 
-   end = page_end;
if (!PageUptodate(page)) {
if (cleancache_get_page(page) == 0) {
BUG_ON(blocksize != PAGE_SIZE);
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/22] btrfs: drop fs_info parameter from tree_mod_log_set_node_key

2018-03-08 Thread David Sterba

It's provided by the extent_buffer.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index b88a79e69ddf..11e985ca0bc2 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -877,13 +877,12 @@ tree_mod_log_eb_move(struct btrfs_fs_info *fs_info, 
struct extent_buffer *dst,
BUG_ON(ret < 0);
 }
 
-static noinline void
-tree_mod_log_set_node_key(struct btrfs_fs_info *fs_info,
- struct extent_buffer *eb, int slot, int atomic)
+static noinline void tree_mod_log_set_node_key(struct extent_buffer *eb,
+   int slot, int atomic)
 {
int ret;
 
-   ret = tree_mod_log_insert_key(fs_info, eb, slot,
+   ret = tree_mod_log_insert_key(eb->fs_info, eb, slot,
MOD_LOG_KEY_REPLACE,
atomic ? GFP_ATOMIC : GFP_NOFS);
BUG_ON(ret < 0);
@@ -2007,8 +2006,7 @@ static noinline int balance_level(struct 
btrfs_trans_handle *trans,
} else {
struct btrfs_disk_key right_key;
btrfs_node_key(right, _key, 0);
-   tree_mod_log_set_node_key(fs_info, parent,
- pslot + 1, 0);
+   tree_mod_log_set_node_key(parent, pslot + 1, 0);
btrfs_set_node_key(parent, _key, pslot + 1);
btrfs_mark_buffer_dirty(parent);
}
@@ -2052,7 +2050,7 @@ static noinline int balance_level(struct 
btrfs_trans_handle *trans,
/* update the parent key to reflect our changes */
struct btrfs_disk_key mid_key;
btrfs_node_key(mid, _key, 0);
-   tree_mod_log_set_node_key(fs_info, parent, pslot, 0);
+   tree_mod_log_set_node_key(parent, pslot, 0);
btrfs_set_node_key(parent, _key, pslot);
btrfs_mark_buffer_dirty(parent);
}
@@ -2153,7 +2151,7 @@ static noinline int push_nodes_for_insert(struct 
btrfs_trans_handle *trans,
struct btrfs_disk_key disk_key;
orig_slot += left_nr;
btrfs_node_key(mid, _key, 0);
-   tree_mod_log_set_node_key(fs_info, parent, pslot, 0);
+   tree_mod_log_set_node_key(parent, pslot, 0);
btrfs_set_node_key(parent, _key, pslot);
btrfs_mark_buffer_dirty(parent);
if (btrfs_header_nritems(left) > orig_slot) {
@@ -2207,8 +2205,7 @@ static noinline int push_nodes_for_insert(struct 
btrfs_trans_handle *trans,
struct btrfs_disk_key disk_key;
 
btrfs_node_key(right, _key, 0);
-   tree_mod_log_set_node_key(fs_info, parent,
- pslot + 1, 0);
+   tree_mod_log_set_node_key(parent, pslot + 1, 0);
btrfs_set_node_key(parent, _key, pslot + 1);
btrfs_mark_buffer_dirty(parent);
 
@@ -3167,7 +3164,7 @@ static void fixup_low_keys(struct btrfs_fs_info *fs_info,
if (!path->nodes[i])
break;
t = path->nodes[i];
-   tree_mod_log_set_node_key(fs_info, t, tslot, 1);
+   tree_mod_log_set_node_key(t, tslot, 1);
btrfs_set_node_key(t, key, tslot);
btrfs_mark_buffer_dirty(path->nodes[i]);
if (tslot != 0)
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/22] btrfs: cleanup merging conditions in submit_extent_page

2018-03-08 Thread David Sterba

The merge call was factored out to a separate helper but it's a trivial
one and arguably we can opencode it and cache the value.

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.c | 26 +-
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ffcbdb2390a1..2fbc58b0b908 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2744,18 +2744,6 @@ static int __must_check submit_one_bio(struct bio *bio, 
int mirror_num,
return blk_status_to_errno(ret);
 }
 
-static int merge_bio(struct extent_io_tree *tree, struct page *page,
-unsigned long offset, size_t size, struct bio *bio,
-unsigned long bio_flags)
-{
-   int ret = 0;
-   if (tree->ops)
-   ret = tree->ops->merge_bio_hook(page, offset, size, bio,
-   bio_flags);
-   return ret;
-
-}
-
 /*
  * @opf:   bio REQ_OP_* and REQ_* flags as one value
  * @bio_ret:   must be valid pointer, newly allocated bio will be stored there
@@ -2774,23 +2762,27 @@ static int submit_extent_page(unsigned int opf, struct 
extent_io_tree *tree,
 {
int ret = 0;
struct bio *bio;
-   int contig = 0;
-   int old_compressed = prev_bio_flags & EXTENT_BIO_COMPRESSED;
size_t page_size = min_t(size_t, size, PAGE_SIZE);
sector_t sector = offset >> 9;
 
ASSERT(bio_ret);
 
if (*bio_ret) {
+   bool contig;
+   bool can_merge = true;
+
bio = *bio_ret;
-   if (old_compressed)
+   if (prev_bio_flags & EXTENT_BIO_COMPRESSED)
contig = bio->bi_iter.bi_sector == sector;
else
contig = bio_end_sector(bio) == sector;
 
-   if (prev_bio_flags != bio_flags || !contig ||
+   if (tree->ops && tree->ops->merge_bio_hook(page, offset,
+   page_size, bio, bio_flags))
+   can_merge = false;
+
+   if (prev_bio_flags != bio_flags || !contig || !can_merge ||
force_bio_submit ||
-   merge_bio(tree, page, pg_offset, page_size, bio, bio_flags) 
||
bio_add_page(bio, page, page_size, pg_offset) < page_size) {
ret = submit_one_bio(bio, mirror_num, prev_bio_flags);
if (ret < 0) {
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/22] btrfs: drop fs_info parameter from tree_mod_log_insert_key

2018-03-08 Thread David Sterba

It's provided by the extent_buffer.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 3b707af76579..60e27bef0e27 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -536,28 +536,26 @@ alloc_tree_mod_elem(struct extent_buffer *eb, int slot,
return tm;
 }
 
-static noinline int
-tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
-   struct extent_buffer *eb, int slot,
-   enum mod_log_op op, gfp_t flags)
+static noinline int tree_mod_log_insert_key(struct extent_buffer *eb, int slot,
+   enum mod_log_op op, gfp_t flags)
 {
struct tree_mod_elem *tm;
int ret;
 
-   if (!tree_mod_need_log(fs_info, eb))
+   if (!tree_mod_need_log(eb->fs_info, eb))
return 0;
 
tm = alloc_tree_mod_elem(eb, slot, op, flags);
if (!tm)
return -ENOMEM;
 
-   if (tree_mod_dont_log(fs_info, eb)) {
+   if (tree_mod_dont_log(eb->fs_info, eb)) {
kfree(tm);
return 0;
}
 
-   ret = __tree_mod_log_insert(fs_info, tm);
-   tree_mod_log_write_unlock(fs_info);
+   ret = __tree_mod_log_insert(eb->fs_info, tm);
+   tree_mod_log_write_unlock(eb->fs_info);
if (ret)
kfree(tm);
 
@@ -879,8 +877,7 @@ static noinline void tree_mod_log_set_node_key(struct 
extent_buffer *eb,
 {
int ret;
 
-   ret = tree_mod_log_insert_key(eb->fs_info, eb, slot,
-   MOD_LOG_KEY_REPLACE,
+   ret = tree_mod_log_insert_key(eb, slot, MOD_LOG_KEY_REPLACE,
atomic ? GFP_ATOMIC : GFP_NOFS);
BUG_ON(ret < 0);
 }
@@ -1178,7 +1175,7 @@ static noinline int __btrfs_cow_block(struct 
btrfs_trans_handle *trans,
add_root_to_dirty_list(root);
} else {
WARN_ON(trans->transid != btrfs_header_generation(parent));
-   tree_mod_log_insert_key(fs_info, parent, parent_slot,
+   tree_mod_log_insert_key(parent, parent_slot,
MOD_LOG_KEY_REPLACE, GFP_NOFS);
btrfs_set_node_blockptr(parent, parent_slot,
cow->start);
@@ -3441,8 +3438,8 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
  (nritems - slot) * sizeof(struct btrfs_key_ptr));
}
if (level) {
-   ret = tree_mod_log_insert_key(fs_info, lower, slot,
- MOD_LOG_KEY_ADD, GFP_NOFS);
+   ret = tree_mod_log_insert_key(lower, slot, MOD_LOG_KEY_ADD,
+   GFP_NOFS);
BUG_ON(ret < 0);
}
btrfs_set_node_key(lower, key, slot);
@@ -4914,8 +4911,8 @@ static void del_ptr(struct btrfs_root *root, struct 
btrfs_path *path,
  sizeof(struct btrfs_key_ptr) *
  (nritems - slot - 1));
} else if (level) {
-   ret = tree_mod_log_insert_key(fs_info, parent, slot,
- MOD_LOG_KEY_REMOVE, GFP_NOFS);
+   ret = tree_mod_log_insert_key(parent, slot, MOD_LOG_KEY_REMOVE,
+   GFP_NOFS);
BUG_ON(ret < 0);
}
 
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/22] btrfs: drop fs_info parameter from tree_mod_log_free_eb

2018-03-08 Thread David Sterba

It's provided by the extent_buffer.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 60e27bef0e27..6d2f8562bb36 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -41,8 +41,6 @@ static int balance_node_right(struct btrfs_trans_handle 
*trans,
  struct extent_buffer *src_buf);
 static void del_ptr(struct btrfs_root *root, struct btrfs_path *path,
int level, int slot);
-static int tree_mod_log_free_eb(struct btrfs_fs_info *fs_info,
-struct extent_buffer *eb);
 
 struct btrfs_path *btrfs_alloc_path(void)
 {
@@ -882,8 +880,7 @@ static noinline void tree_mod_log_set_node_key(struct 
extent_buffer *eb,
BUG_ON(ret < 0);
 }
 
-static noinline int
-tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb)
+static noinline int tree_mod_log_free_eb(struct extent_buffer *eb)
 {
struct tree_mod_elem **tm_list = NULL;
int nritems = 0;
@@ -893,7 +890,7 @@ tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, struct 
extent_buffer *eb)
if (btrfs_header_level(eb) == 0)
return 0;
 
-   if (!tree_mod_need_log(fs_info, NULL))
+   if (!tree_mod_need_log(eb->fs_info, NULL))
return 0;
 
nritems = btrfs_header_nritems(eb);
@@ -910,11 +907,11 @@ tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, 
struct extent_buffer *eb)
}
}
 
-   if (tree_mod_dont_log(fs_info, eb))
+   if (tree_mod_dont_log(eb->fs_info, eb))
goto free_tms;
 
-   ret = __tree_mod_log_free_eb(fs_info, tm_list, nritems);
-   tree_mod_log_write_unlock(fs_info);
+   ret = __tree_mod_log_free_eb(eb->fs_info, tm_list, nritems);
+   tree_mod_log_write_unlock(eb->fs_info);
if (ret)
goto free_tms;
kfree(tm_list);
@@ -1183,7 +1180,7 @@ static noinline int __btrfs_cow_block(struct 
btrfs_trans_handle *trans,
  trans->transid);
btrfs_mark_buffer_dirty(parent);
if (last_ref) {
-   ret = tree_mod_log_free_eb(fs_info, buf);
+   ret = tree_mod_log_free_eb(buf);
if (ret) {
btrfs_abort_transaction(trans, ret);
return ret;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/22] btrfs: document more parameters of submit_extent_page

2018-03-08 Thread David Sterba

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2fbc58b0b908..f1d199c8883e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2746,7 +2746,19 @@ static int __must_check submit_one_bio(struct bio *bio, 
int mirror_num,
 
 /*
  * @opf:   bio REQ_OP_* and REQ_* flags as one value
+ * @tree:  tree so we can call our merge_bio hook
+ * @wbc:   optional writeback control for io accounting
+ * @page:  page to add to the bio
+ * @pg_offset: offset of the new bio or to check whether we are adding
+ *  a contiguous page to the previous one
+ * @size:  portion of page that we want to write
+ * @offset:starting offset in the page
+ * @bdev:  attach newly created bios to this bdev
  * @bio_ret:   must be valid pointer, newly allocated bio will be stored there
+ * @end_io_func: end_io callback for new bio
+ * @mirror_num: desired mirror to read/write
+ * @prev_bio_flags:  flags of previous bio to see if we can merge the current 
one
+ * @bio_flags: flags of the current bio to see if we can merge them
  */
 static int submit_extent_page(unsigned int opf, struct extent_io_tree *tree,
  struct writeback_control *wbc,
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/22] btrfs: drop fs_info parameter from tree_mod_log_insert_move

2018-03-08 Thread David Sterba

It's provided by the extent_buffer.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 11e985ca0bc2..3b707af76579 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -564,10 +564,8 @@ tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
return ret;
 }
 
-static noinline int
-tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
-struct extent_buffer *eb, int dst_slot, int src_slot,
-int nr_items)
+static noinline int tree_mod_log_insert_move(struct extent_buffer *eb,
+   int dst_slot, int src_slot, int nr_items)
 {
struct tree_mod_elem *tm = NULL;
struct tree_mod_elem **tm_list = NULL;
@@ -575,7 +573,7 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
int i;
int locked = 0;
 
-   if (!tree_mod_need_log(fs_info, eb))
+   if (!tree_mod_need_log(eb->fs_info, eb))
return 0;
 
tm_list = kcalloc(nr_items, sizeof(struct tree_mod_elem *), GFP_NOFS);
@@ -603,7 +601,7 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
}
}
 
-   if (tree_mod_dont_log(fs_info, eb))
+   if (tree_mod_dont_log(eb->fs_info, eb))
goto free_tms;
locked = 1;
 
@@ -613,26 +611,26 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
 * buffer, i.e. dst_slot < src_slot.
 */
for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
-   ret = __tree_mod_log_insert(fs_info, tm_list[i]);
+   ret = __tree_mod_log_insert(eb->fs_info, tm_list[i]);
if (ret)
goto free_tms;
}
 
-   ret = __tree_mod_log_insert(fs_info, tm);
+   ret = __tree_mod_log_insert(eb->fs_info, tm);
if (ret)
goto free_tms;
-   tree_mod_log_write_unlock(fs_info);
+   tree_mod_log_write_unlock(eb->fs_info);
kfree(tm_list);
 
return 0;
 free_tms:
for (i = 0; i < nr_items; i++) {
if (tm_list[i] && !RB_EMPTY_NODE(_list[i]->node))
-   rb_erase(_list[i]->node, _info->tree_mod_log);
+   rb_erase(_list[i]->node, >fs_info->tree_mod_log);
kfree(tm_list[i]);
}
if (locked)
-   tree_mod_log_write_unlock(fs_info);
+   tree_mod_log_write_unlock(eb->fs_info);
kfree(tm_list);
kfree(tm);
 
@@ -872,8 +870,7 @@ tree_mod_log_eb_move(struct btrfs_fs_info *fs_info, struct 
extent_buffer *dst,
 int dst_offset, int src_offset, int nr_items)
 {
int ret;
-   ret = tree_mod_log_insert_move(fs_info, dst, dst_offset, src_offset,
-  nr_items);
+   ret = tree_mod_log_insert_move(dst, dst_offset, src_offset, nr_items);
BUG_ON(ret < 0);
 }
 
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/22] btrfs: assume that bio_ret is always valid in submit_extent_page

2018-03-08 Thread David Sterba

All callers pass a valid pointer so we can drop the redundant checks.
The call to submit_one_bio never happend and can be removed.

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index dfeb74a0be77..d00d5a59ff21 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2758,6 +2758,7 @@ static int merge_bio(struct extent_io_tree *tree, struct 
page *page,
 
 /*
  * @opf:   bio REQ_OP_* and REQ_* flags as one value
+ * @bio_ret:   must be valid pointer, newly allocated bio will be stored there
  */
 static int submit_extent_page(unsigned int opf, struct extent_io_tree *tree,
  struct writeback_control *wbc,
@@ -2778,7 +2779,9 @@ static int submit_extent_page(unsigned int opf, struct 
extent_io_tree *tree,
size_t page_size = min_t(size_t, size, PAGE_SIZE);
sector_t sector = offset >> 9;
 
-   if (bio_ret && *bio_ret) {
+   ASSERT(bio_ret);
+
+   if (*bio_ret) {
bio = *bio_ret;
if (old_compressed)
contig = bio->bi_iter.bi_sector == sector;
@@ -2813,10 +2816,7 @@ static int submit_extent_page(unsigned int opf, struct 
extent_io_tree *tree,
wbc_account_io(wbc, page, page_size);
}
 
-   if (bio_ret)
-   *bio_ret = bio;
-   else
-   ret = submit_one_bio(bio, mirror_num, bio_flags);
+   *bio_ret = bio;
 
return ret;
 }
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/22] btrfs: assume that prev_em_start is always valid in __do_readpage

2018-03-08 Thread David Sterba

All callers pass a valid pointer, we can remove the redundant checks.

Signed-off-by: David Sterba 
---
 fs/btrfs/extent_io.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d00d5a59ff21..36514baa661e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2875,6 +2875,8 @@ __get_extent_map(struct inode *inode, struct page *page, 
size_t pg_offset,
  * handlers)
  * XXX JDM: This needs looking at to ensure proper page locking
  * return 0 on success, otherwise return error
+ *
+ * @prev_em_start: return value of previous em start value; must be valid
  */
 static int __do_readpage(struct extent_io_tree *tree,
 struct page *page,
@@ -2903,6 +2905,8 @@ static int __do_readpage(struct extent_io_tree *tree,
size_t blocksize = inode->i_sb->s_blocksize;
unsigned long this_bio_flag = 0;
 
+   ASSERT(prev_em_start);
+
set_page_extent_mapped(page);
 
end = page_end;
@@ -3012,12 +3016,11 @@ static int __do_readpage(struct extent_io_tree *tree,
 * non-optimal behavior (submitting 2 bios for the same extent).
 */
if (test_bit(EXTENT_FLAG_COMPRESSED, >flags) &&
-   prev_em_start && *prev_em_start != (u64)-1 &&
+   *prev_em_start != (u64)-1 &&
*prev_em_start != em->orig_start)
force_bio_submit = true;
 
-   if (prev_em_start)
-   *prev_em_start = em->orig_start;
+   *prev_em_start = em->orig_start;
 
free_extent_map(em);
em = NULL;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/22] Misc cleanups

2018-03-08 Thread David Sterba

Cleanups that have piled over the time. The "very important" unused
argument removals, some renaming and simplifications.

The effects on stack consumption are not huge, but still measurable:

ctree.c:tree_mod_log_insert_key   +16 (48 -> 64)
ctree.c:tree_mod_log_insert_move   -8 (104 -> 96)
ctree.c:insert_ptr.isra.24 -8 (80 -> 72)
ctree.c:del_ptr.isra.25-8 (96 -> 88)
ctree.c:btrfs_old_root_level   -8 (32 -> 24)
ctree.c:btrfs_set_item_key_safe-8 (88 -> 80)
extent_io.c:submit_extent_page-40 (112 -> 72)

David Sterba (22):
  btrfs: assume that bio_ret is always valid in submit_extent_page
  btrfs: assume that prev_em_start is always valid in __do_readpage
  btrfs: remove redundant variable in __do_readpage
  btrfs: cleanup merging conditions in submit_extent_page
  btrfs: document more parameters of submit_extent_page
  btrfs: drop fs_info parameter from tree_mod_log_set_node_key
  btrfs: drop fs_info parameter from tree_mod_log_insert_move
  btrfs: drop fs_info parameter from tree_mod_log_insert_key
  btrfs: drop fs_info parameter from tree_mod_log_free_eb
  btrfs: drop fs_info parameter from tree_mod_log_free_eb
  btrfs: drop unused fs_info parameter from tree_mod_log_eb_move
  btrfs: embed tree_mod_move structure to tree_mod_elem
  btrfs: drop fs_info parameter from __tree_mod_log_oldest_root
  btrfs: remove trivial locking wrappers of tree mod log
  btrfs: kill trivial wrapper tree_mod_log_eb_move
  btrfs: kill tree_mod_log_set_node_key helper
  btrfs: kill tree_mod_log_set_root_pointer helper
  btrfs: move allocation after simple tests in tree_mod_log_insert_key
  btrfs: separate types for submit_bio_start and submit_bio_done
  btrfs: remove unused parameters from extent_submit_bio_start_t
  btrfs: remove unused parameters from extent_submit_bio_done_t
  btrfs: rename submit callbacks and drop double underscores

 fs/btrfs/ctree.c | 234 ---
 fs/btrfs/disk-io.c   |  24 +++---
 fs/btrfs/disk-io.h   |   4 +-
 fs/btrfs/extent_io.c |  61 --
 fs/btrfs/extent_io.h |   7 ++
 fs/btrfs/inode.c |  30 +++
 6 files changed, 158 insertions(+), 202 deletions(-)

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 06/20] btrfs-progs: qgroups: add pathname to show output

2018-03-08 Thread Jeff Mahoney

On 3/8/18 12:33 AM, Qu Wenruo wrote:
> 
> 
> On 2018年03月08日 10:40, je...@suse.com wrote:
>> From: Jeff Mahoney 
>>
>> The btrfs qgroup show command currently only exports qgroup IDs,
>> forcing the user to resolve which subvolume each corresponds to.
>>
>> This patch adds pathname resolution to qgroup show so that when
>> the -P option is used, the last column contains the pathname of
>> the root of the subvolume it describes.  In the case of nested
>> qgroups, it will show the number of member qgroups or the paths
>> of the members if the -v option is used.
>>
>> Pathname can also be used as a sort parameter.
>>
>> Signed-off-by: Jeff Mahoney 
> 
> Reviewed-by: Qu Wenruo 
> 
> Except one nitpick inlined below.
> 
> [snip]
>>  }
>> +if (bq->pathname)
>> +free((void *)bq->pathname);
> 
> What about just free(bq->pathname);?
> 
> Is this (void *) used to get around the const prefix?

Yes.

Thanks,

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2] btrfs-progs: free-space-cache: Enhance free space cache free space check

2018-03-08 Thread Nikolay Borisov



On  8.03.2018 09:02, Qu Wenruo wrote:
> When we found free space difference between free space cache and block
> group item, we just discard this free space cache.
> 
> Normally such difference is caused by btrfs_reserve_extent() called by
> delalloc which is out of a transaction.
> And since all btrfs_release_extent() is called with a transaction, under
> heavy race free space cache can have less free space than block group
> item.
> 
> Normally kernel will detect such difference and just discard that cache.
> 
> However we must be more careful if free space cache has more free space
> cache, and if that happens, paried with above race one invalid free
> space cache can be loaded into kernel.
> 
> So if we find any free space cache who has more free space then block
> group item, we report it as an error other than ignoring it.
> 
> Signed-off-by: Qu Wenruo 
> ---
> v2:
>   Fix the timming of free space output.
> ---
>  check/main.c   |  4 +++-
>  free-space-cache.c | 32 
>  2 files changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/check/main.c b/check/main.c
> index 97baae583f04..bc31f7e32061 100644
> --- a/check/main.c
> +++ b/check/main.c
> @@ -5339,7 +5339,9 @@ static int check_space_cache(struct btrfs_root *root)
>   error += ret;
>   } else {
>   ret = load_free_space_cache(root->fs_info, cache);
> - if (!ret)
> + if (ret < 0)
> + error++;
> + if (ret <= 0)
>   continue;
>   }
>  
> diff --git a/free-space-cache.c b/free-space-cache.c
> index f933f9f1cf3f..9b83a71ca59a 100644
> --- a/free-space-cache.c
> +++ b/free-space-cache.c
> @@ -438,7 +438,8 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info,
>   struct btrfs_path *path;
>   u64 used = btrfs_block_group_used(_group->item);
>   int ret = 0;
> - int matched;
> + u64 bg_free;
> + s64 diff;
>  
>   path = btrfs_alloc_path();
>   if (!path)
> @@ -448,18 +449,33 @@ int load_free_space_cache(struct btrfs_fs_info *fs_info,
> block_group->key.objectid);
>   btrfs_free_path(path);
>  
> - matched = (ctl->free_space == (block_group->key.offset - used -
> -block_group->bytes_super));
> - if (ret == 1 && !matched) {
> - __btrfs_remove_free_space_cache(ctl);
> + bg_free = block_group->key.offset - used - block_group->bytes_super;
> + diff = ctl->free_space - bg_free;
> + if (ret == 1 && diff) {
>   fprintf(stderr,
> -"block group %llu has wrong amount of free space\n",
> -block_group->key.objectid);
> +"block group %llu has wrong amount of free space, free 
> space cache has %llu block group has %llu\n",nit: Always put units when 
> printing numbers. In this case we are talking
about bytes.
> +block_group->key.objectid, ctl->free_space, bg_free);
> + __btrfs_remove_free_space_cache(ctl);
> + /*
> +  * Due to btrfs_reserve_extent() can happen out of > +  
>  * transaction, but all btrfs_release_extent() happens inside
> +  * a transaction, so under heavy race it's possible that free
> +  * space cache has less free space, and both kernel just discard
> +  * such cache. But if we find some case where free space cache
> +  * has more free space, this means under certain case such
> +  * cache can be loaded and cause double allocate.
> +  *
> +  * Detect such possibility here.
> +  */
> + if (diff > 0)
> + error(
> +"free space cache has more free space than block group item, this could 
> leads to serious corruption, please contact btrfs developers");

I'm not entirely happy with this message. So they will post to the
mailing list saying something along the lines of "I got this message
what do I do no, please help".  Better to output actionable data so that
the user can post it immediately.

>   ret = -1;
>   }
>  
>   if (ret < 0) {
> - ret = 0;
> + if (diff <= 0)
> + ret = 0;
>  
>   fprintf(stderr,
>  "failed to load free space cache for block group %llu\n",
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Improve error stats message

2018-03-08 Thread Anand Jain




On 03/08/2018 01:37 AM, Diego wrote:

A typical notification of filesystem errors looks like this:

BTRFS error (device sda2): bdev /dev/sda2 errs: wr 0, rd 1, flush 0, corrupt 0, 
gen 0

The device name is being printed twice. Also, these abbreviatures
feel unnecesary. Make the message look like this instead:

BTRFS error (device sda2): errors: write 0, read 1, flush 0, corrupt 0, 
generation 0


Signed-off-by: Diego Calleja 
---
  fs/btrfs/volumes.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2ceb924ca0d6..52fee5bb056f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7238,9 +7238,8 @@ static void btrfs_dev_stat_print_on_error(struct 
btrfs_device *dev)
  {
if (!dev->dev_stats_valid)
return;
-   btrfs_err_rl_in_rcu(dev->fs_info,
-   "bdev %s errs: wr %u, rd %u, flush %u, corrupt %u, gen %u",
-  rcu_str_deref(dev->name),
+   btrfs_err_rl(dev->fs_info,
+   "errors: write %u, read %u, flush %u, corrupt %u, generation 
%u",
   btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_WRITE_ERRS),
   btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_READ_ERRS),
   btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_FLUSH_ERRS),




As btrfs_err_rl()->btrfs_printk() does its magic to print the
latest_bdev, this will print the wrong device for the error
reporting in case of multiple device FS.

There is this RFC in the ML, to use FSID insted. Comments are
welcome.

   [RFC PATCH] Btrfs: fix fs logging for multi device

Thanks, Anand

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Per subvolume "RAID" level?!

2018-03-08 Thread Austin S. Hemmelgarn


On 2018-03-08 05:36, waxhead wrote:
Just out of curiosity, are there any work going on for enabling 
different "RAID" levels per subvolume?!
Not that I know of, but it would be great to have (I could get rid of 
some of the various small isolated volumes I have solely to have a 
different storage profile for part of the system (it makes essentially 
zero sense to use raid1 for /var/cache)), assuming that cross-profile 
reflinks are properly disallowed (or have an option to be disallowed).


And out of even more curiosity how is this planned to be handled with 
btrfs balance?! When per subvolume "RAID" levels are good to go, how 
would you then run the balance filters to convert / leave alone certain 
parts of the filesystem?!
I would assume a new balance filter would be added to select only chunks 
used by a given subvolume.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] btrfs: introduce conditional wakeup helpers

2018-03-08 Thread Nikolay Borisov



On  8.03.2018 13:49, David Sterba wrote:
> Add convenience wrappers for the waitqueue management that involves
> memory barriers to prevent deadlocks. The helpers will let us remove
> barriers and the necessary comments in several places.
> 
> Signed-off-by: David Sterba 

Reviewed-by: Nikolay Borisov 

> ---
>  fs/btrfs/ctree.h | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index bf545e7552b8..c9827a1b676d 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3748,4 +3748,27 @@ static inline int btrfs_is_testing(struct 
> btrfs_fs_info *fs_info)
>  #endif
>   return 0;
>  }
> +
> +static inline void cond_wake_up(struct wait_queue_head *wq)
> +{
> + /*
> +  * This implies a full smp_mb barrier, see comments for
> +  * waitqueue_active why.
> +  */
> + if (wq_has_sleeper(wq))
> + wake_up(wq);
> +}
> +
> +static inline void cond_wake_up_nomb(struct wait_queue_head *wq)
> +{
> + /*
> +  * Special case for conditional wakeup where the barrier required for
> +  * waitqueue_active is implied by some of the preceding code. Eg. one
> +  * of such atomic operations (atomic_dec_and_return, ...), or a
> +  * unlock/lock sequence, etc.
> +  */
> + if (waitqueue_active(wq))
> + wake_up(wq);
> +}
> +
>  #endif
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] btrfs: replace waitqueue_actvie with cond_wake_up

2018-03-08 Thread Nikolay Borisov



On  8.03.2018 13:49, David Sterba wrote:
> Use the wrappers and reduce the amount of low-level details about the
> waitqueue management.
> 
> Signed-off-by: David Sterba 
> ---
>  fs/btrfs/compression.c   |  7 +--
>  fs/btrfs/delayed-inode.c |  9 +++--
>  fs/btrfs/dev-replace.c   | 10 --
>  fs/btrfs/extent-tree.c   |  7 +--
>  fs/btrfs/inode.c |  9 +++--
>  fs/btrfs/locking.c   | 34 +++---
>  fs/btrfs/ordered-data.c  | 14 --
>  fs/btrfs/transaction.c   |  7 +--
>  fs/btrfs/tree-log.c  | 28 
>  9 files changed, 36 insertions(+), 89 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 562c3e633403..2d2d7380d381 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -1003,12 +1003,7 @@ static void __free_workspace(int type, struct 
> list_head *workspace,
>   btrfs_compress_op[idx]->free_workspace(workspace);
>   atomic_dec(total_ws);
>  wake:
> - /*
> -  * Make sure counter is updated before we wake up waiters.
> -  */
> - smp_mb();
> - if (waitqueue_active(ws_wait))
> - wake_up(ws_wait);
> + cond_wake_up(ws_wait);
>  }
>  
>  static void free_workspace(int type, struct list_head *ws)
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index d06bef16ebd5..3e7f5f26ff0f 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -472,13 +472,10 @@ static void finish_one_item(struct btrfs_delayed_root 
> *delayed_root)
>  {
>   int seq = atomic_inc_return(_root->items_seq);
>  
> - /*
> -  * atomic_dec_return implies a barrier for waitqueue_active
> -  */
> + /* atomic_dec_return implies a barrier for waitqueue_active */
>   if ((atomic_dec_return(_root->items) <
> - BTRFS_DELAYED_BACKGROUND || seq % BTRFS_DELAYED_BATCH == 0) &&
> - waitqueue_active(_root->wait))
> - wake_up(_root->wait);
> + BTRFS_DELAYED_BACKGROUND || seq % BTRFS_DELAYED_BATCH == 0))
> + cond_wake_up_nomb(_root->wait);
>  }
>  
>  static void __btrfs_remove_delayed_item(struct btrfs_delayed_item 
> *delayed_item)
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index e279f04b3388..f498572155f1 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -928,9 +928,9 @@ void btrfs_dev_replace_clear_lock_blocking(
>   ASSERT(atomic_read(_replace->read_locks) > 0);
>   ASSERT(atomic_read(_replace->blocking_readers) > 0);
>   read_lock(_replace->lock);
> - if (atomic_dec_and_test(_replace->blocking_readers) &&
> - waitqueue_active(_replace->read_lock_wq))
> - wake_up(_replace->read_lock_wq);
> + /* Barrier implied by atomic_dec_and_test */
> + if (atomic_dec_and_test(_replace->blocking_readers))
> + cond_wake_up_nomb(_replace->read_lock_wq);
>  }
>  
>  void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
> @@ -941,9 +941,7 @@ void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info 
> *fs_info)
>  void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount)
>  {
>   percpu_counter_sub(_info->bio_counter, amount);
> -
> - if (waitqueue_active(_info->replace_wait))
> - wake_up(_info->replace_wait);
> + cond_wake_up_nomb(_info->replace_wait);

nit/offtopic:

I think here the code requires comments since we have 2 types of waiters for 
fs_info->replace_wait. One is dependent on the percpu_counter_sum (i.e. the 
btrfs_rm_dev_replace_blocked). And then there is another condition on the same 
wait entry - the btrfs_bio_counter_inc_blocked i.e:

 wait_event(fs_info->replace_wait,   
   !test_bit(BTRFS_FS_STATE_DEV_REPLACING,  
 _info->fs_state)); 

geez, who would come up with such synchronization ... 
>  }
>  
>  void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 2760292e1175..d57801711884 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10999,12 +10999,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, 
> struct fstrim_range *range)
>  void btrfs_end_write_no_snapshotting(struct btrfs_root *root)
>  {
>   percpu_counter_dec(>subv_writers->counter);
> - /*
> -  * Make sure counter is updated before we wake up waiters.
> -  */
> - smp_mb();
> - if (waitqueue_active(>subv_writers->wait))
> - wake_up(>subv_writers->wait);
> + cond_wake_up(>subv_writers->wait);
>  }
>  
>  int btrfs_start_write_no_snapshotting(struct btrfs_root *root)
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index fc5b7d82b842..b963b5b4734e 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1168,13 +1168,10 @@ static noinline void

[PATCH 2/2] btrfs: replace waitqueue_actvie with cond_wake_up

2018-03-08 Thread David Sterba

Use the wrappers and reduce the amount of low-level details about the
waitqueue management.

Signed-off-by: David Sterba 
---
 fs/btrfs/compression.c   |  7 +--
 fs/btrfs/delayed-inode.c |  9 +++--
 fs/btrfs/dev-replace.c   | 10 --
 fs/btrfs/extent-tree.c   |  7 +--
 fs/btrfs/inode.c |  9 +++--
 fs/btrfs/locking.c   | 34 +++---
 fs/btrfs/ordered-data.c  | 14 --
 fs/btrfs/transaction.c   |  7 +--
 fs/btrfs/tree-log.c  | 28 
 9 files changed, 36 insertions(+), 89 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 562c3e633403..2d2d7380d381 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1003,12 +1003,7 @@ static void __free_workspace(int type, struct list_head 
*workspace,
btrfs_compress_op[idx]->free_workspace(workspace);
atomic_dec(total_ws);
 wake:
-   /*
-* Make sure counter is updated before we wake up waiters.
-*/
-   smp_mb();
-   if (waitqueue_active(ws_wait))
-   wake_up(ws_wait);
+   cond_wake_up(ws_wait);
 }
 
 static void free_workspace(int type, struct list_head *ws)
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index d06bef16ebd5..3e7f5f26ff0f 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -472,13 +472,10 @@ static void finish_one_item(struct btrfs_delayed_root 
*delayed_root)
 {
int seq = atomic_inc_return(_root->items_seq);
 
-   /*
-* atomic_dec_return implies a barrier for waitqueue_active
-*/
+   /* atomic_dec_return implies a barrier for waitqueue_active */
if ((atomic_dec_return(_root->items) <
-   BTRFS_DELAYED_BACKGROUND || seq % BTRFS_DELAYED_BATCH == 0) &&
-   waitqueue_active(_root->wait))
-   wake_up(_root->wait);
+   BTRFS_DELAYED_BACKGROUND || seq % BTRFS_DELAYED_BATCH == 0))
+   cond_wake_up_nomb(_root->wait);
 }
 
 static void __btrfs_remove_delayed_item(struct btrfs_delayed_item 
*delayed_item)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index e279f04b3388..f498572155f1 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -928,9 +928,9 @@ void btrfs_dev_replace_clear_lock_blocking(
ASSERT(atomic_read(_replace->read_locks) > 0);
ASSERT(atomic_read(_replace->blocking_readers) > 0);
read_lock(_replace->lock);
-   if (atomic_dec_and_test(_replace->blocking_readers) &&
-   waitqueue_active(_replace->read_lock_wq))
-   wake_up(_replace->read_lock_wq);
+   /* Barrier implied by atomic_dec_and_test */
+   if (atomic_dec_and_test(_replace->blocking_readers))
+   cond_wake_up_nomb(_replace->read_lock_wq);
 }
 
 void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
@@ -941,9 +941,7 @@ void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info 
*fs_info)
 void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount)
 {
percpu_counter_sub(_info->bio_counter, amount);
-
-   if (waitqueue_active(_info->replace_wait))
-   wake_up(_info->replace_wait);
+   cond_wake_up_nomb(_info->replace_wait);
 }
 
 void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 2760292e1175..d57801711884 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10999,12 +10999,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, 
struct fstrim_range *range)
 void btrfs_end_write_no_snapshotting(struct btrfs_root *root)
 {
percpu_counter_dec(>subv_writers->counter);
-   /*
-* Make sure counter is updated before we wake up waiters.
-*/
-   smp_mb();
-   if (waitqueue_active(>subv_writers->wait))
-   wake_up(>subv_writers->wait);
+   cond_wake_up(>subv_writers->wait);
 }
 
 int btrfs_start_write_no_snapshotting(struct btrfs_root *root)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index fc5b7d82b842..b963b5b4734e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1168,13 +1168,10 @@ static noinline void async_cow_submit(struct btrfs_work 
*work)
nr_pages = (async_cow->end - async_cow->start + PAGE_SIZE) >>
PAGE_SHIFT;
 
-   /*
-* atomic_sub_return implies a barrier for waitqueue_active
-*/
+   /* atomic_sub_return implies a barrier */
if (atomic_sub_return(nr_pages, _info->async_delalloc_pages) <
-   5 * SZ_1M &&
-   waitqueue_active(_info->async_submit_wait))
-   wake_up(_info->async_submit_wait);
+   5 * SZ_1M)
+   cond_wake_up_nomb(_info->async_submit_wait);
 
if (async_cow->inode)
submit_compressed_extents(async_cow->inode, async_cow);
diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index 621083f8932c..cce666cd104e

[PATCH 0/2] Cleanup waitqueue_active and barriers

2018-03-08 Thread David Sterba

This is depends on current misc-next with "btrfs: Relax memory barrier
in btrfs_tree_unlock".

David Sterba (2):
  btrfs: introduce conditional wakeup helpers
  btrfs: replace waitqueue_actvie with cond_wake_up

 fs/btrfs/compression.c   |  7 +--
 fs/btrfs/ctree.h | 23 +++
 fs/btrfs/delayed-inode.c |  9 +++--
 fs/btrfs/dev-replace.c   | 10 --
 fs/btrfs/extent-tree.c   |  7 +--
 fs/btrfs/inode.c |  9 +++--
 fs/btrfs/locking.c   | 34 +++---
 fs/btrfs/ordered-data.c  | 14 --
 fs/btrfs/transaction.c   |  7 +--
 fs/btrfs/tree-log.c  | 28 
 10 files changed, 59 insertions(+), 89 deletions(-)

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] btrfs: introduce conditional wakeup helpers

2018-03-08 Thread David Sterba

Add convenience wrappers for the waitqueue management that involves
memory barriers to prevent deadlocks. The helpers will let us remove
barriers and the necessary comments in several places.

Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index bf545e7552b8..c9827a1b676d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3748,4 +3748,27 @@ static inline int btrfs_is_testing(struct btrfs_fs_info 
*fs_info)
 #endif
return 0;
 }
+
+static inline void cond_wake_up(struct wait_queue_head *wq)
+{
+   /*
+* This implies a full smp_mb barrier, see comments for
+* waitqueue_active why.
+*/
+   if (wq_has_sleeper(wq))
+   wake_up(wq);
+}
+
+static inline void cond_wake_up_nomb(struct wait_queue_head *wq)
+{
+   /*
+* Special case for conditional wakeup where the barrier required for
+* waitqueue_active is implied by some of the preceding code. Eg. one
+* of such atomic operations (atomic_dec_and_return, ...), or a
+* unlock/lock sequence, etc.
+*/
+   if (waitqueue_active(wq))
+   wake_up(wq);
+}
+
 #endif
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] btrfs: tree-checker: Avoid accidental stack VLA

2018-03-08 Thread David Sterba

On Wed, Mar 07, 2018 at 07:30:47PM -0800, Kees Cook wrote:
> In the quest to remove all stack VLAs from the kernel[1], this refactors
> the stack array size calculation to avoid using max(), which makes the
> compiler think the size isn't fixed.
> 
> [1] https://lkml.org/lkml/2018/3/7/621
> 
> Signed-off-by: Kees Cook 

Acked-by: David Sterba 

for whatever name you decide for the max macro.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/3] vsprintf: Remove accidental VLA usage

2018-03-08 Thread Thomas Gleixner

On Thu, 8 Mar 2018, Rasmus Villemoes wrote:
> On 2018-03-08 04:30, Kees Cook wrote:
> >  /**
> > + * SIMPLE_MAX - return maximum of two values without any type checking
> > + * @x: first value
> > + * @y: second value
> > + *
> > + * This should only be used in stack array sizes, since the type-checking
> > + * from max() confuses the compiler into thinking a VLA is being used.
> > + */
> > +#define SIMPLE_MAX(x, y)   ((size_t)(x) > (size_t)(y) ? (size_t)(x) \
> > +  : (size_t)(y))
> 
> This will be abused at some point, leading to the usual double
> evaluation etc. etc. problems. The name is also too long (and in general
> we should avoid adjectives like "simple", "safe", people reading the
> code won't know what is simple or safe about it). I think this should work
> 
> #define MAX(x, y) (__builtin_choose_expr((x) > (y), x, y))
> 
> That forces (x)>(y) to be a compile-time constant, so x and y must also
> be; hence there can be no side effects. The MIN version of this could
> replace the custom __const_min in fs/file.c, and probably other places
> as well.
> 
> I tested that this at least works in the vsprintf case, -Wvla no longer
> complains. fs/file.c also compiles with the MIN version of this.
> 
> I suppose MIN and MAX will collide with other uses in the tree. Hmm.

Make it CONST_MAX() or something like that which makes it entirely clear.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)" with 8TB HDDs

2018-03-08 Thread Menion

Actually this path can be taken in few occurrency

1) device probe, only when the device is plugged or detected the first time
2) revalidate_disk fops of block device

Is it possible that BTRFS every 5 minutes call the revalidate_disk?

2018-03-08 11:16 GMT+01:00 Menion :
> Hi again
> I had a discussion in linux-scsi about this topic
> My understanding is that it is true that the read_capacity is opaque
> to the filesystem but it is also true that the scsi layer export two
> specific read_capacity ops, the read10 and read16 and the upper layers
> shall select the proper one, based on the response of the other.
> In the log, I see that this read_capacity_10 is called every 5
> minutes, and it fallback to read_capacity_16, since who is doing it
> endup in calling sd_read_capacity in scsi layer, rather then pickup
> read10 or read16 directly
> I am not telling that BTRFS is doing it for sure, but I have ruled out
> smartd, so based on the periodicity of 5 minutes, can you think about
> anything in the BTRFS internals that can be responsible of this?
>
> 2018-03-02 17:19 GMT+01:00 Menion :
>> Thanks
>> My point was to understand if this action was taken by BTRFS or
>> automously by scsi.
>> From your word it seems clear to me that this should go in
>> KERNEL_DEBUG level, instead of KERNEL_NOTICE
>> Bye
>>
>> 2018-03-02 16:18 GMT+01:00 David Sterba :
>>> On Fri, Mar 02, 2018 at 12:37:49PM +0100, Menion wrote:
 Is it really a no problem? I mean, for some reason BTRFS is
 continuously read the HDD capacity in an array, that does not seem to
 be really correct
>>>
>>> The message comes from SCSI:
>>> https://elixir.bootlin.com/linux/latest/source/drivers/scsi/sd.c#L2508
>>>
>>> Reading drive capacity could be totally opaque for the filesystem, eg.
>>> when the scsi layer compares the requested block address with the device
>>> size.
>>>
>>> The sizes of blockdevices is obtained from the i_size member of the
>>> inode representing the block device, so there's no direct read by btrfs.
>>> You'd have better luck reporting that to scsi or block layer
>>> mailinglists.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fstests: test regression of -EEXIST on creating new file after log replay

2018-03-08 Thread Filipe Manana

On Wed, Mar 7, 2018 at 11:59 PM, Liu Bo  wrote:
> The regression is introduced to btrfs in linux v4.4 and it refuses to create
> new files after log replay by returning -EEXIST.
>
> Although the problem is on btrfs only, there is no btrfs stuff in terms of
> test, so this makes it generic.
>
> The kernel fix is
>   Btrfs: fix unexpected -EEXIST when creating new inode
>
> Signed-off-by: Liu Bo 
> ---
>  tests/generic/481 | 75 
> +++
>  tests/generic/481.out |  5 
>  tests/generic/group   |  1 +
>  3 files changed, 81 insertions(+)
>  create mode 100755 tests/generic/481
>  create mode 100644 tests/generic/481.out
>
> diff --git a/tests/generic/481 b/tests/generic/481
> new file mode 100755
> index 000..8d8bb2b
> --- /dev/null
> +++ b/tests/generic/481
> @@ -0,0 +1,75 @@
> +#! /bin/bash
> +# FSQA Test No. 481
> +#
> +# Reproduce a regression of btrfs that leads to -EEXIST on creating new files
> +# after log replay.
> +#
> +# The kernel fix is
> +#   Btrfs: fix unexpected -EEXIST when creating new inode
> +#
> +#---
> +#
> +# Copyright (C) 2018 Oracle. All Rights Reserved.
> +# Author: Bo Liu 
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +tmp=/tmp/$$
> +status=1   # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +   _cleanup_flakey
> +   cd /
> +   rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/dmflakey
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch
> +_require_dm_target flakey
> +
> +rm -f $seqres.full
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_init_flakey
> +_mount_flakey
> +
> +# create a file and keep it in write ahead log
> +$XFS_IO_PROG -f -c "pwrite 0 4k" -c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io
> +
> +# fail this filesystem so that remount can replay the write ahead log
> +_flakey_drop_and_remount
> +
> +# see if we can create a new file successfully
> +touch $SCRATCH_MNT/bar
> +
> +_unmount_flakey
> +
> +echo "Silence is golden"
> +
> +status=0
> +exit
> diff --git a/tests/generic/481.out b/tests/generic/481.out
> new file mode 100644
> index 000..66a6345
> --- /dev/null
> +++ b/tests/generic/481.out
> @@ -0,0 +1,5 @@
> +QA output created by 481
> +wrote 4096/4096 bytes at offset 0
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +touch: cannot touch '/mnt/scratch/bar': File exists

I missed this before, but this error from 'touch' is not supposed to happen.
So this golden output is for the buggy, unpatched btrfs.

> +Silence is golden
> diff --git a/tests/generic/group b/tests/generic/group
> index ea2056b..05f60f2 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -483,3 +483,4 @@
>  478 auto quick
>  479 auto quick metadata
>  480 auto quick metadata
> +481 auto quick
> --
> 2.5.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Per subvolume "RAID" level?!

2018-03-08 Thread waxhead

Just out of curiosity, are there any work going on for enabling 
different "RAID" levels per subvolume?!


And out of even more curiosity how is this planned to be handled with 
btrfs balance?! When per subvolume "RAID" levels are good to go, how 
would you then run the balance filters to convert / leave alone certain 
parts of the filesystem?!

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)" with 8TB HDDs

2018-03-08 Thread Menion

Hi again
I had a discussion in linux-scsi about this topic
My understanding is that it is true that the read_capacity is opaque
to the filesystem but it is also true that the scsi layer export two
specific read_capacity ops, the read10 and read16 and the upper layers
shall select the proper one, based on the response of the other.
In the log, I see that this read_capacity_10 is called every 5
minutes, and it fallback to read_capacity_16, since who is doing it
endup in calling sd_read_capacity in scsi layer, rather then pickup
read10 or read16 directly
I am not telling that BTRFS is doing it for sure, but I have ruled out
smartd, so based on the periodicity of 5 minutes, can you think about
anything in the BTRFS internals that can be responsible of this?

2018-03-02 17:19 GMT+01:00 Menion :
> Thanks
> My point was to understand if this action was taken by BTRFS or
> automously by scsi.
> From your word it seems clear to me that this should go in
> KERNEL_DEBUG level, instead of KERNEL_NOTICE
> Bye
>
> 2018-03-02 16:18 GMT+01:00 David Sterba :
>> On Fri, Mar 02, 2018 at 12:37:49PM +0100, Menion wrote:
>>> Is it really a no problem? I mean, for some reason BTRFS is
>>> continuously read the HDD capacity in an array, that does not seem to
>>> be really correct
>>
>> The message comes from SCSI:
>> https://elixir.bootlin.com/linux/latest/source/drivers/scsi/sd.c#L2508
>>
>> Reading drive capacity could be totally opaque for the filesystem, eg.
>> when the scsi layer compares the requested block address with the device
>> size.
>>
>> The sizes of blockdevices is obtained from the i_size member of the
>> inode representing the block device, so there's no direct read by btrfs.
>> You'd have better luck reporting that to scsi or block layer
>> mailinglists.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fstests: test regression of -EEXIST on creating new file after log replay

2018-03-08 Thread Filipe Manana

On Wed, Mar 7, 2018 at 11:59 PM, Liu Bo  wrote:
> The regression is introduced to btrfs in linux v4.4 and it refuses to create
> new files after log replay by returning -EEXIST.
>
> Although the problem is on btrfs only, there is no btrfs stuff in terms of
> test, so this makes it generic.
>
> The kernel fix is
>   Btrfs: fix unexpected -EEXIST when creating new inode
>
> Signed-off-by: Liu Bo 
> ---
>  tests/generic/481 | 75 
> +++
>  tests/generic/481.out |  5 
>  tests/generic/group   |  1 +
>  3 files changed, 81 insertions(+)
>  create mode 100755 tests/generic/481
>  create mode 100644 tests/generic/481.out
>
> diff --git a/tests/generic/481 b/tests/generic/481
> new file mode 100755
> index 000..8d8bb2b
> --- /dev/null
> +++ b/tests/generic/481
> @@ -0,0 +1,75 @@
> +#! /bin/bash
> +# FSQA Test No. 481
> +#
> +# Reproduce a regression of btrfs that leads to -EEXIST on creating new files
> +# after log replay.
> +#
> +# The kernel fix is
> +#   Btrfs: fix unexpected -EEXIST when creating new inode
> +#
> +#---
> +#
> +# Copyright (C) 2018 Oracle. All Rights Reserved.
> +# Author: Bo Liu 
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +tmp=/tmp/$$
> +status=1   # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +   _cleanup_flakey
> +   cd /
> +   rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/dmflakey
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch
> +_require_dm_target flakey
> +
> +rm -f $seqres.full
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_init_flakey
> +_mount_flakey
> +
> +# create a file and keep it in write ahead log
> +$XFS_IO_PROG -f -c "pwrite 0 4k" -c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io

What's the point of writing data to the file if we later don't check
that the data is there (and the file size is 4K)?
To trigger the bug in btrfs, it was also not needed to write data to
the file in the first place.

Other then that looks good.
Thanks for doing this.

Reviewed-by: Filipe Manana 

> +
> +# fail this filesystem so that remount can replay the write ahead log
> +_flakey_drop_and_remount
> +
> +# see if we can create a new file successfully
> +touch $SCRATCH_MNT/bar
> +
> +_unmount_flakey
> +
> +echo "Silence is golden"
> +
> +status=0
> +exit
> diff --git a/tests/generic/481.out b/tests/generic/481.out
> new file mode 100644
> index 000..66a6345
> --- /dev/null
> +++ b/tests/generic/481.out
> @@ -0,0 +1,5 @@
> +QA output created by 481
> +wrote 4096/4096 bytes at offset 0
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +touch: cannot touch '/mnt/scratch/bar': File exists
> +Silence is golden
> diff --git a/tests/generic/group b/tests/generic/group
> index ea2056b..05f60f2 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -483,3 +483,4 @@
>  478 auto quick
>  479 auto quick metadata
>  480 auto quick metadata
> +481 auto quick
> --
> 2.5.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] btrfs: Add unprivileged version of ino_lookup ioctl

2018-03-08 Thread Nikolay Borisov



On  6.03.2018 10:31, Misono, Tomohiro wrote:
> Add unprivileged version of ino_lookup ioctl (BTRFS_IOC_INO_LOOKUP_USER)
> to allow normal users to call "btrfs subvololume list/show" etc. in
> combination with BTRFS_IOC_GET_SUBVOL_INFO.
> 
> This can be used like BTRFS_IOC_INO_LOOKUP but the argument is
> different because it also returns the name of bottom subvolume,
> which is ommited by BTRFS_IOC_GET_SUBVOL_INFO ioctl.
> Also, this must be called with fd which exists under the tree of
> args->treeid to prevent for user to search any tree.
> 
> The main differences from original ino_lookup ioctl are:
>   1. Read permission will be checked using inode_permission()
>  during path construction. -EACCES will be returned in case
>  of failure.
>   2. Path construction will be stopped at the inode number which
>  corresponds to the fd with which this ioctl is called. If
>  constructed path does not exist under fd's inode, -EACCES
>  will be returned.
>   3. The name of bottom subvolume is also searched and filled.
> 
> Note that the maximum length of path is shorter 272 bytes than

Did you mean 255? Since BTRFS_VOL_NAME_MAX is defined as 255?

> ino_lookup ioctl because of space of subvolume's id and name.
> 
> Signed-off-by: Tomohiro Misono 
> ---
>  fs/btrfs/ioctl.c   | 218 
> +
>  include/uapi/linux/btrfs.h |  16 
>  2 files changed, 234 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 1dba309dce31..ac23da98b7e7 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2425,6 +2425,179 @@ static noinline int btrfs_search_path_in_tree(struct 
> btrfs_fs_info *info,
>   return ret;
>  }
>  
> +static noinline int btrfs_search_path_in_tree_user(struct btrfs_fs_info 
> *info,
> + struct super_block *sb,
> + struct btrfs_key upper_limit,
> + struct btrfs_ioctl_ino_lookup_user_args *args)
> +{
> + struct btrfs_root *root;
> + struct btrfs_key key, key2;
> + char *ptr;
> + int ret = -1;
> + int slot;
> + int len;
> + int total_len = 0;
> + u64 dirid = args->dirid;
> + struct btrfs_inode_ref *iref;
> + struct btrfs_root_ref rref;
> +
> + unsigned long item_off;
> + unsigned long item_len;
> +
> + struct extent_buffer *l;
> + struct btrfs_path *path = NULL;
> +
> + struct inode *inode;
> + int *new = 0;
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> +
> + if (dirid == upper_limit.objectid)
> + /*
> +  * If the bottom subvolume exists directly under upper limit,
> +  * there is no need to construct the path and just get the
> +  * subvolume's name
> +  */
> + goto get_subvol_name;

Eliminate the label and factor out the code in a new function or just
put in the body of the if.

> + if (dirid == BTRFS_FIRST_FREE_OBJECTID)
> + /* The subvolume does not exist under upper_limit */
> + goto access_err;

just set ret to -EACCESS and goto out. And eliminate access_err label
altogether. Furthermore, shouldn't this check really ether:

a) Be the first one in this function so that we return ASAP (i.e. even
before calling btrfs_alloc_path)

or

b) Put in the ioctl function itself. Currently for the
btrfs_ioctl_ino_lookup case it's duplicated in both that function and
btrfs_search_path_in_tree

> +
> + ptr = >path[BTRFS_INO_LOOKUP_PATH_MAX - 1];
> +
> + key.objectid = args->treeid;
> + key.type = BTRFS_ROOT_ITEM_KEY;
> + key.offset = (u64)-1;
> + root = btrfs_read_fs_root_no_name(info, );
> + if (IS_ERR(root)) {
> + btrfs_err(info, "could not find root %llu", args->treeid);
> + ret = -ENOENT;
> + goto out;
> + }
> +
> + key.objectid = dirid;
> + key.type = BTRFS_INODE_REF_KEY;
> + key.offset = (u64)-1;
> +
> + while (1) {
> + ret = btrfs_search_slot(NULL, root, , path, 0, 0);
> + if (ret < 0)
> + goto out;
> + else if (ret > 0) {
> + ret = btrfs_previous_item(root, path, dirid,
> +   BTRFS_INODE_REF_KEY);
> + if (ret < 0)
> + goto out;
> + else if (ret > 0) {
> + ret = -ENOENT;
> + goto out;
> + }
> + }
> +
> + l = path->nodes[0];
> + slot = path->slots[0];
> + btrfs_item_key_to_cpu(l, , slot);
> +
> + iref = btrfs_item_ptr(l, slot, struct btrfs_inode_ref);
> + len = btrfs_inode_ref_name_len(l, iref);
> + ptr -= len + 1;
> + total_len += len + 1;
> +

[PATCH v3] fstests: btrfs/146: make sure hit all stripes in the case of compression

2018-03-08 Thread Lu Fengqi

In the case of compression, each 128K input data chunk will be compressed
to 4K (because of the characters written are duplicate). Therefore we have
to write (128K * 16) to make sure every stripe can be hit.

Signed-off-by: Lu Fengqi 
---
V3: Write ($number_of_devices * 2048)K data unconditionally.
V2: Modify the regular expression to ensure that it matches various
compress mount options.

 tests/btrfs/146 | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tests/btrfs/146 b/tests/btrfs/146
index 7071c128ca0a..246baba1427a 100755
--- a/tests/btrfs/146
+++ b/tests/btrfs/146
@@ -74,9 +74,12 @@ _scratch_pool_mkfs "-d raid0 -m raid1" > $seqres.full 2>&1
 _scratch_mount
 
 # How much do we need to write? We need to hit all of the stripes. btrfs uses
-# a fixed 64k stripesize, so write enough to hit each one
+# a fixed 64k stripesize, so write enough to hit each one. In the case of
+# compression, each 128K input data chunk will be compressed to 4K (because of
+# the characters written are duplicate). Therefore we have to write (128K * 16)
+# to make sure every stripe can be hit.
 number_of_devices=`echo $SCRATCH_DEV_POOL | wc -w`
-write_kb=$(($number_of_devices * 64))
+write_kb=$(($number_of_devices * 2048))
 _require_fs_space $SCRATCH_MNT $write_kb
 
 testfile=$SCRATCH_MNT/fsync-err-test
-- 
2.16.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] btrfs: Add unprivileged subvolume search ioctl

2018-03-08 Thread Nikolay Borisov



On  6.03.2018 10:30, Misono, Tomohiro wrote:
> Add new unprivileged ioctl (BTRFS_IOC_GET_SUBVOL_INFO) which searches
> and returns only subvolume related item (ROOT_ITEM/ROOT_BACKREF/ROOT_REF)
> from root tree. The arguments of this ioctl are the same as treesearch
> ioctl and can be used like treesearch ioctl.
> 
> Since treesearch ioctl requires root privilege, this ioctl is needed
> to allow normal users to call "btrfs subvolume list/show" etc.
> 
> Note that the subvolume name in ROOT_BACKREF/ROOT_REF will not be copied to
> prevent potential name leak. The name can be obtained by calling
> user version's ino_lookup ioctl (BTRFS_IOC_INO_LOOKUP_USER).
> 
> Signed-off-by: Tomohiro Misono 
> ---
>  fs/btrfs/ioctl.c   | 254 
> +
>  include/uapi/linux/btrfs.h |   2 +
>  2 files changed, 256 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 111ee282b777..1dba309dce31 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1921,6 +1921,28 @@ static noinline int key_in_sk(struct btrfs_key *key,
>   return 1;
>  }
>  
> +/*
> + * check if key is in sk and subvolume related range
> + */
> +static noinline int key_in_sk_and_subvol(struct btrfs_key *key,
> +   struct btrfs_ioctl_search_key *sk)
> +{
> + int ret;
> +
> + ret = key_in_sk(key, sk);
> + if (!ret)
> + return 0;
> +
> + if ((key->objectid == BTRFS_FS_TREE_OBJECTID ||
> + (key->objectid >= BTRFS_FIRST_FREE_OBJECTID &&
> +  key->objectid <= BTRFS_LAST_FREE_OBJECTID)) &&

Why do you need the FIRST_FREE/LAST_FREE object id checks? The range
described by those are ordinary files. Since you are only interested in
subvolume data then you should omit those, no?

> + key->type >= BTRFS_ROOT_ITEM_KEY &&
> + key->type <= BTRFS_ROOT_BACKREF_KEY)
I think this check should really be:

if (key->objectid == BTRFS_FS_TREE_OBJECTID || key->objectid ==
BTRFS_ROOT_ITEM_KEY || key->type == BTRFS_ROOT_BACKREF_KEY

> + return 1;
> +
> + return 0;
> +}
> +
>  static noinline int copy_to_sk(struct btrfs_path *path,
>  struct btrfs_key *key,
>  struct btrfs_ioctl_search_key *sk,
> @@ -2045,6 +2067,142 @@ static noinline int copy_to_sk(struct btrfs_path 
> *path,
>   return ret;
>  }
>  
> +/*
> + * Basically the same as copy_to_sk() but restricts the copied item
> + * within subvolume related one using key_in_sk_and_subvol().
> + * Also, the name of subvolume will be omitted from
> + * ROOT_BACKREF/ROOT_REF item.
> + */
> +static noinline int copy_subvol_item(struct btrfs_path *path,
> +struct btrfs_key *key,
> +struct btrfs_ioctl_search_key *sk,
> +size_t *buf_size,
> +char __user *ubuf,
> +unsigned long *sk_offset,
> +int *num_found)
> +{
> + u64 found_transid;
> + struct extent_buffer *leaf;
> + struct btrfs_ioctl_search_header sh;
> + struct btrfs_key test;
> + unsigned long item_off;
> + unsigned long item_len;
> + int nritems;
> + int i;
> + int slot;
> + int ret = 0;
> +
> + leaf = path->nodes[0];
> + slot = path->slots[0];
> + nritems = btrfs_header_nritems(leaf);
> +
> + if (btrfs_header_generation(leaf) > sk->max_transid) {
> + i = nritems;
> + goto advance_key;

I don't see why you need a goto label at all. Just put the necessary
code inside the if and return directly.

> + }
> + found_transid = btrfs_header_generation(leaf);
> +
> + for (i = slot; i < nritems; i++) {
> + item_off = btrfs_item_ptr_offset(leaf, i);
> + item_len = btrfs_item_size_nr(leaf, i);
> +
> + btrfs_item_key_to_cpu(leaf, key, i);
> + if (!key_in_sk_and_subvol(key, sk))
> + continue;
> +
> + /* will not copy the name of subvolume */
> + if (key->type == BTRFS_ROOT_BACKREF_KEY ||
> + key->type == BTRFS_ROOT_REF_KEY)
> + item_len = sizeof(struct btrfs_root_ref);
> +
> + if (sizeof(sh) + item_len > *buf_size) {
> + if (*num_found) {
> + ret = 1;
> + goto out;
 This can be a simple return 1;


> + }
> +
> + /*
> +  * return one empty item back for v1, which does not
> +  * handle -EOVERFLOW
> +  */
> +
> + *buf_size = sizeof(sh) + item_len;
> + item_len = 0;
> + ret = -EOVERFLOW;
> + }
> +
> + if (sizeof(sh) + item_len + *sk_offset > *buf_size) {
> +

Re: [PATCH v2 1/3] vsprintf: Remove accidental VLA usage

2018-03-08 Thread Rasmus Villemoes

On 2018-03-08 04:30, Kees Cook wrote:
> In the quest to remove all stack VLAs from the kernel[1], this introduces
> a new "simple max" macro, and changes the "sym" array size calculation to
> use it. The value is actually a fixed size, but since the max() macro uses
> some extensive tricks for safety, it ends up looking like a variable size
> to the compiler.
> 
> [1] https://lkml.org/lkml/2018/3/7/621
> 
> Signed-off-by: Kees Cook 
> ---
>  include/linux/kernel.h | 11 +++
>  lib/vsprintf.c |  4 ++--
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 3fd291503576..1da554e9997f 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -820,6 +820,17 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
> oops_dump_mode) { }
> x, y)
>  
>  /**
> + * SIMPLE_MAX - return maximum of two values without any type checking
> + * @x: first value
> + * @y: second value
> + *
> + * This should only be used in stack array sizes, since the type-checking
> + * from max() confuses the compiler into thinking a VLA is being used.
> + */
> +#define SIMPLE_MAX(x, y) ((size_t)(x) > (size_t)(y) ? (size_t)(x) \
> +: (size_t)(y))

This will be abused at some point, leading to the usual double
evaluation etc. etc. problems. The name is also too long (and in general
we should avoid adjectives like "simple", "safe", people reading the
code won't know what is simple or safe about it). I think this should work

#define MAX(x, y) (__builtin_choose_expr((x) > (y), x, y))

That forces (x)>(y) to be a compile-time constant, so x and y must also
be; hence there can be no side effects. The MIN version of this could
replace the custom __const_min in fs/file.c, and probably other places
as well.

I tested that this at least works in the vsprintf case, -Wvla no longer
complains. fs/file.c also compiles with the MIN version of this.

I suppose MIN and MAX will collide with other uses in the tree. Hmm.

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/3] btrfs-progs: check/lowmem: Fix the incorrect error message of check_extent_data_item

2018-03-08 Thread Qu Wenruo



On 2018年02月28日 18:13, Lu Fengqi wrote:
> Instead of the disk_bytenr and disk_num_bytes of the extent_item which the
> file extent references, we should output the objectid and offset of the
> file extent. And the leaf may be shared by the file trees, we should print
> the objectid of the root and the owner of the leaf.
> 
> Fixes: b0d360b541f0 ("btrfs-progs: check: introduce function to check data 
> backref in extent tree")
> Signed-off-by: Lu Fengqi 
> ---
> V2: Output the objectid of the root and the owner of the leaf.
> 
>  check/mode-lowmem.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
> index 62bcf3d2e126..f37b1b2c1571 100644
> --- a/check/mode-lowmem.c
> +++ b/check/mode-lowmem.c
> @@ -2631,9 +2631,9 @@ static int check_extent_data_item(struct btrfs_root 
> *root,
>  
>   if (!(extent_flags & BTRFS_EXTENT_FLAG_DATA)) {
>   error(
> - "extent[%llu %llu] backref type mismatch, wanted bit: %llx",
> - disk_bytenr, disk_num_bytes,
> - BTRFS_EXTENT_FLAG_DATA);
> +"file extent[%llu %llu] root %llu owner %llu backref type mismatch, wanted 
> bit: %llx",
> + fi_key.objectid, fi_key.offset, root->objectid, owner,

Indeed this is much easier to identify the problem.

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> + BTRFS_EXTENT_FLAG_DATA);
>   err |= BACKREF_MISMATCH;
>   }
>  
> @@ -2722,8 +2722,9 @@ out:
>   err |= BACKREF_MISSING;
>   btrfs_release_path();
>   if (err & BACKREF_MISSING) {
> - error("data extent[%llu %llu] backref lost",
> -   disk_bytenr, disk_num_bytes);
> + error(
> + "file extent[%llu %llu] root %llu owner %llu backref lost",
> + fi_key.objectid, fi_key.offset, root->objectid, owner);
>   }
>   return err;
>  }
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 3/3] btrfs-progs: fsck-tests: Introduce test case with keyed data backref with the extent offset

2018-03-08 Thread Qu Wenruo



On 2018年02月28日 18:13, Lu Fengqi wrote:
> Add the testcase for false alert of data extent backref lost with the
> extent offset.
> 
> The image can be reproduced by the following commands:
> --
> dev=~/test.img
> mnt=/mnt/btrfs
> 
> umount $mnt &> /dev/null
> fallocate -l 128M $dev
> 
> mkfs.btrfs $dev
> mount $dev $mnt
> 
> for i in `seq 1 10`; do
>   xfs_io -f -c "pwrite 0 2K" $mnt/file$i
> done
> 
> xfs_io -f -c "falloc 0 64K" $mnt/file11
> 
> for i in `seq 1 32`; do
>   xfs_io -f -c "reflink $mnt/file11 0 $(($i * 64))K 64K" $mnt/file11
> done
> 
> xfs_io -f -c "reflink $mnt/file11 32K $((33 * 64))K 32K" $mnt/file11
> 
> btrfs subvolume snapshot $mnt $mnt/snap1
> 
> umount $mnt
> btrfs-image -c9 $dev extent_data_ref.img
> --
> 

The image reproducer looks good.

Reviewed-by: Qu Wenruo 

Thanks,
Qu


> Signed-off-by: Lu Fengqi 
> ---
>  .../fsck-tests/020-extent-ref-cases/extent_data_ref.img  | Bin 0 -> 6144 
> bytes
>  1 file changed, 0 insertions(+), 0 deletions(-)
>  create mode 100644 tests/fsck-tests/020-extent-ref-cases/extent_data_ref.img
> 
> diff --git a/tests/fsck-tests/020-extent-ref-cases/extent_data_ref.img 
> b/tests/fsck-tests/020-extent-ref-cases/extent_data_ref.img
> new file mode 100644
> index 
> ..3ab2396ba9c810d98f16a5efcf7fe23ee4b12ab5
> GIT binary patch
> literal 6144
> zcmeHKhgXx!wg;pLB1&(OB2^R zr3M7)NG}0thfs0@@80*$I`{nncdhGQ-`exrd*<8U?B6%D_DrmOs>hswC7R@)a*^1Q
> zoukK(t9nD9`o4x$Zyr51!+&3ungAIaVYi!m$6rv1jhq;d6iHImVvj2b5>V
> zZ~4u@ZwCGY81Nfk`ym*TMS8{0BlxQ)8N6Q>rPiyN%7Kg|ljJ!X25e4SR
> zqg7JI60Rvr4w?fol%`OCtCcBcV<_$gA%9RVL#wS;7f#<1L22?a92Cb2xOK=~
> zt zw}B)|I!j7tJIsO}mE_?C(}7{eH9eLGlP^k{#vwDU3*RUu55ql{i19j@Nl3c!7G!?o
>

Re: [PATCH 2/3] btrfs-progs: check/lowmem: Fix false alert of data extent backref lost for snapshot

2018-03-08 Thread Qu Wenruo



On 2018年02月28日 18:13, Lu Fengqi wrote:
> Btrfs lowmem check reports the following false alert:
> --
> ERROR: file extent[267 2162688] root 256 owner 5 backref lost
> --
> 
> The file extent is in the leaf which is shared by file tree 256 and fs
> tree.
> --
> leaf 30605312 items 46 free space 4353 generation 7 owner 5
> ..
> item 45 key (267 EXTENT_DATA 2162688) itemoff 5503 itemsize 53
> generation 7 type 2 (prealloc)
> prealloc data disk byte 13631488 nr 65536
> prealloc data offset 32768 nr 32768
> --
> 
> And there is the corresponding extent_data_ref item in the extent tree.
> --
> item 1 key (13631488 EXTENT_DATA_REF 1007496934287921081) itemoff 
> 15274 itemsize 28
> extent data backref root 5 objectid 267 offset 2129920 count 1
> --
> 
> The offset of EXTENT_DATA_REF which is the hash of the owner root objectid,
> the inode number and the calculated offset (file offset - extent offset).
> 
> What caused the false alert is the code mix up the owner root objectid and
> the file tree objectid.
> 
> Fixes: b0d360b541f0 ("btrfs-progs: check: introduce function to check data 
> backref in extent tree")
> Signed-off-by: Lu Fengqi 

Looks good.

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> ---
>  check/mode-lowmem.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
> index f37b1b2c1571..6f1ea8db341d 100644
> --- a/check/mode-lowmem.c
> +++ b/check/mode-lowmem.c
> @@ -2689,8 +2689,8 @@ static int check_extent_data_item(struct btrfs_root 
> *root,
>   /* Didn't find inlined data backref, try EXTENT_DATA_REF_KEY */
>   dbref_key.objectid = btrfs_file_extent_disk_bytenr(eb, fi);
>   dbref_key.type = BTRFS_EXTENT_DATA_REF_KEY;
> - dbref_key.offset = hash_extent_data_ref(root->objectid,
> - fi_key.objectid, fi_key.offset - offset);
> + dbref_key.offset = hash_extent_data_ref(owner, fi_key.objectid,
> + fi_key.offset - offset);
>  
>   ret = btrfs_search_slot(NULL, root->fs_info->extent_root,
>   _key, , 0, 0);
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2] fstests: btrfs/146: make sure hit all stripes in the case of compression

2018-03-08 Thread Lu Fengqi

On Thu, Mar 08, 2018 at 02:10:59PM +0800, Eryu Guan wrote:
>On Thu, Mar 08, 2018 at 01:56:45PM +0800, Lu Fengqi wrote:
>> In the case of compression, each 128K input data chunk will be compressed
>> to 4K (because of the characters written are duplicate). Therefore we have
>> to write (128K * 16) to make sure every stripe can be hit.
>> 
>> Signed-off-by: Lu Fengqi 
>> ---
>> 
>> V2: Modify the regular expression to ensure that it matches various
>> compress mount options.
>> 
>>  tests/btrfs/146 | 11 +--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>> 
>> diff --git a/tests/btrfs/146 b/tests/btrfs/146
>> index 7071c128ca0a..a51eda68eaf3 100755
>> --- a/tests/btrfs/146
>> +++ b/tests/btrfs/146
>> @@ -74,9 +74,16 @@ _scratch_pool_mkfs "-d raid0 -m raid1" > $seqres.full 2>&1
>>  _scratch_mount
>>  
>>  # How much do we need to write? We need to hit all of the stripes. btrfs 
>> uses
>> -# a fixed 64k stripesize, so write enough to hit each one
>> +# a fixed 64k stripesize, so write enough to hit each one. In the case of
>> +# compression, each 128K input data chunk will be compressed to 4K (because 
>> of
>> +# the characters written are duplicate). Therefore we have to write (128K * 
>> 16)
>> +# to make sure every stripe can be hit.
>>  number_of_devices=`echo $SCRATCH_DEV_POOL | wc -w`
>> -write_kb=$(($number_of_devices * 64))
>> +if ! echo $MOUNT_OPTIONS | grep -qoP 'compress(-force)?(=(?!no)|,|$)'; then
>
>I'm wondering if we could just write ($number_of_devices * 2048)K data
>unconditionally, so we could get rid of this case switch and the fancy
>perl style regexp?

That is exactly what I want. I'm just not sure if anyone will mind that the
amount of data written is not exactly hitting all stripes. Since you think
it does not matter, I'm glad to delete it.

>
>Thanks,
>Eryu
>
>> +write_kb=$(($number_of_devices * 64))
>> +else
>> +write_kb=$(($number_of_devices * 2048))
>> +fi
>>  _require_fs_space $SCRATCH_MNT $write_kb
>>  
>>  testfile=$SCRATCH_MNT/fsync-err-test
>> -- 
>> 2.16.2
>> 
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

-- 
Thanks,
Lu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

91 matches

Mail list logo