Re: [PATCH 2/3] btrfs: return required error from btrfs_check_super_csum

2018-03-20 Thread Anand Jain



On 03/21/2018 10:37 AM, Anand Jain wrote:



@@ -418,8 +418,10 @@ static int btrfs_check_super_csum(struct 
btrfs_fs_info *fs_info,

    crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
  btrfs_csum_final(crc, result);
-    if (memcmp(raw_disk_sb, result, sizeof(result)))
-    return 1;
+    if (memcmp(raw_disk_sb, result, sizeof(result))) {
+    btrfs_err(fs_info, "superblock checksum mismatch");
+    return -EINVAL;


Don't we want EUCLEAN here, since an error in the checksum indicates
corruption?


  With the v2 patch sentout here is the error str that is seen on the cli.

   mount: mount /dev/sdb on /btrfs failed: Structure needs cleaning


 And since we return -EINVAL for the error csum_type not found we get..

 mount: wrong fs type, bad option, bad superblock on /dev/sdb,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.

 Looks like mount syscall interprets -EINVAL error.


I am ok. However would have much better if it says checksum error.


 Also we may need an error code to say incompatible disk format/version.


Do you/anyone suggest I should update the errno.h?


Thanks, Anand



Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs: return required error from btrfs_check_super_csum

2018-03-20 Thread Anand Jain




@@ -418,8 +418,10 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
  crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
btrfs_csum_final(crc, result);
  
-	if (memcmp(raw_disk_sb, result, sizeof(result)))

-   return 1;
+   if (memcmp(raw_disk_sb, result, sizeof(result))) {
+   btrfs_err(fs_info, "superblock checksum mismatch");
+   return -EINVAL;


Don't we want EUCLEAN here, since an error in the checksum indicates
corruption?


 With the v2 patch sentout here is the error str that is seen on the cli.

  mount: mount /dev/sdb on /btrfs failed: Structure needs cleaning

I am ok. However would have much better if it says checksum error.

Do you/anyone suggest I should update the errno.h?

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crashes running btrfs scrub

2018-03-20 Thread Qu Wenruo


On 2018年03月21日 01:44, Mike Stevens wrote:
> 
>>> 30 devices is really not that much, heck you get 90 disks top load JBOD
>>> storage chassis these days and BTRFS does sound like an attractive choice
>>> for things like that.
> 
>> So Mike's case is, that both metadata and data are configured as
>> raid6, and the operations, balance and scrub, that he tried, need to
>> set the existing block group as readonly (in order to avoid any
>> further changes being applied during operations are running), then we
>> got into the place where another system chunk is needed.
> 
>> However, I think it'd be better to have some warnings about this when
>> doing a) mkfs.btrfs -mraid6, b) btrfs device add.
> 
>> David, any idea?
> 
> I'll certainly vote for a warning, I would have set this up differently had I 
> been aware.  
> 
> My filesystem check seems to have returned successfully:
> 
> [root@auswscs9903] ~ # btrfs check --readonly /dev/sdb
> Checking filesystem on /dev/sdb
> UUID: 77afc2bb-f7a8-4ce9-9047-c031f7571150
> checking extents
> checking free space cache
> checking fs roots
> checking csums
> checking root refs
> found 97926270238720 bytes used err is 0
> total csum bytes: 95395030288
> total tree bytes: 201223503872
> total fs tree bytes: 84484636672
> total extent tree bytes: 7195869184
> btree space waste bytes: 29627784154
> file data blocks allocated: 97756261568512
> 
> I've remounted the filesystem and I can at least touch a file.  I'm 
> restarting the rsync that was running when it originally went read only.
> What is the next step if it drops back to r/o?

As already mentioned, if you're using tons of disks and RAID0/10/5/6 as
metadata profile, you can just convert your metadata (or just system) to
RAID1/DUP.

Then there will be more than enough space for system chunk array.

Thanks,
Qu

> 
> 
> The information contained in this e-mail is for the exclusive use of the 
> intended recipient(s) and may be confidential, proprietary, and/or 
> legally privileged.  Inadvertent disclosure of this message does not 
> constitute a waiver of any privilege.  If you receive this message in 
> error, please do not directly or indirectly use, print, copy, forward,
> or disclose any part of this message.  Please also delete this e-mail 
> and all copies and notify the sender.  Thank you. 
> 
> N�r��y���b�X��ǧv�^�)޺{.n�+{�n�߲)���w*jg����ݢj/���z�ޖ��2�ޙ���&�)ߡ�a�����G���h��j:+v���w�٥
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-20 Thread Al Viro
On Tue, Mar 20, 2018 at 04:26:52PM -0700, Linus Torvalds wrote:
> On Tue, Mar 20, 2018 at 4:23 PM, Linus Torvalds
>  wrote:
> >
> > Hmm. So thanks to the diseased mind of Martin Uecker, there's a better
> > test for "__is_constant()":
> >
> >   /* Glory to Martin Uecker  */
> >   #define __is_constant(a) \
> > (sizeof(int) == sizeof(*(1 ? ((void*)((a) * 0l)) : (int*)1)))
> >
> > that is actually *specified* by the C standard to work, and doesn't
> > even depend on any gcc extensions.
> 
> Well, it does depend on 'sizeof(*(void *)X)' being 1 and the compiler
> not complaining about it, and that sizeof(int) is not 1.
> 
> But since we depend on those things in the kernel anyway, that's fine.

It also depends upon "ICE for null pointer constant purposes" having the
same extensions as "ICE for enum purposes", etc., which is not obvious.

Back in 2007 or so we had a long thread regarding null pointer constants
in sparse; I probably still have notes from back then, but that'll take
some serious digging to find ;-/

What's more, gcc definitely has odd extensions.  Example I remember from
back then:
extern unsigned n;
struct {
int x : 1 + n - n;
} y;

is accepted.  Used to be quietly accepted with -Wpedantic -std=c99, even,
but that got fixed - with -Wpedantic it does, at least, warn.  What is
and what is not recognized is fairly random - 1 + n - n + 1 + n - n
is recognized as "constant", 1 + n + n + 1 - n - n is not.  Of course,
neither is an ICE.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] btrfs: return required error from btrfs_check_super_csum

2018-03-20 Thread Anand Jain
Return the required -EINVAL and -EUCLEAN from the function
btrfs_check_super_csum(). And more the error log into the
parent function.

Signed-off-by: Anand Jain 
---
 fs/btrfs/disk-io.c | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 582ed6af3c50..4c6de2743250 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -392,9 +392,10 @@ static int verify_parent_transid(struct extent_io_tree 
*io_tree,
 /*
  * Return 0 if the superblock checksum type matches the checksum value of that
  * algorithm. Pass the raw disk superblock data.
+ * Otherwise: -EINVAL  if csum type is not found
+ *   -EUCLEAN if csum does not match
  */
-static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info,
- char *raw_disk_sb)
+static int btrfs_check_super_csum(char *raw_disk_sb)
 {
struct btrfs_super_block *disk_sb =
(struct btrfs_super_block *)raw_disk_sb;
@@ -403,11 +404,8 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
char result[sizeof(crc)];
 
/* We support csum type crc32 only as of now */
-   if (csum_type != BTRFS_CSUM_TYPE_CRC32) {
-   btrfs_err(fs_info, "unsupported checksum algorithm %u",
- csum_type);
-   return 1;
-   }
+   if (csum_type != BTRFS_CSUM_TYPE_CRC32)
+   return -EINVAL;
 
/*
 * The super_block structure does not span the whole
@@ -419,7 +417,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
btrfs_csum_final(crc, result);
 
if (memcmp(raw_disk_sb, result, sizeof(result)))
-   return 1;
+   return -EUCLEAN;
 
return 0;
 }
@@ -2571,9 +2569,17 @@ int open_ctree(struct super_block *sb,
 * We want to check superblock checksum, the type is stored inside.
 * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
 */
-   if (btrfs_check_super_csum(fs_info, bh->b_data)) {
-   btrfs_err(fs_info, "superblock checksum mismatch");
-   err = -EINVAL;
+   err = btrfs_check_super_csum(bh->b_data);
+   if (err) {
+   if (err == -EINVAL)
+   pr_err("BTRFS error (device %pg): unsupported checksum 
algorithm",
+   fs_devices->latest_bdev);
+   else if (err == -EUCLEAN)
+   pr_err("BTRFS error (device %pg): superblock checksum 
mismatch",
+   fs_devices->latest_bdev);
+   else
+   pr_err("BTRFS error (device %pg): checksum check failed 
%d",
+   fs_devices->latest_bdev, err);
brelse(bh);
goto fail_alloc;
}
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] btrfs: cleanup btrfs_check_super_csum() to check the csum_type to the type

2018-03-20 Thread Anand Jain
%csum_type is being checked to check the number of csum types we support,
which is 1 as of now. Instead just check if the type matches with the
only type we support, that is BTRFS_CSUM_TYPE_CRC32. And further adds
cleanup.

Signed-off-by: Anand Jain 
---
 fs/btrfs/disk-io.c | 41 +++--
 1 file changed, 19 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1657d6aa4fa6..582ed6af3c50 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -399,32 +399,29 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
struct btrfs_super_block *disk_sb =
(struct btrfs_super_block *)raw_disk_sb;
u16 csum_type = btrfs_super_csum_type(disk_sb);
-   int ret = 0;
-
-   if (csum_type == BTRFS_CSUM_TYPE_CRC32) {
-   u32 crc = ~(u32)0;
-   char result[sizeof(crc)];
-
-   /*
-* The super_block structure does not span the whole
-* BTRFS_SUPER_INFO_SIZE range, we expect that the unused space
-* is filled with zeros and is included in the checksum.
-*/
-   crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE,
-   crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
-   btrfs_csum_final(crc, result);
-
-   if (memcmp(raw_disk_sb, result, sizeof(result)))
-   ret = 1;
-   }
+   u32 crc = ~(u32)0;
+   char result[sizeof(crc)];
 
-   if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) {
+   /* We support csum type crc32 only as of now */
+   if (csum_type != BTRFS_CSUM_TYPE_CRC32) {
btrfs_err(fs_info, "unsupported checksum algorithm %u",
-   csum_type);
-   ret = 1;
+ csum_type);
+   return 1;
}
 
-   return ret;
+   /*
+* The super_block structure does not span the whole
+* BTRFS_SUPER_INFO_SIZE range, we expect that the unused space
+* is filled with zeros and is included in the checksum.
+*/
+   crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE,
+ crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
+   btrfs_csum_final(crc, result);
+
+   if (memcmp(raw_disk_sb, result, sizeof(result)))
+   return 1;
+
+   return 0;
 }
 
 /*
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] Preparatory to add the csum check in the scan context

2018-03-20 Thread Anand Jain
v1->v2:
  Merge 2/3 and 3/3
  Use -EUCLEAN for csum mismatch
  Move the error logging to the parent function
  Drop the struct btrfs_fsinfo as the argument for
btrfs_check_super_csum()
  Do not make btrfs_check_super_csum() nonstatic here, it will be part
of the patchset which will use the csum in the scan context

Anand Jain (2):
  btrfs: cleanup btrfs_check_super_csum() to check the csum_type to the
type
  btrfs: return required error from btrfs_check_super_csum

 fs/btrfs/disk-io.c | 57 --
 1 file changed, 30 insertions(+), 27 deletions(-)

-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-20 Thread Linus Torvalds
On Tue, Mar 20, 2018 at 4:23 PM, Linus Torvalds
 wrote:
>
> Hmm. So thanks to the diseased mind of Martin Uecker, there's a better
> test for "__is_constant()":
>
>   /* Glory to Martin Uecker  */
>   #define __is_constant(a) \
> (sizeof(int) == sizeof(*(1 ? ((void*)((a) * 0l)) : (int*)1)))
>
> that is actually *specified* by the C standard to work, and doesn't
> even depend on any gcc extensions.

Well, it does depend on 'sizeof(*(void *)X)' being 1 and the compiler
not complaining about it, and that sizeof(int) is not 1.

But since we depend on those things in the kernel anyway, that's fine.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-20 Thread Linus Torvalds
On Sat, Mar 17, 2018 at 1:07 PM, Kees Cook  wrote:
>
> No luck! :( gcc 4.4 refuses to play along. And, hilariously, not only
> does it not change the complaint about __builtin_choose_expr(), it
> also thinks that's a VLA now.

Hmm. So thanks to the diseased mind of Martin Uecker, there's a better
test for "__is_constant()":

  /* Glory to Martin Uecker  */
  #define __is_constant(a) \
(sizeof(int) == sizeof(*(1 ? ((void*)((a) * 0l)) : (int*)1)))

that is actually *specified* by the C standard to work, and doesn't
even depend on any gcc extensions.

The reason is some really subtle pointer conversion rules, where the
type of the ternary operator will depend on whether one of the
pointers is NULL or not.

And the definition of NULL, in turn, very much depends on "integer
constant expression that has the value 0".

Are you willing to do one final try on a generic min/max? Same as my
last patch, but using the above __is_constant() test instead of
__builtin_constant_p?

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] btrfs: refactor btrfs_check_super_csum() for scan

2018-03-20 Thread Anand Jain



On 03/20/2018 07:10 PM, Nikolay Borisov wrote:



On 20.03.2018 11:53, Anand Jain wrote:

In preparation to use the function btrfs_check_super_csum() for
the device scan context, make it nonstatic and pass the
struct block_device instead of the struct fs_info.

Signed-off-by: Anand Jain 
---
  fs/btrfs/disk-io.c | 12 ++--
  fs/btrfs/disk-io.h |  1 +
  2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index aafd836af61d..d2ace2dca9de 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -393,8 +393,7 @@ static int verify_parent_transid(struct extent_io_tree 
*io_tree,
   * Return 0 if the superblock checksum type matches the checksum value of that
   * algorithm. Pass the raw disk superblock data.
   */
-static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info,
- char *raw_disk_sb)
+int btrfs_check_super_csum(struct block_device *bdev, char *raw_disk_sb)
  {


Since this has become a public function and you've changed the fs_info
parameter to taking a bdev, which is not used for anything else than
printing the error I think it's appropriate to document which block
device should be passed to this function.

Also passing the block device only for printing purposes seems a bit odd.


 Its the device on which we have read the superblock. I find it
 odd too. Will do that pr_err on the parent function. Will add
 comments to the public function.


Thanks, Anand

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs: return required error from btrfs_check_super_csum

2018-03-20 Thread Anand Jain





@@ -418,8 +418,10 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
  crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
btrfs_csum_final(crc, result);
  
-	if (memcmp(raw_disk_sb, result, sizeof(result)))

-   return 1;
+   if (memcmp(raw_disk_sb, result, sizeof(result))) {
+   btrfs_err(fs_info, "superblock checksum mismatch");
+   return -EINVAL;


Don't we want EUCLEAN here, since an error in the checksum indicates
corruption?


 yeah, I struggled with it before.
 Strangely there isn't an error code for the csum error
 when csum being widely used in the computer systems.

 Our closest are:

#define EFAULT  14  /* Bad address */
#define EUCLEAN 135 /* Structure needs cleaning */

 Will use the suggested EUCLEAN.

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel warning - not sure if this is important

2018-03-20 Thread Pete
On 03/19/2018 08:34 AM, Nikolay Borisov wrote:

> You are hitting a warning that was removed in : c8195a7b1ad5 ("btrfs:
> remove spurious WARN_ON(ref->count < 0) in find_parent_nodes")
> 
> 
> In short - you can disregard it. However, it's likely you will continue
> seeing it in the future unless you update to 4.16 (the commit is not
> tagged for stable ))
> 

Thank you, much appreciated.  I think I can manage to wait for 4.16!

Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Allow non-privileged user to delete empty subvolume by default

2018-03-20 Thread Goffredo Baroncelli
On 03/20/2018 07:45 AM, Misono, Tomohiro wrote:
> Deletion of subvolume by non-privileged user is completely restricted
> by default because we can delete a subvolume even if it is not empty
> and may cause data loss. In other words, when user_subvol_rm_allowed
> mount option is used, a user can delete a subvolume containing the
> directory which cannot be deleted directly by the user.
> 
> However, there should be no harm to allow users to delete empty subvolumes
> when rmdir(2) would have been allowed if they were normal directories.
> This patch allows deletion of empty subvolume by default.

Instead of modifying the ioctl, what about allowing rmdir(2) to work for an 
_empty_ subvolume (and all the permission check are satisfied) ?



> 
> Note that user_subvol_rm_allowed option requires write+exec permission
> of the subvolume to be deleted, but they are not required for empty
> subvolume.
> 
> The comment in the code is also updated accordingly.
> 
> Signed-off-by: Tomohiro Misono 
> ---
>  fs/btrfs/ioctl.c | 55 +++
>  1 file changed, 31 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 111ee282b777..838406a7a7f5 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2366,36 +2366,43 @@ static noinline int btrfs_ioctl_snap_destroy(struct 
> file *file,
>   dest = BTRFS_I(inode)->root;
>   if (!capable(CAP_SYS_ADMIN)) {
>   /*
> -  * Regular user.  Only allow this with a special mount
> -  * option, when the user has write+exec access to the
> -  * subvol root, and when rmdir(2) would have been
> -  * allowed.
> +  * By default, regular user is only allowed to delete
> +  * empty subvols when rmdir(2) would have been allowed
> +  * if they were normal directories.
>*
> -  * Note that this is _not_ check that the subvol is
> -  * empty or doesn't contain data that we wouldn't
> +  * If the mount option 'user_subvol_rm_allowed' is set,
> +  * it allows users to delete non-empty subvols when the
> +  * user has write+exec access to the subvol root and when
> +  * rmdir(2) would have been allowed (except the emptiness
> +  * check).
> +  *
> +  * Note that this option does _not_ check that if the subvol
> +  * is empty or doesn't contain data that the user wouldn't
>* otherwise be able to delete.
>*
> -  * Users who want to delete empty subvols should try
> -  * rmdir(2).
> +  * Users who want to delete empty subvols created by
> +  * snapshot (ino number == 2) can use rmdir(2).
>*/
> - err = -EPERM;
> - if (!btrfs_test_opt(fs_info, USER_SUBVOL_RM_ALLOWED))
> - goto out_dput;
> + err = -ENOTEMPTY;
> + if (inode->i_size != BTRFS_EMPTY_DIR_SIZE) {
> + if (!btrfs_test_opt(fs_info, USER_SUBVOL_RM_ALLOWED))
> + goto out_dput;
>  
> - /*
> -  * Do not allow deletion if the parent dir is the same
> -  * as the dir to be deleted.  That means the ioctl
> -  * must be called on the dentry referencing the root
> -  * of the subvol, not a random directory contained
> -  * within it.
> -  */
> - err = -EINVAL;
> - if (root == dest)
> - goto out_dput;
> + /*
> +  * Do not allow deletion if the parent dir is the same
> +  * as the dir to be deleted.  That means the ioctl
> +  * must be called on the dentry referencing the root
> +  * of the subvol, not a random directory contained
> +  * within it.
> +  */
> + err = -EINVAL;
> + if (root == dest)
> + goto out_dput;
>  
> - err = inode_permission(inode, MAY_WRITE | MAY_EXEC);
> - if (err)
> - goto out_dput;
> + err = inode_permission(inode, MAY_WRITE | MAY_EXEC);
> + if (err)
> + goto out_dput;
> + }
>   }
>  
>   /* check if subvolume may be deleted by a user */
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crashes running btrfs scrub

2018-03-20 Thread Goffredo Baroncelli
On 03/19/2018 07:06 PM, Liu Bo wrote:
[...]
> 
> So Mike's case is, that both metadata and data are configured as
> raid6, and the operations, balance and scrub, that he tried, need to
> set the existing block group as readonly (in order to avoid any
> further changes being applied during operations are running), then we
> got into the place where another system chunk is needed.
> 
> However, I think it'd be better to have some warnings about this when
> doing a) mkfs.btrfs -mraid6, b) btrfs device add.

What about in the kernel dmesg during the mount

> 
> David, any idea?
> 
> thanks,
> liubo
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: defer adding raid type kobject until after chunk relocation

2018-03-20 Thread jeffm
From: Jeff Mahoney 

Any time the first block group of a new type is created, we add a new
kobject to sysfs to hold the attributes for that type.  Kobject-internal
allocations always use GFP_KERNEL, making them prone to fs-reclaim races.
While it appears as if this can occur any time a block group is created,
the only times the first block group of a new type can be created in
memory is at mount and when we create the first new block group during
raid conversion.

This patch adds a new list to track pending kobject additions and then
handles them after we do chunk relocation.  Between relocating the
target chunk (or forcing allocation of a new chunk in the case of data)
and removing the old chunk, we're in a safe place for fs-reclaim to
occur.  We're holding the volume mutex, which is already held across
page faults, and the delete_unused_bgs_mutex, which will only stall
the cleaner thread.

Signed-off-by: Jeff Mahoney 
---
 fs/btrfs/ctree.h   |  6 -
 fs/btrfs/disk-io.c |  2 ++
 fs/btrfs/extent-tree.c | 59 +++---
 fs/btrfs/sysfs.c   |  2 +-
 fs/btrfs/volumes.c | 12 ++
 5 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ffbb05aa66fa..75dbdf1bbead 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -381,8 +381,9 @@ struct btrfs_dev_replace {
 
 /* For raid type sysfs entries */
 struct raid_kobject {
-   int raid_type;
+   u64 flags;
struct kobject kobj;
+   struct list_head list;
 };
 
 struct btrfs_space_info {
@@ -938,6 +939,8 @@ struct btrfs_fs_info {
int thread_pool_size;
 
struct kobject *space_info_kobj;
+   struct list_head pending_raid_kobjs;
+   spinlock_t pending_raid_kobjs_lock; /* uncontended */
 
u64 total_pinned;
 
@@ -2688,6 +2691,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 
bytenr);
 int btrfs_make_block_group(struct btrfs_trans_handle *trans,
   struct btrfs_fs_info *fs_info, u64 bytes_used,
   u64 type, u64 chunk_offset, u64 size);
+void btrfs_add_raid_kobjects(struct btrfs_fs_info *fs_info);
 struct btrfs_trans_handle *btrfs_start_trans_remove_block_group(
struct btrfs_fs_info *fs_info,
const u64 chunk_offset);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index eb6bb3169a9e..d5e1c2ff71ff 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2447,6 +2447,8 @@ int open_ctree(struct super_block *sb,
INIT_LIST_HEAD(_info->delayed_iputs);
INIT_LIST_HEAD(_info->delalloc_roots);
INIT_LIST_HEAD(_info->caching_block_groups);
+   INIT_LIST_HEAD(_info->pending_raid_kobjs);
+   spin_lock_init(_info->pending_raid_kobjs_lock);
spin_lock_init(_info->delalloc_root_lock);
spin_lock_init(_info->trans_lock);
spin_lock_init(_info->fs_roots_radix_lock);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0e9a21230217..bb5368faa937 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9908,9 +9908,39 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
return 0;
 }
 
+/* link_block_group will queue up kobjects to add when we're reclaim-safe */
+void btrfs_add_raid_kobjects(struct btrfs_fs_info *fs_info)
+{
+   struct btrfs_space_info *space_info;
+   struct raid_kobject *rkobj;
+   LIST_HEAD(list);
+   int index;
+   int ret = 0;
+
+   spin_lock(_info->pending_raid_kobjs_lock);
+   list_splice_init(_info->pending_raid_kobjs, );
+   spin_unlock(_info->pending_raid_kobjs_lock);
+
+   list_for_each_entry(rkobj, , list) {
+   space_info = __find_space_info(fs_info, rkobj->flags);
+   index = __get_raid_index(rkobj->flags);
+
+   ret = kobject_add(>kobj, _info->kobj,
+ "%s", get_raid_name(index));
+   if (ret) {
+   kobject_put(>kobj);
+   break;
+   }
+   }
+   if (ret)
+   btrfs_warn(fs_info,
+  "failed to add kobject for block cache, ignoring");
+}
+
 static void link_block_group(struct btrfs_block_group_cache *cache)
 {
struct btrfs_space_info *space_info = cache->space_info;
+   struct btrfs_fs_info *fs_info = cache->fs_info;
int index = get_block_group_index(cache);
bool first = false;
 
@@ -9921,27 +9951,19 @@ static void link_block_group(struct 
btrfs_block_group_cache *cache)
up_write(_info->groups_sem);
 
if (first) {
-   struct raid_kobject *rkobj;
-   int ret;
-
-   rkobj = kzalloc(sizeof(*rkobj), GFP_NOFS);
-   if (!rkobj)
-   goto out_err;
-   rkobj->raid_type = index;
-   kobject_init(>kobj, _raid_ktype);

[PATCH 1/2] btrfs: remove dead create_space_info calls

2018-03-20 Thread jeffm
From: Jeff Mahoney 

Since commit 2be12ef79 (btrfs: Separate space_info create/update), we've
separated out the creation and updating of the space info structures.
That commit was a straightforward refactoring of the two parts of
update_space_info, but we can go a step further.  Since commits
c59021f84 (Btrfs: fix OOPS of empty filesystem after balance) and
b742bb82f (Btrfs: Link block groups of different raid types), we know
that the space_info structures will be created at mount and there will
only ever be, at most, three of them.

This patch cleans out the create_space_info calls after __find_space_info
returns NULL since __find_space_info *can't* return NULL.

The initial cause for reviewing this was the kobject_add calls from
create_space_info occuring in sites where fs-reclaim wasn't allowed.  Now
we are certain they occur only early in the mount process and are safe.

Signed-off-by: Jeff Mahoney 
---
 fs/btrfs/ctree.h   |  6 +++---
 fs/btrfs/extent-tree.c | 16 ++--
 2 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index da308774b8a4..ffbb05aa66fa 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -952,9 +952,9 @@ struct btrfs_fs_info {
struct btrfs_fs_devices *fs_devices;
 
/*
-* the space_info list is almost entirely read only.  It only changes
-* when we add a new raid type to the FS, and that happens
-* very rarely.  RCU is used to protect it.
+* The space_info list is effectively read only after initial
+* setup.  It is populated at mount time and cleaned up after
+* all block groups are removed.  RCU is used to protect it.
 */
struct list_head space_info;
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 2d5e963fae88..0e9a21230217 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4603,11 +4603,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans,
return -ENOSPC;
 
space_info = __find_space_info(fs_info, flags);
-   if (!space_info) {
-   ret = create_space_info(fs_info, flags, _info);
-   if (ret)
-   return ret;
-   }
+   ASSERT(space_info);
 
 again:
spin_lock(_info->lock);
@@ -10255,15 +10251,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle 
*trans,
 * with its ->space_info set.
 */
cache->space_info = __find_space_info(fs_info, cache->flags);
-   if (!cache->space_info) {
-   ret = create_space_info(fs_info, cache->flags,
-  >space_info);
-   if (ret) {
-   btrfs_remove_free_space_cache(cache);
-   btrfs_put_block_group(cache);
-   return ret;
-   }
-   }
+   ASSERT(cache->space_info);
 
ret = btrfs_add_block_group_cache(fs_info, cache);
if (ret) {
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Crashes running btrfs scrub

2018-03-20 Thread Mike Stevens

>> 30 devices is really not that much, heck you get 90 disks top load JBOD
>> storage chassis these days and BTRFS does sound like an attractive choice
>> for things like that.

> So Mike's case is, that both metadata and data are configured as
> raid6, and the operations, balance and scrub, that he tried, need to
> set the existing block group as readonly (in order to avoid any
> further changes being applied during operations are running), then we
> got into the place where another system chunk is needed.

> However, I think it'd be better to have some warnings about this when
> doing a) mkfs.btrfs -mraid6, b) btrfs device add.

> David, any idea?

I'll certainly vote for a warning, I would have set this up differently had I 
been aware.  

My filesystem check seems to have returned successfully:

[root@auswscs9903] ~ # btrfs check --readonly /dev/sdb
Checking filesystem on /dev/sdb
UUID: 77afc2bb-f7a8-4ce9-9047-c031f7571150
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 97926270238720 bytes used err is 0
total csum bytes: 95395030288
total tree bytes: 201223503872
total fs tree bytes: 84484636672
total extent tree bytes: 7195869184
btree space waste bytes: 29627784154
file data blocks allocated: 97756261568512

I've remounted the filesystem and I can at least touch a file.  I'm restarting 
the rsync that was running when it originally went read only.
What is the next step if it drops back to r/o?


The information contained in this e-mail is for the exclusive use of the 
intended recipient(s) and may be confidential, proprietary, and/or 
legally privileged.  Inadvertent disclosure of this message does not 
constitute a waiver of any privilege.  If you receive this message in 
error, please do not directly or indirectly use, print, copy, forward,
or disclose any part of this message.  Please also delete this e-mail 
and all copies and notify the sender.  Thank you. 



Re: [PATCH] btrfs-progs: Beautify owner when printing leaf/nodes

2018-03-20 Thread Qu Wenruo


On 2018年03月20日 16:45, Nikolay Borisov wrote:
> Currently we print the raw values of the owner field of leaf/nodes.
> This can result in output like the following:
> 
> leaf 30490624 items 2 free space 16061 generation 4 owner 18446744073709551607
> 
> With the patch applied the same leaf looks like:
> 
> leaf 30490624 items 2 free space 16061 generation 4 owner DATA_RELOC_TREE
> 
> Signed-off-by: Nikolay Borisov 

Looks good.

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> ---
>  print-tree.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/print-tree.c b/print-tree.c
> index d59f9002f54c..a2f6bfc027c9 100644
> --- a/print-tree.c
> +++ b/print-tree.c
> @@ -1188,11 +1188,12 @@ void btrfs_print_leaf(struct btrfs_root *root, struct 
> extent_buffer *eb)
>   header_flags_to_str(flags, flags_str);
>   nr = btrfs_header_nritems(eb);
>  
> - printf("leaf %llu items %d free space %d generation %llu owner %llu\n",
> + printf("leaf %llu items %d free space %d generation %llu owner ",
>   (unsigned long long)btrfs_header_bytenr(eb), nr,
>   btrfs_leaf_free_space(root, eb),
> - (unsigned long long)btrfs_header_generation(eb),
> - (unsigned long long)btrfs_header_owner(eb));
> + (unsigned long long)btrfs_header_generation(eb));
> + print_objectid(stdout, btrfs_header_owner(eb), 0);
> + printf("\n");
>   printf("leaf %llu flags 0x%llx(%s) backref revision %d\n",
>   btrfs_header_bytenr(eb), flags, flags_str, backref_rev);
>   print_uuids(eb);
> @@ -1365,12 +1366,13 @@ void btrfs_print_tree(struct btrfs_root *root, struct 
> extent_buffer *eb, int fol
>   btrfs_print_leaf(root, eb);
>   return;
>   }
> - printf("node %llu level %d items %d free %u generation %llu owner 
> %llu\n",
> + printf("node %llu level %d items %d free %u generation %llu owner ",
>  (unsigned long long)eb->start,
>   btrfs_header_level(eb), nr,
>   (u32)BTRFS_NODEPTRS_PER_BLOCK(root->fs_info) - nr,
> - (unsigned long long)btrfs_header_generation(eb),
> - (unsigned long long)btrfs_header_owner(eb));
> + (unsigned long long)btrfs_header_generation(eb));
> + print_objectid(stdout, btrfs_header_owner(eb), 0);
> + printf("\n");
>   print_uuids(eb);
>   fflush(stdout);
>   for (i = 0; i < nr; i++) {
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 3/3] btrfs: refactor btrfs_check_super_csum() for scan

2018-03-20 Thread Nikolay Borisov


On 20.03.2018 11:53, Anand Jain wrote:
> In preparation to use the function btrfs_check_super_csum() for
> the device scan context, make it nonstatic and pass the
> struct block_device instead of the struct fs_info.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/disk-io.c | 12 ++--
>  fs/btrfs/disk-io.h |  1 +
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index aafd836af61d..d2ace2dca9de 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -393,8 +393,7 @@ static int verify_parent_transid(struct extent_io_tree 
> *io_tree,
>   * Return 0 if the superblock checksum type matches the checksum value of 
> that
>   * algorithm. Pass the raw disk superblock data.
>   */
> -static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info,
> -   char *raw_disk_sb)
> +int btrfs_check_super_csum(struct block_device *bdev, char *raw_disk_sb)
>  {

Since this has become a public function and you've changed the fs_info
parameter to taking a bdev, which is not used for anything else than
printing the error I think it's appropriate to document which block
device should be passed to this function.

Also passing the block device only for printing purposes seems a bit odd.

>   struct btrfs_super_block *disk_sb =
>   (struct btrfs_super_block *)raw_disk_sb;
> @@ -404,8 +403,8 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
> *fs_info,
>  
>   /* We support csum type crc32 only as of now */
>   if (csum_type != BTRFS_CSUM_TYPE_CRC32) {
> - btrfs_err(fs_info, "unsupported checksum algorithm %u",
> -   csum_type);
> + pr_err("BTRFS error (device %pg): unsupported checksum 
> algorithm %u",
> + bdev, csum_type);
>   return -EINVAL;
>   }
>  
> @@ -419,7 +418,8 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
> *fs_info,
>   btrfs_csum_final(crc, result);
>  
>   if (memcmp(raw_disk_sb, result, sizeof(result))) {
> - btrfs_err(fs_info, "superblock checksum mismatch");
> + pr_err("BTRFS error (device %pg): superblock checksum mismatch",
> + bdev);
>   return -EINVAL;
>   }
>  
> @@ -2573,7 +2573,7 @@ int open_ctree(struct super_block *sb,
>* We want to check superblock checksum, the type is stored inside.
>* Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
>*/
> - err = btrfs_check_super_csum(fs_info, bh->b_data);
> + err = btrfs_check_super_csum(fs_devices->latest_bdev, bh->b_data);
>   if (err) {
>   brelse(bh);
>   goto fail_alloc;
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index 70a88d61b547..1427fa1181d4 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -69,6 +69,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
> max_mirrors);
>  struct buffer_head *btrfs_read_dev_super(struct block_device *bdev);
>  int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
>   struct buffer_head **bh_ret);
> +int btrfs_check_super_csum(struct block_device *bdev, char *raw_disk_sb);
>  int btrfs_commit_super(struct btrfs_fs_info *fs_info);
>  struct btrfs_root *btrfs_read_fs_root(struct btrfs_root *tree_root,
> struct btrfs_key *location);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Validate child tree block's level and first key

2018-03-20 Thread kbuild test robot
Hi Qu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v4.16-rc4]
[also build test WARNING on next-20180319]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-Validate-child-tree-block-s-level-and-first-key/20180320-054353
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> fs/btrfs/ref-verify.c:586:45: sparse: not enough arguments for function 
>> read_tree_block
   fs/btrfs/ref-verify.c:284:27: sparse: context imbalance in 'add_block_entry' 
- different lock contexts for basic block
   fs/btrfs/ref-verify.c:371:20: sparse: context imbalance in 'add_tree_block' 
- unexpected unlock
   fs/btrfs/ref-verify.c:396:28: sparse: context imbalance in 
'add_shared_data_ref' - unexpected unlock
   fs/btrfs/ref-verify.c:434:28: sparse: context imbalance in 
'add_extent_data_ref' - unexpected unlock
   fs/btrfs/ref-verify.c:892:20: sparse: context imbalance in 
'btrfs_ref_tree_mod' - unexpected unlock
   fs/btrfs/ref-verify.c: In function 'walk_down_tree':
   fs/btrfs/ref-verify.c:586:9: error: too few arguments to function 
'read_tree_block'
   eb = read_tree_block(fs_info, block_bytenr, gen);
^~~
   In file included from fs/btrfs/ref-verify.c:22:0:
   fs/btrfs/disk-io.h:55:23: note: declared here
struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 
bytenr,
  ^~~

vim +586 fs/btrfs/ref-verify.c

fd708b81 Josef Bacik 2017-09-29  570  
fd708b81 Josef Bacik 2017-09-29  571  /* Walk down to the leaf from the given 
level */
fd708b81 Josef Bacik 2017-09-29  572  static int walk_down_tree(struct 
btrfs_root *root, struct btrfs_path *path,
fd708b81 Josef Bacik 2017-09-29  573  int level, u64 
*bytenr, u64 *num_bytes)
fd708b81 Josef Bacik 2017-09-29  574  {
fd708b81 Josef Bacik 2017-09-29  575struct btrfs_fs_info *fs_info = 
root->fs_info;
fd708b81 Josef Bacik 2017-09-29  576struct extent_buffer *eb;
fd708b81 Josef Bacik 2017-09-29  577u64 block_bytenr, gen;
fd708b81 Josef Bacik 2017-09-29  578int ret = 0;
fd708b81 Josef Bacik 2017-09-29  579  
fd708b81 Josef Bacik 2017-09-29  580while (level >= 0) {
fd708b81 Josef Bacik 2017-09-29  581if (level) {
fd708b81 Josef Bacik 2017-09-29  582block_bytenr = 
btrfs_node_blockptr(path->nodes[level],
fd708b81 Josef Bacik 2017-09-29  583
   path->slots[level]);
fd708b81 Josef Bacik 2017-09-29  584gen = 
btrfs_node_ptr_generation(path->nodes[level],
fd708b81 Josef Bacik 2017-09-29  585
path->slots[level]);
fd708b81 Josef Bacik 2017-09-29 @586eb = 
read_tree_block(fs_info, block_bytenr, gen);
fd708b81 Josef Bacik 2017-09-29  587if (IS_ERR(eb))
fd708b81 Josef Bacik 2017-09-29  588return 
PTR_ERR(eb);
fd708b81 Josef Bacik 2017-09-29  589if 
(!extent_buffer_uptodate(eb)) {
fd708b81 Josef Bacik 2017-09-29  590
free_extent_buffer(eb);
fd708b81 Josef Bacik 2017-09-29  591return -EIO;
fd708b81 Josef Bacik 2017-09-29  592}
fd708b81 Josef Bacik 2017-09-29  593
btrfs_tree_read_lock(eb);
fd708b81 Josef Bacik 2017-09-29  594
btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
fd708b81 Josef Bacik 2017-09-29  595path->nodes[level-1] = 
eb;
fd708b81 Josef Bacik 2017-09-29  596path->slots[level-1] = 
0;
fd708b81 Josef Bacik 2017-09-29  597path->locks[level-1] = 
BTRFS_READ_LOCK_BLOCKING;
fd708b81 Josef Bacik 2017-09-29  598} else {
fd708b81 Josef Bacik 2017-09-29  599ret = 
process_leaf(root, path, bytenr, num_bytes);
fd708b81 Josef Bacik 2017-09-29  600if (ret)
fd708b81 Josef Bacik 2017-09-29  601break;
fd708b81 Josef Bacik 2017-09-29  602}
fd708b81 Josef Bacik 2017-09-29  603level--;
fd708b81 Josef Bacik 2017-09-29  604}
fd708b81 Josef Bacik 2017-09-29  605return ret;
fd708b81 Josef Bacik 2017-09-29  606  }
fd708b81 Josef Bacik 2017-09-29  607  

:: The code at line 586 was first introduced by commit
:: fd708b81d972a0714b02a60eb4792fdbf15868c4 Btrfs: add a extent ref verify 
tool

:: TO: Josef Bacik <jo...@toxicpanda.com>
:: CC: David Sterba <dste...@suse.com>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Co

Re: [PATCH 1/3] btrfs: cleanup btrfs_check_super_csum() to check the csum_type to the type

2018-03-20 Thread Nikolay Borisov


On 20.03.2018 12:11, Nikolay Borisov wrote:
> 
> 
> On 20.03.2018 11:53, Anand Jain wrote:
>> %csum_type is being checked to check the number of csum types we support,
>> which is 1 as of now. Instead just check if the type matches with the
>> only type we support, that is BTRFS_CSUM_TYPE_CRC32. And further adds
>> cleanup.
>>
>> Signed-off-by: Anand Jain 
>> ---
>>  fs/btrfs/disk-io.c | 41 +++--
>>  1 file changed, 19 insertions(+), 22 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 1657d6aa4fa6..582ed6af3c50 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -399,32 +399,29 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
>> *fs_info,
>>  struct btrfs_super_block *disk_sb =
>>  (struct btrfs_super_block *)raw_disk_sb;
>>  u16 csum_type = btrfs_super_csum_type(disk_sb);
>> -int ret = 0;
>> -
>> -if (csum_type == BTRFS_CSUM_TYPE_CRC32) {
>> -u32 crc = ~(u32)0;
>> -char result[sizeof(crc)];
>> -
>> -/*
>> - * The super_block structure does not span the whole
>> - * BTRFS_SUPER_INFO_SIZE range, we expect that the unused space
>> - * is filled with zeros and is included in the checksum.
>> - */
>> -crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE,
>> -crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
>> -btrfs_csum_final(crc, result);
>> -
>> -if (memcmp(raw_disk_sb, result, sizeof(result)))
>> -ret = 1;
>> -}
>> +u32 crc = ~(u32)0;
>> +char result[sizeof(crc)];
>>  
>> -if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) {
>> +/* We support csum type crc32 only as of now */
>> +if (csum_type != BTRFS_CSUM_TYPE_CRC32) {
>>  btrfs_err(fs_info, "unsupported checksum algorithm %u",
>> -csum_type);
>> -ret = 1;
>> +  csum_type);
>> +return 1;
> 
> I think it would make sense to return EINVAL here. Then the sole caller
> of this function (open_ctree) can use the code a bit more idiomatically:
> 
> ret = btrfs_check_super_csum()
> 
> if (ret < 0) {
> handle failure
> }
> 
> Right now we return 1 on both when the function fails due to invalid CRC
> algorithm and if the checksum doesn't match. And we are going to print 2
> things: unsupported checksum algorithm and the
> btrfs_err(fs_info, "superblock checksum mismatch");from open_ctree.
> 
> Let's implemented proper error returning strategy for this function.
> 

Disregard, you have already implemented something like that in 2/3. I
will comment there how it can be improved.

> 
> 
>>  }
>>  
>> -return ret;
>> +/*
>> + * The super_block structure does not span the whole
>> + * BTRFS_SUPER_INFO_SIZE range, we expect that the unused space
>> + * is filled with zeros and is included in the checksum.
>> + */
>> +crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE,
>> +  crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
>> +btrfs_csum_final(crc, result);
>> +
>> +if (memcmp(raw_disk_sb, result, sizeof(result)))
>> +return 1;
> 
> Perhaps return EUCLEAN i.e something is corrupted hence the checksum
> didn't pan out as we expected.
> 
>> +
>> +return 0;
>>  }
>>  
>>  /*
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs: return required error from btrfs_check_super_csum

2018-03-20 Thread Nikolay Borisov


On 20.03.2018 11:53, Anand Jain wrote:
> Return the required -EINVAL from the function btrfs_check_super_csum()
> and move the error logging into the btrfs_check_super_csum() itself.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/disk-io.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 582ed6af3c50..aafd836af61d 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -406,7 +406,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
> *fs_info,
>   if (csum_type != BTRFS_CSUM_TYPE_CRC32) {
>   btrfs_err(fs_info, "unsupported checksum algorithm %u",
> csum_type);
> - return 1;
> + return -EINVAL;
>   }
>  
>   /*
> @@ -418,8 +418,10 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
> *fs_info,
> crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
>   btrfs_csum_final(crc, result);
>  
> - if (memcmp(raw_disk_sb, result, sizeof(result)))
> - return 1;
> + if (memcmp(raw_disk_sb, result, sizeof(result))) {
> + btrfs_err(fs_info, "superblock checksum mismatch");
> + return -EINVAL;

Don't we want EUCLEAN here, since an error in the checksum indicates
corruption?

> + }
>  
>   return 0;
>  }
> @@ -2571,9 +2573,8 @@ int open_ctree(struct super_block *sb,
>* We want to check superblock checksum, the type is stored inside.
>* Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
>*/
> - if (btrfs_check_super_csum(fs_info, bh->b_data)) {
> - btrfs_err(fs_info, "superblock checksum mismatch");
> - err = -EINVAL;
> + err = btrfs_check_super_csum(fs_info, bh->b_data);
> + if (err) {
>   brelse(bh);
>   goto fail_alloc;
>   }
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs: cleanup btrfs_check_super_csum() to check the csum_type to the type

2018-03-20 Thread Nikolay Borisov


On 20.03.2018 11:53, Anand Jain wrote:
> %csum_type is being checked to check the number of csum types we support,
> which is 1 as of now. Instead just check if the type matches with the
> only type we support, that is BTRFS_CSUM_TYPE_CRC32. And further adds
> cleanup.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/disk-io.c | 41 +++--
>  1 file changed, 19 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 1657d6aa4fa6..582ed6af3c50 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -399,32 +399,29 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
> *fs_info,
>   struct btrfs_super_block *disk_sb =
>   (struct btrfs_super_block *)raw_disk_sb;
>   u16 csum_type = btrfs_super_csum_type(disk_sb);
> - int ret = 0;
> -
> - if (csum_type == BTRFS_CSUM_TYPE_CRC32) {
> - u32 crc = ~(u32)0;
> - char result[sizeof(crc)];
> -
> - /*
> -  * The super_block structure does not span the whole
> -  * BTRFS_SUPER_INFO_SIZE range, we expect that the unused space
> -  * is filled with zeros and is included in the checksum.
> -  */
> - crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE,
> - crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
> - btrfs_csum_final(crc, result);
> -
> - if (memcmp(raw_disk_sb, result, sizeof(result)))
> - ret = 1;
> - }
> + u32 crc = ~(u32)0;
> + char result[sizeof(crc)];
>  
> - if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) {
> + /* We support csum type crc32 only as of now */
> + if (csum_type != BTRFS_CSUM_TYPE_CRC32) {
>   btrfs_err(fs_info, "unsupported checksum algorithm %u",
> - csum_type);
> - ret = 1;
> +   csum_type);
> + return 1;

I think it would make sense to return EINVAL here. Then the sole caller
of this function (open_ctree) can use the code a bit more idiomatically:

ret = btrfs_check_super_csum()

if (ret < 0) {
handle failure
}

Right now we return 1 on both when the function fails due to invalid CRC
algorithm and if the checksum doesn't match. And we are going to print 2
things: unsupported checksum algorithm and the
btrfs_err(fs_info, "superblock checksum mismatch");from open_ctree.

Let's implemented proper error returning strategy for this function.



>   }
>  
> - return ret;
> + /*
> +  * The super_block structure does not span the whole
> +  * BTRFS_SUPER_INFO_SIZE range, we expect that the unused space
> +  * is filled with zeros and is included in the checksum.
> +  */
> + crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE,
> +   crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
> + btrfs_csum_final(crc, result);
> +
> + if (memcmp(raw_disk_sb, result, sizeof(result)))
> + return 1;

Perhaps return EUCLEAN i.e something is corrupted hence the checksum
didn't pan out as we expected.

> +
> + return 0;
>  }
>  
>  /*
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Preparatory to add the csum check in the scan context

2018-03-20 Thread Anand Jain
Anand Jain (3):
  btrfs: cleanup btrfs_check_super_csum() to check the csum_type to the
type
  btrfs: return required error from btrfs_check_super_csum
  btrfs: refactor btrfs_check_super_csum() for scan

 fs/btrfs/disk-io.c | 50 --
 fs/btrfs/disk-io.h |  1 +
 2 files changed, 25 insertions(+), 26 deletions(-)

-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs: cleanup btrfs_check_super_csum() to check the csum_type to the type

2018-03-20 Thread Anand Jain
%csum_type is being checked to check the number of csum types we support,
which is 1 as of now. Instead just check if the type matches with the
only type we support, that is BTRFS_CSUM_TYPE_CRC32. And further adds
cleanup.

Signed-off-by: Anand Jain 
---
 fs/btrfs/disk-io.c | 41 +++--
 1 file changed, 19 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1657d6aa4fa6..582ed6af3c50 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -399,32 +399,29 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
struct btrfs_super_block *disk_sb =
(struct btrfs_super_block *)raw_disk_sb;
u16 csum_type = btrfs_super_csum_type(disk_sb);
-   int ret = 0;
-
-   if (csum_type == BTRFS_CSUM_TYPE_CRC32) {
-   u32 crc = ~(u32)0;
-   char result[sizeof(crc)];
-
-   /*
-* The super_block structure does not span the whole
-* BTRFS_SUPER_INFO_SIZE range, we expect that the unused space
-* is filled with zeros and is included in the checksum.
-*/
-   crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE,
-   crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
-   btrfs_csum_final(crc, result);
-
-   if (memcmp(raw_disk_sb, result, sizeof(result)))
-   ret = 1;
-   }
+   u32 crc = ~(u32)0;
+   char result[sizeof(crc)];
 
-   if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) {
+   /* We support csum type crc32 only as of now */
+   if (csum_type != BTRFS_CSUM_TYPE_CRC32) {
btrfs_err(fs_info, "unsupported checksum algorithm %u",
-   csum_type);
-   ret = 1;
+ csum_type);
+   return 1;
}
 
-   return ret;
+   /*
+* The super_block structure does not span the whole
+* BTRFS_SUPER_INFO_SIZE range, we expect that the unused space
+* is filled with zeros and is included in the checksum.
+*/
+   crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE,
+ crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
+   btrfs_csum_final(crc, result);
+
+   if (memcmp(raw_disk_sb, result, sizeof(result)))
+   return 1;
+
+   return 0;
 }
 
 /*
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs: refactor btrfs_check_super_csum() for scan

2018-03-20 Thread Anand Jain
In preparation to use the function btrfs_check_super_csum() for
the device scan context, make it nonstatic and pass the
struct block_device instead of the struct fs_info.

Signed-off-by: Anand Jain 
---
 fs/btrfs/disk-io.c | 12 ++--
 fs/btrfs/disk-io.h |  1 +
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index aafd836af61d..d2ace2dca9de 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -393,8 +393,7 @@ static int verify_parent_transid(struct extent_io_tree 
*io_tree,
  * Return 0 if the superblock checksum type matches the checksum value of that
  * algorithm. Pass the raw disk superblock data.
  */
-static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info,
- char *raw_disk_sb)
+int btrfs_check_super_csum(struct block_device *bdev, char *raw_disk_sb)
 {
struct btrfs_super_block *disk_sb =
(struct btrfs_super_block *)raw_disk_sb;
@@ -404,8 +403,8 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
 
/* We support csum type crc32 only as of now */
if (csum_type != BTRFS_CSUM_TYPE_CRC32) {
-   btrfs_err(fs_info, "unsupported checksum algorithm %u",
- csum_type);
+   pr_err("BTRFS error (device %pg): unsupported checksum 
algorithm %u",
+   bdev, csum_type);
return -EINVAL;
}
 
@@ -419,7 +418,8 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
btrfs_csum_final(crc, result);
 
if (memcmp(raw_disk_sb, result, sizeof(result))) {
-   btrfs_err(fs_info, "superblock checksum mismatch");
+   pr_err("BTRFS error (device %pg): superblock checksum mismatch",
+   bdev);
return -EINVAL;
}
 
@@ -2573,7 +2573,7 @@ int open_ctree(struct super_block *sb,
 * We want to check superblock checksum, the type is stored inside.
 * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
 */
-   err = btrfs_check_super_csum(fs_info, bh->b_data);
+   err = btrfs_check_super_csum(fs_devices->latest_bdev, bh->b_data);
if (err) {
brelse(bh);
goto fail_alloc;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 70a88d61b547..1427fa1181d4 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -69,6 +69,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int 
max_mirrors);
 struct buffer_head *btrfs_read_dev_super(struct block_device *bdev);
 int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
struct buffer_head **bh_ret);
+int btrfs_check_super_csum(struct block_device *bdev, char *raw_disk_sb);
 int btrfs_commit_super(struct btrfs_fs_info *fs_info);
 struct btrfs_root *btrfs_read_fs_root(struct btrfs_root *tree_root,
  struct btrfs_key *location);
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs: return required error from btrfs_check_super_csum

2018-03-20 Thread Anand Jain
Return the required -EINVAL from the function btrfs_check_super_csum()
and move the error logging into the btrfs_check_super_csum() itself.

Signed-off-by: Anand Jain 
---
 fs/btrfs/disk-io.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 582ed6af3c50..aafd836af61d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -406,7 +406,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
if (csum_type != BTRFS_CSUM_TYPE_CRC32) {
btrfs_err(fs_info, "unsupported checksum algorithm %u",
  csum_type);
-   return 1;
+   return -EINVAL;
}
 
/*
@@ -418,8 +418,10 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
  crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
btrfs_csum_final(crc, result);
 
-   if (memcmp(raw_disk_sb, result, sizeof(result)))
-   return 1;
+   if (memcmp(raw_disk_sb, result, sizeof(result))) {
+   btrfs_err(fs_info, "superblock checksum mismatch");
+   return -EINVAL;
+   }
 
return 0;
 }
@@ -2571,9 +2573,8 @@ int open_ctree(struct super_block *sb,
 * We want to check superblock checksum, the type is stored inside.
 * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k).
 */
-   if (btrfs_check_super_csum(fs_info, bh->b_data)) {
-   btrfs_err(fs_info, "superblock checksum mismatch");
-   err = -EINVAL;
+   err = btrfs_check_super_csum(fs_info, bh->b_data);
+   if (err) {
brelse(bh);
goto fail_alloc;
}
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy

2018-03-20 Thread Anand Jain



On 03/20/2018 03:32 AM, David Sterba wrote:

On Sat, Mar 17, 2018 at 06:27:15PM +0100, Christoph Biedl wrote:

Greg Kroah-Hartman wrote...


On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:



commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.



On big-endian systems, this change intruduces severe corruption,
resulting in complete loss of the data on the used block device.



That sucks.  Can you test Linus's tree to verify the problem is there?
I'll gladly revert this if Linus's tree also gets the revert, I don't
want you to hit this when you upgrade to a newer kernel.


Confirmed: The problem is, err ... was in Linus' tree as well. The
rather recent commit 8f5fd927c3a7 reverted the change, after that
everything is as expected again.


Thanks for checking.


Looking at the original commit, I don't have a clue why things go wrong
so horribly


It's a half endianness conversion. The plain in-memory structures are in
LE and has to be accessed via the helpers, super block copy and the
root item.  The commit adds only one half of the conversion, that
naturally does not exhibit on LE, because the macros are no-op.

Originally, the items were stored from the on-disk type to on-disk type,
regardless of the CPU:

   super->chunk_root = root_item->bytenr;

The patch should have added conversion of both values, like

   btrfs_set_super_chunk_root(super, btrfs_root_bytenr(root_item));

and not

   btrfs_set_super_chunk_root(super, root_item->bytenr);

It's not very obvious from the context though, typically there's one
structure that needs the accessors and is set from a value that's in CPU
byteorder. I think that the exception to this pattern confused all
involved developers.

The root_item members are annotated as __le64 that should be caught by
sparse/smatch checker in the buggy case, but we don't run the checkers
every the time.


 Ah! the RC is at the other side of the equation.
 Makes sense to me. Thanks.


- otherwise don't be afraid of my data. I took this as a
chance to verify my data recovery procedure, with success.


Should that not be the case, the damaged items in superblock can be
byteswapped back. That's 6 x u64 at most, I have a tool for that now.

Thanks again for the report and sorry for the trouble.


 It's entirely my mistake. My apologies for the inconvenience.

Thanks, Anand



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: ctree.h: Fix wrong comment position about csum size

2018-03-20 Thread Qu Wenruo


On 2018年03月20日 14:47, Misono, Tomohiro wrote:
> 
> Signed-off-by: Tomohiro Misono 

Reviewed-by: Qu Wenruo 

BTW this reminds me that, btrfs-progs is still using BTRFS_CRC32_SIZE
macro which the original comment is for.

It may be a good time to clean it up in btrfs-progs.

Thanks,
Qu

> ---
>  fs/btrfs/ctree.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index da308774b8a4..8f59cb20dd4c 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -86,9 +86,9 @@ struct btrfs_ordered_sum;
>   */
>  #define BTRFS_LINK_MAX 65535U
>  
> +/* four bytes for CRC32 */
>  static const int btrfs_csum_sizes[] = { 4 };
>  
> -/* four bytes for CRC32 */
>  #define BTRFS_EMPTY_DIR_SIZE 0
>  
>  /* ioprio of readahead is set to idle */
> 



signature.asc
Description: OpenPGP digital signature


[PATCH] btrfs: Allow non-privileged user to delete empty subvolume by default

2018-03-20 Thread Misono, Tomohiro
Deletion of subvolume by non-privileged user is completely restricted
by default because we can delete a subvolume even if it is not empty
and may cause data loss. In other words, when user_subvol_rm_allowed
mount option is used, a user can delete a subvolume containing the
directory which cannot be deleted directly by the user.

However, there should be no harm to allow users to delete empty subvolumes
when rmdir(2) would have been allowed if they were normal directories.
This patch allows deletion of empty subvolume by default.

Note that user_subvol_rm_allowed option requires write+exec permission
of the subvolume to be deleted, but they are not required for empty
subvolume.

The comment in the code is also updated accordingly.

Signed-off-by: Tomohiro Misono 
---
 fs/btrfs/ioctl.c | 55 +++
 1 file changed, 31 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 111ee282b777..838406a7a7f5 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2366,36 +2366,43 @@ static noinline int btrfs_ioctl_snap_destroy(struct 
file *file,
dest = BTRFS_I(inode)->root;
if (!capable(CAP_SYS_ADMIN)) {
/*
-* Regular user.  Only allow this with a special mount
-* option, when the user has write+exec access to the
-* subvol root, and when rmdir(2) would have been
-* allowed.
+* By default, regular user is only allowed to delete
+* empty subvols when rmdir(2) would have been allowed
+* if they were normal directories.
 *
-* Note that this is _not_ check that the subvol is
-* empty or doesn't contain data that we wouldn't
+* If the mount option 'user_subvol_rm_allowed' is set,
+* it allows users to delete non-empty subvols when the
+* user has write+exec access to the subvol root and when
+* rmdir(2) would have been allowed (except the emptiness
+* check).
+*
+* Note that this option does _not_ check that if the subvol
+* is empty or doesn't contain data that the user wouldn't
 * otherwise be able to delete.
 *
-* Users who want to delete empty subvols should try
-* rmdir(2).
+* Users who want to delete empty subvols created by
+* snapshot (ino number == 2) can use rmdir(2).
 */
-   err = -EPERM;
-   if (!btrfs_test_opt(fs_info, USER_SUBVOL_RM_ALLOWED))
-   goto out_dput;
+   err = -ENOTEMPTY;
+   if (inode->i_size != BTRFS_EMPTY_DIR_SIZE) {
+   if (!btrfs_test_opt(fs_info, USER_SUBVOL_RM_ALLOWED))
+   goto out_dput;
 
-   /*
-* Do not allow deletion if the parent dir is the same
-* as the dir to be deleted.  That means the ioctl
-* must be called on the dentry referencing the root
-* of the subvol, not a random directory contained
-* within it.
-*/
-   err = -EINVAL;
-   if (root == dest)
-   goto out_dput;
+   /*
+* Do not allow deletion if the parent dir is the same
+* as the dir to be deleted.  That means the ioctl
+* must be called on the dentry referencing the root
+* of the subvol, not a random directory contained
+* within it.
+*/
+   err = -EINVAL;
+   if (root == dest)
+   goto out_dput;
 
-   err = inode_permission(inode, MAY_WRITE | MAY_EXEC);
-   if (err)
-   goto out_dput;
+   err = inode_permission(inode, MAY_WRITE | MAY_EXEC);
+   if (err)
+   goto out_dput;
+   }
}
 
/* check if subvolume may be deleted by a user */
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/6] btrfs-progs: check/lowmem mode: Check inline extent size

2018-03-20 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 check/mode-lowmem.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 62bcf3d2e126..e5446f9c7280 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -1417,6 +1417,8 @@ static int check_file_extent(struct btrfs_root *root, 
struct btrfs_key *fkey,
u64 csum_found; /* In byte size, sectorsize aligned */
u64 search_start;   /* Logical range start we search for csum */
u64 search_len; /* Logical range len we search for csum */
+   u32 max_inline_extent_size = min_t(u32, root->fs_info->sectorsize - 1,
+   BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info));
unsigned int extent_type;
unsigned int is_hole;
int compressed = 0;
@@ -1440,6 +1442,32 @@ static int check_file_extent(struct btrfs_root *root, 
struct btrfs_key *fkey,
root->objectid, fkey->objectid, fkey->offset);
err |= FILE_EXTENT_ERROR;
}
+   if (compressed) {
+   if (extent_num_bytes > root->fs_info->sectorsize) {
+   error(
+"root %llu EXTENT_DATA[%llu %llu] too large inline extent ram size, have %llu, 
max: %u",
+   root->objectid, fkey->objectid,
+   fkey->offset, extent_num_bytes,
+   root->fs_info->sectorsize - 1);
+   err |= FILE_EXTENT_ERROR;
+   }
+   if (item_inline_len > max_inline_extent_size) {
+   error(
+"root %llu EXTENT_DATA[%llu %llu] too large inline extent on-disk size, have 
%u, max: %u",
+   root->objectid, fkey->objectid,
+   fkey->offset, item_inline_len,
+   max_inline_extent_size);
+   err |= FILE_EXTENT_ERROR;
+   }
+   } else {
+   if (extent_num_bytes > max_inline_extent_size) {
+   error(
+ "root %llu EXTENT_DATA[%llu %llu] too large inline extent size, have %llu, 
max: %u",
+   root->objectid, fkey->objectid, fkey->offset,
+   extent_num_bytes, max_inline_extent_size);
+   err |= FILE_EXTENT_ERROR;
+   }
+   }
if (!compressed && extent_num_bytes != item_inline_len) {
error(
"root %llu EXTENT_DATA[%llu %llu] wrong inline size, have: 
%llu, expected: %u",
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/6] btrfs-progs: mkfs/rootdir: Fix inline extent creation check

2018-03-20 Thread Qu Wenruo
Just like convert, we need extra check against sector size for creating
inline extent.

Signed-off-by: Qu Wenruo 
---
 mkfs/rootdir.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mkfs/rootdir.c b/mkfs/rootdir.c
index e06b65ac13e4..a1d223a2408a 100644
--- a/mkfs/rootdir.c
+++ b/mkfs/rootdir.c
@@ -139,7 +139,8 @@ static int fill_inode_item(struct btrfs_trans_handle *trans,
}
if (S_ISREG(src->st_mode)) {
btrfs_set_stack_inode_size(dst, (u64)src->st_size);
-   if (src->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info))
+   if (src->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info) &&
+   src->st_size < sectorsize)
btrfs_set_stack_inode_nbytes(dst, src->st_size);
else {
blocks = src->st_size / sectorsize;
@@ -327,7 +328,8 @@ static int add_file_items(struct btrfs_trans_handle *trans,
if (st->st_size % sectorsize)
blocks += 1;
 
-   if (st->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)) {
+   if (st->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info) &&
+   st->st_size < sectorsize) {
char *buffer = malloc(st->st_size);
 
if (!buffer) {
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/6] btrfs-progs: check/original mode: Check inline extent size

2018-03-20 Thread Qu Wenruo
For inline compressed file extent, kernel doesn't allow inline extent
ram size larger than sector size and on-disk inline extent size should
not exceed BTRFS_MAX_INLINE_DATA_SIZE().

For inline uncompressed file extent, kernel doesn't allow inline extent
ram and on-disk size larger than either BTRFS_MAX_INLINE_DATA_SIZE() or
sector size.

Check it in original mode.

Signed-off-by: Qu Wenruo 
---
 check/main.c  | 16 
 check/mode-original.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/check/main.c b/check/main.c
index 97baae583f04..7821faa929a3 100644
--- a/check/main.c
+++ b/check/main.c
@@ -560,6 +560,8 @@ static void print_inode_error(struct btrfs_root *root, 
struct inode_record *rec)
fprintf(stderr, ", bad file extent");
if (errors & I_ERR_FILE_EXTENT_OVERLAP)
fprintf(stderr, ", file extent overlap");
+   if (errors & I_ERR_FILE_EXTENT_TOO_LARGE)
+   fprintf(stderr, ", inline file extent too large");
if (errors & I_ERR_FILE_EXTENT_DISCOUNT)
fprintf(stderr, ", file extent discount");
if (errors & I_ERR_DIR_ISIZE_WRONG)
@@ -1433,6 +1435,8 @@ static int process_file_extent(struct btrfs_root *root,
u64 disk_bytenr = 0;
u64 extent_offset = 0;
u64 mask = root->fs_info->sectorsize - 1;
+   u32 max_inline_size = min_t(u32, mask,
+   BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info));
int extent_type;
int ret;
 
@@ -1458,9 +1462,21 @@ static int process_file_extent(struct btrfs_root *root,
extent_type = btrfs_file_extent_type(eb, fi);
 
if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
+   u8 compression = btrfs_file_extent_compression(eb, fi);
+   struct btrfs_item *item = btrfs_item_nr(slot);
+
num_bytes = btrfs_file_extent_inline_len(eb, slot, fi);
if (num_bytes == 0)
rec->errors |= I_ERR_BAD_FILE_EXTENT;
+   if (compression) {
+   if (btrfs_file_extent_inline_item_len(eb, item) >
+   max_inline_size ||
+   num_bytes > root->fs_info->sectorsize)
+   rec->errors |= I_ERR_FILE_EXTENT_TOO_LARGE;
+   } else {
+   if (num_bytes > max_inline_size)
+   rec->errors |= I_ERR_FILE_EXTENT_TOO_LARGE;
+   }
rec->found_size += num_bytes;
num_bytes = (num_bytes + mask) & ~mask;
} else if (extent_type == BTRFS_FILE_EXTENT_REG ||
diff --git a/check/mode-original.h b/check/mode-original.h
index f859af478f0f..368de692fdd1 100644
--- a/check/mode-original.h
+++ b/check/mode-original.h
@@ -185,6 +185,7 @@ struct file_extent_hole {
 #define I_ERR_SOME_CSUM_MISSING(1 << 12)
 #define I_ERR_LINK_COUNT_WRONG (1 << 13)
 #define I_ERR_FILE_EXTENT_ORPHAN   (1 << 14)
+#define I_ERR_FILE_EXTENT_TOO_LARGE(1 << 15)
 
 struct inode_record {
struct list_head backrefs;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/6] btrfs-progs: test/convert: Add test case for invalid large inline data extent

2018-03-20 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 .../016-invalid-large-inline-extent/test.sh| 22 ++
 1 file changed, 22 insertions(+)
 create mode 100755 tests/convert-tests/016-invalid-large-inline-extent/test.sh

diff --git a/tests/convert-tests/016-invalid-large-inline-extent/test.sh 
b/tests/convert-tests/016-invalid-large-inline-extent/test.sh
new file mode 100755
index ..f37c7c09d2e7
--- /dev/null
+++ b/tests/convert-tests/016-invalid-large-inline-extent/test.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+# Check if btrfs-convert refuses to rollback the filesystem, and leave the fs
+# and the convert image untouched
+
+source "$TEST_TOP/common"
+source "$TEST_TOP/common.convert"
+
+setup_root_helper
+prepare_test_dev
+check_prereq btrfs-convert
+check_global_prereq mke2fs
+
+convert_test_prep_fs ext4 mke2fs -t ext4 -b 4096
+
+# Create a 6K file, which should not be inlined
+run_check $SUDO_HELPER dd if=/dev/zero bs=2k count=3 of="$TEST_MNT/file1"
+
+run_check_umount_test_dev
+
+# convert_test_do_convert() will call btrfs check, which should expose any
+# invalid inline extent with too large size
+convert_test_do_convert
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/6] btrfs-progs: convert: Fix inline file extent creation condition

2018-03-20 Thread Qu Wenruo
[Bug]
On btrfs converted from ext*, one user reported the following kernel
warning:
 [ cut here ]
 BTRFS: Transaction aborted (error -95)
 WARNING: CPU: 0 PID: 324 at fs/btrfs/inode.c:3042 
btrfs_finish_ordered_io+0x7ab/0x850 [btrfs]
 Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
 RIP: 0010:btrfs_finish_ordered_io+0x7ab/0x850 [btrfs]
 ...
 Call Trace:
  normal_work_helper+0x39/0x370 [btrfs]
  process_one_work+0x1ce/0x410
  worker_thread+0x2b/0x3d0
  ? process_one_work+0x410/0x410
  kthread+0x113/0x130
  ? kthread_create_on_node+0x70/0x70
  ? do_syscall_64+0x74/0x190
  ? SyS_exit_group+0x10/0x10
  ret_from_fork+0x35/0x40
 ---[ end trace c8ed62ff6a525901 ]---
 BTRFS: error (device dm-2) in
btrfs_finish_ordered_io:3042: errno=-95 unknown
 BTRFS info (device dm-2): forced readonly
 BTRFS error (device dm-2): pending csums is 6447104

[Cause]
The call trace and the unique return value points to
__btrfs_drop_extents(), when we tries to drop pages of an inline extent,
we will trigger such -EOPNOTSUPP.

However kernel has limitation on the size of inline file extent
(sector size for ram size and sector size - 1for on-disk size),
btrfs-convert doesn't have the same limitation, resulting much larger
file extent.

The lack of correct inline extent size check dates back to 2008 when
btrfs-convert is added into btrfs-progs.

[Fix]
Fix the inline extent creation condition, not only using
BTRFS_MAX_INLINE_DATA_SIZE(), which is only the maximum size of inline
data according to nodesize, but also limit it against sector size.

Signed-off-by: Qu Wenruo 
---
 convert/source-ext2.c | 2 +-
 convert/source-reiserfs.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/convert/source-ext2.c b/convert/source-ext2.c
index b1492c78693d..e13fbe00415a 100644
--- a/convert/source-ext2.c
+++ b/convert/source-ext2.c
@@ -310,7 +310,7 @@ static int ext2_create_file_extents(struct 
btrfs_trans_handle *trans,
if (ret)
goto fail;
if ((convert_flags & CONVERT_FLAG_INLINE_DATA) && data.first_block == 0
-   && data.num_blocks > 0
+   && data.num_blocks > 0 && inode_size < sectorsize
&& inode_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)) {
u64 num_bytes = data.num_blocks * sectorsize;
u64 disk_bytenr = data.disk_block * sectorsize;
diff --git a/convert/source-reiserfs.c b/convert/source-reiserfs.c
index 39d6f0728bd3..eeb68d962c5d 100644
--- a/convert/source-reiserfs.c
+++ b/convert/source-reiserfs.c
@@ -376,7 +376,8 @@ static int reiserfs_convert_tail(struct btrfs_trans_handle 
*trans,
u64 isize;
int ret;
 
-   if (length >= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info))
+   if (length >= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info) ||
+   length >= root->fs_info->sectorsize)
return convert_direct(trans, root, objectid, inode, body,
  length, offset, convert_flags);
 
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/6] Fix long standing -EOPNOTSUPP problem caused by large inline extent

2018-03-20 Thread Qu Wenruo
The patch is based on v4.15.1, and is designed to replace the old patch
in devel branch.

Kernel doesn't support dropping range inside inline extent, and prevents
such thing happening by limiting max inline extent size to
min(max_inline, sectorsize - 1) in cow_file_range_inline().

However btrfs-progs only inherit the BTRFS_MAX_INLINE_DATA_SIZE() macro,
which doesn't have sectorsize check.
And since btrfs-progs defaults to 16K nodesize, above macro allows large
inline extent over 15K size.

This leads to unexpected kernel behavior.

The bug exists in several parts of btrfs-progs, any tool which creates
file extent is involved, including:
1) btrfs-convert
2) mkfs --rootdir

This patchset fixes the problems in convert (both ext2 and reiserfs),
mkfs --rootdir, then add check support for both original and lowmem
mode, and finally adds 2 test cases, one for mkfs and one for convert.

For mkfs test case, it can already be exposed by misc/002, but a
pin-point test case will be much better.

changelog:
v2:
  Don't modify BTRFS_MAX_INLINE_DATA_SIZE(), but add extra check to
  callers who create file extents.
v3:
  Merge fixes for convert.
  Add real commit message for convert fixes.
  Use $TEST_TOP to replace cooperate with stand alone test cases.
  Use for loops to make the new test case shorter.

Qu Wenruo (6):
  btrfs-progs: convert: Fix inline file extent creation condition
  btrfs-progs: mkfs/rootdir: Fix inline extent creation check
  btrfs-progs: check/original mode: Check inline extent size
  btrfs-progs: check/lowmem mode: Check inline extent size
  btrfs-progs: test/convert: Add test case for invalid large inline data
extent
  btrfs-progs: test/mkfs: Add test case for rootdir inline extent size

 check/main.c   | 16 
 check/mode-lowmem.c| 28 +
 check/mode-original.h  |  1 +
 convert/source-ext2.c  |  2 +-
 convert/source-reiserfs.c  |  3 +-
 mkfs/rootdir.c |  6 ++-
 .../016-invalid-large-inline-extent/test.sh| 22 ++
 tests/mkfs-tests/014-rootdir-inline-extent/test.sh | 47 ++
 8 files changed, 121 insertions(+), 4 deletions(-)
 create mode 100755 tests/convert-tests/016-invalid-large-inline-extent/test.sh
 create mode 100755 tests/mkfs-tests/014-rootdir-inline-extent/test.sh

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 6/6] btrfs-progs: test/mkfs: Add test case for rootdir inline extent size

2018-03-20 Thread Qu Wenruo
Add a test case for mkfs --rootdir, using files with different file
sizes to check if invalid large inline extent could exist.

Signed-off-by: Qu Wenruo 
---
 tests/mkfs-tests/014-rootdir-inline-extent/test.sh | 47 ++
 1 file changed, 47 insertions(+)
 create mode 100755 tests/mkfs-tests/014-rootdir-inline-extent/test.sh

diff --git a/tests/mkfs-tests/014-rootdir-inline-extent/test.sh 
b/tests/mkfs-tests/014-rootdir-inline-extent/test.sh
new file mode 100755
index ..167ced1e4987
--- /dev/null
+++ b/tests/mkfs-tests/014-rootdir-inline-extent/test.sh
@@ -0,0 +1,47 @@
+#!/bin/bash
+# Regression test for mkfs.btrfs --rootdir with inline file extents
+# For any large inline file extent, btrfs check could already report it
+
+source "$TEST_TOP/common"
+
+check_global_prereq fallocate
+check_prereq mkfs.btrfs
+
+prepare_test_dev
+
+tmp=$(mktemp -d --tmpdir btrfs-progs-mkfs.rootdirXXX)
+
+pagesize=$(getconf PAGESIZE)
+create_file()
+{
+   local size=$1
+   # Reuse size as filename
+   run_check fallocate -l $size "$tmp/$size"
+}
+
+test_mkfs_rootdir()
+{
+   nodesize=$1
+   run_check "$TOP/mkfs.btrfs" --nodesize $nodesize -f --rootdir "$tmp" \
+   "$TEST_DEV"
+   run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
+}
+
+# Use power of 2 from 512 to 64K (maximum node size) as base file size
+for i in 512 1024 2048 4096 8192 16384 32768; do
+   create_file $(($i - 102))
+   # 101 is the overhead size for max inline extent
+   create_file $(($i - 101))
+   create_file $(($i - 100))
+
+   create_file $(($i - 1))
+   create_file $i
+   create_file $(($i + 1))
+done
+
+for nodesize in 4096 8192 16384 32768 65536; do
+   if [ $nodesize -ge $pagesize ]; then
+   test_mkfs_rootdir $nodesize
+   fi
+done
+rm -rf -- "$tmp"
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html