Re: [PATCH 2/3] btrfs: return required error from btrfs_check_super_csum
On 03/21/2018 10:37 AM, Anand Jain wrote: @@ -418,8 +418,10 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE); btrfs_csum_final(crc, result); - if (memcmp(raw_disk_sb, result, sizeof(result))) - return 1; + if (memcmp(raw_disk_sb, result, sizeof(result))) { + btrfs_err(fs_info, "superblock checksum mismatch"); + return -EINVAL; Don't we want EUCLEAN here, since an error in the checksum indicates corruption? With the v2 patch sentout here is the error str that is seen on the cli. mount: mount /dev/sdb on /btrfs failed: Structure needs cleaning And since we return -EINVAL for the error csum_type not found we get.. mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. Looks like mount syscall interprets -EINVAL error. I am ok. However would have much better if it says checksum error. Also we may need an error code to say incompatible disk format/version. Do you/anyone suggest I should update the errno.h? Thanks, Anand Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] btrfs: return required error from btrfs_check_super_csum
@@ -418,8 +418,10 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE); btrfs_csum_final(crc, result); - if (memcmp(raw_disk_sb, result, sizeof(result))) - return 1; + if (memcmp(raw_disk_sb, result, sizeof(result))) { + btrfs_err(fs_info, "superblock checksum mismatch"); + return -EINVAL; Don't we want EUCLEAN here, since an error in the checksum indicates corruption? With the v2 patch sentout here is the error str that is seen on the cli. mount: mount /dev/sdb on /btrfs failed: Structure needs cleaning I am ok. However would have much better if it says checksum error. Do you/anyone suggest I should update the errno.h? Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Crashes running btrfs scrub
On 2018年03月21日 01:44, Mike Stevens wrote: > >>> 30 devices is really not that much, heck you get 90 disks top load JBOD >>> storage chassis these days and BTRFS does sound like an attractive choice >>> for things like that. > >> So Mike's case is, that both metadata and data are configured as >> raid6, and the operations, balance and scrub, that he tried, need to >> set the existing block group as readonly (in order to avoid any >> further changes being applied during operations are running), then we >> got into the place where another system chunk is needed. > >> However, I think it'd be better to have some warnings about this when >> doing a) mkfs.btrfs -mraid6, b) btrfs device add. > >> David, any idea? > > I'll certainly vote for a warning, I would have set this up differently had I > been aware. > > My filesystem check seems to have returned successfully: > > [root@auswscs9903] ~ # btrfs check --readonly /dev/sdb > Checking filesystem on /dev/sdb > UUID: 77afc2bb-f7a8-4ce9-9047-c031f7571150 > checking extents > checking free space cache > checking fs roots > checking csums > checking root refs > found 97926270238720 bytes used err is 0 > total csum bytes: 95395030288 > total tree bytes: 201223503872 > total fs tree bytes: 84484636672 > total extent tree bytes: 7195869184 > btree space waste bytes: 29627784154 > file data blocks allocated: 97756261568512 > > I've remounted the filesystem and I can at least touch a file. I'm > restarting the rsync that was running when it originally went read only. > What is the next step if it drops back to r/o? As already mentioned, if you're using tons of disks and RAID0/10/5/6 as metadata profile, you can just convert your metadata (or just system) to RAID1/DUP. Then there will be more than enough space for system chunk array. Thanks, Qu > > > The information contained in this e-mail is for the exclusive use of the > intended recipient(s) and may be confidential, proprietary, and/or > legally privileged. Inadvertent disclosure of this message does not > constitute a waiver of any privilege. If you receive this message in > error, please do not directly or indirectly use, print, copy, forward, > or disclose any part of this message. Please also delete this e-mail > and all copies and notify the sender. Thank you. > > N�r��y���b�X��ǧv�^�){.n�+{�n�߲)���w*jg����ݢj/���z�ޖ��2�ޙ���&�)ߡ�a�����G���h��j:+v���w�٥ > signature.asc Description: OpenPGP digital signature
Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()
On Tue, Mar 20, 2018 at 04:26:52PM -0700, Linus Torvalds wrote: > On Tue, Mar 20, 2018 at 4:23 PM, Linus Torvalds >wrote: > > > > Hmm. So thanks to the diseased mind of Martin Uecker, there's a better > > test for "__is_constant()": > > > > /* Glory to Martin Uecker */ > > #define __is_constant(a) \ > > (sizeof(int) == sizeof(*(1 ? ((void*)((a) * 0l)) : (int*)1))) > > > > that is actually *specified* by the C standard to work, and doesn't > > even depend on any gcc extensions. > > Well, it does depend on 'sizeof(*(void *)X)' being 1 and the compiler > not complaining about it, and that sizeof(int) is not 1. > > But since we depend on those things in the kernel anyway, that's fine. It also depends upon "ICE for null pointer constant purposes" having the same extensions as "ICE for enum purposes", etc., which is not obvious. Back in 2007 or so we had a long thread regarding null pointer constants in sparse; I probably still have notes from back then, but that'll take some serious digging to find ;-/ What's more, gcc definitely has odd extensions. Example I remember from back then: extern unsigned n; struct { int x : 1 + n - n; } y; is accepted. Used to be quietly accepted with -Wpedantic -std=c99, even, but that got fixed - with -Wpedantic it does, at least, warn. What is and what is not recognized is fairly random - 1 + n - n + 1 + n - n is recognized as "constant", 1 + n + n + 1 - n - n is not. Of course, neither is an ICE. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] btrfs: return required error from btrfs_check_super_csum
Return the required -EINVAL and -EUCLEAN from the function btrfs_check_super_csum(). And more the error log into the parent function. Signed-off-by: Anand Jain--- fs/btrfs/disk-io.c | 28 +--- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 582ed6af3c50..4c6de2743250 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -392,9 +392,10 @@ static int verify_parent_transid(struct extent_io_tree *io_tree, /* * Return 0 if the superblock checksum type matches the checksum value of that * algorithm. Pass the raw disk superblock data. + * Otherwise: -EINVAL if csum type is not found + * -EUCLEAN if csum does not match */ -static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, - char *raw_disk_sb) +static int btrfs_check_super_csum(char *raw_disk_sb) { struct btrfs_super_block *disk_sb = (struct btrfs_super_block *)raw_disk_sb; @@ -403,11 +404,8 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, char result[sizeof(crc)]; /* We support csum type crc32 only as of now */ - if (csum_type != BTRFS_CSUM_TYPE_CRC32) { - btrfs_err(fs_info, "unsupported checksum algorithm %u", - csum_type); - return 1; - } + if (csum_type != BTRFS_CSUM_TYPE_CRC32) + return -EINVAL; /* * The super_block structure does not span the whole @@ -419,7 +417,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, btrfs_csum_final(crc, result); if (memcmp(raw_disk_sb, result, sizeof(result))) - return 1; + return -EUCLEAN; return 0; } @@ -2571,9 +2569,17 @@ int open_ctree(struct super_block *sb, * We want to check superblock checksum, the type is stored inside. * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k). */ - if (btrfs_check_super_csum(fs_info, bh->b_data)) { - btrfs_err(fs_info, "superblock checksum mismatch"); - err = -EINVAL; + err = btrfs_check_super_csum(bh->b_data); + if (err) { + if (err == -EINVAL) + pr_err("BTRFS error (device %pg): unsupported checksum algorithm", + fs_devices->latest_bdev); + else if (err == -EUCLEAN) + pr_err("BTRFS error (device %pg): superblock checksum mismatch", + fs_devices->latest_bdev); + else + pr_err("BTRFS error (device %pg): checksum check failed %d", + fs_devices->latest_bdev, err); brelse(bh); goto fail_alloc; } -- 2.15.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] btrfs: cleanup btrfs_check_super_csum() to check the csum_type to the type
%csum_type is being checked to check the number of csum types we support, which is 1 as of now. Instead just check if the type matches with the only type we support, that is BTRFS_CSUM_TYPE_CRC32. And further adds cleanup. Signed-off-by: Anand Jain--- fs/btrfs/disk-io.c | 41 +++-- 1 file changed, 19 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1657d6aa4fa6..582ed6af3c50 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -399,32 +399,29 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, struct btrfs_super_block *disk_sb = (struct btrfs_super_block *)raw_disk_sb; u16 csum_type = btrfs_super_csum_type(disk_sb); - int ret = 0; - - if (csum_type == BTRFS_CSUM_TYPE_CRC32) { - u32 crc = ~(u32)0; - char result[sizeof(crc)]; - - /* -* The super_block structure does not span the whole -* BTRFS_SUPER_INFO_SIZE range, we expect that the unused space -* is filled with zeros and is included in the checksum. -*/ - crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE, - crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE); - btrfs_csum_final(crc, result); - - if (memcmp(raw_disk_sb, result, sizeof(result))) - ret = 1; - } + u32 crc = ~(u32)0; + char result[sizeof(crc)]; - if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) { + /* We support csum type crc32 only as of now */ + if (csum_type != BTRFS_CSUM_TYPE_CRC32) { btrfs_err(fs_info, "unsupported checksum algorithm %u", - csum_type); - ret = 1; + csum_type); + return 1; } - return ret; + /* +* The super_block structure does not span the whole +* BTRFS_SUPER_INFO_SIZE range, we expect that the unused space +* is filled with zeros and is included in the checksum. +*/ + crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE, + crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE); + btrfs_csum_final(crc, result); + + if (memcmp(raw_disk_sb, result, sizeof(result))) + return 1; + + return 0; } /* -- 2.15.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/2] Preparatory to add the csum check in the scan context
v1->v2: Merge 2/3 and 3/3 Use -EUCLEAN for csum mismatch Move the error logging to the parent function Drop the struct btrfs_fsinfo as the argument for btrfs_check_super_csum() Do not make btrfs_check_super_csum() nonstatic here, it will be part of the patchset which will use the csum in the scan context Anand Jain (2): btrfs: cleanup btrfs_check_super_csum() to check the csum_type to the type btrfs: return required error from btrfs_check_super_csum fs/btrfs/disk-io.c | 57 -- 1 file changed, 30 insertions(+), 27 deletions(-) -- 2.15.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()
On Tue, Mar 20, 2018 at 4:23 PM, Linus Torvaldswrote: > > Hmm. So thanks to the diseased mind of Martin Uecker, there's a better > test for "__is_constant()": > > /* Glory to Martin Uecker */ > #define __is_constant(a) \ > (sizeof(int) == sizeof(*(1 ? ((void*)((a) * 0l)) : (int*)1))) > > that is actually *specified* by the C standard to work, and doesn't > even depend on any gcc extensions. Well, it does depend on 'sizeof(*(void *)X)' being 1 and the compiler not complaining about it, and that sizeof(int) is not 1. But since we depend on those things in the kernel anyway, that's fine. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()
On Sat, Mar 17, 2018 at 1:07 PM, Kees Cookwrote: > > No luck! :( gcc 4.4 refuses to play along. And, hilariously, not only > does it not change the complaint about __builtin_choose_expr(), it > also thinks that's a VLA now. Hmm. So thanks to the diseased mind of Martin Uecker, there's a better test for "__is_constant()": /* Glory to Martin Uecker */ #define __is_constant(a) \ (sizeof(int) == sizeof(*(1 ? ((void*)((a) * 0l)) : (int*)1))) that is actually *specified* by the C standard to work, and doesn't even depend on any gcc extensions. The reason is some really subtle pointer conversion rules, where the type of the ternary operator will depend on whether one of the pointers is NULL or not. And the definition of NULL, in turn, very much depends on "integer constant expression that has the value 0". Are you willing to do one final try on a generic min/max? Same as my last patch, but using the above __is_constant() test instead of __builtin_constant_p? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] btrfs: refactor btrfs_check_super_csum() for scan
On 03/20/2018 07:10 PM, Nikolay Borisov wrote: On 20.03.2018 11:53, Anand Jain wrote: In preparation to use the function btrfs_check_super_csum() for the device scan context, make it nonstatic and pass the struct block_device instead of the struct fs_info. Signed-off-by: Anand Jain--- fs/btrfs/disk-io.c | 12 ++-- fs/btrfs/disk-io.h | 1 + 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index aafd836af61d..d2ace2dca9de 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -393,8 +393,7 @@ static int verify_parent_transid(struct extent_io_tree *io_tree, * Return 0 if the superblock checksum type matches the checksum value of that * algorithm. Pass the raw disk superblock data. */ -static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, - char *raw_disk_sb) +int btrfs_check_super_csum(struct block_device *bdev, char *raw_disk_sb) { Since this has become a public function and you've changed the fs_info parameter to taking a bdev, which is not used for anything else than printing the error I think it's appropriate to document which block device should be passed to this function. Also passing the block device only for printing purposes seems a bit odd. Its the device on which we have read the superblock. I find it odd too. Will do that pr_err on the parent function. Will add comments to the public function. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] btrfs: return required error from btrfs_check_super_csum
@@ -418,8 +418,10 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE); btrfs_csum_final(crc, result); - if (memcmp(raw_disk_sb, result, sizeof(result))) - return 1; + if (memcmp(raw_disk_sb, result, sizeof(result))) { + btrfs_err(fs_info, "superblock checksum mismatch"); + return -EINVAL; Don't we want EUCLEAN here, since an error in the checksum indicates corruption? yeah, I struggled with it before. Strangely there isn't an error code for the csum error when csum being widely used in the computer systems. Our closest are: #define EFAULT 14 /* Bad address */ #define EUCLEAN 135 /* Structure needs cleaning */ Will use the suggested EUCLEAN. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel warning - not sure if this is important
On 03/19/2018 08:34 AM, Nikolay Borisov wrote: > You are hitting a warning that was removed in : c8195a7b1ad5 ("btrfs: > remove spurious WARN_ON(ref->count < 0) in find_parent_nodes") > > > In short - you can disregard it. However, it's likely you will continue > seeing it in the future unless you update to 4.16 (the commit is not > tagged for stable )) > Thank you, much appreciated. I think I can manage to wait for 4.16! Pete -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Allow non-privileged user to delete empty subvolume by default
On 03/20/2018 07:45 AM, Misono, Tomohiro wrote: > Deletion of subvolume by non-privileged user is completely restricted > by default because we can delete a subvolume even if it is not empty > and may cause data loss. In other words, when user_subvol_rm_allowed > mount option is used, a user can delete a subvolume containing the > directory which cannot be deleted directly by the user. > > However, there should be no harm to allow users to delete empty subvolumes > when rmdir(2) would have been allowed if they were normal directories. > This patch allows deletion of empty subvolume by default. Instead of modifying the ioctl, what about allowing rmdir(2) to work for an _empty_ subvolume (and all the permission check are satisfied) ? > > Note that user_subvol_rm_allowed option requires write+exec permission > of the subvolume to be deleted, but they are not required for empty > subvolume. > > The comment in the code is also updated accordingly. > > Signed-off-by: Tomohiro Misono> --- > fs/btrfs/ioctl.c | 55 +++ > 1 file changed, 31 insertions(+), 24 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 111ee282b777..838406a7a7f5 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -2366,36 +2366,43 @@ static noinline int btrfs_ioctl_snap_destroy(struct > file *file, > dest = BTRFS_I(inode)->root; > if (!capable(CAP_SYS_ADMIN)) { > /* > - * Regular user. Only allow this with a special mount > - * option, when the user has write+exec access to the > - * subvol root, and when rmdir(2) would have been > - * allowed. > + * By default, regular user is only allowed to delete > + * empty subvols when rmdir(2) would have been allowed > + * if they were normal directories. >* > - * Note that this is _not_ check that the subvol is > - * empty or doesn't contain data that we wouldn't > + * If the mount option 'user_subvol_rm_allowed' is set, > + * it allows users to delete non-empty subvols when the > + * user has write+exec access to the subvol root and when > + * rmdir(2) would have been allowed (except the emptiness > + * check). > + * > + * Note that this option does _not_ check that if the subvol > + * is empty or doesn't contain data that the user wouldn't >* otherwise be able to delete. >* > - * Users who want to delete empty subvols should try > - * rmdir(2). > + * Users who want to delete empty subvols created by > + * snapshot (ino number == 2) can use rmdir(2). >*/ > - err = -EPERM; > - if (!btrfs_test_opt(fs_info, USER_SUBVOL_RM_ALLOWED)) > - goto out_dput; > + err = -ENOTEMPTY; > + if (inode->i_size != BTRFS_EMPTY_DIR_SIZE) { > + if (!btrfs_test_opt(fs_info, USER_SUBVOL_RM_ALLOWED)) > + goto out_dput; > > - /* > - * Do not allow deletion if the parent dir is the same > - * as the dir to be deleted. That means the ioctl > - * must be called on the dentry referencing the root > - * of the subvol, not a random directory contained > - * within it. > - */ > - err = -EINVAL; > - if (root == dest) > - goto out_dput; > + /* > + * Do not allow deletion if the parent dir is the same > + * as the dir to be deleted. That means the ioctl > + * must be called on the dentry referencing the root > + * of the subvol, not a random directory contained > + * within it. > + */ > + err = -EINVAL; > + if (root == dest) > + goto out_dput; > > - err = inode_permission(inode, MAY_WRITE | MAY_EXEC); > - if (err) > - goto out_dput; > + err = inode_permission(inode, MAY_WRITE | MAY_EXEC); > + if (err) > + goto out_dput; > + } > } > > /* check if subvolume may be deleted by a user */ > -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Crashes running btrfs scrub
On 03/19/2018 07:06 PM, Liu Bo wrote: [...] > > So Mike's case is, that both metadata and data are configured as > raid6, and the operations, balance and scrub, that he tried, need to > set the existing block group as readonly (in order to avoid any > further changes being applied during operations are running), then we > got into the place where another system chunk is needed. > > However, I think it'd be better to have some warnings about this when > doing a) mkfs.btrfs -mraid6, b) btrfs device add. What about in the kernel dmesg during the mount > > David, any idea? > > thanks, > liubo > -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs: defer adding raid type kobject until after chunk relocation
From: Jeff MahoneyAny time the first block group of a new type is created, we add a new kobject to sysfs to hold the attributes for that type. Kobject-internal allocations always use GFP_KERNEL, making them prone to fs-reclaim races. While it appears as if this can occur any time a block group is created, the only times the first block group of a new type can be created in memory is at mount and when we create the first new block group during raid conversion. This patch adds a new list to track pending kobject additions and then handles them after we do chunk relocation. Between relocating the target chunk (or forcing allocation of a new chunk in the case of data) and removing the old chunk, we're in a safe place for fs-reclaim to occur. We're holding the volume mutex, which is already held across page faults, and the delete_unused_bgs_mutex, which will only stall the cleaner thread. Signed-off-by: Jeff Mahoney --- fs/btrfs/ctree.h | 6 - fs/btrfs/disk-io.c | 2 ++ fs/btrfs/extent-tree.c | 59 +++--- fs/btrfs/sysfs.c | 2 +- fs/btrfs/volumes.c | 12 ++ 5 files changed, 61 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index ffbb05aa66fa..75dbdf1bbead 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -381,8 +381,9 @@ struct btrfs_dev_replace { /* For raid type sysfs entries */ struct raid_kobject { - int raid_type; + u64 flags; struct kobject kobj; + struct list_head list; }; struct btrfs_space_info { @@ -938,6 +939,8 @@ struct btrfs_fs_info { int thread_pool_size; struct kobject *space_info_kobj; + struct list_head pending_raid_kobjs; + spinlock_t pending_raid_kobjs_lock; /* uncontended */ u64 total_pinned; @@ -2688,6 +2691,7 @@ int btrfs_can_relocate(struct btrfs_fs_info *fs_info, u64 bytenr); int btrfs_make_block_group(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 bytes_used, u64 type, u64 chunk_offset, u64 size); +void btrfs_add_raid_kobjects(struct btrfs_fs_info *fs_info); struct btrfs_trans_handle *btrfs_start_trans_remove_block_group( struct btrfs_fs_info *fs_info, const u64 chunk_offset); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index eb6bb3169a9e..d5e1c2ff71ff 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2447,6 +2447,8 @@ int open_ctree(struct super_block *sb, INIT_LIST_HEAD(_info->delayed_iputs); INIT_LIST_HEAD(_info->delalloc_roots); INIT_LIST_HEAD(_info->caching_block_groups); + INIT_LIST_HEAD(_info->pending_raid_kobjs); + spin_lock_init(_info->pending_raid_kobjs_lock); spin_lock_init(_info->delalloc_root_lock); spin_lock_init(_info->trans_lock); spin_lock_init(_info->fs_roots_radix_lock); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 0e9a21230217..bb5368faa937 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9908,9 +9908,39 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) return 0; } +/* link_block_group will queue up kobjects to add when we're reclaim-safe */ +void btrfs_add_raid_kobjects(struct btrfs_fs_info *fs_info) +{ + struct btrfs_space_info *space_info; + struct raid_kobject *rkobj; + LIST_HEAD(list); + int index; + int ret = 0; + + spin_lock(_info->pending_raid_kobjs_lock); + list_splice_init(_info->pending_raid_kobjs, ); + spin_unlock(_info->pending_raid_kobjs_lock); + + list_for_each_entry(rkobj, , list) { + space_info = __find_space_info(fs_info, rkobj->flags); + index = __get_raid_index(rkobj->flags); + + ret = kobject_add(>kobj, _info->kobj, + "%s", get_raid_name(index)); + if (ret) { + kobject_put(>kobj); + break; + } + } + if (ret) + btrfs_warn(fs_info, + "failed to add kobject for block cache, ignoring"); +} + static void link_block_group(struct btrfs_block_group_cache *cache) { struct btrfs_space_info *space_info = cache->space_info; + struct btrfs_fs_info *fs_info = cache->fs_info; int index = get_block_group_index(cache); bool first = false; @@ -9921,27 +9951,19 @@ static void link_block_group(struct btrfs_block_group_cache *cache) up_write(_info->groups_sem); if (first) { - struct raid_kobject *rkobj; - int ret; - - rkobj = kzalloc(sizeof(*rkobj), GFP_NOFS); - if (!rkobj) - goto out_err; - rkobj->raid_type = index; - kobject_init(>kobj, _raid_ktype);
[PATCH 1/2] btrfs: remove dead create_space_info calls
From: Jeff MahoneySince commit 2be12ef79 (btrfs: Separate space_info create/update), we've separated out the creation and updating of the space info structures. That commit was a straightforward refactoring of the two parts of update_space_info, but we can go a step further. Since commits c59021f84 (Btrfs: fix OOPS of empty filesystem after balance) and b742bb82f (Btrfs: Link block groups of different raid types), we know that the space_info structures will be created at mount and there will only ever be, at most, three of them. This patch cleans out the create_space_info calls after __find_space_info returns NULL since __find_space_info *can't* return NULL. The initial cause for reviewing this was the kobject_add calls from create_space_info occuring in sites where fs-reclaim wasn't allowed. Now we are certain they occur only early in the mount process and are safe. Signed-off-by: Jeff Mahoney --- fs/btrfs/ctree.h | 6 +++--- fs/btrfs/extent-tree.c | 16 ++-- 2 files changed, 5 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index da308774b8a4..ffbb05aa66fa 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -952,9 +952,9 @@ struct btrfs_fs_info { struct btrfs_fs_devices *fs_devices; /* -* the space_info list is almost entirely read only. It only changes -* when we add a new raid type to the FS, and that happens -* very rarely. RCU is used to protect it. +* The space_info list is effectively read only after initial +* setup. It is populated at mount time and cleaned up after +* all block groups are removed. RCU is used to protect it. */ struct list_head space_info; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 2d5e963fae88..0e9a21230217 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4603,11 +4603,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans, return -ENOSPC; space_info = __find_space_info(fs_info, flags); - if (!space_info) { - ret = create_space_info(fs_info, flags, _info); - if (ret) - return ret; - } + ASSERT(space_info); again: spin_lock(_info->lock); @@ -10255,15 +10251,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, * with its ->space_info set. */ cache->space_info = __find_space_info(fs_info, cache->flags); - if (!cache->space_info) { - ret = create_space_info(fs_info, cache->flags, - >space_info); - if (ret) { - btrfs_remove_free_space_cache(cache); - btrfs_put_block_group(cache); - return ret; - } - } + ASSERT(cache->space_info); ret = btrfs_add_block_group_cache(fs_info, cache); if (ret) { -- 2.15.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Crashes running btrfs scrub
>> 30 devices is really not that much, heck you get 90 disks top load JBOD >> storage chassis these days and BTRFS does sound like an attractive choice >> for things like that. > So Mike's case is, that both metadata and data are configured as > raid6, and the operations, balance and scrub, that he tried, need to > set the existing block group as readonly (in order to avoid any > further changes being applied during operations are running), then we > got into the place where another system chunk is needed. > However, I think it'd be better to have some warnings about this when > doing a) mkfs.btrfs -mraid6, b) btrfs device add. > David, any idea? I'll certainly vote for a warning, I would have set this up differently had I been aware. My filesystem check seems to have returned successfully: [root@auswscs9903] ~ # btrfs check --readonly /dev/sdb Checking filesystem on /dev/sdb UUID: 77afc2bb-f7a8-4ce9-9047-c031f7571150 checking extents checking free space cache checking fs roots checking csums checking root refs found 97926270238720 bytes used err is 0 total csum bytes: 95395030288 total tree bytes: 201223503872 total fs tree bytes: 84484636672 total extent tree bytes: 7195869184 btree space waste bytes: 29627784154 file data blocks allocated: 97756261568512 I've remounted the filesystem and I can at least touch a file. I'm restarting the rsync that was running when it originally went read only. What is the next step if it drops back to r/o? The information contained in this e-mail is for the exclusive use of the intended recipient(s) and may be confidential, proprietary, and/or legally privileged. Inadvertent disclosure of this message does not constitute a waiver of any privilege. If you receive this message in error, please do not directly or indirectly use, print, copy, forward, or disclose any part of this message. Please also delete this e-mail and all copies and notify the sender. Thank you.
Re: [PATCH] btrfs-progs: Beautify owner when printing leaf/nodes
On 2018年03月20日 16:45, Nikolay Borisov wrote: > Currently we print the raw values of the owner field of leaf/nodes. > This can result in output like the following: > > leaf 30490624 items 2 free space 16061 generation 4 owner 18446744073709551607 > > With the patch applied the same leaf looks like: > > leaf 30490624 items 2 free space 16061 generation 4 owner DATA_RELOC_TREE > > Signed-off-by: Nikolay BorisovLooks good. Reviewed-by: Qu Wenruo Thanks, Qu > --- > print-tree.c | 14 -- > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/print-tree.c b/print-tree.c > index d59f9002f54c..a2f6bfc027c9 100644 > --- a/print-tree.c > +++ b/print-tree.c > @@ -1188,11 +1188,12 @@ void btrfs_print_leaf(struct btrfs_root *root, struct > extent_buffer *eb) > header_flags_to_str(flags, flags_str); > nr = btrfs_header_nritems(eb); > > - printf("leaf %llu items %d free space %d generation %llu owner %llu\n", > + printf("leaf %llu items %d free space %d generation %llu owner ", > (unsigned long long)btrfs_header_bytenr(eb), nr, > btrfs_leaf_free_space(root, eb), > - (unsigned long long)btrfs_header_generation(eb), > - (unsigned long long)btrfs_header_owner(eb)); > + (unsigned long long)btrfs_header_generation(eb)); > + print_objectid(stdout, btrfs_header_owner(eb), 0); > + printf("\n"); > printf("leaf %llu flags 0x%llx(%s) backref revision %d\n", > btrfs_header_bytenr(eb), flags, flags_str, backref_rev); > print_uuids(eb); > @@ -1365,12 +1366,13 @@ void btrfs_print_tree(struct btrfs_root *root, struct > extent_buffer *eb, int fol > btrfs_print_leaf(root, eb); > return; > } > - printf("node %llu level %d items %d free %u generation %llu owner > %llu\n", > + printf("node %llu level %d items %d free %u generation %llu owner ", > (unsigned long long)eb->start, > btrfs_header_level(eb), nr, > (u32)BTRFS_NODEPTRS_PER_BLOCK(root->fs_info) - nr, > - (unsigned long long)btrfs_header_generation(eb), > - (unsigned long long)btrfs_header_owner(eb)); > + (unsigned long long)btrfs_header_generation(eb)); > + print_objectid(stdout, btrfs_header_owner(eb), 0); > + printf("\n"); > print_uuids(eb); > fflush(stdout); > for (i = 0; i < nr; i++) { > signature.asc Description: OpenPGP digital signature
Re: [PATCH 3/3] btrfs: refactor btrfs_check_super_csum() for scan
On 20.03.2018 11:53, Anand Jain wrote: > In preparation to use the function btrfs_check_super_csum() for > the device scan context, make it nonstatic and pass the > struct block_device instead of the struct fs_info. > > Signed-off-by: Anand Jain> --- > fs/btrfs/disk-io.c | 12 ++-- > fs/btrfs/disk-io.h | 1 + > 2 files changed, 7 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index aafd836af61d..d2ace2dca9de 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -393,8 +393,7 @@ static int verify_parent_transid(struct extent_io_tree > *io_tree, > * Return 0 if the superblock checksum type matches the checksum value of > that > * algorithm. Pass the raw disk superblock data. > */ > -static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, > - char *raw_disk_sb) > +int btrfs_check_super_csum(struct block_device *bdev, char *raw_disk_sb) > { Since this has become a public function and you've changed the fs_info parameter to taking a bdev, which is not used for anything else than printing the error I think it's appropriate to document which block device should be passed to this function. Also passing the block device only for printing purposes seems a bit odd. > struct btrfs_super_block *disk_sb = > (struct btrfs_super_block *)raw_disk_sb; > @@ -404,8 +403,8 @@ static int btrfs_check_super_csum(struct btrfs_fs_info > *fs_info, > > /* We support csum type crc32 only as of now */ > if (csum_type != BTRFS_CSUM_TYPE_CRC32) { > - btrfs_err(fs_info, "unsupported checksum algorithm %u", > - csum_type); > + pr_err("BTRFS error (device %pg): unsupported checksum > algorithm %u", > + bdev, csum_type); > return -EINVAL; > } > > @@ -419,7 +418,8 @@ static int btrfs_check_super_csum(struct btrfs_fs_info > *fs_info, > btrfs_csum_final(crc, result); > > if (memcmp(raw_disk_sb, result, sizeof(result))) { > - btrfs_err(fs_info, "superblock checksum mismatch"); > + pr_err("BTRFS error (device %pg): superblock checksum mismatch", > + bdev); > return -EINVAL; > } > > @@ -2573,7 +2573,7 @@ int open_ctree(struct super_block *sb, >* We want to check superblock checksum, the type is stored inside. >* Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k). >*/ > - err = btrfs_check_super_csum(fs_info, bh->b_data); > + err = btrfs_check_super_csum(fs_devices->latest_bdev, bh->b_data); > if (err) { > brelse(bh); > goto fail_alloc; > diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h > index 70a88d61b547..1427fa1181d4 100644 > --- a/fs/btrfs/disk-io.h > +++ b/fs/btrfs/disk-io.h > @@ -69,6 +69,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int > max_mirrors); > struct buffer_head *btrfs_read_dev_super(struct block_device *bdev); > int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num, > struct buffer_head **bh_ret); > +int btrfs_check_super_csum(struct block_device *bdev, char *raw_disk_sb); > int btrfs_commit_super(struct btrfs_fs_info *fs_info); > struct btrfs_root *btrfs_read_fs_root(struct btrfs_root *tree_root, > struct btrfs_key *location); > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Validate child tree block's level and first key
Hi Qu, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on v4.16-rc4] [also build test WARNING on next-20180319] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-Validate-child-tree-block-s-level-and-first-key/20180320-054353 reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> fs/btrfs/ref-verify.c:586:45: sparse: not enough arguments for function >> read_tree_block fs/btrfs/ref-verify.c:284:27: sparse: context imbalance in 'add_block_entry' - different lock contexts for basic block fs/btrfs/ref-verify.c:371:20: sparse: context imbalance in 'add_tree_block' - unexpected unlock fs/btrfs/ref-verify.c:396:28: sparse: context imbalance in 'add_shared_data_ref' - unexpected unlock fs/btrfs/ref-verify.c:434:28: sparse: context imbalance in 'add_extent_data_ref' - unexpected unlock fs/btrfs/ref-verify.c:892:20: sparse: context imbalance in 'btrfs_ref_tree_mod' - unexpected unlock fs/btrfs/ref-verify.c: In function 'walk_down_tree': fs/btrfs/ref-verify.c:586:9: error: too few arguments to function 'read_tree_block' eb = read_tree_block(fs_info, block_bytenr, gen); ^~~ In file included from fs/btrfs/ref-verify.c:22:0: fs/btrfs/disk-io.h:55:23: note: declared here struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr, ^~~ vim +586 fs/btrfs/ref-verify.c fd708b81 Josef Bacik 2017-09-29 570 fd708b81 Josef Bacik 2017-09-29 571 /* Walk down to the leaf from the given level */ fd708b81 Josef Bacik 2017-09-29 572 static int walk_down_tree(struct btrfs_root *root, struct btrfs_path *path, fd708b81 Josef Bacik 2017-09-29 573 int level, u64 *bytenr, u64 *num_bytes) fd708b81 Josef Bacik 2017-09-29 574 { fd708b81 Josef Bacik 2017-09-29 575struct btrfs_fs_info *fs_info = root->fs_info; fd708b81 Josef Bacik 2017-09-29 576struct extent_buffer *eb; fd708b81 Josef Bacik 2017-09-29 577u64 block_bytenr, gen; fd708b81 Josef Bacik 2017-09-29 578int ret = 0; fd708b81 Josef Bacik 2017-09-29 579 fd708b81 Josef Bacik 2017-09-29 580while (level >= 0) { fd708b81 Josef Bacik 2017-09-29 581if (level) { fd708b81 Josef Bacik 2017-09-29 582block_bytenr = btrfs_node_blockptr(path->nodes[level], fd708b81 Josef Bacik 2017-09-29 583 path->slots[level]); fd708b81 Josef Bacik 2017-09-29 584gen = btrfs_node_ptr_generation(path->nodes[level], fd708b81 Josef Bacik 2017-09-29 585 path->slots[level]); fd708b81 Josef Bacik 2017-09-29 @586eb = read_tree_block(fs_info, block_bytenr, gen); fd708b81 Josef Bacik 2017-09-29 587if (IS_ERR(eb)) fd708b81 Josef Bacik 2017-09-29 588return PTR_ERR(eb); fd708b81 Josef Bacik 2017-09-29 589if (!extent_buffer_uptodate(eb)) { fd708b81 Josef Bacik 2017-09-29 590 free_extent_buffer(eb); fd708b81 Josef Bacik 2017-09-29 591return -EIO; fd708b81 Josef Bacik 2017-09-29 592} fd708b81 Josef Bacik 2017-09-29 593 btrfs_tree_read_lock(eb); fd708b81 Josef Bacik 2017-09-29 594 btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); fd708b81 Josef Bacik 2017-09-29 595path->nodes[level-1] = eb; fd708b81 Josef Bacik 2017-09-29 596path->slots[level-1] = 0; fd708b81 Josef Bacik 2017-09-29 597path->locks[level-1] = BTRFS_READ_LOCK_BLOCKING; fd708b81 Josef Bacik 2017-09-29 598} else { fd708b81 Josef Bacik 2017-09-29 599ret = process_leaf(root, path, bytenr, num_bytes); fd708b81 Josef Bacik 2017-09-29 600if (ret) fd708b81 Josef Bacik 2017-09-29 601break; fd708b81 Josef Bacik 2017-09-29 602} fd708b81 Josef Bacik 2017-09-29 603level--; fd708b81 Josef Bacik 2017-09-29 604} fd708b81 Josef Bacik 2017-09-29 605return ret; fd708b81 Josef Bacik 2017-09-29 606 } fd708b81 Josef Bacik 2017-09-29 607 :: The code at line 586 was first introduced by commit :: fd708b81d972a0714b02a60eb4792fdbf15868c4 Btrfs: add a extent ref verify tool :: TO: Josef Bacik <jo...@toxicpanda.com> :: CC: David Sterba <dste...@suse.com> --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Co
Re: [PATCH 1/3] btrfs: cleanup btrfs_check_super_csum() to check the csum_type to the type
On 20.03.2018 12:11, Nikolay Borisov wrote: > > > On 20.03.2018 11:53, Anand Jain wrote: >> %csum_type is being checked to check the number of csum types we support, >> which is 1 as of now. Instead just check if the type matches with the >> only type we support, that is BTRFS_CSUM_TYPE_CRC32. And further adds >> cleanup. >> >> Signed-off-by: Anand Jain>> --- >> fs/btrfs/disk-io.c | 41 +++-- >> 1 file changed, 19 insertions(+), 22 deletions(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index 1657d6aa4fa6..582ed6af3c50 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -399,32 +399,29 @@ static int btrfs_check_super_csum(struct btrfs_fs_info >> *fs_info, >> struct btrfs_super_block *disk_sb = >> (struct btrfs_super_block *)raw_disk_sb; >> u16 csum_type = btrfs_super_csum_type(disk_sb); >> -int ret = 0; >> - >> -if (csum_type == BTRFS_CSUM_TYPE_CRC32) { >> -u32 crc = ~(u32)0; >> -char result[sizeof(crc)]; >> - >> -/* >> - * The super_block structure does not span the whole >> - * BTRFS_SUPER_INFO_SIZE range, we expect that the unused space >> - * is filled with zeros and is included in the checksum. >> - */ >> -crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE, >> -crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE); >> -btrfs_csum_final(crc, result); >> - >> -if (memcmp(raw_disk_sb, result, sizeof(result))) >> -ret = 1; >> -} >> +u32 crc = ~(u32)0; >> +char result[sizeof(crc)]; >> >> -if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) { >> +/* We support csum type crc32 only as of now */ >> +if (csum_type != BTRFS_CSUM_TYPE_CRC32) { >> btrfs_err(fs_info, "unsupported checksum algorithm %u", >> -csum_type); >> -ret = 1; >> + csum_type); >> +return 1; > > I think it would make sense to return EINVAL here. Then the sole caller > of this function (open_ctree) can use the code a bit more idiomatically: > > ret = btrfs_check_super_csum() > > if (ret < 0) { > handle failure > } > > Right now we return 1 on both when the function fails due to invalid CRC > algorithm and if the checksum doesn't match. And we are going to print 2 > things: unsupported checksum algorithm and the > btrfs_err(fs_info, "superblock checksum mismatch");from open_ctree. > > Let's implemented proper error returning strategy for this function. > Disregard, you have already implemented something like that in 2/3. I will comment there how it can be improved. > > >> } >> >> -return ret; >> +/* >> + * The super_block structure does not span the whole >> + * BTRFS_SUPER_INFO_SIZE range, we expect that the unused space >> + * is filled with zeros and is included in the checksum. >> + */ >> +crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE, >> + crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE); >> +btrfs_csum_final(crc, result); >> + >> +if (memcmp(raw_disk_sb, result, sizeof(result))) >> +return 1; > > Perhaps return EUCLEAN i.e something is corrupted hence the checksum > didn't pan out as we expected. > >> + >> +return 0; >> } >> >> /* >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] btrfs: return required error from btrfs_check_super_csum
On 20.03.2018 11:53, Anand Jain wrote: > Return the required -EINVAL from the function btrfs_check_super_csum() > and move the error logging into the btrfs_check_super_csum() itself. > > Signed-off-by: Anand Jain> --- > fs/btrfs/disk-io.c | 13 +++-- > 1 file changed, 7 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 582ed6af3c50..aafd836af61d 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -406,7 +406,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info > *fs_info, > if (csum_type != BTRFS_CSUM_TYPE_CRC32) { > btrfs_err(fs_info, "unsupported checksum algorithm %u", > csum_type); > - return 1; > + return -EINVAL; > } > > /* > @@ -418,8 +418,10 @@ static int btrfs_check_super_csum(struct btrfs_fs_info > *fs_info, > crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE); > btrfs_csum_final(crc, result); > > - if (memcmp(raw_disk_sb, result, sizeof(result))) > - return 1; > + if (memcmp(raw_disk_sb, result, sizeof(result))) { > + btrfs_err(fs_info, "superblock checksum mismatch"); > + return -EINVAL; Don't we want EUCLEAN here, since an error in the checksum indicates corruption? > + } > > return 0; > } > @@ -2571,9 +2573,8 @@ int open_ctree(struct super_block *sb, >* We want to check superblock checksum, the type is stored inside. >* Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k). >*/ > - if (btrfs_check_super_csum(fs_info, bh->b_data)) { > - btrfs_err(fs_info, "superblock checksum mismatch"); > - err = -EINVAL; > + err = btrfs_check_super_csum(fs_info, bh->b_data); > + if (err) { > brelse(bh); > goto fail_alloc; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs: cleanup btrfs_check_super_csum() to check the csum_type to the type
On 20.03.2018 11:53, Anand Jain wrote: > %csum_type is being checked to check the number of csum types we support, > which is 1 as of now. Instead just check if the type matches with the > only type we support, that is BTRFS_CSUM_TYPE_CRC32. And further adds > cleanup. > > Signed-off-by: Anand Jain> --- > fs/btrfs/disk-io.c | 41 +++-- > 1 file changed, 19 insertions(+), 22 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 1657d6aa4fa6..582ed6af3c50 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -399,32 +399,29 @@ static int btrfs_check_super_csum(struct btrfs_fs_info > *fs_info, > struct btrfs_super_block *disk_sb = > (struct btrfs_super_block *)raw_disk_sb; > u16 csum_type = btrfs_super_csum_type(disk_sb); > - int ret = 0; > - > - if (csum_type == BTRFS_CSUM_TYPE_CRC32) { > - u32 crc = ~(u32)0; > - char result[sizeof(crc)]; > - > - /* > - * The super_block structure does not span the whole > - * BTRFS_SUPER_INFO_SIZE range, we expect that the unused space > - * is filled with zeros and is included in the checksum. > - */ > - crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE, > - crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE); > - btrfs_csum_final(crc, result); > - > - if (memcmp(raw_disk_sb, result, sizeof(result))) > - ret = 1; > - } > + u32 crc = ~(u32)0; > + char result[sizeof(crc)]; > > - if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) { > + /* We support csum type crc32 only as of now */ > + if (csum_type != BTRFS_CSUM_TYPE_CRC32) { > btrfs_err(fs_info, "unsupported checksum algorithm %u", > - csum_type); > - ret = 1; > + csum_type); > + return 1; I think it would make sense to return EINVAL here. Then the sole caller of this function (open_ctree) can use the code a bit more idiomatically: ret = btrfs_check_super_csum() if (ret < 0) { handle failure } Right now we return 1 on both when the function fails due to invalid CRC algorithm and if the checksum doesn't match. And we are going to print 2 things: unsupported checksum algorithm and the btrfs_err(fs_info, "superblock checksum mismatch");from open_ctree. Let's implemented proper error returning strategy for this function. > } > > - return ret; > + /* > + * The super_block structure does not span the whole > + * BTRFS_SUPER_INFO_SIZE range, we expect that the unused space > + * is filled with zeros and is included in the checksum. > + */ > + crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE, > + crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE); > + btrfs_csum_final(crc, result); > + > + if (memcmp(raw_disk_sb, result, sizeof(result))) > + return 1; Perhaps return EUCLEAN i.e something is corrupted hence the checksum didn't pan out as we expected. > + > + return 0; > } > > /* > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] Preparatory to add the csum check in the scan context
Anand Jain (3): btrfs: cleanup btrfs_check_super_csum() to check the csum_type to the type btrfs: return required error from btrfs_check_super_csum btrfs: refactor btrfs_check_super_csum() for scan fs/btrfs/disk-io.c | 50 -- fs/btrfs/disk-io.h | 1 + 2 files changed, 25 insertions(+), 26 deletions(-) -- 2.15.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] btrfs: cleanup btrfs_check_super_csum() to check the csum_type to the type
%csum_type is being checked to check the number of csum types we support, which is 1 as of now. Instead just check if the type matches with the only type we support, that is BTRFS_CSUM_TYPE_CRC32. And further adds cleanup. Signed-off-by: Anand Jain--- fs/btrfs/disk-io.c | 41 +++-- 1 file changed, 19 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1657d6aa4fa6..582ed6af3c50 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -399,32 +399,29 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, struct btrfs_super_block *disk_sb = (struct btrfs_super_block *)raw_disk_sb; u16 csum_type = btrfs_super_csum_type(disk_sb); - int ret = 0; - - if (csum_type == BTRFS_CSUM_TYPE_CRC32) { - u32 crc = ~(u32)0; - char result[sizeof(crc)]; - - /* -* The super_block structure does not span the whole -* BTRFS_SUPER_INFO_SIZE range, we expect that the unused space -* is filled with zeros and is included in the checksum. -*/ - crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE, - crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE); - btrfs_csum_final(crc, result); - - if (memcmp(raw_disk_sb, result, sizeof(result))) - ret = 1; - } + u32 crc = ~(u32)0; + char result[sizeof(crc)]; - if (csum_type >= ARRAY_SIZE(btrfs_csum_sizes)) { + /* We support csum type crc32 only as of now */ + if (csum_type != BTRFS_CSUM_TYPE_CRC32) { btrfs_err(fs_info, "unsupported checksum algorithm %u", - csum_type); - ret = 1; + csum_type); + return 1; } - return ret; + /* +* The super_block structure does not span the whole +* BTRFS_SUPER_INFO_SIZE range, we expect that the unused space +* is filled with zeros and is included in the checksum. +*/ + crc = btrfs_csum_data(raw_disk_sb + BTRFS_CSUM_SIZE, + crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE); + btrfs_csum_final(crc, result); + + if (memcmp(raw_disk_sb, result, sizeof(result))) + return 1; + + return 0; } /* -- 2.15.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs: refactor btrfs_check_super_csum() for scan
In preparation to use the function btrfs_check_super_csum() for the device scan context, make it nonstatic and pass the struct block_device instead of the struct fs_info. Signed-off-by: Anand Jain--- fs/btrfs/disk-io.c | 12 ++-- fs/btrfs/disk-io.h | 1 + 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index aafd836af61d..d2ace2dca9de 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -393,8 +393,7 @@ static int verify_parent_transid(struct extent_io_tree *io_tree, * Return 0 if the superblock checksum type matches the checksum value of that * algorithm. Pass the raw disk superblock data. */ -static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, - char *raw_disk_sb) +int btrfs_check_super_csum(struct block_device *bdev, char *raw_disk_sb) { struct btrfs_super_block *disk_sb = (struct btrfs_super_block *)raw_disk_sb; @@ -404,8 +403,8 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, /* We support csum type crc32 only as of now */ if (csum_type != BTRFS_CSUM_TYPE_CRC32) { - btrfs_err(fs_info, "unsupported checksum algorithm %u", - csum_type); + pr_err("BTRFS error (device %pg): unsupported checksum algorithm %u", + bdev, csum_type); return -EINVAL; } @@ -419,7 +418,8 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, btrfs_csum_final(crc, result); if (memcmp(raw_disk_sb, result, sizeof(result))) { - btrfs_err(fs_info, "superblock checksum mismatch"); + pr_err("BTRFS error (device %pg): superblock checksum mismatch", + bdev); return -EINVAL; } @@ -2573,7 +2573,7 @@ int open_ctree(struct super_block *sb, * We want to check superblock checksum, the type is stored inside. * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k). */ - err = btrfs_check_super_csum(fs_info, bh->b_data); + err = btrfs_check_super_csum(fs_devices->latest_bdev, bh->b_data); if (err) { brelse(bh); goto fail_alloc; diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index 70a88d61b547..1427fa1181d4 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -69,6 +69,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors); struct buffer_head *btrfs_read_dev_super(struct block_device *bdev); int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num, struct buffer_head **bh_ret); +int btrfs_check_super_csum(struct block_device *bdev, char *raw_disk_sb); int btrfs_commit_super(struct btrfs_fs_info *fs_info); struct btrfs_root *btrfs_read_fs_root(struct btrfs_root *tree_root, struct btrfs_key *location); -- 2.15.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs: return required error from btrfs_check_super_csum
Return the required -EINVAL from the function btrfs_check_super_csum() and move the error logging into the btrfs_check_super_csum() itself. Signed-off-by: Anand Jain--- fs/btrfs/disk-io.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 582ed6af3c50..aafd836af61d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -406,7 +406,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, if (csum_type != BTRFS_CSUM_TYPE_CRC32) { btrfs_err(fs_info, "unsupported checksum algorithm %u", csum_type); - return 1; + return -EINVAL; } /* @@ -418,8 +418,10 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE); btrfs_csum_final(crc, result); - if (memcmp(raw_disk_sb, result, sizeof(result))) - return 1; + if (memcmp(raw_disk_sb, result, sizeof(result))) { + btrfs_err(fs_info, "superblock checksum mismatch"); + return -EINVAL; + } return 0; } @@ -2571,9 +2573,8 @@ int open_ctree(struct super_block *sb, * We want to check superblock checksum, the type is stored inside. * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k). */ - if (btrfs_check_super_csum(fs_info, bh->b_data)) { - btrfs_err(fs_info, "superblock checksum mismatch"); - err = -EINVAL; + err = btrfs_check_super_csum(fs_info, bh->b_data); + if (err) { brelse(bh); goto fail_alloc; } -- 2.15.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy
On 03/20/2018 03:32 AM, David Sterba wrote: On Sat, Mar 17, 2018 at 06:27:15PM +0100, Christoph Biedl wrote: Greg Kroah-Hartman wrote... On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote: commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream. On big-endian systems, this change intruduces severe corruption, resulting in complete loss of the data on the used block device. That sucks. Can you test Linus's tree to verify the problem is there? I'll gladly revert this if Linus's tree also gets the revert, I don't want you to hit this when you upgrade to a newer kernel. Confirmed: The problem is, err ... was in Linus' tree as well. The rather recent commit 8f5fd927c3a7 reverted the change, after that everything is as expected again. Thanks for checking. Looking at the original commit, I don't have a clue why things go wrong so horribly It's a half endianness conversion. The plain in-memory structures are in LE and has to be accessed via the helpers, super block copy and the root item. The commit adds only one half of the conversion, that naturally does not exhibit on LE, because the macros are no-op. Originally, the items were stored from the on-disk type to on-disk type, regardless of the CPU: super->chunk_root = root_item->bytenr; The patch should have added conversion of both values, like btrfs_set_super_chunk_root(super, btrfs_root_bytenr(root_item)); and not btrfs_set_super_chunk_root(super, root_item->bytenr); It's not very obvious from the context though, typically there's one structure that needs the accessors and is set from a value that's in CPU byteorder. I think that the exception to this pattern confused all involved developers. The root_item members are annotated as __le64 that should be caught by sparse/smatch checker in the buggy case, but we don't run the checkers every the time. Ah! the RC is at the other side of the equation. Makes sense to me. Thanks. - otherwise don't be afraid of my data. I took this as a chance to verify my data recovery procedure, with success. Should that not be the case, the damaged items in superblock can be byteswapped back. That's 6 x u64 at most, I have a tool for that now. Thanks again for the report and sorry for the trouble. It's entirely my mistake. My apologies for the inconvenience. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: ctree.h: Fix wrong comment position about csum size
On 2018年03月20日 14:47, Misono, Tomohiro wrote: > > Signed-off-by: Tomohiro MisonoReviewed-by: Qu Wenruo BTW this reminds me that, btrfs-progs is still using BTRFS_CRC32_SIZE macro which the original comment is for. It may be a good time to clean it up in btrfs-progs. Thanks, Qu > --- > fs/btrfs/ctree.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index da308774b8a4..8f59cb20dd4c 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -86,9 +86,9 @@ struct btrfs_ordered_sum; > */ > #define BTRFS_LINK_MAX 65535U > > +/* four bytes for CRC32 */ > static const int btrfs_csum_sizes[] = { 4 }; > > -/* four bytes for CRC32 */ > #define BTRFS_EMPTY_DIR_SIZE 0 > > /* ioprio of readahead is set to idle */ > signature.asc Description: OpenPGP digital signature
[PATCH] btrfs: Allow non-privileged user to delete empty subvolume by default
Deletion of subvolume by non-privileged user is completely restricted by default because we can delete a subvolume even if it is not empty and may cause data loss. In other words, when user_subvol_rm_allowed mount option is used, a user can delete a subvolume containing the directory which cannot be deleted directly by the user. However, there should be no harm to allow users to delete empty subvolumes when rmdir(2) would have been allowed if they were normal directories. This patch allows deletion of empty subvolume by default. Note that user_subvol_rm_allowed option requires write+exec permission of the subvolume to be deleted, but they are not required for empty subvolume. The comment in the code is also updated accordingly. Signed-off-by: Tomohiro Misono--- fs/btrfs/ioctl.c | 55 +++ 1 file changed, 31 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 111ee282b777..838406a7a7f5 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2366,36 +2366,43 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file, dest = BTRFS_I(inode)->root; if (!capable(CAP_SYS_ADMIN)) { /* -* Regular user. Only allow this with a special mount -* option, when the user has write+exec access to the -* subvol root, and when rmdir(2) would have been -* allowed. +* By default, regular user is only allowed to delete +* empty subvols when rmdir(2) would have been allowed +* if they were normal directories. * -* Note that this is _not_ check that the subvol is -* empty or doesn't contain data that we wouldn't +* If the mount option 'user_subvol_rm_allowed' is set, +* it allows users to delete non-empty subvols when the +* user has write+exec access to the subvol root and when +* rmdir(2) would have been allowed (except the emptiness +* check). +* +* Note that this option does _not_ check that if the subvol +* is empty or doesn't contain data that the user wouldn't * otherwise be able to delete. * -* Users who want to delete empty subvols should try -* rmdir(2). +* Users who want to delete empty subvols created by +* snapshot (ino number == 2) can use rmdir(2). */ - err = -EPERM; - if (!btrfs_test_opt(fs_info, USER_SUBVOL_RM_ALLOWED)) - goto out_dput; + err = -ENOTEMPTY; + if (inode->i_size != BTRFS_EMPTY_DIR_SIZE) { + if (!btrfs_test_opt(fs_info, USER_SUBVOL_RM_ALLOWED)) + goto out_dput; - /* -* Do not allow deletion if the parent dir is the same -* as the dir to be deleted. That means the ioctl -* must be called on the dentry referencing the root -* of the subvol, not a random directory contained -* within it. -*/ - err = -EINVAL; - if (root == dest) - goto out_dput; + /* +* Do not allow deletion if the parent dir is the same +* as the dir to be deleted. That means the ioctl +* must be called on the dentry referencing the root +* of the subvol, not a random directory contained +* within it. +*/ + err = -EINVAL; + if (root == dest) + goto out_dput; - err = inode_permission(inode, MAY_WRITE | MAY_EXEC); - if (err) - goto out_dput; + err = inode_permission(inode, MAY_WRITE | MAY_EXEC); + if (err) + goto out_dput; + } } /* check if subvolume may be deleted by a user */ -- 2.14.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/6] btrfs-progs: check/lowmem mode: Check inline extent size
Signed-off-by: Qu Wenruo--- check/mode-lowmem.c | 28 1 file changed, 28 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 62bcf3d2e126..e5446f9c7280 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -1417,6 +1417,8 @@ static int check_file_extent(struct btrfs_root *root, struct btrfs_key *fkey, u64 csum_found; /* In byte size, sectorsize aligned */ u64 search_start; /* Logical range start we search for csum */ u64 search_len; /* Logical range len we search for csum */ + u32 max_inline_extent_size = min_t(u32, root->fs_info->sectorsize - 1, + BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)); unsigned int extent_type; unsigned int is_hole; int compressed = 0; @@ -1440,6 +1442,32 @@ static int check_file_extent(struct btrfs_root *root, struct btrfs_key *fkey, root->objectid, fkey->objectid, fkey->offset); err |= FILE_EXTENT_ERROR; } + if (compressed) { + if (extent_num_bytes > root->fs_info->sectorsize) { + error( +"root %llu EXTENT_DATA[%llu %llu] too large inline extent ram size, have %llu, max: %u", + root->objectid, fkey->objectid, + fkey->offset, extent_num_bytes, + root->fs_info->sectorsize - 1); + err |= FILE_EXTENT_ERROR; + } + if (item_inline_len > max_inline_extent_size) { + error( +"root %llu EXTENT_DATA[%llu %llu] too large inline extent on-disk size, have %u, max: %u", + root->objectid, fkey->objectid, + fkey->offset, item_inline_len, + max_inline_extent_size); + err |= FILE_EXTENT_ERROR; + } + } else { + if (extent_num_bytes > max_inline_extent_size) { + error( + "root %llu EXTENT_DATA[%llu %llu] too large inline extent size, have %llu, max: %u", + root->objectid, fkey->objectid, fkey->offset, + extent_num_bytes, max_inline_extent_size); + err |= FILE_EXTENT_ERROR; + } + } if (!compressed && extent_num_bytes != item_inline_len) { error( "root %llu EXTENT_DATA[%llu %llu] wrong inline size, have: %llu, expected: %u", -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/6] btrfs-progs: mkfs/rootdir: Fix inline extent creation check
Just like convert, we need extra check against sector size for creating inline extent. Signed-off-by: Qu Wenruo--- mkfs/rootdir.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mkfs/rootdir.c b/mkfs/rootdir.c index e06b65ac13e4..a1d223a2408a 100644 --- a/mkfs/rootdir.c +++ b/mkfs/rootdir.c @@ -139,7 +139,8 @@ static int fill_inode_item(struct btrfs_trans_handle *trans, } if (S_ISREG(src->st_mode)) { btrfs_set_stack_inode_size(dst, (u64)src->st_size); - if (src->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)) + if (src->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info) && + src->st_size < sectorsize) btrfs_set_stack_inode_nbytes(dst, src->st_size); else { blocks = src->st_size / sectorsize; @@ -327,7 +328,8 @@ static int add_file_items(struct btrfs_trans_handle *trans, if (st->st_size % sectorsize) blocks += 1; - if (st->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)) { + if (st->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info) && + st->st_size < sectorsize) { char *buffer = malloc(st->st_size); if (!buffer) { -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/6] btrfs-progs: check/original mode: Check inline extent size
For inline compressed file extent, kernel doesn't allow inline extent ram size larger than sector size and on-disk inline extent size should not exceed BTRFS_MAX_INLINE_DATA_SIZE(). For inline uncompressed file extent, kernel doesn't allow inline extent ram and on-disk size larger than either BTRFS_MAX_INLINE_DATA_SIZE() or sector size. Check it in original mode. Signed-off-by: Qu Wenruo--- check/main.c | 16 check/mode-original.h | 1 + 2 files changed, 17 insertions(+) diff --git a/check/main.c b/check/main.c index 97baae583f04..7821faa929a3 100644 --- a/check/main.c +++ b/check/main.c @@ -560,6 +560,8 @@ static void print_inode_error(struct btrfs_root *root, struct inode_record *rec) fprintf(stderr, ", bad file extent"); if (errors & I_ERR_FILE_EXTENT_OVERLAP) fprintf(stderr, ", file extent overlap"); + if (errors & I_ERR_FILE_EXTENT_TOO_LARGE) + fprintf(stderr, ", inline file extent too large"); if (errors & I_ERR_FILE_EXTENT_DISCOUNT) fprintf(stderr, ", file extent discount"); if (errors & I_ERR_DIR_ISIZE_WRONG) @@ -1433,6 +1435,8 @@ static int process_file_extent(struct btrfs_root *root, u64 disk_bytenr = 0; u64 extent_offset = 0; u64 mask = root->fs_info->sectorsize - 1; + u32 max_inline_size = min_t(u32, mask, + BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)); int extent_type; int ret; @@ -1458,9 +1462,21 @@ static int process_file_extent(struct btrfs_root *root, extent_type = btrfs_file_extent_type(eb, fi); if (extent_type == BTRFS_FILE_EXTENT_INLINE) { + u8 compression = btrfs_file_extent_compression(eb, fi); + struct btrfs_item *item = btrfs_item_nr(slot); + num_bytes = btrfs_file_extent_inline_len(eb, slot, fi); if (num_bytes == 0) rec->errors |= I_ERR_BAD_FILE_EXTENT; + if (compression) { + if (btrfs_file_extent_inline_item_len(eb, item) > + max_inline_size || + num_bytes > root->fs_info->sectorsize) + rec->errors |= I_ERR_FILE_EXTENT_TOO_LARGE; + } else { + if (num_bytes > max_inline_size) + rec->errors |= I_ERR_FILE_EXTENT_TOO_LARGE; + } rec->found_size += num_bytes; num_bytes = (num_bytes + mask) & ~mask; } else if (extent_type == BTRFS_FILE_EXTENT_REG || diff --git a/check/mode-original.h b/check/mode-original.h index f859af478f0f..368de692fdd1 100644 --- a/check/mode-original.h +++ b/check/mode-original.h @@ -185,6 +185,7 @@ struct file_extent_hole { #define I_ERR_SOME_CSUM_MISSING(1 << 12) #define I_ERR_LINK_COUNT_WRONG (1 << 13) #define I_ERR_FILE_EXTENT_ORPHAN (1 << 14) +#define I_ERR_FILE_EXTENT_TOO_LARGE(1 << 15) struct inode_record { struct list_head backrefs; -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 5/6] btrfs-progs: test/convert: Add test case for invalid large inline data extent
Signed-off-by: Qu Wenruo--- .../016-invalid-large-inline-extent/test.sh| 22 ++ 1 file changed, 22 insertions(+) create mode 100755 tests/convert-tests/016-invalid-large-inline-extent/test.sh diff --git a/tests/convert-tests/016-invalid-large-inline-extent/test.sh b/tests/convert-tests/016-invalid-large-inline-extent/test.sh new file mode 100755 index ..f37c7c09d2e7 --- /dev/null +++ b/tests/convert-tests/016-invalid-large-inline-extent/test.sh @@ -0,0 +1,22 @@ +#!/bin/bash +# Check if btrfs-convert refuses to rollback the filesystem, and leave the fs +# and the convert image untouched + +source "$TEST_TOP/common" +source "$TEST_TOP/common.convert" + +setup_root_helper +prepare_test_dev +check_prereq btrfs-convert +check_global_prereq mke2fs + +convert_test_prep_fs ext4 mke2fs -t ext4 -b 4096 + +# Create a 6K file, which should not be inlined +run_check $SUDO_HELPER dd if=/dev/zero bs=2k count=3 of="$TEST_MNT/file1" + +run_check_umount_test_dev + +# convert_test_do_convert() will call btrfs check, which should expose any +# invalid inline extent with too large size +convert_test_do_convert -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/6] btrfs-progs: convert: Fix inline file extent creation condition
[Bug] On btrfs converted from ext*, one user reported the following kernel warning: [ cut here ] BTRFS: Transaction aborted (error -95) WARNING: CPU: 0 PID: 324 at fs/btrfs/inode.c:3042 btrfs_finish_ordered_io+0x7ab/0x850 [btrfs] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] RIP: 0010:btrfs_finish_ordered_io+0x7ab/0x850 [btrfs] ... Call Trace: normal_work_helper+0x39/0x370 [btrfs] process_one_work+0x1ce/0x410 worker_thread+0x2b/0x3d0 ? process_one_work+0x410/0x410 kthread+0x113/0x130 ? kthread_create_on_node+0x70/0x70 ? do_syscall_64+0x74/0x190 ? SyS_exit_group+0x10/0x10 ret_from_fork+0x35/0x40 ---[ end trace c8ed62ff6a525901 ]--- BTRFS: error (device dm-2) in btrfs_finish_ordered_io:3042: errno=-95 unknown BTRFS info (device dm-2): forced readonly BTRFS error (device dm-2): pending csums is 6447104 [Cause] The call trace and the unique return value points to __btrfs_drop_extents(), when we tries to drop pages of an inline extent, we will trigger such -EOPNOTSUPP. However kernel has limitation on the size of inline file extent (sector size for ram size and sector size - 1for on-disk size), btrfs-convert doesn't have the same limitation, resulting much larger file extent. The lack of correct inline extent size check dates back to 2008 when btrfs-convert is added into btrfs-progs. [Fix] Fix the inline extent creation condition, not only using BTRFS_MAX_INLINE_DATA_SIZE(), which is only the maximum size of inline data according to nodesize, but also limit it against sector size. Signed-off-by: Qu Wenruo--- convert/source-ext2.c | 2 +- convert/source-reiserfs.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/convert/source-ext2.c b/convert/source-ext2.c index b1492c78693d..e13fbe00415a 100644 --- a/convert/source-ext2.c +++ b/convert/source-ext2.c @@ -310,7 +310,7 @@ static int ext2_create_file_extents(struct btrfs_trans_handle *trans, if (ret) goto fail; if ((convert_flags & CONVERT_FLAG_INLINE_DATA) && data.first_block == 0 - && data.num_blocks > 0 + && data.num_blocks > 0 && inode_size < sectorsize && inode_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)) { u64 num_bytes = data.num_blocks * sectorsize; u64 disk_bytenr = data.disk_block * sectorsize; diff --git a/convert/source-reiserfs.c b/convert/source-reiserfs.c index 39d6f0728bd3..eeb68d962c5d 100644 --- a/convert/source-reiserfs.c +++ b/convert/source-reiserfs.c @@ -376,7 +376,8 @@ static int reiserfs_convert_tail(struct btrfs_trans_handle *trans, u64 isize; int ret; - if (length >= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)) + if (length >= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info) || + length >= root->fs_info->sectorsize) return convert_direct(trans, root, objectid, inode, body, length, offset, convert_flags); -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/6] Fix long standing -EOPNOTSUPP problem caused by large inline extent
The patch is based on v4.15.1, and is designed to replace the old patch in devel branch. Kernel doesn't support dropping range inside inline extent, and prevents such thing happening by limiting max inline extent size to min(max_inline, sectorsize - 1) in cow_file_range_inline(). However btrfs-progs only inherit the BTRFS_MAX_INLINE_DATA_SIZE() macro, which doesn't have sectorsize check. And since btrfs-progs defaults to 16K nodesize, above macro allows large inline extent over 15K size. This leads to unexpected kernel behavior. The bug exists in several parts of btrfs-progs, any tool which creates file extent is involved, including: 1) btrfs-convert 2) mkfs --rootdir This patchset fixes the problems in convert (both ext2 and reiserfs), mkfs --rootdir, then add check support for both original and lowmem mode, and finally adds 2 test cases, one for mkfs and one for convert. For mkfs test case, it can already be exposed by misc/002, but a pin-point test case will be much better. changelog: v2: Don't modify BTRFS_MAX_INLINE_DATA_SIZE(), but add extra check to callers who create file extents. v3: Merge fixes for convert. Add real commit message for convert fixes. Use $TEST_TOP to replace cooperate with stand alone test cases. Use for loops to make the new test case shorter. Qu Wenruo (6): btrfs-progs: convert: Fix inline file extent creation condition btrfs-progs: mkfs/rootdir: Fix inline extent creation check btrfs-progs: check/original mode: Check inline extent size btrfs-progs: check/lowmem mode: Check inline extent size btrfs-progs: test/convert: Add test case for invalid large inline data extent btrfs-progs: test/mkfs: Add test case for rootdir inline extent size check/main.c | 16 check/mode-lowmem.c| 28 + check/mode-original.h | 1 + convert/source-ext2.c | 2 +- convert/source-reiserfs.c | 3 +- mkfs/rootdir.c | 6 ++- .../016-invalid-large-inline-extent/test.sh| 22 ++ tests/mkfs-tests/014-rootdir-inline-extent/test.sh | 47 ++ 8 files changed, 121 insertions(+), 4 deletions(-) create mode 100755 tests/convert-tests/016-invalid-large-inline-extent/test.sh create mode 100755 tests/mkfs-tests/014-rootdir-inline-extent/test.sh -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 6/6] btrfs-progs: test/mkfs: Add test case for rootdir inline extent size
Add a test case for mkfs --rootdir, using files with different file sizes to check if invalid large inline extent could exist. Signed-off-by: Qu Wenruo--- tests/mkfs-tests/014-rootdir-inline-extent/test.sh | 47 ++ 1 file changed, 47 insertions(+) create mode 100755 tests/mkfs-tests/014-rootdir-inline-extent/test.sh diff --git a/tests/mkfs-tests/014-rootdir-inline-extent/test.sh b/tests/mkfs-tests/014-rootdir-inline-extent/test.sh new file mode 100755 index ..167ced1e4987 --- /dev/null +++ b/tests/mkfs-tests/014-rootdir-inline-extent/test.sh @@ -0,0 +1,47 @@ +#!/bin/bash +# Regression test for mkfs.btrfs --rootdir with inline file extents +# For any large inline file extent, btrfs check could already report it + +source "$TEST_TOP/common" + +check_global_prereq fallocate +check_prereq mkfs.btrfs + +prepare_test_dev + +tmp=$(mktemp -d --tmpdir btrfs-progs-mkfs.rootdirXXX) + +pagesize=$(getconf PAGESIZE) +create_file() +{ + local size=$1 + # Reuse size as filename + run_check fallocate -l $size "$tmp/$size" +} + +test_mkfs_rootdir() +{ + nodesize=$1 + run_check "$TOP/mkfs.btrfs" --nodesize $nodesize -f --rootdir "$tmp" \ + "$TEST_DEV" + run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV" +} + +# Use power of 2 from 512 to 64K (maximum node size) as base file size +for i in 512 1024 2048 4096 8192 16384 32768; do + create_file $(($i - 102)) + # 101 is the overhead size for max inline extent + create_file $(($i - 101)) + create_file $(($i - 100)) + + create_file $(($i - 1)) + create_file $i + create_file $(($i + 1)) +done + +for nodesize in 4096 8192 16384 32768 65536; do + if [ $nodesize -ge $pagesize ]; then + test_mkfs_rootdir $nodesize + fi +done +rm -rf -- "$tmp" -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html