Re: Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files
On 2018年06月28日 11:14, r...@georgianit.com wrote: > > > On Wed, Jun 27, 2018, at 10:55 PM, Qu Wenruo wrote: > >> >> Please get yourself clear of what other raid1 is doing. > > A drive failure, where the drive is still there when the computer reboots, is > a situation that *any* raid 1, (or for that matter, raid 5, raid 6, anything > but raid 0) will recover from perfectly without raising a sweat. Some will > rebuild the array automatically, WOW, that's black magic, at least for RAID1. The whole RAID1 has no idea of which copy is correct unlike btrfs who has datasum. Don't bother other things, just tell me how to determine which one is correct? The only possibility is that, the misbehaved device missed several super block update so we have a chance to detect it's out-of-date. But that's not always working. If you're talking about missing generation check for btrfs, that's valid, but it's far from a "major design flaw", as there are a lot of cases where other RAID1 (mdraid or LVM mirrored) can also be affected (the brain-split case). > others will automatically kick out the misbehaving drive. *none* of them > will take back the the drive with old data and start commingling that data > with good copy.)\ This behaviour from BTRFS is completely abnormal.. and > defeats even the most basic expectations of RAID. RAID1 can only tolerate 1 missing device, it has nothing to do with error detection. And it's impossible to detect such case without extra help. Your expectation is completely wrong. > > I'm not the one who has to clear his expectations here. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > signature.asc Description: OpenPGP digital signature
Re: [PATCH] fstests: btrfs: Test if btrfs will corrupt nodatasum compressed extent when replacing device
On Thu, Jun 28, 2018 at 08:11:00AM +0300, Nikolay Borisov wrote: > > > On 1.06.2018 04:34, Qu Wenruo wrote: > > This is a long existing bug (from 2012) but exposed by a reporter > > recently, that when compressed extent without data csum get written to > > device-replace target device, the written data is in fact uncompressed data > > other than the original compressed data. > > > > And since btrfs still consider the data is compressed and will try to read > > it > > as compressed, it can cause read error. > > > > The root cause is located, and one RFC patch already sent to fix it, > > titled "[PATCH RFC] btrfs: scrub: Don't use inode pages for device replace". > > (The RFC is only for the extra possible way to fix the bug, the fix > > itself should work without problem) > > > > Reported-by: James Harvey > > Signed-off-by: Qu Wenruo > > Reviewed-by: Nikolay Borisov Thanks for the review! I assume the v3 patch also passes your review :) Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: btrfs: Test if btrfs will corrupt nodatasum compressed extent when replacing device
On 1.06.2018 04:34, Qu Wenruo wrote: > This is a long existing bug (from 2012) but exposed by a reporter > recently, that when compressed extent without data csum get written to > device-replace target device, the written data is in fact uncompressed data > other than the original compressed data. > > And since btrfs still consider the data is compressed and will try to read it > as compressed, it can cause read error. > > The root cause is located, and one RFC patch already sent to fix it, > titled "[PATCH RFC] btrfs: scrub: Don't use inode pages for device replace". > (The RFC is only for the extra possible way to fix the bug, the fix > itself should work without problem) > > Reported-by: James Harvey > Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov > --- > tests/btrfs/161 | 91 + > tests/btrfs/161.out | 2 + > tests/btrfs/group | 1 + > 3 files changed, 94 insertions(+) > create mode 100755 tests/btrfs/161 > create mode 100644 tests/btrfs/161.out > > diff --git a/tests/btrfs/161 b/tests/btrfs/161 > new file mode 100755 > index ..d4a2b474 > --- /dev/null > +++ b/tests/btrfs/161 > @@ -0,0 +1,91 @@ > +#! /bin/bash > +# FS QA Test 161 > +# > +# Test if btrfs will corrupt compressed data extent without data csum > +# by replacing it with uncompressed data, when doing replacing device. > +# > +# This could be fixed by the following RFC patch: > +# "[PATCH RFC] btrfs: scrub: Don't use inode pages for device replace" > +# > +#--- > +# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved. > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#--- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# remove previous $seqres.full before test > +rm -f $seqres.full > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_supported_os Linux > +_require_test > +_require_scratch_dev_pool 2 > +_require_scratch_dev_pool_equal_size > + > + > +_scratch_dev_pool_get 1 > +_spare_dev_get > +_scratch_pool_mkfs >> $seqres.full 2>&1 > + > +# Create nodatasum inode > +_scratch_mount "-o nodatasum" > +touch $SCRATCH_MNT/nodatasum_file > +_scratch_remount "datasum,compress" > +_pwrite_byte 0xcd 0 128K $SCRATCH_MNT/nodatasum_file > /dev/null > + > +# Write the compressed data back to disk > +sync > + > +# Replace the device > +_run_btrfs_util_prog replace start -Bf 1 $SPARE_DEV $SCRATCH_MNT > + > +_scratch_unmount > + > +_mount $SPARE_DEV $SCRATCH_MNT > + > +# Since now the compressed extent contains *UNCOMPRESSED* data, reading it > will > +# easily trigger a EIO error > +cat $SCRATCH_MNT/nodatasum_file > /dev/null > + > +_scratch_unmount > +_spare_dev_put > +_scratch_dev_pool_put > + > +echo "Silence is golden" > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/161.out b/tests/btrfs/161.out > new file mode 100644 > index ..1752a243 > --- /dev/null > +++ b/tests/btrfs/161.out > @@ -0,0 +1,2 @@ > +QA output created by 161 > +Silence is golden > diff --git a/tests/btrfs/group b/tests/btrfs/group > index f04ee8d5..f900b3d0 100644 > --- a/tests/btrfs/group > +++ b/tests/btrfs/group > @@ -163,3 +163,4 @@ > 158 auto quick raid scrub > 159 auto quick > 160 auto quick > +161 auto quick replace > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1] btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf
On 2018/06/27 19:19, Qu Wenruo wrote: > Commit ff3d27a048d9 ("btrfs: qgroup: Finish rescan when hit the last leaf > of extent tree") added a new exit for rescan finish. > > However after finishing quota rescan, we set > fs_info->qgroup_rescan_progress to (u64)-1 before we exit through the > original exit path. > While we missed that assignment of (u64)-1 in the new exit path. > > The end result is, the quota status item doesn't have the same value. > (-1 vs the last bytenr + 1) > Although it doesn't affect quota accounting, it's still better to keep > the original behavior. > > Reported-by: Misono Tomohiro > Fixes: ff3d27a048d9 ("btrfs: qgroup: Finish rescan when hit the last leaf of > extent tree") > Signed-off-by: Qu Wenruo > --- > changelog: > v2: > Commit message update, as the bug only changes the resulting quota status > item without impacting the behavior. Reviewed-by: Misono Tomohiro (As you said, the problem I reported in https://marc.info/?t=15299930357=1=2 is not related to this change) Thanks, Tomohiro Misono > --- > fs/btrfs/qgroup.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c > index 1874a6d2e6f5..99f2b9ce0f15 100644 > --- a/fs/btrfs/qgroup.c > +++ b/fs/btrfs/qgroup.c > @@ -2680,8 +2680,10 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, > struct btrfs_path *path, > free_extent_buffer(scratch_leaf); > } > > - if (done && !ret) > + if (done && !ret) { > ret = 1; > + fs_info->qgroup_rescan_progress.objectid = (u64)-1; > + } > return ret; > } > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files
On Wed, Jun 27, 2018, at 10:55 PM, Qu Wenruo wrote: > > Please get yourself clear of what other raid1 is doing. A drive failure, where the drive is still there when the computer reboots, is a situation that *any* raid 1, (or for that matter, raid 5, raid 6, anything but raid 0) will recover from perfectly without raising a sweat. Some will rebuild the array automatically, others will automatically kick out the misbehaving drive. *none* of them will take back the the drive with old data and start commingling that data with good copy.)\ This behaviour from BTRFS is completely abnormal.. and defeats even the most basic expectations of RAID. I'm not the one who has to clear his expectations here. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files
On 2018年06月28日 10:10, Remi Gauvin wrote: > On 2018-06-27 09:58 PM, Qu Wenruo wrote: >> >> >> On 2018年06月28日 09:42, Remi Gauvin wrote: >>> There seems to be a major design flaw with BTRFS that needs to be better >>> documented, to avoid massive data loss. >>> >>> Tested with Raid 1 on Ubuntu Kernel 4.15 >>> >>> The use case being tested was a Virtualbox VDI file created with >>> NODATACOW attribute, (as is often suggested, due to the painful >>> performance penalty of COW on these files.) >> >> NODATACOW implies NODATASUM. >> > > yes yes,, none of which changes the simple fact that if you use this > option, which is often touted as outright necessary for some types of > files, BTRFS raid is worse than useless,, not only will it not protect > your data at all from bitrot, (as expected), it will actively go out of > it's way to corrupt it! > > This is not expected behaviour from 'Raid', and I despair that seems to > be something that I have to explain! Nope, all normal raid1 is the same, if you corrupt one copy, you won't know which one is correct. Btrfs csum is already doing much better job than plain raid1. Please get yourself clear of what other raid1 is doing. Thanks, Qu signature.asc Description: OpenPGP digital signature
Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files
On 2018-06-27 09:58 PM, Qu Wenruo wrote: > > > On 2018年06月28日 09:42, Remi Gauvin wrote: >> There seems to be a major design flaw with BTRFS that needs to be better >> documented, to avoid massive data loss. >> >> Tested with Raid 1 on Ubuntu Kernel 4.15 >> >> The use case being tested was a Virtualbox VDI file created with >> NODATACOW attribute, (as is often suggested, due to the painful >> performance penalty of COW on these files.) > > NODATACOW implies NODATASUM. > yes yes,, none of which changes the simple fact that if you use this option, which is often touted as outright necessary for some types of files, BTRFS raid is worse than useless,, not only will it not protect your data at all from bitrot, (as expected), it will actively go out of it's way to corrupt it! This is not expected behaviour from 'Raid', and I despair that seems to be something that I have to explain! signature.asc Description: OpenPGP digital signature
Re: Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files
On 2018年06月28日 09:42, Remi Gauvin wrote: > There seems to be a major design flaw with BTRFS that needs to be better > documented, to avoid massive data loss. > > Tested with Raid 1 on Ubuntu Kernel 4.15 > > The use case being tested was a Virtualbox VDI file created with > NODATACOW attribute, (as is often suggested, due to the painful > performance penalty of COW on these files.) NODATACOW implies NODATASUM. From btrfs(5): --- Enable data copy-on-write for newly created files. Nodatacow implies nodatasum, and disables compression. All files created under nodatacow are also set the NOCOW file attribute (see chattr(1)). --- Although it's talking about the mount option, it also applies to per-inode options. Thanks, Qu > > However, if a device is temporarily dropped (this in case, tested by > disconnecting drives.) and re-connects automatically next boot, BTRFS > does not in any way synchronize the VDI file, or have any means to know > that one of copy is out of date and bad. > > The result of trying to use said VDI file is interestingly insane. > Scrub did not do anything to rectify the situation. > > signature.asc Description: OpenPGP digital signature
Major design flaw with BTRFS Raid, temporary device drop will corrupt nodatacow files
There seems to be a major design flaw with BTRFS that needs to be better documented, to avoid massive data loss. Tested with Raid 1 on Ubuntu Kernel 4.15 The use case being tested was a Virtualbox VDI file created with NODATACOW attribute, (as is often suggested, due to the painful performance penalty of COW on these files.) However, if a device is temporarily dropped (this in case, tested by disconnecting drives.) and re-connects automatically next boot, BTRFS does not in any way synchronize the VDI file, or have any means to know that one of copy is out of date and bad. The result of trying to use said VDI file is interestingly insane. Scrub did not do anything to rectify the situation. <>
Re: [PATCH] Btrfs: fix mount failure when qgroup rescan is in progress
On 2018年06月27日 07:43, fdman...@kernel.org wrote: > From: Filipe Manana > > If a power failure happens while the qgroup rescan kthread is running, > the next mount operation will always fail. This is because of a recent > regression that makes qgroup_rescan_init() incorrectly return -EINVAL > when we are mounting the filesystem (through btrfs_read_qgroup_config()). > This causes the -EINVAL error to be returned regardless of any qgroup > flags being set instead of returning the error only when neither of > the flags BTRFS_QGROUP_STATUS_FLAG_RESCAN nor BTRFS_QGROUP_STATUS_FLAG_ON > are set. My fault. > > A test case for fstests follows up soon. > > Fixes: 9593bf49675e ("btrfs: qgroup: show more meaningful qgroup_rescan_init > error message") > Signed-off-by: Filipe Manana Reviewed-by: Qu Wenruo Thanks, Qu > --- > fs/btrfs/qgroup.c | 13 ++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c > index 1874a6d2e6f5..d4171de93087 100644 > --- a/fs/btrfs/qgroup.c > +++ b/fs/btrfs/qgroup.c > @@ -2784,13 +2784,20 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 > progress_objectid, > > if (!init_flags) { > /* we're resuming qgroup rescan at mount time */ > - if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)) > + if (!(fs_info->qgroup_flags & > + BTRFS_QGROUP_STATUS_FLAG_RESCAN)) { > btrfs_warn(fs_info, > "qgroup rescan init failed, qgroup is not enabled"); > - else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON)) > + ret = -EINVAL; > + } else if (!(fs_info->qgroup_flags & > + BTRFS_QGROUP_STATUS_FLAG_ON)) { > btrfs_warn(fs_info, > "qgroup rescan init failed, qgroup rescan is not > queued"); > - return -EINVAL; > + ret = -EINVAL; > + } > + > + if (ret) > + return ret; > } > > mutex_lock(_info->qgroup_rescan_lock); > signature.asc Description: OpenPGP digital signature
Re: unsolvable technical issues?
Chris Murphy wrote: On Thu, Jun 21, 2018 at 5:13 PM, waxhead wrote: According to this: https://stratis-storage.github.io/StratisSoftwareDesign.pdf Page 4 , section 1.2 It claims that BTRFS still have significant technical issues that may never be resolved. Could someone shed some light on exactly what these technical issues might be?! What are BTRFS biggest technical problems? I think it's appropriate to file an issue and ask what they're referring to. It very well might be use case specific to Red Hat. https://github.com/stratis-storage/stratis-storage.github.io/issues I also think it's appropriate to crosslink: include URL for the start of this thread in the issue, and the issue URL to this thread. https://github.com/stratis-storage/stratis-storage.github.io/issues/1 Apparently the author have toned down the wording a bit, this confirm that the claim was without basis and probably based on "popular myth". The document the PDF links to is not yet updated. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix mount failure when qgroup rescan is in progress
On Wed, Jun 27, 2018 at 4:55 PM, Nikolay Borisov wrote: > > > On 27.06.2018 18:45, Filipe Manana wrote: >> On Wed, Jun 27, 2018 at 4:44 PM, Nikolay Borisov wrote: >>> >>> >>> On 27.06.2018 02:43, fdman...@kernel.org wrote: From: Filipe Manana If a power failure happens while the qgroup rescan kthread is running, the next mount operation will always fail. This is because of a recent regression that makes qgroup_rescan_init() incorrectly return -EINVAL when we are mounting the filesystem (through btrfs_read_qgroup_config()). This causes the -EINVAL error to be returned regardless of any qgroup flags being set instead of returning the error only when neither of the flags BTRFS_QGROUP_STATUS_FLAG_RESCAN nor BTRFS_QGROUP_STATUS_FLAG_ON are set. A test case for fstests follows up soon. Fixes: 9593bf49675e ("btrfs: qgroup: show more meaningful qgroup_rescan_init error message") Signed-off-by: Filipe Manana --- fs/btrfs/qgroup.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 1874a6d2e6f5..d4171de93087 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2784,13 +2784,20 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid, if (!init_flags) { /* we're resuming qgroup rescan at mount time */ - if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)) + if (!(fs_info->qgroup_flags & + BTRFS_QGROUP_STATUS_FLAG_RESCAN)) { btrfs_warn(fs_info, "qgroup rescan init failed, qgroup is not enabled"); - else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON)) + ret = -EINVAL; + } else if (!(fs_info->qgroup_flags & + BTRFS_QGROUP_STATUS_FLAG_ON)) { btrfs_warn(fs_info, "qgroup rescan init failed, qgroup rescan is not queued"); - return -EINVAL; + ret = -EINVAL; + } + + if (ret) + return ret; >>> >>> >>> How is this patch functionally different than the old code. In both >>> cases if either of those 2 is not set a warn is printed and -EINVAL is >>> returned? >> >> It is explained in the changelog: > > No need to be snide No one's being snide. Simply, I can't see how the changelog doesn't explain (what is already quite easy to notice from the code). > >> >> "This is because of a recent >> regression that makes qgroup_rescan_init() incorrectly return -EINVAL >> when we are mounting the filesystem (through btrfs_read_qgroup_config()). >> This causes the -EINVAL error to be returned regardless of any qgroup >> flags being set instead of returning the error only when neither of >> the flags BTRFS_QGROUP_STATUS_FLAG_RESCAN nor BTRFS_QGROUP_STATUS_FLAG_ON >> are set." >> >> If you can't understand it, try the test case... > > Ok I see it now, however your description contradicts the code. > Currently -EINVAL will be returned when either of the 2 flags is unset i.e > > !BTRFS_QGROUP_STATUS_FLAG_RESCAN && BTRFS_QGROUP_STATUS_FLAG_ON > !BTRFS_QGROUP_STATUS_FLAG_ON && BTRFS_QGROUP_STATUS_FLAG_RESCAN > > and in your description you refer to neither that is the 2 flags being > unset. Perhaps those combinations are invalid due to some other reasons > which are not visible in the code but in this case the changelog should > be expanded to cover why those 2 combinations are impossible (because if > they are -EINVAL is still likely ) ? I don't think the changelog is contradictory. It says that -EINVAL is always returned, independently of which qgroup flags are set/missing. Further it says that the error should be returned only when one of those 2 qgroup flags is not set (or both are not set). > >> >> >>> } mutex_lock(_info->qgroup_rescan_lock); >> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix mount failure when qgroup rescan is in progress
On 27.06.2018 18:45, Filipe Manana wrote: > On Wed, Jun 27, 2018 at 4:44 PM, Nikolay Borisov wrote: >> >> >> On 27.06.2018 02:43, fdman...@kernel.org wrote: >>> From: Filipe Manana >>> >>> If a power failure happens while the qgroup rescan kthread is running, >>> the next mount operation will always fail. This is because of a recent >>> regression that makes qgroup_rescan_init() incorrectly return -EINVAL >>> when we are mounting the filesystem (through btrfs_read_qgroup_config()). >>> This causes the -EINVAL error to be returned regardless of any qgroup >>> flags being set instead of returning the error only when neither of >>> the flags BTRFS_QGROUP_STATUS_FLAG_RESCAN nor BTRFS_QGROUP_STATUS_FLAG_ON >>> are set. >>> >>> A test case for fstests follows up soon. >>> >>> Fixes: 9593bf49675e ("btrfs: qgroup: show more meaningful >>> qgroup_rescan_init error message") >>> Signed-off-by: Filipe Manana >>> --- >>> fs/btrfs/qgroup.c | 13 ++--- >>> 1 file changed, 10 insertions(+), 3 deletions(-) >>> >>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c >>> index 1874a6d2e6f5..d4171de93087 100644 >>> --- a/fs/btrfs/qgroup.c >>> +++ b/fs/btrfs/qgroup.c >>> @@ -2784,13 +2784,20 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, >>> u64 progress_objectid, >>> >>> if (!init_flags) { >>> /* we're resuming qgroup rescan at mount time */ >>> - if (!(fs_info->qgroup_flags & >>> BTRFS_QGROUP_STATUS_FLAG_RESCAN)) >>> + if (!(fs_info->qgroup_flags & >>> + BTRFS_QGROUP_STATUS_FLAG_RESCAN)) { >>> btrfs_warn(fs_info, >>> "qgroup rescan init failed, qgroup is not enabled"); >>> - else if (!(fs_info->qgroup_flags & >>> BTRFS_QGROUP_STATUS_FLAG_ON)) >>> + ret = -EINVAL; >>> + } else if (!(fs_info->qgroup_flags & >>> + BTRFS_QGROUP_STATUS_FLAG_ON)) { >>> btrfs_warn(fs_info, >>> "qgroup rescan init failed, qgroup rescan is not >>> queued"); >>> - return -EINVAL; >>> + ret = -EINVAL; >>> + } >>> + >>> + if (ret) >>> + return ret; >> >> >> How is this patch functionally different than the old code. In both >> cases if either of those 2 is not set a warn is printed and -EINVAL is >> returned? > > It is explained in the changelog: No need to be snide > > "This is because of a recent > regression that makes qgroup_rescan_init() incorrectly return -EINVAL > when we are mounting the filesystem (through btrfs_read_qgroup_config()). > This causes the -EINVAL error to be returned regardless of any qgroup > flags being set instead of returning the error only when neither of > the flags BTRFS_QGROUP_STATUS_FLAG_RESCAN nor BTRFS_QGROUP_STATUS_FLAG_ON > are set." > > If you can't understand it, try the test case... Ok I see it now, however your description contradicts the code. Currently -EINVAL will be returned when either of the 2 flags is unset i.e !BTRFS_QGROUP_STATUS_FLAG_RESCAN && BTRFS_QGROUP_STATUS_FLAG_ON !BTRFS_QGROUP_STATUS_FLAG_ON && BTRFS_QGROUP_STATUS_FLAG_RESCAN and in your description you refer to neither that is the 2 flags being unset. Perhaps those combinations are invalid due to some other reasons which are not visible in the code but in this case the changelog should be expanded to cover why those 2 combinations are impossible (because if they are -EINVAL is still likely ) ? > > >> >>> } >>> >>> mutex_lock(_info->qgroup_rescan_lock); >>> > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix mount failure when qgroup rescan is in progress
On Wed, Jun 27, 2018 at 4:44 PM, Nikolay Borisov wrote: > > > On 27.06.2018 02:43, fdman...@kernel.org wrote: >> From: Filipe Manana >> >> If a power failure happens while the qgroup rescan kthread is running, >> the next mount operation will always fail. This is because of a recent >> regression that makes qgroup_rescan_init() incorrectly return -EINVAL >> when we are mounting the filesystem (through btrfs_read_qgroup_config()). >> This causes the -EINVAL error to be returned regardless of any qgroup >> flags being set instead of returning the error only when neither of >> the flags BTRFS_QGROUP_STATUS_FLAG_RESCAN nor BTRFS_QGROUP_STATUS_FLAG_ON >> are set. >> >> A test case for fstests follows up soon. >> >> Fixes: 9593bf49675e ("btrfs: qgroup: show more meaningful qgroup_rescan_init >> error message") >> Signed-off-by: Filipe Manana >> --- >> fs/btrfs/qgroup.c | 13 ++--- >> 1 file changed, 10 insertions(+), 3 deletions(-) >> >> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c >> index 1874a6d2e6f5..d4171de93087 100644 >> --- a/fs/btrfs/qgroup.c >> +++ b/fs/btrfs/qgroup.c >> @@ -2784,13 +2784,20 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, >> u64 progress_objectid, >> >> if (!init_flags) { >> /* we're resuming qgroup rescan at mount time */ >> - if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)) >> + if (!(fs_info->qgroup_flags & >> + BTRFS_QGROUP_STATUS_FLAG_RESCAN)) { >> btrfs_warn(fs_info, >> "qgroup rescan init failed, qgroup is not enabled"); >> - else if (!(fs_info->qgroup_flags & >> BTRFS_QGROUP_STATUS_FLAG_ON)) >> + ret = -EINVAL; >> + } else if (!(fs_info->qgroup_flags & >> + BTRFS_QGROUP_STATUS_FLAG_ON)) { >> btrfs_warn(fs_info, >> "qgroup rescan init failed, qgroup rescan is not >> queued"); >> - return -EINVAL; >> + ret = -EINVAL; >> + } >> + >> + if (ret) >> + return ret; > > > How is this patch functionally different than the old code. In both > cases if either of those 2 is not set a warn is printed and -EINVAL is > returned? It is explained in the changelog: "This is because of a recent regression that makes qgroup_rescan_init() incorrectly return -EINVAL when we are mounting the filesystem (through btrfs_read_qgroup_config()). This causes the -EINVAL error to be returned regardless of any qgroup flags being set instead of returning the error only when neither of the flags BTRFS_QGROUP_STATUS_FLAG_RESCAN nor BTRFS_QGROUP_STATUS_FLAG_ON are set." If you can't understand it, try the test case... > >> } >> >> mutex_lock(_info->qgroup_rescan_lock); >> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix mount failure when qgroup rescan is in progress
On 27.06.2018 02:43, fdman...@kernel.org wrote: > From: Filipe Manana > > If a power failure happens while the qgroup rescan kthread is running, > the next mount operation will always fail. This is because of a recent > regression that makes qgroup_rescan_init() incorrectly return -EINVAL > when we are mounting the filesystem (through btrfs_read_qgroup_config()). > This causes the -EINVAL error to be returned regardless of any qgroup > flags being set instead of returning the error only when neither of > the flags BTRFS_QGROUP_STATUS_FLAG_RESCAN nor BTRFS_QGROUP_STATUS_FLAG_ON > are set. > > A test case for fstests follows up soon. > > Fixes: 9593bf49675e ("btrfs: qgroup: show more meaningful qgroup_rescan_init > error message") > Signed-off-by: Filipe Manana > --- > fs/btrfs/qgroup.c | 13 ++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c > index 1874a6d2e6f5..d4171de93087 100644 > --- a/fs/btrfs/qgroup.c > +++ b/fs/btrfs/qgroup.c > @@ -2784,13 +2784,20 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 > progress_objectid, > > if (!init_flags) { > /* we're resuming qgroup rescan at mount time */ > - if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)) > + if (!(fs_info->qgroup_flags & > + BTRFS_QGROUP_STATUS_FLAG_RESCAN)) { > btrfs_warn(fs_info, > "qgroup rescan init failed, qgroup is not enabled"); > - else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON)) > + ret = -EINVAL; > + } else if (!(fs_info->qgroup_flags & > + BTRFS_QGROUP_STATUS_FLAG_ON)) { > btrfs_warn(fs_info, > "qgroup rescan init failed, qgroup rescan is not > queued"); > - return -EINVAL; > + ret = -EINVAL; > + } > + > + if (ret) > + return ret; How is this patch functionally different than the old code. In both cases if either of those 2 is not set a warn is printed and -EINVAL is returned? > } > > mutex_lock(_info->qgroup_rescan_lock); > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fstests: test power failure on btrfs while qgroups rescan is in progress
From: Filipe Manana Test that if a power failure happens on a filesystem with quotas (qgroups) enabled while the quota rescan kernel thread is running, we will be able to mount the filesystem after the power failure. This test is motivated by a recent regression introduced in the linux kernel's 4.18 merge window and is fixed by a patch with the title: "Btrfs: fix mount failure when qgroup rescan is in progress" Signed-off-by: Filipe Manana --- tests/btrfs/166 | 57 + tests/btrfs/166.out | 2 ++ tests/btrfs/group | 1 + 3 files changed, 60 insertions(+) create mode 100755 tests/btrfs/166 create mode 100644 tests/btrfs/166.out diff --git a/tests/btrfs/166 b/tests/btrfs/166 new file mode 100755 index ..c74b0861 --- /dev/null +++ b/tests/btrfs/166 @@ -0,0 +1,57 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved. +# +# FSQA Test No. 166 +# +# Test that if a power failure happens on a filesystem with quotas (qgroups) +# enabled while the quota rescan kernel thread is running, we will be able +# to mount the filesystem after the power failure. +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + _cleanup_flakey + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/dmflakey + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_dm_target flakey + +rm -f $seqres.full + +_scratch_mkfs >>$seqres.full 2>&1 +_require_metadata_journaling $SCRATCH_DEV +_init_flakey +_mount_flakey + +# Enable qgroups on the filesystem. This will start the qgroup rescan kernel +# thread. +_run_btrfs_util_prog quota enable $SCRATCH_MNT + +# Simulate a power failure, while the qgroup rescan kernel thread is running, +# and then mount the filesystem to check that mounting the filesystem does not +# fail. +_flakey_drop_and_remount + +_unmount_flakey +_cleanup_flakey + +echo "Silence is golden" +status=0 +exit diff --git a/tests/btrfs/166.out b/tests/btrfs/166.out new file mode 100644 index ..1b1db1f8 --- /dev/null +++ b/tests/btrfs/166.out @@ -0,0 +1,2 @@ +QA output created by 166 +Silence is golden diff --git a/tests/btrfs/group b/tests/btrfs/group index 9988cedd..e68aa1b7 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -168,3 +168,4 @@ 163 auto quick volume 164 auto quick volume 165 auto quick subvol +166 auto quick qgroup -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix mount failure when qgroup rescan is in progress
From: Filipe Manana If a power failure happens while the qgroup rescan kthread is running, the next mount operation will always fail. This is because of a recent regression that makes qgroup_rescan_init() incorrectly return -EINVAL when we are mounting the filesystem (through btrfs_read_qgroup_config()). This causes the -EINVAL error to be returned regardless of any qgroup flags being set instead of returning the error only when neither of the flags BTRFS_QGROUP_STATUS_FLAG_RESCAN nor BTRFS_QGROUP_STATUS_FLAG_ON are set. A test case for fstests follows up soon. Fixes: 9593bf49675e ("btrfs: qgroup: show more meaningful qgroup_rescan_init error message") Signed-off-by: Filipe Manana --- fs/btrfs/qgroup.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 1874a6d2e6f5..d4171de93087 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2784,13 +2784,20 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid, if (!init_flags) { /* we're resuming qgroup rescan at mount time */ - if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)) + if (!(fs_info->qgroup_flags & + BTRFS_QGROUP_STATUS_FLAG_RESCAN)) { btrfs_warn(fs_info, "qgroup rescan init failed, qgroup is not enabled"); - else if (!(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_ON)) + ret = -EINVAL; + } else if (!(fs_info->qgroup_flags & +BTRFS_QGROUP_STATUS_FLAG_ON)) { btrfs_warn(fs_info, "qgroup rescan init failed, qgroup rescan is not queued"); - return -EINVAL; + ret = -EINVAL; + } + + if (ret) + return ret; } mutex_lock(_info->qgroup_rescan_lock); -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] btrfs: Rename EXTENT_BUFFER_DUMMY to EXTENT_BUFFER_PRIVATE
EXTENT_BUFFER_DUMMY is an awful name for this flag. Buffers which have this flag set are not in any way dummy. Rather, they are private in the sense that are not linked to the global buffer tree. This flag has subtle implications to the way free_extent_buffer work for example, as well as controls whether page->mapping->private_lock is held during extent_buffer release. Pages for a private buffer cannot be under io, nor can they be written by a 3rd party so taking the lock is unnecessary. Signed-off-by: Nikolay Borisov --- fs/btrfs/disk-io.c | 2 +- fs/btrfs/extent_io.c | 10 +- fs/btrfs/extent_io.h | 2 +- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8a469f70d5ee..1c655be92690 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4093,7 +4093,7 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf) * enabled. Normal people shouldn't be marking dummy buffers as dirty * outside of the sanity tests. */ - if (unlikely(test_bit(EXTENT_BUFFER_DUMMY, >bflags))) + if (unlikely(test_bit(EXTENT_BUFFER_PRIVATE, >bflags))) return; #endif root = BTRFS_I(buf->pages[0]->mapping->host)->root; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 6ac15804bab1..6611207e8e1f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4642,7 +4642,7 @@ int extent_buffer_under_io(struct extent_buffer *eb) static void btrfs_release_extent_buffer_page(struct extent_buffer *eb) { int i; - int mapped = !test_bit(EXTENT_BUFFER_DUMMY, >bflags); + int mapped = !test_bit(EXTENT_BUFFER_PRIVATE, >bflags); BUG_ON(extent_buffer_under_io(eb)); @@ -4755,7 +4755,7 @@ struct extent_buffer *btrfs_clone_extent_buffer(struct extent_buffer *src) } set_bit(EXTENT_BUFFER_UPTODATE, >bflags); - set_bit(EXTENT_BUFFER_DUMMY, >bflags); + set_bit(EXTENT_BUFFER_PRIVATE, >bflags); return new; } @@ -4780,7 +4780,7 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info, } set_extent_buffer_uptodate(eb); btrfs_set_header_nritems(eb, 0); - set_bit(EXTENT_BUFFER_DUMMY, >bflags); + set_bit(EXTENT_BUFFER_PRIVATE, >bflags); return eb; err: @@ -5086,7 +5086,7 @@ static int release_extent_buffer(struct extent_buffer *eb) /* Should be safe to release our pages at this point */ btrfs_release_extent_buffer_page(eb); #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS - if (unlikely(test_bit(EXTENT_BUFFER_DUMMY, >bflags))) { + if (unlikely(test_bit(EXTENT_BUFFER_PRIVATE, >bflags))) { __free_extent_buffer(eb); return 1; } @@ -5117,7 +5117,7 @@ void free_extent_buffer(struct extent_buffer *eb) spin_lock(>refs_lock); if (atomic_read(>refs) == 2 && - test_bit(EXTENT_BUFFER_DUMMY, >bflags)) + test_bit(EXTENT_BUFFER_PRIVATE, >bflags)) atomic_dec(>refs); if (atomic_read(>refs) == 2 && diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 0bfd4aeb822d..bfccaec2c89a 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -46,7 +46,7 @@ #define EXTENT_BUFFER_STALE 6 #define EXTENT_BUFFER_WRITEBACK 7 #define EXTENT_BUFFER_READ_ERR 8/* read IO error */ -#define EXTENT_BUFFER_DUMMY 9 +#define EXTENT_BUFFER_PRIVATE 9 #define EXTENT_BUFFER_IN_TREE 10 #define EXTENT_BUFFER_WRITE_ERR 11/* write IO error */ -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] btrfs: Remove unnecessary locking code in qgroup_rescan_leaf
In qgroup_rescan_leaf a copy is made of the target leaf by calling btrfs_clone_extent_buffer. The latter allocates a new buffer and attaches a new set of pages and copies the content of the source buffer. The new scratch buffer is only used to iterate it's items, it's not published anywhere and cannot be accessed by a third party. Hence, it's not necessary to perform any locking on it whatsoever. Furthermore, remove the extra extent_buffer_get call since the new buffer is always allocated with a reference count of 1 which is sufficient here. No functional changes. Signed-off-by: Nikolay Borisov --- fs/btrfs/qgroup.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 1874a6d2e6f5..3b57dc247fa2 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2647,9 +2647,6 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, mutex_unlock(_info->qgroup_rescan_lock); goto out; } - extent_buffer_get(scratch_leaf); - btrfs_tree_read_lock(scratch_leaf); - btrfs_set_lock_blocking_rw(scratch_leaf, BTRFS_READ_LOCK); slot = path->slots[0]; btrfs_release_path(path); mutex_unlock(_info->qgroup_rescan_lock); @@ -2675,10 +2672,8 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, goto out; } out: - if (scratch_leaf) { - btrfs_tree_read_unlock_blocking(scratch_leaf); + if (scratch_leaf) free_extent_buffer(scratch_leaf); - } if (done && !ret) ret = 1; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] btrfs: Document locking require via lockdep_assert_held
Remove stale comment since there is no longer an eb->eb_lock and document the locking expectation with a lockdep_assert_held statement. No functional changes. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent_io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4180a3b7e725..6ac15804bab1 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5064,9 +5064,10 @@ static inline void btrfs_release_extent_buffer_rcu(struct rcu_head *head) __free_extent_buffer(eb); } -/* Expects to have eb->eb_lock already held */ static int release_extent_buffer(struct extent_buffer *eb) { + lockdep_assert_held(>refs_lock); + WARN_ON(atomic_read(>refs) == 0); if (atomic_dec_and_test(>refs)) { if (test_and_clear_bit(EXTENT_BUFFER_IN_TREE, >bflags)) { -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] btrfs: Refactor loop in btrfs_release_extent_buffer_page
The purpose of the function is to free all the pages comprising an extent buffer. This can be achieved with a simple for loop rather than the slitghly more involved 'do {} while' construct. So rewrite the loop using a 'for' construct. Additionally we can never have an extent_buffer that is 0 pages so remove the check for index == 0. No functional changes. Signed-off-by: Nikolay Borisov --- fs/btrfs/extent_io.c | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index cce6087d6880..4180a3b7e725 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4641,19 +4641,14 @@ int extent_buffer_under_io(struct extent_buffer *eb) */ static void btrfs_release_extent_buffer_page(struct extent_buffer *eb) { - unsigned long index; - struct page *page; + int i; int mapped = !test_bit(EXTENT_BUFFER_DUMMY, >bflags); BUG_ON(extent_buffer_under_io(eb)); - index = num_extent_pages(eb->start, eb->len); - if (index == 0) - return; + for (i = 0; i < num_extent_pages(eb->start, eb->len); i++) { + struct page *page = eb->pages[i]; - do { - index--; - page = eb->pages[index]; if (!page) continue; if (mapped) @@ -4685,7 +4680,7 @@ static void btrfs_release_extent_buffer_page(struct extent_buffer *eb) /* One for when we allocated the page */ put_page(page); - } while (index != 0); + } } /* -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Misc cleanups
Here are a couples of cleanups of things I observed while looking at the extent_buffer management code. Patch 1 rewrites a do {} while into a simple for() construct. This survived xfstest + selftests Patch 2 substitutes and outdated comment for a lockdep_assert_held call Patch 3 rename the idiotic EXTENT_BUFFER_DUMMY to something more meaningful Patch 4 removes some cargo-cult copied code when performing qgroup leaf scan Nikolay Borisov (4): btrfs: Refactor loop in btrfs_release_extent_buffer_page btrfs: Document locking require via lockdep_assert_held btrfs: Rename EXTENT_BUFFER_DUMMY to EXTENT_BUFFER_PRIVATE btrfs: Remove unnecessary locking code in qgroup_rescan_leaf fs/btrfs/disk-io.c | 2 +- fs/btrfs/extent_io.c | 26 +++--- fs/btrfs/extent_io.h | 2 +- fs/btrfs/qgroup.c| 7 +-- 4 files changed, 14 insertions(+), 23 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs: Add graceful handling of V0 extents
On Wed, Jun 27, 2018 at 09:12:06AM -0400, Noah Massey wrote: > On Tue, Jun 26, 2018 at 12:02 PM David Sterba wrote: > > > > On Tue, Jun 26, 2018 at 04:57:36PM +0300, Nikolay Borisov wrote: > > > Following the removal of the v0 handling code let's be courteous and > > > print an error message when such extents are handled. In the cases > > > where we have a transaction just abort it, otherwise just call > > > btrfs_handle_fs_error. Both cases result in the FS being re-mounted RO. > > > > > > Signed-off-by: Nikolay Borisov > > > > > - ASSERT(key.type != BTRFS_EXTENT_REF_V0_KEY); > > > - if (key.type == BTRFS_SHARED_BLOCK_REF_KEY) { > > > + if (key.type == BTRFS_EXTENT_REF_V0_KEY) { > > > + err = -EINVAL; > > > + btrfs_print_v0_err(rc->extent_root->fs_info); > > > + btrfs_handle_fs_error(rc->extent_root->fs_info, err, > > > + NULL); > > > + goto out; > > > + } else if (key.type == BTRFS_SHARED_BLOCK_REF_KEY) { > > > > The v0 check should be made last as it's not expected to happen. I'm > > commiting with this diff > > > > --- a/fs/btrfs/relocation.c > > +++ b/fs/btrfs/relocation.c > > @@ -787,13 +787,7 @@ struct backref_node *build_backref_tree(struct > > reloc_control *rc, > > goto next; > > } > > > > - if (key.type == BTRFS_EXTENT_REF_V0_KEY) { > > - err = -EINVAL; > > - btrfs_print_v0_err(rc->extent_root->fs_info); > > - btrfs_handle_fs_error(rc->extent_root->fs_info, err, > > - NULL); > > - goto out; > > - } else if (key.type == BTRFS_SHARED_BLOCK_REF_KEY) { > > + if (key.type == BTRFS_SHARED_BLOCK_REF_KEY) { > > if (key.objectid == key.offset) { > > /* > > * only root blocks of reloc trees use > > @@ -838,6 +832,12 @@ struct backref_node *build_backref_tree(struct > > reloc_control *rc, > > goto next; > > } else if (key.type != BTRFS_TREE_BLOCK_REF_KEY) { > > goto next; > > The V0 check needs to be before this one Right, due to !=. Thanks for catching it. > > + } else if (key.type == BTRFS_EXTENT_REF_V0_KEY) { > > + err = -EINVAL; > > + btrfs_print_v0_err(rc->extent_root->fs_info); > > + btrfs_handle_fs_error(rc->extent_root->fs_info, err, > > + NULL); > > + goto out; > > } > > > > /* key.type == BTRFS_TREE_BLOCK_REF_KEY */ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs: Add graceful handling of V0 extents
On Tue, Jun 26, 2018 at 12:02 PM David Sterba wrote: > > On Tue, Jun 26, 2018 at 04:57:36PM +0300, Nikolay Borisov wrote: > > Following the removal of the v0 handling code let's be courteous and > > print an error message when such extents are handled. In the cases > > where we have a transaction just abort it, otherwise just call > > btrfs_handle_fs_error. Both cases result in the FS being re-mounted RO. > > > > Signed-off-by: Nikolay Borisov > > > - ASSERT(key.type != BTRFS_EXTENT_REF_V0_KEY); > > - if (key.type == BTRFS_SHARED_BLOCK_REF_KEY) { > > + if (key.type == BTRFS_EXTENT_REF_V0_KEY) { > > + err = -EINVAL; > > + btrfs_print_v0_err(rc->extent_root->fs_info); > > + btrfs_handle_fs_error(rc->extent_root->fs_info, err, > > + NULL); > > + goto out; > > + } else if (key.type == BTRFS_SHARED_BLOCK_REF_KEY) { > > The v0 check should be made last as it's not expected to happen. I'm > commiting with this diff > > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -787,13 +787,7 @@ struct backref_node *build_backref_tree(struct > reloc_control *rc, > goto next; > } > > - if (key.type == BTRFS_EXTENT_REF_V0_KEY) { > - err = -EINVAL; > - btrfs_print_v0_err(rc->extent_root->fs_info); > - btrfs_handle_fs_error(rc->extent_root->fs_info, err, > - NULL); > - goto out; > - } else if (key.type == BTRFS_SHARED_BLOCK_REF_KEY) { > + if (key.type == BTRFS_SHARED_BLOCK_REF_KEY) { > if (key.objectid == key.offset) { > /* > * only root blocks of reloc trees use > @@ -838,6 +832,12 @@ struct backref_node *build_backref_tree(struct > reloc_control *rc, > goto next; > } else if (key.type != BTRFS_TREE_BLOCK_REF_KEY) { > goto next; The V0 check needs to be before this one > + } else if (key.type == BTRFS_EXTENT_REF_V0_KEY) { > + err = -EINVAL; > + btrfs_print_v0_err(rc->extent_root->fs_info); > + btrfs_handle_fs_error(rc->extent_root->fs_info, err, > + NULL); > + goto out; > } > > /* key.type == BTRFS_TREE_BLOCK_REF_KEY */ > @@ -3734,11 +3734,7 @@ int add_data_references(struct reloc_control *rc, > if (key.objectid != extent_key->objectid) > break; > > - if (key.type == BTRFS_EXTENT_REF_V0_KEY) { > - btrfs_print_v0_err(eb->fs_info); > - btrfs_handle_fs_error(eb->fs_info, -EINVAL, NULL); > - ret = -EINVAL; > - } else if (key.type == BTRFS_SHARED_DATA_REF_KEY) { > + if (key.type == BTRFS_SHARED_DATA_REF_KEY) { > ret = __add_tree_block(rc, key.offset, blocksize, >blocks); > } else if (key.type == BTRFS_EXTENT_DATA_REF_KEY) { > @@ -3746,6 +3742,10 @@ int add_data_references(struct reloc_control *rc, > struct btrfs_extent_data_ref); > ret = find_data_references(rc, extent_key, >eb, dref, blocks); > + } else if (key.type == BTRFS_EXTENT_REF_V0_KEY) { > + btrfs_print_v0_err(eb->fs_info); > + btrfs_handle_fs_error(eb->fs_info, -EINVAL, NULL); > + ret = -EINVAL; > } else { > ret = 0; > } > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs: return EUCLEAN if extent_inline_ref type is invalid
On Mon, Jun 25, 2018 at 09:28:32AM +0800, Su Yue wrote: > > > On 06/22/2018 05:40 PM, David Sterba wrote: > > On Fri, Jun 22, 2018 at 04:18:01PM +0800, Su Yue wrote: > >> If type of extent_inline_ref found is not expected, filesystem may have > >> been corrupted, should return EUCLEAN instead of EINVAL. > >> No functional changes. > >> > >> Signed-off-by: Su Yue > >> --- > >> Changelog: > >> v2: > >> Add changes in build_backref_tree, get_extent_inline_ref and > >> add_inline_refs. > > > > v2 looks good. I saw one implicit check for invalid ref in > > add_data_references that still returns -EINVAL. > > > > The implicit check is located in relocation.c:3804. I changed > it in v1 which is not listed in changedlog. > Did I need to sent v3 to list all functions in commit message? No, the change is in the patch, I must have overlooked it, sorry. Added to misc-next now, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
fstests/btrfs/011 lockdep warning in 4.18-rc
Hi, I've seen the following lockdep warning after the 4.18 merges, it's probably a cross-subsystem locking issue so I waited some time if this will not go away after merge window. Slab shrinker calls evict inode, in parallel there's an unmount in progress and at some point locks get taken in the wrong order. [ 635.423621] == [ 635.429802] WARNING: possible circular locking dependency detected [ 635.432211] 4.18.0-rc1-default+ #164 Not tainted [ 635.434006] -- [ 635.436888] kswapd0/77 is trying to acquire lock: [ 635.438777] f6fa0d1f (_node->mutex){+.+.}, at: __btrfs_release_delayed_node+0x49/0x2b0 [btrfs] [ 635.440579] [ 635.440579] but task is already holding lock: [ 635.441636] 25dee6ec (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 [ 635.442851] [ 635.442851] which lock already depends on the new lock. [ 635.442851] [ 635.444221] [ 635.444221] the existing dependency chain (in reverse order) is: [ 635.445398] [ 635.445398] -> #3 (fs_reclaim){+.+.}: [ 635.446499]fs_reclaim_acquire.part.111+0x29/0x30 [ 635.447332]kmem_cache_alloc+0x29/0x2b0 [ 635.448066]btrfs_alloc_inode+0x24/0x260 [btrfs] [ 635.448889]alloc_inode+0x18/0x80 [ 635.449549]new_inode_pseudo+0xc/0x60 [ 635.450304]new_inode+0x12/0x30 [ 635.451601]iget5_locked+0xb1/0xf0 [ 635.452524]btrfs_iget+0x57/0xf0 [btrfs] [ 635.453276]__lookup_free_space_inode+0xd8/0x150 [btrfs] [ 635.454205]lookup_free_space_inode+0x63/0xc0 [btrfs] [ 635.455058]load_free_space_cache+0x6e/0x190 [btrfs] [ 635.455949]cache_block_group+0x1c6/0x460 [btrfs] [ 635.456793]find_free_extent+0x889/0x14e0 [btrfs] [ 635.457640]btrfs_reserve_extent+0x9b/0x180 [btrfs] [ 635.458463]btrfs_alloc_tree_block+0x1e8/0x440 [btrfs] [ 635.459204]__btrfs_cow_block+0x109/0x700 [btrfs] [ 635.460124]btrfs_cow_block+0x129/0x2f0 [btrfs] [ 635.460938]btrfs_search_slot+0x22f/0xa70 [btrfs] [ 635.461977]btrfs_lookup_inode+0x3a/0xc0 [btrfs] [ 635.463158]__btrfs_update_delayed_inode+0x75/0x270 [btrfs] [ 635.464032]__btrfs_run_delayed_items+0x147/0x1d0 [btrfs] [ 635.464825]btrfs_commit_transaction+0x18a/0x9e0 [btrfs] [ 635.465737]sync_filesystem+0x6b/0x90 [ 635.466439]generic_shutdown_super+0x22/0x100 [ 635.467229]kill_anon_super+0xe/0x20 [ 635.467994]btrfs_kill_super+0x12/0xa0 [btrfs] [ 635.468827]deactivate_locked_super+0x29/0x60 [ 635.469622]cleanup_mnt+0x3b/0x70 [ 635.470363]task_work_run+0x9b/0xd0 [ 635.471479]exit_to_usermode_loop+0xbb/0xc0 [ 635.472519]do_syscall_64+0x16c/0x170 [ 635.473329]entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 635.474309] [ 635.474309] -> #2 (_ctl->mutex){+.+.}: [ 635.475430]__mutex_lock+0x86/0x9c0 [ 635.476206]cache_block_group+0x1bb/0x460 [btrfs] [ 635.477152]find_free_extent+0x889/0x14e0 [btrfs] [ 635.478101]btrfs_reserve_extent+0x9b/0x180 [btrfs] [ 635.479074]btrfs_alloc_tree_block+0x1e8/0x440 [btrfs] [ 635.480081]__btrfs_cow_block+0x109/0x700 [btrfs] [ 635.481028]btrfs_cow_block+0x129/0x2f0 [btrfs] [ 635.482020]btrfs_search_slot+0x22f/0xa70 [btrfs] [ 635.482733]btrfs_insert_empty_items+0x67/0xc0 [btrfs] [ 635.483459]btrfs_uuid_tree_add+0x1d2/0x2e0 [btrfs] [ 635.484158]btrfs_uuid_scan_kthread+0x159/0x340 [btrfs] [ 635.484880]kthread+0x11b/0x140 [ 635.485443]ret_from_fork+0x24/0x30 [ 635.486075] [ 635.486075] -> #1 (_info->groups_sem){}: [ 635.486931]down_read+0x3b/0x60 [ 635.487456]find_free_extent+0x992/0x14e0 [btrfs] [ 635.488137]btrfs_reserve_extent+0x9b/0x180 [btrfs] [ 635.488833]btrfs_alloc_tree_block+0x1e8/0x440 [btrfs] [ 635.489639]__btrfs_cow_block+0x109/0x700 [btrfs] [ 635.490440]btrfs_cow_block+0x129/0x2f0 [btrfs] [ 635.491102]btrfs_search_slot+0x22f/0xa70 [btrfs] [ 635.491782]btrfs_insert_empty_items+0x67/0xc0 [btrfs] [ 635.492508]btrfs_insert_delayed_items+0x86/0x490 [btrfs] [ 635.493305]__btrfs_run_delayed_items+0x96/0x1d0 [btrfs] [ 635.494200]btrfs_commit_transaction+0x297/0x9e0 [btrfs] [ 635.494989]btrfs_mksubvol+0x4e8/0x530 [btrfs] [ 635.495651]btrfs_ioctl_snap_create_transid+0x170/0x180 [btrfs] [ 635.496458]btrfs_ioctl_snap_create_v2+0x124/0x180 [btrfs] [ 635.497224]btrfs_ioctl+0xc35/0x31a0 [btrfs] [ 635.498003]do_vfs_ioctl+0xa2/0x6b0 [ 635.498680]ksys_ioctl+0x3a/0x70 [ 635.499331]__x64_sys_ioctl+0x16/0x20 [ 635.500032]do_syscall_64+0x5a/0x170 [ 635.500715]entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 635.501574] [
[PATCH v1] btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf
Commit ff3d27a048d9 ("btrfs: qgroup: Finish rescan when hit the last leaf of extent tree") added a new exit for rescan finish. However after finishing quota rescan, we set fs_info->qgroup_rescan_progress to (u64)-1 before we exit through the original exit path. While we missed that assignment of (u64)-1 in the new exit path. The end result is, the quota status item doesn't have the same value. (-1 vs the last bytenr + 1) Although it doesn't affect quota accounting, it's still better to keep the original behavior. Reported-by: Misono Tomohiro Fixes: ff3d27a048d9 ("btrfs: qgroup: Finish rescan when hit the last leaf of extent tree") Signed-off-by: Qu Wenruo --- changelog: v2: Commit message update, as the bug only changes the resulting quota status item without impacting the behavior. --- fs/btrfs/qgroup.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 1874a6d2e6f5..99f2b9ce0f15 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2680,8 +2680,10 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, free_extent_buffer(scratch_leaf); } - if (done && !ret) + if (done && !ret) { ret = 1; + fs_info->qgroup_rescan_progress.objectid = (u64)-1; + } return ret; } -- 2.18.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Enabling quota may not correctly rescan on 4.17
On 2018年06月27日 16:57, Qu Wenruo wrote: > > > On 2018年06月27日 16:47, Nikolay Borisov wrote: >> >> >> On 27.06.2018 11:38, Qu Wenruo wrote: >>> >>> >>> On 2018年06月27日 16:34, Qu Wenruo wrote: On 2018年06月27日 16:25, Misono Tomohiro wrote: > On 2018/06/27 17:10, Qu Wenruo wrote: >> >> >> On 2018年06月26日 14:00, Misono Tomohiro wrote: >>> Hello Nikolay, >>> >>> I noticed that commit 5d23515be669 ("btrfs: Move qgroup rescan >>> on quota enable to btrfs_quota_enable") in 4.17 sometimes causes >>> to fail correctly rescanning quota when quota is enabled. >>> >>> Simple reproducer: >>> >>> $ mkfs.btrfs -f $DEV >>> $ mount $DEV /mnt >>> $ dd if=/dev/urandom of=/mnt/file bs=1000 count=1000 >>> $ btrfs quota enbale /mnt >>> $ umount /mnt >>> $ btrfs check $DEV >>> ... >>> checking quota groups >>> Counts for qgroup id: 0/5 are different >>> our:referenced 1019904 referenced compressed 1019904 >>> disk: referenced 16384 referenced compressed 16384 >>> diff: referenced 1003520 referenced compressed 1003520 >>> our:exclusive 1019904 exclusive compressed 1019904 >>> disk: exclusive 16384 exclusive compressed 16384 >>> diff: exclusive 1003520 exclusive compressed 1003520 >>> found 1413120 bytes used, error(s) found >>> ... >>> >>> This can be also observed in btrfs/114. (Note that progs < 4.17 >>> returns error code 0 even if quota is not consistency and therefore >>> test will incorrectly pass.) >> >> BTW, would you please try to dump the quota tree for such mismatch case? >> >> It could be a btrfs-progs bug which it should skip quota checking if it >> found the quota status item has RESCAN flag. > > Yes, this is what I see after running btrfs/114 (/dev/sdh1 is scratch > dev): > > $ sudo btrfs check -Q /dev/sdh1 > Checking filesystem on /dev/sdh1 > UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb > Print quota groups for /dev/sdh1 > UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb > Counts for qgroup id: 0/5 are different > our:referenced 170999808 referenced compressed 170999808 > disk: referenced 16384 referenced compressed 16384 > diff: referenced 170983424 referenced compressed 170983424 > our:exclusive 170999808 exclusive compressed 170999808 > disk: exclusive 16384 exclusive compressed 16384 > diff: exclusive 170983424 exclusive compressed 170983424 > > > $ sudo btrfs inspect-internal dump-tree -t quota /dev/sdh1 > btrfs-progs v4.17 > quota tree key (QUOTA_TREE ROOT_ITEM 0) > leaf 213958656 items 3 free space 16096 generation 9 owner QUOTA_TREE > leaf 213958656 flags 0x1(WRITTEN) backref revision 1 > fs uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb > chunk uuid 78d753d0-eeb7-4c3e-b825-b6c2c5de5c7a > item 0 key (0 QGROUP_STATUS 0) itemoff 16251 itemsize 32 > version 1 generation 9 flags ON scan 30572545 Scan is not -1 and flags is only ON, without RESCAN. > item 1 key (0 QGROUP_INFO 0/5) itemoff 16211 itemsize 40 > generation 7 > referenced 16384 referenced_compressed 16384 > exclusive 16384 exclusive_compressed 16384 > item 2 key (0 QGROUP_LIMIT 0/5) itemoff 16171 itemsize 40 > flags 0 > max_referenced 0 max_exclusive 0 > rsv_referenced 0 rsv_exclusive 0 > total bytes 26843545600 > bytes used 171769856 > uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb > > > And if I mount+rescan again: > > $ sudo mount /dev/sdh1 /mnt > $ sudo btrfs quota rescan -w /mnt > $ sudo umount /mnt > > $ sudo btrfs check -Q /dev/sdh1 > Checking filesystem on /dev/sdh1 > UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb > Print quota groups for /dev/sdh1 > UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb > Counts for qgroup id: 0/5 > our:referenced 170999808 referenced compressed 170999808 > disk: referenced 170999808 referenced compressed 170999808 > our:exclusive 170999808 exclusive compressed 170999808 > disk: exclusive 170999808 exclusive compressed 170999808 > > $ sudo btrfs inspect-internal dump-tree -t quota /dev/sdh1 > btrfs-progs v4.17 > quota tree key (QUOTA_TREE ROOT_ITEM 0) > leaf 31309824 items 3 free space 16096 generation 13 owner QUOTA_TREE > leaf 31309824 flags 0x1(WRITTEN) backref revision 1 > fs uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb > chunk uuid 78d753d0-eeb7-4c3e-b825-b6c2c5de5c7a > item 0 key (0 QGROUP_STATUS 0) itemoff 16251 itemsize 32 > version 1 generation 13 flags ON scan 213827585
Re: Enabling quota may not correctly rescan on 4.17
On 2018年06月27日 16:47, Nikolay Borisov wrote: > > > On 27.06.2018 11:38, Qu Wenruo wrote: >> >> >> On 2018年06月27日 16:34, Qu Wenruo wrote: >>> >>> >>> On 2018年06月27日 16:25, Misono Tomohiro wrote: On 2018/06/27 17:10, Qu Wenruo wrote: > > > On 2018年06月26日 14:00, Misono Tomohiro wrote: >> Hello Nikolay, >> >> I noticed that commit 5d23515be669 ("btrfs: Move qgroup rescan >> on quota enable to btrfs_quota_enable") in 4.17 sometimes causes >> to fail correctly rescanning quota when quota is enabled. >> >> Simple reproducer: >> >> $ mkfs.btrfs -f $DEV >> $ mount $DEV /mnt >> $ dd if=/dev/urandom of=/mnt/file bs=1000 count=1000 >> $ btrfs quota enbale /mnt >> $ umount /mnt >> $ btrfs check $DEV >> ... >> checking quota groups >> Counts for qgroup id: 0/5 are different >> our:referenced 1019904 referenced compressed 1019904 >> disk: referenced 16384 referenced compressed 16384 >> diff: referenced 1003520 referenced compressed 1003520 >> our:exclusive 1019904 exclusive compressed 1019904 >> disk: exclusive 16384 exclusive compressed 16384 >> diff: exclusive 1003520 exclusive compressed 1003520 >> found 1413120 bytes used, error(s) found >> ... >> >> This can be also observed in btrfs/114. (Note that progs < 4.17 >> returns error code 0 even if quota is not consistency and therefore >> test will incorrectly pass.) > > BTW, would you please try to dump the quota tree for such mismatch case? > > It could be a btrfs-progs bug which it should skip quota checking if it > found the quota status item has RESCAN flag. Yes, this is what I see after running btrfs/114 (/dev/sdh1 is scratch dev): $ sudo btrfs check -Q /dev/sdh1 Checking filesystem on /dev/sdh1 UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb Print quota groups for /dev/sdh1 UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb Counts for qgroup id: 0/5 are different our:referenced 170999808 referenced compressed 170999808 disk: referenced 16384 referenced compressed 16384 diff: referenced 170983424 referenced compressed 170983424 our:exclusive 170999808 exclusive compressed 170999808 disk: exclusive 16384 exclusive compressed 16384 diff: exclusive 170983424 exclusive compressed 170983424 $ sudo btrfs inspect-internal dump-tree -t quota /dev/sdh1 btrfs-progs v4.17 quota tree key (QUOTA_TREE ROOT_ITEM 0) leaf 213958656 items 3 free space 16096 generation 9 owner QUOTA_TREE leaf 213958656 flags 0x1(WRITTEN) backref revision 1 fs uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb chunk uuid 78d753d0-eeb7-4c3e-b825-b6c2c5de5c7a item 0 key (0 QGROUP_STATUS 0) itemoff 16251 itemsize 32 version 1 generation 9 flags ON scan 30572545 >>> >>> Scan is not -1 and flags is only ON, without RESCAN. >>> item 1 key (0 QGROUP_INFO 0/5) itemoff 16211 itemsize 40 generation 7 referenced 16384 referenced_compressed 16384 exclusive 16384 exclusive_compressed 16384 item 2 key (0 QGROUP_LIMIT 0/5) itemoff 16171 itemsize 40 flags 0 max_referenced 0 max_exclusive 0 rsv_referenced 0 rsv_exclusive 0 total bytes 26843545600 bytes used 171769856 uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb And if I mount+rescan again: $ sudo mount /dev/sdh1 /mnt $ sudo btrfs quota rescan -w /mnt $ sudo umount /mnt $ sudo btrfs check -Q /dev/sdh1 Checking filesystem on /dev/sdh1 UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb Print quota groups for /dev/sdh1 UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb Counts for qgroup id: 0/5 our:referenced 170999808 referenced compressed 170999808 disk: referenced 170999808 referenced compressed 170999808 our:exclusive 170999808 exclusive compressed 170999808 disk: exclusive 170999808 exclusive compressed 170999808 $ sudo btrfs inspect-internal dump-tree -t quota /dev/sdh1 btrfs-progs v4.17 quota tree key (QUOTA_TREE ROOT_ITEM 0) leaf 31309824 items 3 free space 16096 generation 13 owner QUOTA_TREE leaf 31309824 flags 0x1(WRITTEN) backref revision 1 fs uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb chunk uuid 78d753d0-eeb7-4c3e-b825-b6c2c5de5c7a item 0 key (0 QGROUP_STATUS 0) itemoff 16251 itemsize 32 version 1 generation 13 flags ON scan 213827585 >>> >>> Still doesn't look good. >>> >>> In v4.17.2 (sorry, just checking the behavior on my host), after correct >>> rescan + sync, if we don't have RESCAN flag, we
[PATCH] btrfs: quota: Reset rescan progress if we hit last leaf
Commit ff3d27a048d9 ("btrfs: qgroup: Finish rescan when hit the last leaf of extent tree") added a new exit for rescan finish. However after finishing quota rescan, we set fs_info->qgroup_rescan_progress to (u64)-1, as qgroup_rescan_progress is also used to determine whether we should account dirty extents. Since if dirty extents is after qgroup_rescan_progress, we don't need to account it as rescan will account them later. Without properly setting qgroup_rescan_progress to (u64)-1, all later dirty extents after qgroup_rescan_progress will be ignored, thus screwing up the whole quota accounting. Reported-by: Misono Tomohiro Fixes: ff3d27a048d9 ("btrfs: qgroup: Finish rescan when hit the last leaf of extent tree") Signed-off-by: Qu Wenruo --- fs/btrfs/qgroup.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 1874a6d2e6f5..99f2b9ce0f15 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2680,8 +2680,10 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, free_extent_buffer(scratch_leaf); } - if (done && !ret) + if (done && !ret) { ret = 1; + fs_info->qgroup_rescan_progress.objectid = (u64)-1; + } return ret; } -- 2.18.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Enabling quota may not correctly rescan on 4.17
On 27.06.2018 11:38, Qu Wenruo wrote: > > > On 2018年06月27日 16:34, Qu Wenruo wrote: >> >> >> On 2018年06月27日 16:25, Misono Tomohiro wrote: >>> On 2018/06/27 17:10, Qu Wenruo wrote: On 2018年06月26日 14:00, Misono Tomohiro wrote: > Hello Nikolay, > > I noticed that commit 5d23515be669 ("btrfs: Move qgroup rescan > on quota enable to btrfs_quota_enable") in 4.17 sometimes causes > to fail correctly rescanning quota when quota is enabled. > > Simple reproducer: > > $ mkfs.btrfs -f $DEV > $ mount $DEV /mnt > $ dd if=/dev/urandom of=/mnt/file bs=1000 count=1000 > $ btrfs quota enbale /mnt > $ umount /mnt > $ btrfs check $DEV > ... > checking quota groups > Counts for qgroup id: 0/5 are different > our:referenced 1019904 referenced compressed 1019904 > disk: referenced 16384 referenced compressed 16384 > diff: referenced 1003520 referenced compressed 1003520 > our:exclusive 1019904 exclusive compressed 1019904 > disk: exclusive 16384 exclusive compressed 16384 > diff: exclusive 1003520 exclusive compressed 1003520 > found 1413120 bytes used, error(s) found > ... > > This can be also observed in btrfs/114. (Note that progs < 4.17 > returns error code 0 even if quota is not consistency and therefore > test will incorrectly pass.) BTW, would you please try to dump the quota tree for such mismatch case? It could be a btrfs-progs bug which it should skip quota checking if it found the quota status item has RESCAN flag. >>> >>> Yes, this is what I see after running btrfs/114 (/dev/sdh1 is scratch dev): >>> >>> $ sudo btrfs check -Q /dev/sdh1 >>> Checking filesystem on /dev/sdh1 >>> UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb >>> Print quota groups for /dev/sdh1 >>> UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb >>> Counts for qgroup id: 0/5 are different >>> our:referenced 170999808 referenced compressed 170999808 >>> disk: referenced 16384 referenced compressed 16384 >>> diff: referenced 170983424 referenced compressed 170983424 >>> our:exclusive 170999808 exclusive compressed 170999808 >>> disk: exclusive 16384 exclusive compressed 16384 >>> diff: exclusive 170983424 exclusive compressed 170983424 >>> >>> >>> $ sudo btrfs inspect-internal dump-tree -t quota /dev/sdh1 >>> btrfs-progs v4.17 >>> quota tree key (QUOTA_TREE ROOT_ITEM 0) >>> leaf 213958656 items 3 free space 16096 generation 9 owner QUOTA_TREE >>> leaf 213958656 flags 0x1(WRITTEN) backref revision 1 >>> fs uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb >>> chunk uuid 78d753d0-eeb7-4c3e-b825-b6c2c5de5c7a >>> item 0 key (0 QGROUP_STATUS 0) itemoff 16251 itemsize 32 >>> version 1 generation 9 flags ON scan 30572545 >> >> Scan is not -1 and flags is only ON, without RESCAN. >> >>> item 1 key (0 QGROUP_INFO 0/5) itemoff 16211 itemsize 40 >>> generation 7 >>> referenced 16384 referenced_compressed 16384 >>> exclusive 16384 exclusive_compressed 16384 >>> item 2 key (0 QGROUP_LIMIT 0/5) itemoff 16171 itemsize 40 >>> flags 0 >>> max_referenced 0 max_exclusive 0 >>> rsv_referenced 0 rsv_exclusive 0 >>> total bytes 26843545600 >>> bytes used 171769856 >>> uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb >>> >>> >>> And if I mount+rescan again: >>> >>> $ sudo mount /dev/sdh1 /mnt >>> $ sudo btrfs quota rescan -w /mnt >>> $ sudo umount /mnt >>> >>> $ sudo btrfs check -Q /dev/sdh1 >>> Checking filesystem on /dev/sdh1 >>> UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb >>> Print quota groups for /dev/sdh1 >>> UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb >>> Counts for qgroup id: 0/5 >>> our:referenced 170999808 referenced compressed 170999808 >>> disk: referenced 170999808 referenced compressed 170999808 >>> our:exclusive 170999808 exclusive compressed 170999808 >>> disk: exclusive 170999808 exclusive compressed 170999808 >>> >>> $ sudo btrfs inspect-internal dump-tree -t quota /dev/sdh1 >>> btrfs-progs v4.17 >>> quota tree key (QUOTA_TREE ROOT_ITEM 0) >>> leaf 31309824 items 3 free space 16096 generation 13 owner QUOTA_TREE >>> leaf 31309824 flags 0x1(WRITTEN) backref revision 1 >>> fs uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb >>> chunk uuid 78d753d0-eeb7-4c3e-b825-b6c2c5de5c7a >>> item 0 key (0 QGROUP_STATUS 0) itemoff 16251 itemsize 32 >>> version 1 generation 13 flags ON scan 213827585 >> >> Still doesn't look good. >> >> In v4.17.2 (sorry, just checking the behavior on my host), after correct >> rescan + sync, if we don't have RESCAN flag, we should have scan set to -1. >> >> While in in v4.18-rc1, it doesn't reset the scan progress to -1 after >> finished. >> And just as explained in previous reply, if
Re: Enabling quota may not correctly rescan on 4.17
On 2018年06月27日 16:34, Qu Wenruo wrote: > > > On 2018年06月27日 16:25, Misono Tomohiro wrote: >> On 2018/06/27 17:10, Qu Wenruo wrote: >>> >>> >>> On 2018年06月26日 14:00, Misono Tomohiro wrote: Hello Nikolay, I noticed that commit 5d23515be669 ("btrfs: Move qgroup rescan on quota enable to btrfs_quota_enable") in 4.17 sometimes causes to fail correctly rescanning quota when quota is enabled. Simple reproducer: $ mkfs.btrfs -f $DEV $ mount $DEV /mnt $ dd if=/dev/urandom of=/mnt/file bs=1000 count=1000 $ btrfs quota enbale /mnt $ umount /mnt $ btrfs check $DEV ... checking quota groups Counts for qgroup id: 0/5 are different our:referenced 1019904 referenced compressed 1019904 disk: referenced 16384 referenced compressed 16384 diff: referenced 1003520 referenced compressed 1003520 our:exclusive 1019904 exclusive compressed 1019904 disk: exclusive 16384 exclusive compressed 16384 diff: exclusive 1003520 exclusive compressed 1003520 found 1413120 bytes used, error(s) found ... This can be also observed in btrfs/114. (Note that progs < 4.17 returns error code 0 even if quota is not consistency and therefore test will incorrectly pass.) >>> >>> BTW, would you please try to dump the quota tree for such mismatch case? >>> >>> It could be a btrfs-progs bug which it should skip quota checking if it >>> found the quota status item has RESCAN flag. >> >> Yes, this is what I see after running btrfs/114 (/dev/sdh1 is scratch dev): >> >> $ sudo btrfs check -Q /dev/sdh1 >> Checking filesystem on /dev/sdh1 >> UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb >> Print quota groups for /dev/sdh1 >> UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb >> Counts for qgroup id: 0/5 are different >> our:referenced 170999808 referenced compressed 170999808 >> disk: referenced 16384 referenced compressed 16384 >> diff: referenced 170983424 referenced compressed 170983424 >> our:exclusive 170999808 exclusive compressed 170999808 >> disk: exclusive 16384 exclusive compressed 16384 >> diff: exclusive 170983424 exclusive compressed 170983424 >> >> >> $ sudo btrfs inspect-internal dump-tree -t quota /dev/sdh1 >> btrfs-progs v4.17 >> quota tree key (QUOTA_TREE ROOT_ITEM 0) >> leaf 213958656 items 3 free space 16096 generation 9 owner QUOTA_TREE >> leaf 213958656 flags 0x1(WRITTEN) backref revision 1 >> fs uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb >> chunk uuid 78d753d0-eeb7-4c3e-b825-b6c2c5de5c7a >> item 0 key (0 QGROUP_STATUS 0) itemoff 16251 itemsize 32 >> version 1 generation 9 flags ON scan 30572545 > > Scan is not -1 and flags is only ON, without RESCAN. > >> item 1 key (0 QGROUP_INFO 0/5) itemoff 16211 itemsize 40 >> generation 7 >> referenced 16384 referenced_compressed 16384 >> exclusive 16384 exclusive_compressed 16384 >> item 2 key (0 QGROUP_LIMIT 0/5) itemoff 16171 itemsize 40 >> flags 0 >> max_referenced 0 max_exclusive 0 >> rsv_referenced 0 rsv_exclusive 0 >> total bytes 26843545600 >> bytes used 171769856 >> uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb >> >> >> And if I mount+rescan again: >> >> $ sudo mount /dev/sdh1 /mnt >> $ sudo btrfs quota rescan -w /mnt >> $ sudo umount /mnt >> >> $ sudo btrfs check -Q /dev/sdh1 >> Checking filesystem on /dev/sdh1 >> UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb >> Print quota groups for /dev/sdh1 >> UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb >> Counts for qgroup id: 0/5 >> our:referenced 170999808 referenced compressed 170999808 >> disk: referenced 170999808 referenced compressed 170999808 >> our:exclusive 170999808 exclusive compressed 170999808 >> disk: exclusive 170999808 exclusive compressed 170999808 >> >> $ sudo btrfs inspect-internal dump-tree -t quota /dev/sdh1 >> btrfs-progs v4.17 >> quota tree key (QUOTA_TREE ROOT_ITEM 0) >> leaf 31309824 items 3 free space 16096 generation 13 owner QUOTA_TREE >> leaf 31309824 flags 0x1(WRITTEN) backref revision 1 >> fs uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb >> chunk uuid 78d753d0-eeb7-4c3e-b825-b6c2c5de5c7a >> item 0 key (0 QGROUP_STATUS 0) itemoff 16251 itemsize 32 >> version 1 generation 13 flags ON scan 213827585 > > Still doesn't look good. > > In v4.17.2 (sorry, just checking the behavior on my host), after correct > rescan + sync, if we don't have RESCAN flag, we should have scan set to -1. > > While in in v4.18-rc1, it doesn't reset the scan progress to -1 after > finished. > And just as explained in previous reply, if later dirty extents are > after scan progress, it won't be accounted. > So this explains everything. > > We just need to find why scan progress is not set
Re: Enabling quota may not correctly rescan on 4.17
On 2018年06月27日 16:25, Misono Tomohiro wrote: > On 2018/06/27 17:10, Qu Wenruo wrote: >> >> >> On 2018年06月26日 14:00, Misono Tomohiro wrote: >>> Hello Nikolay, >>> >>> I noticed that commit 5d23515be669 ("btrfs: Move qgroup rescan >>> on quota enable to btrfs_quota_enable") in 4.17 sometimes causes >>> to fail correctly rescanning quota when quota is enabled. >>> >>> Simple reproducer: >>> >>> $ mkfs.btrfs -f $DEV >>> $ mount $DEV /mnt >>> $ dd if=/dev/urandom of=/mnt/file bs=1000 count=1000 >>> $ btrfs quota enbale /mnt >>> $ umount /mnt >>> $ btrfs check $DEV >>> ... >>> checking quota groups >>> Counts for qgroup id: 0/5 are different >>> our:referenced 1019904 referenced compressed 1019904 >>> disk: referenced 16384 referenced compressed 16384 >>> diff: referenced 1003520 referenced compressed 1003520 >>> our:exclusive 1019904 exclusive compressed 1019904 >>> disk: exclusive 16384 exclusive compressed 16384 >>> diff: exclusive 1003520 exclusive compressed 1003520 >>> found 1413120 bytes used, error(s) found >>> ... >>> >>> This can be also observed in btrfs/114. (Note that progs < 4.17 >>> returns error code 0 even if quota is not consistency and therefore >>> test will incorrectly pass.) >> >> BTW, would you please try to dump the quota tree for such mismatch case? >> >> It could be a btrfs-progs bug which it should skip quota checking if it >> found the quota status item has RESCAN flag. > > Yes, this is what I see after running btrfs/114 (/dev/sdh1 is scratch dev): > > $ sudo btrfs check -Q /dev/sdh1 > Checking filesystem on /dev/sdh1 > UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb > Print quota groups for /dev/sdh1 > UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb > Counts for qgroup id: 0/5 are different > our:referenced 170999808 referenced compressed 170999808 > disk: referenced 16384 referenced compressed 16384 > diff: referenced 170983424 referenced compressed 170983424 > our:exclusive 170999808 exclusive compressed 170999808 > disk: exclusive 16384 exclusive compressed 16384 > diff: exclusive 170983424 exclusive compressed 170983424 > > > $ sudo btrfs inspect-internal dump-tree -t quota /dev/sdh1 > btrfs-progs v4.17 > quota tree key (QUOTA_TREE ROOT_ITEM 0) > leaf 213958656 items 3 free space 16096 generation 9 owner QUOTA_TREE > leaf 213958656 flags 0x1(WRITTEN) backref revision 1 > fs uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb > chunk uuid 78d753d0-eeb7-4c3e-b825-b6c2c5de5c7a > item 0 key (0 QGROUP_STATUS 0) itemoff 16251 itemsize 32 > version 1 generation 9 flags ON scan 30572545 Scan is not -1 and flags is only ON, without RESCAN. > item 1 key (0 QGROUP_INFO 0/5) itemoff 16211 itemsize 40 > generation 7 > referenced 16384 referenced_compressed 16384 > exclusive 16384 exclusive_compressed 16384 > item 2 key (0 QGROUP_LIMIT 0/5) itemoff 16171 itemsize 40 > flags 0 > max_referenced 0 max_exclusive 0 > rsv_referenced 0 rsv_exclusive 0 > total bytes 26843545600 > bytes used 171769856 > uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb > > > And if I mount+rescan again: > > $ sudo mount /dev/sdh1 /mnt > $ sudo btrfs quota rescan -w /mnt > $ sudo umount /mnt > > $ sudo btrfs check -Q /dev/sdh1 > Checking filesystem on /dev/sdh1 > UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb > Print quota groups for /dev/sdh1 > UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb > Counts for qgroup id: 0/5 > our:referenced 170999808 referenced compressed 170999808 > disk: referenced 170999808 referenced compressed 170999808 > our:exclusive 170999808 exclusive compressed 170999808 > disk: exclusive 170999808 exclusive compressed 170999808 > > $ sudo btrfs inspect-internal dump-tree -t quota /dev/sdh1 > btrfs-progs v4.17 > quota tree key (QUOTA_TREE ROOT_ITEM 0) > leaf 31309824 items 3 free space 16096 generation 13 owner QUOTA_TREE > leaf 31309824 flags 0x1(WRITTEN) backref revision 1 > fs uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb > chunk uuid 78d753d0-eeb7-4c3e-b825-b6c2c5de5c7a > item 0 key (0 QGROUP_STATUS 0) itemoff 16251 itemsize 32 > version 1 generation 13 flags ON scan 213827585 Still doesn't look good. In v4.17.2 (sorry, just checking the behavior on my host), after correct rescan + sync, if we don't have RESCAN flag, we should have scan set to -1. While in in v4.18-rc1, it doesn't reset the scan progress to -1 after finished. And just as explained in previous reply, if later dirty extents are after scan progress, it won't be accounted. So this explains everything. We just need to find why scan progress is not set correctly after rescan is finished. Thanks, Qu > item 1 key (0 QGROUP_INFO 0/5) itemoff 16211 itemsize 40 > generation 11 > referenced
Re: Enabling quota may not correctly rescan on 4.17
On 2018/06/27 17:22, Nikolay Borisov wrote: > > > On 27.06.2018 11:20, Misono Tomohiro wrote: >> I can see the failure with or without options... >> maybe it depends on machine spec? > > I'm testing in a virtual machine: > > qemu-system-x86_64 -smp 6 -kernel > /home/nborisov/projects/kernel/source/arch/x86_64/boot/bzImage -append > root=/dev/vda rw -enable-kvm -m 4096 -drive > file=/home/nborisov/projects/qemu/rootfs/ubuntu15.img,if=virtio -virtfs > local,id=fsdev1,path=/mnt/vm_share,security_model=passthrough,mount_tag=hostshare > -drive file=/home/nborisov/scratch/scratch-images/btrfs-test.img,if=virtio > -drive file=/home/nborisov/scratch/scratch-images/scratch2.img,if=virtio > -drive file=/home/nborisov/scratch/scratch-images/scratch3.img,if=virtio > -drive file=/home/nborisov/scratch/scratch-images/scratch4.img,if=virtio > -drive file=/home/nborisov/scratch/scratch-images/scratch5.img,if=virtio > -drive file=/home/nborisov/scratch/scratch-images/scratch6.img,if=virtio > -redir tcp:1235::22 -daemonize > > Perhaps it's not visible on slow storage. Are you testing on NVME or > something like that? No, I use sata ssd and hdd and the problem can be seen on both. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Enabling quota may not correctly rescan on 4.17
On 2018/06/27 17:10, Qu Wenruo wrote: > > > On 2018年06月26日 14:00, Misono Tomohiro wrote: >> Hello Nikolay, >> >> I noticed that commit 5d23515be669 ("btrfs: Move qgroup rescan >> on quota enable to btrfs_quota_enable") in 4.17 sometimes causes >> to fail correctly rescanning quota when quota is enabled. >> >> Simple reproducer: >> >> $ mkfs.btrfs -f $DEV >> $ mount $DEV /mnt >> $ dd if=/dev/urandom of=/mnt/file bs=1000 count=1000 >> $ btrfs quota enbale /mnt >> $ umount /mnt >> $ btrfs check $DEV >> ... >> checking quota groups >> Counts for qgroup id: 0/5 are different >> our:referenced 1019904 referenced compressed 1019904 >> disk: referenced 16384 referenced compressed 16384 >> diff: referenced 1003520 referenced compressed 1003520 >> our:exclusive 1019904 exclusive compressed 1019904 >> disk: exclusive 16384 exclusive compressed 16384 >> diff: exclusive 1003520 exclusive compressed 1003520 >> found 1413120 bytes used, error(s) found >> ... >> >> This can be also observed in btrfs/114. (Note that progs < 4.17 >> returns error code 0 even if quota is not consistency and therefore >> test will incorrectly pass.) > > BTW, would you please try to dump the quota tree for such mismatch case? > > It could be a btrfs-progs bug which it should skip quota checking if it > found the quota status item has RESCAN flag. Yes, this is what I see after running btrfs/114 (/dev/sdh1 is scratch dev): $ sudo btrfs check -Q /dev/sdh1 Checking filesystem on /dev/sdh1 UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb Print quota groups for /dev/sdh1 UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb Counts for qgroup id: 0/5 are different our:referenced 170999808 referenced compressed 170999808 disk: referenced 16384 referenced compressed 16384 diff: referenced 170983424 referenced compressed 170983424 our:exclusive 170999808 exclusive compressed 170999808 disk: exclusive 16384 exclusive compressed 16384 diff: exclusive 170983424 exclusive compressed 170983424 $ sudo btrfs inspect-internal dump-tree -t quota /dev/sdh1 btrfs-progs v4.17 quota tree key (QUOTA_TREE ROOT_ITEM 0) leaf 213958656 items 3 free space 16096 generation 9 owner QUOTA_TREE leaf 213958656 flags 0x1(WRITTEN) backref revision 1 fs uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb chunk uuid 78d753d0-eeb7-4c3e-b825-b6c2c5de5c7a item 0 key (0 QGROUP_STATUS 0) itemoff 16251 itemsize 32 version 1 generation 9 flags ON scan 30572545 item 1 key (0 QGROUP_INFO 0/5) itemoff 16211 itemsize 40 generation 7 referenced 16384 referenced_compressed 16384 exclusive 16384 exclusive_compressed 16384 item 2 key (0 QGROUP_LIMIT 0/5) itemoff 16171 itemsize 40 flags 0 max_referenced 0 max_exclusive 0 rsv_referenced 0 rsv_exclusive 0 total bytes 26843545600 bytes used 171769856 uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb And if I mount+rescan again: $ sudo mount /dev/sdh1 /mnt $ sudo btrfs quota rescan -w /mnt $ sudo umount /mnt $ sudo btrfs check -Q /dev/sdh1 Checking filesystem on /dev/sdh1 UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb Print quota groups for /dev/sdh1 UUID: d07f6028-0ae7-40d4-ac45-01a4505ddcfb Counts for qgroup id: 0/5 our:referenced 170999808 referenced compressed 170999808 disk: referenced 170999808 referenced compressed 170999808 our:exclusive 170999808 exclusive compressed 170999808 disk: exclusive 170999808 exclusive compressed 170999808 $ sudo btrfs inspect-internal dump-tree -t quota /dev/sdh1 btrfs-progs v4.17 quota tree key (QUOTA_TREE ROOT_ITEM 0) leaf 31309824 items 3 free space 16096 generation 13 owner QUOTA_TREE leaf 31309824 flags 0x1(WRITTEN) backref revision 1 fs uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb chunk uuid 78d753d0-eeb7-4c3e-b825-b6c2c5de5c7a item 0 key (0 QGROUP_STATUS 0) itemoff 16251 itemsize 32 version 1 generation 13 flags ON scan 213827585 item 1 key (0 QGROUP_INFO 0/5) itemoff 16211 itemsize 40 generation 11 referenced 170999808 referenced_compressed 170999808 exclusive 170999808 exclusive_compressed 170999808 item 2 key (0 QGROUP_LIMIT 0/5) itemoff 16171 itemsize 40 flags 0 max_referenced 0 max_exclusive 0 rsv_referenced 0 rsv_exclusive 0 total bytes 26843545600 bytes used 171769856 uuid d07f6028-0ae7-40d4-ac45-01a4505ddcfb > > Thanks, > Qu> >> >> My observation is that this commit changed to call initial quota rescan >> when quota is enabeld instead of first comit transaction after enabling >> quota, and therefore if there is something not commited at that time, >> their usage will not be accounted. >> >> Actually this can be simply fixed by calling "btrfs rescan" again or >> calling "btrfs fi sync"
Re: Enabling quota may not correctly rescan on 4.17
On 27.06.2018 11:20, Misono Tomohiro wrote: > I can see the failure with or without options... > maybe it depends on machine spec? I'm testing in a virtual machine: qemu-system-x86_64 -smp 6 -kernel /home/nborisov/projects/kernel/source/arch/x86_64/boot/bzImage -append root=/dev/vda rw -enable-kvm -m 4096 -drive file=/home/nborisov/projects/qemu/rootfs/ubuntu15.img,if=virtio -virtfs local,id=fsdev1,path=/mnt/vm_share,security_model=passthrough,mount_tag=hostshare -drive file=/home/nborisov/scratch/scratch-images/btrfs-test.img,if=virtio -drive file=/home/nborisov/scratch/scratch-images/scratch2.img,if=virtio -drive file=/home/nborisov/scratch/scratch-images/scratch3.img,if=virtio -drive file=/home/nborisov/scratch/scratch-images/scratch4.img,if=virtio -drive file=/home/nborisov/scratch/scratch-images/scratch5.img,if=virtio -drive file=/home/nborisov/scratch/scratch-images/scratch6.img,if=virtio -redir tcp:1235::22 -daemonize Perhaps it's not visible on slow storage. Are you testing on NVME or something like that? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Enabling quota may not correctly rescan on 4.17
On 2018/06/27 17:04, Nikolay Borisov wrote: > > > On 27.06.2018 10:55, Misono Tomohiro wrote: >> On 2018/06/27 16:40, Nikolay Borisov wrote: >>> >>> >>> On 26.06.2018 09:00, Misono Tomohiro wrote: Hello Nikolay, I noticed that commit 5d23515be669 ("btrfs: Move qgroup rescan on quota enable to btrfs_quota_enable") in 4.17 sometimes causes to fail correctly rescanning quota when quota is enabled. Simple reproducer: $ mkfs.btrfs -f $DEV $ mount $DEV /mnt $ dd if=/dev/urandom of=/mnt/file bs=1000 count=1000 $ btrfs quota enbale /mnt $ umount /mnt $ btrfs check $DEV ... checking quota groups Counts for qgroup id: 0/5 are different our:referenced 1019904 referenced compressed 1019904 disk: referenced 16384 referenced compressed 16384 diff: referenced 1003520 referenced compressed 1003520 our:exclusive 1019904 exclusive compressed 1019904 disk: exclusive 16384 exclusive compressed 16384 diff: exclusive 1003520 exclusive compressed 1003520 found 1413120 bytes used, error(s) found ... >>> >>> I ran your script 100 times with progs 4.17 and 4.18-rc1 and didn't >>> observe this error. I didn't observe btrfs/114 also failing but I ran it >>> a lot less. Is there anything else i can do to make your small >>> reproducer more likely to trigger? >> >> How about btrfs/114? I saw the problem in it first (progs 4.17/kernel >> 4.18-rc2) >> and it seems always happen in my environment. > > So far nothing, I'm using David's github/misc-next branch, and latest > commit is: 5330a89b3ee3. > > My mount options are: > > MOUNT_OPTIONS -- -o enospc_debug -o space_cache=v2 /dev/vdc /media/scratch I can see the failure with or without options... maybe it depends on machine spec? > >> >>> This can be also observed in btrfs/114. (Note that progs < 4.17 returns error code 0 even if quota is not consistency and therefore test will incorrectly pass.) My observation is that this commit changed to call initial quota rescan when quota is enabeld instead of first comit transaction after enabling quota, and therefore if there is something not commited at that time, their usage will not be accounted. Actually this can be simply fixed by calling "btrfs rescan" again or calling "btrfs fi sync" before "btrfs quota enable". I think the commit itself makes the code much easier to read, so it may be better to fix the problem in progs (i.e. calling sync before quota enable). Do you have any thoughts? Thanks, Tomohiro Misono -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Enabling quota may not correctly rescan on 4.17
On 2018年06月26日 14:00, Misono Tomohiro wrote: > Hello Nikolay, > > I noticed that commit 5d23515be669 ("btrfs: Move qgroup rescan > on quota enable to btrfs_quota_enable") in 4.17 sometimes causes > to fail correctly rescanning quota when quota is enabled. > > Simple reproducer: > > $ mkfs.btrfs -f $DEV > $ mount $DEV /mnt > $ dd if=/dev/urandom of=/mnt/file bs=1000 count=1000 > $ btrfs quota enbale /mnt > $ umount /mnt > $ btrfs check $DEV > ... > checking quota groups > Counts for qgroup id: 0/5 are different > our:referenced 1019904 referenced compressed 1019904 > disk: referenced 16384 referenced compressed 16384 > diff: referenced 1003520 referenced compressed 1003520 > our:exclusive 1019904 exclusive compressed 1019904 > disk: exclusive 16384 exclusive compressed 16384 > diff: exclusive 1003520 exclusive compressed 1003520 > found 1413120 bytes used, error(s) found > ... > > This can be also observed in btrfs/114. (Note that progs < 4.17 > returns error code 0 even if quota is not consistency and therefore > test will incorrectly pass.) BTW, would you please try to dump the quota tree for such mismatch case? It could be a btrfs-progs bug which it should skip quota checking if it found the quota status item has RESCAN flag. Thanks, Qu > > My observation is that this commit changed to call initial quota rescan > when quota is enabeld instead of first comit transaction after enabling > quota, and therefore if there is something not commited at that time, > their usage will not be accounted. > > Actually this can be simply fixed by calling "btrfs rescan" again or > calling "btrfs fi sync" before "btrfs quota enable". > > I think the commit itself makes the code much easier to read, so it may > be better to fix the problem in progs (i.e. calling sync before quota enable). > > Do you have any thoughts? > > Thanks, > Tomohiro Misono > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: qgroups: Move transaction managed inside btrfs_quota_enable
On 2018年06月26日 16:46, Misono Tomohiro wrote: > On 2018/06/26 16:09, Nikolay Borisov wrote: >> Commit 5d23515be669 ("btrfs: Move qgroup rescan on quota enable to >> btrfs_quota_enable") not only resulted in an easier to follow code but >> it also introduced a subtle bug. It changed the timing when the initial >> transaction rescan was happening - before the commit it would happen >> after transaction commit had occured but after the commit it might happen >> before the transaction was committed. This results in failure to >> correctly rescan the quota since there could be data which is still not >> committed on disk. >> >> This patch aims to fix this by movign the transaction creation/commit >> inside btrfs_quota_enable, which allows to schedule the quota commit >> after the transaction has been committed. >> >> Fixes: 5d23515be669 ("btrfs: Move qgroup rescan on quota enable to >> btrfs_quota_enable") >> Reported-by: Misono Tomohiro >> Signed-off-by: Nikolay Borisov >> --- >> Hi Misono, >> >> Care to test the following patch ? If you say it's ok I will do a similar >> one >> for the btrfs_quota_disable function. This will also allow me to get rid of >> the extra err variable in btrfs_ioctl_quota_ctl. Additionally I think the >> number of blocks (2) passed to the transaction for reservation might be >> wrong. > > Hi, > > The patch does not removes start_transaction() from btrfs_ioctl_quota_ctl(), > so this does not work but I understand your approach (continue to below). > >> >> fs/btrfs/ioctl.c | 2 +- >> fs/btrfs/qgroup.c | 17 ++--- >> fs/btrfs/qgroup.h | 3 +-- >> 3 files changed, 16 insertions(+), 6 deletions(-) >> >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >> index a399750b9e41..bf99d7aae3ae 100644 >> --- a/fs/btrfs/ioctl.c >> +++ b/fs/btrfs/ioctl.c >> @@ -5161,7 +5161,7 @@ static long btrfs_ioctl_quota_ctl(struct file *file, >> void __user *arg) >> >> switch (sa->cmd) { >> case BTRFS_QUOTA_CTL_ENABLE: >> -ret = btrfs_quota_enable(trans, fs_info); >> +ret = btrfs_quota_enable(fs_info); >> break; >> case BTRFS_QUOTA_CTL_DISABLE: >> ret = btrfs_quota_disable(trans, fs_info); >> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c >> index 1874a6d2e6f5..91bb7e97c0d0 100644 >> --- a/fs/btrfs/qgroup.c >> +++ b/fs/btrfs/qgroup.c >> @@ -875,8 +875,7 @@ static int btrfs_clean_quota_tree(struct >> btrfs_trans_handle *trans, >> return ret; >> } >> >> -int btrfs_quota_enable(struct btrfs_trans_handle *trans, >> - struct btrfs_fs_info *fs_info) >> +int btrfs_quota_enable(struct btrfs_fs_info *fs_info) >> { >> struct btrfs_root *quota_root; >> struct btrfs_root *tree_root = fs_info->tree_root; >> @@ -886,6 +885,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, >> struct btrfs_key key; >> struct btrfs_key found_key; >> struct btrfs_qgroup *qgroup = NULL; >> +struct btrfs_trans_handle *trans = NULL; >> int ret = 0; >> int slot; >> >> @@ -893,6 +893,12 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, >> if (fs_info->quota_root) >> goto out; >> >> +trans = btrfs_start_transaction(tree_root, 2); > > (Should we use fs_root for quota?) > >> +if (IS_ERR(trans)) { >> +ret = PTR_ERR(trans); >> +goto out; >> +} >> + >> fs_info->qgroup_ulist = ulist_alloc(GFP_KERNEL); >> if (!fs_info->qgroup_ulist) { >> ret = -ENOMEM; >> @@ -987,6 +993,11 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, >> fs_info->quota_root = quota_root; >> set_bit(BTRFS_FS_QUOTA_ENABLED, _info->flags); >> spin_unlock(_info->qgroup_lock); >> + >> +ret = btrfs_commit_transaction(trans); >> +if (ret) >> +goto out_free_path; >> + >> ret = qgroup_rescan_init(fs_info, 0, 1); > > However, I'm not sure if this approach completely works well when some files > are > concurrently written while quota is being enabled. > Since before the commit 5d23515be669, quota_rescan_init() is called during > transaction > commit, but now quota_rescan_init() is called outside of transacation. > So, is there still a slight possibility that the same problem occurs here? This is the tricky part of btrfs quota rescan. For rescan, it only scans commit root (even before the large quota rework), and records where the current scanning location. Then, qgroup code does a trick, if new dirty extents is after current scanning location, which means later rescan would scan that extent later, so it skips the accounting and let rescan to handle it. Currently since qgroup only accounts extent at transaction commit time, the only possible cause of problems should be the timing of @qgroup_rescan_progress initialization. I think we should hold a trans handle when setting @qgroup_rescan_progress, but I may need extra investigation into the race. Thanks, Qu > > (I
Re: Enabling quota may not correctly rescan on 4.17
On 27.06.2018 10:55, Misono Tomohiro wrote: > On 2018/06/27 16:40, Nikolay Borisov wrote: >> >> >> On 26.06.2018 09:00, Misono Tomohiro wrote: >>> Hello Nikolay, >>> >>> I noticed that commit 5d23515be669 ("btrfs: Move qgroup rescan >>> on quota enable to btrfs_quota_enable") in 4.17 sometimes causes >>> to fail correctly rescanning quota when quota is enabled. >>> >>> Simple reproducer: >>> >>> $ mkfs.btrfs -f $DEV >>> $ mount $DEV /mnt >>> $ dd if=/dev/urandom of=/mnt/file bs=1000 count=1000 >>> $ btrfs quota enbale /mnt >>> $ umount /mnt >>> $ btrfs check $DEV >>> ... >>> checking quota groups >>> Counts for qgroup id: 0/5 are different >>> our:referenced 1019904 referenced compressed 1019904 >>> disk: referenced 16384 referenced compressed 16384 >>> diff: referenced 1003520 referenced compressed 1003520 >>> our:exclusive 1019904 exclusive compressed 1019904 >>> disk: exclusive 16384 exclusive compressed 16384 >>> diff: exclusive 1003520 exclusive compressed 1003520 >>> found 1413120 bytes used, error(s) found >>> ... >> >> I ran your script 100 times with progs 4.17 and 4.18-rc1 and didn't >> observe this error. I didn't observe btrfs/114 also failing but I ran it >> a lot less. Is there anything else i can do to make your small >> reproducer more likely to trigger? > > How about btrfs/114? I saw the problem in it first (progs 4.17/kernel > 4.18-rc2) > and it seems always happen in my environment. So far nothing, I'm using David's github/misc-next branch, and latest commit is: 5330a89b3ee3. My mount options are: MOUNT_OPTIONS -- -o enospc_debug -o space_cache=v2 /dev/vdc /media/scratch > >> >>> >>> This can be also observed in btrfs/114. (Note that progs < 4.17 >>> returns error code 0 even if quota is not consistency and therefore >>> test will incorrectly pass.) >>> >>> My observation is that this commit changed to call initial quota rescan >>> when quota is enabeld instead of first comit transaction after enabling >>> quota, and therefore if there is something not commited at that time, >>> their usage will not be accounted. >>> >>> Actually this can be simply fixed by calling "btrfs rescan" again or >>> calling "btrfs fi sync" before "btrfs quota enable". >>> >>> I think the commit itself makes the code much easier to read, so it may >>> be better to fix the problem in progs (i.e. calling sync before quota >>> enable). >>> >>> Do you have any thoughts? >>> >>> Thanks, >>> Tomohiro Misono >>> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Enabling quota may not correctly rescan on 4.17
On 2018/06/27 16:40, Nikolay Borisov wrote: > > > On 26.06.2018 09:00, Misono Tomohiro wrote: >> Hello Nikolay, >> >> I noticed that commit 5d23515be669 ("btrfs: Move qgroup rescan >> on quota enable to btrfs_quota_enable") in 4.17 sometimes causes >> to fail correctly rescanning quota when quota is enabled. >> >> Simple reproducer: >> >> $ mkfs.btrfs -f $DEV >> $ mount $DEV /mnt >> $ dd if=/dev/urandom of=/mnt/file bs=1000 count=1000 >> $ btrfs quota enbale /mnt >> $ umount /mnt >> $ btrfs check $DEV >> ... >> checking quota groups >> Counts for qgroup id: 0/5 are different >> our:referenced 1019904 referenced compressed 1019904 >> disk: referenced 16384 referenced compressed 16384 >> diff: referenced 1003520 referenced compressed 1003520 >> our:exclusive 1019904 exclusive compressed 1019904 >> disk: exclusive 16384 exclusive compressed 16384 >> diff: exclusive 1003520 exclusive compressed 1003520 >> found 1413120 bytes used, error(s) found >> ... > > I ran your script 100 times with progs 4.17 and 4.18-rc1 and didn't > observe this error. I didn't observe btrfs/114 also failing but I ran it > a lot less. Is there anything else i can do to make your small > reproducer more likely to trigger? How about btrfs/114? I saw the problem in it first (progs 4.17/kernel 4.18-rc2) and it seems always happen in my environment. > >> >> This can be also observed in btrfs/114. (Note that progs < 4.17 >> returns error code 0 even if quota is not consistency and therefore >> test will incorrectly pass.) >> >> My observation is that this commit changed to call initial quota rescan >> when quota is enabeld instead of first comit transaction after enabling >> quota, and therefore if there is something not commited at that time, >> their usage will not be accounted. >> >> Actually this can be simply fixed by calling "btrfs rescan" again or >> calling "btrfs fi sync" before "btrfs quota enable". >> >> I think the commit itself makes the code much easier to read, so it may >> be better to fix the problem in progs (i.e. calling sync before quota >> enable). >> >> Do you have any thoughts? >> >> Thanks, >> Tomohiro Misono >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Enabling quota may not correctly rescan on 4.17
On 26.06.2018 09:00, Misono Tomohiro wrote: > Hello Nikolay, > > I noticed that commit 5d23515be669 ("btrfs: Move qgroup rescan > on quota enable to btrfs_quota_enable") in 4.17 sometimes causes > to fail correctly rescanning quota when quota is enabled. > > Simple reproducer: > > $ mkfs.btrfs -f $DEV > $ mount $DEV /mnt > $ dd if=/dev/urandom of=/mnt/file bs=1000 count=1000 > $ btrfs quota enbale /mnt > $ umount /mnt > $ btrfs check $DEV > ... > checking quota groups > Counts for qgroup id: 0/5 are different > our:referenced 1019904 referenced compressed 1019904 > disk: referenced 16384 referenced compressed 16384 > diff: referenced 1003520 referenced compressed 1003520 > our:exclusive 1019904 exclusive compressed 1019904 > disk: exclusive 16384 exclusive compressed 16384 > diff: exclusive 1003520 exclusive compressed 1003520 > found 1413120 bytes used, error(s) found > ... I ran your script 100 times with progs 4.17 and 4.18-rc1 and didn't observe this error. I didn't observe btrfs/114 also failing but I ran it a lot less. Is there anything else i can do to make your small reproducer more likely to trigger? > > This can be also observed in btrfs/114. (Note that progs < 4.17 > returns error code 0 even if quota is not consistency and therefore > test will incorrectly pass.) > > My observation is that this commit changed to call initial quota rescan > when quota is enabeld instead of first comit transaction after enabling > quota, and therefore if there is something not commited at that time, > their usage will not be accounted. > > Actually this can be simply fixed by calling "btrfs rescan" again or > calling "btrfs fi sync" before "btrfs quota enable". > > I think the commit itself makes the code much easier to read, so it may > be better to fix the problem in progs (i.e. calling sync before quota enable). > > Do you have any thoughts? > > Thanks, > Tomohiro Misono > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/5] code cleanups for btrfs_get_acl()
On Wed, Jun 27, 2018 at 12:16:33PM +0800, Chengguang Xu wrote: > This patch set does code cleanups for btrfs_get_acl(). > > Chengguang Xu (5): > btrfs: return error instead of crash when detecting unexpected type in > btrfs_get_acl() > btrfs: replace empty string with NULL when getting attribute length in > btrfs_get_acl() > btrfs: remove unnecessary -ERANGE check in btrfs_get_acl() > btrfs: avoid error code overriding in btrfs_get_acl() > btrfs: remove unnecessary bracket in btrfs_get_acl() Perfect, that's exactly what I expected. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html