Re: [PATCH v2] fstests: btrfs: Add regression test for reserved space leak.
On 2015/08/05 12:10, Dave Chinner wrote: On Wed, Aug 05, 2015 at 11:52:36AM +0900, Tsutomu Itoh wrote: On 2015/08/05 10:57, Dave Chinner wrote: On Wed, Aug 05, 2015 at 09:39:35AM +0800, Qu Wenruo wrote: Tsutomu Itoh wrote on 2015/08/05 10:26 +0900: On 2015/08/05 10:08, Qu Wenruo wrote: +# As the reserved space freeing happens at commit_transaction time, +# without a transaction commit, no reserved space needs freeing and +# won't trigger the bug. +sync Isn't '$BTRFS_UTIL_PROG filesystem sync' better instead of 'sync'? Thanks, Tsutomu Hi, Tsutomu-san, Yes, I did use such method before, but Dave said it's better to use unified interface to sync a filesystem other than the specialized one. So I still use sync as Dave said. Mainly because "sync" is what users will use to make sure their data is safe. filesystem specific tools have a habit of doing "special stuff" to sync a filesystem, so it may not reflect the way users expect the system to behaviour when they run sync. The other option is this: _syncfs() { mntpt=$1 $XFS_IO_PROG -c syncfs $mntpt } _sync_test() { _syncfs $TEST_DIR } _sync_scratch() { _syncfs $SCRATCH_MNT } which only runs sync on the filesystem that needs syncing (via the syncfs() syscall) I think that syncfs is better instead of sync because the syncing of the specified filesystem is necessary. sync guarantees that. IOWs, there's no difference between sync and syncfs for the persepctive of this test and so the test does not need really need changing. I agree that sync and syncfs is not different for this test. However, sync is syncing of all filesystems, and, syncfs is syncing of the specified filesystem only. Therefor, I think that it is better to use sync and syncfs properly according to the necessity. Thanks, Tsutomu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] fstests: btrfs: Add regression test for reserved space leak.
On Wed, Aug 05, 2015 at 11:52:36AM +0900, Tsutomu Itoh wrote: > On 2015/08/05 10:57, Dave Chinner wrote: > >On Wed, Aug 05, 2015 at 09:39:35AM +0800, Qu Wenruo wrote: > >>Tsutomu Itoh wrote on 2015/08/05 10:26 +0900: > >>>On 2015/08/05 10:08, Qu Wenruo wrote: > +# As the reserved space freeing happens at commit_transaction time, > +# without a transaction commit, no reserved space needs freeing and > +# won't trigger the bug. > +sync > >>> > >>>Isn't '$BTRFS_UTIL_PROG filesystem sync' better instead of 'sync'? > >>> > >>>Thanks, > >>>Tsutomu > >>Hi, Tsutomu-san, > >> > >>Yes, I did use such method before, but Dave said it's better to use > >>unified interface to sync a filesystem other than the specialized > >>one. > >> > >>So I still use sync as Dave said. > > > >Mainly because "sync" is what users will use to make sure their data > >is safe. filesystem specific tools have a habit of doing "special > >stuff" to sync a filesystem, so it may not reflect the way users > >expect the system to behaviour when they run sync. > > > >The other option is this: > > > >_syncfs() > >{ > > mntpt=$1 > > > > $XFS_IO_PROG -c syncfs $mntpt > >} > > > >_sync_test() > >{ > > _syncfs $TEST_DIR > >} > > > >_sync_scratch() > >{ > > _syncfs $SCRATCH_MNT > >} > > > >which only runs sync on the filesystem that needs syncing (via the > >syncfs() syscall) > > I think that syncfs is better instead of sync because the syncing of > the specified filesystem is necessary. sync guarantees that. IOWs, there's no difference between sync and syncfs for the persepctive of this test and so the test does not need really need changing. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] fstests: btrfs: Add regression test for reserved space leak.
Tsutomu Itoh wrote on 2015/08/05 11:52 +0900: On 2015/08/05 10:57, Dave Chinner wrote: On Wed, Aug 05, 2015 at 09:39:35AM +0800, Qu Wenruo wrote: Tsutomu Itoh wrote on 2015/08/05 10:26 +0900: On 2015/08/05 10:08, Qu Wenruo wrote: +# As the reserved space freeing happens at commit_transaction time, +# without a transaction commit, no reserved space needs freeing and +# won't trigger the bug. +sync Isn't '$BTRFS_UTIL_PROG filesystem sync' better instead of 'sync'? Thanks, Tsutomu Hi, Tsutomu-san, Yes, I did use such method before, but Dave said it's better to use unified interface to sync a filesystem other than the specialized one. So I still use sync as Dave said. Mainly because "sync" is what users will use to make sure their data is safe. filesystem specific tools have a habit of doing "special stuff" to sync a filesystem, so it may not reflect the way users expect the system to behaviour when they run sync. The other option is this: _syncfs() { mntpt=$1 $XFS_IO_PROG -c syncfs $mntpt } _sync_test() { _syncfs $TEST_DIR } _sync_scratch() { _syncfs $SCRATCH_MNT } which only runs sync on the filesystem that needs syncing (via the syncfs() syscall) I think that syncfs is better instead of sync because the syncing of the specified filesystem is necessary. Thanks, Tsutomu I'm OK changing the sync to _sync_scratch(). But no one in btrfs testcase uses that function and I didn't see the reason to specifically call _sync_scratch() instead of sync() BTW. Thanks, Qu Cheers, Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] fstests: btrfs: Add regression test for reserved space leak.
On 2015/08/05 10:57, Dave Chinner wrote: On Wed, Aug 05, 2015 at 09:39:35AM +0800, Qu Wenruo wrote: Tsutomu Itoh wrote on 2015/08/05 10:26 +0900: On 2015/08/05 10:08, Qu Wenruo wrote: +# As the reserved space freeing happens at commit_transaction time, +# without a transaction commit, no reserved space needs freeing and +# won't trigger the bug. +sync Isn't '$BTRFS_UTIL_PROG filesystem sync' better instead of 'sync'? Thanks, Tsutomu Hi, Tsutomu-san, Yes, I did use such method before, but Dave said it's better to use unified interface to sync a filesystem other than the specialized one. So I still use sync as Dave said. Mainly because "sync" is what users will use to make sure their data is safe. filesystem specific tools have a habit of doing "special stuff" to sync a filesystem, so it may not reflect the way users expect the system to behaviour when they run sync. The other option is this: _syncfs() { mntpt=$1 $XFS_IO_PROG -c syncfs $mntpt } _sync_test() { _syncfs $TEST_DIR } _sync_scratch() { _syncfs $SCRATCH_MNT } which only runs sync on the filesystem that needs syncing (via the syncfs() syscall) I think that syncfs is better instead of sync because the syncing of the specified filesystem is necessary. Thanks, Tsutomu Cheers, Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] fstests: btrfs: Add regression test for reserved space leak.
On Wed, Aug 05, 2015 at 09:39:35AM +0800, Qu Wenruo wrote: > Tsutomu Itoh wrote on 2015/08/05 10:26 +0900: > >On 2015/08/05 10:08, Qu Wenruo wrote: > >>+# As the reserved space freeing happens at commit_transaction time, > >>+# without a transaction commit, no reserved space needs freeing and > >>+# won't trigger the bug. > >>+sync > > > >Isn't '$BTRFS_UTIL_PROG filesystem sync' better instead of 'sync'? > > > >Thanks, > >Tsutomu > Hi, Tsutomu-san, > > Yes, I did use such method before, but Dave said it's better to use > unified interface to sync a filesystem other than the specialized > one. > > So I still use sync as Dave said. Mainly because "sync" is what users will use to make sure their data is safe. filesystem specific tools have a habit of doing "special stuff" to sync a filesystem, so it may not reflect the way users expect the system to behaviour when they run sync. The other option is this: _syncfs() { mntpt=$1 $XFS_IO_PROG -c syncfs $mntpt } _sync_test() { _syncfs $TEST_DIR } _sync_scratch() { _syncfs $SCRATCH_MNT } which only runs sync on the filesystem that needs syncing (via the syncfs() syscall) Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] fstests: btrfs: Add regression test for reserved space leak.
Tsutomu Itoh wrote on 2015/08/05 10:26 +0900: On 2015/08/05 10:08, Qu Wenruo wrote: The regression is introduced in v4.2-rc1, with the big btrfs qgroup change. The problem is, qgroup reserved space is never freed, causing even we increase the limit, we can still hit the EDQUOT much faster than it should. Reported-by: Tsutomu Itoh Signed-off-by: Qu Wenruo --- tests/btrfs/089 | 88 + tests/btrfs/089.out | 5 +++ tests/btrfs/group | 1 + 3 files changed, 94 insertions(+) create mode 100755 tests/btrfs/089 create mode 100644 tests/btrfs/089.out diff --git a/tests/btrfs/089 b/tests/btrfs/089 new file mode 100755 index 000..82db96c --- /dev/null +++ b/tests/btrfs/089 @@ -0,0 +1,88 @@ +#! /bin/bash +# FS QA Test 089 +# +# Regression test for btrfs qgroup reserved space leak. +# +# Due to qgroup reserved space leak, EDQUOT can be trigged even it's not +# over limit after previous write. +# +#--- +# Copyright (c) 2015 Fujitsu. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch +_need_to_be_root + +# Use big blocksize to ensure there is still enough space left +# for metadata reserve after hitting EDQUOT +BLOCKSIZE=$(( 2 * 1024 * 1024 )) +FILESIZE=$(( 128 * 1024 * 1024 )) # 128Mbytes + +# The last block won't be able to finish write, as metadata takes +# $NODESIZE space, causing the last block triggering EDQUOT +LENGTH=$(( $FILESIZE - $BLOCKSIZE )) + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount +_require_fs_space $SCRATCH_MNT $(($FILESIZE * 2 / 1024)) + +_run_btrfs_util_prog quota enable $SCRATCH_MNT +_run_btrfs_util_prog qgroup limit $FILESIZE 5 $SCRATCH_MNT + +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE 0 $LENGTH" \ + $SCRATCH_MNT/foo | _filter_xfs_io + +# A sync is needed to trigger a commit_transaction. +# As the reserved space freeing happens at commit_transaction time, +# without a transaction commit, no reserved space needs freeing and +# won't trigger the bug. +sync Isn't '$BTRFS_UTIL_PROG filesystem sync' better instead of 'sync'? Thanks, Tsutomu Hi, Tsutomu-san, Yes, I did use such method before, but Dave said it's better to use unified interface to sync a filesystem other than the specialized one. So I still use sync as Dave said. Thanks, Qu + +# Double the limit to allow further write +_run_btrfs_util_prog qgroup limit $(($FILESIZE * 2)) 5 $SCRATCH_MNT + +# Test whether further write can succeed +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE $LENGTH $LENGTH" \ + $SCRATCH_MNT/foo | _filter_xfs_io + +# success, all done +status=0 +exit diff --git a/tests/btrfs/089.out b/tests/btrfs/089.out new file mode 100644 index 000..396888f --- /dev/null +++ b/tests/btrfs/089.out @@ -0,0 +1,5 @@ +QA output created by 089 +wrote 132120576/132120576 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 132120576/132120576 bytes at offset 132120576 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) diff --git a/tests/btrfs/group b/tests/btrfs/group index ffe18bf..225b532 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -91,6 +91,7 @@ 086 auto quick clone 087 auto quick send 088 auto quick metadata +089 auto quick qgroup 090 auto quick metadata 091 auto quick qgroup 092 auto quick send -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] fstests: btrfs: Add regression test for reserved space leak.
On 2015/08/05 10:08, Qu Wenruo wrote: > The regression is introduced in v4.2-rc1, with the big btrfs qgroup > change. > The problem is, qgroup reserved space is never freed, causing even we > increase the limit, we can still hit the EDQUOT much faster than it > should. > > Reported-by: Tsutomu Itoh > Signed-off-by: Qu Wenruo > --- > tests/btrfs/089 | 88 > + > tests/btrfs/089.out | 5 +++ > tests/btrfs/group | 1 + > 3 files changed, 94 insertions(+) > create mode 100755 tests/btrfs/089 > create mode 100644 tests/btrfs/089.out > > diff --git a/tests/btrfs/089 b/tests/btrfs/089 > new file mode 100755 > index 000..82db96c > --- /dev/null > +++ b/tests/btrfs/089 > @@ -0,0 +1,88 @@ > +#! /bin/bash > +# FS QA Test 089 > +# > +# Regression test for btrfs qgroup reserved space leak. > +# > +# Due to qgroup reserved space leak, EDQUOT can be trigged even it's not > +# over limit after previous write. > +# > +#--- > +# Copyright (c) 2015 Fujitsu. All Rights Reserved. > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#--- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_supported_os Linux > +_require_scratch > +_need_to_be_root > + > +# Use big blocksize to ensure there is still enough space left > +# for metadata reserve after hitting EDQUOT > +BLOCKSIZE=$(( 2 * 1024 * 1024 )) > +FILESIZE=$(( 128 * 1024 * 1024 )) # 128Mbytes > + > +# The last block won't be able to finish write, as metadata takes > +# $NODESIZE space, causing the last block triggering EDQUOT > +LENGTH=$(( $FILESIZE - $BLOCKSIZE )) > + > +_scratch_mkfs >>$seqres.full 2>&1 > +_scratch_mount > +_require_fs_space $SCRATCH_MNT $(($FILESIZE * 2 / 1024)) > + > +_run_btrfs_util_prog quota enable $SCRATCH_MNT > +_run_btrfs_util_prog qgroup limit $FILESIZE 5 $SCRATCH_MNT > + > +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE 0 $LENGTH" \ > + $SCRATCH_MNT/foo | _filter_xfs_io > + > +# A sync is needed to trigger a commit_transaction. > +# As the reserved space freeing happens at commit_transaction time, > +# without a transaction commit, no reserved space needs freeing and > +# won't trigger the bug. > +sync Isn't '$BTRFS_UTIL_PROG filesystem sync' better instead of 'sync'? Thanks, Tsutomu > + > +# Double the limit to allow further write > +_run_btrfs_util_prog qgroup limit $(($FILESIZE * 2)) 5 $SCRATCH_MNT > + > +# Test whether further write can succeed > +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE $LENGTH $LENGTH" \ > + $SCRATCH_MNT/foo | _filter_xfs_io > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/089.out b/tests/btrfs/089.out > new file mode 100644 > index 000..396888f > --- /dev/null > +++ b/tests/btrfs/089.out > @@ -0,0 +1,5 @@ > +QA output created by 089 > +wrote 132120576/132120576 bytes at offset 0 > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +wrote 132120576/132120576 bytes at offset 132120576 > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > diff --git a/tests/btrfs/group b/tests/btrfs/group > index ffe18bf..225b532 100644 > --- a/tests/btrfs/group > +++ b/tests/btrfs/group > @@ -91,6 +91,7 @@ > 086 auto quick clone > 087 auto quick send > 088 auto quick metadata > +089 auto quick qgroup > 090 auto quick metadata > 091 auto quick qgroup > 092 auto quick send > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] fstests: btrfs: Add regression test for reserved space leak.
The regression is introduced in v4.2-rc1, with the big btrfs qgroup change. The problem is, qgroup reserved space is never freed, causing even we increase the limit, we can still hit the EDQUOT much faster than it should. Reported-by: Tsutomu Itoh Signed-off-by: Qu Wenruo --- tests/btrfs/089 | 88 + tests/btrfs/089.out | 5 +++ tests/btrfs/group | 1 + 3 files changed, 94 insertions(+) create mode 100755 tests/btrfs/089 create mode 100644 tests/btrfs/089.out diff --git a/tests/btrfs/089 b/tests/btrfs/089 new file mode 100755 index 000..82db96c --- /dev/null +++ b/tests/btrfs/089 @@ -0,0 +1,88 @@ +#! /bin/bash +# FS QA Test 089 +# +# Regression test for btrfs qgroup reserved space leak. +# +# Due to qgroup reserved space leak, EDQUOT can be trigged even it's not +# over limit after previous write. +# +#--- +# Copyright (c) 2015 Fujitsu. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch +_need_to_be_root + +# Use big blocksize to ensure there is still enough space left +# for metadata reserve after hitting EDQUOT +BLOCKSIZE=$(( 2 * 1024 * 1024 )) +FILESIZE=$(( 128 * 1024 * 1024 )) # 128Mbytes + +# The last block won't be able to finish write, as metadata takes +# $NODESIZE space, causing the last block triggering EDQUOT +LENGTH=$(( $FILESIZE - $BLOCKSIZE )) + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount +_require_fs_space $SCRATCH_MNT $(($FILESIZE * 2 / 1024)) + +_run_btrfs_util_prog quota enable $SCRATCH_MNT +_run_btrfs_util_prog qgroup limit $FILESIZE 5 $SCRATCH_MNT + +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE 0 $LENGTH" \ + $SCRATCH_MNT/foo | _filter_xfs_io + +# A sync is needed to trigger a commit_transaction. +# As the reserved space freeing happens at commit_transaction time, +# without a transaction commit, no reserved space needs freeing and +# won't trigger the bug. +sync + +# Double the limit to allow further write +_run_btrfs_util_prog qgroup limit $(($FILESIZE * 2)) 5 $SCRATCH_MNT + +# Test whether further write can succeed +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE $LENGTH $LENGTH" \ + $SCRATCH_MNT/foo | _filter_xfs_io + +# success, all done +status=0 +exit diff --git a/tests/btrfs/089.out b/tests/btrfs/089.out new file mode 100644 index 000..396888f --- /dev/null +++ b/tests/btrfs/089.out @@ -0,0 +1,5 @@ +QA output created by 089 +wrote 132120576/132120576 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 132120576/132120576 bytes at offset 132120576 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) diff --git a/tests/btrfs/group b/tests/btrfs/group index ffe18bf..225b532 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -91,6 +91,7 @@ 086 auto quick clone 087 auto quick send 088 auto quick metadata +089 auto quick qgroup 090 auto quick metadata 091 auto quick qgroup 092 auto quick send -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: btrfs: Add regression test for reserved space leak.
Filipe David Manana wrote on 2015/08/04 14:16 +0100: On Tue, Aug 4, 2015 at 7:27 AM, Qu Wenruo wrote: The regression is introduced in v4.2-rc1, with the big btrfs qgroup change. The problem is, qgroup reserved space is never freed, causing even we increase the limit, we can still hit the EDQUOT much faster than it should. Reported-by: Tsutomu Itoh Signed-off-by: Qu Wenruo Thanks for doing this Qu. The test fails without the btrfs fix and passes with it, as expected. However, one question below: Thanks for the review, Filipe. I'll explain it inline below. --- tests/btrfs/089 | 83 + tests/btrfs/089.out | 5 tests/btrfs/group | 1 + 3 files changed, 89 insertions(+) create mode 100755 tests/btrfs/089 create mode 100644 tests/btrfs/089.out diff --git a/tests/btrfs/089 b/tests/btrfs/089 new file mode 100755 index 000..0c018f2 --- /dev/null +++ b/tests/btrfs/089 @@ -0,0 +1,83 @@ +#! /bin/bash +# FS QA Test 089 +# +# Regression test for btrfs qgroup reserved space leak. +# +# Due to qgroup reserved space leak, EDQUOT can be trigged even it's not +# over limit after previous write. +# +#--- +# Copyright (c) 2015 Fujitsu. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch +_need_to_be_root + +# Use big blocksize to ensure there is still enough space left +# for metadata reserve after hitting EDQUOT +BLOCKSIZE=$(( 2 * 1024 * 1024 )) +FILESIZE=$(( 128 * 1024 * 1024 )) # 128Mbytes + +# The last block won't be able to finish write, as metadata takes +# $NODESIZE space, causing the last block triggering EDQUOT +LENGTH=$(( $FILESIZE - $BLOCKSIZE )) + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount +_require_fs_space $SCRATCH_MNT $(($FILESIZE * 2 / 1024)) + +_run_btrfs_util_prog quota enable $SCRATCH_MNT +_run_btrfs_util_prog qgroup limit $FILESIZE 5 $SCRATCH_MNT + +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE 0 $LENGTH" \ + $SCRATCH_MNT/foo | _filter_xfs_io +sync Why is the sync needed here? Can you add a comment explaining why? It isn't trivial/obvious (for me at least), specially because without the call to "sync" the test passes without the btrfs fix. thanks No problem, I'll send a v2 patch with explain about the sync. The reason is, without the sync, it's highly possible the data is not flush into disk. So the reserved space is correct until data is written. For current write flow with sync, without the fix patch: 1) Want to write first 126M Reserve 126M space Qgroup 5: reserved = 126M, rfer = 0(*), rfer_max = 128M *: Just ignore metadata, as blocksize 2M is much larger than nodesize(16K) 2) Sync Data writeback and metadata change |- Run delayed refs |- Qgroup accouting Qgroup 5: reserved = 126M, rfer = 126M, rfer_max = 128M ^^ Should be 0, as reserved data is written into disk. 3) Increase limit to 256M Qgroup 5: reserved = 126M, rfer = 126M, rfer_max = 256M 4) Want to write the next 126M Reserve 126M space. But qgroup 5 only has less than 4M available space. rfer_max - (reserved + rfer) = 4M So reserve fails with EDQUOT. On the other hand, if there is no sync: 1) Want to write first 126M Reserve 126M space Qgroup 5: reserved = 126M, rfer = 0(*), rfer_max = 128M *: Ignore metadata again. Also we assume your memory is large enough to keep that amount of dirty pages without trigger page flush. 3) Increase limit to 256M Qgroup 5: reserved = 126M, rfer = 0, rfer_max = 256M Rfer will only be increase at commit_transaction() time. So it will stay 0 until manually sync or dirty page number triggers a flush. 4) Want to write the next 126M Reserve 126M space. Now qgroup 5 has 256 - 126 = 130M available space. The reserve will succeed withou
Re: [PATCH 2/3] Btrfs: fix null pointer dereference when extent buffer is already freed
Wasn't this already solved by commit [1] added in the last 4.2-rc? Ah yes. it does. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS disaster (of my own making). Is this recoverable?
On Tue, Aug 4, 2015 at 1:10 AM, Duncan <1i5t5.dun...@cox.net> wrote: > I don't remember you saying anything about trying btrfs restore. I did try it earlier in dry-run mode and it can't do anything: = btrfs restore -D /dev/sdc /mnt/saved/ warning, device 2 is missing bytenr mismatch, want=20971520, have=0 Couldn't read chunk root Could not open root, trying backup super warning, device 2 is missing bytenr mismatch, want=20971520, have=0 Couldn't read chunk root Could not open root, trying backup super warning, device 2 is missing bytenr mismatch, want=20971520, have=0 Couldn't read chunk root Could not open root, trying backup super = Find-roots, list-roots, none of it works. = btrfs restore --list-roots /dev/sde checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238 checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238 bytenr mismatch, want=20971520, have=8330001001141004672 Couldn't read chunk root Could not open root, trying backup super warning, device 1 is missing checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238 checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238 bytenr mismatch, want=20971520, have=8330001001141004672 Couldn't read chunk root Could not open root, trying backup super warning, device 1 is missing checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238 checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238 bytenr mismatch, want=20971520, have=8330001001141004672 Couldn't read chunk root Could not open root, trying backup super = btrfs-find-root -a /dev/sdc warning, device 2 is missing Couldn't read chunk root btrfs-find-root -a /dev/sde Couldn't read chunk root Open ctree failed = In defense of BTRFS it's been absolutely trouble free keeping my data through many power outages, etc. and I plan to keep using it. It was my own doing that caused the problem. Seems that if there was someway to edit something in those first overwritten 32MB of disc 2 to say "hey, I'm really here, just a bit screwed up" maybe some of the recovery tools could actually work. Thanks to all. Chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
On Tue, Aug 4, 2015 at 4:28 AM, Austin S Hemmelgarn wrote: > On 2015-08-04 00:58, John Ettedgui wrote: >> >> On Mon, Aug 3, 2015 at 8:01 PM, Qu Wenruo wrote: >>> >>> Although the best practice is staying away from such converted fs, either >>> using pure, newly created btrfs, or convert back to ext* before any >>> balance. >>> >> Unfortunately I don't have enough hard drive space to do a clean >> btrfs, so my only way to use btrfs for that partition was a >> conversion. > > If you could get your hands on a decent sized flash drive (32G or more), you > could do an incremental conversion offline. The steps would look something > like this: > > 1. Boot the system into a LiveCD or something similar that doesn't need to > run from your regular root partition (SystemRescueCD would be my personal > recommendation, although if you go that way, make sure to boot the > alternative kernel, as it's a lot newer then the standard ones). > 2. Plug in the flash drive, format it as BTRFS. > 3. Mount both your old partition and the flash drive somewhere. > 4. Start copying files from the old partition to the flash drive. > 5. When you hit ENOSPC on the flash drive, unmount the old partition, shrink > it down to the minimum size possible, and create a new partition in the free > space produced by doing so. > 6. Add the new partition to the BTRFS filesystem on the flash drive. > 7. Repeat steps 4-6 until you have copied everything. > 8. Wipe the old partition, and add it to the BTRFS filesystem. > 9. Run a full balance on the new BTRFS filesystem. > 10. Delete the partition from step 5 that is closest to the old partition > (via btrfs device delete), then resize the old partition to fill the space > that the deleted partition took up. > 11. Repeat steps 9-10 until the only remaining partitions in the new BTRFS > filesystem are the old one and the flash drive. > 12. Delete the flash drive from the BTRFS filesystem. > > This takes some time and coordination, but it does work reliably as long as > you are careful (I've done it before on multiple systems). > > I suppose I could do that even without the flash as I have some free space anyway, but moving Tbs of data with Gbs of free space will take days, plus the repartitioning. It'd probably be easier to start with a 1Tb drive or something. Is this currently my best bet as conversion is not as good as I thought? I believe my other 2 partitions also come from conversion, though I may have rebuilt them later from scratch. Thank you! John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] Btrfs: fix null pointer dereference when extent buffer is already freed
On Tue, Aug 4, 2015 at 3:43 PM, Anand Jain wrote: > When, read_tree_block() returns error it has already freed the extent_buffer > > read_tree_block(..) > { > :: > ret = btree_read_extent_buffer_pages(root, buf, 0, parent_transid); > <== fails > if (ret) { > free_extent_buffer(buf); <=== its freed already > return ERR_PTR(ret); > } > :: > } > > open_ctree() > { > :: > chunk_root->node = read_tree_block(..); > :: > > BTRFS: failed to read chunk root on sdf > BUG: unable to handle kernel NULL pointer dereference at 001f > IP: [] free_extent_buffer+0x1e/0xb0 [btrfs] > :: > [] ? free_root_extent_buffers+0x1e/0x40 [btrfs] > [] free_root_pointers+0x56/0x60 [btrfs] > [] open_ctree+0x19e0/0x2360 [btrfs] > [] btrfs_mount+0x9e6/0xb10 [btrfs] > [] ? find_next_zero_bit+0x1a/0x30 > [] ? find_next_bit+0x15/0x30 > [] ? pcpu_alloc+0x35a/0x680 > [] mount_fs+0x38/0x190 > [] ? __alloc_percpu+0x15/0x20 > [] vfs_kern_mount+0x6b/0x120 > [] btrfs_mount+0x1f8/0xb10 [btrfs] > [] ? pcpu_alloc+0x35a/0x680 > [] mount_fs+0x38/0x190 > [] ? __alloc_percpu+0x15/0x20 > [] vfs_kern_mount+0x6b/0x120 > [] do_mount+0x224/0xb80 > [] SyS_mount+0x7e/0xe0 > [] system_call_fastpath+0x12/0x71 > > this patch avoids calling it again > > Signed-off-by: Anand Jain > --- > fs/btrfs/disk-io.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index da7b7bf..e8901ac 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -2081,7 +2081,8 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info > *fs_info) > static void free_root_extent_buffers(struct btrfs_root *root) > { > if (root) { > - free_extent_buffer(root->node); > + if (!IS_ERR(root->node)) > + free_extent_buffer(root->node); Wasn't this already solved by commit [1] added in the last 4.2-rc? Either way, it's better to avoid having root->node with an error in the first place, should be NULL or a valid (non-error) address (like what's done everywhere else). [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=95ab1f64908795a2edd6b847eca94a0c63a44be4 > free_extent_buffer(root->commit_root); > root->node = NULL; > root->commit_root = NULL; > -- > 2.4.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] introduce function to handle device offline
Here the patch 3/3 below adds a function to handle device offline, also in this patch the device offline is triggred by the external sysfs interface. Patch 1/3 makes it possible to mount a or continue to be rw mounted (if -o degraded option is set or if tolerated failure is more than missing) when one of the device goes offline in midest of the mounted FS. Patch 2/3 fixes a bug. Anand Jain (3): Btrfs: allow -o rw,degraded for single group profile Btrfs: fix null pointer dereference when extent buffer is already freed Btrfs: introduce function to handle device offline fs/btrfs/disk-io.c | 6 ++- fs/btrfs/super.c | 3 +- fs/btrfs/sysfs.c | 20 +- fs/btrfs/volumes.c | 113 + fs/btrfs/volumes.h | 7 5 files changed, 145 insertions(+), 4 deletions(-) -- 2.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] Btrfs: allow -o rw,degraded for single group profile
As of now mount with missing device with degraded option is allowed as long as number of missing devices is below the number of device failure/missing that a group profile could tolerate. However there is feature/bug in the btrfs that when write happens with least number of devices with which degraded mount is possible, chunk allocation would default to single. Which means any further mount with missing device will fail even with degraded option as the number of device failure that single group profile could tolerate is zero. Which also means the user has to find the missing device to mount the FS RW able to recover from this situation, which is not practical. A self inflected problem. test case eg: mkfs.btrfs -draid1 -mraid /dev/sdc /dev/sdd modprobe -r btrfs && modprobe btrfs <== dev scan is cleared mount -o degraded /dev/sdc /btrfs <== sdd is not used dd if=/dev/zero of=/btrfs/tf1 count=1 <== creates single profile btrfs fi sync /btrfs umount /btrfs mount -o degraded /dev/sdc /btrfs <== fails. since single profile would need all the disks This patch will let the RW mount to succeed with degraded option, when the group profile tolerance is zero or below the number of device missing. Now users have a choice to rebalance the group profile back to their desired when missing device reappears during the subsequent mounts. Signed-off-by: Anand Jain --- fs/btrfs/disk-io.c | 3 ++- fs/btrfs/super.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2593952..da7b7bf 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2897,7 +2897,8 @@ retry_root_backup: btrfs_calc_num_tolerated_disk_barrier_failures(fs_info); if (fs_info->fs_devices->missing_devices > fs_info->num_tolerated_disk_barrier_failures && - !(sb->s_flags & MS_RDONLY)) { + !(sb->s_flags & MS_RDONLY || + btrfs_test_opt(fs_info->dev_root, DEGRADED))) { printk(KERN_WARNING "BTRFS: " "too many missing devices, writable mount is not allowed\n"); goto fail_sysfs; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 96a7857..16687c2 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1656,7 +1656,8 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data) if (fs_info->fs_devices->missing_devices > fs_info->num_tolerated_disk_barrier_failures && - !(*flags & MS_RDONLY)) { + !(*flags & MS_RDONLY || + btrfs_test_opt(root, DEGRADED))) { btrfs_warn(fs_info, "too many missing devices, writable remount is not allowed"); ret = -EACCES; -- 2.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] Btrfs: introduce function to handle device offline
RFC: Btrfs should offline the device in the following context as below.. 1) to report that device has failed per the IO errors 2) identify the target for the (hot) replacement 3) avoid further commit error reported against the failing device and 4) fix the bug which would unmount btrfs (in some systemd config) when one of the device goes missing in a multi device btrfs. which sounds good for single device btrfs but not when fault tolerance is more than 0. So as of now, this patch will provide a handle to offline a device, when requested it would bring the device offline if the conditions are right (that is mount degraded option is set or failure tolerance is more than total missing devices) or panic/bug (per user config). The consumer of this handle device offline as of now is sysfs interface '/sys/fs/btrfs//device//missing' The write feature for this interface is also added in this patch. This interface plus systemd changes I hope fix to 4th reason (as above) for this patch. This patch is for the review as of now. Thanks Signed-off-by: Anand Jain --- fs/btrfs/sysfs.c | 20 +- fs/btrfs/volumes.c | 113 + fs/btrfs/volumes.h | 7 3 files changed, 139 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index d7f6aab..31d3d65 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -656,7 +656,8 @@ int btrfs_sysfs_rm_device_link(struct btrfs_fs_devices *fs_devices, if (!fs_devices->device_dir_kobj) return -EINVAL; - if (one_device && one_device->bdev) { + if (one_device && (one_device->bdev || + one_device->dev_state == BTRFS_DEV_STATE_OFFLINED)) { disk = one_device->bdev->bd_part; disk_kobj = &part_to_dev(disk)->kobj; @@ -1268,6 +1269,23 @@ static ssize_t btrfs_dev_attr_store(struct kobject *kobj, * we might need some of the parameter to be writable * but as of now just deny all */ + int ret; + unsigned long val; + struct btrfs_device *dev = to_btrfs_dev(kobj); + + ret = kstrtoul(skip_spaces(buf), 0, &val); + if (ret) + return ret; + + if (BTRFS_DEV_CHECK_ATTR(&a->attr, missing)) { + if (val != 0 && val != 1) + return -EINVAL; + if (val == dev->missing) + return -EINVAL; + btrfs_put_dev_offline(dev); + return count; + } + return -EPERM; } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5fa73f4..56d92f0 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -769,6 +769,28 @@ static void free_device(struct rcu_head *head) schedule_work(&device->rcu_work); } +static void __close_device(struct work_struct *work) +{ + struct btrfs_device *device; + + device = container_of(work, struct btrfs_device, rcu_work); + + if (device->bdev) + blkdev_put(device->bdev, device->mode); + + device->bdev = NULL; +} + +static void close_device(struct rcu_head *head) +{ + struct btrfs_device *device; + + device = container_of(head, struct btrfs_device, rcu); + + INIT_WORK(&device->rcu_work, __close_device); + schedule_work(&device->rcu_work); +} + static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices) { struct btrfs_device *device, *tmp; @@ -6887,3 +6909,94 @@ void btrfs_close_one_device(struct btrfs_device *device) call_rcu(&device->rcu, free_device); } +void btrfs_close_one_device_dont_free(struct btrfs_device *device) +{ + struct btrfs_fs_devices *fs_devices = device->fs_devices; + + if (device->bdev) + fs_devices->open_devices--; + + if (device->writeable && + device->devid != BTRFS_DEV_REPLACE_DEVID) { + list_del_init(&device->dev_alloc_list); + fs_devices->rw_devices--; + } + + device->writeable = 0; + + call_rcu(&device->rcu, close_device); +} + +void __btrfs_put_dev_offline(struct btrfs_device *device) +{ + struct btrfs_device *next_device; + struct btrfs_fs_devices *fs_devices; + + fs_devices = device->fs_devices; + + mutex_lock(&fs_devices->device_list_mutex); + lock_chunks(fs_devices->fs_info->fs_root); + + next_device = list_entry(fs_devices->devices.next, + struct btrfs_device, dev_list); + if (device->bdev == fs_devices->fs_info->sb->s_bdev) + fs_devices->fs_info->sb->s_bdev = next_device->bdev; + + if (device->bdev == fs_devices->latest_bdev) + fs_devices->latest_bdev = next_device->bdev; + + btrfs_close_one_device_dont_free(device); + device->missing = 1; + fs_devices->missing_devices++; + + device->dev_state = BTRFS_DEV_STATE_OFFLINED; + + rcu_barrier(); + + unlock_chu
[PATCH 2/3] Btrfs: fix null pointer dereference when extent buffer is already freed
When, read_tree_block() returns error it has already freed the extent_buffer read_tree_block(..) { :: ret = btree_read_extent_buffer_pages(root, buf, 0, parent_transid); <== fails if (ret) { free_extent_buffer(buf); <=== its freed already return ERR_PTR(ret); } :: } open_ctree() { :: chunk_root->node = read_tree_block(..); :: BTRFS: failed to read chunk root on sdf BUG: unable to handle kernel NULL pointer dereference at 001f IP: [] free_extent_buffer+0x1e/0xb0 [btrfs] :: [] ? free_root_extent_buffers+0x1e/0x40 [btrfs] [] free_root_pointers+0x56/0x60 [btrfs] [] open_ctree+0x19e0/0x2360 [btrfs] [] btrfs_mount+0x9e6/0xb10 [btrfs] [] ? find_next_zero_bit+0x1a/0x30 [] ? find_next_bit+0x15/0x30 [] ? pcpu_alloc+0x35a/0x680 [] mount_fs+0x38/0x190 [] ? __alloc_percpu+0x15/0x20 [] vfs_kern_mount+0x6b/0x120 [] btrfs_mount+0x1f8/0xb10 [btrfs] [] ? pcpu_alloc+0x35a/0x680 [] mount_fs+0x38/0x190 [] ? __alloc_percpu+0x15/0x20 [] vfs_kern_mount+0x6b/0x120 [] do_mount+0x224/0xb80 [] SyS_mount+0x7e/0xe0 [] system_call_fastpath+0x12/0x71 this patch avoids calling it again Signed-off-by: Anand Jain --- fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index da7b7bf..e8901ac 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2081,7 +2081,8 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) static void free_root_extent_buffers(struct btrfs_root *root) { if (root) { - free_extent_buffer(root->node); + if (!IS_ERR(root->node)) + free_extent_buffer(root->node); free_extent_buffer(root->commit_root); root->node = NULL; root->commit_root = NULL; -- 2.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
On Mon, Aug 3, 2015 at 9:01 PM, Qu Wenruo wrote: > Oh, converted... > That's too bad. :( > > [[What's wrong with convert]] > Although btrfs is flex enough in theory to fit itself into the free space of > ext* and works fine, > But in practice, ext* is too fragmental in the standard of btrfs, not to > mention it also enables mixed-blockgroup. There is an -f flag for mkfs to help users avoid accidents. Is there a case to be made for btrfs-convert having either a -f flag, or an interactive "Convert has limitations that could increase risk to data, please see the wiki. Continue? y/n" OR "Convert has limitations, is not recommended for production usage, please see the wiki. Continue? y/n" It just seems users are jumping into convert without reading the wiki warning. Is it a good idea to reduce problems for less experienced users by actively discouraging btrfs-convert for production use? -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: btrfs: Add regression test for reserved space leak.
On Tue, Aug 4, 2015 at 7:27 AM, Qu Wenruo wrote: > The regression is introduced in v4.2-rc1, with the big btrfs qgroup > change. > The problem is, qgroup reserved space is never freed, causing even we > increase the limit, we can still hit the EDQUOT much faster than it > should. > > Reported-by: Tsutomu Itoh > Signed-off-by: Qu Wenruo Thanks for doing this Qu. The test fails without the btrfs fix and passes with it, as expected. However, one question below: > --- > tests/btrfs/089 | 83 > + > tests/btrfs/089.out | 5 > tests/btrfs/group | 1 + > 3 files changed, 89 insertions(+) > create mode 100755 tests/btrfs/089 > create mode 100644 tests/btrfs/089.out > > diff --git a/tests/btrfs/089 b/tests/btrfs/089 > new file mode 100755 > index 000..0c018f2 > --- /dev/null > +++ b/tests/btrfs/089 > @@ -0,0 +1,83 @@ > +#! /bin/bash > +# FS QA Test 089 > +# > +# Regression test for btrfs qgroup reserved space leak. > +# > +# Due to qgroup reserved space leak, EDQUOT can be trigged even it's not > +# over limit after previous write. > +# > +#--- > +# Copyright (c) 2015 Fujitsu. All Rights Reserved. > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#--- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_supported_os Linux > +_require_scratch > +_need_to_be_root > + > +# Use big blocksize to ensure there is still enough space left > +# for metadata reserve after hitting EDQUOT > +BLOCKSIZE=$(( 2 * 1024 * 1024 )) > +FILESIZE=$(( 128 * 1024 * 1024 )) # 128Mbytes > + > +# The last block won't be able to finish write, as metadata takes > +# $NODESIZE space, causing the last block triggering EDQUOT > +LENGTH=$(( $FILESIZE - $BLOCKSIZE )) > + > +_scratch_mkfs >>$seqres.full 2>&1 > +_scratch_mount > +_require_fs_space $SCRATCH_MNT $(($FILESIZE * 2 / 1024)) > + > +_run_btrfs_util_prog quota enable $SCRATCH_MNT > +_run_btrfs_util_prog qgroup limit $FILESIZE 5 $SCRATCH_MNT > + > +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE 0 $LENGTH" \ > + $SCRATCH_MNT/foo | _filter_xfs_io > +sync Why is the sync needed here? Can you add a comment explaining why? It isn't trivial/obvious (for me at least), specially because without the call to "sync" the test passes without the btrfs fix. thanks > + > +# Double the limit to allow further write > +_run_btrfs_util_prog qgroup limit $(($FILESIZE * 2)) 5 $SCRATCH_MNT > + > +# Test whether further write can succeed > +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE $LENGTH $LENGTH" \ > + $SCRATCH_MNT/foo | _filter_xfs_io > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/089.out b/tests/btrfs/089.out > new file mode 100644 > index 000..396888f > --- /dev/null > +++ b/tests/btrfs/089.out > @@ -0,0 +1,5 @@ > +QA output created by 089 > +wrote 132120576/132120576 bytes at offset 0 > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > +wrote 132120576/132120576 bytes at offset 132120576 > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > diff --git a/tests/btrfs/group b/tests/btrfs/group > index ffe18bf..225b532 100644 > --- a/tests/btrfs/group > +++ b/tests/btrfs/group > @@ -91,6 +91,7 @@ > 086 auto quick clone > 087 auto quick send > 088 auto quick metadata > +089 auto quick qgroup > 090 auto quick metadata > 091 auto quick qgroup > 092 auto quick send > -- > 1.8.3.1 > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in th
Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
On 2015-08-04 00:58, John Ettedgui wrote: On Mon, Aug 3, 2015 at 8:01 PM, Qu Wenruo wrote: Although the best practice is staying away from such converted fs, either using pure, newly created btrfs, or convert back to ext* before any balance. Unfortunately I don't have enough hard drive space to do a clean btrfs, so my only way to use btrfs for that partition was a conversion. If you could get your hands on a decent sized flash drive (32G or more), you could do an incremental conversion offline. The steps would look something like this: 1. Boot the system into a LiveCD or something similar that doesn't need to run from your regular root partition (SystemRescueCD would be my personal recommendation, although if you go that way, make sure to boot the alternative kernel, as it's a lot newer then the standard ones). 2. Plug in the flash drive, format it as BTRFS. 3. Mount both your old partition and the flash drive somewhere. 4. Start copying files from the old partition to the flash drive. 5. When you hit ENOSPC on the flash drive, unmount the old partition, shrink it down to the minimum size possible, and create a new partition in the free space produced by doing so. 6. Add the new partition to the BTRFS filesystem on the flash drive. 7. Repeat steps 4-6 until you have copied everything. 8. Wipe the old partition, and add it to the BTRFS filesystem. 9. Run a full balance on the new BTRFS filesystem. 10. Delete the partition from step 5 that is closest to the old partition (via btrfs device delete), then resize the old partition to fill the space that the deleted partition took up. 11. Repeat steps 9-10 until the only remaining partitions in the new BTRFS filesystem are the old one and the flash drive. 12. Delete the flash drive from the BTRFS filesystem. This takes some time and coordination, but it does work reliably as long as you are careful (I've done it before on multiple systems). smime.p7s Description: S/MIME Cryptographic Signature