Re: [PATCH v2] fstests: btrfs: Add regression test for reserved space leak.

2015-08-04 Thread Tsutomu Itoh

On 2015/08/05 12:10, Dave Chinner wrote:

On Wed, Aug 05, 2015 at 11:52:36AM +0900, Tsutomu Itoh wrote:

On 2015/08/05 10:57, Dave Chinner wrote:

On Wed, Aug 05, 2015 at 09:39:35AM +0800, Qu Wenruo wrote:

Tsutomu Itoh wrote on 2015/08/05 10:26 +0900:

On 2015/08/05 10:08, Qu Wenruo wrote:

+# As the reserved space freeing happens at commit_transaction time,
+# without a transaction commit, no reserved space needs freeing and
+# won't trigger the bug.
+sync


Isn't '$BTRFS_UTIL_PROG filesystem sync' better instead of 'sync'?

Thanks,
Tsutomu

Hi, Tsutomu-san,

Yes, I did use such method before, but Dave said it's better to use
unified interface to sync a filesystem other than the specialized
one.

So I still use sync as Dave said.


Mainly because "sync" is what users will use to make sure their data
is safe. filesystem specific tools have a habit of doing "special
stuff" to sync a filesystem, so it may not reflect the way users
expect the system to behaviour when they run sync.

The other option is this:

_syncfs()
{
mntpt=$1

$XFS_IO_PROG -c syncfs $mntpt
}

_sync_test()
{
_syncfs $TEST_DIR
}

_sync_scratch()
{
_syncfs $SCRATCH_MNT
}

which only runs sync on the filesystem that needs syncing (via the
syncfs() syscall)


I think that syncfs is better instead of sync because the syncing of
the specified filesystem is necessary.


sync guarantees that. IOWs, there's no difference between sync and
syncfs for the persepctive of this test and so the test does not
need really need changing.


I agree that sync and syncfs is not different for this test.

However, sync is syncing of all filesystems, and, syncfs is syncing
of the specified filesystem only.
Therefor, I think that it is better to use sync and syncfs properly
according to the necessity.

Thanks,
Tsutomu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: btrfs: Add regression test for reserved space leak.

2015-08-04 Thread Dave Chinner
On Wed, Aug 05, 2015 at 11:52:36AM +0900, Tsutomu Itoh wrote:
> On 2015/08/05 10:57, Dave Chinner wrote:
> >On Wed, Aug 05, 2015 at 09:39:35AM +0800, Qu Wenruo wrote:
> >>Tsutomu Itoh wrote on 2015/08/05 10:26 +0900:
> >>>On 2015/08/05 10:08, Qu Wenruo wrote:
> +# As the reserved space freeing happens at commit_transaction time,
> +# without a transaction commit, no reserved space needs freeing and
> +# won't trigger the bug.
> +sync
> >>>
> >>>Isn't '$BTRFS_UTIL_PROG filesystem sync' better instead of 'sync'?
> >>>
> >>>Thanks,
> >>>Tsutomu
> >>Hi, Tsutomu-san,
> >>
> >>Yes, I did use such method before, but Dave said it's better to use
> >>unified interface to sync a filesystem other than the specialized
> >>one.
> >>
> >>So I still use sync as Dave said.
> >
> >Mainly because "sync" is what users will use to make sure their data
> >is safe. filesystem specific tools have a habit of doing "special
> >stuff" to sync a filesystem, so it may not reflect the way users
> >expect the system to behaviour when they run sync.
> >
> >The other option is this:
> >
> >_syncfs()
> >{
> > mntpt=$1
> >
> > $XFS_IO_PROG -c syncfs $mntpt
> >}
> >
> >_sync_test()
> >{
> > _syncfs $TEST_DIR
> >}
> >
> >_sync_scratch()
> >{
> > _syncfs $SCRATCH_MNT
> >}
> >
> >which only runs sync on the filesystem that needs syncing (via the
> >syncfs() syscall)
> 
> I think that syncfs is better instead of sync because the syncing of
> the specified filesystem is necessary.

sync guarantees that. IOWs, there's no difference between sync and
syncfs for the persepctive of this test and so the test does not
need really need changing.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: btrfs: Add regression test for reserved space leak.

2015-08-04 Thread Qu Wenruo



Tsutomu Itoh wrote on 2015/08/05 11:52 +0900:

On 2015/08/05 10:57, Dave Chinner wrote:

On Wed, Aug 05, 2015 at 09:39:35AM +0800, Qu Wenruo wrote:

Tsutomu Itoh wrote on 2015/08/05 10:26 +0900:

On 2015/08/05 10:08, Qu Wenruo wrote:

+# As the reserved space freeing happens at commit_transaction time,
+# without a transaction commit, no reserved space needs freeing and
+# won't trigger the bug.
+sync


Isn't '$BTRFS_UTIL_PROG filesystem sync' better instead of 'sync'?

Thanks,
Tsutomu

Hi, Tsutomu-san,

Yes, I did use such method before, but Dave said it's better to use
unified interface to sync a filesystem other than the specialized
one.

So I still use sync as Dave said.


Mainly because "sync" is what users will use to make sure their data
is safe. filesystem specific tools have a habit of doing "special
stuff" to sync a filesystem, so it may not reflect the way users
expect the system to behaviour when they run sync.

The other option is this:

_syncfs()
{
mntpt=$1

$XFS_IO_PROG -c syncfs $mntpt
}

_sync_test()
{
_syncfs $TEST_DIR
}

_sync_scratch()
{
_syncfs $SCRATCH_MNT
}

which only runs sync on the filesystem that needs syncing (via the
syncfs() syscall)


I think that syncfs is better instead of sync because the syncing of
the specified filesystem is necessary.

Thanks,
Tsutomu


I'm OK changing the sync to _sync_scratch().

But no one in btrfs testcase uses that function and I didn't see the 
reason to specifically call _sync_scratch() instead of sync() BTW.


Thanks,
Qu



Cheers,

Dave.





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: btrfs: Add regression test for reserved space leak.

2015-08-04 Thread Tsutomu Itoh

On 2015/08/05 10:57, Dave Chinner wrote:

On Wed, Aug 05, 2015 at 09:39:35AM +0800, Qu Wenruo wrote:

Tsutomu Itoh wrote on 2015/08/05 10:26 +0900:

On 2015/08/05 10:08, Qu Wenruo wrote:

+# As the reserved space freeing happens at commit_transaction time,
+# without a transaction commit, no reserved space needs freeing and
+# won't trigger the bug.
+sync


Isn't '$BTRFS_UTIL_PROG filesystem sync' better instead of 'sync'?

Thanks,
Tsutomu

Hi, Tsutomu-san,

Yes, I did use such method before, but Dave said it's better to use
unified interface to sync a filesystem other than the specialized
one.

So I still use sync as Dave said.


Mainly because "sync" is what users will use to make sure their data
is safe. filesystem specific tools have a habit of doing "special
stuff" to sync a filesystem, so it may not reflect the way users
expect the system to behaviour when they run sync.

The other option is this:

_syncfs()
{
mntpt=$1

$XFS_IO_PROG -c syncfs $mntpt
}

_sync_test()
{
_syncfs $TEST_DIR
}

_sync_scratch()
{
_syncfs $SCRATCH_MNT
}

which only runs sync on the filesystem that needs syncing (via the
syncfs() syscall)


I think that syncfs is better instead of sync because the syncing of
the specified filesystem is necessary.

Thanks,
Tsutomu



Cheers,

Dave.




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: btrfs: Add regression test for reserved space leak.

2015-08-04 Thread Dave Chinner
On Wed, Aug 05, 2015 at 09:39:35AM +0800, Qu Wenruo wrote:
> Tsutomu Itoh wrote on 2015/08/05 10:26 +0900:
> >On 2015/08/05 10:08, Qu Wenruo wrote:
> >>+# As the reserved space freeing happens at commit_transaction time,
> >>+# without a transaction commit, no reserved space needs freeing and
> >>+# won't trigger the bug.
> >>+sync
> >
> >Isn't '$BTRFS_UTIL_PROG filesystem sync' better instead of 'sync'?
> >
> >Thanks,
> >Tsutomu
> Hi, Tsutomu-san,
> 
> Yes, I did use such method before, but Dave said it's better to use
> unified interface to sync a filesystem other than the specialized
> one.
> 
> So I still use sync as Dave said.

Mainly because "sync" is what users will use to make sure their data
is safe. filesystem specific tools have a habit of doing "special
stuff" to sync a filesystem, so it may not reflect the way users
expect the system to behaviour when they run sync.

The other option is this:

_syncfs()
{
mntpt=$1

$XFS_IO_PROG -c syncfs $mntpt
}

_sync_test()
{
_syncfs $TEST_DIR
}

_sync_scratch()
{
_syncfs $SCRATCH_MNT
}

which only runs sync on the filesystem that needs syncing (via the
syncfs() syscall)

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: btrfs: Add regression test for reserved space leak.

2015-08-04 Thread Qu Wenruo



Tsutomu Itoh wrote on 2015/08/05 10:26 +0900:

On 2015/08/05 10:08, Qu Wenruo wrote:

The regression is introduced in v4.2-rc1, with the big btrfs qgroup
change.
The problem is, qgroup reserved space is never freed, causing even we
increase the limit, we can still hit the EDQUOT much faster than it
should.

Reported-by: Tsutomu Itoh 
Signed-off-by: Qu Wenruo 
---
   tests/btrfs/089 | 88 
+
   tests/btrfs/089.out |  5 +++
   tests/btrfs/group   |  1 +
   3 files changed, 94 insertions(+)
   create mode 100755 tests/btrfs/089
   create mode 100644 tests/btrfs/089.out

diff --git a/tests/btrfs/089 b/tests/btrfs/089
new file mode 100755
index 000..82db96c
--- /dev/null
+++ b/tests/btrfs/089
@@ -0,0 +1,88 @@
+#! /bin/bash
+# FS QA Test 089
+#
+# Regression test for btrfs qgroup reserved space leak.
+#
+# Due to qgroup reserved space leak, EDQUOT can be trigged even it's not
+# over limit after previous write.
+#
+#---
+# Copyright (c) 2015 Fujitsu. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_need_to_be_root
+
+# Use big blocksize to ensure there is still enough space left
+# for metadata reserve after hitting EDQUOT
+BLOCKSIZE=$(( 2 * 1024 * 1024 ))
+FILESIZE=$(( 128 * 1024 * 1024 )) # 128Mbytes
+
+# The last block won't be able to finish write, as metadata takes
+# $NODESIZE space, causing the last block triggering EDQUOT
+LENGTH=$(( $FILESIZE - $BLOCKSIZE ))
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+_require_fs_space $SCRATCH_MNT $(($FILESIZE * 2 / 1024))
+
+_run_btrfs_util_prog quota enable $SCRATCH_MNT
+_run_btrfs_util_prog qgroup limit $FILESIZE 5 $SCRATCH_MNT
+
+$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE 0 $LENGTH" \
+   $SCRATCH_MNT/foo | _filter_xfs_io
+
+# A sync is needed to trigger a commit_transaction.
+# As the reserved space freeing happens at commit_transaction time,
+# without a transaction commit, no reserved space needs freeing and
+# won't trigger the bug.
+sync


Isn't '$BTRFS_UTIL_PROG filesystem sync' better instead of 'sync'?

Thanks,
Tsutomu

Hi, Tsutomu-san,

Yes, I did use such method before, but Dave said it's better to use 
unified interface to sync a filesystem other than the specialized one.


So I still use sync as Dave said.

Thanks,
Qu



+
+# Double the limit to allow further write
+_run_btrfs_util_prog qgroup limit $(($FILESIZE * 2)) 5 $SCRATCH_MNT
+
+# Test whether further write can succeed
+$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE $LENGTH $LENGTH" \
+   $SCRATCH_MNT/foo | _filter_xfs_io
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/089.out b/tests/btrfs/089.out
new file mode 100644
index 000..396888f
--- /dev/null
+++ b/tests/btrfs/089.out
@@ -0,0 +1,5 @@
+QA output created by 089
+wrote 132120576/132120576 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 132120576/132120576 bytes at offset 132120576
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/btrfs/group b/tests/btrfs/group
index ffe18bf..225b532 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -91,6 +91,7 @@
   086 auto quick clone
   087 auto quick send
   088 auto quick metadata
+089 auto quick qgroup
   090 auto quick metadata
   091 auto quick qgroup
   092 auto quick send




--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: btrfs: Add regression test for reserved space leak.

2015-08-04 Thread Tsutomu Itoh
On 2015/08/05 10:08, Qu Wenruo wrote:
> The regression is introduced in v4.2-rc1, with the big btrfs qgroup
> change.
> The problem is, qgroup reserved space is never freed, causing even we
> increase the limit, we can still hit the EDQUOT much faster than it
> should.
> 
> Reported-by: Tsutomu Itoh 
> Signed-off-by: Qu Wenruo 
> ---
>   tests/btrfs/089 | 88 
> +
>   tests/btrfs/089.out |  5 +++
>   tests/btrfs/group   |  1 +
>   3 files changed, 94 insertions(+)
>   create mode 100755 tests/btrfs/089
>   create mode 100644 tests/btrfs/089.out
> 
> diff --git a/tests/btrfs/089 b/tests/btrfs/089
> new file mode 100755
> index 000..82db96c
> --- /dev/null
> +++ b/tests/btrfs/089
> @@ -0,0 +1,88 @@
> +#! /bin/bash
> +# FS QA Test 089
> +#
> +# Regression test for btrfs qgroup reserved space leak.
> +#
> +# Due to qgroup reserved space leak, EDQUOT can be trigged even it's not
> +# over limit after previous write.
> +#
> +#---
> +# Copyright (c) 2015 Fujitsu. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch
> +_need_to_be_root
> +
> +# Use big blocksize to ensure there is still enough space left
> +# for metadata reserve after hitting EDQUOT
> +BLOCKSIZE=$(( 2 * 1024 * 1024 ))
> +FILESIZE=$(( 128 * 1024 * 1024 )) # 128Mbytes
> +
> +# The last block won't be able to finish write, as metadata takes
> +# $NODESIZE space, causing the last block triggering EDQUOT
> +LENGTH=$(( $FILESIZE - $BLOCKSIZE ))
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount
> +_require_fs_space $SCRATCH_MNT $(($FILESIZE * 2 / 1024))
> +
> +_run_btrfs_util_prog quota enable $SCRATCH_MNT
> +_run_btrfs_util_prog qgroup limit $FILESIZE 5 $SCRATCH_MNT
> +
> +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE 0 $LENGTH" \
> + $SCRATCH_MNT/foo | _filter_xfs_io
> +
> +# A sync is needed to trigger a commit_transaction.
> +# As the reserved space freeing happens at commit_transaction time,
> +# without a transaction commit, no reserved space needs freeing and
> +# won't trigger the bug.
> +sync

Isn't '$BTRFS_UTIL_PROG filesystem sync' better instead of 'sync'?

Thanks,
Tsutomu

> +
> +# Double the limit to allow further write
> +_run_btrfs_util_prog qgroup limit $(($FILESIZE * 2)) 5 $SCRATCH_MNT
> +
> +# Test whether further write can succeed
> +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE $LENGTH $LENGTH" \
> + $SCRATCH_MNT/foo | _filter_xfs_io
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/089.out b/tests/btrfs/089.out
> new file mode 100644
> index 000..396888f
> --- /dev/null
> +++ b/tests/btrfs/089.out
> @@ -0,0 +1,5 @@
> +QA output created by 089
> +wrote 132120576/132120576 bytes at offset 0
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 132120576/132120576 bytes at offset 132120576
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> diff --git a/tests/btrfs/group b/tests/btrfs/group
> index ffe18bf..225b532 100644
> --- a/tests/btrfs/group
> +++ b/tests/btrfs/group
> @@ -91,6 +91,7 @@
>   086 auto quick clone
>   087 auto quick send
>   088 auto quick metadata
> +089 auto quick qgroup
>   090 auto quick metadata
>   091 auto quick qgroup
>   092 auto quick send
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] fstests: btrfs: Add regression test for reserved space leak.

2015-08-04 Thread Qu Wenruo
The regression is introduced in v4.2-rc1, with the big btrfs qgroup
change.
The problem is, qgroup reserved space is never freed, causing even we
increase the limit, we can still hit the EDQUOT much faster than it
should.

Reported-by: Tsutomu Itoh 
Signed-off-by: Qu Wenruo 
---
 tests/btrfs/089 | 88 +
 tests/btrfs/089.out |  5 +++
 tests/btrfs/group   |  1 +
 3 files changed, 94 insertions(+)
 create mode 100755 tests/btrfs/089
 create mode 100644 tests/btrfs/089.out

diff --git a/tests/btrfs/089 b/tests/btrfs/089
new file mode 100755
index 000..82db96c
--- /dev/null
+++ b/tests/btrfs/089
@@ -0,0 +1,88 @@
+#! /bin/bash
+# FS QA Test 089
+#
+# Regression test for btrfs qgroup reserved space leak.
+#
+# Due to qgroup reserved space leak, EDQUOT can be trigged even it's not
+# over limit after previous write.
+#
+#---
+# Copyright (c) 2015 Fujitsu. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_need_to_be_root
+
+# Use big blocksize to ensure there is still enough space left
+# for metadata reserve after hitting EDQUOT
+BLOCKSIZE=$(( 2 * 1024 * 1024 ))
+FILESIZE=$(( 128 * 1024 * 1024 )) # 128Mbytes 
+
+# The last block won't be able to finish write, as metadata takes
+# $NODESIZE space, causing the last block triggering EDQUOT
+LENGTH=$(( $FILESIZE - $BLOCKSIZE ))
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+_require_fs_space $SCRATCH_MNT $(($FILESIZE * 2 / 1024))
+
+_run_btrfs_util_prog quota enable $SCRATCH_MNT
+_run_btrfs_util_prog qgroup limit $FILESIZE 5 $SCRATCH_MNT
+
+$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE 0 $LENGTH" \
+   $SCRATCH_MNT/foo | _filter_xfs_io
+
+# A sync is needed to trigger a commit_transaction.
+# As the reserved space freeing happens at commit_transaction time,
+# without a transaction commit, no reserved space needs freeing and
+# won't trigger the bug.
+sync
+
+# Double the limit to allow further write
+_run_btrfs_util_prog qgroup limit $(($FILESIZE * 2)) 5 $SCRATCH_MNT
+
+# Test whether further write can succeed
+$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE $LENGTH $LENGTH" \
+   $SCRATCH_MNT/foo | _filter_xfs_io
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/089.out b/tests/btrfs/089.out
new file mode 100644
index 000..396888f
--- /dev/null
+++ b/tests/btrfs/089.out
@@ -0,0 +1,5 @@
+QA output created by 089
+wrote 132120576/132120576 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 132120576/132120576 bytes at offset 132120576
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/btrfs/group b/tests/btrfs/group
index ffe18bf..225b532 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -91,6 +91,7 @@
 086 auto quick clone
 087 auto quick send
 088 auto quick metadata
+089 auto quick qgroup
 090 auto quick metadata
 091 auto quick qgroup
 092 auto quick send
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstests: btrfs: Add regression test for reserved space leak.

2015-08-04 Thread Qu Wenruo



Filipe David Manana wrote on 2015/08/04 14:16 +0100:

On Tue, Aug 4, 2015 at 7:27 AM, Qu Wenruo  wrote:

The regression is introduced in v4.2-rc1, with the big btrfs qgroup
change.
The problem is, qgroup reserved space is never freed, causing even we
increase the limit, we can still hit the EDQUOT much faster than it
should.

Reported-by: Tsutomu Itoh 
Signed-off-by: Qu Wenruo 


Thanks for doing this Qu.
The test fails without the btrfs fix and passes with it, as expected.
However, one question below:

Thanks for the review, Filipe.

I'll explain it inline below.



---
  tests/btrfs/089 | 83 +
  tests/btrfs/089.out |  5 
  tests/btrfs/group   |  1 +
  3 files changed, 89 insertions(+)
  create mode 100755 tests/btrfs/089
  create mode 100644 tests/btrfs/089.out

diff --git a/tests/btrfs/089 b/tests/btrfs/089
new file mode 100755
index 000..0c018f2
--- /dev/null
+++ b/tests/btrfs/089
@@ -0,0 +1,83 @@
+#! /bin/bash
+# FS QA Test 089
+#
+# Regression test for btrfs qgroup reserved space leak.
+#
+# Due to qgroup reserved space leak, EDQUOT can be trigged even it's not
+# over limit after previous write.
+#
+#---
+# Copyright (c) 2015 Fujitsu. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_need_to_be_root
+
+# Use big blocksize to ensure there is still enough space left
+# for metadata reserve after hitting EDQUOT
+BLOCKSIZE=$(( 2 * 1024 * 1024 ))
+FILESIZE=$(( 128 * 1024 * 1024 )) # 128Mbytes
+
+# The last block won't be able to finish write, as metadata takes
+# $NODESIZE space, causing the last block triggering EDQUOT
+LENGTH=$(( $FILESIZE - $BLOCKSIZE ))
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+_require_fs_space $SCRATCH_MNT $(($FILESIZE * 2 / 1024))
+
+_run_btrfs_util_prog quota enable $SCRATCH_MNT
+_run_btrfs_util_prog qgroup limit $FILESIZE 5 $SCRATCH_MNT
+
+$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE 0 $LENGTH" \
+   $SCRATCH_MNT/foo | _filter_xfs_io
+sync


Why is the sync needed here? Can you add a comment explaining why? It
isn't trivial/obvious (for me at least), specially because without the
call to "sync" the test passes without the btrfs fix.

thanks

No problem, I'll send a v2 patch with explain about the sync.

The reason is, without the sync, it's highly possible the data is not 
flush into disk. So the reserved space is correct until data is written.


For current write flow with sync, without the fix patch:
1) Want to write first 126M
   Reserve 126M space

   Qgroup 5: reserved = 126M, rfer = 0(*), rfer_max = 128M
*: Just ignore metadata, as blocksize 2M is much larger than nodesize(16K)

2) Sync
   Data writeback and metadata change
|- Run delayed refs
   |- Qgroup accouting
   Qgroup 5: reserved = 126M, rfer = 126M, rfer_max = 128M
 ^^ Should be 0, as reserved data is written into disk.

3) Increase limit to 256M
   Qgroup 5: reserved = 126M, rfer = 126M, rfer_max = 256M

4) Want to write the next 126M
   Reserve 126M space.

   But qgroup 5 only has less than 4M available space.
   rfer_max - (reserved + rfer) = 4M

   So reserve fails with EDQUOT.

On the other hand, if there is no sync:
1) Want to write first 126M
   Reserve 126M space

   Qgroup 5: reserved = 126M, rfer = 0(*), rfer_max = 128M
   *: Ignore metadata again.
   Also we assume your memory is large enough to keep that amount of
   dirty pages without trigger page flush.

3) Increase limit to 256M
   Qgroup 5: reserved = 126M, rfer = 0, rfer_max = 256M

   Rfer will only be increase at commit_transaction() time.
   So it will stay 0 until manually sync or dirty page number triggers a
   flush.

4) Want to write the next 126M
   Reserve 126M space.

   Now qgroup 5 has 256 - 126 = 130M available space.
   The reserve will succeed withou

Re: [PATCH 2/3] Btrfs: fix null pointer dereference when extent buffer is already freed

2015-08-04 Thread Anand Jain



Wasn't this already solved by commit [1] added in the last 4.2-rc?


 Ah yes. it does.

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS disaster (of my own making). Is this recoverable?

2015-08-04 Thread Sonic
On Tue, Aug 4, 2015 at 1:10 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> I don't remember you saying anything about trying btrfs restore.

I did try it earlier in dry-run mode and it can't do anything:
=
btrfs restore -D /dev/sdc /mnt/saved/
warning, device 2 is missing
bytenr mismatch, want=20971520, have=0
Couldn't read chunk root
Could not open root, trying backup super
warning, device 2 is missing
bytenr mismatch, want=20971520, have=0
Couldn't read chunk root
Could not open root, trying backup super
warning, device 2 is missing
bytenr mismatch, want=20971520, have=0
Couldn't read chunk root
Could not open root, trying backup super
=

Find-roots, list-roots, none of it works.
=
btrfs restore --list-roots /dev/sde
checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238
checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238
bytenr mismatch, want=20971520, have=8330001001141004672
Couldn't read chunk root
Could not open root, trying backup super
warning, device 1 is missing
checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238
checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238
bytenr mismatch, want=20971520, have=8330001001141004672
Couldn't read chunk root
Could not open root, trying backup super
warning, device 1 is missing
checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238
checksum verify failed on 20971520 found 8B1D9672 wanted 2F8A4238
bytenr mismatch, want=20971520, have=8330001001141004672
Couldn't read chunk root
Could not open root, trying backup super
=
btrfs-find-root -a /dev/sdc
warning, device 2 is missing
Couldn't read chunk root

btrfs-find-root -a /dev/sde
Couldn't read chunk root
Open ctree failed
=

In defense of BTRFS it's been absolutely trouble free keeping my data
through many power outages, etc. and I plan to keep using it. It was
my own doing that caused the problem.

Seems that if there was someway to edit something in those first
overwritten 32MB of disc 2 to say "hey, I'm really here, just a bit
screwed up" maybe some of the recovery tools could actually work.

Thanks to all.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2015-08-04 Thread John Ettedgui
On Tue, Aug 4, 2015 at 4:28 AM, Austin S Hemmelgarn
 wrote:
> On 2015-08-04 00:58, John Ettedgui wrote:
>>
>> On Mon, Aug 3, 2015 at 8:01 PM, Qu Wenruo  wrote:
>>>
>>> Although the best practice is staying away from such converted fs, either
>>> using pure, newly created btrfs, or convert back to ext* before any
>>> balance.
>>>
>> Unfortunately I don't have enough hard drive space to do a clean
>> btrfs, so my only way to use btrfs for that partition was a
>> conversion.
>
> If you could get your hands on a decent sized flash drive (32G or more), you
> could do an incremental conversion offline.  The steps would look something
> like this:
>
> 1. Boot the system into a LiveCD or something similar that doesn't need to
> run from your regular root partition (SystemRescueCD would be my personal
> recommendation, although if you go that way, make sure to boot the
> alternative kernel, as it's a lot newer then the standard ones).
> 2. Plug in the flash drive, format it as BTRFS.
> 3. Mount both your old partition and the flash drive somewhere.
> 4. Start copying files from the old partition to the flash drive.
> 5. When you hit ENOSPC on the flash drive, unmount the old partition, shrink
> it down to the minimum size possible, and create a new partition in the free
> space produced by doing so.
> 6. Add the new partition to the BTRFS filesystem on the flash drive.
> 7. Repeat steps 4-6 until you have copied everything.
> 8. Wipe the old partition, and add it to the BTRFS filesystem.
> 9. Run a full balance on the new BTRFS filesystem.
> 10. Delete the partition from step 5 that is closest to the old partition
> (via btrfs device delete), then resize the old partition to fill the space
> that the deleted partition took up.
> 11. Repeat steps 9-10 until the only remaining partitions in the new BTRFS
> filesystem are the old one and the flash drive.
> 12. Delete the flash drive from the BTRFS filesystem.
>
> This takes some time and coordination, but it does work reliably as long as
> you are careful (I've done it before on multiple systems).
>
>
I suppose I could do that even without the flash as I have some free
space anyway, but moving Tbs of data with Gbs of free space will take
days, plus the repartitioning. It'd probably be easier to start with a
1Tb drive or something.
Is this currently my best bet as conversion is not as good as I thought?

I believe my other 2 partitions also come from conversion, though I
may have rebuilt them later from scratch.

Thank you!
John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] Btrfs: fix null pointer dereference when extent buffer is already freed

2015-08-04 Thread Filipe David Manana
On Tue, Aug 4, 2015 at 3:43 PM, Anand Jain  wrote:
> When, read_tree_block() returns error it has already freed the extent_buffer
>
> read_tree_block(..)
> {
> ::
> ret = btree_read_extent_buffer_pages(root, buf, 0, parent_transid); 
> <== fails
> if (ret) {
> free_extent_buffer(buf); <=== its freed already
> return ERR_PTR(ret);
> }
> ::
> }
>
> open_ctree()
> {
> ::
> chunk_root->node = read_tree_block(..);
> ::
>
> BTRFS: failed to read chunk root on sdf
> BUG: unable to handle kernel NULL pointer dereference at 001f
> IP: [] free_extent_buffer+0x1e/0xb0 [btrfs]
> ::
> [] ? free_root_extent_buffers+0x1e/0x40 [btrfs]
> [] free_root_pointers+0x56/0x60 [btrfs]
> [] open_ctree+0x19e0/0x2360 [btrfs]
> [] btrfs_mount+0x9e6/0xb10 [btrfs]
> [] ? find_next_zero_bit+0x1a/0x30
> [] ? find_next_bit+0x15/0x30
> [] ? pcpu_alloc+0x35a/0x680
> [] mount_fs+0x38/0x190
> [] ? __alloc_percpu+0x15/0x20
> [] vfs_kern_mount+0x6b/0x120
> [] btrfs_mount+0x1f8/0xb10 [btrfs]
> [] ? pcpu_alloc+0x35a/0x680
> [] mount_fs+0x38/0x190
> [] ? __alloc_percpu+0x15/0x20
> [] vfs_kern_mount+0x6b/0x120
> [] do_mount+0x224/0xb80
> [] SyS_mount+0x7e/0xe0
> [] system_call_fastpath+0x12/0x71
>
> this patch avoids calling it again
>
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/disk-io.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index da7b7bf..e8901ac 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2081,7 +2081,8 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info 
> *fs_info)
>  static void free_root_extent_buffers(struct btrfs_root *root)
>  {
> if (root) {
> -   free_extent_buffer(root->node);
> +   if (!IS_ERR(root->node))
> +   free_extent_buffer(root->node);

Wasn't this already solved by commit [1] added in the last 4.2-rc?
Either way, it's better to avoid having root->node with an error in
the first place, should be NULL or a valid (non-error) address (like
what's done everywhere else).

[1] 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=95ab1f64908795a2edd6b847eca94a0c63a44be4

> free_extent_buffer(root->commit_root);
> root->node = NULL;
> root->commit_root = NULL;
> --
> 2.4.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] introduce function to handle device offline

2015-08-04 Thread Anand Jain
Here the patch 3/3 below adds a function to handle device offline,
also in this patch the device offline is triggred by the external
sysfs interface.
Patch 1/3 makes it possible to mount a or continue to be rw mounted
(if -o degraded option is set or if tolerated failure is more than
missing) when one of the device goes offline in midest of the mounted
FS.
Patch 2/3 fixes a bug. 

Anand Jain (3):
  Btrfs: allow -o rw,degraded for single group profile
  Btrfs: fix null pointer dereference when extent buffer is already
freed
  Btrfs: introduce function to handle device offline

 fs/btrfs/disk-io.c |   6 ++-
 fs/btrfs/super.c   |   3 +-
 fs/btrfs/sysfs.c   |  20 +-
 fs/btrfs/volumes.c | 113 +
 fs/btrfs/volumes.h |   7 
 5 files changed, 145 insertions(+), 4 deletions(-)

-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] Btrfs: allow -o rw,degraded for single group profile

2015-08-04 Thread Anand Jain
As of now mount with missing device with degraded option is
allowed as long as number of missing devices is below the
number of device failure/missing that a group profile could
tolerate.

However there is feature/bug in the btrfs that when write
happens with least number of devices with which degraded mount
is possible, chunk allocation would default to single.

Which means any further mount with missing device will fail
even with degraded option as the number of device failure that
single group profile could tolerate is zero. Which also means
the user has to find the missing device to mount the FS RW able
to recover from this situation, which is not practical. A self
inflected problem.

test case eg:
 mkfs.btrfs -draid1 -mraid /dev/sdc /dev/sdd
 modprobe -r btrfs && modprobe btrfs  <== dev scan is cleared
 mount -o degraded /dev/sdc /btrfs <== sdd is not used
 dd if=/dev/zero of=/btrfs/tf1 count=1 <== creates single profile
 btrfs fi sync /btrfs
 umount /btrfs
 mount -o degraded /dev/sdc /btrfs  <== fails.
since single profile would need all the disks

This patch will let the RW mount to succeed with degraded option,
when the group profile tolerance is zero or below the number of
device missing. Now users have a choice to rebalance the group
profile back to their desired when missing device reappears during
the subsequent mounts.

Signed-off-by: Anand Jain 
---
 fs/btrfs/disk-io.c | 3 ++-
 fs/btrfs/super.c   | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2593952..da7b7bf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2897,7 +2897,8 @@ retry_root_backup:
btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
if (fs_info->fs_devices->missing_devices >
 fs_info->num_tolerated_disk_barrier_failures &&
-   !(sb->s_flags & MS_RDONLY)) {
+   !(sb->s_flags & MS_RDONLY ||
+   btrfs_test_opt(fs_info->dev_root, DEGRADED))) {
printk(KERN_WARNING "BTRFS: "
"too many missing devices, writable mount is not 
allowed\n");
goto fail_sysfs;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 96a7857..16687c2 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1656,7 +1656,8 @@ static int btrfs_remount(struct super_block *sb, int 
*flags, char *data)
 
if (fs_info->fs_devices->missing_devices >
 fs_info->num_tolerated_disk_barrier_failures &&
-   !(*flags & MS_RDONLY)) {
+   !(*flags & MS_RDONLY ||
+   btrfs_test_opt(root, DEGRADED))) {
btrfs_warn(fs_info,
"too many missing devices, writable remount is 
not allowed");
ret = -EACCES;
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] Btrfs: introduce function to handle device offline

2015-08-04 Thread Anand Jain
RFC:

Btrfs should offline the device in the following context as below..
1) to report that device has failed per the IO errors
2) identify the target for the (hot) replacement
3) avoid further commit error reported against the failing device and
4) fix the bug which would unmount btrfs (in some systemd config)
   when one of the device goes missing in a multi device btrfs.
   which sounds good for single device btrfs but not when fault
   tolerance is more than 0.

So as of now, this patch will provide a handle to offline a device,
when requested it would bring the device offline if the conditions
are right (that is mount degraded option is set or failure tolerance
is more than total missing devices) or panic/bug (per user config).

The consumer of this handle device offline as of now is sysfs
interface '/sys/fs/btrfs//device//missing'
The write feature for this interface is also added in this patch.
This interface plus systemd changes I hope fix to 4th reason
(as above) for this patch.

This patch is for the review as of now.

Thanks

Signed-off-by: Anand Jain 
---
 fs/btrfs/sysfs.c   |  20 +-
 fs/btrfs/volumes.c | 113 +
 fs/btrfs/volumes.h |   7 
 3 files changed, 139 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index d7f6aab..31d3d65 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -656,7 +656,8 @@ int btrfs_sysfs_rm_device_link(struct btrfs_fs_devices 
*fs_devices,
if (!fs_devices->device_dir_kobj)
return -EINVAL;
 
-   if (one_device && one_device->bdev) {
+   if (one_device && (one_device->bdev ||
+   one_device->dev_state == BTRFS_DEV_STATE_OFFLINED)) {
disk = one_device->bdev->bd_part;
disk_kobj = &part_to_dev(disk)->kobj;
 
@@ -1268,6 +1269,23 @@ static ssize_t btrfs_dev_attr_store(struct kobject *kobj,
 * we might need some of the parameter to be writable
 * but as of now just deny all
 */
+   int ret;
+   unsigned long val;
+   struct btrfs_device *dev = to_btrfs_dev(kobj);
+
+   ret = kstrtoul(skip_spaces(buf), 0, &val);
+   if (ret)
+   return ret;
+
+   if (BTRFS_DEV_CHECK_ATTR(&a->attr, missing)) {
+   if (val != 0 && val != 1)
+   return -EINVAL;
+   if (val == dev->missing)
+   return -EINVAL;
+   btrfs_put_dev_offline(dev);
+   return count;
+   }
+
return -EPERM;
 }
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5fa73f4..56d92f0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -769,6 +769,28 @@ static void free_device(struct rcu_head *head)
schedule_work(&device->rcu_work);
 }
 
+static void __close_device(struct work_struct *work)
+{
+   struct btrfs_device *device;
+
+   device = container_of(work, struct btrfs_device, rcu_work);
+
+   if (device->bdev)
+   blkdev_put(device->bdev, device->mode);
+
+   device->bdev = NULL;
+}
+
+static void close_device(struct rcu_head *head)
+{
+   struct btrfs_device *device;
+
+   device = container_of(head, struct btrfs_device, rcu);
+
+   INIT_WORK(&device->rcu_work, __close_device);
+   schedule_work(&device->rcu_work);
+}
+
 static int __btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
 {
struct btrfs_device *device, *tmp;
@@ -6887,3 +6909,94 @@ void btrfs_close_one_device(struct btrfs_device *device)
call_rcu(&device->rcu, free_device);
 }
 
+void btrfs_close_one_device_dont_free(struct btrfs_device *device)
+{
+   struct btrfs_fs_devices *fs_devices = device->fs_devices;
+
+   if (device->bdev)
+   fs_devices->open_devices--;
+
+   if (device->writeable &&
+   device->devid != BTRFS_DEV_REPLACE_DEVID) {
+   list_del_init(&device->dev_alloc_list);
+   fs_devices->rw_devices--;
+   }
+
+   device->writeable = 0;
+
+   call_rcu(&device->rcu, close_device);
+}
+
+void __btrfs_put_dev_offline(struct btrfs_device *device)
+{
+   struct btrfs_device *next_device;
+   struct btrfs_fs_devices *fs_devices;
+
+   fs_devices = device->fs_devices;
+
+   mutex_lock(&fs_devices->device_list_mutex);
+   lock_chunks(fs_devices->fs_info->fs_root);
+
+   next_device = list_entry(fs_devices->devices.next,
+   struct btrfs_device, dev_list);
+   if (device->bdev == fs_devices->fs_info->sb->s_bdev)
+   fs_devices->fs_info->sb->s_bdev = next_device->bdev;
+
+   if (device->bdev == fs_devices->latest_bdev)
+   fs_devices->latest_bdev = next_device->bdev;
+
+   btrfs_close_one_device_dont_free(device);
+   device->missing = 1;
+   fs_devices->missing_devices++;
+
+   device->dev_state = BTRFS_DEV_STATE_OFFLINED;
+
+   rcu_barrier();
+
+   unlock_chu

[PATCH 2/3] Btrfs: fix null pointer dereference when extent buffer is already freed

2015-08-04 Thread Anand Jain
When, read_tree_block() returns error it has already freed the extent_buffer

read_tree_block(..)
{
::
ret = btree_read_extent_buffer_pages(root, buf, 0, parent_transid); <== 
fails
if (ret) {
free_extent_buffer(buf); <=== its freed already
return ERR_PTR(ret);
}
::
}

open_ctree()
{
::
chunk_root->node = read_tree_block(..);
::

BTRFS: failed to read chunk root on sdf
BUG: unable to handle kernel NULL pointer dereference at 001f
IP: [] free_extent_buffer+0x1e/0xb0 [btrfs]
::
[] ? free_root_extent_buffers+0x1e/0x40 [btrfs]
[] free_root_pointers+0x56/0x60 [btrfs]
[] open_ctree+0x19e0/0x2360 [btrfs]
[] btrfs_mount+0x9e6/0xb10 [btrfs]
[] ? find_next_zero_bit+0x1a/0x30
[] ? find_next_bit+0x15/0x30
[] ? pcpu_alloc+0x35a/0x680
[] mount_fs+0x38/0x190
[] ? __alloc_percpu+0x15/0x20
[] vfs_kern_mount+0x6b/0x120
[] btrfs_mount+0x1f8/0xb10 [btrfs]
[] ? pcpu_alloc+0x35a/0x680
[] mount_fs+0x38/0x190
[] ? __alloc_percpu+0x15/0x20
[] vfs_kern_mount+0x6b/0x120
[] do_mount+0x224/0xb80
[] SyS_mount+0x7e/0xe0
[] system_call_fastpath+0x12/0x71

this patch avoids calling it again

Signed-off-by: Anand Jain 
---
 fs/btrfs/disk-io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index da7b7bf..e8901ac 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2081,7 +2081,8 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info 
*fs_info)
 static void free_root_extent_buffers(struct btrfs_root *root)
 {
if (root) {
-   free_extent_buffer(root->node);
+   if (!IS_ERR(root->node))
+   free_extent_buffer(root->node);
free_extent_buffer(root->commit_root);
root->node = NULL;
root->commit_root = NULL;
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2015-08-04 Thread Chris Murphy
On Mon, Aug 3, 2015 at 9:01 PM, Qu Wenruo  wrote:

> Oh, converted...
> That's too bad. :(
>
> [[What's wrong with convert]]
> Although btrfs is flex enough in theory to fit itself into the free space of
> ext* and works fine,
> But in practice, ext* is too fragmental in the standard of btrfs, not to
> mention it also enables mixed-blockgroup.

There is an -f flag for mkfs to help users avoid accidents. Is there a
case to be made for btrfs-convert having either a -f flag, or an
interactive

"Convert has limitations that could increase risk to data, please see
the wiki. Continue? y/n"
OR
"Convert has limitations, is not recommended for production usage,
please see the wiki. Continue? y/n"

It just seems users are jumping into convert without reading the wiki
warning. Is it a good idea to reduce problems for less experienced
users by actively discouraging btrfs-convert for production use?



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstests: btrfs: Add regression test for reserved space leak.

2015-08-04 Thread Filipe David Manana
On Tue, Aug 4, 2015 at 7:27 AM, Qu Wenruo  wrote:
> The regression is introduced in v4.2-rc1, with the big btrfs qgroup
> change.
> The problem is, qgroup reserved space is never freed, causing even we
> increase the limit, we can still hit the EDQUOT much faster than it
> should.
>
> Reported-by: Tsutomu Itoh 
> Signed-off-by: Qu Wenruo 

Thanks for doing this Qu.
The test fails without the btrfs fix and passes with it, as expected.
However, one question below:

> ---
>  tests/btrfs/089 | 83 
> +
>  tests/btrfs/089.out |  5 
>  tests/btrfs/group   |  1 +
>  3 files changed, 89 insertions(+)
>  create mode 100755 tests/btrfs/089
>  create mode 100644 tests/btrfs/089.out
>
> diff --git a/tests/btrfs/089 b/tests/btrfs/089
> new file mode 100755
> index 000..0c018f2
> --- /dev/null
> +++ b/tests/btrfs/089
> @@ -0,0 +1,83 @@
> +#! /bin/bash
> +# FS QA Test 089
> +#
> +# Regression test for btrfs qgroup reserved space leak.
> +#
> +# Due to qgroup reserved space leak, EDQUOT can be trigged even it's not
> +# over limit after previous write.
> +#
> +#---
> +# Copyright (c) 2015 Fujitsu. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1   # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +   cd /
> +   rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch
> +_need_to_be_root
> +
> +# Use big blocksize to ensure there is still enough space left
> +# for metadata reserve after hitting EDQUOT
> +BLOCKSIZE=$(( 2 * 1024 * 1024 ))
> +FILESIZE=$(( 128 * 1024 * 1024 )) # 128Mbytes
> +
> +# The last block won't be able to finish write, as metadata takes
> +# $NODESIZE space, causing the last block triggering EDQUOT
> +LENGTH=$(( $FILESIZE - $BLOCKSIZE ))
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount
> +_require_fs_space $SCRATCH_MNT $(($FILESIZE * 2 / 1024))
> +
> +_run_btrfs_util_prog quota enable $SCRATCH_MNT
> +_run_btrfs_util_prog qgroup limit $FILESIZE 5 $SCRATCH_MNT
> +
> +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE 0 $LENGTH" \
> +   $SCRATCH_MNT/foo | _filter_xfs_io
> +sync

Why is the sync needed here? Can you add a comment explaining why? It
isn't trivial/obvious (for me at least), specially because without the
call to "sync" the test passes without the btrfs fix.

thanks

> +
> +# Double the limit to allow further write
> +_run_btrfs_util_prog qgroup limit $(($FILESIZE * 2)) 5 $SCRATCH_MNT
> +
> +# Test whether further write can succeed
> +$XFS_IO_PROG -f -c "pwrite -b $BLOCKSIZE $LENGTH $LENGTH" \
> +   $SCRATCH_MNT/foo | _filter_xfs_io
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/089.out b/tests/btrfs/089.out
> new file mode 100644
> index 000..396888f
> --- /dev/null
> +++ b/tests/btrfs/089.out
> @@ -0,0 +1,5 @@
> +QA output created by 089
> +wrote 132120576/132120576 bytes at offset 0
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 132120576/132120576 bytes at offset 132120576
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> diff --git a/tests/btrfs/group b/tests/btrfs/group
> index ffe18bf..225b532 100644
> --- a/tests/btrfs/group
> +++ b/tests/btrfs/group
> @@ -91,6 +91,7 @@
>  086 auto quick clone
>  087 auto quick send
>  088 auto quick metadata
> +089 auto quick qgroup
>  090 auto quick metadata
>  091 auto quick qgroup
>  092 auto quick send
> --
> 1.8.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
th

Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2015-08-04 Thread Austin S Hemmelgarn

On 2015-08-04 00:58, John Ettedgui wrote:

On Mon, Aug 3, 2015 at 8:01 PM, Qu Wenruo  wrote:

Although the best practice is staying away from such converted fs, either
using pure, newly created btrfs, or convert back to ext* before any balance.


Unfortunately I don't have enough hard drive space to do a clean
btrfs, so my only way to use btrfs for that partition was a
conversion.
If you could get your hands on a decent sized flash drive (32G or more), 
you could do an incremental conversion offline.  The steps would look 
something like this:


1. Boot the system into a LiveCD or something similar that doesn't need 
to run from your regular root partition (SystemRescueCD would be my 
personal recommendation, although if you go that way, make sure to boot 
the alternative kernel, as it's a lot newer then the standard ones).

2. Plug in the flash drive, format it as BTRFS.
3. Mount both your old partition and the flash drive somewhere.
4. Start copying files from the old partition to the flash drive.
5. When you hit ENOSPC on the flash drive, unmount the old partition, 
shrink it down to the minimum size possible, and create a new partition 
in the free space produced by doing so.

6. Add the new partition to the BTRFS filesystem on the flash drive.
7. Repeat steps 4-6 until you have copied everything.
8. Wipe the old partition, and add it to the BTRFS filesystem.
9. Run a full balance on the new BTRFS filesystem.
10. Delete the partition from step 5 that is closest to the old 
partition (via btrfs device delete), then resize the old partition to 
fill the space that the deleted partition took up.
11. Repeat steps 9-10 until the only remaining partitions in the new 
BTRFS filesystem are the old one and the flash drive.

12. Delete the flash drive from the BTRFS filesystem.

This takes some time and coordination, but it does work reliably as long 
as you are careful (I've done it before on multiple systems).





smime.p7s
Description: S/MIME Cryptographic Signature