Re: [PATCH 1/2] Btrfs: be more precise on errors when getting an inode from disk

2016-07-21 Thread NeilBrown
On Fri, Jul 22 2016, J. Bruce Fields wrote:

> On Fri, Jul 22, 2016 at 11:08:17AM +1000, NeilBrown wrote:
>> On Fri, Jun 10 2016, fdman...@kernel.org wrote:
>> 
>> > From: Filipe Manana 
>> >
>> > When we attempt to read an inode from disk, we end up always returning an
>> > -ESTALE error to the caller regardless of the actual failure reason, which
>> > can be an out of memory problem (when allocating a path), some error found
>> > when reading from the fs/subvolume btree (like a genuine IO error) or the
>> > inode does not exists. So lets start returning the real error code to the
>> > callers so that they don't treat all -ESTALE errors as meaning that the
>> > inode does not exists (such as during orphan cleanup). This will also be
>> > needed for a subsequent patch in the same series dealing with a special
>> > fsync case.
>> >
>> > Signed-off-by: Filipe Manana 
>> 
>> SNIP
>> 
>> > @@ -5594,7 +5602,8 @@ struct inode *btrfs_iget(struct super_block *s, 
>> > struct btrfs_key *location,
>> >} else {
>> >unlock_new_inode(inode);
>> >iput(inode);
>> > -  inode = ERR_PTR(-ESTALE);
>> > +  ASSERT(ret < 0);
>> > +  inode = ERR_PTR(ret < 0 ? ret : -ESTALE);
>> >}
>> 
>> Just a heads-up.  This change breaks NFS :-(
>> 
>> The change in error code percolates up the call chain:
>> 
>>  nfs4_pufh->fh_verify->nfsd_set_fh_dentry->exportfs_decode_fh
>> ->btrfs_fh_to_dentry->ntrfs_get_dentry->btrfs_iget
>> 
>> and nfsd returns NFS4ERR_NOENT to the client instead of NFS4ERR_STALE,
>> and the client doesn't handle that quite the same way.
>> 
>> This doesn't mean that the change is wrong, but it could mean we need to
>> fix something else in the path to sanitize the error code.
>> 
>> nfsd_set_fh_dentry already has
>> 
>>  error = nfserr_stale;
>>  if (PTR_ERR(exp) == -ENOENT)
>>  return error;
>> 
>>  if (IS_ERR(exp))
>>  return nfserrno(PTR_ERR(exp));
>> 
>> for a different error case, so duplicating that would work, but I doubt
>> it is best.  At the very least we should check for valid errors, not
>> specific invalid ones.
>> 
>> Bruce: do you have an opinion where we should make sure that PUTFH (and
>> various other requests) returns a valid error code?
>
> Uh, I guess not.  Maybe exportfs_decode_fh?
>
> Though my kneejerk reaction is to be cranky and wonder why btrfs
> suddenly needs a different convention for decode_fh().
>

I can certainly agree with that perspective, though it would be
appropriate in that case to make sure we document the requirements for
fh_to_dentry (the current spelling for 'decode_fh').  So I went looking
for documentation and found:

 * fh_to_dentry:
 *@fh_to_dentry is given a  super_block (@sb) and a file handle
 *fragment (@fh, @fh_len). It should return a  dentry which refers
 *to the same file that the file handle fragment refers to.  If it cannot,
 *it should return a %NULL pointer if the file was found but no acceptable
 * were available, or an %ERR_PTR error code indicating why it
 *couldn't be found (e.g. %ENOENT or %ENOMEM).  Any suitable dentry can be
 *returned including, if necessary, a new dentry created with d_alloc_root.
 *The caller can then find any other extant dentries by following the
 *d_alias links.
 *

So the new btrfs code is actually conformant!!
That documentation dates back to 2002 when I wrote it
And it looks like ENOENT wasn't handled correctly then :-(

I suspect anything that isn't ENOMEM should be converted to ESTALE.
ENOMEM causes the client to be asked to retry the request later.

Does this look reasonable to you?
(Adding Christof as he as contributed a lot to exportfs)

If there is agreement I'll test and post a proper patch.

Thanks,
NeilBrown


diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 207ba8d627ca..3527b58cd5bc 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -428,10 +428,10 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, 
struct fid *fid,
if (!nop || !nop->fh_to_dentry)
return ERR_PTR(-ESTALE);
result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type);
-   if (!result)
-   result = ERR_PTR(-ESTALE);
-   if (IS_ERR(result))
-   return result;
+   if (PTR_ERR(result) == -ENOMEM)
+   return ERR_CAST(result)
+   if (IS_ERR_OR_NULL(result))
+   return ERR_PTR(-ESTALE);
 
if (d_is_dir(result)) {
/*
@@ -541,6 +541,8 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, 
struct fid *fid,
 
  err_result:
dput(result);
+   if (err != -ENOMEM)
+   err = -ESTALE;
return ERR_PTR(err);
 }
 EXPORT_SYMBOL_GPL(exportfs_decode_fh);


signature.asc
Description: PGP signature


Re: [PATCH 1/2] Btrfs: be more precise on errors when getting an inode from disk

2016-07-21 Thread J. Bruce Fields
On Fri, Jul 22, 2016 at 11:08:17AM +1000, NeilBrown wrote:
> On Fri, Jun 10 2016, fdman...@kernel.org wrote:
> 
> > From: Filipe Manana 
> >
> > When we attempt to read an inode from disk, we end up always returning an
> > -ESTALE error to the caller regardless of the actual failure reason, which
> > can be an out of memory problem (when allocating a path), some error found
> > when reading from the fs/subvolume btree (like a genuine IO error) or the
> > inode does not exists. So lets start returning the real error code to the
> > callers so that they don't treat all -ESTALE errors as meaning that the
> > inode does not exists (such as during orphan cleanup). This will also be
> > needed for a subsequent patch in the same series dealing with a special
> > fsync case.
> >
> > Signed-off-by: Filipe Manana 
> 
> SNIP
> 
> > @@ -5594,7 +5602,8 @@ struct inode *btrfs_iget(struct super_block *s, 
> > struct btrfs_key *location,
> > } else {
> > unlock_new_inode(inode);
> > iput(inode);
> > -   inode = ERR_PTR(-ESTALE);
> > +   ASSERT(ret < 0);
> > +   inode = ERR_PTR(ret < 0 ? ret : -ESTALE);
> > }
> 
> Just a heads-up.  This change breaks NFS :-(
> 
> The change in error code percolates up the call chain:
> 
>  nfs4_pufh->fh_verify->nfsd_set_fh_dentry->exportfs_decode_fh
> ->btrfs_fh_to_dentry->ntrfs_get_dentry->btrfs_iget
> 
> and nfsd returns NFS4ERR_NOENT to the client instead of NFS4ERR_STALE,
> and the client doesn't handle that quite the same way.
> 
> This doesn't mean that the change is wrong, but it could mean we need to
> fix something else in the path to sanitize the error code.
> 
> nfsd_set_fh_dentry already has
> 
>   error = nfserr_stale;
>   if (PTR_ERR(exp) == -ENOENT)
>   return error;
> 
>   if (IS_ERR(exp))
>   return nfserrno(PTR_ERR(exp));
> 
> for a different error case, so duplicating that would work, but I doubt
> it is best.  At the very least we should check for valid errors, not
> specific invalid ones.
> 
> Bruce: do you have an opinion where we should make sure that PUTFH (and
> various other requests) returns a valid error code?

Uh, I guess not.  Maybe exportfs_decode_fh?

Though my kneejerk reaction is to be cranky and wonder why btrfs
suddenly needs a different convention for decode_fh().

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: be more precise on errors when getting an inode from disk

2016-07-21 Thread NeilBrown
On Fri, Jun 10 2016, fdman...@kernel.org wrote:

> From: Filipe Manana 
>
> When we attempt to read an inode from disk, we end up always returning an
> -ESTALE error to the caller regardless of the actual failure reason, which
> can be an out of memory problem (when allocating a path), some error found
> when reading from the fs/subvolume btree (like a genuine IO error) or the
> inode does not exists. So lets start returning the real error code to the
> callers so that they don't treat all -ESTALE errors as meaning that the
> inode does not exists (such as during orphan cleanup). This will also be
> needed for a subsequent patch in the same series dealing with a special
> fsync case.
>
> Signed-off-by: Filipe Manana 

SNIP

> @@ -5594,7 +5602,8 @@ struct inode *btrfs_iget(struct super_block *s, struct 
> btrfs_key *location,
>   } else {
>   unlock_new_inode(inode);
>   iput(inode);
> - inode = ERR_PTR(-ESTALE);
> + ASSERT(ret < 0);
> + inode = ERR_PTR(ret < 0 ? ret : -ESTALE);
>   }

Just a heads-up.  This change breaks NFS :-(

The change in error code percolates up the call chain:

 nfs4_pufh->fh_verify->nfsd_set_fh_dentry->exportfs_decode_fh
->btrfs_fh_to_dentry->ntrfs_get_dentry->btrfs_iget

and nfsd returns NFS4ERR_NOENT to the client instead of NFS4ERR_STALE,
and the client doesn't handle that quite the same way.

This doesn't mean that the change is wrong, but it could mean we need to
fix something else in the path to sanitize the error code.

nfsd_set_fh_dentry already has

error = nfserr_stale;
if (PTR_ERR(exp) == -ENOENT)
return error;

if (IS_ERR(exp))
return nfserrno(PTR_ERR(exp));

for a different error case, so duplicating that would work, but I doubt
it is best.  At the very least we should check for valid errors, not
specific invalid ones.

Bruce: do you have an opinion where we should make sure that PUTFH (and
various other requests) returns a valid error code?

Thanks,
NeilBrown



signature.asc
Description: PGP signature


Re: A lot warnings in dmesg while running thunderbird

2016-07-21 Thread Adam Borowski
On Thu, Jul 21, 2016 at 11:25:02PM +0200, Gabriel C wrote:
> On 21.07.2016 14:56, Chris Mason wrote:
> > On 07/20/2016 01:50 PM, Gabriel C wrote:
> >> After 24h of running the program and thundirbird all is still fine here.
> >>
> >> I let it run one more day.. But looks very good.
> >>
> > 
> > Thanks for your time in helping to track this down.  It'll go into the 
> > next merge window and be cc'd to stable.
> > 
> 
> You are welcome :)
> 
> Test program was running without problems for 52h.. I think your fix is fine 
> :)

AOL.
I haven't tested it for as long, but had it running concurrently with some
big compiles (kernel, sbuild ie snapshot+dpkg+subv delete), cleanup of
several tens of daily snapshots, etc.

No problems.

-- 
An imaginary friend squared is a real enemy.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/17] xfs: test realtime rmapbt code

2016-07-21 Thread Darrick J. Wong
Test the realtime rmap btree code by exercising various IO patterns
on realtime files.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/878 |   88 +++
 tests/xfs/878.out |9 
 tests/xfs/879 |   67 +
 tests/xfs/879.out |7 +++
 tests/xfs/880 |   86 ++
 tests/xfs/880.out |7 +++
 tests/xfs/881 |   95 +++
 tests/xfs/881.out |8 
 tests/xfs/882 |  108 +
 tests/xfs/882.out |   11 +
 tests/xfs/883 |   89 
 tests/xfs/883.out |   10 +
 tests/xfs/884 |   91 +
 tests/xfs/884.out |9 
 tests/xfs/885 |   94 ++
 tests/xfs/885.out |   10 +
 tests/xfs/886 |  104 +++
 tests/xfs/886.out |7 +++
 tests/xfs/887 |  107 +
 tests/xfs/887.out |7 +++
 tests/xfs/888 |   76 +
 tests/xfs/888.out |6 +++
 tests/xfs/group   |   11 +
 23 files changed, 1107 insertions(+)
 create mode 100644 tests/xfs/878
 create mode 100644 tests/xfs/878.out
 create mode 100755 tests/xfs/879
 create mode 100644 tests/xfs/879.out
 create mode 100755 tests/xfs/880
 create mode 100644 tests/xfs/880.out
 create mode 100755 tests/xfs/881
 create mode 100644 tests/xfs/881.out
 create mode 100755 tests/xfs/882
 create mode 100644 tests/xfs/882.out
 create mode 100755 tests/xfs/883
 create mode 100644 tests/xfs/883.out
 create mode 100755 tests/xfs/884
 create mode 100644 tests/xfs/884.out
 create mode 100755 tests/xfs/885
 create mode 100644 tests/xfs/885.out
 create mode 100755 tests/xfs/886
 create mode 100644 tests/xfs/886.out
 create mode 100755 tests/xfs/887
 create mode 100644 tests/xfs/887.out
 create mode 100755 tests/xfs/888
 create mode 100644 tests/xfs/888.out


diff --git a/tests/xfs/878 b/tests/xfs/878
new file mode 100644
index 000..b1f2672
--- /dev/null
+++ b/tests/xfs/878
@@ -0,0 +1,88 @@
+#! /bin/bash
+# FS QA Test No. 878
+#
+# Set rrmapino to another inode on an non-rt rmap fs and see if repair fixes 
it.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -rf "$tmp".* $metadump_file
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/attr
+
+# real QA test starts here
+_supported_os Linux
+_require_xfs_scratch_rmapbt
+
+rm -f "$seqres.full"
+
+echo "Format and mount"
+SCRATCH_RTDEV= USE_EXTERNAL= _scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount
+
+echo "Create some files"
+$XFS_IO_PROG -f -c "pwrite -S 0x68 0 " $SCRATCH_MNT/f1 >> $seqres.full
+$XFS_IO_PROG -f -c "pwrite -S 0x68 0 " $SCRATCH_MNT/f2 >> $seqres.full
+echo garbage > $SCRATCH_MNT/f3
+ino=$(stat -c '%i' $SCRATCH_MNT/f3)
+_scratch_unmount
+
+echo "Corrupt fs"
+$XFS_DB_PROG -x -c 'sb 0' -c "write rrmapino $ino" $SCRATCH_DEV >> $seqres.full
+_scratch_mount
+
+echo "Check files"
+md5sum $SCRATCH_MNT/f1 2>&1 | _filter_scratch
+
+echo "Try to create more files"
+$XFS_IO_PROG -f -c "pwrite -S 0x68 0 " $SCRATCH_MNT/f3 >> $seqres.full 2>&1
+
+echo "Repair fs"
+_scratch_unmount 2>&1 | _filter_scratch
+$XFS_LOGPRINT_PROG -t $extra_log_options $device 2>&1 \
+   | tee $tmp.logprint | grep -q ""
+if [ $? -ne 0 ]; then
+   echo "Dirty log, zeroing..." >> $seqres.full
+   _scratch_xfs_repair -L >> $seqres.full 2>&1
+else
+   _scratch_xfs_repair >> $seqres.full 2>&1
+fi
+_scratch_xfs_repair >> $seqres.full 2>&1
+
+echo "Try to create more files (again)"
+_scratch_mount
+$XFS_IO_PROG -f -c "pwrite -S 0x68 0 " $SCRATCH_MNT/f4 >> $seqres.full
+
+# success, all done
+status=0
+exit
diff 

[PATCH 08/17] xfs/129: fix post-metadump remounting idiocy

2016-07-21 Thread Darrick J. Wong
Use the standard _scratch_mount to mount the filesystem from the restored
image, instead of trying to call mount directly.  This is needed in case
we had custom mount options (like rtdev).

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/129 |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


diff --git a/tests/xfs/129 b/tests/xfs/129
index d29f842..17ff238 100755
--- a/tests/xfs/129
+++ b/tests/xfs/129
@@ -34,7 +34,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _cleanup()
 {
 cd /
-umount $SCRATCH_MNT > /dev/null 2>&1
+_scratch_unmount > /dev/null 2>&1
 rm -rf $tmp.* $testdir $metadump_file $TEST_DIR/image
 }
 
@@ -76,8 +76,8 @@ _scratch_metadump $metadump_file
 # Now restore the obfuscated one back and take a look around
 echo "Restore metadump"
 xfs_mdrestore $metadump_file $TEST_DIR/image
-_mount -t $FSTYP $TEST_DIR/image $SCRATCH_MNT
-umount $SCRATCH_MNT
+SCRATCH_DEV=$TEST_DIR/image _scratch_mount
+SCRATCH_DEV=$TEST_DIR/image _scratch_unmount
 
 echo "Check restored fs"
 _check_generic_filesystem $metadump_file

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/17] xfs/122: add the realtime rmapbt inode and btree fields

2016-07-21 Thread Darrick J. Wong
Add the on-disk structures added by the realtime rmapbt.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/122.out |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)


diff --git a/tests/xfs/122.out b/tests/xfs/122.out
index c4ed725..b20011e 100644
--- a/tests/xfs/122.out
+++ b/tests/xfs/122.out
@@ -44,6 +44,7 @@ offsetof(xfs_sb_t, sb_rextents) = 24
 offsetof(xfs_sb_t, sb_rextsize) = 80
 offsetof(xfs_sb_t, sb_rextslog) = 125
 offsetof(xfs_sb_t, sb_rootino) = 56
+offsetof(xfs_sb_t, sb_rrmapino) = 264
 offsetof(xfs_sb_t, sb_rsumino) = 72
 offsetof(xfs_sb_t, sb_sectlog) = 121
 offsetof(xfs_sb_t, sb_sectsize) = 102
@@ -86,6 +87,9 @@ sizeof(struct xfs_refcount_key) = 4
 sizeof(struct xfs_refcount_rec) = 12
 sizeof(struct xfs_rmap_key) = 20
 sizeof(struct xfs_rmap_rec) = 24
+sizeof(struct xfs_rtrmap_key) = 24
+sizeof(struct xfs_rtrmap_rec) = 32
+sizeof(struct xfs_rtrmap_root) = 4
 sizeof(struct xfs_rud_log_format) = 48
 sizeof(struct xfs_rui_log_format) = 48
 sizeof(xfs_agf_t) = 224
@@ -130,7 +134,7 @@ sizeof(xfs_dir2_sf_off_t) = 2
 sizeof(xfs_disk_dquot_t) = 104
 sizeof(xfs_dq_logformat_t) = 24
 sizeof(xfs_dqblk_t) = 136
-sizeof(xfs_dsb_t) = 264
+sizeof(xfs_dsb_t) = 272
 sizeof(xfs_efd_log_format_32_t) = 28
 sizeof(xfs_efd_log_format_64_t) = 32
 sizeof(xfs_efi_log_format_32_t) = 28

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/17] reflink: test cross-mountpoint reflink and dedupe

2016-07-21 Thread Darrick J. Wong
Test sharing blocks via reflink and dedupe between two different
mountpoints of the same filesystem.  This shouldn't work, since
we don't allow cross-mountpoint functions.

Signed-off-by: Darrick J. Wong 
---
 tests/generic/927 |   88 +
 tests/generic/927.out |9 +
 tests/generic/928 |   86 
 tests/generic/928.out |   10 ++
 tests/generic/group   |2 +
 5 files changed, 195 insertions(+)
 create mode 100755 tests/generic/927
 create mode 100644 tests/generic/927.out
 create mode 100755 tests/generic/928
 create mode 100644 tests/generic/928.out


diff --git a/tests/generic/927 b/tests/generic/927
new file mode 100755
index 000..c7d82b7
--- /dev/null
+++ b/tests/generic/927
@@ -0,0 +1,88 @@
+#! /bin/bash
+# FS QA Test No. 927
+#
+# Check that cross-mountpoint reflink doesn't work.
+#
+#---
+# Copyright (c) 2016 Oracle, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 7 15
+
+_cleanup()
+{
+   cd /
+   rm -rf $tmp.*
+   wait
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+
+# real QA test starts here
+_supported_os Linux
+_require_scratch_reflink
+_require_cp_reflink
+
+echo "Format and mount"
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+testdir=$SCRATCH_MNT/test-$seq
+mkdir $testdir
+otherdir=/tmp/m.$seq
+othertestdir=$otherdir/test-$seq
+rm -rf $otherdir
+mkdir $otherdir
+
+blocks=1
+blksz=65536
+sz=$((blksz * blocks))
+
+echo "Mount otherdir"
+SCRATCH_MNT=$otherdir _scratch_mount
+
+echo "Create file"
+_pwrite_byte 0x61 0 $sz $testdir/file >> $seqres.full
+
+filter_md5() {
+   _filter_scratch | sed -e "s,$otherdir,OTHER_DIR,g"
+}
+
+echo "Reflink one file to another"
+_cp_reflink $testdir/file $othertestdir/otherfiles 2>&1 | filter_md5
+
+echo "Check output"
+md5sum $testdir/file | _filter_scratch
+test -e $othertestdir/otherfile && echo "otherfile should not exist"
+
+echo "Unmount otherdir"
+SCRATCH_MNT=$otherdir _scratch_unmount
+rm -rf $otherdir
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/927.out b/tests/generic/927.out
new file mode 100644
index 000..3ef3057
--- /dev/null
+++ b/tests/generic/927.out
@@ -0,0 +1,9 @@
+QA output created by 927
+Format and mount
+Mount otherdir
+Create file
+Reflink one file to another
+cp: failed to clone 'OTHER_DIR/test-927/otherfiles' from 
'SCRATCH_MNT/test-927/file': Invalid cross-device link
+Check output
+2d61aa54b58c2e94403fb092c3dbc027  SCRATCH_MNT/test-927/file
+Unmount otherdir
diff --git a/tests/generic/928 b/tests/generic/928
new file mode 100755
index 000..86268e7
--- /dev/null
+++ b/tests/generic/928
@@ -0,0 +1,86 @@
+#! /bin/bash
+# FS QA Test No. 928
+#
+# Check that cross-mountpoint dedupe doesn't work.
+#
+#---
+# Copyright (c) 2016 Oracle, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 7 15
+
+_cleanup()
+{
+   cd /
+   rm -rf $tmp.*
+   wait
+}
+
+# get standard environment, filters and 

[PATCH 15/17] xfs: scrub fs (if still mounted) at the end of the test

2016-07-21 Thread Darrick J. Wong
Teach _check_xfs_filesystem to scrub mounted filesystems before
unmounting and fscking them.  This is mostly to test the online
scrub tool...

Signed-off-by: Darrick J. Wong 
---
 README|3 +++
 common/config |1 +
 common/rc |7 +++
 3 files changed, 11 insertions(+)


diff --git a/README b/README
index 4509cc1..c19fcb1 100644
--- a/README
+++ b/README
@@ -84,6 +84,9 @@ Preparing system for tests (IRIX and Linux):
run xfs_repair -n to check the filesystem; xfs_repair to rebuild
metadata indexes; and xfs_repair -n (a third time) to check the
results of the rebuilding.
+- set TEST_XFS_SCRUB=1 to have _check_xfs_filesystem run
+  xfs_scrub -vd to scrub the filesystem metadata online before
+   unmounting to run the offline check.
 
 - or add a case to the switch in common/config assigning
   these variables based on the hostname of your test
diff --git a/common/config b/common/config
index 08d5d80..168f46c 100644
--- a/common/config
+++ b/common/config
@@ -163,6 +163,7 @@ export XFS_REPAIR_PROG="`set_prog_path xfs_repair`"
 export XFS_DB_PROG="`set_prog_path xfs_db`"
 export XFS_GROWFS_PROG=`set_prog_path xfs_growfs`
 export XFS_IO_PROG="`set_prog_path xfs_io`"
+export XFS_SCRUB_PROG="`set_prog_path xfs_scrub`"
 export XFS_PARALLEL_REPAIR_PROG="`set_prog_path xfs_prepair`"
 export XFS_PARALLEL_REPAIR64_PROG="`set_prog_path xfs_prepair64`"
 export __XFSDUMP_PROG="`set_prog_path xfsdump`"
diff --git a/common/rc b/common/rc
index 3b45578..861a721 100644
--- a/common/rc
+++ b/common/rc
@@ -2391,6 +2391,13 @@ _check_xfs_filesystem()
 
 if [ "$type" = "xfs" ]
 then
+if [ -n "$TEST_XFS_SCRUB" ] && [ -x "$XFS_SCRUB_PROG" ]; then
+"$XFS_SCRUB_PROG" $scrubflag -vd $device >>$seqres.full
+if [ $? -ne 0 ]; then
+echo "filesystem on $device failed scrub (see $seqres.full)"
+ok=0
+fi
+fi
 # mounted ...
 mountpoint=`_umount_or_remount_ro $device`
 fi

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/17] xfs: test swapext with reflink

2016-07-21 Thread Darrick J. Wong
Add a few tests to stress the new swapext code for reflink and rmap.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/873 |  107 +
 tests/xfs/873.out |6 +++
 tests/xfs/874 |   99 +
 tests/xfs/874.out |   10 +
 tests/xfs/875 |  100 ++
 tests/xfs/875.out |   12 ++
 tests/xfs/group   |3 +
 7 files changed, 337 insertions(+)
 create mode 100755 tests/xfs/873
 create mode 100644 tests/xfs/873.out
 create mode 100755 tests/xfs/874
 create mode 100644 tests/xfs/874.out
 create mode 100755 tests/xfs/875
 create mode 100644 tests/xfs/875.out


diff --git a/tests/xfs/873 b/tests/xfs/873
new file mode 100755
index 000..a980148
--- /dev/null
+++ b/tests/xfs/873
@@ -0,0 +1,107 @@
+#! /bin/bash
+# FS QA Test No. 873
+#
+# See how well xfs_fsr handles "defragging" a file with a hojillion extents.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -rf "$tmp".*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/attr
+. ./common/reflink
+
+# real QA test starts here
+_supported_os Linux
+_require_scratch_reflink
+_require_cp_reflink
+_require_test_program "punch-alternating"
+test -x $XFS_FSR_PROG || _notrun "xfs_fsr not found"
+
+rm -f "$seqres.full"
+
+echo "Format and mount"
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount >> "$seqres.full" 2>&1
+
+testdir="$SCRATCH_MNT/test-$seq"
+mkdir "$testdir"
+
+# Setup for 16000 blocks, but we'll accept stress testing down to
+# 2^10 blocks... that should be plenty for anyone.
+fnr=$((12 + LOAD_FACTOR))
+free_blocks=$(stat -f -c '%a' "$testdir")
+blksz=$(stat -f -c '%S' "$testdir")
+space_avail=$((free_blocks * blksz))
+calc_space() {
+   blocks_needed=$(( 2 ** (fnr + 1) ))
+   space_needed=$((blocks_needed * blksz * 5 / 4))
+}
+calc_space
+while test $space_needed -gt $space_avail; do
+   fnr=$((fnr - 1))
+   calc_space
+done
+test $fnr -lt 10 && _notrun "Insufficient space for stress test; would only 
create $blocks_needed extents."
+bytes=$((blocks_needed * blksz))
+
+echo "Create a many-block file"
+echo "creating $blocks_needed blocks..." >> "$seqres.full"
+_pwrite_byte 0x62 0 $blksz $testdir/file0 >> $seqres.full
+"$XFS_IO_PROG" -f -c "pwrite -S 0x61 -b 4194304 0 $bytes" "$testdir/file1" >> 
"$seqres.full"
+echo "punching..." >> "$seqres.full"
+"$here/src/punch-alternating" "$testdir/file1" >> "$seqres.full"
+seq 0 2 $((2 ** (fnr + 1) )) | while read lblk; do
+   _reflink_range $testdir/file0 0 $testdir/file1 $((lblk * blksz)) $blksz 
>> $seqres.full
+done
+echo "...done" >> "$seqres.full"
+_scratch_cycle_mount
+
+echo "Reflink the big file"
+echo "reflinking $((blocks_needed / 2)) blocks, $((bytes / 2)) bytes" >> 
"$seqres.full"
+_reflink_range "$testdir/file1" 0 "$testdir/file2" 0 $bytes >> "$seqres.full"
+
+echo "Defrag the big file"
+old_nextents=$(xfs_io -c 'stat -v' $testdir/file1 | grep 'nextents' | cut -d ' 
' -f 3)
+$XFS_FSR_PROG -v -d $testdir/file1 >> $seqres.full
+new_nextents=$(xfs_io -c 'stat -v' $testdir/file1 | grep 'nextents' | cut -d ' 
' -f 3)
+
+echo "Check extent count"
+$XFS_IO_PROG -c 'stat -v' $testdir/file1 >> $seqres.full
+$XFS_IO_PROG -c 'stat -v' $testdir/file2 >> $seqres.full
+echo "extents: $old_nextents -> $new_nextents" >> $seqres.full
+test $old_nextents -gt $new_nextents || echo "FAIL: $old_nextents -> 
$new_nextents"
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/873.out b/tests/xfs/873.out
new file mode 100644
index 000..9dddf55
--- /dev/null
+++ b/tests/xfs/873.out
@@ -0,0 +1,6 @@
+QA output created by 873
+Format and mount
+Create a many-block file
+Reflink the big file
+Defrag the big file
+Check extent count
diff --git a/tests/xfs/874 b/tests/xfs/874
new file mode 100755
index 000..fb31ea8
--- /dev/null
+++ 

[PATCH 14/17] xfs: more rmapbt tests

2016-07-21 Thread Darrick J. Wong
More tests for the reverse mapping functionality.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/876 |   76 +++
 tests/xfs/876.out |4 ++
 tests/xfs/877 |   85 +
 tests/xfs/877.out |   10 ++
 tests/xfs/group   |4 ++
 5 files changed, 178 insertions(+), 1 deletion(-)
 create mode 100755 tests/xfs/876
 create mode 100644 tests/xfs/876.out
 create mode 100755 tests/xfs/877
 create mode 100644 tests/xfs/877.out


diff --git a/tests/xfs/876 b/tests/xfs/876
new file mode 100755
index 000..cf73d40
--- /dev/null
+++ b/tests/xfs/876
@@ -0,0 +1,76 @@
+#! /bin/bash
+# FS QA Test No. 876
+#
+# Create a big enough rmapbt that we tickle a fdblocks accounting bug.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -rf "$tmp".*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/attr
+. ./common/reflink
+
+# real QA test starts here
+_supported_os Linux
+_require_scratch
+_require_xfs_scratch_rmapbt
+_require_scratch_reflink
+_require_test_program "punch-alternating"
+
+rm -f "$seqres.full"
+
+echo "+ create scratch fs"
+_scratch_mkfs > "$seqres.full" 2>&1
+
+echo "+ mount fs image"
+_scratch_mount
+blksz="$(stat -f $SCRATCH_MNT -c '%S')"
+isize=$(xfs_info $SCRATCH_MNT | grep isize | sed -e 
's/^.*isize=\([0-9]*\).*$/\1/g')
+
+bt_ptrs=$(( (blksz - 56) / 44 ))
+bt_recs=$(( (blksz - 56) / 24 ))
+
+blocks=$((bt_ptrs * bt_recs + 1))
+_require_fs_space $SCRATCH_MNT $(( (2 * blocks * blksz) * 5 / 4096 ))
+len=$((blocks * blksz))
+
+echo "+ make some files"
+$XFS_IO_PROG -f -c "falloc 0 $len" -c "pwrite -S 0x68 -b 1048576 0 $len" 
$SCRATCH_MNT/f1 >> $seqres.full
+$XFS_IO_PROG -f -c "falloc 0 $len" -c "pwrite -S 0x68 -b 1048576 0 $len" 
$SCRATCH_MNT/f2 >> $seqres.full
+./src/punch-alternating $SCRATCH_MNT/f1 >> "$seqres.full"
+./src/punch-alternating $SCRATCH_MNT/f2 >> "$seqres.full"
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/876.out b/tests/xfs/876.out
new file mode 100644
index 000..96c1970
--- /dev/null
+++ b/tests/xfs/876.out
@@ -0,0 +1,4 @@
+QA output created by 876
++ create scratch fs
++ mount fs image
++ make some files
diff --git a/tests/xfs/877 b/tests/xfs/877
new file mode 100755
index 000..9700263
--- /dev/null
+++ b/tests/xfs/877
@@ -0,0 +1,85 @@
+#! /bin/bash
+# FS QA Test No. 877
+#
+# Make sure query_range returns -EINVAL if lowkey > highkey.
+#
+#---
+# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename "$0"`
+seqres="$RESULT_DIR/$seq"
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -rf "$tmp".* $metadump_file
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/attr
+
+# real QA test starts here
+_supported_os Linux
+_require_xfs_scratch_rmapbt
+
+rm -f "$seqres.full"
+
+echo "Format and mount"
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount
+blksz=65536
+blocks=16

[PATCH 09/17] common/dmerror: fix mount option issues

2016-07-21 Thread Darrick J. Wong
Calling _mount doesn't work when we want to add mount options
such as realtime devices.  Since it's just a normal scratch device
mount except for the source device, just call _scratch_mount with
SCRATCH_DEV set to the dmerror device.

Signed-off-by: Darrick J. Wong 
---
 common/dmerror |7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)


diff --git a/common/dmerror b/common/dmerror
index 5ad9994..22b9ea9 100644
--- a/common/dmerror
+++ b/common/dmerror
@@ -41,14 +41,9 @@ _dmerror_init()
DMERROR_TABLE="0 $blk_dev_size error $dm_backing_dev 0"
 }
 
-_dmerror_mount_options()
-{
-   echo `_common_dev_mount_options $*` $DMERROR_DEV $SCRATCH_MNT
-}
-
 _dmerror_mount()
 {
-   _mount -t $FSTYP `_dmerror_mount_options $*`
+   SCRATCH_DEV=$DMERROR_DEV _scratch_mount $*
 }
 
 _dmerror_unmount()

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/17] xfs/128: cycle_mount the scratch device, not the test device

2016-07-21 Thread Darrick J. Wong
This test uses the scratch device, so cycle that, not the test dev.
This is also a xfs_fsr test, so put it in the fsr group.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/128   |7 ---
 tests/xfs/group |2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)


diff --git a/tests/xfs/128 b/tests/xfs/128
index 8758d7e..2e756d5 100755
--- a/tests/xfs/128
+++ b/tests/xfs/128
@@ -66,7 +66,7 @@ _pwrite_byte 0x61 0 $((blks * blksz)) $testdir/file1 >> 
$seqres.full
 _cp_reflink $testdir/file1 $testdir/file2
 _cp_reflink $testdir/file2 $testdir/file3
 _cp_reflink $testdir/file3 $testdir/file4
-_test_cycle_mount
+_scratch_cycle_mount
 free_blocks1=$(stat -f $testdir -c '%f')
 
 md5sum $testdir/file1 | _filter_scratch
@@ -82,7 +82,7 @@ c04=$(_md5_checksum $testdir/file4)
 echo "CoW the reflink copies"
 _pwrite_byte 0x62 $blksz $blksz $testdir/file2 >> $seqres.full
 _pwrite_byte 0x63 $(( blksz * (blks - 1) )) $blksz $testdir/file3 >> 
$seqres.full
-_test_cycle_mount
+_scratch_cycle_mount
 free_blocks2=$(stat -f $testdir -c '%f')
 
 md5sum $testdir/file1 | _filter_scratch
@@ -97,11 +97,12 @@ c14=$(_md5_checksum $testdir/file4)
 
 echo "Defragment"
 lsattr -l $testdir/ | _filter_scratch | _filter_spaces
+filefrag -v $testdir/file* >> $seqres.full
 $XFS_FSR_PROG -v -d $testdir/file1 >> $seqres.full
 $XFS_FSR_PROG -v -d $testdir/file2 >> $seqres.full # fsr probably breaks the 
link
 $XFS_FSR_PROG -v -d $testdir/file3 >> $seqres.full # fsr probably breaks the 
link
 $XFS_FSR_PROG -v -d $testdir/file4 >> $seqres.full # fsr probably ignores this 
file
-_test_cycle_mount
+_scratch_cycle_mount
 free_blocks3=$(stat -f $testdir -c '%f')
 
 md5sum $testdir/file1 | _filter_scratch
diff --git a/tests/xfs/group b/tests/xfs/group
index aa3b3ec..5ccf3d6 100644
--- a/tests/xfs/group
+++ b/tests/xfs/group
@@ -125,7 +125,7 @@
 125 fuzzers
 126 fuzzers
 127 auto quick clone
-128 auto quick clone
+128 auto quick clone fsr
 129 auto quick clone
 130 fuzzers clone
 131 auto quick clone

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/17] xfs: run xfs_repair at the end of each test

2016-07-21 Thread Darrick J. Wong
Run xfs_repair twice at the end of each test -- once to rebuild
the btree indices, and again with -n to check the rebuild work.

Signed-off-by: Darrick J. Wong 
---
 README|4 
 common/rc |   30 ++
 2 files changed, 34 insertions(+)


diff --git a/README b/README
index 2647e12..4509cc1 100644
--- a/README
+++ b/README
@@ -80,6 +80,10 @@ Preparing system for tests (IRIX and Linux):
added to the end of fsstresss and fsx invocations, respectively,
in case you wish to exclude certain operational modes from these
tests.
+ - set TEST_XFS_REPAIR_REBUILD=1 to have _check_xfs_filesystem
+   run xfs_repair -n to check the filesystem; xfs_repair to rebuild
+   metadata indexes; and xfs_repair -n (a third time) to check the
+   results of the rebuilding.
 
 - or add a case to the switch in common/config assigning
   these variables based on the hostname of your test
diff --git a/common/rc b/common/rc
index 7c79bf8..3b45578 100644
--- a/common/rc
+++ b/common/rc
@@ -2428,6 +2428,36 @@ _check_xfs_filesystem()
 ok=0
 fi
 
+if [ -n "$TEST_XFS_REPAIR_REBUILD" ]; then
+$XFS_REPAIR_PROG -n $extra_options $extra_log_options 
$extra_rt_options $device >$tmp.repair 2>&1
+if [ $? -ne 0 ]
+then
+echo "_check_xfs_filesystem: filesystem on $device is inconsistent 
(r) (see $seqres.full)"
+
+echo "_check_xfs_filesystem: filesystem on $device is 
inconsistent" >>$seqres.full
+echo "*** xfs_repair -n output ***">>$seqres.full
+cat $tmp.repair | _fix_malloc  >>$seqres.full
+echo "*** end xfs_repair output"   >>$seqres.full
+
+ok=0
+fi
+rm -f $tmp.fs_check $tmp.logprint $tmp.repair
+
+$XFS_REPAIR_PROG $extra_options $extra_log_options $extra_rt_options 
$device >$tmp.repair 2>&1
+if [ $? -ne 0 ]
+then
+echo "_check_xfs_filesystem: filesystem on $device is inconsistent 
(r) (see $seqres.full)"
+
+echo "_check_xfs_filesystem: filesystem on $device is 
inconsistent" >>$seqres.full
+echo "*** xfs_repair -n output ***">>$seqres.full
+cat $tmp.repair | _fix_malloc  >>$seqres.full
+echo "*** end xfs_repair output"   >>$seqres.full
+
+ok=0
+fi
+rm -f $tmp.fs_check $tmp.logprint $tmp.repair
+fi
+
 $XFS_REPAIR_PROG -n $extra_options $extra_log_options $extra_rt_options 
$device >$tmp.repair 2>&1
 if [ $? -ne 0 ]
 then

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/17] xfs/179: use scratch device helpers

2016-07-21 Thread Darrick J. Wong
Use the helper functions for scratch devices.  This fixes a problem
where xfs/179 fails when there's a realtime device.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/179 |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)


diff --git a/tests/xfs/179 b/tests/xfs/179
index e0b0af8..18459cb 100755
--- a/tests/xfs/179
+++ b/tests/xfs/179
@@ -65,7 +65,7 @@ _pwrite_byte 0x61 0 $blksz $testdir/file1 >> $seqres.full
 _cp_reflink $testdir/file1 $testdir/file2 >> $seqres.full
 
 echo "Change reference count"
-umount $SCRATCH_MNT
+_scratch_unmount
 echo "set refcount to -4" >> $seqres.full
 $XFS_DB_PROG -x -c 'agf 0' -c 'addr refcntroot' -c 'write recs[1].refcount 
4294967292' $SCRATCH_DEV >> $seqres.full
 echo "check refcount after setting to -4" >> $seqres.full
@@ -80,7 +80,7 @@ _cp_reflink $testdir/file1 $testdir/file6 >> $seqres.full
 _cp_reflink $testdir/file1 $testdir/file7 >> $seqres.full
 
 echo "Check scratch fs"
-umount $SCRATCH_MNT
+_scratch_unmount
 echo "check refcount after reflinking 5 more times" >> $seqres.full
 $XFS_DB_PROG -c 'agf 0' -c 'addr refcntroot' -c 'p recs[1]' $SCRATCH_DEV >> 
$seqres.full
 _scratch_mount >> $seqres.full
@@ -91,7 +91,7 @@ _pwrite_byte 0x62 0 $blksz $testdir/file5 >> $seqres.full
 _pwrite_byte 0x62 0 $blksz $testdir/file7 >> $seqres.full
 
 echo "Check scratch fs"
-umount $SCRATCH_MNT
+_scratch_unmount
 echo "check refcount after cowing 3 files" >> $seqres.full
 $XFS_DB_PROG -c 'agf 0' -c 'addr refcntroot' -c 'p recs[1]' $SCRATCH_DEV >> 
$seqres.full
 _scratch_mount >> $seqres.full
@@ -100,10 +100,10 @@ echo "Remove reflinked files"
 rm -rf $testdir/file*
 
 echo "Check scratch fs"
-umount $SCRATCH_MNT
+_scratch_unmount
 echo "check refcount after removing all files" >> $seqres.full
 $XFS_DB_PROG -c 'agf 0' -c 'addr refcntroot' -c 'p recs[1]' $SCRATCH_DEV >> 
$seqres.full
-$XFS_REPAIR_PROG -o force_geometry -n $SCRATCH_DEV >> $seqres.full 2>&1
+_scratch_xfs_repair -o force_geometry -n >> $seqres.full 2>&1
 res=$?
 if [ $res -eq 0 ]; then
# If repair succeeds then format the device so that the post-test

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/17] xfs/234: use scratch device helpers

2016-07-21 Thread Darrick J. Wong
Use the helper functions for scratch devices.  This fixes a problem
where xfs/234 fails when there's a realtime device.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/234 |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


diff --git a/tests/xfs/234 b/tests/xfs/234
index 2bbf295..8dadc34 100755
--- a/tests/xfs/234
+++ b/tests/xfs/234
@@ -34,7 +34,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _cleanup()
 {
 cd /
-umount $SCRATCH_MNT > /dev/null 2>&1
+_scratch_unmount > /dev/null 2>&1
 rm -rf $tmp.* $metadump_file $TEST_DIR/image
 }
 
@@ -76,8 +76,8 @@ _scratch_metadump $metadump_file
 # Now restore the obfuscated one back and take a look around
 echo "Restore metadump"
 xfs_mdrestore $metadump_file $TEST_DIR/image
-_mount -t $FSTYP $TEST_DIR/image $SCRATCH_MNT
-umount $SCRATCH_MNT
+SCRATCH_DEV=$TEST_DIR/image _scratch_mount
+SCRATCH_DEV=$TEST_DIR/image _scratch_unmount
 
 echo "Check restored fs"
 _check_generic_filesystem $metadump_file

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/17] xfs/26[34]: remove duplicate tests

2016-07-21 Thread Darrick J. Wong
These two tests were accidentally double-added as xfs/30[78], but the
newer versions have fixed up helper usage and fewer whitespace
problems, so nuke the old tests.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/263 |  179 -
 tests/xfs/263.out |   13 
 tests/xfs/264 |  180 -
 tests/xfs/264.out |   13 
 tests/xfs/group   |2 -
 5 files changed, 387 deletions(-)
 delete mode 100755 tests/xfs/263
 delete mode 100644 tests/xfs/263.out
 delete mode 100755 tests/xfs/264
 delete mode 100644 tests/xfs/264.out


diff --git a/tests/xfs/263 b/tests/xfs/263
deleted file mode 100755
index 6659dee..000
--- a/tests/xfs/263
+++ /dev/null
@@ -1,179 +0,0 @@
-#! /bin/bash
-# FS QA Test No. 263
-#
-# Test recovery of "lost" CoW blocks:
-# - Use the debugger to fake a leftover CoW extent
-# - See if xfs_repair fixes it
-#
-#---
-# Copyright (c) 2016, Oracle and/or its affiliates.  All Rights Reserved.
-#
-# This program is free software; you can redistribute it and/or
-# modify it under the terms of the GNU General Public License as
-# published by the Free Software Foundation.
-#
-# This program is distributed in the hope that it would be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-# GNU General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program; if not, write the Free Software Foundation,
-# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
-#---
-
-seq=`basename $0`
-seqres=$RESULT_DIR/$seq
-echo "QA output created by $seq"
-
-here=`pwd`
-tmp=/tmp/$$
-status=1# failure is the default!
-trap "_cleanup; exit \$status" 0 1 2 3 15
-
-_cleanup()
-{
-cd /
-rm -rf $tmp.*
-}
-
-# get standard environment, filters and checks
-. ./common/rc
-. ./common/filter
-. ./common/reflink
-
-# real QA test starts here
-_supported_os Linux
-_supported_fs xfs
-_require_scratch_reflink
-
-rm -f $seqres.full
-
-echo "Format"
-_scratch_mkfs > $seqres.full 2>&1
-_scratch_mount >> $seqres.full
-is_rmap=$(xfs_info $SCRATCH_MNT | grep -c "rmapbt=1")
-umount $SCRATCH_MNT
-
-_get_agf_data() {
-   field="$1"
-   shift
-
-   xfs_db -c 'agf 1' "$@" -c "p $field" $SCRATCH_DEV | awk '{print $3}'
-}
-
-_set_agf_data() {
-   field="$1"
-   value="$2"
-   shift; shift
-
-   xfs_db -x -c 'agf 1' "$@" -c "write $field -- $value" $SCRATCH_DEV >> 
$seqres.full
-}
-
-_get_sb_data() {
-   field="$1"
-   shift
-
-   xfs_db -c 'sb 0' "$@" -c "p $field" $SCRATCH_DEV | awk '{print $3}'
-}
-
-_set_sb_data() {
-   field="$1"
-   value="$2"
-   shift; shift
-
-   xfs_db -x -c 'sb 0' "$@" -c "write $field -- $value" $SCRATCH_DEV >> 
$seqres.full
-}
-
-_filter_leftover() {
-   grep "^leftover" | sed -e "s/[0-9]\+/NR/g"
-}
-
-_dump_status() {
-   echo "** " "$@"
-   xfs_db -c 'sb 0' -c p $SCRATCH_DEV
-   echo "** agf header"
-   xfs_db -c 'agf 1' -c p $SCRATCH_DEV
-   echo "** refcntbt"
-   xfs_db -c 'agf 1' -c 'addr refcntroot' -c p $SCRATCH_DEV
-   echo "** rmapbt"
-   test $is_rmap -gt 0 && xfs_db -c 'agf 1' -c 'addr rmaproot' -c p 
$SCRATCH_DEV
-   echo "** bnobt"
-   xfs_db -c 'agf 1' -c 'addr bnoroot' -c p $SCRATCH_DEV
-   echo "** cntbt"
-   xfs_db -c 'agf 1' -c 'addr cntroot' -c p $SCRATCH_DEV
-}
-
-echo "We need AG1 to have a single free extent"
-bno_lvl=$(_get_agf_data level -c 'addr bnoroot')
-bno_nr=$(_get_agf_data numrecs -c 'addr bnoroot')
-refc_lvl=$(_get_agf_data level -c 'addr refcntroot')
-refc_nr=$(_get_agf_data numrecs -c 'addr refcntroot')
-
-test $bno_lvl -eq 0 || echo "  AG 1 bnobt must only have one level"
-test $bno_nr -eq 1 || echo "  AG 1 bnobt must only have one record"
-test $refc_lvl -eq 0 || echo "  AG 1 refcountbt must only have one level"
-test $refc_nr -eq 0 || echo "  AG 1 refcountbt must only have one record"
-
-if [ $is_rmap -gt 0 ]; then
-   rmap_lvl=$(_get_agf_data level -c 'addr rmaproot')
-   rmap_nr=$(_get_agf_data numrecs -c 'addr rmaproot')
-   test $rmap_lvl -eq 0 || echo "  AG 1 rmapbt must only have one level"
-fi
-
-echo "Find our extent and old counter values"
-bno=$(_get_agf_data "recs[1].startblock" -c 'addr bnoroot')
-len=$(_get_agf_data "recs[1].blockcount" -c 'addr bnoroot')
-agf_freeblks=$(_get_agf_data freeblks)
-sb_fdblocks=$(_get_sb_data fdblocks)
-
-test $len -ge 200 || echo "  AG 1 doesn't have enough free blocks"
-
-# Take the last 100 blocks of the free extent
-debris_len=100
-debris_bno=$((bno + len - debris_len))
-
-echo "Remove the extent from the freesp btrees"
-_set_agf_data "recs[1].blockcount" $((len - 

[PATCH 04/17] xfs/122: list the new log redo items

2016-07-21 Thread Darrick J. Wong
List the new log redo items.  These should have stable sizes.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/122.out |8 
 1 file changed, 8 insertions(+)


diff --git a/tests/xfs/122.out b/tests/xfs/122.out
index ebc4421..c4ed725 100644
--- a/tests/xfs/122.out
+++ b/tests/xfs/122.out
@@ -60,7 +60,11 @@ sizeof(struct xfs_attr3_leaf_hdr) = 80
 sizeof(struct xfs_attr3_leafblock) = 88
 sizeof(struct xfs_attr3_rmt_hdr) = 56
 sizeof(struct xfs_btree_block) = 72
+sizeof(struct xfs_bud_log_format) = 48
+sizeof(struct xfs_bui_log_format) = 48
 sizeof(struct xfs_clone_args) = 32
+sizeof(struct xfs_cud_log_format) = 32
+sizeof(struct xfs_cui_log_format) = 32
 sizeof(struct xfs_da3_blkinfo) = 56
 sizeof(struct xfs_da3_intnode) = 64
 sizeof(struct xfs_da3_node_hdr) = 64
@@ -76,10 +80,14 @@ sizeof(struct xfs_extent_data_info) = 32
 sizeof(struct xfs_fs_eofblocks) = 128
 sizeof(struct xfs_icreate_log) = 28
 sizeof(struct xfs_log_dinode) = 176
+sizeof(struct xfs_map_extent) = 32
+sizeof(struct xfs_phys_extent) = 16
 sizeof(struct xfs_refcount_key) = 4
 sizeof(struct xfs_refcount_rec) = 12
 sizeof(struct xfs_rmap_key) = 20
 sizeof(struct xfs_rmap_rec) = 24
+sizeof(struct xfs_rud_log_format) = 48
+sizeof(struct xfs_rui_log_format) = 48
 sizeof(xfs_agf_t) = 224
 sizeof(xfs_agfl_t) = 36
 sizeof(xfs_agi_t) = 336

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/17] xfs: use rmapbt-checking helper

2016-07-21 Thread Darrick J. Wong
Don't open-code _notrun checks for the rmapbt, just use the helper.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/310 |4 +---
 tests/xfs/314 |4 +---
 tests/xfs/317 |4 +---
 tests/xfs/322 |4 +---
 4 files changed, 4 insertions(+), 12 deletions(-)


diff --git a/tests/xfs/310 b/tests/xfs/310
index 36b683c..bfdec39 100755
--- a/tests/xfs/310
+++ b/tests/xfs/310
@@ -47,6 +47,7 @@ _cleanup()
 _supported_os Linux
 _supported_fs xfs
 _require_scratch
+_require_xfs_scratch_rmapbt
 _require_xfs_io_command "falloc"
 
 rm -f $seqres.full
@@ -56,9 +57,6 @@ echo "Figure out block size"
 _scratch_mkfs >/dev/null 2>&1
 _scratch_mount >> $seqres.full
 
-is_rmap=$(xfs_info $SCRATCH_MNT | grep -c "rmapbt=1")
-test $is_rmap -gt 0 || _notrun "rmap not supported on scratch fs"
-
 testdir=$SCRATCH_MNT/test-$seq
 blksz="$(stat -f $SCRATCH_MNT -c '%S')"
 
diff --git a/tests/xfs/314 b/tests/xfs/314
index 6b867b8..ec16c6f 100755
--- a/tests/xfs/314
+++ b/tests/xfs/314
@@ -48,6 +48,7 @@ _supported_os Linux
 _supported_fs xfs
 _require_cp_reflink
 _require_scratch_reflink
+_require_xfs_scratch_rmapbt
 _require_error_injection
 _require_xfs_io_error_injection "rmap_finish_one"
 
@@ -60,9 +61,6 @@ echo "Format filesystem"
 _scratch_mkfs >/dev/null 2>&1
 _scratch_mount >> $seqres.full
 
-is_rmap=$(xfs_info $SCRATCH_MNT | grep -c "rmapbt=1")
-test $is_rmap -gt 0 || _notrun "rmap not supported on scratch fs"
-
 echo "Create files"
 _pwrite_byte 0x66 0 $sz $SCRATCH_MNT/file1 >> $seqres.full
 _cp_reflink $SCRATCH_MNT/file1 $SCRATCH_MNT/file2
diff --git a/tests/xfs/317 b/tests/xfs/317
index ac9f3ae..507c2a9 100755
--- a/tests/xfs/317
+++ b/tests/xfs/317
@@ -45,6 +45,7 @@ _cleanup()
 _supported_os Linux
 _supported_fs xfs
 _require_scratch
+_require_xfs_scratch_rmapbt
 _require_error_injection
 _require_xfs_io_error_injection "rmap_finish_one"
 
@@ -57,9 +58,6 @@ echo "Format filesystem"
 _scratch_mkfs >/dev/null 2>&1
 _scratch_mount >> $seqres.full
 
-is_rmap=$(xfs_info $SCRATCH_MNT | grep -c "rmapbt=1")
-test $is_rmap -gt 0 || _notrun "rmap not supported on scratch fs"
-
 echo "Create files"
 touch $SCRATCH_MNT/file1
 _pwrite_byte 0x67 0 $sz $SCRATCH_MNT/file0 >> $seqres.full
diff --git a/tests/xfs/322 b/tests/xfs/322
index 0dddb1f..ef402c8 100755
--- a/tests/xfs/322
+++ b/tests/xfs/322
@@ -48,6 +48,7 @@ _supported_os Linux
 _supported_fs xfs
 _require_cp_reflink
 _require_scratch_reflink
+_require_xfs_scratch_rmapbt
 _require_xfs_io_error_injection "rmap_finish_one"
 
 rm -f $seqres.full
@@ -59,9 +60,6 @@ echo "Format filesystem"
 _scratch_mkfs >/dev/null 2>&1
 _scratch_mount >> $seqres.full
 
-is_rmap=$(xfs_info $SCRATCH_MNT | grep -c "rmapbt=1")
-test $is_rmap -gt 0 || _notrun "rmap not supported on scratch fs"
-
 echo "Create files"
 _pwrite_byte 0x66 0 $sz $SCRATCH_MNT/file1 >> $seqres.full
 $XFS_IO_PROG -f -c "truncate $sz" $SCRATCH_MNT/file3 >> $seqres.full

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 00/17] xfstests: fixes and new tests for rmap/reflink/etc

2016-07-21 Thread Darrick J. Wong
Hi all,

This is the seventh revision of a patchset that adds to xfstests
support for testing reverse-mappings of physical blocks to file and
metadata (rmap); support for testing multiple file logical blocks to
the same physical block (reflink); and implements the beginnings of
online metadata scrubbing.

The first eleven tests fix various bugs in existing reflink and rmap
tests, most of which were a result of not using helpers when I should
have.

After that, there are new tests to make sure that we can't clone_range
across mountpoints; test to make sure that swapext works with many
extents on a rmap filesystem; and tests for realtime reverse-mapping.
There are also revised patches for testing xfs_repair's ability to
rebuild filesystem indices correctly, and to run xfs_scrub during each
test.

If you're going to start using this mess, you probably ought to just
pull from my github trees for kernel[1], xfsprogs[2], and xfstests[3].
There are also updates for xfs-docs[4].  The kernel patches should
apply to dchinner's for-next; xfsprogs patches to for-next; and
xfstest to master.  The kernel git tree already has for-next included.

The patches have been xfstested with x64, i386, armv7l, and ppc64.
AFAICT these don't cause any new failures for the 'auto' group.

This is an extraordinary way to eat your data.  Enjoy! 
Comments and questions are, as always, welcome.

--D

[1] https://github.com/djwong/linux/tree/djwong-experimental
[2] https://github.com/djwong/xfsprogs/tree/djwong-experimental
[3] https://github.com/djwong/xfstests/tree/djwong-devel
[4] https://github.com/djwong/xfs-documentation/tree/djwong-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/17] xfs/310: fix the size calculation for the huge device

2016-07-21 Thread Darrick J. Wong
Fix the calculation of the dmhuge size.  The previous calculation
tried to calculate the size correctly, but got it wrong for 1k
block sizes.  Therefore, clean the whole mess up.

Signed-off-by: Darrick J. Wong 
---
 tests/xfs/310 |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


diff --git a/tests/xfs/310 b/tests/xfs/310
index bfdec39..5125773 100755
--- a/tests/xfs/310
+++ b/tests/xfs/310
@@ -63,7 +63,9 @@ blksz="$(stat -f $SCRATCH_MNT -c '%S')"
 umount $SCRATCH_MNT
 
 echo "Format huge device"
-_dmhugedisk_init $((blksz * 2 * 4400)) # a little over 2^22 blocks
+nr_blks=210# 2^21 plus a little more
+sectors=$(( (nr_blks * 3) * blksz / 512 )) # each AG must have > 2^21 blocks
+_dmhugedisk_init $sectors
 _mkfs_dev -d agcount=2 $DMHUGEDISK_DEV
 _mount $DMHUGEDISK_DEV $SCRATCH_MNT
 xfs_info $SCRATCH_MNT >> $seqres.full
@@ -71,7 +73,6 @@ xfs_info $SCRATCH_MNT >> $seqres.full
 echo "Create the original file blocks"
 mkdir $testdir
 blksz="$(stat -f $testdir -c '%S')"
-nr_blks=210# 2^21 plus a little more
 $XFS_IO_PROG -f -c "falloc 0 $((nr_blks * blksz))" $testdir/file1 >> 
$seqres.full
 
 echo "Check extent count"

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/17] common/reflink: actually test dedupe on scratch device

2016-07-21 Thread Darrick J. Wong
In _require_scratch_dedupe, test the scratch device, not the testdev.

Signed-off-by: Darrick J. Wong 
---
 common/reflink |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/common/reflink b/common/reflink
index 4ec390d..1363971 100644
--- a/common/reflink
+++ b/common/reflink
@@ -107,7 +107,7 @@ _require_scratch_dedupe()
_scratch_mount
"$XFS_IO_PROG" -f -c "pwrite -S 0x61 0 65536" "$SCRATCH_MNT/file1" > 
/dev/null
"$XFS_IO_PROG" -f -c "pwrite -S 0x61 0 65536" "$SCRATCH_MNT/file2" > 
/dev/null
-   testio="$("$XFS_IO_PROG" -f -c "dedupe $TEST_DIR/file1 0 0 65536" 
"$TEST_DIR/file2" 2>&1)"
+   testio="$("$XFS_IO_PROG" -f -c "dedupe $SCRATCH_MNT/file1 0 0 65536" 
"$SCRATCH_MNT/file2" 2>&1)"
echo $testio | grep -q "Operation not supported" && \
_notrun "Dedupe not supported by test filesystem type: $FSTYP"
echo $testio | grep -q "Inappropriate ioctl for device" && \

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] generic/371: run write(2) and fallocate(2) in parallel

2016-07-21 Thread Dave Chinner
On Thu, Jul 21, 2016 at 06:41:00PM +0800, Eryu Guan wrote:
> On Thu, Jul 21, 2016 at 03:30:25PM +0800, Wang Xiaoguang wrote:
> > +
> > +run_time=$((180 * $TIME_FACTOR))
> 
> 180s is too long time, I can reproduce it in around 10s on my test vm,
> just loop for 100 times for each operation (pwrite and falloc)

Or simply do:

run_time=$((20 * $TIME_FACTOR * $TIME_FACTOR))

So that the time factor scales up quickly when you want to run long
running tests.

> > --- a/tests/generic/group
> > +++ b/tests/generic/group
> > @@ -373,3 +373,4 @@
> >  368 auto quick richacl
> >  369 auto quick richacl
> >  370 auto quick richacl
> > +371 auto enospc prealloc stress
> 
> So we can add 'quick' group and remove 'stress'.

I'd leave the test in stress if we can run it based on time as per
my above suggestion.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstests: btrfs: Test send on heavily deduped file

2016-07-21 Thread Dave Chinner
On Thu, Jul 21, 2016 at 10:05:25AM +0800, Qu Wenruo wrote:
> 
> 
> At 07/21/2016 07:37 AM, Dave Chinner wrote:
> >On Wed, Jul 20, 2016 at 03:40:29PM +0800, Qu Wenruo wrote:
> >>At 07/20/2016 03:01 PM, Eryu Guan wrote:
> >>>On Tue, Jul 19, 2016 at 01:42:03PM +0800, Qu Wenruo wrote:
> >
> >This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it
> >hangs the latest kernel, stop other tests from running, I think we can
> >add it to 'dangerous' group as well.
> >
> 
> Thanks for this info.
> I'm completely OK to add this group to 'stress' and 'dangerous'.
> 
> 
> However I'm a little curious about the meaning/standard of these groups.
> 
> Does 'dangerous' conflicts with 'auto'?
> Since under most case, tester would just execute './check -g auto' and the
> system hangs at the test case.
> So I'm a little confused with the 'auto' group.
> >>>
> >>>I quote my previous email here to explain the 'auto' group
> >>>http://www.spinics.net/lists/fstests/msg03262.html
> >>>
> >>>"
> >>>I searched for Dave's explainations on 'auto' group in his reviews, and
> >>>got the following definitions:
> >>>
> >>>- it should be a valid & reliable test (it's finished and have
> >>> deterministic output) [1]
> >>>- it passes on current upstream kernels, if it fails, it's likely to be
> >>> resolved in forseeable future [2]
> >>>- it should take no longer than 5 minutes to finish [3]
> >>>"
> >>>
> >>>And "The only difference between quick and auto group criteria is the
> >>>test runtime." Usually 'quick' tests finish within 30s.
> >>>
> >>>For the 'dangerous' group, it was added in commit 3f28d55c3954 ("add
> >>>freeze and dangerous groups"), and seems that it didn't have a very
> >>>clear definition[*]. But I think any test that could hang/crash recent
> >>>kernels is considered as dangerous.
> >>>
> >>>* http://oss.sgi.com/archives/xfs/2012-03/msg00073.html
> >>
> >>Thanks for all the info, really helps a lot.
> >>
> >>Especially for quick and auto difference.
> >>
> >>Would you mind me applying this standard to current btrfs test cases?
> >
> >It shoul dbe applied to all test cases, regardless of the filesystem
> >type.
> 
> It's straightforward for specific fs test cases.
> 
> But for generic, I'm a little concerned of the quick/auto standard.
> Should we use result of one single fs(and which fs? I assume xfs
> though) or all fs, to determine quick/auto group?

It's up to the person introducing the new test to determine how it
should be classified. If this causes problems for other people, then
they can send patches to reclassify it appropriately based on their
runtime numbers and configuration.

Historicaly speaking, we've tended to ignore btrfs performance
because it's been so randomly terrible. It's so often been a massive
outlier that we've generally considered btrfs performance to be a
bug that needs fixing and not something that requires the test to be
reclassified for everyone else.

> So it makes quick/auto tag quite hard to determine.

It's quite straight forward, really. Send patches with numbers for
the tests you want reclassified, and lots of people will say "yes, i
see that too" or "no, that only takes 2s on my smallest, slowest
test machine, it should remain as a quick test". And that's about
it.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: fix memory leak of block group cache

2016-07-21 Thread Liu Bo
On Thu, Jul 21, 2016 at 03:17:41PM -0400, Chris Mason wrote:
> 
> 
> On 07/21/2016 03:03 PM, Liu Bo wrote:
> > On Thu, Jul 21, 2016 at 11:32:26AM -0400, Chris Mason wrote:
> > > 
> > > 
> > > On 07/20/2016 08:44 PM, Liu Bo wrote:
> > > > While processing delayed refs, we may update block group's statistics
> > > > and attach it to cur_trans->dirty_bgs, and later writing dirty block
> > > > groups will process the list, which happens during
> > > > btrfs_commit_transaction().
> > > > 
> > > > For whatever reason, the transaction is aborted and dirty_bgs
> > > > is not processed in cleanup_transaction(), we end up with memory leak
> > > > of these dirty block group cache.
> > > > 
> > > > Since btrfs_start_dirty_block_groups() doesn't make it go to the commit
> > > > critical section, this also adds the cleanup work inside it.
> > > 
> > > It's the start_drity_block_groups() hunt that worries me a bit:
> > > 
> > > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > > > index 50bd683..7a35c9d 100644
> > > > --- a/fs/btrfs/extent-tree.c
> > > > +++ b/fs/btrfs/extent-tree.c
> > > > @@ -3698,6 +3698,8 @@ again:
> > > > goto again;
> > > > }
> > > > spin_unlock(_trans->dirty_bgs_lock);
> > > > +   } else if (ret < 0) {
> > > > +   btrfs_cleanup_dirty_bgs(cur_trans, root);
> > > > }
> > > > 
> > > > btrfs_free_path(path);
> > > > 
> > > 
> > > We have checks in here to make sure only one process runs
> > > btrfs_start_dirty_block_groups() but that doesn't mean that only one 
> > > process
> > > its messing around with the cache inode.  Is there any reason we can't let
> > > this cleanup wait for the cleanup_transaction code?
> > > 
> > > btrfs_run_delayed_refs() already aborts when it fails.
> > 
> > update_block_group() is the only producer to add block group cache to
> > dirty_bgs list, and if btrfs_run_delayed_refs() aborts, the transaction
> > is aborted, so seems that there won't be anyone manipulating dirty_bgs
> > list, am I missing?
> > 
> 
> No, the dirty_bgs processing is safe I think.  My concern is with the cache
> inode which we iput()

I think iput() is OK, we're doing iput() on block group cache on the io_bgs
list, where all block groups's inodes has been igrab()'d.  If others are
messing around with our cache inode, they should have their own igrab,
too.

> 
> > Another point is that when we fail on btrfs_start_dirty_block_groups(),
> > btrfs_commit_transaction() won't get to cleanup_transaction error
> > handling,
> 
> Right, because we don't actually finish the commit.  Someone will eventually
> though ;)

Hmm yes, it's possible that there's a concurrent commit transaction
running.  If that's not true, we may still resort to
btrfs_error_commit_super(), other than that, I don't see who could
commit/cleanup the transaction after entering into BTRFS_FS_STATE_ERROR
state.

Thanks,

-liubo

> 
> > 
> > btrfs_commit_transaction() {
> > ...
> > if (!test_bit(BTRFS_TRANS_DIRTY_BG_RUN, _trans->flags)) {
> > ...
> > ret = btrfs_start_dirty_block_groups(trans, root);
> > }
> > if (ret) {
> > btrfs_end_transaction(trans, root);
> > return ret;
> > }
> > ...
> > cleanup_transaction();
> > }
> > 
> > 
> > But yes, if we delay the cleanup, we still have a chance to do cleanup
> > in btrfs_error_commit_super(), and I have sent another patch to add
> > several ASSERT()s to check block group related memory leak, with which
> > we'll be warned if anything wrong.
> > 
> > I'm OK to remove the part that causes concerns.
> 
> Thanks.
> 
> -chris
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A lot warnings in dmesg while running thunderbird

2016-07-21 Thread Gabriel C


On 21.07.2016 14:56, Chris Mason wrote:
> On 07/20/2016 01:50 PM, Gabriel C wrote:
>>
>> After 24h of running the program and thundirbird all is still fine here.
>>
>> I let it run one more day.. But looks very good.
>>
> 
> Thanks for your time in helping to track this down.  It'll go into the 
> next merge window and be cc'd to stable.
> 

You are welcome :)

Test program was running without problems for 52h.. I think your fix is fine :)

Also feel free to add Tested-by: Gabriel Craciunescu  to 
you commit.

Regrads,

Gabriel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: fix memory leak of block group cache

2016-07-21 Thread Chris Mason



On 07/21/2016 03:03 PM, Liu Bo wrote:

On Thu, Jul 21, 2016 at 11:32:26AM -0400, Chris Mason wrote:



On 07/20/2016 08:44 PM, Liu Bo wrote:

While processing delayed refs, we may update block group's statistics
and attach it to cur_trans->dirty_bgs, and later writing dirty block
groups will process the list, which happens during
btrfs_commit_transaction().

For whatever reason, the transaction is aborted and dirty_bgs
is not processed in cleanup_transaction(), we end up with memory leak
of these dirty block group cache.

Since btrfs_start_dirty_block_groups() doesn't make it go to the commit
critical section, this also adds the cleanup work inside it.


It's the start_drity_block_groups() hunt that worries me a bit:


diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 50bd683..7a35c9d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3698,6 +3698,8 @@ again:
goto again;
}
spin_unlock(_trans->dirty_bgs_lock);
+   } else if (ret < 0) {
+   btrfs_cleanup_dirty_bgs(cur_trans, root);
}

btrfs_free_path(path);



We have checks in here to make sure only one process runs
btrfs_start_dirty_block_groups() but that doesn't mean that only one process
its messing around with the cache inode.  Is there any reason we can't let
this cleanup wait for the cleanup_transaction code?

btrfs_run_delayed_refs() already aborts when it fails.


update_block_group() is the only producer to add block group cache to
dirty_bgs list, and if btrfs_run_delayed_refs() aborts, the transaction
is aborted, so seems that there won't be anyone manipulating dirty_bgs
list, am I missing?



No, the dirty_bgs processing is safe I think.  My concern is with the 
cache inode which we iput()



Another point is that when we fail on btrfs_start_dirty_block_groups(),
btrfs_commit_transaction() won't get to cleanup_transaction error
handling,


Right, because we don't actually finish the commit.  Someone will 
eventually though ;)




btrfs_commit_transaction() {
...
if (!test_bit(BTRFS_TRANS_DIRTY_BG_RUN, _trans->flags)) {
...
ret = btrfs_start_dirty_block_groups(trans, root);
}
if (ret) {
btrfs_end_transaction(trans, root);
return ret;
}
...
cleanup_transaction();
}


But yes, if we delay the cleanup, we still have a chance to do cleanup
in btrfs_error_commit_super(), and I have sent another patch to add
several ASSERT()s to check block group related memory leak, with which
we'll be warned if anything wrong.

I'm OK to remove the part that causes concerns.


Thanks.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: fix memory leak of block group cache

2016-07-21 Thread Liu Bo
On Thu, Jul 21, 2016 at 11:32:26AM -0400, Chris Mason wrote:
> 
> 
> On 07/20/2016 08:44 PM, Liu Bo wrote:
> > While processing delayed refs, we may update block group's statistics
> > and attach it to cur_trans->dirty_bgs, and later writing dirty block
> > groups will process the list, which happens during
> > btrfs_commit_transaction().
> > 
> > For whatever reason, the transaction is aborted and dirty_bgs
> > is not processed in cleanup_transaction(), we end up with memory leak
> > of these dirty block group cache.
> > 
> > Since btrfs_start_dirty_block_groups() doesn't make it go to the commit
> > critical section, this also adds the cleanup work inside it.
> 
> It's the start_drity_block_groups() hunt that worries me a bit:
> 
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index 50bd683..7a35c9d 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -3698,6 +3698,8 @@ again:
> > goto again;
> > }
> > spin_unlock(_trans->dirty_bgs_lock);
> > +   } else if (ret < 0) {
> > +   btrfs_cleanup_dirty_bgs(cur_trans, root);
> > }
> > 
> > btrfs_free_path(path);
> > 
> 
> We have checks in here to make sure only one process runs
> btrfs_start_dirty_block_groups() but that doesn't mean that only one process
> its messing around with the cache inode.  Is there any reason we can't let
> this cleanup wait for the cleanup_transaction code?
> 
> btrfs_run_delayed_refs() already aborts when it fails.

update_block_group() is the only producer to add block group cache to
dirty_bgs list, and if btrfs_run_delayed_refs() aborts, the transaction
is aborted, so seems that there won't be anyone manipulating dirty_bgs
list, am I missing?

Another point is that when we fail on btrfs_start_dirty_block_groups(),
btrfs_commit_transaction() won't get to cleanup_transaction error
handling,

btrfs_commit_transaction() {
...
if (!test_bit(BTRFS_TRANS_DIRTY_BG_RUN, _trans->flags)) {
...
ret = btrfs_start_dirty_block_groups(trans, root);
}
if (ret) {
btrfs_end_transaction(trans, root);
return ret;
}
...
cleanup_transaction();
}


But yes, if we delay the cleanup, we still have a chance to do cleanup
in btrfs_error_commit_super(), and I have sent another patch to add
several ASSERT()s to check block group related memory leak, with which
we'll be warned if anything wrong.

I'm OK to remove the part that causes concerns.

Thanks,

-liubo
> 
> -chris
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2016-07-21 Thread Graham Cobb
On 21/07/16 09:19, Qu Wenruo wrote:
> We don't usually get such large extent tree dump from a real world use
> case.

Let us know if you want some more :-)

I have a heavily used single disk BTRFS filesystem with about 3.7TB in
use and about 9 million extents.  I am happy to provide an extent dump
if it is useful to you.  Particularly if you don't need me to actually
unmount it (i.e. you can live with some inconsistencies).


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-21 Thread Patrik Lundquist
On 21 July 2016 at 15:34, Chris Murphy  wrote:
>
> Do programs have a way to communicate what portion of a data file is
> modified, so that only changed blocks are COW'd? When I change a
> single pixel in a 400MiB image and do a save (to overwrite the
> original file), it takes just as long to overwrite as to write it out
> as a new file. It'd be neat if that could be optimized but I don't see
> it being the case at the moment.

Programs can choose to seek within a file and only overwrite changed
parts, like BitTorrent (use NOCOW or defrag files like that).

Paint programs usually compress the changed image on save, so most of
the file is changed anyway. But if it's a raw image file just writing
the changed pixels should work, but that would require a comparison
with the original image (or a for pixel change history) so I doubt
anyone cares to implement it at the application level.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: fix memory leak of block group cache

2016-07-21 Thread Chris Mason



On 07/20/2016 08:44 PM, Liu Bo wrote:

While processing delayed refs, we may update block group's statistics
and attach it to cur_trans->dirty_bgs, and later writing dirty block
groups will process the list, which happens during
btrfs_commit_transaction().

For whatever reason, the transaction is aborted and dirty_bgs
is not processed in cleanup_transaction(), we end up with memory leak
of these dirty block group cache.

Since btrfs_start_dirty_block_groups() doesn't make it go to the commit
critical section, this also adds the cleanup work inside it.


It's the start_drity_block_groups() hunt that worries me a bit:


diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 50bd683..7a35c9d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3698,6 +3698,8 @@ again:
goto again;
}
spin_unlock(_trans->dirty_bgs_lock);
+   } else if (ret < 0) {
+   btrfs_cleanup_dirty_bgs(cur_trans, root);
}

btrfs_free_path(path);



We have checks in here to make sure only one process runs 
btrfs_start_dirty_block_groups() but that doesn't mean that only one 
process its messing around with the cache inode.  Is there any reason we 
can't let this cleanup wait for the cleanup_transaction code?


btrfs_run_delayed_refs() already aborts when it fails.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Finding only non-snapshots via btrfs subvol list

2016-07-21 Thread Holger Hoffstätte

I'm trying to find non-snapshots, i.e. 'top-level' subvolumes in a
filesystem and this seems harder than it IMHO should be.

The fs is just like:

/mnt/stuff
 subvolA
 subvolA-date1
 subvolA-date2
 subvolB
 subvolB-date1
 subvolB-date2
..

All I want are the subvol{A,B} *without* the snapshots, but so
far I haven't been able to accomplish this easily with "subvol list"
and its options. -s lists only snapshots, but what I want is the
exact opposite.

So far the best I could find - except for relying on my ad-hoc naming
conventions and inverse-grepping for that - is via -q (print parent UUID)
and matching on that:

btrfs subvol list -q /mnt/stuff | grep "parent_uuid -" | cut -f 11 -d " "

gives me what I want - but eeww. So somehow I think I'm missing something
trivial. Is there a better, non-greppy way?

Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-21 Thread Chris Murphy
On Thu, Jul 21, 2016 at 8:12 AM, Austin S. Hemmelgarn
 wrote:

> This really isn't fake COW, it's COW, just at a higher level than most
> programmers would think of it.

Alright I'll stop calling it that. Thanks.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-21 Thread Austin S. Hemmelgarn

On 2016-07-21 09:34, Chris Murphy wrote:

On Thu, Jul 21, 2016 at 6:46 AM, Austin S. Hemmelgarn
 wrote:

On 2016-07-20 15:58, Chris Murphy wrote:


On Sun, Jul 17, 2016 at 3:08 AM, Hendrik Friedel 
wrote:


Well, btrfs does write data very different to many other file systems. On
every write the file is copied to another place, even if just one bit is
changed. That's special and I am wondering whether that could cause
problems.



It depends on the application. In practice, the program most
responsible for writing the file often does a faux-COW by writing a
whole new (temporary) file somewhere, when that operation completes,
it then deletes the original, and move+renames the temporary one into
place where the original one, doing fsync in between each of those
operations. I think some of this is done via VFS also. It's all much
more metadata centric than what Btrfs would do on its own.


I'm pretty certain that the VFS itself does not do replace by rename type
stuff.


I can't tell what does it. But so far every program I've tried: vi,
gedit, GIMP, writes out a new file - as in, it has a different inode
number and every extent has a different address. That every program
reimplements this faux-COW would kinda surprise me rather than just
letting the VFS do it for everyone. I think since ancient times
literally overwriting files is just a bad idea that pretty much
guarantees data loss of old and new data if something interrupts that
overwrite.
This really isn't fake COW, it's COW, just at a higher level than most 
programmers would think of it.  The rename to replace is the pointer 
update, and the copy granularity is variable based on the size of the file.


The whole practice is used by just about everything, and dates back to 
before even SVR4, because traditional filesystems will corrupt files if 
they're being written when a power loss or crash occurs.  It's also 
popular because it breaks hard links, which have often be used as a poor 
man's form of deduplication.  Even on newer journaled filesystems, 
things aren't always safe across a power loss if you don't do this.  It 
can't be done legitimately in the VFS though, because POSIX requires 
that the inode not change if the file is just overwritten or rewritten 
in place.  Vi (which is probably vim on your system, although all other 
implementations I know of do likewise) does the this by itself.  Most 
graphical applications have it happen through libraries they link to (I 
know for a fact that Qt has an option to do this, and I'm pretty certain 
Glib does too, but I don't know if they do by default or not).  In 
general though, it's really not all that much duplicated code, maybe 20 
lines tops, assuming they don't use predictable file names and open code 
the temporary name generation.



BTRFS by nature technically does though, it's the same idea as a COW
update, just at a higher level, so we're technically doing the same thing
for every single block that changes.  The only issue I can think of in this
context with a replace by rename is that you end up hitting the metadata
trees twice.


Do programs have a way to communicate what portion of a data file is
modified, so that only changed blocks are COW'd? When I change a
single pixel in a 400MiB image and do a save (to overwrite the
original file), it takes just as long to overwrite as to write it out
as a new file. It'd be neat if that could be optimized but I don't see
it being the case at the moment.
AFAIUI, in BTRFS (and also ZFS), whatever blocks get rewritten get 
COW'ed.  So, rewriting the whole file will COW the whole file, not just 
the blocks that are different.  Trying to check in the FS itself what 
changed is actually rather inefficient (you will almost always spend 
more time comparing data than you will save by writing it all out if 
your using fast storage, and every write potentially implies a huge 
number of reads), and relying on the application to tell us is 
dangerous.  That said, most of the required infrastructure is already 
present in the in-band deduplication stuff, and in fact, it may do this 
for files that get rewritten frequently enough that they don't get 
pushed out of it's cache (I haven't tested for this, and I don't have 
the time or expertise to read through the code to see if it will, but 
based on my current understanding of how it works, it should do this 
implicitly).  The whole thing is a trade off though, because only 
COW'ing the parts that changed leads to higher levels of fragmentation, 
and that's part of why database and disk image files have such issues 
with fragmentation and making them NOCOW helps with these issues, they 
only get spot rewrites.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-21 Thread Andrei Borzenkov
On Thu, Jul 21, 2016 at 4:34 PM, Chris Murphy  wrote:

>
> Do programs have a way to communicate what portion of a data file is
> modified, so that only changed blocks are COW'd? When I change a
> single pixel in a 400MiB image and do a save (to overwrite the
> original file), it takes just as long to overwrite as to write it out
> as a new file. It'd be neat if that could be optimized but I don't see
> it being the case at the moment.
>

NetApp has an option to do it for CIFS connections. It literally
compares old and new files on renames and discards duplicates. It is
off by default. I am not aware of anyone using it :)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-21 Thread Chris Murphy
On Thu, Jul 21, 2016 at 6:46 AM, Austin S. Hemmelgarn
 wrote:
> On 2016-07-20 15:58, Chris Murphy wrote:
>>
>> On Sun, Jul 17, 2016 at 3:08 AM, Hendrik Friedel 
>> wrote:
>>
>>> Well, btrfs does write data very different to many other file systems. On
>>> every write the file is copied to another place, even if just one bit is
>>> changed. That's special and I am wondering whether that could cause
>>> problems.
>>
>>
>> It depends on the application. In practice, the program most
>> responsible for writing the file often does a faux-COW by writing a
>> whole new (temporary) file somewhere, when that operation completes,
>> it then deletes the original, and move+renames the temporary one into
>> place where the original one, doing fsync in between each of those
>> operations. I think some of this is done via VFS also. It's all much
>> more metadata centric than what Btrfs would do on its own.
>
> I'm pretty certain that the VFS itself does not do replace by rename type
> stuff.

I can't tell what does it. But so far every program I've tried: vi,
gedit, GIMP, writes out a new file - as in, it has a different inode
number and every extent has a different address. That every program
reimplements this faux-COW would kinda surprise me rather than just
letting the VFS do it for everyone. I think since ancient times
literally overwriting files is just a bad idea that pretty much
guarantees data loss of old and new data if something interrupts that
overwrite.


>BTRFS by nature technically does though, it's the same idea as a COW
> update, just at a higher level, so we're technically doing the same thing
> for every single block that changes.  The only issue I can think of in this
> context with a replace by rename is that you end up hitting the metadata
> trees twice.

Do programs have a way to communicate what portion of a data file is
modified, so that only changed blocks are COW'd? When I change a
single pixel in a 400MiB image and do a save (to overwrite the
original file), it takes just as long to overwrite as to write it out
as a new file. It'd be neat if that could be optimized but I don't see
it being the case at the moment.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()

2016-07-21 Thread Josef Bacik

On 07/20/2016 09:49 PM, Wang Xiaoguang wrote:

hello,

On 07/20/2016 09:18 PM, Josef Bacik wrote:

On 07/20/2016 01:56 AM, Wang Xiaoguang wrote:

In prealloc_file_extent_cluster(), btrfs_check_data_free_space() uses
wrong file offset for reloc_inode, it uses cluster->start and cluster->end,
which indeed are extent's bytenr. The correct value should be
cluster->[start|end] minus block group's start bytenr.

start bytenr   cluster->start
|  | extent  |   extent   | ...| extent |
||
|block group reloc_inode |

Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/relocation.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0477dca..a0de885 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3030,12 +3030,14 @@ int prealloc_file_extent_cluster(struct inode *inode,
 u64 num_bytes;
 int nr = 0;
 int ret = 0;
+u64 prealloc_start = cluster->start - offset;
+u64 prealloc_end = cluster->end - offset;

 BUG_ON(cluster->start != cluster->boundary[0]);
 inode_lock(inode);

-ret = btrfs_check_data_free_space(inode, cluster->start,
-  cluster->end + 1 - cluster->start);
+ret = btrfs_check_data_free_space(inode, prealloc_start,
+  prealloc_end + 1 - prealloc_start);
 if (ret)
 goto out;

@@ -3056,8 +3058,8 @@ int prealloc_file_extent_cluster(struct inode *inode,
 break;
 nr++;
 }
-btrfs_free_reserved_data_space(inode, cluster->start,
-   cluster->end + 1 - cluster->start);
+btrfs_free_reserved_data_space(inode, prealloc_start,
+   prealloc_end + 1 - prealloc_start);
 out:
 inode_unlock(inode);
 return ret;



This ends up being the same amount.  Consider this scenario

bg bytenr = 4096
cluster->start = 8192
cluster->end = 12287

cluster->end + 1 - cluster->start = 4096

prealloc_start = cluster->start - offset = 0
prealloc_end = cluster->end - offset = 8191

prealloc_end + 1 - prealloc_start = 4096

You shift both by the same amount, which gives you the same answer.  Thanks,

Thanks for reviewing.
Yes, I know the amount of preallocated data space is the same, this patch
does not fix any bugs :)

For every block group to be balanced, we create a corresponding inode.
For this inode, the initial offset should be 0. In your above example,
before this patch, it's btrfs_free_reserved_data_space(inode, 8192, 4096);
with this patch, it's btrfs_free_reserved_data_space(inode, 4096, 4096).

I just want to make btrfs_free_reserved_data_space()'s 'start' argument
be offset inside block group, not offset inside whole fs byternr space. I'm not
a  English native, hope that I have expressed what I want to :)

But yes, I'm also OK with removing this patch.



Oh I see, ok that's fine if we're just trying to make it look sane then go for 
it.

Signed-off-by: Josef Bacik 

Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A lot warnings in dmesg while running thunderbird

2016-07-21 Thread Chris Mason

On 07/20/2016 01:50 PM, Gabriel C wrote:


After 24h of running the program and thundirbird all is still fine here.

I let it run one more day.. But looks very good.



Thanks for your time in helping to track this down.  It'll go into the 
next merge window and be cc'd to stable.


-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-21 Thread Austin S. Hemmelgarn

On 2016-07-20 15:58, Chris Murphy wrote:

On Sun, Jul 17, 2016 at 3:08 AM, Hendrik Friedel  wrote:


Well, btrfs does write data very different to many other file systems. On
every write the file is copied to another place, even if just one bit is
changed. That's special and I am wondering whether that could cause
problems.


It depends on the application. In practice, the program most
responsible for writing the file often does a faux-COW by writing a
whole new (temporary) file somewhere, when that operation completes,
it then deletes the original, and move+renames the temporary one into
place where the original one, doing fsync in between each of those
operations. I think some of this is done via VFS also. It's all much
more metadata centric than what Btrfs would do on its own.
I'm pretty certain that the VFS itself does not do replace by rename 
type stuff.  BTRFS by nature technically does though, it's the same idea 
as a COW update, just at a higher level, so we're technically doing the 
same thing for every single block that changes.  The only issue I can 
think of in this context with a replace by rename is that you end up 
hitting the metadata trees twice.


I'd expect the write pattern of Btrfs to be similar to f2fs, with
respect to sequentiality of new writes.
Not necessarily, F2FS is log structured, and while not as much like 
traditional log structured filesystems, it still has a similar long-term 
write pattern to stuff like NILFS2 or LFS.  I've not done as much with 
F2FS specifically, but I can say based on comparison to other log 
structured filesystems that outside of WORM write patterns in userspace, 
BTRFS does not have a similar write pattern to a log structured 
filesystem.  We try to pack stuff into existing allocations pretty 
aggressively, so we end up with most of our writes condensed in a small 
area of the disk.  The only cases I've seen where we get long sequential 
writes are when writing out single files one by one, without having 
anything else running at the same time on the FS.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-21 Thread Matthias Prager
> I'd expect the write pattern of Btrfs to be similar to f2fs, with
> respect to sequentiality of new writes.
Ideally yes - though my tests with a Seagate SMR drive suggest
otherwise. Optimizing the write behavior would probably lead to speed
improvements for btrfs on spinning disks.

---
Matthias Prager
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix delalloc accounting after copy_from_user faults

2016-07-21 Thread Chris Mason
Commit 56244ef151c3cd11 was almost but not quite enough to fix the
reservation math after btrfs_copy_from_user returned partial copies.

Some users are still seeing warnings in btrfs_destroy_inode, and with a
long enough test run I'm able to trigger them as well.

This patch fixes the accounting math again, bringing it much closer to
the way it was before the sectorsize conversion Chandan did.  The
problem is accounting for the offset into the page/sector when we do a
partial copy.  This one just uses the dirty_sectors variable which
should already be updated properly.

Signed-off-by: Chris Mason 
cc: sta...@vger.kernel.org # v4.6+
---
 fs/btrfs/file.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f3f61d1..bcfb4a2 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1629,13 +1629,11 @@ again:
 * managed to copy.
 */
if (num_sectors > dirty_sectors) {
-   /*
-* we round down because we don't want to count
-* any partial blocks actually sent through the
-* IO machines
-*/
-   release_bytes = round_down(release_bytes - copied,
- root->sectorsize);
+
+   /* release everything except the sectors we dirtied */
+   release_bytes -= dirty_sectors <<
+   root->fs_info->sb->s_blocksize_bits;
+
if (copied > 0) {
spin_lock(_I(inode)->lock);
BTRFS_I(inode)->outstanding_extents++;
-- 
2.8.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: avoid deadlocks during reservations in btrfs_truncate_block

2016-07-21 Thread Chris Mason
From: Josef Bacik 

The new enospc code makes it possible to deadlock if we don't use
FLUSH_LIMIT during reservations inside a transaction.  This enforces
the correct flush type to avoid both deadlocks and assertions

Signed-off-by: Chris Mason 
Signed-off-by: Josef Bacik 
---
 fs/btrfs/extent-tree.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3a129c4..9fcb8c9 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5939,10 +5939,15 @@ int btrfs_delalloc_reserve_metadata(struct inode 
*inode, u64 num_bytes)
 * the middle of a transaction commit.  We also don't need the delalloc
 * mutex since we won't race with anybody.  We need this mostly to make
 * lockdep shut its filthy mouth.
+*
+* If we have a transaction open (can happen if we call truncate_block
+* from truncate), then we need FLUSH_LIMIT so we don't deadlock.
 */
if (btrfs_is_free_space_inode(inode)) {
flush = BTRFS_RESERVE_NO_FLUSH;
delalloc_lock = false;
+   } else if (current->journal_info) {
+   flush = BTRFS_RESERVE_FLUSH_LIMIT;
}
 
if (flush != BTRFS_RESERVE_NO_FLUSH &&
-- 
2.8.0.rc2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] generic/371: run write(2) and fallocate(2) in parallel

2016-07-21 Thread Eryu Guan
On Thu, Jul 21, 2016 at 03:30:25PM +0800, Wang Xiaoguang wrote:
> Currently in btrfs, there is something wrong with fallocate(2)'s data
> space reservation, it'll temporarily occupy more data space thant it
> really needs, which in turn will impact other operations' data request.
> 
> In this test case, it runs write(2) and fallocate(2) in parallel and the
> total needed data space for these two operations don't exceed whole fs
> free data space, to see whether we will get any unexpected ENOSPC error.
> 
> Signed-off-by: Wang Xiaoguang 
> ---
>  tests/generic/371 | 119 
> ++
>  tests/generic/371.out |   2 +
>  tests/generic/group   |   1 +
>  3 files changed, 122 insertions(+)
>  create mode 100755 tests/generic/371
>  create mode 100644 tests/generic/371.out
> 
> diff --git a/tests/generic/371 b/tests/generic/371
> new file mode 100755
> index 000..b85327a
> --- /dev/null
> +++ b/tests/generic/371
> @@ -0,0 +1,119 @@
> +#! /bin/bash
> +# FS QA Test 371
> +#
> +# Run write(2) and fallocate(2) in parallel and the total needed data space
> +# for these operations don't exceed whole fs free data space, to see whether
> +# we will get any unexpected ENOSPC error.
> +#
> +#---
> +# Copyright (c) 2016 Fujitsu.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch

Need '_require_xfs_io_command "falloc"', otherwise test fails on ext3/2,
because they don't support fallocate(2).

> +
> +_scratch_mkfs_sized $((256 * 1024 * 1024)) >> $seqres.full 2>&1
> +_scratch_mount
> +
> +testfile1=$SCRATCH_MNT/testfile1
> +testfile2=$SCRATCH_MNT/testfile2
> +
> +write_work()
> +{
> + rm -f $testfile1
> + while [ 1 ]; do
> + $XFS_IO_PROG -f -c "pwrite 0 80M" $testfile1 >>$seqres.full 2>&1
> + grep "No space left on device" $seqres.full >/dev/null
> + if [ $? -eq 0 ]; then
> + echo "unexpected ENOSPC error occurs"
> + exit 1
> + fi

No need to grep for error message, just redirect stdout to /dev/null and
any error will appear in stderr, which will break golden image.

And $seqres.full is where we dump logs for debug purpose, tests should
dump output they need to somewhere like $tmp.

> + rm -f $testfile1
> + done
> +}
> +
> +fallocate_work()
> +{
> + rm -f  $testfile2
> + while [ 1 ]; do
> + $XFS_IO_PROG -f -c "falloc 0 80M" $testfile2 >>$seqres.full 2>&1
> + grep "No space left on device" $seqres.full >/dev/null
> + if [ $? -eq 0 ]; then
> + echo "unexpected ENOSPC error occurs"
> + exit 1
> + fi
> + rm -f $testfile2
> + done
> +}
> +
> +run_time=$((180 * $TIME_FACTOR))

180s is too long time, I can reproduce it in around 10s on my test vm,
just loop for 100 times for each operation (pwrite and falloc)

> +write_work &
> +write_work_pid=$!
> +fallocate_work &
> +fallocate_work_pid=$!
> +
> +for ((elapsed_time = 0; elapsed_time < run_time; elapsed_time += 5)); do
> + kill -0 $write_work_pid >/dev/null 2>&1
> + if [ $? -ne 0 ]; then
> + kill $fallocate_work_pid >/dev/null 2>&1
> + break
> + fi
> +
> + kill -0 $fallocate_work_pid >/dev/null 2>&1
> + if [ $? -ne 0 ]; then
> + kill $write_work_pid >/dev/null 2>&1
> + break
> + fi
> + sleep 5
> +done
> +
> +kill $fallocate_work_pid >/dev/null 2>&1
> +kill $write_work_pid >/dev/null 2>&1
> +wait
> +
> +# wait un-finished xfs_io
> +while ps aux | grep "xfs_io" | grep -qv grep; do
> + sleep 1
> +done

And this seems 

Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2016-07-21 Thread Qu Wenruo



At 07/21/2016 04:13 PM, John Ettedgui wrote:

On Thu, Jul 21, 2016 at 1:10 AM Qu Wenruo > wrote:

Thanks for the info, pretty helpful.

After a simple analysis, the defrag did do a pretty good job.

---
| Avg Extent size | Median Extent size | Data Extents  |
---
Predefrag  | 2.6M| 512K   | 1043589   |
Postdefrag | 7.4M| 80K| 359823|

Defrag reduced the number of extents to 34%!

Quite awesome.

While I still see quite a lot small extents (In fact, small extents are
more after defrag), so I assume there can be more improvement.

But considering the mount time is only affected by number of extents
(data and meta, but amount of meta is not affect by defrag), so the
improvement is already quite obvious now.

Much more obvious than my expectation.

Thanks,
Qu

I'm glad to be of help.
Is there anything else you'd like me to try?
I don't have any non-defragmented partitions anymore, but you already
got that information so that should be ok.

Thank you,
John


No more.

The dump is already good enough for me to dig for some time.

We don't usually get such large extent tree dump from a real world use case.

It would help us in several ways, from determine how fragmented a block 
group is to determine if a defrag will help.


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: Update quick and auto tag for btrfs group

2016-07-21 Thread Qu Wenruo
Update the following quick/auto tag based on their execution time
007
050
100
101

Two systems are used to determine their execution time.
One is backed by an SATA spinning rust, whose maximum R/W speed is
about 100MB/s, modern desktop performance. (VM1)

Another one is a VM inside a openstack pool, with stronger CPU and
memory performance along with high latency storage.
Maximum R/W speed is around 150MB/s, latency is much higher than normal
HDD though. (VM2)

The 'quick' standard is a little more restrict, only when both systems
pass the test within 30s(+/- 10%), while 'auto' is less restrict, any
system can pass within 5min(+/- 10%) will still stay in 'auto' group.

Other test cases don't fit both standards on both systems will not be
modified.

Execution time result: (Unit: seconds)
--
Test case No. | VM1| VM2  | Modification |
--
007   | 4  | 2| +quick   |
050   | 4  | 13   | +quick   |
100   | 57 | 151  | -quick   |
101   | 45 | 59   | -quick   |
--

Signed-off-by: Qu Wenruo 
---
v2:
  Keep btrfs/011 into 'auto' group, to keep the coverage.
---
 tests/btrfs/group | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/btrfs/group b/tests/btrfs/group
index a21a80a..f2e8ab2 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -9,7 +9,7 @@
 004 auto rw metadata
 005 auto defrag
 006 auto quick
-007 auto rw metadata send
+007 auto quick rw metadata send
 008 auto quick send
 009 auto quick subvol
 010 auto quick defrag
@@ -52,7 +52,7 @@
 047 auto quick send
 048 auto quick
 049 auto quick
-050 auto send
+050 auto quick send
 051 auto quick send
 052 auto quick clone
 053 auto quick send
@@ -102,8 +102,8 @@
 097 auto quick send clone
 098 auto quick metadata clone
 099 auto quick qgroup
-100 auto quick replace
-101 auto quick replace
+100 auto replace
+101 auto replace
 102 auto quick metadata enospc
 103 auto quick clone compress
 104 auto qgroup
-- 
2.9.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2016-07-21 Thread Qu Wenruo

Thanks for the info, pretty helpful.

After a simple analysis, the defrag did do a pretty good job.

---
   | Avg Extent size | Median Extent size | Data Extents  |
---
Predefrag  | 2.6M| 512K   | 1043589   |
Postdefrag | 7.4M| 80K| 359823|

Defrag reduced the number of extents to 34%!

Quite awesome.

While I still see quite a lot small extents (In fact, small extents are 
more after defrag), so I assume there can be more improvement.


But considering the mount time is only affected by number of extents 
(data and meta, but amount of meta is not affect by defrag), so the 
improvement is already quite obvious now.


Much more obvious than my expectation.

Thanks,
Qu

At 07/20/2016 06:44 PM, John Ettedgui wrote:

On Mon, Jul 18, 2016 at 2:07 AM Qu Wenruo > wrote:

Yes, to compare the extent size and verify my assumption.

But I'm afraid you don't have any fs with that slow mount time any more.


Here are 2 links for the information you requested, I've gzipped each
file as it was quite big...

https://mega.nz/#!QhQSHBhb!RwN3kDBK6ZOkq3e5UkNhzB0XnbfgZgql4c5fvjfDq1w

https://mega.nz/#!M5gVAbLA!S_TxIls1_q6MqMVlCRK5XxTXifXPE76tdJWsf5XRxYE


I didn't look at their content, but just comparing their size, there is
quite a difference there.

Thank you,
John



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Update quick and auto tag for btrfs group

2016-07-21 Thread Qu Wenruo



At 07/21/2016 03:47 PM, Eryu Guan wrote:

On Thu, Jul 21, 2016 at 10:06:07AM +0800, Qu Wenruo wrote:

Update the following quick/auto tag based on their execution time
007
011
050
100
101

Two systems are used to determine their execution time.
One is backed by an SATA spinning rust, whose maximum R/W speed is
about 100MB/s, modern desktop performance. (VM1)

Another one is a VM inside a openstack pool, with stronger CPU and
memory performance along with high latency storage.
Maximum R/W speed is around 150MB/s, latency is much higher than normal
HDD though. (VM2)

The 'quick' standard is a little more restrict, only when both systems
pass the test within 30s(+/- 10%), while 'auto' is less restrict, any
system can pass within 5min(+/- 10%) will still stay in 'auto' group.

Other test cases don't fit both standards on both systems will not be
modified.

Execution time result: (Unit: seconds)
--
Test case No. | VM1| VM2  | Modification |
--
007   | 4  | 2| +quick   |
011   | 669| 1748 | -auto|
050   | 4  | 13   | +quick   |
100   | 57 | 151  | -quick   |
101   | 45 | 59   | -quick   |
--


Hmm, I'm a bit hesitated to take this. We usually try to make new tests
run quick enough to fit auto/quick group, or reduce the run time of
existing tests if they're taking too long time (e.g. 86c1b55 xfs/042:
reduce runtime of the test).

But we usually don't remove 'auto' group from a test just because of
test time, so we don't lose test coverage, especially when the test is
potent. (I think btrfs/011 is one of these potent tests, it finds
hang/crash/failure from time to time. BTW, btrfs/011 took 210s on my
test vm, which has 4vcpu and 8G memory.)

I think the "<30s <5m" rule is more like a guidance, not a hard rule.

But 007 and 050 clearly belong to 'quick' group.

Thanks,
Eryu


Right, 011 is really helpful, so I'm OK to keep the 'auto' tag.

I'll update the patch soon.

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Update quick and auto tag for btrfs group

2016-07-21 Thread Eryu Guan
On Thu, Jul 21, 2016 at 10:06:07AM +0800, Qu Wenruo wrote:
> Update the following quick/auto tag based on their execution time
> 007
> 011
> 050
> 100
> 101
> 
> Two systems are used to determine their execution time.
> One is backed by an SATA spinning rust, whose maximum R/W speed is
> about 100MB/s, modern desktop performance. (VM1)
> 
> Another one is a VM inside a openstack pool, with stronger CPU and
> memory performance along with high latency storage.
> Maximum R/W speed is around 150MB/s, latency is much higher than normal
> HDD though. (VM2)
> 
> The 'quick' standard is a little more restrict, only when both systems
> pass the test within 30s(+/- 10%), while 'auto' is less restrict, any
> system can pass within 5min(+/- 10%) will still stay in 'auto' group.
> 
> Other test cases don't fit both standards on both systems will not be
> modified.
> 
> Execution time result: (Unit: seconds)
> --
> Test case No. | VM1| VM2  | Modification |
> --
> 007   | 4  | 2| +quick   |
> 011   | 669| 1748 | -auto|
> 050   | 4  | 13   | +quick   |
> 100   | 57 | 151  | -quick   |
> 101   | 45 | 59   | -quick   |
> --

Hmm, I'm a bit hesitated to take this. We usually try to make new tests
run quick enough to fit auto/quick group, or reduce the run time of
existing tests if they're taking too long time (e.g. 86c1b55 xfs/042:
reduce runtime of the test).

But we usually don't remove 'auto' group from a test just because of
test time, so we don't lose test coverage, especially when the test is
potent. (I think btrfs/011 is one of these potent tests, it finds
hang/crash/failure from time to time. BTW, btrfs/011 took 210s on my
test vm, which has 4vcpu and 8G memory.)

I think the "<30s <5m" rule is more like a guidance, not a hard rule.

But 007 and 050 clearly belong to 'quick' group.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] generic/371: run write(2) and fallocate(2) in parallel

2016-07-21 Thread Wang Xiaoguang
Currently in btrfs, there is something wrong with fallocate(2)'s data
space reservation, it'll temporarily occupy more data space thant it
really needs, which in turn will impact other operations' data request.

In this test case, it runs write(2) and fallocate(2) in parallel and the
total needed data space for these two operations don't exceed whole fs
free data space, to see whether we will get any unexpected ENOSPC error.

Signed-off-by: Wang Xiaoguang 
---
 tests/generic/371 | 119 ++
 tests/generic/371.out |   2 +
 tests/generic/group   |   1 +
 3 files changed, 122 insertions(+)
 create mode 100755 tests/generic/371
 create mode 100644 tests/generic/371.out

diff --git a/tests/generic/371 b/tests/generic/371
new file mode 100755
index 000..b85327a
--- /dev/null
+++ b/tests/generic/371
@@ -0,0 +1,119 @@
+#! /bin/bash
+# FS QA Test 371
+#
+# Run write(2) and fallocate(2) in parallel and the total needed data space
+# for these operations don't exceed whole fs free data space, to see whether
+# we will get any unexpected ENOSPC error.
+#
+#---
+# Copyright (c) 2016 Fujitsu.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# Modify as appropriate.
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+
+_scratch_mkfs_sized $((256 * 1024 * 1024)) >> $seqres.full 2>&1
+_scratch_mount
+
+testfile1=$SCRATCH_MNT/testfile1
+testfile2=$SCRATCH_MNT/testfile2
+
+write_work()
+{
+   rm -f $testfile1
+   while [ 1 ]; do
+   $XFS_IO_PROG -f -c "pwrite 0 80M" $testfile1 >>$seqres.full 2>&1
+   grep "No space left on device" $seqres.full >/dev/null
+   if [ $? -eq 0 ]; then
+   echo "unexpected ENOSPC error occurs"
+   exit 1
+   fi
+   rm -f $testfile1
+   done
+}
+
+fallocate_work()
+{
+   rm -f  $testfile2
+   while [ 1 ]; do
+   $XFS_IO_PROG -f -c "falloc 0 80M" $testfile2 >>$seqres.full 2>&1
+   grep "No space left on device" $seqres.full >/dev/null
+   if [ $? -eq 0 ]; then
+   echo "unexpected ENOSPC error occurs"
+   exit 1
+   fi
+   rm -f $testfile2
+   done
+}
+
+run_time=$((180 * $TIME_FACTOR))
+write_work &
+write_work_pid=$!
+fallocate_work &
+fallocate_work_pid=$!
+
+for ((elapsed_time = 0; elapsed_time < run_time; elapsed_time += 5)); do
+   kill -0 $write_work_pid >/dev/null 2>&1
+   if [ $? -ne 0 ]; then
+   kill $fallocate_work_pid >/dev/null 2>&1
+   break
+   fi
+
+   kill -0 $fallocate_work_pid >/dev/null 2>&1
+   if [ $? -ne 0 ]; then
+   kill $write_work_pid >/dev/null 2>&1
+   break
+   fi
+   sleep 5
+done
+
+kill $fallocate_work_pid >/dev/null 2>&1
+kill $write_work_pid >/dev/null 2>&1
+wait
+
+# wait un-finished xfs_io
+while ps aux | grep "xfs_io" | grep -qv grep; do
+   sleep 1
+done
+
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/generic/371.out b/tests/generic/371.out
new file mode 100644
index 000..22ec8a2
--- /dev/null
+++ b/tests/generic/371.out
@@ -0,0 +1,2 @@
+QA output created by 371
+Silence is golden
diff --git a/tests/generic/group b/tests/generic/group
index 97ecb65..3d4a802 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -373,3 +373,4 @@
 368 auto quick richacl
 369 auto quick richacl
 370 auto quick richacl
+371 auto enospc prealloc stress
-- 
2.9.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: test whether fallocate(2) can preallocate half of the whole fs space

2016-07-21 Thread Wang Xiaoguang

hello,

On 07/21/2016 03:00 AM, Christoph Hellwig wrote:

On Wed, Jul 20, 2016 at 03:34:50PM +0800, Wang Xiaoguang wrote:

Currently in btrfs, there is something wrong with data space reservation.
For example, if we try to preallocate more than haf of whole fs space,
ENOSPC will occur, but indeed fs still has free space to satisfy this
request.

To easily reproduce this bug, this test case needs fs is mixed mode(btrfs
specific), so put this test case in btrfs group, not generic group.

But the actual test isn't btrfs specific, and other file systems might
have issues in this area as well.  Please add it to the generic group
instead.

OK, new version will be sent soon.
ext4 & xfs pass the test, btrfs fails.

Regards,
Xiaoguang Wang







--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: test whether fallocate(2) can preallocate half of the whole fs space

2016-07-21 Thread Wang Xiaoguang

hello,

On 07/20/2016 05:25 PM, Filipe Manana wrote:

On Wed, Jul 20, 2016 at 8:34 AM, Wang Xiaoguang
 wrote:

Currently in btrfs, there is something wrong with data space reservation.
For example, if we try to preallocate more than haf of whole fs space,
ENOSPC will occur, but indeed fs still has free space to satisfy this
request.

To easily reproduce this bug, this test case needs fs is mixed mode(btrfs
specific), so put this test case in btrfs group, not generic group.

Why does it need to be in mixed mode (why is it easier)? As far as I
can see from the corresponding btrfs patch (and its description),
the problem affects non-mixed mode, therefore it could be a generic test.

In btrfs mixed mode, as you know, data and metadata share the same
btrfs_space_info. In btrfs_fallocate(), if we try to pre-allocate 64M data,
it first makes bytes_may_use += 64M,  later btrfs_reserve_extent()
only makes bytes_reserved += 64M, but does not decrease bytes_may_use.
Assume our fs is 128M, now bytes_may_use + bytes_reserved is already
greater than 128M, so any later transaction operations in 
btrfs_fallocate() will

fail for ENOSPC reason. But note, byte_may_use will only be decreased in end
of btrfs_fallocate(). So I think this bug is easy to reproduce in mixed 
mode  :)


For non-mixed mode, still assume fs is 128M. For data's btrfs_space_info,
before we call  btrfs_free_reserved_data_space() in end of 
btrfs_fallocate(), though

bytes_may_use +=64M, bytes_reserved += 64M, and bytes_may_use +
bytes_reserved > 128M, there is no other process to allocating data 
space, so

there will still be no ENOSPC error.

I have rewritten a test case which do write(2) and fallocate(2) in 
parallel and

put this test case to generic group, thanks.


Regards,
Xiaoguang Wang




Signed-off-by: Wang Xiaoguang 
---
  tests/btrfs/127 | 61 +
  tests/btrfs/127.out |  2 ++
  tests/btrfs/group   |  1 +
  3 files changed, 64 insertions(+)
  create mode 100755 tests/btrfs/127
  create mode 100644 tests/btrfs/127.out

diff --git a/tests/btrfs/127 b/tests/btrfs/127
new file mode 100755
index 000..f95c72f
--- /dev/null
+++ b/tests/btrfs/127
@@ -0,0 +1,61 @@
+#! /bin/bash
+# FS QA Test 127
+#
+# Test whether fallocate(2) can preallocate half of the whole fs space.
+#
+#---
+# Copyright (c) 2016 Fujitsu.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+
+MKFS_OPTIONS="-M"

This now overrides any specific mkfs options specified in the command
line, doesn't it?


+_scratch_mkfs_sized $((256 * 1024 * 1024)) >> $seqres.full 2>&1
+_scratch_mount
+
+$XFS_IO_PROG -f -c "falloc 0 128M" $SCRATCH_MNT/testfile | _filter_xfs_io

No need for the filter here.


+
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/btrfs/127.out b/tests/btrfs/127.out
new file mode 100644
index 000..0af84c0
--- /dev/null
+++ b/tests/btrfs/127.out
@@ -0,0 +1,2 @@
+QA output created by 127
+Silence is golden
diff --git a/tests/btrfs/group b/tests/btrfs/group
index a21a80a..4c3ac00 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -129,3 +129,4 @@
  124 auto replace
  125 auto replace
  126 auto quick qgroup
+127 auto quick metadata enospc
--
2.9.0



--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html







--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html