Re: [PATCH] fstests: generic: Test SHARED flag about fiemap ioctl before and after sync

2016-05-12 Thread Darrick J. Wong
On Fri, May 13, 2016 at 09:52:52AM +0800, Qu Wenruo wrote:
> The test case will check SHARED flag returned by fiemap ioctl on
> reflinked files before and after sync.
> 
> Normally SHARED flag won't change just due to a normal sync operation.
> 
> But btrfs doesn't handle SHARED flag well, and this time it won't check
> any delayed extent tree(reverse extent searching tree) modification, but
> only metadata already committed to disk.
> 
> So btrfs will not return correct SHARED flag on reflinked files if there
> is no sync to commit all metadata.
> 
> This testcase will just check it.
> 
> Signed-off-by: Qu Wenruo 
> --
> And of course, xfs handles it quite well. Nice work Darrick.
> Also the test case needs the new infrastructure introduced in previous
> generic/352 test case.
> ---
>  tests/generic/353 | 86 
> +++
>  tests/generic/353.out |  9 ++
>  tests/generic/group   |  1 +
>  3 files changed, 96 insertions(+)
>  create mode 100755 tests/generic/353
>  create mode 100644 tests/generic/353.out
> 
> diff --git a/tests/generic/353 b/tests/generic/353
> new file mode 100755
> index 000..1e9117e
> --- /dev/null
> +++ b/tests/generic/353
> @@ -0,0 +1,86 @@
> +#! /bin/bash
> +# FS QA Test 353
> +#
> +# Check if fiemap ioctl returns correct SHARED flag on reflinked file
> +# before and after sync the fs
> +#
> +# Btrfs has a bug in checking shared extent, which can only handle metadata
> +# already committed to disk, but not delayed extent tree modification.
> +# This caused SHARED flag only occurs after sync.

I noticed this a while ago, but figured it was just btrfs being btrfs.  Ho hum.

Thanks for writing a test and getting the problem fixed.

> +#
> +#---
> +# Copyright (c) 2016 Fujitsu. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*

tmp isn't used for anything in this testcase.

> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/reflink
> +. ./common/punch
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch_reflink
> +_require_fiemap
> +
> +_scratch_mkfs > /dev/null 2>&1
> +_scratch_mount
> +
> +blocksize=64k
> +file1="$SCRATCH_MNT/file1"
> +file2="$SCRATCH_MNT/file2"
> +
> +# write the initial file
> +_pwrite_byte 0xcdcdcdcd 0 $blocksize $file1 | _filter_xfs_io
> +
> +# reflink initial file
> +_reflink_range $file1 0 $file2 0 $blocksize | _filter_xfs_io
> +
> +# check their fiemap to make sure it's correct
> +$XFS_IO_PROG -c "fiemap -v" $file1 | _filter_fiemap_flags
> +$XFS_IO_PROG -c "fiemap -v" $file2 | _filter_fiemap_flags
> +
> +# sync and recheck, to make sure the fiemap doesn't change just
> +# due to sync
> +sync
> +$XFS_IO_PROG -c "fiemap -v" $file1 | _filter_fiemap_flags
> +$XFS_IO_PROG -c "fiemap -v" $file2 | _filter_fiemap_flags

Nowadays, when I write a test that prints similar output one after the other I
will also write a comment to the output to distinguish the two cases, e.g.:

+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
fiemap before sync
+0: [0..127]: 0x2001
+0: [0..127]: 0x2001
fiemap after sync
+0: [0..127]: 0x2001
+0: [0..127]: 0x2001

This way when a bunch of tests regress some months later it's easier
for me to relearn what's going on.

(I wasn't always good at doing that.)

Otherwise, everything looks ok.

--D

> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/353.out b/tests/generic/353.out
> new file mode 100644
> index 000..0cd8981
> --- /dev/null
> +++ b/tests/generic/353.out
> @@ -0,0 +1,9 @@
> +QA output created by 353
> +wrote 65536/65536 bytes at offset 0
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +linked 65536/65536 bytes at offset 0
> +XXX Bytes, X ops; XX:XX:XX.X (XXX 

Re: BTRFS Data at Rest File Corruption

2016-05-12 Thread Richard A. Lochner
Chris,

See notes inline.

On Thu, 2016-05-12 at 19:41 -0600, Chris Murphy wrote:
> On Thu, May 12, 2016 at 11:49 AM, Richard A. Lochner  com> wrote:
> 
> > 
> > I suspected, and I still suspect that the error occurred upon a
> > metadata update that corrupted the checksum for the file, probably
> > due
> > to silent memory corruption.  If the checksum was silently
> > corrupted,
> > it would be simply written to both drives causing this type of
> > error.
> Metadata is checksummed independently of data. So if the data isn't
> updated, its checksum doesn't change, only metadata checksum is
> changed.
> > 
> > 
> > btrfs dmesg(s):
> > 
> > [16510.334020] BTRFS warning (device sdb1): checksum error at
> > logical
> > 3037444042752 on dev /dev/sdb1, sector 4988789496, root 259, inode
> > 1437377, offset 75754369024, length 4096, links 1 (path:
> > Rick/sda4.img)
> > [16510.334043] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr
> > 0, rd
> > 0, flush 0, corrupt 5, gen 0
> > [16510.345662] BTRFS error (device sdb1): unable to fixup (regular)
> > error at logical 3037444042752 on dev /dev/sdb1
> > 
> > [17606.978439] BTRFS warning (device sdb1): checksum error at
> > logical
> > 3037444042752 on dev /dev/sdc1, sector 4988750584, root 259, inode
> > 1437377, offset 75754369024, length 4096, links 1 (path:
> > Rick/sda4.img)
> > [17606.978460] BTRFS error (device sdb1): bdev /dev/sdc1 errs: wr
> > 0, rd
> > 13, flush 0, corrupt 4, gen 0
> > [17606.989497] BTRFS error (device sdb1): unable to fixup (regular)
> > error at logical 3037444042752 on dev /dev/sdc1
> This is confusing. Are these the same boot? The later time has a
> lower
> corrupt count. Can you just 'dd if=sda4.img of=/dev/null' and report
> all (new) messages in dmesg? It seems to me there should be pretty
> much all the same monotonic-time for the problem with both devices.

My apologies, they were from different boots.  After the dd, I get
these:

[109479.550836] BTRFS warning (device sdb1): csum failed ino 1437377
off 75754369024 csum 1689728329 expected csum 2165338402
[109479.596626] BTRFS warning (device sdb1): csum failed ino 1437377
off 75754369024 csum 1689728329 expected csum 2165338402
[109479.601969] BTRFS warning (device sdb1): csum failed ino 1437377
off 75754369024 csum 1689728329 expected csum 2165338402
[109479.602189] BTRFS warning (device sdb1): csum failed ino 1437377
off 75754369024 csum 1689728329 expected csum 2165338402
[109479.602323] BTRFS warning (device sdb1): csum failed ino 1437377
off 75754369024 csum 1689728329 expected csum 2165338402
> 
> Also what do you get for these for each device:
> 
> smartctl scterc -l /dev/sdX
> cat /sys/block/sdX/device/timeout
> 
# smartctl -l scterc  /dev/sdb
sartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.8-300.fc23.x86_64]
(local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools
.org

SCT Error Recovery Control:
   Read: 70 (7.0 seconds)
  Write: 70 (7.0 seconds)

# smartctl -l scterc  /dev/sdc
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.8-300.fc23.x86_64]
(local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools
.org

SCT Error Recovery Control:
   Read: 70 (7.0 seconds)
  Write: 70 (7.0 seconds)

# cat /sys/block/sdb/device/timeout
30
# cat /sys/block/sdc/device/timeout
30
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] btrfs-progs: autogen: Don't show success message on fail

2016-05-12 Thread Zhao Lei
When autogen.sh failed, the success message is still in output:
 # ./autogen.sh
 ...
 configure.ac:131: error: possibly undefined macro: PKG_CHECK_VAR
  If this token and others are legitimate, please use m4_pattern_allow.
  See the Autoconf documentation.

 Now type './configure' and 'make' to compile.
 #

Fixed by check return value of autoconf.

After patch:
 # ./autogen.sh
 ...
 configure.ac:132: error: possibly undefined macro: PKG_CHECK_VAR
  If this token and others are legitimate, please use m4_pattern_allow.
  See the Autoconf documentation.
 #

Signed-off-by: Zhao Lei 
---
 autogen.sh | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/autogen.sh b/autogen.sh
index 8b9a9cb..a5f9af2 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -64,9 +64,10 @@ echo "   automake:   $(automake --version | head -1)"
 chmod +x version.sh
 rm -rf autom4te.cache
 
-aclocal $AL_OPTS
-autoconf $AC_OPTS
-autoheader $AH_OPTS
+aclocal $AL_OPTS &&
+autoconf $AC_OPTS &&
+autoheader $AH_OPTS ||
+exit 1
 
 # it's better to use helper files from automake installation than
 # maintain copies in git tree
-- 
1.8.5.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] btrfs-progs: autogen: Make build success in CentOS 6 and 7

2016-05-12 Thread Zhao Lei
btrfs-progs build failed in CentOS 6 and 7:
 #./autogen.sh
 ...
 configure.ac:131: error: possibly undefined macro: PKG_CHECK_VAR
  If this token and others are legitimate, please use m4_pattern_allow.
  See the Autoconf documentation.
 ...

Seems PKG_CHECK_VAR is new in pkgconfig 0.28 (24-Jan-2013):
http://redmine.audacious-media-player.org/boards/1/topics/736

And the max available version for CentOS 7 in yum-repo and
rpmfind.net is: pkgconfig-0.27.1-4.el7
http://rpmfind.net/linux/rpm2html/search.php?query=pkgconfig=Search+...=centos=

I updated my pkgconfig to 0.30, but still failed at above error.
(Maybe it is my setting problem)

To make user in centos 6 and 7 building btrfs-progs without
more changes, we can avoid using PKG_CHECK_VAR in following
way found in:
https://github.com/audacious-media-player/audacious-plugins/commit/f95ab6f939ecf0d9232b3165f9241d2ea9676b9e

Changelog v1->v2:
1: Add AC_SUBST(UDEVDIR)
   Suggested by: Jeff Mahoney 

Signed-off-by: Zhao Lei 
---
 configure.ac | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 4688bc7..c79472c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -128,7 +128,8 @@ PKG_STATIC(UUID_LIBS_STATIC, [uuid])
 PKG_CHECK_MODULES(ZLIB, [zlib])
 PKG_STATIC(ZLIB_LIBS_STATIC, [zlib])
 
-PKG_CHECK_VAR([UDEVDIR], [udev], [udevdir])
+UDEVDIR="$(pkg-config udev --variable=udevdir)"
+AC_SUBST(UDEVDIR)
 
 dnl lzo library does not provide pkg-config, let use classic way
 AC_CHECK_LIB([lzo2], [lzo_version], [
-- 
1.8.5.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] btrfs-progs: autogen: Some compatibility fixs

2016-05-12 Thread Zhao Lei
Changelog v1->v2:
1: btrfs-progs: autogen: Make build success in CentOS 6 and 7
   Add AC_SUBST(UDEVDIR), suggested by: Jeff Mahoney 

Zhao Lei (3):
  btrfs-progs: autogen: Avoid chdir fail on dirname with blank
  btrfs-progs: autogen: Make build success in CentOS 6 and 7
  btrfs-progs: autogen: Don't show success message on fail

 autogen.sh   | 9 +
 configure.ac | 3 ++-
 2 files changed, 7 insertions(+), 5 deletions(-)

-- 
1.8.5.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] btrfs-progs: autogen: Avoid chdir fail on dirname with blank

2016-05-12 Thread Zhao Lei
If source put in dir with blanks, as:
  /var/lib/jenkins/workspace/btrfs progs

autogen will failed:
./autogen.sh: line 95: cd: /var/lib/jenkins/workspace/btrfs: No such file or 
directory

Can be fixed by adding quotes into cd command.

Signed-off-by: Zhao Lei 
---
 autogen.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/autogen.sh b/autogen.sh
index 9669850..8b9a9cb 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -92,7 +92,7 @@ find_autofile config.guess
 find_autofile config.sub
 find_autofile install-sh
 
-cd $THEDIR
+cd "$THEDIR"
 
 echo
 echo "Now type '$srcdir/configure' and 'make' to compile."
-- 
1.8.5.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About in-band dedupe for v4.7

2016-05-12 Thread Qu Wenruo



Wang Shilong wrote on 2016/05/13 11:13 +0800:

Hello Guys,

I am commenting not because of Qu's patch, of course, Qu and Mark Fasheh
Do a really good thing for Btrfs contributions.

Just my two cents!

1) I think Currently, we should really focus on Btrfs stability, there
are still many
bugs hidden inside Btrfs, please See Filipe flighting patches here and
there. Unfortunately,
I have not seen Btrfs's Bug is reducing for these two years even we
have frozen new features here.
we are adopting Btrfs internally sometime before, but it is really
unstable unfortunately.

So Why not sit down and make it stable firstly?


Make sense.

Although maybe out of your expectation, inband de-dedupe did exposed 
some existing bugs we didn't ever found before.

And they are all reproducible without inband dedupe.

Some examples:
1) fiemap bugs
   Not only one, but at least 2.
   See the recently submitted generic/352 and 353.

   And the fix is already under testing and may come out soon.

2) inode->outstanding_extents problems.
   Currently we use SZ_128M hard coded max extent length for
   non-compressed file extents.
   But if we change the limit to a smaller one, for example, 128K.
   We will have outstanding extents leak, and tons of warning.

   Although it won't affect current codes, it's still better to fix it.
   And we're already testing the fix again.

3) Slow backref walk.
   Already in the comment of backref.c from ancient days, but we didn't
   put much concern until inband dedupe/heavily reflink work load.

Even not that obvious, we are doing our best to stabilizing btrfs during 
the push of inband dedupe.

(While a nitpicking jerk will never see this)

But you are still quite right on this case, we may be in a rush to push it.



2)I am not against new feature, but for a new feature, I think we
should be really
careful now,  Especially if a new feature affect normal write/read
path, I think following things can
be done to make things better:


Although the affect to normal routine is limited to minimal.
you're still right, we lacks the overall documentation to explain the 
design which tries to reduce the impact to existing write routine.




->Give your design and documents(maybe put it in wiki or somewhere
else) So that
other guys can really review your design instead of looking a bunch of
codes firstly. And we really
need understand pros and cons of design, also if there is TODO, please
do clarity out how we
can solve this problem(or it is possible?).


Right, already planned before but always busy with other fixes.

The case will change in recent 2 month dramatically, as the modification 
to patchset is already minimal, we have time to create/improve the 
documentation now.


Thanks for the remind.



   ->We need reallly a lot of testing and benchmarking if it affects
normal write/read path.

   ->I think Chris is nice to merge patches, but I really argue next
time if we want to merge new feautres
Please make sure at least two other guys review for patches.


Thank you!
Shilong


Thanks for you suggestions, really helps a lot!

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About in-band dedupe for v4.7

2016-05-12 Thread Wang Shilong
Hello Guys,

I am commenting not because of Qu's patch, of course, Qu and Mark Fasheh
Do a really good thing for Btrfs contributions.

Just my two cents!

1) I think Currently, we should really focus on Btrfs stability, there
are still many
bugs hidden inside Btrfs, please See Filipe flighting patches here and
there. Unfortunately,
I have not seen Btrfs's Bug is reducing for these two years even we
have frozen new features here.
we are adopting Btrfs internally sometime before, but it is really
unstable unfortunately.

So Why not sit down and make it stable firstly?

2)I am not against new feature, but for a new feature, I think we
should be really
careful now,  Especially if a new feature affect normal write/read
path, I think following things can
be done to make things better:

->Give your design and documents(maybe put it in wiki or somewhere
else) So that
other guys can really review your design instead of looking a bunch of
codes firstly. And we really
need understand pros and cons of design, also if there is TODO, please
do clarity out how we
can solve this problem(or it is possible?).

   ->We need reallly a lot of testing and benchmarking if it affects
normal write/read path.

   ->I think Chris is nice to merge patches, but I really argue next
time if we want to merge new feautres
Please make sure at least two other guys review for patches.


Thank you!
Shilong

On Wed, May 11, 2016 at 6:11 AM, Mark Fasheh  wrote:
> On Tue, May 10, 2016 at 03:19:52PM +0800, Qu Wenruo wrote:
>> Hi, Chris, Josef and David,
>>
>> As merge window for v4.7 is coming, it would be good to hear your
>> ideas about the inband dedupe.
>>
>> We are addressing the ENOSPC problem which Josef pointed out, and we
>> believe the final fix patch would come out at the beginning of the
>> merge window.(Next week)
>
> How about the fiemap performance problem you referenced before? My guess is
> that it happens because you don't coalesce writes into anything larger than
> a page so you're stuck deduping at some silly size like 4k. This in turn
> fragments the files so much that fiemap has a hard time walking backrefs.
>
> I have to check the patches to be sure but perhaps you can tell me whether
> my hunch is correct or not.
>
>
> In fact, I actually asked privately for time to review your dedupe patches,
> but I've been literally so busy cleaning up after the mess you left in your
> last qgroups rewrite I haven't had time.
>
> You literally broke qgroups in almost every spot that matters. In some cases
> (drop_snapshot) you tore out working code and left in a /* TODO */ comment
> for someone else to complete.  Snapshot create was so trivially and
> completely broken by your changes that weeks later, I'm still hunting a
> solution which doesn't involve adding an extra _commit_ to our commit.  This
> is a MASSIVE regression from where we were before.
>
> IMHO, you should not be trusted with large features or rewrites until you
> can demonstrate:
>
>  - A willingness to *completely* solve the problem you are trying to 'fix',
>not do half the job which someone else will have to complete for you.
>
>  - Actual testing. The snapshot bug I reference above exists purely because
>nobody created a snapshot inside of one and checked the qgroup numbers!
>
> Sorry to be so harsh.
>--Mark
>
> --
> Mark Fasheh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fstests: generic: Test SHARED flag about fiemap ioctl before and after sync

2016-05-12 Thread Qu Wenruo
The test case will check SHARED flag returned by fiemap ioctl on
reflinked files before and after sync.

Normally SHARED flag won't change just due to a normal sync operation.

But btrfs doesn't handle SHARED flag well, and this time it won't check
any delayed extent tree(reverse extent searching tree) modification, but
only metadata already committed to disk.

So btrfs will not return correct SHARED flag on reflinked files if there
is no sync to commit all metadata.

This testcase will just check it.

Signed-off-by: Qu Wenruo 
--
And of course, xfs handles it quite well. Nice work Darrick.
Also the test case needs the new infrastructure introduced in previous
generic/352 test case.
---
 tests/generic/353 | 86 +++
 tests/generic/353.out |  9 ++
 tests/generic/group   |  1 +
 3 files changed, 96 insertions(+)
 create mode 100755 tests/generic/353
 create mode 100644 tests/generic/353.out

diff --git a/tests/generic/353 b/tests/generic/353
new file mode 100755
index 000..1e9117e
--- /dev/null
+++ b/tests/generic/353
@@ -0,0 +1,86 @@
+#! /bin/bash
+# FS QA Test 353
+#
+# Check if fiemap ioctl returns correct SHARED flag on reflinked file
+# before and after sync the fs
+#
+# Btrfs has a bug in checking shared extent, which can only handle metadata
+# already committed to disk, but not delayed extent tree modification.
+# This caused SHARED flag only occurs after sync.
+#
+#---
+# Copyright (c) 2016 Fujitsu. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+. ./common/punch
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs generic
+_supported_os Linux
+_require_scratch_reflink
+_require_fiemap
+
+_scratch_mkfs > /dev/null 2>&1
+_scratch_mount
+
+blocksize=64k
+file1="$SCRATCH_MNT/file1"
+file2="$SCRATCH_MNT/file2"
+
+# write the initial file
+_pwrite_byte 0xcdcdcdcd 0 $blocksize $file1 | _filter_xfs_io
+
+# reflink initial file
+_reflink_range $file1 0 $file2 0 $blocksize | _filter_xfs_io
+
+# check their fiemap to make sure it's correct
+$XFS_IO_PROG -c "fiemap -v" $file1 | _filter_fiemap_flags
+$XFS_IO_PROG -c "fiemap -v" $file2 | _filter_fiemap_flags
+
+# sync and recheck, to make sure the fiemap doesn't change just
+# due to sync
+sync
+$XFS_IO_PROG -c "fiemap -v" $file1 | _filter_fiemap_flags
+$XFS_IO_PROG -c "fiemap -v" $file2 | _filter_fiemap_flags
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/353.out b/tests/generic/353.out
new file mode 100644
index 000..0cd8981
--- /dev/null
+++ b/tests/generic/353.out
@@ -0,0 +1,9 @@
+QA output created by 353
+wrote 65536/65536 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+linked 65536/65536 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+0: [0..127]: 0x2001
+0: [0..127]: 0x2001
+0: [0..127]: 0x2001
+0: [0..127]: 0x2001
diff --git a/tests/generic/group b/tests/generic/group
index 3f00386..0392d4d 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -355,3 +355,4 @@
 350 blockdev quick rw
 351 blockdev quick rw
 352 auto clone
+353 auto quick clone
-- 
2.5.5



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS Data at Rest File Corruption

2016-05-12 Thread Chris Murphy
On Thu, May 12, 2016 at 11:49 AM, Richard A. Lochner  wrote:

> I suspected, and I still suspect that the error occurred upon a
> metadata update that corrupted the checksum for the file, probably due
> to silent memory corruption.  If the checksum was silently corrupted,
> it would be simply written to both drives causing this type of error.

Metadata is checksummed independently of data. So if the data isn't
updated, its checksum doesn't change, only metadata checksum is
changed.

>
> btrfs dmesg(s):
>
> [16510.334020] BTRFS warning (device sdb1): checksum error at logical
> 3037444042752 on dev /dev/sdb1, sector 4988789496, root 259, inode
> 1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img)
> [16510.334043] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd
> 0, flush 0, corrupt 5, gen 0
> [16510.345662] BTRFS error (device sdb1): unable to fixup (regular)
> error at logical 3037444042752 on dev /dev/sdb1
>
> [17606.978439] BTRFS warning (device sdb1): checksum error at logical
> 3037444042752 on dev /dev/sdc1, sector 4988750584, root 259, inode
> 1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img)
> [17606.978460] BTRFS error (device sdb1): bdev /dev/sdc1 errs: wr 0, rd
> 13, flush 0, corrupt 4, gen 0
> [17606.989497] BTRFS error (device sdb1): unable to fixup (regular)
> error at logical 3037444042752 on dev /dev/sdc1

This is confusing. Are these the same boot? The later time has a lower
corrupt count. Can you just 'dd if=sda4.img of=/dev/null' and report
all (new) messages in dmesg? It seems to me there should be pretty
much all the same monotonic-time for the problem with both devices.

Also what do you get for these for each device:

smartctl scterc -l /dev/sdX
cat /sys/block/sdX/device/timeout


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS Data at Rest File Corruption

2016-05-12 Thread Richard A. Lochner
Austin,

Ah, the idea of rewriting the "bad" data block is very interesting. I
had not thought of that.  Interestingly, the corrupted file is a raw
backup image of a btrfs file system partition. I can mount it as a loop
device.  I suppose I could rewrite that data block, mount it and run a
scrub on that mounted loop device to find out if it is truly fixed.

I should also mention that this data is not critical to me.  I only
brought this issue up because I thought it might be of interest.  

I can think of ways to protect against most manifestations of this type
of error (since metadata is checksummed in btrfs), but I cannot argue
that it would be worth the development effort, increased code
complexity or the additional cpu cycles required to implement such a
"defensive" algorithm for an "edge case" like this.  Even with a
defensive algorithm, these errors could still occur, but I believe you
could shrink the time window in which they could occur enough to
significantly reduce their probability.

That said, I happen to have experienced this particular error twice
(over a period of about 7 months) with btrfs on this system.  I do
believe that both were due to memory errors and I plan to upgrade soon
to a Haswell system with ECC memory because of this. 

However, I wonder if my "commodity hardware" is that unique?

In any event, thank you very much for your time and insight.

Rick Lochner


On Thu, 2016-05-12 at 14:29 -0400, Austin S. Hemmelgarn wrote:
> On 2016-05-12 13:49, Richard A. Lochner wrote:
> > 
> > Austin,
> > 
> > I rebooted the computer and reran the scrub to no avail.  The error
> > is
> > consistent.
> > 
> > The reason I brought this question to the mailing list is because
> > it
> > seemed like a situation that might be of interest to the
> > developers.
> >  Perhaps, there might be a way to "defend" against this type of
> > corruption.
> > 
> > I suspected, and I still suspect that the error occurred upon a
> > metadata update that corrupted the checksum for the file, probably
> > due
> > to silent memory corruption.  If the checksum was silently
> > corrupted,
> > it would be simply written to both drives causing this type of
> > error.
> That does seem to be the most likely cause, and sadly, is not
> something 
> any filesystem can protect reliably against on any commodity
> hardware.
> > 
> > 
> > With that in mind, I proved (see below) that the data blocks match
> > on
> > both mirrors.  This I expected since the data blocks should not
> > have
> > been touched as the the file has not been written.
> > 
> > This is the sequence of events as I see them that I think might be
> > of
> > interest to the developers.
> > 
> > 1. A block containing a checksum for the file was read into memory.
> > The block read would have been checksummed, so the checksum for the
> > file must have been good at that moment.
> It's worth noting that BTRFS doesn't verify all the checksums in a 
> metadata block when it loads that metadata block, only the ones for
> the 
> reads that triggered the metadata block being loaded will get
> verified.
> > 
> > 
> > 2. The checksum block was the altered in memory (perhaps to add or
> > change a value).
> > 
> > 3. A new checksum would then have been calculated for the checksum
> > block.
> > 
> > 4. The checksum block would have been written to both mirrors.
> > 
> > Presumably, in the case that I am experiencing, an undetected
> > memory
> > error must have occurred after 1 and before step 3 was completed.
> > 
> > I wonder if there is a way to correct or detect that situation.
> The closest we could get is to provide an option to handle this in 
> scrub, preferably with a big scary warning on it as this same
> situation 
> can be easily cause by someone modifying the disks themselves (we
> can't 
> reasonably protect against that, but we shouldn't make it trivial
> for 
> people to inject arbitrary data that way either).
> > 
> > 
> > As I stated previously, the machine on which this occurred does not
> > have ECC memory, however, I would not think that the majority of
> > users
> > running btrfs do either.  If it has happened to me, it likely has
> > happened to others.
> > 
> > Rick Lochner
> > 
> > btrfs dmesg(s):
> > 
> > [16510.334020] BTRFS warning (device sdb1): checksum error at
> > logical
> > 3037444042752 on dev /dev/sdb1, sector 4988789496, root 259, inode
> > 1437377, offset 75754369024, length 4096, links 1 (path:
> > Rick/sda4.img)
> > [16510.334043] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr
> > 0, rd
> > 0, flush 0, corrupt 5, gen 0
> > [16510.345662] BTRFS error (device sdb1): unable to fixup (regular)
> > error at logical 3037444042752 on dev /dev/sdb1
> > 
> > [17606.978439] BTRFS warning (device sdb1): checksum error at
> > logical
> > 3037444042752 on dev /dev/sdc1, sector 4988750584, root 259, inode
> > 1437377, offset 75754369024, length 4096, links 1 (path:
> > Rick/sda4.img)
> > [17606.978460] BTRFS error (device sdb1): bdev 

Re: BTRFS Data at Rest File Corruption

2016-05-12 Thread Goffredo Baroncelli
On 2016-05-12 20:29, Austin S. Hemmelgarn wrote:
>> I wonder if there is a way to correct or detect that situation.
> The closest we could get is to provide an option to handle this in
> scrub, preferably with a big scary warning on it as this same
> situation can be easily cause by someone modifying the disks
> themselves (we can't reasonably protect against that, but we
> shouldn't make it trivial for people to inject arbitrary data that
> way either).

"btrfs check" has the option "--init-csum-tree"...

Anyway, it should be exist an option to recalculate the checksum for a single 
file. BTRFS is good to highlight that a file is corrupted, but it should have 
the possibility to read it anyway: in some case is better to have a corrupted 
file (knowing that it is corrupted) that loosing all the file.

BR
G.Baroncelli
-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/1] btrfs-progs: Typo review of strings and comments

2016-05-12 Thread Nicholas D Steeves
On 12 May 2016 at 07:29, David Sterba  wrote:
> On Wed, May 11, 2016 at 07:50:35PM -0400, Nicholas D Steeves wrote:
>> There were a couple of instances where I wasn't sure what to do; I've
>> annotated them, and they are long lines now.  To find them, search or
>> grep the diff for 'Steeves'.
>
> Found and updated, most of them real typos, 'strtoull' is name of a C
> library functioon.

Hi David,

Thank you for reviewing this patch.  Sorry for missing the context of
the strtoull comment; I should have been able to infer that and am
embarrassed that I failed to.  Also, embarrassed because I think I've
used it in some C++ code!

I learned how to use git rebase and git reset today, and can submit a
v2 patch diffed against master at your earliest convenience.  My only
remaining question is this:

mkfs.c: printf:("Incompatible features:  %s", features_buf)
  * Should this be left as "Imcompat features"?

Regards,
Nicholas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About in-band dedupe for v4.7

2016-05-12 Thread Mark Fasheh
On Wed, May 11, 2016 at 07:36:59PM +0200, David Sterba wrote:
> On Tue, May 10, 2016 at 07:52:11PM -0700, Mark Fasheh wrote:
> > Taking your history with qgroups out of this btw, my opinion does not
> > change.
> > 
> > With respect to in-memory only dedupe, it is my honest opinion that such a
> > limited feature is not worth the extra maintenance work. In particular
> > there's about 800 lines of code in the userspace patches which I'm sure
> > you'd want merged, because how could we test this then?
> 
> I like the in-memory dedup backend. It's lightweight, only a heuristic,
> does not need any IO or persistent storage. OTOH I consider it a subpart
> of the in-band deduplication that does all the persistency etc. So I
> treat the ioctl interface from a broader aspect.

Those are all nice qualities, but what do they all get us?

For example, my 'large' duperemove test involves about 750 gigabytes of
general purpose data - quite literally /home off my workstation.

After the run I'm usually seeing between 65-75 gigabytes saved for a total
of only 10% duplicated data. I would expect this to be fairly 'average' -
/home on my machine has the usual stuff - documents, source code, media,
etc.

So if you were writing your whole fs out you could expect about the same
from inline dedupe - 10%-ish. Let's be generous and go with that number
though as a general 'this is how much dedupe we get'.

What the memory backend is doing then is providing a cache of sha256/block
calculations. This cache is very expensive to fill, and every written block
must go through it. On top of that, the cache does not persist between
mounts, and has items regularly removed from it when we run low on memory.
All of this will drive down the amount of duplicated data we can find.

So our best case savings is probably way below 10% - let's be _really_ nice
and say 5%.

Now ask yourself the question - would you accept a write cache which is
expensive to fill and would only have a hit rate of less than 5%?

Oh and there's 800 lines of userspace we'd merge to manage this cache too,
kernel ioctls which would have to be finalized, etc.


> A usecase I find interesting is to keep the in-memory dedup cache and
> then flush it to disk on demand, compared to automatically synced dedup
> (eg. at commit time).

What's the benefit here? We're still going to be hashing blocks on the way
in, and if we're not deduping them at write time then we're just have to
remove the extents and dedupe them later.


> > A couple examples sore points in my review so far:
> > 
> > - Internally you're using a mutex (instead of a spinlock) to lock out 
> > queries
> >  to the in-memory hash, which I can see becoming a performance problem in 
> > the
> >  write path.
> > 
> > - Also, we're doing SHA256 in the write path which I expect will
> >  slow it down even more dramatically. Given that all the work done gets
> >  thrown out every time we fill the hash (or remount), I just don't see much
> >  benefit to the user with this.
> 
> I had some ideas to use faster hashes and do sha256 when it's going to
> be stored on disk, but there were some concerns. The objection against
> speed and performance hit at write time is valid. But we'll need to
> verify that in real performance tests, which haven't happend yet up to
> my knowledge.

This is the type of thing that IMHO absolutely must be provided with each
code drop of the feature. Dedupe is nice but _nobody_ will use it if it's
slow. I know this from experience. I personally feel that btrfs has had
enough of 'cute' and 'almost working' features. If we want inline dedupe we
should do it correctly and with the right metrics from the beginning.


This is slightly unrelated to our discussion but my other unsolicited
opinion: As a kernel developer and maintainer of a file system for well over
a decade I will say that balancing the number of out of tree patches is
necessary but we should never be accepting of large features just because
'they've been out for a long time'. Again I mention this because other parts
of the discussion felt like they were going in that direction.

Thanks,
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: added quiet-option for scripts

2016-05-12 Thread btrfs
From: M G Berberich 

-q,--quiet to prevent status-messages on stderr
--verbose as alternative for -v
moved 'Mode NO_FILE_DATA enabled' message to stderr
changed default for g_verbose to 1
---
 cmds-send.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/cmds-send.c b/cmds-send.c
index 4063475..81b086e 100644
--- a/cmds-send.c
+++ b/cmds-send.c
@@ -44,7 +44,9 @@
 #include "send.h"
 #include "send-utils.h"
 
-static int g_verbose = 0;
+/* default is 1 for historical reasons
+   changing may break scripts */
+static int g_verbose = 1;
 
 struct btrfs_send {
int send_fd;
@@ -301,10 +303,10 @@ static int do_send(struct btrfs_send *send, u64 
parent_root_id,
"Try upgrading your kernel or don't use -e.\n");
goto out;
}
-   if (g_verbose > 0)
+   if (g_verbose > 1)
fprintf(stderr, "BTRFS_IOC_SEND returned %d\n", ret);
 
-   if (g_verbose > 0)
+   if (g_verbose > 1)
fprintf(stderr, "joining genl thread\n");
 
close(pipefd[1]);
@@ -429,9 +431,11 @@ int cmd_send(int argc, char **argv)
while (1) {
enum { GETOPT_VAL_SEND_NO_DATA = 256 };
static const struct option long_options[] = {
+   { "verbose", no_argument, NULL, 'v' },
+   { "quiet", no_argument, NULL, 'q' },
{ "no-data", no_argument, NULL, GETOPT_VAL_SEND_NO_DATA 
}
};
-   int c = getopt_long(argc, argv, "vec:f:i:p:", long_options, 
NULL);
+   int c = getopt_long(argc, argv, "vqec:f:i:p:", long_options, 
NULL);
 
if (c < 0)
break;
@@ -440,6 +444,9 @@ int cmd_send(int argc, char **argv)
case 'v':
g_verbose++;
break;
+   case 'q':
+   g_verbose--;
+   break;
case 'e':
new_end_cmd_semantic = 1;
break;
@@ -622,8 +629,8 @@ int cmd_send(int argc, char **argv)
}
}
 
-   if (send_flags & BTRFS_SEND_FLAG_NO_FILE_DATA)
-   printf("Mode NO_FILE_DATA enabled\n");
+   if ((send_flags & BTRFS_SEND_FLAG_NO_FILE_DATA) && g_verbose > 1)
+   fprintf(stderr, "Mode NO_FILE_DATA enabled\n");
 
for (i = optind; i < argc; i++) {
int is_first_subvol;
@@ -632,7 +639,8 @@ int cmd_send(int argc, char **argv)
free(subvol);
subvol = argv[i];
 
-   fprintf(stderr, "At subvol %s\n", subvol);
+   if (g_verbose > 0)
+   fprintf(stderr, "At subvol %s\n", subvol);
 
subvol = realpath(subvol, NULL);
if (!subvol) {
@@ -713,8 +721,9 @@ const char * const cmd_send_usage[] = {
"which case 'btrfs send' will determine a suitable parent among the",
"clone sources itself.",
"\n",
-   "-v   Enable verbose debug output. Each occurrence of",
+   "-v, --verboseEnable verbose debug output. Each occurrence of",
" this option increases the verbose level more.",
+   "-q, --quiet  suppress messages to stderr.",
"-e   If sending multiple subvols at once, use the new",
" format and omit the end-cmd between the subvols.",
"-p   Send an incremental stream from  to",
@@ -728,5 +737,6 @@ const char * const cmd_send_usage[] = {
" does not contain any file data and thus cannot be 
used",
" to transfer changes. This mode is faster and useful 
to",
" show the differences in metadata.",
+   "--help   display this help and exit",
NULL
 };
-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Input/output error on newly created file

2016-05-12 Thread Szalma László

2016-05-12 19:41 keltezéssel, Nikolaus Rath írta:

On May 12 2016, Diego Calleja  wrote:

El jueves, 12 de mayo de 2016 8:46:00 (CEST) Nikolaus Rath escribió:

*ping*

Anyone any idea?

All I can say is that I've had the same problem in the past. In my
case, the problematic files where active torrents. The interesting
thing is that I was able to read them correctly up to a point, then
I would get the same error as you. No messages in dmesg. The amount
of data I was able to read from them was not random, it was
something multiple of 4K. After reboot the problems went away and
I wasn't able to reproduce it.

There has been reports of similiar issues in the past:
http://www.spinics.net/lists/linux-btrfs/msg52371.html

Thanks for the pointer. So according to these reports, the IO errors
happen with:


1. dm-crypt, LVM and mount options
rw,noatime,compress=lzo,ssd,discard,space_cache,autodefrag,inode_cache
2. dm-crypt, no LVM, and mount options
noatime,compress,nossd

I can add the following data point:

3. dm-crypt on LVM, and mount options
relatime,compress=lzo


(Just to preserve this for the future)


I don't want to repeat my e-mails in the past, but I think I have a 
similar issue. In short: xen virtual machine. The files that rarely 
become unreadable (I/O error but no error in dmesg or anywhere) are 
mysql MyIsam database files, and they are always small. Like 16kbyte for 
example, or smaller. Sometimes dropping the fs cache fixes the problem, 
sometimes not. Umount / mount always fixes the problem. Scrub says the 
filesystem is OK (when the file is unreadable).


I experienced similar problem with log files (smtp or apache log files), 
but it is rare. It happens 1-2 times in a month on a heavy loaded 
mail/web server. (the log file io error is not a real problem for me, 
mysql is.). The problem present from 3.18 to 4.5.


I started to migrate mysql table files to innodb, but i think this is 
only a workaround.


László Szalma





Best,
-Nikolaus



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS Data at Rest File Corruption

2016-05-12 Thread Austin S. Hemmelgarn

On 2016-05-12 13:49, Richard A. Lochner wrote:

Austin,

I rebooted the computer and reran the scrub to no avail.  The error is
consistent.

The reason I brought this question to the mailing list is because it
seemed like a situation that might be of interest to the developers.
 Perhaps, there might be a way to "defend" against this type of
corruption.

I suspected, and I still suspect that the error occurred upon a
metadata update that corrupted the checksum for the file, probably due
to silent memory corruption.  If the checksum was silently corrupted,
it would be simply written to both drives causing this type of error.
That does seem to be the most likely cause, and sadly, is not something 
any filesystem can protect reliably against on any commodity hardware.


With that in mind, I proved (see below) that the data blocks match on
both mirrors.  This I expected since the data blocks should not have
been touched as the the file has not been written.

This is the sequence of events as I see them that I think might be of
interest to the developers.

1. A block containing a checksum for the file was read into memory.
The block read would have been checksummed, so the checksum for the
file must have been good at that moment.
It's worth noting that BTRFS doesn't verify all the checksums in a 
metadata block when it loads that metadata block, only the ones for the 
reads that triggered the metadata block being loaded will get verified.


2. The checksum block was the altered in memory (perhaps to add or
change a value).

3. A new checksum would then have been calculated for the checksum
block.

4. The checksum block would have been written to both mirrors.

Presumably, in the case that I am experiencing, an undetected memory
error must have occurred after 1 and before step 3 was completed.

I wonder if there is a way to correct or detect that situation.
The closest we could get is to provide an option to handle this in 
scrub, preferably with a big scary warning on it as this same situation 
can be easily cause by someone modifying the disks themselves (we can't 
reasonably protect against that, but we shouldn't make it trivial for 
people to inject arbitrary data that way either).


As I stated previously, the machine on which this occurred does not
have ECC memory, however, I would not think that the majority of users
running btrfs do either.  If it has happened to me, it likely has
happened to others.

Rick Lochner

btrfs dmesg(s):

[16510.334020] BTRFS warning (device sdb1): checksum error at logical
3037444042752 on dev /dev/sdb1, sector 4988789496, root 259, inode
1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img)
[16510.334043] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd
0, flush 0, corrupt 5, gen 0
[16510.345662] BTRFS error (device sdb1): unable to fixup (regular)
error at logical 3037444042752 on dev /dev/sdb1

[17606.978439] BTRFS warning (device sdb1): checksum error at logical
3037444042752 on dev /dev/sdc1, sector 4988750584, root 259, inode
1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img)
[17606.978460] BTRFS error (device sdb1): bdev /dev/sdc1 errs: wr 0, rd
13, flush 0, corrupt 4, gen 0
[17606.989497] BTRFS error (device sdb1): unable to fixup (regular)
error at logical 3037444042752 on dev /dev/sdc1

How I compared the data blocks:

#btrfs-map-logical -l 3037444042752  /dev/sdc1
mirror 1 logical 3037444042752 physical 2554240299008 device /dev/sdc1
mirror 1 logical 3037444046848 physical 2554240303104 device /dev/sdc1
mirror 2 logical 3037444042752 physical 2554260221952 device /dev/sdb1
mirror 2 logical 3037444046848 physical 2554260226048 device /dev/sdb1

#dd if=/dev/sdc1 bs=1 skip=2554240299008 count=4096 of=c1
4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.0292201 s, 140 kB/s

#dd if=/dev/sdc1 bs=1 skip=2554240303104 count=4096 of=c2
4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.0142381 s, 288 kB/s

#dd if=/dev/sdb1 bs=1 skip=2554260221952 count=4096 of=b1
4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.0293211 s, 140 kB/s

#dd if=/dev/sdb1 bs=1 skip=2554260226048 count=4096 of=b2
4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.0151947 s, 270 kB/s

#diff b1 c1
#diff b2 c2

Excellent thinking here.

Now, if you can find some external method to verify that that block is 
in fact correct, you can just write it back into the file itself at the 
correct offset, and fix the issue.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS Data at Rest File Corruption

2016-05-12 Thread Richard A. Lochner
Andrew,

I agree with your supposition about the metadata and corrupted RAM.  

I verified that the data blocks on both devices are equal (see my reply
to Austin for the commands I used.  I believe that they correctly prove
that the blocks are, in fact, equal.

I am not sure I have the skills to "walk the checksum tree manually" as
you described.  I would also like to verify that the checksum blocks
agree as I expect they do, but I may have to "bone up" on my tree
walking skills first.

Thanks for your help.

Rick Lochner

On Wed, 2016-05-11 at 21:16 -0400, Andrew Wade wrote:
> 
> I would expect the "data at rest" to be good too. But perhaps
> something happened to the metadata (checksum). If the checksum was
> corrupted in RAM it could be written back to the disks due to updates
> elsewhere in the metadata node.
> If this is what happened I would expect the metadata node containing
> the checksum to have a recent generation number.
> I'm not actually a BTRFS developer myself, but you might be able to
> find the generation by using btrfs-debug-tree from btrfs-tools.
> btrfs-debug-tree -r /dev/sdc1 will give you the block number of the
> checksum tree root, which you can then feed into btrfs-debug-tree -b
>  /dev/sdc1 and walk the tree manually. You're looking for the
> largest key before 3037444042752. 
> For dumping the data and metadata blocks I think btrfs-map-logical is
> what you need, though to be honest I've never used this tool myself.
> Even if the file data is still good I don't know of a simple way to
> tell BTRFS to ignore the checksums for a file. It is possible to
> regenerate the checksum tree for the entire filesystem, but I
> personally wouldn't do that unless you really need the file.
> regards,
> Andrew
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fsck: to repair or not to repair

2016-05-12 Thread Ashish Samant



On 05/12/2016 10:35 AM, Nikolaus Rath wrote:

On May 12 2016, Henk Slager  wrote:

On Wed, May 11, 2016 at 11:10 PM, Nikolaus Rath  wrote:

Hello,

I recently ran btrfsck on one of my file systems, and got the following
messages:

checking extents
checking free space cache
checking fs roots
root 5 inode 3149867 errors 400, nbytes wrong
root 5 inode 3150237 errors 400, nbytes wrong
root 5 inode 3150238 errors 400, nbytes wrong
root 5 inode 3150242 errors 400, nbytes wrong
root 5 inode 3150260 errors 400, nbytes wrong
[ lots of similar message with different inode numbers ]
root 5 inode 15595011 errors 400, nbytes wrong
root 5 inode 15595016 errors 400, nbytes wrong
Checking filesystem on /dev/mapper/vg0-nikratio_crypt
UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
found 263648960636 bytes used err is 1
total csum bytes: 395314372
total tree bytes: 908644352
total fs tree bytes: 352735232
total extent tree bytes: 95039488
btree space waste bytes: 156301160
file data blocks allocated: 675209801728
  referenced 410351722496
Btrfs v3.17



Can someone explain to me the risk that I run by attempting a repair,
and (conversely) what I put at stake when continuing to use this file
system as-is?

It has once been mentioned in this mail-list, that if the 'errors 400,
nbytes wrong' is the only error on an fs, btrfs check --repair can fix
them ( was around time of tools release 4.4 , by Qu AFAIK).
I had /(have?) about 7 of those errors in small files on an fs that is
2.5 years old and has quite some older ro snapshots. I once tried to
fix them with 4.5.0 + some patches tools, but actually they did not
get fixed. At least with 4.5.2 or 4.5.3 tools it should be possible to
fix them in your case. Maybe you first want to test it on an overlay
of the device or copy the whole fs with dd. It depends on how much
time you can allow the fs to be offline etc, it is up to you.

In my case, I recreated the files in the working subvol, but as long

[...]

How did you determine which files were affected? Is there a way to map
inodes to paths?

 btrfs inspect-internal inode-resolve  .

This resolves the  in subvol  to its fs paths

Thanks,
Ashish



Thanks!
-Nikolaus



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS Data at Rest File Corruption

2016-05-12 Thread Richard A. Lochner
Austin,

I rebooted the computer and reran the scrub to no avail.  The error is
consistent.

The reason I brought this question to the mailing list is because it
seemed like a situation that might be of interest to the developers.
 Perhaps, there might be a way to "defend" against this type of
corruption.

I suspected, and I still suspect that the error occurred upon a
metadata update that corrupted the checksum for the file, probably due
to silent memory corruption.  If the checksum was silently corrupted,
it would be simply written to both drives causing this type of error.

With that in mind, I proved (see below) that the data blocks match on
both mirrors.  This I expected since the data blocks should not have
been touched as the the file has not been written.

This is the sequence of events as I see them that I think might be of
interest to the developers.

1. A block containing a checksum for the file was read into memory.
The block read would have been checksummed, so the checksum for the
file must have been good at that moment.

2. The checksum block was the altered in memory (perhaps to add or
change a value).

3. A new checksum would then have been calculated for the checksum
block.

4. The checksum block would have been written to both mirrors.

Presumably, in the case that I am experiencing, an undetected memory
error must have occurred after 1 and before step 3 was completed.

I wonder if there is a way to correct or detect that situation.  

As I stated previously, the machine on which this occurred does not
have ECC memory, however, I would not think that the majority of users
running btrfs do either.  If it has happened to me, it likely has
happened to others.

Rick Lochner

btrfs dmesg(s):

[16510.334020] BTRFS warning (device sdb1): checksum error at logical
3037444042752 on dev /dev/sdb1, sector 4988789496, root 259, inode
1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img)
[16510.334043] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd
0, flush 0, corrupt 5, gen 0
[16510.345662] BTRFS error (device sdb1): unable to fixup (regular)
error at logical 3037444042752 on dev /dev/sdb1

[17606.978439] BTRFS warning (device sdb1): checksum error at logical
3037444042752 on dev /dev/sdc1, sector 4988750584, root 259, inode
1437377, offset 75754369024, length 4096, links 1 (path: Rick/sda4.img)
[17606.978460] BTRFS error (device sdb1): bdev /dev/sdc1 errs: wr 0, rd
13, flush 0, corrupt 4, gen 0
[17606.989497] BTRFS error (device sdb1): unable to fixup (regular)
error at logical 3037444042752 on dev /dev/sdc1

How I compared the data blocks:

#btrfs-map-logical -l 3037444042752  /dev/sdc1
mirror 1 logical 3037444042752 physical 2554240299008 device /dev/sdc1
mirror 1 logical 3037444046848 physical 2554240303104 device /dev/sdc1
mirror 2 logical 3037444042752 physical 2554260221952 device /dev/sdb1
mirror 2 logical 3037444046848 physical 2554260226048 device /dev/sdb1

#dd if=/dev/sdc1 bs=1 skip=2554240299008 count=4096 of=c1
4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.0292201 s, 140 kB/s

#dd if=/dev/sdc1 bs=1 skip=2554240303104 count=4096 of=c2
4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.0142381 s, 288 kB/s

#dd if=/dev/sdb1 bs=1 skip=2554260221952 count=4096 of=b1
4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.0293211 s, 140 kB/s

#dd if=/dev/sdb1 bs=1 skip=2554260226048 count=4096 of=b2
4096+0 records in
4096+0 records out
4096 bytes (4.1 kB) copied, 0.0151947 s, 270 kB/s

#diff b1 c1
#diff b2 c2

> 
On Wed, 2016-05-11 at 15:26 -0400, Austin S. Hemmelgarn wrote:
On 2016-05-11 14:36, Richard Lochner wrote:
> > 
> > Hello,
> > 
> > I have encountered a data corruption error with BTRFS which may or
> > may
> > not be of interest to your developers.
> > 
> > The problem is that an unmodified file on a RAID-1 volume that had
> > been scrubbed successfully is now corrupt.  The details follow.
> > 
> > The volume was formatted as btrfs with raid1 data and raid1
> > metadata
> > on two new 4T hard drives (WD Data Center Re WD4000FYYZ) .
> > 
> > A large binary file was copied to the volume (~76 GB) on December
> > 27,
> > 2015.  Soon after copying the file, a btrfs scrub was run. There
> > were
> > no errors.  Multiple scrubs have also been run over the past
> > several
> > months.
> > 
> > Recently, a scrub returned an unrecoverable error on that file.
> > Again, the file has not been modified since it was originally
> > copied
> > and has the time stamp from December.  Furthermore, SMART tests
> > (long)
> > for both drives do not indicate any errors (Current_Pending_Sector
> > or
> > otherwise).
> > 
> > I should note that the system does not have ECC memory.
> > 
> > It would be interesting to me to know if:
> > 
> > a) The primary and secondary data blocks match (I suspect they do),
> > and
> > b) The primary and secondary checksums for the block match (I
> > suspect
> > they do as well)
> Do you mean if 

Re: Input/output error on newly created file

2016-05-12 Thread Nikolaus Rath
On May 12 2016, Diego Calleja  wrote:
> El jueves, 12 de mayo de 2016 8:46:00 (CEST) Nikolaus Rath escribió:
>> *ping*
>> 
>> Anyone any idea?
>
> All I can say is that I've had the same problem in the past. In my
> case, the problematic files where active torrents. The interesting
> thing is that I was able to read them correctly up to a point, then
> I would get the same error as you. No messages in dmesg. The amount
> of data I was able to read from them was not random, it was
> something multiple of 4K. After reboot the problems went away and
> I wasn't able to reproduce it.
>
> There has been reports of similiar issues in the past:
> http://www.spinics.net/lists/linux-btrfs/msg52371.html

Thanks for the pointer. So according to these reports, the IO errors
happen with:


1. dm-crypt, LVM and mount options
   rw,noatime,compress=lzo,ssd,discard,space_cache,autodefrag,inode_cache
2. dm-crypt, no LVM, and mount options
   noatime,compress,nossd

I can add the following data point:

3. dm-crypt on LVM, and mount options
   relatime,compress=lzo


(Just to preserve this for the future)


Best,
-Nikolaus
-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: fix race between fsync and direct IO writes for prealloc extents

2016-05-12 Thread Josef Bacik

On 05/09/2016 07:01 AM, fdman...@kernel.org wrote:

From: Filipe Manana 

When we do a direct IO write against a preallocated extent (fallocate)
that does not go beyond the i_size of the inode, we do the write operation
without holding the inode's i_mutex (an optimization that landed in
commit 38851cc19adb ("Btrfs: implement unlocked dio write")). This allows
for a very tiny time window where a race can happen with a concurrent
fsync using the fast code path, as the direct IO write path creates first
a new extent map (no longer flagged as a prealloc extent) and then it
creates the ordered extent, while the fast fsync path first collects
ordered extents and then it collects extent maps. This allows for the
possibility of the fast fsync path to collect the new extent map without
collecting the new ordered extent, and therefore logging an extent item
based on the extent map without waiting for the ordered extent to be
created and complete. This can result in a situation where after a log
replay we end up with an extent not marked anymore as prealloc but it was
only partially written (or not written at all), exposing random, stale or
garbage data corresponding to the unwritten pages and without any
checksums in the csum tree covering the extent's range.

This is an extension of what was done in commit de0ee0edb21f ("Btrfs: fix
race between fsync and lockless direct IO writes").

So fix this by creating first the ordered extent and then the extent
map, so that this way if the fast fsync patch collects the new extent
map it also collects the corresponding ordered extent.

Signed-off-by: Filipe Manana 


Reviewed-by: Josef Bacik 

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: add semaphore to synchronize direct IO writes with fsync

2016-05-12 Thread Josef Bacik

On 05/12/2016 10:26 AM, fdman...@kernel.org wrote:

From: Filipe Manana 

Due to the optimization of lockless direct IO writes (the inode's i_mutex
is not held) introduced in commit 38851cc19adb ("Btrfs: implement unlocked
dio write"), we started having races between such writes with concurrent
fsync operations that use the fast fsync path. These races were addressed
in the patches titled "Btrfs: fix race between fsync and lockless direct
IO writes" and "Btrfs: fix race between fsync and direct IO writes for
prealloc extents". The races happened because the direct IO path, like
every other write path, does create extent maps followed by the
corresponding ordered extents while the fast fsync path collected first
ordered extents and then it collected extent maps. This made it possible
to log file extent items (based on the collected extent maps) without
waiting for the corresponding ordered extents to complete (get their IO
done). The two fixes mentioned before added a solution that consists of
making the direct IO path create first the ordered extents and then the
extent maps, while the fsync path attempts to collect any new ordered
extents once it collects the extent maps. This was simple and did not
require adding any synchonization primitive to any data structure (struct
btrfs_inode for example) but it makes things more fragile for future
development endeavours and adds an exceptional approach compared to the
other write paths.

This change adds a read-write semaphore to the btrfs inode structure and
makes the direct IO path create the extent maps and the ordered extents
while holding read access on that semaphore, while the fast fsync path
collects extent maps and ordered extents while holding write access on
that semaphore. The logic for direct IO write path is encapsulated in a
new helper function that is used both for cow and nocow direct IO writes.

Signed-off-by: Filipe Manana 


Looks good, thanks Filipe,

Reviewed-by: Josef Bacik 

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fsck: to repair or not to repair

2016-05-12 Thread Nikolaus Rath
On May 12 2016, Henk Slager  wrote:
> On Wed, May 11, 2016 at 11:10 PM, Nikolaus Rath  wrote:
>> Hello,
>>
>> I recently ran btrfsck on one of my file systems, and got the following
>> messages:
>>
>> checking extents
>> checking free space cache
>> checking fs roots
>> root 5 inode 3149867 errors 400, nbytes wrong
>> root 5 inode 3150237 errors 400, nbytes wrong
>> root 5 inode 3150238 errors 400, nbytes wrong
>> root 5 inode 3150242 errors 400, nbytes wrong
>> root 5 inode 3150260 errors 400, nbytes wrong
>> [ lots of similar message with different inode numbers ]
>> root 5 inode 15595011 errors 400, nbytes wrong
>> root 5 inode 15595016 errors 400, nbytes wrong
>> Checking filesystem on /dev/mapper/vg0-nikratio_crypt
>> UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
>> found 263648960636 bytes used err is 1
>> total csum bytes: 395314372
>> total tree bytes: 908644352
>> total fs tree bytes: 352735232
>> total extent tree bytes: 95039488
>> btree space waste bytes: 156301160
>> file data blocks allocated: 675209801728
>>  referenced 410351722496
>> Btrfs v3.17
>>
>>
>>
>> Can someone explain to me the risk that I run by attempting a repair,
>> and (conversely) what I put at stake when continuing to use this file
>> system as-is?
>
> It has once been mentioned in this mail-list, that if the 'errors 400,
> nbytes wrong' is the only error on an fs, btrfs check --repair can fix
> them ( was around time of tools release 4.4 , by Qu AFAIK).
> I had /(have?) about 7 of those errors in small files on an fs that is
> 2.5 years old and has quite some older ro snapshots. I once tried to
> fix them with 4.5.0 + some patches tools, but actually they did not
> get fixed. At least with 4.5.2 or 4.5.3 tools it should be possible to
> fix them in your case. Maybe you first want to test it on an overlay
> of the device or copy the whole fs with dd. It depends on how much
> time you can allow the fs to be offline etc, it is up to you.
>
> In my case, I recreated the files in the working subvol, but as long
[...]

How did you determine which files were affected? Is there a way to map
inodes to paths?


Thanks!
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: add semaphore to synchronize direct IO writes with fsync

2016-05-12 Thread fdmanana
From: Filipe Manana 

Due to the optimization of lockless direct IO writes (the inode's i_mutex
is not held) introduced in commit 38851cc19adb ("Btrfs: implement unlocked
dio write"), we started having races between such writes with concurrent
fsync operations that use the fast fsync path. These races were addressed
in the patches titled "Btrfs: fix race between fsync and lockless direct
IO writes" and "Btrfs: fix race between fsync and direct IO writes for
prealloc extents". The races happened because the direct IO path, like
every other write path, does create extent maps followed by the
corresponding ordered extents while the fast fsync path collected first
ordered extents and then it collected extent maps. This made it possible
to log file extent items (based on the collected extent maps) without
waiting for the corresponding ordered extents to complete (get their IO
done). The two fixes mentioned before added a solution that consists of
making the direct IO path create first the ordered extents and then the
extent maps, while the fsync path attempts to collect any new ordered
extents once it collects the extent maps. This was simple and did not
require adding any synchonization primitive to any data structure (struct
btrfs_inode for example) but it makes things more fragile for future
development endeavours and adds an exceptional approach compared to the
other write paths.

This change adds a read-write semaphore to the btrfs inode structure and
makes the direct IO path create the extent maps and the ordered extents
while holding read access on that semaphore, while the fast fsync path
collects extent maps and ordered extents while holding write access on
that semaphore. The logic for direct IO write path is encapsulated in a
new helper function that is used both for cow and nocow direct IO writes.

Signed-off-by: Filipe Manana 
---
 fs/btrfs/btrfs_inode.h |  10 
 fs/btrfs/inode.c   | 134 +++--
 fs/btrfs/tree-log.c|  51 ++-
 3 files changed, 77 insertions(+), 118 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 61205e3..1da5753 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -196,6 +196,16 @@ struct btrfs_inode {
struct list_head delayed_iput;
long delayed_iput_count;
 
+   /*
+* To avoid races between lockless (i_mutex not held) direct IO writes
+* and concurrent fsync requests. Direct IO writes must acquire read
+* access on this semaphore for creating an extent map and its
+* corresponding ordered extent. The fast fsync path must acquire write
+* access on this semaphore before it collects ordered extents and
+* extent maps.
+*/
+   struct rw_semaphore dio_sem;
+
struct inode vfs_inode;
 };
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5065ac2..c483bd21 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7145,6 +7145,43 @@ out:
return em;
 }
 
+static struct extent_map *btrfs_create_dio_extent(struct inode *inode,
+ const u64 start,
+ const u64 len,
+ const u64 orig_start,
+ const u64 block_start,
+ const u64 block_len,
+ const u64 orig_block_len,
+ const u64 ram_bytes,
+ const int type)
+{
+   struct extent_map *em = NULL;
+   int ret;
+
+   down_read(_I(inode)->dio_sem);
+   if (type != BTRFS_ORDERED_NOCOW) {
+   em = create_pinned_em(inode, start, len, orig_start,
+ block_start, block_len, orig_block_len,
+ ram_bytes, type);
+   if (IS_ERR(em))
+   goto out;
+   }
+   ret = btrfs_add_ordered_extent_dio(inode, start, block_start,
+  len, block_len, type);
+   if (ret) {
+   if (em) {
+   free_extent_map(em);
+   btrfs_drop_extent_cache(inode, start,
+   start + len - 1, 0);
+   }
+   em = ERR_PTR(ret);
+   }
+ out:
+   up_read(_I(inode)->dio_sem);
+
+   return em;
+}
+
 static struct extent_map *btrfs_new_extent_direct(struct inode *inode,
  u64 start, u64 len)
 {
@@ -7160,43 +7197,13 @@ static struct extent_map 
*btrfs_new_extent_direct(struct inode *inode,
if (ret)
return ERR_PTR(ret);
 
-   /*
-* Create the ordered extent before the extent map. This is 

Re: fsck: to repair or not to repair

2016-05-12 Thread Henk Slager
On Wed, May 11, 2016 at 11:10 PM, Nikolaus Rath  wrote:
> Hello,
>
> I recently ran btrfsck on one of my file systems, and got the following
> messages:
>
> checking extents
> checking free space cache
> checking fs roots
> root 5 inode 3149867 errors 400, nbytes wrong
> root 5 inode 3150237 errors 400, nbytes wrong
> root 5 inode 3150238 errors 400, nbytes wrong
> root 5 inode 3150242 errors 400, nbytes wrong
> root 5 inode 3150260 errors 400, nbytes wrong
> [ lots of similar message with different inode numbers ]
> root 5 inode 15595011 errors 400, nbytes wrong
> root 5 inode 15595016 errors 400, nbytes wrong
> Checking filesystem on /dev/mapper/vg0-nikratio_crypt
> UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
> found 263648960636 bytes used err is 1
> total csum bytes: 395314372
> total tree bytes: 908644352
> total fs tree bytes: 352735232
> total extent tree bytes: 95039488
> btree space waste bytes: 156301160
> file data blocks allocated: 675209801728
>  referenced 410351722496
> Btrfs v3.17
>
>
>
> Can someone explain to me the risk that I run by attempting a repair,
> and (conversely) what I put at stake when continuing to use this file
> system as-is?

It has once been mentioned in this mail-list, that if the 'errors 400,
nbytes wrong' is the only error on an fs, btrfs check --repair can fix
them ( was around time of tools release 4.4 , by Qu AFAIK).
I had /(have?) about 7 of those errors in small files on an fs that is
2.5 years old and has quite some older ro snapshots. I once tried to
fix them with 4.5.0 + some patches tools, but actually they did not
get fixed. At least with 4.5.2 or 4.5.3 tools it should be possible to
fix them in your case. Maybe you first want to test it on an overlay
of the device or copy the whole fs with dd. It depends on how much
time you can allow the fs to be offline etc, it is up to you.

In my case, I recreated the files in the working subvol, but as long
as I don't remove the older snapshots, the errors 400 will still be
there I assume. At least I don't experience any negative impact of it,
so I keep it like it is until at some point in time the older
snapshots get removed or I am somehow forced to clone back the data
into a fresh fs. I am running mostly latest stable or sometimes
mainline kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair

2016-05-12 Thread Zygo Blaxell
On Thu, May 12, 2016 at 04:35:24PM +0200, Niccolò Belli wrote:
> When doing the btrfs check I also always do a btrfs scrub and it never found
> any error. Once it didn't manage to finish the scrub because of:
> BTRFS critical (device dm-0): corrupt leaf, slot offset bad:
> block=670597120,root=1, slot=6
> and btrfs scrub status reported "was aborted after 00:00:10".
> 
> Talking about scrub I created a systemd timer to run scrub hourly and I
> noticed 2 *uncorrectable* errors suddenly appeared on my system. So I
> immediately re-run the scrub just to confirm it and then I rebooted into the
> Arch live usb and runned btrfs check: the metadata were perfect. So I runned
> btrfs scrub from the live usb and there were no errors at all! I rebooted
> into my system and runned scrub once again and the uncorrectable errors
> where really gone! It happened two times in the past few days.

That's what a RAM corruption problem looks like when you run btrfs scrub.
Maybe the RAM itself is OK, but *something* is scribbling on it.

Does the Arch live usb use the same kernel as your normal system?

> Almost no patches get applied by the Arch kernel team:
> https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/linux
> At the moment the only one is an harmless
> "change-default-console-loglevel.patch".

Did you try an older (or newer) kernel?  I've been running 4.5.x on a few
canary systems, but so far none of them have survived more than a day.
Contrast with 4.1.x and 4.4.x, which runs for months between reboots
for me.  Maybe there's a regression in 4.5.x, maybe I did something
wrong in my config or build, or maybe I just have too few data points
to draw any conclusions, but my data so far is telling me to stay on
4.4.x until something changes (i.e. wait for a 4.5.x stable update or
skip directly to 4.6.x).  :-/

It's always worth trying this if only to eliminate regression as a
possible root cause early.  In practice, every mainline kernel release
has a regression that affects at least one combination of config options
and hardware.  btrfs is stable enough now that you can be running one
or two releases behind to avoid a problem elsewhere in the kernel.

> Another option will be crashing it with my car's wheels hoping that because
> of my comprehensive insurance policy Dell will give me the next model (the
> Skylake one) as a replacement (hoping that it will not suffer from the same
> issue of the Broadwell one).

The first rule of Insurance Fraud Club:  don't talk about Insurance
Fraud Club.  ;)

It's possible there's a problem that affects only very specific chipsets
You seem to have eliminated RAM in isolation, but there could be a problem
in the kernel that affects only your chipset.



signature.asc
Description: Digital signature


Re: [PATCH] fstests: test creating a symlink and then fsync its parent directory

2016-05-12 Thread Josef Bacik

On 04/24/2016 09:26 PM, fdman...@kernel.org wrote:

From: Filipe Manana 

Test creating a symlink, fsync its parent directory, power fail and mount
again the filesystem. After these steps the symlink should exist and its
content must match what we specified when we created it (must not be
empty or point to something else).

This is motivated by an issue in btrfs where after the log replay happens
we get empty symlinks, which not only does not make much sense from a
user's point of view, it's also not valid to have empty links in linux
(wgich is explicitly forbidden by the symlink(2) system call).

The issue in btrfs is fixed by the following patch for the linux kernel:

  "Btrfs: fix empty symlink after creating symlink and fsync parent dir"

Tested against ext3, ext4, xfs, f2fs, reiserfs and nilfs2.

Signed-off-by: Filipe Manana 


Reviewed-by: Josef Bacik 

Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Input/output error on newly created file

2016-05-12 Thread Diego Calleja
El jueves, 12 de mayo de 2016 8:46:00 (CEST) Nikolaus Rath escribió:
> *ping*
> 
> Anyone any idea?

All I can say is that I've had the same problem in the past. In my
case, the problematic files where active torrents. The interesting
thing is that I was able to read them correctly up to a point, then
I would get the same error as you. No messages in dmesg. The amount
of data I was able to read from them was not random, it was
something multiple of 4K. After reboot the problems went away and
I wasn't able to reproduce it.

There has been reports of similiar issues in the past:
http://www.spinics.net/lists/linux-btrfs/msg52371.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo

2016-05-12 Thread Niccolò Belli
Thanks for the detailed explanation, hopefully in the future someone will 
be able to make defrag snapshot/reflink aware in a scalable manner.
I will not use use defrag anymore, but what do you suggest me to do to 
reclaim the lost space? Get rid of my current snapshots or maybe simply 
running bedup?


Niccolò
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Input/output error on newly created file

2016-05-12 Thread Nikolaus Rath
*ping*

Anyone any idea?

Best,
-Nikolaus

On May 09 2016, Nikolaus Rath  wrote:
> On May 09 2016, Filipe Manana  wrote:
>> On Sun, May 8, 2016 at 11:18 PM, Nikolaus Rath  wrote:
>>> Hello,
>>>
>>> I just created an innocent 10 MB on a btrfs file system, yet my attempt
>>> to read it a few seconds later (and ever since), just gives:
>>>
>>> $ ls -l in-progress/mysterious-io-error
>>> -rw-rw-r-- 1 nikratio nikratio 10485760 May  8 14:41 
>>> in-progress/mysterious-io-error
>>> $ cat in-progress/mysterious-io-error
>>> cat: in-progress/mysterious-io-error: Input/output error
>>
>> If you unmount and mount again the filesystem, does it happen again?
>
> After rebooting, the previously unaccessible file can now be read. But I
> cannot tell if it contains the right data.
>
> However, I just encountered the same problem with another, freshly
> created file.
>
>> How did you create the file?
>
> In Python 3. The equivalent code is more or less:
>
> with open('file.dat', 'wb+') as fh:
>  for buf in generate_data():
>  fh.write(buf) # bufsize is about 128 kB
>
>
> However, I should note that there is a lot of activity in this
> file system (it contains my home directory), so the above alone will
> probably not reproduce the problem.
>
> That said, so far both the problematic files were created by the same
> application (S3QL, of which luckily I am also the maintainer).
>
>
>> Does fsck reports any issues?
>
> Do you mean btrfsck? It actually has a lot to say:
>
> checking extents
> checking free space cache
> checking fs roots
> root 5 inode 3149867 errors 400, nbytes wrong
> root 5 inode 3150237 errors 400, nbytes wrong
> root 5 inode 3150238 errors 400, nbytes wrong
> root 5 inode 3150242 errors 400, nbytes wrong
> root 5 inode 3150260 errors 400, nbytes wrong
> [...]
> root 5 inode 15595011 errors 400, nbytes wrong
> root 5 inode 15595016 errors 400, nbytes wrong
> Checking filesystem on /dev/mapper/vg0-nikratio_crypt
> UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
> found 263648960636 bytes used err is 1
> total csum bytes: 395314372
> total tree bytes: 908644352
> total fs tree bytes: 352735232
> total extent tree bytes: 95039488
> btree space waste bytes: 156301160
> file data blocks allocated: 675209801728
>  referenced 410351722496
> Btrfs v3.17
>
> However, the inode of the problematic file (16186241) is not
> mentioned. But I guess this is not surprising, because also for this
> file, I can read the contents after remounting.
>
>
> Best,
> -Nikolaus
>
>
> -- 
> GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
> Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
>
>  »Time flies like an arrow, fruit flies like a Banana.«
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair

2016-05-12 Thread Austin S. Hemmelgarn

On 2016-05-12 10:35, Niccolò Belli wrote:

On lunedì 9 maggio 2016 18:29:41 CEST, Zygo Blaxell wrote:

Did you also check the data matches the backup?  btrfs check will only
look at the metadata, which is 0.1% of what you've copied.  From what
you've written, there should be a lot of errors in the data too.  If you
have incorrect data but btrfs scrub finds no incorrect checksums, then
your storage layer is probably fine and we have to look at CPU, host RAM,
and software as possible culprits.

The logs you've posted so far indicate that bad metadata (e.g. negative
item lengths, nonsense transids in metadata references but sane transids
in the referred pages) is getting into otherwise valid and well-formed
btrfs metadata pages.  Since these pages are protected by checksums,
the corruption can't be originating in the storage layer--if it was, the
pages should be rejected as they are read from disk, before btrfs even
looks at them, and the insane transid should be the "found" one not the
"expected" one.  That suggests there is either RAM corruption happening
_after_ the data is read from disk (i.e. while the pages are cached in
RAM), or a severe software bug in the kernel you're running.


When doing the btrfs check I also always do a btrfs scrub and it never
found any error. Once it didn't manage to finish the scrub because of:
BTRFS critical (device dm-0): corrupt leaf, slot offset bad:
block=670597120,root=1, slot=6
and btrfs scrub status reported "was aborted after 00:00:10".

Talking about scrub I created a systemd timer to run scrub hourly and I
noticed 2 *uncorrectable* errors suddenly appeared on my system. So I
immediately re-run the scrub just to confirm it and then I rebooted into
the Arch live usb and runned btrfs check: the metadata were perfect. So
I runned btrfs scrub from the live usb and there were no errors at all!
I rebooted into my system and runned scrub once again and the
uncorrectable errors where really gone! It happened two times in the
past few days.
This would indicate to me that you've either got bad RAM (most likely), 
or some other hardware component is not working correctly.  It's not 
unusual for hardware issues to be intermittent.



Try different kernel versions (e.g. 4.4.9 or 4.1.23) in case whoever
maintains your kernel had a bad day and merged a patch they should
not have.


Almost no patches get applied by the Arch kernel team:
https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/linux
At the moment the only one is an harmless
"change-default-console-loglevel.patch".


Try a minimal configuration with as few drivers as possible loaded,
especially GPU drivers and anything from the staging subdirectory--when
these drivers have bugs, they ruin everything.


Arch kernel team is quite conservative regarding staging/experimental
features, I remember they rejected some config patches I submitted
because of this.
Anyway I will try to blacklist as many kernel modules as I can. Maybe
blacklisting GPU is too much because if I can't actually use my laptop
it will be much more difficult to reproduce the issue.
Disable the GPU driver, but make sure you have the VGA_CONSOLE config 
enabled, and you should be fine (you'll just get a 80x25 text-mode 
console instead of a high-resolution one).



Try memtest86+ which has a few more/different tests than memtest86.
I have encountered RAM modules that pass memtest86 but fail memtest86+
and vice versa.

Try memtester, a memory tester that runs as a Linux process, so it can
detect corruption caused when device drivers spray data randomly into
RAM,
or when the CPU thermal controls are influenced by Linux (an overheating
CPU-to-RAM bridge can really ruin your day, and some of the dumber laptop
designs rely on the OS for thermal management).

Try running more than one memory testing process, in case there is a bug
in your hardware that affects interactions between multiple cores
(memtest
is single-threaded).  You can run memtest86 inside a kvm (e.g. kvm
-m 3072 -kernel /boot/memtest86.bin) to detect these kinds of issues.

Kernel compiles are a bad way to test RAM.  I've successfully built
kernels on hosts with known RAM failures.  The kernels don't always work
properly, but it's quite rare to see a build fail outright.


I didn't use memtest86+ because of the lack of EFI support, but I just
tried the shiny new memtest86 7.0 beta with improved tests for 12+ hours
without issues.
Also I runned "memtester 4G" and "systester-cli -gausslg 64M -threads 4
-turns 10" together for 12 hours without any issue so I think both
my ram and cpu are ok.
That's probably a good indication of the CPU and the MB being OK, but 
not necessarily the RAM.  There's two other possible options for testing 
the RAM that haven't been mentioned yet though (which I hadn't thought 
of myself until now):
1. If you have access to Windows, try the Windows Memory Diagnostic. 
This runs yet another slightly different set of tests from memtest86 and 
memtest86+, 

Re: [PATCH 2/3] btrfs-progs: autogen: Make build success in CentOS 6 and 7

2016-05-12 Thread Jeff Mahoney
On 5/12/16 6:42 AM, Zhao Lei wrote:
> btrfs-progs build failed in CentOS 6 and 7:
>  #./autogen.sh
>  ...
>  configure.ac:131: error: possibly undefined macro: PKG_CHECK_VAR
>   If this token and others are legitimate, please use m4_pattern_allow.
>   See the Autoconf documentation.
>  ...
> 
> Seems PKG_CHECK_VAR is new in pkgconfig 0.28 (24-Jan-2013):
> http://redmine.audacious-media-player.org/boards/1/topics/736
> 
> And the max available version for CentOS 7 in yum-repo and
> rpmfind.net is: pkgconfig-0.27.1-4.el7
> http://rpmfind.net/linux/rpm2html/search.php?query=pkgconfig=Search+...=centos=
> 
> I updated my pkgconfig to 0.30, but still failed at above error.
> (Maybe it is my setting problem)
> 
> To make user in centos 6 and 7 building btrfs-progs without
> more changes, we can avoid using PKG_CHECK_VAR in following
> way found in:
> https://github.com/audacious-media-player/audacious-plugins/commit/f95ab6f939ecf0d9232b3165f9241d2ea9676b9e
> 
> Signed-off-by: Zhao Lei 
> ---
>  configure.ac | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/configure.ac b/configure.ac
> index 4688bc7..a754990 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -128,7 +128,7 @@ PKG_STATIC(UUID_LIBS_STATIC, [uuid])
>  PKG_CHECK_MODULES(ZLIB, [zlib])
>  PKG_STATIC(ZLIB_LIBS_STATIC, [zlib])
>  
> -PKG_CHECK_VAR([UDEVDIR], [udev], [udevdir])
> +UDEVDIR="$(pkg-config udev --variable=udevdir)"

You need, minimally, AC_SUBST(UDEVDIR) here as well.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair

2016-05-12 Thread Niccolò Belli

On lunedì 9 maggio 2016 18:29:41 CEST, Zygo Blaxell wrote:

Did you also check the data matches the backup?  btrfs check will only
look at the metadata, which is 0.1% of what you've copied.  From what
you've written, there should be a lot of errors in the data too.  If you
have incorrect data but btrfs scrub finds no incorrect checksums, then
your storage layer is probably fine and we have to look at CPU, host RAM,
and software as possible culprits.

The logs you've posted so far indicate that bad metadata (e.g. negative
item lengths, nonsense transids in metadata references but sane transids
in the referred pages) is getting into otherwise valid and well-formed
btrfs metadata pages.  Since these pages are protected by checksums,
the corruption can't be originating in the storage layer--if it was, the
pages should be rejected as they are read from disk, before btrfs even
looks at them, and the insane transid should be the "found" one not the
"expected" one.  That suggests there is either RAM corruption happening
_after_ the data is read from disk (i.e. while the pages are cached in
RAM), or a severe software bug in the kernel you're running.


When doing the btrfs check I also always do a btrfs scrub and it never 
found any error. Once it didn't manage to finish the scrub because of:
BTRFS critical (device dm-0): corrupt leaf, slot offset bad: 
block=670597120,root=1, slot=6

and btrfs scrub status reported "was aborted after 00:00:10".

Talking about scrub I created a systemd timer to run scrub hourly and I 
noticed 2 *uncorrectable* errors suddenly appeared on my system. So I 
immediately re-run the scrub just to confirm it and then I rebooted into 
the Arch live usb and runned btrfs check: the metadata were perfect. So I 
runned btrfs scrub from the live usb and there were no errors at all! I 
rebooted into my system and runned scrub once again and the uncorrectable 
errors where really gone! It happened two times in the past few days.



Try different kernel versions (e.g. 4.4.9 or 4.1.23) in case whoever
maintains your kernel had a bad day and merged a patch they should
not have.


Almost no patches get applied by the Arch kernel team: 
https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/linux
At the moment the only one is an harmless 
"change-default-console-loglevel.patch".



Try a minimal configuration with as few drivers as possible loaded,
especially GPU drivers and anything from the staging subdirectory--when
these drivers have bugs, they ruin everything.


Arch kernel team is quite conservative regarding staging/experimental 
features, I remember they rejected some config patches I submitted because 
of this.
Anyway I will try to blacklist as many kernel modules as I can. Maybe 
blacklisting GPU is too much because if I can't actually use my laptop it 
will be much more difficult to reproduce the issue.



Try memtest86+ which has a few more/different tests than memtest86.
I have encountered RAM modules that pass memtest86 but fail memtest86+
and vice versa.

Try memtester, a memory tester that runs as a Linux process, so it can
detect corruption caused when device drivers spray data randomly into RAM,
or when the CPU thermal controls are influenced by Linux (an overheating
CPU-to-RAM bridge can really ruin your day, and some of the dumber laptop
designs rely on the OS for thermal management).

Try running more than one memory testing process, in case there is a bug
in your hardware that affects interactions between multiple cores (memtest
is single-threaded).  You can run memtest86 inside a kvm (e.g. kvm
-m 3072 -kernel /boot/memtest86.bin) to detect these kinds of issues.

Kernel compiles are a bad way to test RAM.  I've successfully built
kernels on hosts with known RAM failures.  The kernels don't always work
properly, but it's quite rare to see a build fail outright.


I didn't use memtest86+ because of the lack of EFI support, but I just 
tried the shiny new memtest86 7.0 beta with improved tests for 12+ hours 
without issues.
Also I runned "memtester 4G" and "systester-cli -gausslg 64M -threads 4 
-turns 10" together for 12 hours without any issue so I think both my 
ram and cpu are ok.


I can think only about two possible culprits now (correct me if I'm wrong):
1) A btrfs bug
2) Another module screwing things around

I can do nothing about btrfs bugs so I will try to hunt the second option. 
This is the list of modules I'm running:


lsmod | awk '$4 == ""' | awk '{print $1}' | sort

8250_dw
ac
acpi_als
acpi_pad
aesni_intel
ahci
algif_skcipher
ansi_cprng
arc4
atkbd
battery
bnep
btrfs
btusb
cdc_ether
cmac
coretemp
crc32c_intel
crc32_pclmul
crct10dif_pclmul
dell_laptop
dell_wmi
dm_crypt
drbg
ecb
elan_i2c
evdev
ext4
fan
fjes
ghash_clmulni_intel
gpio_lynxpoint
hid_generic
hid_multitouch
hmac
i2c_designware_platform
i2c_hid
i2c_i801
i915
input_leds
int3400_thermal
int3402_thermal
int3403_thermal
intel_hid
intel_pch_thermal
intel_powerclamp
intel_rapl
ip_tables

Undelete deleted subvolume?

2016-05-12 Thread Andrei Borzenkov
I accidentally deleted wrong snapshot using SUSE snapper. Is it
possible to undelete subvolume? I know that it is possible to extract
files from old tree (although SLES12 does not seem to offer
btrfs-find-root), but is it possible to "reconnect" subvolume back?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATHCH] add option to supress "At subvol …" message in btrfs send

2016-05-12 Thread David Sterba
On Sat, May 07, 2016 at 06:29:58PM +0200, M G Berberich wrote:
> btrfs send puts a “At subvol …” on stderr, which is very annoying in
> scripts, esp. cron-jobs. Piping stderr to /dev/null does suppress this
> message, but also error-messages which one would probably want to
> see. I added an option to not change the behavior of btrfs send
> and possibly break existing scripts, but moving this message to
> verbose would be O.K. for me too.

We should use the current verbosity option. For compatibility reasons,
I'd keep the 'At subvol' printed as default, matching verbosity level 1.
All existing messages verbosity should then become 2, and the proposed
quiet option 0.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/1] btrfs-progs: Typo review of strings and comments

2016-05-12 Thread David Sterba
On Wed, May 11, 2016 at 07:50:35PM -0400, Nicholas D Steeves wrote:
> Thank you David Sterba for the btrfs-typos.txt which gave me a head
> start.  Unfortunately I wasn't able finish before btrfs-progs-4.5.3
> was released, because I decided to use emacs'
> ispell-comments-and-strings to do a full review.  I had to rebase to
> kdave's 4.5.2 branch on github, and that is what this patch will
> cleanly apply to.

Yes, patch applied cleanly.

> There were a couple of instances where I wasn't sure what to do; I've
> annotated them, and they are long lines now.  To find them, search or
> grep the diff for 'Steeves'.

Found and updated, most of them real typos, 'strtoull' is name of a C
library functioon.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: autogen: Make build success in CentOS 6 and 7

2016-05-12 Thread Zhao Lei
btrfs-progs build failed in CentOS 6 and 7:
 #./autogen.sh
 ...
 configure.ac:131: error: possibly undefined macro: PKG_CHECK_VAR
  If this token and others are legitimate, please use m4_pattern_allow.
  See the Autoconf documentation.
 ...

Seems PKG_CHECK_VAR is new in pkgconfig 0.28 (24-Jan-2013):
http://redmine.audacious-media-player.org/boards/1/topics/736

And the max available version for CentOS 7 in yum-repo and
rpmfind.net is: pkgconfig-0.27.1-4.el7
http://rpmfind.net/linux/rpm2html/search.php?query=pkgconfig=Search+...=centos=

I updated my pkgconfig to 0.30, but still failed at above error.
(Maybe it is my setting problem)

To make user in centos 6 and 7 building btrfs-progs without
more changes, we can avoid using PKG_CHECK_VAR in following
way found in:
https://github.com/audacious-media-player/audacious-plugins/commit/f95ab6f939ecf0d9232b3165f9241d2ea9676b9e

Signed-off-by: Zhao Lei 
---
 configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 4688bc7..a754990 100644
--- a/configure.ac
+++ b/configure.ac
@@ -128,7 +128,7 @@ PKG_STATIC(UUID_LIBS_STATIC, [uuid])
 PKG_CHECK_MODULES(ZLIB, [zlib])
 PKG_STATIC(ZLIB_LIBS_STATIC, [zlib])
 
-PKG_CHECK_VAR([UDEVDIR], [udev], [udevdir])
+UDEVDIR="$(pkg-config udev --variable=udevdir)"
 
 dnl lzo library does not provide pkg-config, let use classic way
 AC_CHECK_LIB([lzo2], [lzo_version], [
-- 
1.8.5.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: autogen: Avoid chdir fail on dirname with blank

2016-05-12 Thread Zhao Lei
If source put in dir with blanks, as:
  /var/lib/jenkins/workspace/btrfs progs

autogen will failed:
./autogen.sh: line 95: cd: /var/lib/jenkins/workspace/btrfs: No such file or 
directory

Can be fixed by adding quotes into cd command.

Signed-off-by: Zhao Lei 
---
 autogen.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/autogen.sh b/autogen.sh
index 9669850..8b9a9cb 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -92,7 +92,7 @@ find_autofile config.guess
 find_autofile config.sub
 find_autofile install-sh
 
-cd $THEDIR
+cd "$THEDIR"
 
 echo
 echo "Now type '$srcdir/configure' and 'make' to compile."
-- 
1.8.5.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: autogen: Don't show success message on fail

2016-05-12 Thread Zhao Lei
When autogen.sh failed, the success message is still in output:
 # ./autogen.sh
 ...
 configure.ac:131: error: possibly undefined macro: PKG_CHECK_VAR
  If this token and others are legitimate, please use m4_pattern_allow.
  See the Autoconf documentation.

 Now type './configure' and 'make' to compile.
 #

Fixed by check return value of autoconf.

After patch:
 # ./autogen.sh
 ...
 configure.ac:132: error: possibly undefined macro: PKG_CHECK_VAR
  If this token and others are legitimate, please use m4_pattern_allow.
  See the Autoconf documentation.
 #

Signed-off-by: Zhao Lei 
---
 autogen.sh | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/autogen.sh b/autogen.sh
index 8b9a9cb..a5f9af2 100755
--- a/autogen.sh
+++ b/autogen.sh
@@ -64,9 +64,10 @@ echo "   automake:   $(automake --version | head -1)"
 chmod +x version.sh
 rm -rf autom4te.cache
 
-aclocal $AL_OPTS
-autoconf $AC_OPTS
-autoheader $AH_OPTS
+aclocal $AL_OPTS &&
+autoconf $AC_OPTS &&
+autoheader $AH_OPTS ||
+exit 1
 
 # it's better to use helper files from automake installation than
 # maintain copies in git tree
-- 
1.8.5.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo

2016-05-12 Thread Duncan
Niccolò Belli posted on Wed, 11 May 2016 21:50:43 +0200 as excerpted:

> Hi,
> Before doing the daily backup I did a btrfs check and btrfs scrub as
> usual.

> After that this time I also decided to run btrfs filesystem defragment
> -r -v -clzo on all subvolumes (from a live distro) and just to be sure I
> runned check and scrub once again.
> 
> Before defragment: total bytes scrubbed: 15.90GiB with 0 errors
> After  defragment: total bytes scrubbed: 26.66GiB with 0 errors
> 
> What did happen? This is something like a night and day difference:
> almost double the data! As stated in the subject all the subolumes have
> always been mounted with compress=lzo in /etc/fstab, even when I
> installed the distro a couple of days ago I manually mounted the
> subvolumes with -o compress=lzo. Instead I never used autodefrag.

I'd place money on your use of either snapshots or dedup.  As CAM says 
(perhaps too) briefly, defrag isn't snapshot (technically, reflink) 
aware, and will break reflinks from other snapshots/dedups as it defrags 
whatever file it's currently working on.

If there's few to no reflinks, as there won't be if you're not using 
snapshots, btrfs dedup, etc, no problem, but where there's existing 
reflinks, the mechanism both snapshots and the various btrfs dedup tools 
use, it will rewrite only the copy of the data it's working on, leaving 
the others as they are, thus effectively doubling (for the snapshots and 
first defrag case) the data usage, the old possibly multiply snapshot-
reflinked copy, and the new defragged copy that no longer shares extents 
with the snapshots and other previously reflinked copies.


And unlike a normal defrag, when you use the compress option, it forced 
rewrite of every file in ordered to (possibly re)compress it.  So while a 
normal defrag would have only rewritten some files and would have only 
expanded data usage to the extent it actually did rewrites, the compress 
option forced it to recompress all files it came across, breaking all 
those reflinks and duplicating the data if existing snapshots, etc, still 
referenced the old copies, in the process, thereby effectively doubling 
your data usage.

The fact that it didn't /quite/ double usage may be down to the normal 
compress mount option only doing a quick compression test and not 
compressing it if the file doesn't seem particularly compressible based 
on that quick test, while the defrag with compress likely actually checks 
every (128 KiB compression) block, getting a bit better compression in 
the process.  So the defrag/compress run didn't quite double usage as it 
compressed some stuff that the runtime compression didn't.  (FWIW, you 
can get the more thorough runtime compression behavior with the compress-
force option, which always tries compression, not just doing a quick test 
and skipping compression on the entire file if the bit the test tried 
didn't compress so well.)


FWIW, around 3.9, btrfs defrag was actually snapshot/reflink aware for a 
few releases, but it turned out that dealing with all those reflinks 
simply didn't scale well with the then existing code, and people were 
reporting defrag runs taking days or weeks, to (would-be) months with 
enough snapshots and with quotas (which didn't scale well either) turned 
on.

Obviously that was simply unworkable, so defrag's snapshot awareness was 
reverted until they could make it scale better, as a working but snapshot 
unaware defrag was clearly more practical than one that couldn't be run 
because it'd take months, and that snapshot awareness has yet to be 
reactivated.


So now the bottom line is don't defrag what you don't want un-reflinked.


FWIW, autodefrag has the same problem in theory, but the effect in 
practice is far more limited, in part because it only does its defrag 
thing when some part of the file is being rewritten (and thus COWed 
elsewhere, doing a limited dereflink already for the actually written 
block(s) already, and while autodefrag will magnify that a bit by COWing 
somewhat larger extents, for files of any size (MiB scale and larger) 
it's not going to rewrite and thus duplicate the entire file, as as 
defrag could do.  And it's definitely not going to be rewriting all files 
in large sections of the filesystem as recursive defrag with the 
compression option will.

Additionally, autodefrag will tend to defrag the file shortly after it 
has been changed, likely before any snapshots have been taken if they're 
only taken daily or so, so you'll only have effectively two copies of the 
portion of the file that was changed, the old version as still locked in 
place by previous snapshots and the new version, not the three that 
you're likely to have if you wait until snapshots have been done before 
doing the defrag (the old version as in previous snapshots, the new 
version as initially written and locked in place by post-change pre-
defrag snapshots, and the new version as defragged).

-- 
Duncan - List replies 

Re: Btrfs progs release 4.5.3

2016-05-12 Thread David Sterba
On Thu, May 12, 2016 at 09:14:44AM +, Duncan wrote:
> David Sterba posted on Wed, 11 May 2016 16:47:39 +0200 as excerpted:
> 
> > btrfs-progs 4.5.2 have been released. A bugfix release.
> 
> So 4.5.3 as stated in the subject line, or 4.5.2 as stated in the message 
> body?
> 
> Given that I'm on 4.5.2, it must be 4.5.3 that's just released. =:^)

The subject line is correct, I'm copying the previous release message
text and forgot to update the version everywhere.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Idea on compatibility for old distributions

2016-05-12 Thread David Sterba
On Thu, May 12, 2016 at 04:51:22PM +0800, Qu Wenruo wrote:
> David Sterba wrote on 2016/05/12 10:37 +0200:
> > On Thu, May 12, 2016 at 08:57:47AM +0800, Qu Wenruo wrote:
> >> Thanks for the info.
> >>
> >> I also read the phoronix news yesterday.
> >> So for RHEL6 that's meaningless.
> >>
> >> But I'm not sure whether it's still meaningless for OpenSUSE, maybe
> >> David has some plan on it?
> >
> > No plans.
> >
> So the compatible layer patch is just to allow btrfs-progs to be 
> compiled on old distros?

Yes, but convert compatibility is with e2fsprogs while
FIEMAP_EXTENT_SHARED is kernel interface.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs progs release 4.5.3

2016-05-12 Thread Duncan
David Sterba posted on Wed, 11 May 2016 16:47:39 +0200 as excerpted:

> btrfs-progs 4.5.2 have been released. A bugfix release.

So 4.5.3 as stated in the subject line, or 4.5.2 as stated in the message 
body?

Given that I'm on 4.5.2, it must be 4.5.3 that's just released. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-05-12 Thread David Sterba
On Wed, May 11, 2016 at 01:30:21PM -0700, Josef Bacik wrote:
> > Signed-off-by: Qu Wenruo 
> > Signed-off-by: Mark Fasheh 
> Reviewed-by: Josef Bacik 

Applied to for-next with the following fixup to make it bisectable:

---
btrfs: build fixup for qgroup_account_snapshot

The macro btrfs_std_error got renamed to btrfs_handle_fs_error in an
independent branch for the same merge target (4.7). To make the code
compilable for bisectability reasons, add a temporary stub.

Signed-off-by: David Sterba 
---
 fs/btrfs/transaction.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index d7172d7ced5f..530081388d77 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1311,6 +1311,11 @@ int btrfs_defrag_root(struct btrfs_root *root)
return ret;
 }

+/* Bisesctability fixup, remove in 4.8 */
+#ifndef btrfs_std_error
+#define btrfs_std_error btrfs_handle_fs_error
+#endif
+
 /*
  * Do all special snapshot related qgroup dirty hack.
  *
---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Idea on compatibility for old distributions

2016-05-12 Thread Qu Wenruo

David Sterba wrote on 2016/05/12 10:37 +0200:

On Thu, May 12, 2016 at 08:57:47AM +0800, Qu Wenruo wrote:

Thanks for the info.

I also read the phoronix news yesterday.
So for RHEL6 that's meaningless.

But I'm not sure whether it's still meaningless for OpenSUSE, maybe
David has some plan on it?


No plans.


So the compatible layer patch is just to allow btrfs-progs to be 
compiled on old distros?


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Idea on compatibility for old distributions

2016-05-12 Thread David Sterba
On Thu, May 12, 2016 at 08:57:47AM +0800, Qu Wenruo wrote:
> Thanks for the info.
> 
> I also read the phoronix news yesterday.
> So for RHEL6 that's meaningless.
> 
> But I'm not sure whether it's still meaningless for OpenSUSE, maybe 
> David has some plan on it?

No plans.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] fstests: generic: Test reserved extent map search routine on deduped file

2016-05-12 Thread Qu Wenruo
For fully deduped file, which means all its file exntents are pointing to
the same bytenr, btrfs can cause soft lockup when calling fiemap ioctl
on that file, like the following output:
--
CPU: 1 PID: 7500 Comm: xfs_io Not tainted 4.5.0-rc6+ #2
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox
12/01/2006
task: 880027681b40 ti: 8800276e task.ti: 8800276e
RIP: 0010:[]  []
__merge_refs+0x34/0x120 [btrfs]
RSP: 0018:8800276e3c08  EFLAGS: 0202
RAX: 8800269cc330 RBX: 8800269cdb18 RCX: 0007
RDX: 61b0 RSI: 8800269cc4c8 RDI: 8800276e3c88
RBP: 8800276e3c20 R08:  R09: 0001
R10:  R11:  R12: 880026ea3cb0
R13: 8800276e3c88 R14: 880027132a50 R15: 88002743
FS:  7f10201df700() GS:88003fa0()
knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f10201ec000 CR3: 27603000 CR4: 000406e0
Stack:
    8800276e3ce8
 a0259f38 0005 8800274c6870 8800274c7d88
 00c1  0001 27431190
Call Trace:
 [] find_parent_nodes+0x448/0x740 [btrfs]
 [] btrfs_check_shared+0x102/0x1b0 [btrfs]
 [] ? __might_fault+0x4d/0xa0
 [] extent_fiemap+0x2ac/0x550 [btrfs]
 [] ? __filemap_fdatawait_range+0x96/0x160
 [] ? btrfs_get_extent+0xb30/0xb30 [btrfs]
 [] btrfs_fiemap+0x45/0x50 [btrfs]
 [] do_vfs_ioctl+0x498/0x670
 [] SyS_ioctl+0x79/0x90
 [] entry_SYSCALL_64_fastpath+0x12/0x6f
Code: 41 55 41 54 53 4c 8b 27 4c 39 e7 0f 84 e9 00 00 00 49 89 fd 49 8b
34 24 49 39 f5 48 8b 1e 75 17 e9 d5 00 00 00 49 39 dd 48 8b 03 <48> 89
de 0f 84 b9 00 00 00 48 89 c3 8b 46 2c 41 39 44 24 2c 75
--

Also btrfs will return wrong flag for all these extents, they should
have SHARED(0x2000) flags, while btrfs still consider them as exclusive
extents.

On the other hand, with unmerged xfs reflink patches, xfs can handle it
without problem, and for patched btrfs, it can also handle it.

This test case will create a large fully deduped file to check if the fs
can handle the fiemap ioctl and return correct SHARED flag for any fs
which support reflink.

Reported-by: Tsutomu Itoh 
Signed-off-by: Qu Wenruo 
---
v2:
  Use more xfs_io wrapper instead of calling $XFS_IO_PROGS
  Add fiemap requirement
  Refactor output to match golden output if LOAD_FACTOR is not 1
---
 common/punch  | 17 +
 tests/generic/352 | 98 +++
 tests/generic/352.out |  5 +++
 tests/generic/group   |  1 +
 4 files changed, 121 insertions(+)
 create mode 100755 tests/generic/352
 create mode 100644 tests/generic/352.out

diff --git a/common/punch b/common/punch
index 43f04c2..44c6e1c 100644
--- a/common/punch
+++ b/common/punch
@@ -218,6 +218,23 @@ _filter_fiemap()
_coalesce_extents
 }
 
+_filter_fiemap_flags()
+{
+   $AWK_PROG '
+   $3 ~ /hole/ {
+   print $1, $2, $3;
+   next;
+   }
+   $5 ~ /0x[[:xdigit:]]*8[[:xdigit:]][[:xdigit:]]/ {
+   print $1, $2, "unwritten";
+   next;
+   }
+   $5 ~ /0x[[:xdigit:]]+/ {
+   print $1, $2, $5;
+   }' |
+   _coalesce_extents
+}
+
 # Filters fiemap output to only print the 
 # file offset column and whether or not
 # it is an extent or a hole
diff --git a/tests/generic/352 b/tests/generic/352
new file mode 100755
index 000..3537074
--- /dev/null
+++ b/tests/generic/352
@@ -0,0 +1,98 @@
+#! /bin/bash
+# FS QA Test 352
+#
+# Test fiemap ioctl on heavily deduped file
+#
+# This test case will check if reserved extent map searching go
+# without problem and return correct SHARED flag.
+# Which btrfs will soft lock up and return wrong shared flag.
+#
+#---
+# Copyright (c) 2016 Fujitsu.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!

Re: BTRFS Data at Rest File Corruption

2016-05-12 Thread Chris Murphy
What are the mount options for this filesystem?

Maybe filter with grep/egrep the journalctl output or
/var/log/messages for -i btrfs, and also for libata/scsi messages.
Anything over previous days might reveal some clue.

I've got multiple Btrfs raid1's and several times I've had
*correctable* errors. So your expectation is proper.


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.4.0 - no space left with >1.7 TB free space left

2016-05-12 Thread Tomasz Chmielewski

On 2016-05-12 15:03, Tomasz Chmielewski wrote:


FYI, I'm still getting this with 4.5.3, which probably means the fix
was not yet included ("No space left" at snapshot time):

/var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC
LOG:  could not close temporary statistics file
"pg_stat_tmp/db_0.tmp": No space left on device
/var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC
LOG:  could not close temporary statistics file
"pg_stat_tmp/global.tmp": No space left on device
/var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC
LOG:  could not close temporary statistics file
"pg_stat_tmp/db_0.tmp": No space left on device
/var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC
LOG:  could not close temporary statistics file
"pg_stat_tmp/global.tmp": No space left on device


I've tried mounting with space_cache=v2, but it didn't help.


On the good side, I see it's in 4.6-rc7.


Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.4.0 - no space left with >1.7 TB free space left

2016-05-12 Thread Tomasz Chmielewski

On 2016-04-08 20:53, Roman Mamedov wrote:


> Do you snapshot the parent subvolume which holds the databases? Can you
> correlate that perhaps ENOSPC occurs at the time of snapshotting? If
> yes, then
> you should try the patch https://patchwork.kernel.org/patch/7967161/
>
> (Too bad this was not included into 4.4.1.)

By the way - was it included in any later kernel? I'm running 4.4.5 on
that server, but still hitting the same issue.


It's not in 4.4.6 either. I don't know why it doesn't get included, or 
what

we need to do. Last time I asked, it was queued:
http://www.spinics.net/lists/linux-btrfs/msg52478.html
But maybe that meant 4.5 or 4.6 only? While the bug is affecting people 
on

4.4.x today.


FYI, I'm still getting this with 4.5.3, which probably means the fix was 
not yet included ("No space left" at snapshot time):


/var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC LOG: 
 could not close temporary statistics file "pg_stat_tmp/db_0.tmp": No 
space left on device
/var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC LOG: 
 could not close temporary statistics file "pg_stat_tmp/global.tmp": No 
space left on device
/var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC LOG: 
 could not close temporary statistics file "pg_stat_tmp/db_0.tmp": No 
space left on device
/var/log/postgresql/postgresql-9.3-main.log:2016-05-11 06:06:10 UTC LOG: 
 could not close temporary statistics file "pg_stat_tmp/global.tmp": No 
space left on device



I've tried mounting with space_cache=v2, but it didn't help.



Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html