Niccolò Belli wrote on 2016/05/05 01:21 +0200:
I really need your help, because it's the second time btrfs ate my data
in a couple of days and I can't use my laptop if I don't find the culprit.

This was the mail I sent a couple of days ago:
https://www.spinics.net/lists/linux-btrfs/msg54754.html

Output in that mail shows obvious tree block corruption:
checksum verify failed on 245498111 found C7652CC3 wanted 00000000
checksum verify failed on 245498111 found C7652CC3 wanted 00000000
checksum verify failed on 245498111 found C7652CC3 wanted 00000000
checksum verify failed on 245498111 found C7652CC3 wanted 00000000
bytenr mismatch, want=245498111, have=8454382400481263616

That's the root cause of following tons of error.
I assume it maybe the same cause this time.

I previously thought the culprit was a bug in kernel 4.6-rc, but I was
wrong.

Then I reinstalled the whole system (Arch Linux) from scratch, and after
just two days I lost some of my data, again. Once again btrfs check
--repair got stuck in an infinite loop and I can't repair my fs. The
system has always been shutdown properly, except for a single time when
I had to forcedly power it off just after the boot because I didn't see
any signal on the screen.

First the obvious things:

- memory is ok
(https://drive.google.com/open?id=0Bwe9Wtc-5xF1VnJ0SE9fT1FZMTg)
- disk is ok
(https://drive.google.com/open?id=0Bwe9Wtc-5xF1NGRhd2daVDRJVGc)
- tlp has SATA_LINKPWR_ON_BAT=max_performance
(https://drive.google.com/open?id=0Bwe9Wtc-5xF1dFAwUE5ETVpNWGM)
- rootfs mount options:
rw,noatime,compress=lzo,ssd,discard,space_cache,autodefrag,subvolid=257,subvol=/@

- Command line: BOOT_IMAGE=/@/boot/vmlinuz-linux
root=UUID=4fc2278e-f6e8-4a21-8876-cabbf885bb2e rw rootflags=subvol=@
cryptdevice=/dev/disk/by-uuid/c7c8f501-507c-4bd2-a80a-8c7360651f02:cryptroot:allow-discards
quiet
- scrub didn't find any error:
$ sudo btrfs scrub status /
scrub status for 4fc2278e-f6e8-4a21-8876-cabbf885bb2e
       scrub started at Thu May  5 00:57:30 2016 and finished after
00:00:45
       total bytes scrubbed: 22.26GiB with 0 errors

I have the whole rootfs encrypted, including boot. I followed these
steps:
https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_an_entire_system#Btrfs_subvolumes_with_swap


Would it be OK for you to test your btrfs on a plain ssd, without encryption?

I know this suggestion is quite rude, but this would hugely reduce the possible layers we need to investigate.

And just as Chris Murphy said, reducing mount option is also a pretty good debugging start point.


Disk is a SAMSUNG SSD PM851 M.2 2280 256GB (Firmware Version: EXT25D0Q).
Laptop is a Dell XPS 13 9343 QHD+.
Distro is Arch Linux, kernel version is 4.5.1. btrfs-progs is 4.5.2.

After two days from the previous data loss I finished reinstalling my
distro from scratch, then I decided to do a full backup from a snapshot
using tar. This is what I got while trying to backup my data:

tar: usr/share/kig/icons/hicolor/32x32/actions/test.png: errore di
lettura al byte 0 leggendo 810 byte: Errore di input/output
tar: usr/share/kig/icons/hicolor/32x32/actions/circlebpd.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/pointOnLine.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/bezierN.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/convexhull.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/centerofcurvature.png:
funzione "stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/en.png: funzione "stat"
non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/circlebps.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/directrix.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/beziercurves.png:
funzione "stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/segment_midpoint.png:
funzione "stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/distance.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/circlebcl.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/conicb5p.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/kig_polygon.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/conicasymptotes.png:
funzione "stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/pointxy.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/attacher.png: funzione
"stat" non riuscita: Stale file handle
tar:
usr/share/kig/icons/hicolor/32x32/actions/coniclineintersection.png:
funzione "stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/vectorsum.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/rbezier4.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/ellipsebffp.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/angle.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/kig_text.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/vectordifference.png:
funzione "stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/segmentaxis.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/radicalline.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/polygonsides.png:
funzione "stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/projection.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/inversion.png: funzione
"stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/bezier4.png: funzione
"stat" non riuscita: Stale file handle
tar:
usr/share/kig/icons/hicolor/32x32/actions/equilateralhyperbolab4p.png:
funzione "stat" non riuscita: Stale file handle
tar: usr/share/kig/icons/hicolor/32x32/actions/areaCircle.png: funzione
"stat" non riuscita: Stale file handle
tar: var/lib/samba/private/msg.sock/666: socket ignorato
tar: Uscita con stato di fallimento in base agli errori precedenti


[ 3057.008185] BTRFS error (device dm-0): parent transid verify failed
on 528089088 wanted 3458764513820541211 found 283

Tree blocks are again heavily damaged.
Wanted transid is super large, definitely not sane.

So parent node is already corrupted.
Although the child transid, 283 seems quite valid.


[ 3057.008195] BTRFS error (device dm-0): error loading props for ino
183988 (root 505): -5
[ 3057.008417] BTRFS error (device dm-0): parent transid verify failed
on 528089088 wanted 3458764513820541211 found 283
[ 3057.008631] BTRFS error (device dm-0): parent transid verify failed
on 528089088 wanted 3458764513820541211 found 283
[ 3057.009165] BTRFS error (device dm-0): parent transid verify failed
on 528089088 wanted 3458764513820541211 found 283
[ 3057.009389] BTRFS error (device dm-0): parent transid verify failed
on 528089088 wanted 3458764513820541211 found 283
[ 3057.009734] BTRFS error (device dm-0): parent transid verify failed
on 528089088 wanted 3458764513820541211 found 283
[ 3057.009960] BTRFS error (device dm-0): parent transid verify failed
on 528089088 wanted 3458764513820541211 found 283
[ 3057.010664] BTRFS error (device dm-0): parent transid verify failed
on 528089088 wanted 3458764513820541211 found 283
[ 3057.010888] BTRFS error (device dm-0): parent transid verify failed
on 528089088 wanted 3458764513820541211 found 283
[ 3057.011201] BTRFS error (device dm-0): parent transid verify failed
on 528089088 wanted 3458764513820541211 found 283
[ 3331.795474] verify_parent_transid: 57 callbacks suppressed
[ 3331.795480] BTRFS error (device dm-0): parent transid verify failed
on 528089088 wanted 3458764513820541211 found 283
[ 3331.795776] BTRFS error (device dm-0): parent transid verify failed
on 528089088 wanted 3458764513820541211 found 283

I made a copy of /dev/mapper/cryptroot with dd on an external drive and
I run btrfs check on it (btrfs-progs 4.5.2):
https://drive.google.com/open?id=0Bwe9Wtc-5xF1SjJacXpMMU5mems (37MB)

Checked, but seems the output is truncated?

Thanks,
Qu


Then I tried to run btrfs check --repair on it but once again it got
stuck in an infinite loop like this one
(https://www.spinics.net/lists/linux-btrfs/msg54146.html) and after an
hour of looping and several hundreds of MBs of logs I had to kill it.
Here is the log, truncated to 30MB:
https://drive.google.com/open?id=0Bwe9Wtc-5xF1SmRuVUlfeGRES3M

They are probably not needed but here is snapper -c @ list:
https://drive.google.com/open?id=0Bwe9Wtc-5xF1N0llOFpfVXVwNVk
and btrfs subvolume list -p /:
https://drive.google.com/open?id=0Bwe9Wtc-5xF1andCdWZzeV9VbDg

This is the link to the whole gdrive directory with all the logs:
https://drive.google.com/open?id=0Bwe9Wtc-5xF1UFltcXhtRmt4YjA

I really don't know what may be the problem, maybe discard? I can't
think about switching back to ext4 and losing snapshots, transactions,
compression, incremental send/receive backups etc.
I would really love being able to do something to fix it, but I don't
have the slightest idea about what's the problem. Hopefully someone here
will be smarter than me and find the problem, otherwise I will have to
switch to ext4 because I need my laptop to work.

Thanks,
Niccolò
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to