Niccolò Belli wrote on 2016/05/05 01:21 +0200:
I really need your help, because it's the second time btrfs ate my data in a couple of days and I can't use my laptop if I don't find the culprit. This was the mail I sent a couple of days ago: https://www.spinics.net/lists/linux-btrfs/msg54754.html
Output in that mail shows obvious tree block corruption: checksum verify failed on 245498111 found C7652CC3 wanted 00000000 checksum verify failed on 245498111 found C7652CC3 wanted 00000000 checksum verify failed on 245498111 found C7652CC3 wanted 00000000 checksum verify failed on 245498111 found C7652CC3 wanted 00000000 bytenr mismatch, want=245498111, have=8454382400481263616 That's the root cause of following tons of error. I assume it maybe the same cause this time.
I previously thought the culprit was a bug in kernel 4.6-rc, but I was wrong. Then I reinstalled the whole system (Arch Linux) from scratch, and after just two days I lost some of my data, again. Once again btrfs check --repair got stuck in an infinite loop and I can't repair my fs. The system has always been shutdown properly, except for a single time when I had to forcedly power it off just after the boot because I didn't see any signal on the screen. First the obvious things: - memory is ok (https://drive.google.com/open?id=0Bwe9Wtc-5xF1VnJ0SE9fT1FZMTg) - disk is ok (https://drive.google.com/open?id=0Bwe9Wtc-5xF1NGRhd2daVDRJVGc) - tlp has SATA_LINKPWR_ON_BAT=max_performance (https://drive.google.com/open?id=0Bwe9Wtc-5xF1dFAwUE5ETVpNWGM) - rootfs mount options: rw,noatime,compress=lzo,ssd,discard,space_cache,autodefrag,subvolid=257,subvol=/@ - Command line: BOOT_IMAGE=/@/boot/vmlinuz-linux root=UUID=4fc2278e-f6e8-4a21-8876-cabbf885bb2e rw rootflags=subvol=@ cryptdevice=/dev/disk/by-uuid/c7c8f501-507c-4bd2-a80a-8c7360651f02:cryptroot:allow-discards quiet - scrub didn't find any error: $ sudo btrfs scrub status / scrub status for 4fc2278e-f6e8-4a21-8876-cabbf885bb2e scrub started at Thu May 5 00:57:30 2016 and finished after 00:00:45 total bytes scrubbed: 22.26GiB with 0 errors I have the whole rootfs encrypted, including boot. I followed these steps: https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_an_entire_system#Btrfs_subvolumes_with_swap
Would it be OK for you to test your btrfs on a plain ssd, without encryption?
I know this suggestion is quite rude, but this would hugely reduce the possible layers we need to investigate.
And just as Chris Murphy said, reducing mount option is also a pretty good debugging start point.
Disk is a SAMSUNG SSD PM851 M.2 2280 256GB (Firmware Version: EXT25D0Q). Laptop is a Dell XPS 13 9343 QHD+. Distro is Arch Linux, kernel version is 4.5.1. btrfs-progs is 4.5.2. After two days from the previous data loss I finished reinstalling my distro from scratch, then I decided to do a full backup from a snapshot using tar. This is what I got while trying to backup my data: tar: usr/share/kig/icons/hicolor/32x32/actions/test.png: errore di lettura al byte 0 leggendo 810 byte: Errore di input/output tar: usr/share/kig/icons/hicolor/32x32/actions/circlebpd.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/pointOnLine.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/bezierN.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/convexhull.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/centerofcurvature.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/en.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/circlebps.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/directrix.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/beziercurves.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/segment_midpoint.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/distance.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/circlebcl.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/conicb5p.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/kig_polygon.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/conicasymptotes.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/pointxy.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/attacher.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/coniclineintersection.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/vectorsum.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/rbezier4.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/ellipsebffp.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/angle.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/kig_text.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/vectordifference.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/segmentaxis.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/radicalline.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/polygonsides.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/projection.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/inversion.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/bezier4.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/equilateralhyperbolab4p.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/areaCircle.png: funzione "stat" non riuscita: Stale file handle tar: var/lib/samba/private/msg.sock/666: socket ignorato tar: Uscita con stato di fallimento in base agli errori precedenti [ 3057.008185] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283
Tree blocks are again heavily damaged. Wanted transid is super large, definitely not sane. So parent node is already corrupted. Although the child transid, 283 seems quite valid.
[ 3057.008195] BTRFS error (device dm-0): error loading props for ino 183988 (root 505): -5 [ 3057.008417] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.008631] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.009165] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.009389] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.009734] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.009960] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.010664] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.010888] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.011201] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3331.795474] verify_parent_transid: 57 callbacks suppressed [ 3331.795480] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3331.795776] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 I made a copy of /dev/mapper/cryptroot with dd on an external drive and I run btrfs check on it (btrfs-progs 4.5.2): https://drive.google.com/open?id=0Bwe9Wtc-5xF1SjJacXpMMU5mems (37MB)
Checked, but seems the output is truncated? Thanks, Qu
Then I tried to run btrfs check --repair on it but once again it got stuck in an infinite loop like this one (https://www.spinics.net/lists/linux-btrfs/msg54146.html) and after an hour of looping and several hundreds of MBs of logs I had to kill it. Here is the log, truncated to 30MB: https://drive.google.com/open?id=0Bwe9Wtc-5xF1SmRuVUlfeGRES3M They are probably not needed but here is snapper -c @ list: https://drive.google.com/open?id=0Bwe9Wtc-5xF1N0llOFpfVXVwNVk and btrfs subvolume list -p /: https://drive.google.com/open?id=0Bwe9Wtc-5xF1andCdWZzeV9VbDg This is the link to the whole gdrive directory with all the logs: https://drive.google.com/open?id=0Bwe9Wtc-5xF1UFltcXhtRmt4YjA I really don't know what may be the problem, maybe discard? I can't think about switching back to ext4 and losing snapshots, transactions, compression, incremental send/receive backups etc. I would really love being able to do something to fix it, but I don't have the slightest idea about what's the problem. Hopefully someone here will be smarter than me and find the problem, otherwise I will have to switch to ext4 because I need my laptop to work. Thanks, Niccolò -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html