Hello!

I tried to "cp --reflink" a huge file (about 80G, a VMware disk
image). It took maybe about 1 minute when my PC started thrashing the
hard disk, some minutes later the command returned with an out of
memory message. I could no longer open terminals in my KDE Konsole to
investiage dmesg. I could not start new programs. I could not log out.
Hard disk access was somehow blocked. Opening new terminals within
Konsole yielded in red letters. "unable to start /bin/bash" after a
few seconds.

I rebooted using the reset button, Alt+Print+S didn't seem to sync
anything to the disk - I tried. The system booted up just fine, some
error messages about unusable free space caches came up but it booted
up into the login manager. However, I can no longer login: KDE startup
freezes the system. If I ssh into the box first, I can see some dmesg
output related to "bad blocks" and some "transid" errors. So I did a
scrub, this is what I get:

# jupiter btrfs-progs-unstable [git:integration-20110805] # ./btrfs
scr start -B /mnt/btrfs
ERROR: scrubbing /mnt/btrfs failed for device id 1 (Input/output error)
scrub canceled for 493dacb5-0397-4b47-bd18-c2b2349c9958
        scrub started at Sat Oct  8 00:41:59 2011 and was aborted
after 6113 seconds
        total bytes scrubbed: 586.67GB with 40 errors
        error details: verify=40
        corrected errors: 0, uncorrectable errors: 20, unverified errors: 0

jupiter ~ # while true; do dmesg -c; sleep 1; done
[13686.476028] zcache: destroyed pool id=0
[13844.990091] device fsid 493dacb5-0397-4b47-bd18-c2b2349c9958 devid
1 transid 45326 /dev/sdd3
[13844.990374] btrfs: use lzo compression
[13845.074088] btrfs: disk space caching is enabled
[13852.481822] zcache: created ephemeral tmem pool, id=0
[19577.056022] btrfs: unable to fixup at 641086156800
[19577.056211] btrfs: unable to fixup at 641086160896
[19577.056383] btrfs: unable to fixup at 641086164992
[19577.056555] btrfs: unable to fixup at 641086169088
[19577.056733] btrfs: unable to fixup at 641086173184
[19577.858378] btrfs: unable to fixup at 641086156800
[19577.858566] btrfs: unable to fixup at 641086160896
[19577.858736] btrfs: unable to fixup at 641086164992
[19577.858909] btrfs: unable to fixup at 641086169088
[19577.859083] btrfs: unable to fixup at 641086173184
[19986.054338] verify_parent_transid: 310 callbacks suppressed
[19986.054343] parent transid verify failed on 641086156800 wanted
43863 found 43873
[19986.054559] parent transid verify failed on 641086156800 wanted
43863 found 43873
[19986.054904] parent transid verify failed on 641086156800 wanted
43863 found 43873
[19986.062448] parent transid verify failed on 641086156800 wanted
43863 found 43873
[19986.062455] parent transid verify failed on 641086156800 wanted
43863 found 43873

I was able to rsync my /home to the original partition I created my
btrfs from about 2 weeks ago, so this is not a complete desaster - but
with many of these logged to dmesg:

[10902.814420] btrfs no csum found for inode 445127 start 58392576
[10902.815153] btrfs no csum found for inode 445127 start 58396672
[10902.815951] btrfs no csum found for inode 445127 start 58400768
[10902.816692] btrfs no csum found for inode 445127 start 58404864
[10902.817430] btrfs no csum found for inode 445127 start 58408960
[10902.818168] btrfs no csum found for inode 445127 start 58413056
[10902.818904] btrfs no csum found for inode 445127 start 58417152
[10902.819683] btrfs no csum found for inode 445127 start 58421248
[10902.820421] btrfs no csum found for inode 445127 start 58425344
[10902.821154] btrfs no csum found for inode 445127 start 58429440
[10902.821887] btrfs no csum found for inode 445127 start 58433536
[10902.822673] btrfs no csum found for inode 445127 start 58437632
[10902.823414] btrfs no csum found for inode 445127 start 58441728
[10902.824151] btrfs no csum found for inode 445127 start 58445824
[10902.824889] btrfs no csum found for inode 445127 start 58449920
[10902.825716] btrfs no csum found for inode 445127 start 58454016
[10960.325903] verify_parent_transid: 470 callbacks suppressed
[10960.325908] parent transid verify failed on 641086173184 wanted
43863 found 43873
[10960.326129] parent transid verify failed on 641086173184 wanted
43863 found 43873
[10960.326319] parent transid verify failed on 641086173184 wanted
43863 found 43873
[10960.334898] parent transid verify failed on 641086173184 wanted
43863 found 43873
[10960.334906] parent transid verify failed on 641086173184 wanted
43863 found 43873
[10960.334912] btrfs no csum found for inode 288125 start 8912896
[10960.335131] parent transid verify failed on 641086173184 wanted
43863 found 43873
[10960.335322] parent transid verify failed on 641086173184 wanted
43863 found 43873
[10960.335518] parent transid verify failed on 641086173184 wanted
43863 found 43873
[10960.335840] parent transid verify failed on 641086173184 wanted
43863 found 43873
[10960.335849] parent transid verify failed on 641086173184 wanted
43863 found 43873
[10960.335854] btrfs no csum found for inode 288125 start 8916992
[10960.336643] btrfs no csum found for inode 288125 start 8921088
[10960.337413] btrfs no csum found for inode 288125 start 8925184
[10960.338169] btrfs no csum found for inode 288125 start 8929280
[10960.338920] btrfs no csum found for inode 288125 start 8933376
[10960.339702] btrfs no csum found for inode 288125 start 8937472
[10960.340440] btrfs no csum found for inode 288125 start 8941568
[10960.341179] btrfs no csum found for inode 288125 start 8945664
[10960.341916] btrfs no csum found for inode 288125 start 8949760
[10960.342746] btrfs no csum found for inode 288125 start 8953856
[10960.343483] btrfs no csum found for inode 288125 start 8957952
[10960.344222] btrfs no csum found for inode 288125 start 8962048

I'm on Gentoo, using gentoo-sources 3.0.4 in the backup system (the
btrfs system runs on 3.0.6):

# uname -a
Linux jupiter 3.0.4-gentoo #1 SMP Sat Oct 1 17:20:43 CEST 2011 i686
Intel(R) Pentium(R) 4 CPU 3.20GHz GenuineIntel GNU/Linux

The btrfs partition consists of multiple sub volumes. I would not
loose my /home as I was able to sync it but I would loose the rest of
the file system which includes a complete Gentoo installation and some
big data files only recoverable by investing much time. So I'd love to
get rid of the problems scrub complains about. I don't mind if I would
have to delete some files which I can probably recover easily. But I
can simply find no way to identify these files. Is there a way to map
the above error messages to file system pathes?

Just for completeness, here's my fstab with mount options (the relevant part):

/dev/sdd3 / btrfs compress=lzo,autodefrag,subvol=root 0 1
/dev/sdd3 /home btrfs compress=lzo,autodefrag,subvol=home 0 2
/dev/sdd3 /usr/portage btrfs compress=lzo,autodefrag,subvol=portage 0 2
/dev/sdd3 /usr/src btrfs compress=lzo,autodefrag,subvol=usr-src 0 2
/dev/sdd3 /tmp btrfs compress=lzo,autodefrag,subvol=tmp,nodev,nosuid 0 2
/dev/sdd3 /var/tmp btrfs compress=lzo,autodefrag,subvol=var-tmp,nodev,nosuid 0 2
/dev/sdd3 /mnt/btrfs-subvol-0 btrfs
compress=lzo,subvolid=0,autodefrag,noauto 0 2

I got rid of the free space caching issues by mounting with
clear_cache. Memory of the system is stable (memtest86 does not report
errors). The hard disk is fresh from factory, no errors reported in
smartctl and no sector errors reported in dmesg. When the problem in
"cp --reflink" occured I know there was a backtrace in dmesg related
to btrfs but I was not able to capture it. Btrfs was created with meta
data mirroring and user data striping although the fs is single disk
only currently. I planned to add more disks when btrfs proves stable
for me.

Thanks in advance,
Kai
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to