Hi all
$ uname -a
Gentoo Linux s9 3.3.1-pf #2 SMP PREEMPT Mon Apr 9 00:35:28 EEST 2012
i686 Intel(R) Core(TM) i5-2467M CPU @ 1.60GHz GenuineIntel GNU/Linux
I was running stuff for the past year or so on 4 partitions:
/dev/sda1 -> dm-crypt -> btrfs raid 0 ROOT 10.0GB
/dev/sda2 -> dm-crypt -> btrfs raid 0 ROOT 10.0GB
/dev/sda3 -> dm-crypt -> btrfs raid 0 HOME 10.0GB
/dev/sda4 -> dm-crypt -> btrfs raid 0 HOME 10.0GB
Both filesystems mounted with "noatime,nodiratime,ssd,discard,compress=lzo"
I set that multi-partition monster up back in the 2.6.36ish days, when
dm-crypt either was not capable of utilizing multicores on a single
partition or I possibly didn't know that it already could. At one point
it definitely couldn't.
So over time HOME started filling up and at the point of last night's
baby eating "df -hT" showed 1.7G free. Yes I know free space is
complicated in btrfs. Space had not been an issue so I didn't think to
use any better tools regularly to check, such as "btrfs fi show" I guess.
I upgraded my 3.2.2-pf to 3.3.1-pf* and proceeding to launching my
regular apps Firefox, TB, office, etc. Except they all hung. Checking my
/var/log/message window revealed what was happening:
* pf-sources => http://pf.natalenko.name/
...
Apr 8 02:45:52 s9 sudo: leho : TTY=pts/0 ; PWD=/home/leho ;
USER=root ; COMMAND=/bin/tail -
f /home/leho/.tail/awesome-leho /home/leho/.tail/messages
/home/leho/.tail/openvpn.log
Apr 8 02:45:52 s9 sudo: pam_unix(sudo:session): session opened for user
root by (uid=0)
Apr 8 02:46:11 s9 kernel: [ 189.691778] attempt to access beyond end
of device
Apr 8 02:46:11 s9 kernel: [ 189.691787] dm-3: rw=129, want=23361976,
limit=20967424
Apr 8 02:46:11 s9 kernel: [ 189.691792] attempt to access beyond end
of device
Apr 8 02:46:11 s9 kernel: [ 189.691795] dm-3: rw=129, want=27556216,
limit=20967424
Apr 8 02:46:11 s9 kernel: [ 189.691799] attempt to access beyond end
of device
...
Apr 8 02:46:11 s9 kernel: [ 189.691869] attempt to access beyond end
of device
Apr 8 02:46:11 s9 kernel: [ 189.691874] dm-3: rw=129, want=69498616,
limit=20967424
...
Apr 8 02:46:11 s9 kernel: [ 189.692233] attempt to access beyond end
of device
Apr 8 02:46:11 s9 kernel: [ 189.692237] dm-3: rw=129, want=228879736,
limit=20967424
(thousands of lines of this, as we can see "want" gets bigger all the time)
And it was all downhill from there. Result is a majorly corrupted
filesystem that seems to be beyond repair. Hard rebooting back started
giving csum errors in various spots and any modifications to the
filesystem, even deleting files, would start another flood of "attempt
to access beyond end of device", totally messing up syslog-ng. With
blazing speedsc of an SSD that probably isn't a surprise.
So searching around, I found out about the ENOSPC thing which is
possibly still an issue in 3.3. Is there any useful info I could provide
for this? I now have some bigger partitions and probably won't run out
of space again for a while.
I also discovered the btrfs "restore" binary, although possibly it was
too late, since I had already hard rebooted a few times and done some
more damage to HOME. This thing returned a whole bunch of "ret is -3"
messages, and 0 byte files. Occasionally files were good as well. But
majority of the files, seems to corrupt. When running out of space
happens, is this a reasonable result to expect?
"btrfs scrub" reported uncorrectable errors count in the millions. At
least thousands of csum mismatch errors visible in dmesg.
"btrfs balance" would bomb the machine with the same "access beyond end
of device".
I made images of the two btrfs partitions on sda3 and sda4 for future
diagnosis. I do think they are pretty corrupt though. Or could there be
some magic poke or offset that would make more stuff magically
"restore"-able :>
So in conclusion:
* is filesystem-wide corruption like this helped by running on top of
dm-crypt or btrfs multi device? dm-crypt is definitely staying for me,
but I did consolidate partitions now to just 2.
* what exactly should happen when an out of space scenario like the
above happens?
* I guess I should keep an eye on "btrfs fi show" on the regular?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html