On Tue, Jul 17, 2012 at 12:29:33AM +0300, Sami Liedes wrote: > So, currently my idea is to boot the machine with a live USB stick, > install kvm and make qemu qcow images backed by the real (2*1.1T) > devices, but writing changes to the qcow images (I dare not mess with > the actual devices, and don't happen to have quite 2.2T extra space > outside of them...), and try to run scrub there. If that succeeds and > the bug happens there too, debugging *should* be easier, and it > *should* be possible to run it under KMEMCHECK too. If the bug doesn't > happen inside a virtual machine, that would be interesting information > too.
I have now been able to reproduce the bug in KVM with the setup described above. I think it's safe to say now that the bug depends on some kind of interaction between btrfs and dm-crypt. With the following setup, the bug does NOT happen: * kvm, single cpu * sees 3 disks, /dev/vda=root, /dev/vdb=btrfs-dev1, /dev/vdc=btrfs-dev2 * The btrfs devices are essentially snapshots of the real btrfs devices in raid-1 configuration (2*1.1T). As the real devices are encrypted, the decryption is done outside the KVM, i.e. the KVM snapshots are backed by the decrypted devices. With the following setup, the bug DOES happen: * kvm, single cpu * sees 3 disks, /dev/vda=root, /dev/vdb=part1, /dev/vdc=part2, where part[12] is are LUKS containers containing the individual btrfs devices * inside kvm, they are opened using cryptsetup luksOpen /dev/vdb root1 cryptsetup luksOpen /dev/vdc root2 * after this, the filesystem is mounted with mount /dev/mapper/root1 /media -o device=/dev/mapper/root1,device=/dev/mapper/root2 * The devices are snapshots of the actual physical encrypted partitions containing the btrfs devices. I have not yet figured out if this can be reproduced using a pristine, smaller btrfs filesystem in raid-1 configuration inside KVM or if there's something about my specific filesystem that causes this. I can investigate that too; it's easier to do for me than the above testing, as I don't need to have continuous physical access to the computer to do that. Here's the .config of the kernel I used inside KVM to reproduce this: http://www.niksula.hut.fi/~sliedes/btrfs/config.3.4.4 I also ran the same tests with KMEMCHECK. Both with and without crypto, there were quite a number of (of course possibly false) warnings from btrfs code. I doubt any of them are related to this bug - there were no KMEMCHECK warnings during the scrub operation. Here are the logs, anyway: http://www.niksula.hut.fi/~sliedes/btrfs/screenlog.nocrypto.gz http://www.niksula.hut.fi/~sliedes/btrfs/screenlog.crypto.gz Sami
signature.asc
Description: Digital signature