On Tue, Jul 17, 2012 at 12:29:33AM +0300, Sami Liedes wrote:
> So, currently my idea is to boot the machine with a live USB stick,
> install kvm and make qemu qcow images backed by the real (2*1.1T)
> devices, but writing changes to the qcow images (I dare not mess with
> the actual devices, and don't happen to have quite 2.2T extra space
> outside of them...), and try to run scrub there. If that succeeds and
> the bug happens there too, debugging *should* be easier, and it
> *should* be possible to run it under KMEMCHECK too. If the bug doesn't
> happen inside a virtual machine, that would be interesting information
> too.

I have now been able to reproduce the bug in KVM with the setup
described above.

I think it's safe to say now that the bug depends on some kind of
interaction between btrfs and dm-crypt. With the following setup, the
bug does NOT happen:

* kvm, single cpu

* sees 3 disks, /dev/vda=root, /dev/vdb=btrfs-dev1, /dev/vdc=btrfs-dev2

* The btrfs devices are essentially snapshots of the real btrfs
  devices in raid-1 configuration (2*1.1T). As the real devices are
  encrypted, the decryption is done outside the KVM, i.e. the KVM
  snapshots are backed by the decrypted devices.

With the following setup, the bug DOES happen:

* kvm, single cpu

* sees 3 disks, /dev/vda=root, /dev/vdb=part1, /dev/vdc=part2, where
  part[12] is are LUKS containers containing the individual btrfs
  devices

* inside kvm, they are opened using

    cryptsetup luksOpen /dev/vdb root1
    cryptsetup luksOpen /dev/vdc root2

* after this, the filesystem is mounted with

    mount /dev/mapper/root1 /media -o 
device=/dev/mapper/root1,device=/dev/mapper/root2

* The devices are snapshots of the actual physical encrypted
  partitions containing the btrfs devices.

I have not yet figured out if this can be reproduced using a pristine,
smaller btrfs filesystem in raid-1 configuration inside KVM or if
there's something about my specific filesystem that causes this. I can
investigate that too; it's easier to do for me than the above testing,
as I don't need to have continuous physical access to the computer to
do that.

Here's the .config of the kernel I used inside KVM to reproduce this:

  http://www.niksula.hut.fi/~sliedes/btrfs/config.3.4.4

I also ran the same tests with KMEMCHECK. Both with and without
crypto, there were quite a number of (of course possibly false)
warnings from btrfs code. I doubt any of them are related to this bug
- there were no KMEMCHECK warnings during the scrub operation. Here
are the logs, anyway:

  http://www.niksula.hut.fi/~sliedes/btrfs/screenlog.nocrypto.gz
  http://www.niksula.hut.fi/~sliedes/btrfs/screenlog.crypto.gz

        Sami

Attachment: signature.asc
Description: Digital signature

Reply via email to