Hi, On Fri, Oct 26, 2012 at 08:35:43AM +0200, Jan Schmidt wrote: > while running extensive qgroup and tree mod log tests I got to the following > BUG, which is probably not related to tree mod log:
I've left the fsx DIO tester overnight, after Josef identified the fix for the csum mismatch bug, and found it crashed at the same point as Jan reported: [80582.012720] ------------[ cut here ]------------ [80582.013045] kernel BUG at fs/btrfs/ctree.c:2945! [80582.013045] invalid opcode: 0000 [#1] PREEMPT SMP io_blk virtio_pci sr_mod 8139cp parport microcode virtio_ring floppy virtio sg pcspkr button i2c_piix4 cdrom processor thermal_sys scsi_dh_alua scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh ata_ge neric ata_piix [last unloaded: btrfs] [80582.013045] CPU 16 [80582.013045] Pid: 13045, comm: btrfs-endio-wri Tainted: G W 3.8.0-rc4-desktop+ #54 Bochs Bochs [80582.013045] RIP: 0010:[<ffffffffa0393a25>] [<ffffffffa0393a25>] btrfs_set_item_key_safe+0x65/0x1e0 [btrfs] [80582.013045] RSP: 0018:ffff880adb9d9ad8 EFLAGS: 00010206 [80582.013045] RAX: 000000000000006c RBX: ffff880adb9d9c18 RCX: 000000000000012e [80582.013045] RDX: 0000000000016000 RSI: ffff880adeeee076 RDI: ffff880adb9d9af9 [80582.013045] RBP: ffff880adb9d9b38 R08: ffff880ae4cd9dc0 R09: 0000000000000000 [80582.013045] R10: ffff880bee7c0f40 R11: 0000000000000015 R12: 0000000000000001 [80582.013045] R13: ffff880add976c78 R14: ffff880adb9d9ae8 R15: ffff880be92301c0 [80582.013045] FS: 0000000000000000(0000) GS:ffff880c1f200000(0000) knlGS:0000000000000000 [80582.013045] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [80582.013045] CR2: ffffffffff600400 CR3: 0000000be6cb8000 CR4: 00000000000006e0 [80582.013045] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [80582.013045] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [80582.013045] Process btrfs-endio-wri (pid: 13045, threadinfo ffff880adb9d8000, task ffff880b301545c0) [80582.013045] Stack: [80582.013045] ffff880bec37a000 ffff880b932a5170 000000000000012e 000000000160006c [80582.013045] ffff880adb9d9b00 ffffffffa03a1ad0 000000000000012e 0000000000010000 [80582.013045] ffff880add976c78 ffff880be92301c0 0000000000016000 0000000000000f96 [80582.013045] Call Trace: [80582.013045] [<ffffffffa03a1ad0>] ? btrfs_inc_extent_ref+0x90/0xb0 [btrfs] [80582.013045] [<ffffffffa03cdc50>] __btrfs_drop_extents+0x530/0xb70 [btrfs] [80582.013045] [<ffffffffa03cecab>] btrfs_drop_extents+0x6b/0x90 [btrfs] [80582.013045] [<ffffffffa03bce1a>] insert_reserved_file_extent+0x8a/0x2a0 [btrfs] [80582.013045] [<ffffffffa03c0c6e>] btrfs_finish_ordered_io+0x39e/0x3e0 [btrfs] [80582.013045] [<ffffffffa03c0cc0>] finish_ordered_fn+0x10/0x20 [btrfs] [80582.013045] [<ffffffffa03e7205>] worker_loop+0x165/0x4c0 [btrfs] [80582.013045] [<ffffffffa03e70a0>] ? check_pending_worker_creates+0xe0/0xe0 [btrfs] [80582.013045] [<ffffffffa03e70a0>] ? check_pending_worker_creates+0xe0/0xe0 [btrfs] [80582.013045] [<ffffffff8106d026>] kthread+0xc6/0xd0 [80582.013045] [<ffffffff8106cf60>] ? kthread_freezable_should_stop+0x70/0x70 [80582.013045] [<ffffffff815f7a7c>] ret_from_fork+0x7c/0xb0 [80582.013045] [<ffffffff8106cf60>] ? kthread_freezable_should_stop+0x70/0x70 [80582.013045] Code: 00 00 00 4c 89 ef 48 63 d2 4c 89 f6 48 8d 14 92 48 8d 54 92 65 e8 6c 0c 04 00 48 8b 0b 48 39 4d b0 48 8b 55 b9 0f b6 45 b8 76 0b <0f> 0b eb fe 0f 1f 80 00 00 00 00 72 1e 3a 43 08 77 ee 66 0f 1f [80582.013045] RIP [<ffffffffa0393a25>] btrfs_set_item_key_safe+0x65/0x1e0 [btrfs] [80582.013045] RSP <ffff880adb9d9ad8> [80582.071249] ---[ end trace 5360a95c1506db85 ]--- The lines are shifted, but it's the same code: 2933 void btrfs_set_item_key_safe(struct btrfs_trans_handle *trans, 2934 struct btrfs_root *root, struct btrfs_path *path, 2935 struct btrfs_key *new_key) 2936 { 2937 struct btrfs_disk_key disk_key; 2938 struct extent_buffer *eb; 2939 int slot; 2940 2941 eb = path->nodes[0]; 2942 slot = path->slots[0]; 2943 if (slot > 0) { 2944 btrfs_item_key(eb, &disk_key, slot - 1); 2945 BUG_ON(comp_keys(&disk_key, new_key) >= 0); ^^^^ 2946 } 2947 if (slot < btrfs_header_nritems(eb) - 1) { 2948 btrfs_item_key(eb, &disk_key, slot + 1); 2949 BUG_ON(comp_keys(&disk_key, new_key) <= 0); 2950 } 2951 2952 btrfs_cpu_key_to_disk(&disk_key, new_key); 2953 btrfs_set_item_key(eb, &disk_key, slot); 2954 btrfs_mark_buffer_dirty(eb); 2955 if (slot == 0) 2956 fixup_low_keys(trans, root, path, &disk_key, 1); 2957 } It was unexpected so I haven't put any debugging into config or code. According to the timestamps, the test has been running for 16h 58m. Test setup: * kvm virtual machine with 48 cpus * 48G memory (single bit ECC) * 5x 136GB 15K SAS disks (virtio,cache=none) filesystem was freshly created before the test as data/raid0 and meta/radi0 and then converted via balance to data/single meta/single (that was one of the reproducers of the csum problem). balance log: [19441.016374] btrfs: disk space caching is enabled [19452.900668] btrfs: relocating block group 4332584960 flags 9 [19455.935676] btrfs: found 4 extents [19456.247759] btrfs: found 4 extents [19456.471688] btrfs: relocating block group 37683200 flags 12 [19459.345387] btrfs: found 16 extents [19459.688217] btrfs: relocating block group 20971520 flags 10 [19460.194694] btrfs: found 1 extents [19460.465852] btrfs: relocating block group 12582912 flags 1 [19463.829548] btrfs: found 1 extents [19464.123115] btrfs: found 1 extents [19464.512113] btrfs: relocating block group 4194304 flags 4 [19475.407355] btrfs: found 1 extents # btrfs fi df /mnt/btrfs/ Data: total=3.00GiB, used=11.29MiB System, RAID1: total=32.00MiB, used=4.00KiB System: total=4.00MiB, used=0.00 Metadata, RAID1: total=2.00GiB, used=336.00KiB # btrfs fi show Label: none uuid: a15e1f6c-0e19-420e-93e2-82b4aa072a47 Total devices 5 FS bytes used 10.41MiB devid 1 size 136.70GiB used 1.04GiB path /dev/vda devid 2 size 136.70GiB used 2.00GiB path /dev/vdb devid 3 size 136.70GiB used 1.00GiB path /dev/vdc devid 4 size 136.70GiB used 1.03GiB path /dev/vdd devid 5 size 136.70GiB used 2.00GiB path /dev/vde the sources used were cmason/for-linus (0d90060f330ef49a60f56df2d9439ba263c979d9) plus https://patchwork.kernel.org/patch/2020661/ (Btrfs: put csums on the right ordered extent) plus https://patchwork.kernel.org/patch/1987481/ (Btrfs: fix wrong max device number for single profile) btrfs-debug-tree dumped the filesystem without problems, so it's an in-memory issue. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html