On Tue, Feb 1, 2011 at 11:37 PM, cwillu <cwi...@cwillu.com> wrote:
> A couple hours after a build finished involving creating and deleting
> a couple snapshots, I got the following BUG.  The system locked up
> completely.
>
> This is 2.6.38rc2 with btrfs from josef's master (9d4ba5:  Btrfs:
> handle errors in btrfs_orphan_cleanup).
>
> Original screenshot at http://imgur.com/sCinW
> Retyped from that:
>
> kernel BUG at /var/lib/dkms/btrfs/git/build/inode.c:150!
> invalid opcode: 0000 [#1] SMP
> last sysfs file:
> /sys/devices/pci0000:00/0000:00:1f.2/host3/target3:0:0/3:0:0:0/block/sdb/uevent
> CPU 1
> Modules linked in: binfmt_misc ppdev ipt_MASQUERADE iptable_nat nf_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
> xt_tcpudd iptable_filter ip_tables x_tables bridge stp
> snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm
> snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer
> snd_seq_device aes_x86_64 snd aes_generic soundcore dm_crypt
> asus_atk0110 lp snd_page_alloc parport raid10 raid456
> async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy
> async_tx raid1 raid0 multipath linear btrfs zlib_deflate libcrc32c
> radeon ttm drm_kms_helper drm usbhid usb_storage hid uas ahci
> i2c_algo_bit r8169 libahci pata_jmicron
>
> Pid: 17930, comm: btrfs-delalloc- Not tainted 2.6.38-020638rc2-generic
> #201101220905 P5Q3/System Product Name
> RIP: 0010:[<ffffffffa0219cf8>] [<ffffffffa0219cf8>]
> insert_inline_extent+0x328/0x330 [btrfs]
> RSP: 0000:ffff88020a17bbf0  EFLAGS: 00010282
> RAX: 00000000ffffffef RBX: 000000000000003f RCX: ffff88020a17a000
> RDX: 0000000000000008 RSI: ffff880000000000 RDI: ffff880185a99c38
> RBP: ffff88020a17bc80 R08: 0000000000000000 R09: 0000000000000000
> R10: ffff88022a433800 R11: ffff88017b096a00 R12: ffff88021bfb7390
> R13: 0000000000000054 R14: 0000000000000200 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff8800bfc80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f6b5afdfc68 CR3: 000000021697c000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process btrfs-delalloc- (pid: 17930, threadinfo ffff88020a17a000, task
> ffff88022b158000)
> Stack:
>  00000002005b09d8 0000000000000001 ffff88022a433800 0000000000000fd5
>  ffff88017b096a00 0000000000000001 ffff88022cab3d80 0000000000000200
>  00000000005b09d5 000000000000006c ffff88020a17bc00 00000054a0253ff8
> Call Trace:
>  [<ffffffffa0219e21>] cow_file_range_inline+0x121/0x190 [btrfs]
>  [<ffffffff815a2ffe>] ? mutex_lock+0x1e/0x50
>  [<ffffffffa021ffb3>] compress_file_range+0x483/0x5e0 [btrfs]
>  [<ffffffffa0220145>] async_cow_start+0x35/0x50 [btrfs]
>  [<ffffffffa02419bc>] worker_loop+0x15c/0x5b0 [btrfs]
>  [<ffffffffa0241860>] ? worker_loop+0x0/0x5b0 [btrfs]
>  [<ffffffff81085147>] kthread+0x97/0xa0
>  [<ffffffff8100ce24>] kernel_thread_helper+0x4/0x10
>  [<ffffffff810850b0>] ? kthread+0x0/0xa0
>  [<ffffffff8100ce20>] ? kernel_thread_helper+0x0/0x10
> Code: f8 03 48 0f af c2 4c 89 ea 48 c1 e0 0c 48 01 f0 4a 8d 34 38 e8
> 0a b4 01 00 83 6b 1c 01 48 8b 7d 98 e8 3d 9e ef e0 e9 fc fe ff ff <0f>
> 0b eb fe 0f 1f 40 00 55 48 89 e5 48 83 ec 70 48 89 5d d8 4c
> RIP  [<ffffffffa0219cf8>] insert_inline_extent+0x328/0x330 [btrfs]
>  RSP <ffff88020a17bbf0>


The crash happened after a few hours idling;  last significant
workload was an experiment with my build process abusing the snapshot
facility: a base system was installed to a subvolume via debootstrap,
then 8 snapshots of that state were taken.  Things were installed to
those snapshots in parallel, and then everything was rsync'd back to
the master, after which the snapshots were deleted.  There wouldn't
have been any syncs, fsyncs or otherwise during the builds (dpkg was
run with libeatmydata, for instance).  The original subvolume would
have been ~1gb, and each snapshot grown slightly more than that, so
there would've been ~8gb worth of snapshots being deleted in the
background.

The code in question dates back to the original zlib compression
commit, and the system was running with compess=lzo;  perhaps there's
some mismatch there?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to