On 08.09.2011 01:33, Justin Gottula wrote: > Hi, > > > I recently created a Btrfs volume on top of a software (mdadm) raid5 array > (since Btrfs currently lacks raid5 support at the FS level). On this 640 GB > volume, I stored a ~400 GB tar file. After a couple weeks of use, I used > 'btrfs > defragment' on this file in an effort to (a) defrag and (b) compress the > file. I > made sure I was using the latest version of the userspace utilities (Btrfs > v0.19-35-g1b444cd-dirty) as well as kernel 3.0.
I think Liu Bo posted a fix for this a while ago: [PATCH] Btrfs: fix a bug of balance on full multi-disk partitions -Arne > Now, this may or may not have had something to do with the drives being at 55 > celsius, which I only discovered later, but after I had left this operation to > work overnight, I came back to some scary messages in the kernel log. > Immediately after the operation started (as far as I can tell), the following > messages showed up in the kernel log: > > > [17055.793912] btrfs: relocating block group 636489826304 flags 1 > > [17112.566998] attempt to access beyond end of device > [17112.567003] md127p1: rw=145, want=1248172032, limit=1248171999 > [17112.567004] attempt to access beyond end of device > [17112.567006] md127p1: rw=145, want=1248172280, limit=1248171999 > [17112.567008] attempt to access beyond end of device > [17112.567009] md127p1: rw=145, want=1248172416, limit=1248171999 > [17112.567011] attempt to access beyond end of device > [17112.567012] md127p1: rw=145, want=1248172664, limit=1248171999 > [17112.567014] attempt to access beyond end of device > [17112.567015] md127p1: rw=145, want=1248172912, limit=1248171999 > [17112.567016] attempt to access beyond end of device > [17112.567018] md127p1: rw=145, want=1248172928, limit=1248171999 > (thousands more of the above in rapid succession) > > > and occasionally a few of these: > > > [17157.916746] btrfs csum failed ino 258 off 8192 csum 2566472073 private > 1679206033 > [17157.916758] btrfs csum failed ino 258 off 12288 csum 2566472073 private > 248979876 > [17157.916771] btrfs csum failed ino 258 off 16384 csum 2566472073 private > 3790022839 > > > Then, later, another burst of the same device access warnings followed by > this: > > > [20063.971837] ------------[ cut here ]------------ > [20063.972050] kernel BUG at fs/btrfs/ctree.c:300! > [20063.972238] invalid opcode: 0000 [#1] PREEMPT SMP > [20063.972666] CPU 0 > [20063.972669] Modules linked in: ipv6 loop nouveau snd_hda_codec_via ttm > drm_kms_helper drm r8169 forcedeth i2c_nforce2 mii ppdev i2c_algo_bit mxm_wmi > parport_pc wmi pcspkr video parport asus_atk0110 snd_hda_intel evdev sg > edac_core edac_mce_amd snd_hda_codec processor psmouse button snd_hwdep > snd_pcm > snd_timer snd soundcore snd_page_alloc k10temp serio_raw i2c_core usbhid hid > raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy > async_tx > raid0 btrfs zlib_deflate crc32c libcrc32c ext4 mbcache jbd2 crc16 ohci_hcd > ehci_hcd usbcore md_mod sata_nv sd_mod pata_amd libata scsi_mod > [20063.974670] > [20063.974670] Pid: 7425, comm: btrfs-endio-wri Not tainted 3.0-ARCH #1 System > manufacturer System Product Name/M4N68T-M-V2 > [20063.974670] RIP: 0010:[<ffffffffa01a1df1>] [<ffffffffa01a1df1>] > update_ref_for_cow+0x331/0x340 [btrfs] > [20063.974670] RSP: 0018:ffff8800236798b0 EFLAGS: 00010282 > [20063.974670] RAX: 00000000fffffffb RBX: ffff88002539f000 RCX: > 00000001bc22b580 > [20063.974670] RDX: 00000001bc22b540 RSI: 000060ffc24024f0 RDI: > ffffea0000cb7930 > [20063.974670] RBP: ffff880023679900 R08: ffffffffa01a121a R09: > 0000000000000000 > [20063.974670] R10: 00000000fffffffb R11: 0000000000000001 R12: > ffff880020a72a00 > [20063.974670] R13: ffff880037c7f000 R14: ffff88002c24fc00 R15: > ffff88002367997c > [20063.974670] FS: 00007f6c7ea81740(0000) GS:ffff88003d800000(0000) > knlGS:0000000000000000 > [20063.974670] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [20063.974670] CR2: 00007fe40de62000 CR3: 0000000025978000 CR4: > 00000000000006f0 > [20063.974670] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [20063.974670] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [20063.974670] Process btrfs-endio-wri (pid: 7425, threadinfo > ffff880023678000, > task ffff88003a43f300) > [20063.974670] Stack: > [20063.974670] 0000000000001000 ffff880023678000 ffff880023679fd8 > ffff880025cb8010 > [20063.974670] ffff880023679910 ffff88002539f000 ffff88002c24fc00 > ffff880020a72a00 > [20063.974670] ffff880037c7f000 ffff880003955900 ffff8800236799b0 > ffffffffa01a2276 > [20063.974670] Call Trace: > [20063.974670] [<ffffffffa01a2276>] __btrfs_cow_block+0x476/0x890 [btrfs] > [20063.974670] [<ffffffffa01a27a8>] btrfs_cow_block+0x118/0x360 [btrfs] > [20063.974670] [<ffffffffa01a7e0e>] btrfs_search_slot+0x1de/0x900 [btrfs] > [20063.974670] [<ffffffffa01bac58>] btrfs_lookup_file_extent+0x38/0x40 > [btrfs] > [20063.974670] [<ffffffffa01d54c4>] btrfs_drop_extents+0x104/0xa10 [btrfs] > [20063.974670] [<ffffffffa01cadf3>] > insert_reserved_file_extent.constprop.47+0x73/0x250 [btrfs] > [20063.974670] [<ffffffffa01c6007>] ? start_transaction+0x97/0x2b0 [btrfs] > [20063.974670] [<ffffffffa01cbf9a>] btrfs_finish_ordered_io+0x2da/0x320 > [btrfs] > [20063.974670] [<ffffffffa01cc02d>] btrfs_writepage_end_io_hook+0x4d/0xf0 > [btrfs] > [20063.974670] [<ffffffffa01e5c9f>] end_bio_extent_writepage+0x12f/0x170 > [btrfs] > [20063.974670] [<ffffffff811894fd>] bio_endio+0x1d/0x40 > [20063.974670] [<ffffffffa01becf6>] end_workqueue_fn+0x56/0x140 [btrfs] > [20063.974670] [<ffffffffa01f2c15>] worker_loop+0x165/0x520 [btrfs] > [20063.974670] [<ffffffffa01f2ab0>] ? btrfs_queue_worker+0x2f0/0x2f0 [btrfs] > [20063.974670] [<ffffffff8107ee9c>] kthread+0x8c/0xa0 > [20063.974670] [<ffffffff813f5b64>] kernel_thread_helper+0x4/0x10 > [20063.974670] [<ffffffff8107ee10>] ? kthread_worker_fn+0x190/0x190 > [20063.974670] [<ffffffff813f5b60>] ? gs_change+0x13/0x13 > [20063.974670] Code: 48 8b 45 b8 e9 1c fe ff ff f6 45 c9 01 0f 85 1c fe ff ff > 0f > 0b 0f 0b 48 89 45 b8 e8 ba f6 24 e1 48 8b 45 b8 e9 89 fd ff ff 0f 0b <0f> 0b > 0f > 0b 0f 0b 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 > [20063.974670] RIP [<ffffffffa01a1df1>] update_ref_for_cow+0x331/0x340 > [btrfs] > [20063.974670] RSP <ffff8800236798b0> > [20063.974670] [drm] nouveau 0000:00:0d.0: Setting dpms mode 0 on vga encoder > (output 0) > [20064.021164] ---[ end trace 5d092910be378bb8 ]--- > [20098.068178] attempt to access beyond end of device > [20098.068495] md127p1: rw=0, want=1250270168, limit=1248171999 > [20098.069269] attempt to access beyond end of device > [20098.070180] md127p1: rw=0, want=1250270168, limit=1248171999 > [20098.071105] attempt to access beyond end of device > [20098.072280] md127p1: rw=0, want=1250532312, limit=1248171999 > [20098.072973] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000030 > [20098.073881] IP: [<ffffffffa01b8027>] btrfs_print_leaf+0x37/0x880 [btrfs] > [20098.074824] PGD 39989067 PUD 39aed067 PMD 0 > [20098.075361] Oops: 0000 [#2] PREEMPT SMP > [20098.075361] CPU 0 > [20098.075361] Modules linked in: ipv6 loop nouveau snd_hda_codec_via ttm > drm_kms_helper drm r8169 forcedeth i2c_nforce2 mii ppdev i2c_algo_bit mxm_wmi > parport_pc wmi pcspkr video parport asus_atk0110 snd_hda_intel evdev sg > edac_core edac_mce_amd snd_hda_codec processor psmouse button snd_hwdep > snd_pcm > snd_timer snd soundcore snd_page_alloc k10temp serio_raw i2c_core usbhid hid > raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy > async_tx > raid0 btrfs zlib_deflate crc32c libcrc32c ext4 mbcache jbd2 crc16 ohci_hcd > ehci_hcd usbcore md_mod sata_nv sd_mod pata_amd libata scsi_mod > [20098.075361] > [20098.075361] Pid: 7416, comm: btrfs-transacti Tainted: G D 3.0-ARCH > #1 System manufacturer System Product Name/M4N68T-M-V2 > [20098.075361] RIP: 0010:[<ffffffffa01b8027>] [<ffffffffa01b8027>] > btrfs_print_leaf+0x37/0x880 [btrfs] > [20098.075361] RSP: 0018:ffff880009437af0 EFLAGS: 00010286 > [20098.075361] RAX: 00000000fffffffb RBX: 0000000000000000 RCX: > ffff880009437fd8 > [20098.075361] RDX: ffff880009436000 RSI: 0000000000000000 RDI: > ffff88002539f800 > [20098.075361] RBP: ffff880009437b80 R08: 0000000000000001 R09: > 0000000000000000 > [20098.075361] R10: 00000000fffffffb R11: 0000000000000001 R12: > 0000000000000101 > [20098.075361] R13: 00000094db37b000 R14: 00000000fffffffb R15: > 0000000000001000 > [20098.075361] FS: 00007fdcd1611700(0000) GS:ffff88003d800000(0000) > knlGS:0000000000000000 > [20098.075361] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [20098.075361] CR2: 0000000000000030 CR3: 0000000039abd000 CR4: > 00000000000006f0 > [20098.075361] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [20098.075361] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [20098.075361] Process btrfs-transacti (pid: 7416, threadinfo > ffff880009436000, > task ffff88003ba2dd70) > [20098.075361] Stack: > [20098.075361] 0000000000002000 0000000000001000 ffff880009437c28 > 00000094db37b000 > [20098.075361] 0000000000000000 ffffffff81142ad3 ffff88002539f800 > 0000000000000090 > [20098.075361] 0000000000000090 000080503a22a120 a800000094db37b0 > 0000000000001000 > [20098.075361] Call Trace: > [20098.075361] [<ffffffff81142ad3>] ? kmem_cache_alloc+0x133/0x160 > [20098.075361] [<ffffffffa01b02df>] __btrfs_free_extent+0x61f/0x6d0 [btrfs] > [20098.075361] [<ffffffffa01b2b6d>] ? run_clustered_refs+0x11d/0x830 [btrfs] > [20098.075361] [<ffffffffa01b2dd9>] run_clustered_refs+0x389/0x830 [btrfs] > [20098.075361] [<ffffffff813f0b6e>] ? schedule+0x34e/0x9f0 > [20098.075361] [<ffffffffa0205400>] ? btrfs_find_ref_cluster+0x20/0x190 > [btrfs] > [20098.075361] [<ffffffffa01b3348>] btrfs_run_delayed_refs+0xc8/0x220 [btrfs] > [20098.075361] [<ffffffff813f27ee>] ? mutex_unlock+0xe/0x10 > [20098.075361] [<ffffffffa01c5381>] btrfs_commit_transaction+0x81/0x8a0 > [btrfs] > [20098.075361] [<ffffffff8107f7f0>] ? abort_exclusive_wait+0xb0/0xb0 > [20098.075361] [<ffffffffa01be3ed>] transaction_kthread+0x26d/0x290 [btrfs] > [20098.075361] [<ffffffffa01be180>] ? btrfs_congested_fn+0xd0/0xd0 [btrfs] > [20098.075361] [<ffffffff8107ee9c>] kthread+0x8c/0xa0 > [20098.075361] [<ffffffff813f5b64>] kernel_thread_helper+0x4/0x10 > [20098.075361] [<ffffffff8107ee10>] ? kthread_worker_fn+0x190/0x190 > [20098.075361] [<ffffffff813f5b60>] ? gs_change+0x13/0x13 > [20098.075361] Code: 89 5d d8 4c 89 65 e0 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 > 66 > 66 66 66 90 65 48 8b 0c 25 48 cd 00 00 48 8d 91 28 e0 ff ff 48 89 f3 > [20098.075361] RIP [<ffffffffa01b8027>] btrfs_print_leaf+0x37/0x880 [btrfs] > [20098.075361] RSP <ffff880009437af0> > [20098.075361] CR2: 0000000000000030 > [20098.121409] ---[ end trace 5d092910be378bb9 ]--- > > > And some time later, this: > > > [36853.419291] ------------[ cut here ]------------ > [36853.419852] kernel BUG at fs/btrfs/extent-tree.c:5510! > [36853.419852] invalid opcode: 0000 [#3] PREEMPT SMP > [36853.419852] CPU 0 > [36853.419852] Modules linked in: ipv6 loop nouveau snd_hda_codec_via ttm > drm_kms_helper drm r8169 forcedeth i2c_nforce2 mii ppdev i2c_algo_bit mxm_wmi > parport_pc wmi pcspkr video parport asus_atk0110 snd_hda_intel evdev sg > edac_core edac_mce_amd snd_hda_codec processor psmouse button snd_hwdep > snd_pcm > snd_timer snd soundcore snd_page_alloc k10temp serio_raw i2c_core usbhid hid > raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy > async_tx > raid0 btrfs zlib_deflate crc32c libcrc32c ext4 mbcache jbd2 crc16 ohci_hcd > ehci_hcd usbcore md_mod sata_nv sd_mod pata_amd libata scsi_mod > [36853.419852] > [36853.419852] Pid: 7413, comm: btrfs-delayed-m Tainted: G D 3.0-ARCH > #1 System manufacturer System Product Name/M4N68T-M-V2 > [36853.419852] RIP: 0010:[<ffffffffa01b3263>] [<ffffffffa01b3263>] > run_clustered_refs+0x813/0x830 [btrfs] > [36853.419852] RSP: 0018:ffff880025c13c10 EFLAGS: 00010282 > [36853.419852] RAX: 00000000fffffffb RBX: ffff880025dfc2a0 RCX: > ffff88002a2ceb80 > [36853.419852] RDX: 0000000000000008 RSI: 00000094db37a000 RDI: > 0000000000000000 > [36853.419852] RBP: ffff880025c13cf0 R08: 0000000000000001 R09: > 0000000000000000 > [36853.419852] R10: ffff88002539f800 R11: ffff880037c7f090 R12: > ffff88002869c3c0 > [36853.419852] R13: ffff88001c5b7200 R14: ffff88000a66a1e0 R15: > 0000000000000000 > [36853.419852] FS: 00007fd708110700(0000) GS:ffff88003d800000(0000) > knlGS:0000000000000000 > [36853.419852] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [36853.419852] CR2: 00000000017bd048 CR3: 000000001e1cc000 CR4: > 00000000000006f0 > [36853.419852] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [36853.419852] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [36853.419852] Process btrfs-delayed-m (pid: 7413, threadinfo > ffff880025c12000, > task ffff88003a7bebd0) > [36853.419852] Stack: > [36853.419852] ffff880025398800 0000000000000000 ffff880025c13c60 > ffffffffa01a45f1 > [36853.419852] ffff880025c13c40 000000033a22a5a0 0000000000000000 > 0000000000000002 > [36853.419852] ffff880026b69d00 ffff880025c13d28 0000000000000000 > ffff880025c13d30 > [36853.419852] Call Trace: > [36853.419852] [<ffffffffa01a45f1>] ? bin_search+0x81/0x130 [btrfs] > [36853.419852] [<ffffffffa01b3348>] btrfs_run_delayed_refs+0xc8/0x220 [btrfs] > [36853.419852] [<ffffffffa01c5c53>] __btrfs_end_transaction+0x83/0x320 > [btrfs] > [36853.419852] [<ffffffffa01c5f08>] btrfs_end_transaction_dmeta+0x18/0x20 > [btrfs] > [36853.419852] [<ffffffffa020f8b8>] > btrfs_async_run_delayed_node_done+0xd8/0x1a0 [btrfs] > [36853.419852] [<ffffffffa01f2c15>] worker_loop+0x165/0x520 [btrfs] > [36853.419852] [<ffffffffa01f2ab0>] ? btrfs_queue_worker+0x2f0/0x2f0 [btrfs] > [36853.419852] [<ffffffff8107ee9c>] kthread+0x8c/0xa0 > [36853.419852] [<ffffffff813f5b64>] kernel_thread_helper+0x4/0x10 > [36853.419852] [<ffffffff8107ee10>] ? kthread_worker_fn+0x190/0x190 > [36853.419852] [<ffffffff813f5b60>] ? gs_change+0x13/0x13 > [36853.419852] Code: e0 e9 12 f9 ff ff 0f 0b 80 fa b2 0f 84 bc f9 ff ff 0f 0b > be > 95 00 00 00 48 c7 c7 37 a0 21 a0 e8 d4 95 ea e0 e9 04 f9 ff ff 0f 0b <0f> 0b > 0f > 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 66 66 66 66 66 2e > [36853.419852] RIP [<ffffffffa01b3263>] run_clustered_refs+0x813/0x830 > [btrfs] > [36853.419852] RSP <ffff880025c13c10> > [36853.419852] [drm] nouveau 0000:00:0d.0: Setting dpms mode 0 on vga encoder > (output 0) > [36853.473157] ---[ end trace 5d092910be378bba ]--- > > > On the next boot, the volume wouldn't mount: > > > [ 7.010353] btrfs: use zlib compression > [ 7.040893] attempt to access beyond end of device > [ 7.040896] md126p1: rw=0, want=1250270312, limit=1248171999 > [ 7.040902] attempt to access beyond end of device > [ 7.040904] md126p1: rw=0, want=1250270312, limit=1248171999 > [ 7.040908] attempt to access beyond end of device > [ 7.040910] md126p1: rw=0, want=1250532456, limit=1248171999 > [ 7.040914] btrfs: failed to read tree root on md126p1 > [ 7.046842] btrfs: open_ctree failed > > > At this point, I took a disk image and dived in, and in doing so discovered > that > somehow, there were CHUNK_ITEMs in the chunk tree that referred to physical > address ranges that were entirely outside of the device (matching up to the > ranges showing up in the kernel log over and over). Evidently, the filesystem > driver thought it should move the root tree onto a chunk that existed at a > nonexistant offset in the device. I checked the superblocks and verified that > the total_bytes fields matched up correctly to the actual device size, which > leaves me wondering how those chunks ever got there. > > I hope this report proves useful for finding the bug responsible for this. If > complete logs, details about versions used or steps taken, diagnostic output > from system commands, or particular pieces of the disk image would be helpful > for debugging this problem, I can provide them. > > > Thanks, > > Justin > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html