Re: unable to handle kernel paging request - btrfs
Here is another trace, similar to the original issue, but I have a bit more detail on this one and it is available as text which if nothing else is more convenient so I'll go ahead and paste this. I don't intend to keep pasting these unless I get something that looks different. I only posted the initial BUG. Oct 10 05:11:15 hab nc[1250]: ip_tables ext4 crc16 mbcache jbd2 radeon nxt200x cx88_dvb cx88_vp3054_i2c videobuf2_dvb dvb_core tuner_simple tuner_types tuner cx8800 cx8802 videobuf2_dma_sg videobuf2_memops videobuf2_v4l2 cx88_alsa cx88xx mousedev fbcon videobuf2_core bitblit dm_region_hash dm_log dm_mod Oct 10 05:11:15 hab nc[1250]: [81346.935203] CPU: 3 PID: 29648 Comm: kworker/u16:3 Not tainted 4.4.24 #1 Oct 10 05:11:15 hab nc[1250]: [81346.935317] Hardware name: Gigabyte Technology Co., Ltd. GA-880GM-UD2H/GA-880GM-UD2H, BIOS F8 10/11/2010 Oct 10 05:11:15 hab nc[1250]: [81346.935544] Workqueue: btrfs-endio btrfs_endio_helper [btrfs] Oct 10 05:11:15 hab nc[1250]: [81346.935657] task: 880415acae00 ti: 88019a584000 task.ti: 88019a584000 Oct 10 05:11:15 hab nc[1250]: [81346.935783] RIP: 0010:[] [] __memcpy+0x12/0x20 Oct 10 05:11:15 hab nc[1250]: [81346.935930] RSP: 0018:88019a587c68 EFLAGS: 00010246 Oct 10 05:11:15 hab nc[1250]: [81346.936023] RAX: c90002ecfff8 RBX: 1000 RCX: 01ff Oct 10 05:11:15 hab nc[1250]: [81346.936142] RDX: RSI: 88008c950008 RDI: c90002ed Oct 10 05:11:15 hab nc[1250]: [81346.936262] RBP: 88019a587d30 R08: 41545345 R09: c90002ece000 Oct 10 05:11:15 hab nc[1250]: [81346.936382] R10: e8cc09e0 R11: 1000 R12: 88008c95 Oct 10 05:11:15 hab nc[1250]: [81346.936502] R13: 4154534d R14: R15: 8802b25b2798 Oct 10 05:11:15 hab nc[1250]: [81346.936623] FS: 7fe90a15d780() GS:880427cc() knlGS: Oct 10 05:11:15 hab nc[1250]: [81346.936756] CS: 0010 DS: ES: CR0: 8005003b Oct 10 05:11:16 hab nc[1250]: [81346.937182] 8800102c5720 0004 41545345 4154334d Oct 10 05:11:16 hab nc[1250]: [81346.937347] c90002ece000 1000 0002 003a Oct 10 05:11:16 hab nc[1250]: [81346.937515] Call Trace: Oct 10 05:11:16 hab nc[1250]: [81346.937621] [] ? lzo_decompress_biovec+0x1d1/0x2c0 [btrfs] Oct 10 05:11:16 hab nc[1250]: [81346.944148] [] end_compressed_bio_read+0x20c/0x2c0 [btrfs] Oct 10 05:11:16 hab nc[1250]: [81346.950610] [] ? resched_curr+0x60/0xc0 Oct 10 05:11:16 hab nc[1250]: [81346.957055] [] bio_endio+0x3a/0x70 Oct 10 05:11:16 hab nc[1250]: [81346.963516] [] end_workqueue_fn+0x37/0x40 [btrfs] Oct 10 05:11:16 hab nc[1250]: [81346.970009] [] normal_work_helper+0xae/0x2d0 [btrfs] Oct 10 05:11:16 hab nc[1250]: [81346.976532] [] btrfs_endio_helper+0xd/0x10 [btrfs] Oct 10 05:11:16 hab nc[1250]: [81346.983010] [] process_one_work+0x148/0x400 Oct 10 05:11:16 hab nc[1250]: [81346.989509] [] worker_thread+0x46/0x430 Oct 10 05:11:16 hab nc[1250]: [81346.996013] [] ? rescuer_thread+0x2d0/0x2d0 Oct 10 05:11:16 hab nc[1250]: [81347.034423] Code: ff ff 48 8b 43 60 48 2b 43 50 88 43 4e 5b 5d f3 c3 90 90 90 90 90 90 90 90 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3 Oct 10 05:11:16 hab nc[1250]: [81347.041852] RIP [] __memcpy+0x12/0x20 Oct 10 05:11:16 hab nc[1250]: [81347.048565] RSP Oct 10 05:11:16 hab nc[1250]: [81347.055218] CR2: c90002ed Oct 10 05:11:16 hab nc[1250]: [81347.104741] ---[ end trace 9a43c0b6d874fe31 ]--- Oct 10 05:11:16 hab nc[1250]: [81347.104752] BUG: unable to handle kernel paging request at c90002c4a000 Oct 10 05:11:16 hab nc[1250]: [81347.104761] IP: [] __memcpy+0x12/0x20 Oct 10 05:11:16 hab nc[1250]: [81347.104767] PGD 417427067 PUD 417488067 PMD 410881067 PTE 0 Oct 10 05:11:16 hab nc[1250]: [81347.104771] Oops: 0002 [#2] SMP Oct 10 05:11:16 hab nc[1250]: [81347.104825] Modules linked in: netconsole configfs tun ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_conntrack veth iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables ext4 crc16 mbcache jbd2 radeon nxt200x cx88_dvb cx88_vp3054_i2c videobuf2_dvb dvb_core tuner_simple tuner_types tuner cx8800 cx8802 videobuf2_dma_sg videobuf2_memops videobuf2_v4l2 cx88_alsa cx88xx mousedev fbcon videobuf2_core bitblit softcursor tveeprom font tileblit drm_kms_helper kvm_amd rc_core kvm v4l2_common cfbfillrect syscopyarea videodev cfbimgblt sysfillrect snd_hda_codec_realtek snd_hda_codec_generic irqbypass i2c_algo_bit sysimgblt fb_sys_fops snd_hda_intel k10temp cfbcopyarea ttm snd_hda_codec snd_hwdep i2c_piix4 snd_hda_core drm hid_logitech_hidpp snd_pcm r8169[81347.104954] CR2: c90002c4a000 CR3: cb2fb000 CR4: 06e0 Oct 10 05:11:16 hab nc[1250]: [81347.104955] Stack: Oct 10 05:11:16 hab nc[1250]: [81347.104960] a02ef741
Re: unable to handle kernel paging request - btrfs
I'm not sure if this is related to the same issue or not, but I just started getting a new BUG, followed by a panic. (I'm also enabled network console capture so that you won't have to squint at photos.) Original BUG is: [14740.444257] [ cut here ] [14740.444293] kernel BUG at /usr/src/linux-stable/fs/btrfs/volumes.c:5509! [14740.444323] invalid opcode: [#1] SMP [14740.444348] Modules linked in: nfsd auth_rpcgss oid_registry lockd grace sunrpc it87 hwmon_vid netconsole configfs tun ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_conntrack veth iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_n at nf_conntrack iptable_filter ip_tables ext4 crc16 mbcache jbd2 radeon nxt200x cx88_dvb cx88_vp3054_i2c videobuf2_dvb dvb_coretuner_simple tuner_types tuner fbcon bitblit softcursor font tileblit drm_kms_helper kvm_amd kvm cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt mousedev fb_sys_fops cfbcopyarea cx88_alsa ttm cx8802 drm cx8800 videobuf2_dma_sg videobuf2_memops videobuf2_v4l2 cx88xx snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel videobuf2_core snd_hda_codec tveeprom rc_core irqbypass v4l2_common videodev k10temp i2c_algo_bit [14740.444799] snd_hwdep i2c_piix4 snd_hda_core hid_logitech_hidpp snd_pcm r8169 8250 snd_timer snd mii 8250_base backlight serial_core soundcore evdev sch_fq_codel hid_logitech_dj hid_generic usbhid btrfs firewire_ohci atkbd ata_generic pata_acpi firew ire_core crc_itu_t xor zlib_deflate ohci_pci pata_atiixp raid6_pq ehci_pci ohci_hcd ehci_hcd usbcore usb_common dm_mirror dm_region_hash dm_log dm_mod [14740.445028] CPU: 1 PID: 3213 Comm: kworker/u16:2 Not tainted 4.4.24 #1 [14740.445056] Hardware name: Gigabyte Technology Co., Ltd. GA-880GM-UD2H/GA-880GM-UD2H, BIOS F8 10/11/2010 [14740.445116] Workqueue: btrfs-endio btrfs_endio_helper [btrfs] [14740.445143] task: 8803ff527300 ti: 8803e3c8c000 task.ti: 8803e3c8c000 [14740.445173] RIP: 0010:[] [] __btrfs_map_block+0xdfd/0x1140 [btrfs] [14740.445226] RSP: 0018:8803e3c8faa0 EFLAGS: 00010282 [14740.445248] RAX: cdf2f040 RBX: 0002 RCX: 0002 [14740.445277] RDX: RSI: 21b27000 RDI: 8800cab4fb40 [14740.445306] RBP: 8803e3c8fb88 R08: 050743c0 R09: cdf2f040 [14740.445334] R10: 0001 R11: 1e4d R12: cdf2f03f [14740.445363] R13: 9000 R14: 8803e3c8fbd0 R15: 0001 [14740.445391] FS: 7f9e2befc7c0() GS:880427c4() knlGS: [14740.445423] CS: 0010 DS: ES: CR0: 8005003b [14740.445446] CR2: 7fc533bf7000 CR3: 0003e29e4000 CR4: 06e0 [14740.445474] Stack: [14740.445484] 8803e3c8fab0 81084577 8112acf0 02011200 [14740.445526] 880410cacc60 880410cacc90 1e4e 8803ff527300 [14740.445565] 1e4e 880414e68ee8 [14740.445603] Call Trace: [14740.445618] [] ? __enqueue_entity+0x67/0x70 [14740.445644] [] ? mempool_alloc_slab+0x10/0x20 [14740.445680] [] btrfs_map_bio+0x71/0x320 [btrfs] [14740.445707] [] ? kmem_cache_alloc+0x190/0x1f0 [14740.445742] [] ? btrfs_bio_wq_end_io+0x2e/0x80 [btrfs] [14740.445780] [] btrfs_submit_compressed_read+0x451/0x4a0 [btrfs] [14740.445821] [] btrfs_submit_bio_hook+0x1a0/0x1b0 [btrfs] [14740.445860] [] ? btrfs_io_bio_alloc+0x10/0x30 [btrfs] [14740.445900] [] ? btrfs_create_repair_bio+0xc3/0xe0 [btrfs] [14740.445940] [] end_bio_extent_readpage+0x44f/0x510 [btrfs] [14740.445981] [] ? btrfs_create_repair_bio+0xe0/0xe0 [btrfs] [14740.446011] [] bio_endio+0x3a/0x70 [14740.446042] [] end_workqueue_fn+0x37/0x40 [btrfs] [14740.446080] [] normal_work_helper+0xae/0x2d0 [btrfs] [14740.446118] [] btrfs_endio_helper+0xd/0x10 [btrfs] [14740.446145] [] process_one_work+0x148/0x400 [14740.446170] [] worker_thread+0x46/0x430 [14740.446193] [] ? rescuer_thread+0x2d0/0x2d0 [14740.446217] [] ? rescuer_thread+0x2d0/0x2d0 [14740.446241] [] kthread+0xc4/0xe0 [14740.446262] [] ? kthread_park+0x50/0x50 [14740.446286] [] ret_from_fork+0x3f/0x70 [14740.446309] [] ? kthread_park+0x50/0x50 [14740.446332] Code: 60 ff ff ff 48 63 d3 48 2b 4d c0 48 0f af c1 48 39 c2 48 0f 46 c2 48 89 45 90 89 d9 c7 85 70 ff ff ff 00 0 0 00 00 e9 f9 f3 ff ff <0f> 0b bb f4 ff ff ff e9 c7 fa ff ff be 6a 16 00 00 48 c7 c7 18 [14740.446672] RIP [] __btrfs_map_block+0xdfd/0x1140 [btrfs] [14740.446714] RSP [14740.456756] ---[ end trace e349a675c6512569 ]--- [14740.456832] BUG: unable to handle kernel paging request at ffd8 [14740.456869] IP: [] kthread_data+0xb/0x20 [14740.456896] PGD 1a0a067 PUD 1a0c067 PMD 0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unable to handle kernel paging request - btrfs
On Fri, Sep 30, 2016 at 8:38 PM, Jeff Mahoney <je...@suse.com> wrote: > On 9/30/16 5:07 PM, Rich Freeman wrote: >> On Fri, Sep 30, 2016 at 4:55 PM, Jeff Mahoney <je...@suse.com> wrote: >>> This looks like a use-after-free on one of the pages used for >>> compression. Can you post the output of objdump -Dr >>> /lib/modules/$(uname -r)/kernel/fs/btrfs/btrfs.ko somewhere? >>> >> >> Sure: >> https://drive.google.com/open?id=0BwUDImviY_gcR3JfT0Z1cUlRVEk >> >> I was impressed by just how large it was. >> >> I take it you're going to try to use the offsets in the oops to figure >> out where it went wrong? I really need to get kernel core dumping >> working on this box... > > Yep. What I think is happening is that we have workspace getting freed > while it's in use. The faulting address is in vmalloc space and it's > also the first argument to memcpy, which makes it the destination. In > lzo_decompress_biovec, that means it's the workspace->cbuf. Beyond that > I'll have to dig a bit more. > I'll confess to not being much of a kernel hacker, but could this error also be caused by a buffer overrun? If working_bytes or in_page_bytes_left are larger than the size of the buffer then the memcpy would overrun the length of the buffer. I don't know if that generates a different error than the one reported. What guarantee do we have that working_bytes is less than the size of workspace->cbuf? I'm just throwing stuff out there because as far as I can tell the code never frees workspace (I'm guessing kunmap at the very end might take care of it). -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unable to handle kernel paging request - btrfs
On Fri, Sep 30, 2016 at 4:55 PM, Jeff Mahoneywrote: > This looks like a use-after-free on one of the pages used for > compression. Can you post the output of objdump -Dr > /lib/modules/$(uname -r)/kernel/fs/btrfs/btrfs.ko somewhere? > Sure: https://drive.google.com/open?id=0BwUDImviY_gcR3JfT0Z1cUlRVEk I was impressed by just how large it was. I take it you're going to try to use the offsets in the oops to figure out where it went wrong? I really need to get kernel core dumping working on this box... -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unable to handle kernel paging request - btrfs
On Thu, Sep 22, 2016 at 1:41 PM, Jeff Mahoney <je...@suse.com> wrote: > On 9/22/16 8:18 AM, Rich Freeman wrote: >> I have been getting panics consistently after doing a btrfs replace >> operation on a raid1 and rebooting. I linked a photo of the panic; I >> haven't been able to get a text capture of it. >> >> https://ibin.co/2vx0HhDeViu3.jpg >> >> I'm getting this error on the latest 4.4, 4.1, and even on an old >> 3.18.26 kernel I had lying around. >> >> I tried the remove root_log_ctx from ctx list before btrfs_sync_log >> returns patch on 4.1 and that did not solve my problem either. >> >> I'm able to boot into single-user mode and if I don't start any >> processes the system seems fairly stable. I am also able to start a >> btrfs balance and run that for several hours without issue. If I >> start launching services the system will tend to panic, though how >> many processes I can launch will vary. I don't think that it is a >> particular file being accessed that is triggering the issue since the >> point where it fails varies. I suspect it may be load-related. >> >> Mounting with compress=no doesn't seem to help either. Granted, I see >> lzo_decompress in the backtrace and that is probably a read operation. >> >> Any suggestions? Google hasn't been helpful on this one... > > Can you boot with panic_on_oops=1, reproduce it, and capture that Oops? > The trace in your photo is a secondary Oops (tainted D), which means > that something else went wrong before that and now the system is > tripping over it. Secondary Oopses don't really help the debugging > process because the system was already in a broken, undefined, state. > Ok, the system has been up for a week without issue, but just paniced and rebooted right towards the end of a balance (it literally had about 30 of 2500 chunks left). After it came up (and waiting for it to fully mount as there were a bunch of free space warnings/etc) I managed to capture an initial oops when it happened again: https://ibin.co/2wt0n2IaCOA3.jpg This is on a system without swap, though my understanding is that the paging system is used for other things. Note that I've updated my kernel since my last post. When it paniced during the balance it was running 4.4.21, and on the oops I actually captured it was on 4.4.23 (I was actually just waiting for the balance to finish before rebooting with a new kernel). -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs and containers
On Wed, Mar 9, 2016 at 4:45 PM, Marc MERLINwrote: > On Wed, Mar 09, 2016 at 02:21:26PM -0700, Chris Murphy wrote: >> > I have a very stripped down docker image that actually mounts portion of >> > of my root filesystem read only. >> > While it's running out of a btrfs filesystem, you can't run btrfs >> > commands against it: >> > 05233e5c91f0:/# btrfs fi show >> > 05233e5c91f0:/# btrfs subvol list / >> > ERROR: can't perform the search - Operation not permitted >> > 05233e5c91f0:/# btrfs subvol list . >> > ERROR: can't perform the search - Operation not permitted >> > >> > I didn't do anything special, it's just working that way. >> >> Yep, you're not using --privileged in which case you can't list >> things. But I'm not sure what the equivalent is off hand with >> systemd-nspawn containers, I think those may always be privileged? > > Ok, cool. I just used docker out of the box, glad to know it errs on > the secure side by default. > (and I don't have systemd, so that may also help me there) > I'm sure the default capability list for systemd-nspawn and docker is different. I know that you can tune nspawn to give the container whatever capabilities you want it to. In general though a general warning is that linux containers are still not quite 100% secure when root is running inside. Obviously the fewer capabilities you give them the better, but the level of isolation isn't quite to VM levels. It is better than chroot levels, however. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid
On Sun, Mar 6, 2016 at 4:07 PM, Chris Murphy <li...@colorremedies.com> wrote: > On Sun, Mar 6, 2016 at 5:01 AM, Rich Freeman <ri...@gentoo.org> wrote: > >> I think it depends on how you define "old." I think that 3.18.28 >> would be fine as it is a supported longterm. > > For raid56? I disagree. There were substantial raid56 code changes in > 3.19 that were not backported to 3.18. Of course. I was referring to raid1. I wouldn't run raid56 without an expectation of occasionally losing everything on any version of linux. :) If I were just testing it or I could tolerate losing everything occasionally I'd probably track the current stable, if not mainline, depending on my goals. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid
On Tue, Mar 1, 2016 at 11:27 AM, Hugo Millswrote: > >Definitely don't use parity RAID on 3.19. It's not really something > I'd trust, personally, even on 4.4, except for testing purposes. ++ - raid 5/6 are fairly unstable at this point. Raid 1 should be just fine. >TBH, I wouldn't really want to be running something as old as 3.19 > either. The actual problems of running older kernels are, IME, > considerably worse than the perceived problems of upgrading. I think it depends on how you define "old." I think that 3.18.28 would be fine as it is a supported longterm. I've just upgraded to the 4.1 series which I plan to track until a new longterm has been out for a few months and things lok quiet. 3.19 is very problematic though, as it is no longer supported. I'd sooner "downgrade" to 3.18.28 (which likely has more btrfs backports unless your distro handles them). Or, upgrade to 4.1.19. If you are using highly experimental features like raid5 support on btrfs then bleeding-edge is probably better, but I've found I've had the fewest issues sticking with the previous longterm. I've been bitten by a few btrfs regressions over the years and I think 3.19 was actually around the time I got hit by one of them. Since I've switched to just staying on a longterm once it hits the x.x.15 version or so I've found things to be much more reliable. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Deadlock after upgrade to 4.1
On Fri, Dec 25, 2015 at 11:34 PM, Chris Murphywrote: > I would then also try to reproduce with 4.2.8 or 4.3.3 because those > have ~ 25% backports than made it to 4.1.15, so there's an off chance > it's fixed there. I take it that those backports are in the queue though? I was actually thinking about updating to 4.1 over the holidays but this thread is making me think that btrfs isn't quite ready in 4.1 yet. 3.18.25 is about the best experience with btrfs I've had so far, and I guess I don't really have any reason to update until raid5 is stable (which seems a long way off). -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs autodefrag?
On Sat, Oct 17, 2015 at 12:36 PM, Xavier Gnatawrote: > 2) Disabling copy-on-write for just the VM image directory. Unless this has changed, doing this will also disable checksumming. I don't see any reason why it has to, but it does. So, I avoid using this at all costs. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: State of Dedup / Defrag
On Wed, Oct 14, 2015 at 10:47 PM, Zygo Blaxellwrote: > > I wouldn't describe dedup+defrag as unsafe. More like insane. You won't > lose any data, but running both will waste a lot of time and power. > Either one is OK without the other, or applied to non-overlapping sets > of files, but they are operations with opposite results. That is probably why I disabled it then. I now recall past discussion that defragging a file wasn't snapshot-aware, though I thought that was fixed. Obviously there is always a tradeoff since from a dedup perspective you're best off arranging extents so that you're sharing as much as possible, and from a defrag standpoint you want to just have each file have a single extent even if two files differ by a single byte. I've pretty much stopped running VMs on btrfs and I've adjusted my journal settings to something more sane so the defrag isn't nearly as important these days. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID6 stable enough for production?
On Wed, Oct 14, 2015 at 9:47 PM, Chris Murphywrote: > > For that matter, now that GlusterFS has checksums and snapshots... Interesting - I haven't kept up with that. Does it actually do end-to-end checksums? That is, compute the checksum at the time of storage, store the checksum in the metadata somehow, and ensure the checksum matches when data is retrieved? I forget whether it was glusterfs or ceph I was looking at, but some of those distributed filesystems will only checksum data while in transit, but not while it is at rest. So, if a server claims it has a copy of the file, then it is assumed to be a good copy and you never realize that even though you have 5 copies of that file distributed around the server you ended up using differs from the other 4. I'm also not sure if it supports an n+1/2 model like raid5/6, or if it is just a 2*n model like raid1. If I want to store 5TB of data with redundancy, I'd prefer to not need 10TB worth of drives to do it, regardless of how many systems they're spread across. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID6 stable enough for production?
On Wed, Oct 14, 2015 at 4:53 PM, Donald Pearsonwrote: > > Personally I would still recommend zfs on illumos in production, > because it's nearly unshakeable and the creative things you can do to > deal with problems are pretty remarkable. The unfortunate reality is > though that over time your system will probably grow and expand and > zfs is very locked in to the original configuration. Adding vdevs is > a poor solution IMO. > This is the main thing that has kept me away from zfs - you can't modify a vdev, like you can with an md array or btrfs. I don't think zfs makes use of all your space if you have mixed disk sizes in a raid-z either - it works like mdadm. I'm not sure whether btrfs will be any better in that regard (if I have 2x3TB and 3x1TB drives in a RAID5 I should get 6TB of usable space, not 4TB, without messing with partitioning). So, I am running raid1 btrfs in the hope that I'll be able to move to something more efficient in the future. However, I would not personally be using raid5/6 for anything but pure experimentation on btrfs anytime soon. I don't even trust the 4.1 kernel series for btrfs at all just yet, and you're not going to be running older than that for raid5/6. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: State of Dedup / Defrag
On Wed, Oct 14, 2015 at 1:09 AM, Zygo Blaxellwrote: > > I wouldn't try to use dedup on a kernel older than v4.1 because of these > fixes in 4.1 and later: I would assume that these would be ported to the other longterm kernels like 3.18 at some point? > Do dedup a photo or video file collection. Don't dedup > a live database server on a filesystem with compression enabled...yet. LIkewise. Typically I just dedup the entire filesystem, so it sounds like we're not quite there yet. Would it make sense to put this on the wiki in the gotchas section? > Using dedup and defrag at the same time is still a bad idea. The features > work against each other You mentioned quite a bit about autodefrag. I was thinking more in terms of using explicit defrag, as was done by dedup in the past. It looks like duperemove doesn't actually do this, perhaps because it is also considered unsafe these days. Thanks, I was just trying to get a sense for where this was at. It sounds like we're getting to the point where it could be used in general, but for now it is probably best to run it manually on stuff that isn't too busy. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
State of Dedup / Defrag
What is the current state of Dedup and Defrag in btrfs? I seem to recall there having been problems a few months ago and I've stopped using it, but I haven't seen much news since. I'm interested both in the 3.18 and subsequent kernel series. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS as image store for KVM?
On Mon, Oct 5, 2015 at 7:16 AM, Lionel Boutonwrote: > According to the bad performance -> unstable logic, md would then be the > less stable RAID1 implementation which doesn't make sense to me. > The argument wasn't that bad performance meant that something was unstable. The argument was that a lack of significant performance optimization meant that the developers considered it unstable and not worth investing time on optimizing. So, the question isn't whether btrfs is or isn't faster than something else. the question is whether it is or isn't faster than it could be if it were properly optimized. That is, how does btrfs perform today against btrfs from 20 years from now, which obviously cannot be benchmarked today. That said, I'm not really convinced that the developers haven't fixed this because they feel that it would need to be redone later after major refactoring. I think it is more likely that there are just very few developers working on btrfs and load-balancing on raid just doesn't rank high on their list of interests or possibly expertise. If any are being paid to work on btrfs then most likely their employers don't care too much about it either. I did find the phoronix results interesting though. The whole driver for "layer-violation" is that with knowledge of the filesystem you can better optimize what you do/don't read and write, and that may be showing here. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS as image store for KVM?
On Sun, Oct 4, 2015 at 8:03 AM, Lionel Boutonwrote: > > This focus on single reader RAID1 performance surprises me. > > 1/ AFAIK the kernel md RAID1 code behaves the same (last time I checked > you need 2 processes to read from 2 devices at once) and I've never seen > anyone arguing that the current md code is unstable. Perhaps, but with btrfs it wouldn't be hard to get 1000 processes reading from a raid1 in btrfs and have every single request directed to the same disk with the other disk remaining completely idle. I believe the algorithm is just whether the pid is even or odd, and doesn't take into account disk activity at all, let alone disk performance or anything more sophisticated than that. I'm sure md does a better job than that. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fstrim silently does nothing on dev add/dev rem'd filesystem
On Sun, Sep 27, 2015 at 10:45 PM, Duncan <1i5t5.dun...@cox.net> wrote: > But I think part of reasoning behind the relatively low priority this > issue has received is that it's a low visibility issue not really > affecting most people running btrfs, either because they're not running > on ssd or because they simply don't have a particularly high awareness of > what trim does and thus about how it's failing to work here and what that > means to them. If we get a rash of people posting on-list that it's > affecting them, that relative priority is likely to go up, and with it > the patch testing and integration schedule for the affected patches. I've never actually seen fstrim do anything on btrfs (0 bytes trimmed). I stopped using it a few months ago when the news came out about all the issues with its implementation, and I believe my drive is still blacklisted anyway. It really should be fixed, but right now that goes all around - if btrfs fixed it tomorrow I'd still be stuck until somebody figures out how to reliably do it on a Samsung 850. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Latest kernel to use?
On Fri, Sep 25, 2015 at 9:25 AM, Bostjan Skufcawrote: > > Similar here: I am sticking with 3.19.2 which has proven to work fine for me I'd recommend still tracking SOME stable series. I'm sure there were fixes in 3.19 for btrfs (to say nothing of other subsystems) that you're missing with that version. 3.19 is also unsupported at this time. You might want to consider moving to either 3.18.21 or 4.1.8 and tracking those series instead. I doubt you'd give up much moving back to 3.18 and there have been a bunch of btrfs fixes in that series (though it seems to me that 3.18 has been slower to receive btrfs patches than some of the other series). I'm on the fence right now about making the move to 4.1. Maybe in a few releases I'll be there, depending on what the noise on the lists sounds like. There was a time when you were better off on bleeding-edge linux for btrfs. If you REALLY want to run btrfs raid5 or something like that then I'd say that is still your best strategy. However, if you stick with features that have been around for a year the longterm kernels seem a lot less likely to hit you with a regression, as long as you don't switch to a new one the day it is declared as such. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS as image store for KVM?
On Sat, Sep 19, 2015 at 9:26 PM, Jim Salterwrote: > > ZFS, by contrast, works like absolute gangbusters for KVM image storage. I'd be interested in what allows ZFS to handle KVM image storage well, and whether this could be implemented in btrfs. I'd think that the fragmentation issues would potentially apply to any COW filesystem, and if ZFS has a solution for this then it would probably benefit btrfs to implement the same solution, and not just for VM images. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Latest kernel to use?
On Fri, Sep 25, 2015 at 7:20 AM, Austin S Hemmelgarnwrote: > On 2015-09-24 17:07, Sjoerd wrote: >> >> Maybe a silly question for most of you, but the wiki states to always try >> to >> use the latest kernel with btrfs. Which one would be best: >> - 4.2.1 (currently latest stable and matches the btrfs-progs versioning) >> or >> - the 4.3.x (mainline)? >> >> Stable sounds more stable to me(hence the name ;) ), but the mainline >> kernel >> seems to be in more active development? >> > Like Hugo said, 4.2.1 is what you want right now. In general, go with the > highest version number that isn't a -rc version (4.3 isn't actually released > yet, IIRC they're up to 4.3-rc2 right now, and almost at -rc3) (we should > probably be specific like this on the wiki). > I'll just say that my btrfs stability has gone WAY up when I stopped following this advice and instead followed a recent longterm. Right now I'm following 3.18. There were some really bad corruption issues in 3.17/18/19 that burned me, and today while considering moving up to 4.1 I'm still seeing a lot of threads about issues during balance/etc. I still run into the odd issue with 3.18, but not nearly to the degree that I used to. Now, I would stick with a recent longterm. The older longterms go back to a time when btrfs was far more experimental. Even 3.16 probably has a lot of issues that are fixed in 3.18. That said, if you do run into an issue on a longterm kernel nobody around here is likely to be able to help you much unless you can reproduce it on the most recent stable kernel. Just tossing that out as an alternative opinion. Right now I'm sticking with 3.18, but I'm interested in making the 4.1 switch once issues with that seem to have died down. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FYIO: A rant about btrfs
On Wed, Sep 16, 2015 at 12:45 PM, Martin Tippmannwrote: > From reading the list I understand that btrfs is still very much work > in progress and performance is not a top priority at this stage but I > don't see why it shouldn't perform at least equally good as ZFS/F2FS > on the same workloads. Is looking at performance problems on the > development roadmap? My sense is that sufferings in comparison to ZFS just represent a lack of maturity - there just hasn't been as much focus on performance. I'm not aware of any fundamental design issues which are likely to make btrfs perform worse than ZFS in the long-term. F2FS is a fundamentally different beast. It is a log-based filesystem as far as I'm aware, and on flash that gives it some substantial advantages, but it doesn't support snapshotting/etc as far as I'm aware. I'm sure that in the long term some operations are just going to be faster on F2FS no matter what just due to its design, and other operations will always be slower on F2FS. To draw an analogy, imagine you have a 1TB ext4 filesystem and a 1TB btrfs filesystem. On each you create a 900GB file, and then proceed to make millions of internal writes all over it. The ext4 filesystem is just going to completely outperform btrfs at this job, and I suspect it would outperform zfs as well. For such a use case you don't really even need a filesystem - you might as well just be reading/writing random blocks right off the disk, and ext4 is pretty close to that in behavior when it comes to internal file modifications. The COW filesystems are going to be fragmenting the living daylights out of the file and its metadata. Of course, if you pulled the plug in the middle of one of those operations the COW filesystems are more likely to end up in a sane state if you care about the order of file modifications, and if you're doing this on RAID both zfs and btrfs will be immune to any write hole issues. Also, if you go making reflink copies of large files on a btrfs filesystem it will perform MUCH better than doing the equivalent on ext4 (which requires copying all the data, at a cost of both time and space). In the end you have to look at your application, and not just performance stats. There are tradeoffs. Personally, I've had enough hard drive failures that btrfs is worth it to me just for the assurance that when something goes wrong the filesystem knows what is good and what isn't. As drives get bigger this becomes more and more important. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 on uneven-sized disks
On Sun, Aug 9, 2015 at 8:47 AM, Hugo Mills h...@carfax.org.uk wrote: On Sun, Aug 09, 2015 at 02:29:53PM +0200, Jim MacBaine wrote: Hi, How does btrfs handle raid1 on a bunch of uneven sized disks? Can I just keep adding arbitrarily sized disks to an existing raid1 and expect the file system to continue to keep two copies of everything, so I could survive the loss of any single disk without data loss? Does btrfs work this way? Yes, exactly. You may find that http://carfax.org.uk/btrfs-usage/ is helpful. The key is that btrfs manages raid at the chunk level, not the device level. When btrfs needs more disk space it allocates a new chunk from unallocated space on a device. If it is in raid1 mode it will allocate a pair of chunks from two different drives, storing the same data in each. The allocation algorithm is reasonably smart so if you have 2x1TB drives and 1x3TB drive you'll end up with about 2TB of data stored and not 1TB on each of the two 1TB drives and an empty unusable 3TB drive. This is also why you can switch between raid modes on the fly - switching modes only affects newly-allocated chunks, and the old ones operate in whatever mode they were previously in. A balance operation rewrites the existing data to new chunks which would force everything to use the new mode. This also lets you do things like add a disk to a raid5. If you have 5 disks and add one more, existing chunks will be striped across 5 drives, and new chunks will be striped across 6, unless you balance them. That may be a bit oversimplified, and obviously others on the list know all the details... -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: systemd : Timed out waiting for defice dev-disk-by…
On Mon, Jul 27, 2015 at 1:20 AM, Duncan 1i5t5.dun...@cox.net wrote: Philip Seeger posted on Sun, 26 Jul 2015 22:39:04 +0200 as excerpted: Hi, 50% of the time when booting, the system go in safe mode because my 12x 4TB RAID10 btrfs is taking too long to mount from fstab. This won't help, but I've seen this exact behavior too (some time ago). Except that it wasn't 50% that it didn't work, more like almost everytime. Commenting out the fstab entry fixed it, mounting using a cronjob (@reboot) worked without a problem. (As far as I remember, options like x-systemd.device-timeout didn't change anything.) If someone has the answer, I'd be interested too. You mean something like a custom systemd *.service unit file? That's what I'd do here. =:^) I'd have to play with it to work out the kinks, but I'm pretty sure you'd be better off with a mount unit instead of basically reinventing a mount unit using a service unit. I'd also think that you could also use drop-ins to enhance the auto-generated units created by the fstab generator, if you just wanted to add a dependency or such to a mount unit. However, I've never tried to create a drop-in for a generated unit. Mount units should take any setting in systemd.unit which includes all the ordering/dependency/etc controls. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please add 9c4f61f01d269815bb7c37be3ede59c5587747c6 to stable
On Mon, Apr 13, 2015 at 12:58 PM, Greg KH gre...@linuxfoundation.org wrote: On Mon, Apr 13, 2015 at 07:28:38PM +0500, Roman Mamedov wrote: On Thu, 2 Apr 2015 10:17:47 -0400 Chris Mason c...@fb.com wrote: Hi stable friends, Can you please backport this one to 3.19.y. It fixes a bug introduced by: 381cf6587f8a8a8e981bc0c18859b51dc756, which was tagged for stable 3.14+ The symptoms of the bug are deadlocks during log reply after a crash. The patch wasn't intentionally fixing the deadlock, which is why we missed it when tagging fixes. Unfortunately still not fixed (no btrfs-related changes) in 3.14.38 and 3.18.11 released today. I have a few hundred stable backports left to sort through, don't worry, this is still in the queue, it's not lost. It looks like this still isn't in 3.18.12, though it looks like it is in 3.19.5. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Upgrade to 3.19.2 Kernel fails to boot
On Wed, Apr 1, 2015 at 2:50 AM, Anand Jain anand.j...@oracle.com wrote: Eric found something like this and has a fix with in the email. Sub: I think btrfs: fix leak of path in btrfs_find_item broke stable trees ... I don't mind trying this patch if the maintainers recommend it. I'm still getting panics every few days and 3.18.10 won't mount my root filesystem, so I've been running on 3.18.8. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs dedup - available or experimental? Or yet to be?
On Sun, Mar 29, 2015 at 7:43 AM, Kai Krakow hurikha...@gmail.com wrote: With the planned performance improvements, I'm guessing the best way will become mounting the root subvolume (subvolid 0) and letting duperemove work on that as a whole - including crossing all fs boundaries. Why cross filesystem boundaries by default? If you scan from the root subvolume you're guanteed to traverse every file on the filesystem (which is all that can be deduped) without crossing any filesystem boundaries. Even if you have btrfs on non-btrfs on btrfs there must be some other path that reaches the same files when scanning from subvolid 0. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs dedup - available or experimental? Or yet to be?
On Thu, Mar 26, 2015 at 8:07 PM, Martin m_bt...@ml1.co.uk wrote: Anyone with any comments on how well duperemove performs for TB-sized volumes? Took many hours but less than a day for a few TB - I'm not sure whether it is smart enough to take less time on subsequent scans like bedup. Does it work across subvolumes? (Presumably not...) As far as I can tell, yes. Unless you pass a command-line option it crosses filesystem boundaries and even scans non-btrfs filesystems (like /proc, /dev, etc). Obviously you'll want to avoid that since it only wastes time and I can just imagine it trying to hash kcore and such. Other than being less-than-ideal intelligence-wise, it seemed effective. I can live with that in an early release like this. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: snapshot destruction making IO extremely slow
On Wed, Mar 25, 2015 at 6:55 AM, Marc Cousin cousinm...@gmail.com wrote: On 25/03/2015 02:19, David Sterba wrote: as it reads the pre/post snapshots and deletes them if the diff is empty. This adds some IO stress. I couldn't find a clear explanation in the documentation. Does it mean that when there is absolutely no difference between two snapshots, one of them is deleted ? And that snapper does a diff between them to determine that ? It seems like there should be some supported way of doing a diff on two btrfs subvolumes. There should be no need to recursively scan trees if the heads of those trees are shared. If I change one file at the bottom of a 10 layer directory hierarchy, it should only take a small number of reads to determine this. The problem is that we don't have any functionality in kernel space to do this (that I'm aware of), and we don't expose the necessary information to userspace for it to do this smartly (again, as far as I'm aware). Maybe there would be some way to do it using btrfs send and parsing the output. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Upgrade to 3.19.2 Kernel fails to boot
On Tue, Mar 24, 2015 at 2:31 AM, Anand Jain anand.j...@oracle.com wrote: Do you have this fix .. [PATCH] Btrfs: release path before starting transaction in can_nocow_extent could you try ?. I believe I already have this patch. 3.18.9 contains this: commit bdeeab62a611f1f7cd48fd285ce568e8dcd0455a Merge: 797afdf 1bda19e Author: Linus Torvalds torva...@linux-foundation.org Date: Fri Oct 18 16:46:21 2013 -0700 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fix from Chris Mason: Sage hit a deadlock with ceph on btrfs, and Josef tracked it down to a regression in our initial rc1 pull. When doing nocow writes we were sometimes starting a transaction with locks held * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: release path before starting transaction in can_nocow_extent -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs dedup - available or experimental? Or yet to be?
On Mon, Mar 23, 2015 at 7:22 PM, Hugo Mills h...@carfax.org.uk wrote: On Mon, Mar 23, 2015 at 11:10:46PM +, Martin wrote: As titled: Does btrfs have dedup (on raid1 multiple disks) that can be enabled? The current state of play is on the wiki: https://btrfs.wiki.kernel.org/index.php/Deduplication I hadn't realized that bedup was deprecated. This seems unfortunate since it seemed to be a lot smarter about detecting what has and hasn't already been scanned, and it also supported defragmenting files while de-duplicating them. I'll give duperemove a shot. I just packaged it on Gentoo. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Upgrade to 3.19.2 Kernel fails to boot
On Mon, Mar 23, 2015 at 4:23 AM, Anand Jain anand.j...@oracle.com wrote: Do you still have the problem ? Can you pls confirm on the latest btrfs ? Since I am fixing the devices part of the btrfs, I am bit nervous. I'm having a similar problem. I'm getting some kind of btrfs corruption that causes a panic/reboot, and then the initramfs won't mount root for 3.18.9, but it will mount it for 3.18.8. Running on 3.18.8 eventually caused the panic to repeat, so I'm not sure that 3.18.9 is necessarily breaking things - it might just be fussier about not mounting a dirty fs. I did run a btrfs check --repair and it ended up moving some chromium preferences from the user profile folder to lost+found. That got the system to run for about 8 hours, but it still paniced the next morning. I'm now running on 3.18.7 to see what happens. Unfortunately I haven't been doing a good job about capturing logs. I'll try to capture more the next time this happens. I've been running fine on 3.18 for a while now, so I'm not sure where all of this is coming from. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Upgrade to 3.19.2 Kernel fails to boot
On Mon, Mar 23, 2015 at 9:22 AM, Rich Freeman r-bt...@thefreemanclan.net wrote: I'm having a similar problem. I'm getting some kind of btrfs corruption that causes a panic/reboot, and then the initramfs won't mount root for 3.18.9, but it will mount it for 3.18.8. Running on 3.18.8 eventually caused the panic to repeat, so I'm not sure that 3.18.9 is necessarily breaking things - it might just be fussier about not mounting a dirty fs. This continues to happen. The filesystem won't mount with 3.18.9, but will mount with 3.18.8. Here is the dmesg output from dracut on 3.18.9: [ 240.765147] INFO: task mount:395 blocked for more than 120 seconds. [ 240.765224] Not tainted 3.18.9-gentoo #1 [ 240.765274] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 240.765809] mount D 880427c51900 11800 395 1 0x0004 [ 240.765927] 88040d2f76a8 0082 8804106170f0 00011900 [ 240.766181] 88040d2f7fd8 00011900 88041593d6e0 8804106170f0 [ 240.766373] 88040d2f76b8 8800cb505c70 8800cb505cf0 8800cb505cd8 [ 240.766556] Call Trace: [ 240.766618] [81504084] schedule+0x24/0x60 [ 240.766719] [a032fe9d] btrfs_tree_lock+0x4d/0x1c0 [btrfs] [ 240.766780] [810882f0] ? prepare_to_wait_event+0x100/0x100 [ 240.766859] [a02d3859] btrfs_search_slot+0x6e9/0x9f0 [btrfs] [ 240.766939] [a02d5503] btrfs_insert_empty_items+0x73/0xd0 [btrfs] [ 240.767017] [a02ce495] ? btrfs_alloc_path+0x15/0x20 [btrfs] [ 240.767118] [a033012a] btrfs_insert_orphan_item+0x5a/0x80 [btrfs] [ 240.767211] [a03316c5] insert_orphan_item+0x65/0xa0 [btrfs] [ 240.767301] [a0336589] replay_one_buffer+0x349/0x360 [btrfs] [ 240.767391] [a0330ff5] walk_up_log_tree+0xc5/0x220 [btrfs] [ 240.767481] [a03311eb] walk_log_tree+0x9b/0x1a0 [btrfs] [ 240.767572] [a0338932] btrfs_recover_log_trees+0x262/0x4d0 [btrfs] [ 240.767662] [a0336240] ? replay_one_extent+0x780/0x780 [btrfs] [ 240.767749] [a02f4b9f] open_ctree+0x17ef/0x2100 [btrfs] [ 240.767827] [a02cb876] btrfs_mount+0x766/0x900 [btrfs] [ 240.767886] [81175bef] mount_fs+0x3f/0x1b0 [ 240.767940] [811331b0] ? __alloc_percpu+0x10/0x20 [ 240.767997] [8118fc53] vfs_kern_mount+0x63/0x100 [ 240.768087] [a02cb28b] btrfs_mount+0x17b/0x900 [btrfs] [ 240.768146] [81132e8a] ? pcpu_alloc+0x35a/0x660 [ 240.768201] [81175bef] mount_fs+0x3f/0x1b0 [ 240.768255] [811331b0] ? __alloc_percpu+0x10/0x20 [ 240.768311] [8118fc53] vfs_kern_mount+0x63/0x100 [ 240.768365] [8119289c] do_mount+0x20c/0xaf0 [ 240.768420] [81118eb9] ? __get_free_pages+0x9/0x40 [ 240.768474] [81192555] ? copy_mount_options+0x35/0x150 [ 240.768528] [81193497] SyS_mount+0x97/0xf0 [ 240.768582] [81507ad2] system_call_fastpath+0x12/0x17 [ 240.768638] INFO: task btrfs-transacti:435 blocked for more than 120 seconds. [ 240.768693] Not tainted 3.18.9-gentoo #1 [ 240.768742] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 240.768811] btrfs-transacti D 880427c11900 12424 435 2 0x [ 240.768928] 8800cfab7dc8 0046 880410f01a10 00011900 [ 240.769119] 8800cfab7fd8 00011900 81a16460 880410f01a10 [ 240.769302] 8800cfab7dd8 88040c7ab000 8800cb554000 8800cb5301a0 [ 240.769485] Call Trace: [ 240.769540] [81504084] schedule+0x24/0x60 [ 240.769625] [a02f73e5] btrfs_commit_transaction+0x275/0xa40 [btrfs] [ 240.769698] [810882f0] ? prepare_to_wait_event+0x100/0x100 [ 240.769784] [a02f305d] transaction_kthread+0x1ad/0x240 [btrfs] [ 240.769870] [a02f2eb0] ? btrfs_cleanup_transaction+0x530/0x530 [btrfs] [ 240.769942] [8106aa04] kthread+0xc4/0xe0 [ 240.769997] [8106a940] ? kthread_create_on_node+0x190/0x190 [ 240.770064] [81507a2c] ret_from_fork+0x7c/0xb0 [ 240.770119] [8106a940] ? kthread_create_on_node+0x190/0x190 [ 360.832426] INFO: task mount:395 blocked for more than 120 seconds. [ 360.832488] Not tainted 3.18.9-gentoo #1 [ 360.832539] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 360.832609] mount D 880427c51900 11800 395 1 0x0004 [ 360.832727] 88040d2f76a8 0082 8804106170f0 00011900 [ 360.832911] 88040d2f7fd8 00011900 88041593d6e0 8804106170f0 [ 360.833093] 88040d2f76b8 8800cb505c70 8800cb505cf0 8800cb505cd8 [ 360.833276] Call Trace: [ 360.833385] [81504084] schedule+0x24/0x60 [ 360.833495] [a032fe9d] btrfs_tree_lock+0x4d/0x1c0 [btrfs] [ 360.833555] [810882f0] ? prepare_to_wait_event+0x100/0x100 [ 360.833634
btrfs raid5 with mixed disks
How does btrfs raid5 handle mixed-size disks? The docs weren't terribly clear on this. Suppose I have 4x3TB and 1x1TB disks. Using conventional lvm+mdadm in raid5 mode I'd expect to be able to fit about 10TB of space on those (2TB striped across 4 disks plus 1TB striped across 5 disks after partitioning). How much would btrfs be able to store in the same configuration? I did see something about being able to use fixed-size stripes, and I'm not sure if this helps. If it does, are there any penalties, especially with future expansion of the array? With raid1 mode btrfrs is reasonably smart about mixed disk sizes, and you usually end up with half of the total space available. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: scrub implies failing drive - smartctl blissfully unaware
On Tue, Nov 25, 2014 at 6:13 PM, Chris Murphy li...@colorremedies.com wrote: A few years ago companies including Western Digital started shipping large cheap drives, think of the green drives. These had very high TLER (Time Limited Error Recovery) settings, a.k.a. SCT ERC. Later they completely took out the ability to configure this error recovery timing so you only get the upward of 2 minutes to actually get a read error reported by the drive. Why sell an $80 hard drive when you can change a few bytes in the firmware and sell a crippled $80 drive and an otherwise-identical non-crippled $130 drive? -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filesystem corruption
On Thu, Oct 30, 2014 at 9:02 PM, Tobias Holst to...@tobby.eu wrote: Addition: I found some posts here about a general file system corruption in 3.17 and 3.17.1 - is this the cause? Additionally I am using ro-snapshots - maybe this is the cause, too? Anyway: Can I fix that or do I have to reinstall? Haven't touched the filesystem, just did a scrub (found 0 errors). Yup - ro-snapshots is a big problem in 3.17. You can probably recover now by: 1. Update your kernel to 3.17.2 - that takes care of all the big known 3.16/17 issues in general. 2. Run btrfs check using btrfs-tools 3.17. That can clean up the broken snapshots in your filesystem. That is fairly likely to get your filesystem working normally again. It worked for me. I was getting some balance issues when trying to add another device and I'm not sure if 3.17.2 totally fixed that - I ended up cancelling the balance and it will be a while before I have to balance this particular filesystem again, so I'll just hold off and hope things stabilize. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS balance segfault, where to go from here
On Tue, Oct 28, 2014 at 9:12 AM, E V eliven...@gmail.com wrote: I've seen dead locks on 3.16.3. Personally, I'm staying with 3.14 until something newer stabilizes, haven't had any issues with it. You might want to try the latest 3.14, though I think there should be a new one pretty soon with quite a few btrfs patches. Yeah, I forget what drove me to switch to a newer kernel, but I'm wishing I had stuck with 3.14. The last set of stable kernels has been a pretty rough ride. :) My sense browsing the list is that the activity level has picked up a bit, and that might be why 3.15-17 have been a bit more bug-ridden than is normal. For the long-term it is actually a good sign for the vitality of btrfs. But, I'll probably track 3.17 until a new longterm is announced and be a bit more conservative. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS balance segfault, where to go from here
On Tue, Oct 28, 2014 at 9:33 AM, Duncan 1i5t5.dun...@cox.net wrote: Since it's not an option here I've not looked into it too closely personally, and don't know if it'll fit your needs, but if it does, it may well be simpler to substitute it into the existing backup setup without rewriting the WHOLE thing, than to do that full rewrite from scratch, without the btrfs/zfs features. I'd at least look into it, assuming you haven't already. I haven't researched zfs as thoroughly as btrfs and I'm not running it, but you're certainly right that it is more mature (though I would not say that zfs on linux is as mature as zfs on BSD or especially Solaris). Keep in mind that ZFS is marketed more towards enterprise workloads. It isn't quite a dynamic as btrfs is intended to be, though in truth many of those btrfs features like reshaping a raid5 aren't implemented yet. My sense is that you're going to need to plan ahead a bit more with ZFS and making changes without doing a full backup/re-create is going to be harder. It also isn't designed for SSD (though it does have features for SSD caching of the write log and I think also read-caching, which is something that does not yet exist for btrfs). From what I understand of both I'd say that btrfs actually has the better overall design, but zfs just has a LOT more maturity. I think that btrfs will eventually overtake it, but just when that will happen is anybody's guess, and it certainly isn't there today. The one thing that zfs does have going for you is that you're very unlikely to get BUGs and PANICs anytime you do something as simple as running rsync on it. I will also note that I rsync data off of my btrfs filesystem all the time without issue. I do not have experience with using rsync to write TO a btrfs filesystem. Right now I don't trust btrfs send enough to rely on it - the whole purpose of using rsync right now is to backup my btrfs data to an ext4 partition which lets me sleep well at night while still getting to play around with btrfs and make use of features like snapshots/etc. :) If I was running a large (ie measured in 10s of disks) storage system I'd probably go with ZFS now. In such a setup being limited to RAID6s of maybe 7 drives each and having to add/remove drives 7 at a time wouldn't be a big deal. When you're running a system with 6 disks total that is a much bigger limitation. If you look at something like Backblade's storage pods that is the perfect example of the kind of situation ZFS was designed to handle. On the other hand, btrfs aims to eventually address that while being a decent default filesystem for your smartphone. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance segfault, kernel BUG at fs/btrfs/extent-tree.c:7727
On Mon, Oct 13, 2014 at 11:12 AM, Rich Freeman r-bt...@thefreemanclan.net wrote: On Thu, Oct 9, 2014 at 10:19 AM, Petr Janecek jane...@ucw.cz wrote: I have trouble finishing btrfs balance on five disk raid10 fs. I added a disk to 4x3TB raid10 fs and run btrfs balance start /mnt/b3, which segfaulted after few hours, probably because of the BUG below. btrfs check does not find any errors, both before the balance and after reboot (the fs becomes un-umountable). [22744.238559] WARNING: CPU: 0 PID: 4211 at fs/btrfs/extent-tree.c:876 btrfs_lookup_extent_info+0x292/0x30a [btrfs]() [22744.532378] kernel BUG at fs/btrfs/extent-tree.c:7727! I am running into something similar. I just added a 3TB drive to my raid1 btrfs and started a balance. The balance segfaulted, and I find this in dmesg: I got another one of these crashes during a balance today, and this is on 3.17.1 with the Btrfs: race free update of commit root for ro snapshots patch. So, there is something else in 3.17.1 that causes this problem. I did see mention of an extent error of some kind on the lists and I don't have that patch - I believe it is planned for 3.17.2. After the crash the filesystem became read-only. I didn't have any way to easily capture the logs, but I got repeated crashes when trying to re-mount the filesystem after rebooting. The dmesg log showed read errors from one of the devices (bdev /dev/sdb2 errs: wr 0, rd 1361, flush 0, corrupt 0, gen 0). When I tried to btrfs check the filesystem with btrfs-progs 3.17 it abruptly terminated and output an error mentioning could not find extent items followed by root and a really large number. I finally managed to recover by mounting the device with skip_balance - I suspect that it was crashing due to attempts to restart the failing balance. Then after letting the filesystem settle down I unmounted it cleanly and rebooted and everything was back to normal. However, i'm still getting bdev /dev/sdb2 errs: wr 0, rd 1361, flush 0, corrupt 0, gen 0 in my dmesg logs. I have tried scrubbing the device with no errors found. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: device balance times
On Thu, Oct 23, 2014 at 10:35 PM, Zygo Blaxell ce3g8...@umail.furryterror.org wrote: - single profile: we can tolerate zero missing disks, so we don't allow rw mounts even if degraded. That seems like the wrong logic here. By all means mount read-only by default for safety, but there should be a way to force a read-write mount on any filesystem, precisely because the RAID modes can be mixed and even if you lose two devices on a RAID1 system not ALL the data is lost if you have more than two drives. By all means return an error when reading a file that is completely missing. By all means have an extra fsck mode that goes ahead and deletes all the missing files (assuming it has metadata) or perhaps moves them all to a new lost+notfound subvolume or something. Indeed, if the lost device just happens to not actually contain any data you might be lucky and not lose any data at all when losing a single device in a filesystem that entirely uses the single profile. That would be a bit of an edge case though, but one that is automatically handled if you give the admin the ability to force read-write/etc. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: device balance times
On Fri, Oct 24, 2014 at 12:07 PM, Zygo Blaxell ce3g8...@umail.furryterror.org wrote: We could also leave this as an option to the user mount -o degraded-and-I-want-to-lose-my-data, but in my opinion the use case is very, very exceptional. Well, it is only exceptional if you never shut down during a conversion to raid1 as far as I understand it. :) IMHO the use case is common any time restoring the entire filesystem from backups is inconvenient. That covers a *lot* of users. I never have a machine with more than 50% of its raw disk space devoted to btrfs because I need raw space on the disk to do mkfs+rsync from the broken read-only btrfs filesystems. The problem is that if you want btrfs raid1 and you ALSO want to have an extra set of spares for copying your entire RAID1 to something else, you're talking about a lot of extra disk space. I really don't want to maintain a SAN just in case I have a btrfs problem. :) I realize things are still somewhat experimental now, but we need to at least think about how things will work long-term. Copying all your data to another filesystem and re-creating the btrfs filesystem isn't really a good recovery mode. Restoring from backups is also becoming increasingly difficult. IO bandwidth just has not kept pace with disk capacity. It can take the better part of a day to copy a multi-TB array, and if you need to copy it two ways you have to double the time, not to mention having multiple TB of disks lying around. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Poll: time to switch skinny-metadata on by default?
On Tue, Oct 21, 2014 at 5:29 AM, Duncan 1i5t5.dun...@cox.net wrote: David Sterba posted on Mon, 20 Oct 2014 18:34:03 +0200 as excerpted: On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote: I'd like to make it default with the 3.17 release of btrfs-progs. Please let me know if you have objections. For the record, 3.17 will not change the defaults. The timing of the poll was very bad to get enough feedback before the release. Let's keep it open for now. FWIW my own results agree with yours, I've had no problem with skinny- metadata here, and it has been my default now for a couple backup-and-new- mkfs.btrfs generations, now. How does one enable it for an existing filesystem? Is it safe to just run btrfstune -x? Can this be done on a mounted filesystem? Are there any risks with converting? -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unexplainable corruptions 3.17.0
On Mon, Oct 20, 2014 at 10:04 AM, Zygo Blaxell zblax...@furryterror.org wrote: On Fri, Oct 17, 2014 at 08:17:37AM +, Hugo Mills wrote: On Fri, Oct 17, 2014 at 10:10:09AM +0200, Tomasz Torcz wrote: On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote: Recently I've observed some corruptions to systemd's journal files which are somewhat puzzling. This is especially worrying as this is btrfs raid1 setup and I expected auto-healing. System details: 3.17.0-301.fc21.x86_64 btrfs: raid1 over 2x dm-crypted 6TB HDDs. mount opts: rw,relatime,seclabel,compress=lzo,space_cache Reads with cat, hexdump fails with: read(4, 0x1001000, 65536) = -1 EIO (Input/output error) Does scrub work for you? As there seem to be no way to scrub individual files, I've started scrub of full volume. It will take some hours to finish. Meanwhile, could you satisfy my curiosity what would scrub do that wouldn't be done by just reading the whole file? It checks both copies. Reading the file will only read one of the copies of any given block (so if that's good and the other copy is bad, it won't fix anything). Really? One of my earliest btrfs tests was to run a loop of 'sha1sum -c' on a gigabyte or two of files in one window while I used dd to write random data in random locations directly to one of the filesystem mirror partitions in the other. I did this test *specifically* to watch the automatic checksumming and self-healing features of btrfs in action. A complete 'sha1sum' verification of the filesystem contents passed even though the kernel log was showing checksum errors scrolling by faster than I could read, which strongly implies that read() normally does check both mirrors before returning EIO. I think you misread the earlier post. It sounds like the algorithm is: 1. Receive request to read block from file. 2. Determine which mirrored block to read it from (it sounds like this is sub-optimal today, presumably you'd want to use the least busy disk or disk with the head closest to the right cylinder to do it). 3. Read the block. Verify the checksum. If it matches return the data. 4. If not find another mirrored block to read it from if one exists. Verify the checksum. If it matches return the data and update all other mirrored copies with it. 5. Repeat step 4 until you run out of mirrored copies. If so, return an error. So, doing random reads will NOT be equivalent to scrubbing the disks, because with a scrub you want to check that ALL copies are code, and the algorithm above only determines that any copy is good. When you used dd to overwrite blocks, you didn't get errors because when the first copy failed the filesystem just read the second copy as intended. That isn't a scrub - it is a recovery. An actual scrub isn't file-focused, but device focused. It starts reading at the start of the device, and verifies each logical unit of data sequentially. This can be done asynchronously since btrfs stores checksums, as opposed to a traditional RAID where the reads need to be synchronous since the validity of a mirror/stripe can only be ascertained by comparing it to all the other devices in that mirror/stripe (and then unless you're using something like RAID6+ you couldn't determine which copy is bad without a checksum). In theory I'd expect a scrub with btrfs to be less detrimental to performance as a result - a read request could halt the scrub on one device without delaying the scrub on the other devices. Writes in RAID1 mode necessarily disrupt two devices, but others would not be impacted. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unexplainable corruptions 3.17.0
On Fri, Oct 17, 2014 at 8:53 AM, Chris Mason c...@fb.com wrote: This sounds like the problem fixed with some patches to our extent mapping code that went in with the merge window. I've cherry picked a few for stable and I'm running them through tests now. They are in my stable-3.17 branch, and I'll send to Greg once Linus grabs the revert for the last one. Just for clarity - when can we expect to see these in the kernel? I wasn't sure which merge windows you're referring to. I take it that 3.17.1 is still unpatched (for this and the readonly snapshot issue - which requires reverting 9c3b306e1c9e6be4be09e99a8fe2227d1005effc). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Random file system corruption in 3.17 (not BTRFS related...?)
On Wed, Oct 15, 2014 at 10:30 AM, Josef Bacik jba...@fb.com wrote: We've found it, the Fedora guys are reverting the bad patch now, we'll get the fix sent back to stable shortly. Sorry about that. After reverting this commit, can the bad snapshots be deleted/repaired/etc without wiping and restoring the entire filesystem? Copying 2.3TB of data isn't a particularly fast operation... -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: what is the best way to monitor raid1 drive failures?
On Tue, Oct 14, 2014 at 10:48 AM, Suman C schakr...@gmail.com wrote: The new drive shows up as sdb. btrfs fi show still prints drive missing. mounted the filesystem with ro,degraded tried adding the new sdb drive which results in the following error. (-f because the new drive has a fs from past) # btrfs device add -f /dev/sdb /mnt2/raid1pool /dev/sdb is mounted Unless I am missing something, this looks like a bug. You need to first run btrfs device delete missing /mnt2/raid1pool I believe (missing is a keyword for a missing device in the array - if the device were still present you could specify it by /dev/sdX). -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is the vision for btrfs fs repair?
On Sun, Oct 12, 2014 at 6:14 AM, Martin Steigerwald mar...@lichtvoll.de wrote: Am Freitag, 10. Oktober 2014, 10:37:44 schrieb Chris Murphy: On Oct 10, 2014, at 6:53 AM, Bob Marley bobmar...@shiftmail.org wrote: On 10/10/2014 03:58, Chris Murphy wrote: * mount -o recovery Enable autorecovery attempts if a bad tree root is found at mount time. I'm confused why it's not the default yet. Maybe it's continuing to evolve at a pace that suggests something could sneak in that makes things worse? It is almost an oxymoron in that I'm manually enabling an autorecovery If true, maybe the closest indication we'd get of btrfs stablity is the default enabling of autorecovery. No way! I wouldn't want a default like that. If you think at distributed transactions: suppose a sync was issued on both sides of a distributed transaction, then power was lost on one side, than btrfs had corruption. When I remount it, definitely the worst thing that can happen is that it auto-rolls-back to a previous known-good state. For a general purpose file system, losing 30 seconds (or less) of questionably committed data, likely corrupt, is a file system that won't mount without user intervention, which requires a secret decoder ring to get it to mount at all. And may require the use of specialized tools to retrieve that data in any case. The fail safe behavior is to treat the known good tree root as the default tree root, and bypass the bad tree root if it cannot be repaired, so that the volume can be mounted with default mount options (i.e. the ones in fstab). Otherwise it's a filesystem that isn't well suited for general purpose use as rootfs let alone for boot. To understand this a bit better: What can be the reasons a recent tree gets corrupted? I always thought with a controller and device and driver combination that honors fsync with BTRFS it would either be the new state of the last known good state *anyway*. So where does the need to rollback arise from? In theory the recover option should never be necessary. Btrfs makes all the guarantees everybody wants it to - when the data is fsynced then it will never be lost. The question is what should happen when a corrupted tree root, which should never happen, happens anyway. The options are to refuse to mount the filesystem by default, or mount it by default discarding about 30-60s worth of writes. And yes, when this situation happens (whether it mounts by default or not) btrfs has broken its promise of data being written after a successful fsync return. As has been pointed out, braindead drive firmware is the most likely cause of this sort of issue. However, there are a number of other hardware and software errors that could cause it, including errors in linux outside of btrfs, and of course bugs in btrfs as well. In an ideal world no filesystem would need any kind of recovery/repair tools. They can often mean that the fsync promise was broken. The real question is, once that has happened, how do you move on? I think the best default is to auto-recover, but to have better facilities for reporting errors to the user. Right now btrfs is very quiet about failures - maybe a cryptic message in dmesg, and nobody reads all of that unless they're looking for something. If btrfs could report significant issues that might mitigate the impact of an auto-recovery. Also, another thing to consider during recovery is whether the damaged data could be optionally stored in a snapshot of some kind - maybe in the way that ext3/4 rollback data after conversion gets stored in a snapshot. My knowledge of the underlying structures is weak, but I'd think that a corrupted tree root practically is a snapshot already, and turning it into one might even be easier than cleaning it up. Of course, we would need to ensure the snapshot could be deleted without further error. Doing anything with the snapshot might require special tools, but if people want to do disk scraping they could. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance segfault, kernel BUG at fs/btrfs/extent-tree.c:7727
On Thu, Oct 9, 2014 at 10:19 AM, Petr Janecek jane...@ucw.cz wrote: I have trouble finishing btrfs balance on five disk raid10 fs. I added a disk to 4x3TB raid10 fs and run btrfs balance start /mnt/b3, which segfaulted after few hours, probably because of the BUG below. btrfs check does not find any errors, both before the balance and after reboot (the fs becomes un-umountable). [22744.238559] WARNING: CPU: 0 PID: 4211 at fs/btrfs/extent-tree.c:876 btrfs_lookup_extent_info+0x292/0x30a [btrfs]() [22744.532378] kernel BUG at fs/btrfs/extent-tree.c:7727! I am running into something similar. I just added a 3TB drive to my raid1 btrfs and started a balance. The balance segfaulted, and I find this in dmesg: [453046.291762] BTRFS info (device sde2): relocating block group 10367073779712 flags 17 [453062.494151] BTRFS info (device sde2): found 13 extents [453069.283368] [ cut here ] [453069.283468] kernel BUG at /data/src/linux-3.17.0-gentoo/fs/btrfs/relocation.c:931! [453069.283590] invalid opcode: [#1] SMP [453069.283666] Modules linked in: vhost_net vhost macvtap macvlan tun ipt_MASQUERADE xt_conntrack veth nfsd auth_rpcgss oid_registry lockd iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables it87 hwmon_vid hid_logitech_dj nxt200x cx88_dvb videobuf_dvb dvb_core cx88_vp3054_i2c tuner_simple tuner_types tuner mousedev hid_generic usbhid cx88_alsa radeon cx8800 cx8802 cx88xx snd_hda_codec_realtek btcx_risc snd_hda_codec_generic videobuf_dma_sg videobuf_core kvm_amd tveeprom kvm rc_core v4l2_common cfbfillrect fbcon videodev cfbimgblt snd_hda_intel bitblit snd_hda_controller cfbcopyarea softcursor font tileblit i2c_algo_bit k10temp snd_hda_codec backlight drm_kms_helper snd_hwdep i2c_piix4 ttm snd_pcm snd_timer drm snd soundcore 8250 evdev [453069.285043] serial_core ext4 crc16 jbd2 mbcache zram lz4_compress zsmalloc ata_generic pata_acpi btrfs xor zlib_deflate atkbd raid6_pq ohci_pci firewire_ohci firewire_core crc_itu_t pata_atiixp ehci_pci ohci_hcd ehci_hcd usbcore usb_common r8169 mii sunrpc dm_mirror dm_region_hash dm_log dm_mod [453069.285552] CPU: 1 PID: 17270 Comm: btrfs Not tainted 3.17.0-gentoo #1 [453069.285657] Hardware name: Gigabyte Technology Co., Ltd. GA-880GM-UD2H/GA-880GM-UD2H, BIOS F8 10/11/2010 [453069.285806] task: 88040ec556e0 ti: 88010cf94000 task.ti: 88010cf94000 [453069.285925] RIP: 0010:[a02ddd62] [a02ddd62] build_backref_tree+0x1152/0x11b0 [btrfs] [453069.286137] RSP: 0018:88010cf97848 EFLAGS: 00010206 [453069.286223] RAX: 8800ae67c800 RBX: 880122e94000 RCX: 880122e949c0 [453069.286336] RDX: 09270788d000 RSI: 880054c3fbc0 RDI: 8800ae67c800 [453069.286449] RBP: 88010cf97958 R08: 000159a0 R09: 880122e94000 [453069.286561] R10: 0003 R11: R12: 8802da313000 [453069.286674] R13: 8802da313c60 R14: 880122e94780 R15: 88040c277000 [453069.286787] FS: 7f175ac51880() GS:880427c4() knlGS:f7333b40 [453069.286913] CS: 0010 DS: ES: CR0: 8005003b [453069.287005] CR2: 7f208de58000 CR3: 0003b0a9c000 CR4: 07e0 [453069.287116] Stack: [453069.287151] 88010cf97868 880122e94000 01ff880122e94300 880342156060 [453069.287282] 880122e94780 8802da313c60 880122e94600 8800ae67c800 [453069.287412] 880122e947c0 8802da313000 88040c277120 88010005 [453069.287542] Call Trace: [453069.287640] [a02ddfa3] relocate_tree_blocks+0x1e3/0x630 [btrfs] [453069.287796] [a02e0550] relocate_block_group+0x3d0/0x650 [btrfs] [453069.287951] [a02e0958] btrfs_relocate_block_group+0x188/0x2a0 [btrfs] [453069.288113] [a02b48f0] btrfs_relocate_chunk.isra.61+0x70/0x780 [btrfs] [453069.288276] [a02c7fd0] ? btrfs_set_lock_blocking_rw+0x70/0xc0 [btrfs] [453069.288438] [a02b0e79] ? free_extent_buffer+0x59/0xb0 [btrfs] [453069.288590] [a02b8e99] btrfs_balance+0x829/0xf40 [btrfs] [453069.288738] [a02bf80f] btrfs_ioctl_balance+0x1af/0x510 [btrfs] [453069.288890] [a02c59e4] btrfs_ioctl+0xa54/0x2950 [btrfs] [453069.288995] [8111d016] ? lru_cache_add_active_or_unevictable+0x26/0x90 [453069.289119] [8113a061] ? handle_mm_fault+0xbe1/0xdb0 [453069.289219] [811ffdce] ? cred_has_capability+0x5e/0x100 [453069.289323] [8104065c] ? __do_page_fault+0x1fc/0x4f0 [453069.289422] [8117d80e] do_vfs_ioctl+0x7e/0x4f0 [453069.289513] [811ff64f] ? file_has_perm+0x8f/0xa0 [453069.289606] [8117dd09] SyS_ioctl+0x89/0xa0 [453069.289692] [81040a1c] ? do_page_fault+0xc/0x10 [453069.289785] [814f5752] system_call_fastpath+0x16/0x1b [453069.289881] Code: ff ff 48 8b 9d 20 ff ff ff e9 11 ff ff ff 0f 0b be ec 03 00 00 48 c7 c7 d0 f0 30 a0 e8 28 00 d7 e0 e9 06 f3 ff ff e8 c4 42
Re: 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931!
On Thu, Oct 2, 2014 at 3:27 AM, Tomasz Chmielewski t...@virtall.com wrote: Got this when running balance with 3.17.0-rc7: [173475.410717] kernel BUG at fs/btrfs/relocation.c:931! I just started a post on another thread with this exact same issue on 3.17.0. I started a balance after adding a new drive. [453046.291762] BTRFS info (device sde2): relocating block group 10367073779712 flags 17 [453062.494151] BTRFS info (device sde2): found 13 extents [453069.283368] [ cut here ] [453069.283468] kernel BUG at /data/src/linux-3.17.0-gentoo/fs/btrfs/relocation.c:931! [453069.283590] invalid opcode: [#1] SMP [453069.283666] Modules linked in: vhost_net vhost macvtap macvlan tun ipt_MASQUERADE xt_conntrack veth nfsd auth_rpcgss oid_registry lockd iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables it87 hwmon_vid hid_logitech_dj nxt200x cx88_dvb videobuf_dvb dvb_core cx88_vp3054_i2c tuner_simple tuner_types tuner mousedev hid_generic usbhid cx88_alsa radeon cx8800 cx8802 cx88xx snd_hda_codec_realtek btcx_risc snd_hda_codec_generic videobuf_dma_sg videobuf_core kvm_amd tveeprom kvm rc_core v4l2_common cfbfillrect fbcon videodev cfbimgblt snd_hda_intel bitblit snd_hda_controller cfbcopyarea softcursor font tileblit i2c_algo_bit k10temp snd_hda_codec backlight drm_kms_helper snd_hwdep i2c_piix4 ttm snd_pcm snd_timer drm snd soundcore 8250 evdev [453069.285043] serial_core ext4 crc16 jbd2 mbcache zram lz4_compress zsmalloc ata_generic pata_acpi btrfs xor zlib_deflate atkbd raid6_pq ohci_pci firewire_ohci firewire_core crc_itu_t pata_atiixp ehci_pci ohci_hcd ehci_hcd usbcore usb_common r8169 mii sunrpc dm_mirror dm_region_hash dm_log dm_mod [453069.285552] CPU: 1 PID: 17270 Comm: btrfs Not tainted 3.17.0-gentoo #1 [453069.285657] Hardware name: Gigabyte Technology Co., Ltd. GA-880GM-UD2H/GA-880GM-UD2H, BIOS F8 10/11/2010 [453069.285806] task: 88040ec556e0 ti: 88010cf94000 task.ti: 88010cf94000 [453069.285925] RIP: 0010:[a02ddd62] [a02ddd62] build_backref_tree+0x1152/0x11b0 [btrfs] [453069.286137] RSP: 0018:88010cf97848 EFLAGS: 00010206 [453069.286223] RAX: 8800ae67c800 RBX: 880122e94000 RCX: 880122e949c0 [453069.286336] RDX: 09270788d000 RSI: 880054c3fbc0 RDI: 8800ae67c800 [453069.286449] RBP: 88010cf97958 R08: 000159a0 R09: 880122e94000 [453069.286561] R10: 0003 R11: R12: 8802da313000 [453069.286674] R13: 8802da313c60 R14: 880122e94780 R15: 88040c277000 [453069.286787] FS: 7f175ac51880() GS:880427c4() knlGS:f7333b40 [453069.286913] CS: 0010 DS: ES: CR0: 8005003b [453069.287005] CR2: 7f208de58000 CR3: 0003b0a9c000 CR4: 07e0 [453069.287116] Stack: [453069.287151] 88010cf97868 880122e94000 01ff880122e94300 880342156060 [453069.287282] 880122e94780 8802da313c60 880122e94600 8800ae67c800 [453069.287412] 880122e947c0 8802da313000 88040c277120 88010005 [453069.287542] Call Trace: [453069.287640] [a02ddfa3] relocate_tree_blocks+0x1e3/0x630 [btrfs] [453069.287796] [a02e0550] relocate_block_group+0x3d0/0x650 [btrfs] [453069.287951] [a02e0958] btrfs_relocate_block_group+0x188/0x2a0 [btrfs] [453069.288113] [a02b48f0] btrfs_relocate_chunk.isra.61+0x70/0x780 [btrfs] [453069.288276] [a02c7fd0] ? btrfs_set_lock_blocking_rw+0x70/0xc0 [btrfs] [453069.288438] [a02b0e79] ? free_extent_buffer+0x59/0xb0 [btrfs] [453069.288590] [a02b8e99] btrfs_balance+0x829/0xf40 [btrfs] [453069.288738] [a02bf80f] btrfs_ioctl_balance+0x1af/0x510 [btrfs] [453069.288890] [a02c59e4] btrfs_ioctl+0xa54/0x2950 [btrfs] [453069.288995] [8111d016] ? lru_cache_add_active_or_unevictable+0x26/0x90 [453069.289119] [8113a061] ? handle_mm_fault+0xbe1/0xdb0 [453069.289219] [811ffdce] ? cred_has_capability+0x5e/0x100 [453069.289323] [8104065c] ? __do_page_fault+0x1fc/0x4f0 [453069.289422] [8117d80e] do_vfs_ioctl+0x7e/0x4f0 [453069.289513] [811ff64f] ? file_has_perm+0x8f/0xa0 [453069.289606] [8117dd09] SyS_ioctl+0x89/0xa0 [453069.289692] [81040a1c] ? do_page_fault+0xc/0x10 [453069.289785] [814f5752] system_call_fastpath+0x16/0x1b [453069.289881] Code: ff ff 48 8b 9d 20 ff ff ff e9 11 ff ff ff 0f 0b be ec 03 00 00 48 c7 c7 d0 f0 30 a0 e8 28 00 d7 e0 e9 06 f3 ff ff e8 c4 42 02 00 0f 0b 3c b0 0f 84 72 f1 ff ff be 22 03 00 00 48 c7 c7 d0 f0 30 [453069.290429] RIP [a02ddd62] build_backref_tree+0x1152/0x11b0 [btrfs] [453069.290591] RSP 88010cf97848 [453069.316194] ---[ end trace 5fdc0af4cc62bf41 ]--- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: btrfs send and kernel 3.17
On Sun, Oct 12, 2014 at 7:11 AM, David Arendt ad...@prnet.org wrote: This weekend I finally had time to try btrfs send again on the newly created fs. Now I am running into another problem: btrfs send returns: ERROR: send ioctl failed with -12: Cannot allocate memory In dmesg I see only the following output: parent transid verify failed on 21325004800 wanted 2620 found 8325 I'm not using send at all, but I've been running into parent transid verify failed messages where the wanted is way smaller than the found when trying to balance a raid1 after adding a new drive. Originally I had gotten a BUG, and after reboot the drive finished balancing (interestingly enough without moving any chunks to the new drive - just consolidating everything on the old drives), and then when I try to do another balance I get: [ 4426.987177] BTRFS info (device sdc2): relocating block group 10367073779712 flags 17 [ 4446.287998] BTRFS info (device sdc2): found 13 extents [ 4451.330887] parent transid verify failed on 10063286579200 wanted 987432 found 993678 [ 4451.350663] parent transid verify failed on 10063286579200 wanted 987432 found 993678 The btrfs program itself outputs: btrfs balance start -v /data Dumping filters: flags 0x7, state 0x0, force is off DATA (flags 0x0): balancing METADATA (flags 0x0): balancing SYSTEM (flags 0x0): balancing ERROR: error during balancing '/data' - Cannot allocate memory There may be more info in syslog - try dmesg | tail This is also on 3.17. This may be completely unrelated, but it seemed similar enough to be worth mentioning. The filesystem otherwise seems to work fine, other than the new drive not having any data on it: Label: 'datafs' uuid: cd074207-9bc3-402d-bee8-6a8c77d56959 Total devices 6 FS bytes used 2.16TiB devid1 size 2.73TiB used 2.40TiB path /dev/sdc2 devid2 size 931.32GiB used 695.03GiB path /dev/sda2 devid3 size 931.32GiB used 700.00GiB path /dev/sdb2 devid4 size 931.32GiB used 700.00GiB path /dev/sdd2 devid5 size 931.32GiB used 699.00GiB path /dev/sde2 devid6 size 2.73TiB used 0.00 path /dev/sdf2 This is btrfs-progs-3.16.2. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs random filesystem corruption in kernel 3.17
On Mon, Oct 13, 2014 at 4:27 PM, David Arendt ad...@prnet.org wrote: From my own experience and based on what other people are saying, I think there is a random btrfs filesystem corruption problem in kernel 3.17 at least related to snapshots, therefore I decided to post using another subject to draw attention from people not concerned about btrfs send to it. More information can be found in the brtfs send posts. Did the filesystem you tried to balance contain snapshots ? Read only ones ? The filesystem contains numerous subvolumes and snapshots, many of which are read-only. I'm managing many with snapper. The similarity of the transid verify errors made me think this issue is related, and the root cause may have nothing to do with btrfs send. As far as I can tell these errors aren't having any affect on my data - hopefully the system is catching the problems before there are actual disk writes/etc. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs random filesystem corruption in kernel 3.17
On Mon, Oct 13, 2014 at 4:48 PM, john terragon jterra...@gmail.com wrote: I think I just found a consistent simple way to trigger the problem (at least on my system). And, as I guessed before, it seems to be related just to readonly snapshots: 1) I create a readonly snapshot 2) I do some changes on the source subvolume for the snapshot (I'm not sure changes are strictly needed) 3) reboot (or probably just unmount and remount. I reboot because the fs I've problems with contains my root subvolume) After the rebooting (or the remount) I consistently have the corruption with the usual multitude of these in dmesg parent transid verify failed on 902316032 wanted 2484 found 4101 and the characteristic ls -la output drwxr-xr-x 1 root root 250 Oct 10 15:37 root d? ? ?? ?? root-b2 drwxr-xr-x 1 root root 250 Oct 10 15:37 root-b3 d? ? ?? ?? root-backup root-backup and root-b2 are both readonly whereas root-b3 is rw (and it didn't get corrupted). David, maybe you can try the same steps on one of your machines? Look at that. I didn't realize it, but indeed I have a corrupted snapshot: /data/.snapshots/5338/: ls: cannot access /data/.snapshots/5338/snapshot: Cannot allocate memory total 4 drwxr-xr-x 1 root root 32 Oct 11 06:09 . drwxr-x--- 1 root root 32 Oct 11 07:42 .. -rw--- 1 root root 135 Oct 11 06:09 info.xml d? ? ?? ?? snapshot Several older snapshots are fine, and those predate my 3.17 upgrade. I noticed that this corrupted snapshot isn't even listed in my snapper lists. btrfs su delete /data/.snapshots/5338/snapshot Transaction commit: none (default) ERROR: error accessing '/data/.snapshots/5338/snapshot' Removing them appears to be problematic as well. I might just disable compress=lzo and go back to 3.16 to see how that goes. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs random filesystem corruption in kernel 3.17
On Mon, Oct 13, 2014 at 4:55 PM, Rich Freeman r-bt...@thefreemanclan.net wrote: On Mon, Oct 13, 2014 at 4:48 PM, john terragon jterra...@gmail.com wrote: After the rebooting (or the remount) I consistently have the corruption with the usual multitude of these in dmesg parent transid verify failed on 902316032 wanted 2484 found 4101 and the characteristic ls -la output Sorry to double-reply, but I left this out. I have a long string of these early in boot as well that I never noticed before. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs random filesystem corruption in kernel 3.17
On Mon, Oct 13, 2014 at 5:22 PM, john terragon jterra...@gmail.com wrote: I'm using compress=no so compression doesn't seem to be related, at least in my case. Just read-only snapshots on 3.17 (although I haven't tried 3.16). I was using lzo compression, and hence my comment about turning it off before going back to 3.16 (not realizing that 3.16 has subsequently been fixed). Ironically enough I discovered this as I was about to migrate my ext4 backup drive into my btrfs raid1. Maybe I'll go ahead and wait on that and have an rsync backup of the filesystem handy (minus snapshots) just in case. :) I'd switch to 3.16, but it sounds like there is no way to remove the snapshots at the moment, and I can live for a while without the ability to create new ones. interestingly enough it doesn't look like ALL snapshots are affected. I checked and some of the snapshots I made last weekend while doing system updates look accessible. They are significantly smaller, and the subvolumes they were made from are also fairly new - though I have no idea if that is related. The subvolumes do show up in btrfs su list. They cannot be examined using btrfs su show. It would be VERY nice to have a way of cleaning this up without blowing away the entire filesystem... -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.16 Managed to ENOSPC with 80% used
On Thu, Sep 25, 2014 at 5:21 PM, Holger Hoffstätte holger.hoffstae...@googlemail.com wrote: That's why I mentioned adding a second device - that will immediately allow cleanup with headroom. An additional 8GB tmpfs volume can works wonders. If you add a single 8GB tmpfs to a RAID1 btrfs array, is it safe to assume that you'll still always have a redundant copy of everything on a disk somewhere during the recovery? Would only a single tmpfs volume actually help in this case? I get a bit nervous about doing a cleanup that involves moving metadata to tmpfs of all places, since some kind of deadlock/etc could result in unrecoverable data loss. Doing the same thing with an actual hard drive would concern me less. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is it necessary to balance a btrfs raid1 array?
On Wed, Sep 10, 2014 at 9:06 AM, Austin S Hemmelgarn ahferro...@gmail.com wrote: Normally, you shouldn't need to run balance at all on most BTRFS filesystems, unless your usage patterns vary widely over time (I'm actually a good example of this, most of the files in my home directory are relatively small, except for when I am building a system with buildroot or compiling a kernel, and on occasion I have VM images that I'm working with). Tend to agree, but I do keep a close eye on free space. If I get to the point where I'm over 90% allocated to chunks with lots of unused space otherwise I run a balance. I tend to have the most problems with my root/OS filesystem running on a 64GB SSD, likely because it is so small. Is there a big performance penalty running mixed chunks on an SSD? I believe this would get rid of the risk of ENOSPC issues if everything gets allocated to chunks. There are obviously no issues with random access on an SSD, but there could be other problems (cache utilization, etc). I tend to watch btrfs fi sho and if the total space used starts getting high then I run a balance. Usually I run with -dusage=30 or -dusage=50, but sometimes I get to the point where I just need to do a full balance. Often it is helpful to run a series of balance commands starting at -dusage=10 and moving up in increments. This at least prevents killing IO continuously for hours. If we can get to a point where balancing can operate at low IO priority that would be helpful. IO priority is a problem in btrfs in general. Even tasks run at idle scheduling priority can really block up a disk. I've seen a lot of hurry-and-wait behavior in btrfs. It seems like the initial commit to the log/etc is willing to accept a very large volume of data, and then when all the trees get updated the system grinds to a crawl trying to deal with all the data that was committed. The problem is that you have two queues, with the second queue being rate-limiting but the first queue being the one that applies priority control. What we really need is for the log to have controls on how much it accepts so that the updating of the trees/etc never is rate-limiting. That will limit the ability to have short IO write bursts, but it would prevent low-priority writes from blocking high-priority read/writes. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distro vs latest kernel for BTRFS?
On Fri, Aug 22, 2014 at 8:04 AM, Austin S Hemmelgarn ahferro...@gmail.com wrote: I personally use Gentoo Unstable on all my systems, so I build all my kernels locally anyway, and stay pretty much in-line with the current stable Mainline kernel. Gentoo Unstable probably means gentoo-sources, testing version, which follows the stable kernel branch, but the most recent stable, and not the long-term stable. gentoo-sources stable version generally follows the most recent longterm stable kernel (so 3.14 right now). I'm not sure what the exact policy is, but that is my sense of it. So, you're still running a stable kernel most likely. If you really want mainline then you want git-sources. That follows the most recent mainline I believe. Of course, if you're following it that closely then you probably should think about just doing a git clone and managing it yourself, since then you can handle patches/etc more easily. I think the best option for somebody running btrfs is to stick with a stable kernel branch, either the current stable or a very recent longterm. I wouldn't go back into 3.2 land or anything like that. But, yes, if you had stuck with 3.14 and not gone to the current stable then you would have missed the compress=lzo deadlock. So, pick your poison. :) Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Significance of high number of mails on this list?
On Fri, Aug 22, 2014 at 3:35 AM, Duncan 1i5t5.dun...@cox.net wrote: No claim to be a dev, btrfs or otherwise, here, but I believe in this case you /are/ being too paranoid. Both btrfs send and receive only deal with data/metadata they know how to deal with. If it's corrupt in some way or if they don't understand it, they don't send/write it, they fail. IOW, if it works without error it's as guaranteed to be golden as these things get. The problem is that it doesn't always work without error in the first place, sometimes it /does/ fail. In that instance you can always try again as the existing data/metadata shouldn't be damaged, but if it keeps failing you may have to try something else, rsync, etc. Well, my main use-case for rsync right now is btrfs bug hoses my filesystem, so it would be nice to have a daily full backup on something other than btrfs so that it is unlikely to suffer the same problem at the same time. Using btrfs send with that backup would certainly be more efficient, but it would defeat the purpose of the backup, which is to not be btrfs. I am already using mirroring in the event of drive failure, and offsite cloud backups of critical data in the event of a larger catastrophe. Btrfs eating my data is a somewhat likely failure mode in the grand scheme of things, so I protect against it so that I can still have fun playing with btrfs without losing sleep. I've actually restored from it once. I suspect that I could have fixed my ENOSPC problem without resorting to that, but the usual FAQ solutions didn't work and I was running short on time, and that particular filesystem was only 64GB anyway so it was a fast restore (and that is why this filesystem is prone to ENOSPC in the first place). Oh, and I'm using rsnapshot, so I also get the benefit of a few days worth of backups - almost as good as snapper, though in reality obviously not the same thing. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix task hang under heavy compressed write
On Wed, Aug 13, 2014 at 7:54 AM, Martin Steigerwald mar...@lichtvoll.de wrote: Am Dienstag, 12. August 2014, 15:44:59 schrieb Liu Bo: This has been reported and discussed for a long time, and this hang occurs in both 3.15 and 3.16. Liu, is this safe for testing yet? I'm more than happy to test this an re-enable lzo (I've been running fine on 3.16 with it disabled, but had numerous issues when it was enabled on 3.15 and the rcs). It would just be helpful to clarify exactly what patch we should be testing, and what kernel we should test it against to be most helpful. No sense generating issue reports that aren't useful. Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On Tue, Jul 22, 2014 at 10:53 AM, Chris Mason c...@fb.com wrote: Thanks for the help in tracking this down everyone. We'll get there! Are you all running multi-disk systems (from a btrfs POV, more than one device?) I don't care how many physical drives this maps to, just does btrfs think there's more than one drive. I've been away on vacation so I haven't been able to try your latest patch, but I can try whatever is out there starting this weekend. I was getting fairly consistent hangs during heavy IO (especially rsync) on 3.15 with lzo enabled. This is on raid1 across 5 drives, directly against the partitions themselves (no dmcrypt, mdadm, lvm, etc). I disabled lzo and haven't had problems since. I'm now running on mainline without issue, but I think I did see the hang on mainline when I tried enabling lzo again briefly. Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On Fri, Jun 27, 2014 at 8:22 PM, Chris Samuel ch...@csamuel.org wrote: On Fri, 27 Jun 2014 05:20:41 PM Duncan wrote: If I'm not mistaken the fix for the 3.16 series bug was: ea4ebde02e08558b020c4b61bb9a4c0fcf63028e Btrfs: fix deadlocks with trylock on tree nodes. That patch applies cleanly to 3.15.2 so if it is indeed the fix it should probably go to -stable for the next 3.15 release.. I can confirm that 3.15.2 definitely has the deadlock problem. I tried upgrading just to convince myself of this before patching it and it only took a few hours before it stopped syncing with the usual errors. I applied the patch on Jun 28 around 20:00UTC. I haven't had a deadlock since, despite having the file system fairly active with a few reboots, some deleted snapshots, being assimilated by the new sysvinit replacement, etc. That doesn't really prove anything though - for all I know it will hang a week from now. However, the patch seems stable so far on 3.15.2. Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On Fri, Jun 27, 2014 at 9:06 AM, Duncan 1i5t5.dun...@cox.net wrote: Hopefully that problem's fixed on 3.16-rc2+, but as of yet there's not enough 3.16-rc2+ reports out there from folks experiencing issues with 3.15 blocked tasks to rightfully say. Any chance that it was backported to 3.15.2? I'd rather not move to mainline just for btrfs. I got another block this morning and failed to capture a log before my terminals gave out. I switched back to 3.15.0 for the moment, and we'll see if that fares any better. Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On Fri, Jun 27, 2014 at 11:52 AM, Chris Murphy li...@colorremedies.com wrote: On Jun 27, 2014, at 9:14 AM, Rich Freeman r-bt...@thefreemanclan.net wrote: I got another block this morning and failed to capture a log before my terminals gave out. I switched back to 3.15.0 for the moment, and we'll see if that fares any better. Yeah I'd start going backwards. The idea of going forwards is to hopefully get you unstuck or extract data where otherwise you can't, it's not really a recommendation for production usage. It's also often useful if you can reproduce the block with a current rc kernel and issue sysrq+w and post that. Then do your regression with an older kernel. So, obviously I'm getting my money's worth from the btrfs team, but neither is always a great option as neither involves me running a stable kernel. 3.15.0 contains CVE-2014-4014, although I'm running a version patched for that vulnerability. If I go back any further I'd probably have to backport it myself, and I only know about it because my distro patched that CVE on 3.15.0 before moving to 3.15.1. Running 3.16 doesn't bother me much from a btrfs standpoint, but it means I'm getting unstable updates on all the other modules as well. It is just more to deal with. I might give 3.15.2 a shot and see what happens, and I can always fall back to 3.15.0 again. Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Blocked tasks on 3.15.1
I've been getting blocked tasks on 3.15.1 generally at times when the filesystem is somewhat busy (such as doing a backup via scp/clonezilla writing to the disk). A week ago I had enabled snapper for a day which resulted in a daily cleanup of about 8 snapshots at once, which might have contributed, but I've been limping along since. Here is a pastebin of my dmesg from the hung tasks and a subsequent Alt-SysRq-W: http://pastebin.com/yYdcxFTE When this happens the system remains somewhat stable, but no writes to the disk succeed, and I start getting load averages in the dozens as tasks start blocking. On reboot the system generally works fine, though it can hang a day or two later. I'm happy to try patches, or try to capture any other output that is helpful the next time this happens - the system is fairly stable as long as I capture things someplace other than my btrfs file systems. I didn't see anything quite like this on the list. I updated my kernel around the time this behavior started, and was on 3.15.0 previously (though I haven't tried reverting yet). Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix deadlock with nested trans handles
On Sat, Mar 15, 2014 at 7:51 AM, Duncan 1i5t5.dun...@cox.net wrote: 1) Does running the snapper cleanup command from that cron job manually trigger the problem as well? As you can imagine I'm not too keen to trigger this often. But yes, I just gave it a shot on my SSD and cleaning a few days of timelines triggered a panic. 2) What about modifying the cron job to run hourly, or perhaps every six hours, so it's deleting only 2 or 12 instead of 48 at a time? Does that help? If so then it's a thundering herd problem. While definitely still a bug, you'll at least have a workaround until its fixed. Definitely looks like a thundering herd problem. I stopped the cron jobs (including the creation of snapshots based on your later warning). However, I am my snapshots one at a time at a rate of one every 5-30 minutes, and while that is creating surprisingly high disk loads on my ssd and hard drives, I don't get any panics. I figured that having only one deletion pending per checkpoint would eliminate locking risk. I did get some blocked task messages in dmesg, like: [105538.121239] INFO: task mysqld:3006 blocked for more than 120 seconds. [105538.121251] Not tainted 3.13.6-gentoo #1 [105538.121256] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [105538.121262] mysqld D 880395f63e80 3432 3006 1 0x [105538.121273] 88028b623d38 0086 88028b623dc8 81c10440 [105538.121283] 0200 88028b623fd8 880395f63b80 00012c40 [105538.121291] 00012c40 880395f63b80 532b7877 880410e7e578 [105538.121299] Call Trace: [105538.121316] [81623d73] schedule+0x6a/0x6c [105538.121327] [81623f52] schedule_preempt_disabled+0x9/0xb [105538.121337] [816251af] __mutex_lock_slowpath+0x155/0x1af [105538.121347] [812b9db0] ? radix_tree_tag_set+0x71/0xd4 [105538.121356] [81625225] mutex_lock+0x1c/0x2e [105538.121365] [8123c168] btrfs_log_inode_parent+0x161/0x308 [105538.121373] [8162466d] ? mutex_unlock+0x11/0x13 [105538.121382] [8123cd37] btrfs_log_dentry_safe+0x39/0x52 [105538.121390] [8121a0c9] btrfs_sync_file+0x1bc/0x280 [105538.121401] [811339a3] vfs_fsync_range+0x13/0x1d [105538.121409] [811339c4] vfs_fsync+0x17/0x19 [105538.121416] [81133c3c] do_fsync+0x30/0x55 [105538.121423] [81133e40] SyS_fsync+0xb/0xf [105538.121432] [8162c2e2] system_call_fastpath+0x16/0x1b I suspect that this may not be terribly helpful - it probably reflects tasks waiting for a lock rather than whatever is holding it. It was more of a problem when I was trying to delete a snapshot per minute on my ssd, or one every 5 min on hdd. Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix deadlock with nested trans handles
On Wed, Mar 12, 2014 at 12:34 PM, Rich Freeman r-bt...@thefreemanclan.net wrote: On Wed, Mar 12, 2014 at 11:24 AM, Josef Bacik jba...@fb.com wrote: On 03/12/2014 08:56 AM, Rich Freeman wrote: After a number of reboots the system became stable, presumably whatever race condition btrfs was hitting followed a favorable path. I do have a 2GB btrfs-image pre-dating my application of this patch that was causing the issue last week. Uhm wow that's pretty epic. I will talk to chris and figure out how we want to deal with that and send you a patch shortly. Thanks, A tiny bit more background. And some more background. I had more reboots over the next two days at the same time each day, just after my crontab successfully completed. One of the last thing it does is runs the snapper cleanups which delete a bunch of snapshots. During a reboot I checked and there were a bunch of deleted snapshots, which disappeared over the next 30-60 seconds before the panic, and then they would re-appear on the next reboot. I disabled the snapper cron job and this morning had no issues at all. One day isn't much to establish a trend, but I suspect that this is the cause. Obviously getting rid of snapshots would be desirable at some point, but I can wait for a patch. Snapper would be deleting about 48 snapshots at the same time, since I create them hourly and the cleanup occurs daily on two different subvolumes on the same filesystem. Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix deadlock with nested trans handles
On Thu, Mar 6, 2014 at 7:25 PM, Zach Brown z...@redhat.com wrote: On Thu, Mar 06, 2014 at 07:01:07PM -0500, Josef Bacik wrote: Zach found this deadlock that would happen like this And this fixes it. It's run through a few times successfully. I'm not sure if my issue is related to this or not - happy to start a new thread if not. I applied this patch as I was running into locks, but I am still having them. See: http://picpaste.com/IMG_20140312_072458-KPH35pQ6.jpg After a number of reboots the system became stable, presumably whatever race condition btrfs was hitting followed a favorable path. I do have a 2GB btrfs-image pre-dating my application of this patch that was causing the issue last week. Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix deadlock with nested trans handles
On Wed, Mar 12, 2014 at 11:24 AM, Josef Bacik jba...@fb.com wrote: On 03/12/2014 08:56 AM, Rich Freeman wrote: After a number of reboots the system became stable, presumably whatever race condition btrfs was hitting followed a favorable path. I do have a 2GB btrfs-image pre-dating my application of this patch that was causing the issue last week. Uhm wow that's pretty epic. I will talk to chris and figure out how we want to deal with that and send you a patch shortly. Thanks, If you need any info from me at all beyond the capture let me know. A tiny bit more background. The system would boot normally, but panic after about 30-90 seconds (usually long enough to log into KDE, perhaps even fire up a browser/etc). In single-user mode I could mount the filesystem read-only without issue. If I mounted it read-write (in recovery mode or normally) I'd get the panic after about 30-60 seconds. On one occasion it seemed stable, but panicked when I unmounted it. I have to say that I'm impressed that it recovers at all. I'd rather have the file system not write anything if it isn't sure it can't write it correctly, and that seems to be the effect here. Just about all the issues I've run into with btrfs have tended to be lockup/etc type issues, and not silent corruption. Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html