oops at mount
hi All, I'm new on the list. System: Distributor ID: Ubuntu Description:Ubuntu 13.04 Release:13.04 Codename: raring Linux ctu 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux The symptom is the same with Saucy 3.9 kernel. ii btrfs-tools 0.20~git20130524~650e656-0daily13~raring1 amd64 Checksumming Copy on Write Filesystem utilities I also tried btrfs-tools v0.19 before with no luck. $ btrfsck --repair /dev/sda1 enabling repair mode parent transid verify failed on 430612480 wanted 81016 found 81011 parent transid verify failed on 430612480 wanted 81016 found 81011 parent transid verify failed on 430612480 wanted 81016 found 81011 parent transid verify failed on 430612480 wanted 81016 found 81011 Ignoring transid failure Checking filesystem on /dev/sda1 UUID: deed1ffb-27bb-4555-b5ce-8a3c8ee5612c checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots checking csums checking root refs found 67570520064 bytes used err is 0 total csum bytes: 65168792 total tree bytes: 789745664 total fs tree bytes: 651145216 total extent tree bytes: 50372608 btree space waste bytes: 192929190 file data blocks allocated: 80764424192 referenced 69347667968 Btrfs v0.20-rc1 If I mount, I get an oops message. The machine is not completely freezed, but I have to reboot it to be able to use it again. 69.257107] btrfsck[2703]: segfault at 7ff069802710 ip 7ff063ceecbd sp 7fff9bb5db70 error 4 in libc-2.17.so[7ff063c6f000+1be000] [ 480.799981] device fsid deed1ffb-27bb-4555-b5ce-8a3c8ee5612c devid 1 transid 81010 /dev/sda1 [ 480.802507] btrfs: disk space caching is enabled [ 480.851534] Btrfs detected SSD devices, enabling SSD mode [ 480.863245] btrfs bad tree block start 0 413601792 [ 480.863320] btrfs bad tree block start 0 413601792 [ 480.863389] [ cut here ] [ 480.863426] Kernel BUG at a03d3b6a [verbose debug info unavailable] [ 480.863459] invalid opcode: [#1] SMP [ 480.863490] Modules linked in: ip6table_filter(F) ip6_tables(F) xt_state(F) ipt_REJECT(F) xt_CHECKSUM(F) iptable_mangle(F) xt_tcpudp(F) iptable_filter(F) ipt_MASQUERADE(F) iptable_nat(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack(F) ip_tables(F) x_tables(F) bridge(F) stp(F) llc(F) pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) vboxdrv(OF) rfcomm bnep snd_hda_codec_hdmi snd_hda_codec_idt binfmt_misc(F) qcserial usb_wwan usbserial pata_pcmcia arc4(F) hid_generic coretemp kvm_intel iwldvm kvm mac80211 ghash_clmulni_intel(F) aesni_intel(F) aes_x86_64(F) xts(F) lrw(F) gf128mul(F) ablk_helper(F) cryptd(F) usbhid hid joydev(F) tpm_infineon hp_wmi sparse_keymap uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev pcmcia microcode(F) btusb bluetooth psmouse(F) serio_raw(F) intel_ips btrfs(F) tpm_tis libcrc32c(F) zlib_deflate(F) sdhci_pci snd_hda_intel sdhci snd_hda_codec snd_hwdep(F) snd_pcm(F) firewire_ohci snd_page_alloc(F) firewire_core snd_seq_midi(F) snd_seq_midi_event(F) crc_itu_t(F) yenta_socket pcmcia_rsrc i915 pcmcia_core snd_rawmidi(F) drm_kms_helper snd_seq(F) hp_accel drm lis3lv02d snd_seq_device(F) input_polldev snd_timer(F) wmi iwlwifi snd(F) video(F) mac_hid cfg80211 lpc_ich i2c_algo_bit mei e1000e(F) soundcore(F) lp(F) parport(F) ahci(F) libahci(F) [ 480.864322] CPU 3 [ 480.864338] Pid: 5550, comm: mount Tainted: GF O 3.8.0-19-generic #30-Ubuntu Hewlett-Packard HP EliteBook 2540p/7008 [ 480.864386] RIP: 0010:[a03d3b6a] [a03d3b6a] btrfs_recover_log_trees+0x23a/0x390 [btrfs] [ 480.864474] RSP: 0018:88012ad41b40 EFLAGS: 00010282 [ 480.864499] RAX: fffb RBX: 88018b91c000 RCX: 0001801c001b [ 480.864531] RDX: 0001801c001c RSI: 801c001b RDI: 8801b20b3900 [ 480.864563] RBP: 88012ad41bf0 R08: R09: 0001 [ 480.864594] R10: R11: R12: 88014fc0a5a0 [ 480.864625] R13: 88011d2f0e40 R14: 88018b91a800 R15: 8801ab3ea000 [ 480.864656] FS: 7fb531818840() GS:8801bbcc() knlGS: [ 480.864693] CS: 0010 DS: ES: CR0: 8005003b [ 480.864718] CR2: 006a5000 CR3: 00016800b000 CR4: 07e0 [ 480.864750] DR0: DR1: DR2: [ 480.864781] DR3: DR6: 0ff0 DR7: 0400 [ 480.864813] Process mount (pid: 5550, threadinfo 88012ad4, task 880128522e80) [ 480.864847] Stack: [ 480.864860] 8801b0e5ce40 88012ad41b98 fffa ff84 [ 480.864905] faff 010684ff 0106 ff84 [ 480.864947] faff 84ff 0106 [ 480.864990] Call Trace: [
Re: oops at mount
On Thu, May 30, 2013 at 05:17:06AM -0600, Papp Tamas wrote: hi All, I'm new on the list. System: Distributor ID: Ubuntu Description: Ubuntu 13.04 Release: 13.04 Codename: raring Linux ctu 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux The symptom is the same with Saucy 3.9 kernel. Can you try btrfs-next git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git if it's still not fixed please file a bug at bugzilla.kernel.org and make sure the component is set to btrfs. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs metadata corruption; unmountable FS
On Wed, May 29, 2013 at 08:55:31PM -0600, Alex Marquez wrote: I'm not entirely sure what went completely wrong. Three possibilities are most likely, and they're listed below. For reference, here are supplemental materials split out into their own pastebins: * btrfs-debug-tree -R log http://pastebin.com/7ePy9sin * dmesg log http://pastebin.com/s1sdJRyd (btrfs tools are git head) Mounting with recovery,ro is no use. I've also taken a metadata dump with btrfs-image, though it completed with errors, so the dump may be incomplete. It's also 5 GBs, but I'm more than willing to make it publicly downloadable if it would help the cause. ** 1 Firstly, I have a raid1 (and, as I'll explain, partially raid10) array of 8 raw drives. A couple experience a controller error every once in a while. So it /may/ be the case that the hardware itself caused this problem, but I find it less likely than the following other two possibilities. (However, in part 3's log there is some mention of sdf giving IO errors...) ** 2 A couple of months ago I was doing a balance, trying to convert from raid10 to raid1. At the time, it was on the 3.6 kernel. I kept getting enospc errors (even with plenty of space), so I went from doing a soft conversion to a hard one. Of course, in the process my server was hard-rebooted by accident. When back online, I used btrfsck and it showed a bunch of extent vs. csum problems, which I used --repair to attempt to deal with. Though I can't recall the problems exactly, I do remember that it triggered an odd check regarding csums existing for extents that were freed. The commit which introduced this printf was https://git.kernel.org/cgit/linux/kernel/git/mason/btrfs-progs.git/commit/?id=580ccf9e2ef4607f5b67b531190e7842c4b2b0db Since then, every once in a while I would do another balance (sometimes soft, sometimes hard) in an attempt to complete the conversion -- to no avail, but seemingly to no harm. ** 3 Now, 2 weeks ago I (foolishly) thought I'd try the new skinny extents feature (mistaking it as available in 3.9) in order to see if it might alleviate the issues I've had with trying to finish that conversion. I enabled it via btrfstune, but quickly noted that my 3.9 kernel wouldn't mount the filesystem anymore (because of the incompatible feature). However, nothing had changed on-disk (given I wasn't running 3.10) but the flag... So I looked into clearing that flag, but btrfstune provided me no recourse. So I did something very dangerous and foolish: I went into btrfstune.c and changed the setting of the flag to clear the flag instead, then reran it. I mounted again, fingers crossed, and lo and behold, it was fine! Unfortunately, after some use, the filesystem failed and went read-only. That's when I got scared and decided it was time to stop trying to fix things myself (of course, far too late). The actual log is at http://pastebin.com/s1sdJRyd On line 85 you can see where I tried to mount it Line 87 is where I remounted after my btrfstune hack May 17 18:13:25 norman kernel: [ 1677.876008] item 1 key (51401449938944 a9 0) itemoff 3911 itemsize 33 So it did actually get a skinny extent in there, thats the skinny extent item key. You'll have to reset the flag and move to btrfs-next/3.10. Seems like you are smart enough to do basic things so if don't like that option you can just fix btrfsck to go through and delete any extent entry that has BTRFS_METADATA_ITEM_KEY and then --repair should put them back normally. If you want to do option #2 you don't need to reset the flag, leave it unset and then add a function to cmds-check.c right before check_extents() and have it just go through the extent tree and delete any entries with that key, and then check_extents() will take care of the rest. This is a bit dangerous though so I'd really recommend option #1. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: oops at mount
On Thu, 30 May 2013 08:32:35 -0400, Josef Bacik wrote: On Thu, May 30, 2013 at 05:17:06AM -0600, Papp Tamas wrote: hi All, I'm new on the list. System: Distributor ID: Ubuntu Description: Ubuntu 13.04 Release: 13.04 Codename:raring Linux ctu 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux The symptom is the same with Saucy 3.9 kernel. Can you try btrfs-next git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git if it's still not fixed please file a bug at bugzilla.kernel.org and make sure the component is set to btrfs. Thanks, Papp is using an Intel X18-M/X25-M/X25-V G2 SSD. At least with an Intel X25 SSD that identifies itself with INTEL SSDSA2M080 and on one with the ID INTEL SSDSA2M040, I've tested whether they honor the flush request. And these two SSDs don't do so, they ignore it. If you cut the power after a flush request completes, the data that was written before the flush request is gone, the write cache was _not_ flushed. You can only disable the write cache during/after every boot hdparm -W 0 /dev/sd... (which reduces the SSDs write speed to about 4 MB/s), or avoid such SSDs, or prepare to restore from backup occasionally. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: oops at mount
Quoting Stefan Behrens (2013-05-30 08:55:58) On Thu, 30 May 2013 08:32:35 -0400, Josef Bacik wrote: On Thu, May 30, 2013 at 05:17:06AM -0600, Papp Tamas wrote: hi All, I'm new on the list. System: Distributor ID: Ubuntu Description: Ubuntu 13.04 Release: 13.04 Codename:raring Linux ctu 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux The symptom is the same with Saucy 3.9 kernel. Can you try btrfs-next git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git if it's still not fixed please file a bug at bugzilla.kernel.org and make sure the component is set to btrfs. Thanks, Papp is using an Intel X18-M/X25-M/X25-V G2 SSD. At least with an Intel X25 SSD that identifies itself with INTEL SSDSA2M080 and on one with the ID INTEL SSDSA2M040, I've tested whether they honor the flush request. And these two SSDs don't do so, they ignore it. If you cut the power after a flush request completes, the data that was written before the flush request is gone, the write cache was _not_ flushed. You can only disable the write cache during/after every boot hdparm -W 0 /dev/sd... (which reduces the SSDs write speed to about 4 MB/s), or avoid such SSDs, or prepare to restore from backup occasionally. Hi Stefan, How did you verify this? I'm sure intel will want to hear about it if we can reproduce on all filesystems. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: oops at mount
On Thu, 30 May 2013 10:03:29 -0400, Chris Mason wrote: Quoting Stefan Behrens (2013-05-30 08:55:58) Papp is using an Intel X18-M/X25-M/X25-V G2 SSD. At least with an Intel X25 SSD that identifies itself with INTEL SSDSA2M080 and on one with the ID INTEL SSDSA2M040, I've tested whether they honor the flush request. And these two SSDs don't do so, they ignore it. If you cut the power after a flush request completes, the data that was written before the flush request is gone, the write cache was _not_ flushed. You can only disable the write cache during/after every boot hdparm -W 0 /dev/sd... (which reduces the SSDs write speed to about 4 MB/s), or avoid such SSDs, or prepare to restore from backup occasionally. Hi Stefan, How did you verify this? I'm sure intel will want to hear about it if we can reproduce on all filesystems. -chris We have written a kernel module that (among others) is able to write 4KB block of random data at random locations on an SSD, and in a second step to read and verify that data. The test procedure to check SSDs is: 1. Write 4KB blocks of random data to random locations on the disk. Send a submit_bio(REQ_FLUSH) after each 4KB block. Log the completion of the write request and of the flush request together with the result value. 2. Somewhere in the middle of operation, switch off all power, drive presence and SAS data pins between the SSD and the SATA host controller. 3. Wait some time, afterwards enable the connection between the SSD and the host controller again. 4. Read back the 4KB blocks of random data at random locations using the same seed value that was used to generate the contents and location when the blocks were written. Verify the data, log whether the verification succeeded or failed. 5. Compare the log of the write and flush request completion with the one of the read and verify process. SSDs that honor the flush request don't cause verify errors for blocks where the write bio and the flush bio completed successfully. Those two Intel SSDs that I mentioned failed this test. Other Intel SSD types succeeded the test. Maybe a firmware update would fix this issue, I suppose it will, I have never tried it. My intention was not to blame the SSD manufacturer, in fact, I like their SSDs very much and buy and use them frequently. I just wanted to prevent Josef from the headache to question the Btrfs implementation. The issue that Papp described looks just like a power failure in conjunction with a storage device that ignores flush requests. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: oops at mount
Quoting Stefan Behrens (2013-05-30 10:59:59) On Thu, 30 May 2013 10:03:29 -0400, Chris Mason wrote: Quoting Stefan Behrens (2013-05-30 08:55:58) Papp is using an Intel X18-M/X25-M/X25-V G2 SSD. At least with an Intel X25 SSD that identifies itself with INTEL SSDSA2M080 and on one with the ID INTEL SSDSA2M040, I've tested whether they honor the flush request. And these two SSDs don't do so, they ignore it. If you cut the power after a flush request completes, the data that was written before the flush request is gone, the write cache was _not_ flushed. You can only disable the write cache during/after every boot hdparm -W 0 /dev/sd... (which reduces the SSDs write speed to about 4 MB/s), or avoid such SSDs, or prepare to restore from backup occasionally. Hi Stefan, How did you verify this? I'm sure intel will want to hear about it if we can reproduce on all filesystems. -chris We have written a kernel module that (among others) is able to write 4KB block of random data at random locations on an SSD, and in a second step to read and verify that data. The test procedure to check SSDs is: 1. Write 4KB blocks of random data to random locations on the disk. Send a submit_bio(REQ_FLUSH) after each 4KB block. Log the completion of the write request and of the flush request together with the result value. 2. Somewhere in the middle of operation, switch off all power, drive presence and SAS data pins between the SSD and the SATA host controller. 3. Wait some time, afterwards enable the connection between the SSD and the host controller again. 4. Read back the 4KB blocks of random data at random locations using the same seed value that was used to generate the contents and location when the blocks were written. Verify the data, log whether the verification succeeded or failed. 5. Compare the log of the write and flush request completion with the one of the read and verify process. SSDs that honor the flush request don't cause verify errors for blocks where the write bio and the flush bio completed successfully. Those two Intel SSDs that I mentioned failed this test. Other Intel SSD types succeeded the test. Maybe a firmware update would fix this issue, I suppose it will, I have never tried it. My intention was not to blame the SSD manufacturer, in fact, I like their SSDs very much and buy and use them frequently. I just wanted to prevent Josef from the headache to question the Btrfs implementation. The issue that Papp described looks just like a power failure in conjunction with a storage device that ignores flush requests. It's definitely useful information. The gen2's did have some problems (mine failed as well) but I didn't realize how bad the powercut handling was. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nocow 'C' flag ignored after balance
On Wed, May 29, 2013 Miao Xie wrote: On wed, 29 May 2013 10:55:11 +0900, Liu Bo wrote: On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote: From: Liu Bo bo.li@oracle.com Subject: [PATCH] Btrfs: fix broken nocow after a normal balance [...] Sorry for the long wait in replying. This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu Raring kernel). I can probably try again on a newer version if you think it will help. This was my first kernel compile so I patched by hand and waited (10 hours on my old 32 bit single core machine). I did move some of the files off and back on to the filesystem to start fresh and compare but all seem to exhibit the same behavior after a balance. Thanks for testing the patch although it didn't help you. Actually I tested it to be sure that it fixed the problems in my reproducer. So anyway can you please apply this debug patch in order to nail it down? Your patch can not fix the above problem is because we may update -last_snapshot after we relocate the file data extent. For example, there are two block groups which will be relocated, One is data block group, the other is metadata block group. Then we relocate the data block group firstly, and set the new generation for the file data extent item/the relative extent item and set (new_generation - 1) for -last_snapshot. After the relocation of this block group, we will end the transaction and drop the relocation tree. If we end the space balance now, we won't break the nocow rule because -last_snapshot is less than the generation of the file data extent item/the relative extent item. But there is still one block group which will be relocated, when relocating the second block group, we will also start a new transaction, and update -last_snapshot if need. So, -last_snapshot is greater than the generation of the file data extent item we set before. And the nocow rule is broken. Back to this above problem. I don't think it is a serious problem, we only do COW once after the relocation, then we will still honour the nocow rule. The behaviour is similar to snapshot. So maybe it needn't be fixed. I would argue that for large vm workloads, running a balance or adding disks is a common practice that will result in a drastic drop in performance as well as massive increases in metadata writes and fragmentation. In my case my disks were thrashing severely, performance was poor and ntp couldn't even hold my clock stable. If the fix is nontrival please add this to the todo list. Thanks, Kyle If we must fix this problem, I think the only way is that get the generation at the beginning of the space balance, and then set it to -last_snapshot if -last_snapshot is less than it, don't use (current_generation - 1) to update the -last_snapshot. Besides that, don't forget to store the generation into btrfs_balance_item, or the problem will happen after we resume the balance. Thanks Miao thanks, liubo [...] -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfsck errors (fs tree 565 refs 125 not found) - how serious
Hi again, I am able to induce the btrfsck errors I experienced using a synthetic workload on a fresh filesystem with linux-3.10.0.rc2. However as filing the bug-report would take quite some time (uploading 512mb trace-files, writing a short read-me, ...) I wonder whether this is an issue woth of reporting, or maybe just caused by an outdated version of btrfsck (I am using btrfs-progs-0.20.rc1.20130308git704a08c-1). Regards, Clemens fs tree 565 refs 125 not found unresolved ref root 807 dir 813347 index 277 namelen 39 name snapshot_1368273601_2013-05-11_14:00:01 error 600 .. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs metadata corruption; unmountable FS
Oh, I see... Well at least now I know. Thanks! I'll probably go for the safer route of using 3.10... Though I'd like to know how stable the current RC is wrt btrfs, if instead I should wait for the release. ~Alex On May 30, 2013, at 8:52 AM, Josef Bacik jba...@fusionio.com wrote: On Wed, May 29, 2013 at 08:55:31PM -0600, Alex Marquez wrote: I'm not entirely sure what went completely wrong. Three possibilities are most likely, and they're listed below. For reference, here are supplemental materials split out into their own pastebins: * btrfs-debug-tree -R log http://pastebin.com/7ePy9sin * dmesg log http://pastebin.com/s1sdJRyd (btrfs tools are git head) Mounting with recovery,ro is no use. I've also taken a metadata dump with btrfs-image, though it completed with errors, so the dump may be incomplete. It's also 5 GBs, but I'm more than willing to make it publicly downloadable if it would help the cause. ** 1 Firstly, I have a raid1 (and, as I'll explain, partially raid10) array of 8 raw drives. A couple experience a controller error every once in a while. So it /may/ be the case that the hardware itself caused this problem, but I find it less likely than the following other two possibilities. (However, in part 3's log there is some mention of sdf giving IO errors...) ** 2 A couple of months ago I was doing a balance, trying to convert from raid10 to raid1. At the time, it was on the 3.6 kernel. I kept getting enospc errors (even with plenty of space), so I went from doing a soft conversion to a hard one. Of course, in the process my server was hard-rebooted by accident. When back online, I used btrfsck and it showed a bunch of extent vs. csum problems, which I used --repair to attempt to deal with. Though I can't recall the problems exactly, I do remember that it triggered an odd check regarding csums existing for extents that were freed. The commit which introduced this printf was https://git.kernel.org/cgit/linux/kernel/git/mason/btrfs-progs.git/commit/?id=580ccf9e2ef4607f5b67b531190e7842c4b2b0db Since then, every once in a while I would do another balance (sometimes soft, sometimes hard) in an attempt to complete the conversion -- to no avail, but seemingly to no harm. ** 3 Now, 2 weeks ago I (foolishly) thought I'd try the new skinny extents feature (mistaking it as available in 3.9) in order to see if it might alleviate the issues I've had with trying to finish that conversion. I enabled it via btrfstune, but quickly noted that my 3.9 kernel wouldn't mount the filesystem anymore (because of the incompatible feature). However, nothing had changed on-disk (given I wasn't running 3.10) but the flag... So I looked into clearing that flag, but btrfstune provided me no recourse. So I did something very dangerous and foolish: I went into btrfstune.c and changed the setting of the flag to clear the flag instead, then reran it. I mounted again, fingers crossed, and lo and behold, it was fine! Unfortunately, after some use, the filesystem failed and went read-only. That's when I got scared and decided it was time to stop trying to fix things myself (of course, far too late). The actual log is at http://pastebin.com/s1sdJRyd On line 85 you can see where I tried to mount it Line 87 is where I remounted after my btrfstune hack May 17 18:13:25 norman kernel: [ 1677.876008] item 1 key (51401449938944 a9 0) itemoff 3911 itemsize 33 So it did actually get a skinny extent in there, thats the skinny extent item key. You'll have to reset the flag and move to btrfs-next/3.10. Seems like you are smart enough to do basic things so if don't like that option you can just fix btrfsck to go through and delete any extent entry that has BTRFS_METADATA_ITEM_KEY and then --repair should put them back normally. If you want to do option #2 you don't need to reset the flag, leave it unset and then add a function to cmds-check.c right before check_extents() and have it just go through the extent tree and delete any entries with that key, and then check_extents() will take care of the rest. This is a bit dangerous though so I'd really recommend option #1. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfsck errors (fs tree 565 refs 125 not found) - how serious
On Thu, May 30, 2013 at 12:06:50PM -0600, Clemens Eisserer wrote: Hi again, I am able to induce the btrfsck errors I experienced using a synthetic workload on a fresh filesystem with linux-3.10.0.rc2. However as filing the bug-report would take quite some time (uploading 512mb trace-files, writing a short read-me, ...) I wonder whether this is an issue woth of reporting, or maybe just caused by an outdated version of btrfsck (I am using btrfs-progs-0.20.rc1.20130308git704a08c-1). You are running on an unmounted fs right? Also please make sure you are running the git version git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: oops at mount
On 05/30/2013 13:17, Papp Tamas wrote: hi All, I'm new on the list. System: Distributor ID:Ubuntu Description:Ubuntu 13.04 Release:13.04 Codename:raring Linux ctu 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux The symptom is the same with Saucy 3.9 kernel. ii btrfs-tools 0.20~git20130524~650e656-0daily13~raring1 amd64 Checksumming Copy on Write Filesystem utilities I also tried btrfs-tools v0.19 before with no luck. $ btrfsck --repair /dev/sda1 enabling repair mode parent transid verify failed on 430612480 wanted 81016 found 81011 parent transid verify failed on 430612480 wanted 81016 found 81011 parent transid verify failed on 430612480 wanted 81016 found 81011 parent transid verify failed on 430612480 wanted 81016 found 81011 Ignoring transid failure Checking filesystem on /dev/sda1 UUID: deed1ffb-27bb-4555-b5ce-8a3c8ee5612c checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots checking csums checking root refs found 67570520064 bytes used err is 0 total csum bytes: 65168792 total tree bytes: 789745664 total fs tree bytes: 651145216 total extent tree bytes: 50372608 btree space waste bytes: 192929190 file data blocks allocated: 80764424192 referenced 69347667968 Btrfs v0.20-rc1 If I mount, I get an oops message. The machine is not completely freezed, but I have to reboot it to be able to use it again. 69.257107] btrfsck[2703]: segfault at 7ff069802710 ip 7ff063ceecbd sp 7fff9bb5db70 error 4 in libc-2.17.so[7ff063c6f000+1be000] [ 480.799981] device fsid deed1ffb-27bb-4555-b5ce-8a3c8ee5612c devid 1 transid 81010 /dev/sda1 [ 480.802507] btrfs: disk space caching is enabled [ 480.851534] Btrfs detected SSD devices, enabling SSD mode [ 480.863245] btrfs bad tree block start 0 413601792 [ 480.863320] btrfs bad tree block start 0 413601792 [ 480.863389] [ cut here ] [ 480.863426] Kernel BUG at a03d3b6a [verbose debug info unavailable] [ 480.863459] invalid opcode: [#1] SMP [ 480.863490] Modules linked in: ip6table_filter(F) ip6_tables(F) xt_state(F) ipt_REJECT(F) xt_CHECKSUM(F) iptable_mangle(F) xt_tcpudp(F) iptable_filter(F) ipt_MASQUERADE(F) iptable_nat(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack(F) ip_tables(F) x_tables(F) bridge(F) stp(F) llc(F) pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) vboxdrv(OF) rfcomm bnep snd_hda_codec_hdmi snd_hda_codec_idt binfmt_misc(F) qcserial usb_wwan usbserial pata_pcmcia arc4(F) hid_generic coretemp kvm_intel iwldvm kvm mac80211 ghash_clmulni_intel(F) aesni_intel(F) aes_x86_64(F) xts(F) lrw(F) gf128mul(F) ablk_helper(F) cryptd(F) usbhid hid joydev(F) tpm_infineon hp_wmi sparse_keymap uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev pcmcia microcode(F) btusb bluetooth psmouse(F) serio_raw(F) intel_ips btrfs(F) tpm_tis libcrc32c(F) zlib_deflate(F) sdhci_pci snd_hda_intel sdhci snd_hda_codec snd_hwdep(F) snd_pcm(F) firewire_ohci snd_page_alloc(F) firewire_core snd_seq_midi(F) snd_seq_midi_event(F) crc_itu_t(F) yenta_socket pcmcia_rsrc i915 pcmcia_core snd_rawmidi(F) drm_kms_helper snd_seq(F) hp_accel drm lis3lv02d snd_seq_device(F) input_polldev snd_timer(F) wmi iwlwifi snd(F) video(F) mac_hid cfg80211 lpc_ich i2c_algo_bit mei e1000e(F) soundcore(F) lp(F) parport(F) ahci(F) libahci(F) [ 480.864322] CPU 3 [ 480.864338] Pid: 5550, comm: mount Tainted: GF O 3.8.0-19-generic #30-Ubuntu Hewlett-Packard HP EliteBook 2540p/7008 [ 480.864386] RIP: 0010:[a03d3b6a] [a03d3b6a] btrfs_recover_log_trees+0x23a/0x390 [btrfs] [ 480.864474] RSP: 0018:88012ad41b40 EFLAGS: 00010282 [ 480.864499] RAX: fffb RBX: 88018b91c000 RCX: 0001801c001b [ 480.864531] RDX: 0001801c001c RSI: 801c001b RDI: 8801b20b3900 [ 480.864563] RBP: 88012ad41bf0 R08: R09: 0001 [ 480.864594] R10: R11: R12: 88014fc0a5a0 [ 480.864625] R13: 88011d2f0e40 R14: 88018b91a800 R15: 8801ab3ea000 [ 480.864656] FS: 7fb531818840() GS:8801bbcc() knlGS: [ 480.864693] CS: 0010 DS: ES: CR0: 8005003b [ 480.864718] CR2: 006a5000 CR3: 00016800b000 CR4: 07e0 [ 480.864750] DR0: DR1: DR2: [ 480.864781] DR3: DR6: 0ff0 DR7: 0400 [ 480.864813] Process mount (pid: 5550, threadinfo 88012ad4, task 880128522e80) [ 480.864847] Stack: [ 480.864860] 8801b0e5ce40 88012ad41b98 fffa ff84 [ 480.864905] faff 010684ff 0106 ff84 [ 480.864947] faff 84ff 0106 [ 480.864990] Call Trace: [ 480.865019]
[PATCH] Btrfs: stop all workers before cleaning up roots
Dave reported a panic because the extent_root-commit_root was NULL in the caching kthread. That is because we just unset it in free_root_pointers, which is not the correct thing to do, we have to either wait for the caching kthread to complete or hold the extent_commit_sem lock so we know the thread has exited. This patch makes the kthreads all stop first and then we do our cleanup. This should fix the race. Thanks, Reported-by: David Sterba dste...@suse.cz Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/disk-io.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2b53afd..77cb566 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3547,13 +3547,13 @@ int close_ctree(struct btrfs_root *root) btrfs_free_block_groups(fs_info); - free_root_pointers(fs_info, 1); + btrfs_stop_all_workers(fs_info); del_fs_roots(fs_info); - iput(fs_info-btree_inode); + free_root_pointers(fs_info, 1); - btrfs_stop_all_workers(fs_info); + iput(fs_info-btree_inode); #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY if (btrfs_test_opt(root, CHECK_INTEGRITY)) -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
testing stable pages being modified
'stable' pages have always been a bit of a fiction. It's easy to intentionally modify stable pages under io with some help from page references that ignore mappings and page state. Here's little test that uses O_DIRECT to get the pinned aio ring pages under IO and then has event completion stores modify them while they're in flight. It's a nice quick way to test the consequences of stable pages being modified. It can be used to burp out ratelimited csum failure kernel messages with btrfs, for example. - z #define _GNU_SOURCE #include stdlib.h #include unistd.h #include stdio.h #include limits.h #include sys/types.h #include sys/stat.h #include fcntl.h #include sys/uio.h #include assert.h #include libaio.h int main(int argc, char **argv) { size_t total = 1 * 1024 * 1024; size_t page_size = sysconf(_SC_PAGESIZE); struct iovec *iov; size_t iov_nr = total / page_size; void *junk; io_context_t ctx = NULL; int nr_iocbs = 3; struct iocb iocbs[nr_iocbs]; struct iocb *iocb_ptrs[nr_iocbs]; struct io_event events[nr_iocbs]; int ret; int fd; int nr; int i; if (argc != 2) { fprintf(stderr, usage: %s file_to_overwrite\n, argv[0]); exit(1); } iov = calloc(iov_nr, sizeof(*iov)); junk = malloc(total); assert(iov junk); fd = open(argv[1], O_RDWR|O_CREAT|O_DIRECT, 0644); assert(fd = 0); ret = io_setup(nr_iocbs, ctx); assert(ret = 0); for (i = 0; i iov_nr; i++) { iov[i].iov_base = ctx; iov[i].iov_len = page_size; } /* initial write to allocate the file region */ ret = writev(fd, iov, iov_nr); assert(ret == total); /* * Keep one of each of these iocbs in flight: * * [0]: hopefully fast 0 byte read to keep churning events * [1]: dio read of file bytes to trigger csum verification * [2]: dio write of unstable event pages */ io_prep_pread(iocbs[0], fd, junk, 0, 0); io_prep_pread(iocbs[1], fd, junk, total, 0); io_prep_pwritev(iocbs[2], fd, iov, iov_nr, 0); for (i = 0; i nr_iocbs; i++) iocb_ptrs[i] = iocbs[i]; nr = nr_iocbs; for(;;) { ret = io_submit(ctx, nr, iocb_ptrs); assert(ret == nr); nr = io_getevents(ctx, 1, nr_iocbs, events, NULL); assert(nr 0); for (i = 0; i nr; i++) iocb_ptrs[i] = events[i].obj; } return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: testing stable pages being modified
Quoting Zach Brown (2013-05-30 18:36:10) 'stable' pages have always been a bit of a fiction. It's easy to intentionally modify stable pages under io with some help from page references that ignore mappings and page state. Here's little test that uses O_DIRECT to get the pinned aio ring pages under IO and then has event completion stores modify them while they're in flight. It's a nice quick way to test the consequences of stable pages being modified. It can be used to burp out ratelimited csum failure kernel messages with btrfs, for example. Changing O_DIRECT in flight has always been a deep dark corner case, and crc errors are the expected result. Have you found anyone doing this in real life? I do like the small test program though, we should extend it into a test to make sure crcs are really crcing. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html