oops at mount

2013-05-30 Thread Papp Tamas

hi All,

I'm new on the list.

System:
Distributor ID: Ubuntu
Description:Ubuntu 13.04
Release:13.04
Codename:   raring

Linux ctu 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 
x86_64 x86_64 GNU/Linux

The symptom is the same with Saucy 3.9 kernel.

ii  btrfs-tools   0.20~git20130524~650e656-0daily13~raring1 amd64 
 Checksumming Copy on Write Filesystem utilities



I also tried btrfs-tools v0.19 before with no luck.


$ btrfsck --repair /dev/sda1
enabling repair mode
parent transid verify failed on 430612480 wanted 81016 found 81011
parent transid verify failed on 430612480 wanted 81016 found 81011
parent transid verify failed on 430612480 wanted 81016 found 81011
parent transid verify failed on 430612480 wanted 81016 found 81011
Ignoring transid failure
Checking filesystem on /dev/sda1
UUID: deed1ffb-27bb-4555-b5ce-8a3c8ee5612c
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 67570520064 bytes used err is 0
total csum bytes: 65168792
total tree bytes: 789745664
total fs tree bytes: 651145216
total extent tree bytes: 50372608
btree space waste bytes: 192929190
file data blocks allocated: 80764424192
 referenced 69347667968
Btrfs v0.20-rc1


If I mount, I get an oops message. The machine is not completely freezed, but I have to reboot it to 
be able to use it again.



   69.257107] btrfsck[2703]: segfault at 7ff069802710 ip 7ff063ceecbd sp 7fff9bb5db70 error 
4 in libc-2.17.so[7ff063c6f000+1be000]

[  480.799981] device fsid deed1ffb-27bb-4555-b5ce-8a3c8ee5612c devid 1 transid 
81010 /dev/sda1
[  480.802507] btrfs: disk space caching is enabled
[  480.851534] Btrfs detected SSD devices, enabling SSD mode
[  480.863245] btrfs bad tree block start 0 413601792
[  480.863320] btrfs bad tree block start 0 413601792
[  480.863389] [ cut here ]
[  480.863426] Kernel BUG at a03d3b6a [verbose debug info unavailable]
[  480.863459] invalid opcode:  [#1] SMP
[  480.863490] Modules linked in: ip6table_filter(F) ip6_tables(F) xt_state(F) ipt_REJECT(F) 
xt_CHECKSUM(F) iptable_mangle(F) xt_tcpudp(F) iptable_filter(F) ipt_MASQUERADE(F) iptable_nat(F) 
nf_conntrack_ipv4(F) nf_defrag_ipv4(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack(F) ip_tables(F) 
x_tables(F) bridge(F) stp(F) llc(F) pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) vboxdrv(OF) 
rfcomm bnep snd_hda_codec_hdmi snd_hda_codec_idt binfmt_misc(F) qcserial usb_wwan usbserial 
pata_pcmcia arc4(F) hid_generic coretemp kvm_intel iwldvm kvm mac80211 ghash_clmulni_intel(F) 
aesni_intel(F) aes_x86_64(F) xts(F) lrw(F) gf128mul(F) ablk_helper(F) cryptd(F) usbhid hid joydev(F) 
tpm_infineon hp_wmi sparse_keymap uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core 
videodev pcmcia microcode(F) btusb bluetooth psmouse(F) serio_raw(F) intel_ips btrfs(F) tpm_tis 
libcrc32c(F) zlib_deflate(F) sdhci_pci snd_hda_intel sdhci snd_hda_codec snd_hwdep(F) snd_pcm(F) 
firewire_ohci snd_page_alloc(F) firewire_core snd_seq_midi(F) snd_seq_midi_event(F) crc_itu_t(F) 
yenta_socket pcmcia_rsrc i915 pcmcia_core snd_rawmidi(F) drm_kms_helper snd_seq(F) hp_accel drm 
lis3lv02d snd_seq_device(F) input_polldev snd_timer(F) wmi iwlwifi snd(F) video(F) mac_hid cfg80211 
lpc_ich i2c_algo_bit mei e1000e(F) soundcore(F) lp(F) parport(F) ahci(F) libahci(F)

[  480.864322] CPU 3
[  480.864338] Pid: 5550, comm: mount Tainted: GF  O 3.8.0-19-generic #30-Ubuntu 
Hewlett-Packard HP EliteBook 2540p/7008
[  480.864386] RIP: 0010:[a03d3b6a]  [a03d3b6a] 
btrfs_recover_log_trees+0x23a/0x390 [btrfs]

[  480.864474] RSP: 0018:88012ad41b40  EFLAGS: 00010282
[  480.864499] RAX: fffb RBX: 88018b91c000 RCX: 0001801c001b
[  480.864531] RDX: 0001801c001c RSI: 801c001b RDI: 8801b20b3900
[  480.864563] RBP: 88012ad41bf0 R08:  R09: 0001
[  480.864594] R10:  R11:  R12: 88014fc0a5a0
[  480.864625] R13: 88011d2f0e40 R14: 88018b91a800 R15: 8801ab3ea000
[  480.864656] FS:  7fb531818840() GS:8801bbcc() 
knlGS:
[  480.864693] CS:  0010 DS:  ES:  CR0: 8005003b
[  480.864718] CR2: 006a5000 CR3: 00016800b000 CR4: 07e0
[  480.864750] DR0:  DR1:  DR2: 
[  480.864781] DR3:  DR6: 0ff0 DR7: 0400
[  480.864813] Process mount (pid: 5550, threadinfo 88012ad4, task 
880128522e80)
[  480.864847] Stack:
[  480.864860]  8801b0e5ce40 88012ad41b98 fffa 
ff84
[  480.864905]  faff 010684ff 0106 
ff84
[  480.864947]  faff 84ff 0106 

[  480.864990] Call Trace:
[  

Re: oops at mount

2013-05-30 Thread Josef Bacik
On Thu, May 30, 2013 at 05:17:06AM -0600, Papp Tamas wrote:
 hi All,
 
 I'm new on the list.
 
 System:
 Distributor ID:   Ubuntu
 Description:  Ubuntu 13.04
 Release:  13.04
 Codename: raring
 
 Linux ctu 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 
 x86_64 x86_64 GNU/Linux
 
 The symptom is the same with Saucy 3.9 kernel.

Can you try btrfs-next

git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git

if it's still not fixed please file a bug at bugzilla.kernel.org and make sure
the component is set to btrfs.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs metadata corruption; unmountable FS

2013-05-30 Thread Josef Bacik
On Wed, May 29, 2013 at 08:55:31PM -0600, Alex Marquez wrote:
 I'm not entirely sure what went completely wrong.  Three possibilities are 
 most likely, and they're listed below.
 For reference, here are supplemental materials split out into their own 
 pastebins:
 * btrfs-debug-tree -R log http://pastebin.com/7ePy9sin
 * dmesg log http://pastebin.com/s1sdJRyd
 (btrfs tools are git head)
 Mounting with recovery,ro is no use.
 I've also taken a metadata dump with btrfs-image, though it completed with 
 errors, so the dump may be incomplete.  It's also 5 GBs, but I'm more than 
 willing to make it publicly downloadable if it would help the cause.
 
 ** 1
 Firstly, I have a raid1 (and, as I'll explain, partially raid10) array of 8 
 raw drives.  A couple experience a controller error every once in a while.  
 So it /may/ be the case that the hardware itself caused this problem, but I 
 find it less likely than the following other two possibilities.  (However, in 
 part 3's log there is some mention of sdf giving IO errors...)
 
 ** 2
 A couple of months ago I was doing a balance, trying to convert from raid10 
 to raid1.  At the time, it was on the 3.6 kernel.
 
 I kept getting enospc errors (even with plenty of space), so I went from 
 doing a soft conversion to a hard one.  Of course, in the process my server 
 was hard-rebooted by accident.  When back online, I used btrfsck and it 
 showed a bunch of extent vs. csum problems, which I used --repair to attempt 
 to deal with. 
 
 Though I can't recall the problems exactly, I do remember that it triggered 
 an odd check regarding csums existing for extents that were freed.
 The commit which introduced this printf was 
 https://git.kernel.org/cgit/linux/kernel/git/mason/btrfs-progs.git/commit/?id=580ccf9e2ef4607f5b67b531190e7842c4b2b0db
 
 Since then, every once in a while I would do another balance (sometimes soft, 
 sometimes hard) in an attempt to complete the conversion -- to no avail, but 
 seemingly to no harm.
 
 ** 3
 Now, 2 weeks ago I (foolishly) thought I'd try the new skinny extents feature 
 (mistaking it as available in 3.9) in order to see if it might alleviate the 
 issues I've had with trying to finish that conversion.  I enabled it via 
 btrfstune, but quickly noted that my 3.9 kernel wouldn't mount the filesystem 
 anymore (because of the incompatible feature).
 
 However, nothing had changed on-disk (given I wasn't running 3.10) but the 
 flag...  So I looked into clearing that flag, but btrfstune provided me no 
 recourse.  So I did something very dangerous and foolish:  I went into 
 btrfstune.c and changed the setting of the flag to clear the flag instead, 
 then reran it.  I mounted again, fingers crossed, and lo and behold, it was 
 fine!
 
 Unfortunately, after some use, the filesystem failed and went read-only.  
 That's when I got scared and decided it was time to stop trying to fix things 
 myself (of course, far too late).
 
 The actual log is at http://pastebin.com/s1sdJRyd
 On line 85 you can see where I tried to mount it
 Line 87 is where I remounted after my btrfstune hack

May 17 18:13:25 norman kernel: [ 1677.876008]   item 1 key (51401449938944 a9 
0) itemoff 3911 itemsize 33

So it did actually get a skinny extent in there, thats the skinny extent item
key.  You'll have to reset the flag and move to btrfs-next/3.10.  Seems like you
are smart enough to do basic things so if don't like that option you can just
fix btrfsck to go through and delete any extent entry that has
BTRFS_METADATA_ITEM_KEY and then --repair should put them back normally.  If you
want to do option #2 you don't need to reset the flag, leave it unset and then
add a function to cmds-check.c right before check_extents() and have it just go
through the extent tree and delete any entries with that key, and then
check_extents() will take care of the rest.  This is a bit dangerous though so
I'd really recommend option #1.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oops at mount

2013-05-30 Thread Stefan Behrens
On Thu, 30 May 2013 08:32:35 -0400, Josef Bacik wrote:
 On Thu, May 30, 2013 at 05:17:06AM -0600, Papp Tamas wrote:
 hi All,

 I'm new on the list.

 System:
 Distributor ID:  Ubuntu
 Description: Ubuntu 13.04
 Release: 13.04
 Codename:raring

 Linux ctu 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 
 x86_64 x86_64 GNU/Linux

 The symptom is the same with Saucy 3.9 kernel.
 
 Can you try btrfs-next
 
 git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git
 
 if it's still not fixed please file a bug at bugzilla.kernel.org and make sure
 the component is set to btrfs.  Thanks,

Papp is using an Intel X18-M/X25-M/X25-V G2 SSD. At least with an Intel
X25 SSD that identifies itself with INTEL SSDSA2M080 and on one with
the ID INTEL SSDSA2M040, I've tested whether they honor the flush
request. And these two SSDs don't do so, they ignore it. If you cut the
power after a flush request completes, the data that was written before
the flush request is gone, the write cache was _not_ flushed.

You can only disable the write cache during/after every boot hdparm -W
0 /dev/sd... (which reduces the SSDs write speed to about 4 MB/s), or
avoid such SSDs, or prepare to restore from backup occasionally.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oops at mount

2013-05-30 Thread Chris Mason
Quoting Stefan Behrens (2013-05-30 08:55:58)
 On Thu, 30 May 2013 08:32:35 -0400, Josef Bacik wrote:
  On Thu, May 30, 2013 at 05:17:06AM -0600, Papp Tamas wrote:
  hi All,
 
  I'm new on the list.
 
  System:
  Distributor ID:  Ubuntu
  Description: Ubuntu 13.04
  Release: 13.04
  Codename:raring
 
  Linux ctu 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 
  x86_64 x86_64 x86_64 GNU/Linux
 
  The symptom is the same with Saucy 3.9 kernel.
  
  Can you try btrfs-next
  
  git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git
  
  if it's still not fixed please file a bug at bugzilla.kernel.org and make 
  sure
  the component is set to btrfs.  Thanks,
 
 Papp is using an Intel X18-M/X25-M/X25-V G2 SSD. At least with an Intel
 X25 SSD that identifies itself with INTEL SSDSA2M080 and on one with
 the ID INTEL SSDSA2M040, I've tested whether they honor the flush
 request. And these two SSDs don't do so, they ignore it. If you cut the
 power after a flush request completes, the data that was written before
 the flush request is gone, the write cache was _not_ flushed.
 
 You can only disable the write cache during/after every boot hdparm -W
 0 /dev/sd... (which reduces the SSDs write speed to about 4 MB/s), or
 avoid such SSDs, or prepare to restore from backup occasionally.

Hi Stefan,

How did you verify this?  I'm sure intel will want to hear about it if
we can reproduce on all filesystems.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oops at mount

2013-05-30 Thread Stefan Behrens
On Thu, 30 May 2013 10:03:29 -0400, Chris Mason wrote:
 Quoting Stefan Behrens (2013-05-30 08:55:58)
 Papp is using an Intel X18-M/X25-M/X25-V G2 SSD. At least with an Intel
 X25 SSD that identifies itself with INTEL SSDSA2M080 and on one with
 the ID INTEL SSDSA2M040, I've tested whether they honor the flush
 request. And these two SSDs don't do so, they ignore it. If you cut the
 power after a flush request completes, the data that was written before
 the flush request is gone, the write cache was _not_ flushed.

 You can only disable the write cache during/after every boot hdparm -W
 0 /dev/sd... (which reduces the SSDs write speed to about 4 MB/s), or
 avoid such SSDs, or prepare to restore from backup occasionally.
 
 Hi Stefan,
 
 How did you verify this?  I'm sure intel will want to hear about it if
 we can reproduce on all filesystems.
 
 -chris
 

We have written a kernel module that (among others) is able to write 4KB
block of random data at random locations on an SSD, and in a second step
to read and verify that data.

The test procedure to check SSDs is:
1. Write 4KB blocks of random data to random locations on the disk. Send
a submit_bio(REQ_FLUSH) after each 4KB block. Log the completion of the
write request and of the flush request together with the result value.
2. Somewhere in the middle of operation, switch off all power, drive
presence and SAS data pins between the SSD and the SATA host controller.
3. Wait some time, afterwards enable the connection between the SSD and
the host controller again.
4. Read back the 4KB blocks of random data at random locations using the
same seed value that was used to generate the contents and location when
the blocks were written. Verify the data, log whether the verification
succeeded or failed.
5. Compare the log of the write and flush request completion with the
one of the read and verify process.

SSDs that honor the flush request don't cause verify errors for blocks
where the write bio and the flush bio completed successfully. Those two
Intel SSDs that I mentioned failed this test. Other Intel SSD types
succeeded the test.

Maybe a firmware update would fix this issue, I suppose it will, I have
never tried it. My intention was not to blame the SSD manufacturer, in
fact, I like their SSDs very much and buy and use them frequently. I
just wanted to prevent Josef from the headache to question the Btrfs
implementation. The issue that Papp described looks just like a power
failure in conjunction with a storage device that ignores flush requests.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oops at mount

2013-05-30 Thread Chris Mason
Quoting Stefan Behrens (2013-05-30 10:59:59)
 On Thu, 30 May 2013 10:03:29 -0400, Chris Mason wrote:
  Quoting Stefan Behrens (2013-05-30 08:55:58)
  Papp is using an Intel X18-M/X25-M/X25-V G2 SSD. At least with an Intel
  X25 SSD that identifies itself with INTEL SSDSA2M080 and on one with
  the ID INTEL SSDSA2M040, I've tested whether they honor the flush
  request. And these two SSDs don't do so, they ignore it. If you cut the
  power after a flush request completes, the data that was written before
  the flush request is gone, the write cache was _not_ flushed.
 
  You can only disable the write cache during/after every boot hdparm -W
  0 /dev/sd... (which reduces the SSDs write speed to about 4 MB/s), or
  avoid such SSDs, or prepare to restore from backup occasionally.
  
  Hi Stefan,
  
  How did you verify this?  I'm sure intel will want to hear about it if
  we can reproduce on all filesystems.
  
  -chris
  
 
 We have written a kernel module that (among others) is able to write 4KB
 block of random data at random locations on an SSD, and in a second step
 to read and verify that data.
 
 The test procedure to check SSDs is:
 1. Write 4KB blocks of random data to random locations on the disk. Send
 a submit_bio(REQ_FLUSH) after each 4KB block. Log the completion of the
 write request and of the flush request together with the result value.
 2. Somewhere in the middle of operation, switch off all power, drive
 presence and SAS data pins between the SSD and the SATA host controller.
 3. Wait some time, afterwards enable the connection between the SSD and
 the host controller again.
 4. Read back the 4KB blocks of random data at random locations using the
 same seed value that was used to generate the contents and location when
 the blocks were written. Verify the data, log whether the verification
 succeeded or failed.
 5. Compare the log of the write and flush request completion with the
 one of the read and verify process.
 
 SSDs that honor the flush request don't cause verify errors for blocks
 where the write bio and the flush bio completed successfully. Those two
 Intel SSDs that I mentioned failed this test. Other Intel SSD types
 succeeded the test.
 
 Maybe a firmware update would fix this issue, I suppose it will, I have
 never tried it. My intention was not to blame the SSD manufacturer, in
 fact, I like their SSDs very much and buy and use them frequently. I
 just wanted to prevent Josef from the headache to question the Btrfs
 implementation. The issue that Papp described looks just like a power
 failure in conjunction with a storage device that ignores flush requests.

It's definitely useful information.  The gen2's did have some problems
(mine failed as well) but I didn't realize how bad the powercut handling
was.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nocow 'C' flag ignored after balance

2013-05-30 Thread Kyle Gates

On Wed, May 29, 2013 Miao Xie wrote:

On wed, 29 May 2013 10:55:11 +0900, Liu Bo wrote:

On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote:

From: Liu Bo bo.li@oracle.com

Subject: [PATCH] Btrfs: fix broken nocow after a normal balance


[...]

Sorry for the long wait in replying.
This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu
Raring kernel). I can probably try again on a newer version if you
think it will help.
This was my first kernel compile so I patched by hand and waited (10
hours on my old 32 bit single core machine).

I did move some of the files off and back on to the filesystem to
start fresh and compare but all seem to exhibit the same behavior
after a balance.



Thanks for testing the patch although it didn't help you.
Actually I tested it to be sure that it fixed the problems in my 
reproducer.


So anyway can you please apply this debug patch in order to nail it down?


Your patch can not fix the above problem is because we may 
update -last_snapshot

after we relocate the file data extent.

For example, there are two block groups which will be relocated, One is 
data block
group, the other is metadata block group. Then we relocate the data block 
group firstly,
and set the new generation for the file data extent item/the relative 
extent item and
set (new_generation - 1) for -last_snapshot. After the relocation of this 
block group,
we will end the transaction and drop the relocation tree. If we end the 
space balance now,
we won't break the nocow rule because -last_snapshot is less than the 
generation of the file
data extent item/the relative extent item. But there is still one block 
group which will be
relocated, when relocating the second block group, we will also start a 
new transaction,
and update -last_snapshot if need. So, -last_snapshot is greater than 
the generation of the file

data extent item we set before. And the nocow rule is broken.

Back to this above problem. I don't think it is a serious problem, we only 
do COW once after
the relocation, then we will still honour the nocow rule. The behaviour is 
similar to snapshot.

So maybe it needn't be fixed.


I would argue that for large vm workloads, running a balance or adding disks 
is a common practice that will result in a drastic drop in performance as 
well as massive increases in metadata writes and fragmentation.
In my case my disks were thrashing severely, performance was poor and ntp 
couldn't even hold my clock stable.

If the fix is nontrival please add this to the todo list.
Thanks,
Kyle

If we must fix this problem, I think the only way is that get the 
generation at the beginning
of the space balance, and then set it to -last_snapshot 
if -last_snapshot is less than it,
don't use (current_generation - 1) to update the -last_snapshot. Besides 
that, don't forget
to store the generation into btrfs_balance_item, or the problem will 
happen after we resume the

balance.

Thanks
Miao



thanks,
liubo

[...]






--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfsck errors (fs tree 565 refs 125 not found) - how serious

2013-05-30 Thread Clemens Eisserer
Hi again,

I am able to induce the btrfsck errors I experienced using a synthetic
workload on a fresh filesystem with linux-3.10.0.rc2.
However as filing the bug-report would take quite some time (uploading
512mb trace-files, writing a short read-me, ...) I wonder whether this
is an issue woth of reporting, or maybe just caused by an outdated
version of btrfsck (I am using
btrfs-progs-0.20.rc1.20130308git704a08c-1).

Regards, Clemens

 fs tree 565 refs 125 not found
 unresolved ref root 807 dir 813347 index 277 namelen 39 name
 snapshot_1368273601_2013-05-11_14:00:01 error 600
  ..
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs metadata corruption; unmountable FS

2013-05-30 Thread Alex Marquez
Oh, I see... Well at least now I know. Thanks!

I'll probably go for the safer route of using 3.10... Though I'd like to know 
how stable the current RC is wrt btrfs, if instead I should wait for the 
release.

~Alex

On May 30, 2013, at 8:52 AM, Josef Bacik jba...@fusionio.com wrote:

 On Wed, May 29, 2013 at 08:55:31PM -0600, Alex Marquez wrote:
 I'm not entirely sure what went completely wrong.  Three possibilities are 
 most likely, and they're listed below.
 For reference, here are supplemental materials split out into their own 
 pastebins:
 * btrfs-debug-tree -R log http://pastebin.com/7ePy9sin
 * dmesg log http://pastebin.com/s1sdJRyd
 (btrfs tools are git head)
 Mounting with recovery,ro is no use.
 I've also taken a metadata dump with btrfs-image, though it completed with 
 errors, so the dump may be incomplete.  It's also 5 GBs, but I'm more than 
 willing to make it publicly downloadable if it would help the cause.
 
 ** 1
 Firstly, I have a raid1 (and, as I'll explain, partially raid10) array of 8 
 raw drives.  A couple experience a controller error every once in a while.  
 So it /may/ be the case that the hardware itself caused this problem, but I 
 find it less likely than the following other two possibilities.  (However, 
 in part 3's log there is some mention of sdf giving IO errors...)
 
 ** 2
 A couple of months ago I was doing a balance, trying to convert from raid10 
 to raid1.  At the time, it was on the 3.6 kernel.
 
 I kept getting enospc errors (even with plenty of space), so I went from 
 doing a soft conversion to a hard one.  Of course, in the process my server 
 was hard-rebooted by accident.  When back online, I used btrfsck and it 
 showed a bunch of extent vs. csum problems, which I used --repair to attempt 
 to deal with. 
 
 Though I can't recall the problems exactly, I do remember that it triggered 
 an odd check regarding csums existing for extents that were freed.
 The commit which introduced this printf was 
 https://git.kernel.org/cgit/linux/kernel/git/mason/btrfs-progs.git/commit/?id=580ccf9e2ef4607f5b67b531190e7842c4b2b0db
 
 Since then, every once in a while I would do another balance (sometimes 
 soft, sometimes hard) in an attempt to complete the conversion -- to no 
 avail, but seemingly to no harm.
 
 ** 3
 Now, 2 weeks ago I (foolishly) thought I'd try the new skinny extents 
 feature (mistaking it as available in 3.9) in order to see if it might 
 alleviate the issues I've had with trying to finish that conversion.  I 
 enabled it via btrfstune, but quickly noted that my 3.9 kernel wouldn't 
 mount the filesystem anymore (because of the incompatible feature).
 
 However, nothing had changed on-disk (given I wasn't running 3.10) but the 
 flag...  So I looked into clearing that flag, but btrfstune provided me no 
 recourse.  So I did something very dangerous and foolish:  I went into 
 btrfstune.c and changed the setting of the flag to clear the flag instead, 
 then reran it.  I mounted again, fingers crossed, and lo and behold, it was 
 fine!
 
 Unfortunately, after some use, the filesystem failed and went read-only.  
 That's when I got scared and decided it was time to stop trying to fix 
 things myself (of course, far too late).
 
 The actual log is at http://pastebin.com/s1sdJRyd
 On line 85 you can see where I tried to mount it
 Line 87 is where I remounted after my btrfstune hack
 
 May 17 18:13:25 norman kernel: [ 1677.876008]   item 1 key (51401449938944 a9 
 0) itemoff 3911 itemsize 33
 
 So it did actually get a skinny extent in there, thats the skinny extent item
 key.  You'll have to reset the flag and move to btrfs-next/3.10.  Seems like 
 you
 are smart enough to do basic things so if don't like that option you can just
 fix btrfsck to go through and delete any extent entry that has
 BTRFS_METADATA_ITEM_KEY and then --repair should put them back normally.  If 
 you
 want to do option #2 you don't need to reset the flag, leave it unset and then
 add a function to cmds-check.c right before check_extents() and have it just 
 go
 through the extent tree and delete any entries with that key, and then
 check_extents() will take care of the rest.  This is a bit dangerous though so
 I'd really recommend option #1.  Thanks,
 
 Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfsck errors (fs tree 565 refs 125 not found) - how serious

2013-05-30 Thread Josef Bacik
On Thu, May 30, 2013 at 12:06:50PM -0600, Clemens Eisserer wrote:
 Hi again,
 
 I am able to induce the btrfsck errors I experienced using a synthetic
 workload on a fresh filesystem with linux-3.10.0.rc2.
 However as filing the bug-report would take quite some time (uploading
 512mb trace-files, writing a short read-me, ...) I wonder whether this
 is an issue woth of reporting, or maybe just caused by an outdated
 version of btrfsck (I am using
 btrfs-progs-0.20.rc1.20130308git704a08c-1).
 

You are running on an unmounted fs right?  Also please make sure you are running
the git version

git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git

Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: oops at mount

2013-05-30 Thread Stefan Behrens

On 05/30/2013 13:17, Papp Tamas wrote:

hi All,

I'm new on the list.

System:
Distributor ID:Ubuntu
Description:Ubuntu 13.04
Release:13.04
Codename:raring

Linux ctu 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013
x86_64 x86_64 x86_64 GNU/Linux

The symptom is the same with Saucy 3.9 kernel.

ii  btrfs-tools
0.20~git20130524~650e656-0daily13~raring1 amd64  Checksumming Copy on
Write Filesystem utilities


I also tried btrfs-tools v0.19 before with no luck.


$ btrfsck --repair /dev/sda1
enabling repair mode
parent transid verify failed on 430612480 wanted 81016 found 81011
parent transid verify failed on 430612480 wanted 81016 found 81011
parent transid verify failed on 430612480 wanted 81016 found 81011
parent transid verify failed on 430612480 wanted 81016 found 81011
Ignoring transid failure
Checking filesystem on /dev/sda1
UUID: deed1ffb-27bb-4555-b5ce-8a3c8ee5612c
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 67570520064 bytes used err is 0
total csum bytes: 65168792
total tree bytes: 789745664
total fs tree bytes: 651145216
total extent tree bytes: 50372608
btree space waste bytes: 192929190
file data blocks allocated: 80764424192
  referenced 69347667968
Btrfs v0.20-rc1


If I mount, I get an oops message. The machine is not completely
freezed, but I have to reboot it to be able to use it again.


69.257107] btrfsck[2703]: segfault at 7ff069802710 ip
7ff063ceecbd sp 7fff9bb5db70 error 4 in
libc-2.17.so[7ff063c6f000+1be000]
[  480.799981] device fsid deed1ffb-27bb-4555-b5ce-8a3c8ee5612c devid 1
transid 81010 /dev/sda1
[  480.802507] btrfs: disk space caching is enabled
[  480.851534] Btrfs detected SSD devices, enabling SSD mode
[  480.863245] btrfs bad tree block start 0 413601792
[  480.863320] btrfs bad tree block start 0 413601792
[  480.863389] [ cut here ]
[  480.863426] Kernel BUG at a03d3b6a [verbose debug info
unavailable]
[  480.863459] invalid opcode:  [#1] SMP
[  480.863490] Modules linked in: ip6table_filter(F) ip6_tables(F)
xt_state(F) ipt_REJECT(F) xt_CHECKSUM(F) iptable_mangle(F) xt_tcpudp(F)
iptable_filter(F) ipt_MASQUERADE(F) iptable_nat(F) nf_conntrack_ipv4(F)
nf_defrag_ipv4(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack(F) ip_tables(F)
x_tables(F) bridge(F) stp(F) llc(F) pci_stub vboxpci(OF) vboxnetadp(OF)
vboxnetflt(OF) vboxdrv(OF) rfcomm bnep snd_hda_codec_hdmi
snd_hda_codec_idt binfmt_misc(F) qcserial usb_wwan usbserial pata_pcmcia
arc4(F) hid_generic coretemp kvm_intel iwldvm kvm mac80211
ghash_clmulni_intel(F) aesni_intel(F) aes_x86_64(F) xts(F) lrw(F)
gf128mul(F) ablk_helper(F) cryptd(F) usbhid hid joydev(F) tpm_infineon
hp_wmi sparse_keymap uvcvideo videobuf2_vmalloc videobuf2_memops
videobuf2_core videodev pcmcia microcode(F) btusb bluetooth psmouse(F)
serio_raw(F) intel_ips btrfs(F) tpm_tis libcrc32c(F) zlib_deflate(F)
sdhci_pci snd_hda_intel sdhci snd_hda_codec snd_hwdep(F) snd_pcm(F)
firewire_ohci snd_page_alloc(F) firewire_core snd_seq_midi(F)
snd_seq_midi_event(F) crc_itu_t(F) yenta_socket pcmcia_rsrc i915
pcmcia_core snd_rawmidi(F) drm_kms_helper snd_seq(F) hp_accel drm
lis3lv02d snd_seq_device(F) input_polldev snd_timer(F) wmi iwlwifi
snd(F) video(F) mac_hid cfg80211 lpc_ich i2c_algo_bit mei e1000e(F)
soundcore(F) lp(F) parport(F) ahci(F) libahci(F)
[  480.864322] CPU 3
[  480.864338] Pid: 5550, comm: mount Tainted: GF  O
3.8.0-19-generic #30-Ubuntu Hewlett-Packard HP EliteBook 2540p/7008
[  480.864386] RIP: 0010:[a03d3b6a]  [a03d3b6a]
btrfs_recover_log_trees+0x23a/0x390 [btrfs]
[  480.864474] RSP: 0018:88012ad41b40  EFLAGS: 00010282
[  480.864499] RAX: fffb RBX: 88018b91c000 RCX:
0001801c001b
[  480.864531] RDX: 0001801c001c RSI: 801c001b RDI:
8801b20b3900
[  480.864563] RBP: 88012ad41bf0 R08:  R09:
0001
[  480.864594] R10:  R11:  R12:
88014fc0a5a0
[  480.864625] R13: 88011d2f0e40 R14: 88018b91a800 R15:
8801ab3ea000
[  480.864656] FS:  7fb531818840() GS:8801bbcc()
knlGS:
[  480.864693] CS:  0010 DS:  ES:  CR0: 8005003b
[  480.864718] CR2: 006a5000 CR3: 00016800b000 CR4:
07e0
[  480.864750] DR0:  DR1:  DR2:

[  480.864781] DR3:  DR6: 0ff0 DR7:
0400
[  480.864813] Process mount (pid: 5550, threadinfo 88012ad4,
task 880128522e80)
[  480.864847] Stack:
[  480.864860]  8801b0e5ce40 88012ad41b98 fffa
ff84
[  480.864905]  faff 010684ff 0106
ff84
[  480.864947]  faff 84ff 0106

[  480.864990] Call Trace:
[  480.865019]  

[PATCH] Btrfs: stop all workers before cleaning up roots

2013-05-30 Thread Josef Bacik
Dave reported a panic because the extent_root-commit_root was NULL in the
caching kthread.  That is because we just unset it in free_root_pointers, which
is not the correct thing to do, we have to either wait for the caching kthread
to complete or hold the extent_commit_sem lock so we know the thread has exited.
This patch makes the kthreads all stop first and then we do our cleanup.  This
should fix the race.  Thanks,

Reported-by: David Sterba dste...@suse.cz
Signed-off-by: Josef Bacik jba...@fusionio.com
---
 fs/btrfs/disk-io.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2b53afd..77cb566 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3547,13 +3547,13 @@ int close_ctree(struct btrfs_root *root)
 
btrfs_free_block_groups(fs_info);
 
-   free_root_pointers(fs_info, 1);
+   btrfs_stop_all_workers(fs_info);
 
del_fs_roots(fs_info);
 
-   iput(fs_info-btree_inode);
+   free_root_pointers(fs_info, 1);
 
-   btrfs_stop_all_workers(fs_info);
+   iput(fs_info-btree_inode);
 
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
if (btrfs_test_opt(root, CHECK_INTEGRITY))
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


testing stable pages being modified

2013-05-30 Thread Zach Brown
'stable' pages have always been a bit of a fiction.  It's easy to
intentionally modify stable pages under io with some help from page
references that ignore mappings and page state.

Here's little test that uses O_DIRECT to get the pinned aio ring pages
under IO and then has event completion stores modify them while they're
in flight.

It's a nice quick way to test the consequences of stable pages being
modified.  It can be used to burp out ratelimited csum failure kernel
messages with btrfs, for example.

- z

#define _GNU_SOURCE
#include stdlib.h
#include unistd.h
#include stdio.h
#include limits.h
#include sys/types.h
#include sys/stat.h
#include fcntl.h
#include sys/uio.h
#include assert.h
#include libaio.h

int main(int argc, char **argv)
{
size_t total = 1 * 1024 * 1024;
size_t page_size = sysconf(_SC_PAGESIZE);
struct iovec *iov;
size_t iov_nr = total / page_size;
void *junk;
io_context_t ctx = NULL;
int nr_iocbs = 3;
struct iocb iocbs[nr_iocbs];
struct iocb *iocb_ptrs[nr_iocbs];
struct io_event events[nr_iocbs];
int ret;
int fd;
int nr;
int i;

if (argc != 2) {
fprintf(stderr, usage: %s file_to_overwrite\n, argv[0]);
exit(1);
}

iov = calloc(iov_nr, sizeof(*iov));
junk = malloc(total);
assert(iov  junk);

fd = open(argv[1], O_RDWR|O_CREAT|O_DIRECT, 0644);
assert(fd = 0);

ret = io_setup(nr_iocbs, ctx);
assert(ret = 0);

for (i = 0; i  iov_nr; i++) {
iov[i].iov_base = ctx;
iov[i].iov_len = page_size;
}

/* initial write to allocate the file region */
ret = writev(fd, iov, iov_nr);
assert(ret == total);

/*
 * Keep one of each of these iocbs in flight:
 *
 * [0]: hopefully fast 0 byte read to keep churning events
 * [1]: dio read of file bytes to trigger csum verification
 * [2]: dio write of unstable event pages
 */
io_prep_pread(iocbs[0], fd, junk, 0, 0);
io_prep_pread(iocbs[1], fd, junk, total, 0);
io_prep_pwritev(iocbs[2], fd, iov, iov_nr, 0);

for (i = 0; i  nr_iocbs; i++)
iocb_ptrs[i] = iocbs[i];
nr = nr_iocbs;

for(;;) {
ret = io_submit(ctx, nr, iocb_ptrs);
assert(ret == nr);

nr = io_getevents(ctx, 1, nr_iocbs, events, NULL);
assert(nr  0);

for (i = 0; i  nr; i++)
iocb_ptrs[i] = events[i].obj;
}

return 0;
}
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: testing stable pages being modified

2013-05-30 Thread Chris Mason
Quoting Zach Brown (2013-05-30 18:36:10)
 'stable' pages have always been a bit of a fiction.  It's easy to
 intentionally modify stable pages under io with some help from page
 references that ignore mappings and page state.
 
 Here's little test that uses O_DIRECT to get the pinned aio ring pages
 under IO and then has event completion stores modify them while they're
 in flight.
 
 It's a nice quick way to test the consequences of stable pages being
 modified.  It can be used to burp out ratelimited csum failure kernel
 messages with btrfs, for example.

Changing O_DIRECT in flight has always been a deep dark corner case, and
crc errors are the expected result.  Have you found anyone doing this in
real life?

I do like the small test program though, we should extend it into a test
to make sure crcs are really crcing.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html