btrfs crash with a corrupted(?) filesystem

2014-02-04 Thread Roman Mamedov
Hello,

My server had a period of instability (PSU-related issues), some lockups,
some strange crashes, and some files became corrupted, and perhaps parts of
a filesystem too. One BTRFS partition now fails with the following errors.

On an attempt to make a snapshot:

[   48.035664] btrfs: corrupt leaf, bad key order: block=193529446400,root=1, 
slot=9
[   48.035795] [ cut here ]
[   48.035840] kernel BUG at fs/btrfs/inode.c:873!
[   48.035884] invalid opcode:  [#1] SMP 
[   48.036000] Modules linked in: cpufreq_stats cpufreq_userspace 
cpufreq_powersave cpufreq_conservative nfsd auth_rpcgss oid_registry nfs_acl 
nfs lockd dns_resolver fscache sunrpc 8021q garp mrp bridge stp llc fuse ext3 
jbd it87 hwmon_vid ecryptfs snd_hda_codec_hdmi acpi_cpufreq 
snd_hda_codec_realtek mperf snd_hda_intel snd_hda_codec kvm_amd snd_pcsp 
snd_hwdep kvm fglrx(PO) snd_pcm_oss sp5100_tco snd_mixer_oss crc32c_intel 
ghash_clmulni_intel snd_pcm eeepc_wmi aesni_intel asus_wmi sparse_keymap 
uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core psmouse rfkill 
snd_page_alloc video videodev edac_mce_amd snd_timer edac_core aes_x86_64 
serio_raw evdev fam15h_power media snd ablk_helper joydev i2c_piix4 cp210x 
cryptd processor usbserial k10temp lrw i2c_core mxm_wmi thermal_sys gf128mul 
wmi glue_helper soundcore ehci_pci button ext4 crc16 jbd2 mbcache btrfs 
zlib_deflate crc32c libcrc32c dm_mod raid1 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod nbd sata_nv sg 
sd_mod crc_t10dif hid_generic usbhid hid sata_promise ohci_hcd ehci_hcd 
microcode xhci_hcd ahci ata_generic libahci libata usbcore r8169 mii usb_common 
scsi_mod
[   48.040775] CPU: 4 PID: 4145 Comm: btrfs Tainted: P   O 3.10.28-rm1+ 
#54
[   48.040825] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./M5A97 LE R2.0, BIOS 1903 07/11/2013
[   48.040876] task: 88040b4dc300 ti: 880409c14000 task.ti: 
880409c14000
[   48.040926] RIP: 0010:[a023f225]  [a023f225] 
__cow_file_range+0x445/0x4e0 [btrfs]
[   48.041042] RSP: 0018:880409c15328  EFLAGS: 00010203
[   48.041086] RAX: 1000 RBX:  RCX: 0102
[   48.041131] RDX: 0103 RSI: 88042e16a2e0 RDI: 88042e146e50
[   48.041177] RBP: 880409c153e8 R08:  R09: 0003
[   48.041222] R10: 0004 R11: 0001 R12: 8804352eb0a0
[   48.041267] R13:  R14: 88042e16a2e0 R15: 
[   48.041318] FS:  7f064ac9e780() GS:88044ed0() 
knlGS:
[   48.041367] CS:  0010 DS:  ES:  CR0: 8005003b
[   48.041411] CR2: 0046a9f0 CR3: 0004388dc000 CR4: 000407e0
[   48.041456] DR0:  DR1:  DR2: 
[   48.041532] DR3:  DR6: 0ff0 DR7: 0400
[   48.041576] Stack:
[   48.041617]  880409c15388 880430a2e800 8804352eb130 
880409c15350
[   48.041813]  0003 88042e16a100 880409c15350 
ea000eb312c8
[   48.042006]  88042e146e50 ffff16a0 8804347c0800 

[   48.042200] Call Trace:
[   48.042274]  [a0255a89] ? free_extent_buffer+0x59/0xa0 [btrfs]
[   48.042348]  [a023f668] run_delalloc_nocow+0x3a8/0xaf0 [btrfs]
[   48.042421]  [a02401c0] run_delalloc_range+0x330/0x390 [btrfs]
[   48.042495]  [a0254641] __extent_writepage+0x2f1/0x750 [btrfs]
[   48.042570]  [a0254d52] 
extent_write_cache_pages.isra.31.constprop.47+0x2b2/0x3c0 [btrfs]
[   48.042650]  [a02550d7] extent_writepages+0x47/0x60 [btrfs]
[   48.042752]  [a023bee0] ? can_nocow_odirect+0x330/0x330 [btrfs]
[   48.042823]  [a0239873] btrfs_writepages+0x23/0x30 [btrfs]
[   48.042873]  [8110ea19] do_writepages+0x19/0x40
[   48.042921]  [81104711] __filemap_fdatawrite_range+0x51/0x60
[   48.042969]  [811054ce] filemap_fdatawrite_range+0xe/0x10
[   48.043042]  [a024f9e8] btrfs_wait_ordered_range+0x48/0x110 [btrfs]
[   48.043116]  [a027499a] __btrfs_write_out_cache+0x76a/0x990 [btrfs]
[   48.043187]  [a0232405] ? btrfs_buffer_uptodate+0x65/0x80 [btrfs]
[   48.043261]  [a0274ec2] btrfs_write_out_cache+0xb2/0xf0 [btrfs]
[   48.045346]  [a0255a89] ? free_extent_buffer+0x59/0xa0 [btrfs]
[   48.045414]  [a0227383] btrfs_write_dirty_block_groups+0x573/0x660 
[btrfs]
[   48.045489]  [a02353a4] commit_cowonly_roots+0x164/0x260 [btrfs]
[   48.045560]  [a023724c] btrfs_commit_transaction+0x59c/0xab0 
[btrfs]
[   48.045614]  [81075030] ? finish_wait+0x80/0x80
[   48.045686]  [a026692a] btrfs_mksubvol.isra.49+0x3aa/0x450 [btrfs]
[   48.045759]  [a0266aba] btrfs_ioctl_snap_create_transid+0xea/0x170 
[btrfs]
[   48.045838]  [a0266bfa] ? btrfs_ioctl_snap_create_v2+0x3a/0x140 

Re: btrfs crash with a corrupted(?) filesystem

2014-02-04 Thread Hugo Mills
On Tue, Feb 04, 2014 at 10:23:10PM +0600, Roman Mamedov wrote:
 Hello,
 
 My server had a period of instability (PSU-related issues), some lockups,
 some strange crashes, and some files became corrupted, and perhaps parts of
 a filesystem too. One BTRFS partition now fails with the following errors.
 
 On an attempt to make a snapshot:
 
 [   48.035664] btrfs: corrupt leaf, bad key order: block=193529446400,root=1, 
 slot=9
[snip]

   Bad key order is pretty much always down to hardware corrupting
data at some point -- which would go well with your list of hardware
problems above.

 Currently I have it mounted read-only, and all data seems to be accessible.
 Short of copying everything away and recreating the FS, how can I bring it to
 a working order. Is btrfsck a good option here?

   The first investigation to do would be to look at the block in
question and see if it's got an obvious problem with it. If you post
the output of btrfs-debug-tree -b 193529446400 /dev/whatever, we can
take a look at the indexing.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- You're never alone with a rubber duck... --- 


signature.asc
Description: Digital signature


Re: btrfs crash with a corrupted(?) filesystem

2014-02-04 Thread Roman Mamedov
On Tue, 4 Feb 2014 16:32:35 +
Hugo Mills h...@carfax.org.uk wrote:

 On Tue, Feb 04, 2014 at 10:23:10PM +0600, Roman Mamedov wrote:
  Hello,
  
  My server had a period of instability (PSU-related issues), some lockups,
  some strange crashes, and some files became corrupted, and perhaps parts of
  a filesystem too. One BTRFS partition now fails with the following errors.
  
  On an attempt to make a snapshot:
  
  [   48.035664] btrfs: corrupt leaf, bad key order: 
  block=193529446400,root=1, slot=9
 [snip]
 
Bad key order is pretty much always down to hardware corrupting
 data at some point -- which would go well with your list of hardware
 problems above.
 
  Currently I have it mounted read-only, and all data seems to be accessible.
  Short of copying everything away and recreating the FS, how can I bring it 
  to
  a working order. Is btrfsck a good option here?
 
The first investigation to do would be to look at the block in
 question and see if it's got an obvious problem with it. If you post
 the output of btrfs-debug-tree -b 193529446400 /dev/whatever, we can
 take a look at the indexing.

Thanks; here it is:

# btrfs-debug-tree -b 193529446400 /dev/md4 
leaf 193529446400 items 81 free space 46 generation 565572 owner 7
fs uuid dd12de99-bbe5-45cf-b869-6313c1f58431
chunk uuid b61f845a-ada5-4bcf-b995-7c5e1affa63d
item 0 key (EXTENT_CSUM EXTENT_CSUM 4278808576) itemoff 3955 itemsize 40
extent csum item
item 1 key (EXTENT_CSUM EXTENT_CSUM 4278853632) itemoff 3895 itemsize 60
extent csum item
item 2 key (EXTENT_CSUM EXTENT_CSUM 4278919168) itemoff 3883 itemsize 12
extent csum item
item 3 key (EXTENT_CSUM EXTENT_CSUM 4278931456) itemoff 3843 itemsize 40
extent csum item
item 4 key (EXTENT_CSUM EXTENT_CSUM 4278976512) itemoff 3819 itemsize 24
extent csum item
item 5 key (EXTENT_CSUM EXTENT_CSUM 4279001088) itemoff 3815 itemsize 4
extent csum item
item 6 key (EXTENT_CSUM EXTENT_CSUM 4279005184) itemoff 3787 itemsize 28
extent csum item
item 7 key (EXTENT_CSUM EXTENT_CSUM 4279033856) itemoff 3715 itemsize 72
extent csum item
item 8 key (EXTENT_CSUM EXTENT_CSUM 4279107584) itemoff 3619 itemsize 96
extent csum item
item 9 key (EXTENT_CSUM EXTENT_CSUM 72998785024) itemoff 3599 itemsize 
20
extent csum item
item 10 key (EXTENT_CSUM EXTENT_CSUM 4279345152) itemoff 3583 itemsize 
16
extent csum item
item 11 key (EXTENT_CSUM EXTENT_CSUM 4279369728) itemoff 3575 itemsize 8
extent csum item
item 12 key (EXTENT_CSUM EXTENT_CSUM 4279463936) itemoff 3551 itemsize 
24
extent csum item
item 13 key (EXTENT_CSUM EXTENT_CSUM 4279496704) itemoff 3523 itemsize 
28
extent csum item
item 14 key (EXTENT_CSUM EXTENT_CSUM 4279525376) itemoff 3515 itemsize 8
extent csum item
item 15 key (EXTENT_CSUM EXTENT_CSUM 4279533568) itemoff 3511 itemsize 4
extent csum item
item 16 key (EXTENT_CSUM EXTENT_CSUM 4279537664) itemoff 3491 itemsize 
20
extent csum item
item 17 key (EXTENT_CSUM EXTENT_CSUM 4279562240) itemoff 3463 itemsize 
28
extent csum item
item 18 key (EXTENT_CSUM EXTENT_CSUM 4279607296) itemoff 3459 itemsize 4
extent csum item
item 19 key (EXTENT_CSUM EXTENT_CSUM 4279615488) itemoff 3403 itemsize 
56
extent csum item
item 20 key (EXTENT_CSUM EXTENT_CSUM 4280020992) itemoff 3395 itemsize 8
extent csum item
item 21 key (EXTENT_CSUM EXTENT_CSUM 4280033280) itemoff 3391 itemsize 4
extent csum item
item 22 key (EXTENT_CSUM EXTENT_CSUM 4280053760) itemoff 3379 itemsize 
12
extent csum item
item 23 key (EXTENT_CSUM EXTENT_CSUM 4280082432) itemoff 3331 itemsize 
48
extent csum item
item 24 key (EXTENT_CSUM EXTENT_CSUM 4280135680) itemoff 3311 itemsize 
20
extent csum item
item 25 key (EXTENT_CSUM EXTENT_CSUM 4280156160) itemoff 3299 itemsize 
12
extent csum item
item 26 key (EXTENT_CSUM EXTENT_CSUM 4280168448) itemoff 3255 itemsize 
44
extent csum item
item 27 key (EXTENT_CSUM EXTENT_CSUM 4280229888) itemoff 3243 itemsize 
12
extent csum item
item 28 key (EXTENT_CSUM EXTENT_CSUM 4280262656) itemoff 3231 itemsize 
12
extent csum item
item 29 key (EXTENT_CSUM EXTENT_CSUM 4280360960) itemoff 3223 itemsize 8
extent csum item
item 30 key (EXTENT_CSUM EXTENT_CSUM 4280369152) itemoff 3123 itemsize 
100
extent csum item
item 31 key (EXTENT_CSUM EXTENT_CSUM 4280496128) itemoff 3115 itemsize 

Re: btrfs crash with a corrupted(?) filesystem

2014-02-04 Thread Hugo Mills
On Tue, Feb 04, 2014 at 10:35:06PM +0600, Roman Mamedov wrote:
 On Tue, 4 Feb 2014 16:32:35 +
 Hugo Mills h...@carfax.org.uk wrote:
 
  On Tue, Feb 04, 2014 at 10:23:10PM +0600, Roman Mamedov wrote:
   Hello,
   
   My server had a period of instability (PSU-related issues), some lockups,
   some strange crashes, and some files became corrupted, and perhaps parts 
   of
   a filesystem too. One BTRFS partition now fails with the following errors.
   
   On an attempt to make a snapshot:
   
   [   48.035664] btrfs: corrupt leaf, bad key order: 
   block=193529446400,root=1, slot=9
  [snip]
  
 Bad key order is pretty much always down to hardware corrupting
  data at some point -- which would go well with your list of hardware
  problems above.
  
   Currently I have it mounted read-only, and all data seems to be 
   accessible.
   Short of copying everything away and recreating the FS, how can I bring 
   it to
   a working order. Is btrfsck a good option here?
  
 The first investigation to do would be to look at the block in
  question and see if it's got an obvious problem with it. If you post
  the output of btrfs-debug-tree -b 193529446400 /dev/whatever, we can
  take a look at the indexing.
 
 Thanks; here it is:
 
 # btrfs-debug-tree -b 193529446400 /dev/md4 
 leaf 193529446400 items 81 free space 46 generation 565572 owner 7
 fs uuid dd12de99-bbe5-45cf-b869-6313c1f58431
 chunk uuid b61f845a-ada5-4bcf-b995-7c5e1affa63d
   item 0 key (EXTENT_CSUM EXTENT_CSUM 4278808576) itemoff 3955 itemsize 40
   extent csum item
   item 1 key (EXTENT_CSUM EXTENT_CSUM 4278853632) itemoff 3895 itemsize 60
   extent csum item
   item 2 key (EXTENT_CSUM EXTENT_CSUM 4278919168) itemoff 3883 itemsize 12
   extent csum item
   item 3 key (EXTENT_CSUM EXTENT_CSUM 4278931456) itemoff 3843 itemsize 40
   extent csum item
   item 4 key (EXTENT_CSUM EXTENT_CSUM 4278976512) itemoff 3819 itemsize 24
   extent csum item
   item 5 key (EXTENT_CSUM EXTENT_CSUM 4279001088) itemoff 3815 itemsize 4
   extent csum item
   item 6 key (EXTENT_CSUM EXTENT_CSUM 4279005184) itemoff 3787 itemsize 28
   extent csum item
   item 7 key (EXTENT_CSUM EXTENT_CSUM 4279033856) itemoff 3715 itemsize 72
   extent csum item
   item 8 key (EXTENT_CSUM EXTENT_CSUM 4279107584) itemoff 3619 itemsize 96
   extent csum item
   item 9 key (EXTENT_CSUM EXTENT_CSUM 72998785024) itemoff 3599 itemsize 
 20
   extent csum item

   ^^^ Here it is.

The previous key (item 8):
 hex(4279107584)
'0xff0eL'

This key (item 9):
 hex(72998785024)
'0x10ff111000L'

   So it looks likely that you've got a single bit flip in the key.
Josef had a patch for fsck some time before Christmas that would deal
with (some of) these cases, but I'm not sure if this is one of them.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- You're never alone with a rubber duck... --- 


signature.asc
Description: Digital signature