Re: Hard crash on 4.9.5

2017-03-13 Thread Omar Sandoval
On Mon, Mar 13, 2017 at 10:58:29PM +0100, Kai Krakow wrote:
> Am Sat, 28 Jan 2017 15:50:38 -0500
> schrieb Matt McKinnon :
> 
> > This same file system (which crashed again with the same errors) is
> > also giving this output during a metadata or data balance:
> 
> This looks somewhat familiar to the err=-17 that I am experiencing when
> using VirtualBox image on btrfs in CoW mode (compress=lzo).
> 
> During IO intensive workloads, it results in "object already exists,
> err -17" (or similar, someone else also experienced it through another
> workload). The resulting btrfs check show the same errors, giving
> inodes without csum.
> 
> Trying to continue using this file system in successive boots usually
> results in boot freezes or complete unmountable filesystem, broken
> beyond repair.
> 
> I'm feeling that using the bfq elevator usually enables me to trigger
> this bug also without using VirtualBox, i.e. during normal system
> usage, and mostly during boot when IO load is very high. So I also
> stopped using bfq although it was giving me a much superior
> interactivity.
> 
> Marking vbox images nocow and using standard elevators (cfq, deadline)
> exposes no such problems so far - even during excessive IO loads.
> 
> EOM

This sounds similar to a bug I fixed here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e2bd3b7fac91b79a6115fd1511ca20b2a09696d

That change is in v4.10. If you're not already running a kernel version
with that fix, could you check if that solves it?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hard crash on 4.9.5

2017-03-13 Thread Kai Krakow
Am Sat, 28 Jan 2017 15:50:38 -0500
schrieb Matt McKinnon :

> This same file system (which crashed again with the same errors) is
> also giving this output during a metadata or data balance:

This looks somewhat familiar to the err=-17 that I am experiencing when
using VirtualBox image on btrfs in CoW mode (compress=lzo).

During IO intensive workloads, it results in "object already exists,
err -17" (or similar, someone else also experienced it through another
workload). The resulting btrfs check show the same errors, giving
inodes without csum.

Trying to continue using this file system in successive boots usually
results in boot freezes or complete unmountable filesystem, broken
beyond repair.

I'm feeling that using the bfq elevator usually enables me to trigger
this bug also without using VirtualBox, i.e. during normal system
usage, and mostly during boot when IO load is very high. So I also
stopped using bfq although it was giving me a much superior
interactivity.

Marking vbox images nocow and using standard elevators (cfq, deadline)
exposes no such problems so far - even during excessive IO loads.

EOM

> Jan 27 19:42:47 my_machine kernel: [  335.018123] BTRFS info (device 
> sda1): no csum found for inode 28472371 start 2191360
> Jan 27 19:42:47 my_machine kernel: [  335.018128] BTRFS info (device 
> sda1): no csum found for inode 28472371 start 2195456
> Jan 27 19:42:47 my_machine kernel: [  335.018491] BTRFS info (device 
> sda1): no csum found for inode 28472371 start 4018176
> Jan 27 19:42:47 my_machine kernel: [  335.018496] BTRFS info (device 
> sda1): no csum found for inode 28472371 start 4022272
> Jan 27 19:42:47 my_machine kernel: [  335.018499] BTRFS info (device 
> sda1): no csum found for inode 28472371 start 4026368
> Jan 27 19:42:47 my_machine kernel: [  335.018502] BTRFS info (device 
> sda1): no csum found for inode 28472371 start 4030464
> Jan 27 19:42:47 my_machine kernel: [  335.019443] BTRFS info (device 
> sda1): no csum found for inode 28472371 start 6156288
> Jan 27 19:42:47 my_machine kernel: [  335.019688] BTRFS info (device 
> sda1): no csum found for inode 28472371 start 7933952
> Jan 27 19:42:47 my_machine kernel: [  335.019693] BTRFS info (device 
> sda1): no csum found for inode 28472371 start 7938048
> Jan 27 19:42:47 my_machine kernel: [  335.019754] BTRFS info (device 
> sda1): no csum found for inode 28472371 start 8077312
> Jan 27 19:42:47 my_machine kernel: [  335.025485] BTRFS warning
> (device sda1): csum failed ino 28472371 off 2191360 csum 4031061501
> expected csum 0 Jan 27 19:42:47 my_machine kernel: [  335.025490]
> BTRFS warning (device sda1): csum failed ino 28472371 off 2195456
> csum 2371784003 expected csum 0 Jan 27 19:42:47 my_machine kernel:
> [  335.025526] BTRFS warning (device sda1): csum failed ino 28472371
> off 4018176 csum 3812080098 expected csum 0 Jan 27 19:42:47
> my_machine kernel: [  335.025531] BTRFS warning (device sda1): csum
> failed ino 28472371 off 4022272 csum 2776681411 expected csum 0 Jan
> 27 19:42:47 my_machine kernel: [  335.025534] BTRFS warning (device
> sda1): csum failed ino 28472371 off 4026368 csum 1179241675 expected
> csum 0 Jan 27 19:42:47 my_machine kernel: [  335.025540] BTRFS
> warning (device sda1): csum failed ino 28472371 off 4030464 csum
> 1256914217 expected csum 0 Jan 27 19:42:47 my_machine kernel:
> [  335.026142] BTRFS warning (device sda1): csum failed ino 28472371
> off 7933952 csum 2695958066 expected csum 0 Jan 27 19:42:47
> my_machine kernel: [  335.026147] BTRFS warning (device sda1): csum
> failed ino 28472371 off 7938048 csum 3260800596 expected csum 0 Jan
> 27 19:42:47 my_machine kernel: [  335.026934] BTRFS warning (device
> sda1): csum failed ino 28472371 off 6156288 csum 4293116449 expected
> csum 0 Jan 27 19:42:47 my_machine kernel: [  335.033249] BTRFS
> warning (device sda1): csum failed ino 28472371 off 8077312 csum
> 4031878292 expected csum 0
> 
> Can these be ignored?
> 
> 
> On 01/25/2017 04:06 PM, Liu Bo wrote:
> > On Mon, Jan 23, 2017 at 03:03:55PM -0500, Matt McKinnon wrote:  
> >> Wondering what to do about this error which says 'reboot needed'.
> >> Has happened a three times in the past week:
> >>  
> >
> > Well, I don't think btrfs's logic here is wrong, the following stack
> > shows that a nfs client has sent a second unlink against the same
> > inode while somehow the inode was not fully deleted by the first
> > unlink.
> >
> > So it'd be good that you could add some debugging information to
> > get us further.
> >
> > Thanks,
> >
> > -liubo
> >  
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error
> >> (device sda1): err add delayed dir index item(index: 23810) into
> >> the deletion tree of the delayed node(root id: 257, inode id:
> >> 2661433, errno: -17) Jan 23 14:16:17 my_machine kernel:
> >> [ 2568.611010] [ cut here ]
> >> Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at
> >> 

Hard crash on 4.9.5, part 2

2017-01-30 Thread Matt McKinnon
I have an error on this file system I've had in the distant pass where 
the mount would fail with a "file exists" error.  Running a btrfs check 
gives the following over and over again:


Found file extent holes:
start: 0, len: 290816
root 257 inode 28472371 errors 1000, some csum missing
root 257 inode 28472416 errors 1000, some csum missing
root 257 inode 9182183 errors 1000, some csum missing
root 257 inode 9182186 errors 1000, some csum missing
root 257 inode 28419536 errors 1100, file extent discount, some csum missing
Found file extent holes:
start: 0, len: 290816
root 257 inode 28472371 errors 1000, some csum missing
root 257 inode 28472416 errors 1000, some csum missing
root 257 inode 9182183 errors 1000, some csum missing
root 257 inode 9182186 errors 1000, some csum missing
root 257 inode 28419536 errors 1100, file extent discount, some csum missing


Are these found per subvolume snapshot I have and will eventually end?

Here is the crash after the mount (with recovery/usebackuproot):

[  627.233213] BTRFS warning (device sda1): 'recovery' is deprecated, 
use 'usebackuproot' instead
[  627.233216] BTRFS info (device sda1): trying to use backup root at 
mount time

[  627.233218] BTRFS info (device sda1): disk space caching is enabled
[  627.233220] BTRFS info (device sda1): has skinny extents
[  709.234688] [ cut here ]
[  709.234734] WARNING: CPU: 5 PID: 3468 at fs/btrfs/file.c:546 
btrfs_drop_extent_cache+0x3e8/0x400 [btrfs]
[  709.234735] Modules linked in: ipmi_devintf nfsd auth_rpcgss nfs_acl 
nfs lockd grace sunrpc fscache lp parport intel_rapl sb_edac
 edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel 
xt_tcpudp kvm nf_conntrack_ipv4 nf_defrag_ipv4 irqbypass crct10d
if_pclmul crc32_pclmul ghash_clmulni_intel xt_conntrack aesni_intel 
btrfs nf_conntrack aes_x86_64 lrw gf128mul iptable_filter glue_h
elper ip_tables ablk_helper cryptd x_tables dm_multipath joydev mei_me 
ioatdma mei lpc_ich wmi ipmi_si ipmi_msghandler shpchp mac_hi
d ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor hid_generic megarai
d_sas raid6_pq ahci libcrc32c libahci igb usbhid raid1 hid i2c_algo_bit 
raid0 dca ptp multipath pps_core linear dm_mirror dm_region_

hash dm_log
[  709.234812] CPU: 5 PID: 3468 Comm: mount Not tainted 4.9.5-custom #1
[  709.234813] Hardware name: Supermicro 
X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014
[  709.234816]  bd3784bb7568 8e3c8e7c  

[  709.234820]  bd3784bb75a8 8e07d3d1 02220070 
9e5f0ae4d150
[  709.234823]  0002d000 9e5f0bc91f78 9e5f0bc91da8 
0002c000

[  709.234827] Call Trace:
[  709.234837]  [] dump_stack+0x63/0x87
[  709.234846]  [] __warn+0xd1/0xf0
[  709.234850]  [] warn_slowpath_null+0x1d/0x20
[  709.234874]  [] btrfs_drop_extent_cache+0x3e8/0x400 
[btrfs]
[  709.234895]  [] __btrfs_drop_extents+0x5b2/0xd30 
[btrfs]
[  709.234914]  [] ? 
generic_bin_search.constprop.36+0x8b/0x1e0 [btrfs]
[  709.234931]  [] ? btrfs_set_path_blocking+0x36/0x70 
[btrfs]

[  709.234942]  [] ? kmem_cache_alloc+0x194/0x1a0
[  709.234958]  [] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
[  709.234977]  [] btrfs_drop_extents+0x79/0xa0 [btrfs]
[  709.235002]  [] replay_one_extent+0x414/0x7b0 [btrfs]
[  709.235007]  [] ? autoremove_wake_function+0x40/0x40
[  709.235030]  [] replay_one_buffer+0x4cc/0x7c0 [btrfs]
[  709.235053]  [] ? 
mark_extent_buffer_accessed+0x4f/0x70 [btrfs]

[  709.235074]  [] walk_down_log_tree+0x1ba/0x3b0 [btrfs]
[  709.235094]  [] walk_log_tree+0xb4/0x1a0 [btrfs]
[  709.235114]  [] btrfs_recover_log_trees+0x20e/0x460 
[btrfs]

[  709.235133]  [] ? replay_one_extent+0x7b0/0x7b0 [btrfs]
[  709.235154]  [] open_ctree+0x2640/0x27f0 [btrfs]
[  709.235171]  [] btrfs_mount+0xca4/0xec0 [btrfs]
[  709.235176]  [] ? find_next_zero_bit+0x1e/0x20
[  709.235180]  [] ? pcpu_next_unpop+0x3e/0x50
[  709.235184]  [] ? find_next_bit+0x19/0x20
[  709.235190]  [] mount_fs+0x39/0x160
[  709.235193]  [] ? __alloc_percpu+0x15/0x20
[  709.235196]  [] vfs_kern_mount+0x67/0x110
[  709.235213]  [] btrfs_mount+0x18b/0xec0 [btrfs]
[  709.235216]  [] ? find_next_zero_bit+0x1e/0x20
[  709.235220]  [] mount_fs+0x39/0x160
[  709.235223]  [] ? __alloc_percpu+0x15/0x20
[  709.235225]  [] vfs_kern_mount+0x67/0x110
[  709.235228]  [] do_mount+0x1bb/0xc80
[  709.235232]  [] ? kmem_cache_alloc_trace+0x14b/0x1b0
[  709.235235]  [] SyS_mount+0x83/0xd0
[  709.235240]  [] entry_SYSCALL_64_fastpath+0x1e/0xad
[  709.235243] ---[ end trace d4e5dcddb432b7d3 ]---
[  709.354972] BTRFS: error (device sda1) in btrfs_replay_log:2506: 
errno=-17 Object already exists (Failed to recover log tree)
[  709.355570] BTRFS error (device sda1): cleaner transaction attach 
returned -30

[  709.548919] BTRFS error (device sda1): open_ctree failed


-Matt
--
To unsubscribe from this list: send the line "unsubscribe 

Re: Hard crash on 4.9.5

2017-01-28 Thread Matt McKinnon
This same file system (which crashed again with the same errors) is also 
giving this output during a metadata or data balance:


Jan 27 19:42:47 my_machine kernel: [  335.018123] BTRFS info (device 
sda1): no csum found for inode 28472371 start 2191360
Jan 27 19:42:47 my_machine kernel: [  335.018128] BTRFS info (device 
sda1): no csum found for inode 28472371 start 2195456
Jan 27 19:42:47 my_machine kernel: [  335.018491] BTRFS info (device 
sda1): no csum found for inode 28472371 start 4018176
Jan 27 19:42:47 my_machine kernel: [  335.018496] BTRFS info (device 
sda1): no csum found for inode 28472371 start 4022272
Jan 27 19:42:47 my_machine kernel: [  335.018499] BTRFS info (device 
sda1): no csum found for inode 28472371 start 4026368
Jan 27 19:42:47 my_machine kernel: [  335.018502] BTRFS info (device 
sda1): no csum found for inode 28472371 start 4030464
Jan 27 19:42:47 my_machine kernel: [  335.019443] BTRFS info (device 
sda1): no csum found for inode 28472371 start 6156288
Jan 27 19:42:47 my_machine kernel: [  335.019688] BTRFS info (device 
sda1): no csum found for inode 28472371 start 7933952
Jan 27 19:42:47 my_machine kernel: [  335.019693] BTRFS info (device 
sda1): no csum found for inode 28472371 start 7938048
Jan 27 19:42:47 my_machine kernel: [  335.019754] BTRFS info (device 
sda1): no csum found for inode 28472371 start 8077312
Jan 27 19:42:47 my_machine kernel: [  335.025485] BTRFS warning (device 
sda1): csum failed ino 28472371 off 2191360 csum 4031061501 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025490] BTRFS warning (device 
sda1): csum failed ino 28472371 off 2195456 csum 2371784003 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025526] BTRFS warning (device 
sda1): csum failed ino 28472371 off 4018176 csum 3812080098 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025531] BTRFS warning (device 
sda1): csum failed ino 28472371 off 4022272 csum 2776681411 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025534] BTRFS warning (device 
sda1): csum failed ino 28472371 off 4026368 csum 1179241675 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025540] BTRFS warning (device 
sda1): csum failed ino 28472371 off 4030464 csum 1256914217 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.026142] BTRFS warning (device 
sda1): csum failed ino 28472371 off 7933952 csum 2695958066 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.026147] BTRFS warning (device 
sda1): csum failed ino 28472371 off 7938048 csum 3260800596 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.026934] BTRFS warning (device 
sda1): csum failed ino 28472371 off 6156288 csum 4293116449 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.033249] BTRFS warning (device 
sda1): csum failed ino 28472371 off 8077312 csum 4031878292 expected csum 0


Can these be ignored?


On 01/25/2017 04:06 PM, Liu Bo wrote:

On Mon, Jan 23, 2017 at 03:03:55PM -0500, Matt McKinnon wrote:

Wondering what to do about this error which says 'reboot needed'.  Has
happened a three times in the past week:



Well, I don't think btrfs's logic here is wrong, the following stack
shows that a nfs client has sent a second unlink against the same inode
while somehow the inode was not fully deleted by the first unlink.

So it'd be good that you could add some debugging information to get us
further.

Thanks,

-liubo


Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device sda1):
err add delayed dir index item(index: 23810) into the deletion tree of the
delayed node(root id: 257, inode id: 2661433, errno: -17)
Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here
]
Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at
fs/btrfs/delayed-inode.c:1557!
Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode:  [#1]
SMP
Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs
qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej
ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au
th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int
el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt
d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si ipmi_msghandler
btrfs shpchp mac_hid lp parport ses enclosure scsi_tran
sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_
bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath
Jan 23 14:16:17 my_machine kernel: [ 2568.697150]  hid libahci pps_core
linear dm_mirror dm_region_hash dm_log
Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm:
nfsd Tainted: GW   4.9.5-custom #1
Jan 23 14:16:17 

Re: Hard crash on 4.9.5

2017-01-25 Thread Liu Bo
On Mon, Jan 23, 2017 at 03:03:55PM -0500, Matt McKinnon wrote:
> Wondering what to do about this error which says 'reboot needed'.  Has
> happened a three times in the past week:
> 

Well, I don't think btrfs's logic here is wrong, the following stack
shows that a nfs client has sent a second unlink against the same inode
while somehow the inode was not fully deleted by the first unlink.

So it'd be good that you could add some debugging information to get us
further.

Thanks,

-liubo

> Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device sda1):
> err add delayed dir index item(index: 23810) into the deletion tree of the
> delayed node(root id: 257, inode id: 2661433, errno: -17)
> Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here
> ]
> Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at
> fs/btrfs/delayed-inode.c:1557!
> Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode:  [#1]
> SMP
> Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs
> qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej
> ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
> nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au
> th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac
> edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int
> el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt
> d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si ipmi_msghandler
> btrfs shpchp mac_hid lp parport ses enclosure scsi_tran
> sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
> async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_
> bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath
> Jan 23 14:16:17 my_machine kernel: [ 2568.697150]  hid libahci pps_core
> linear dm_mirror dm_region_hash dm_log
> Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm:
> nfsd Tainted: GW   4.9.5-custom #1
> Jan 23 14:16:17 my_machine kernel: [ 2568.710166] Hardware name: Supermicro
> X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28
> /2014
> Jan 23 14:16:17 my_machine kernel: [ 2568.719207] task: 95a42addab80
> task.stack: b9da8533
> Jan 23 14:16:17 my_machine kernel: [ 2568.725124] RIP:
> 0010:[]  []
> btrfs_delete_delayed_dir_inde
> x+0x286/0x290 [btrfs]
> Jan 23 14:16:17 my_machine kernel: [ 2568.735604] RSP: 0018:b9da85333be0
> EFLAGS: 00010286
> Jan 23 14:16:17 my_machine kernel: [ 2568.740917] RAX:  RBX:
> 95a3b104b690 RCX: 
> Jan 23 14:16:17 my_machine kernel: [ 2568.748048] RDX: 0001 RSI:
> 95a42fc0dcc8 RDI: 95a42fc0dcc8
> Jan 23 14:16:17 my_machine kernel: [ 2568.755171] RBP: b9da85333c48 R08:
> 0491 R09: 
> Jan 23 14:16:17 my_machine kernel: [ 2568.762297] R10: 0005 R11:
> 0006 R12: 95a3b104b6d8
> Jan 23 14:16:17 my_machine kernel: [ 2568.769429] R13: 5d02 R14:
> 95a82953d800 R15: ffef
> Jan 23 14:16:17 my_machine kernel: [ 2568.776555] FS: ()
> GS:95a42fc0() knlGS:
> Jan 23 14:16:17 my_machine kernel: [ 2568.784639] CS:  0010 DS:  ES:
>  CR0: 80050033
> Jan 23 14:16:17 my_machine kernel: [ 2568.790377] CR2: 7f12ea376000 CR3:
> 0003e1e07000 CR4: 001406f0
> Jan 23 14:16:17 my_machine kernel: [ 2568.797503] Stack:
> Jan 23 14:16:17 my_machine kernel: [ 2568.799524]  9b7fe5f2
> 95a3b104b560 0004 95a3f96b3e80
> Jan 23 14:16:17 my_machine kernel: [ 2568.806983]  95a3f96b3e80
> 39ff95a814eeeb68 6000289c 5d02
> Jan 23 14:16:17 my_machine kernel: [ 2568.814436]  95a3f7457c40
> 95a3bcb74138 95a814eeeb68 00289c39
> Jan 23 14:16:17 my_machine kernel: [ 2568.821891] Call Trace:
> Jan 23 14:16:17 my_machine kernel: [ 2568.824343]  [] ?
> mutex_lock+0x12/0x2f
> Jan 23 14:16:17 my_machine kernel: [ 2568.829671]  []
> __btrfs_unlink_inode+0x198/0x4c0 [btrfs]
> Jan 23 14:16:17 my_machine kernel: [ 2568.836555]  []
> btrfs_unlink_inode+0x1c/0x40 [btrfs]
> Jan 23 14:16:17 my_machine kernel: [ 2568.843086]  []
> btrfs_unlink+0x6b/0xb0 [btrfs]
> Jan 23 14:16:17 my_machine kernel: [ 2568.849091]  []
> vfs_unlink+0xda/0x190
> Jan 23 14:16:17 my_machine kernel: [ 2568.854315]  [] ?
> lookup_one_len+0xd3/0x130
> Jan 23 14:16:17 my_machine kernel: [ 2568.860075]  []
> nfsd_unlink+0x16e/0x210 [nfsd]
> Jan 23 14:16:17 my_machine kernel: [ 2568.866084]  []
> nfsd3_proc_remove+0x7c/0x110 [nfsd]
> Jan 23 14:16:17 my_machine kernel: [ 2568.872529]  []
> nfsd_dispatch+0xb8/0x1f0 [nfsd]
> Jan 23 14:16:17 my_machine kernel: [ 2568.878641]  []
> svc_process_common+0x43f/0x700 [sunrpc]
> Jan 23 14:16:17 my_machine kernel: [ 2568.885432]  []
> 

Re: Hard crash on 4.9.5

2017-01-25 Thread Liu Bo
On Mon, Jan 23, 2017 at 09:27:22PM +0100, Hans van Kranenburg wrote:
> On 01/23/2017 09:03 PM, Matt McKinnon wrote:
> > Wondering what to do about this error which says 'reboot needed'.  Has
> > happened a three times in the past week:
> > 
> > Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device
> > sda1): err add delayed dir index item(index: 23810) into the deletion
> > tree of the delayed node(root id: 257, inode id: 2661433, errno: -17)
> > Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here
> > ]
> > Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at
> > fs/btrfs/delayed-inode.c:1557!
> > Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode: 
> > [#1] SMP
> > [...]
> 
> The purpose of the code involved is that if you create a directory or
> file and quickly remove it again, the filesystem doesn't need to do two
> disk writes, it can just erase it again from its memory before writing
> anything to disk.
> 
>  8< more 
> 
> This is when the functionality was added:
> 
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=16cdcec736cd214350cdb591bf1091f8beedefa0
> 
> If you look for "err add delayed dir" in the source code of that commit
> message, you see where the error message is constructed
> 
> errno: -17, just after it called __btrfs_add_delayed_insertion_item
> 
> __btrfs_add_delayed_insertion_item calls __btrfs_add_delayed_item, and
> the only non-0 return in that function is: return -EEXIST, which is -17
> 
> I think this means you added a file or directory, and the kernel code
> tried to add adding the file twice to the list of additions, which it
> has no way to deal with except making the whole kernel crash.
> 

This was happening while doing unlink, so I think it encounters a twice
deletion somehow.

Thanks,

-liubo

>  >8 
> 
> A while ago someone reported this on IRC, running a 4.8.13 kernel.
> (that's when I looked up the above info). I can also find it in Oct 2016
> in my IRC logs, but without any info on kernel version.
> 
> Anyway, it seems to point to something that's going wrong with changes
> that are *not* on disk *yet*, and the crash is preventing .
> 
> -- 
> Hans van Kranenburg
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hard crash on 4.9.5

2017-01-23 Thread Hans van Kranenburg
On 01/23/2017 09:27 PM, Hans van Kranenburg wrote:
> [... press send without rereading ...]
> 
> Anyway, it seems to point to something that's going wrong with changes
> that are *not* on disk *yet*, and the crash is preventing ...

... whatever incorrect data this situation might result in from reaching
disk, at least.

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hard crash on 4.9.5

2017-01-23 Thread Hans van Kranenburg
On 01/23/2017 09:03 PM, Matt McKinnon wrote:
> Wondering what to do about this error which says 'reboot needed'.  Has
> happened a three times in the past week:
> 
> Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device
> sda1): err add delayed dir index item(index: 23810) into the deletion
> tree of the delayed node(root id: 257, inode id: 2661433, errno: -17)
> Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here
> ]
> Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at
> fs/btrfs/delayed-inode.c:1557!
> Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode: 
> [#1] SMP
> [...]

The purpose of the code involved is that if you create a directory or
file and quickly remove it again, the filesystem doesn't need to do two
disk writes, it can just erase it again from its memory before writing
anything to disk.

 8< more 

This is when the functionality was added:

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=16cdcec736cd214350cdb591bf1091f8beedefa0

If you look for "err add delayed dir" in the source code of that commit
message, you see where the error message is constructed

errno: -17, just after it called __btrfs_add_delayed_insertion_item

__btrfs_add_delayed_insertion_item calls __btrfs_add_delayed_item, and
the only non-0 return in that function is: return -EEXIST, which is -17

I think this means you added a file or directory, and the kernel code
tried to add adding the file twice to the list of additions, which it
has no way to deal with except making the whole kernel crash.

 >8 

A while ago someone reported this on IRC, running a 4.8.13 kernel.
(that's when I looked up the above info). I can also find it in Oct 2016
in my IRC logs, but without any info on kernel version.

Anyway, it seems to point to something that's going wrong with changes
that are *not* on disk *yet*, and the crash is preventing .

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hard crash on 4.9.5

2017-01-23 Thread Matt McKinnon
Wondering what to do about this error which says 'reboot needed'.  Has 
happened a three times in the past week:


Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device 
sda1): err add delayed dir index item(index: 23810) into the deletion 
tree of the delayed node(root id: 257, inode id: 2661433, errno: -17)
Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here 
]
Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at 
fs/btrfs/delayed-inode.c:1557!
Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode:  
[#1] SMP
Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs 
qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej
ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au
th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac 
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int
el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt
d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si 
ipmi_msghandler btrfs shpchp mac_hid lp parport ses enclosure scsi_tran
sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_

bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath
Jan 23 14:16:17 my_machine kernel: [ 2568.697150]  hid libahci pps_core 
linear dm_mirror dm_region_hash dm_log
Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm: 
nfsd Tainted: GW   4.9.5-custom #1
Jan 23 14:16:17 my_machine kernel: [ 2568.710166] Hardware name: 
Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28

/2014
Jan 23 14:16:17 my_machine kernel: [ 2568.719207] task: 95a42addab80 
task.stack: b9da8533
Jan 23 14:16:17 my_machine kernel: [ 2568.725124] RIP: 
0010:[]  [] 
btrfs_delete_delayed_dir_inde

x+0x286/0x290 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.735604] RSP: 
0018:b9da85333be0  EFLAGS: 00010286
Jan 23 14:16:17 my_machine kernel: [ 2568.740917] RAX:  
RBX: 95a3b104b690 RCX: 
Jan 23 14:16:17 my_machine kernel: [ 2568.748048] RDX: 0001 
RSI: 95a42fc0dcc8 RDI: 95a42fc0dcc8
Jan 23 14:16:17 my_machine kernel: [ 2568.755171] RBP: b9da85333c48 
R08: 0491 R09: 
Jan 23 14:16:17 my_machine kernel: [ 2568.762297] R10: 0005 
R11: 0006 R12: 95a3b104b6d8
Jan 23 14:16:17 my_machine kernel: [ 2568.769429] R13: 5d02 
R14: 95a82953d800 R15: ffef
Jan 23 14:16:17 my_machine kernel: [ 2568.776555] FS: 
() GS:95a42fc0() knlGS:
Jan 23 14:16:17 my_machine kernel: [ 2568.784639] CS:  0010 DS:  ES: 
 CR0: 80050033
Jan 23 14:16:17 my_machine kernel: [ 2568.790377] CR2: 7f12ea376000 
CR3: 0003e1e07000 CR4: 001406f0

Jan 23 14:16:17 my_machine kernel: [ 2568.797503] Stack:
Jan 23 14:16:17 my_machine kernel: [ 2568.799524]  9b7fe5f2 
95a3b104b560 0004 95a3f96b3e80
Jan 23 14:16:17 my_machine kernel: [ 2568.806983]  95a3f96b3e80 
39ff95a814eeeb68 6000289c 5d02
Jan 23 14:16:17 my_machine kernel: [ 2568.814436]  95a3f7457c40 
95a3bcb74138 95a814eeeb68 00289c39

Jan 23 14:16:17 my_machine kernel: [ 2568.821891] Call Trace:
Jan 23 14:16:17 my_machine kernel: [ 2568.824343]  [] 
? mutex_lock+0x12/0x2f
Jan 23 14:16:17 my_machine kernel: [ 2568.829671]  [] 
__btrfs_unlink_inode+0x198/0x4c0 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.836555]  [] 
btrfs_unlink_inode+0x1c/0x40 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.843086]  [] 
btrfs_unlink+0x6b/0xb0 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.849091]  [] 
vfs_unlink+0xda/0x190
Jan 23 14:16:17 my_machine kernel: [ 2568.854315]  [] 
? lookup_one_len+0xd3/0x130
Jan 23 14:16:17 my_machine kernel: [ 2568.860075]  [] 
nfsd_unlink+0x16e/0x210 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.866084]  [] 
nfsd3_proc_remove+0x7c/0x110 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.872529]  [] 
nfsd_dispatch+0xb8/0x1f0 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.878641]  [] 
svc_process_common+0x43f/0x700 [sunrpc]
Jan 23 14:16:17 my_machine kernel: [ 2568.885432]  [] 
svc_process+0xfc/0x1c0 [sunrpc]
Jan 23 14:16:17 my_machine kernel: [ 2568.891528]  [] 
nfsd+0xf0/0x160 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.896838]  [] 
? nfsd_destroy+0x60/0x60 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.902931]  [] 
kthread+0xca/0xe0
Jan 23 14:16:17 my_machine kernel: [ 2568.907807]  [] 
? kthread_park+0x60/0x60
Jan 23 14:16:17 my_machine kernel: [ 2568.913296]  [] 
ret_from_fork+0x25/0x30
Jan 23 14:16:17 my_machine kernel: [ 2568.918693] Code: ff ff 48 8b 43 
10 49 8b