Hi all,

we run a > 500 TiB backup system on iSCSI targets using 19 BTRFS
filesystems (the biggest of which is 110 TiB) on Ubuntu 14.04 LTS and
various kernel versions. Btrfs-Progs v3.17.1. The hardware is a 24 core
Xeon E5-2620 on an Intel S2600GZ board with 128 GiB RAM.

Since btrfs has changed to kworkers (I think in 3.15) the frontend
server somewhat randomly crashes with soft lockups (see attachment). The
system is rock solid with the 3.14.22 kernel.

The lockups happen during the nightly cron-controlled rsync backups and
occur at random times during this process.
We are totally aware of the fact that this tends to be one of
those “it doesn’t work” bug reports, but it’s really hard to pin
down the source of the problem other than it seems to be related to the
kworkers. We’d love to provide any feedback we can, please let us know
what you need.

Regards
Patrick
-- 
Patrick Schmid  <sch...@phys.ethz.ch>     support: +41 44 633 2668
IT Services Group, HPT H 8                voice:   +41 44 633 3997
Departement Physik, ETH Zurich
CH-8093 Zurich, Switzerland
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207104] NMI watchdog: BUG: soft 
lockup - CPU#0 stuck for 23s! [kworker/u481:26:108963]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207147] Modules linked in: btrfs(E) 
xor(E) raid6_pq(E) tcp_diag(E) inet_diag(E) autofs4(E) ib_iser(E) rdma_cm(E) 
iw_cm(E) ib_cm(E) ib_sa(E) ib_mad(E) ib_core(E) ib_addr(E) iscsi_tcp(E) 
libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) x86_pkg_temp_thermal(E) 
intel_powerclamp(E) coretemp(E) crct10dif_pclmul(E) crc32_pclmul(E) 
ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) 
glue_helper(E) ablk_helper(E) mousedev(E) cryptd(E) ioatdma(E) sb_edac(E) 
microcode(E) ipmi_si(E) edac_core(E) lpc_ich(E) mei_me(E) ipmi_msghandler(E) 
tpm_tis(E) mei(E) wmi(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) 
sunrpc(E) fscache(E) lp(E) parport(E) hid_generic(E) usbhid(E) hid(E) igb(E) 
ixgbe(E) i2c_algo_bit(E) dca(E) isci(E) ptp(E) ahci(E) libsas(E) 
scsi_transport_sas(E) libahci(E) mdio(E) arcmsr(E) pps_core(E)
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207152] CPU: 0 PID: 108963 Comm: 
kworker/u481:26 Tainted: G            EL 3.17.2-stable.slub #6
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207154] Hardware name: Intel 
Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207185] Workqueue: btrfs-endio-write 
btrfs_endio_write_helper [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207186] task: ffff8802e34a8000 ti: 
ffff88070a5a8000 task.ti: ffff88070a5a8000
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207194] RIP: 
0010:[<ffffffff810b0b35>]  [<ffffffff810b0b35>] 
queue_read_lock_slowpath+0xb5/0xd0
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207195] RSP: 0018:ffff88070a5aba00  
EFLAGS: 00000206
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207196] RAX: 00000000000041b8 RBX: 
ffff8806bdac3a18 RCX: 0000000000003bcc
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207197] RDX: ffff8800a2c4f350 RSI: 
0000000000003bcc RDI: ffff8800a2c4f354
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207198] RBP: ffff88070a5aba08 R08: 
0000000000003bc6 R09: 0000000000000000
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207199] R10: 00000000ffffffff R11: 
0000000000000001 R12: ffff88081ee14300
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207200] R13: ffff88100e6e0000 R14: 
ffffffff810946ac R15: ffff88070a5ab9a8
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207202] FS:  0000000000000000(0000) 
GS:ffff88081ee00000(0000) knlGS:0000000000000000
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207203] CS:  0010 DS: 0000 ES: 0000 
CR0: 0000000080050033
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207204] CR2: 0000000002b97fc8 CR3: 
0000000001c16000 CR4: 00000000000407f0
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207205] Stack:
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207207]  ffffffff8173b07c 
ffff88070a5aba68 ffffffffa04d8a3b 0000000000000000
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207209]  ffff88070a5aba78 
ffffffffa04757af 00003f66a0497f6e ffff88061c29af68
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207211]  ffff8800a2c4f2e0 
ffff88100f36d800 ffff880000000000 0000160000000000
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207212] Call Trace:
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207218]  [<ffffffff8173b07c>] ? 
_raw_read_lock+0x1c/0x30
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207233]  [<ffffffffa04d8a3b>] 
btrfs_tree_read_lock+0x5b/0x120 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207241]  [<ffffffffa04757af>] ? 
leaf_space_used+0xcf/0x110 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207249]  [<ffffffffa0477d6b>] 
btrfs_read_lock_root_node+0x3b/0x50 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207258]  [<ffffffffa047cbee>] 
btrfs_search_slot+0x50e/0xa10 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207269]  [<ffffffffa0494257>] 
btrfs_lookup_file_extent+0x37/0x40 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207282]  [<ffffffffa04b35da>] 
__btrfs_drop_extents+0x16a/0xd90 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207285]  [<ffffffff810946ac>] ? 
try_to_wake_up+0x1fc/0x340
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207299]  [<ffffffffa04bc65b>] ? 
__set_extent_bit+0x15b/0x540 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207302]  [<ffffffff811b0a12>] ? 
kmem_cache_alloc+0x122/0x130
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207311]  [<ffffffffa0477aea>] ? 
btrfs_alloc_path+0x1a/0x20 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207323]  [<ffffffffa04a36ce>] 
insert_reserved_file_extent.constprop.59+0x9e/0x2f0 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207335]  [<ffffffffa04a94c5>] 
btrfs_finish_ordered_io+0x2e5/0x5f0 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207345]  [<ffffffffa04a9ad5>] 
finish_ordered_fn+0x15/0x20 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207358]  [<ffffffffa04cf3e2>] 
normal_work_helper+0xc2/0x2b0 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207362]  [<ffffffff8107fe09>] ? 
pwq_activate_delayed_work+0x39/0x80
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207374]  [<ffffffffa04cf742>] 
btrfs_endio_write_helper+0x12/0x20 [btrfs]
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207377]  [<ffffffff81082000>] 
process_one_work+0x150/0x3f0
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207379]  [<ffffffff810826f1>] 
worker_thread+0x121/0x520
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207381]  [<ffffffff810825d0>] ? 
rescuer_thread+0x330/0x330
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207385]  [<ffffffff81087992>] 
kthread+0xd2/0xf0
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207388]  [<ffffffff810878c0>] ? 
kthread_create_on_node+0x180/0x180
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207390]  [<ffffffff8173b6bc>] 
ret_from_fork+0x7c/0xb0
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207393]  [<ffffffff810878c0>] ? 
kthread_create_on_node+0x180/0x180
Nov 12 23:25:16 phd-bkp-gw kernel: [29411.207413] Code: 8b 02 3c ff 74 f8 f3 c3 
55 48 89 e5 e8 a8 df 67 00 5d c3 83 e1 fe 0f b7 f1 b8 00 80 00 00 44 0f b7 42 
04 66 44 39 c1 74 83 f3 90 <83> e8 01 75 ee 66 66 66 90 66 66 90 eb e0 66 2e 0f 
1f 84 00 00

Reply via email to