Hi All,
we have stability problems with our lustre on zfs installation:
CentOS 7.6, kernel 3.10.0-957.5.1.el7.x86_64, lustre 2.12.0, zfs 0.7.12
The ods servers have hanging kernel threads ll_ost_io. The threads locks
up the cpu and were killed after 22 sec, please have a look at the
attachement.
Is this a known problem and is this a zfs problem?
Mit freundlichen Grüßen
Bernd Melchers
--
Archiv- und Backup-Service | fab-serv...@zedat.fu-berlin.de
Freie Universität Berlin | Tel. +49-30-838-55905
[Fri Apr 5 14:49:05 2019] NMI watchdog: BUG: soft lockup - CPU#8 stuck for
22s! [ll_ost_io00_022:123082]
[Fri Apr 5 14:49:05 2019] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE)
mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE)
obdclass(OE) lnet(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd
grace fscache libcfs(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp
coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel
aesni_intel lrw gf128mul glue_helper ablk_helper cryptd opa_vnic pcspkr
zfs(POE) zunicode(POE) zavl(POE) icp(POE) ses enclosure zcommon(POE)
znvpair(POE) ast ttm spl(OE) drm_kms_helper syscopyarea sysfillrect rpcrdma
sysimgblt fb_sys_fops sg joydev drm ib_isert iscsi_target_mod mei_me
drm_panel_orientation_quirks mei lpc_ich i2c_i801 wmi ib_iser libiscsi
scsi_transport_iscsi ib_srpt target_core_mod ib_srp acpi_pad acpi_power_meter
scsi_transport_srp
[Fri Apr 5 14:49:05 2019] scsi_tgt ib_ipoib(OE) rdma_ucm ib_ucm ib_uverbs
ib_umad rdma_cm ib_cm iw_cm ip_tables dm_service_time sd_mod crc_t10dif
crct10dif_generic crct10dif_pclmul crct10dif_common hfi1(OE) crc32c_intel
rdmavt(OE) i40e(OE) ahci i2c_algo_bit mpt3sas ptp libahci ib_core pps_core
raid_class libata scsi_transport_sas ipmi_si ipmi_devintf ipmi_msghandler nfit
libnvdimm dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod
[Fri Apr 5 14:49:05 2019] CPU: 8 PID: 123082 Comm: ll_ost_io00_022 Tainted: P
OEL 3.10.0-957.5.1.el7.x86_64 #1
[Fri Apr 5 14:49:05 2019] Hardware name: Intel Corporation S2600WFT/S2600WFT,
BIOS SE5C620.86B.00.01.0013.030920180427 03/09/2018
[Fri Apr 5 14:49:05 2019] task: a07371fcb0c0 ti: a071e5c54000 task.ti:
a071e5c54000
[Fri Apr 5 14:49:05 2019] RIP: 0010:[] []
dbuf_prefetch+0xb/0x590 [zfs]
[Fri Apr 5 14:49:05 2019] RSP: 0018:a071e5c57838 EFLAGS: 0206
[Fri Apr 5 14:49:05 2019] RAX: RBX: 0074a315 RCX:
0002
[Fri Apr 5 14:49:05 2019] RDX: 00162de556bc RSI: 0001 RDI:
a071ed958348
[Fri Apr 5 14:49:05 2019] RBP: a071e5c57840 R08: 0020 R09:
[Fri Apr 5 14:49:05 2019] R10: 0074a315 R11: e1978c898e00 R12:
0074a315
[Fri Apr 5 14:49:05 2019] R13: a071e5c57840 R14: a9b65e92 R15:
a071e5c57840
[Fri Apr 5 14:49:05 2019] FS: ()
GS:a073cdc0() knlGS:
[Fri Apr 5 14:49:05 2019] CS: 0010 DS: ES: CR0: 80050033
[Fri Apr 5 14:49:05 2019] CR2: 2b9788a2 CR3: 000e5081 CR4:
007607e0
[Fri Apr 5 14:49:05 2019] DR0: DR1: DR2:
[Fri Apr 5 14:49:05 2019] DR3: DR6: fffe0ff0 DR7:
0400
[Fri Apr 5 14:49:05 2019] PKRU:
[Fri Apr 5 14:49:05 2019] Call Trace:
[Fri Apr 5 14:49:05 2019] [] dmu_zfetch+0x320/0x520 [zfs]
[Fri Apr 5 14:49:05 2019] []
dmu_buf_hold_array_by_dnode+0x420/0x4a0 [zfs]
[Fri Apr 5 14:49:05 2019] []
dmu_buf_hold_array_by_bonus+0x69/0x90 [zfs]
[Fri Apr 5 14:49:05 2019] [] osd_bufs_get+0x45b/0xd70
[osd_zfs]
[Fri Apr 5 14:49:05 2019] [] ? cfs_percpt_unlock+0x1a/0xb0
[libcfs]
[Fri Apr 5 14:49:05 2019] [] ofd_preprw+0x6b7/0x1160 [ofd]
[Fri Apr 5 14:49:05 2019] [] ?
__req_capsule_get+0x15f/0x740 [ptlrpc]
[Fri Apr 5 14:49:05 2019] [] tgt_brw_read+0x9db/0x1e50
[ptlrpc]
[Fri Apr 5 14:49:05 2019] [] ? null_alloc_rs+0x16d/0x340
[ptlrpc]
[Fri Apr 5 14:49:05 2019] [] ?
lprocfs_counter_add+0xf9/0x160 [obdclass]
[Fri Apr 5 14:49:05 2019] [] ? null_alloc_rs+0x186/0x340
[ptlrpc]
[Fri Apr 5 14:49:05 2019] [] ?
lustre_pack_reply_v2+0x14f/0x280 [ptlrpc]
[Fri Apr 5 14:49:05 2019] [] ?
lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
[Fri Apr 5 14:49:05 2019] [] ? lustre_pack_reply+0x11/0x20
[ptlrpc]
[Fri Apr 5 14:49:05 2019] []
tgt_request_handle+0xaea/0x1580 [ptlrpc]
[Fri Apr 5 14:49:05 2019] [] ?
ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
[Fri Apr 5 14:49:05 2019] [] ?
ktime_get_real_seconds+0xe/0x10 [libcfs]
[Fri Apr 5 14:49:05 2019] []
ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[Fri Apr 5 14:49:05 2019] [] ? ptlrpc_wait_event+0xa5/0x360
[ptlrpc]
[Fri Apr 5 14:49:05 2019] [] ?
default_wake_function+0x12/0x20
[Fri Apr 5 14:49:05 2019] [] ? __wake_up_common+0x5b/0x90
[Fri Apr 5 14:49:05 2019] [] ptlrpc_main+0xafc/0x1fc