[lustre-discuss] lustre stability problem (2.12.0)

2019-04-09 Thread Bernd Melchers
Hi All,
we have stability problems with our lustre on zfs installation:
CentOS 7.6, kernel 3.10.0-957.5.1.el7.x86_64, lustre 2.12.0, zfs 0.7.12

The ods servers have hanging kernel threads ll_ost_io. The threads locks
up the cpu and were killed after 22 sec, please have a look at the
attachement.
Is this a known problem and is this a zfs problem?


Mit freundlichen Grüßen
Bernd Melchers

-- 
Archiv- und Backup-Service | fab-serv...@zedat.fu-berlin.de
Freie Universität Berlin   | Tel. +49-30-838-55905
[Fri Apr  5 14:49:05 2019] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 
22s! [ll_ost_io00_022:123082]
[Fri Apr  5 14:49:05 2019] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) 
mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) 
obdclass(OE) lnet(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd 
grace fscache libcfs(OE) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp 
coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel 
aesni_intel lrw gf128mul glue_helper ablk_helper cryptd opa_vnic pcspkr 
zfs(POE) zunicode(POE) zavl(POE) icp(POE) ses enclosure zcommon(POE) 
znvpair(POE) ast ttm spl(OE) drm_kms_helper syscopyarea sysfillrect rpcrdma 
sysimgblt fb_sys_fops sg joydev drm ib_isert iscsi_target_mod mei_me 
drm_panel_orientation_quirks mei lpc_ich i2c_i801 wmi ib_iser libiscsi 
scsi_transport_iscsi ib_srpt target_core_mod ib_srp acpi_pad acpi_power_meter 
scsi_transport_srp
[Fri Apr  5 14:49:05 2019]  scsi_tgt ib_ipoib(OE) rdma_ucm ib_ucm ib_uverbs 
ib_umad rdma_cm ib_cm iw_cm ip_tables dm_service_time sd_mod crc_t10dif 
crct10dif_generic crct10dif_pclmul crct10dif_common hfi1(OE) crc32c_intel 
rdmavt(OE) i40e(OE) ahci i2c_algo_bit mpt3sas ptp libahci ib_core pps_core 
raid_class libata scsi_transport_sas ipmi_si ipmi_devintf ipmi_msghandler nfit 
libnvdimm dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod
[Fri Apr  5 14:49:05 2019] CPU: 8 PID: 123082 Comm: ll_ost_io00_022 Tainted: P  
 OEL    3.10.0-957.5.1.el7.x86_64 #1
[Fri Apr  5 14:49:05 2019] Hardware name: Intel Corporation S2600WFT/S2600WFT, 
BIOS SE5C620.86B.00.01.0013.030920180427 03/09/2018
[Fri Apr  5 14:49:05 2019] task: a07371fcb0c0 ti: a071e5c54000 task.ti: 
a071e5c54000
[Fri Apr  5 14:49:05 2019] RIP: 0010:[]  [] 
dbuf_prefetch+0xb/0x590 [zfs]
[Fri Apr  5 14:49:05 2019] RSP: 0018:a071e5c57838  EFLAGS: 0206
[Fri Apr  5 14:49:05 2019] RAX:  RBX: 0074a315 RCX: 
0002
[Fri Apr  5 14:49:05 2019] RDX: 00162de556bc RSI: 0001 RDI: 
a071ed958348
[Fri Apr  5 14:49:05 2019] RBP: a071e5c57840 R08: 0020 R09: 

[Fri Apr  5 14:49:05 2019] R10: 0074a315 R11: e1978c898e00 R12: 
0074a315
[Fri Apr  5 14:49:05 2019] R13: a071e5c57840 R14: a9b65e92 R15: 
a071e5c57840
[Fri Apr  5 14:49:05 2019] FS:  () 
GS:a073cdc0() knlGS:
[Fri Apr  5 14:49:05 2019] CS:  0010 DS:  ES:  CR0: 80050033
[Fri Apr  5 14:49:05 2019] CR2: 2b9788a2 CR3: 000e5081 CR4: 
007607e0
[Fri Apr  5 14:49:05 2019] DR0:  DR1:  DR2: 

[Fri Apr  5 14:49:05 2019] DR3:  DR6: fffe0ff0 DR7: 
0400
[Fri Apr  5 14:49:05 2019] PKRU: 
[Fri Apr  5 14:49:05 2019] Call Trace:
[Fri Apr  5 14:49:05 2019]  [] dmu_zfetch+0x320/0x520 [zfs]
[Fri Apr  5 14:49:05 2019]  [] 
dmu_buf_hold_array_by_dnode+0x420/0x4a0 [zfs]
[Fri Apr  5 14:49:05 2019]  [] 
dmu_buf_hold_array_by_bonus+0x69/0x90 [zfs]
[Fri Apr  5 14:49:05 2019]  [] osd_bufs_get+0x45b/0xd70 
[osd_zfs]
[Fri Apr  5 14:49:05 2019]  [] ? cfs_percpt_unlock+0x1a/0xb0 
[libcfs]
[Fri Apr  5 14:49:05 2019]  [] ofd_preprw+0x6b7/0x1160 [ofd]
[Fri Apr  5 14:49:05 2019]  [] ? 
__req_capsule_get+0x15f/0x740 [ptlrpc]
[Fri Apr  5 14:49:05 2019]  [] tgt_brw_read+0x9db/0x1e50 
[ptlrpc]
[Fri Apr  5 14:49:05 2019]  [] ? null_alloc_rs+0x16d/0x340 
[ptlrpc]
[Fri Apr  5 14:49:05 2019]  [] ? 
lprocfs_counter_add+0xf9/0x160 [obdclass]
[Fri Apr  5 14:49:05 2019]  [] ? null_alloc_rs+0x186/0x340 
[ptlrpc]
[Fri Apr  5 14:49:05 2019]  [] ? 
lustre_pack_reply_v2+0x14f/0x280 [ptlrpc]
[Fri Apr  5 14:49:05 2019]  [] ? 
lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
[Fri Apr  5 14:49:05 2019]  [] ? lustre_pack_reply+0x11/0x20 
[ptlrpc]
[Fri Apr  5 14:49:05 2019]  [] 
tgt_request_handle+0xaea/0x1580 [ptlrpc]
[Fri Apr  5 14:49:05 2019]  [] ? 
ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
[Fri Apr  5 14:49:05 2019]  [] ? 
ktime_get_real_seconds+0xe/0x10 [libcfs]
[Fri Apr  5 14:49:05 2019]  [] 
ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[Fri Apr  5 14:49:05 2019]  [] ? ptlrpc_wait_event+0xa5/0x360 
[ptlrpc]
[Fri Apr  5 14:49:05 2019]  [] ? 
default_wake_function+0x12/0x20
[Fri Apr  5 14:49:05 2019]  [] ? __wake_up_common+0x5b/0x90
[Fri Apr  5 14:49:05 2019]  [] ptlrpc_main+0xafc/0x1fc

[lustre-discuss] lfs check *, change of behaviour from 2.7 to 2.10?

2019-04-09 Thread Andrew Elwell
I've just noticed that 'lfs check mds / servers no longer works (2.10.0 or
greater clients) for unprivileged users, yet it worked for 2.7.x clients.

Is this by design?
(lfs quota thankfully still works as a normal user tho)


Andrew
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org