Hi all!

Hopefully some of you can shed some light on this. We have big problems with 
samba crashing when macOS smb clients access certain/random folders/files over 
vfs_ceph.

When browsing cephfs folder in question directly on a cephnode where cephfs is 
mouted we experience some issues like slow dir listing. We suspect that maybe 
macOS fetching of xattr metadata creates a lot of traffic, but it should not 
lockup the cluster like this. In logs we see both rdlock and wrlock, but mostly 
rdlocks.

End clients experience spurious disconnects when issue occurs, roughly up to a 
handfull times a day. Is this a config issue? Have we hit a bug? It's certainly 
not a feature :/

Any pointers on how to troubleshoot or rectify this problem is most welcome.

ceph version 14.2.11
samba version 4.12.10-SerNet-Ubuntu-10.focal
Supermicro X11, Intel Silver 4110, 9 ceph nodes, 2x40gbe network, 150OSD 
spinners, NVMe db/journal

--

2020-11-17 22:09:07.525706 [WRN] evicting unresponsive client bo-samba-03 
(3887652779), after 301.746 seconds
2020-11-17 22:09:07.525580 [INF] Evicting (and blacklisting) client session 
3877970532 (10.40.30.133:0/3971626932)
2020-11-17 22:09:07.525536 [WRN] evicting unresponsive client bo-samba-03 
(3877970532), after 302.034 seconds
2020-11-17 22:07:23.915412 [INF] Cluster is now healthy
2020-11-17 22:07:23.915381 [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 
MDSs report slow requests)
2020-11-17 22:07:23.915330 [INF] Health check cleared: MDS_CLIENT_LATE_RELEASE 
(was: 1 clients failing to respond to capability release)
2020-11-17 22:07:23.064492 [INF] MDS health message cleared (mds.?): 1 slow 
requests are blocked > 30 secs
2020-11-17 22:07:23.064457 [INF] MDS health message cleared (mds.?): Client 
bo-samba-03 failing to respond to capability release
2020-11-17 22:07:17.524023 [WRN] client.3887663354 isn't responding to 
mclientcaps(revoke), ino 0x10001202b55 pending pAsLsXsFs issued pAsLsXsFsx, 
sent 63.325997 seconds ago
2020-11-17 22:07:17.523987 [INF] Evicting (and blacklisting) client session 
3887663354 (10.40.30.133:0/3230547239)
2020-11-17 22:07:17.523967 [WRN] evicting unresponsive client bo-samba-03 
(3887663354), after 64.5412 seconds
2020-11-17 22:07:17.523610 [WRN] slow request 63.325528 seconds old, received 
at 2020-11-17 22:06:14.197986: client_request(client.3878823430:4 lookup 
#0x100011f9a68/mappe uten navn 2020-11-17 22:06:14.197908 caller_uid=111139, 
caller_gid=110513{}) currently failed to rdlock, waiting
2020-11-17 22:07:17.523596 [WRN] 1 slow requests, 1 included below; oldest 
blocked for > 63.325529 secs
2020-11-17 22:07:19.255177 [WRN] Health check failed: 1 clients failing to 
respond to capability release (MDS_CLIENT_LATE_RELEASE)
2020-11-17 22:07:12.523453 [WRN] 1 slow requests, 0 included below; oldest 
blocked for > 58.325433 secs
2020-11-17 22:07:07.523382 [WRN] 1 slow requests, 0 included below; oldest 
blocked for > 53.325362 secs
2020-11-17 22:07:02.523360 [WRN] 1 slow requests, 0 included below; oldest 
blocked for > 48.325307 secs
2020-11-17 22:06:57.523218 [WRN] 1 slow requests, 0 included below; oldest 
blocked for > 43.325199 secs
2020-11-17 22:06:52.523203 [WRN] 1 slow requests, 0 included below; oldest 
blocked for > 38.325158 secs
2020-11-17 22:06:47.523105 [WRN] slow request 33.325065 seconds old, received 
at 2020-11-17 22:06:14.197986: client_request(client.3878823430:4 lookup 
#0x100011f9a68/mappe uten navn 2020-11-17 22:06:14.197908 caller_uid=111139, 
caller_gid=110513{}) currently failed to rdlock, waiting
2020-11-17 22:06:47.523100 [WRN] 1 slow requests, 1 included below; oldest 
blocked for > 33.325065 secs
2020-11-17 22:06:51.431745 [WRN] Health check failed: 1 MDSs report slow 
requests (MDS_SLOW_REQUEST)
2020-11-17 22:06:20.045030 [INF] Cluster is now healthy
2020-11-17 22:06:20.045008 [INF] Health check cleared: MDS_SLOW_REQUEST (was: 1 
MDSs report slow requests)
2020-11-17 22:06:20.044960 [INF] Health check cleared: MDS_CLIENT_LATE_RELEASE 
(was: 1 clients failing to respond to capability release)
2020-11-17 22:06:19.062307 [INF] MDS health message cleared (mds.?): 1 slow 
requests are blocked > 30 secs
2020-11-17 22:06:19.062253 [INF] MDS health message cleared (mds.?): Client 
bo-samba-03 failing to respond to capability release
2020-11-17 22:06:15.936150 [WRN] Health check failed: 1 clients failing to 
respond to capability release (MDS_CLIENT_LATE_RELEASE)
2020-11-17 22:06:12.522624 [WRN] client.3869410498 isn't responding to 
mclientcaps(revoke), ino 0x10001202b55 pending pAsLsXsFs issued pAsLsXsFsx, 
sent 64.045677 seconds ago


--thomas

--
Thomas Hukkelberg
tho...@hovedkvarteret.no
+47 971 81 192
--
supp...@hovedkvarteret.no
+47 966 44 999




_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to