David, few inputs based on my working experience on cephFS. Might or might not be relevant to the current issue seen in your cluster.
1. Create Metadata pool on NVMe. Folks can claim not needed, but I have seen worst perf when on HDD though the Metadata size is very small. 2. In cephFS, ensure MDS node has enough RAM allocated for MDS cache(this will not improve drastic perf. But some extent). On side note, MDS has some bug related to oversubscribed memory usage regardless of the cache settings if you have more than 64GB RAM. Take a look. http://tracker.ceph.com/issues/21402 http://tracker.ceph.com/issues/22599 https://bugzilla.redhat.com/show_bug.cgi?id=1531679 1. cephFS is not great for small files(in KB’s) but works great with large file sizes(MB or GB’s). So using like filer(NFS/SMB) use-case needs administration attention. 2. Next thing to ensure if the large # of inode/file counts in cephFS. Ensure dirfrag, active/active MDS etc tunable are implemented on the luminous version you used on filestore or asking users not to store multi-million of small files in one dir(it’s debatable scenario, not sure how much control you have over you customer use-case) 3. Always use kernel mounts. ceph-fuse are super slow(3-5 times than kernel mounts), I hope you may know this. -- Deepak From: ceph-users [mailto:[email protected]] On Behalf Of David C Sent: Wednesday, March 14, 2018 10:46 AM To: John Spray <[email protected]> Cc: ceph-users <[email protected]> Subject: Re: [ceph-users] Cephfs MDS slow requests Thanks, John. I'm pretty sure the root of my slow OSD issues is filestore subfolder splitting. On Wed, Mar 14, 2018 at 2:17 PM, John Spray <[email protected]<mailto:[email protected]>> wrote: On Tue, Mar 13, 2018 at 7:17 PM, David C <[email protected]<mailto:[email protected]>> wrote: > Hi All > > I have a Samba server that is exporting directories from a Cephfs Kernel > mount. Performance has been pretty good for the last year but users have > recently been complaining of short "freezes", these seem to coincide with > MDS related slow requests in the monitor ceph.log such as: > >> 2018-03-13 13:34:58.461030 osd.15 osd.15 >> 10.10.10.211:6812/13367<http://10.10.10.211:6812/13367> 5752 : >> cluster [WRN] slow request 31.834418 seconds old, received at 2018-03-13 >> 13:34:26.626474: osd_repop(mds.0.5495:810644 3.3e e14085/14019 >> 3:7cea5bac:::10001a88b8f.00000000:head v 14085'846936) currently commit_sent >> 2018-03-13 13:34:59.461270 osd.15 osd.15 >> 10.10.10.211:6812/13367<http://10.10.10.211:6812/13367> 5754 : >> cluster [WRN] slow request 32.832059 seconds old, received at 2018-03-13 >> 13:34:26.629151: osd_repop(mds.0.5495:810671 2.dc2 e14085/14020 >> 2:43bdcc3f:::10001e91a91.00000000:head v 14085'21394) currently commit_sent >> 2018-03-13 14:23:57.409427 osd.30 osd.30 >> 10.10.10.212:6824/14997<http://10.10.10.212:6824/14997> 5708 : >> cluster [WRN] slow request 30.536832 seconds old, received at 2018-03-13 >> 14:23:26.872513: osd_repop(mds.0.5495:865403 2.fb6 e14085/14077 >> 2:6df955ef:::10001e93542.000000c4:head v 14085'21296) currently commit_sent >> 2018-03-13 14:23:57.409449 osd.30 osd.30 >> 10.10.10.212:6824/14997<http://10.10.10.212:6824/14997> 5709 : >> cluster [WRN] slow request 30.529640 seconds old, received at 2018-03-13 >> 14:23:26.879704: osd_repop(mds.0.5495:865407 2.595 e14085/14019 >> 2:a9a56101:::10001e93542.000000c8:head v 14085'20437) currently commit_sent >> 2018-03-13 14:23:57.409453 osd.30 osd.30 >> 10.10.10.212:6824/14997<http://10.10.10.212:6824/14997> 5710 : >> cluster [WRN] slow request 30.503138 seconds old, received at 2018-03-13 >> 14:23:26.906207: osd_repop(mds.0.5495:865423 2.ea e14085/14055 >> 2:57096bbf:::10001e93542.000000d8:head v 14085'21147) currently commit_sent > > > -- > > Looking in the MDS log, with debug set to 4, it's full of "setfilelockrule > 1" and "setfilelockrule 2": > >> 2018-03-13 14:23:00.446905 7fde43e73700 4 mds.0.server >> handle_client_request client_request(client.9174621:141162337 >> setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 120, >> length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=1155, >> caller_gid=1131{}) v2 >> 2018-03-13 14:23:00.447050 7fde43e73700 4 mds.0.server >> handle_client_request client_request(client.9174621:141162338 >> setfilelockrule 2, type 4, owner 14971048137043556787, pid 4632, start 0, >> length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0, >> caller_gid=0{}) v2 >> 2018-03-13 14:23:00.447258 7fde43e73700 4 mds.0.server >> handle_client_request client_request(client.9174621:141162339 >> setfilelockrule 2, type 4, owner 14971048137043550643, pid 4632, start 0, >> length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0, >> caller_gid=0{}) v2 >> 2018-03-13 14:23:00.447393 7fde43e73700 4 mds.0.server >> handle_client_request client_request(client.9174621:141162340 >> setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 124, >> length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=1155, >> caller_gid=1131{}) v2 The MDS reporting slow requests when file locking in use is a bug, the ticket is: http://tracker.ceph.com/issues/22428 Probably only indirectly related to the stuck OSD requests: perhaps the application itself is having trouble promptly releasing locks because it is hung up on flushing its data to slow OSDs. John > > -- > > I don't have a particularly good monitoring set up on this cluster yet, but > a cursory look at a few things such as iostat doesn't seem to suggest OSDs > are being hammered. > > Some questions: > > 1) Can anyone recommend a way of diagnosing this issue? > 2) Are the multiple "setfilelockrule" per inode to be expected? I assume > this is something to do with the Samba oplocks. > 3) What's the recommended highest MDS debug setting before performance > starts to be adversely affected (I'm aware log files will get huge)? > 4) What's the best way of matching inodes in the MDS log to the file names > in cephfs? > > Hardware/Versions: > > Luminous 12.1.1 > Cephfs client 3.10.0-514.2.2.el7.x86_64 > Samba 4.4.4 > 4 node cluster, each node 1xIntel 3700 NVME, 12x SATA, 40Gbps networking > > Thanks in advance! > > Cheers, > David > > > > _______________________________________________ > ceph-users mailing list > [email protected]<mailto:[email protected]> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -----------------------------------------------------------------------------------
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
