David, few inputs based on my working experience on cephFS. Might or might not 
be relevant to the current issue seen in your cluster.


  1.  Create Metadata pool on NVMe. Folks can claim not needed, but I have seen 
worst perf when on HDD though the Metadata size is very small.
  2.  In cephFS, ensure MDS node has enough RAM allocated for MDS cache(this 
will not improve drastic perf. But some extent). On side note, MDS has some bug 
related to oversubscribed memory usage regardless of the cache settings if you 
have more than 64GB RAM. Take a look. http://tracker.ceph.com/issues/21402

http://tracker.ceph.com/issues/22599

https://bugzilla.redhat.com/show_bug.cgi?id=1531679

  1.  cephFS is not great for small files(in KB’s) but works great with large 
file sizes(MB or GB’s). So using like filer(NFS/SMB) use-case needs 
administration attention.
  2.  Next thing to ensure if the large # of inode/file counts in cephFS. 
Ensure dirfrag, active/active MDS etc tunable are implemented on the luminous 
version you used on filestore or asking users not to store multi-million of 
small files in one dir(it’s debatable scenario, not sure how much control you 
have over you customer use-case)
  3.  Always use kernel mounts. ceph-fuse are super slow(3-5 times than kernel 
mounts), I hope you may know this.


--
Deepak

From: ceph-users [mailto:[email protected]] On Behalf Of David C
Sent: Wednesday, March 14, 2018 10:46 AM
To: John Spray <[email protected]>
Cc: ceph-users <[email protected]>
Subject: Re: [ceph-users] Cephfs MDS slow requests

Thanks, John. I'm pretty sure the root of my slow OSD issues is filestore 
subfolder splitting.

On Wed, Mar 14, 2018 at 2:17 PM, John Spray 
<[email protected]<mailto:[email protected]>> wrote:
On Tue, Mar 13, 2018 at 7:17 PM, David C 
<[email protected]<mailto:[email protected]>> wrote:
> Hi All
>
> I have a Samba server that is exporting directories from a Cephfs Kernel
> mount. Performance has been pretty good for the last year but users have
> recently been complaining of short "freezes", these seem to coincide with
> MDS related slow requests in the monitor ceph.log such as:
>
>> 2018-03-13 13:34:58.461030 osd.15 osd.15 
>> 10.10.10.211:6812/13367<http://10.10.10.211:6812/13367> 5752 :
>> cluster [WRN] slow request 31.834418 seconds old, received at 2018-03-13
>> 13:34:26.626474: osd_repop(mds.0.5495:810644 3.3e e14085/14019
>> 3:7cea5bac:::10001a88b8f.00000000:head v 14085'846936) currently commit_sent
>> 2018-03-13 13:34:59.461270 osd.15 osd.15 
>> 10.10.10.211:6812/13367<http://10.10.10.211:6812/13367> 5754 :
>> cluster [WRN] slow request 32.832059 seconds old, received at 2018-03-13
>> 13:34:26.629151: osd_repop(mds.0.5495:810671 2.dc2 e14085/14020
>> 2:43bdcc3f:::10001e91a91.00000000:head v 14085'21394) currently commit_sent
>> 2018-03-13 14:23:57.409427 osd.30 osd.30 
>> 10.10.10.212:6824/14997<http://10.10.10.212:6824/14997> 5708 :
>> cluster [WRN] slow request 30.536832 seconds old, received at 2018-03-13
>> 14:23:26.872513: osd_repop(mds.0.5495:865403 2.fb6 e14085/14077
>> 2:6df955ef:::10001e93542.000000c4:head v 14085'21296) currently commit_sent
>> 2018-03-13 14:23:57.409449 osd.30 osd.30 
>> 10.10.10.212:6824/14997<http://10.10.10.212:6824/14997> 5709 :
>> cluster [WRN] slow request 30.529640 seconds old, received at 2018-03-13
>> 14:23:26.879704: osd_repop(mds.0.5495:865407 2.595 e14085/14019
>> 2:a9a56101:::10001e93542.000000c8:head v 14085'20437) currently commit_sent
>> 2018-03-13 14:23:57.409453 osd.30 osd.30 
>> 10.10.10.212:6824/14997<http://10.10.10.212:6824/14997> 5710 :
>> cluster [WRN] slow request 30.503138 seconds old, received at 2018-03-13
>> 14:23:26.906207: osd_repop(mds.0.5495:865423 2.ea e14085/14055
>> 2:57096bbf:::10001e93542.000000d8:head v 14085'21147) currently commit_sent
>
>
> --
>
> Looking in the MDS log, with debug set to 4, it's full of "setfilelockrule
> 1" and "setfilelockrule 2":
>
>> 2018-03-13 14:23:00.446905 7fde43e73700  4 mds.0.server
>> handle_client_request client_request(client.9174621:141162337
>> setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 120,
>> length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=1155,
>> caller_gid=1131{}) v2
>> 2018-03-13 14:23:00.447050 7fde43e73700  4 mds.0.server
>> handle_client_request client_request(client.9174621:141162338
>> setfilelockrule 2, type 4, owner 14971048137043556787, pid 4632, start 0,
>> length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0,
>> caller_gid=0{}) v2
>> 2018-03-13 14:23:00.447258 7fde43e73700  4 mds.0.server
>> handle_client_request client_request(client.9174621:141162339
>> setfilelockrule 2, type 4, owner 14971048137043550643, pid 4632, start 0,
>> length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0,
>> caller_gid=0{}) v2
>> 2018-03-13 14:23:00.447393 7fde43e73700  4 mds.0.server
>> handle_client_request client_request(client.9174621:141162340
>> setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 124,
>> length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=1155,
>> caller_gid=1131{}) v2
The MDS reporting slow requests when file locking in use is a bug, the
ticket is:
http://tracker.ceph.com/issues/22428

Probably only indirectly related to the stuck OSD requests: perhaps
the application itself is having trouble promptly releasing locks
because it is hung up on flushing its data to slow OSDs.

John

>
> --
>
> I don't have a particularly good monitoring set up on this cluster yet, but
> a cursory look at a few things such as iostat doesn't seem to suggest OSDs
> are being hammered.
>
> Some questions:
>
> 1) Can anyone recommend a way of diagnosing this issue?
> 2) Are the multiple "setfilelockrule" per inode to be expected? I assume
> this is something to do with the Samba oplocks.
> 3) What's the recommended highest MDS debug setting before performance
> starts to be adversely affected (I'm aware log files will get huge)?
> 4) What's the best way of matching inodes in the MDS log to the file names
> in cephfs?
>
> Hardware/Versions:
>
> Luminous 12.1.1
> Cephfs client 3.10.0-514.2.2.el7.x86_64
> Samba 4.4.4
> 4 node cluster, each node 1xIntel 3700 NVME, 12x SATA, 40Gbps networking
>
> Thanks in advance!
>
> Cheers,
> David
>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]<mailto:[email protected]>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to