Hi, experts,

We are using cephfs(15.2.*) with kernel mount on our production environment. 
And these days when we do massive read from cluster(multi processes),  ceph 
health always report slow ops for some osds(build with hdd(8TB) which using ssd 
as db cache).

our cluster have more read than write request.

health log like below:
100 slow ops, oldest one blocked for 114 sec, [osd.* ...] has slow ops (SLOW 
_OPS)

my question is does there any best practices to process hundreds of millions small files(means 100kb-300kb each file and 10000+ files in each directory, also more than 5000 directory)?

Small files are slow on any HDD system, each HDD can only do around 100 opsĀ  per sec. Some things to try, not that some may involve data copy-ing:

-If your workload logic involves more processing on recent files, may have 2 pools, 1 ssd pool for recent files and a larger hdd for less accessed archived files.

-if you can group files to be processed in groups, maybe store them in larger lumps like via tar files or even re-structure their data in SQLite , then you would modify the processing application to tar/untar the goup, or access data via SQLite.

-it may help to reduce the read_ahead_kb on your HHD devices to reduce un-needed load.

-Using dm-cache on the HDD may help, though our experience with it is not great (we use dm-writecache instead but is geared for speeding writes), it should cache more recent read objects on ssd, but its promotion algorithms may not match your workload pattern, maybe try it first in a lab with similar workload pattern.

-Using Ceph cache tier may help, though our experience with it is not great, its support is also deprecated

-The file sizes, average 150kb, are not large but also not extremely small, you could lower the application concurrency/processes so not to stress the disk % busy over say 80%, with 150kb size you should get around 10 MB/s read speeds from your HDD. Having too much processes could actually slow things.

-You may want to lower your scrub rates or increase the scrub window, if you have a lot of small files this will already be stressing your HDDs.

-Any Ceph recovery healing with small files on HDD will also slow things down but it is something to bear in mind not too much we can do.

/maged

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to