Hoping someone may be able to help point out where my bottleneck(s) may be.

I have an 80TB kRBD image on an EC8:2 pool, with an XFS filesystem on top of 
This was not an ideal scenario, rather it was a rescue mission to dump a large, 
aging raid array before it was too late, so I'm working with the hand I was 

To further conflate the issues, the main directory structure consists of lots 
and lots of small file sizes, and deep directories.

My goal is to try and rsync (or otherwise) data from the RBD to cephfs, but its 
just unbearably slow and will take ~150 days to transfer ~35TB, which is far 
from ideal.

>          15.41G  79%    4.36MB/s    0:56:09 (xfr#23165, ir-chk=4061/27259)

> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.17    0.00    1.34   13.23    0.00   85.26
> Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     
> wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm 
> d_await dareq-sz  aqu-sz  %util
> rbd0           124.00      0.66     0.00   0.00   17.30     5.48   50.00      
> 0.17     0.00   0.00   31.70     3.49    0.00      0.00     0.00   0.00    
> 0.00     0.00    3.39  96.40

Rsync progress and iostat (during the rsync) from the rbd to a local ssd, to 
remove any bottlenecks doubling back to cephfs.
About 16G in 1h, not exactly blazing, this being 5 of the 7000 directories I'm 
looking to offload to cephfs.

Currently running 15.2.11, and the host is Ubuntu 20.04 (5.4.0-72-generic) with 
a single E5-2620, 64GB of memory, and 4x10GbT bond talking to ceph, iperf 
proves it out.
EC8:2, across about 16 hosts, 240 OSDs, with 24 of those being 8TB 7.2k SAS, 
and the other 216 being 2TB 7.2K SATA. So there are quite a few spindles in 
play here.
Only 128 PGs, in this pool, but its the only RBD image in this pool. Autoscaler 
recommends going to 512, but was hoping to avoid the performance overhead of 
the PG splits if possible, given perf is bad enough as is.

Examining the main directory structure it looks like there are 7000 files per 
directory, about 60% of which are <1MiB, and in all totaling nearly 5GiB per 

My fstab for this is:
> xfs   _netdev,noatime 0       0

I tried to increase the read_ahead_kb to 4M from 128K at 
/sys/block/rbd0/queue/read_ahead_kb to match the object/stripe size of the EC 
pool, but that doesn't appear to have had much of an impact.

The only thing I can think of that I could possibly try as a change would be to 
increase the queue depth in the rbdmap up from 128, so thats my next bullet to 

Attaching xfs_info in case there are any useful nuggets:
> meta-data=/dev/rbd0              isize=256    agcount=81, agsize=268435455 
> blks
>          =                       sectsz=512   attr=2, projid32bit=0
>          =                       crc=0        finobt=0, sparse=0, rmapbt=0
>          =                       reflink=0
> data     =                       bsize=4096   blocks=21483470848, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=0
> log      =internal log           bsize=4096   blocks=32768, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=0
> realtime =none                   extsz=4096   blocks=0, rtextents=0

And rbd-info:
> rbd image 'rbd-image-name:
>         size 85 TiB in 22282240 objects
>         order 22 (4 MiB objects)
>         snapshot_count: 0
>         id: a09cac2b772af5
>         data_pool: rbd-ec82-pool
>         block_name_prefix: rbd_data.29.a09cac2b772af5
>         format: 2
>         features: layering, exclusive-lock, object-map, fast-diff, 
> deep-flatten, data-pool
>         op_features:
>         flags:
>         create_timestamp: Mon Apr 12 18:44:38 2021
>         access_timestamp: Mon Apr 12 18:44:38 2021
>         modify_timestamp: Mon Apr 12 18:44:38 2021

Any other ideas or hints are greatly appreciated.


ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to