[ceph-users] Re: CephFS mirror very slow (maybe for small files?)

2023-11-15 Thread Stuart Cornell
Hi Jos,
 I have tried adding multiple daemons but it seems only 1 is active, and there 
is no improvement in throughput.
On further reading, you suggestion conflicts with the docs 
(https://docs.ceph.com/en/reef/dev/cephfs-mirroring/#:~:text=Multiple%20mirror%20daemons%20can%20be,set%20thus%20providing%20high%2Davailability.)
It recommends against multiple daemons and it also says that it "...rebalances 
the directory assignment amongst the new set thus providing high-availability." 
This sounds like it can only balance if there are multiple directories 
registered for snapshot. As mentioned in my OP we must use only 1.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS mirror very slow (maybe for small files?)

2023-11-15 Thread Stuart Cornell
Thankyou Jos.
 I will try the multiple daemons to see how that helps. It looks like I need to 
wait for the fix [1] to be in a release (currently pending review) before I can 
apply it.

Stuart
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS mirror very slow (maybe for small files?)

2023-11-13 Thread Jos Collin

Hi Stuart,

I would highly recommend you to have this [1] fix, so that the mirroring 
works as expected and uses the prev snapshot for syncing.

Having multiple mirror daemons also improves the speed.

[1] https://github.com/ceph/ceph/pull/54405

- Jos Collin

On 13/11/23 21:31, Stuart Cornell wrote:

Hi all.
I have successfully configured an operational mirror between 2 sites for Ceph 
FS. The mirroring is running but the speed of data transfer is varying a lot 
over time (200KB/s – 120MB/s). The network infrastructure between the two Ceph 
clusters is reliable and should not be the cause of this speed variation. The 
FS in question, has a lot of small files in it and I suspect this is the cause 
of the variability – ie, the transfer of many small files will be more impacted 
by greater site-site latency.
If this suspicion is true, what options do I have to improve the overall 
throughput?

   *   Is it possible to parallelise or “chunk” the transfers with some options 
to the mirror daemon?
   *   Would the use of multiple snapshot mirror points help?
  *   Note that I am currently forced to use only a single point ( ceph fs snapshot 
mirror add   ) because the part of the FS in use is managed by 
Openstack Manila which created a subvolume. Requests to add mirrors for sub-directories are 
therefore denied.
Any suggestions for how I can improve this throughput would be most welcome.
Currrently I am running Pacific (16.2.10) on the sender and Quincy (17.2.6) on 
the target.

Stuart Cornell
Cloud Development Director
http://graphcore.ai



** We have updated our privacy policy, which contains important information about how 
we collect and process your personal data. To read the policy, please click 
here **

This email and its attachments are intended solely for the addressed recipients 
and may contain confidential or legally privileged information.
If you are not the intended recipient you must not copy, distribute or 
disseminate this email in any way; to do so may be unlawful.

Any personal data/special category personal data herein are processed in 
accordance with UK data protection legislation.
All associated feasible security measures are in place. Further details are 
available from the Privacy Notice on the website and/or from the Company.

Graphcore Limited (registered in England and Wales with registration number 
10185006) is registered at 107 Cheapside, London, UK, EC2V 6DN.
This message was scanned for viruses upon transmission. However Graphcore 
accepts no liability for any such transmission.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS mirror very slow (maybe for small files?)

2023-11-13 Thread Peter Grandi
> the speed of data transfer is varying a lot over time (200KB/s
> – 120MB/s). [...] The FS in question, has a lot of small files
> in it and I suspect this is the cause of the variability – ie,
> the transfer of many small files will be more impacted by
> greater site-site latency.

200KB/s on small files across sites? That's pretty good. I have
seen rates of 3-5KB/s on some Ceph instances for reading local
small files, never mind remotely.

> If this suspicion is true, what options do I have to improve
> the overall throughput?

In practice not much. Perhaps switching to all-RAM storage (with
battery backup) for OSDs might help :-). In one case by undoing
some of the more egregious issues I managed to improve small
file transfer rates locally by 10 times, that is to 40-60KB/s.
In your case a 10 times, if achievable, improvement might get
you transfer rates of 2MB/s. Often the question is not just
longer network latency, but whether your underlying storage can
sustain the IOPS needed for "scan" type operations at the same
time as user workload.

Perhaps it would go a lot faster if you just RSYNC, or even just
'tar -f - -c ... | ssh ... tar =f - -x' (or 'rclone' if you
don't use CephFS) and it would be worth doing a test of
transferring a directory (or bucket if you don't use CephFS)
with small files by RSYNC and/or 'tar' to a non-Ceph remote
target and a Ceph remote target to see what you could achieve.

No network/sharded filesystem (and very few local ones) handles
well small files. In some cases I have seen Ceph was used to
store a traditional filesystem image of a type more suitable for
small files, mounted on a loop device.

https://www.sabi.co.uk/blog/anno05-4th.html?051016#051016
https://www.sabi.co.uk/blog/0909Sep.html?090919#090919
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io