Re: Parallelizing rsync through multiple ssh connections

2021-12-16 Thread Robin H. Johnson via rsync
On Thu, Nov 04, 2021 at 04:58:03PM +0100, SERVANT Cyril via rsync wrote:
> Hi, I want to increase the speed of rsync transfers over ssh.
Thanks for your great email here.

Having had similar issues in the past in trying to rsync single large
files, I wanted to share some of the ideas I'd found to work:

HPN-SSH patches. The website is out of date, but don't let that put
you off. HPN-SSH can saturate 40Gbit links with tuning (but it's
absolutely work to do that tuning). The main things there are the buffer
patches, and the multithreaded AES, but you can use the NONE encryption
for benchmarking as well.

Intel had a paper from 2010 showing the HPN boost (and also other work
on multi streams):
https://www.intel.com/content/dam/support/us/en/documents/network/sb/fedexcasestudyfinal.pdf

Facebook's WARP/WDT tooling:
https://github.com/facebookarchive/wdt
https://opensourcelibs.com/lib/warp-cli

Lastly, I was trying multipath TCP:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_networking/getting-started-with-multipath-tcp_configuring-and-managing-networking
I didn't get very far on the MPTCP research angle.

I think all of these are likely to be complementary to your work on
partitioning the large file.

If you have a sample large file and permission to test without
encryption, temporarily replacing ssh w/ either the NONE cipher or
trying to use buffer-tuned netcat would let you identify what the
bottleneck of rsync is in your situation. I found previously that it
didn't do a good job on the rsync:// wire protocol over high-latency: it
had too many round trips and didn't do much work between them.

I think from looking at the rsync code in the past, the checksum system
in general is going to be your largest problem.
- it assumes that it's checking a single stream for each file
- meaningful replacement would be either independent per-segment
  checksums or something like a merkle tree
> 1. The need
> 
> TL;DR: we need to transfer one huge file quickly (100Gb/s) through ssh.
...
> In order to maximize transfers speed, I first tried different Ciphers / MACs.
> The list of authorized Ciphers / MACs is provided to me by our security team.
> With these constraints, I can reach 1Gb/s to 3Gb/s. I'm still far from the
> expected result. This is due to the way encryption/decryption work on modern
> CPUs: they are really efficient thanks to AES-NI, but are single-threaded. The
> bandwidth limiter is the speed of a single CPU core.
HPN-SSH MT-AES here gets you to many cores at the SSH level.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Parallelizing rsync through multiple ssh connections

2021-11-04 Thread SERVANT Cyril via rsync
Hi, I want to increase the speed of rsync transfers over ssh.


1. The need

TL;DR: we need to transfer one huge file quickly (100Gb/s) through ssh.

I'm working at CEA (Alternative Energies and Atomic Energy Commission) in
France. We have a compute cluster complex, and our customers regularly need to
transfer big files from and to the cluster. The bandwidth between the customers
and us is generally between 10Gb/s and 100Gb/s. The file system used is
LustreFS, and can handle more than 100Gb/s read and write. One security
constraint is the use of ssh for every connection.

In order to maximize transfers speed, I first tried different Ciphers / MACs.
The list of authorized Ciphers / MACs is provided to me by our security team.
With these constraints, I can reach 1Gb/s to 3Gb/s. I'm still far from the
expected result. This is due to the way encryption/decryption work on modern
CPUs: they are really efficient thanks to AES-NI, but are single-threaded. The
bandwidth limiter is the speed of a single CPU core.

So the next step is: just use parallel or xargs with rsync. And it works like a
charm in most cases. But not in compute clusters case. As I said earlier, the
files are stored in LustreFS. The good practice for this file system is to
create really few files, but big files. And with the way compute clusters work,
you generally end with one really big file, often hundreds of Gigabytes, or
even Terabytes.


2. What has been done

I created parallel-sftp (https://github.com/cea-hpc/openssh-portable/). It is
just a fork of openssh's sftp, which creates multiple ssh connections for a
single transfer. This way, parallelization is really simple : files are
transferred in parallel just like the parallel/xargs solution. And big files
are transferred by chunks directly into the destination file (created as a
sparse file). One big advantage of this solution is that it doesn't require any
server change. All the parallelization is made on the client side.

However, there are 2 caveats. There is no consistency check of the copied
files. And an interrupted transfer must be restarted from scratch, because
there is no way to exactly know which chunks of a big file are transferred.


3. Is rsync the best solution?

Now I'm thinking that adding parallelization to rsync is the best solution. We
could take advantage of the delta-transfer algorithm in order to just transfer
parts of a file. I can imagine a first rsync connection taking care of
detecting the diffs between local and distant files, and then forking (or
creating threads) for the actual transfers. The development work could be split
in two parts :
- adding the possibility to transfer part of a file (from byte x to byte y).
- adding the possibility to delegate the transfers to other threads /
  processes.

What do you think about this? Does it look feasible? If I develop it, does it
have a chance to be merged upstream? I understand it's kind of a niche use
case, but I know it's a frequent need in the super-computing world.

One important thing to note is that at CEA we have the manpower and will to
develop this functionality. We are also open to sponsoring, for development
and/or reviews.


Thank you,
-- 
Cyril

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


parallelizing rsync

2021-10-27 Thread Steve French via rsync
I noticed a few external threads on problems with rsync performance
due to lack of parallelization.   For example modern Linux tools are
supposed to use "io_uring" for parallel i/o and some apparently have
done some experiments with this, and this is something we were looking
at as well. For example - google search shows multiple examples for
rsync with io_uring

https://news.ycombinator.com/item?id=23132549

and

https://wheybags.com/blog/wcp.html

Are there upstream patches already in progress for rsync this - or
should we submit something similar?

-- 
Thanks,

Steve

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html