Re: [Bug 5124] Parallelize the rsync run using multiple threads and/or connections

2023-04-17 Thread Harry Mangalam via rsync
Obviously self-promotional, but Parsyncfp?

https://github.com/hjmangalam/parsyncfp


There are other like scripted parallel rsyncs, but I like this one.  ;)

Hjm

On Mon, Apr 17, 2023, 3:42 AM just subscribed for rsync-qa from bugzilla
via rsync  wrote:

> https://bugzilla.samba.org/show_bug.cgi?id=5124
>
> --- Comment #12 from Paulo Marques  ---
> Using multiple connections also helps when you have LACP network links,
> which
> are relatively common in data center setups to have both redundancy and
> increased bandwidth.
>
> If you have two 1Gbps links aggregated, you can only use 1Gbps using
> rsync, but
> you could use 2Gbps if rsync made several connections from different TCP
> ports.
>
> --
> You are receiving this mail because:
> You are the QA Contact for the bug.
> --
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
>
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: [Bug 5124] Parallelize the rsync run using multiple threads and/or connections

2019-02-07 Thread Marc Roos via rsync

I made a bash script doing this in parallel, checks how many rsyncs are 
running and then starts another 'concurrent one'. My parallel sessions 
are against different servers. I doubt if it would make any sense doing 
multiple sessions between the same two hosts. My single rsync sessions 
was already limited by the hosts iops. So two threads would run at half 
speed.

IMO rsync does what it needs to do, if you want it to run in parallel 
execute it in parallel.


 >
 >--- Comment #8 from Michael  ---
 >+1 from me on this.
 >
 >We have several situations where we need to copy a large number of 
very small
 >files, and I expect that having multiple file transfer threads, 
allowing say ~5
 >transfers concurrently, would speed up the process considerably. I 
expect that
 >this would also make better use of the available network bandwidth as 
each
 >transfer appears to have an overhead for starting and completing the 
transfer

So test it with two or 3 concurrent sessions.

 >which makes the effective transfer rate far less than the available 
network
 >bandwidth. This is the method one of our pieces of backup software 
uses to
 >speed up backups and is also implemented in FileZilla for file 
transfers.
 >Consider a very large file that needs to be transferred, along with a 
number of
 >small files. In a single transfer mode, all other files would need to 
wait
 >while the large file is transferred. If there are multiple transfers 
happening
 >concurrently, the smaller files will continue transferring while the 
large file
 >transfers. I have seen the benefits of this sort of implementation in 
other
 >software.
 >
 >I can also see benefits in having file transfers begin whilst rsync is
 >comparing files. This could logically work if you consider rsync makes 
a 'list'
 >of files to be transferred and that it begins transferring files as 
soon as
 >this list begins to be populated. In situations where there are a 
large number
 >of files and few of these files changed, the sync could effectively be
 >completed by the time rsync is finished comparing files (given the few 
changed
 >files may have already been transferred during the file comparison). 
This also
 >is effectively implemented in FileZilla (consider copying a directory 
in which
 >FileZilla has to recurse into each directory and add each file to copy 
into the
 >queue).
 >
 >Interestingly, I assumed this was already an option for rsync, so I 
went
 >looking to find the necessary option. However, all I found were the 
previously
 >mentioned hacks, which weren't what I was going for.
 >
 >
 >

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [Bug 5124] Parallelize the rsync run using multiple threads and/or connections

2014-02-10 Thread Jason Haar
On 26/01/14 18:03, L.A. Walsh wrote:
 But multiple TCP connections are not used to load a single picture. 
 They are used
 for separate items on the page.  A single TCP stream CAN be very fast
 and rsync
 isn't close to hitting that limit.
 The proof?   Using 1Gb connections, smb/cifs could get 125MB/s writes and
 119MB/s reads -- the writes were at theoretical speeds and were
 faster, because
 the sender doesn't have to wait for the ACKs with a large window size.


A bit late but I'll add my 2c worth.

bbcp - multi-tcp/threaded application. Completely nails rsync when
transferring over high-bandwidth/high-latency links

http://moo.nac.uci.edu/~hjm/HOWTO_move_data.html

Like bittorrent, it establishes multiple TCP channels between a bbcp
client and server, and I guess has a parent process that tells each
child what part of the directory structure/data stream it is responsible
for, and joins it all up at the other end

I have tested rsync over a 100Mbs continental link and am lucky to get
10Mbs. Using bbcp with 4-6 channels, I can get 40-50Mbs (that's on a
link with other real traffic on it - so it may have actually got
80-90Mbs byt itself for all I know)

-- 
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +1 408 481 8171
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [Bug 5124] Parallelize the rsync run using multiple threads and/or connections

2014-01-28 Thread L.A. Walsh



Matthias Schniedermeyer wrote:

On 25.01.2014 21:03, L.A. Walsh wrote:

If rsync already hitting over 200MB (even over 100MB)/s I'd agree that using
multiple TCP connections would help as they could be processed by different
cpu's on both ends.  But since it doesn't even hit 100MB/s locally, the
limit *isn't* TCP connections.


Just FYI.
Rsync 3.1 got much better in that regard.
When i rsync a file locally from tmpfs to tmpfs rsync does that with 
about 560MB/s. Rsync 3.0.X managed to do less than half of that.

---


rsync --version

rsync  version 3.1.0  protocol version 31

To run an rsync and compare SRC with a DST specified by --compare-dest,
to an empty partition (i.e. just copy differences between SRC + cmp-dest
to the empty partition, on a 1TB partition takes 45-90 minutes.
Usually thats:
 Home-2014.01.12-03.07.03 HnS -wi-ao---   1.04g
  Home-2014.01.14-03.07.07 HnS -wi-ao---   2.05g
  Home-2014.01.16-03.07.02 HnS -wi-ao---   1.42g
  Home-2014.01.18-03.07.03 HnS -wi-ao---   1.26g
  Home-2014.01.20-03.07.03 HnS -wi-ao---   2.30g
  Home-2014.01.21-03.07.03 HnS -wi-ao---   2.96g
  Home-2014.01.22-03.07.03 HnS -wi-ao---   1.57g
  Home-2014.01.23-03.07.03 HnS -wi-ao---   1.80g


1-3g in length.

3g/45minutes = 3g/2700s = 1.38MB/s -- not even close to 100MB.

It has to read time  date stamps of quite a few files, but my
best local speed has been under 100MB.

What size files are you transferring and how many?  (average).

My times are for my home partition, (in case that wasn't
obvious from the partition names above... ;-))...

It has 4,986,955 files in 716132824K (~683M).
Lots of seeks, very little full speed reads (up to ~1TB/s raid)


So.. what size files and how much info are you transferring and
to/from what type of disks?






--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [Bug 5124] Parallelize the rsync run using multiple threads and/or connections

2014-01-28 Thread Matthias Schniedermeyer
On 28.01.2014 04:26, L.A. Walsh wrote:
 
 
 Matthias Schniedermeyer wrote:
 On 25.01.2014 21:03, L.A. Walsh wrote:
 If rsync already hitting over 200MB (even over 100MB)/s I'd agree that using
 multiple TCP connections would help as they could be processed by different
 cpu's on both ends.  But since it doesn't even hit 100MB/s locally, the
 limit *isn't* TCP connections.
 
 Just FYI.
 Rsync 3.1 got much better in that regard.
 When i rsync a file locally from tmpfs to tmpfs rsync does that
 with about 560MB/s. Rsync 3.0.X managed to do less than half of
 that.
 ---
 
 rsync --version
 rsync  version 3.1.0  protocol version 31
 
 To run an rsync and compare SRC with a DST specified by --compare-dest,
 to an empty partition (i.e. just copy differences between SRC + cmp-dest
 to the empty partition, on a 1TB partition takes 45-90 minutes.
 Usually thats:
  Home-2014.01.12-03.07.03 HnS -wi-ao---   1.04g
   Home-2014.01.14-03.07.07 HnS -wi-ao---   2.05g
   Home-2014.01.16-03.07.02 HnS -wi-ao---   1.42g
   Home-2014.01.18-03.07.03 HnS -wi-ao---   1.26g
   Home-2014.01.20-03.07.03 HnS -wi-ao---   2.30g
   Home-2014.01.21-03.07.03 HnS -wi-ao---   2.96g
   Home-2014.01.22-03.07.03 HnS -wi-ao---   1.57g
   Home-2014.01.23-03.07.03 HnS -wi-ao---   1.80g
 
 
 1-3g in length.
 
 3g/45minutes = 3g/2700s = 1.38MB/s -- not even close to 100MB.
 
 It has to read time  date stamps of quite a few files, but my
 best local speed has been under 100MB.
 
 What size files are you transferring and how many?  (average).
 
 My times are for my home partition, (in case that wasn't
 obvious from the partition names above... ;-))...
 
 It has 4,986,955 files in 716132824K (~683M).
 Lots of seeks, very little full speed reads (up to ~1TB/s raid)
 
 
 So.. what size files and how much info are you transferring and
 to/from what type of disks?

For the number i used in the email i transferred a 10GB file from a 
tmpfs to the same tmpfs.

 cd /tmpfs
 dd if=/dev/zero of=zero bs=1M count=10k
 rsync -avP zero zero.2
sending incremental file list
zero
 10,737,418,240 100%  562.67MB/s0:00:18 (xfr#1, to-chk=0/1)

sent 10,740,039,772 bytes  received 35 bytes  580,542,692.27 bytes/sec
total size is 10,737,418,240  speedup is 1.00


As i personally don't have any hardware that can sustain such a 
perfomance i can really only compare dry performance in a tmpfs, 
against the older rsync.

On my real hardware i only have SSDs, non-RAID HDDs and 
Gigabit-Ethernet, neither of those can sustain the bandwidth rsync 
delivers, so my copy operations are mostly using all available 
bandwidth.
Or are seek/latency limited when i transfer many small files. Even the 
SSDs i own aren't that great in that regard (good at the read part, not 
that good at the random-write part).

That you don't get good performance (copying/synchronising) nearly 5 
million files doesn't surprise me at all, HDDs are really bad for that. 
Even high performance RAIDs with many spindles only reduce that problem 
but can't eliminate it.

I remember a whitepaper Intel had released a few years ago where they 
compared SSDs performance against a high-performance RAID (16 or 24 15k 
RPM HDDs IIRC) in an Exchange-Email scenario, so about as random as it 
gets. AFAIR just 1 SSD had 80% of the seek performance than the entire 
high performance RAID system they benchmarked it against. And the 
benchmark used 3 SSDs, so it was like something about 200% better 
performance for the 3 SSDs vs. the high performance RAID.
Copying ginourmous amounts of small files is pratically not much 
different than random seeking (what the Exchange benchmark was about).

What works against you is that rsync processes the file-list in 
alphabetically order, if the files would be processed in creation 
order(*), the performance MIGHT be better as that would reduce seeks, 
but AFAIK there is really no way to do that with rsync. (Keeping track 
of which files and what order the files where created by inotify and 
doing a separated copy of new files in creation order, might be a way 
of speeding up that part. And than an additional rsync-run for the rest)


In conclusion: Sorry, i can't really help you with your problem.




*:
But that heavily depends on workload.
And you would need a way to determine that, either CRTIME or inode.
CRTIME isn't supported by all fileystems, inode is usually not 
GUARANTEED to be deterministic, altough it might be.

With that said, that could work in a scenario where files are created 
with full contents and not changed after that. If file are changed over 
time that would only help for the newly created file, but not for 
changed files.



-- 

Matthias
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [Bug 5124] Parallelize the rsync run using multiple threads and/or connections

2014-01-26 Thread Matthias Schniedermeyer
On 25.01.2014 21:03, L.A. Walsh wrote:
 
 If rsync already hitting over 200MB (even over 100MB)/s I'd agree that using
 multiple TCP connections would help as they could be processed by different
 cpu's on both ends.  But since it doesn't even hit 100MB/s locally, the
 limit *isn't* TCP connections.

Just FYI.
Rsync 3.1 got much better in that regard.
When i rsync a file locally from tmpfs to tmpfs rsync does that with 
about 560MB/s. Rsync 3.0.X managed to do less than half of that.




-- 

Matthias
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [Bug 5124] Parallelize the rsync run using multiple threads and/or connections

2014-01-25 Thread L.A. Walsh

samba-b...@samba.org wrote:

https://bugzilla.samba.org/show_bug.cgi?id=5124

--- Comment #5 from Andrew J. Kroll fo...@dr.ea.ms 2014-01-19 03:10:35 UTC ---
 Another proven case is your typical modern web browser.
There is a very good reason why multiple connections are used to load in those
pretty pictures you see. It is all about getting around the latency by using
TCP as a double buffer

---
But multiple TCP connections are not used to load a single picture.  
They are used
for separate items on the page.  A single TCP stream CAN be very fast 
and rsync
isn't close to hitting that limit. 


The proof?   Using 1Gb connections, smb/cifs could get 125MB/s writes and
119MB/s reads -- the writes were at theoretical speeds and were 
faster, because

the sender doesn't have to wait for the ACKs with a large window size.

Using a 10Gb connection (OR 2 of them using link aggregation to deliver up
to 20Gb), I'm limited to 400-550MB/s -- I don't get close to theoretical
maximums -- the bottleneck -- the *single* TCP connection that is allowed
per user between client and server.  1 TCP connection is limited by
how fast the CPU's can process and move the data in memory.  Depending on
the transfer size, and direction I can move the CPU bounding between client
and server, but no matter what, it hits the cpu limit for processing 1 
TCP connection

on one end or the other.

If rsync already hitting over 200MB (even over 100MB)/s I'd agree that using
multiple TCP connections would help as they could be processed by different
cpu's on both ends.  But since it doesn't even hit 100MB/s locally, the
limit *isn't* TCP connections.

Note -- I could probably get faster speeds if CIFS wasn't limited to 
64K packet

sizes, but that's another limitation.

1 TCP connection could go faster if I had faster CPU's, but that's not 
likely to happen

soon.  TCP isn't the limit.


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html