Re: [Bug 5124] Parallelize the rsync run using multiple threads and/or connections
Obviously self-promotional, but Parsyncfp? https://github.com/hjmangalam/parsyncfp There are other like scripted parallel rsyncs, but I like this one. ;) Hjm On Mon, Apr 17, 2023, 3:42 AM just subscribed for rsync-qa from bugzilla via rsync wrote: > https://bugzilla.samba.org/show_bug.cgi?id=5124 > > --- Comment #12 from Paulo Marques --- > Using multiple connections also helps when you have LACP network links, > which > are relatively common in data center setups to have both redundancy and > increased bandwidth. > > If you have two 1Gbps links aggregated, you can only use 1Gbps using > rsync, but > you could use 2Gbps if rsync made several connections from different TCP > ports. > > -- > You are receiving this mail because: > You are the QA Contact for the bug. > -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: > https://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html > -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
RE: [Bug 5124] Parallelize the rsync run using multiple threads and/or connections
I made a bash script doing this in parallel, checks how many rsyncs are running and then starts another 'concurrent one'. My parallel sessions are against different servers. I doubt if it would make any sense doing multiple sessions between the same two hosts. My single rsync sessions was already limited by the hosts iops. So two threads would run at half speed. IMO rsync does what it needs to do, if you want it to run in parallel execute it in parallel. > >--- Comment #8 from Michael --- >+1 from me on this. > >We have several situations where we need to copy a large number of very small >files, and I expect that having multiple file transfer threads, allowing say ~5 >transfers concurrently, would speed up the process considerably. I expect that >this would also make better use of the available network bandwidth as each >transfer appears to have an overhead for starting and completing the transfer So test it with two or 3 concurrent sessions. >which makes the effective transfer rate far less than the available network >bandwidth. This is the method one of our pieces of backup software uses to >speed up backups and is also implemented in FileZilla for file transfers. >Consider a very large file that needs to be transferred, along with a number of >small files. In a single transfer mode, all other files would need to wait >while the large file is transferred. If there are multiple transfers happening >concurrently, the smaller files will continue transferring while the large file >transfers. I have seen the benefits of this sort of implementation in other >software. > >I can also see benefits in having file transfers begin whilst rsync is >comparing files. This could logically work if you consider rsync makes a 'list' >of files to be transferred and that it begins transferring files as soon as >this list begins to be populated. In situations where there are a large number >of files and few of these files changed, the sync could effectively be >completed by the time rsync is finished comparing files (given the few changed >files may have already been transferred during the file comparison). This also >is effectively implemented in FileZilla (consider copying a directory in which >FileZilla has to recurse into each directory and add each file to copy into the >queue). > >Interestingly, I assumed this was already an option for rsync, so I went >looking to find the necessary option. However, all I found were the previously >mentioned hacks, which weren't what I was going for. > > > -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: [Bug 5124] Parallelize the rsync run using multiple threads and/or connections
On 26/01/14 18:03, L.A. Walsh wrote: But multiple TCP connections are not used to load a single picture. They are used for separate items on the page. A single TCP stream CAN be very fast and rsync isn't close to hitting that limit. The proof? Using 1Gb connections, smb/cifs could get 125MB/s writes and 119MB/s reads -- the writes were at theoretical speeds and were faster, because the sender doesn't have to wait for the ACKs with a large window size. A bit late but I'll add my 2c worth. bbcp - multi-tcp/threaded application. Completely nails rsync when transferring over high-bandwidth/high-latency links http://moo.nac.uci.edu/~hjm/HOWTO_move_data.html Like bittorrent, it establishes multiple TCP channels between a bbcp client and server, and I guess has a parent process that tells each child what part of the directory structure/data stream it is responsible for, and joins it all up at the other end I have tested rsync over a 100Mbs continental link and am lucky to get 10Mbs. Using bbcp with 4-6 channels, I can get 40-50Mbs (that's on a link with other real traffic on it - so it may have actually got 80-90Mbs byt itself for all I know) -- Cheers Jason Haar Information Security Manager, Trimble Navigation Ltd. Phone: +1 408 481 8171 PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1 -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: [Bug 5124] Parallelize the rsync run using multiple threads and/or connections
Matthias Schniedermeyer wrote: On 25.01.2014 21:03, L.A. Walsh wrote: If rsync already hitting over 200MB (even over 100MB)/s I'd agree that using multiple TCP connections would help as they could be processed by different cpu's on both ends. But since it doesn't even hit 100MB/s locally, the limit *isn't* TCP connections. Just FYI. Rsync 3.1 got much better in that regard. When i rsync a file locally from tmpfs to tmpfs rsync does that with about 560MB/s. Rsync 3.0.X managed to do less than half of that. --- rsync --version rsync version 3.1.0 protocol version 31 To run an rsync and compare SRC with a DST specified by --compare-dest, to an empty partition (i.e. just copy differences between SRC + cmp-dest to the empty partition, on a 1TB partition takes 45-90 minutes. Usually thats: Home-2014.01.12-03.07.03 HnS -wi-ao--- 1.04g Home-2014.01.14-03.07.07 HnS -wi-ao--- 2.05g Home-2014.01.16-03.07.02 HnS -wi-ao--- 1.42g Home-2014.01.18-03.07.03 HnS -wi-ao--- 1.26g Home-2014.01.20-03.07.03 HnS -wi-ao--- 2.30g Home-2014.01.21-03.07.03 HnS -wi-ao--- 2.96g Home-2014.01.22-03.07.03 HnS -wi-ao--- 1.57g Home-2014.01.23-03.07.03 HnS -wi-ao--- 1.80g 1-3g in length. 3g/45minutes = 3g/2700s = 1.38MB/s -- not even close to 100MB. It has to read time date stamps of quite a few files, but my best local speed has been under 100MB. What size files are you transferring and how many? (average). My times are for my home partition, (in case that wasn't obvious from the partition names above... ;-))... It has 4,986,955 files in 716132824K (~683M). Lots of seeks, very little full speed reads (up to ~1TB/s raid) So.. what size files and how much info are you transferring and to/from what type of disks? -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: [Bug 5124] Parallelize the rsync run using multiple threads and/or connections
On 28.01.2014 04:26, L.A. Walsh wrote: Matthias Schniedermeyer wrote: On 25.01.2014 21:03, L.A. Walsh wrote: If rsync already hitting over 200MB (even over 100MB)/s I'd agree that using multiple TCP connections would help as they could be processed by different cpu's on both ends. But since it doesn't even hit 100MB/s locally, the limit *isn't* TCP connections. Just FYI. Rsync 3.1 got much better in that regard. When i rsync a file locally from tmpfs to tmpfs rsync does that with about 560MB/s. Rsync 3.0.X managed to do less than half of that. --- rsync --version rsync version 3.1.0 protocol version 31 To run an rsync and compare SRC with a DST specified by --compare-dest, to an empty partition (i.e. just copy differences between SRC + cmp-dest to the empty partition, on a 1TB partition takes 45-90 minutes. Usually thats: Home-2014.01.12-03.07.03 HnS -wi-ao--- 1.04g Home-2014.01.14-03.07.07 HnS -wi-ao--- 2.05g Home-2014.01.16-03.07.02 HnS -wi-ao--- 1.42g Home-2014.01.18-03.07.03 HnS -wi-ao--- 1.26g Home-2014.01.20-03.07.03 HnS -wi-ao--- 2.30g Home-2014.01.21-03.07.03 HnS -wi-ao--- 2.96g Home-2014.01.22-03.07.03 HnS -wi-ao--- 1.57g Home-2014.01.23-03.07.03 HnS -wi-ao--- 1.80g 1-3g in length. 3g/45minutes = 3g/2700s = 1.38MB/s -- not even close to 100MB. It has to read time date stamps of quite a few files, but my best local speed has been under 100MB. What size files are you transferring and how many? (average). My times are for my home partition, (in case that wasn't obvious from the partition names above... ;-))... It has 4,986,955 files in 716132824K (~683M). Lots of seeks, very little full speed reads (up to ~1TB/s raid) So.. what size files and how much info are you transferring and to/from what type of disks? For the number i used in the email i transferred a 10GB file from a tmpfs to the same tmpfs. cd /tmpfs dd if=/dev/zero of=zero bs=1M count=10k rsync -avP zero zero.2 sending incremental file list zero 10,737,418,240 100% 562.67MB/s0:00:18 (xfr#1, to-chk=0/1) sent 10,740,039,772 bytes received 35 bytes 580,542,692.27 bytes/sec total size is 10,737,418,240 speedup is 1.00 As i personally don't have any hardware that can sustain such a perfomance i can really only compare dry performance in a tmpfs, against the older rsync. On my real hardware i only have SSDs, non-RAID HDDs and Gigabit-Ethernet, neither of those can sustain the bandwidth rsync delivers, so my copy operations are mostly using all available bandwidth. Or are seek/latency limited when i transfer many small files. Even the SSDs i own aren't that great in that regard (good at the read part, not that good at the random-write part). That you don't get good performance (copying/synchronising) nearly 5 million files doesn't surprise me at all, HDDs are really bad for that. Even high performance RAIDs with many spindles only reduce that problem but can't eliminate it. I remember a whitepaper Intel had released a few years ago where they compared SSDs performance against a high-performance RAID (16 or 24 15k RPM HDDs IIRC) in an Exchange-Email scenario, so about as random as it gets. AFAIR just 1 SSD had 80% of the seek performance than the entire high performance RAID system they benchmarked it against. And the benchmark used 3 SSDs, so it was like something about 200% better performance for the 3 SSDs vs. the high performance RAID. Copying ginourmous amounts of small files is pratically not much different than random seeking (what the Exchange benchmark was about). What works against you is that rsync processes the file-list in alphabetically order, if the files would be processed in creation order(*), the performance MIGHT be better as that would reduce seeks, but AFAIK there is really no way to do that with rsync. (Keeping track of which files and what order the files where created by inotify and doing a separated copy of new files in creation order, might be a way of speeding up that part. And than an additional rsync-run for the rest) In conclusion: Sorry, i can't really help you with your problem. *: But that heavily depends on workload. And you would need a way to determine that, either CRTIME or inode. CRTIME isn't supported by all fileystems, inode is usually not GUARANTEED to be deterministic, altough it might be. With that said, that could work in a scenario where files are created with full contents and not changed after that. If file are changed over time that would only help for the newly created file, but not for changed files. -- Matthias -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: [Bug 5124] Parallelize the rsync run using multiple threads and/or connections
On 25.01.2014 21:03, L.A. Walsh wrote: If rsync already hitting over 200MB (even over 100MB)/s I'd agree that using multiple TCP connections would help as they could be processed by different cpu's on both ends. But since it doesn't even hit 100MB/s locally, the limit *isn't* TCP connections. Just FYI. Rsync 3.1 got much better in that regard. When i rsync a file locally from tmpfs to tmpfs rsync does that with about 560MB/s. Rsync 3.0.X managed to do less than half of that. -- Matthias -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: [Bug 5124] Parallelize the rsync run using multiple threads and/or connections
samba-b...@samba.org wrote: https://bugzilla.samba.org/show_bug.cgi?id=5124 --- Comment #5 from Andrew J. Kroll fo...@dr.ea.ms 2014-01-19 03:10:35 UTC --- Another proven case is your typical modern web browser. There is a very good reason why multiple connections are used to load in those pretty pictures you see. It is all about getting around the latency by using TCP as a double buffer --- But multiple TCP connections are not used to load a single picture. They are used for separate items on the page. A single TCP stream CAN be very fast and rsync isn't close to hitting that limit. The proof? Using 1Gb connections, smb/cifs could get 125MB/s writes and 119MB/s reads -- the writes were at theoretical speeds and were faster, because the sender doesn't have to wait for the ACKs with a large window size. Using a 10Gb connection (OR 2 of them using link aggregation to deliver up to 20Gb), I'm limited to 400-550MB/s -- I don't get close to theoretical maximums -- the bottleneck -- the *single* TCP connection that is allowed per user between client and server. 1 TCP connection is limited by how fast the CPU's can process and move the data in memory. Depending on the transfer size, and direction I can move the CPU bounding between client and server, but no matter what, it hits the cpu limit for processing 1 TCP connection on one end or the other. If rsync already hitting over 200MB (even over 100MB)/s I'd agree that using multiple TCP connections would help as they could be processed by different cpu's on both ends. But since it doesn't even hit 100MB/s locally, the limit *isn't* TCP connections. Note -- I could probably get faster speeds if CIFS wasn't limited to 64K packet sizes, but that's another limitation. 1 TCP connection could go faster if I had faster CPU's, but that's not likely to happen soon. TCP isn't the limit. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html