Re: Pushing Terrabytes of files across a WAN

Gil Tene Sun, 16 Sep 2018 08:34:59 -0700

Assuming your 40TB is spread across many files, the main thing I'd play 
with is the number of threads used in the copy (Robocopy has a /MT:n option 
which defaults to 8 but can be set as high as 128 
per 
https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/robocopy).
 
On the assumption that each thread uses blocking APIs on a TCP connection, 
using the highest number of threads (128) will allow you to keep the 
throughput and TCP window sizes of each TCP connection closer to the range 
that more mere mortals work at. and still be able to saturate this cool 
10Gbps high latency link.

To saturate the 10Gbps link, your 128 connections will average ~80Mbps per 
connection. With an e.g. ~50msec actual RTT, this means connections will 
need at least a ~512KB send buffer (and probably 2-3x that since you'll see 
variance across them, and some will need to be faster than the average). On 
older OSs this could be a challenge (and require tweaking default send and 
receive buffer limits), but it should not be a problem if RFC 1323 TCP 
auto-scaling is used an supported on both sides. I would like to assume 
that any Windows and CentOS setup that has a 10Gbps NIC also has TCP window 
scaling support on by default and that all network elements on the path 
(including e.g. firewalls) don't block TCP Window Scale Option in RFC 1323, 
but assumptions like that could be a bit presumptuous. Just in case, there 
is some useful discussion here 
<https://blogs.technet.microsoft.com/netgeeks/2018/05/21/a-word-about-autotuninglevel-tcp-receive-auto-tuning-level-explained/>
 .

One other obvious question to ask, since (as noted by Thomas) your average 
transfer speed will need to be at least 3.7Gbps in order to complete 40TB 
in each 24hr period, is whether anyone else is using that nice fat 10Gbps 
link... It the link dedicated to you? You'll be using well over 30% of it's 
capacity,, and it's enough for two or more other users like you to want to 
do the same for the math to never work out...

[ has window scaling enabled (which I'd hope to see on on any machine with 
a 10Gbps network card), this shouldn't be a problem. But just in case, I'd 

On Thursday, September 13, 2018 at 10:15:23 PM UTC+2, Jay Askren wrote:
>
> Todd Montgomvery,
>
> Correct, robocopy uses TCP.  We have a 10 Gbps terrestrial line and are 
> working on getting a second line for redundancy.
>
>
> Thomas, 
> Thanks for the link.  I will look at that page.
>
> Jay
>
>
>
> On Thursday, September 13, 2018 at 11:04:54 AM UTC-6, Todd L. Montgomery 
> wrote:
>>
>> Hi Jay.
>>
>> Going to assume Robocopy uses TCP....
>>
>> As you had no real issues with things without a WAN, I would assume the 
>> TCP window sizes, etc. are all good for the rates you need.
>>
>> Latency will play a role, but more likely loss is a more impactful factor 
>> as congestion control will be more of a throttle than flow control. With 
>> TCP (low loss rate), RTT scales linearly with throughput. Well, as RTT goes 
>> up, throughput goes down, but it is linear. With loss, even low loss, 
>> throughput scales with sqrt(loss rate). After about 5%, TCP-Reno goes into 
>> stop-and-wait. 1 MSS per RTT. This scale is non-linear and in the < 5% loss 
>> rate area is really painful on throughput.
>>
>> In short, WANs will slow down with loss quite a lot. Latency will also 
>> have an impact, though. Just not as much potential.
>>
>> Running multiple TCP connections over the same path will mean that they 
>> will fight with one another via congestion control trying to find a 
>> fairness point that jumps around and can end up underutilizing the 
>> bandwidth at times. This is where things like TCP BBR can be helpful. But 
>> still, loss will cause quite a slow down.
>>
>> What can you do? Well, it depends on what your links between the areas 
>> actually are..... terrestrial vs. satellite, etc. Lots of options.
>>
>> On Thu, Sep 13, 2018 at 9:41 AM Jay Askren <jay.a...@gmail.com> wrote:
>>
>>> We need to push 40 TB of images per day from our scanning department in 
>>> Utah to our storage servers in Virginia and then we download about 4 TB of 
>>> processed images per day back to Utah.  In our previous process we had no 
>>> problem getting the throughput we needed by using Robocopy which comes with 
>>> Windows, but our old storage servers were here in Utah.  We can get 
>>> Robocopy to work across the WAN but we have to run 3 or 4 Robocopy 
>>> processes under different Windows users which is somewhat fragile and feels 
>>> like a bad hack.  The files here in Utah are on a Windows server because of 
>>> the proprietary software needed to run the scanner.  All of our servers in 
>>> Virginia run Centos.
>>>
>>> Any thoughts on how to transfer files over long distance and still get 
>>> high throughput?  I believe the issue we are running into is high latency.
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "mechanical-sympathy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to mechanical-sympathy+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Pushing Terrabytes of files across a WAN

Reply via email to