[gpfsug-discuss] Tuning AFM for high throughput/high IO over _really_ long

Jake Carroll Wed, 09 Nov 2016 10:33:45 -0800

Scott,

Nar, very much pure AFM to AFM here, hence we are a little surprised. Last time 
we did this over a longish link we almost caused an outage with the ease at 
which we attained throughput - but maybe there are some magic tolerances we are 
hitting in latency and in flight IO semantics that SS/GPFS/AFM is not well 
tweaked for (yet...)...


Yes - we are catching up at SC. I think it's all been arranged? We are also 
talking to one of your resources about this AFM throughput behaviour this 
afternoon. John I believe his name is?

Anyway - if you've got any ideas, am all ears!
> 
> 
> Today's Topics:
> 
>   1. Re: Tuning AFM for high throughput/high IO over    _really_ long
>      distances (Scott Fadden)
>   2. Re: Tuning AFM for high throughput/high IO over _really_ long
>      distances (Jan-Frode Myklebust) (Jake Carroll)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Wed, 9 Nov 2016 10:08:42 -0800
> From: "Scott Fadden" <[email protected]>
> To: gpfsug main discussion list <[email protected]>
> Subject: Re: [gpfsug-discuss] Tuning AFM for high throughput/high IO
>    over    _really_ long distances
> Message-ID:
>    
> <of438b4e0a.3df65f65-on88258066.006179ea-88258066.0063a...@notes.na.collabserv.com>
>    
> Content-Type: text/plain; charset="utf-8"
> 
> Jake,
> 
> If AFM is using NFS it is all about NFS tuning. The copy from one side to 
> the other is basically just a client writing to an NFS mount. Thee are a 
> few things you can look at:
> 1. NFS Transfer size (Make is 1MiB, I think that is the max)
> 2. TCP Tuning for large window size. This is discussed on Tuning active 
> file management home communications in the docs. On this page you will 
> find some discussion on increasing gateway threads, and other things 
> similar that may help as well.
> 
> We can discuss further as I understand we will be meeting at SC16.
> 
> Scott Fadden
> Spectrum Scale - Technical Marketing 
> Phone: (503) 880-5833 
> [email protected]
> http://www.ibm.com/systems/storage/spectrum/scale
> 
> 
> 
> From:   Jake Carroll <[email protected]>
> To:     "[email protected]" 
> <[email protected]>
> Date:   11/09/2016 09:39 AM
> Subject:        [gpfsug-discuss] Tuning AFM for high throughput/high IO 
> over    _really_ long distances
> Sent by:        [email protected]
> 
> 
> 
> Hi.
> 
> I?ve got an GPFS to GPFS AFM cache/home (IW) relationship set up over a 
> really long distance. About 180ms of latency between the two clusters and 
> around 13,000km of optical path. Fortunately for me, I?ve actually got 
> near theoretical maximum IO over the NIC?s between the clusters and I?m 
> iPerf?ing at around 8.90 to 9.2Gbit/sec over a 10GbE circuit. All MTU9000 
> all the way through.
> 
> Anyway ? I?m finding my AFM traffic to be dragging its feet and I don?t 
> really understand why that might be. I?ve verified the links and 
> transports ability as I said above with iPerf, and CERN?s FDT to near 
> 10Gbit/sec. 
> 
> I also verified the clusters on both sides in terms of disk IO and they 
> both seem easily capable in IOZone and IOR tests of multiple GB/sec of 
> throughput.
> 
> So ? my questions:
> 
> 1.       Are there very specific tunings AFM needs for high latency/long 
> distance IO? 
> 2.       Are there very specific NIC/TCP-stack tunings (beyond the type of 
> thing we already have in place) that benefits AFM over really long 
> distances and high latency?
> 3.       We are seeing on the ?cache? side really lazy/sticky ?ls ?als? in 
> the home mount. It sometimes takes 20 to 30 seconds before the command 
> line will report back with a long listing of files. Any ideas why it?d 
> take that long to get a response from ?home?.
> 
> We?ve got our TCP stack setup fairly aggressively, on all hosts that 
> participate in these two clusters.
> 
> ethtool -C enp2s0f0 adaptive-rx off
> ifconfig enp2s0f0 txqueuelen 10000
> sysctl -w net.core.rmem_max=536870912
> sysctl -w net.core.wmem_max=536870912
> sysctl -w net.ipv4.tcp_rmem="4096 87380 268435456"
> sysctl -w net.ipv4.tcp_wmem="4096 65536 268435456"
> sysctl -w net.core.netdev_max_backlog=250000
> sysctl -w net.ipv4.tcp_congestion_control=htcp
> sysctl -w net.ipv4.tcp_mtu_probing=1
> 
> I modified a couple of small things on the AFM ?cache? side to see if it?d 
> make a difference such as:
> 
> mmchconfig afmNumWriteThreads=4
> mmchconfig afmNumReadThreads=4
> 
> But no difference so far.
> 
> Thoughts would be appreciated. I?ve done this before over much shorter 
> distances (30Km) and I?ve flattened a 10GbE wire without really 
> tuning?anything. Are my large in-flight-packets 
> numbers/long-time-to-acknowledgement semantics going to hurt here? I 
> really thought AFM might be well designed for exactly this kind of work at 
> long distance *and* high throughput ? so I must be missing something!
> 
> -jc
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
> <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161109/c775cf5a/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 2
> Date: Wed, 9 Nov 2016 18:09:14 +0000
> From: Jake Carroll <[email protected]>
> To: "[email protected]"
>    <[email protected]>
> Subject: Re: [gpfsug-discuss] Tuning AFM for high throughput/high IO
>    over _really_ long distances (Jan-Frode Myklebust)
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi jf?
> 
> 
>>>   Mostly curious, don't have experience in such environments, but ... Is 
>>> this
>    AFM over NFS or NSD protocol? Might be interesting to try the other option
>    -- and also check how nsdperf performs over such distance/latency.
> 
> As it turns out, it seems, very few people do. 
> 
> I will test nsdperf over it and see how it performs. And yes, it is AFM ? 
> AFM. No NFS involved here!
> 
> -jc
> 
> 
> 
>    ------------------------------
> 
>    Message: 2
>    Date: Wed, 9 Nov 2016 17:39:05 +0000
>    From: Jake Carroll <[email protected]>
>    To: "[email protected]"
>        <[email protected]>
>    Subject: [gpfsug-discuss] Tuning AFM for high throughput/high IO over
>        _really_ long distances
>    Message-ID: <[email protected]>
>    Content-Type: text/plain; charset="utf-8"
> 
>    Hi.
> 
>    I?ve got an GPFS to GPFS AFM cache/home (IW) relationship set up over a 
> really long distance. About 180ms of latency between the two clusters and 
> around 13,000km of optical path. Fortunately for me, I?ve actually got near 
> theoretical maximum IO over the NIC?s between the clusters and I?m iPerf?ing 
> at around 8.90 to 9.2Gbit/sec over a 10GbE circuit. All MTU9000 all the way 
> through.
> 
>    Anyway ? I?m finding my AFM traffic to be dragging its feet and I don?t 
> really understand why that might be. I?ve verified the links and transports 
> ability as I said above with iPerf, and CERN?s FDT to near 10Gbit/sec.
> 
>    I also verified the clusters on both sides in terms of disk IO and they 
> both seem easily capable in IOZone and IOR tests of multiple GB/sec of 
> throughput.
> 
>    So ? my questions:
> 
> 
>    1.       Are there very specific tunings AFM needs for high latency/long 
> distance IO?
> 
>    2.       Are there very specific NIC/TCP-stack tunings (beyond the type of 
> thing we already have in place) that benefits AFM over really long distances 
> and high latency?
> 
>    3.       We are seeing on the ?cache? side really lazy/sticky ?ls ?als? in 
> the home mount. It sometimes takes 20 to 30 seconds before the command line 
> will report back with a long listing of files. Any ideas why it?d take that 
> long to get a response from ?home?.
> 
>    We?ve got our TCP stack setup fairly aggressively, on all hosts that 
> participate in these two clusters.
> 
>    ethtool -C enp2s0f0 adaptive-rx off
>    ifconfig enp2s0f0 txqueuelen 10000
>    sysctl -w net.core.rmem_max=536870912
>    sysctl -w net.core.wmem_max=536870912
>    sysctl -w net.ipv4.tcp_rmem="4096 87380 268435456"
>    sysctl -w net.ipv4.tcp_wmem="4096 65536 268435456"
>    sysctl -w net.core.netdev_max_backlog=250000
>    sysctl -w net.ipv4.tcp_congestion_control=htcp
>    sysctl -w net.ipv4.tcp_mtu_probing=1
> 
>    I modified a couple of small things on the AFM ?cache? side to see if it?d 
> make a difference such as:
> 
>    mmchconfig afmNumWriteThreads=4
>    mmchconfig afmNumReadThreads=4
> 
>    But no difference so far.
> 
>    Thoughts would be appreciated. I?ve done this before over much shorter 
> distances (30Km) and I?ve flattened a 10GbE wire without really 
> tuning?anything. Are my large in-flight-packets 
> numbers/long-time-to-acknowledgement semantics going to hurt here? I really 
> thought AFM might be well designed for exactly this kind of work at long 
> distance *and* high throughput ? so I must be missing something!
> 
>    -jc
> 
> 
> 
>    -------------- next part --------------
>    An HTML attachment was scrubbed...
>    URL: 
> <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161109/d4f4d9a7/attachment-0001.html>
> 
>    ------------------------------
> 
>    Message: 3
>    Date: Wed, 09 Nov 2016 18:05:21 +0000
>    From: Jan-Frode Myklebust <[email protected]>
>    To: "[email protected]"
>        <[email protected]>
>    Subject: Re: [gpfsug-discuss] Tuning AFM for high throughput/high IO
>        over _really_ long distances
>    Message-ID:
>        <CAHwPathy=4z=jDXN5qa3ys+Z-_7n=tsjh7cz3zkzfwqmg34...@mail.gmail.com>
>    Content-Type: text/plain; charset="utf-8"
> 
>    Mostly curious, don't have experience in such environments, but ... Is this
>    AFM over NFS or NSD protocol? Might be interesting to try the other option
>    -- and also check how nsdperf performs over such distance/latency.
> 
> 
> 
>    -jf
>>    ons. 9. nov. 2016 kl. 18.39 skrev Jake Carroll <[email protected]>:
>> 
>> Hi.
>> 
>> 
>> 
>> I?ve got an GPFS to GPFS AFM cache/home (IW) relationship set up over a
>> really long distance. About 180ms of latency between the two clusters and
>> around 13,000km of optical path. Fortunately for me, I?ve actually got near
>> theoretical maximum IO over the NIC?s between the clusters and I?m
>> iPerf?ing at around 8.90 to 9.2Gbit/sec over a 10GbE circuit. All MTU9000
>> all the way through.
>> 
>> 
>> 
>> Anyway ? I?m finding my AFM traffic to be dragging its feet and I don?t
>> really understand why that might be. I?ve verified the links and transports
>> ability as I said above with iPerf, and CERN?s FDT to near 10Gbit/sec.
>> 
>> 
>> 
>> I also verified the clusters on both sides in terms of disk IO and they
>> both seem easily capable in IOZone and IOR tests of multiple GB/sec of
>> throughput.
>> 
>> 
>> 
>> So ? my questions:
>> 
>> 
>> 
>> 1.       Are there very specific tunings AFM needs for high latency/long
>> distance IO?
>> 
>> 2.       Are there very specific NIC/TCP-stack tunings (beyond the type
>> of thing we already have in place) that benefits AFM over really long
>> distances and high latency?
>> 
>> 3.       We are seeing on the ?cache? side really lazy/sticky ?ls ?als?
>> in the home mount. It sometimes takes 20 to 30 seconds before the command
>> line will report back with a long listing of files. Any ideas why it?d take
>> that long to get a response from ?home?.
>> 
>> 
>> 
>> We?ve got our TCP stack setup fairly aggressively, on all hosts that
>> participate in these two clusters.
>> 
>> 
>> 
>> ethtool -C enp2s0f0 adaptive-rx off
>> 
>> ifconfig enp2s0f0 txqueuelen 10000
>> 
>> sysctl -w net.core.rmem_max=536870912
>> 
>> sysctl -w net.core.wmem_max=536870912
>> 
>> sysctl -w net.ipv4.tcp_rmem="4096 87380 268435456"
>> 
>> sysctl -w net.ipv4.tcp_wmem="4096 65536 268435456"
>> 
>> sysctl -w net.core.netdev_max_backlog=250000
>> 
>> sysctl -w net.ipv4.tcp_congestion_control=htcp
>> 
>> sysctl -w net.ipv4.tcp_mtu_probing=1
>> 
>> 
>> 
>> I modified a couple of small things on the AFM ?cache? side to see if it?d
>> make a difference such as:
>> 
>> 
>> 
>> mmchconfig afmNumWriteThreads=4
>> 
>> mmchconfig afmNumReadThreads=4
>> 
>> 
>> 
>> But no difference so far.
>> 
>> 
>> 
>> Thoughts would be appreciated. I?ve done this before over much shorter
>> distances (30Km) and I?ve flattened a 10GbE wire without really
>> tuning?anything. Are my large in-flight-packets
>> numbers/long-time-to-acknowledgement semantics going to hurt here? I really
>> thought AFM might be well designed for exactly this kind of work at long
>> distance **and** high throughput ? so I must be missing something!
>> 
>> 
>> 
>> -jc
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>    -------------- next part --------------
>    An HTML attachment was scrubbed...
>    URL: 
> <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161109/f44369ab/attachment.html>
> 
>    ------------------------------
> 
>    _______________________________________________
>    gpfsug-discuss mailing list
>    gpfsug-discuss at spectrumscale.org
>    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
>    End of gpfsug-discuss Digest, Vol 58, Issue 12
>    **********************************************
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> End of gpfsug-discuss Digest, Vol 58, Issue 13
> **********************************************
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] Tuning AFM for high throughput/high IO over _really_ long

Reply via email to