Scott, Nar, very much pure AFM to AFM here, hence we are a little surprised. Last time we did this over a longish link we almost caused an outage with the ease at which we attained throughput - but maybe there are some magic tolerances we are hitting in latency and in flight IO semantics that SS/GPFS/AFM is not well tweaked for (yet...)...
Yes - we are catching up at SC. I think it's all been arranged? We are also talking to one of your resources about this AFM throughput behaviour this afternoon. John I believe his name is? Anyway - if you've got any ideas, am all ears! > > > Today's Topics: > > 1. Re: Tuning AFM for high throughput/high IO over _really_ long > distances (Scott Fadden) > 2. Re: Tuning AFM for high throughput/high IO over _really_ long > distances (Jan-Frode Myklebust) (Jake Carroll) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 9 Nov 2016 10:08:42 -0800 > From: "Scott Fadden" <[email protected]> > To: gpfsug main discussion list <[email protected]> > Subject: Re: [gpfsug-discuss] Tuning AFM for high throughput/high IO > over _really_ long distances > Message-ID: > > <of438b4e0a.3df65f65-on88258066.006179ea-88258066.0063a...@notes.na.collabserv.com> > > Content-Type: text/plain; charset="utf-8" > > Jake, > > If AFM is using NFS it is all about NFS tuning. The copy from one side to > the other is basically just a client writing to an NFS mount. Thee are a > few things you can look at: > 1. NFS Transfer size (Make is 1MiB, I think that is the max) > 2. TCP Tuning for large window size. This is discussed on Tuning active > file management home communications in the docs. On this page you will > find some discussion on increasing gateway threads, and other things > similar that may help as well. > > We can discuss further as I understand we will be meeting at SC16. > > Scott Fadden > Spectrum Scale - Technical Marketing > Phone: (503) 880-5833 > [email protected] > http://www.ibm.com/systems/storage/spectrum/scale > > > > From: Jake Carroll <[email protected]> > To: "[email protected]" > <[email protected]> > Date: 11/09/2016 09:39 AM > Subject: [gpfsug-discuss] Tuning AFM for high throughput/high IO > over _really_ long distances > Sent by: [email protected] > > > > Hi. > > I?ve got an GPFS to GPFS AFM cache/home (IW) relationship set up over a > really long distance. About 180ms of latency between the two clusters and > around 13,000km of optical path. Fortunately for me, I?ve actually got > near theoretical maximum IO over the NIC?s between the clusters and I?m > iPerf?ing at around 8.90 to 9.2Gbit/sec over a 10GbE circuit. All MTU9000 > all the way through. > > Anyway ? I?m finding my AFM traffic to be dragging its feet and I don?t > really understand why that might be. I?ve verified the links and > transports ability as I said above with iPerf, and CERN?s FDT to near > 10Gbit/sec. > > I also verified the clusters on both sides in terms of disk IO and they > both seem easily capable in IOZone and IOR tests of multiple GB/sec of > throughput. > > So ? my questions: > > 1. Are there very specific tunings AFM needs for high latency/long > distance IO? > 2. Are there very specific NIC/TCP-stack tunings (beyond the type of > thing we already have in place) that benefits AFM over really long > distances and high latency? > 3. We are seeing on the ?cache? side really lazy/sticky ?ls ?als? in > the home mount. It sometimes takes 20 to 30 seconds before the command > line will report back with a long listing of files. Any ideas why it?d > take that long to get a response from ?home?. > > We?ve got our TCP stack setup fairly aggressively, on all hosts that > participate in these two clusters. > > ethtool -C enp2s0f0 adaptive-rx off > ifconfig enp2s0f0 txqueuelen 10000 > sysctl -w net.core.rmem_max=536870912 > sysctl -w net.core.wmem_max=536870912 > sysctl -w net.ipv4.tcp_rmem="4096 87380 268435456" > sysctl -w net.ipv4.tcp_wmem="4096 65536 268435456" > sysctl -w net.core.netdev_max_backlog=250000 > sysctl -w net.ipv4.tcp_congestion_control=htcp > sysctl -w net.ipv4.tcp_mtu_probing=1 > > I modified a couple of small things on the AFM ?cache? side to see if it?d > make a difference such as: > > mmchconfig afmNumWriteThreads=4 > mmchconfig afmNumReadThreads=4 > > But no difference so far. > > Thoughts would be appreciated. I?ve done this before over much shorter > distances (30Km) and I?ve flattened a 10GbE wire without really > tuning?anything. Are my large in-flight-packets > numbers/long-time-to-acknowledgement semantics going to hurt here? I > really thought AFM might be well designed for exactly this kind of work at > long distance *and* high throughput ? so I must be missing something! > > -jc > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161109/c775cf5a/attachment-0001.html> > > ------------------------------ > > Message: 2 > Date: Wed, 9 Nov 2016 18:09:14 +0000 > From: Jake Carroll <[email protected]> > To: "[email protected]" > <[email protected]> > Subject: Re: [gpfsug-discuss] Tuning AFM for high throughput/high IO > over _really_ long distances (Jan-Frode Myklebust) > Message-ID: <[email protected]> > Content-Type: text/plain; charset="utf-8" > > Hi jf? > > >>> Mostly curious, don't have experience in such environments, but ... Is >>> this > AFM over NFS or NSD protocol? Might be interesting to try the other option > -- and also check how nsdperf performs over such distance/latency. > > As it turns out, it seems, very few people do. > > I will test nsdperf over it and see how it performs. And yes, it is AFM ? > AFM. No NFS involved here! > > -jc > > > > ------------------------------ > > Message: 2 > Date: Wed, 9 Nov 2016 17:39:05 +0000 > From: Jake Carroll <[email protected]> > To: "[email protected]" > <[email protected]> > Subject: [gpfsug-discuss] Tuning AFM for high throughput/high IO over > _really_ long distances > Message-ID: <[email protected]> > Content-Type: text/plain; charset="utf-8" > > Hi. > > I?ve got an GPFS to GPFS AFM cache/home (IW) relationship set up over a > really long distance. About 180ms of latency between the two clusters and > around 13,000km of optical path. Fortunately for me, I?ve actually got near > theoretical maximum IO over the NIC?s between the clusters and I?m iPerf?ing > at around 8.90 to 9.2Gbit/sec over a 10GbE circuit. All MTU9000 all the way > through. > > Anyway ? I?m finding my AFM traffic to be dragging its feet and I don?t > really understand why that might be. I?ve verified the links and transports > ability as I said above with iPerf, and CERN?s FDT to near 10Gbit/sec. > > I also verified the clusters on both sides in terms of disk IO and they > both seem easily capable in IOZone and IOR tests of multiple GB/sec of > throughput. > > So ? my questions: > > > 1. Are there very specific tunings AFM needs for high latency/long > distance IO? > > 2. Are there very specific NIC/TCP-stack tunings (beyond the type of > thing we already have in place) that benefits AFM over really long distances > and high latency? > > 3. We are seeing on the ?cache? side really lazy/sticky ?ls ?als? in > the home mount. It sometimes takes 20 to 30 seconds before the command line > will report back with a long listing of files. Any ideas why it?d take that > long to get a response from ?home?. > > We?ve got our TCP stack setup fairly aggressively, on all hosts that > participate in these two clusters. > > ethtool -C enp2s0f0 adaptive-rx off > ifconfig enp2s0f0 txqueuelen 10000 > sysctl -w net.core.rmem_max=536870912 > sysctl -w net.core.wmem_max=536870912 > sysctl -w net.ipv4.tcp_rmem="4096 87380 268435456" > sysctl -w net.ipv4.tcp_wmem="4096 65536 268435456" > sysctl -w net.core.netdev_max_backlog=250000 > sysctl -w net.ipv4.tcp_congestion_control=htcp > sysctl -w net.ipv4.tcp_mtu_probing=1 > > I modified a couple of small things on the AFM ?cache? side to see if it?d > make a difference such as: > > mmchconfig afmNumWriteThreads=4 > mmchconfig afmNumReadThreads=4 > > But no difference so far. > > Thoughts would be appreciated. I?ve done this before over much shorter > distances (30Km) and I?ve flattened a 10GbE wire without really > tuning?anything. Are my large in-flight-packets > numbers/long-time-to-acknowledgement semantics going to hurt here? I really > thought AFM might be well designed for exactly this kind of work at long > distance *and* high throughput ? so I must be missing something! > > -jc > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161109/d4f4d9a7/attachment-0001.html> > > ------------------------------ > > Message: 3 > Date: Wed, 09 Nov 2016 18:05:21 +0000 > From: Jan-Frode Myklebust <[email protected]> > To: "[email protected]" > <[email protected]> > Subject: Re: [gpfsug-discuss] Tuning AFM for high throughput/high IO > over _really_ long distances > Message-ID: > <CAHwPathy=4z=jDXN5qa3ys+Z-_7n=tsjh7cz3zkzfwqmg34...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Mostly curious, don't have experience in such environments, but ... Is this > AFM over NFS or NSD protocol? Might be interesting to try the other option > -- and also check how nsdperf performs over such distance/latency. > > > > -jf >> ons. 9. nov. 2016 kl. 18.39 skrev Jake Carroll <[email protected]>: >> >> Hi. >> >> >> >> I?ve got an GPFS to GPFS AFM cache/home (IW) relationship set up over a >> really long distance. About 180ms of latency between the two clusters and >> around 13,000km of optical path. Fortunately for me, I?ve actually got near >> theoretical maximum IO over the NIC?s between the clusters and I?m >> iPerf?ing at around 8.90 to 9.2Gbit/sec over a 10GbE circuit. All MTU9000 >> all the way through. >> >> >> >> Anyway ? I?m finding my AFM traffic to be dragging its feet and I don?t >> really understand why that might be. I?ve verified the links and transports >> ability as I said above with iPerf, and CERN?s FDT to near 10Gbit/sec. >> >> >> >> I also verified the clusters on both sides in terms of disk IO and they >> both seem easily capable in IOZone and IOR tests of multiple GB/sec of >> throughput. >> >> >> >> So ? my questions: >> >> >> >> 1. Are there very specific tunings AFM needs for high latency/long >> distance IO? >> >> 2. Are there very specific NIC/TCP-stack tunings (beyond the type >> of thing we already have in place) that benefits AFM over really long >> distances and high latency? >> >> 3. We are seeing on the ?cache? side really lazy/sticky ?ls ?als? >> in the home mount. It sometimes takes 20 to 30 seconds before the command >> line will report back with a long listing of files. Any ideas why it?d take >> that long to get a response from ?home?. >> >> >> >> We?ve got our TCP stack setup fairly aggressively, on all hosts that >> participate in these two clusters. >> >> >> >> ethtool -C enp2s0f0 adaptive-rx off >> >> ifconfig enp2s0f0 txqueuelen 10000 >> >> sysctl -w net.core.rmem_max=536870912 >> >> sysctl -w net.core.wmem_max=536870912 >> >> sysctl -w net.ipv4.tcp_rmem="4096 87380 268435456" >> >> sysctl -w net.ipv4.tcp_wmem="4096 65536 268435456" >> >> sysctl -w net.core.netdev_max_backlog=250000 >> >> sysctl -w net.ipv4.tcp_congestion_control=htcp >> >> sysctl -w net.ipv4.tcp_mtu_probing=1 >> >> >> >> I modified a couple of small things on the AFM ?cache? side to see if it?d >> make a difference such as: >> >> >> >> mmchconfig afmNumWriteThreads=4 >> >> mmchconfig afmNumReadThreads=4 >> >> >> >> But no difference so far. >> >> >> >> Thoughts would be appreciated. I?ve done this before over much shorter >> distances (30Km) and I?ve flattened a 10GbE wire without really >> tuning?anything. Are my large in-flight-packets >> numbers/long-time-to-acknowledgement semantics going to hurt here? I really >> thought AFM might be well designed for exactly this kind of work at long >> distance **and** high throughput ? so I must be missing something! >> >> >> >> -jc >> >> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161109/f44369ab/attachment.html> > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 58, Issue 12 > ********************************************** > > > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 58, Issue 13 > ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
