Dave -- That's an unfortunate segv in the runtime, not the the MPI layer. Did you get a corefile, perchance? Could you send a backtrace?
> On Feb 23, 2015, at 9:53 PM, Dave Turner <drdavetur...@gmail.com> wrote: > > Jeff, George, > > When I try to use yalla with the dev master I get the error message > below: > > ~/openmpi-master/bin/mpirun -np 2 --mca pml yalla ./NPmpi.master -o np.out > > *** Error in `/homes/daveturner/openmpi-master/bin/orted': free(): corrupted > unsorted chunks: 0x0000000002351270 *** > *** Error in `/homes/daveturner/openmpi-master/bin/orted': corrupted > double-linked list: 0x0000000002351260 *** > > > > George: I'm not sure what the PML ob1 is that you refer to. I'm running > with the > > default settings for the most part and trying to tune those. When we had > nodes > > with just RoCE over 10 Gbps and just QDR IB, the latencies were the same > > at around 3 microseconds. The bandwidths were around 10 Gbps and 30 Gbps. > > > > Dave > > > On Sun, Feb 22, 2015 at 5:37 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > Dave -- > > Just out of curiosity, what kind of performance do you get when you use MXM? > (e.g., the yalla PML on master) > > > > On Feb 19, 2015, at 6:41 PM, Dave Turner <drdavetur...@gmail.com> wrote: > > > > > > I've downloaded the OpenMPI master as suggested and rerun all my > > aggregate tests > > across my system with QDR IB and 10 Gbps RoCE. > > > > The attached unidirectional.pdf graph is the ping-pong performance for > > 1 core > > on 1 machine to 1 core on the 2nd. The red curve for OpenMPI 1.8.3 shows > > lower > > performance for small and also medium message sizes for the base test > > without > > using any tuning parameters. The green line from the OpenMPI master shows > > lower > > performance only for small messages, but great for medium size. Turning > > off the > > 10 Gbps card entirely produces great performance for all message sizes. So > > the > > fixes in the master at least help, but it still seems to be choosing to use > > RoCE for > > small messages rather than QDR IB. They both use the openib btl so I > > assume it > > just chooses one at random so this is probably not that surprising. Since > > there are > > no tunable parameters for multiple openib btl's, this cannot be manually > > tuned. > > > > The bi-directional ping-pong tests show basically the same thing with > > lower > > performance for small message sizes for 1.8.3 and the master. However, I'm > > also seeing the max bandwidth being limited to 44 Gbps instead of 60 Gbps > > for the master for some reason. > > > > The aggregate tests in the 3rd graph are for 20 cores on one machine > > yelling at 20 cores on the 2nd machine (bi-directional too). They likewise > > show > > the lower 10 Gbps RoCE performance for small messages, and also show > > the max bandwidth being limited to 45 Gbps for the master. > > > > Our solution for now is to simply exclude mlx4_1 which is the 10 Gbps > > card > > which will give us QDR performance but not allow us to use the extra 10 Gbps > > to channel bond for large messages. It is more worrisome that max bandwidth > > on the bi-directional and aggregate tests using the master are slower than > > they > > should be. > > > > Dave > > > > On Wed, Feb 11, 2015 at 11:00 AM, <devel-requ...@open-mpi.org> wrote: > > Send devel mailing list submissions to > > de...@open-mpi.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > or, via email, send a message with subject or body 'help' to > > devel-requ...@open-mpi.org > > > > You can reach the person managing the list at > > devel-ow...@open-mpi.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of devel digest..." > > > > > > Today's Topics: > > > > 1. Re: OMPI devel] RoCE plus QDR IB tunable parameters > > (George Bosilca) > > 2. Re: OMPI devel] RoCE plus QDR IB tunable parameters > > (Howard Pritchard) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Tue, 10 Feb 2015 20:41:30 -0500 > > From: George Bosilca <bosi...@icl.utk.edu> > > To: drdavetur...@gmail.com, Open MPI Developers <de...@open-mpi.org> > > Subject: Re: [OMPI devel] OMPI devel] RoCE plus QDR IB tunable > > parameters > > Message-ID: > > <camjjpkxc6e_y34fu5vej0uhrrj2z4ca89mn7wfwa5dsfx52...@mail.gmail.com> > > Content-Type: text/plain; charset="utf-8" > > > > Somehow one of the most basic information about the capabilities of the > > BTLs (bandwidth) disappeared from the MCA parameters and the one left > > (latency) was mislabeled. This mishap not only prevented the communication > > engine from correctly ordering the BTL for small messages (the latency > > bound part), but also introduced undesirable bias on the load-balance > > between multiple devices logic (the bandwidth part). > > > > I just pushed a fix in master > > https://github.com/open-mpi/ompi/commit/e173f9b0c0c63c3ea24b8d8bc0ebafe1f1736acb. > > Once validated this should be moved over the 1.8 branch. > > > > Dave do you think it is possible to renew your experiment with the current > > master ? > > > > Thanks, > > George. > > > > > > > > On Mon, Feb 9, 2015 at 2:57 PM, Dave Turner <drdavetur...@gmail.com> wrote: > > > > > Gilles, > > > > > > I tried running with btl_openib_cpc_include rdmacm and saw no change. > > > > > > Let's simplify the problem by forgetting about the channel bonding. > > > If I just do an aggregate test of 16 cores on one machine talking to 16 on > > > a second machine without any settings changed from the default install > > > of OpenMPI, I see that RoCE over the 10 Gbps link is used for small > > > messages > > > then it switches over to QDR IB for large messages. I don't see channel > > > bonding > > > for large messages, but can turn this on with the btl_tcp_exclusivity > > > parameter. > > > > > > I think there are 2 problems here, both related to the fact that QDR > > > IB link and RoCE > > > both use the same openib btl. The first problem is that the slower RoCE > > > link is being chosen > > > for small messages, which does lower performance significantly. The > > > second problem > > > is that I don't think there are parameters to allow for tuning of multiple > > > openib btl's > > > to manually select one over the other. > > > > > > Dave > > > > > > On Fri, Feb 6, 2015 at 8:24 PM, Gilles Gouaillardet < > > > gilles.gouaillar...@gmail.com> wrote: > > > > > >> Dave, > > >> > > >> These settings tell ompi to use native infiniband on the ib qdr port and > > >> tcpo/ip on the other port. > > >> > > >> From the faq, roce is implemented in the openib btl > > >> http://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce > > >> > > >> Did you use > > >> --mca btl_openib_cpc_include rdmacm > > >> in your first tests ? > > >> > > >> I had some second thougths about the bandwidth values, and imho they > > >> should be 327680 and 81920 because of the 8/10 encoding > > >> (And that being said, that should not change the measured performance) > > >> > > >> Also, could you try again by forcing the same btl_tcp_latency and > > >> btl_openib_latency ? > > >> > > >> Cheers, > > >> > > >> Gilles > > >> > > >> Dave Turner <drdavetur...@gmail.com> wrote: > > >> George, > > >> > > >> I can check with my guys on Monday but I think the bandwidth > > >> parameters > > >> are the defaults. I did alter these to 40960 and 10240 as someone else > > >> suggested to me. The attached graph shows the base red line, along with > > >> the manual balanced blue line and auto balanced green line (0's for > > >> both). > > >> This shift lower suggests to me that the higher TCP latency is being > > >> pulled in. > > >> I'm not sure why the curves are shifted right. > > >> > > >> Dave > > >> > > >> On Fri, Feb 6, 2015 at 5:32 PM, George Bosilca <bosi...@icl.utk.edu> > > >> wrote: > > >> > > >>> Dave, > > >>> > > >>> Based on your ompi_info.all the following bandwidth are reported on your > > >>> system: > > >>> > > >>> MCA btl: parameter "btl_openib_bandwidth" (current > > >>> value: "4", data source: default, level: 5 tuner/detail, type: unsigned) > > >>> Approximate maximum bandwidth of interconnect > > >>> (0 = auto-detect value at run-time [not supported in all BTL modules], > > >>> >= 1 > > >>> = bandwidth in Mbps) > > >>> > > >>> MCA btl: parameter "btl_tcp_bandwidth" (current value: > > >>> "100", data source: default, level: 5 tuner/detail, type: unsigned) > > >>> Approximate maximum bandwidth of interconnect > > >>> (0 = auto-detect value at run-time [not supported in all BTL modules], > > >>> >= 1 > > >>> = bandwidth in Mbps) > > >>> > > >>> This basically states that on your system the default values for these > > >>> parameters are wrong, your TCP network being much faster than the IB. > > >>> This > > >>> explains the somewhat unexpected decision of OMPI. > > >>> > > >>> As a possible solution I suggest you set these bandwidth values to > > >>> something more meaningful (directly in your configuration file). As an > > >>> example, > > >>> > > >>> btl_openib_bandwidth = 40000 > > >>> btl_tcp_bandwidth = 10000 > > >>> > > >>> make more sense based on your HPC system description. > > >>> > > >>> George. > > >>> > > >>> > > >>> > > >>> > > >>> On Fri, Feb 6, 2015 at 5:37 PM, Dave Turner <drdavetur...@gmail.com> > > >>> wrote: > > >>> > > >>>> > > >>>> We have nodes in our HPC system that have 2 NIC's, > > >>>> one being QDR IB and the second being a slower 10 Gbps card > > >>>> configured for both RoCE and TCP. Aggregate bandwidth > > >>>> tests with 20 cores on one node yelling at 20 cores on a second > > >>>> node (attached roce.ib.aggregate.pdf) show that without tuning > > >>>> the slower RoCE interface is being used for small messages > > >>>> then QDR IB is used for larger messages (red line). Tuning > > >>>> the tcp_exclusivity to 1024 to match the openib_exclusivity > > >>>> adds another 20 Gbps of bidirectional bandwidth to the high end (green > > >>>> line), > > >>>> and I'm guessing this is TCP traffic and not RoCE. > > >>>> > > >>>> So by default the slower interface is being chosen on the low end, > > >>>> and > > >>>> I don't think there are tunable parameters to allow me to choose the > > >>>> QDR interface as the default. Going forward we'll probably just > > >>>> disable > > >>>> RoCE on these nodes and go with QDR IB plus 10 Gbps TCP for large > > >>>> messages. > > >>>> > > >>>> However, I do think these issues will come up more in the future. > > >>>> With the low latency of RoCE matching IB, there are more opportunities > > >>>> to do channel bonding or allowing multiple interfaces for aggregate > > >>>> traffic > > >>>> for even smaller message sizes. > > >>>> > > >>>> Dave Turner > > >>>> > > >>>> -- > > >>>> Work: davetur...@ksu.edu (785) 532-7791 > > >>>> 118 Nichols Hall, Manhattan KS 66502 > > >>>> Home: drdavetur...@gmail.com > > >>>> cell: (785) 770-5929 > > >>>> > > >>>> _______________________________________________ > > >>>> devel mailing list > > >>>> de...@open-mpi.org > > >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > >>>> Link to this post: > > >>>> http://www.open-mpi.org/community/lists/devel/2015/02/16951.php > > >>>> > > >>> > > >>> > > >> > > >> > > >> -- > > >> Work: davetur...@ksu.edu (785) 532-7791 > > >> 118 Nichols Hall, Manhattan KS 66502 > > >> Home: drdavetur...@gmail.com > > >> cell: (785) 770-5929 > > >> > > > > > > > > > > > > -- > > > Work: davetur...@ksu.edu (785) 532-7791 > > > 118 Nichols Hall, Manhattan KS 66502 > > > Home: drdavetur...@gmail.com > > > cell: (785) 770-5929 > > > > > > _______________________________________________ > > > devel mailing list > > > de...@open-mpi.org > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > Link to this post: > > > http://www.open-mpi.org/community/lists/devel/2015/02/16963.php > > > > > -------------- next part -------------- > > HTML attachment scrubbed and removed > > > > ------------------------------ > > > > Message: 2 > > Date: Tue, 10 Feb 2015 20:34:59 -0700 > > From: Howard Pritchard <hpprit...@gmail.com> > > To: Open MPI Developers <de...@open-mpi.org> > > Subject: Re: [OMPI devel] OMPI devel] RoCE plus QDR IB tunable > > parameters > > Message-ID: > > <CAF1Cqj5=GPfi=t8Jw6SSUBKjqut0ChgntTyXfU0diM=mxs+...@mail.gmail.com> > > Content-Type: text/plain; charset="utf-8" > > > > HI George, > > > > I'd say commit cf377db82 explains the vanishing of the bandwidth metric as > > well as the mis-labeling of the latency metric. > > > > Howard > > > > > > 2015-02-10 18:41 GMT-07:00 George Bosilca <bosi...@icl.utk.edu>: > > > > > Somehow one of the most basic information about the capabilities of the > > > BTLs (bandwidth) disappeared from the MCA parameters and the one left > > > (latency) was mislabeled. This mishap not only prevented the communication > > > engine from correctly ordering the BTL for small messages (the latency > > > bound part), but also introduced undesirable bias on the load-balance > > > between multiple devices logic (the bandwidth part). > > > > > > I just pushed a fix in master > > > https://github.com/open-mpi/ompi/commit/e173f9b0c0c63c3ea24b8d8bc0ebafe1f1736acb. > > > Once validated this should be moved over the 1.8 branch. > > > > > > Dave do you think it is possible to renew your experiment with the current > > > master ? > > > > > > Thanks, > > > George. > > > > > > > > > > > > On Mon, Feb 9, 2015 at 2:57 PM, Dave Turner <drdavetur...@gmail.com> > > > wrote: > > > > > >> Gilles, > > >> > > >> I tried running with btl_openib_cpc_include rdmacm and saw no > > >> change. > > >> > > >> Let's simplify the problem by forgetting about the channel bonding. > > >> > > >> If I just do an aggregate test of 16 cores on one machine talking to 16 > > >> on > > >> a second machine without any settings changed from the default install > > >> of OpenMPI, I see that RoCE over the 10 Gbps link is used for small > > >> messages > > >> then it switches over to QDR IB for large messages. I don't see channel > > >> bonding > > >> for large messages, but can turn this on with the btl_tcp_exclusivity > > >> parameter. > > >> > > >> I think there are 2 problems here, both related to the fact that QDR > > >> IB link and RoCE > > >> both use the same openib btl. The first problem is that the slower RoCE > > >> link is being chosen > > >> for small messages, which does lower performance significantly. The > > >> second problem > > >> is that I don't think there are parameters to allow for tuning of > > >> multiple openib btl's > > >> to manually select one over the other. > > >> > > >> Dave > > >> > > >> On Fri, Feb 6, 2015 at 8:24 PM, Gilles Gouaillardet < > > >> gilles.gouaillar...@gmail.com> wrote: > > >> > > >>> Dave, > > >>> > > >>> These settings tell ompi to use native infiniband on the ib qdr port and > > >>> tcpo/ip on the other port. > > >>> > > >>> From the faq, roce is implemented in the openib btl > > >>> http://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce > > >>> > > >>> Did you use > > >>> --mca btl_openib_cpc_include rdmacm > > >>> in your first tests ? > > >>> > > >>> I had some second thougths about the bandwidth values, and imho they > > >>> should be 327680 and 81920 because of the 8/10 encoding > > >>> (And that being said, that should not change the measured performance) > > >>> > > >>> Also, could you try again by forcing the same btl_tcp_latency and > > >>> btl_openib_latency ? > > >>> > > >>> Cheers, > > >>> > > >>> Gilles > > >>> > > >>> Dave Turner <drdavetur...@gmail.com> wrote: > > >>> George, > > >>> > > >>> I can check with my guys on Monday but I think the bandwidth > > >>> parameters > > >>> are the defaults. I did alter these to 40960 and 10240 as someone else > > >>> suggested to me. The attached graph shows the base red line, along with > > >>> the manual balanced blue line and auto balanced green line (0's for > > >>> both). > > >>> This shift lower suggests to me that the higher TCP latency is being > > >>> pulled in. > > >>> I'm not sure why the curves are shifted right. > > >>> > > >>> Dave > > >>> > > >>> On Fri, Feb 6, 2015 at 5:32 PM, George Bosilca <bosi...@icl.utk.edu> > > >>> wrote: > > >>> > > >>>> Dave, > > >>>> > > >>>> Based on your ompi_info.all the following bandwidth are reported on > > >>>> your system: > > >>>> > > >>>> MCA btl: parameter "btl_openib_bandwidth" (current > > >>>> value: "4", data source: default, level: 5 tuner/detail, type: > > >>>> unsigned) > > >>>> Approximate maximum bandwidth of interconnect > > >>>> (0 = auto-detect value at run-time [not supported in all BTL modules], > > >>>> >= 1 > > >>>> = bandwidth in Mbps) > > >>>> > > >>>> MCA btl: parameter "btl_tcp_bandwidth" (current value: > > >>>> "100", data source: default, level: 5 tuner/detail, type: unsigned) > > >>>> Approximate maximum bandwidth of interconnect > > >>>> (0 = auto-detect value at run-time [not supported in all BTL modules], > > >>>> >= 1 > > >>>> = bandwidth in Mbps) > > >>>> > > >>>> This basically states that on your system the default values for these > > >>>> parameters are wrong, your TCP network being much faster than the IB. > > >>>> This > > >>>> explains the somewhat unexpected decision of OMPI. > > >>>> > > >>>> As a possible solution I suggest you set these bandwidth values to > > >>>> something more meaningful (directly in your configuration file). As an > > >>>> example, > > >>>> > > >>>> btl_openib_bandwidth = 40000 > > >>>> btl_tcp_bandwidth = 10000 > > >>>> > > >>>> make more sense based on your HPC system description. > > >>>> > > >>>> George. > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> On Fri, Feb 6, 2015 at 5:37 PM, Dave Turner <drdavetur...@gmail.com> > > >>>> wrote: > > >>>> > > >>>>> > > >>>>> We have nodes in our HPC system that have 2 NIC's, > > >>>>> one being QDR IB and the second being a slower 10 Gbps card > > >>>>> configured for both RoCE and TCP. Aggregate bandwidth > > >>>>> tests with 20 cores on one node yelling at 20 cores on a second > > >>>>> node (attached roce.ib.aggregate.pdf) show that without tuning > > >>>>> the slower RoCE interface is being used for small messages > > >>>>> then QDR IB is used for larger messages (red line). Tuning > > >>>>> the tcp_exclusivity to 1024 to match the openib_exclusivity > > >>>>> adds another 20 Gbps of bidirectional bandwidth to the high end (green > > >>>>> line), > > >>>>> and I'm guessing this is TCP traffic and not RoCE. > > >>>>> > > >>>>> So by default the slower interface is being chosen on the low > > >>>>> end, and > > >>>>> I don't think there are tunable parameters to allow me to choose the > > >>>>> QDR interface as the default. Going forward we'll probably just > > >>>>> disable > > >>>>> RoCE on these nodes and go with QDR IB plus 10 Gbps TCP for large > > >>>>> messages. > > >>>>> > > >>>>> However, I do think these issues will come up more in the > > >>>>> future. > > >>>>> With the low latency of RoCE matching IB, there are more opportunities > > >>>>> to do channel bonding or allowing multiple interfaces for aggregate > > >>>>> traffic > > >>>>> for even smaller message sizes. > > >>>>> > > >>>>> Dave Turner > > >>>>> > > >>>>> -- > > >>>>> Work: davetur...@ksu.edu (785) 532-7791 > > >>>>> 118 Nichols Hall, Manhattan KS 66502 > > >>>>> Home: drdavetur...@gmail.com > > >>>>> cell: (785) 770-5929 > > >>>>> > > >>>>> _______________________________________________ > > >>>>> devel mailing list > > >>>>> de...@open-mpi.org > > >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > >>>>> Link to this post: > > >>>>> http://www.open-mpi.org/community/lists/devel/2015/02/16951.php > > >>>>> > > >>>> > > >>>> > > >>> > > >>> > > >>> -- > > >>> Work: davetur...@ksu.edu (785) 532-7791 > > >>> 118 Nichols Hall, Manhattan KS 66502 > > >>> Home: drdavetur...@gmail.com > > >>> cell: (785) 770-5929 > > >>> > > >> > > >> > > >> > > >> -- > > >> Work: davetur...@ksu.edu (785) 532-7791 > > >> 118 Nichols Hall, Manhattan KS 66502 > > >> Home: drdavetur...@gmail.com > > >> cell: (785) 770-5929 > > >> > > >> _______________________________________________ > > >> devel mailing list > > >> de...@open-mpi.org > > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > >> Link to this post: > > >> http://www.open-mpi.org/community/lists/devel/2015/02/16963.php > > >> > > > > > > > > > _______________________________________________ > > > devel mailing list > > > de...@open-mpi.org > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > Link to this post: > > > http://www.open-mpi.org/community/lists/devel/2015/02/16965.php > > > > > -------------- next part -------------- > > HTML attachment scrubbed and removed > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > ------------------------------ > > > > End of devel Digest, Vol 2917, Issue 1 > > ************************************** > > > > > > > > -- > > Work: davetur...@ksu.edu (785) 532-7791 > > 118 Nichols Hall, Manhattan KS 66502 > > Home: drdavetur...@gmail.com > > cell: (785) 770-5929 > > <unidirectional.pdf><bidirectional.pdf><aggregate.pdf>_______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2015/02/17004.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > -- > Work: davetur...@ksu.edu (785) 532-7791 > 118 Nichols Hall, Manhattan KS 66502 > Home: drdavetur...@gmail.com > cell: (785) 770-5929 -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/