I tried this. However, 23 bytes is too small so I added the 23 to the 56 (79) required for the PML header. I do not get the error.
mpirun -host host0,host1 -np 2 --mca btl self,tcp --mca btl_tcp_flags 3 --mca btl_tcp_rndv_eager_limit 23 --mca btl_tcp_eager_limit 23 --mca btl_tcp_max_send_size 23 MPI_Isend_ator_c *** An error occurred in MPI_Init The "eager limit" MCA parameter in the tcp BTL was set to a value which is too low for Open MPI to function properly. Please re-run your job with a higher eager limit value for this BTL; the exact MCA parameter name and its corresponding minimum value is shown below. Local host: host0 BTL name: tcp BTL eager limit value: 23 (set via btl_tcp_eager_limit) BTL eager limit minimum: 56 MCA parameter name: btl_tcp_eager_limit -------------------------------------------------------------------------- mpirun -host host0,host1 -np 2 --mca btl self,tcp --mca btl_tcp_flags 3 --mca btl_tcp_rndv_eager_limit 79 --mca btl_tcp_eager_limit 79 --mca btl_tcp_max_send_size 79 MPI_Isend_ator_c MPITEST info (0): Starting MPI_Isend_ator: All Isend TO Root test MPITEST info (0): Node spec MPITEST_comm_sizes[6]=2 too large, using 1 MPITEST info (0): Node spec MPITEST_comm_sizes[22]=2 too large, using 1 MPITEST info (0): Node spec MPITEST_comm_sizes[32]=2 too large, using 1 MPITEST_results: MPI_Isend_ator: All Isend TO Root all tests PASSED (3744) ________________________________________ From: devel [devel-boun...@open-mpi.org] On Behalf Of George Bosilca [bosi...@icl.utk.edu] Sent: Wednesday, May 07, 2014 1:23 PM To: Open MPI Developers Subject: Re: [OMPI devel] regression with derived datatypes Strange. The outcome and the timing of this issue seems to highlight a link with the other datatype-related issue you reported earlier, and as suggested by Ralph with Gilles scif+vader issue. Generally speaking, the mechanism used to split the data in the case of multiple BTLs, is identical to the one used to split the data in fragments. So, if the culprit is in the splitting logic, one might see some weirdness as soon as we force the exclusive usage of the send protocol, with an unconventional fragment size. In other words using the following flags “—mca btl tcp,self —mca btl_tcp_flags 3 —mca btl_tcp_rndv_eager_limit 23 —mca btl_tcp_eager_limit 23 —mca btl_tcp_max_send_size 23” should always transfer wrong data, even when only one single BTL is in play. George. On May 7, 2014, at 13:11 , Rolf vandeVaart <rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote: OK. So, I investigated a little more. I only see the issue when I am running with multiple ports enabled such that I have two openib BTLs instantiated. In addition, large message RDMA has to be enabled. If those conditions are not met, then I do not see the problem. For example: FAILS: > mpirun –np 2 –host host1,host2 –mca btl_openib_if_include mlx5_0:1,mlx5_0:2 > –mca btl_openib_flags 3 MPI_Isend_ator_c PASS: > mpirun –np 2 –host host1,host2 –mca btl_openib_if_include mlx5_0:1 –mca > btl_openib_flags 3 MPI_Isend_ator_c > mpirun –np 2 –host host1,host2 –mca btl_openib_if_include_mlx5:0:1,mlx5_0:2 > –mca btl_openib_flags 1 MPI_Isend_ator_c So we must have some type of issue when we break up the message between the two openib BTLs. Maybe someone else can confirm my observations? I was testing against the latest trunk. Rolf From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Joshua Ladd Sent: Wednesday, May 07, 2014 10:48 AM To: Open MPI Developers Subject: Re: [OMPI devel] regression with derived datatypes Rolf, This was run on a Sandy Bridge system with ConnectX-3 cards. Josh On Wed, May 7, 2014 at 10:46 AM, Joshua Ladd <jladd.m...@gmail.com<mailto:jladd.m...@gmail.com>> wrote: Elena, can you run your reproducer on the trunk, please, and see if the problem persists? Josh On Wed, May 7, 2014 at 10:26 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote: On May 7, 2014, at 10:03 AM, Elena Elkina <elena.elk...@itseez.com<mailto:elena.elk...@itseez.com>> wrote: > Yes, this commit is also in the trunk. Yes, I understand that -- my question is: is this same *behavior* happening on the trunk. I.e., is there some other effect on the trunk that is causing the bad behavior to not occur? > Best, > Elena > > > On Wed, May 7, 2014 at 5:45 PM, Jeff Squyres (jsquyres) > <jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote: > Is this also happening on the trunk? > > > Sent from my phone. No type good. > > On May 7, 2014, at 9:44 AM, "Elena Elkina" > <elena.elk...@itseez.com<mailto:elena.elk...@itseez.com>> wrote: > >> Sorry, >> >> Fixes #4501: Datatype unpack code produces incorrect results in some case >> >> ---svn-pre-commit-ignore-below--- >> >> r31370 [[BR]] >> Reshape all the packing/unpacking functions to use the same skeleton. >> Rewrite the >> generic_unpacking to take advantage of the same capabilitites. >> >> r31380 [[BR]] >> Remove a non-necessary label. >> >> r31387 [[BR]] >> Correctly save the displacement for the case where the convertor is not >> completed. As we need to have the right displacement at the beginning >> of the next call, we should save the position relative to the beginning >> of the buffer and not to the last loop. >> >> Best regards, >> Elena >> >> >> On Wed, May 7, 2014 at 5:43 PM, Jeff Squyres (jsquyres) >> <jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote: >> Can you cite the branch and SVN r number? >> >> Sent from my phone. No type good. >> >> > On May 7, 2014, at 9:24 AM, "Elena Elkina" >> > <elena.elk...@itseez.com<mailto:elena.elk...@itseez.com>> wrote: >> > >> > b531973419a056696e6f88d813769aa4f1f1aee6 >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org<mailto:de...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/05/14701.php >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org<mailto:de...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/05/14702.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org<mailto:de...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14703.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org<mailto:de...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14704.php -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ _______________________________________________ devel mailing list de...@open-mpi.org<mailto:de...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14706.php ________________________________ This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ________________________________ _______________________________________________ devel mailing list de...@open-mpi.org<mailto:de...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14720.php