Re: [OMPI users] Application hangs on mpi_waitall

2013-06-25 Thread eblosch
An update: I recoded the mpi_waitall as a loop over the requests with
mpi_test and a 30 second timeout.  The timeout happens unpredictably,
sometimes after 10 minutes of run time, other times after 15 minutes, for
the exact same case.

After 30 seconds, I print out the status of all outstanding receive
requests.  The message tags that are outstanding have definitely been
sent, so I am wondering why they are not getting received?

As I said before, everybody posts non-blocking standard receives, then
non-blocking standard sends, then calls mpi_waitall. Each process is
typically waiting on 200 to 300 requests. Is deadlock possible via this
implementation approach under some kind of unusual conditions?

Thanks again,

Ed

> I'm running OpenMPI 1.6.4 and seeing a problem where mpi_waitall never
> returns.  The case runs fine with MVAPICH.  The logic associated with the
> communications has been extensively debugged in the past; we don't think
> it has errors.   Each process posts non-blocking receives, non-blocking
> sends, and then does waitall on all the outstanding requests.
>
> The work is broken down into 960 chunks. If I run with 960 processes (60
> nodes of 16 cores each), things seem to work.  If I use 160 processes
> (each process handling 6 chunks of work), then each process is handling 6
> times as much communication, and that is the case that hangs with OpenMPI
> 1.6.4; again, seems to work with MVAPICH.  Is there an obvious place to
> start, diagnostically?  We're using the openib btl.
>
> Thanks,
>
> Ed
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

2013-06-11 Thread eblosch
Problem solved. I did not configure with --with-mxm=/opt/mellanox/mcm and
this location was not auto-detected.  Once I rebuilt with this option,
everything worked fine. Scaled better than MVAPICH out to 800. MVAPICH
configure log showed that it had found this component of the OFED stack.

Ed


> If you run at 224 and things look okay, then I would suspect something in
> the upper level switch that spans cabinets. At that point, I'd have to
> leave it to Mellanox to advise.
>
>
> On Jun 11, 2013, at 6:55 AM, "Blosch, Edwin L" 
> wrote:
>
>> I tried adding "-mca btl openib,sm,self"  but it did not make any
>> difference.
>>
>> Jesus’ e-mail this morning has got me thinking.  In our system, each
>> cabinet has 224 cores, and we are reaching a different level of the
>> system architecture when we go beyond 224.  I got an additional data
>> point at 256 and found that performance is already falling off. Perhaps
>> I did not build OpenMPI properly to support the Mellanox adapters that
>> are used in the backplane, or I need some configuration setting similar
>> to FAQ #19 in the Tuning/Openfabrics section.
>>
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>> Behalf Of Ralph Castain
>> Sent: Sunday, June 09, 2013 6:48 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance
>> problem
>>
>> Strange - it looks like a classic oversubscription behavior. Another
>> possibility is that it isn't using IB for some reason when extended to
>> the other nodes. What does your cmd line look like? Have you tried
>> adding "-mca btl openib,sm,self" just to ensure it doesn't use TCP for
>> some reason?
>>
>>
>> On Jun 9, 2013, at 4:31 PM, "Blosch, Edwin L" 
>> wrote:
>>
>>
>> Correct.  20 nodes, 8 cores per dual-socket on each node = 360.
>>
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>> Behalf Of Ralph Castain
>> Sent: Sunday, June 09, 2013 6:18 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance
>> problem
>>
>> So, just to be sure - when you run 320 "cores", you are running across
>> 20 nodes?
>>
>> Just want to ensure we are using "core" the same way - some people
>> confuse cores with hyperthreads.
>>
>> On Jun 9, 2013, at 3:50 PM, "Blosch, Edwin L" 
>> wrote:
>>
>>
>>
>> 16.  dual-socket Xeon, E5-2670.
>>
>> I am trying a larger model to see if the performance drop-off happens at
>> a different number of cores.
>> Also I’m running some intermediate core-count sizes to refine the curve
>> a bit.
>> I also added mpi_show_mca_params all, and at the same time,
>> btl_openib_use_eager_rdma 1, just to see if that does anything.
>>
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>> Behalf Of Ralph Castain
>> Sent: Sunday, June 09, 2013 5:04 PM
>> To: Open MPI Users
>> Subject: EXTERNAL: Re: [OMPI users] Need advice on performance problem
>>
>> Looks to me like things are okay thru 160, and then things fall apart
>> after that point. How many cores are on a node?
>>
>>
>> On Jun 9, 2013, at 1:59 PM, "Blosch, Edwin L" 
>> wrote:
>>
>>
>>
>>
>> I’m having some trouble getting good scaling with OpenMPI 1.6.4 and I
>> don’t know where to start looking. This is an Infiniband FDR network
>> with Sandy Bridge nodes.  I am using affinity (--bind-to-core) but no
>> other options. As the number of cores goes up, the message sizes are
>> typically going down. There seem to be lots of options in the FAQ, and I
>> would welcome any advice on where to start.  All these timings are on a
>> completely empty system except for me.
>>
>> Thanks
>>
>>
>> MPI  # cores   Ave. Rate   Std. Dev. %  # timings
>> SpeedupEfficiency
>> 
>> MVAPICH|   16   |8.6783  |   0.995 % |   2  |
>> 16.000  |  1.
>> MVAPICH|   48   |8.7665  |   1.937 % |   3  |
>> 47.517  |  0.9899
>> MVAPICH|   80   |8.8900  |   2.291 % |   3  |
>> 78.095  |  0.9762
>> MVAPICH|  160   |8.9897  |   2.409 % |   3  |
>> 154.457  |  0.9654
>> MVAPICH|  320   |8.9780  |   2.801 % |   3  |
>> 309.317  |  0.9666
>> MVAPICH|  480   |8.9704  |   2.316 % |   3  |
>> 464.366  |  0.9674
>> MVAPICH|  640   |9.0792  |   1.138 % |   3  |
>> 611.739  |  0.9558
>> MVAPICH|  720   |9.1328  |   1.052 % |   3  |
>> 684.162  |  0.9502
>> MVAPICH|  800   |9.1945  |   0.773 % |   3  |
>> 755.079  |  0.9438
>> OpenMPI|   16   |8.6743  |   2.335 % |   2  |
>> 16.000  |  1.
>> OpenMPI|   48   |8.7826  |   1.605 % |   2  |
>> 47.408  |  0.9877
>> OpenMPI|   80   |8.8861  |   0.120 % |   2  |