Re: [OMPI users] orte-ps and orte-top behavior

2014-10-31 Thread Brock Palen
Thanks!

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



> On Oct 31, 2014, at 2:22 PM, Ralph Castain  wrote:
> 
> 
>> On Oct 30, 2014, at 3:15 PM, Brock Palen  wrote:
>> 
>> If i'm on the node hosting mpirun for a job, and run:
>> 
>> orte-ps
>> 
>> It finds the job and shows the pids and info for all ranks.
>> 
>> If I use orte-top though it does no such default, I have to find the mpirun 
>> pid and then use it.
>> 
>> Why do the two have different behavior?  The show data from the same source 
>> don't they?
> 
> Yeah, well….no good reason, really. Just historical. I can make them 
> consistent :-)
> 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/25648.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25657.php



Re: [OMPI users] orte-ps and orte-top behavior

2014-10-31 Thread Ralph Castain

> On Oct 30, 2014, at 3:15 PM, Brock Palen  wrote:
> 
> If i'm on the node hosting mpirun for a job, and run:
> 
> orte-ps
> 
> It finds the job and shows the pids and info for all ranks.
> 
> If I use orte-top though it does no such default, I have to find the mpirun 
> pid and then use it.
> 
> Why do the two have different behavior?  The show data from the same source 
> don't they?

Yeah, well….no good reason, really. Just historical. I can make them consistent 
:-)

> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25648.php



[OMPI users] IB Retry Limit Errors when fabric changes

2014-10-31 Thread Brock Palen
Does anyone have issues with jobs dying with errors:

> The InfiniBand retry count between two MPI processes has been
> exceeded.  "Retry count" is defined in the InfiniBand spec 1.2
> (section 12.7.38):

We started seeing this about a year ago.  If we have changes to the IB fabric, 
this can happen.  Multiple times now when just plugging in line cards to 
switches on a live system causes large swaths of jobs to die with this error.

Does anyone else have this problem?  We are a Mellonox based fabric.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-31 Thread Roland Fehrenbacher
> "Nathan" == Nathan Hjelm  writes:

Hi Nathan

Nathan> I want to close the loop on this issue. 1.8.5 will address
Nathan> it in several ways:

Nathan>  - knem support in btl/sm has been fixed. A sanity check was
Nathan>disabling knem during component registration. I wrote the
Nathan>sanity check before the 1.7 release and didn't intend
Nathan>this side-effect.

Nathan>  - vader now supports xpmem, cma, and knem. The best
Nathan>available single-copy mechanism will be used. If multiple
Nathan>single-copy mechanisms are available you can select which
Nathan>one you want to use are runtime.

Nathan> More about the vader btl can be found here:
Nathan> 
http://blogs.cisco.com/performance/the-vader-shared-memory-transport-in-open-mpi-now-featuring-3-flavors-of-zero-copy/

Nathan> -Nathan Hjelm HPC-5, LANL

thanks for the great info. Question about xpmem: Are there any plans by
someone to maintain the code?

Roland

---
http://www.q-leap.com / http://qlustar.com
  --- HPC / Storage / Cloud Linux Cluster OS ---


Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-31 Thread Michael.Rachner
Dear developers of OPENMPI,

There remains a hanging observed in MPI_WIN_ALLOCATE_SHARED.

But first: 
Thank you for your advices to employ shmem_mmap_relocate_backing_file = 1
It indeed turned out, that the bad (but silent) allocations  by 
MPI_WIN_ALLOCATE_SHARED, which I observed in the past after ~140 MB of 
allocated shared memory, 
were indeed caused by  a too small available storage for the sharedmem backing 
files. Applying the MCA parameter resolved the problem.

Now the allocation of shared data windows by  MPI_WIN_ALLOCATE_SHARED in the 
OPENMPI-1.8.3 release version works on both clusters!
I tested it both with my small sharedmem-Ftn-testprogram  as well as with our 
Ftn-CFD-code.
It worked  even when allocating 1000 shared data windows containing a total of 
40 GB.  Very well.

But now I come to the problem remaining:
According to the attached email of Jeff (see below) of 2014-10-24, 
we have alternatively installed and tested the bugfixed OPENMPI Nightly Tarball 
 of 2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
That version worked well, when our CFD-code was running on only 1 node.
But I observe now, that when running the CFD-code on 2 node with  2 processes 
per node,
after having allocated a total of 200 MB of data in 20 shared windows, the 
allocation of the 21-th window fails, 
because all 4 processes enter MPI_WIN_ALLOCATE_SHARED but never leave it. The 
code hangs in that routine, without any message.

In contrast, that bug does NOT occur with the  OPENMPI-1.8.3 release version   
with same program on same machine.

That means for you:  
   In openmpi-dev-176-g9334abc.tar.gz   the new-introduced  bugfix concerning 
the shared memory allocation may be not yet correctly coded ,
   or that version contains another new bug in sharedmemory allocation  
compared to the working(!) 1.8.3-release version.

Greetings to you all
  Michael Rachner




-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Freitag, 24. Oktober 2014 22:45
An: Open MPI User's List
Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Nathan tells me that this may well be related to a fix that was literally just 
pulled into the v1.8 branch today:

https://github.com/open-mpi/ompi-release/pull/56

Would you mind testing any nightly tarball after tonight?  (i.e., the v1.8 
tarballs generated tonight will be the first ones to contain this fix)

http://www.open-mpi.org/nightly/master/



On Oct 24, 2014, at 11:46 AM,   
wrote:

> Dear developers of OPENMPI,
>  
> I am running a small downsized Fortran-testprogram for shared memory 
> allocation (using MPI_WIN_ALLOCATE_SHARED and  MPI_WIN_SHARED_QUERY) )
> on only 1 node   of 2 different Linux-clusters with OPENMPI-1.8.3 and 
> Intel-14.0.4 /Intel-13.0.1, respectively.
>  
> The program simply allocates a sequence of shared data windows, each 
> consisting of 1 integer*4-array.
> None of the windows is freed, so the amount of allocated data  in shared 
> windows raises during the course of the execution.
>  
> That worked well on the 1st cluster (Laki, having 8 procs per node))  
> when allocating even 1000 shared windows each having 5 integer*4 array 
> elements, i.e. a total of  200 MBytes.
> On the 2nd cluster (Cluster5, having 24 procs per node) it also worked on the 
> login node, but it did NOT work on a compute node.
> In that error case, there occurs something like an internal storage limit of 
> ~ 140 MB for the total storage allocated in all shared windows.
> When that limit is reached, all later shared memory allocations fail (but 
> silently).
> So the first attempt to use such a bad shared data window results in a bus 
> error due to the bad storage address encountered.
>  
> That strange behavior could be observed in the small testprogram but also 
> with my large Fortran CFD-code.
> If the error occurs, then it occurs with both codes, and both at a storage 
> limit of  ~140 MB.
> I found that this storage limit depends only weakly on  the number of 
> processes (for np=2,4,8,16,24  it is: 144.4 , 144.0, 141.0, 137.0, 
> 132.2 MB)
>  
> Note that the shared memory storage available on both clusters was very large 
> (many GB of free memory).
>  
> Here is the error message when running with np=2 and an  array 
> dimension of idim_1=5  for the integer*4 array allocated per shared 
> window on the compute node of Cluster5:
> In that case, the error occurred at the 723-th shared window, which is the 
> 1st badly allocated window in that case:
> (722 successfully allocated shared windows * 5 array elements * 4 
> Bytes/el. = 144.4 MB)
>  
>  
> [1,0]: on nodemaster: iwin= 722 :
> [1,0]:  total storage [MByte] alloc. in shared windows so far:   
> 144.4000
> 

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-31 Thread Brice Goglin
Le 31/10/2014 00:24, Gus Correa a écrit :
> 2) Any recommendation for the values of the
> various vader btl parameters?
> [There are 12 of them in OMPI 1.8.3!
> That is real challenge to get right.]
>
> Which values did you use in your benchmarks?
> Defaults?
> Other?
>
> In particular, is there an optimal value for the eager/rendevous
> threshold value? (btl_vader_eager_limit, default=4kB)
> [The INRIA web site suggests 32kB for the sm+knem counterpart
> (btl_sm_eager_limit, default=4kB).]

There's no perfect value, and no easy way to tune all this.

The impact of direct copy mechanisms such as XPMEM/KNEM/CMA depends on
the contention in your memory bus and caches. If you're doing a
Alltoall, the optimal threshold for enabling them will be much lower
than if you're doing a pingpong because doing a single copy instead of
two usually helps more when the memory subsystem is overloaded. And it
also depends on your process placement and what cache (and cache size)
is shared between them.

Unfortunately, microbenchmarks will hardly help you decide of a better
threshold because performance also depend on the state of buffers in
caches (did the application writes the send buffer recently? will the
application read the buffer soon? microbenchmark ignore these), and each
copy strategy may have different impact of caches (which process is
reading and writing from which processors and from/to which buffer?).

So I'd say don't bother tuning things for too long...

Brice