Re: [OMPI devel] [OMPI svn] svn:open-mpi r25323

2011-10-20 Thread Jeff Squyres
On Oct 19, 2011, at 6:41 PM, George Bosilca wrote:

> A careful reading of the committed patch, would have pointed out that none of 
> the concerns raised so far were true, the "old-way" behavior of the OMPI code 
> was preserved.

Then perhaps you could have added some comments to explain the not-obvious 
semantics, and less people would have argued.  :-)

> Moreover, every single of the error codes removed were not used in ages.

On the trunk.  It is highly likely that no one else is using those codes 
anywhere, but you can't *know* that.  A courtesy RFC is always a good idea 
here.  

Indeed, you have railed against exactly this kind of behavior before: people 
changing things on the trunk that had impact on your private research branches. 
 :-)

> What Brian pointed out as evil, evil being a subjective notion by itself, 
> didn't prevent the correct behavior of the code, nor affected in any way it's 
> correctness. Anyway, to address his concern I pushed a patch (25333) putting 
> the OMPI error codes back where they were originally.

The point was that an RFC would have been a much less controversial way of 
doing this, especially since the code turned out to be fairly subtle, 
un-commented, and different than what has been done in the past.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI devel] Process ranks

2011-10-20 Thread Ralph Castain
Since people may not be fully familiar, and because things have evolved, I 
thought it might help to provide a brief explanation of the ranks we assign to 
processes in OMPI.

Each process has four "ranks" assigned to it at launch:

1. vpid - equivalent to its MPI rank within the job. You can access the vpid 
with ORTE_PROC_MY_NAME->vpid.

2. local_rank - the relative rank of the process, within its own job, on the 
local node. For example, if there are three processes from this job on the 
node, then the lowest vpid process would have local_rank=0, the next highest 
vpid process would have local_rank=1, etc. The local_rank is typically used by 
the shared memory subsystem to decide which proc will create the backing file.

Note that processes from dynamically spawned jobs on the node will have 
overlapping local_ranks. For example, if a process on the above job were to 
comm_spawn two more procs on the node, the lowest vpid of those would also have 
local_rank=0 as it is in a different jobid.

Every process has full knowledge of the local_rank for every other process 
executing within that mpirun AND for any proc that connected to it via MPI 
connect/accept or comm_spawn (the info is included in the modex during the 
connect/accept procedure). You can obtain the local_rank of any process using

orte_local_rank_t orte_ess.get_local_rank(proc_name)

This will return ORTE_LOCAL_RANK_INVALID if the info isn't known.

3. node_rank - the relative rank of the process, spanning all jobs under this 
mpirun, on the local node. The node_rank is typically used by the OOB to select 
a static port from the given range, thus ensuring that each proc on the node - 
regardless of job - takes a unique port. For example, if there are three 
processes from this job on the node, then the lowest vpid process would have 
node_rank=0, the next highest vpid process would have node_rank =1, etc. If a 
process they comm_spawns another process onto the node, it will have 
node_rank=3 since the computation spans -all- jobs.

Every process has full knowledge of the node_rank for every other process 
executing within that mpirun AND for any proc that connected to it via MPI 
connect/accept or comm_spawn (the info is included in the modex during the 
connect/accept procedure). You can obtain the node_rank of any process using

orte_node_rank_t orte_ess.get_node_rank(proc_name)

This will return ORTE_NODE_RANK_INVALID if the info isn't known.

4. app_rank - the relative rank of the process within its app_context. This 
equates to the vpid for a job that contains only one app_context. However, for 
jobs with multiple app_contexts, this value provides a way of determining a 
proc's rank solely within its own app_context. Each process only has access to 
its own app_rank in orte_process_info - it doesn't have any knowledge of the 
app_rank for other processes.

HTH
Ralph




[OMPI devel] Determining locality

2011-10-20 Thread Ralph Castain
For those wishing to use the new locality functionality, here is a little 
(hopefully clearer) info on how to do it. A few clarifications first may help:

1. the locality is defined by the precise cpu set upon which a process is 
bound. If not bound, this obviously includes all the available cpus on the node 
where the process resides. 

2. the locality value we return to you is a bitmask where each bit represents a 
specific layer of common usage between you (the proc in which the call to 
orte_ess.proc_get_locality is made) and the given process. In other words, if 
the "socket" bit is set, it means you and the process you specified are both 
bound to the same socket.

Important note: it does -not- mean that the other process is currently 
executing on the same socket as you are executing upon at this instant in time. 
It only means that the OS is allowing that process to use the same socket that 
you are allowed to use. As the process swaps in/out and moves around, it may or 
may not be co-located on the socket with you at any given instant.

We do not currently provide a way for a process to get the relative locality of 
two other remote processes. However, the infrastructure supports this, so we 
can add it if/when someone shows a use-case for it.

3. every process has locality info for all of its peers AND for any proc that 
connected to it via MPI connect/accept or comm_spawn (the info is included in 
the modex during the connect/accept procedure). This is true regardless of 
launch method, with the exception of cnos (which doesn't have a modex).


With that in mind, let's start with determining if a proc is on the same node. 
The only way to determine if two procs other than yourself are on the same node 
is to compare their daemon vpids:

if (orte_ess.proc_get_daemon(A) == orte_ess.proc_get_daemon(B)), then A and B 
are on the same node.


However, there are two ways to determine if another proc is on the same node as 
you. First, you can of course use the above method to determine if you share 
the same daemon:

if (orte_ess.proc_get_daemon(A) == ORTE_PROC_MY_DAEMON->vpid), then we are on 
the same node

Alternatively, you can use the proc locality since it contains a "node" bit:

if (OPAL_PROC_ON_LOCAL_NODE(orte_ess.proc_get_locality(A))), then the proc is 
on the same node as us.


Similarly, we can determine if another process shares a socket, NUMA node, or 
other hardware element with us by applying the corresponding OPAL_PROC_ON_xxx 
macro to the locality returned by calling orte_ess.proc_get_locality for that 
process.

HTH
Ralph




[OMPI devel] MPI 2.2 datatypes

2011-10-20 Thread Eugene Loh
In MTT testing, we check OMPI version number to decide whether to test 
MPI 2.2 datatypes.


Specifically, in intel_tests/src/mpitest_def.h:

#define MPITEST_2_2_datatype 0
#if defined(OPEN_MPI)
#if (OMPI_MAJOR_VERSION > 1) || (OMPI_MAJOR_VERSION == 1 && 
OMPI_MINOR_VERSION >= 7)

#undef MPITEST_2_2_datatype
#define MPITEST_2_2_datatype 1
#endif
#endif
#if MPI_VERSION > 2 || (MPI_VERSION == 2 && MPI_SUBVERSION >= 2)
#undef MPITEST_2_2_datatype
#define MPITEST_2_2_datatype 1
#endif

The check looks for OMPI 1.7 or higher, but we introduced support for 
MPI 2.2. datatypes in 1.5.4.  So, can we check for 1.5.4 or higher?  Or, 
is it possible that this support might not go into the first 1.6 
release?  I'm willing to make the changes, but just wanted some guidance 
on what to expect in 1.6.


Re: [OMPI devel] === CREATE FAILURE (trunk) ===

2011-10-20 Thread Ralph Castain
regenerating now...

On Oct 20, 2011, at 7:14 PM, MPI Team wrote:

> 
> ERROR: Command returned a non-zero exist status (trunk):
>   make distcheck
> 
> Start time: Thu Oct 20 21:00:02 EDT 2011
> End time:   Thu Oct 20 21:14:13 EDT 2011
> 
> ===
> [... previous lines snipped ...]
> make[2]: Entering directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/vprotocol'
> make[2]: Leaving directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/vprotocol'
> (cd mca/common/mx && make  top_distdir=../../../../openmpi-1.7a1r25345 
> distdir=../../../../openmpi-1.7a1r25345/ompi/mca/common/mx \
> am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir)
> make[2]: Entering directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/mx'
> make[2]: Leaving directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/mx'
> (cd mca/common/cuda && make  top_distdir=../../../../openmpi-1.7a1r25345 
> distdir=../../../../openmpi-1.7a1r25345/ompi/mca/common/cuda \
> am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir)
> make[2]: Entering directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/cuda'
> make[2]: Leaving directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/cuda'
> (cd mca/common/sm && make  top_distdir=../../../../openmpi-1.7a1r25345 
> distdir=../../../../openmpi-1.7a1r25345/ompi/mca/common/sm \
> am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir)
> make[2]: Entering directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/sm'
> make[2]: Leaving directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/sm'
> (cd mca/common/portals && make  top_distdir=../../../../openmpi-1.7a1r25345 
> distdir=../../../../openmpi-1.7a1r25345/ompi/mca/common/portals \
> am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir)
> make[2]: Entering directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/portals'
> make[2]: Leaving directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/common/portals'
> (cd mca/allocator/bucket && make  top_distdir=../../../../openmpi-1.7a1r25345 
> distdir=../../../../openmpi-1.7a1r25345/ompi/mca/allocator/bucket \
> am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir)
> make[2]: Entering directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/allocator/bucket'
> make[2]: Leaving directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/allocator/bucket'
> (cd mca/allocator/basic && make  top_distdir=../../../../openmpi-1.7a1r25345 
> distdir=../../../../openmpi-1.7a1r25345/ompi/mca/allocator/basic \
> am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir)
> make[2]: Entering directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/allocator/basic'
> make[2]: Leaving directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/allocator/basic'
> (cd mca/bml/r2 && make  top_distdir=../../../../openmpi-1.7a1r25345 
> distdir=../../../../openmpi-1.7a1r25345/ompi/mca/bml/r2 \
> am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir)
> make[2]: Entering directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/bml/r2'
> make[2]: Leaving directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/bml/r2'
> (cd mca/btl/self && make  top_distdir=../../../../openmpi-1.7a1r25345 
> distdir=../../../../openmpi-1.7a1r25345/ompi/mca/btl/self \
> am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir)
> make[2]: Entering directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/btl/self'
> make[2]: Leaving directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/btl/self'
> (cd mca/btl/mx && make  top_distdir=../../../../openmpi-1.7a1r25345 
> distdir=../../../../openmpi-1.7a1r25345/ompi/mca/btl/mx \
> am__remove_distdir=: am__skip_length_check=: am__skip_mode_fix=: distdir)
> make[2]: Entering directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/btl/mx'
> make[2]: Leaving directory 
> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r25345/ompi/ompi/mca/btl/mx'
> (cd mca/btl/ofud && make  top_distdir=../../../../openmpi-1.7a1r25345 
> distdir=../../..