[OMPI devel] Communicator based on locality of PU

2014-11-28 Thread Nick Papior Andersen
Sorry, Jeff, missed your msg about sending it to the dev list.

Background:
I wanted to be able to easily generate communicators based on locality of
PU used in MPI.
My initial idea is to use MPI_Win_Create to create shared memory based on
locality.
In my line of ideas I have a few arrays which are rarely needed, and when
they are I need all information from all processors.
Instead of performing full AllGather I could use a shared memory base and
skip the overhead of communication and only have overhead of memory
locality. Ok, this might be too specific, but I wanted to test it to learn
something about shared memory in MPI ;)

This functionality is already existing in the hwloc base, it contains all
the information that is needed.

So I worked on the idea and got MPI to recognize a few more flags based on
the locality provided by hwloc.
The function MPI_Comm_Split_Type already provides this type of splitting:
MPI_COMM_TYPE_SHARED
which pretty much does what I wanted.
But it fell short of the general scheme to all levels of control.

So I added different communicator splittings based on these locality
segments:
OMPI_COMM_TYPE_CU
OMPI_COMM_TYPE_HOST
OMPI_COMM_TYPE_BOARD
OMPI_COMM_TYPE_NODE // same as MPI_COMM_TYPE_SHARED
MPI_COMM_TYPE_SHARED // same as OMPI_COMM_TYPE_NODE
OMPI_COMM_TYPE_NUMA
OMPI_COMM_TYPE_SOCKET
OMPI_COMM_TYPE_L3CACHE
OMPI_COMM_TYPE_L2CACHE
OMPI_COMM_TYPE_L1CACHE
OMPI_COMM_TYPE_CORE
OMPI_COMM_TYPE_HWTHREAD

My branch can be found at: https://github.com/zerothi/ompi

First a small "bug" report on the compilation:
I had problems right after the autogen.pl script.
Procedure:
$> git clone .. ompi
$> cd ompi
$> ./autogen.pl
My build versions:
m4: 1.4.17
automake: 1.14
autoconf: 2.69
libtool: 2.4.3
the autogen completes successfully (attached is the autogen output if
needed)
$> mkdir build
$> cd build
$> ../configure --with-platform=optimized
I have attached the config.log (note that I have tested it with both the
shipped 1.9.1 and 1.10.0 hwloc)
$> make all
Error message is:
make[2]: Entering directory '/home/nicpa/test/build/opal/libltdl'
CDPATH="${ZSH_VERSION+.}:" && cd ../../../opal/libltdl && /bin/bash
/home/nicpa/test/config/missing aclocal-1.14 -I ../../config
aclocal-1.14: error: ../../config/autogen_found_items.m4:308: file
'opal/mca/backtrace/configure.m4' does not exist
this error message is the same as found:
http://www.open-mpi.org/community/lists/devel/2013/07/12504.php
My work-around is simple
It has to do with the created ACLOCAL_AMFLAGS variable
in build/opal/libltdl/Makefile
OLD:
ACLOCAL_AMFLAGS = -I ../../config
CORRECT:
ACLOCAL_AMFLAGS = -I ../../
Either the configure script creates the wrong include paths for the m4
scripts, or the m4 scripts are not copied fully to the config directory.
Ok, it works and the fix is simple. I just wonder why?


First here is my test system 1:
$> hwloc-info
depth 0: 1 Machine (type #1)
depth 1: 1 Socket (type #3)
depth 2: 1 L3Cache (type #4)
depth 3: 2 L2Cache (type #4)
depth 4: 2 L1dCache (type #4)
depth 5: 2 L1iCache (type #4)
depth 6: 2 Core (type #5)
depth 7: 4 PU (type #6)
Special depth -3: 2 Bridge (type #9)
Special depth -4: 4 PCI Device (type #10)
Special depth -5: 5 OS Device (type #11)

and my test system 2:
depth 0: 1 Machine (type #1)
depth 1: 1 Socket (type #3)
depth 2: 1 L3Cache (type #4)
depth 3: 4 L2Cache (type #4)
depth 4: 4 L1dCache (type #4)
depth 5: 4 L1iCache (type #4)
depth 6: 4 Core (type #5)
depth 7: 8 PU (type #6)
Special depth -3: 3 Bridge (type #9)
Special depth -4: 3 PCI Device (type #10)
Special depth -5: 4 OS Device (type #11)

Here is an excerpt of what it can do (I have attached a fortran program
that creates a communicator using all types):

$> mpirun -np 4 ./comm_split
Example of MPI_Comm_Split_Type

Currently using 4 nodes.

Comm using CU Node: 2 local rank: 2 out of 4 ranks
Comm using CU Node: 3 local rank: 3 out of 4 ranks
Comm using CU Node: 1 local rank: 1 out of 4 ranks
Comm using CU Node: 0 local rank: 0 out of 4 ranks

Comm using Host Node: 0 local rank: 0 out of 4 ranks
Comm using Host Node: 2 local rank: 2 out of 4 ranks
Comm using Host Node: 3 local rank: 3 out of 4 ranks
Comm using Host Node: 1 local rank: 1 out of 4 ranks

Comm using Board Node: 2 local rank: 2 out of 4 ranks
Comm using Board Node: 3 local rank: 3 out of 4 ranks
Comm using Board Node: 1 local rank: 1 out of 4 ranks
Comm using Board Node: 0 local rank: 0 out of 4 ranks

Comm using Node Node: 0 local rank: 0 out of 4 ranks
Comm using Node Node: 1 local rank: 1 out of 4 ranks
Comm using Node Node: 2 local rank: 2 out of 4 ranks
Comm using Node Node: 3 local rank: 3 out of 4 ranks

Comm using Shared Node: 0 local rank: 0 out of 4 ranks
Comm using Shared Node: 3 local rank: 3 out of 4 ranks
Comm using Shared Node: 1 local rank: 1 out of 4 ranks
Comm using Shared Node: 2 local rank: 2 out of 4 ranks

Comm using Numa Node: 0 local rank: 0 out of 1 ranks
Comm using Numa Node: 2 local rank: 0 out of 1 ranks
Comm using Numa Node: 3 local rank

Re: [OMPI devel] Setting up debug environment on Eclipse PTP

2014-11-28 Thread Ralph Castain
I’m not sure we have any developers using PTP - have you tried asking this 
question on the PTP mailing list, assuming that project still exists?


> On Nov 27, 2014, at 7:38 PM, Alvyn Liang  wrote:
> 
> Dear all,
> 
> I am trying to learn how Open MPI works. Followed many instructions on Web, I 
> tried to setup MPI Hello projects on Eclipse PTP. I am wondering if there is 
> any protocol to setup such an environment.
> 
> I did try a few combination, but still stuck at the point where sometimes 
> there are:
> 1. little bugs symbol showing on the left panel (next to the line numbers) 
> while debugging. Things like "Symbol 'ompi_mpi_finalized' could not be 
> resolved". I think this is due to environmental variables or paths not being 
> set correctly, but I don't know what I have missed.
> 2. Cannot toggle breakpoints or toggled breakpoints being set on a relative 
> file path. This makes the threads not stopping at the breakpoints.
> 
> My environment is CentOS 6.6 running on a machine with 32GB memory, and Intel 
> i7-3770. Since I am still experimenting on local debugging, I am debugging on 
> Generic Open MPI Interactive with connection type local or remotely to 
> 127.0.0.1, and with only a few processes. Detailed Eclipse installation 
> configuration attached.
> 
> My Open MPI is configured as
> ../configure --enable-debug --enable-event-debug --enable-mem-debug 
> --enable-mem-profile
> The compiler is GNU C compiler.
> 
> This gives a lot of information in the console while debugging but not very 
> useful. I am not sure if I should run 'make install' for Open MPI to /usr, or 
> simply set Open MPI source tree as part of the project, or both. Open MPI has 
> examples folder but I don't know how to use the code directly as my source 
> code. For now I can step into source code of Open MPI, but sometimes I cannot 
> toggle breakpoints. Attached is my current debug configuration.
> 
> Good day,
> 
> Alvyn
> 
> A screen shot:
> https://www.dropbox.com/s/s105m2qgi14oj2y/Screenshot-Parallel%20Debug%20-%20ompitest-build-ompi-mpi-c-profile-pinit.c%20-%20Eclipse%20.png?dl=0
>  
> 
> Eclipse configuration:
> https://www.dropbox.com/s/5fnrqyga842w0e0/eclipse.conf?dl=0 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/11/16370.php



Re: [OMPI devel] Question about tight integration with not-yet-supported queuing systems

2014-11-28 Thread Ralph Castain
Hey Marc - just wanted to check to see if you felt this would indeed solve the 
problem for you. I’d rather not invest the time if this isn’t going to meet the 
need, and I honestly don’t know of a better solution.


> On Nov 20, 2014, at 2:13 PM, Ralph Castain  wrote:
> 
> Here’s what I can provide:
> 
> * lsrun -n N bash  This causes openlava to create an allocation and start you 
> off in a bash shell (or pick your shell)
> 
> * mpirun …..   Will read the allocation and use openlava to start the 
> daemons, and then the application, on the allocated nodes
> 
> You can execute as many mpirun’s as you like, then release the allocation (I 
> believe by simply exiting the shell) when done.
> 
> Does that match your expectations?
> Ralph
> 
> 
>> On Nov 20, 2014, at 2:03 PM, Marc Höppner > > wrote:
>> 
>> Hi,
>> 
>> yes, lsrun exists under openlava. 
>> 
>> Using mpirun is fine, but openlava currently requires that to be launched 
>> through a bash script (openmpi-mpirun). Would be neater if one could do away 
>> with that. 
>> 
>> Agan, thanks for looking into this!
>> 
>> /Marc
>> 
>>> Hold on - was discussing this with a (possibly former) OpenLava developer 
>>> who made some suggestions that would make this work. It all hinges on one 
>>> thing.
>>> 
>>> Can you please check and see if you have “lsrun” on your system? If you do, 
>>> then I can offer a tight integration in that we would use OpenLava to 
>>> actually launch the OMPI daemons. Still not sure I could support you 
>>> directly launching MPI apps without using mpirun, if that’s what you are 
>>> after.
>>> 
>>> 
 On Nov 18, 2014, at 7:58 AM, Marc Höppner >>> > wrote:
 
 Hi Ralph,
 
 I really appreciate you guys looking into this! At least now I know that 
 there isn't a better way to run mpi jobs. Probably worth looking into LSF 
 again..
 
 Cheers,
 
 Marc
> I took a brief gander at the OpenLava source code, and a couple of things 
> jump out. First, OpenLava is a batch scheduler and only supports batch 
> execution - there is no interactive command for "run this job". So you 
> would have to "bsub" mpirun regardless.
> 
> Once you submit the job, mpirun can certainly read the local allocation 
> via the environment. However, we cannot use the OpenLava internal 
> functions to launch the daemons or processes as the code is GPL2, and 
> thus has a viral incompatible license. Ordinarily, we get around that by 
> just executing the interactive job execution command, but OpenLava 
> doesn't have one.
> 
> So we'd have no other choice but to use ssh to launch the daemons on the 
> remote nodes. This is exactly what the provided openmpi wrapper script 
> that comes with OpenLava already does.
> 
> Bottom line: I don't see a way to do any deeper integration minus the 
> interactive execution command. If OpenLava had a way of getting an 
> allocation and then interactively running jobs, we could support what you 
> requested. This doesn't seem to be what they are intending, unless I'm 
> missing something (the documentation is rather incomplete).
> 
> Ralph
> 
> 
> On Tue, Nov 18, 2014 at 6:20 AM, Marc Höppner  > wrote:
> Hi,
> 
> sure, no problem. And about the C Api, I really don’t know more than what 
> I was told in the google group post I referred to (i.e. the API is 
> essentially identical to LSF 4-6, which should be on the web).
> 
> The output of env can be found here: 
> https://dl.dropboxusercontent.com/u/1918141/env.txt 
> 
> 
> /M
> 
> Marc P. Hoeppner, PhD
> Team Leader
> BILS Genome Annotation Platform
> Department for Medical Biochemistry and Microbiology
> Uppsala University, Sweden
> marc.hoepp...@bils.se 
>> On 18 Nov 2014, at 15:14, Ralph Castain > > wrote:
>> 
>> If you could just run a single copy of "env" and send the output along, 
>> that would help a lot. I'm not interested in the usual path etc, but 
>> would like to see the envars that OpenLava is setting.
>> 
>> Thanks
>> Ralph
>> 
>> 
>> On Tue, Nov 18, 2014 at 2:19 AM, Gilles Gouaillardet 
>> mailto:gilles.gouaillar...@iferc.org>> 
>> wrote:
>> Marc,
>> 
>> the reply you pointed is a bit confusing to me :
>> 
>> "There is a native C API which can submit/start/stop/kill/re queue jobs"
>> this is not what i am looking for :-(
>> 
>> "you need to make an appropriate call to openlava to start a remote 
>> process"
>> this is what i am interested in :-)
>> could you be more specific (e.g. point me to the functions, since the 
>> OpenLava doc is pretty minima