Re: [OMPI users] deadlock on intercommunicator after MPI_Comm_spawn_multiple (OS X / Linux)

2012-12-04 Thread Valentin Clement
Hi, 

Thanks, with this the problem is gone. In fact, no interface is posing problem 
if there is only one of them active. Anyway, using the options to mpiexec are 
just fine for the moment for me. 

Regards

Valentin 


On Dec 5, 2012, at 11:19 AM, Ralph Castain  wrote:

> Strange - that shouldn't be happening. However, to get thru it, just restrict 
> the interfaces OMPI uses. If it's the hardwired Ethernet that is causing the 
> problem, then add
> 
> -mca oob_tcp_if_exclude en0 -mca btl_tcp_if_exclude en0
> 
> to your cmd line. If it's the wireless, then substitute en1 for en0 in the 
> above.
> 
> 
> 
> On Tue, Dec 4, 2012 at 5:16 PM, Valentin Clement  
> wrote:
> Hi, 
> 
> It seems, the problem is happening if I have two active interfaces on my 
> computer. Is there any configuration to use MPI_Comm_spawn_multiple on a 
> machine with multiple interfaces ? 
> 
> Regards, 
> 
> Valentin 
> 
> On Dec 3, 2012, at 3:00 PM, Valentin Clement  
> wrote:
> 
>> Hi, 
>> 
>> I'm using call to MPI_Comm_spawn_multiple in a quite big application. I've 
>> seen a deadlock occurred in a very strange situation. If I'm running my 
>> application on my Ubuntu 12.10 with OpenMPI 1.6.3 there is absolutely no 
>> problem. 
>> 
>> On my Mac OS X 10.8.2 with also OpenMPI 1.6.3, I'm experiencing a dead lock 
>> on an intrecommunicator resulting from the MPI_Comm_spawn_multiple only if 
>> my ethernet interface is enable. If I disable it, the deadlock is gone. 
>> 
>> Anyone has an idea of what is happening ? I joined the output of ompi_info 
>> on both OS X and Linux. 
>> 
>> Regards,
>> 
>> Valentin 
>> 
>> 
>> -
>> Valentin Clement - Student trainee at RIKEN AICS
>> Programming Environment Research Team
>> valentin.clem...@hefr.ch
>> valentin.clem...@riken.jp
>> Master thesis project
>> POP-C++ on the K Computer 
>> Project homepage: https://forge.tic.eia-fr.ch/projects/poponk
>> Project board: https://forge.tic.eia-fr.ch/projects/poponk/wiki/Wiki
>> -
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> On Dec 3, 2012, at 3:00 PM, Valentin Clement  
> wrote:
> 
>> Hi, 
>> 
>> I'm using call to MPI_Comm_spawn_multiple in a quite big application. I've 
>> seen a deadlock occurred in a very strange situation. If I'm running my 
>> application on my Ubuntu 12.10 with OpenMPI 1.6.3 there is absolutely no 
>> problem. 
>> 
>> On my Mac OS X 10.8.2 with also OpenMPI 1.6.3, I'm experiencing a dead lock 
>> on an intrecommunicator resulting from the MPI_Comm_spawn_multiple only if 
>> my ethernet interface is enable. If I disable it, the deadlock is gone. 
>> 
>> Anyone has an idea of what is happening ? I joined the output of ompi_info 
>> on both OS X and Linux. 
>> 
>> Regards,
>> 
>> Valentin 
>> 
>> 
>> -
>> Valentin Clement - Student trainee at RIKEN AICS
>> Programming Environment Research Team
>> valentin.clem...@hefr.ch
>> valentin.clem...@riken.jp
>> Master thesis project
>> POP-C++ on the K Computer 
>> Project homepage: https://forge.tic.eia-fr.ch/projects/poponk
>> Project board: https://forge.tic.eia-fr.ch/projects/poponk/wiki/Wiki
>> -
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -
> Valentin Clement - Student trainee at RIKEN AICS
> Programming Environment Research Team
> valentin.clem...@hefr.ch
> valentin.clem...@riken.jp
> Master thesis project
> POP-C++ on the K Computer 
> Project homepage: https://forge.tic.eia-fr.ch/projects/poponk
> Project board: https://forge.tic.eia-fr.ch/projects/poponk/wiki/Wiki
> -
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-
Valentin Clement - Student trainee at RIKEN AICS
Programming Environment Research Team
valentin.clem...@hefr.ch
valentin.clem...@riken.jp
Master thesis project
POP-C++ on the K Computer 
Project homepage: https://forge.tic.eia-fr.ch/projects/poponk
Project board: https://forge.tic.eia-fr.ch/projects/poponk/wiki/Wiki
---

Re: [OMPI users] deadlock on intercommunicator after MPI_Comm_spawn_multiple (OS X / Linux)

2012-12-04 Thread Ralph Castain
Strange - that shouldn't be happening. However, to get thru it, just
restrict the interfaces OMPI uses. If it's the hardwired Ethernet that is
causing the problem, then add

-mca oob_tcp_if_exclude en0 -mca btl_tcp_if_exclude en0

to your cmd line. If it's the wireless, then substitute en1 for en0 in the
above.



On Tue, Dec 4, 2012 at 5:16 PM, Valentin Clement
wrote:

> Hi,
>
> It seems, the problem is happening if I have two active interfaces on my
> computer. Is there any configuration to use MPI_Comm_spawn_multiple on a
> machine with multiple interfaces ?
>
> Regards,
>
> Valentin
>
> On Dec 3, 2012, at 3:00 PM, Valentin Clement 
> wrote:
>
> Hi,
>
> I'm using call to MPI_Comm_spawn_multiple in a quite big application. I've
> seen a deadlock occurred in a very strange situation. If I'm running my
> application on my Ubuntu 12.10 with OpenMPI 1.6.3 there is absolutely no
> problem.
>
> On my Mac OS X 10.8.2 with also OpenMPI 1.6.3, I'm experiencing a dead
> lock on an intrecommunicator resulting from the MPI_Comm_spawn_multiple
> only if my ethernet interface is enable. If I disable it, the deadlock is
> gone.
>
> Anyone has an idea of what is happening ? I joined the output of ompi_info
> on both OS X and Linux.
>
> Regards,
>
> Valentin
>
>
>
> -
> Valentin Clement - Student trainee at RIKEN AICS
> Programming Environment Research Team
> valentin.clem...@hefr.ch
> valentin.clem...@riken.jp
> Master thesis project
> POP-C++ on the K Computer
> Project homepage: https://forge.tic.eia-fr.ch/projects/poponk
> Project board: https://forge.tic.eia-fr.ch/projects/poponk/wiki/Wiki
>
> -
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> On Dec 3, 2012, at 3:00 PM, Valentin Clement 
> wrote:
>
> Hi,
>
> I'm using call to MPI_Comm_spawn_multiple in a quite big application. I've
> seen a deadlock occurred in a very strange situation. If I'm running my
> application on my Ubuntu 12.10 with OpenMPI 1.6.3 there is absolutely no
> problem.
>
> On my Mac OS X 10.8.2 with also OpenMPI 1.6.3, I'm experiencing a dead
> lock on an intrecommunicator resulting from the MPI_Comm_spawn_multiple
> only if my ethernet interface is enable. If I disable it, the deadlock is
> gone.
>
> Anyone has an idea of what is happening ? I joined the output of ompi_info
> on both OS X and Linux.
>
> Regards,
>
> Valentin
>
>
>
> -
> Valentin Clement - Student trainee at RIKEN AICS
> Programming Environment Research Team
> valentin.clem...@hefr.ch
> valentin.clem...@riken.jp
> Master thesis project
> POP-C++ on the K Computer
> Project homepage: https://forge.tic.eia-fr.ch/projects/poponk
> Project board: https://forge.tic.eia-fr.ch/projects/poponk/wiki/Wiki
>
> -
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> -
> Valentin Clement - Student trainee at RIKEN AICS
> Programming Environment Research Team
> valentin.clem...@hefr.ch
> valentin.clem...@riken.jp
> Master thesis project
> POP-C++ on the K Computer
> Project homepage: https://forge.tic.eia-fr.ch/projects/poponk
> Project board: https://forge.tic.eia-fr.ch/projects/poponk/wiki/Wiki
>
> -
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] deadlock on intercommunicator after MPI_Comm_spawn_multiple (OS X / Linux)

2012-12-04 Thread Valentin Clement
Hi, 

It seems, the problem is happening if I have two active interfaces on my 
computer. Is there any configuration to use MPI_Comm_spawn_multiple on a 
machine with multiple interfaces ? 

Regards, 

Valentin 

On Dec 3, 2012, at 3:00 PM, Valentin Clement  wrote:

> Hi, 
> 
> I'm using call to MPI_Comm_spawn_multiple in a quite big application. I've 
> seen a deadlock occurred in a very strange situation. If I'm running my 
> application on my Ubuntu 12.10 with OpenMPI 1.6.3 there is absolutely no 
> problem. 
> 
> On my Mac OS X 10.8.2 with also OpenMPI 1.6.3, I'm experiencing a dead lock 
> on an intrecommunicator resulting from the MPI_Comm_spawn_multiple only if my 
> ethernet interface is enable. If I disable it, the deadlock is gone. 
> 
> Anyone has an idea of what is happening ? I joined the output of ompi_info on 
> both OS X and Linux. 
> 
> Regards,
> 
> Valentin 
> 
> 
> -
> Valentin Clement - Student trainee at RIKEN AICS
> Programming Environment Research Team
> valentin.clem...@hefr.ch
> valentin.clem...@riken.jp
> Master thesis project
> POP-C++ on the K Computer 
> Project homepage: https://forge.tic.eia-fr.ch/projects/poponk
> Project board: https://forge.tic.eia-fr.ch/projects/poponk/wiki/Wiki
> -
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

On Dec 3, 2012, at 3:00 PM, Valentin Clement  wrote:

> Hi, 
> 
> I'm using call to MPI_Comm_spawn_multiple in a quite big application. I've 
> seen a deadlock occurred in a very strange situation. If I'm running my 
> application on my Ubuntu 12.10 with OpenMPI 1.6.3 there is absolutely no 
> problem. 
> 
> On my Mac OS X 10.8.2 with also OpenMPI 1.6.3, I'm experiencing a dead lock 
> on an intrecommunicator resulting from the MPI_Comm_spawn_multiple only if my 
> ethernet interface is enable. If I disable it, the deadlock is gone. 
> 
> Anyone has an idea of what is happening ? I joined the output of ompi_info on 
> both OS X and Linux. 
> 
> Regards,
> 
> Valentin 
> 
> 
> -
> Valentin Clement - Student trainee at RIKEN AICS
> Programming Environment Research Team
> valentin.clem...@hefr.ch
> valentin.clem...@riken.jp
> Master thesis project
> POP-C++ on the K Computer 
> Project homepage: https://forge.tic.eia-fr.ch/projects/poponk
> Project board: https://forge.tic.eia-fr.ch/projects/poponk/wiki/Wiki
> -
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-
Valentin Clement - Student trainee at RIKEN AICS
Programming Environment Research Team
valentin.clem...@hefr.ch
valentin.clem...@riken.jp
Master thesis project
POP-C++ on the K Computer 
Project homepage: https://forge.tic.eia-fr.ch/projects/poponk
Project board: https://forge.tic.eia-fr.ch/projects/poponk/wiki/Wiki
-



smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] Windows support for OpenMPI

2012-12-04 Thread Durga Choudhury
All

Since I did not see any Microsoft/other 'official' folks pick up the ball,
let me step up. I have been lurking in this list for quite a while and I am
a generic scientific programmer (i.e. I use many frameworks such as
OpenCL/OpenMP etc, not just MPI)
Although I am primarily a Linux user, I do own multiple versions of Visual
Studio licenses and have a small cluster that dual boots to Windows/Linux
(and more nodes can be added on demand). I cannot do any large scale
testing on this, but I can build and run regression tests etc.

If the community needs the Windows support to continue, I can take up that
responsibility, until a more capable person/group is found at least.

Thanks
Durga


On Mon, Dec 3, 2012 at 12:32 PM, Damien  wrote:

> All,
>
> I completely missed the message about Shiqing departing as the OpenMPI
> Windows maintainer.  I'll try and keep Windows builds going for 1.6 at
> least, I have 2011 and 2013 Intel licenses and VS2008 and 2012, but not
> 2010.  I see that the 1.6.3 code base already doesn't build on Windows in
> VS2012  :-(.
>
> While I can try and keep builds going, I don't have access to a Windows
> cluster right now, and I'm flat out on two other projects. I can test on my
> workstation, but that will only go so far. Longer-term, there needs to be a
> decision made on whether Windows gets to be a first-class citizen in
> OpenMPI or not.  Jeff's already told me that 1.7 is lagging behind on
> Windows.  It would be a shame to have all the work Shiqing put in gradually
> decay because it can't be supported enough.  If there's any
> Microsoft/HPC/Azure folks observing this list, or any other vendors who run
> on Windows with OpenMPI, maybe we can see what can be done if you're
> interested.
>
> Damien
> __**_
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/**mailman/listinfo.cgi/users
>


Re: [OMPI users] BLCR + Qlogic infiniband

2012-12-04 Thread William Hay
On 28 November 2012 11:14, William Hay  wrote:

> I'm trying to build openmpi with support for BLCR plus qlogic infiniband
> (plus grid engine).  Everything seems to compile OK and checkpoints are
> taken but whenever I try to restore a checkpoint I get the following error:
> - do_mmap(, 2aaab18c7000, 1000, ...) failed:
> ffea
> - mmap failed: /dev/ipath
> - thaw_threads returned error, aborting. -22
> - thaw_threads returned error, aborting. -22
> Restart failed: Invalid argument
>
> This occurs whether I specify psm or openib as the btl.
>
> This looks like the sort of thing I would expect to be handled by the blcr
> supporting code in openmpi.  So I guess I have a couple ofquestions.
> 1)Are Infiniband and BLCR support in openmpi compatible?
> 2)Are there any special tricks necessary to get them working together.
>
> A third question occurred to me that may be relevant.  How do I verify
that my openmpi install has blcr support built in?  I would have thought
this would mean that either mpiexec or binaries built with mpicc would have
libcr linked in.  However running ldd doesn't report this in either case.
 I'm setting LD_PRELOAD to point to it but I would have thought openmpi
would need to register a callback with blcr and it would be easier to do
this if the library were linked in rather than trying to detect whether it
has been LD_PRELOADed.  I'm building with the following options:
./configure --prefix=/home/ccaawih/openmpi-blcr --with-openib --without-psm
--with-blcr=/usr --with-blcr-libdir=/usr/lib64 --with-ft=cr
--enable-ft-thread --enable-mpi-threads --with-sge


Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-04 Thread Joseph Farran

Hi Mike.

Removed the old mxm, downloaded and installed:

/tmp/mxm/v1.1/per-ofed/1.5.4.1/mxm-1.1.3a5e745-1.x86_64-rhel6u3.rpm

I am suing OFED 1.5.4.1 and it still fails at the same spot:

make[2]: Entering directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
  CC mtl_mxm.lo
  CC mtl_mxm_cancel.lo
  CC mtl_mxm_component.lo
  CC mtl_mxm_endpoint.lo
  CC mtl_mxm_probe.lo
  CC mtl_mxm_recv.lo
  CC mtl_mxm_send.lo
  CCLD   mca_mtl_mxm.la
/bin/grep: /usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la: No such file or 
directory
/bin/sed: can't read /usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la: No such 
file or directory
libtool: link: `/usr/local/mofed-inst/1.5.4.1/lib/librdmacm.la' is not a valid 
libtool archive
make[2]: *** [mca_mtl_mxm.la] Error 1
make[2]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/data/apps/sources/openmpi-1.6.3/ompi'
make: *** [all-recursive] Error 1


On 12/2/2012 10:18 PM, Mike Dubman wrote:

ohh.. you have MOFED 1.5.4.1, thought it was 1.5.3-3.1.0
will provide you a link to mxm package compiled with this MOFED version (thanks 
to no ABI in OFED).

On Sun, Dec 2, 2012 at 10:04 PM, Joseph Farran mailto:jfar...@uci.edu>> wrote:

1.5.4.1