Re: [OMPI users] problem about mpirun on two nodes

2016-05-23 Thread Jeff Squyres (jsquyres)
You might want to test with some known-good MPI applications first.  Try 
following the steps in this FAQ item:

https://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems


> On May 23, 2016, at 2:31 PM, dour...@aol.com wrote:
> 
> Jeff, Thank you for your advice.
> 
> By bad. I took the wrong shot, because I tested so many different settings. 
> After I came back to the original network settings, "permission denied', of 
> course disappeared, but the other messages were still there. The master node 
> has two NICs, one for WAN (via another server) with zone=external and the 
> other for the slave node, zone = internal. The NICs on the master are in 
> different subnet.
> NIC on the slave node is set to 'internal'.Their status was confirmed by 
> firewall-cmd --get-active-zones. 
> 
> I temporary stopped firewalld and the error messages disappeared. I saw six 
> processes were running on each node, but now the all processes keep running 
> forever with 100% CPU usage.
> 
> 
> -Original Message-
> From: Jeff Squyres (jsquyres) <jsquy...@cisco.com>
> To: Open MPI User's List <us...@open-mpi.org>
> Sent: Mon, May 23, 2016 9:13 am
> Subject: Re: [OMPI users] problem about mpirun on two nodes
> 
> On May 21, 2016, at 11:31 PM, dour...@aol.com wrote:
>> 
>> I encountered a problem about mpirun and SSH when using OMPI 1.10.0 compiled 
>> with gcc, running on centos7.2.
>> When I execute mpirun on my 2 node cluster, I get the following errors 
>> pasted below.
>> 
>> [douraku@master home]$ mpirun -np 12 a.out
>> Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
> 
> This is the key right here: you got a permission denied error when you 
> (assumedly) tried to execute on the remote server.
> 
> Triple check your ssh settings to ensure that you can run on the remote 
> server(s) without a password or interactive passphrase entry.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29282.php
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29290.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] problem about mpirun on two nodes

2016-05-23 Thread douraku
Jeff, Thank you for your advice.

By bad. I took the wrong shot, because I tested so many different settings. 
After I came back to the original network settings, "permission denied', of 
course disappeared, but the other messages were still there. The master node 
has two NICs, one for WAN (via another server) with zone=external and the other 
for the slave node, zone = internal. The NICs on the master are in different 
subnet.
NIC on the slave node is set to 'internal'.Their status was confirmed by 
firewall-cmd --get-active-zones. 

I temporary stopped firewalld and the error messages disappeared. I saw six 
processes were running on each node, but now the all processes keep running 
forever with 100% CPU usage.


-Original Message-
From: Jeff Squyres (jsquyres) <jsquy...@cisco.com>
To: Open MPI User's List <us...@open-mpi.org>
Sent: Mon, May 23, 2016 9:13 am
Subject: Re: [OMPI users] problem about mpirun on two nodes

On May 21, 2016, at 11:31 PM, dour...@aol.com wrote:
> 
> I encountered a problem about mpirun and SSH when using OMPI 1.10.0 compiled 
> with gcc, running on centos7.2.
> When I execute mpirun on my 2 node cluster, I get the following errors pasted 
> below.
> 
> [douraku@master home]$ mpirun -np 12 a.out
> Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

This is the key right here: you got a permission denied error when you 
(assumedly) tried to execute on the remote server.

Triple check your ssh settings to ensure that you can run on the remote 
server(s) without a password or interactive passphrase entry.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/05/29282.php


Re: [OMPI users] problem about mpirun on two nodes

2016-05-23 Thread Jeff Squyres (jsquyres)
On May 21, 2016, at 11:31 PM, dour...@aol.com wrote:
> 
> I encountered a problem about mpirun and SSH when using OMPI 1.10.0 compiled 
> with gcc, running on centos7.2.
> When I execute mpirun on my 2 node cluster, I get the following errors pasted 
> below.
> 
> [douraku@master home]$ mpirun -np 12 a.out
> Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

This is the key right here: you got a permission denied error when you 
(assumedly) tried to execute on the remote server.

Triple check your ssh settings to ensure that you can run on the remote 
server(s) without a password or interactive passphrase entry.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI users] problem about mpirun on two nodes

2016-05-22 Thread douraku
Hi all

I encountered a problem about mpirun and SSH when using OMPI 1.10.0 compiled 
with gcc, running on centos7.2.
When I execute mpirun on my 2 node cluster, I get the following errors pasted 
below.


[douraku@master home]$ mpirun -np 12 a.out
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
--
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).
--


Here are some information about settings.
- When only master node is used, this does not happen.
- opmi is installed in /opt/openmpi-1.10.0/ on the master node.
- /opt and /home are exported and are nfs mounted on slave node.
- master and slave and their # of cpu's are written in the 
openmpi-default-hostfile
- path to mpi library was confirmed (No doubt, because /home and /opt are 
shared).
- password-less login using public key has been configured. So, I can login 
from master to slave, or slave to master without password. 

I see similar issues in FAQ on the system consisting of multiple slave nodes, 
where ssh login is necessary between the slave nodes due to the "tree 
structure" of ompi. So, I am puzzled why the same issue occur because my system 
does not have multiple slave nodes (and password less loging was established 
for both direction).
I hope I could have some suggestions for solving this issue.

Many thanks in advance.










Re: [OMPI users] Problem with mpirun for java codes

2013-01-18 Thread Jeff Squyres (jsquyres)
If the examples didn't work for you, then something else was wrong (or there's 
some bug that we're unaware of) -- we're all able to run the examples ok.  We 
pulled Java from the 1.7.0 release because of issues with multi-dimensional 
arrays, MPI.OBJECT weirdness, ...etc.  Basic functionality, like the examples, 
should work fine.



On Jan 18, 2013, at 8:20 PM, Karos Lotfifar <foad...@gmail.com>
 wrote:

> Hi Chuck, 
> 
> No, I tried the examples but they did not work as well. Hope the issues would 
> be resolved soon.
> 
> Regards,
> Karos
> 
> On 18 Jan 2013, at 20:26, Ralph Castain <r...@open-mpi.org> wrote:
> 
>> I expect it to be in the 1.7.1 release - we just need some time to really 
>> test it and clean things up.
>> 
>> Meantime, it is available in the developer's nightly tarball, or via svn 
>> checkout.
>> 
>> 
>> On Jan 18, 2013, at 12:23 PM, Chuck Mosher <chuckmos...@yahoo.com> wrote:
>> 
>>> Ralph - I'm relying on you guys to support Java, hope it makes it back in 
>>> soon !!!
>>> 
>>> I've had no problems, by the way.
>>> 
>>> Karos - Were you able to make and run the Java examples in the 
>>> MPI_ROOT/examples directory  ?
>>> 
>>> I started with those after similar hiccups trying to get things up and 
>>> running.
>>> 
>>> Chuck Mosher
>>> JavaSeis.org
>>> 
>>> From: Ralph Castain <r...@open-mpi.org>
>>> To: Open MPI Users <us...@open-mpi.org> 
>>> Sent: Thursday, January 17, 2013 2:27 PM
>>> Subject: Re: [OMPI users] Problem with mpirun for java codes
>>> 
>>> Just as an FYI: we have removed the Java bindings from the 1.7.0 release 
>>> due to all the reported errors - looks like that code just isn't ready yet 
>>> for release. It remains available on the nightly snapshots of the 
>>> developer's trunk while we continue to debug it.
>>> 
>>> With that said, I tried your example using the current developer's trunk - 
>>> and it worked just fine.
>>> 
>>> I ran it on a single node, however. Were you running this across multiple 
>>> nodes? Is it possible that the "classes" directory wasn't available on the 
>>> remote node?
>>> 
>>> 
>>> On Jan 16, 2013, at 4:17 PM, Karos Lotfifar <foad...@gmail.com> wrote:
>>> 
>>>> Hi, 
>>>> The version that I am using is 
>>>> 
>>>> 1.7rc6 (pre-release)
>>>> 
>>>> 
>>>> Regards,
>>>> Karos
>>>> 
>>>> On 16 Jan 2013, at 21:07, Ralph Castain <r...@open-mpi.org> wrote:
>>>> 
>>>>> Which version of OMPI are you using?
>>>>> 
>>>>> 
>>>>> On Jan 16, 2013, at 11:43 AM, Karos Lotfifar <foad...@gmail.com> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I am still struggling with the installation problems! I get very strange 
>>>>>> errors. everything is fine when I run OpenMPI for C codes, but when I 
>>>>>> try to run a simple java code I get very strange error. The code is as 
>>>>>> simple as the following and I can not get it running:
>>>>>> 
>>>>>> import mpi.*;
>>>>>> 
>>>>>> class JavaMPI {
>>>>>>   public static void main(String[] args) throws MPIException {
>>>>>> MPI.Init(args);
>>>>>> System.out.println("Hello world from rank " + 
>>>>>>   MPI.COMM_WORLD.Rank() + " of " +
>>>>>>   MPI.COMM_WORLD.Size() );
>>>>>> MPI.Finalize();
>>>>>>   }
>>>>>> } 
>>>>>> 
>>>>>> everything is ok with mpijavac, my java code, etc. when I try to run the 
>>>>>> code with the following command:
>>>>>> 
>>>>>> /usr/local/bin/mpijavac -d classes JavaMPI.java   --> FINE
>>>>>> /usr/local/bin/mpirun -np 2 java -cp ./classes JavaMPI  --> *ERROR*
>>>>>> 
>>>>>> I'll the following error. Could you please help me about this (As I 
>>>>>> mentioned the I can run C MPI codes without any problem ). The system 
>>>>>> specifications are:
>>>>>> 
>>>>>> JRE version: 6.0_30-b12 (java-sun-6)
>>>>>> OS: Linux 3.0.0-30-generic-pae #47-Ubuntu
>>

Re: [OMPI users] Problem with mpirun for java codes

2013-01-18 Thread Karos Lotfifar
Hi Chuck, 

No, I tried the examples but they did not work as well. Hope the issues would 
be resolved soon.

Regards,
Karos

On 18 Jan 2013, at 20:26, Ralph Castain <r...@open-mpi.org> wrote:

> I expect it to be in the 1.7.1 release - we just need some time to really 
> test it and clean things up.
> 
> Meantime, it is available in the developer's nightly tarball, or via svn 
> checkout.
> 
> 
> On Jan 18, 2013, at 12:23 PM, Chuck Mosher <chuckmos...@yahoo.com> wrote:
> 
>> Ralph - I'm relying on you guys to support Java, hope it makes it back in 
>> soon !!!
>> 
>> I've had no problems, by the way.
>> 
>> Karos - Were you able to make and run the Java examples in the 
>> MPI_ROOT/examples directory ?
>> 
>> I started with those after similar hiccups trying to get things up and 
>> running.
>> 
>> Chuck Mosher
>> JavaSeis.org
>> 
>> From: Ralph Castain <r...@open-mpi.org>
>> To: Open MPI Users <us...@open-mpi.org> 
>> Sent: Thursday, January 17, 2013 2:27 PM
>> Subject: Re: [OMPI users] Problem with mpirun for java codes
>> 
>> Just as an FYI: we have removed the Java bindings from the 1.7.0 release due 
>> to all the reported errors - looks like that code just isn't ready yet for 
>> release. It remains available on the nightly snapshots of the developer's 
>> trunk while we continue to debug it.
>> 
>> With that said, I tried your example using the current developer's trunk - 
>> and it worked just fine.
>> 
>> I ran it on a single node, however. Were you running this across multiple 
>> nodes? Is it possible that the "classes" directory wasn't available on the 
>> remote node?
>> 
>> 
>> On Jan 16, 2013, at 4:17 PM, Karos Lotfifar <foad...@gmail.com> wrote:
>> 
>>> Hi, 
>>> The version that I am using is 
>>> 
>>> 1.7rc6 (pre-release)
>>> 
>>> 
>>> Regards,
>>> Karos
>>> 
>>> On 16 Jan 2013, at 21:07, Ralph Castain <r...@open-mpi.org> wrote:
>>> 
>>>> Which version of OMPI are you using?
>>>> 
>>>> 
>>>> On Jan 16, 2013, at 11:43 AM, Karos Lotfifar <foad...@gmail.com> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am still struggling with the installation problems! I get very strange 
>>>>> errors. everything is fine when I run OpenMPI for C codes, but when I try 
>>>>> to run a simple java code I get very strange error. The code is as simple 
>>>>> as the following and I can not get it running:
>>>>> 
>>>>> import mpi.*;
>>>>> 
>>>>> class JavaMPI {
>>>>>   public static void main(String[] args) throws MPIException {
>>>>> MPI.Init(args);
>>>>> System.out.println("Hello world from rank " + 
>>>>>   MPI.COMM_WORLD.Rank() + " of " +
>>>>>   MPI.COMM_WORLD.Size() );
>>>>> MPI.Finalize();
>>>>>   }
>>>>> } 
>>>>> 
>>>>> everything is ok with mpijavac, my java code, etc. when I try to run the 
>>>>> code with the following command:
>>>>> 
>>>>> /usr/local/bin/mpijavac -d classes JavaMPI.java   --> FINE
>>>>> /usr/local/bin/mpirun -np 2 java -cp ./classes JavaMPI  --> *ERROR*
>>>>> 
>>>>> I'll the following error. Could you please help me about this (As I 
>>>>> mentioned the I can run C MPI codes without any problem ). The system 
>>>>> specifications are:
>>>>> 
>>>>> JRE version: 6.0_30-b12 (java-sun-6)
>>>>> OS: Linux 3.0.0-30-generic-pae #47-Ubuntu
>>>>> CPU:total 4 (2 cores per cpu, 2 threads per core) family 6 model 42 
>>>>> stepping 7, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, 
>>>>> popcnt, ht
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ##
>>>>> #
>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>> #
>>>>> #  SIGSEGV#
>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>> #
>>>>> #  SIGSEGV (0xb) at pc=0x70e1dd12, pid=28616, tid=3063311216
>>>>> #
>>>

Re: [OMPI users] Problem with mpirun for java codes

2013-01-18 Thread Ralph Castain
I expect it to be in the 1.7.1 release - we just need some time to really test 
it and clean things up.

Meantime, it is available in the developer's nightly tarball, or via svn 
checkout.


On Jan 18, 2013, at 12:23 PM, Chuck Mosher <chuckmos...@yahoo.com> wrote:

> Ralph - I'm relying on you guys to support Java, hope it makes it back in 
> soon !!!
> 
> I've had no problems, by the way.
> 
> Karos - Were you able to make and run the Java examples in the 
> MPI_ROOT/examples directory ?
> 
> I started with those after similar hiccups trying to get things up and 
> running.
> 
> Chuck Mosher
> JavaSeis.org
> 
> From: Ralph Castain <r...@open-mpi.org>
> To: Open MPI Users <us...@open-mpi.org> 
> Sent: Thursday, January 17, 2013 2:27 PM
> Subject: Re: [OMPI users] Problem with mpirun for java codes
> 
> Just as an FYI: we have removed the Java bindings from the 1.7.0 release due 
> to all the reported errors - looks like that code just isn't ready yet for 
> release. It remains available on the nightly snapshots of the developer's 
> trunk while we continue to debug it.
> 
> With that said, I tried your example using the current developer's trunk - 
> and it worked just fine.
> 
> I ran it on a single node, however. Were you running this across multiple 
> nodes? Is it possible that the "classes" directory wasn't available on the 
> remote node?
> 
> 
> On Jan 16, 2013, at 4:17 PM, Karos Lotfifar <foad...@gmail.com> wrote:
> 
>> Hi, 
>> The version that I am using is 
>> 
>> 1.7rc6 (pre-release)
>> 
>> 
>> Regards,
>> Karos
>> 
>> On 16 Jan 2013, at 21:07, Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>> Which version of OMPI are you using?
>>> 
>>> 
>>> On Jan 16, 2013, at 11:43 AM, Karos Lotfifar <foad...@gmail.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am still struggling with the installation problems! I get very strange 
>>>> errors. everything is fine when I run OpenMPI for C codes, but when I try 
>>>> to run a simple java code I get very strange error. The code is as simple 
>>>> as the following and I can not get it running:
>>>> 
>>>> import mpi.*;
>>>> 
>>>> class JavaMPI {
>>>>   public static void main(String[] args) throws MPIException {
>>>> MPI.Init(args);
>>>> System.out.println("Hello world from rank " + 
>>>>   MPI.COMM_WORLD.Rank() + " of " +
>>>>   MPI.COMM_WORLD.Size() );
>>>> MPI.Finalize();
>>>>   }
>>>> } 
>>>> 
>>>> everything is ok with mpijavac, my java code, etc. when I try to run the 
>>>> code with the following command:
>>>> 
>>>> /usr/local/bin/mpijavac -d classes JavaMPI.java   --> FINE
>>>> /usr/local/bin/mpirun -np 2 java -cp ./classes JavaMPI  --> *ERROR*
>>>> 
>>>> I'll the following error. Could you please help me about this (As I 
>>>> mentioned the I can run C MPI codes without any problem ). The system 
>>>> specifications are:
>>>> 
>>>> JRE version: 6.0_30-b12 (java-sun-6)
>>>> OS: Linux 3.0.0-30-generic-pae #47-Ubuntu
>>>> CPU:total 4 (2 cores per cpu, 2 threads per core) family 6 model 42 
>>>> stepping 7, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, 
>>>> popcnt, ht
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ##
>>>> #
>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>> #
>>>> #  SIGSEGV#
>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>> #
>>>> #  SIGSEGV (0xb) at pc=0x70e1dd12, pid=28616, tid=3063311216
>>>> #
>>>>  (0xb) at pc=0x70f61d12, pid=28615, tid=3063343984
>>>> #
>>>> # JRE version: 6.0_30-b12
>>>> # JRE version: 6.0_30-b12
>>>> # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 )
>>>> # Problematic frame:
>>>> # C  [libmpi.so.1+0x20d12]  unsigned __int128+0xa2
>>>> #
>>>> # An error report file with more information is saved as:
>>>> # /home/karos/hs_err_pid28616.log
>>>> # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 )
>>>> # Problematic frame:
>&g

Re: [OMPI users] Problem with mpirun for java codes

2013-01-18 Thread Chuck Mosher
Ralph - I'm relying on you guys to support Java, hope it makes it back in soon 
!!!

I've had no problems, by the way.

Karos - Were you able to make and run the Java examples in the 
MPI_ROOT/examples directory ?

I started with those after similar hiccups trying to get things up and running.

Chuck Mosher
JavaSeis.org



 From: Ralph Castain <r...@open-mpi.org>
To: Open MPI Users <us...@open-mpi.org> 
Sent: Thursday, January 17, 2013 2:27 PM
Subject: Re: [OMPI users] Problem with mpirun for java codes
 

Just as an FYI: we have removed the Java bindings from the 1.7.0 release due to 
all the reported errors - looks like that code just isn't ready yet for 
release. It remains available on the nightly snapshots of the developer's trunk 
while we continue to debug it.

With that said, I tried your example using the current developer's trunk - and 
it worked just fine.

I ran it on a single node, however. Were you running this across multiple 
nodes? Is it possible that the "classes" directory wasn't available on the 
remote node?



On Jan 16, 2013, at 4:17 PM, Karos Lotfifar <foad...@gmail.com> wrote:

Hi, 
>The version that I am using is 
>
>
>1.7rc6 (pre-release)
>
>
>
>
>Regards,
>Karos
>
>On 16 Jan 2013, at 21:07, Ralph Castain <r...@open-mpi.org> wrote:
>
>
>Which version of OMPI are you using?
>>
>>
>>
>>
>>On Jan 16, 2013, at 11:43 AM, Karos Lotfifar <foad...@gmail.com> wrote:
>>
>>Hi,
>>>
>>>I am still struggling with the installation problems! I get very strange 
>>>errors. everything is fine when I run OpenMPI for C codes, but when I try to 
>>>run a simple java code I get very strange error. The code is as simple as 
>>>the following and I can not get it running:
>>>
>>>import mpi.*;
>>>
>>>class JavaMPI {
>>>  public static void main(String[] args) throws MPIException {
>>>    MPI.Init(args);
>>>    System.out.println("Hello world from rank " + 
>>>  MPI.COMM_WORLD.Rank() + " of " +
>>>  MPI.COMM_WORLD.Size() );
>>>    MPI.Finalize();
>>>  }
>>>} 
>>>
>>>everything is ok with mpijavac, my java code, etc. when I try to run the 
>>>code with the following command:
>>>
>>>/usr/local/bin/mpijavac -d classes JavaMPI.java   --> FINE
>>>/usr/local/bin/mpirun -np 2 java -cp ./classes JavaMPI  --> *ERROR*
>>>
>>>I'll the following error. Could you please help me about this (As I 
>>>mentioned the I can run C MPI codes without any problem ). The system 
>>>specifications are:
>>>
>>>JRE version: 6.0_30-b12 (java-sun-6)
>>>OS: Linux 3.0.0-30-generic-pae #47-Ubuntu
>>>CPU:total 4 (2 cores per cpu, 2 threads per core) family 6 model 42 stepping 
>>>7, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, ht
>>>
>>>
>>>
>>>
>>>##
>>>#
>>># A fatal error has been detected by the Java Runtime Environment:
>>>#
>>>#  SIGSEGV#
>>># A fatal error has been detected by the Java Runtime Environment:
>>>#
>>>#  SIGSEGV (0xb) at pc=0x70e1dd12, pid=28616, tid=3063311216
>>>#
>>> (0xb) at pc=0x70f61d12, pid=28615, tid=3063343984
>>>#
>>># JRE version: 6.0_30-b12
>>># JRE version: 6.0_30-b12
>>># Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 )
>>># Problematic frame:
>>># C  [libmpi.so.1+0x20d12]  unsigned __int128+0xa2
>>>#
>>># An error report file with more information is saved as:
>>># /home/karos/hs_err_pid28616.log
>>># Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 )
>>># Problematic frame:
>>># C  [libmpi.so.1+0x20d12]  unsigned __int128+0xa2
>>>#
>>># An error report file with more information is saved as:
>>># /home/karos/hs_err_pid28615.log
>>>#
>>># If you would like to submit a bug report, please visit:
>>>#   http://java.sun.com/webapps/bugreport/crash.jsp
>>># The crash happened outside the Java Virtual Machine in native code.
>>># See problematic frame for where to report the bug.
>>>#
>>>[tulips:28616] *** Process received signal ***
>>>[tulips:28616] Signal: Aborted (6)
>>>[tulips:28616] Signal code:  (-6)
>>>[tulips:28616] [ 0] [0xb777840c]
>>>[tulips:28616] [ 1] [0xb7778424]
>>>[tulips:28616] [ 

Re: [OMPI users] Problem with mpirun for java codes

2013-01-17 Thread Ralph Castain
Just as an FYI: we have removed the Java bindings from the 1.7.0 release due to 
all the reported errors - looks like that code just isn't ready yet for 
release. It remains available on the nightly snapshots of the developer's trunk 
while we continue to debug it.

With that said, I tried your example using the current developer's trunk - and 
it worked just fine.

I ran it on a single node, however. Were you running this across multiple 
nodes? Is it possible that the "classes" directory wasn't available on the 
remote node?


On Jan 16, 2013, at 4:17 PM, Karos Lotfifar  wrote:

> Hi, 
> The version that I am using is 
> 
> 1.7rc6 (pre-release)
> 
> 
> Regards,
> Karos
> 
> On 16 Jan 2013, at 21:07, Ralph Castain  wrote:
> 
>> Which version of OMPI are you using?
>> 
>> 
>> On Jan 16, 2013, at 11:43 AM, Karos Lotfifar  wrote:
>> 
>>> Hi,
>>> 
>>> I am still struggling with the installation problems! I get very strange 
>>> errors. everything is fine when I run OpenMPI for C codes, but when I try 
>>> to run a simple java code I get very strange error. The code is as simple 
>>> as the following and I can not get it running:
>>> 
>>> import mpi.*;
>>> 
>>> class JavaMPI {
>>>   public static void main(String[] args) throws MPIException {
>>> MPI.Init(args);
>>> System.out.println("Hello world from rank " + 
>>>   MPI.COMM_WORLD.Rank() + " of " +
>>>   MPI.COMM_WORLD.Size() );
>>> MPI.Finalize();
>>>   }
>>> } 
>>> 
>>> everything is ok with mpijavac, my java code, etc. when I try to run the 
>>> code with the following command:
>>> 
>>> /usr/local/bin/mpijavac -d classes JavaMPI.java   --> FINE
>>> /usr/local/bin/mpirun -np 2 java -cp ./classes JavaMPI  --> *ERROR*
>>> 
>>> I'll the following error. Could you please help me about this (As I 
>>> mentioned the I can run C MPI codes without any problem ). The system 
>>> specifications are:
>>> 
>>> JRE version: 6.0_30-b12 (java-sun-6)
>>> OS: Linux 3.0.0-30-generic-pae #47-Ubuntu
>>> CPU:total 4 (2 cores per cpu, 2 threads per core) family 6 model 42 
>>> stepping 7, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, 
>>> popcnt, ht
>>> 
>>> 
>>> 
>>> 
>>> ##
>>> #
>>> # A fatal error has been detected by the Java Runtime Environment:
>>> #
>>> #  SIGSEGV#
>>> # A fatal error has been detected by the Java Runtime Environment:
>>> #
>>> #  SIGSEGV (0xb) at pc=0x70e1dd12, pid=28616, tid=3063311216
>>> #
>>>  (0xb) at pc=0x70f61d12, pid=28615, tid=3063343984
>>> #
>>> # JRE version: 6.0_30-b12
>>> # JRE version: 6.0_30-b12
>>> # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 )
>>> # Problematic frame:
>>> # C  [libmpi.so.1+0x20d12]  unsigned __int128+0xa2
>>> #
>>> # An error report file with more information is saved as:
>>> # /home/karos/hs_err_pid28616.log
>>> # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 )
>>> # Problematic frame:
>>> # C  [libmpi.so.1+0x20d12]  unsigned __int128+0xa2
>>> #
>>> # An error report file with more information is saved as:
>>> # /home/karos/hs_err_pid28615.log
>>> #
>>> # If you would like to submit a bug report, please visit:
>>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>>> # The crash happened outside the Java Virtual Machine in native code.
>>> # See problematic frame for where to report the bug.
>>> #
>>> [tulips:28616] *** Process received signal ***
>>> [tulips:28616] Signal: Aborted (6)
>>> [tulips:28616] Signal code:  (-6)
>>> [tulips:28616] [ 0] [0xb777840c]
>>> [tulips:28616] [ 1] [0xb7778424]
>>> [tulips:28616] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75e3cff]
>>> [tulips:28616] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0xb75e7325]
>>> [tulips:28616] [ 4] 
>>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dcf7f) 
>>> [0xb6f6df7f]
>>> [tulips:28616] [ 5] 
>>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x724897) 
>>> [0xb70b5897]
>>> [tulips:28616] [ 6] 
>>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(JVM_handle_linux_signal+0x21c)
>>>  [0xb6f7529c]
>>> [tulips:28616] [ 7] 
>>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dff64) 
>>> [0xb6f70f64]
>>> [tulips:28616] [ 8] [0xb777840c]
>>> [tulips:28616] [ 9] [0xb3891548]
>>> [tulips:28616] *** End of error message ***
>>> [tulips:28615] *** Process received signal ***
>>> [tulips:28615] Signal: Aborted (6)
>>> [tulips:28615] Signal code:  (-6)
>>> #
>>> # If you would like to submit a bug report, please visit:
>>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>>> # The crash happened outside the Java Virtual Machine in native code.
>>> # See problematic frame for where to report the bug.
>>> #
>>> [tulips:28615] [ 0] [0xb778040c]
>>> [tulips:28615] [ 1] [0xb7780424]
>>> [tulips:28615] [ 2] 

Re: [OMPI users] Problem with mpirun for java codes

2013-01-16 Thread Karos Lotfifar
Hi, 
The version that I am using is 

1.7rc6 (pre-release)


Regards,
Karos

On 16 Jan 2013, at 21:07, Ralph Castain  wrote:

> Which version of OMPI are you using?
> 
> 
> On Jan 16, 2013, at 11:43 AM, Karos Lotfifar  wrote:
> 
>> Hi,
>> 
>> I am still struggling with the installation problems! I get very strange 
>> errors. everything is fine when I run OpenMPI for C codes, but when I try to 
>> run a simple java code I get very strange error. The code is as simple as 
>> the following and I can not get it running:
>> 
>> import mpi.*;
>> 
>> class JavaMPI {
>>   public static void main(String[] args) throws MPIException {
>> MPI.Init(args);
>> System.out.println("Hello world from rank " + 
>>   MPI.COMM_WORLD.Rank() + " of " +
>>   MPI.COMM_WORLD.Size() );
>> MPI.Finalize();
>>   }
>> } 
>> 
>> everything is ok with mpijavac, my java code, etc. when I try to run the 
>> code with the following command:
>> 
>> /usr/local/bin/mpijavac -d classes JavaMPI.java   --> FINE
>> /usr/local/bin/mpirun -np 2 java -cp ./classes JavaMPI  --> *ERROR*
>> 
>> I'll the following error. Could you please help me about this (As I 
>> mentioned the I can run C MPI codes without any problem ). The system 
>> specifications are:
>> 
>> JRE version: 6.0_30-b12 (java-sun-6)
>> OS: Linux 3.0.0-30-generic-pae #47-Ubuntu
>> CPU:total 4 (2 cores per cpu, 2 threads per core) family 6 model 42 stepping 
>> 7, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, ht
>> 
>> 
>> 
>> 
>> ##
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGSEGV#
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x70e1dd12, pid=28616, tid=3063311216
>> #
>>  (0xb) at pc=0x70f61d12, pid=28615, tid=3063343984
>> #
>> # JRE version: 6.0_30-b12
>> # JRE version: 6.0_30-b12
>> # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 )
>> # Problematic frame:
>> # C  [libmpi.so.1+0x20d12]  unsigned __int128+0xa2
>> #
>> # An error report file with more information is saved as:
>> # /home/karos/hs_err_pid28616.log
>> # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 )
>> # Problematic frame:
>> # C  [libmpi.so.1+0x20d12]  unsigned __int128+0xa2
>> #
>> # An error report file with more information is saved as:
>> # /home/karos/hs_err_pid28615.log
>> #
>> # If you would like to submit a bug report, please visit:
>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>> # The crash happened outside the Java Virtual Machine in native code.
>> # See problematic frame for where to report the bug.
>> #
>> [tulips:28616] *** Process received signal ***
>> [tulips:28616] Signal: Aborted (6)
>> [tulips:28616] Signal code:  (-6)
>> [tulips:28616] [ 0] [0xb777840c]
>> [tulips:28616] [ 1] [0xb7778424]
>> [tulips:28616] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75e3cff]
>> [tulips:28616] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0xb75e7325]
>> [tulips:28616] [ 4] 
>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dcf7f) 
>> [0xb6f6df7f]
>> [tulips:28616] [ 5] 
>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x724897) 
>> [0xb70b5897]
>> [tulips:28616] [ 6] 
>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(JVM_handle_linux_signal+0x21c)
>>  [0xb6f7529c]
>> [tulips:28616] [ 7] 
>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dff64) 
>> [0xb6f70f64]
>> [tulips:28616] [ 8] [0xb777840c]
>> [tulips:28616] [ 9] [0xb3891548]
>> [tulips:28616] *** End of error message ***
>> [tulips:28615] *** Process received signal ***
>> [tulips:28615] Signal: Aborted (6)
>> [tulips:28615] Signal code:  (-6)
>> #
>> # If you would like to submit a bug report, please visit:
>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>> # The crash happened outside the Java Virtual Machine in native code.
>> # See problematic frame for where to report the bug.
>> #
>> [tulips:28615] [ 0] [0xb778040c]
>> [tulips:28615] [ 1] [0xb7780424]
>> [tulips:28615] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75ebcff]
>> [tulips:28615] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0xb75ef325]
>> [tulips:28615] [ 4] 
>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dcf7f) 
>> [0xb6f75f7f]
>> [tulips:28615] [ 5] 
>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x724897) 
>> [0xb70bd897]
>> [tulips:28615] [ 6] 
>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(JVM_handle_linux_signal+0x21c)
>>  [0xb6f7d29c]
>> [tulips:28615] [ 7] 
>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dff64) 
>> [0xb6f78f64]
>> [tulips:28615] [ 8] [0xb778040c]
>> [tulips:28615] [ 9] [0xb3899548]
>> [tulips:28615] *** End of error message ***
>> 

Re: [OMPI users] Problem with mpirun for java codes

2013-01-16 Thread Ralph Castain
Which version of OMPI are you using?


On Jan 16, 2013, at 11:43 AM, Karos Lotfifar  wrote:

> Hi,
> 
> I am still struggling with the installation problems! I get very strange 
> errors. everything is fine when I run OpenMPI for C codes, but when I try to 
> run a simple java code I get very strange error. The code is as simple as the 
> following and I can not get it running:
> 
> import mpi.*;
> 
> class JavaMPI {
>   public static void main(String[] args) throws MPIException {
> MPI.Init(args);
> System.out.println("Hello world from rank " + 
>   MPI.COMM_WORLD.Rank() + " of " +
>   MPI.COMM_WORLD.Size() );
> MPI.Finalize();
>   }
> } 
> 
> everything is ok with mpijavac, my java code, etc. when I try to run the code 
> with the following command:
> 
> /usr/local/bin/mpijavac -d classes JavaMPI.java   --> FINE
> /usr/local/bin/mpirun -np 2 java -cp ./classes JavaMPI  --> *ERROR*
> 
> I'll the following error. Could you please help me about this (As I mentioned 
> the I can run C MPI codes without any problem ). The system specifications 
> are:
> 
> JRE version: 6.0_30-b12 (java-sun-6)
> OS: Linux 3.0.0-30-generic-pae #47-Ubuntu
> CPU:total 4 (2 cores per cpu, 2 threads per core) family 6 model 42 stepping 
> 7, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, ht
> 
> 
> 
> 
> ##
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV#
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x70e1dd12, pid=28616, tid=3063311216
> #
>  (0xb) at pc=0x70f61d12, pid=28615, tid=3063343984
> #
> # JRE version: 6.0_30-b12
> # JRE version: 6.0_30-b12
> # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 )
> # Problematic frame:
> # C  [libmpi.so.1+0x20d12]  unsigned __int128+0xa2
> #
> # An error report file with more information is saved as:
> # /home/karos/hs_err_pid28616.log
> # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 )
> # Problematic frame:
> # C  [libmpi.so.1+0x20d12]  unsigned __int128+0xa2
> #
> # An error report file with more information is saved as:
> # /home/karos/hs_err_pid28615.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> [tulips:28616] *** Process received signal ***
> [tulips:28616] Signal: Aborted (6)
> [tulips:28616] Signal code:  (-6)
> [tulips:28616] [ 0] [0xb777840c]
> [tulips:28616] [ 1] [0xb7778424]
> [tulips:28616] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75e3cff]
> [tulips:28616] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0xb75e7325]
> [tulips:28616] [ 4] 
> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dcf7f) 
> [0xb6f6df7f]
> [tulips:28616] [ 5] 
> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x724897) 
> [0xb70b5897]
> [tulips:28616] [ 6] 
> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(JVM_handle_linux_signal+0x21c)
>  [0xb6f7529c]
> [tulips:28616] [ 7] 
> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dff64) 
> [0xb6f70f64]
> [tulips:28616] [ 8] [0xb777840c]
> [tulips:28616] [ 9] [0xb3891548]
> [tulips:28616] *** End of error message ***
> [tulips:28615] *** Process received signal ***
> [tulips:28615] Signal: Aborted (6)
> [tulips:28615] Signal code:  (-6)
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> [tulips:28615] [ 0] [0xb778040c]
> [tulips:28615] [ 1] [0xb7780424]
> [tulips:28615] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75ebcff]
> [tulips:28615] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0xb75ef325]
> [tulips:28615] [ 4] 
> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dcf7f) 
> [0xb6f75f7f]
> [tulips:28615] [ 5] 
> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x724897) 
> [0xb70bd897]
> [tulips:28615] [ 6] 
> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(JVM_handle_linux_signal+0x21c)
>  [0xb6f7d29c]
> [tulips:28615] [ 7] 
> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dff64) 
> [0xb6f78f64]
> [tulips:28615] [ 8] [0xb778040c]
> [tulips:28615] [ 9] [0xb3899548]
> [tulips:28615] *** End of error message ***
> --
> mpirun noticed that process rank 1 with PID 28616 on node tulips exited on 
> signal 6 (Aborted).
> --
> 
> 

Re: [OMPI users] problem with mpirun

2011-11-04 Thread Jeff Squyres
We really need more information in order to help you.  Please see:

http://www.open-mpi.org/community/help/


On Nov 3, 2011, at 7:37 PM, amine mrabet wrote:

> i instaled  last version of openmpi now i have this error
> I
> t seems that [at least] one of the processes that was started with
> mpirun did not invoke MPI_INIT before quitting (it is possible that
> more than one process did not invoke MPI_INIT -- mpirun was only
> notified of the first one, which was on node n0).
> 
> :)
> 
> 
> 2011/11/3 amine mrabet 
> yes i have old version i will instal  1.4.4 and see 
> merci 
> 
> 
> 2011/11/3 Jeff Squyres 
> It sounds like you have an old version of Open MPI that is not ignoring your 
> unconfigured OpenFabrics devices in your Linux install.  This is a guess 
> because you didn't provide any information about your Open MPI installation.  
> :-)
> 
> Try upgrading to a newer version of Open MPI.
> 
> 
> On Nov 3, 2011, at 12:52 PM, amine mrabet wrote:
> 
> > i use openmpi in my computer
> >
> > 2011/11/3 Ralph Castain 
> > Couple of things:
> >
> > 1. Check the configure cmd line you gave - OMPI thinks your local computer 
> > should have an openib support that isn't correct.
> >
> > 2. did you recompile your app on your local computer, using the version of 
> > OMPI built/installed there?
> >
> >
> > On Nov 3, 2011, at 10:10 AM, amine mrabet wrote:
> >
> > > hey ,
> > > i use mpirun tu run program  with using mpi this program worked well in 
> > > university computer
> > >
> > > but with mine i have this error
> > >  i run with
> > >
> > > amine@dellam:~/Bureau$ mpirun  -np 2 pl
> > > and i have this error
> > >
> > > libibverbs: Fatal: couldn't read uverbs ABI version.
> > > --
> > > [0,0,0]: OpenIB on host dellam was unable to find any HCAs.
> > > Another transport will be used instead, although this may result in
> > > lower performance.
> > >
> > >
> > >
> > >
> > >
> > > any help?!
> > > --
> > > amine mrabet
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > --
> > amine mrabet
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
> amine mrabet 
> 
> 
> 
> -- 
> amine mrabet 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] problem with mpirun

2011-11-03 Thread amine mrabet
i instaled  last version of openmpi now i have this error
I
t seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).

:)


2011/11/3 amine mrabet 

> yes i have old version i will instal  1.4.4 and see
> merci
>
>
> 2011/11/3 Jeff Squyres 
>
>> It sounds like you have an old version of Open MPI that is not ignoring
>> your unconfigured OpenFabrics devices in your Linux install.  This is a
>> guess because you didn't provide any information about your Open MPI
>> installation.  :-)
>>
>> Try upgrading to a newer version of Open MPI.
>>
>>
>> On Nov 3, 2011, at 12:52 PM, amine mrabet wrote:
>>
>> > i use openmpi in my computer
>> >
>> > 2011/11/3 Ralph Castain 
>> > Couple of things:
>> >
>> > 1. Check the configure cmd line you gave - OMPI thinks your local
>> computer should have an openib support that isn't correct.
>> >
>> > 2. did you recompile your app on your local computer, using the version
>> of OMPI built/installed there?
>> >
>> >
>> > On Nov 3, 2011, at 10:10 AM, amine mrabet wrote:
>> >
>> > > hey ,
>> > > i use mpirun tu run program  with using mpi this program worked well
>> in university computer
>> > >
>> > > but with mine i have this error
>> > >  i run with
>> > >
>> > > amine@dellam:~/Bureau$ mpirun  -np 2 pl
>> > > and i have this error
>> > >
>> > > libibverbs: Fatal: couldn't read uverbs ABI version.
>> > >
>> --
>> > > [0,0,0]: OpenIB on host dellam was unable to find any HCAs.
>> > > Another transport will be used instead, although this may result in
>> > > lower performance.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > any help?!
>> > > --
>> > > amine mrabet
>> > > ___
>> > > users mailing list
>> > > us...@open-mpi.org
>> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> >
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> >
>> >
>> > --
>> > amine mrabet
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> amine mrabet
>



-- 
amine mrabet


Re: [OMPI users] problem with mpirun

2011-11-03 Thread amine mrabet
yes i have old version i will instal  1.4.4 and see
merci

2011/11/3 Jeff Squyres 

> It sounds like you have an old version of Open MPI that is not ignoring
> your unconfigured OpenFabrics devices in your Linux install.  This is a
> guess because you didn't provide any information about your Open MPI
> installation.  :-)
>
> Try upgrading to a newer version of Open MPI.
>
>
> On Nov 3, 2011, at 12:52 PM, amine mrabet wrote:
>
> > i use openmpi in my computer
> >
> > 2011/11/3 Ralph Castain 
> > Couple of things:
> >
> > 1. Check the configure cmd line you gave - OMPI thinks your local
> computer should have an openib support that isn't correct.
> >
> > 2. did you recompile your app on your local computer, using the version
> of OMPI built/installed there?
> >
> >
> > On Nov 3, 2011, at 10:10 AM, amine mrabet wrote:
> >
> > > hey ,
> > > i use mpirun tu run program  with using mpi this program worked well
> in university computer
> > >
> > > but with mine i have this error
> > >  i run with
> > >
> > > amine@dellam:~/Bureau$ mpirun  -np 2 pl
> > > and i have this error
> > >
> > > libibverbs: Fatal: couldn't read uverbs ABI version.
> > >
> --
> > > [0,0,0]: OpenIB on host dellam was unable to find any HCAs.
> > > Another transport will be used instead, although this may result in
> > > lower performance.
> > >
> > >
> > >
> > >
> > >
> > > any help?!
> > > --
> > > amine mrabet
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > --
> > amine mrabet
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
amine mrabet


Re: [OMPI users] problem with mpirun

2011-11-03 Thread Jeff Squyres
It sounds like you have an old version of Open MPI that is not ignoring your 
unconfigured OpenFabrics devices in your Linux install.  This is a guess 
because you didn't provide any information about your Open MPI installation.  
:-)

Try upgrading to a newer version of Open MPI.


On Nov 3, 2011, at 12:52 PM, amine mrabet wrote:

> i use openmpi in my computer 
> 
> 2011/11/3 Ralph Castain 
> Couple of things:
> 
> 1. Check the configure cmd line you gave - OMPI thinks your local computer 
> should have an openib support that isn't correct.
> 
> 2. did you recompile your app on your local computer, using the version of 
> OMPI built/installed there?
> 
> 
> On Nov 3, 2011, at 10:10 AM, amine mrabet wrote:
> 
> > hey ,
> > i use mpirun tu run program  with using mpi this program worked well in 
> > university computer
> >
> > but with mine i have this error
> >  i run with
> >
> > amine@dellam:~/Bureau$ mpirun  -np 2 pl
> > and i have this error
> >
> > libibverbs: Fatal: couldn't read uverbs ABI version.
> > --
> > [0,0,0]: OpenIB on host dellam was unable to find any HCAs.
> > Another transport will be used instead, although this may result in
> > lower performance.
> >
> >
> >
> >
> >
> > any help?!
> > --
> > amine mrabet
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
> amine mrabet 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] problem with mpirun

2011-11-03 Thread amine mrabet
i use openmpi in my computer

2011/11/3 Ralph Castain 

> Couple of things:
>
> 1. Check the configure cmd line you gave - OMPI thinks your local computer
> should have an openib support that isn't correct.
>
> 2. did you recompile your app on your local computer, using the version of
> OMPI built/installed there?
>
>
> On Nov 3, 2011, at 10:10 AM, amine mrabet wrote:
>
> > hey ,
> > i use mpirun tu run program  with using mpi this program worked well in
> university computer
> >
> > but with mine i have this error
> >  i run with
> >
> > amine@dellam:~/Bureau$ mpirun  -np 2 pl
> > and i have this error
> >
> > libibverbs: Fatal: couldn't read uverbs ABI version.
> >
> --
> > [0,0,0]: OpenIB on host dellam was unable to find any HCAs.
> > Another transport will be used instead, although this may result in
> > lower performance.
> >
> >
> >
> >
> >
> > any help?!
> > --
> > amine mrabet
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
amine mrabet


Re: [OMPI users] problem with mpirun

2011-11-03 Thread Ralph Castain
Couple of things:

1. Check the configure cmd line you gave - OMPI thinks your local computer 
should have an openib support that isn't correct.

2. did you recompile your app on your local computer, using the version of OMPI 
built/installed there?


On Nov 3, 2011, at 10:10 AM, amine mrabet wrote:

> hey ,
> i use mpirun tu run program  with using mpi this program worked well in 
> university computer 
> 
> but with mine i have this error
>  i run with 
> 
> amine@dellam:~/Bureau$ mpirun  -np 2 pl
> and i have this error 
> 
> libibverbs: Fatal: couldn't read uverbs ABI version.
> --
> [0,0,0]: OpenIB on host dellam was unable to find any HCAs.
> Another transport will be used instead, although this may result in 
> lower performance.
> 
> 
> 
> 
> 
> any help?!
> -- 
> amine mrabet 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] problem with mpirun

2011-11-03 Thread amine mrabet
hey ,
i use mpirun tu run program  with using mpi this program worked well in
university computer

but with mine i have this error
 i run with

amine@dellam:~/Bureau$ mpirun  -np 2 pl
and i have this error

libibverbs: Fatal: couldn't read uverbs ABI version.
--
[0,0,0]: OpenIB on host dellam was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.





any help?!
-- 
amine mrabet


Re: [OMPI users] problem using mpirun over multiple nodes

2011-05-26 Thread Jagannath Mondal
Hi Jeff,
  Thanks to you, I figured the problem . As you suspected, it was iptables
which was  acting as firewalls in some machines. So, after I stopped the
iptable, the MPI communication is going fine. Even I tried with 5 machines
together and the communication is going allright.
Thanks again,
Jagannath

On Thu, May 26, 2011 at 5:19 AM, Jeff Squyres  wrote:

> ssh may be allowed but other random TCP ports may not.
>
> iptables is the typical firewall software that most Linux installations
> use; it may have been enabled by default.
>
> I'm a little doubtful that this is your problem, though, because you're
> apparently able to *launch* your application, which means that OMPI's
> out-of-band communication system was able to make some sockets.  So it's a
> little weird that the MPI layer's TCP sockets were borked.  But let's check
> for firewall software, first...
>
>
> On May 26, 2011, at 12:42 AM, Jagannath Mondal wrote:
>
> > Hi Jeff,
> > I was wondering how I can check whether there is any firewall
> software . In fact I can use ssh to go from one machine to another . But,
> only with mpirun , it does not work. I was wondering whether it is possible
> that even in presence of firewall ssh may work but mpirun may not.
> > Jagannath
> >
> > On Wed, May 25, 2011 at 10:42 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > Are you running any firewall software?
> >
> > Sent from my phone. No type good.
> >
> > On May 25, 2011, at 10:41 PM, "Jagannath Mondal" <
> jagannath.mon...@gmail.com> wrote:
> >
> >> Hi,
> >> I am having a problem in running mpirun  over multiple nodes.
> >> To run a job  over two 8-core processors, I generated a hostfile as
> follows:
> >>  yethiraj30 slots=8 max_slots=8
> >>   yethiraj31 slots=8 max_slots=8
> >>
> >> These two machines are intra-connected and I have installed openmpi
> 1.3.3.
> >> Then If I try to run the replica exchange simulation using the following
> command:
> >> mpirun -np 16 --hostfile  hostfile  mdrun_4mpi -s topol_.tpr -multi 16
> -replex 100 >& log_replica_test
> >>
> >> But I find following error and job does not proceed at all :
> >> btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect()
> to 192.168.0.31 failed: No route to host (113)
> >>
> >> Here is the full details:
> >>
> >> NNODES=16, MYRANK=0, HOSTNAME=yethiraj30
> >> NNODES=16, MYRANK=1, HOSTNAME=yethiraj30
> >> NNODES=16, MYRANK=4, HOSTNAME=yethiraj30
> >> NNODES=16, MYRANK=2, HOSTNAME=yethiraj30
> >> NNODES=16, MYRANK=6, HOSTNAME=yethiraj30
> >> NNODES=16, MYRANK=3, HOSTNAME=yethiraj30
> >> NNODES=16, MYRANK=5, HOSTNAME=yethiraj30
> >> NNODES=16, MYRANK=7, HOSTNAME=yethiraj30
> >>
> [yethiraj30][[22604,1],0][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> >>
> [yethiraj30][[22604,1],4][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> >>
> [yethiraj30][[22604,1],6][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> >>
> [yethiraj30][[22604,1],1][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> >>
> [yethiraj30][[22604,1],3][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> >>
> [yethiraj30][[22604,1],2][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
> connect() to 192.168.0.31 failed: No route to host (113)
> >> NNODES=16, MYRANK=10, HOSTNAME=yethiraj31
> >> NNODES=16, MYRANK=12, HOSTNAME=yethiraj31
> >>
> >> I am not sure how to resolve this issue. In general, I can go from one
> machine to another without any problem using ssh. But, when I am trying to
> run openmpi over both the machines, I get this error. Any help will be
> appreciated.
> >>
> >> Jagannath
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] problem using mpirun over multiple nodes

2011-05-26 Thread Jeff Squyres
ssh may be allowed but other random TCP ports may not.

iptables is the typical firewall software that most Linux installations use; it 
may have been enabled by default.

I'm a little doubtful that this is your problem, though, because you're 
apparently able to *launch* your application, which means that OMPI's 
out-of-band communication system was able to make some sockets.  So it's a 
little weird that the MPI layer's TCP sockets were borked.  But let's check for 
firewall software, first...


On May 26, 2011, at 12:42 AM, Jagannath Mondal wrote:

> Hi Jeff,
> I was wondering how I can check whether there is any firewall software . 
> In fact I can use ssh to go from one machine to another . But, only with 
> mpirun , it does not work. I was wondering whether it is possible that even 
> in presence of firewall ssh may work but mpirun may not. 
> Jagannath
> 
> On Wed, May 25, 2011 at 10:42 PM, Jeff Squyres (jsquyres) 
>  wrote:
> Are you running any firewall software?
> 
> Sent from my phone. No type good. 
> 
> On May 25, 2011, at 10:41 PM, "Jagannath Mondal"  
> wrote:
> 
>> Hi, 
>> I am having a problem in running mpirun  over multiple nodes. 
>> To run a job  over two 8-core processors, I generated a hostfile as follows:
>>  yethiraj30 slots=8 max_slots=8
>>   yethiraj31 slots=8 max_slots=8
>> 
>> These two machines are intra-connected and I have installed openmpi 1.3.3.
>> Then If I try to run the replica exchange simulation using the following 
>> command:
>> mpirun -np 16 --hostfile  hostfile  mdrun_4mpi -s topol_.tpr -multi 16 
>> -replex 100 >& log_replica_test
>> 
>> But I find following error and job does not proceed at all : 
>> btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 
>> 192.168.0.31 failed: No route to host (113)
>> 
>> Here is the full details:
>> 
>> NNODES=16, MYRANK=0, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=1, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=4, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=2, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=6, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=3, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=5, HOSTNAME=yethiraj30
>> NNODES=16, MYRANK=7, HOSTNAME=yethiraj30
>> [yethiraj30][[22604,1],0][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>>  connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],4][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>>  connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],6][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>>  connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],1][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>>  connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],3][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>>  connect() to 192.168.0.31 failed: No route to host (113)
>> [yethiraj30][[22604,1],2][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>>  connect() to 192.168.0.31 failed: No route to host (113)
>> NNODES=16, MYRANK=10, HOSTNAME=yethiraj31
>> NNODES=16, MYRANK=12, HOSTNAME=yethiraj31
>> 
>> I am not sure how to resolve this issue. In general, I can go from one 
>> machine to another without any problem using ssh. But, when I am trying to 
>> run openmpi over both the machines, I get this error. Any help will be 
>> appreciated.
>> 
>> Jagannath
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] problem using mpirun over multiple nodes

2011-05-26 Thread Jeff Squyres (jsquyres)
Are you running any firewall software?

Sent from my phone. No type good. 

On May 25, 2011, at 10:41 PM, "Jagannath Mondal"  
wrote:

> Hi, 
> I am having a problem in running mpirun  over multiple nodes. 
> To run a job  over two 8-core processors, I generated a hostfile as follows:
>  yethiraj30 slots=8 max_slots=8
>   yethiraj31 slots=8 max_slots=8
> 
> These two machines are intra-connected and I have installed openmpi 1.3.3.
> Then If I try to run the replica exchange simulation using the following 
> command:
> mpirun -np 16 --hostfile  hostfile  mdrun_4mpi -s topol_.tpr -multi 16 
> -replex 100 >& log_replica_test
> 
> But I find following error and job does not proceed at all : 
> btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to 
> 192.168.0.31 failed: No route to host (113)
> 
> Here is the full details:
> 
> NNODES=16, MYRANK=0, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=1, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=4, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=2, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=6, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=3, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=5, HOSTNAME=yethiraj30
> NNODES=16, MYRANK=7, HOSTNAME=yethiraj30
> [yethiraj30][[22604,1],0][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>  connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],4][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>  connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],6][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>  connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],1][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>  connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],3][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>  connect() to 192.168.0.31 failed: No route to host (113)
> [yethiraj30][[22604,1],2][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
>  connect() to 192.168.0.31 failed: No route to host (113)
> NNODES=16, MYRANK=10, HOSTNAME=yethiraj31
> NNODES=16, MYRANK=12, HOSTNAME=yethiraj31
> 
> I am not sure how to resolve this issue. In general, I can go from one 
> machine to another without any problem using ssh. But, when I am trying to 
> run openmpi over both the machines, I get this error. Any help will be 
> appreciated.
> 
> Jagannath
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] problem using mpirun over multiple nodes

2011-05-25 Thread Jagannath Mondal
Hi,
I am having a problem in running mpirun  over multiple nodes.
To run a job  over two 8-core processors, I generated a hostfile as follows:
 yethiraj30 slots=8 max_slots=8
  yethiraj31 slots=8 max_slots=8

These two machines are intra-connected and I have installed openmpi 1.3.3.
Then If I try to run the replica exchange simulation using the following
command:
mpirun -np 16 --hostfile  hostfile  mdrun_4mpi -s topol_.tpr -multi 16
-replex 100 >& log_replica_test

But I find following error and job does not proceed at all :
btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect] connect() to
192.168.0.31 failed: No route to host (113)

Here is the full details:

NNODES=16, MYRANK=0, HOSTNAME=yethiraj30
NNODES=16, MYRANK=1, HOSTNAME=yethiraj30
NNODES=16, MYRANK=4, HOSTNAME=yethiraj30
NNODES=16, MYRANK=2, HOSTNAME=yethiraj30
NNODES=16, MYRANK=6, HOSTNAME=yethiraj30
NNODES=16, MYRANK=3, HOSTNAME=yethiraj30
NNODES=16, MYRANK=5, HOSTNAME=yethiraj30
NNODES=16, MYRANK=7, HOSTNAME=yethiraj30
[yethiraj30][[22604,1],0][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.0.31 failed: No route to host (113)
[yethiraj30][[22604,1],4][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.0.31 failed: No route to host (113)
[yethiraj30][[22604,1],6][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.0.31 failed: No route to host (113)
[yethiraj30][[22604,1],1][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.0.31 failed: No route to host (113)
[yethiraj30][[22604,1],3][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.0.31 failed: No route to host (113)
[yethiraj30][[22604,1],2][btl_tcp_endpoint.c:636:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.0.31 failed: No route to host (113)
NNODES=16, MYRANK=10, HOSTNAME=yethiraj31
NNODES=16, MYRANK=12, HOSTNAME=yethiraj31

I am not sure how to resolve this issue. In general, I can go from one
machine to another without any problem using ssh. But, when I am trying to
run openmpi over both the machines, I get this error. Any help will be
appreciated.

Jagannath


Re: [OMPI users] PROBLEM WITH MPIRUN

2010-11-29 Thread Tushar Andriyas
and the openmpi-1.2.7-pgi??

On Mon, Nov 29, 2010 at 6:27 AM, Tushar Andriyas wrote:

> Hi there,
> The thing is I did not write the code myself and am just trying to get it
> to work. So, would it help if i change the version of the compiler or is
> that it happens with every pgi compiler suite??
>
>
> On Sun, Nov 28, 2010 at 11:45 PM, Simon Hammond 
> wrote:
>
>> Hi,
>>
>> This isn't usually an error - you get this by using conventional
>> Fortran exit methods. The Fortran stop means the program hit the exit
>> statements in the code. I have only had this with PGI.
>>
>>
>>
>>
>> --
>> Si Hammond
>>
>> Research Fellow & Knowledge Transfer Associate
>> Performance Computing & Visualisation
>> Department of Computer Science
>> University of Warwick, UK, CV4 7AL
>>
>> --
>>
>>
>>
>> On 29 November 2010 04:56, Tushar Andriyas 
>> wrote:
>> > Hi there,
>> > I have posted before about the problems that I am facing with mpirun. I
>> have
>> > gotten some help but right now i am stuck with an error
>> message.FORTRAN
>> > STOP when I invoke mpirun..can someone help PLEASE!!
>> > I m using openmpi-1.2.7-pgi and pgi-7.2 compiler.
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>


Re: [OMPI users] PROBLEM WITH MPIRUN

2010-11-29 Thread Tushar Andriyas
Hi there,
The thing is I did not write the code myself and am just trying to get it to
work. So, would it help if i change the version of the compiler or is that
it happens with every pgi compiler suite??

On Sun, Nov 28, 2010 at 11:45 PM, Simon Hammond wrote:

> Hi,
>
> This isn't usually an error - you get this by using conventional
> Fortran exit methods. The Fortran stop means the program hit the exit
> statements in the code. I have only had this with PGI.
>
>
>
>
> --
> Si Hammond
>
> Research Fellow & Knowledge Transfer Associate
> Performance Computing & Visualisation
> Department of Computer Science
> University of Warwick, UK, CV4 7AL
>
> --
>
>
>
> On 29 November 2010 04:56, Tushar Andriyas  wrote:
> > Hi there,
> > I have posted before about the problems that I am facing with mpirun. I
> have
> > gotten some help but right now i am stuck with an error
> message.FORTRAN
> > STOP when I invoke mpirun..can someone help PLEASE!!
> > I m using openmpi-1.2.7-pgi and pgi-7.2 compiler.
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] PROBLEM WITH MPIRUN

2010-11-29 Thread Simon Hammond
Hi,

This isn't usually an error - you get this by using conventional
Fortran exit methods. The Fortran stop means the program hit the exit
statements in the code. I have only had this with PGI.



--
Si Hammond

Research Fellow & Knowledge Transfer Associate
Performance Computing & Visualisation
Department of Computer Science
University of Warwick, UK, CV4 7AL
--



On 29 November 2010 04:56, Tushar Andriyas  wrote:
> Hi there,
> I have posted before about the problems that I am facing with mpirun. I have
> gotten some help but right now i am stuck with an error message.FORTRAN
> STOP when I invoke mpirun..can someone help PLEASE!!
> I m using openmpi-1.2.7-pgi and pgi-7.2 compiler.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] PROBLEM WITH MPIRUN

2010-11-28 Thread Tushar Andriyas
Hi there,

I have posted before about the problems that I am facing with mpirun. I have
gotten some help but right now i am stuck with an error message.FORTRAN
STOP when I invoke mpirun..can someone help PLEASE!!

I m using openmpi-1.2.7-pgi and pgi-7.2 compiler.


Re: [OMPI users] problem with mpirun

2010-06-25 Thread Nifty Tom Mitchell
On Fri, Jun 11, 2010 at 11:03:03AM +0200, asmae.elbahlo...@mpsa.com wrote:
> Sender: users-boun...@open-mpi.org
> 
> 
>hello,
> 
>i'm doing a tutorial on OpenFoam, but when i run in parallel by typing
>"mpirun -np 30 foamProMesh -parallel | tee 2>&1 log/FPM.log"

.
> 
>[1] in file autoHexMesh/meshRefinement/meshRefinement.C at line
>1180.
> 

Your error is in OpenFoam -- the MPI errors are
simply MPI cleaning up because OpenFoam bailed.

If you run the parallel example of open foam
at http://www.openfoam.com/docs/user/damBreak.php#x7-610002.3.11
following the instructions with care
does it run better.

The source is full of tutorials and examples but 
it appears that running in parallel requires changes
in multiple places when the number of ranks changes.


-- 
T o m  M i t c h e l l 
Found me a new hat, now what?



Re: [OMPI users] problem with mpirun

2010-06-11 Thread Jeff Squyres
I'm a afraid I don't know anything about OpenFoam, but it looks like it 
deliberately chose to abort due to some error (i.e., it then called MPI_ABORT 
to abort).

I don't know what those stack traces mean; you will likely have better luck 
asking your question on the OpenFoam support list.

Good luck!


On Jun 11, 2010, at 5:03 AM,  wrote:

> hello,
> i'm doing a tutorial on OpenFoam, but when i run in parallel by typing 
> "mpirun -np 30 foamProMesh -parallel | tee 2>&1 log/FPM.log"
> On the terminal window , after fews seconds of run, it iterate but i have at 
> the end:
>  
>  
> tta201@linux-qv31:/media/OpenFoam/Travaux/F1car_asmaetest> mpirun -np 30 
> -machinefile machinefile foamProMesh -parallel | tee 2>&1 log/FPM.log
> /*---*\
> 
> |  |  
> |
> |   F ield | FOAM: The Open Source CFD Toolbox
> |
> |   O peration | Version:  1.5-2.2
> |
> |   A nd   | Web:  http://www.iconcfd.com 
> |
> |   M anipulation  |  
> |
> \*---*/
> 
> Exec   : foamProMesh -parallel
>  
> Date   : Jun 11 2010  
>  
> Time   : 10:42:24 
>  
> Host   : Foam1
>  
> PID: 9789 
>  
> Case   : /media/OpenFoam/Travaux/F1car_asmaetest  
>  
> nProcs : 30   
>  
> Slaves :  
>  
> 29
>  
> ( 
>  
> Foam1.9790
>  
> Foam1.9791
>  
> Foam1.9792
>  
> Foam2.9224
>  
> Foam2.9225
>  
> Foam2.9226
>  
> Foam2.9227
>  
> Foam3.8925
>  
> Foam3.8926
>  
> Foam3.8927
>  
> .. 
> 
> Added patches in = 0 s
>  
> Selecting decompositionMethod hierarchical
>  
> Overall mesh bounding box  : (-5.60160988792 -5.00165616875 -0.259253998544) 
> (9.39931715541 5.00165616875 

Re: [OMPI users] Problem running mpirun with ssh on remote nodes -Daemon did not report back when launched problem

2010-04-08 Thread rohan nigam
Hi Jeff,

You were right.  One of the other admins of the server I am working on, had a 
script that runs the firewall every time I logged in. So even when I was 
turning it off manually, the firewall ran the next time i logged in and hence 
the error.

Thanks.

- Rohan


--- On Tue, 4/6/10, Jeff Squyres <jsquy...@cisco.com> wrote:

From: Jeff Squyres <jsquy...@cisco.com>
Subject: Re: [OMPI users] Problem running mpirun with ssh on remote nodes 
-Daemon did not report back when launched problem
To: "Open MPI Users" <us...@open-mpi.org>
List-Post: users@lists.open-mpi.org
Date: Tuesday, April 6, 2010, 11:40 AM

Open MPI opens random TCP sockets during the startup phase of MPI processes -- 
mostly from the "orted" helper process that is started on each node (or VM) 
back to the initiating mpirun process.

Do you have firewalling or other TCP blocking software running?  Or are the 
appropriate TCP routes setup between all your VMs?


On Apr 2, 2010, at 5:00 PM, rohan nigam wrote:

> Hello,
> 
> I am trying to run a simple hello world program before actually launching 
> some very heavy load testing over the Xen SMP set up that I have. 
> 
> I am trying to run this command over four different hosts, Dom0  being the 
> host where i am launching mpirun and rest three being xen guest domains.
> 
> I have password less login setup across all the hosts. These hosts are 
> actually on AMD Opteron dual socket Quad core with 2 cores assigned to each 
> host/domain.
> 
> mpirun --prefix /root/xentools/openmpi-gcc/ -mca plm_rsh_agent ssh --host 
> localhost, xenguest1 -np 4 /root/xentools/hello
> 
> I am able to run mpirun successfully when I launch this command from one of 
> the guests and also when i lauch this command on dom0 (localhost)  alone. But 
> when i launch mpirun from the Dom 0 on one or more guests there is no 
> response from the guests and I am eventually having to kill the process which 
> reports saying 
>       xenguest1 - daemon did not report back when launched
> 
> Can someone point out where I am going wrong. I have seen people having 
> similar problem in the list but no one posted how they got around this 
> problem.
> 
> Note: I also tried setting the default agent launcher to ssh. Also, on every 
> host the directory structure is exactly the same and also the Library paths 
> and paths are also set up properly and the executable is also present at the 
> exact same location..
> 
> Thanks,
> Rohan Nigam
> Research Asst, 
> Univ. of Houston
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



  

[OMPI users] Problem running mpirun with ssh on remote nodes - Daemon did not report back when launched problem

2010-04-02 Thread rohan nigam
Hello,

I am trying to run a simple hello world program before actually launching some 
very heavy load testing over the Xen SMP set up that I have. 

I am trying to run this command over four different hosts, Dom0  being the host 
where i am launching mpirun and rest three being xen guest domains.

I have password less login setup across all the hosts. These hosts are actually 
on AMD Opteron dual socket Quad core with 2 cores assigned to each host/domain.

mpirun --prefix /root/xentools/openmpi-gcc/ -mca
 plm_rsh_agent ssh --host localhost, xenguest1 -np 4 
/root/xentools/hello

I am able to run mpirun successfully when I launch this command from one of the 
guests and also when i lauch this command on dom0 (localhost)  alone. But when 
i launch mpirun from the Dom 0 on one or more guests there is no response from 
the guests and I am eventually having to kill the process which reports saying 
  xenguest1 - daemon did not report back when launched

Can someone point out where I am going wrong. I have seen people having similar 
problem in the list but no one posted how they got around this problem.

Note: I also tried setting the default agent launcher to ssh. Also, on every 
host the directory structure is exactly the same and also the Library paths and 
paths are also set up properly and the executable is also present at the exact 
same location..

Thanks,
Rohan Nigam
Research Asst, 
Univ. of Houston





  

Re: [OMPI users] Problem with mpirun -preload-binary option

2009-12-09 Thread Josh Hursey
I verified that the preload functionality works on the trunk. It seems  
to be broken on the v1.3/v1.4 branches. The version of this code has  
changed significantly between the v1.3/v1.4 and the trunk/v1.5  
versions. I filed a bug about this so it does not get lost:

  https://svn.open-mpi.org/trac/ompi/ticket/2139

Can you try this again with either the trunk or v1.5 to see if that  
helps with the preloading?


However you need to fix the password-less login issue before anything  
else will work. If mpirun is prompting you for a password, then it  
will work properly.


-- Josh

On Nov 12, 2009, at 3:50 PM, Qing Pang wrote:

Now that I have passwordless-ssh set up both directions, and  
verified working - I still have the same problem.
I'm able to run ssh/scp on both master and client nodes - (at this  
point, they are pretty much the same), without being asked for  
password. And mpirun works fine if I have the executable put in the  
same directory on both nodes.


But when I tried the preload-binary option, I still have the same  
problem - it asked me for the password of the node running mpirun,  
and then tells that scp failed.


---


Josh Wrote:

Though the --preload-binary option was created while building the  
checkpoint/restart functionality it does not depend on checkpoint/ 
restart function in any way (just a side effect of the initial  
development).


The problem you are seeing is a result of the computing environment  
setup of password-less ssh. The --preload-binary command uses  
'scp' (at the moment) to copy the files from the node running mpirun  
to the compute nodes. The compute nodes are the ones that call  
'scp', so you will need to setup password-less ssh in both directions.


-- Josh

On Nov 11, 2009, at 8:38 AM, Ralph Castain wrote:


I'm no expert on the preload-binary option - but I would suspect that

is the case given your observations.


That option was created to support checkpoint/restart, not for what
you are attempting to do. Like I said, you -should- be able to use  
it for that purpose, but I expect you may hit a few quirks like this  
along the way.


On Nov 11, 2009, at 9:16 AM, Qing Pang wrote:

> Thank you very much for your help! I believe I do have password- 
less
ssh set up, at least from master node to client node (desktop ->  
laptop in my case). If I type >ssh node1 on my desktop terminal, I  
am able to get to the laptop node without being asked for password.  
And as I mentioned, if I copy the example executable from desktop to  
the laptop node using scp, then I am able to run it from desktop  
using both nodes.

> Back to the preload-binary problem - I am asked for the password of
my master node - the node I am working on - not the remote client  
node. Do you mean that I should set up password-less ssh in both  
direction? Does the client node need to access master node through  
password-less ssh to make the preload-binary option work?

>
>
> Ralph Castain Wrote:
>
> It -should- work, but you need password-less ssh setup. See our FAQ
> for how to do that, if you are unfamiliar with it.
>
> On Nov 10, 2009, at 2:02 PM, Qing Pang wrote:
>
> I'm having problem getting the mpirun "preload-binary" option to  
work.

>>
>> I'm using ubutu8.10 with openmpi 1.3.3, nodes connected with

Ethernet cable.
>> If I copy the executable to client nodes using scp, then do  
mpirun,

everything works.

>>
>> But I really want to avoid the copying, so I tried the

-preload-binary option.

>>
>> When I typed the command on my master node as below (gordon- 
desktop

is my master node, and gordon-laptop is the client node):

>>
>>

--

>> gordon_at_gordon-desktop:~/Desktop/openmpi-1.3.3/examples$ mpirun
>> -machinefile machine.linux -np 2 --preload-binary $(pwd)/ 
hello_c.out

>>

--

>>
>> I got the following:
>>
>> gordon_at_gordon-desktop's password: (I entered my password here,
why am I asked for the password? I am working under this account  
anyway)

>>
>>
>> WARNING: Remote peer ([[18118,0],1]) failed to preload a file.
>>
>> Exit Status: 256
>> Local File:

/tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/hello_c.out
>> Remote File: /home/gordon/Desktop/openmpi-1.3.3/examples/ 
hello_c.out

>> Command:
>> scp

gordon-desktop:/home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
>> /tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/ 
hello_c.out

>>
>> Will continue attempting to launch the process(es).
>>

--

>>

--

>> mpirun was unable to launch the specified application as it could

not access

>> or execute an executable:
>>
>> Executable: 

Re: [OMPI users] Problem with mpirun -preload-binary option

2009-11-12 Thread Qing Pang
Now that I have passwordless-ssh set up both directions, and verified 
working - I still have the same problem.
I'm able to run ssh/scp on both master and client nodes - (at this 
point, they are pretty much the same), without being asked for password. 
And mpirun works fine if I have the executable put in the same directory 
on both nodes.


But when I tried the preload-binary option, I still have the same 
problem - it asked me for the password of the node running mpirun, and 
then tells that scp failed.


---


Josh Wrote:

Though the --preload-binary option was created while building the 
checkpoint/restart functionality it does not depend on 
checkpoint/restart function in any way (just a side effect of the 
initial development).


The problem you are seeing is a result of the computing environment 
setup of password-less ssh. The --preload-binary command uses 'scp' (at 
the moment) to copy the files from the node running mpirun to the 
compute nodes. The compute nodes are the ones that call 'scp', so you 
will need to setup password-less ssh in both directions.


-- Josh

On Nov 11, 2009, at 8:38 AM, Ralph Castain wrote:

 I'm no expert on the preload-binary option - but I would suspect that 

is the case given your observations.


 That option was created to support checkpoint/restart, not for what 
you are attempting to do. Like I said, you -should- be able to use it 
for that purpose, but I expect you may hit a few quirks like this along 
the way.


 On Nov 11, 2009, at 9:16 AM, Qing Pang wrote:

> Thank you very much for your help! I believe I do have password-less 
ssh set up, at least from master node to client node (desktop -> laptop 
in my case). If I type >ssh node1 on my desktop terminal, I am able to 
get to the laptop node without being asked for password. And as I 
mentioned, if I copy the example executable from desktop to the laptop 
node using scp, then I am able to run it from desktop using both nodes.
> Back to the preload-binary problem - I am asked for the password of 
my master node - the node I am working on - not the remote client node. 
Do you mean that I should set up password-less ssh in both direction? 
Does the client node need to access master node through password-less 
ssh to make the preload-binary option work?

>
>
> Ralph Castain Wrote:
>
> It -should- work, but you need password-less ssh setup. See our FAQ
> for how to do that, if you are unfamiliar with it.
>
> On Nov 10, 2009, at 2:02 PM, Qing Pang wrote:
>
> I'm having problem getting the mpirun "preload-binary" option to work.
>>
>> I'm using ubutu8.10 with openmpi 1.3.3, nodes connected with 

Ethernet cable.
>> If I copy the executable to client nodes using scp, then do mpirun, 

everything works.

>>
>> But I really want to avoid the copying, so I tried the 

-preload-binary option.

>>
>> When I typed the command on my master node as below (gordon-desktop 

is my master node, and gordon-laptop is the client node):

>>
>> 

--

>> gordon_at_gordon-desktop:~/Desktop/openmpi-1.3.3/examples$ mpirun
>> -machinefile machine.linux -np 2 --preload-binary $(pwd)/hello_c.out
>> 

--

>>
>> I got the following:
>>
>> gordon_at_gordon-desktop's password: (I entered my password here, 

why am I asked for the password? I am working under this account anyway)

>>
>>
>> WARNING: Remote peer ([[18118,0],1]) failed to preload a file.
>>
>> Exit Status: 256
>> Local File: 

/tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/hello_c.out

>> Remote File: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
>> Command:
>> scp 

gordon-desktop:/home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out

>> /tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/hello_c.out
>>
>> Will continue attempting to launch the process(es).
>> 

--
>> 

--
>> mpirun was unable to launch the specified application as it could 

not access

>> or execute an executable:
>>
>> Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
>> Node: node1
>>
>> while attempting to start process rank 1.
>> 

--

>>
>> Had anyone succeeded with the 'preload-binary' option with the 
similar settings? I assume this mpirun option should work when compiling 
openmpi with default options? Anything I need to set?

>>
>> --qing
>>
>>




Re: [OMPI users] Problem with mpirun -preload-binary option

2009-11-11 Thread Josh Hursey
Though the --preload-binary option was created while building the 
checkpoint/restart functionality it does not depend on checkpoint/restart 
function in any way (just a side effect of the initial development).

The problem you are seeing is a result of the computing environment setup of 
password-less ssh. The --preload-binary command uses 'scp' (at the moment) to 
copy the files from the node running mpirun to the compute nodes. The compute 
nodes are the ones that call 'scp', so you will need to setup password-less ssh 
in both directions.

-- Josh

On Nov 11, 2009, at 8:38 AM, Ralph Castain wrote:

> I'm no expert on the preload-binary option - but I would suspect that is the 
> case given your observations.
> 
> That option was created to support checkpoint/restart, not for what you are 
> attempting to do. Like I said, you -should- be able to use it for that 
> purpose, but I expect you may hit a few quirks like this along the way.
> 
> On Nov 11, 2009, at 9:16 AM, Qing Pang wrote:
> 
>> Thank you very much for your help! I believe I do have password-less ssh set 
>> up, at least from master node to client node (desktop -> laptop in my case). 
>> If I type >ssh node1 on my desktop terminal, I am able to get to the laptop 
>> node without being asked for password. And as I mentioned, if I copy the 
>> example executable from desktop to the laptop node using scp, then I am able 
>> to run it from desktop using both nodes.
>> Back to the preload-binary problem - I am asked for the password of my 
>> master node - the node I am working on - not the remote client node. Do you 
>> mean that I should set up password-less ssh in both direction? Does the 
>> client node need to access master node through password-less ssh to make the 
>> preload-binary option work?
>> 
>> 
>> Ralph Castain Wrote:
>> 
>> It -should- work, but you need password-less ssh setup. See our FAQ
>> for how to do that, if you are unfamiliar with it.
>> 
>> On Nov 10, 2009, at 2:02 PM, Qing Pang wrote:
>> 
>> I'm having problem getting the mpirun "preload-binary" option to work.
>>> 
>>> I'm using ubutu8.10 with openmpi 1.3.3, nodes connected with Ethernet cable.
>>> If I copy the executable to client nodes using scp, then do mpirun, 
>>> everything works.
>>> 
>>> But I really want to avoid the copying, so I tried the -preload-binary 
>>> option.
>>> 
>>> When I typed the command on my master node as below (gordon-desktop is my 
>>> master node, and gordon-laptop is the client node):
>>> 
>>> --
>>> gordon_at_gordon-desktop:~/Desktop/openmpi-1.3.3/examples$  mpirun
>>> -machinefile machine.linux -np 2 --preload-binary $(pwd)/hello_c.out
>>> --
>>> 
>>> I got the following:
>>> 
>>> gordon_at_gordon-desktop's password:  (I entered my password here, why am I 
>>> asked for the password? I am working under this account anyway)
>>> 
>>> 
>>> WARNING: Remote peer ([[18118,0],1]) failed to preload a file.
>>> 
>>> Exit Status: 256
>>> Local  File: 
>>> /tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/hello_c.out
>>> Remote File: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
>>> Command:
>>> scp  gordon-desktop:/home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
>>> /tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/hello_c.out
>>> 
>>> Will continue attempting to launch the process(es).
>>> --
>>> --
>>> mpirun was unable to launch the specified application as it could not access
>>> or execute an executable:
>>> 
>>> Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
>>> Node: node1
>>> 
>>> while attempting to start process rank 1.
>>> --
>>> 
>>> Had anyone succeeded with the 'preload-binary' option with the similar 
>>> settings? I assume this mpirun option should work when compiling openmpi 
>>> with default  options? Anything I need to set?
>>> 
>>> --qing
>>> 
>>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Problem with mpirun -preload-binary option

2009-11-11 Thread Ralph Castain
I'm no expert on the preload-binary option - but I would suspect that  
is the case given your observations.


That option was created to support checkpoint/restart, not for what  
you are attempting to do. Like I said, you -should- be able to use it  
for that purpose, but I expect you may hit a few quirks like this  
along the way.


On Nov 11, 2009, at 9:16 AM, Qing Pang wrote:

Thank you very much for your help! I believe I do have password-less  
ssh set up, at least from master node to client node (desktop ->  
laptop in my case). If I type >ssh node1 on my desktop terminal, I  
am able to get to the laptop node without being asked for password.  
And as I mentioned, if I copy the example executable from desktop to  
the laptop node using scp, then I am able to run it from desktop  
using both nodes.
Back to the preload-binary problem - I am asked for the password of  
my master node - the node I am working on - not the remote client  
node. Do you mean that I should set up password-less ssh in both  
direction? Does the client node need to access master node through  
password-less ssh to make the preload-binary option work?



Ralph Castain Wrote:

It -should- work, but you need password-less ssh setup. See our FAQ
for how to do that, if you are unfamiliar with it.

On Nov 10, 2009, at 2:02 PM, Qing Pang wrote:

I'm having problem getting the mpirun "preload-binary" option to work.


I'm using ubutu8.10 with openmpi 1.3.3, nodes connected with  
Ethernet cable.
If I copy the executable to client nodes using scp, then do mpirun,  
everything works.


But I really want to avoid the copying, so I tried the -preload- 
binary option.


When I typed the command on my master node as below (gordon-desktop  
is my master node, and gordon-laptop is the client node):


--
gordon_at_gordon-desktop:~/Desktop/openmpi-1.3.3/examples$  mpirun
-machinefile machine.linux -np 2 --preload-binary $(pwd)/hello_c.out
--

I got the following:

gordon_at_gordon-desktop's password:  (I entered my password here,  
why am I asked for the password? I am working under this account  
anyway)



WARNING: Remote peer ([[18118,0],1]) failed to preload a file.

Exit Status: 256
Local  File: /tmp/openmpi-sessions-gordon_at_gordon- 
laptop_0/18118/0/hello_c.out

Remote File: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Command:
scp  gordon-desktop:/home/gordon/Desktop/openmpi-1.3.3/examples/ 
hello_c.out

/tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/hello_c.out

Will continue attempting to launch the process(es).
--
--
mpirun was unable to launch the specified application as it could  
not access

or execute an executable:

Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Node: node1

while attempting to start process rank 1.
--

Had anyone succeeded with the 'preload-binary' option with the  
similar settings? I assume this mpirun option should work when  
compiling openmpi with default  options? Anything I need to set?


--qing




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Problem with mpirun -preload-binary option

2009-11-11 Thread Qing Pang
Thank you very much for your help! I believe I do have password-less ssh 
set up, at least from master node to client node (desktop -> laptop in 
my case). If I type >ssh node1 on my desktop terminal, I am able to get 
to the laptop node without being asked for password. And as I mentioned, 
if I copy the example executable from desktop to the laptop node using 
scp, then I am able to run it from desktop using both nodes.
Back to the preload-binary problem - I am asked for the password of my 
master node - the node I am working on - not the remote client node. Do 
you mean that I should set up password-less ssh in both direction? Does 
the client node need to access master node through password-less ssh to 
make the preload-binary option work?



Ralph Castain Wrote:

It -should- work, but you need password-less ssh setup. See our FAQ
for how to do that, if you are unfamiliar with it.

On Nov 10, 2009, at 2:02 PM, Qing Pang wrote:

I'm having problem getting the mpirun "preload-binary" option to work.


I'm using ubutu8.10 with openmpi 1.3.3, nodes connected with Ethernet 
cable.
If I copy the executable to client nodes using scp, then do mpirun, 
everything works.


But I really want to avoid the copying, so I tried the -preload-binary 
option.


When I typed the command on my master node as below (gordon-desktop is 
my master node, and gordon-laptop is the client node):


-- 


gordon_at_gordon-desktop:~/Desktop/openmpi-1.3.3/examples$  mpirun
-machinefile machine.linux -np 2 --preload-binary $(pwd)/hello_c.out
-- 



I got the following:

gordon_at_gordon-desktop's password:  (I entered my password here, why 
am I asked for the password? I am working under this account anyway)



WARNING: Remote peer ([[18118,0],1]) failed to preload a file.

Exit Status: 256
Local  File: 
/tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/hello_c.out

Remote File: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Command:
 scp  
gordon-desktop:/home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out

/tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/hello_c.out

Will continue attempting to launch the process(es).
-- 

-- 

mpirun was unable to launch the specified application as it could not 
access

or execute an executable:

Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Node: node1

while attempting to start process rank 1.
-- 



Had anyone succeeded with the 'preload-binary' option with the similar 
settings? I assume this mpirun option should work when compiling 
openmpi with default  options? Anything I need to set?


--qing






Re: [OMPI users] Problem with mpirun -preload-binary option

2009-11-10 Thread Ralph Castain
It -should- work, but you need password-less ssh setup. See our FAQ  
for how to do that, if you are unfamiliar with it.



On Nov 10, 2009, at 2:02 PM, Qing Pang wrote:


I'm having problem getting the mpirun "preload-binary" option to work.

I'm using ubutu8.10 with openmpi 1.3.3, nodes connected with  
Ethernet cable.
If I copy the executable to client nodes using scp, then do mpirun,  
everything works.


But I really want to avoid the copying, so I tried the -preload- 
binary option.


When I typed the command on my master node as below (gordon-desktop  
is my master node, and gordon-laptop is the client node):


--
gordon_at_gordon-desktop:~/Desktop/openmpi-1.3.3/examples$  mpirun
-machinefile machine.linux -np 2 --preload-binary $(pwd)/hello_c.out
--

I got the following:

gordon_at_gordon-desktop's password:  (I entered my password here,  
why am I asked for the password? I am working under this account  
anyway)



WARNING: Remote peer ([[18118,0],1]) failed to preload a file.

Exit Status: 256
Local  File: /tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/ 
hello_c.out

Remote File: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Command:
scp  gordon-desktop:/home/gordon/Desktop/openmpi-1.3.3/examples/ 
hello_c.out

/tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/hello_c.out

Will continue attempting to launch the process(es).
--
--
mpirun was unable to launch the specified application as it could  
not access

or execute an executable:

Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Node: node1

while attempting to start process rank 1.
--

Had anyone succeeded with the 'preload-binary' option with the  
similar settings? I assume this mpirun option should work when  
compiling openmpi with default  options? Anything I need to set?


--qing

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Problem with mpirun -preload-binary option

2009-11-10 Thread Qing Pang

I'm having problem getting the mpirun "preload-binary" option to work.

I'm using ubutu8.10 with openmpi 1.3.3, nodes connected with Ethernet cable.
If I copy the executable to client nodes using scp, then do mpirun, 
everything works.


But I really want to avoid the copying, so I tried the -preload-binary 
option.


When I typed the command on my master node as below (gordon-desktop is 
my master node, and gordon-laptop is the client node):


--
gordon_at_gordon-desktop:~/Desktop/openmpi-1.3.3/examples$  mpirun
-machinefile machine.linux -np 2 --preload-binary $(pwd)/hello_c.out
--

I got the following:

gordon_at_gordon-desktop's password:  (I entered my password here, why 
am I asked for the password? I am working under this account anyway)



WARNING: Remote peer ([[18118,0],1]) failed to preload a file.

Exit Status: 256
Local  File: 
/tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/hello_c.out

Remote File: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Command:
 scp  
gordon-desktop:/home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out

/tmp/openmpi-sessions-gordon_at_gordon-laptop_0/18118/0/hello_c.out

Will continue attempting to launch the process(es).
--
--
mpirun was unable to launch the specified application as it could not 
access

or execute an executable:

Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Node: node1

while attempting to start process rank 1.
--

Had anyone succeeded with the 'preload-binary' option with the similar 
settings? I assume this mpirun option should work when compiling openmpi 
with default  options? Anything I need to set?


--qing



Re: [OMPI users] problem calling mpirun from script invoked

2009-10-29 Thread Ralph Castain
Please see my earlier response. This proposed solution will work, but may be
unstable as it (a) removes all of OMPI's internal variables, some of which
are required; and (b) also removes all the variables that might be needed by
your system. For example, envars directing the use of specific transports,
or defining buffer sizes, will all be removed from the subsequent execution.

So it can work - but may lead to surprising results. Definitely a "user
beware" method.

:-)

Ralph


On Thu, Oct 29, 2009 at 2:34 AM, Per Madsen  wrote:

>  Could your problem is related to the MCA parameter “contamination”
> problem, where the child MPI process inherits MCA environment variables from
> the parent process still exists.
>
>
>
> Back in 2007 I was implementing a program that solves two large
> interrelated systems of equations (+200.000.000 eq.) using PCG iteration.
> The program starts to iterate on the first system until a certain degree of
> convergence, then the master node executes a shell script which starts the
> parallel solver on the second system. Again the iteration is to certain
> degree of convergence, some parameters from solving the second system are
> stored in files. After the solving of the second system, the stored
> parameters are used in the solver for the first system. Both before and
> after the master node makes the system call the nodes are synchronized via
> calls of MPI_BARRIER.
>
>
>
> The program was hanging when the master node executed the shell script.
>
>
>
> I found that it was because MCA environment variables was inherited form
> the parent process, and solved the problem by adding the following to the
> script starting the second MPI program:
>
>
>
> for i in $(env | grep OMPI_MCA |sed 's/=/ /' | awk '{print $1}')
>
>   do
>
> unset $i
>
>   done
>
>
>
> Med venlig hilsen / Regards
>
> *Per Madsen*
> Seniorforsker / Senior scientist
>
> *AARHUS UNIVERSITET / UNIVERSITY OF AARHUS*  *Det
> Jordbrugsvidenskabelige Fakultet / Faculty of Agricultural Sciences*  Inst.
> for Genetik og Bioteknologi / Dept. of Genetics and Biotechnology  Blichers
> Allé 20, P.O. BOX 50  DK-8830 Tjele   Tel: +45 8999 1900  Direct: +45
> 8999 1216  Mobile: +45  E-mail: per.mad...@agrsci.dk  Web: www.agrsci.dk
> --
> DJF udbyder nye uddannelser/ 
> DJF
> now offers new degree 
> programmes.
>
>
> Tilmeld dig DJF's nyhedsbrev / Subscribe Faculty of Agricultural Sciences
> Newsletter .
>
> Denne email kan indeholde fortrolig information. Enhver brug eller
> offentliggørelse af denne email uden skriftlig tilladelse fra DJF er ikke
> tilladt. Hvis De ikke er den tiltænkte adressat, bedes De venligst straks
> underrette DJF samt slette emailen.
>
> This email may contain information that is confidential. Any use or
> publication of this email without written permission from Faculty of
> Agricultural Sciences is not allowed. If you are not the intended recipient,
> please notify Faculty of Agricultural Sciences immediately and delete this
> email.
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] problem calling mpirun from script invoked

2009-10-29 Thread Per Madsen
Could your problem is related to the MCA parameter "contamination" problem, 
where the child MPI process inherits MCA environment variables from the parent 
process still exists.



Back in 2007 I was implementing a program that solves two large interrelated 
systems of equations (+200.000.000 eq.) using PCG iteration. The program starts 
to iterate on the first system until a certain degree of convergence, then the 
master node executes a shell script which starts the parallel solver on the 
second system. Again the iteration is to certain degree of convergence, some 
parameters from solving the second system are stored in files. After the 
solving of the second system, the stored parameters are used in the solver for 
the first system. Both before and after the master node makes the system call 
the nodes are synchronized via calls of MPI_BARRIER.



The program was hanging when the master node executed the shell script.



I found that it was because MCA environment variables was inherited form the 
parent process, and solved the problem by adding the following to the script 
starting the second MPI program:



for i in $(env | grep OMPI_MCA |sed 's/=/ /' | awk '{print $1}')

  do

unset $i

  done




Med venlig hilsen / Regards

Per Madsen
Seniorforsker / Senior scientist



 AARHUS UNIVERSITET / UNIVERSITY OF AARHUS
Det Jordbrugsvidenskabelige Fakultet / Faculty of Agricultural Sciences
Inst. for Genetik og Bioteknologi / Dept. of Genetics and Biotechnology
Blichers Allé 20, P.O. BOX 50
DK-8830 Tjele

Tel: +45 8999 1900
Direct:  +45 8999 1216
Mobile:  +45
E-mail:  per.mad...@agrsci.dk
Web: www.agrsci.dk
  _

DJF udbyder nye uddannelser / 
DJF now offers new degree 
programmes.

Tilmeld dig DJF's nyhedsbrev / Subscribe Faculty of Agricultural Sciences 
Newsletter.

Denne email kan indeholde fortrolig information. Enhver brug eller 
offentliggørelse af denne email uden skriftlig tilladelse fra DJF er ikke 
tilladt. Hvis De ikke er den tiltænkte adressat, bedes De venligst straks 
underrette DJF samt slette emailen.

This email may contain information that is confidential. Any use or publication 
of this email without written permission from Faculty of Agricultural Sciences 
is not allowed. If you are not the intended recipient, please notify Faculty of 
Agricultural Sciences immediately and delete this email.



Re: [OMPI users] problem calling mpirun from script invoked with mpirun

2009-10-28 Thread Luke Shulenburger
Thanks,

That's what I wanted to know.  And thanks for all the help!

Luke

On Wed, Oct 28, 2009 at 9:06 PM, Ralph Castain  wrote:
> I see. No, we don't copy your envars and ship them to remote nodes. Simple
> reason is that we don't know which ones we can safely move, and which would
> cause problems.
>
> However, we do provide a mechanism for you to tell us which envars to move.
> Just add:
>
> -x LD_LIBRARY_PATH
>
> to your mpirun cmd line and we will pickup that value and move it. You can
> use that option any number of times.
>
> I know it's a tad tedious if you have to move many of them, but it's the
> only safe mechanism we could devise.
>
> HTH
> Ralph


Re: [OMPI users] problem calling mpirun from script invoked with mpirun

2009-10-28 Thread Ralph Castain
I see. No, we don't copy your envars and ship them to remote nodes. Simple
reason is that we don't know which ones we can safely move, and which would
cause problems.

However, we do provide a mechanism for you to tell us which envars to move.
Just add:

-x LD_LIBRARY_PATH

to your mpirun cmd line and we will pickup that value and move it. You can
use that option any number of times.

I know it's a tad tedious if you have to move many of them, but it's the
only safe mechanism we could devise.

HTH
Ralph


On Wed, Oct 28, 2009 at 2:36 PM, Luke Shulenburger
wrote:

> My apologies for not being clear.  These variables are set in my
> environment, they just are not published to the other nodes in the
> cluster when the jobs are run through the scheduler.  At the moment,
> even though I can use mpirun to run jobs locally on the head node
> without touching my environment, if I use the scheduler I am forced to
> do something like source my bashrc in the jub submission script to get
> them set.  I had always assumed that mpirun just copied my current
> environment variables to the nodes, but this does not seem to be
> happening in this case.
>
> Luke
>
> On Wed, Oct 28, 2009 at 4:30 PM, Ralph Castain  wrote:
> > Normally, one does simply set the ld_library_path in your environment to
> > point to the right thing. Alternatively, you could configure OMPI with
> >
> > --enable-mpirun-prefix-by-default
> >
> > This tells OMPI to automatically add the prefix you configured the system
> > with to your ld_library_path and path envars. It should solve your
> problem,
> > if you don't want to simply set those values in your environment anyway.
> >
> > Ralph
> >
> >
> > On Wed, Oct 28, 2009 at 2:10 PM, Luke Shulenburger <
> lshulenbur...@gmail.com>
> > wrote:
> >>
> >> Thanks for the quick reply.  This leads me to another issue I have
> >> been having with openmpi as it relates to sge.  The "tight
> >> integration" works where I do not have to give mpirun a hostfile when
> >> I use the scheduler, but it does not seem to be passing on my
> >> environment variables.  Specifically because I used intel compilers to
> >> compile openmpi, I have to be sure to set the LD_LIBRARY_PATH
> >> correctly in my job submission script or openmpi will not run (giving
> >> the error discussed in the FAQ).  Where I am a little lost is whether
> >> this is a problem with the way I built openmpi or whether it is a
> >> configuration problem with sge.
> >>
> >> This may be unrelated to my previous problem, but the similarities
> >> with the environment variables made me think of it.
> >>
> >> Thanks for your consideration,
> >> Luke Shulenburger
> >> Geophysical Laboratory
> >> Carnegie Institution of Washington
> >>
> >> On Wed, Oct 28, 2009 at 3:48 PM, Ralph Castain 
> wrote:
> >> > I'm afraid we have never really supported this kind of nested
> >> > invocations of
> >> > mpirun. If it works with any version of OMPI, it is totally a fluke -
> it
> >> > might work one time, and then fail the next.
> >> >
> >> > The problem is that we pass envars to the launched processes to
> control
> >> > their behavior, and these conflict with what mpirun needs. We have
> tried
> >> > various scrubbing mechanisms (i.e., having mpirun start out by
> scrubbing
> >> > the
> >> > environment of envars that would have come from the initial mpirun,
> but
> >> > they
> >> > all have the unfortunate possibility of removing parameters provided
> by
> >> > the
> >> > user - and that can cause its own problems.
> >> >
> >> > I don't know if we will ever support nested operations - occasionally,
> I
> >> > do
> >> > give it some thought, but have yet to find a foolproof solution.
> >> >
> >> > Ralph
> >> >
> >> >
> >> > On Wed, Oct 28, 2009 at 1:11 PM, Luke Shulenburger
> >> > 
> >> > wrote:
> >> >>
> >> >> Hello,
> >> >> I am having trouble with a script that calls mpi.  Basically my
> >> >> problem distills to wanting to call a script with:
> >> >>
> >> >> mpirun -np # ./script.sh
> >> >>
> >> >> where script.sh looks like:
> >> >> #!/bin/bash
> >> >> mpirun -np 2 ./mpiprogram
> >> >>
> >> >> Whenever I invoke script.sh normally (as ./script.sh for instance) it
> >> >> works fine, but if I do mpirun -np 2 ./script.sh I get the following
> >> >> error:
> >> >>
> >> >> [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is
> >> >> attempting to be sent to a process whose contact information is
> >> >> unknown in file rml_oob_send.c at line 105
> >> >> [ppv.stanford.edu:08814] [[27860,1],0] could not get route to
> >> >> [[INVALID],INVALID]
> >> >> [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is
> >> >> attempting to be sent to a process whose contact information is
> >> >> unknown in file base/plm_base_proxy.c at line 86
> >> >>
> >> >> I have also tried running with mpirun -d to get some debugging info
> >> >> and it appears that the proctable is 

Re: [OMPI users] problem calling mpirun from script invoked with mpirun

2009-10-28 Thread Luke Shulenburger
My apologies for not being clear.  These variables are set in my
environment, they just are not published to the other nodes in the
cluster when the jobs are run through the scheduler.  At the moment,
even though I can use mpirun to run jobs locally on the head node
without touching my environment, if I use the scheduler I am forced to
do something like source my bashrc in the jub submission script to get
them set.  I had always assumed that mpirun just copied my current
environment variables to the nodes, but this does not seem to be
happening in this case.

Luke

On Wed, Oct 28, 2009 at 4:30 PM, Ralph Castain  wrote:
> Normally, one does simply set the ld_library_path in your environment to
> point to the right thing. Alternatively, you could configure OMPI with
>
> --enable-mpirun-prefix-by-default
>
> This tells OMPI to automatically add the prefix you configured the system
> with to your ld_library_path and path envars. It should solve your problem,
> if you don't want to simply set those values in your environment anyway.
>
> Ralph
>
>
> On Wed, Oct 28, 2009 at 2:10 PM, Luke Shulenburger 
> wrote:
>>
>> Thanks for the quick reply.  This leads me to another issue I have
>> been having with openmpi as it relates to sge.  The "tight
>> integration" works where I do not have to give mpirun a hostfile when
>> I use the scheduler, but it does not seem to be passing on my
>> environment variables.  Specifically because I used intel compilers to
>> compile openmpi, I have to be sure to set the LD_LIBRARY_PATH
>> correctly in my job submission script or openmpi will not run (giving
>> the error discussed in the FAQ).  Where I am a little lost is whether
>> this is a problem with the way I built openmpi or whether it is a
>> configuration problem with sge.
>>
>> This may be unrelated to my previous problem, but the similarities
>> with the environment variables made me think of it.
>>
>> Thanks for your consideration,
>> Luke Shulenburger
>> Geophysical Laboratory
>> Carnegie Institution of Washington
>>
>> On Wed, Oct 28, 2009 at 3:48 PM, Ralph Castain  wrote:
>> > I'm afraid we have never really supported this kind of nested
>> > invocations of
>> > mpirun. If it works with any version of OMPI, it is totally a fluke - it
>> > might work one time, and then fail the next.
>> >
>> > The problem is that we pass envars to the launched processes to control
>> > their behavior, and these conflict with what mpirun needs. We have tried
>> > various scrubbing mechanisms (i.e., having mpirun start out by scrubbing
>> > the
>> > environment of envars that would have come from the initial mpirun, but
>> > they
>> > all have the unfortunate possibility of removing parameters provided by
>> > the
>> > user - and that can cause its own problems.
>> >
>> > I don't know if we will ever support nested operations - occasionally, I
>> > do
>> > give it some thought, but have yet to find a foolproof solution.
>> >
>> > Ralph
>> >
>> >
>> > On Wed, Oct 28, 2009 at 1:11 PM, Luke Shulenburger
>> > 
>> > wrote:
>> >>
>> >> Hello,
>> >> I am having trouble with a script that calls mpi.  Basically my
>> >> problem distills to wanting to call a script with:
>> >>
>> >> mpirun -np # ./script.sh
>> >>
>> >> where script.sh looks like:
>> >> #!/bin/bash
>> >> mpirun -np 2 ./mpiprogram
>> >>
>> >> Whenever I invoke script.sh normally (as ./script.sh for instance) it
>> >> works fine, but if I do mpirun -np 2 ./script.sh I get the following
>> >> error:
>> >>
>> >> [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is
>> >> attempting to be sent to a process whose contact information is
>> >> unknown in file rml_oob_send.c at line 105
>> >> [ppv.stanford.edu:08814] [[27860,1],0] could not get route to
>> >> [[INVALID],INVALID]
>> >> [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is
>> >> attempting to be sent to a process whose contact information is
>> >> unknown in file base/plm_base_proxy.c at line 86
>> >>
>> >> I have also tried running with mpirun -d to get some debugging info
>> >> and it appears that the proctable is not being created for the second
>> >> mpirun.  The command hangs like so:
>> >>
>> >> [ppv.stanford.edu:08823] procdir:
>> >> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/0/0
>> >> [ppv.stanford.edu:08823] jobdir:
>> >> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/0
>> >> [ppv.stanford.edu:08823] top: openmpi-sessions-sluke@ppv.stanford.edu_0
>> >> [ppv.stanford.edu:08823] tmp: /tmp
>> >> [ppv.stanford.edu:08823] [[27855,0],0] node[0].name ppv daemon 0 arch
>> >> ffc91200
>> >> [ppv.stanford.edu:08823] Info: Setting up debugger process table for
>> >> applications
>> >>  MPIR_being_debugged = 0
>> >>  MPIR_debug_state = 1
>> >>  MPIR_partial_attach_ok = 1
>> >>  MPIR_i_am_starter = 0
>> >>  MPIR_proctable_size = 1
>> >>  MPIR_proctable:
>> >>    (i, host, exe, pid) = (0, 

Re: [OMPI users] problem calling mpirun from script invoked with mpirun

2009-10-28 Thread Ralph Castain
Normally, one does simply set the ld_library_path in your environment to
point to the right thing. Alternatively, you could configure OMPI with

--enable-mpirun-prefix-by-default

This tells OMPI to automatically add the prefix you configured the system
with to your ld_library_path and path envars. It should solve your problem,
if you don't want to simply set those values in your environment anyway.

Ralph


On Wed, Oct 28, 2009 at 2:10 PM, Luke Shulenburger
wrote:

> Thanks for the quick reply.  This leads me to another issue I have
> been having with openmpi as it relates to sge.  The "tight
> integration" works where I do not have to give mpirun a hostfile when
> I use the scheduler, but it does not seem to be passing on my
> environment variables.  Specifically because I used intel compilers to
> compile openmpi, I have to be sure to set the LD_LIBRARY_PATH
> correctly in my job submission script or openmpi will not run (giving
> the error discussed in the FAQ).  Where I am a little lost is whether
> this is a problem with the way I built openmpi or whether it is a
> configuration problem with sge.
>
> This may be unrelated to my previous problem, but the similarities
> with the environment variables made me think of it.
>
> Thanks for your consideration,
> Luke Shulenburger
> Geophysical Laboratory
> Carnegie Institution of Washington
>
> On Wed, Oct 28, 2009 at 3:48 PM, Ralph Castain  wrote:
> > I'm afraid we have never really supported this kind of nested invocations
> of
> > mpirun. If it works with any version of OMPI, it is totally a fluke - it
> > might work one time, and then fail the next.
> >
> > The problem is that we pass envars to the launched processes to control
> > their behavior, and these conflict with what mpirun needs. We have tried
> > various scrubbing mechanisms (i.e., having mpirun start out by scrubbing
> the
> > environment of envars that would have come from the initial mpirun, but
> they
> > all have the unfortunate possibility of removing parameters provided by
> the
> > user - and that can cause its own problems.
> >
> > I don't know if we will ever support nested operations - occasionally, I
> do
> > give it some thought, but have yet to find a foolproof solution.
> >
> > Ralph
> >
> >
> > On Wed, Oct 28, 2009 at 1:11 PM, Luke Shulenburger <
> lshulenbur...@gmail.com>
> > wrote:
> >>
> >> Hello,
> >> I am having trouble with a script that calls mpi.  Basically my
> >> problem distills to wanting to call a script with:
> >>
> >> mpirun -np # ./script.sh
> >>
> >> where script.sh looks like:
> >> #!/bin/bash
> >> mpirun -np 2 ./mpiprogram
> >>
> >> Whenever I invoke script.sh normally (as ./script.sh for instance) it
> >> works fine, but if I do mpirun -np 2 ./script.sh I get the following
> >> error:
> >>
> >> [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is
> >> attempting to be sent to a process whose contact information is
> >> unknown in file rml_oob_send.c at line 105
> >> [ppv.stanford.edu:08814] [[27860,1],0] could not get route to
> >> [[INVALID],INVALID]
> >> [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is
> >> attempting to be sent to a process whose contact information is
> >> unknown in file base/plm_base_proxy.c at line 86
> >>
> >> I have also tried running with mpirun -d to get some debugging info
> >> and it appears that the proctable is not being created for the second
> >> mpirun.  The command hangs like so:
> >>
> >> [ppv.stanford.edu:08823] procdir:
> >> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/0/0
> >> [ppv.stanford.edu:08823] jobdir:
> >> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/0
> >> [ppv.stanford.edu:08823] top: openmpi-sessions-sluke@ppv.stanford.edu_0
> >> [ppv.stanford.edu:08823] tmp: /tmp
> >> [ppv.stanford.edu:08823] [[27855,0],0] node[0].name ppv daemon 0 arch
> >> ffc91200
> >> [ppv.stanford.edu:08823] Info: Setting up debugger process table for
> >> applications
> >>  MPIR_being_debugged = 0
> >>  MPIR_debug_state = 1
> >>  MPIR_partial_attach_ok = 1
> >>  MPIR_i_am_starter = 0
> >>  MPIR_proctable_size = 1
> >>  MPIR_proctable:
> >>(i, host, exe, pid) = (0, ppv.stanford.edu,
> >> /home/sluke/maintenance/openmpi-1.3.3/examples/./shell.sh, 8824)
> >> [ppv.stanford.edu:08825] procdir:
> >> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/1/0
> >> [ppv.stanford.edu:08825] jobdir:
> >> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/1
> >> [ppv.stanford.edu:08825] top: openmpi-sessions-sluke@ppv.stanford.edu_0
> >> [ppv.stanford.edu:08825] tmp: /tmp
> >> [ppv.stanford.edu:08825] [[27855,1],0] ORTE_ERROR_LOG: A message is
> >> attempting to be sent to a process whose contact information is
> >> unknown in file rml_oob_send.c at line 105
> >> [ppv.stanford.edu:08825] [[27855,1],0] could not get route to
> >> [[INVALID],INVALID]
> >> [ppv.stanford.edu:08825] [[27855,1],0] ORTE_ERROR_LOG: A message is
> >> attempting to be sent 

Re: [OMPI users] problem calling mpirun from script invoked with mpirun

2009-10-28 Thread Luke Shulenburger
Thanks for the quick reply.  This leads me to another issue I have
been having with openmpi as it relates to sge.  The "tight
integration" works where I do not have to give mpirun a hostfile when
I use the scheduler, but it does not seem to be passing on my
environment variables.  Specifically because I used intel compilers to
compile openmpi, I have to be sure to set the LD_LIBRARY_PATH
correctly in my job submission script or openmpi will not run (giving
the error discussed in the FAQ).  Where I am a little lost is whether
this is a problem with the way I built openmpi or whether it is a
configuration problem with sge.

This may be unrelated to my previous problem, but the similarities
with the environment variables made me think of it.

Thanks for your consideration,
Luke Shulenburger
Geophysical Laboratory
Carnegie Institution of Washington

On Wed, Oct 28, 2009 at 3:48 PM, Ralph Castain  wrote:
> I'm afraid we have never really supported this kind of nested invocations of
> mpirun. If it works with any version of OMPI, it is totally a fluke - it
> might work one time, and then fail the next.
>
> The problem is that we pass envars to the launched processes to control
> their behavior, and these conflict with what mpirun needs. We have tried
> various scrubbing mechanisms (i.e., having mpirun start out by scrubbing the
> environment of envars that would have come from the initial mpirun, but they
> all have the unfortunate possibility of removing parameters provided by the
> user - and that can cause its own problems.
>
> I don't know if we will ever support nested operations - occasionally, I do
> give it some thought, but have yet to find a foolproof solution.
>
> Ralph
>
>
> On Wed, Oct 28, 2009 at 1:11 PM, Luke Shulenburger 
> wrote:
>>
>> Hello,
>> I am having trouble with a script that calls mpi.  Basically my
>> problem distills to wanting to call a script with:
>>
>> mpirun -np # ./script.sh
>>
>> where script.sh looks like:
>> #!/bin/bash
>> mpirun -np 2 ./mpiprogram
>>
>> Whenever I invoke script.sh normally (as ./script.sh for instance) it
>> works fine, but if I do mpirun -np 2 ./script.sh I get the following
>> error:
>>
>> [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is
>> attempting to be sent to a process whose contact information is
>> unknown in file rml_oob_send.c at line 105
>> [ppv.stanford.edu:08814] [[27860,1],0] could not get route to
>> [[INVALID],INVALID]
>> [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is
>> attempting to be sent to a process whose contact information is
>> unknown in file base/plm_base_proxy.c at line 86
>>
>> I have also tried running with mpirun -d to get some debugging info
>> and it appears that the proctable is not being created for the second
>> mpirun.  The command hangs like so:
>>
>> [ppv.stanford.edu:08823] procdir:
>> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/0/0
>> [ppv.stanford.edu:08823] jobdir:
>> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/0
>> [ppv.stanford.edu:08823] top: openmpi-sessions-sluke@ppv.stanford.edu_0
>> [ppv.stanford.edu:08823] tmp: /tmp
>> [ppv.stanford.edu:08823] [[27855,0],0] node[0].name ppv daemon 0 arch
>> ffc91200
>> [ppv.stanford.edu:08823] Info: Setting up debugger process table for
>> applications
>>  MPIR_being_debugged = 0
>>  MPIR_debug_state = 1
>>  MPIR_partial_attach_ok = 1
>>  MPIR_i_am_starter = 0
>>  MPIR_proctable_size = 1
>>  MPIR_proctable:
>>    (i, host, exe, pid) = (0, ppv.stanford.edu,
>> /home/sluke/maintenance/openmpi-1.3.3/examples/./shell.sh, 8824)
>> [ppv.stanford.edu:08825] procdir:
>> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/1/0
>> [ppv.stanford.edu:08825] jobdir:
>> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/1
>> [ppv.stanford.edu:08825] top: openmpi-sessions-sluke@ppv.stanford.edu_0
>> [ppv.stanford.edu:08825] tmp: /tmp
>> [ppv.stanford.edu:08825] [[27855,1],0] ORTE_ERROR_LOG: A message is
>> attempting to be sent to a process whose contact information is
>> unknown in file rml_oob_send.c at line 105
>> [ppv.stanford.edu:08825] [[27855,1],0] could not get route to
>> [[INVALID],INVALID]
>> [ppv.stanford.edu:08825] [[27855,1],0] ORTE_ERROR_LOG: A message is
>> attempting to be sent to a process whose contact information is
>> unknown in file base/plm_base_proxy.c at line 86
>> [ppv.stanford.edu:08825] Info: Setting up debugger process table for
>> applications
>>  MPIR_being_debugged = 0
>>  MPIR_debug_state = 1
>>  MPIR_partial_attach_ok = 1
>>  MPIR_i_am_starter = 0
>>  MPIR_proctable_size = 0
>>  MPIR_proctable:
>>
>>
>> In this case, it does not matter what the ultimate mpiprogram I try to
>> run is, the shell script fails in the same way regardless (I've tried
>> the hello_f90 executable from the openmpi examples directory).  Here
>> are some details of my setup:
>>
>> I have built openmpi 1.3.3 with the intel fortran in c compilers
>> (version 11.1).  The machine 

Re: [OMPI users] problem calling mpirun from script invoked with mpirun

2009-10-28 Thread Ralph Castain
I'm afraid we have never really supported this kind of nested invocations of
mpirun. If it works with any version of OMPI, it is totally a fluke - it
might work one time, and then fail the next.

The problem is that we pass envars to the launched processes to control
their behavior, and these conflict with what mpirun needs. We have tried
various scrubbing mechanisms (i.e., having mpirun start out by scrubbing the
environment of envars that would have come from the initial mpirun, but they
all have the unfortunate possibility of removing parameters provided by the
user - and that can cause its own problems.

I don't know if we will ever support nested operations - occasionally, I do
give it some thought, but have yet to find a foolproof solution.

Ralph


On Wed, Oct 28, 2009 at 1:11 PM, Luke Shulenburger
wrote:

> Hello,
> I am having trouble with a script that calls mpi.  Basically my
> problem distills to wanting to call a script with:
>
> mpirun -np # ./script.sh
>
> where script.sh looks like:
> #!/bin/bash
> mpirun -np 2 ./mpiprogram
>
> Whenever I invoke script.sh normally (as ./script.sh for instance) it
> works fine, but if I do mpirun -np 2 ./script.sh I get the following
> error:
>
> [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is
> attempting to be sent to a process whose contact information is
> unknown in file rml_oob_send.c at line 105
> [ppv.stanford.edu:08814] [[27860,1],0] could not get route to
> [[INVALID],INVALID]
> [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is
> attempting to be sent to a process whose contact information is
> unknown in file base/plm_base_proxy.c at line 86
>
> I have also tried running with mpirun -d to get some debugging info
> and it appears that the proctable is not being created for the second
> mpirun.  The command hangs like so:
>
> [ppv.stanford.edu:08823] procdir:
> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/0/0
> [ppv.stanford.edu:08823] jobdir:
> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/0
> [ppv.stanford.edu:08823] top: openmpi-sessions-sluke@ppv.stanford.edu_0
> [ppv.stanford.edu:08823] tmp: /tmp
> [ppv.stanford.edu:08823] [[27855,0],0] node[0].name ppv daemon 0 arch
> ffc91200
> [ppv.stanford.edu:08823] Info: Setting up debugger process table for
> applications
>  MPIR_being_debugged = 0
>  MPIR_debug_state = 1
>  MPIR_partial_attach_ok = 1
>  MPIR_i_am_starter = 0
>  MPIR_proctable_size = 1
>  MPIR_proctable:
>(i, host, exe, pid) = (0, ppv.stanford.edu,
> /home/sluke/maintenance/openmpi-1.3.3/examples/./shell.sh, 8824)
> [ppv.stanford.edu:08825] procdir:
> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/1/0
> [ppv.stanford.edu:08825] jobdir:
> /tmp/openmpi-sessions-sluke@ppv.stanford.edu_0/27855/1
> [ppv.stanford.edu:08825] top: openmpi-sessions-sluke@ppv.stanford.edu_0
> [ppv.stanford.edu:08825] tmp: /tmp
> [ppv.stanford.edu:08825] [[27855,1],0] ORTE_ERROR_LOG: A message is
> attempting to be sent to a process whose contact information is
> unknown in file rml_oob_send.c at line 105
> [ppv.stanford.edu:08825] [[27855,1],0] could not get route to
> [[INVALID],INVALID]
> [ppv.stanford.edu:08825] [[27855,1],0] ORTE_ERROR_LOG: A message is
> attempting to be sent to a process whose contact information is
> unknown in file base/plm_base_proxy.c at line 86
> [ppv.stanford.edu:08825] Info: Setting up debugger process table for
> applications
>  MPIR_being_debugged = 0
>  MPIR_debug_state = 1
>  MPIR_partial_attach_ok = 1
>  MPIR_i_am_starter = 0
>  MPIR_proctable_size = 0
>  MPIR_proctable:
>
>
> In this case, it does not matter what the ultimate mpiprogram I try to
> run is, the shell script fails in the same way regardless (I've tried
> the hello_f90 executable from the openmpi examples directory).  Here
> are some details of my setup:
>
> I have built openmpi 1.3.3 with the intel fortran in c compilers
> (version 11.1).  The machine uses rocks with the SGE scheduler, so I
> have run autoconf with ./configure --prefix=/home/sluke --with-sge,
> however this problem persists even if I am running on the head node
> outside of the scheduler.  I am attaching the resulting config.log to
> this email as well as output to ompi_info --all and ifconfig.  I hope
> this gives the experts on the list enough to go from, but I will be
> happy to provide any more information that might be helpful.
>
> Luke Shulenburger
> Geophysical Laboratory
> Carnegie Institution of Washington
>
>
> PS I have tried this on a machine with openmpi-1.2.6 and cannot
> reproduce the error, however on a second machine with openmpi-1.3.2 I
> have the same problem.
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>