Re: [OMPI devel] mpirun --prefix question

2007-03-22 Thread Jeff Squyres
FWIW, I believe that we had intended --prefix to handle simple cases  
which is why this probably doesn't work for you.  But as long as the  
different prefixes are specified for different nodes, it could  
probably be made to work.


Which launcher are you using this with?



On Mar 21, 2007, at 11:36 PM, Ralph Castain wrote:


Yo David

What system are you running this on? RoadRunner? If so, I can take  
a look at

"fixing" it for you tomorrow (Thurs).

Ralph


On 3/21/07 10:17 AM, "David Daniel"  wrote:


I'm experimenting with heterogeneous applications (x86_64 <-->
ppc64), where the systems share the file system where Open MPI is
installed.

What I would like to be able to do is something like this:

mpirun --np 1 --host host-x86_64 --prefix /opt/ompi/x86_64
a.out.x86_64 : --np 1 --host host-ppc64 --prefix /opt/ompi/ppc64
a.out.ppc64

Unfortunately it looks as if the second --prefix is always ignored.
My guess is that orte_app_context_t::prefix_dir is getting set, but
only the 0th app context is never consulted (except in the dynamic
process stuff where I do see a loop over the app context array).

I can of course work around it with startup scripts, but a command
line solution would be attractive.

This is with openmpi-1.2.

Thanks, David

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] mpirun --prefix question

2007-03-22 Thread David Daniel

This is a development system for roadrunner using ssh.

David

On Mar 22, 2007, at 5:19 AM, Jeff Squyres wrote:


FWIW, I believe that we had intended --prefix to handle simple cases
which is why this probably doesn't work for you.  But as long as the
different prefixes are specified for different nodes, it could
probably be made to work.

Which launcher are you using this with?



On Mar 21, 2007, at 11:36 PM, Ralph Castain wrote:


Yo David

What system are you running this on? RoadRunner? If so, I can take
a look at
"fixing" it for you tomorrow (Thurs).

Ralph


On 3/21/07 10:17 AM, "David Daniel"  wrote:


I'm experimenting with heterogeneous applications (x86_64 <-->
ppc64), where the systems share the file system where Open MPI is
installed.

What I would like to be able to do is something like this:

mpirun --np 1 --host host-x86_64 --prefix /opt/ompi/x86_64
a.out.x86_64 : --np 1 --host host-ppc64 --prefix /opt/ompi/ppc64
a.out.ppc64

Unfortunately it looks as if the second --prefix is always ignored.
My guess is that orte_app_context_t::prefix_dir is getting set, but
only the 0th app context is never consulted (except in the dynamic
process stuff where I do see a loop over the app context array).

I can of course work around it with startup scripts, but a command
line solution would be attractive.

This is with openmpi-1.2.

Thanks, David

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
David Daniel 
Computer Science for High-Performance Computing (CCS-1)






Re: [OMPI devel] mpirun --prefix question

2007-03-22 Thread Ralph Castain
We had a nice chat about this on the OpenRTE telecon this morning. The
question of what to do with multiple prefix's has been a long-running issue,
most recently captured in bug trac report #497. The problem is that prefix
is intended to tell us where to find the ORTE/OMPI executables, and
therefore is associated with a node - not an app_context. What we haven't
been able to define is an appropriate notation that a user can exploit to
tell us the association.

This issue has arisen on several occasions where either (a) users have
heterogeneous clusters with a common file system, so the prefix must be
adjusted on each *type* of node to point to the correct type of binary; and
(b) for whatever reason, typically on rsh/ssh clusters, users have installed
the binaries in different locations on some of the nodes. In this latter
case, the reports have been from homogeneous clusters, so the *type* of
binary was never the issue - it just wasn't located where we expected.

Sun's solution is (I believe) what most of us would expect - they locate
their executables in the same relative location on all their nodes. The
binary in that location is correct for that local architecture. This
requires, though, that the "prefix" location not be on a common file system.

Unfortunately, that isn't the case with LANL's roadrunner, nor can we expect
that everyone will follow that sensible approach :-). So we need a notation
to support the "exception" case where someone needs to truly specify prefix
versus node(s).

We discussed a number of options, including auto-detecting the local arch
and appending it to the specified "prefix" and several others. After
discussing them, those of us on the call decided that adding a field to the
hostfile that specifies the prefix to use on that host would be the best
solution. This could be done on a cluster-level basis, so - although it is
annoying to create the data file - at least it would only have to be done
once.

Again, this is the exception case, so requiring a little inconvenience seems
a reasonable thing to do.

Anyone have heartburn and/or other suggestions? If not, we might start to
play with this next week. We would have to do some small modifications to
the RAS, RMAPS, and PLS components to ensure that any multi-prefix info gets
correctly propagated and used across all platforms for consistent behavior.

Ralph


On 3/22/07 9:11 AM, "David Daniel"  wrote:

> This is a development system for roadrunner using ssh.
> 
> David
> 
> On Mar 22, 2007, at 5:19 AM, Jeff Squyres wrote:
> 
>> FWIW, I believe that we had intended --prefix to handle simple cases
>> which is why this probably doesn't work for you.  But as long as the
>> different prefixes are specified for different nodes, it could
>> probably be made to work.
>> 
>> Which launcher are you using this with?
>> 
>> 
>> 
>> On Mar 21, 2007, at 11:36 PM, Ralph Castain wrote:
>> 
>>> Yo David
>>> 
>>> What system are you running this on? RoadRunner? If so, I can take
>>> a look at
>>> "fixing" it for you tomorrow (Thurs).
>>> 
>>> Ralph
>>> 
>>> 
>>> On 3/21/07 10:17 AM, "David Daniel"  wrote:
>>> 
 I'm experimenting with heterogeneous applications (x86_64 <-->
 ppc64), where the systems share the file system where Open MPI is
 installed.
 
 What I would like to be able to do is something like this:
 
 mpirun --np 1 --host host-x86_64 --prefix /opt/ompi/x86_64
 a.out.x86_64 : --np 1 --host host-ppc64 --prefix /opt/ompi/ppc64
 a.out.ppc64
 
 Unfortunately it looks as if the second --prefix is always ignored.
 My guess is that orte_app_context_t::prefix_dir is getting set, but
 only the 0th app context is never consulted (except in the dynamic
 process stuff where I do see a loop over the app context array).
 
 I can of course work around it with startup scripts, but a command
 line solution would be attractive.
 
 This is with openmpi-1.2.
 
 Thanks, David
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> -- 
>> Jeff Squyres
>> Cisco Systems
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> --
> David Daniel 
> Computer Science for High-Performance Computing (CCS-1)
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] RH Enterprise Linux issue

2007-03-22 Thread Greg Watson

Is this a known problem? Building ompi 1.2 on RHEL4:

./configure --with-devel-headers --without-threads

(actually tried without '--without-threads' too, but no change)

$ mpirun -np 2 test
[beth:06029] *** Process received signal ***
[beth:06029] Signal: Segmentation fault (11)
[beth:06029] Signal code: Address not mapped (1)
[beth:06029] Failing at address: 0x2e342e33
[beth:06029] [ 0] /lib/tls/libc.so.6 [0x21b890]
[beth:06029] [ 1] /usr/local/lib/libopen-rte.so.0(orte_init_stage1 
+0x293) [0xb7fc50cb]
[beth:06029] [ 2] /usr/local/lib/libopen-rte.so.0(orte_system_init 
+0x1e) [0xb7fc84be]
[beth:06029] [ 3] /usr/local/lib/libopen-rte.so.0(orte_init+0x6a)  
[0xb7fc4cee]

[beth:06029] [ 4] mpirun(orterun+0x14b) [0x8049ecb]
[beth:06029] [ 5] mpirun(main+0x2a) [0x8049d7a]
[beth:06029] [ 6] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0x208de3]
[beth:06029] [ 7] mpirun [0x8049cc9]
[beth:06029] *** End of error message ***
Segmentation fault

Thanks,

Greg


Re: [OMPI devel] RH Enterprise Linux issue

2007-03-22 Thread Greg Watson

Oh, and this is a single x86 machine. Just trying to launch locally.

 $uname -a
Linux  2.6.9-42.0.2.ELsmp #1 SMP Thu Aug 17 18:00:32 EDT 2006 i686  
i686 i386 GNU/Linux


Greg

On Mar 22, 2007, at 12:17 PM, Greg Watson wrote:


Is this a known problem? Building ompi 1.2 on RHEL4:

./configure --with-devel-headers --without-threads

(actually tried without '--without-threads' too, but no change)

$ mpirun -np 2 test
[beth:06029] *** Process received signal ***
[beth:06029] Signal: Segmentation fault (11)
[beth:06029] Signal code: Address not mapped (1)
[beth:06029] Failing at address: 0x2e342e33
[beth:06029] [ 0] /lib/tls/libc.so.6 [0x21b890]
[beth:06029] [ 1] /usr/local/lib/libopen-rte.so.0(orte_init_stage1 
+0x293) [0xb7fc50cb]
[beth:06029] [ 2] /usr/local/lib/libopen-rte.so.0(orte_system_init 
+0x1e) [0xb7fc84be]
[beth:06029] [ 3] /usr/local/lib/libopen-rte.so.0(orte_init+0x6a)  
[0xb7fc4cee]

[beth:06029] [ 4] mpirun(orterun+0x14b) [0x8049ecb]
[beth:06029] [ 5] mpirun(main+0x2a) [0x8049d7a]
[beth:06029] [ 6] /lib/tls/libc.so.6(__libc_start_main+0xd3)  
[0x208de3]

[beth:06029] [ 7] mpirun [0x8049cc9]
[beth:06029] *** End of error message ***
Segmentation fault

Thanks,

Greg




Re: [OMPI devel] RH Enterprise Linux issue

2007-03-22 Thread Jeff Squyres
No, not a known problem -- my cluster is RHEL4U4 -- I use it for many  
thousands of runs of the OMPI v1.2 branch every day...


Can you see where it's dying in orte_init_stage1?


On Mar 22, 2007, at 2:17 PM, Greg Watson wrote:


Is this a known problem? Building ompi 1.2 on RHEL4:

./configure --with-devel-headers --without-threads

(actually tried without '--without-threads' too, but no change)

$ mpirun -np 2 test
[beth:06029] *** Process received signal ***
[beth:06029] Signal: Segmentation fault (11)
[beth:06029] Signal code: Address not mapped (1)
[beth:06029] Failing at address: 0x2e342e33
[beth:06029] [ 0] /lib/tls/libc.so.6 [0x21b890]
[beth:06029] [ 1] /usr/local/lib/libopen-rte.so.0(orte_init_stage1
+0x293) [0xb7fc50cb]
[beth:06029] [ 2] /usr/local/lib/libopen-rte.so.0(orte_system_init
+0x1e) [0xb7fc84be]
[beth:06029] [ 3] /usr/local/lib/libopen-rte.so.0(orte_init+0x6a)
[0xb7fc4cee]
[beth:06029] [ 4] mpirun(orterun+0x14b) [0x8049ecb]
[beth:06029] [ 5] mpirun(main+0x2a) [0x8049d7a]
[beth:06029] [ 6] /lib/tls/libc.so.6(__libc_start_main+0xd3)  
[0x208de3]

[beth:06029] [ 7] mpirun [0x8049cc9]
[beth:06029] *** End of error message ***
Segmentation fault

Thanks,

Greg
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RH Enterprise Linux issue

2007-03-22 Thread Greg Watson

gdb says this:

#0  0x2e342e33 in ?? ()
#1  0xb7fe1d31 in orte_pls_base_select () from /usr/local/lib/libopen- 
rte.so.0
#2  0xb7fc50cb in orte_init_stage1 () from /usr/local/lib/libopen- 
rte.so.0
#3  0xb7fc84be in orte_system_init () from /usr/local/lib/libopen- 
rte.so.0

#4  0xb7fc4cee in orte_init () from /usr/local/lib/libopen-rte.so.0
#5  0x08049ecb in orterun (argc=4, argv=0xb9f4) at orterun.c:369
#6  0x08049d7a in main (argc=4, argv=0xb9f4) at main.c:13
(gdb) The program is running.  Exit anyway? (y or n) y

I can recompile with debugging if that would be useful. Let me know  
if there's anything else I can do.


Here's ompi_info in case it helps:

Open MPI: 1.2
   Open MPI SVN revision: r14027
Open RTE: 1.2
   Open RTE SVN revision: r14027
OPAL: 1.2
   OPAL SVN revision: r14027
  Prefix: /usr/local
Configured architecture: i686-pc-linux-gnu
   Configured on: Thu Mar 22 13:39:30 EDT 2007
Built on: Thu Mar 22 13:55:38 EDT 2007
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: no
Fortran90 bindings size: na
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: g77
  Fortran77 compiler abs: /usr/bin/g77
  Fortran90 compiler: none
  Fortran90 compiler abs: none
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: no
  C++ exceptions: no
  Thread support: no
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous support: yes
mpirun default --prefix: no
mca: base: component_find: unable to open pml teg: file not found  
(ignored)

   MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2)
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component  
v1.2)

   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component  
v1.2)

   MCA timer: linux (MCA v1.0, API v1.0, Component v1.2)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.2)
MCA coll: self (MCA v1.0, API v1.0, Component v1.2)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.2)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.2)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2)
 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2)
 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2)
  MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2)
  MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2)
 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2)
 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2)
 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.2)
 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2)
  MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2)
  MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2)
  MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2)
 MCA gpr: null (MCA v1.0, API v1.0, Component v1.2)
 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2)
 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2)
 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2)
 MCA iof: svc (MCA v1.0, API v1.0, Component v1.2)
  MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2)
  MCA ns: replica (MCA v1.0, API v2.0, Component v1.2)
 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
 MCA ras: dash_host (MCA v1.0, API v1.3, Component  
v1.2)
 MCA ras: gridengine (MCA v1.0, API v1.3, Component  
v1.2)
 MCA ras: hostfile (MCA v1.0, API v1.0, Component  
v1.0.2)
 MCA ras: localhost (MCA v1.0, API v1.3, Component  
v1.2)

 MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2)
 MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2)
 MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2)
 MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2)
   MCA rmaps: round_robin (MCA v1.0, API v1.3, Component  
v1.2)

MCA rmgr: proxy (MCA v1.0, API v2.0, Co

Re: [OMPI devel] RH Enterprise Linux issue

2007-03-22 Thread Jeff Squyres

Yes, if you could recompile with debugging, that would be great.

What launcher are you trying to use?


On Mar 22, 2007, at 2:35 PM, Greg Watson wrote:


gdb says this:

#0  0x2e342e33 in ?? ()
#1  0xb7fe1d31 in orte_pls_base_select () from /usr/local/lib/libopen-
rte.so.0
#2  0xb7fc50cb in orte_init_stage1 () from /usr/local/lib/libopen-
rte.so.0
#3  0xb7fc84be in orte_system_init () from /usr/local/lib/libopen-
rte.so.0
#4  0xb7fc4cee in orte_init () from /usr/local/lib/libopen-rte.so.0
#5  0x08049ecb in orterun (argc=4, argv=0xb9f4) at orterun.c:369
#6  0x08049d7a in main (argc=4, argv=0xb9f4) at main.c:13
(gdb) The program is running.  Exit anyway? (y or n) y

I can recompile with debugging if that would be useful. Let me know
if there's anything else I can do.

Here's ompi_info in case it helps:

 Open MPI: 1.2
Open MPI SVN revision: r14027
 Open RTE: 1.2
Open RTE SVN revision: r14027
 OPAL: 1.2
OPAL SVN revision: r14027
   Prefix: /usr/local
Configured architecture: i686-pc-linux-gnu
Configured on: Thu Mar 22 13:39:30 EDT 2007
 Built on: Thu Mar 22 13:55:38 EDT 2007
   C bindings: yes
 C++ bindings: yes
   Fortran77 bindings: yes (all)
   Fortran90 bindings: no
Fortran90 bindings size: na
   C compiler: gcc
  C compiler absolute: /usr/bin/gcc
 C++ compiler: g++
C++ compiler absolute: /usr/bin/g++
   Fortran77 compiler: g77
   Fortran77 compiler abs: /usr/bin/g77
   Fortran90 compiler: none
   Fortran90 compiler abs: none
  C profiling: yes
C++ profiling: yes
  Fortran77 profiling: yes
  Fortran90 profiling: no
   C++ exceptions: no
   Thread support: no
   Internal debug support: no
  MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
  libltdl support: yes
Heterogeneous support: yes
mpirun default --prefix: no
mca: base: component_find: unable to open pml teg: file not found
(ignored)
MCA backtrace: execinfo (MCA v1.0, API v1.0, Component  
v1.2)

   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
v1.2)
MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2)
MCA maffinity: first_use (MCA v1.0, API v1.0, Component
v1.2)
MCA timer: linux (MCA v1.0, API v1.0, Component v1.2)
MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
 MCA coll: basic (MCA v1.0, API v1.0, Component v1.2)
 MCA coll: self (MCA v1.0, API v1.0, Component v1.2)
 MCA coll: sm (MCA v1.0, API v1.0, Component v1.2)
 MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2)
   MCA io: romio (MCA v1.0, API v1.0, Component v1.2)
MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2)
  MCA pml: cm (MCA v1.0, API v1.0, Component v1.2)
  MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2)
  MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2)
   MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2)
   MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2)
  MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2)
  MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2)
  MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
 MCA topo: unity (MCA v1.0, API v1.0, Component v1.2)
  MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2)
   MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2)
   MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2)
   MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2)
  MCA gpr: null (MCA v1.0, API v1.0, Component v1.2)
  MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2)
  MCA gpr: replica (MCA v1.0, API v1.0, Component  
v1.2)

  MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2)
  MCA iof: svc (MCA v1.0, API v1.0, Component v1.2)
   MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2)
   MCA ns: replica (MCA v1.0, API v2.0, Component  
v1.2)

  MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
  MCA ras: dash_host (MCA v1.0, API v1.3, Component
v1.2)
  MCA ras: gridengine (MCA v1.0, API v1.3, Component
v1.2)
  MCA ras: hostfile (MCA v1.0, API v1.0, Component
v1.0.2)
  MCA ras: localhost (MCA v1.0, API v1.3, Component
v1.2)
  MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2)
  MCA rds: hostfile (MCA v1.0, API v1.3, Component  
v1.2)

  MCA rds: proxy (MCA v1.0, API v1.3, Compon

Re: [OMPI devel] RH Enterprise Linux issue

2007-03-22 Thread Greg Watson
Scratch that. The problem was an installation over an old copy of  
ompi. Obviously picking up some old stuff.


Sorry for the disturbance. Back to the bat cave...

Greg

On Mar 22, 2007, at 12:46 PM, Jeff Squyres wrote:


Yes, if you could recompile with debugging, that would be great.

What launcher are you trying to use?


On Mar 22, 2007, at 2:35 PM, Greg Watson wrote:


gdb says this:

#0  0x2e342e33 in ?? ()
#1  0xb7fe1d31 in orte_pls_base_select () from /usr/local/lib/ 
libopen-

rte.so.0
#2  0xb7fc50cb in orte_init_stage1 () from /usr/local/lib/libopen-
rte.so.0
#3  0xb7fc84be in orte_system_init () from /usr/local/lib/libopen-
rte.so.0
#4  0xb7fc4cee in orte_init () from /usr/local/lib/libopen-rte.so.0
#5  0x08049ecb in orterun (argc=4, argv=0xb9f4) at orterun.c:369
#6  0x08049d7a in main (argc=4, argv=0xb9f4) at main.c:13
(gdb) The program is running.  Exit anyway? (y or n) y

I can recompile with debugging if that would be useful. Let me know
if there's anything else I can do.

Here's ompi_info in case it helps:

 Open MPI: 1.2
Open MPI SVN revision: r14027
 Open RTE: 1.2
Open RTE SVN revision: r14027
 OPAL: 1.2
OPAL SVN revision: r14027
   Prefix: /usr/local
Configured architecture: i686-pc-linux-gnu
Configured on: Thu Mar 22 13:39:30 EDT 2007
 Built on: Thu Mar 22 13:55:38 EDT 2007
   C bindings: yes
 C++ bindings: yes
   Fortran77 bindings: yes (all)
   Fortran90 bindings: no
Fortran90 bindings size: na
   C compiler: gcc
  C compiler absolute: /usr/bin/gcc
 C++ compiler: g++
C++ compiler absolute: /usr/bin/g++
   Fortran77 compiler: g77
   Fortran77 compiler abs: /usr/bin/g77
   Fortran90 compiler: none
   Fortran90 compiler abs: none
  C profiling: yes
C++ profiling: yes
  Fortran77 profiling: yes
  Fortran90 profiling: no
   C++ exceptions: no
   Thread support: no
   Internal debug support: no
  MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
  libltdl support: yes
Heterogeneous support: yes
mpirun default --prefix: no
mca: base: component_find: unable to open pml teg: file not found
(ignored)
MCA backtrace: execinfo (MCA v1.0, API v1.0, Component
v1.2)
   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
v1.2)
MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2)
MCA maffinity: first_use (MCA v1.0, API v1.0, Component
v1.2)
MCA timer: linux (MCA v1.0, API v1.0, Component v1.2)
MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
MCA allocator: bucket (MCA v1.0, API v1.0, Component  
v1.0)

 MCA coll: basic (MCA v1.0, API v1.0, Component v1.2)
 MCA coll: self (MCA v1.0, API v1.0, Component v1.2)
 MCA coll: sm (MCA v1.0, API v1.0, Component v1.2)
 MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2)
   MCA io: romio (MCA v1.0, API v1.0, Component v1.2)
MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2)
  MCA pml: cm (MCA v1.0, API v1.0, Component v1.2)
  MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2)
  MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2)
   MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2)
   MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2)
  MCA btl: self (MCA v1.0, API v1.0.1, Component  
v1.2)

  MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2)
  MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
 MCA topo: unity (MCA v1.0, API v1.0, Component v1.2)
  MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2)
   MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2)
   MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2)
   MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2)
  MCA gpr: null (MCA v1.0, API v1.0, Component v1.2)
  MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2)
  MCA gpr: replica (MCA v1.0, API v1.0, Component
v1.2)
  MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2)
  MCA iof: svc (MCA v1.0, API v1.0, Component v1.2)
   MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2)
   MCA ns: replica (MCA v1.0, API v2.0, Component
v1.2)
  MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
  MCA ras: dash_host (MCA v1.0, API v1.3, Component
v1.2)
  MCA ras: gridengine (MCA v1.0, API v1.3, Component
v1.2)
  MCA ras: hostfile (MCA v1.0, API v1.0, Component
v1.0.2)
  MCA ras: localhost (MCA v1.0, API v1.3, 

Re: [OMPI devel] RH Enterprise Linux issue

2007-03-22 Thread Jeff Squyres

Whew!  You had me worried there for a minute... :-)


On Mar 22, 2007, at 3:15 PM, Greg Watson wrote:


Scratch that. The problem was an installation over an old copy of
ompi. Obviously picking up some old stuff.

Sorry for the disturbance. Back to the bat cave...

Greg

On Mar 22, 2007, at 12:46 PM, Jeff Squyres wrote:


Yes, if you could recompile with debugging, that would be great.

What launcher are you trying to use?


On Mar 22, 2007, at 2:35 PM, Greg Watson wrote:


gdb says this:

#0  0x2e342e33 in ?? ()
#1  0xb7fe1d31 in orte_pls_base_select () from /usr/local/lib/
libopen-
rte.so.0
#2  0xb7fc50cb in orte_init_stage1 () from /usr/local/lib/libopen-
rte.so.0
#3  0xb7fc84be in orte_system_init () from /usr/local/lib/libopen-
rte.so.0
#4  0xb7fc4cee in orte_init () from /usr/local/lib/libopen-rte.so.0
#5  0x08049ecb in orterun (argc=4, argv=0xb9f4) at orterun.c:369
#6  0x08049d7a in main (argc=4, argv=0xb9f4) at main.c:13
(gdb) The program is running.  Exit anyway? (y or n) y

I can recompile with debugging if that would be useful. Let me know
if there's anything else I can do.

Here's ompi_info in case it helps:

 Open MPI: 1.2
Open MPI SVN revision: r14027
 Open RTE: 1.2
Open RTE SVN revision: r14027
 OPAL: 1.2
OPAL SVN revision: r14027
   Prefix: /usr/local
Configured architecture: i686-pc-linux-gnu
Configured on: Thu Mar 22 13:39:30 EDT 2007
 Built on: Thu Mar 22 13:55:38 EDT 2007
   C bindings: yes
 C++ bindings: yes
   Fortran77 bindings: yes (all)
   Fortran90 bindings: no
Fortran90 bindings size: na
   C compiler: gcc
  C compiler absolute: /usr/bin/gcc
 C++ compiler: g++
C++ compiler absolute: /usr/bin/g++
   Fortran77 compiler: g77
   Fortran77 compiler abs: /usr/bin/g77
   Fortran90 compiler: none
   Fortran90 compiler abs: none
  C profiling: yes
C++ profiling: yes
  Fortran77 profiling: yes
  Fortran90 profiling: no
   C++ exceptions: no
   Thread support: no
   Internal debug support: no
  MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
  libltdl support: yes
Heterogeneous support: yes
mpirun default --prefix: no
mca: base: component_find: unable to open pml teg: file not found
(ignored)
MCA backtrace: execinfo (MCA v1.0, API v1.0, Component
v1.2)
   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
v1.2)
MCA paffinity: linux (MCA v1.0, API v1.0, Component  
v1.2)

MCA maffinity: first_use (MCA v1.0, API v1.0, Component
v1.2)
MCA timer: linux (MCA v1.0, API v1.0, Component  
v1.2)
MCA allocator: basic (MCA v1.0, API v1.0, Component  
v1.0)

MCA allocator: bucket (MCA v1.0, API v1.0, Component
v1.0)
 MCA coll: basic (MCA v1.0, API v1.0, Component  
v1.2)

 MCA coll: self (MCA v1.0, API v1.0, Component v1.2)
 MCA coll: sm (MCA v1.0, API v1.0, Component v1.2)
 MCA coll: tuned (MCA v1.0, API v1.0, Component  
v1.2)
   MCA io: romio (MCA v1.0, API v1.0, Component  
v1.2)

MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2)
  MCA pml: cm (MCA v1.0, API v1.0, Component v1.2)
  MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2)
  MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2)
   MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2)
   MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2)
  MCA btl: self (MCA v1.0, API v1.0.1, Component
v1.2)
  MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2)
  MCA btl: tcp (MCA v1.0, API v1.0.1, Component  
v1.0)
 MCA topo: unity (MCA v1.0, API v1.0, Component  
v1.2)
  MCA osc: pt2pt (MCA v1.0, API v1.0, Component  
v1.2)

   MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2)
   MCA errmgr: orted (MCA v1.0, API v1.3, Component  
v1.2)
   MCA errmgr: proxy (MCA v1.0, API v1.3, Component  
v1.2)

  MCA gpr: null (MCA v1.0, API v1.0, Component v1.2)
  MCA gpr: proxy (MCA v1.0, API v1.0, Component  
v1.2)

  MCA gpr: replica (MCA v1.0, API v1.0, Component
v1.2)
  MCA iof: proxy (MCA v1.0, API v1.0, Component  
v1.2)

  MCA iof: svc (MCA v1.0, API v1.0, Component v1.2)
   MCA ns: proxy (MCA v1.0, API v2.0, Component  
v1.2)

   MCA ns: replica (MCA v1.0, API v2.0, Component
v1.2)
  MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
  MCA ras: dash_host (MCA v1.0, API v1.3, Component
v1.2)
  MCA ras: gridengine (MCA v1.0, API v1.3, Component
v1.2)
  

Re: [OMPI devel] mpirun --prefix question

2007-03-22 Thread David Daniel

OK. This sounds sensible.

Thanks, David

On Mar 22, 2007, at 10:38 AM, Ralph Castain wrote:


We had a nice chat about this on the OpenRTE telecon this morning. The
question of what to do with multiple prefix's has been a long- 
running issue,
most recently captured in bug trac report #497. The problem is that  
prefix

is intended to tell us where to find the ORTE/OMPI executables, and
therefore is associated with a node - not an app_context. What we  
haven't
been able to define is an appropriate notation that a user can  
exploit to

tell us the association.

This issue has arisen on several occasions where either (a) users have
heterogeneous clusters with a common file system, so the prefix  
must be
adjusted on each *type* of node to point to the correct type of  
binary; and
(b) for whatever reason, typically on rsh/ssh clusters, users have  
installed
the binaries in different locations on some of the nodes. In this  
latter
case, the reports have been from homogeneous clusters, so the  
*type* of

binary was never the issue - it just wasn't located where we expected.

Sun's solution is (I believe) what most of us would expect - they  
locate
their executables in the same relative location on all their nodes.  
The

binary in that location is correct for that local architecture. This
requires, though, that the "prefix" location not be on a common  
file system.


Unfortunately, that isn't the case with LANL's roadrunner, nor can  
we expect
that everyone will follow that sensible approach :-). So we need a  
notation
to support the "exception" case where someone needs to truly  
specify prefix

versus node(s).

We discussed a number of options, including auto-detecting the  
local arch

and appending it to the specified "prefix" and several others. After
discussing them, those of us on the call decided that adding a  
field to the
hostfile that specifies the prefix to use on that host would be the  
best
solution. This could be done on a cluster-level basis, so -  
although it is
annoying to create the data file - at least it would only have to  
be done

once.

Again, this is the exception case, so requiring a little  
inconvenience seems

a reasonable thing to do.

Anyone have heartburn and/or other suggestions? If not, we might  
start to
play with this next week. We would have to do some small  
modifications to
the RAS, RMAPS, and PLS components to ensure that any multi-prefix  
info gets
correctly propagated and used across all platforms for consistent  
behavior.


Ralph


On 3/22/07 9:11 AM, "David Daniel"  wrote:


This is a development system for roadrunner using ssh.

David

On Mar 22, 2007, at 5:19 AM, Jeff Squyres wrote:


FWIW, I believe that we had intended --prefix to handle simple cases
which is why this probably doesn't work for you.  But as long as the
different prefixes are specified for different nodes, it could
probably be made to work.

Which launcher are you using this with?



On Mar 21, 2007, at 11:36 PM, Ralph Castain wrote:


Yo David

What system are you running this on? RoadRunner? If so, I can take
a look at
"fixing" it for you tomorrow (Thurs).

Ralph


On 3/21/07 10:17 AM, "David Daniel"  wrote:


I'm experimenting with heterogeneous applications (x86_64 <-->
ppc64), where the systems share the file system where Open MPI is
installed.

What I would like to be able to do is something like this:

mpirun --np 1 --host host-x86_64 --prefix /opt/ompi/x86_64
a.out.x86_64 : --np 1 --host host-ppc64 --prefix /opt/ompi/ppc64
a.out.ppc64

Unfortunately it looks as if the second --prefix is always  
ignored.
My guess is that orte_app_context_t::prefix_dir is getting set,  
but

only the 0th app context is never consulted (except in the dynamic
process stuff where I do see a loop over the app context array).

I can of course work around it with startup scripts, but a command
line solution would be attractive.

This is with openmpi-1.2.

Thanks, David

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
David Daniel 
Computer Science for High-Performance Computing (CCS-1)




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
David Daniel 
Computer Science for High-Performance Computing (CCS-1)