Re: [O-MPI devel] [PATCH] Update Open MPI for new libibverbs API

2005-09-27 Thread Brian Barrett

On Sep 26, 2005, at 4:20 PM, Roland Dreier wrote:


[It's somewhat annoying to have to subscribe to de...@open-mpi.org
just to be able to send patches, but oh well...]


It's even more annoying to be deluged with SPAM ;).  We (the LAM  
developers) used to try to keep our mailing lists as open as  
possible.  In the end, SPAM pushed the signal to noise ratio way too  
high and something had to be done.  Requiring subscriptions to post  
was the best we could do.



This patch updates Open MPI for the new ibv_create_cq() API.
Signed-off-by: Roland Dreier 


I'll admit my ignorance - is this part of a particular release of  
OpenIB, or is this something that has happened recently in SVN?  I  
ask because we already have people using OpenIB and Open MPI  
together, and it would be bad to suddenly break things for them.   
Testing for number of arguments in a function is horribly unreliable  
- is there some version number or other key in the Open IB headers we  
can use to determine which version of the function to use?


Brian



--- ompi/mca/btl/openib/btl_openib.c(revision 7507)
+++ ompi/mca/btl/openib/btl_openib.c(working copy)
@@ -656,7 +656,8 @@ int mca_btl_openib_module_init(mca_btl_o
 }

 /* Create the low and high priority queue pairs */
-openib_btl->ib_cq_low = ibv_create_cq(ctx,  
mca_btl_openib_component.ib_cq_size, NULL);
+openib_btl->ib_cq_low = ibv_create_cq(ctx,  
mca_btl_openib_component.ib_cq_size,

+  NULL, NULL, 0);

 if(NULL == openib_btl->ib_cq_low) {
 BTL_ERROR(("error creating low priority cq for %s errno  
says %s\n",

@@ -665,7 +666,8 @@ int mca_btl_openib_module_init(mca_btl_o
 return OMPI_ERROR;
 }

-openib_btl->ib_cq_high = ibv_create_cq(ctx,  
mca_btl_openib_component.ib_cq_size, NULL);
+openib_btl->ib_cq_high = ibv_create_cq(ctx,  
mca_btl_openib_component.ib_cq_size,

+   NULL, NULL, 0);

 if(NULL == openib_btl->ib_cq_high) {
 BTL_ERROR(("error creating high priority cq for %s errno  
says %s\n",

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [O-MPI devel] ompi_info Seg Fault, missing component -- linux (solved?) (fwd)

2005-09-27 Thread Ferris McCormick
On Mon, 2005-09-26 at 20:09 +, Ferris McCormick wrote:
> On Mon, 2005-09-26 at 14:59 +, Ferris McCormick wrote:
> > On Fri, 2005-09-16 at 11:35 -0500, Brian Barrett wrote:
> > > On Sep 16, 2005, at 8:44 AM, Ferris McCormick wrote:
> > > 
> > > > ==
> > > > fmccor@polylepis util [235]% ./opal_timer
> > > > --> frequency: 9
> > > > --> cycle count
> > > > Slept approximately 903151189 cycles, or 1003501 us
> > > > --> usecs
> > > > Slept approximately 18446744073289684648 us
> > > > ==
> > > 
> > > That last value means that I'm munging the upper 32 bits of the tick 
> > > register (it's 64 bits long).  So we're not quite there yet, but 
> > > getting closer.  I should be able to get to that today.
> > > 
> > > The other problem is very odd.  Since you're compiling in 32bit mode, 
> > > I'd expect us to see it on our PowerPC machines, but I haven't run into 
> > > that one yet.  I'll try to compile without debugging and see what I can 
> > > see.
> > > 
> > > 
> > > Brian 

Here's where the SegFault comes from.
For whatever reason, when working with the verbose opal_output_stream,
eventually opal_paffinity_base_open sets opal_paffinity_base_output=-1
(at paffinity_base_open.c, 62) and calls mca_base_components_open with
that value as the output_id.  In turn, if output_id!=0,
mca_base_components_open calls:
=
 if (output_id != 0) {
opal_output_set_verbosity(output_id, verbose_level);
  }
==

Now, opal_set_verbosity (in opal/util/output.c) unconditionally does
this:
info[output_id].ldi_verbose_level = level;
(where, for verbose, this is info[-1].ldi_verbose_level=0;)
On my system, this wipes out verbose itself.

Elsewhere in output.c, such constructs are bracketed with
if(output_id >= 0) { ... } (or if(-1 == output_id) {...}), and I suspect
that is needed here, too.

Hope this helps,
Ferris
-- 
Ferris McCormick (P44646, MI) 
Developer, Gentoo Linux (Sparc, Devrel)


signature.asc
Description: This is a digitally signed message part


Re: [O-MPI devel] ompi_info Seg Fault, missing component -- linux (solved?) (fwd)

2005-09-27 Thread Jeff Squyres
Thanks muchly for tracking this down!  I'm working on the fixes right 
now; will commit shortly.




On Sep 27, 2005, at 11:59 AM, Ferris McCormick wrote:


On Mon, 2005-09-26 at 20:09 +, Ferris McCormick wrote:

On Mon, 2005-09-26 at 14:59 +, Ferris McCormick wrote:

On Fri, 2005-09-16 at 11:35 -0500, Brian Barrett wrote:

On Sep 16, 2005, at 8:44 AM, Ferris McCormick wrote:


==
fmccor@polylepis util [235]% ./opal_timer
--> frequency: 9
--> cycle count
Slept approximately 903151189 cycles, or 1003501 us
--> usecs
Slept approximately 18446744073289684648 us
==


That last value means that I'm munging the upper 32 bits of the tick
register (it's 64 bits long).  So we're not quite there yet, but
getting closer.  I should be able to get to that today.

The other problem is very odd.  Since you're compiling in 32bit 
mode,
I'd expect us to see it on our PowerPC machines, but I haven't run 
into
that one yet.  I'll try to compile without debugging and see what I 
can

see.


Brian


Here's where the SegFault comes from.
For whatever reason, when working with the verbose opal_output_stream,
eventually opal_paffinity_base_open sets opal_paffinity_base_output=-1
(at paffinity_base_open.c, 62) and calls mca_base_components_open with
that value as the output_id.  In turn, if output_id!=0,
mca_base_components_open calls:
=
 if (output_id != 0) {
opal_output_set_verbosity(output_id, verbose_level);
  }
==

Now, opal_set_verbosity (in opal/util/output.c) unconditionally does
this:
info[output_id].ldi_verbose_level = level;
(where, for verbose, this is info[-1].ldi_verbose_level=0;)
On my system, this wipes out verbose itself.

Elsewhere in output.c, such constructs are bracketed with
if(output_id >= 0) { ... } (or if(-1 == output_id) {...}), and I 
suspect

that is needed here, too.

Hope this helps,
Ferris
--
Ferris McCormick (P44646, MI) 
Developer, Gentoo Linux (Sparc, Devrel)
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] [PATCH] Update Open MPI for new libibverbs API

2005-09-27 Thread Roland Dreier
Brian> It's even more annoying to be deluged with SPAM ;).  We
Brian> (the LAM developers) used to try to keep our mailing lists
Brian> as open as possible.  In the end, SPAM pushed the signal to
Brian> noise ratio way too high and something had to be done.
Brian> Requiring subscriptions to post was the best we could do.

I understand that you have limited resources to administer your
mailing list, but certainly lists like openib-general and linux-kernel
show that it is possible to run lists with low levels of spam and
still allow posting by anyone.

In general, if I have to subscribe to a list just to send a bug fix to
a project, I'm quite likely to forget about it.  So you are definitely
missing out on contributions by closing your lists.

Brian> I'll admit my ignorance - is this part of a particular
Brian> release of OpenIB, or is this something that has happened
Brian> recently in SVN?  I ask because we already have people
Brian> using OpenIB and Open MPI together, and it would be bad to
Brian> suddenly break things for them.  Testing for number of
Brian> arguments in a function is horribly unreliable - is there
Brian> some version number or other key in the Open IB headers we
Brian> can use to determine which version of the function to use?

OpenIB has not done an "official" release of any userspace components,
so this falls into the category of prerelease API breakage.

New kernels will require a new libibverbs, so the number of obsolete
old development versions should decrease fairly quickly.

 - R.


[O-MPI devel] Back to 32bit on 64bit machines...

2005-09-27 Thread Nathan DeBardeleben

So is this an error or am I configuring wrong?

Here's my configure:

[sparkplug]~/ompi > ./configure CFLAGS=-m32 FFLAGS=-m32 CXXFLAGS=-m32 
--without-threads --prefix=/home/ndebard/local/ompi 
--with-devel-headers --without-gm


I've also tried adding --build=i586-suse-linux, that didn't help either.
Basically the compile eventually ends here:

 g++ -DHAVE_CONFIG_H -I. -I. -I../../../include -I../../../include 
-I../../../include -I../../.. -I../../.. -I../../../include 
-I../../../opal -I../../../orte -I../../../ompi -m32 -g -Wall -Wundef 
-Wno-long-long -finline-functions -MT comm.lo -MD -MP -MF 
.deps/comm.Tpo -c comm.cc  -fPIC -DPIC -o .libs/comm.o
/bin/sh ../../../libtool --mode=link g++  -m32 -g -Wall -Wundef 
-Wno-long-long -finline-functions   -export-dynamic   -o libmpi_cxx.la 
-rpath /home/ndebard/local/ompi/lib  mpicxx.lo intercepts.lo comm.lo  
-lm  -lutil -lnsl
g++ -shared -nostdlib 
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib/crti.o 
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32/crtbeginS.o  
.libs/mpicxx.o .libs/intercepts.o .libs/comm.o  -lutil -lnsl 
-L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32 
-L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3 
-L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib/../lib 
-L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/lib 
-L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib 
-L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -L/lib/../lib 
-L/usr/lib/../lib /usr/lib64/libstdc++.so -lm -lc -lgcc_s_32 
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32/crtendS.o 
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib/crtn.o  
-m32 -Wl,-soname -Wl,libmpi_cxx.so.0 -o .libs/libmpi_cxx.so.0.0.0

/usr/lib64/libstdc++.so: could not read symbols: Invalid operation
collect2: ld returned 1 exit status
make[3]: *** [libmpi_cxx.la] Error 1
make[3]: Leaving directory `/home/ndebard/ompi/ompi/mpi/cxx'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/ndebard/ompi/ompi/mpi'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/ndebard/ompi/ompi'
make: *** [all-recursive] Error 1
[sparkplug]~/ompi >


I'm having problems I think might be 64bit related and want to prove it 
by building in 32bit mode.

Oh, here's some basics if it helps.


[sparkplug]~/ompi > cat /etc/issue

Welcome to SuSE Linux 9.1 (x86-64) - Kernel \r (\l).


[sparkplug]~/ompi > uname -a
Linux sparkplug 2.6.10 #4 SMP Wed Jan 26 11:50:00 MST 2005 x86_64 
x86_64 x86_64 GNU/Linux
[sparkplug]~/ompi > 



--
-- Nathan
Correspondence
-
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov
-



Re: [O-MPI devel] Back to 32bit on 64bit machines...

2005-09-27 Thread Jeff Squyres
This looks like it *might* be a libtool problem -- it's picking up the  
/usr/lib64/libstdc++.so when you're compiling in 32 bit mode (and  
therefore barfing).


Can you send the libtool command that immediately preceded this link  
line?


As a workaround, you should be able to --disable-cxx to disable the MPI  
C++ bindings, and therefore skip building in this tree.


Ralf -- any thoughts?



On Sep 27, 2005, at 3:23 PM, Nathan DeBardeleben wrote:


So is this an error or am I configuring wrong?

Here's my configure:


[sparkplug]~/ompi > ./configure CFLAGS=-m32 FFLAGS=-m32 CXXFLAGS=-m32
--without-threads --prefix=/home/ndebard/local/ompi
--with-devel-headers --without-gm


I've also tried adding --build=i586-suse-linux, that didn't help  
either.

Basically the compile eventually ends here:


 g++ -DHAVE_CONFIG_H -I. -I. -I../../../include -I../../../include
-I../../../include -I../../.. -I../../.. -I../../../include
-I../../../opal -I../../../orte -I../../../ompi -m32 -g -Wall -Wundef
-Wno-long-long -finline-functions -MT comm.lo -MD -MP -MF
.deps/comm.Tpo -c comm.cc  -fPIC -DPIC -o .libs/comm.o
/bin/sh ../../../libtool --mode=link g++  -m32 -g -Wall -Wundef
-Wno-long-long -finline-functions   -export-dynamic   -o libmpi_cxx.la
-rpath /home/ndebard/local/ompi/lib  mpicxx.lo intercepts.lo comm.lo
-lm  -lutil -lnsl
g++ -shared -nostdlib
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib/crti.o
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32/crtbeginS.o
.libs/mpicxx.o .libs/intercepts.o .libs/comm.o  -lutil -lnsl
-L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32
-L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3
-L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse- 
linux/lib/../lib
-L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse- 
linux/lib

-L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib
-L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -L/lib/../lib
-L/usr/lib/../lib /usr/lib64/libstdc++.so -lm -lc -lgcc_s_32
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32/crtendS.o
/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib/crtn.o
-m32 -Wl,-soname -Wl,libmpi_cxx.so.0 -o .libs/libmpi_cxx.so.0.0.0
/usr/lib64/libstdc++.so: could not read symbols: Invalid operation
collect2: ld returned 1 exit status
make[3]: *** [libmpi_cxx.la] Error 1
make[3]: Leaving directory `/home/ndebard/ompi/ompi/mpi/cxx'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/ndebard/ompi/ompi/mpi'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/ndebard/ompi/ompi'
make: *** [all-recursive] Error 1
[sparkplug]~/ompi >


I'm having problems I think might be 64bit related and want to prove it
by building in 32bit mode.
Oh, here's some basics if it helps.


[sparkplug]~/ompi > cat /etc/issue

Welcome to SuSE Linux 9.1 (x86-64) - Kernel \r (\l).


[sparkplug]~/ompi > uname -a
Linux sparkplug 2.6.10 #4 SMP Wed Jan 26 11:50:00 MST 2005 x86_64
x86_64 x86_64 GNU/Linux
[sparkplug]~/ompi >



--
-- Nathan
Correspondence
-
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndeb...@lanl.gov
-

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] Back to 32bit on 64bit machines...

2005-09-27 Thread Ralf Wildenhues
Hi Nathan, Jeff,

* Jeff Squyres wrote on Tue, Sep 27, 2005 at 09:39:59PM CEST:
> This looks like it *might* be a libtool problem -- it's picking up the  
> /usr/lib64/libstdc++.so when you're compiling in 32 bit mode (and  
> therefore barfing).

Yep, I think it is.

> Can you send the libtool command that immediately preceded this link  
> line?
> 
> As a workaround, you should be able to --disable-cxx to disable the MPI  
> C++ bindings, and therefore skip building in this tree.

Other, better-suited workarounds: either
- remove the 64bit paths from compiler_lib_search_path and
  sys_lib_search_path_spec in the generated libtool script(s)
  (note these variables are set both at the very beginning,
  and at the very end, once for each source file language),
or
- link with "LDFLAGS=-L/usr/lib", if /usr/lib is where your 
  32-bit libstdc++.so is located.

We're not really sure yet how to fix this for all distributions.

Sorry for the inconvenience,
Ralf

> On Sep 27, 2005, at 3:23 PM, Nathan DeBardeleben wrote:
> 
> > So is this an error or am I configuring wrong?
> >
> > Here's my configure:
> >
> >> [sparkplug]~/ompi > ./configure CFLAGS=-m32 FFLAGS=-m32 CXXFLAGS=-m32
> >> --without-threads --prefix=/home/ndebard/local/ompi
> >> --with-devel-headers --without-gm
> >
> > I've also tried adding --build=i586-suse-linux, that didn't help  
> > either.
> > Basically the compile eventually ends here:
> >
> >>  g++ -DHAVE_CONFIG_H -I. -I. -I../../../include -I../../../include
> >> -I../../../include -I../../.. -I../../.. -I../../../include
> >> -I../../../opal -I../../../orte -I../../../ompi -m32 -g -Wall -Wundef
> >> -Wno-long-long -finline-functions -MT comm.lo -MD -MP -MF
> >> .deps/comm.Tpo -c comm.cc  -fPIC -DPIC -o .libs/comm.o
> >> /bin/sh ../../../libtool --mode=link g++  -m32 -g -Wall -Wundef
> >> -Wno-long-long -finline-functions   -export-dynamic   -o libmpi_cxx.la
> >> -rpath /home/ndebard/local/ompi/lib  mpicxx.lo intercepts.lo comm.lo
> >> -lm  -lutil -lnsl
> >> g++ -shared -nostdlib
> >> /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib/crti.o
> >> /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32/crtbeginS.o
> >> .libs/mpicxx.o .libs/intercepts.o .libs/comm.o  -lutil -lnsl
> >> -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32
> >> -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3
> >> -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse- 
> >> linux/lib/../lib
> >> -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse- 
> >> linux/lib
> >> -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib
> >> -L/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../.. -L/lib/../lib
> >> -L/usr/lib/../lib /usr/lib64/libstdc++.so -lm -lc -lgcc_s_32
> >> /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/32/crtendS.o
> >> /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../lib/crtn.o
> >> -m32 -Wl,-soname -Wl,libmpi_cxx.so.0 -o .libs/libmpi_cxx.so.0.0.0
> >> /usr/lib64/libstdc++.so: could not read symbols: Invalid operation
> >> collect2: ld returned 1 exit status
> >> make[3]: *** [libmpi_cxx.la] Error 1
> >> make[3]: Leaving directory `/home/ndebard/ompi/ompi/mpi/cxx'
> >> make[2]: *** [all-recursive] Error 1
> >> make[2]: Leaving directory `/home/ndebard/ompi/ompi/mpi'
> >> make[1]: *** [all-recursive] Error 1
> >> make[1]: Leaving directory `/home/ndebard/ompi/ompi'
> >> make: *** [all-recursive] Error 1
> >> [sparkplug]~/ompi >
> >
> > I'm having problems I think might be 64bit related and want to prove it
> > by building in 32bit mode.
> > Oh, here's some basics if it helps.
> >
> >> [sparkplug]~/ompi > cat /etc/issue
> >>
> >> Welcome to SuSE Linux 9.1 (x86-64) - Kernel \r (\l).
> >>
> >>
> >> [sparkplug]~/ompi > uname -a
> >> Linux sparkplug 2.6.10 #4 SMP Wed Jan 26 11:50:00 MST 2005 x86_64
> >> x86_64 x86_64 GNU/Linux
> >> [sparkplug]~/ompi >


[O-MPI devel] bproc question

2005-09-27 Thread Greg Watson

Hi,

Trying to install ompi on a bproc machine with no network filesystem.  
I've copied the contents of the ompi lib directory into /tmp/ompi on  
each node and set my LD_LIBRARY_PATH to /tmp/ompi. However when I run  
the program, I get the following error. Any suggestions on what else  
I need to do?


Thanks,

Greg

[n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file  
orte_init_stage1.c at line 191
[n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file  
orte_system_init.c at line 39
[n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file orte_init.c at  
line 47
 
--

Sorry!  You were supposed to get help about:
orted:init-failure
from the file:
help-orted.txt
But I couldn't find any file matching that name.  Sorry!
 
--
 
--

A daemon (pid 31161) launched by the bproc PLS component on node 0 died
unexpectedly so we are aborting.

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to  
have the

location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
 
--
[bluesteel.lanl.gov:31160] [0,0,0] ORTE_ERROR_LOG: Error in file  
pls_bproc.c at line 870




Re: [O-MPI devel] bproc question

2005-09-27 Thread Ralf Wildenhues
Hi Greg,

* Greg Watson wrote on Tue, Sep 27, 2005 at 10:27:22PM CEST:
> 
> Trying to install ompi on a bproc machine with no network filesystem.  
> I've copied the contents of the ompi lib directory into /tmp/ompi on  
> each node and set my LD_LIBRARY_PATH to /tmp/ompi. However when I run  
> the program, I get the following error. Any suggestions on what else  
> I need to do?

[ Disclaimer: I don't know much about bproc, so I don't know if this
applies here ]

You could try to
  configure --prefix=/tmp/ompi
and then just
  make install
there?

Cheers,
Ralf

> [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file  
> orte_init_stage1.c at line 191
> [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file  
> orte_system_init.c at line 39
> [n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file orte_init.c at  
> line 47


Re: [O-MPI devel] bproc question

2005-09-27 Thread Jeff Squyres

This exact problem came up in a different context today.

This is only a side-effect of us having crummy error messages.  :-(

What is happening is that OMPI is not finding its components.   
Specifically, it's looking for the SDS components in this case, not  
finding them, and then barfing.


Open MPI, by default, looks in $prefix/lib/openmpi and  
$HOME/.openmpi/components for its components.  This is set with the  
"mca_component_path" MCA parameter -- you can certainly change it to be  
whatever you need.  For example:


-
[15:26] odin:~/svn/ompi/ompi/runtime % ompi_info --param mca all
[snipped]
 MCA mca: parameter "mca_component_path" (current value:

"/u/jsquyres/bogus/lib/openmpi:/u/jsquyres/.openmpi components")
  Path where to look for Open MPI and ORTE  
components

[snipped]
-

So you should be able to:

orteun --mca mca_component_path /path/where/you/have/them ...

Disclaimer: this *used* to work, but I haven't tried it in a long time.  
 There's no reason that it shouldn't work, but we all know how bit rot  
happens...


However, be aware that the wrapper compilers are still hard-coded to  
look in $prefix/lib to link the OMPI/ORTE/OPAL compilers.  You can  
override that stuff with environment variables if you need to, but it's  
not desirable.


Sidenote: in LAM, we had a single, top-level environment variable named  
LAMHOME that would override all this stuff.  However, we found that it  
*really* confused most users -- there were very, very few instances  
where there was a genuine need for it.  So we didn't add a single,  
top-level control like that in OMPI.



On Sep 27, 2005, at 4:27 PM, Greg Watson wrote:


Hi,

Trying to install ompi on a bproc machine with no network filesystem.
I've copied the contents of the ompi lib directory into /tmp/ompi on
each node and set my LD_LIBRARY_PATH to /tmp/ompi. However when I run
the program, I get the following error. Any suggestions on what else
I need to do?

Thanks,

Greg

[n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file
orte_init_stage1.c at line 191
[n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file
orte_system_init.c at line 39
[n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file orte_init.c at
line 47
--- 
-

--
Sorry!  You were supposed to get help about:
 orted:init-failure
from the file:
 help-orted.txt
But I couldn't find any file matching that name.  Sorry!
--- 
-

--
--- 
-

--
A daemon (pid 31161) launched by the bproc PLS component on node 0 died
unexpectedly so we are aborting.

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to
have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--- 
-

--
[bluesteel.lanl.gov:31160] [0,0,0] ORTE_ERROR_LOG: Error in file
pls_bproc.c at line 870

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] bproc question

2005-09-27 Thread Greg Watson

Yes!!! It worked:

I have the components installed in /home/gwatson/ompi_install/lib/ 
openmpi on the front end and in /tmp/ompi/openmpi on the nodes. The  
two things I need to do are:


1. Set my LD_LIBRARY_PATH to /home/gwatson/ompi_install/lib:/tmp/ompi  
so that it picks up the shared libraries on the front end and on the  
nodes.


2. Use the following command to run my program 'x':

orterun --mca mca_component_path /home/gwatson/ompi_install/lib/ 
openmpi:/tmp/ompi/openmpi -np 2 ./x


Cheers,

Greg

On Sep 27, 2005, at 2:53 PM, Jeff Squyres wrote:


This exact problem came up in a different context today.

This is only a side-effect of us having crummy error messages.  :-(

What is happening is that OMPI is not finding its components.
Specifically, it's looking for the SDS components in this case, not
finding them, and then barfing.

Open MPI, by default, looks in $prefix/lib/openmpi and
$HOME/.openmpi/components for its components.  This is set with the
"mca_component_path" MCA parameter -- you can certainly change it  
to be

whatever you need.  For example:

-
[15:26] odin:~/svn/ompi/ompi/runtime % ompi_info --param mca all
[snipped]
  MCA mca: parameter "mca_component_path" (current  
value:


"/u/jsquyres/bogus/lib/openmpi:/u/jsquyres/.openmpi components")
   Path where to look for Open MPI and ORTE
components
[snipped]
-

So you should be able to:

orteun --mca mca_component_path /path/where/you/have/them ...

Disclaimer: this *used* to work, but I haven't tried it in a long  
time.
  There's no reason that it shouldn't work, but we all know how bit  
rot

happens...

However, be aware that the wrapper compilers are still hard-coded to
look in $prefix/lib to link the OMPI/ORTE/OPAL compilers.  You can
override that stuff with environment variables if you need to, but  
it's

not desirable.

Sidenote: in LAM, we had a single, top-level environment variable  
named

LAMHOME that would override all this stuff.  However, we found that it
*really* confused most users -- there were very, very few instances
where there was a genuine need for it.  So we didn't add a single,
top-level control like that in OMPI.


On Sep 27, 2005, at 4:27 PM, Greg Watson wrote:



Hi,

Trying to install ompi on a bproc machine with no network filesystem.
I've copied the contents of the ompi lib directory into /tmp/ompi on
each node and set my LD_LIBRARY_PATH to /tmp/ompi. However when I run
the program, I get the following error. Any suggestions on what else
I need to do?

Thanks,

Greg

[n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file
orte_init_stage1.c at line 191
[n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file
orte_system_init.c at line 39
[n0:31161] [NO-NAME] ORTE_ERROR_LOG: Not found in file orte_init.c at
line 47
- 
--

-
--
Sorry!  You were supposed to get help about:
 orted:init-failure
from the file:
 help-orted.txt
But I couldn't find any file matching that name.  Sorry!
- 
--

-
--
- 
--

-
--
A daemon (pid 31161) launched by the bproc PLS component on node 0  
died

unexpectedly so we are aborting.

This may be because the daemon was unable to find all the needed  
shared

libraries on the remote node. You may set your LD_LIBRARY_PATH to
have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
- 
--

-
--
[bluesteel.lanl.gov:31160] [0,0,0] ORTE_ERROR_LOG: Error in file
pls_bproc.c at line 870

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel