date:20060906

Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL

2006-09-06 Thread Ralf Wildenhues

Hello,

* Open MPI wrote on Wed, Sep 06, 2006 at 01:00:00PM CEST:
> #334: Building with Libtool 2.1a fails to run OpenIB BTL

>Are you testing uninstalled or installed programs/libraries?
> 
>  Installed.
> 
>If you are testing an uninstalled program: does libtool generate a shell
>  wrapper for it?  If yes: post the two shell wrappers generated by 1.5.22
>  and HEAD.
> 
>  I was testing with a trivial "hello world" MPI application; i.e., one that
>  I had compiled with mpicc and was running with "mpirun -np 2 --mca btl
>  openib hello".  Hence, I was testing against installed trees of Open MPI.
>  I took care to "rm -rf" the installation tree before testing each so that
>  there would be no kruft left from prior installs.

OK, I can only assume that with 1.5.22, some code links against
libibverbs, or loads it earlier at runtime, so that the symbol is
present.  In any case I wonder why mthca.so isn't linked directly
against libibverbs (maybe useful to suggest that to upstream).

To find out the culprit, here's a couple of quick (well) ways:
  diff build-with-lt1.5/Makefile build-with-lt2.1a/Makefile

and look for differences in library link variables.  Otherwise, the
config.log outputs should give clues.

To give you some more hints for possible causes: ompi_info could have a
different set of RPATH entries, or different NEEDED libraries than your
test executable; if any of those cause libibverbs.so to be loaded, then
the symbol would be visible already.  Maybe your test executables even
have different RPATHs or NEEDED libs (find out with 'objdump -p' and
ldd)?

>  I've attached 2 tarballs to the bug (you have to go to the URL of the bug
>  to get them; they are not included in the mails):

If there are tarballs available at
http://svn.open-mpi.org/trac/ompi/ticket/334, then I'm too blind to find
them.  Would that be elsewhere?

>   * One with all the configure output and the wrapper script for ompi_info.
>  Note that ompi_info -- which lt_dlopen()'s the OpenIB BTL -- does not show
>  the same problem (i.e., the OpenIB BTL opens properly and ompi_info shows
>  its information).  This happens with both the uninstalled and installed
>  ompi_info.
>   * Another with the same for 2.1a.

That would be very helpful.

Cheers,
Ralf

Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL

2006-09-06 Thread Gleb Natapov

This error is usually happens when libibverbs is dlopened without
RTLD_GLOBAL flag.

On Wed, Sep 06, 2006 at 03:05:39PM +0200, Ralf Wildenhues wrote:
> Hello,
> 
> * Open MPI wrote on Wed, Sep 06, 2006 at 01:00:00PM CEST:
> > #334: Building with Libtool 2.1a fails to run OpenIB BTL
> 
> >Are you testing uninstalled or installed programs/libraries?
> > 
> >  Installed.
> > 
> >If you are testing an uninstalled program: does libtool generate a shell
> >  wrapper for it?  If yes: post the two shell wrappers generated by 1.5.22
> >  and HEAD.
> > 
> >  I was testing with a trivial "hello world" MPI application; i.e., one that
> >  I had compiled with mpicc and was running with "mpirun -np 2 --mca btl
> >  openib hello".  Hence, I was testing against installed trees of Open MPI.
> >  I took care to "rm -rf" the installation tree before testing each so that
> >  there would be no kruft left from prior installs.
> 
> OK, I can only assume that with 1.5.22, some code links against
> libibverbs, or loads it earlier at runtime, so that the symbol is
> present.  In any case I wonder why mthca.so isn't linked directly
> against libibverbs (maybe useful to suggest that to upstream).
> 
> To find out the culprit, here's a couple of quick (well) ways:
>   diff build-with-lt1.5/Makefile build-with-lt2.1a/Makefile
> 
> and look for differences in library link variables.  Otherwise, the
> config.log outputs should give clues.
> 
> To give you some more hints for possible causes: ompi_info could have a
> different set of RPATH entries, or different NEEDED libraries than your
> test executable; if any of those cause libibverbs.so to be loaded, then
> the symbol would be visible already.  Maybe your test executables even
> have different RPATHs or NEEDED libs (find out with 'objdump -p' and
> ldd)?
> 
> >  I've attached 2 tarballs to the bug (you have to go to the URL of the bug
> >  to get them; they are not included in the mails):
> 
> If there are tarballs available at
> http://svn.open-mpi.org/trac/ompi/ticket/334, then I'm too blind to find
> them.  Would that be elsewhere?
> 
> >   * One with all the configure output and the wrapper script for ompi_info.
> >  Note that ompi_info -- which lt_dlopen()'s the OpenIB BTL -- does not show
> >  the same problem (i.e., the OpenIB BTL opens properly and ompi_info shows
> >  its information).  This happens with both the uninstalled and installed
> >  ompi_info.
> >   * Another with the same for 2.1a.
> 
> That would be very helpful.
> 
> Cheers,
> Ralf
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Gleb.

Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL

2006-09-06 Thread Ralf Wildenhues

* Gleb Natapov wrote on Wed, Sep 06, 2006 at 03:26:38PM CEST:
> This error is usually happens when libibverbs is dlopened without
> RTLD_GLOBAL flag.

Ah, yep.  That is a difference between 2.1a and 1.5.x.  Not an
undisputed one.

I don't know enough about the setup to suggest a workaround right away.
It may also be useful to bring this up on the Libtool list, as a case
in point against this change (or for some way to choose either).

Cheers,
Ralf

Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL

2006-09-06 Thread Jeff Squyres

On 9/6/06 9:05 AM, "Ralf Wildenhues"  wrote:

>>  I was testing with a trivial "hello world" MPI application; i.e., one that
>>  I had compiled with mpicc and was running with "mpirun -np 2 --mca btl
>>  openib hello".  Hence, I was testing against installed trees of Open MPI.
>>  I took care to "rm -rf" the installation tree before testing each so that
>>  there would be no kruft left from prior installs.
> 
> OK, I can only assume that with 1.5.22, some code links against
> libibverbs, or loads it earlier at runtime, so that the symbol is
> present.  In any case I wonder why mthca.so isn't linked directly
> against libibverbs (maybe useful to suggest that to upstream).

I'm not sure.  I think it's a plugin; I'll ask.

> To find out the culprit, here's a couple of quick (well) ways:
>   diff build-with-lt1.5/Makefile build-with-lt2.1a/Makefile
> 
> and look for differences in library link variables.  Otherwise, the
> config.log outputs should give clues.

I did that early in the ticket and didn't see much of a difference.
However, while typing this, Gleb just replied about RTLD_GLOBAL.  I'll go
comment on that...

> To give you some more hints for possible causes: ompi_info could have a
> different set of RPATH entries, or different NEEDED libraries than your
> test executable; if any of those cause libibverbs.so to be loaded, then
> the symbol would be visible already.  Maybe your test executables even
> have different RPATHs or NEEDED libs (find out with 'objdump -p' and
> ldd)?
> 
>>  I've attached 2 tarballs to the bug (you have to go to the URL of the bug
>>  to get them; they are not included in the mails):
> 
> If there are tarballs available at
> http://svn.open-mpi.org/trac/ompi/ticket/334, then I'm too blind to find
> them.  Would that be elsewhere?

It's because I'm a bozo and forgot to attach them.  They're now near the top
of the ticket in the "Attachments" section.  Sorry about that...

http://svn.open-mpi.org/trac/ompi/ticket/334

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL

2006-09-06 Thread Jeff Squyres

On 9/6/06 9:37 AM, "Ralf Wildenhues"  wrote:

>> This error is usually happens when libibverbs is dlopened without
>> RTLD_GLOBAL flag.
> 
> Ah, yep.  That is a difference between 2.1a and 1.5.x.  Not an
> undisputed one.
> 
> I don't know enough about the setup to suggest a workaround right away.
> It may also be useful to bring this up on the Libtool list, as a case
> in point against this change (or for some way to choose either).

The setup is that we lt_dlopen() the OpenIB BTL (which was linked against
libibverbs.so), which, in turn, dlopen's mthca.so.  Hence, since mthca.so
apparently requires at least one symbol from libibverbs.so, that dlopen
fails.

I'll mail the LT list.

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Bogdan Costescu

On Fri, 1 Sep 2006, Ralph Castain wrote:

> The only use case I am really concerned about is that of a Head Node 
> Process (HNP) that needs to talk to both IPv6 and IPv4 systems. I 
> admit this will be unusual,

This and other aspects were discussed or at least mentioned in a 
thread starting at:

http://www.open-mpi.org/community/lists/devel/2006/03/0781.php

I don't know why you think that this (talking to different nodes via 
different channels) is unusual - I think that it's quite probable, 
especially in a heterogenous environment.

However, if the present discussion is only about a proof of concept 
version, then I'd say that anything to show IPv6 functionality would 
be acceptable.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: bogdan.coste...@iwr.uni-heidelberg.de

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Ralph H Castain

Actually, I was a part of that thread - see my comments beginning with
http://www.open-mpi.org/community/lists/devel/2006/03/0797.php.

Perhaps I communicated poorly here. The issue in the prior thread was that
few systems nowadays don't offer at least some level of IPv6 compatibility,
even if nothing more than mapping IPv6 addresses to IPv4. My point in that
thread was that some types of systems (e.g., embedded systems) don't - they
have no ability to interact with IPv6 at all - but that these are not
commonly found in the high performance world (the focus of OpenMPI).

Although I expect hetero operations to be fairly common, I don't expect to
see too many high performance systems that have no library support at all
for IPv6.

Hope that clarifies my comment. The intent is to fully support both types of
systems anyway, so I'll concede that the point (i.e., how unusual the
situation might be) is somewhat moot.

On 9/6/06 8:13 AM, "Bogdan Costescu" 
wrote:

> On Fri, 1 Sep 2006, Ralph Castain wrote:
> 
>> The only use case I am really concerned about is that of a Head Node
>> Process (HNP) that needs to talk to both IPv6 and IPv4 systems. I
>> admit this will be unusual,
> 
> This and other aspects were discussed or at least mentioned in a
> thread starting at:
> 
> http://www.open-mpi.org/community/lists/devel/2006/03/0781.php
> 
> I don't know why you think that this (talking to different nodes via
> different channels) is unusual - I think that it's quite probable,
> especially in a heterogenous environment.
> 
> However, if the present discussion is only about a proof of concept
> version, then I'd say that anything to show IPv6 functionality would
> be acceptable.

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Christian Kauhaus

Bogdan Costescu :
>I don't know why you think that this (talking to different nodes via 
>different channels) is unusual - I think that it's quite probable, 
>especially in a heterogenous environment.

I think the first goal should be to get IPv6 working -- and this is much
more easier when we restrict ourselves to the case when all system
participating in one(!) job are reachable via a single protocol version,
either IPv4 or IPv6. 

I'm not quite sure if we need to run a *single* job across a network
with both systems that are not reachable via IPv4 and systems
that are not reachable via IPv6. If there is a practical need for this,
we will probably tackle this in the future. Note that the current plan
does not restrict the use of OpenMPI in heterogenous IPv4/IPv6
environments, but we will not support mixed IPv4/IPv6 operation in a
single job right now. 

Our current plan is to look into the hostfile and see if there are 

(1a) just IPv4 addresses
(1b) IPv4 addresses and hostnames for which 'A' queries can be resolved
(2a) just IPv6 addresses
(2b) IPv6 addresses and hostnames for which '' queries can be resolved.

In case 1 we initially use an IPv4 transport and in case 2 we initially
use an IPv6 transport for the oob. If neither case 1 or 2 are possible,
we abort. 

I hope that all can agree that this is a good starting point. 

Regards
  Christian

-- 
Dipl.-Inf. Christian Kauhaus   <><
Lehrstuhl fuer Rechnerarchitektur und -kommunikation 
Institut fuer Informatik * Ernst-Abbe-Platz 1-2 * D-07743 Jena
Tel: +49 3641 9 46376  *  Fax: +49 3641 9 46372   *  Raum 3217

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Adrian Knoth

On Wed, Sep 06, 2006 at 05:44:23PM +0200, Christian Kauhaus wrote:

> Our current plan is to look into the hostfile and see if there are 
> 
> (1a) just IPv4 addresses
> (1b) IPv4 addresses and hostnames for which 'A' queries can be resolved
> (2a) just IPv6 addresses
> (2b) IPv6 addresses and hostnames for which '' queries can be resolved.

Speaking of which: Today, I've extended rds/hostfile/ to accept
IPv6 addresses.

This now gives me the possibility to specify IPv6 addresses,
resulting in an IPv4 (yes, I-P-v-four) connection.

Obviously, I'll have to investigate ;)

(just to let you know I'm working on it)

-- 
mail: a...@thur.de  http://adi.thur.de  PGP: v2-key via keyserver

Wer braucht 'ne Maus, wenn er 'ne Tastatur hat? (Sebastian Linser)

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Ralph H Castain

On 9/6/06 9:44 AM, "Christian Kauhaus"  wrote:

> Bogdan Costescu :
>> I don't know why you think that this (talking to different nodes via
>> different channels) is unusual - I think that it's quite probable,
>> especially in a heterogenous environment.
> 
> I think the first goal should be to get IPv6 working -- and this is much
> more easier when we restrict ourselves to the case when all system
> participating in one(!) job are reachable via a single protocol version,
> either IPv4 or IPv6.
> 
> I'm not quite sure if we need to run a *single* job across a network
> with both systems that are not reachable via IPv4 and systems
> that are not reachable via IPv6. If there is a practical need for this,
> we will probably tackle this in the future. Note that the current plan
> does not restrict the use of OpenMPI in heterogenous IPv4/IPv6
> environments, but we will not support mixed IPv4/IPv6 operation in a
> single job right now.
> 
> Our current plan is to look into the hostfile and see if there are
> 
> (1a) just IPv4 addresses
> (1b) IPv4 addresses and hostnames for which 'A' queries can be resolved
> (2a) just IPv6 addresses
> (2b) IPv6 addresses and hostnames for which '' queries can be resolved.
> 
> In case 1 we initially use an IPv4 transport and in case 2 we initially
> use an IPv6 transport for the oob. If neither case 1 or 2 are possible,
> we abort. 
> 

Actually, that could cause us considerable problem. Only a subset of OpenRTE
and OpenMPI users actually have hostfiles - the majority do not. Hence, if
we base the IPv6 operation on what is in a hostfile we will be in trouble.

I believe we are going to have to use the "select" mechanism of the OOB
and/or the RML frameworks to let us know which protocol to use when talking
to a specific host.

I also believe you cannot assume that this choice will be consistent for all
processes involved in a job. For example, the head node process must talk to
the external network, which may well be IPv6. However, the nodes *inside*
the cluster may well be IPv4 since they could likely be sitting on a NAT.
The HNP still needs to talk to those nodes as well as the external network.

I don't believe that letting both modes co-exist is all that much harder a
problem to solve. We have similar situations elsewhere in the code base and
have found that the framework mechanism works very well in this situation.

I need to answer Adrian's note anyway and will describe there how to handle
multiple component operations.

> I hope that all can agree that this is a good starting point.
> 
> Regards
>   Christian

Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL

Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL

Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL

Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL

Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL

Re: [OMPI devel] [IPv6] new component oob/tcp6

Re: [OMPI devel] [IPv6] new component oob/tcp6

Re: [OMPI devel] [IPv6] new component oob/tcp6

Re: [OMPI devel] [IPv6] new component oob/tcp6

Re: [OMPI devel] [IPv6] new component oob/tcp6

10 matches

Site Navigation

Mail list logo

Footer information