Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL
Hello, * Open MPI wrote on Wed, Sep 06, 2006 at 01:00:00PM CEST: > #334: Building with Libtool 2.1a fails to run OpenIB BTL >Are you testing uninstalled or installed programs/libraries? > > Installed. > >If you are testing an uninstalled program: does libtool generate a shell > wrapper for it? If yes: post the two shell wrappers generated by 1.5.22 > and HEAD. > > I was testing with a trivial "hello world" MPI application; i.e., one that > I had compiled with mpicc and was running with "mpirun -np 2 --mca btl > openib hello". Hence, I was testing against installed trees of Open MPI. > I took care to "rm -rf" the installation tree before testing each so that > there would be no kruft left from prior installs. OK, I can only assume that with 1.5.22, some code links against libibverbs, or loads it earlier at runtime, so that the symbol is present. In any case I wonder why mthca.so isn't linked directly against libibverbs (maybe useful to suggest that to upstream). To find out the culprit, here's a couple of quick (well) ways: diff build-with-lt1.5/Makefile build-with-lt2.1a/Makefile and look for differences in library link variables. Otherwise, the config.log outputs should give clues. To give you some more hints for possible causes: ompi_info could have a different set of RPATH entries, or different NEEDED libraries than your test executable; if any of those cause libibverbs.so to be loaded, then the symbol would be visible already. Maybe your test executables even have different RPATHs or NEEDED libs (find out with 'objdump -p' and ldd)? > I've attached 2 tarballs to the bug (you have to go to the URL of the bug > to get them; they are not included in the mails): If there are tarballs available at http://svn.open-mpi.org/trac/ompi/ticket/334, then I'm too blind to find them. Would that be elsewhere? > * One with all the configure output and the wrapper script for ompi_info. > Note that ompi_info -- which lt_dlopen()'s the OpenIB BTL -- does not show > the same problem (i.e., the OpenIB BTL opens properly and ompi_info shows > its information). This happens with both the uninstalled and installed > ompi_info. > * Another with the same for 2.1a. That would be very helpful. Cheers, Ralf
Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL
This error is usually happens when libibverbs is dlopened without RTLD_GLOBAL flag. On Wed, Sep 06, 2006 at 03:05:39PM +0200, Ralf Wildenhues wrote: > Hello, > > * Open MPI wrote on Wed, Sep 06, 2006 at 01:00:00PM CEST: > > #334: Building with Libtool 2.1a fails to run OpenIB BTL > > >Are you testing uninstalled or installed programs/libraries? > > > > Installed. > > > >If you are testing an uninstalled program: does libtool generate a shell > > wrapper for it? If yes: post the two shell wrappers generated by 1.5.22 > > and HEAD. > > > > I was testing with a trivial "hello world" MPI application; i.e., one that > > I had compiled with mpicc and was running with "mpirun -np 2 --mca btl > > openib hello". Hence, I was testing against installed trees of Open MPI. > > I took care to "rm -rf" the installation tree before testing each so that > > there would be no kruft left from prior installs. > > OK, I can only assume that with 1.5.22, some code links against > libibverbs, or loads it earlier at runtime, so that the symbol is > present. In any case I wonder why mthca.so isn't linked directly > against libibverbs (maybe useful to suggest that to upstream). > > To find out the culprit, here's a couple of quick (well) ways: > diff build-with-lt1.5/Makefile build-with-lt2.1a/Makefile > > and look for differences in library link variables. Otherwise, the > config.log outputs should give clues. > > To give you some more hints for possible causes: ompi_info could have a > different set of RPATH entries, or different NEEDED libraries than your > test executable; if any of those cause libibverbs.so to be loaded, then > the symbol would be visible already. Maybe your test executables even > have different RPATHs or NEEDED libs (find out with 'objdump -p' and > ldd)? > > > I've attached 2 tarballs to the bug (you have to go to the URL of the bug > > to get them; they are not included in the mails): > > If there are tarballs available at > http://svn.open-mpi.org/trac/ompi/ticket/334, then I'm too blind to find > them. Would that be elsewhere? > > > * One with all the configure output and the wrapper script for ompi_info. > > Note that ompi_info -- which lt_dlopen()'s the OpenIB BTL -- does not show > > the same problem (i.e., the OpenIB BTL opens properly and ompi_info shows > > its information). This happens with both the uninstalled and installed > > ompi_info. > > * Another with the same for 2.1a. > > That would be very helpful. > > Cheers, > Ralf > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.
Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL
* Gleb Natapov wrote on Wed, Sep 06, 2006 at 03:26:38PM CEST: > This error is usually happens when libibverbs is dlopened without > RTLD_GLOBAL flag. Ah, yep. That is a difference between 2.1a and 1.5.x. Not an undisputed one. I don't know enough about the setup to suggest a workaround right away. It may also be useful to bring this up on the Libtool list, as a case in point against this change (or for some way to choose either). Cheers, Ralf
Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL
On 9/6/06 9:05 AM, "Ralf Wildenhues" wrote: >> I was testing with a trivial "hello world" MPI application; i.e., one that >> I had compiled with mpicc and was running with "mpirun -np 2 --mca btl >> openib hello". Hence, I was testing against installed trees of Open MPI. >> I took care to "rm -rf" the installation tree before testing each so that >> there would be no kruft left from prior installs. > > OK, I can only assume that with 1.5.22, some code links against > libibverbs, or loads it earlier at runtime, so that the symbol is > present. In any case I wonder why mthca.so isn't linked directly > against libibverbs (maybe useful to suggest that to upstream). I'm not sure. I think it's a plugin; I'll ask. > To find out the culprit, here's a couple of quick (well) ways: > diff build-with-lt1.5/Makefile build-with-lt2.1a/Makefile > > and look for differences in library link variables. Otherwise, the > config.log outputs should give clues. I did that early in the ticket and didn't see much of a difference. However, while typing this, Gleb just replied about RTLD_GLOBAL. I'll go comment on that... > To give you some more hints for possible causes: ompi_info could have a > different set of RPATH entries, or different NEEDED libraries than your > test executable; if any of those cause libibverbs.so to be loaded, then > the symbol would be visible already. Maybe your test executables even > have different RPATHs or NEEDED libs (find out with 'objdump -p' and > ldd)? > >> I've attached 2 tarballs to the bug (you have to go to the URL of the bug >> to get them; they are not included in the mails): > > If there are tarballs available at > http://svn.open-mpi.org/trac/ompi/ticket/334, then I'm too blind to find > them. Would that be elsewhere? It's because I'm a bozo and forgot to attach them. They're now near the top of the ticket in the "Attachments" section. Sorry about that... http://svn.open-mpi.org/trac/ompi/ticket/334 -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL
On 9/6/06 9:37 AM, "Ralf Wildenhues" wrote: >> This error is usually happens when libibverbs is dlopened without >> RTLD_GLOBAL flag. > > Ah, yep. That is a difference between 2.1a and 1.5.x. Not an > undisputed one. > > I don't know enough about the setup to suggest a workaround right away. > It may also be useful to bring this up on the Libtool list, as a case > in point against this change (or for some way to choose either). The setup is that we lt_dlopen() the OpenIB BTL (which was linked against libibverbs.so), which, in turn, dlopen's mthca.so. Hence, since mthca.so apparently requires at least one symbol from libibverbs.so, that dlopen fails. I'll mail the LT list. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
Re: [OMPI devel] [IPv6] new component oob/tcp6
On Fri, 1 Sep 2006, Ralph Castain wrote: > The only use case I am really concerned about is that of a Head Node > Process (HNP) that needs to talk to both IPv6 and IPv4 systems. I > admit this will be unusual, This and other aspects were discussed or at least mentioned in a thread starting at: http://www.open-mpi.org/community/lists/devel/2006/03/0781.php I don't know why you think that this (talking to different nodes via different channels) is unusual - I think that it's quite probable, especially in a heterogenous environment. However, if the present discussion is only about a proof of concept version, then I'd say that anything to show IPv6 functionality would be acceptable. -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI devel] [IPv6] new component oob/tcp6
Actually, I was a part of that thread - see my comments beginning with http://www.open-mpi.org/community/lists/devel/2006/03/0797.php. Perhaps I communicated poorly here. The issue in the prior thread was that few systems nowadays don't offer at least some level of IPv6 compatibility, even if nothing more than mapping IPv6 addresses to IPv4. My point in that thread was that some types of systems (e.g., embedded systems) don't - they have no ability to interact with IPv6 at all - but that these are not commonly found in the high performance world (the focus of OpenMPI). Although I expect hetero operations to be fairly common, I don't expect to see too many high performance systems that have no library support at all for IPv6. Hope that clarifies my comment. The intent is to fully support both types of systems anyway, so I'll concede that the point (i.e., how unusual the situation might be) is somewhat moot. On 9/6/06 8:13 AM, "Bogdan Costescu" wrote: > On Fri, 1 Sep 2006, Ralph Castain wrote: > >> The only use case I am really concerned about is that of a Head Node >> Process (HNP) that needs to talk to both IPv6 and IPv4 systems. I >> admit this will be unusual, > > This and other aspects were discussed or at least mentioned in a > thread starting at: > > http://www.open-mpi.org/community/lists/devel/2006/03/0781.php > > I don't know why you think that this (talking to different nodes via > different channels) is unusual - I think that it's quite probable, > especially in a heterogenous environment. > > However, if the present discussion is only about a proof of concept > version, then I'd say that anything to show IPv6 functionality would > be acceptable.
Re: [OMPI devel] [IPv6] new component oob/tcp6
Bogdan Costescu : >I don't know why you think that this (talking to different nodes via >different channels) is unusual - I think that it's quite probable, >especially in a heterogenous environment. I think the first goal should be to get IPv6 working -- and this is much more easier when we restrict ourselves to the case when all system participating in one(!) job are reachable via a single protocol version, either IPv4 or IPv6. I'm not quite sure if we need to run a *single* job across a network with both systems that are not reachable via IPv4 and systems that are not reachable via IPv6. If there is a practical need for this, we will probably tackle this in the future. Note that the current plan does not restrict the use of OpenMPI in heterogenous IPv4/IPv6 environments, but we will not support mixed IPv4/IPv6 operation in a single job right now. Our current plan is to look into the hostfile and see if there are (1a) just IPv4 addresses (1b) IPv4 addresses and hostnames for which 'A' queries can be resolved (2a) just IPv6 addresses (2b) IPv6 addresses and hostnames for which '' queries can be resolved. In case 1 we initially use an IPv4 transport and in case 2 we initially use an IPv6 transport for the oob. If neither case 1 or 2 are possible, we abort. I hope that all can agree that this is a good starting point. Regards Christian -- Dipl.-Inf. Christian Kauhaus <>< Lehrstuhl fuer Rechnerarchitektur und -kommunikation Institut fuer Informatik * Ernst-Abbe-Platz 1-2 * D-07743 Jena Tel: +49 3641 9 46376 * Fax: +49 3641 9 46372 * Raum 3217
Re: [OMPI devel] [IPv6] new component oob/tcp6
On Wed, Sep 06, 2006 at 05:44:23PM +0200, Christian Kauhaus wrote: > Our current plan is to look into the hostfile and see if there are > > (1a) just IPv4 addresses > (1b) IPv4 addresses and hostnames for which 'A' queries can be resolved > (2a) just IPv6 addresses > (2b) IPv6 addresses and hostnames for which '' queries can be resolved. Speaking of which: Today, I've extended rds/hostfile/ to accept IPv6 addresses. This now gives me the possibility to specify IPv6 addresses, resulting in an IPv4 (yes, I-P-v-four) connection. Obviously, I'll have to investigate ;) (just to let you know I'm working on it) -- mail: a...@thur.de http://adi.thur.de PGP: v2-key via keyserver Wer braucht 'ne Maus, wenn er 'ne Tastatur hat? (Sebastian Linser)
Re: [OMPI devel] [IPv6] new component oob/tcp6
On 9/6/06 9:44 AM, "Christian Kauhaus" wrote: > Bogdan Costescu : >> I don't know why you think that this (talking to different nodes via >> different channels) is unusual - I think that it's quite probable, >> especially in a heterogenous environment. > > I think the first goal should be to get IPv6 working -- and this is much > more easier when we restrict ourselves to the case when all system > participating in one(!) job are reachable via a single protocol version, > either IPv4 or IPv6. > > I'm not quite sure if we need to run a *single* job across a network > with both systems that are not reachable via IPv4 and systems > that are not reachable via IPv6. If there is a practical need for this, > we will probably tackle this in the future. Note that the current plan > does not restrict the use of OpenMPI in heterogenous IPv4/IPv6 > environments, but we will not support mixed IPv4/IPv6 operation in a > single job right now. > > Our current plan is to look into the hostfile and see if there are > > (1a) just IPv4 addresses > (1b) IPv4 addresses and hostnames for which 'A' queries can be resolved > (2a) just IPv6 addresses > (2b) IPv6 addresses and hostnames for which '' queries can be resolved. > > In case 1 we initially use an IPv4 transport and in case 2 we initially > use an IPv6 transport for the oob. If neither case 1 or 2 are possible, > we abort. > Actually, that could cause us considerable problem. Only a subset of OpenRTE and OpenMPI users actually have hostfiles - the majority do not. Hence, if we base the IPv6 operation on what is in a hostfile we will be in trouble. I believe we are going to have to use the "select" mechanism of the OOB and/or the RML frameworks to let us know which protocol to use when talking to a specific host. I also believe you cannot assume that this choice will be consistent for all processes involved in a job. For example, the head node process must talk to the external network, which may well be IPv6. However, the nodes *inside* the cluster may well be IPv4 since they could likely be sitting on a NAT. The HNP still needs to talk to those nodes as well as the external network. I don't believe that letting both modes co-exist is all that much harder a problem to solve. We have similar situations elsewhere in the code base and have found that the framework mechanism works very well in this situation. I need to answer Adrian's note anyway and will describe there how to handle multiple component operations. > I hope that all can agree that this is a good starting point. > > Regards > Christian