I think your proposed approach is an excellent one! I know it will take work to implement, which raises its own issues, but I do believe that it is the only real long-term solution.
Just my $0.002. I would be willing to help with implementation, if that would be of use. Not sure I understand the build system well enough to just do it, I fear. On 2/7/08 9:34 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > All these comments are good. I confess that although I should have, I > really did not previously consider the complexity of adding in N > contrib packages to OMPI. > > The goal of the contrib packages is to easily allow additional > functionality that is nicely integrated with Open MPI. An obvious way > to do this is to include the code in the Open MPI tarball, but that > leads to the logistics and other issues that have been identified. > > Ralph proposes a good way around this. But what about going farther > than that: what we if we offer a standardized set of hooks for > including contrib functionality *after* core OMPI has been installed? > Yes, it's one more step after OMPI has been installed -- but if we can > keep it as *one* step, perhaps the user onus is not that bad. Let me > explain. > > Consider a new standalone executable: ompi_contrib. You would run > ompi_contrib to install and uninstall contrib functionality into your > existing OMPI: > > ompi_contrib --install http://www.example.com/nbc/nbc-ompi-contrib.tar.gz > or ompi_contrib --install file:///home/htor/nbc-ompi-contrib.tar.gz > > This will download NBC (if http), build it, and install it into the > current OMPI. It is likely that the nbc-ompi-contrib.tar.gz file will > contain the real NBC tarball (or maybe just a reference to it?) plus a > small number of hook/glue scripts for OMPI integration (perhaps quite > similar to what is in the contrib/ tree [on the branch] today for > NBC?). Likewise, after NBC is installed into the local OMPI > installation, ompi_info should be able to show "nbc" as installed > contrib functionality. It then follows that we might be able to do: > > ompi_contrib --uninstall nbc > > to uninstall contrib NBC from the local OMPI installation. > > This kind of approach would seem to have several benefits: > > - Keep a clear[er] distinction between core OMPI and contributed > packages. > > - Allow simple integration of MPI libraries, tools, and even > applications (!) (think: numerical libraries, boost C++ libraries, > etc. -- how many of your users install additional tools on top of MPI > incorrectly?). Anything > > - Allow 3rd parties to have "contrib" code to Open MPI without needing > to get into our code tree (and sign the 3rd party agreements, etc.), > keeping our distribution size down, avoiding release schedule > logistical issues, keeping our "core" build time down, etc. > > - Allow integration of contrib functionality at both a per-user and > system-wide basis. > > What I'm really proposing here is that OMPI becomes a system that can > have additional functionality installed / uninstalled. Based on the > infrastructure that we already have, this is not as much of a stretch > as one would think. > > Comments? > > ("who's going to write this" is a question that will also have to be > answered, but perhaps we can discuss the code concept/idea first...) > > > > On Feb 7, 2008, at 10:11 AM, Ralph H Castain wrote: > >> I believe Brian and Terry raise good points. May I offer a possible >> alternative? What if we only include in Open MPI an include file that >> contains the "hooks" to libNBC, and have the build system only "see" >> those >> if someone specifies --with-NBC (or whatever option name you like). >> If you >> like, you can make the inclusion automatic if libNBC is detected on >> the >> system. It would make sense to also add -libNBC to the mpicc et al >> wrappers >> as well when the build system includes the function definitions. >> >> This would allow those users that want (or can) to use that library >> link >> against it, without adding a bunch of source code to our release. I >> suspect >> there are complications that will have to be dealt with, but offer >> it as >> something to consider. >> >> >> Also, remember that there is an added burden when we add source code >> to Open >> MPI that we haven't discussed - we are now adding coordination >> issues to our >> own release cycle. If libNBC changes, are we now going to be pressed >> to >> issue another OMPI release so that the new NBC version is included? >> Do we >> now need to coordinate our releases with theirs so that things align? >> >> And if we have an increasing number of such "included" packages, how >> complex >> is -that- release discussion going to get?!? >> >> >> On 2/7/08 4:48 AM, "Terry Dontje" <terry.don...@sun.com> wrote: >> >>> Torsten Hoefler wrote: >>>> Hi Brian, >>>> >>>>> Let me start by reminding everyone that I have no vote, so this >>>>> should >>>>> probably be sent to /dev/null. >>>>> >>>> thanks for your comment and this will not go to /dev/null! >>>> >>>> >>>>> I think Ralph raised some good points. I'd like to raise another. >>>>> >>>> yes [will reply to this in a separate thread] >>>> >>>> >>>>> Does it make sense to bring LibNBC into the release at this point, >>>>> given the current standardization process of non-blocking >>>>> collectives? >>>>> >>>>> My feeling is no, based on the long term support costs. We had >>>>> this >>>>> problem with a function in LAM/MPI -- MPIL_SPAWN, I believe it >>>>> was -- >>>>> that was almost but not quite MPI_COMM_SPAWN. It was added to >>>>> allow >>>>> spawn before the standard was finished for dynamics. The problem >>>>> is, >>>>> it wasn't quite MPI_COMM_SPAWN, so we were now stuck with yet >>>>> another >>>>> function to support (in a touchy piece of code) for infinity and >>>>> beyond. >>>>> >>>>> I worry that we'll have the same with LibNBC -- a piece of code >>>>> that >>>>> solves an immediate problem (no non-blocking collectives in MPI) >>>>> but >>>>> will become a long-term support anchor. Since this is something >>>>> we'll >>>>> be encouraging users to write code to, it's not like support for >>>>> mvapi, where we can just deprecate it and users won't really >>>>> notice. >>>>> It's one thing to tell them to update their cluster software >>>>> stack -- >>>>> it's another to tell them to rewrite their applications. >>>>> >>>> I think this is a very good and valid point. However, I would like >>>> to >>>> deprecate the NBC_* things as soon as non-blocking collectives are a >>>> part of the standard. Of course, this would probably need two minor >>>> versions to "clean" the code-base, but this is (will be) our normal >>>> procedure (just what happened to MVAPI). >>>> >>>> >>> Though it doesn't seem to me that NBC is a slam dunk to get into >>> the MPI >>> spec and I could >>> imagine it changing significantly due to someone elses opinion/needs. >>>> And rewriting the user's application will not be that hard, it'll >>>> mainly >>>> be vim:%s/NBC_/MPI_/g. Even if we change the interface (e.g. add >>>> tags or >>>> decide to use the more limited split collective approach), this >>>> task is >>>> rather easy and can be automated easily. It's not a functionality >>>> change, just an interface. >>>> >>>> >>> Though if NBC is built by default for release builds I think that >>> raises >>> the bar saying that we >>> OMPI believe this should be used by all of our users without any >>> concerns that the API may >>> change or it might have significant issues. >>> >>> On a similar track do you have any tests that validate the >>> functionality/correctness of NBC >>> that can be ran as a part of the MTT nightly tests? >>> >>> My opinion is I have no problem with NBC being merged in just that I >>> don't think it should be >>> built by default. >>> >>> --td >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >