[OMPI devel] rhc-step2b compile failures
It looks like these compile failures last night: http://www.open-mpi.org/mtt/index.php?do_redir=531 Were caused by a missing header file in the openib btl that propagated down with Ralph's merge yesterday. I fixed it with r17453 on the trunk. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Scheduled merge of ORTE devel branch to trunk
Update on this plan... I am going to delay merging the branch over to the trunk for awhile, perhaps several weeks, for several reasons: 1. the trunk is somewhat unstable at the moment. Things are going in and being backed out, errors in recent commits being found and corrected, etc. at a pace I simply cannot track during a merge 2. I am still waiting for Galen's commit of the CNL code 3. we are about to hit a long weekend, and I need to get moving again on development I had no major conflicts when sync'ing the branch with the trunk, so I don't think the eventual merge will be a problem. For now, though, perhaps it is better to let the trunk settle down a little before introducing another major perturbation. We can revisit this during Tues telecon and see where we go from here. Ralph On 2/12/08 7:16 PM, "Ralph Castain" wrote: > Wellbest laid plans of mice and men, as they say. > > We were just having -way- too much fun here at IBM today going over the new > ORTE design, planning for future scalability changes, etc., so American > decided to cancel my flight back home! So thoughtful! > > I will be spending my Wed (hopefully!) enroute back home. It looks like it > will be Thurs before I can do the merge. My apologies to all - but I would > really rather not try to do it from a notebook computer in an airline > terminal! > > Ralph > > > On 2/12/08 9:54 AM, "Jeff Squyres" wrote: > >> Ralph -- >> >> We talked about this on the OMPI con call today and everyone agrees >> that this seems to be a good plan. Just as a safety net: if the merge >> goes disastrously wrong and you're unavailable Thu/Fri this week, we >> can just back it out and try again later. >> >> Thanks! >> >> >> On Feb 11, 2008, at 11:37 PM, Ralph Castain wrote: >> >>> Hello all >>> >>> Per last week's telecon, we planned the merge of the latest ORTE devel >>> branch to the OMPI trunk for after Sun had committed its C++ >>> changes. That >>> happened over the weekend. >>> >>> Therefore, based on the requests at the telecon, I will be merging the >>> current ORTE devel branch to the trunk on Wed 2/13. I'll make the >>> commit >>> around 4:30pm Eastern time - will send out warning shortly before >>> the commit >>> to let you know it is coming. I'll advise of any delays. >>> >>> This will be a snapshot of that devel branch - it will include the >>> upgraded >>> launch system, remove the GPR, add the new tool communication >>> library, allow >>> arbitrary mpiruns to interconnect, supports the revamped hostfile and >>> dash-host behaviors per the wiki, etc. >>> >>> However, it is incomplete and contains some known flaws. For example, >>> totalview support has not been enabled yet. Comm_spawn, which is >>> currently >>> broken on the OMPI trunk, is fixed - but singleton comm_spawn remains >>> broken. I am in the process of establishing support for direct and >>> standalone launch capabilities, but those won't be in the merge. I >>> have >>> updated all of the launchers, but can only certify the SLURM, TM, >>> and RSH >>> ones to work - the Xgrid launcher is known to not compile, so if you >>> have >>> Xgrid on your Mac, you need to tell the build system to not build that >>> component. >>> >>> This will give you a chance to look over the new arch, though, and I >>> understand that people would like to begin having a chance to test and >>> review the revised code. Hopefully, you will find most of the bugs >>> to be >>> minor. >>> >>> Please advise of any concerns about this merge. The schedule is >>> totally >>> driven by the requests of the MPI team members (delaying the merge >>> has no >>> impact on ORTE development), so requests to shift the schedule >>> should be >>> discussed amongst the community. >>> >>> Thanks >>> Ralph >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [RFC] Non-blocking collectives (LibNBC) merge to trunk
So I don't think that we ever concluded this discussion/RFC. I am in favor of bringing in libnbc, given the qualifications below. Others? On Feb 8, 2008, at 12:16 PM, Jeff Squyres wrote: Terry -- I reluctantly agree. :-) What I envision is not difficult (a first cut/feature-lean version is probably only several hundred lines of perl?), but I don't have the cycles (at present) to implement it -- my priorities are elsewhere at the moment. If anyone is interested in this, I would gladly talk them through what [I think] needs to be done. That being said, for NBC, per Terry's points: - if it's not compiled/installed by default - if we can make a big enough red flag for users that it's an R&D effort that is subject to change (perhaps 3'x5'?) Then I think it would not be a bad thing to include NBC. But then I think we need to disallow any other contrib/ projects until someone can find the cycles to implement a better solution (such as an ompi_contrib executable/system). On Feb 7, 2008, at 1:18 PM, Terry Dontje wrote: Jeff, the below sounds good if we really believe there is going to be a whole bunch of addons. I am not sure NBC really constitute as an addon than more some research work that might become an official API. So I look at the NBC stuff more like a BTL or PM that is in progress of being developed/refined for prime time. So would a new PM or BTL be added via ompi_contrib? I wouldn't think they would. The ompi_contrib sounds like a nice utility but I have feeling there are bigger fish to fry unless we really believe there will be a lot of addons that we will need to support. --td Jeff Squyres wrote: All these comments are good. I confess that although I should have, I really did not previously consider the complexity of adding in N contrib packages to OMPI. The goal of the contrib packages is to easily allow additional functionality that is nicely integrated with Open MPI. An obvious way to do this is to include the code in the Open MPI tarball, but that leads to the logistics and other issues that have been identified. Ralph proposes a good way around this. But what about going farther than that: what we if we offer a standardized set of hooks for including contrib functionality *after* core OMPI has been installed? Yes, it's one more step after OMPI has been installed -- but if we can keep it as *one* step, perhaps the user onus is not that bad. Let me explain. Consider a new standalone executable: ompi_contrib. You would run ompi_contrib to install and uninstall contrib functionality into your existing OMPI: ompi_contrib --install http://www.example.com/nbc/nbc-ompi-contrib.tar.gz or ompi_contrib --install file:///home/htor/nbc-ompi-contrib.tar.gz This will download NBC (if http), build it, and install it into the current OMPI. It is likely that the nbc-ompi-contrib.tar.gz file will contain the real NBC tarball (or maybe just a reference to it?) plus a small number of hook/glue scripts for OMPI integration (perhaps quite similar to what is in the contrib/ tree [on the branch] today for NBC?). Likewise, after NBC is installed into the local OMPI installation, ompi_info should be able to show "nbc" as installed contrib functionality. It then follows that we might be able to do: ompi_contrib --uninstall nbc to uninstall contrib NBC from the local OMPI installation. This kind of approach would seem to have several benefits: - Keep a clear[er] distinction between core OMPI and contributed packages. - Allow simple integration of MPI libraries, tools, and even applications (!) (think: numerical libraries, boost C++ libraries, etc. -- how many of your users install additional tools on top of MPI incorrectly?). Anything - Allow 3rd parties to have "contrib" code to Open MPI without needing to get into our code tree (and sign the 3rd party agreements, etc.), keeping our distribution size down, avoiding release schedule logistical issues, keeping our "core" build time down, etc. - Allow integration of contrib functionality at both a per-user and system-wide basis. What I'm really proposing here is that OMPI becomes a system that can have additional functionality installed / uninstalled. Based on the infrastructure that we already have, this is not as much of a stretch as one would think. Comments? ("who's going to write this" is a question that will also have to be answered, but perhaps we can discuss the code concept/idea first...) On Feb 7, 2008, at 10:11 AM, Ralph H Castain wrote: I believe Brian and Terry raise good points. May I offer a possible alternative? What if we only include in Open MPI an include file that contains the "hooks" to libNBC, and have the build system only "see" those if someone specifies --with-NBC (or whatever option name you like). If you like, you can make the inclusion automatic if libNBC is detected on the system. It would make sense to also add -libNBC to the mpicc et al wrappers as well when t
Re: [OMPI devel] [RFC] Non-blocking collectives (LibNBC) merge to trunk
I am in favor of bringing this in. - Galen On Feb 14, 2008, at 1:15 PM, Jeff Squyres wrote: So I don't think that we ever concluded this discussion/RFC. I am in favor of bringing in libnbc, given the qualifications below. Others? On Feb 8, 2008, at 12:16 PM, Jeff Squyres wrote: Terry -- I reluctantly agree. :-) What I envision is not difficult (a first cut/feature-lean version is probably only several hundred lines of perl?), but I don't have the cycles (at present) to implement it -- my priorities are elsewhere at the moment. If anyone is interested in this, I would gladly talk them through what [I think] needs to be done. That being said, for NBC, per Terry's points: - if it's not compiled/installed by default - if we can make a big enough red flag for users that it's an R&D effort that is subject to change (perhaps 3'x5'?) Then I think it would not be a bad thing to include NBC. But then I think we need to disallow any other contrib/ projects until someone can find the cycles to implement a better solution (such as an ompi_contrib executable/system). On Feb 7, 2008, at 1:18 PM, Terry Dontje wrote: Jeff, the below sounds good if we really believe there is going to be a whole bunch of addons. I am not sure NBC really constitute as an addon than more some research work that might become an official API. So I look at the NBC stuff more like a BTL or PM that is in progress of being developed/refined for prime time. So would a new PM or BTL be added via ompi_contrib? I wouldn't think they would. The ompi_contrib sounds like a nice utility but I have feeling there are bigger fish to fry unless we really believe there will be a lot of addons that we will need to support. --td Jeff Squyres wrote: All these comments are good. I confess that although I should have, I really did not previously consider the complexity of adding in N contrib packages to OMPI. The goal of the contrib packages is to easily allow additional functionality that is nicely integrated with Open MPI. An obvious way to do this is to include the code in the Open MPI tarball, but that leads to the logistics and other issues that have been identified. Ralph proposes a good way around this. But what about going farther than that: what we if we offer a standardized set of hooks for including contrib functionality *after* core OMPI has been installed? Yes, it's one more step after OMPI has been installed -- but if we can keep it as *one* step, perhaps the user onus is not that bad. Let me explain. Consider a new standalone executable: ompi_contrib. You would run ompi_contrib to install and uninstall contrib functionality into your existing OMPI: ompi_contrib --install http://www.example.com/nbc/nbc-ompi- contrib.tar.gz or ompi_contrib --install file:///home/htor/nbc-ompi- contrib.tar.gz This will download NBC (if http), build it, and install it into the current OMPI. It is likely that the nbc-ompi-contrib.tar.gz file will contain the real NBC tarball (or maybe just a reference to it?) plus a small number of hook/glue scripts for OMPI integration (perhaps quite similar to what is in the contrib/ tree [on the branch] today for NBC?). Likewise, after NBC is installed into the local OMPI installation, ompi_info should be able to show "nbc" as installed contrib functionality. It then follows that we might be able to do: ompi_contrib --uninstall nbc to uninstall contrib NBC from the local OMPI installation. This kind of approach would seem to have several benefits: - Keep a clear[er] distinction between core OMPI and contributed packages. - Allow simple integration of MPI libraries, tools, and even applications (!) (think: numerical libraries, boost C++ libraries, etc. -- how many of your users install additional tools on top of MPI incorrectly?). Anything - Allow 3rd parties to have "contrib" code to Open MPI without needing to get into our code tree (and sign the 3rd party agreements, etc.), keeping our distribution size down, avoiding release schedule logistical issues, keeping our "core" build time down, etc. - Allow integration of contrib functionality at both a per-user and system-wide basis. What I'm really proposing here is that OMPI becomes a system that can have additional functionality installed / uninstalled. Based on the infrastructure that we already have, this is not as much of a stretch as one would think. Comments? ("who's going to write this" is a question that will also have to be answered, but perhaps we can discuss the code concept/idea first...) On Feb 7, 2008, at 10:11 AM, Ralph H Castain wrote: I believe Brian and Terry raise good points. May I offer a possible alternative? What if we only include in Open MPI an include file that contains the "hooks" to libNBC, and have the build system only "see" those if someone specifies --with-NBC (or whatever option name you like). If you like, you can make the inclusion automatic if libNBC
Re: [OMPI devel] [RFC] Non-blocking collectives (LibNBC) merge to trunk
I am NOT in favor of bringing LibNBC in the trunk. Some of my concerns were already stated by Brian and Ralph, and the answers didn't clearly address all my reticences. I don't want to bring the MPI 3 Forum discussion on this mailing list, but I think that as long as there is a lack of the smallest beginning of consensus on the MPI Forum we should keep the LibNBC outside the main distribution. Second: who really need these? Please ask them to request it over the public mailing lists, and to share with us their needs and concerns. This will emphasize the needs for such a feature not only for Open MPI but for the MPI 3 as a whole. I have the chance to work surrounded by math people. Not some math users, but the ones who design, maintain, analyze, improve and work daily on some of the most used mathematical libraries out there. And they never had any needs for non blocking collective. Moreover, they state that most well designed algorithms have very regular patterns that fit well with the blocking way the collectives are designed today. Additionally, in the few cases where they might use non blocking approaches, based on the current trend toward multi-core what the MPI standard allow today if enough. More on the practical side I doubt that we want to validate, maintain and distribute something that will be useful to only a very limited number of people. Third: I wonder how the life cycle of this addition will be different that the libnbc that we already have in the mca/coll. I guess IU is maintaining the current coll/libnbc. However, is there anybody that test it on a regular basis? MTT doesn't contain anything related to libnbc. Is there anybody using it ? Forth: It was claimed that the integration of LibNBC will not require any modification of the Open MPI source, and that NBC_ will be the prologue of these functions. Then it make perfectly sense to distribute them as a separate library, isn't it ? Thanks, george. On Feb 14, 2008, at 1:15 PM, Jeff Squyres wrote: So I don't think that we ever concluded this discussion/RFC. I am in favor of bringing in libnbc, given the qualifications below. Others? On Feb 8, 2008, at 12:16 PM, Jeff Squyres wrote: Terry -- I reluctantly agree. :-) What I envision is not difficult (a first cut/feature-lean version is probably only several hundred lines of perl?), but I don't have the cycles (at present) to implement it -- my priorities are elsewhere at the moment. If anyone is interested in this, I would gladly talk them through what [I think] needs to be done. That being said, for NBC, per Terry's points: - if it's not compiled/installed by default - if we can make a big enough red flag for users that it's an R&D effort that is subject to change (perhaps 3'x5'?) Then I think it would not be a bad thing to include NBC. But then I think we need to disallow any other contrib/ projects until someone can find the cycles to implement a better solution (such as an ompi_contrib executable/system). On Feb 7, 2008, at 1:18 PM, Terry Dontje wrote: Jeff, the below sounds good if we really believe there is going to be a whole bunch of addons. I am not sure NBC really constitute as an addon than more some research work that might become an official API. So I look at the NBC stuff more like a BTL or PM that is in progress of being developed/refined for prime time. So would a new PM or BTL be added via ompi_contrib? I wouldn't think they would. The ompi_contrib sounds like a nice utility but I have feeling there are bigger fish to fry unless we really believe there will be a lot of addons that we will need to support. --td Jeff Squyres wrote: All these comments are good. I confess that although I should have, I really did not previously consider the complexity of adding in N contrib packages to OMPI. The goal of the contrib packages is to easily allow additional functionality that is nicely integrated with Open MPI. An obvious way to do this is to include the code in the Open MPI tarball, but that leads to the logistics and other issues that have been identified. Ralph proposes a good way around this. But what about going farther than that: what we if we offer a standardized set of hooks for including contrib functionality *after* core OMPI has been installed? Yes, it's one more step after OMPI has been installed -- but if we can keep it as *one* step, perhaps the user onus is not that bad. Let me explain. Consider a new standalone executable: ompi_contrib. You would run ompi_contrib to install and uninstall contrib functionality into your existing OMPI: ompi_contrib --install http://www.example.com/nbc/nbc-ompi-contrib.tar.gz or ompi_contrib --install file:///home/htor/nbc-ompi- contrib.tar.gz This will download NBC (if http), build it, and install it into the current OMPI. It is likely that the nbc-ompi-contrib.tar.gz file will contain the real NBC tarball (or maybe ju
[OMPI devel] memchecker and weak symbols
It turned out that memchecker break our usage of weak symbols. The problem is that the definition of the weak symbol should always appear before the first use of the function. There are two MPI functions that are used in the memchecker.h file: MPI_Type_get_contents and MPI_Type_get_envelope. The memchecker.h header get included before we get a chance to define the #pragma weak, and the symbols are incorrectly marked in the resulting object file. Additionally, I remember that we decided not to use any MPI level functions inside the Open MPI library. I guess the correct way of doing this is to use directly the functions provided by the datatype engine ompi_ddt_get_args once with which set to zero (to retrieve the values i.e. similar to MPI_Type_get_envelope) and once with "which" set to one (to retrieve the content i.e. similar to MPI_Type_get_content). Moreover, there is a better way to have the memchecker_call function implemented by taking advantage of the data-type engine. It will make memchecker really dependent of Open MPI ... but I guess not more than it is right now :) Ping me if you are interested in exploring this option. Thanks, george. smime.p7s Description: S/MIME cryptographic signature