[OMPI devel] 3rd party code contributions
I thought maybe we should move this to another thread as it really isn't about Torsten's specific RFC. I just took a quick gander at the code base to see how extensive this problem might really be per Terry's concern. What I found was that we have added 3rd party code in several places. How we want to define them in terms of this issue is probably something for discussion. Packages I could readily identify include: 1. event library 2. ROMIO 3. VT 4. backtrace 5. PLPA - this one is a little less obvious, but still being released as a separate package 6. libNBC There may well be others - these are only the ones I know about. By 3rd party package, I mean these are blocks of code obtained as a complete, distinct version and "dropped in" to the OMPI code repository, and then to some degree tied into our build system. They are not code specifically developed for OMPI by OMPI developers. We have already discussed the issues with this approach. I am particularly concerned with the maintenance and release cycle issues right now. If these packages could be linked to our code instead of embedded within it, then it seems to me that updating them could become much easier. For example, we could download and install the latest ROMIO + Panasas patch, compile it, and simply link it into libompi - without occupying someone with constantly fixing the build system issues, etc. Obviously, I don't claim to know enough about what was done to integrate ROMIO to know if this would easily work. I only use it to illustrate the point - the same could be said about the event library, for example. Given our maintenance support problems, it would seem to me that changing the way we do 3rd party packaging may be worth consideration and some effort. I can't prioritize that relative to 1.3, though I do note that, from LANL's perspective, the ROMIO issue is a definite blocker for 1.3 release. Ralph > Subject: Re: [OMPI devel] [RFC] Non-blocking collectives (LibNBC) merge to > trunk > From: Terry Dontje (Terry.Dontje_at_[hidden]) > Date: 2008-02-07 13:18:36 > > Jeff, the below sounds good if we really believe there is going to be a > whole bunch of addons. I am not sure NBC really constitute as an addon > than more some research work that might become an official API. So I > look at the NBC stuff more like a BTL or PM that is in progress of being > developed/refined for prime time. So would a new PM or BTL be added via > ompi_contrib? I wouldn't think they would. > > The ompi_contrib sounds like a nice utility but I have feeling there are > bigger fish to fry unless we really believe there will be a lot of > addons that we will need to support. > > --td > > Jeff Squyres wrote: >> All these comments are good. I confess that although I should have, I >> really did not previously consider the complexity of adding in N >> contrib packages to OMPI. >> >> The goal of the contrib packages is to easily allow additional >> functionality that is nicely integrated with Open MPI. An obvious way >> to do this is to include the code in the Open MPI tarball, but that >> leads to the logistics and other issues that have been identified. >> >> Ralph proposes a good way around this. But what about going farther >> than that: what we if we offer a standardized set of hooks for >> including contrib functionality *after* core OMPI has been installed? >> Yes, it's one more step after OMPI has been installed -- but if we can >> keep it as *one* step, perhaps the user onus is not that bad. Let me >> explain. >> >> Consider a new standalone executable: ompi_contrib. You would run >> ompi_contrib to install and uninstall contrib functionality into your >> existing OMPI: >> >> ompi_contrib --install http://www.example.com/nbc/nbc-ompi-contrib.tar.gz >> or ompi_contrib --install file:///home/htor/nbc-ompi-contrib.tar.gz >> >> This will download NBC (if http), build it, and install it into the >> current OMPI. It is likely that the nbc-ompi-contrib.tar.gz file will >> contain the real NBC tarball (or maybe just a reference to it?) plus a >> small number of hook/glue scripts for OMPI integration (perhaps quite >> similar to what is in the contrib/ tree [on the branch] today for >> NBC?). Likewise, after NBC is installed into the local OMPI >> installation, ompi_info should be able to show "nbc" as installed >> contrib functionality. It then follows that we might be able to do: >> >> ompi_contrib --uninstall nbc >> >> to uninstall contrib NBC from the local OMPI installation. >> >> This kind of approach would seem to have several benefits: >> >> - Keep a clear[er] distinction between core OMPI and contributed >> packages. >> >> - Allow simple integration of MPI libraries, tools, and even >> applications (!) (think: numerical libraries, boost C++ libraries, >> etc. -- how many of your users install additional tools on top of MPI >> incorrectly?). Anything >> >> - Allow 3rd parties to have "contrib" code to Open MPI without needing >> to get into o
Re: [OMPI devel] ROMIO
I know that Argonne was engaged at some level to help with the OMPI ROMIO integration -- was it on a formal or informal basis? On Feb 7, 2008, at 12:02 PM, Ralph H Castain wrote: I just -know- this is everyone's favorite subject, but... Brian used to take care of the ROMIO code in Open MPI, but he has now moved on to greener, happier pastures. As he left, he did raise the question of who was going to maintain ROMIO, which we all happily dodged. I raise this question again because I have been informed that a new ROMIO patch may have come out that is required for Panasas support. I don't know enough myself to verify this situation, but it did raise the flag about who is going to track and support this area of our code, especially since we now may have to do something in that area for 1.3. Any great thoughts? Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] 3rd party code contributions
On Feb 8, 2008, at 10:38 AM, Ralph Castain wrote: I thought maybe we should move this to another thread as it really isn't about Torsten's specific RFC. I just took a quick gander at the code base to see how extensive this problem might really be per Terry's concern. What I found was that we have added 3rd party code in several places. How we want to define them in terms of this issue is probably something for discussion. Packages I could readily identify include: 1. event library 4. backtrace 5. PLPA - this one is a little less obvious, but still being released as a separate package FWIW, these packages are part of "core" OMPI and are not especially problematic. We upgrade them when we have a need or desire to (which has been low frequency); we don't try to stay in sync with their release schedules at all. 2. ROMIO ROMIO has traditionally been a problem (keeping up with its releases and patches). We have long-since agreed that we definitely want to include ROMIO in our tarball, even though that presents challenges. One thing that makes it *slightly* easier is that Brian added the mechanics for OMPI to use a ROMIO that is outside of Open MPI rather than the one that is bundled with it. It's not a perfect solution, but it does help some. 3. VT 6. libNBC These two are definitely in the "contrib" category. There may well be others - these are only the ones I know about. By 3rd party package, I mean these are blocks of code obtained as a complete, distinct version and "dropped in" to the OMPI code repository, and then to some degree tied into our build system. They are not code specifically developed for OMPI by OMPI developers. Those are all that I'm aware of. We have already discussed the issues with this approach. I am particularly concerned with the maintenance and release cycle issues right now. If these packages could be linked to our code instead of embedded within it, then it seems to me that updating them could become much easier. For example, we could download and install the latest ROMIO + Panasas patch, compile it, and simply link it into libompi - without occupying someone with constantly fixing the build system issues, etc. FWIW: - event,backtrace,PLPA,ROMIO are included in OMPI because we wanted to certify them as part of "core" OMPI. That is, we wanted to certify the whole system (vs. relying on [untested] combinations of versions that already exist on users' systems). - ROMIO is likely the only one of that group that presents ongoing logistics problems. The mechanism Brian added was seen as a workaround. Argonne will definitely need to be involved at some level to improve the ROMIO integration. Some talks started between Brian, me, and Rob(ANL) about a) making our integration better/easier, and b) having access to the ROMIO SVN to be able to suck down releases when we want to, but they kinda tapered off (Brian left and I got other priorities). There was also talk of LANL maintaining its own ROMIO tree and pushing it into OMPI, but I don't know what happened there. I can help with part of the ROMIO make-the-integration-easier (not in the immediate future, though -- probably not for a few weeks), but I do not think that I can do it on an ongoing basis. Note, too, that ROMIO is no longer distributed as a separate package -- it's only included in MPICH2. So it's a little harder to just link against a ROMIO that is already installed on a system -- there won't be one that isn't already bundled with an MPI. - vt and libnbc are a different category; they are add-on functionality, not "core" OMPI. Obviously, I don't claim to know enough about what was done to integrate ROMIO to know if this would easily work. I only use it to illustrate the point - the same could be said about the event library, for example. Given our maintenance support problems, it would seem to me that changing the way we do 3rd party packaging may be worth consideration and some effort. I can't prioritize that relative to 1.3, though I do note that, from LANL's perspective, the ROMIO issue is a definite blocker for 1.3 release. Hmm. This is odd because of the prior statements about ROMIO from LANL (that LANL was going to maintain ROMIO and push it into OMPI). I'm assuming that's changed? If ROMIO is a v1.3 blocker for LANL, can LANL commit resources to fixing the problem? -- Jeff Squyres Cisco Systems
Re: [OMPI devel] 3rd party code contributions
On Fri, 8 Feb 2008, Ralph Castain wrote: 1. event library 2. ROMIO 3. VT 4. backtrace 5. PLPA - this one is a little less obvious, but still being released as a separate package 6. libNBC Sorry to Ralph, but I clipped everything from his e-mail, then am going to make references to it. oh well :). One minor correction -- the entire backtrace framework is not a third party deal. The *DARWIN/Mac OS X* component relies heavily on third party code, but the others (Linux and Solaris) are just wrappers around code in their respective C libraries. I believe I was responsible for the event library, ROMIO, and backtrace before leaving LANL. I'll go through the motivations and issues with all three in terms of integration. Event Library: The event library is the core "rendezvous" point for all of Open MPI, so any issues with it cause lots of issues with Open MPI in general. We've also hacked it considerably since taking the original libevent source -- we've renamed all the functions, we've made it thread safe in a way the author was unwilling to do, we've fixed some performance issues unique to our usage model. In short, this is no longer really the same libevent that might already be installed on the system. Using such an unmodified libevent would be disasterous. ROMIO is actually one that there was significant discussion about prior to me leaveing Los Alamos. There are a number of problems / issues with ROMIO. First and foremost, without ROMIO, we are not a fully compliant MPI implementation. So we have to ship ROMIO -- it's the only way to have that important check mark. But its current integration has some issues -- it's hard to test patches independently. There is actually a mode in the current Open MPI tree where the MPI interface to MPI-I/O is not provided by OPen MPI and no io components are built. This is to allow users to build ROMIO independently of Open MPI, for testing updates or whatever. There are some disadvantages to this. First, the independent ROMIO will use generalized requests instead of being hooked into our progress engine, so there may be some progress issues (I never verified either way). Second, it does mean dealing with another package to build on the user's site. Jeff is correct --there was discussion about how to make the integration "better" -- many of the changes were on our side, and we were going to have to ask for a couple of changes from Argonne. If someone is going to put in the considerable amount of time to make this happen, I'm happy to write up whatever notes I can remember / find on the issue. The Darwin backtrace component is mostly maintanance free. It doesn't support 64-bit Intel chips, but that's fine. Once every 18 months or so, I need to get a new copy for the latest operation system, although the truth is I don't think anything bad happens if we just stop doing the updates at OS release (by the way, I did the one for Leopard, so we're probably all going to be sick of MPI and on to other things before the next time it has to be done). While it's useful, if the community is really worried, it could probably be deleted. But having a stack trace when you segfault sure is nice :). Brian
[OMPI devel] Datasize confusion in MPI_Write can lead to data los!
Hello! I tested openMPI at HLRS for some time without detecting new problems in the implementation but now I recognized some awful ones with MPI_Write which can lead to data los: When creating a struct for a mixed datatype like struct { short a; int b; } the C-compiler introduce a gap of 2 bytes in the data representation for this type due to the 4byte alignment of the integer on 32bit systems. If I now try to use MPI_File_write to write these data to a file and use MPI_SHORT_INT as mpi_datatype this leads to a data los. I located the problem at the combined use of "write" and MPI_Type_size in MPI_File_write. So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct uses 8 bytes in memory as there is a gap of 2 bytes. The write function in ad_write.c now leads to the los of the data because the gaps are not within the calculation of the complete data size to be written into the file. This problem occures also in the other io functions. As far as I could find out the problem seems not to be present with derived data types. The question is now how to "fix": i) Either the MPI_Standard is not clear in this point and the data types MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used with structs of these types, ii) Or the implementation of the MPI_Type_size function has to be modified to return the value of eg. true_ub which contains the correct value iii) Or the MPI_File_write function has not to use the write function in the "continues" way on the data and should take care of the gaps. Regards Christoph Niethammer signature.asc Description: This is a digitally signed message part.
Re: [OMPI devel] [RFC] Non-blocking collectives (LibNBC) merge to trunk
Terry -- I reluctantly agree. :-) What I envision is not difficult (a first cut/feature-lean version is probably only several hundred lines of perl?), but I don't have the cycles (at present) to implement it -- my priorities are elsewhere at the moment. If anyone is interested in this, I would gladly talk them through what [I think] needs to be done. That being said, for NBC, per Terry's points: - if it's not compiled/installed by default - if we can make a big enough red flag for users that it's an R&D effort that is subject to change (perhaps 3'x5'?) Then I think it would not be a bad thing to include NBC. But then I think we need to disallow any other contrib/ projects until someone can find the cycles to implement a better solution (such as an ompi_contrib executable/system). On Feb 7, 2008, at 1:18 PM, Terry Dontje wrote: Jeff, the below sounds good if we really believe there is going to be a whole bunch of addons. I am not sure NBC really constitute as an addon than more some research work that might become an official API. So I look at the NBC stuff more like a BTL or PM that is in progress of being developed/refined for prime time. So would a new PM or BTL be added via ompi_contrib? I wouldn't think they would. The ompi_contrib sounds like a nice utility but I have feeling there are bigger fish to fry unless we really believe there will be a lot of addons that we will need to support. --td Jeff Squyres wrote: All these comments are good. I confess that although I should have, I really did not previously consider the complexity of adding in N contrib packages to OMPI. The goal of the contrib packages is to easily allow additional functionality that is nicely integrated with Open MPI. An obvious way to do this is to include the code in the Open MPI tarball, but that leads to the logistics and other issues that have been identified. Ralph proposes a good way around this. But what about going farther than that: what we if we offer a standardized set of hooks for including contrib functionality *after* core OMPI has been installed? Yes, it's one more step after OMPI has been installed -- but if we can keep it as *one* step, perhaps the user onus is not that bad. Let me explain. Consider a new standalone executable: ompi_contrib. You would run ompi_contrib to install and uninstall contrib functionality into your existing OMPI: ompi_contrib --install http://www.example.com/nbc/nbc-ompi-contrib.tar.gz or ompi_contrib --install file:///home/htor/nbc-ompi-contrib.tar.gz This will download NBC (if http), build it, and install it into the current OMPI. It is likely that the nbc-ompi-contrib.tar.gz file will contain the real NBC tarball (or maybe just a reference to it?) plus a small number of hook/glue scripts for OMPI integration (perhaps quite similar to what is in the contrib/ tree [on the branch] today for NBC?). Likewise, after NBC is installed into the local OMPI installation, ompi_info should be able to show "nbc" as installed contrib functionality. It then follows that we might be able to do: ompi_contrib --uninstall nbc to uninstall contrib NBC from the local OMPI installation. This kind of approach would seem to have several benefits: - Keep a clear[er] distinction between core OMPI and contributed packages. - Allow simple integration of MPI libraries, tools, and even applications (!) (think: numerical libraries, boost C++ libraries, etc. -- how many of your users install additional tools on top of MPI incorrectly?). Anything - Allow 3rd parties to have "contrib" code to Open MPI without needing to get into our code tree (and sign the 3rd party agreements, etc.), keeping our distribution size down, avoiding release schedule logistical issues, keeping our "core" build time down, etc. - Allow integration of contrib functionality at both a per-user and system-wide basis. What I'm really proposing here is that OMPI becomes a system that can have additional functionality installed / uninstalled. Based on the infrastructure that we already have, this is not as much of a stretch as one would think. Comments? ("who's going to write this" is a question that will also have to be answered, but perhaps we can discuss the code concept/idea first...) On Feb 7, 2008, at 10:11 AM, Ralph H Castain wrote: I believe Brian and Terry raise good points. May I offer a possible alternative? What if we only include in Open MPI an include file that contains the "hooks" to libNBC, and have the build system only "see" those if someone specifies --with-NBC (or whatever option name you like). If you like, you can make the inclusion automatic if libNBC is detected on the system. It would make sense to also add -libNBC to the mpicc et al wrappers as well when the build system includes the function definitions. This would allow those users that want (or can) to use that library link against it, without adding a bunch of so
Re: [OMPI devel] Datasize confusion in MPI_Write can lead to data los!
MPI_Type_size is supposed to return only the size of useful data, which apparently it does (MPI_SHORT_INT is 6 bytes). What I think it happens is that the MPI_SHORT_INT type is a predefined one, but it's a really strange predefined type. It's one of the few that are not contiguous. The problem seems to come from the fact that the MPI_File_write do a contiguous write for the predefined data types, making the assumption that they are all contiguous. I tracked the problem down in the romio/adio/common/is_contig.c file. For Open MPI the last #else branch is used. The first case in the switch check for the MPI_COMBINER_NAMED (which is what an MPI is supposed to return for predefined data types) and set the flag to 1 (which means contiguous). This is obviously wrong for MPI_SHORT_INT. It really look like a ROMIO problem, so I guess this email should be redirected to their mailing list. Thanks, george. On Feb 8, 2008, at 12:50 PM, Christoph Niethammer wrote: Hello! I tested openMPI at HLRS for some time without detecting new problems in the implementation but now I recognized some awful ones with MPI_Write which can lead to data los: When creating a struct for a mixed datatype like struct { short a; int b; } the C-compiler introduce a gap of 2 bytes in the data representation for this type due to the 4byte alignment of the integer on 32bit systems. If I now try to use MPI_File_write to write these data to a file and use MPI_SHORT_INT as mpi_datatype this leads to a data los. I located the problem at the combined use of "write" and MPI_Type_size in MPI_File_write. So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct uses 8 bytes in memory as there is a gap of 2 bytes. The write function in ad_write.c now leads to the los of the data because the gaps are not within the calculation of the complete data size to be written into the file. This problem occures also in the other io functions. As far as I could find out the problem seems not to be present with derived data types. The question is now how to "fix": i) Either the MPI_Standard is not clear in this point and the data types MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used with structs of these types, ii) Or the implementation of the MPI_Type_size function has to be modified to return the value of eg. true_ub which contains the correct value iii) Or the MPI_File_write function has not to use the write function in the "continues" way on the data and should take care of the gaps. Regards Christoph Niethammer ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] Datasize confusion in MPI_Write can lead to data los!
Hi George, Good, if You come to the same conclusion with regard to romio using MPI_Type_size internally in RomIO... So taking iscontig.c ,-] /* This function needs more work. It should check for contiguity in other cases as well.*/ and mail to the romio list or have a specialized version of ADIOI_Datatype_iscontig for ompi ,-] Either way, the mpi_test_suite in that regard is sane. Thanks, Rainer On Friday 08 February 2008 18:22, George Bosilca wrote: > MPI_Type_size is supposed to return only the size of useful data, > which apparently it does (MPI_SHORT_INT is 6 bytes). What I think it > happens is that the MPI_SHORT_INT type is a predefined one, but it's a > really strange predefined type. It's one of the few that are not > contiguous. The problem seems to come from the fact that the > MPI_File_write do a contiguous write for the predefined data types, > making the assumption that they are all contiguous. > > I tracked the problem down in the romio/adio/common/is_contig.c file. > For Open MPI the last #else branch is used. The first case in the > switch check for the MPI_COMBINER_NAMED (which is what an MPI is > supposed to return for predefined data types) and set the flag to 1 > (which means contiguous). This is obviously wrong for MPI_SHORT_INT. > It really look like a ROMIO problem, so I guess this email should be > redirected to their mailing list. > >Thanks, > george. > > On Feb 8, 2008, at 12:50 PM, Christoph Niethammer wrote: > > Hello! > > > > I tested openMPI at HLRS for some time without detecting new > > problems in the > > implementation but now I recognized some awful ones with MPI_Write > > which can > > lead to data los: > > > > When creating a struct for a mixed datatype like > > > > struct { > > short a; > > int b; > > } > > > > the C-compiler introduce a gap of 2 bytes in the data representation > > for this > > type due to the 4byte alignment of the integer on 32bit systems. > > > > If I now try to use MPI_File_write to write these data to a file and > > use > > MPI_SHORT_INT as mpi_datatype this leads to a data los. > > > > I located the problem at the combined use of "write" and > > MPI_Type_size in > > MPI_File_write. > > So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct > > uses 8 bytes > > in memory as there is a gap of 2 bytes. The write function in > > ad_write.c now > > leads to the los of the data because the gaps are not within the > > calculation > > of the complete data size to be written into the file. > > > > This problem occures also in the other io functions. > > As far as I could find out the problem seems not to be present with > > derived > > data types. > > > > The question is now how to "fix": > > i) Either the MPI_Standard is not clear in this point and the data > > types > > MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used with > > structs of these types, > > ii) Or the implementation of the MPI_Type_size function has to be > > modified to > > return the value of eg. true_ub which contains the correct value > > iii) Or the MPI_File_write function has not to use the write > > function in > > the "continues" way on the data and should take care of the gaps. > > > > Regards > > > > Christoph Niethammer > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Dipl.-Inf. Rainer Keller http://www.hlrs.de/people/keller HLRS Tel: ++49 (0)711-685 6 5858 Nobelstrasse 19 Fax: ++49 (0)711-685 6 5832 70550 Stuttgartemail: kel...@hlrs.de Germany AIM/Skype:rusraink
Re: [OMPI devel] 3rd party code contributions
I'm going to "re-integrate" Jeff and Brian's comments into one reponse. I have no problem with either of their observations. I only included the event library, backtrace, and PLPA in my list for completeness. I expected we would continue to treat those as we are, recognizing that this means -someone- is going to have to step up to support those when we need to update them. In the event library case, I know people have talked about a major change coming soon - a release that has significant improvement we may care about. Not sure when that might happen, or who is going to do that integration. As to ROMIO: as with many of the community's "planned" contributions, they have tended to fade with time and personnel turnover. At this time, there is no way LANL could support a ROMIO integration without a significant delay to the proposed 1.3 release schedule. Not that such a delay particularly bothers me - I don't see a pressing need to just throw something out there, and I have been beaten severely around the neck-and-shoulders the last two days about how out of date our ROMIO version is, and that it lacks a critical Panasas patch that is severely impacting performance. I'll continue to talk to people here about possibly getting help with ROMIO. I don't know the prospects, but it will take some time for someone to become familiar enough with our code base/build system to make a real contribution. Alternatively, -I- may have to take this on, which will definitely delay the 1.3 RTE work, effectively just transferring the "blocker" from one part of the code to another. ;-) But we can deal with that on a separate thread. For now, I think Jeff's last response to the other thread is where we are converging: delay work on a 3rd party contribution system until we have more cycles, but don't bring more 3rd party code (post-libNBC) in until we have a better mechanism. Ralph On 2/8/08 9:06 AM, "Jeff Squyres" wrote: > On Feb 8, 2008, at 10:38 AM, Ralph Castain wrote: > >> I thought maybe we should move this to another thread as it really >> isn't >> about Torsten's specific RFC. >> >> I just took a quick gander at the code base to see how extensive this >> problem might really be per Terry's concern. What I found was that >> we have >> added 3rd party code in several places. How we want to define them >> in terms >> of this issue is probably something for discussion. >> >> Packages I could readily identify include: >> >> 1. event library >> 4. backtrace >> 5. PLPA - this one is a little less obvious, but still being >> released as a >> separate package > > FWIW, these packages are part of "core" OMPI and are not especially > problematic. We upgrade them when we have a need or desire to (which > has been low frequency); we don't try to stay in sync with their > release schedules at all. > >> 2. ROMIO > > ROMIO has traditionally been a problem (keeping up with its releases > and patches). We have long-since agreed that we definitely want to > include ROMIO in our tarball, even though that presents challenges. > One thing that makes it *slightly* easier is that Brian added the > mechanics for OMPI to use a ROMIO that is outside of Open MPI rather > than the one that is bundled with it. It's not a perfect solution, > but it does help some. > >> 3. VT >> 6. libNBC > > These two are definitely in the "contrib" category. > >> There may well be others - these are only the ones I know about. By >> 3rd >> party package, I mean these are blocks of code obtained as a complete, >> distinct version and "dropped in" to the OMPI code repository, and >> then to >> some degree tied into our build system. They are not code specifically >> developed for OMPI by OMPI developers. > > Those are all that I'm aware of. > >> We have already discussed the issues with this approach. I am >> particularly >> concerned with the maintenance and release cycle issues right now. >> >> If these packages could be linked to our code instead of embedded >> within it, >> then it seems to me that updating them could become much easier. For >> example, we could download and install the latest ROMIO + Panasas >> patch, >> compile it, and simply link it into libompi - without occupying >> someone with >> constantly fixing the build system issues, etc. > > FWIW: > > - event,backtrace,PLPA,ROMIO are included in OMPI because we wanted to > certify them as part of "core" OMPI. That is, we wanted to certify > the whole system (vs. relying on [untested] combinations of versions > that already exist on users' systems). > > - ROMIO is likely the only one of that group that presents ongoing > logistics problems. The mechanism Brian added was seen as a > workaround. Argonne will definitely need to be involved at some level > to improve the ROMIO integration. Some talks started between Brian, > me, and Rob(ANL) about a) making our integration better/easier, and b) > having access to the ROMIO SVN to be able to suck down releases when > we want to,
[OMPI devel] PML V will be enabled again
Hi everyone, All the problems detected last time PML V has been enabled in trunk have been fixed. We invite you to give it a try (add a .ompi_unignore in ompi/mca/pml/v) on your favorite platform and compilation options and report any issues you may encounter. If none are detected, we plan to remove the ignore tag on wed. feb. 6. Thanks, Aurelien -- Dr. Aurélien Bouteiller Sr. Research Associate - Innovative Computing Laboratory Suite 350, 1122 Volunteer Boulevard Knoxville, TN 37996 865 974 6321
Re: [OMPI devel] Datasize confusion in MPI_Write can lead to data los!
Here is sketch of a ROMIO patch for Open MPI. I just wrote it, I didn't had time to test it. If you can test it please let me know if this solve the problem. Thanks, george. Index: iscontig.c === --- iscontig.c (revision 17399) +++ iscontig.c (working copy) @@ -58,6 +58,20 @@ *flag = MPI_SGI_type_is_contig(datatype) && (displacement == 0); } +#elif defined(OMPI_MPI_H) + +#include "ompi/datatype/datatype.h" + +void ADIOI_Datatype_iscontig(MPI_Datatype datatype, int *flag) +{ +/* + * Open MPI contiguous check return true for datatype with + * gaps in the beginning and at the end. We have to provide + * a count of 2 in order to get these gaps taken into acount. + */ +*flag = ompi_ddt_is_contiguous_memory_layout( datatype, 2); +} + #else On Feb 8, 2008, at 12:26 PM, Rainer Keller wrote: Hi George, Good, if You come to the same conclusion with regard to romio using MPI_Type_size internally in RomIO... So taking iscontig.c ,-] /* This function needs more work. It should check for contiguity in other cases as well.*/ and mail to the romio list or have a specialized version of ADIOI_Datatype_iscontig for ompi ,-] Either way, the mpi_test_suite in that regard is sane. Thanks, Rainer On Friday 08 February 2008 18:22, George Bosilca wrote: MPI_Type_size is supposed to return only the size of useful data, which apparently it does (MPI_SHORT_INT is 6 bytes). What I think it happens is that the MPI_SHORT_INT type is a predefined one, but it's a really strange predefined type. It's one of the few that are not contiguous. The problem seems to come from the fact that the MPI_File_write do a contiguous write for the predefined data types, making the assumption that they are all contiguous. I tracked the problem down in the romio/adio/common/is_contig.c file. For Open MPI the last #else branch is used. The first case in the switch check for the MPI_COMBINER_NAMED (which is what an MPI is supposed to return for predefined data types) and set the flag to 1 (which means contiguous). This is obviously wrong for MPI_SHORT_INT. It really look like a ROMIO problem, so I guess this email should be redirected to their mailing list. Thanks, george. On Feb 8, 2008, at 12:50 PM, Christoph Niethammer wrote: Hello! I tested openMPI at HLRS for some time without detecting new problems in the implementation but now I recognized some awful ones with MPI_Write which can lead to data los: When creating a struct for a mixed datatype like struct { short a; int b; } the C-compiler introduce a gap of 2 bytes in the data representation for this type due to the 4byte alignment of the integer on 32bit systems. If I now try to use MPI_File_write to write these data to a file and use MPI_SHORT_INT as mpi_datatype this leads to a data los. I located the problem at the combined use of "write" and MPI_Type_size in MPI_File_write. So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct uses 8 bytes in memory as there is a gap of 2 bytes. The write function in ad_write.c now leads to the los of the data because the gaps are not within the calculation of the complete data size to be written into the file. This problem occures also in the other io functions. As far as I could find out the problem seems not to be present with derived data types. The question is now how to "fix": i) Either the MPI_Standard is not clear in this point and the data types MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used with structs of these types, ii) Or the implementation of the MPI_Type_size function has to be modified to return the value of eg. true_ub which contains the correct value iii) Or the MPI_File_write function has not to use the write function in the "continues" way on the data and should take care of the gaps. Regards Christoph Niethammer ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Dipl.-Inf. Rainer Keller http://www.hlrs.de/people/keller HLRS Tel: ++49 (0)711-685 6 5858 Nobelstrasse 19 Fax: ++49 (0)711-685 6 5832 70550 Stuttgartemail: kel...@hlrs.de Germany AIM/Skype:rusraink smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] Datasize confusion in MPI_Write can lead to data los!
The patch I send few minutes ago will only remove the problem for Open MPI. However, their generic test for contiguous data types is still broken. Only checking for COMBINER_NAMED is clearly not enough. A second test checking that the size and the extent of the data types are equal will make the check a lot more accurate. Thanks, george. On Feb 8, 2008, at 12:26 PM, Rainer Keller wrote: Hi George, Good, if You come to the same conclusion with regard to romio using MPI_Type_size internally in RomIO... So taking iscontig.c ,-] /* This function needs more work. It should check for contiguity in other cases as well.*/ and mail to the romio list or have a specialized version of ADIOI_Datatype_iscontig for ompi ,-] Either way, the mpi_test_suite in that regard is sane. Thanks, Rainer On Friday 08 February 2008 18:22, George Bosilca wrote: MPI_Type_size is supposed to return only the size of useful data, which apparently it does (MPI_SHORT_INT is 6 bytes). What I think it happens is that the MPI_SHORT_INT type is a predefined one, but it's a really strange predefined type. It's one of the few that are not contiguous. The problem seems to come from the fact that the MPI_File_write do a contiguous write for the predefined data types, making the assumption that they are all contiguous. I tracked the problem down in the romio/adio/common/is_contig.c file. For Open MPI the last #else branch is used. The first case in the switch check for the MPI_COMBINER_NAMED (which is what an MPI is supposed to return for predefined data types) and set the flag to 1 (which means contiguous). This is obviously wrong for MPI_SHORT_INT. It really look like a ROMIO problem, so I guess this email should be redirected to their mailing list. Thanks, george. On Feb 8, 2008, at 12:50 PM, Christoph Niethammer wrote: Hello! I tested openMPI at HLRS for some time without detecting new problems in the implementation but now I recognized some awful ones with MPI_Write which can lead to data los: When creating a struct for a mixed datatype like struct { short a; int b; } the C-compiler introduce a gap of 2 bytes in the data representation for this type due to the 4byte alignment of the integer on 32bit systems. If I now try to use MPI_File_write to write these data to a file and use MPI_SHORT_INT as mpi_datatype this leads to a data los. I located the problem at the combined use of "write" and MPI_Type_size in MPI_File_write. So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct uses 8 bytes in memory as there is a gap of 2 bytes. The write function in ad_write.c now leads to the los of the data because the gaps are not within the calculation of the complete data size to be written into the file. This problem occures also in the other io functions. As far as I could find out the problem seems not to be present with derived data types. The question is now how to "fix": i) Either the MPI_Standard is not clear in this point and the data types MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used with structs of these types, ii) Or the implementation of the MPI_Type_size function has to be modified to return the value of eg. true_ub which contains the correct value iii) Or the MPI_File_write function has not to use the write function in the "continues" way on the data and should take care of the gaps. Regards Christoph Niethammer ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Dipl.-Inf. Rainer Keller http://www.hlrs.de/people/keller HLRS Tel: ++49 (0)711-685 6 5858 Nobelstrasse 19 Fax: ++49 (0)711-685 6 5832 70550 Stuttgartemail: kel...@hlrs.de Germany AIM/Skype:rusraink smime.p7s Description: S/MIME cryptographic signature
[OMPI devel] request help debugging openib btl problem
I'm using openmpi 1.2.5 with a QLogic HCA and using the openib btl (not PSM). osu_latency and osu_bw work OK but when I run osu_bibw with a message size of 2MB (1<<21), it hangs in btl_openib_component_progress() waiting for something. I tried adding printfs at each point where ibv_post_send(), ibv_post_recv(), and ibv_poll_cq() are called and then ran a python script which verified that all sends and recvs got a good completion notice in the posted order (mca_btl_openib_component.use_srq is zero for this test) Note that only RC SEND (12252 byte) messages are being sent at this point. I can send the trace of ibv_* calls if it will help. Any suggestions what to look for are welcome.