Just a reminder -- this RFC timed out today. If there are no objections to this, I'll commit the patch on #4205 to the trunk tomorrow evening.
No one has come up with a patch yet for the v1.7 branch (because of ABI reasons, it must be different than what we do on the trunk), but since that is definitely a bug fix, it can go in at any time. On Feb 10, 2014, at 7:14 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > WHAT: On trunk, force MPI_Count/MPI_Offset to be 32 bits when building in 32 > bit mode (they are currently 64 bit, even in a 32 bit build). On v1.7, leave > the sizes at 64 bit (for ABI reasons), but put error checking in the MPI API > layer to ensure we won't over/underflow 32 bits. > > WHY: See ticket #4205 (https://svn.open-mpi.org/trac/ompi/ticket/4205) > > WHERE: On trunk, this can be solved entirely in configury. In v1.7/v1.8, > make changes in the MPI API layer (e.g., check MPI_Send to ensure > (count*size_of_datatype)<2B) > > TIMEOUT: I'll tentatively say next Tuesday teleconf, Feb 18, 2014, but it can > be pushed back -- there's no real rush; this isn't a hot issue (but it is > wrong and should be fixed). > > MORE DETAIL: > > I noticed that MPI_Get_elements_x() and MPI_Type_size_x() were giving wrong > answers when compiled in 32 bit mode on a 64 bit machine. This is because in > that build: > > - size_t: 4 bytes > - ptrdiff_t: 4 bytes > - MPI_Aint: 4 bytes > - MPI_Offset: 8 bytes > - MPI_Count: 8 bytes > > Some data points: > > 1. MPI-3 says that MPI_Count must be big enough to hold both an MPI_Aint and > MPI_Offset. > > 2. The entire PML/BML/BTL/convertor infrastructure uses size_t as its > underlying computation type. > > 3. The _x tests were failing in 32 bit builds because they take > (count,datatype) input that intentionally results in a number of bytes that > is larger than 2 billion, assigned that value to a size_t (which is 32 bits), > caused an overflow, and therefore got the wrong answer. > > To solve this: > > - On the trunk, we can just not allow MPI_Count (and therefore MPI_Offset) to > be larger than size_t. This means that on 32 bit builds -- on both 32 and 64 > bit systems -- sizeof(MPI_Aint) == sizeof(MPI_Offset) == sizeof(MPI_Count) == > 4. There is a patch for this on #4205. > > - Because of ABI issues, we cannot change the size of MPI_Count/MPI_Offset on > v1.7, so we can just check for over/underflow in the MPI API. For example, > we can check that (count * size_of_datatype) < 2 billion (other checks will > also be necessary; this is just an example). I have no patch for this yet. > > As a side effect, this means that -- for 32 bit builds -- we will not support > large filesystems well (e.g., filesystems with 64 bit offsets). BlueGene is > an example of such a system (not that OMPI supports BlueGene, but...). > Specifically: for 32 bit builds, we'll only allow MPI_Offset to be 32 bits. > I don't think that this is a major issue, because 32 bit builds are not a > huge issue for the OMPI community, but I raise the point in the spirit of > full disclosure. Fixing it to allow 32 bit MPI_Aint but 64 bit MPI_Offset > and MPI_Count would likely mean re-tooling the PML/BML/BTL/convertor > infrastructure to use something other than size_t, and I have zero desire to > do that! (please, no OMPI vendor reveal that they're going to seriously > build giant 32 bit systems...) > > Also, while investigating this issue, I discovered that the configury for > determining the Fortran MPI_ADDRESS_KIND, MPI_OFFSET_KIND, and MPI_COUNT_KIND > values were unrelated to the C types that we discovered for these concepts. > The patch on #4205 fixes this issue as well -- the Fortran MPI_*_KIND value > are now directly correlated with the C types that were discovered. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/