Just a reminder -- this RFC timed out today.

If there are no objections to this, I'll commit the patch on #4205 to the trunk 
tomorrow evening.

No one has come up with a patch yet for the v1.7 branch (because of ABI 
reasons, it must be different than what we do on the trunk), but since that is 
definitely a bug fix, it can go in at any time.


On Feb 10, 2014, at 7:14 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

> WHAT: On trunk, force MPI_Count/MPI_Offset to be 32 bits when building in 32 
> bit mode (they are currently 64 bit, even in a 32 bit build).  On v1.7, leave 
> the sizes at 64 bit (for ABI reasons), but put error checking in the MPI API 
> layer to ensure we won't over/underflow 32 bits.
> 
> WHY: See ticket #4205 (https://svn.open-mpi.org/trac/ompi/ticket/4205)
> 
> WHERE: On trunk, this can be solved entirely in configury.  In v1.7/v1.8, 
> make changes in the MPI API layer (e.g., check MPI_Send to ensure 
> (count*size_of_datatype)<2B)
> 
> TIMEOUT: I'll tentatively say next Tuesday teleconf, Feb 18, 2014, but it can 
> be pushed back -- there's no real rush; this isn't a hot issue (but it is 
> wrong and should be fixed).
> 
> MORE DETAIL:
> 
> I noticed that MPI_Get_elements_x() and MPI_Type_size_x() were giving wrong 
> answers when compiled in 32 bit mode on a 64 bit machine.  This is because in 
> that build:
> 
> - size_t: 4 bytes
> - ptrdiff_t: 4 bytes
> - MPI_Aint: 4 bytes
> - MPI_Offset: 8 bytes
> - MPI_Count: 8 bytes
> 
> Some data points:
> 
> 1. MPI-3 says that MPI_Count must be big enough to hold both an MPI_Aint and 
> MPI_Offset.
> 
> 2. The entire PML/BML/BTL/convertor infrastructure uses size_t as its 
> underlying computation type.
> 
> 3. The _x tests were failing in 32 bit builds because they take 
> (count,datatype) input that intentionally results in a number of bytes that 
> is larger than 2 billion, assigned that value to a size_t (which is 32 bits), 
> caused an overflow, and therefore got the wrong answer.
> 
> To solve this:
> 
> - On the trunk, we can just not allow MPI_Count (and therefore MPI_Offset) to 
> be larger than size_t.  This means that on 32 bit builds -- on both 32 and 64 
> bit systems -- sizeof(MPI_Aint) == sizeof(MPI_Offset) == sizeof(MPI_Count) == 
> 4.  There is a patch for this on #4205.
> 
> - Because of ABI issues, we cannot change the size of MPI_Count/MPI_Offset on 
> v1.7, so we can just check for over/underflow in the MPI API.  For example, 
> we can check that (count * size_of_datatype) < 2 billion (other checks will 
> also be necessary; this is just an example).  I have no patch for this yet.
> 
> As a side effect, this means that -- for 32 bit builds -- we will not support 
> large filesystems well (e.g., filesystems with 64 bit offsets).  BlueGene is 
> an example of such a system (not that OMPI supports BlueGene, but...).  
> Specifically: for 32 bit builds, we'll only allow MPI_Offset to be 32 bits.  
> I don't think that this is a major issue, because 32 bit builds are not a 
> huge issue for the OMPI community, but I raise the point in the spirit of 
> full disclosure.  Fixing it to allow 32 bit MPI_Aint but 64 bit MPI_Offset 
> and MPI_Count would likely mean re-tooling the PML/BML/BTL/convertor 
> infrastructure to use something other than size_t, and I have zero desire to 
> do that!  (please, no OMPI vendor reveal that they're going to seriously 
> build giant 32 bit systems...)
> 
> Also, while investigating this issue, I discovered that the configury for 
> determining the Fortran MPI_ADDRESS_KIND, MPI_OFFSET_KIND, and MPI_COUNT_KIND 
> values were unrelated to the C types that we discovered for these concepts.  
> The patch on #4205 fixes this issue as well -- the Fortran MPI_*_KIND value 
> are now directly correlated with the C types that were discovered.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to