Re: [OMPI devel] Missing support for 2 types in MPI_Sizeof()
Agreed! On 04/15/2016 11:46 PM, Jeff Squyres (jsquyres) wrote: > All sounds like good reasons to amend the Bull test suite to no longer check > for character and logical. :-) > > >> On Apr 15, 2016, at 5:38 PM, Larry Baker wrote: >> >> I also remember reading in the past about problems with C sizeof and >> multibyte characters. I just looked over the C90 standard. In C90 code, >> the sizeof operator returns size_t in bytes. Except that it always returns >> 1 for char, signed char, or unsigned char. For an array, C90 says sizeof >> returns the number of bytes. I interpret this to mean that when the >> execution character set is a 16-bit multibyte character set, sizeof a char >> is 1, while sizeof a char[1] is 2. I've never actually tested this. >> >> Any programs that marshall character strings for interchange have to be >> quite specific, I think, in the character set being exchanged. I don't >> think MPI SIZEOF has a way to know or specify the semantics of the character >> set. >> >> Larry Baker >> US Geological Survey >> 650-329-5608 >> ba...@usgs.gov >> >> >> >> On 15 Apr 2016, at 2:23 PM, Larry Baker wrote: >> >>> Be careful what you wish for. >>> >>> I remember looking at this issue a while ago, but I can't remember why or >>> how I ran into it. I do remember convincing myself that the MPI standard >>> was correct in restricting SIZEOF to numeric types. For one thing, a >>> character variable type is a string container in Fortran, while in C it is >>> a single character. What would be the correct interpretation for SIZEOF in >>> Fortran? The maximum length? The TRIM'd length? What would be the >>> correct interpretation in C? 1? strlen()? strlen()+1? The size of a >>> character itself may not be the same on either end of an MPI connection if, >>> for example, one program is compiled using 8-bit characters and the other >>> is compiled using uses 16-bit characters. Interchanging strings opens up a >>> can of worms. As far as logical, there is no C logical type. In Fortran, >>> while the size of a logical variable is specified as a "storage unit" (the >>> same as an integer), the representation of true and false is unspecified, >>> and, thus, is processor dep > endent. On VAXes, only a single bit matters; the instruction set supports > this logical data type. (In C, thought there is no logical data type, the C > standard does specify 0=false and 1=true for the result of relational and > logical operators and 0=false and not 0=true for logical operator operands.) > This means it is problematic to exchange logical data between Fortran > programs (C makes no sense, since there is no logical data type) when > different compilers (part of what Fortran calls a processor) are used. >>> >>> Better to find out what discussions took place in the MPI standards >>> committee before adding extensions to SIZEOF. They may very well have good >>> reasons to avoid character and logical data, as I concluded. >>> >>> Larry Baker >>> US Geological Survey >>> 650-329-5608 >>> ba...@usgs.gov >>> >>> >>> >>> On 15 Apr 2016, at 5:34 AM, Jeff Squyres (jsquyres) wrote: >>> Nadia -- I believe that the character and logical types are not in this script already because the description of MPI_SIZEOF in MPI-3.1 says that the input choice buffer parameter is: IN x a Fortran variable of numeric intrinsic type (choice) As I understand it (and my usual disclaimer here: I am *not* a Fortran expert), CHARACTER and LOGICAL types are not numeric in Fortran. However, we could add such interfaces as an extension. I just checked MPICH 3.2, and they *do* include MPI_SIZEOF interfaces for CHARACTER and LOGICAL, but they are missing many of the other MPI_SIZEOF interfaces that we have in OMPI. Meaning: OMPI and MPICH already diverge wildly on MPI_SIZEOF. :-\ I guess I don't have a strong opinion here. If you file a PR for this patch, I won't object. :-) > On Apr 15, 2016, at 3:22 AM, DERBEY, NADIA wrote: > > Hi, > > The following trivial example doesn't compile because of 2 missing types > in the MPI_SIZEOF subroutines (in mpi_sizeof.f90). > > [derbeyn@btp0 test]$ cat mpi_sizeof.f90 > program main > !use mpi > include 'mpif.h' > > integer ierr, sz, mpisize > real r1 > integer i1 > character ch1 > logical l1 > > call MPI_INIT(ierr) > call MPI_SIZEOF(r1, sz, ierr) > call MPI_SIZEOF(i1, sz, ierr) > call MPI_SIZEOF(l1, sz, ierr) > call MPI_SIZEOF(ch1, sz, ierr) > call MPI_FINALIZE(ierr) > > end > [derbeyn@btp0 test]$ mpif90 -o mpi_sizeof mpi_sizeof.f90 > mpi_sizeof.f90(14): error #6285: There is no matching specif
Re: [OMPI devel] psm2 and psm2_ep_open problems
please point me to the patch. -- sent from my smart phonr so no good type. Howard On Apr 15, 2016 1:04 PM, "Ralph Castain" wrote: > I have a patch that I think will resolve this problem - would you please > take a look? > > Ralph > > > > On Apr 15, 2016, at 7:32 AM, Ralph Castain wrote: > > Actually, it did come across the developer list :-) > > Why don’t I resolve this by just ensuring that the key we create is > properly filled? It’s a trivial fix in the PMI ess component > > > On Apr 15, 2016, at 7:26 AM, Howard Pritchard wrote: > > I didn't copy dev on this. > > > > -- Weitergeleitete Nachricht -- > Von: *Howard Pritchard* > Datum: Donnerstag, 14. April 2016 > Betreff: psm2 and psm2_ep_open problems > An: Open MPI Developers > > > Hi Matias > > Actually I triaged this further. Open mpi PMI subsystem is actually doing > things correctly wrt env variable setting with or without mpi run. The > problem has to do with a psm2 and the fact that on my cluster right now > SLURM has only scheduled about 25 jobs. This results in the unique key > PSM2 Mtl is feeding to PSM2 has lots of zeros inthe initial part of the > key. This ends up messing up the epid generated in PSM2. OFI MTL doesn't > have this problem because the PSM2 provider has some of these LSBs set in > the value it passes to PSM2. > > I will open a PR to "fix" the PSM2MTL to handle this feature of PSM2. > > Howard > > Am Donnerstag, 14. April 2016 schrieb Cabral, Matias A : > >> Hi Howard, >> >> >> >> I suspect this is the known issue that when using SLURM with OMPI and PSM >> that is discussed here: >> >> https://www.open-mpi.org/community/lists/users/2010/12/15220.php >> >> >> >> As per today, orte generates the psm_key, so when using SLURM this does >> not happen and is necessary to set it in the environment. Here Ralph >> explains the workaround: >> >> https://www.open-mpi.org/community/lists/users/2010/12/15242.php >> >> >> >> As you found, epid of 0 is not a valid value. So, basing comments on: >> >> https://github.com/01org/opa-psm2/blob/master/psm_ep.c >> >> >> >> the assert of line 832. psmi_ep_open_device() will do : >> >> >> >> /* >> >> * We use a LID of 0 for non-HFI >> communication. >> >> * Since a jobkey is not available from >> IPS, pull the >> >> * first 16 bits from the UUID. >> >> */ >> >> >> >> *epid = PSMI_EPID_PACK(((uint16_t *) >> unique_job_key)[0], >> >>(rank >> >> 3), rank, 0, >> >> >> PSMI_HFI_TYPE_DEFAULT, >> rank); >> >> In the particular case you mention below, when there is no HFI (shared >> memory), rank 0 and the passed key is 0, epid will be 0. >> >> >> >> SOLUTION: set >> >> Set in the environment OMPI_MCA_orte_precondition_transports with a value >> different than 0. >> >> >> >> Thanks, >> >> >> >> _MAC >> >> >> >> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Howard >> Pritchard >> *Sent:* Thursday, April 14, 2016 1:10 PM >> *To:* Open MPI Developers List >> *Subject:* [OMPI devel] psm2 and psm2_ep_open problems >> >> >> >> Hi Folks, >> >> >> >> So we have this brand-new omnipath cluster here at work, >> >> but people are having problem using it on a single node using >> >> srun as the job launcher. >> >> >> >> The customer wants to use srun to launch jobs not the open mpi >> >> mpirun. >> >> >> >> The customer installed 1.10.1, but I can reproduce the >> >> problem with v2.x and I'm sure with master, unless I build the >> >> ofi mtl. ofi mtl works, psm2 mtl doesn't. >> >> >> >> I downloaded the psm2 code from github and started hacking. >> >> >> >> What appears to be the problem is that when running on a single >> >> node one can go through a path in psmi_ep_open_device where >> >> for a single process job, the value stored into epid is zero. >> >> >> >> This results in an assert failing in the __psm2_ep_open_internal >> >> function. >> >> >> >> Is there a quick and dirty workaround that doesn't involve fixing >> >> psm2 MTL? I could suggest to the sysadmins to install libfabric 1.3 >> >> and build the openmpi to only have ofi mtl, but perhaps there's >> >> another way to get psm2 mtl to work for single node jobs? I'd prefer >> >> to not ask users to disable psm2 mtl explicitly for their single node >> jobs. >> >> >> >> Thanks for suggestions. >> >> >> >> Howard >> >> >> >> >> >> >> > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/04/18773.php > > > > > ___ > devel mailing list > de...@open-mpi.or