Re: [OMPI devel] Missing support for 2 types in MPI_Sizeof()

2016-04-18 Thread DERBEY, NADIA
Agreed!

On 04/15/2016 11:46 PM, Jeff Squyres (jsquyres) wrote:
> All sounds like good reasons to amend the Bull test suite to no longer check 
> for character and logical.  :-)
>
>
>> On Apr 15, 2016, at 5:38 PM, Larry Baker  wrote:
>>
>> I also remember reading in the past about problems with C sizeof and 
>> multibyte characters.  I just looked over the C90 standard.  In C90 code, 
>> the sizeof operator returns size_t in bytes.  Except that it always returns 
>> 1 for char, signed char, or unsigned char.  For an array, C90 says sizeof 
>> returns the number of bytes.  I interpret this to mean that when the 
>> execution character set is a 16-bit multibyte character set, sizeof a char 
>> is 1, while sizeof a char[1] is 2.  I've never actually tested this.
>>
>> Any programs that marshall character strings for interchange have to be 
>> quite specific, I think, in the character set being exchanged.  I don't 
>> think MPI SIZEOF has a way to know or specify the semantics of the character 
>> set.
>>
>> Larry Baker
>> US Geological Survey
>> 650-329-5608
>> ba...@usgs.gov
>>
>>
>>
>> On 15 Apr 2016, at 2:23 PM, Larry Baker wrote:
>>
>>> Be careful what you wish for.
>>>
>>> I remember looking at this issue a while ago, but I can't remember why or 
>>> how I ran into it.  I do remember convincing myself that the MPI standard 
>>> was correct in restricting SIZEOF to numeric types.  For one thing, a 
>>> character variable type is a string container in Fortran, while in C it is 
>>> a single character.  What would be the correct interpretation for SIZEOF in 
>>> Fortran?  The maximum length?  The TRIM'd length?  What would be the 
>>> correct interpretation in C?  1?  strlen()?  strlen()+1?  The size of a 
>>> character itself may not be the same on either end of an MPI connection if, 
>>> for example, one program is compiled using 8-bit characters and the other 
>>> is compiled using uses 16-bit characters.  Interchanging strings opens up a 
>>> can of worms.  As far as logical, there is no C logical type.  In Fortran, 
>>> while the size of a logical variable is specified as a "storage unit" (the 
>>> same as an integer), the representation of true and false is unspecified, 
>>> and, thus, is processor dep
>   endent.  On VAXes, only a single bit matters; the instruction set supports 
> this logical data type.  (In C, thought there is no logical data type, the C 
> standard does specify 0=false and 1=true for the result of relational and 
> logical operators and 0=false and not 0=true for logical operator operands.)  
> This means it is problematic to exchange logical data between Fortran 
> programs (C makes no sense, since there is no logical data type) when 
> different compilers (part of what Fortran calls a processor) are used.
>>>
>>> Better to find out what discussions took place in the MPI standards 
>>> committee before adding extensions to SIZEOF.  They may very well have good 
>>> reasons to avoid character and logical data, as I concluded.
>>>
>>> Larry Baker
>>> US Geological Survey
>>> 650-329-5608
>>> ba...@usgs.gov
>>>
>>>
>>>
>>> On 15 Apr 2016, at 5:34 AM, Jeff Squyres (jsquyres) wrote:
>>>
 Nadia --

 I believe that the character and logical types are not in this script 
 already because the description of MPI_SIZEOF in MPI-3.1 says that the 
 input choice buffer parameter is:

 IN x a Fortran variable of numeric intrinsic type (choice)

 As I understand it (and my usual disclaimer here: I am *not* a Fortran 
 expert), CHARACTER and LOGICAL types are not numeric in Fortran.

 However, we could add such interfaces as an extension.

 I just checked MPICH 3.2, and they *do* include MPI_SIZEOF interfaces for 
 CHARACTER and LOGICAL, but they are missing many of the other MPI_SIZEOF 
 interfaces that we have in OMPI.  Meaning: OMPI and MPICH already diverge 
 wildly on MPI_SIZEOF.  :-\

 I guess I don't have a strong opinion here.  If you file a PR for this 
 patch, I won't object.  :-)


> On Apr 15, 2016, at 3:22 AM, DERBEY, NADIA  wrote:
>
> Hi,
>
> The following trivial example doesn't compile because of 2 missing types
> in the MPI_SIZEOF subroutines (in mpi_sizeof.f90).
>
> [derbeyn@btp0 test]$ cat mpi_sizeof.f90
>   program main
> !use mpi
>   include 'mpif.h'
>
>   integer ierr, sz, mpisize
>   real r1
>   integer i1
>   character ch1
>   logical l1
>
>   call MPI_INIT(ierr)
>   call MPI_SIZEOF(r1, sz, ierr)
>   call MPI_SIZEOF(i1, sz, ierr)
>   call MPI_SIZEOF(l1, sz, ierr)
>   call MPI_SIZEOF(ch1, sz, ierr)
>   call MPI_FINALIZE(ierr)
>
>   end
> [derbeyn@btp0 test]$ mpif90 -o mpi_sizeof mpi_sizeof.f90
> mpi_sizeof.f90(14): error #6285: There is no matching specif

Re: [OMPI devel] psm2 and psm2_ep_open problems

2016-04-18 Thread Howard Pritchard
please point me to the patch.

--

sent from my smart phonr so no good type.

Howard
On Apr 15, 2016 1:04 PM, "Ralph Castain"  wrote:

> I have a patch that I think will resolve this problem - would you please
> take a look?
>
> Ralph
>
>
>
> On Apr 15, 2016, at 7:32 AM, Ralph Castain  wrote:
>
> Actually, it did come across the developer list :-)
>
> Why don’t I resolve this by just ensuring that the key we create is
> properly filled? It’s a trivial fix in the PMI ess component
>
>
> On Apr 15, 2016, at 7:26 AM, Howard Pritchard  wrote:
>
> I didn't copy dev on this.
>
>
>
> -- Weitergeleitete Nachricht --
> Von: *Howard Pritchard* 
> Datum: Donnerstag, 14. April 2016
> Betreff: psm2 and psm2_ep_open problems
> An: Open MPI Developers 
>
>
> Hi Matias
>
> Actually I triaged this further.  Open mpi PMI subsystem is actually doing
> things correctly wrt env variable setting with or without mpi run.  The
> problem has to do with a psm2  and the fact that on my cluster right now
> SLURM has only scheduled about 25 jobs.  This results in the unique key
> PSM2 Mtl is feeding to PSM2 has lots of zeros inthe initial part of the
> key.  This ends up messing up the epid generated in PSM2.  OFI MTL doesn't
> have this problem because the PSM2 provider has some of these LSBs set in
> the value it passes to PSM2.
>
> I will open a PR to "fix" the PSM2MTL to handle this feature of PSM2.
>
> Howard
>
> Am Donnerstag, 14. April 2016 schrieb Cabral, Matias A :
>
>> Hi Howard,
>>
>>
>>
>> I suspect this is the known issue that when using SLURM with OMPI and PSM
>> that is discussed here:
>>
>> https://www.open-mpi.org/community/lists/users/2010/12/15220.php
>>
>>
>>
>> As per today, orte generates the psm_key, so when using SLURM this does
>> not happen and is necessary to set it in the environment.  Here Ralph
>> explains the workaround:
>>
>> https://www.open-mpi.org/community/lists/users/2010/12/15242.php
>>
>>
>>
>> As you found, epid of 0 is not a valid value. So, basing comments on:
>>
>> https://github.com/01org/opa-psm2/blob/master/psm_ep.c
>>
>>
>>
>> the assert of line 832. psmi_ep_open_device()  will do :
>>
>>
>>
>> /*
>>
>> * We use a LID of 0 for non-HFI
>> communication.
>>
>> * Since a jobkey is not available from
>> IPS, pull the
>>
>> * first 16 bits from the UUID.
>>
>> */
>>
>>
>>
>> *epid = PSMI_EPID_PACK(((uint16_t *)
>> unique_job_key)[0],
>>
>>(rank
>> >> 3), rank, 0,
>>
>>
>> PSMI_HFI_TYPE_DEFAULT,
>> rank);
>>
>>  In the particular case you mention below, when there is no HFI (shared
>> memory), rank 0 and the passed key is 0, epid will be 0.
>>
>>
>>
>> SOLUTION: set
>>
>> Set in the environment OMPI_MCA_orte_precondition_transports with a value
>> different than 0.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> _MAC
>>
>>
>>
>> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Howard
>> Pritchard
>> *Sent:* Thursday, April 14, 2016 1:10 PM
>> *To:* Open MPI Developers List 
>> *Subject:* [OMPI devel] psm2 and psm2_ep_open problems
>>
>>
>>
>> Hi Folks,
>>
>>
>>
>> So we have this brand-new omnipath cluster here at work,
>>
>> but people are having problem using it on a single node using
>>
>> srun as the job launcher.
>>
>>
>>
>> The customer wants to use srun to launch jobs not the open mpi
>>
>> mpirun.
>>
>>
>>
>> The customer installed 1.10.1, but I can reproduce the
>>
>> problem with v2.x and I'm sure with master, unless I build the
>>
>> ofi mtl.  ofi mtl works, psm2 mtl doesn't.
>>
>>
>>
>> I downloaded the psm2 code from github and started hacking.
>>
>>
>>
>> What appears to be the problem is that when running on a single
>>
>> node one can go through a path in psmi_ep_open_device where
>>
>> for a single process job, the value stored into epid is zero.
>>
>>
>>
>> This results in an assert failing in the __psm2_ep_open_internal
>>
>> function.
>>
>>
>>
>> Is there a quick and dirty workaround that doesn't involve fixing
>>
>> psm2 MTL?  I could suggest to the sysadmins to install libfabric 1.3
>>
>> and build the openmpi to only have ofi mtl, but perhaps there's
>>
>> another way to get psm2 mtl to work for single node jobs?  I'd prefer
>>
>> to not ask users to disable psm2 mtl explicitly for their single node
>> jobs.
>>
>>
>>
>> Thanks for suggestions.
>>
>>
>>
>> Howard
>>
>>
>>
>>
>>
>>
>>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/04/18773.php
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.or