Re: [OMPI devel] coll/ml without hwloc (?)

2014-08-27 Thread Ralph Castain
Done, set for 1.8.3

On Aug 26, 2014, at 7:56 AM, Shamis, Pavel  wrote:

> Theoretically, we may make it functional (with good performance) even without 
> hwloc.
> As it is today, I would suggest to disable ML if hwloc is disabled.
> 
> Best,
> Pasha
> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles
>> Gouaillardet
>> Sent: Tuesday, August 26, 2014 4:38 AM
>> To: Open MPI Developers
>> Subject: [OMPI devel] coll/ml without hwloc (?)
>> 
>> Folks,
>> 
>> i just commited r32604 in order to fix compilation (pmix) when ompi is
>> configured with --without-hwloc
>> 
>> now, even a trivial hello world program issues the following output
>> (which is a non fatal, and could even be reported as a warning) :
>> 
>> [soleil][[32389,1],0][../../../../../../src/ompi-
>> trunk/ompi/mca/coll/ml/coll_ml_module.c:1496:ml_discover_hierarchy]
>> COLL-ML Error: (size of mca_bcol_base_components_in_use = 3) != (size of
>> mca_sbgp_base_components_in_use = 2) or zero.
>> [soleil][[32389,1],1][../../../../../../src/ompi-
>> trunk/ompi/mca/coll/ml/coll_ml_module.c:1496:ml_discover_hierarchy]
>> COLL-ML Error: (size of mca_bcol_base_components_in_use = 3) != (size of
>> mca_sbgp_base_components_in_use = 2) or zero.
>> 
>> 
>> in my understanding, coll/ml somehow relies on the topology information
>> (reported by hwloc) so i am wondering whether we should simply
>> *not* compile coll/ml or set its priority to zero if ompi is configured
>> with --without-hwloc
>> 
>> any thoughts ?
>> 
>> Cheers,
>> 
>> Gilles
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: http://www.open-
>> mpi.org/community/lists/devel/2014/08/15708.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15711.php



Re: [OMPI devel] intercomm_create from the ibm test suite hangs

2014-08-27 Thread Ralph Castain
Took me awhile to track this down, but it is now fixed - combination of several 
minor errors

Thanks
Ralph

On Aug 27, 2014, at 4:07 AM, Gilles Gouaillardet 
 wrote:

> Folks,
> 
> the intercomm_create test case from the ibm test suite can hang under
> some configuration.
> 
> basically, it will spawn n tasks in a first communicator, and then n
> tasks in a second communicator.
> 
> when i run from node0 :
> mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2
> ./intercomm_create
> 
> the second spawn will hang.
> a simple workaround is to use 3 hosts :
> mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2,node3
> ./intercomm_create
> 
> the second spawn creates the task on node2.
> for some reasons i cannot fully understand, pmix believe orted of nodes
> node1 and node2 are involved in allgather.
> since node1 in not involved whatsoever, the program hangs
> /* in create_dmns, orte_get_job_data_object(sig->signature[0].jobid)
> returns jdata with jdata->map->num_nodes = 2 */
> 
> Cheers,
> 
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15732.php



Re: [OMPI devel] malloc 0 warnings

2014-08-27 Thread Lisandro Dalcin
On 27 August 2014 02:38, Jeff Squyres (jsquyres)  wrote:
> If you have reproducers, yes, that would be most helpful -- thanks.
>

Here you have another one...

$ cat igatherv.c
#include 
int main(int argc, char *argv[])
{
  signed char a=1,b=2;
  int rcounts[1] = {0};
  int rdispls[1] = {0};
  MPI_Request request;
  MPI_Init(, );
  MPI_Igatherv(, 0, MPI_SIGNED_CHAR,
   , rcounts, rdispls, MPI_SIGNED_CHAR,
   0, MPI_COMM_SELF, );
  MPI_Wait(, MPI_STATUS_IGNORE);
  MPI_Finalize();
  return 0;
}

$ mpicc igatherv.c
$ ./a.out
malloc debug: Request for 0 bytes (nbc_internal.h, 496)


Re: [OMPI devel] TKR

2014-08-27 Thread Jed Brown
"Jeff Squyres (jsquyres)"  writes:
> Before Fortran 08, there was no Fortran equivalent of C's (void*).
> Hence, it was actually impossible -- using pure Fortran -- to have
> Fortran prototypes for MPI subroutines that take a choice buffer
> (e.g., MPI_Send, which takes a (void*) buffer argument in C).

Just a note that Fortran 2008 doesn't really have this either.  It is in
TS 29113, which is scheduled for inclusion in the next Fortran standard.
Many compilers support it already.


pgpVXRXL8ISVM.pgp
Description: PGP signature


Re: [OMPI devel] malloc 0 warnings

2014-08-27 Thread Lisandro Dalcin
On 27 August 2014 02:38, Jeff Squyres (jsquyres)  wrote:
> If you have reproducers, yes, that would be most helpful -- thanks.
>

OK, here you have something to start. To be fair, this is a reduction
with zero count. I have many other tests for reductions with zero
count that are failing.

Does Open MPI ban zero-count reduction calls, or any failure is actually a bug?

$ cat ireduce_scatter_block.c
#include 
int main(int argc, char *argv[])
{
  MPI_Request request;
  MPI_Init(, );
  MPI_Ireduce_scatter_block(NULL, NULL, 0, MPI_INT,
MPI_SUM, MPI_COMM_SELF, );
  MPI_Wait(, MPI_STATUS_IGNORE);
  MPI_Finalize();
  return 0;
}

$ mpicc ireduce_scatter_block.c
$ ./a.out
malloc debug: Request for 0 bytes (coll_libnbc_ireduce_scatter_block.c, 67)


Re: [OMPI devel] TKR

2014-08-27 Thread Orion Poplawski

On 08/27/2014 08:32 AM, Jeff Squyres (jsquyres) wrote:

On Aug 27, 2014, at 10:05 AM, Orion Poplawski  wrote:


Can someone give me a quick overview of the tkr/ignore-tkr split in the fortran 
bindings?


Heh.  How much do you want to know?  How far down the Fortran rabbit hole do 
you want to go?  :-)


In the process of updating the Fedora openmpi packages from 1.8.1 in Fedora 
21/22 to 1.8.2 we seem to have gone from libmpi_usempi.so to 
libmpi_usempi_ignore_tkr.so and I'm not sure why.


Did you upgrade gcc/gfortran to 4.9[.x]?  If so, that's likely why.



That's the trick.  Thanks very much for the description.  I'm glad we 
got this change in now then, and it looks like we can safely update 
older releases if needed.



In short:

- pre gcc/gfortran-4.9: uses the TKR interface
- gcc/gfortran >= 4.9: uses the ignore-TKR interface

TKR = Fortran-eese for "type, kind, rank".  "Type" is what you would expect: INTEGER, DOUBLE PRECISION, 
...etc.  "Kind", as I understand it, is a variant of the type: e.g., there are different kinds of INTEGERs.  I'm sure 
that a Fortran expert will disagree with me here, but for a software engineer, it comes down to INTEGERs of different sizes: 2 
byte integer values, 4 byte integer values, etc.  "Rank" is the array dimension of the variable (which is a little 
confusing in an MPI context, where "rank" has an entirely different meaning).

Before Fortran 08, there was no Fortran equivalent of C's (void*).  Hence, it 
was actually impossible -- using pure Fortran -- to have Fortran prototypes for 
MPI subroutines that take a choice buffer (e.g., MPI_Send, which takes a 
(void*) buffer argument in C).

Most Fortran compilers have long-since had various pragmas that tell the 
compiler to ignore the TKR of a given subroutine parameter -- effectively 
making it like a C (void*).

Gfortran tends to be quite pure in its Fortran implementation and did not 
support this kind of ignore-TKR pragma until the 4.9 series.

Hence, gfortran =4.9 uses the shiny new "ignore TKR"-based 
implementation, which is significantly simpler, has more features, and is OMPI's defined path fortran for 
Fortran support.

Keep in mind that all of this is based one *one* of the 3 defined Fortran 
interfaces in MPI:

1. mpif.h
2. "mpi" module
3. "mpi_f08" module

Specificially, this conversation is about #2.  Many of the aspects also apply 
to #3, but the issues are (related but) a little different there.




--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA/CoRA DivisionFAX: 303-415-9702
3380 Mitchell Lane  or...@cora.nwra.com
Boulder, CO 80301  http://www.cora.nwra.com


Re: [OMPI devel] MPI calls in callback functions during MPI_Finalize()

2014-08-27 Thread George Bosilca
Lisandro,

We all use similar mechanisms to handle internal releases. Let's give some
credit to the MPI folks who (for once) designed a clear and workable
mechanism to achieve this.

  George.



On Wed, Aug 27, 2014 at 10:15 AM, Lisandro Dalcin  wrote:

> On 26 August 2014 23:59, George Bosilca  wrote:
> > Lisandro,
> >
> > You rely on a feature clearly prohibited by the MPI standard. Please read
> > the entire section I pinpointed you to (8.7.1).
> >
> > There are 2 key sentences in the section.
> >
> > 1. When MPI_FINALIZE is called, it will first execute the equivalent of
> an
> > MPI_COMM_FREE on MPI_COMM_SELF.
> >
> > 2. The freeing of MPI_COMM_SELF occurs before any other parts of MPI are
> > affected. Thus, for example, calling MPI_FINALIZED will return false in
> any
> > of these callback functions. Once done with MPI_COMM_SELF, the order and
> > rest of the actions taken by MPI_FINALIZE is not specified.
> >
> > Thus when MPI is calling the equivalent of MPI_COMM_FREE on your
> > communicator, it is too late the MPI is already considered as finalized.
> > Moreover, relying on MPI to cleanup your communicators is already bad
> habit,
> > which is rightfully punished by Open MPI.
> >
>
> After much thinking about it, I must surrender :-), you were right.
> Sorry for the noise.
>
>
> --
> Lisandro Dalcin
> 
> Research Scientist
> Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
> Numerical Porous Media Center (NumPor)
> King Abdullah University of Science and Technology (KAUST)
> http://numpor.kaust.edu.sa/
>
> 4700 King Abdullah University of Science and Technology
> al-Khawarizmi Bldg (Bldg 1), Office # 4332
> Thuwal 23955-6900, Kingdom of Saudi Arabia
> http://www.kaust.edu.sa
>
> Office Phone: +966 12 808-0459
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15735.php
>


Re: [OMPI devel] MPI calls in callback functions during MPI_Finalize()

2014-08-27 Thread Lisandro Dalcin
On 26 August 2014 23:59, George Bosilca  wrote:
> Lisandro,
>
> You rely on a feature clearly prohibited by the MPI standard. Please read
> the entire section I pinpointed you to (8.7.1).
>
> There are 2 key sentences in the section.
>
> 1. When MPI_FINALIZE is called, it will first execute the equivalent of an
> MPI_COMM_FREE on MPI_COMM_SELF.
>
> 2. The freeing of MPI_COMM_SELF occurs before any other parts of MPI are
> affected. Thus, for example, calling MPI_FINALIZED will return false in any
> of these callback functions. Once done with MPI_COMM_SELF, the order and
> rest of the actions taken by MPI_FINALIZE is not specified.
>
> Thus when MPI is calling the equivalent of MPI_COMM_FREE on your
> communicator, it is too late the MPI is already considered as finalized.
> Moreover, relying on MPI to cleanup your communicators is already bad habit,
> which is rightfully punished by Open MPI.
>

After much thinking about it, I must surrender :-), you were right.
Sorry for the noise.


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


[OMPI devel] TKR

2014-08-27 Thread Orion Poplawski
Can someone give me a quick overview of the tkr/ignore-tkr split in the 
fortran bindings?  In the process of updating the Fedora openmpi 
packages from 1.8.1 in Fedora 21/22 to 1.8.2 we seem to have gone from 
libmpi_usempi.so to libmpi_usempi_ignore_tkr.so and I'm not sure why.


checking Fortran compiler ignore TKR syntax... not cached; checking variants
checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
checking for Fortran compiler support of !GCC$ ATTRIBUTES 
NO_ARG_CHECK... yes
checking Fortran compiler ignore TKR syntax... 1:type(*), 
dimension(*):!GCC$ ATTRIBUTES NO_ARG_CHECK ::


--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA/CoRA DivisionFAX: 303-415-9702
3380 Mitchell Lane  or...@cora.nwra.com
Boulder, CO 80301  http://www.cora.nwra.com


[OMPI devel] SVN -> git conversion: check your email address!

2014-08-27 Thread Jeff Squyres (jsquyres)
I was doing another trial SVN -> git conversion and found 2 new commit IDs this 
morning that were not in my authors list.

Please please please check 
https://github.com/open-mpi/authors/blob/master/authors.txt and ensure that the 
email address(es) listed for your commit ID(s) are what you want them to be.

Feel free to either email me corrections to send me a pull request (hint: you 
might want to try a pull request, since that's what we're going to be using for 
CMRs!).

We'll be doing the final final final SVN conversion soon, and after that point, 
it won't be possible to change your email address in the git history.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] intercomm_create from the ibm test suite hangs

2014-08-27 Thread Gilles Gouaillardet
Folks,

the intercomm_create test case from the ibm test suite can hang under
some configuration.

basically, it will spawn n tasks in a first communicator, and then n
tasks in a second communicator.

when i run from node0 :
mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2
./intercomm_create

the second spawn will hang.
a simple workaround is to use 3 hosts :
mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2,node3
./intercomm_create

the second spawn creates the task on node2.
for some reasons i cannot fully understand, pmix believe orted of nodes
node1 and node2 are involved in allgather.
since node1 in not involved whatsoever, the program hangs
/* in create_dmns, orte_get_job_data_object(sig->signature[0].jobid)
returns jdata with jdata->map->num_nodes = 2 */

Cheers,

Gilles


Re: [OMPI devel] Envelope of HINDEXED_BLOCK

2014-08-27 Thread George Bosilca
Lisandro,

Thanks for the tester. I pushed a fix in the trunk (r32613) and I requested
a CMR for the 1.8.3.

  George.



On Tue, Aug 26, 2014 at 6:53 AM, Lisandro Dalcin  wrote:

> I've just installed 1.8.2, something is still wrong with
> HINDEXED_BLOCK datatypes.
>
> Please note the example below, it should print "ni=2" but I'm getting
> "ni=7".
>
> $ cat type_hindexed_block.c
> #include 
> #include 
> int main(int argc, char *argv[])
> {
>   MPI_Datatype datatype;
>   MPI_Aint disps[] = {0,2,4,6,8};
>   int ni,na,nd,combiner;
>   MPI_Init(, );
>   MPI_Type_create_hindexed_block(5, 2, disps, MPI_BYTE, );
>   MPI_Type_get_envelope(datatype, , , , );
>   printf("ni=%d na=%d nd=%d combiner=%d\n", ni, na, nd, combiner);
>   MPI_Type_free();
>   MPI_Finalize();
>   return 0;
> }
>
> $ mpicc type_hindexed_block.c
>
> $ ./a.out
> ni=7 na=5 nd=1 combiner=18
>
>
> --
> Lisandro Dalcin
> 
> Research Scientist
> Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
> Numerical Porous Media Center (NumPor)
> King Abdullah University of Science and Technology (KAUST)
> http://numpor.kaust.edu.sa/
>
> 4700 King Abdullah University of Science and Technology
> al-Khawarizmi Bldg (Bldg 1), Office # 4332
> Thuwal 23955-6900, Kingdom of Saudi Arabia
> http://www.kaust.edu.sa
>
> Office Phone: +966 12 808-0459
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15709.php
>


Re: [OMPI devel] Comm_split_type(COMM_SELF, MPI_UNDEFINED, ...)

2014-08-27 Thread George Bosilca
The proposed patch has several issues, all of them detailed on the ticket.
A correct patch as well as a broaden tester are provided.

  George.



On Tue, Aug 26, 2014 at 8:21 PM, Jeff Squyres (jsquyres)  wrote:

> Good catch.
>
> I filed https://svn.open-mpi.org/trac/ompi/ticket/4876 with a patch for
> the fix; I want to get more eyeballs on it before I commit.
>
>
> On Aug 26, 2014, at 7:07 AM, Lisandro Dalcin  wrote:
>
> > While I agree that the code below is rather useless, however I'm not
> > sure it should actually fail:
> >
> > $ cat comm_split_type.c
> > #include 
> > #include 
> > int main(int argc, char *argv[])
> > {
> >  MPI_Comm comm;
> >  MPI_Init(, );
> >  MPI_Comm_split_type(MPI_COMM_SELF,MPI_UNDEFINED,0,MPI_INFO_NULL,);
> >  assert(comm == MPI_COMM_NULL);
> >  MPI_Finalize();
> >  return 0;
> > }
> >
> > $ mpicc comm_split_type.c
> > $ ./a.out
> > [kw2060:9865] *** An error occurred in MPI_Comm_split_type
> > [kw2060:9865] *** reported by process [140735368986625,140071768424448]
> > [kw2060:9865] *** on communicator MPI_COMM_SELF
> > [kw2060:9865] *** MPI_ERR_ARG: invalid argument of some other kind
> > [kw2060:9865] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
> > will now abort,
> > [kw2060:9865] ***and potentially your MPI job)
> >
> > --
> > Lisandro Dalcin
> > 
> > Research Scientist
> > Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
> > Numerical Porous Media Center (NumPor)
> > King Abdullah University of Science and Technology (KAUST)
> > http://numpor.kaust.edu.sa/
> >
> > 4700 King Abdullah University of Science and Technology
> > al-Khawarizmi Bldg (Bldg 1), Office # 4332
> > Thuwal 23955-6900, Kingdom of Saudi Arabia
> > http://www.kaust.edu.sa
> >
> > Office Phone: +966 12 808-0459
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15710.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15727.php
>