Re: [OMPI devel] coll/ml without hwloc (?)
Done, set for 1.8.3 On Aug 26, 2014, at 7:56 AM, Shamis, Pavelwrote: > Theoretically, we may make it functional (with good performance) even without > hwloc. > As it is today, I would suggest to disable ML if hwloc is disabled. > > Best, > Pasha > >> -Original Message- >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles >> Gouaillardet >> Sent: Tuesday, August 26, 2014 4:38 AM >> To: Open MPI Developers >> Subject: [OMPI devel] coll/ml without hwloc (?) >> >> Folks, >> >> i just commited r32604 in order to fix compilation (pmix) when ompi is >> configured with --without-hwloc >> >> now, even a trivial hello world program issues the following output >> (which is a non fatal, and could even be reported as a warning) : >> >> [soleil][[32389,1],0][../../../../../../src/ompi- >> trunk/ompi/mca/coll/ml/coll_ml_module.c:1496:ml_discover_hierarchy] >> COLL-ML Error: (size of mca_bcol_base_components_in_use = 3) != (size of >> mca_sbgp_base_components_in_use = 2) or zero. >> [soleil][[32389,1],1][../../../../../../src/ompi- >> trunk/ompi/mca/coll/ml/coll_ml_module.c:1496:ml_discover_hierarchy] >> COLL-ML Error: (size of mca_bcol_base_components_in_use = 3) != (size of >> mca_sbgp_base_components_in_use = 2) or zero. >> >> >> in my understanding, coll/ml somehow relies on the topology information >> (reported by hwloc) so i am wondering whether we should simply >> *not* compile coll/ml or set its priority to zero if ompi is configured >> with --without-hwloc >> >> any thoughts ? >> >> Cheers, >> >> Gilles >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: http://www.open- >> mpi.org/community/lists/devel/2014/08/15708.php > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15711.php
Re: [OMPI devel] intercomm_create from the ibm test suite hangs
Took me awhile to track this down, but it is now fixed - combination of several minor errors Thanks Ralph On Aug 27, 2014, at 4:07 AM, Gilles Gouaillardetwrote: > Folks, > > the intercomm_create test case from the ibm test suite can hang under > some configuration. > > basically, it will spawn n tasks in a first communicator, and then n > tasks in a second communicator. > > when i run from node0 : > mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2 > ./intercomm_create > > the second spawn will hang. > a simple workaround is to use 3 hosts : > mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2,node3 > ./intercomm_create > > the second spawn creates the task on node2. > for some reasons i cannot fully understand, pmix believe orted of nodes > node1 and node2 are involved in allgather. > since node1 in not involved whatsoever, the program hangs > /* in create_dmns, orte_get_job_data_object(sig->signature[0].jobid) > returns jdata with jdata->map->num_nodes = 2 */ > > Cheers, > > Gilles > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15732.php
Re: [OMPI devel] malloc 0 warnings
On 27 August 2014 02:38, Jeff Squyres (jsquyres)wrote: > If you have reproducers, yes, that would be most helpful -- thanks. > Here you have another one... $ cat igatherv.c #include int main(int argc, char *argv[]) { signed char a=1,b=2; int rcounts[1] = {0}; int rdispls[1] = {0}; MPI_Request request; MPI_Init(, ); MPI_Igatherv(, 0, MPI_SIGNED_CHAR, , rcounts, rdispls, MPI_SIGNED_CHAR, 0, MPI_COMM_SELF, ); MPI_Wait(, MPI_STATUS_IGNORE); MPI_Finalize(); return 0; } $ mpicc igatherv.c $ ./a.out malloc debug: Request for 0 bytes (nbc_internal.h, 496)
Re: [OMPI devel] TKR
"Jeff Squyres (jsquyres)"writes: > Before Fortran 08, there was no Fortran equivalent of C's (void*). > Hence, it was actually impossible -- using pure Fortran -- to have > Fortran prototypes for MPI subroutines that take a choice buffer > (e.g., MPI_Send, which takes a (void*) buffer argument in C). Just a note that Fortran 2008 doesn't really have this either. It is in TS 29113, which is scheduled for inclusion in the next Fortran standard. Many compilers support it already. pgpVXRXL8ISVM.pgp Description: PGP signature
Re: [OMPI devel] malloc 0 warnings
On 27 August 2014 02:38, Jeff Squyres (jsquyres)wrote: > If you have reproducers, yes, that would be most helpful -- thanks. > OK, here you have something to start. To be fair, this is a reduction with zero count. I have many other tests for reductions with zero count that are failing. Does Open MPI ban zero-count reduction calls, or any failure is actually a bug? $ cat ireduce_scatter_block.c #include int main(int argc, char *argv[]) { MPI_Request request; MPI_Init(, ); MPI_Ireduce_scatter_block(NULL, NULL, 0, MPI_INT, MPI_SUM, MPI_COMM_SELF, ); MPI_Wait(, MPI_STATUS_IGNORE); MPI_Finalize(); return 0; } $ mpicc ireduce_scatter_block.c $ ./a.out malloc debug: Request for 0 bytes (coll_libnbc_ireduce_scatter_block.c, 67)
Re: [OMPI devel] TKR
On 08/27/2014 08:32 AM, Jeff Squyres (jsquyres) wrote: On Aug 27, 2014, at 10:05 AM, Orion Poplawskiwrote: Can someone give me a quick overview of the tkr/ignore-tkr split in the fortran bindings? Heh. How much do you want to know? How far down the Fortran rabbit hole do you want to go? :-) In the process of updating the Fedora openmpi packages from 1.8.1 in Fedora 21/22 to 1.8.2 we seem to have gone from libmpi_usempi.so to libmpi_usempi_ignore_tkr.so and I'm not sure why. Did you upgrade gcc/gfortran to 4.9[.x]? If so, that's likely why. That's the trick. Thanks very much for the description. I'm glad we got this change in now then, and it looks like we can safely update older releases if needed. In short: - pre gcc/gfortran-4.9: uses the TKR interface - gcc/gfortran >= 4.9: uses the ignore-TKR interface TKR = Fortran-eese for "type, kind, rank". "Type" is what you would expect: INTEGER, DOUBLE PRECISION, ...etc. "Kind", as I understand it, is a variant of the type: e.g., there are different kinds of INTEGERs. I'm sure that a Fortran expert will disagree with me here, but for a software engineer, it comes down to INTEGERs of different sizes: 2 byte integer values, 4 byte integer values, etc. "Rank" is the array dimension of the variable (which is a little confusing in an MPI context, where "rank" has an entirely different meaning). Before Fortran 08, there was no Fortran equivalent of C's (void*). Hence, it was actually impossible -- using pure Fortran -- to have Fortran prototypes for MPI subroutines that take a choice buffer (e.g., MPI_Send, which takes a (void*) buffer argument in C). Most Fortran compilers have long-since had various pragmas that tell the compiler to ignore the TKR of a given subroutine parameter -- effectively making it like a C (void*). Gfortran tends to be quite pure in its Fortran implementation and did not support this kind of ignore-TKR pragma until the 4.9 series. Hence, gfortran =4.9 uses the shiny new "ignore TKR"-based implementation, which is significantly simpler, has more features, and is OMPI's defined path fortran for Fortran support. Keep in mind that all of this is based one *one* of the 3 defined Fortran interfaces in MPI: 1. mpif.h 2. "mpi" module 3. "mpi_f08" module Specificially, this conversation is about #2. Many of the aspects also apply to #3, but the issues are (related but) a little different there. -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA DivisionFAX: 303-415-9702 3380 Mitchell Lane or...@cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com
Re: [OMPI devel] MPI calls in callback functions during MPI_Finalize()
Lisandro, We all use similar mechanisms to handle internal releases. Let's give some credit to the MPI folks who (for once) designed a clear and workable mechanism to achieve this. George. On Wed, Aug 27, 2014 at 10:15 AM, Lisandro Dalcinwrote: > On 26 August 2014 23:59, George Bosilca wrote: > > Lisandro, > > > > You rely on a feature clearly prohibited by the MPI standard. Please read > > the entire section I pinpointed you to (8.7.1). > > > > There are 2 key sentences in the section. > > > > 1. When MPI_FINALIZE is called, it will first execute the equivalent of > an > > MPI_COMM_FREE on MPI_COMM_SELF. > > > > 2. The freeing of MPI_COMM_SELF occurs before any other parts of MPI are > > affected. Thus, for example, calling MPI_FINALIZED will return false in > any > > of these callback functions. Once done with MPI_COMM_SELF, the order and > > rest of the actions taken by MPI_FINALIZE is not specified. > > > > Thus when MPI is calling the equivalent of MPI_COMM_FREE on your > > communicator, it is too late the MPI is already considered as finalized. > > Moreover, relying on MPI to cleanup your communicators is already bad > habit, > > which is rightfully punished by Open MPI. > > > > After much thinking about it, I must surrender :-), you were right. > Sorry for the noise. > > > -- > Lisandro Dalcin > > Research Scientist > Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > Numerical Porous Media Center (NumPor) > King Abdullah University of Science and Technology (KAUST) > http://numpor.kaust.edu.sa/ > > 4700 King Abdullah University of Science and Technology > al-Khawarizmi Bldg (Bldg 1), Office # 4332 > Thuwal 23955-6900, Kingdom of Saudi Arabia > http://www.kaust.edu.sa > > Office Phone: +966 12 808-0459 > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15735.php >
Re: [OMPI devel] MPI calls in callback functions during MPI_Finalize()
On 26 August 2014 23:59, George Bosilcawrote: > Lisandro, > > You rely on a feature clearly prohibited by the MPI standard. Please read > the entire section I pinpointed you to (8.7.1). > > There are 2 key sentences in the section. > > 1. When MPI_FINALIZE is called, it will first execute the equivalent of an > MPI_COMM_FREE on MPI_COMM_SELF. > > 2. The freeing of MPI_COMM_SELF occurs before any other parts of MPI are > affected. Thus, for example, calling MPI_FINALIZED will return false in any > of these callback functions. Once done with MPI_COMM_SELF, the order and > rest of the actions taken by MPI_FINALIZE is not specified. > > Thus when MPI is calling the equivalent of MPI_COMM_FREE on your > communicator, it is too late the MPI is already considered as finalized. > Moreover, relying on MPI to cleanup your communicators is already bad habit, > which is rightfully punished by Open MPI. > After much thinking about it, I must surrender :-), you were right. Sorry for the noise. -- Lisandro Dalcin Research Scientist Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) Numerical Porous Media Center (NumPor) King Abdullah University of Science and Technology (KAUST) http://numpor.kaust.edu.sa/ 4700 King Abdullah University of Science and Technology al-Khawarizmi Bldg (Bldg 1), Office # 4332 Thuwal 23955-6900, Kingdom of Saudi Arabia http://www.kaust.edu.sa Office Phone: +966 12 808-0459
[OMPI devel] TKR
Can someone give me a quick overview of the tkr/ignore-tkr split in the fortran bindings? In the process of updating the Fedora openmpi packages from 1.8.1 in Fedora 21/22 to 1.8.2 we seem to have gone from libmpi_usempi.so to libmpi_usempi_ignore_tkr.so and I'm not sure why. checking Fortran compiler ignore TKR syntax... not cached; checking variants checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no checking for Fortran compiler support of !GCC$ ATTRIBUTES NO_ARG_CHECK... yes checking Fortran compiler ignore TKR syntax... 1:type(*), dimension(*):!GCC$ ATTRIBUTES NO_ARG_CHECK :: -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA DivisionFAX: 303-415-9702 3380 Mitchell Lane or...@cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com
[OMPI devel] SVN -> git conversion: check your email address!
I was doing another trial SVN -> git conversion and found 2 new commit IDs this morning that were not in my authors list. Please please please check https://github.com/open-mpi/authors/blob/master/authors.txt and ensure that the email address(es) listed for your commit ID(s) are what you want them to be. Feel free to either email me corrections to send me a pull request (hint: you might want to try a pull request, since that's what we're going to be using for CMRs!). We'll be doing the final final final SVN conversion soon, and after that point, it won't be possible to change your email address in the git history. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI devel] intercomm_create from the ibm test suite hangs
Folks, the intercomm_create test case from the ibm test suite can hang under some configuration. basically, it will spawn n tasks in a first communicator, and then n tasks in a second communicator. when i run from node0 : mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2 ./intercomm_create the second spawn will hang. a simple workaround is to use 3 hosts : mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2,node3 ./intercomm_create the second spawn creates the task on node2. for some reasons i cannot fully understand, pmix believe orted of nodes node1 and node2 are involved in allgather. since node1 in not involved whatsoever, the program hangs /* in create_dmns, orte_get_job_data_object(sig->signature[0].jobid) returns jdata with jdata->map->num_nodes = 2 */ Cheers, Gilles
Re: [OMPI devel] Envelope of HINDEXED_BLOCK
Lisandro, Thanks for the tester. I pushed a fix in the trunk (r32613) and I requested a CMR for the 1.8.3. George. On Tue, Aug 26, 2014 at 6:53 AM, Lisandro Dalcinwrote: > I've just installed 1.8.2, something is still wrong with > HINDEXED_BLOCK datatypes. > > Please note the example below, it should print "ni=2" but I'm getting > "ni=7". > > $ cat type_hindexed_block.c > #include > #include > int main(int argc, char *argv[]) > { > MPI_Datatype datatype; > MPI_Aint disps[] = {0,2,4,6,8}; > int ni,na,nd,combiner; > MPI_Init(, ); > MPI_Type_create_hindexed_block(5, 2, disps, MPI_BYTE, ); > MPI_Type_get_envelope(datatype, , , , ); > printf("ni=%d na=%d nd=%d combiner=%d\n", ni, na, nd, combiner); > MPI_Type_free(); > MPI_Finalize(); > return 0; > } > > $ mpicc type_hindexed_block.c > > $ ./a.out > ni=7 na=5 nd=1 combiner=18 > > > -- > Lisandro Dalcin > > Research Scientist > Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > Numerical Porous Media Center (NumPor) > King Abdullah University of Science and Technology (KAUST) > http://numpor.kaust.edu.sa/ > > 4700 King Abdullah University of Science and Technology > al-Khawarizmi Bldg (Bldg 1), Office # 4332 > Thuwal 23955-6900, Kingdom of Saudi Arabia > http://www.kaust.edu.sa > > Office Phone: +966 12 808-0459 > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15709.php >
Re: [OMPI devel] Comm_split_type(COMM_SELF, MPI_UNDEFINED, ...)
The proposed patch has several issues, all of them detailed on the ticket. A correct patch as well as a broaden tester are provided. George. On Tue, Aug 26, 2014 at 8:21 PM, Jeff Squyres (jsquyres)wrote: > Good catch. > > I filed https://svn.open-mpi.org/trac/ompi/ticket/4876 with a patch for > the fix; I want to get more eyeballs on it before I commit. > > > On Aug 26, 2014, at 7:07 AM, Lisandro Dalcin wrote: > > > While I agree that the code below is rather useless, however I'm not > > sure it should actually fail: > > > > $ cat comm_split_type.c > > #include > > #include > > int main(int argc, char *argv[]) > > { > > MPI_Comm comm; > > MPI_Init(, ); > > MPI_Comm_split_type(MPI_COMM_SELF,MPI_UNDEFINED,0,MPI_INFO_NULL,); > > assert(comm == MPI_COMM_NULL); > > MPI_Finalize(); > > return 0; > > } > > > > $ mpicc comm_split_type.c > > $ ./a.out > > [kw2060:9865] *** An error occurred in MPI_Comm_split_type > > [kw2060:9865] *** reported by process [140735368986625,140071768424448] > > [kw2060:9865] *** on communicator MPI_COMM_SELF > > [kw2060:9865] *** MPI_ERR_ARG: invalid argument of some other kind > > [kw2060:9865] *** MPI_ERRORS_ARE_FATAL (processes in this communicator > > will now abort, > > [kw2060:9865] ***and potentially your MPI job) > > > > -- > > Lisandro Dalcin > > > > Research Scientist > > Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > > Numerical Porous Media Center (NumPor) > > King Abdullah University of Science and Technology (KAUST) > > http://numpor.kaust.edu.sa/ > > > > 4700 King Abdullah University of Science and Technology > > al-Khawarizmi Bldg (Bldg 1), Office # 4332 > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > http://www.kaust.edu.sa > > > > Office Phone: +966 12 808-0459 > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15710.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15727.php >