Re: [OMPI devel] Possible bug with derived datatypes and openib BTL in trunk
I sent this information to George off the mailing list since the attachment was somewhat large. Still strange that I guess I am the only one that sees this. >-Original Message- >From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George >Bosilca >Sent: Wednesday, April 16, 2014 4:24 PM >To: Open MPI Developers >Subject: Re: [OMPI devel] Possible bug with derived datatypes and openib >BTL in trunk > >Rolf, > >I didn't see these on my check run. Can you run the MPI_Isend_ator test with >mpi_ddt_pack_debug and mpi_ddt_unpack_debug set to 1. I would be >interested in the output you get on your machine. > >George. > > >On Apr 16, 2014, at 14:34 , Rolf vandeVaartwrote: > >> I have seen errors when running the intel test suite using the openib BTL >when transferring derived datatypes. I do not see the error with sm or tcp >BTLs. The errors begin after this checkin. >> >> https://svn.open-mpi.org/trac/ompi/changeset/31370 >> Timestamp: 04/11/14 16:06:56 (5 days ago) >> Author: bosilca >> Message: Reshape all the packing/unpacking functions to use the same >> skeleton. Rewrite the generic_unpacking to take advantage of the same >capabilitites. >> >> Does anyone else see errors? Here is an example running with r31370: >> >> [rvandevaart@drossetti-ivy1 src]$ mpirun --mca btl self,openib -np 2 >> -host drossetti-ivy0,drossetti-ivy1 --mca >> btl_openib_warn_default_gid_prefix 0 MPI_Isend_ator_c MPITEST error >> (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117 MPITEST >> error (1): libmpitest.c:1578 i=195, char value=-1, expected -61 >> MPITEST error (1): 2 errors in buffer (17,0,12) len 273 commsize 2 >> commtype -10 data_type 13 root 1 MPITEST error (1): libmpitest.c:1608 >> i=117, int32_t value=-1, expected 117 MPITEST error (1): >> libmpitest.c:1578 i=195, char value=-1, expected -61 MPITEST error >> (1): 2 errors in buffer (17,2,12) len 273 commsize 2 commtype -16 >> data_type 13 root 1 MPITEST info (0): Starting MPI_Isend_ator: All >> Isend TO Root test MPITEST info (0): Node spec >> MPITEST_comm_sizes[6]=2 too large, using 1 MPITEST info (0): Node >> spec MPITEST_comm_sizes[22]=2 too large, using 1 MPITEST info (0): >> Node spec MPITEST_comm_sizes[32]=2 too large, using 1 MPITEST error >> (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 118 MPITEST >> error (0): libmpitest.c:1578 i=195, char value=-1, expected -60 >> MPITEST error (0): 2 errors in buffer (17,0,12) len 273 commsize 2 >> commtype -10 data_type 13 root 0 MPITEST error (0): libmpitest.c:1608 >> i=117, int32_t value=-1, expected 118 MPITEST error (0): >> libmpitest.c:1578 i=195, char value=-1, expected -60 MPITEST error >> (0): 2 errors in buffer (17,2,12) len 273 commsize 2 commtype -16 >> data_type 13 root 0 MPITEST error (1): libmpitest.c:1608 i=117, >> int32_t value=-1, expected 117 MPITEST error (1): libmpitest.c:1578 >> i=195, char value=-1, expected -61 MPITEST error (1): 2 errors in >> buffer (17,4,12) len 273 commsize 2 commtype -13 data_type 13 root 1 >> MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected >> 118 MPITEST error (0): libmpitest.c:1578 i=195, char value=-1, >> expected -60 MPITEST error (0): 2 errors in buffer (17,4,12) len 273 >> commsize 2 commtype -13 data_type 13 root 0 MPITEST error (1): >> libmpitest.c:1608 i=117, int32_t value=-1, expected 117 MPITEST error >> (1): libmpitest.c:1578 i=195, char value=-1, expected -61 MPITEST >> error (1): 2 errors in buffer (17,6,12) len 273 commsize 2 commtype >> -15 data_type 13 root 0 MPITEST error (0): libmpitest.c:1608 i=117, >> int32_t value=-1, expected 117 MPITEST error (0): libmpitest.c:1578 >> i=195, char value=-1, expected -61 MPITEST error (0): 2 errors in >> buffer (17,6,12) len 273 commsize 2 commtype -15 data_type 13 root 0 >> MPITEST_results: MPI_Isend_ator: All Isend TO Root 8 tests FAILED (of >> 3744) >> --- >> Primary job terminated normally, but 1 process returned a non-zero >> exit code.. Per user-direction, the job has been aborted. >> --- >> -- >> mpirun detected that one or more processes exited with non-zero >> status, thus causing the job to be terminated. The first process to do >> so was: >> >> Process name: [[12363,1],0] >> Exit code:4 >> -- >> >> [rvandevaart@drossetti-ivy1 src]$ >> >> >> Here is an error with the trunk which is slightly different. >> [rvandevaart@drossetti-ivy1 src]$ mpirun --mca btl self,openib -np 2 >> -host drossetti-ivy0,drossetti-ivy1 --mca >btl_openib_warn_default_gid_prefix 0 MPI_Isend_ator_c [drossetti- >ivy1.nvidia.com:22875] ../../../opal/datatype/opal_datatype_position.c:72 >> Pointer 0x1ad414c size 4 is outside [0x1ac1d20,0x1ad1d08] for >> base ptr 0x1ac1d20 count 273 and data
[OMPI devel] MTT has migrated to git
For all of you who run MTT via a SVN checkout, note that MTT has now moved to git/github: https://github.com/open-mpi/mtt/ You might want to get a new clone and remove your old SVN checkout. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] 1-question developer poll
For Open MPI: - Primary: Mercurial (hosted on BitBucket - better deal for academia) - Secondary: Git (hosted on either BitBucket or GitHub) - SVN only to commit back For other projects: - SVN - Becoming less commonly used, but still used for some projects like Open MPI - Mercurial and Git - equally for various projects. Teaching students SCM, Git is probably the most difficult since the initial learning curve is steeper than Mercurial, and they can easily get turned around with some of the more complex features they find on their own. SVN is the easiest to teach, but the most restrictive and requires dedicated a hosting server in the department. We are having a similar discussion in our department at the moment regarding which SCM system we should expose students to in the upper level courses. Currently, we have started (past year and a half) using Git in at least 2 classes. Previously, students were not really exposed to SCM except if they did some independent research. It is too early to tell how successful that has been. -- Josh On Wed, Apr 16, 2014 at 5:32 AM, Jeff Squyres (jsquyres)wrote: > What source code repository technology(ies) do you use for Open MPI > development? (indicate all that apply) > > - SVN > - Mercurial > - Git > > I ask this question because there's serious discussions afoot to switch > OMPI's main SVN repo to Git, and I want to get a feel for the current > landscape out there. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/04/14537.php > -- Joshua Hursey Assistant Professor of Computer Science University of Wisconsin-La Crosse http://cs.uwlax.edu/~jjhursey
Re: [MTT devel] Converted to git
I assume this means that no one found any problems or has any changes. I'll be moving this to its permanent home on github sometime soon and making the MTT SVN be read-only. Trac will be migrating to be git-based soon as well. Please do not use the MTT trac until further notice. Thanks! On Apr 16, 2014, at 3:40 PM, Jeff Squyres (jsquyres)wrote: > BTW, I used the following SVN<-->email addresses mapping for creating the git > commits. Let me know if you want something different: > > adkulkar = Abhishek Kulkarni > afriedle = Andrew Friedley > brbarret = Brian Barrett > cyeoh = Chris Yeoh > em162155 = Ethan Mallove > emallove = Ethan Mallove > eugene = Eugene Loh > hpcstork = Sven Stork > jjhursey = Josh Hursey > jsquyres = Jeff Squyres > miked = Mike Dubman > pasha = Pavel Shamis > rusraink = Rainer Keller > shiqing = Shiqing Fan > timattox = Tim Mattox > > > On Apr 16, 2014, at 3:37 PM, "Jeff Squyres (jsquyres)" > wrote: > >> I have done a TRIAL conversion to git and pushed it to a demo repo at >> github. Please examine it and let me know if you see any problems: >> >> https://github.com/jsquyres/mtt-test >> >> Note that we converted references to "rXYZ" in log messages -- see >> https://github.com/jsquyres/mtt-test/commit/ebb98c67677b02fa00064f8b1ae0d40941c305cd >> for an example. >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> mtt-devel mailing list >> mtt-de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > mtt-devel mailing list > mtt-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks
Dear OpenMPI developers, i just created #4531 in order to track this issue : https://svn.open-mpi.org/trac/ompi/ticket/4531 Basically, the coll/tuned implementation of MPI_Bcast does not work when two tasks uses datatypes of different sizes. for example, if the root send two large vectors of MPI_INT and non root receive many MPI_INT, then MPI_Bcast will crash. but if the root send many MPI_INT and the non root receive two large vectors of MPI_INT, then MPI_Bcast will silently fail. (the TRAC ticket has attached test cases) i believe this kind of issue could occur on all/most collective of the coll/tuned module, so it is not limited to MPI_Bcast. i am wondering of what could be the best way to solve this. one solution i could think of, would be to generate temporary datatypes in order to send message whose size is exactly the segment_size. an other solution i could think of, would be to have new send/recv functions : if we consider the send function : int mca_pml_ob1_send(void *buf, size_t count, ompi_datatype_t * datatype, int dst, int tag, mca_pml_base_send_mode_t sendmode, ompi_communicator_t * comm) we could imagine to have the xsend function : int mca_pml_ob1_xsend(void *buf, size_t count, ompi_datatype_t * datatype, size_t offset, size_t size, int dst, int tag, mca_pml_base_send_mode_t sendmode, ompi_communicator_t * comm) where offset is the number of bytes that should be skipped from the beginning of buf and size if the (max) number of bytes to be sent (e.g. the message will be "truncated" to size bytes if (count*size(datatype) - offset) > size or we could use a buffer if needed, and send/recv with MPI_PACKED datatype (this is less efficient, would it even work on heterogeneous nodes ?) or we could simply consider this is just a limitation of coll/tuned (coll/basic works fine) and do nothing or something else i did not think of ... thanks in advance for your feedback Gilles
Re: [OMPI devel] 1-question developer poll
git (Github mirror, git-svn, git patches) -- Christoph Niethammer High Performance Computing Center Stuttgart (HLRS) Nobelstrasse 19 70569 Stuttgart Tel: ++49(0)711-685-87203 email: nietham...@hlrs.de http://www.hlrs.de/people/niethammer - Original Message - From: "Jeff Squyres (jsquyres)"To: "Open MPI Developers List" Sent: Wednesday, April 16, 2014 12:32:10 PM Subject: [OMPI devel] 1-question developer poll What source code repository technology(ies) do you use for Open MPI development? (indicate all that apply) - SVN - Mercurial - Git I ask this question because there's serious discussions afoot to switch OMPI's main SVN repo to Git, and I want to get a feel for the current landscape out there. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ devel mailing list de...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14537.php