Re: [OMPI devel] Possible bug with derived datatypes and openib BTL in trunk

2014-04-17 Thread Rolf vandeVaart
I sent this information to George off the mailing list since the attachment was 
somewhat large.
Still strange that I guess I am the only one that sees this.

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
>Bosilca
>Sent: Wednesday, April 16, 2014 4:24 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] Possible bug with derived datatypes and openib
>BTL in trunk
>
>Rolf,
>
>I didn't see these on my check run. Can you run the MPI_Isend_ator test with
>mpi_ddt_pack_debug and mpi_ddt_unpack_debug set to 1. I would be
>interested in the output you get on your machine.
>
>George.
>
>
>On Apr 16, 2014, at 14:34 , Rolf vandeVaart  wrote:
>
>> I have seen errors when running the intel test suite using the openib BTL
>when transferring derived datatypes.  I do not see the error with sm or tcp
>BTLs.  The errors begin after this checkin.
>>
>> https://svn.open-mpi.org/trac/ompi/changeset/31370
>> Timestamp: 04/11/14 16:06:56 (5 days ago)
>> Author: bosilca
>> Message: Reshape all the packing/unpacking functions to use the same
>> skeleton. Rewrite the generic_unpacking to take advantage of the same
>capabilitites.
>>
>> Does anyone else see errors?  Here is an example running with r31370:
>>
>> [rvandevaart@drossetti-ivy1 src]$ mpirun --mca btl self,openib -np 2
>> -host drossetti-ivy0,drossetti-ivy1 --mca
>> btl_openib_warn_default_gid_prefix 0 MPI_Isend_ator_c MPITEST error
>> (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117 MPITEST
>> error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
>> MPITEST error (1): 2 errors in buffer (17,0,12) len 273 commsize 2
>> commtype -10 data_type 13 root 1 MPITEST error (1): libmpitest.c:1608
>> i=117, int32_t value=-1, expected 117 MPITEST error (1):
>> libmpitest.c:1578 i=195, char value=-1, expected -61 MPITEST error
>> (1): 2 errors in buffer (17,2,12) len 273 commsize 2 commtype -16
>> data_type 13 root 1 MPITEST info  (0): Starting MPI_Isend_ator: All
>> Isend TO Root test MPITEST info  (0): Node spec
>> MPITEST_comm_sizes[6]=2 too large, using 1 MPITEST info  (0): Node
>> spec MPITEST_comm_sizes[22]=2 too large, using 1 MPITEST info  (0):
>> Node spec MPITEST_comm_sizes[32]=2 too large, using 1 MPITEST error
>> (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 118 MPITEST
>> error (0): libmpitest.c:1578 i=195, char value=-1, expected -60
>> MPITEST error (0): 2 errors in buffer (17,0,12) len 273 commsize 2
>> commtype -10 data_type 13 root 0 MPITEST error (0): libmpitest.c:1608
>> i=117, int32_t value=-1, expected 118 MPITEST error (0):
>> libmpitest.c:1578 i=195, char value=-1, expected -60 MPITEST error
>> (0): 2 errors in buffer (17,2,12) len 273 commsize 2 commtype -16
>> data_type 13 root 0 MPITEST error (1): libmpitest.c:1608 i=117,
>> int32_t value=-1, expected 117 MPITEST error (1): libmpitest.c:1578
>> i=195, char value=-1, expected -61 MPITEST error (1): 2 errors in
>> buffer (17,4,12) len 273 commsize 2 commtype -13 data_type 13 root 1
>> MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected
>> 118 MPITEST error (0): libmpitest.c:1578 i=195, char value=-1,
>> expected -60 MPITEST error (0): 2 errors in buffer (17,4,12) len 273
>> commsize 2 commtype -13 data_type 13 root 0 MPITEST error (1):
>> libmpitest.c:1608 i=117, int32_t value=-1, expected 117 MPITEST error
>> (1): libmpitest.c:1578 i=195, char value=-1, expected -61 MPITEST
>> error (1): 2 errors in buffer (17,6,12) len 273 commsize 2 commtype
>> -15 data_type 13 root 0 MPITEST error (0): libmpitest.c:1608 i=117,
>> int32_t value=-1, expected 117 MPITEST error (0): libmpitest.c:1578
>> i=195, char value=-1, expected -61 MPITEST error (0): 2 errors in
>> buffer (17,6,12) len 273 commsize 2 commtype -15 data_type 13 root 0
>> MPITEST_results: MPI_Isend_ator: All Isend TO Root 8 tests FAILED (of
>> 3744)
>> ---
>> Primary job  terminated normally, but 1 process returned a non-zero
>> exit code.. Per user-direction, the job has been aborted.
>> ---
>> --
>>  mpirun detected that one or more processes exited with non-zero
>> status, thus causing the job to be terminated. The first process to do
>> so was:
>>
>>  Process name: [[12363,1],0]
>>  Exit code:4
>> --
>> 
>> [rvandevaart@drossetti-ivy1 src]$
>>
>>
>> Here is an error with the trunk which is slightly different.
>> [rvandevaart@drossetti-ivy1 src]$ mpirun --mca btl self,openib -np 2
>> -host drossetti-ivy0,drossetti-ivy1 --mca
>btl_openib_warn_default_gid_prefix 0 MPI_Isend_ator_c [drossetti-
>ivy1.nvidia.com:22875] ../../../opal/datatype/opal_datatype_position.c:72
>>  Pointer 0x1ad414c size 4 is outside [0x1ac1d20,0x1ad1d08] for
>>  base ptr 0x1ac1d20 count 273 and data

[OMPI devel] MTT has migrated to git

2014-04-17 Thread Jeff Squyres (jsquyres)
For all of you who run MTT via a SVN checkout, note that MTT has now moved to 
git/github:

https://github.com/open-mpi/mtt/

You might want to get a new clone and remove your old SVN checkout.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] 1-question developer poll

2014-04-17 Thread Josh Hursey
For Open MPI:
 - Primary: Mercurial (hosted on BitBucket - better deal for academia)
 - Secondary: Git (hosted on either BitBucket or GitHub)
 - SVN only to commit back

For other projects:
 - SVN - Becoming less commonly used, but still used for some projects like
Open MPI
 - Mercurial and Git - equally for various projects.

Teaching students SCM, Git is probably the most difficult since the initial
learning curve is steeper than Mercurial, and they can easily get turned
around with some of the more complex features they find on their own. SVN
is the easiest to teach, but the most restrictive and requires dedicated a
hosting server in the department.

We are having a similar discussion in our department at the moment
regarding which SCM system we should expose students to in the upper level
courses. Currently, we have started (past year and a half) using Git in at
least 2 classes. Previously, students were not really exposed to SCM except
if they did some independent research. It is too early to tell how
successful that has been.

-- Josh



On Wed, Apr 16, 2014 at 5:32 AM, Jeff Squyres (jsquyres)  wrote:

> What source code repository technology(ies) do you use for Open MPI
> development? (indicate all that apply)
>
> - SVN
> - Mercurial
> - Git
>
> I ask this question because there's serious discussions afoot to switch
> OMPI's main SVN repo to Git, and I want to get a feel for the current
> landscape out there.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/04/14537.php
>



-- 
Joshua Hursey
Assistant Professor of Computer Science
University of Wisconsin-La Crosse
http://cs.uwlax.edu/~jjhursey


Re: [MTT devel] Converted to git

2014-04-17 Thread Jeff Squyres (jsquyres)
I assume this means that no one found any problems or has any changes.

I'll be moving this to its permanent home on github sometime soon and making 
the MTT SVN be read-only.  Trac will be migrating to be git-based soon as well.

Please do not use the MTT trac until further notice.  Thanks!


On Apr 16, 2014, at 3:40 PM, Jeff Squyres (jsquyres)  wrote:

> BTW, I used the following SVN<-->email addresses mapping for creating the git 
> commits.  Let me know if you want something different:
> 
> adkulkar = Abhishek Kulkarni 
> afriedle = Andrew Friedley 
> brbarret = Brian Barrett 
> cyeoh = Chris Yeoh 
> em162155 = Ethan Mallove 
> emallove = Ethan Mallove 
> eugene = Eugene Loh 
> hpcstork = Sven Stork 
> jjhursey = Josh Hursey 
> jsquyres = Jeff Squyres 
> miked = Mike Dubman 
> pasha = Pavel Shamis 
> rusraink = Rainer Keller 
> shiqing = Shiqing Fan 
> timattox = Tim Mattox 
> 
> 
> On Apr 16, 2014, at 3:37 PM, "Jeff Squyres (jsquyres)"  
> wrote:
> 
>> I have done a TRIAL conversion to git and pushed it to a demo repo at 
>> github.  Please examine it and let me know if you see any problems:
>> 
>>   https://github.com/jsquyres/mtt-test
>> 
>> Note that we converted references to "rXYZ" in log messages -- see 
>> https://github.com/jsquyres/mtt-test/commit/ebb98c67677b02fa00064f8b1ae0d40941c305cd
>>  for an example.
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> mtt-devel mailing list
>> mtt-de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> mtt-devel mailing list
> mtt-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks

2014-04-17 Thread Gilles Gouaillardet
Dear OpenMPI developers,

i just created #4531 in order to track this issue :
https://svn.open-mpi.org/trac/ompi/ticket/4531

Basically, the coll/tuned implementation of MPI_Bcast does not work when
two tasks
uses datatypes of different sizes.
for example, if the root send two large vectors of MPI_INT and non root
receive many MPI_INT, then MPI_Bcast will crash.
but if the root send many MPI_INT and the non root receive two large
vectors of MPI_INT, then MPI_Bcast will silently fail.
(the TRAC ticket has attached test cases)

i believe this kind of issue could occur on all/most collective of the
coll/tuned module, so it is not limited to MPI_Bcast.


i am wondering of what could be the best way to solve this.

one solution i could think of, would be to generate temporary datatypes
in order to send message whose size is exactly the segment_size.

an other solution i could think of, would be to have new send/recv
functions :
if we consider the send function :
int mca_pml_ob1_send(void *buf,
 size_t count,
 ompi_datatype_t * datatype,
 int dst,
 int tag,
 mca_pml_base_send_mode_t sendmode,
 ompi_communicator_t * comm)

we could imagine to have the xsend function :
int mca_pml_ob1_xsend(void *buf,
 size_t count,
 ompi_datatype_t * datatype,
 size_t offset,
 size_t size,
 int dst,
 int tag,
 mca_pml_base_send_mode_t sendmode,
 ompi_communicator_t * comm)

where offset is the number of bytes that should be skipped from the
beginning of buf
and size if the (max) number of bytes to be sent (e.g. the message will
be "truncated"
to size bytes if (count*size(datatype) - offset) > size

or we could use a buffer if needed, and send/recv with MPI_PACKED datatype
(this is less efficient, would it even work on heterogeneous nodes ?)

or we could simply consider this is just a limitation of coll/tuned
(coll/basic works fine) and do nothing

or something else i did not think of ...


thanks in advance for your feedback

Gilles


Re: [OMPI devel] 1-question developer poll

2014-04-17 Thread Christoph Niethammer
git (Github mirror, git-svn, git patches)

--

Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart

Tel: ++49(0)711-685-87203
email: nietham...@hlrs.de
http://www.hlrs.de/people/niethammer



- Original Message -
From: "Jeff Squyres (jsquyres)" 
To: "Open MPI Developers List" 
Sent: Wednesday, April 16, 2014 12:32:10 PM
Subject: [OMPI devel] 1-question developer poll

What source code repository technology(ies) do you use for Open MPI 
development? (indicate all that apply)

- SVN
- Mercurial
- Git

I ask this question because there's serious discussions afoot to switch OMPI's 
main SVN repo to Git, and I want to get a feel for the current landscape out 
there.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/04/14537.php