Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks

2014-05-06 Thread George Bosilca
Any update on this? Can it be used in the RMA part?

  George.


On Wed, Apr 23, 2014 at 1:58 AM, Gilles Gouaillardet
 wrote:
> my bad :-(
>
> this has just been fixed
>
> Gilles
>
> On 2014/04/23 14:55, Nathan Hjelm wrote:
>> The ompi_datatype_flatten.c file appears to be missing. Let me know once
>> it is committed and I will take a look. I will see if I can write the
>> RMA code using it over the next week or so.
>>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14582.php


Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks

2014-04-23 Thread Gilles Gouaillardet
my bad :-(

this has just been fixed

Gilles

On 2014/04/23 14:55, Nathan Hjelm wrote:
> The ompi_datatype_flatten.c file appears to be missing. Let me know once
> it is committed and I will take a look. I will see if I can write the
> RMA code using it over the next week or so.
>



Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks

2014-04-23 Thread Nathan Hjelm
The ompi_datatype_flatten.c file appears to be missing. Let me know once
it is committed and I will take a look. I will see if I can write the
RMA code using it over the next week or so.

-Nathan

On Wed, Apr 23, 2014 at 02:43:12PM +0900, Gilles Gouaillardet wrote:
> Nathan,
> 
> i uploaded this part to github :
> https://github.com/ggouaillardet/ompi-svn-mirror/tree/flatten-datatype
> 
> you really need to check the last commit :
> https://github.com/ggouaillardet/ompi-svn-mirror/commit/a8d014c6f144fa5732bdd25f8b6b05b07ea8
> 
> please consider this as experimental and poorly tested.
> that being said, this is only addition to existing code, so it does not
> break anything and could be pushed to the trunk.
> 
> Gilles
> 
> On 2014/04/23 0:05, Hjelm, Nathan T wrote:
> > I need the flatten datatype call for handling true rdma in the one-sided 
> > code as well. Is there a plan to implement this feature soon?
> >
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14579.php


pgpNQanAfaGLx.pgp
Description: PGP signature


Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks

2014-04-23 Thread Gilles Gouaillardet
George,

i am sorry i cannot see how flatten datatype can be helpful here :-(

in this example, the master must broadcast a long vector. this datatype
is contiguous
so the flatten'ed datatype *is* the type provided by the MPI application.

how would pipelining happen in this case (e.g. who has to cut the long
vector into pieces and how) ?

should a temporary buffer be used ? and then should it be sent into
pieces of type MPI_PACKED ?
(and if yes, would this be safe in an heterogenous communicator ?)

Thanks in advance for your insights,

Gilles

On 2014/04/22 12:04, George Bosilca wrote:
> Indeed there are many potential solutions, but all require too much
> intervention on the code to be generic enough. As we discussed
> privately mid last year, the "flatten datatype" approach seems to me
> to be the most profitable.It is simple to implement and it is also
> generic, a simple change will make all pipelined collective work (not
> only tuned but all the other as well).



Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks

2014-04-23 Thread Gilles Gouaillardet
Nathan,

i uploaded this part to github :
https://github.com/ggouaillardet/ompi-svn-mirror/tree/flatten-datatype

you really need to check the last commit :
https://github.com/ggouaillardet/ompi-svn-mirror/commit/a8d014c6f144fa5732bdd25f8b6b05b07ea8

please consider this as experimental and poorly tested.
that being said, this is only addition to existing code, so it does not
break anything and could be pushed to the trunk.

Gilles

On 2014/04/23 0:05, Hjelm, Nathan T wrote:
> I need the flatten datatype call for handling true rdma in the one-sided code 
> as well. Is there a plan to implement this feature soon?
>



Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks

2014-04-22 Thread Hjelm, Nathan T
I need the flatten datatype call for handling true rdma in the one-sided code 
as well. Is there a plan to implement this feature soon?

-Nathan

From: devel [devel-boun...@open-mpi.org] on behalf of George Bosilca 
[bosi...@icl.utk.edu]
Sent: Monday, April 21, 2014 9:04 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when 
using distinct datatypes across tasks

Indeed there are many potential solutions, but all require too much
intervention on the code to be generic enough. As we discussed
privately mid last year, the "flatten datatype" approach seems to me
to be the most profitable.It is simple to implement and it is also
generic, a simple change will make all pipelined collective work (not
only tuned but all the other as well).

Use a flatten datatype instead of the one provided by the MPI
application. The flatten datatype will have the same type map as the
original data, but will be all in a single level. As the MPI standard
requires all collective to use datatype*count that has the same type
signature, this flattened datatype will allow all the peers in a
collective to have a consistent view of the operations to be done, and
as a result use the same sane pipelining boundaries.

  George.

On Thu, Apr 17, 2014 at 5:02 AM, Gilles Gouaillardet
<gilles.gouaillar...@iferc.org> wrote:
> Dear OpenMPI developers,
>
> i just created #4531 in order to track this issue :
> https://svn.open-mpi.org/trac/ompi/ticket/4531
>
> Basically, the coll/tuned implementation of MPI_Bcast does not work when
> two tasks
> uses datatypes of different sizes.
> for example, if the root send two large vectors of MPI_INT and non root
> receive many MPI_INT, then MPI_Bcast will crash.
> but if the root send many MPI_INT and the non root receive two large
> vectors of MPI_INT, then MPI_Bcast will silently fail.
> (the TRAC ticket has attached test cases)
>
> i believe this kind of issue could occur on all/most collective of the
> coll/tuned module, so it is not limited to MPI_Bcast.
>
>
> i am wondering of what could be the best way to solve this.
>
> one solution i could think of, would be to generate temporary datatypes
> in order to send message whose size is exactly the segment_size.
>
> an other solution i could think of, would be to have new send/recv
> functions :
> if we consider the send function :
> int mca_pml_ob1_send(void *buf,
>  size_t count,
>  ompi_datatype_t * datatype,
>  int dst,
>  int tag,
>  mca_pml_base_send_mode_t sendmode,
>  ompi_communicator_t * comm)
>
> we could imagine to have the xsend function :
> int mca_pml_ob1_xsend(void *buf,
>  size_t count,
>  ompi_datatype_t * datatype,
>  size_t offset,
>  size_t size,
>  int dst,
>  int tag,
>  mca_pml_base_send_mode_t sendmode,
>  ompi_communicator_t * comm)
>
> where offset is the number of bytes that should be skipped from the
> beginning of buf
> and size if the (max) number of bytes to be sent (e.g. the message will
> be "truncated"
> to size bytes if (count*size(datatype) - offset) > size
>
> or we could use a buffer if needed, and send/recv with MPI_PACKED datatype
> (this is less efficient, would it even work on heterogeneous nodes ?)
>
> or we could simply consider this is just a limitation of coll/tuned
> (coll/basic works fine) and do nothing
>
> or something else i did not think of ...
>
>
> thanks in advance for your feedback
>
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14556.php
___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/04/14571.php


Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks

2014-04-22 Thread George Bosilca
Indeed there are many potential solutions, but all require too much
intervention on the code to be generic enough. As we discussed
privately mid last year, the "flatten datatype" approach seems to me
to be the most profitable.It is simple to implement and it is also
generic, a simple change will make all pipelined collective work (not
only tuned but all the other as well).

Use a flatten datatype instead of the one provided by the MPI
application. The flatten datatype will have the same type map as the
original data, but will be all in a single level. As the MPI standard
requires all collective to use datatype*count that has the same type
signature, this flattened datatype will allow all the peers in a
collective to have a consistent view of the operations to be done, and
as a result use the same sane pipelining boundaries.

  George.

On Thu, Apr 17, 2014 at 5:02 AM, Gilles Gouaillardet
 wrote:
> Dear OpenMPI developers,
>
> i just created #4531 in order to track this issue :
> https://svn.open-mpi.org/trac/ompi/ticket/4531
>
> Basically, the coll/tuned implementation of MPI_Bcast does not work when
> two tasks
> uses datatypes of different sizes.
> for example, if the root send two large vectors of MPI_INT and non root
> receive many MPI_INT, then MPI_Bcast will crash.
> but if the root send many MPI_INT and the non root receive two large
> vectors of MPI_INT, then MPI_Bcast will silently fail.
> (the TRAC ticket has attached test cases)
>
> i believe this kind of issue could occur on all/most collective of the
> coll/tuned module, so it is not limited to MPI_Bcast.
>
>
> i am wondering of what could be the best way to solve this.
>
> one solution i could think of, would be to generate temporary datatypes
> in order to send message whose size is exactly the segment_size.
>
> an other solution i could think of, would be to have new send/recv
> functions :
> if we consider the send function :
> int mca_pml_ob1_send(void *buf,
>  size_t count,
>  ompi_datatype_t * datatype,
>  int dst,
>  int tag,
>  mca_pml_base_send_mode_t sendmode,
>  ompi_communicator_t * comm)
>
> we could imagine to have the xsend function :
> int mca_pml_ob1_xsend(void *buf,
>  size_t count,
>  ompi_datatype_t * datatype,
>  size_t offset,
>  size_t size,
>  int dst,
>  int tag,
>  mca_pml_base_send_mode_t sendmode,
>  ompi_communicator_t * comm)
>
> where offset is the number of bytes that should be skipped from the
> beginning of buf
> and size if the (max) number of bytes to be sent (e.g. the message will
> be "truncated"
> to size bytes if (count*size(datatype) - offset) > size
>
> or we could use a buffer if needed, and send/recv with MPI_PACKED datatype
> (this is less efficient, would it even work on heterogeneous nodes ?)
>
> or we could simply consider this is just a limitation of coll/tuned
> (coll/basic works fine) and do nothing
>
> or something else i did not think of ...
>
>
> thanks in advance for your feedback
>
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14556.php


[OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks

2014-04-17 Thread Gilles Gouaillardet
Dear OpenMPI developers,

i just created #4531 in order to track this issue :
https://svn.open-mpi.org/trac/ompi/ticket/4531

Basically, the coll/tuned implementation of MPI_Bcast does not work when
two tasks
uses datatypes of different sizes.
for example, if the root send two large vectors of MPI_INT and non root
receive many MPI_INT, then MPI_Bcast will crash.
but if the root send many MPI_INT and the non root receive two large
vectors of MPI_INT, then MPI_Bcast will silently fail.
(the TRAC ticket has attached test cases)

i believe this kind of issue could occur on all/most collective of the
coll/tuned module, so it is not limited to MPI_Bcast.


i am wondering of what could be the best way to solve this.

one solution i could think of, would be to generate temporary datatypes
in order to send message whose size is exactly the segment_size.

an other solution i could think of, would be to have new send/recv
functions :
if we consider the send function :
int mca_pml_ob1_send(void *buf,
 size_t count,
 ompi_datatype_t * datatype,
 int dst,
 int tag,
 mca_pml_base_send_mode_t sendmode,
 ompi_communicator_t * comm)

we could imagine to have the xsend function :
int mca_pml_ob1_xsend(void *buf,
 size_t count,
 ompi_datatype_t * datatype,
 size_t offset,
 size_t size,
 int dst,
 int tag,
 mca_pml_base_send_mode_t sendmode,
 ompi_communicator_t * comm)

where offset is the number of bytes that should be skipped from the
beginning of buf
and size if the (max) number of bytes to be sent (e.g. the message will
be "truncated"
to size bytes if (count*size(datatype) - offset) > size

or we could use a buffer if needed, and send/recv with MPI_PACKED datatype
(this is less efficient, would it even work on heterogeneous nodes ?)

or we could simply consider this is just a limitation of coll/tuned
(coll/basic works fine) and do nothing

or something else i did not think of ...


thanks in advance for your feedback

Gilles