Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks
Any update on this? Can it be used in the RMA part? George. On Wed, Apr 23, 2014 at 1:58 AM, Gilles Gouaillardetwrote: > my bad :-( > > this has just been fixed > > Gilles > > On 2014/04/23 14:55, Nathan Hjelm wrote: >> The ompi_datatype_flatten.c file appears to be missing. Let me know once >> it is committed and I will take a look. I will see if I can write the >> RMA code using it over the next week or so. >> > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/04/14582.php
Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks
my bad :-( this has just been fixed Gilles On 2014/04/23 14:55, Nathan Hjelm wrote: > The ompi_datatype_flatten.c file appears to be missing. Let me know once > it is committed and I will take a look. I will see if I can write the > RMA code using it over the next week or so. >
Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks
The ompi_datatype_flatten.c file appears to be missing. Let me know once it is committed and I will take a look. I will see if I can write the RMA code using it over the next week or so. -Nathan On Wed, Apr 23, 2014 at 02:43:12PM +0900, Gilles Gouaillardet wrote: > Nathan, > > i uploaded this part to github : > https://github.com/ggouaillardet/ompi-svn-mirror/tree/flatten-datatype > > you really need to check the last commit : > https://github.com/ggouaillardet/ompi-svn-mirror/commit/a8d014c6f144fa5732bdd25f8b6b05b07ea8 > > please consider this as experimental and poorly tested. > that being said, this is only addition to existing code, so it does not > break anything and could be pushed to the trunk. > > Gilles > > On 2014/04/23 0:05, Hjelm, Nathan T wrote: > > I need the flatten datatype call for handling true rdma in the one-sided > > code as well. Is there a plan to implement this feature soon? > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/04/14579.php pgpNQanAfaGLx.pgp Description: PGP signature
Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks
George, i am sorry i cannot see how flatten datatype can be helpful here :-( in this example, the master must broadcast a long vector. this datatype is contiguous so the flatten'ed datatype *is* the type provided by the MPI application. how would pipelining happen in this case (e.g. who has to cut the long vector into pieces and how) ? should a temporary buffer be used ? and then should it be sent into pieces of type MPI_PACKED ? (and if yes, would this be safe in an heterogenous communicator ?) Thanks in advance for your insights, Gilles On 2014/04/22 12:04, George Bosilca wrote: > Indeed there are many potential solutions, but all require too much > intervention on the code to be generic enough. As we discussed > privately mid last year, the "flatten datatype" approach seems to me > to be the most profitable.It is simple to implement and it is also > generic, a simple change will make all pipelined collective work (not > only tuned but all the other as well).
Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks
Nathan, i uploaded this part to github : https://github.com/ggouaillardet/ompi-svn-mirror/tree/flatten-datatype you really need to check the last commit : https://github.com/ggouaillardet/ompi-svn-mirror/commit/a8d014c6f144fa5732bdd25f8b6b05b07ea8 please consider this as experimental and poorly tested. that being said, this is only addition to existing code, so it does not break anything and could be pushed to the trunk. Gilles On 2014/04/23 0:05, Hjelm, Nathan T wrote: > I need the flatten datatype call for handling true rdma in the one-sided code > as well. Is there a plan to implement this feature soon? >
Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks
I need the flatten datatype call for handling true rdma in the one-sided code as well. Is there a plan to implement this feature soon? -Nathan From: devel [devel-boun...@open-mpi.org] on behalf of George Bosilca [bosi...@icl.utk.edu] Sent: Monday, April 21, 2014 9:04 PM To: Open MPI Developers Subject: Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks Indeed there are many potential solutions, but all require too much intervention on the code to be generic enough. As we discussed privately mid last year, the "flatten datatype" approach seems to me to be the most profitable.It is simple to implement and it is also generic, a simple change will make all pipelined collective work (not only tuned but all the other as well). Use a flatten datatype instead of the one provided by the MPI application. The flatten datatype will have the same type map as the original data, but will be all in a single level. As the MPI standard requires all collective to use datatype*count that has the same type signature, this flattened datatype will allow all the peers in a collective to have a consistent view of the operations to be done, and as a result use the same sane pipelining boundaries. George. On Thu, Apr 17, 2014 at 5:02 AM, Gilles Gouaillardet <gilles.gouaillar...@iferc.org> wrote: > Dear OpenMPI developers, > > i just created #4531 in order to track this issue : > https://svn.open-mpi.org/trac/ompi/ticket/4531 > > Basically, the coll/tuned implementation of MPI_Bcast does not work when > two tasks > uses datatypes of different sizes. > for example, if the root send two large vectors of MPI_INT and non root > receive many MPI_INT, then MPI_Bcast will crash. > but if the root send many MPI_INT and the non root receive two large > vectors of MPI_INT, then MPI_Bcast will silently fail. > (the TRAC ticket has attached test cases) > > i believe this kind of issue could occur on all/most collective of the > coll/tuned module, so it is not limited to MPI_Bcast. > > > i am wondering of what could be the best way to solve this. > > one solution i could think of, would be to generate temporary datatypes > in order to send message whose size is exactly the segment_size. > > an other solution i could think of, would be to have new send/recv > functions : > if we consider the send function : > int mca_pml_ob1_send(void *buf, > size_t count, > ompi_datatype_t * datatype, > int dst, > int tag, > mca_pml_base_send_mode_t sendmode, > ompi_communicator_t * comm) > > we could imagine to have the xsend function : > int mca_pml_ob1_xsend(void *buf, > size_t count, > ompi_datatype_t * datatype, > size_t offset, > size_t size, > int dst, > int tag, > mca_pml_base_send_mode_t sendmode, > ompi_communicator_t * comm) > > where offset is the number of bytes that should be skipped from the > beginning of buf > and size if the (max) number of bytes to be sent (e.g. the message will > be "truncated" > to size bytes if (count*size(datatype) - offset) > size > > or we could use a buffer if needed, and send/recv with MPI_PACKED datatype > (this is less efficient, would it even work on heterogeneous nodes ?) > > or we could simply consider this is just a limitation of coll/tuned > (coll/basic works fine) and do nothing > > or something else i did not think of ... > > > thanks in advance for your feedback > > Gilles > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/04/14556.php ___ devel mailing list de...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14571.php
Re: [OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks
Indeed there are many potential solutions, but all require too much intervention on the code to be generic enough. As we discussed privately mid last year, the "flatten datatype" approach seems to me to be the most profitable.It is simple to implement and it is also generic, a simple change will make all pipelined collective work (not only tuned but all the other as well). Use a flatten datatype instead of the one provided by the MPI application. The flatten datatype will have the same type map as the original data, but will be all in a single level. As the MPI standard requires all collective to use datatype*count that has the same type signature, this flattened datatype will allow all the peers in a collective to have a consistent view of the operations to be done, and as a result use the same sane pipelining boundaries. George. On Thu, Apr 17, 2014 at 5:02 AM, Gilles Gouaillardetwrote: > Dear OpenMPI developers, > > i just created #4531 in order to track this issue : > https://svn.open-mpi.org/trac/ompi/ticket/4531 > > Basically, the coll/tuned implementation of MPI_Bcast does not work when > two tasks > uses datatypes of different sizes. > for example, if the root send two large vectors of MPI_INT and non root > receive many MPI_INT, then MPI_Bcast will crash. > but if the root send many MPI_INT and the non root receive two large > vectors of MPI_INT, then MPI_Bcast will silently fail. > (the TRAC ticket has attached test cases) > > i believe this kind of issue could occur on all/most collective of the > coll/tuned module, so it is not limited to MPI_Bcast. > > > i am wondering of what could be the best way to solve this. > > one solution i could think of, would be to generate temporary datatypes > in order to send message whose size is exactly the segment_size. > > an other solution i could think of, would be to have new send/recv > functions : > if we consider the send function : > int mca_pml_ob1_send(void *buf, > size_t count, > ompi_datatype_t * datatype, > int dst, > int tag, > mca_pml_base_send_mode_t sendmode, > ompi_communicator_t * comm) > > we could imagine to have the xsend function : > int mca_pml_ob1_xsend(void *buf, > size_t count, > ompi_datatype_t * datatype, > size_t offset, > size_t size, > int dst, > int tag, > mca_pml_base_send_mode_t sendmode, > ompi_communicator_t * comm) > > where offset is the number of bytes that should be skipped from the > beginning of buf > and size if the (max) number of bytes to be sent (e.g. the message will > be "truncated" > to size bytes if (count*size(datatype) - offset) > size > > or we could use a buffer if needed, and send/recv with MPI_PACKED datatype > (this is less efficient, would it even work on heterogeneous nodes ?) > > or we could simply consider this is just a limitation of coll/tuned > (coll/basic works fine) and do nothing > > or something else i did not think of ... > > > thanks in advance for your feedback > > Gilles > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/04/14556.php
[OMPI devel] coll/tuned MPI_Bcast can crash or silently fail when using distinct datatypes across tasks
Dear OpenMPI developers, i just created #4531 in order to track this issue : https://svn.open-mpi.org/trac/ompi/ticket/4531 Basically, the coll/tuned implementation of MPI_Bcast does not work when two tasks uses datatypes of different sizes. for example, if the root send two large vectors of MPI_INT and non root receive many MPI_INT, then MPI_Bcast will crash. but if the root send many MPI_INT and the non root receive two large vectors of MPI_INT, then MPI_Bcast will silently fail. (the TRAC ticket has attached test cases) i believe this kind of issue could occur on all/most collective of the coll/tuned module, so it is not limited to MPI_Bcast. i am wondering of what could be the best way to solve this. one solution i could think of, would be to generate temporary datatypes in order to send message whose size is exactly the segment_size. an other solution i could think of, would be to have new send/recv functions : if we consider the send function : int mca_pml_ob1_send(void *buf, size_t count, ompi_datatype_t * datatype, int dst, int tag, mca_pml_base_send_mode_t sendmode, ompi_communicator_t * comm) we could imagine to have the xsend function : int mca_pml_ob1_xsend(void *buf, size_t count, ompi_datatype_t * datatype, size_t offset, size_t size, int dst, int tag, mca_pml_base_send_mode_t sendmode, ompi_communicator_t * comm) where offset is the number of bytes that should be skipped from the beginning of buf and size if the (max) number of bytes to be sent (e.g. the message will be "truncated" to size bytes if (count*size(datatype) - offset) > size or we could use a buffer if needed, and send/recv with MPI_PACKED datatype (this is less efficient, would it even work on heterogeneous nodes ?) or we could simply consider this is just a limitation of coll/tuned (coll/basic works fine) and do nothing or something else i did not think of ... thanks in advance for your feedback Gilles