date:20110414

Re: [OMPI devel] Exit status

2011-04-14 Thread N.M. Maclaren


On Apr 14 2011, Ralph Castain wrote:


I've run across an interesting issue for which I don't have a ready answer.

If an MPI process aborts, we automatically abort the entire job.

If an MPI process returns a non-zero exit status, indicating that there 
was something abnormal about its termination, we ignore it and let the 
job continue. We do print an error message out upon completion of the 
job, but we don't terminate the job upon receiving the non-zero status. 
Note that non-zero status is considered a "standard" method of indicating 
abnormal termination, though no meaning has been agreed upon for the 
specific value.


Not really.  See below.

Should we be allowing the job to continue in that circumstance? In the 
case I'm reviewing, the user's code indicates there is an error in the 
result. Since he has already called MPI_Finalize, he can't call 
MPI_Abort, and his system won't allow him to drop cores by calling 
"abort". So the exit status is his only way of indicating "abnormal 
termination".


Obviously, in this case, he would prefer the job terminate as nothing 
useful is going to be accomplished - so no point in tying up the machine.


Thoughts?


Blame Unix.  Seriously.  Many or most mainframes had the following
categories:

   Complete success - or, rather, a failure to detect an error :-)
   Partial success, with warnings of potential problems
   Failure that was diagnosed and partially cleaned-up
   Heap horrible failure - all bets are off

Unix has no such categorisation.  The distinction between a zero return
and other values can occur at any point, and some programs even use them
as flags.  It's hopeless, and whatever you do will be wrong for many
people.  I have no idea what Microsoft do, but assume that it has copied
Unix, as that is its SOP.  I recommend NOT rocking this boat.

He might do better by calling abort after MPI_Finalize, but that's
pretty iffy - just like all other approaches.  To improve this needs a
new function or argument to MPI_Finalize.

Regards,
Nick Maclaren.

Re: [OMPI devel] Exit status

2011-04-14 Thread Jeff Squyres

On Apr 14, 2011, at 4:02 AM, N.M. Maclaren wrote:

> ...  It's hopeless, and whatever you do will be wrong for many
> people.  ...

I think that sums it up pretty well.  :-)

It does seem a little strange that the scenario you describe somewhat implies 
that one process is calling MPI_Finalize lng before the others do.  
Specifically, the user is concerned with tying up resources after one process 
has called Finalize -- which implies that the others may continue on for a 
while.  It's not invalid, of course, but it is a little unusual.

I see two possibilities here:

1. have the user delay calling MPI_Finalize in the application until it can do 
the test that indicates that the rest of the job should be aborted (i.e., so 
that it can still call MPI_Abort if it wants to).  Don't forget that an 
implementation is allowed to block in MPI_Finalize until all processes call 
MPI_Finalize, anyway.

2. add an MCA param and/or orterun CLI option to abort a job if an MPI process 
terminates after MPI_Finalize with a nonzero exit status.

Just my $0.02.  :-)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] Exit status

2011-04-14 Thread Ralph Castain

On Apr 14, 2011, at 5:33 AM, Jeff Squyres wrote:

> On Apr 14, 2011, at 4:02 AM, N.M. Maclaren wrote:
> 
>> ...  It's hopeless, and whatever you do will be wrong for many
>> people.  ...
> 
> I think that sums it up pretty well.  :-)
> 
> It does seem a little strange that the scenario you describe somewhat implies 
> that one process is calling MPI_Finalize lng before the others do.  
> Specifically, the user is concerned with tying up resources after one process 
> has called Finalize -- which implies that the others may continue on for a 
> while.  It's not invalid, of course, but it is a little unusual.

I'm finding it more common than we thought. Note that I didn't say that one 
process called MPI_Finalize before the others. In this case, they call it 
fairly close together, but the individual processes continue running for quite 
some time, or until they determine that something is wrong and exit with 
non-zero status.

> 
> I see two possibilities here:
> 
> 1. have the user delay calling MPI_Finalize in the application until it can 
> do the test that indicates that the rest of the job should be aborted (i.e., 
> so that it can still call MPI_Abort if it wants to).  Don't forget that an 
> implementation is allowed to block in MPI_Finalize until all processes call 
> MPI_Finalize, anyway.
> 
> 2. add an MCA param and/or orterun CLI option to abort a job if an MPI 
> process terminates after MPI_Finalize with a nonzero exit status.
> 

I figure this last is the best option. My point was just that we abort the job 
if someone calls "abort". However, if they indicate their program is exiting 
with "something is wrong", we ignore it.

Not that big a deal - the param was my option too. Just thought I'd raise it to 
the group since it had never been discussed.

> Just my $0.02.  :-)
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Exit status

2011-04-14 Thread Jeff Squyres

On Apr 14, 2011, at 9:13 AM, Ralph Castain wrote:

> I figure this last is the best option. My point was just that we abort the 
> job if someone calls "abort". However, if they indicate their program is 
> exiting with "something is wrong", we ignore it.

Another option for the user is to kill(getpid(), 9).  

That would kill the entire job, no?  :-)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] Exit status

2011-04-14 Thread N.M. Maclaren


On Apr 14 2011, Ralph Castain wrote:



...  It's hopeless, and whatever you do will be wrong for many
people.  ...


I think that sums it up pretty well.  :-)

It does seem a little strange that the scenario you describe somewhat 
implies that one process is calling MPI_Finalize lng before the 
others do. Specifically, the user is concerned with tying up resources 
after one process has called Finalize -- which implies that the others 
may continue on for a while. It's not invalid, of course, but it is a 
little unusual.


I'm finding it more common than we thought. Note that I didn't say that 
one process called MPI_Finalize before the others. In this case, they 
call it fairly close together, but the individual processes continue 
running for quite some time, or until they determine that something is 
wrong and exit with non-zero status.


Nobody is denying that it is common.  Now, what happens when you encounter
a language or compiler that uses return codes for mere warnings (e.g.
ignored IEEE 754 flags, as stated to be desirable by LIA-1)?  Bang!

Remember that C is not the universe and many languages use MPI via the
C interface, but do not let C control their model.

Regards,
Nick Maclaren.

Re: [OMPI devel] Exit status

2011-04-14 Thread Ken Lloyd

Point well made, Nick. In other words, irrespective of OS or language,
are we citing the need for "application correcting code" from OpenMPI,
(relocate a/o retry) similar to ECC in memory? 

Ken

On Thu, 2011-04-14 at 14:31 +0100, N.M. Maclaren wrote:

> On Apr 14 2011, Ralph Castain wrote:
> >> 
> >>> ...  It's hopeless, and whatever you do will be wrong for many
> >>> people.  ...
> >> 
> >> I think that sums it up pretty well.  :-)
> >> 
> >> It does seem a little strange that the scenario you describe somewhat 
> >> implies that one process is calling MPI_Finalize lng before the 
> >> others do. Specifically, the user is concerned with tying up resources 
> >> after one process has called Finalize -- which implies that the others 
> >> may continue on for a while. It's not invalid, of course, but it is a 
> >> little unusual.
> >
> > I'm finding it more common than we thought. Note that I didn't say that 
> > one process called MPI_Finalize before the others. In this case, they 
> > call it fairly close together, but the individual processes continue 
> > running for quite some time, or until they determine that something is 
> > wrong and exit with non-zero status.
> 
> Nobody is denying that it is common.  Now, what happens when you encounter
> a language or compiler that uses return codes for mere warnings (e.g.
> ignored IEEE 754 flags, as stated to be desirable by LIA-1)?  Bang!
> 
> Remember that C is not the universe and many languages use MPI via the
> C interface, but do not let C control their model.
> 
> Regards,
> Nick Maclaren.
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

=
Kenneth A. Lloyd
CEO - Director of Systems Science
Watt Systems Technologies Inc.
www.wattsys.com
kenneth.ll...@wattsys.com 

This e-mail is covered by the Electronic Communications Privacy Act, 18
U.S.C. 2510-2521 and is intended only for the addressee named above. It
may contain privileged or confidential information. If you are not the
addressee you must not copy, distribute, disclose or use any of the
information in it. If you have received it in error please delete it and
immediately notify the sender.

Re: [OMPI devel] Exit status

2011-04-14 Thread Jeff Squyres

I think Ralph's point is that OMPI is providing the run-time environment for 
the application, and it would probably behoove us to support both kinds of 
behaviors since there are obviously people in both camps out there.  

It's pretty easy to add a non-default MCA param / orterun CLI option to say 
"abort the job if any of them exit with a non-zero status."


On Apr 14, 2011, at 9:43 AM, Ken Lloyd wrote:

> Point well made, Nick. In other words, irrespective of OS or language, are we 
> citing the need for "application correcting code" from OpenMPI, (relocate a/o 
> retry) similar to ECC in memory? 
> 
> Ken
> 
> On Thu, 2011-04-14 at 14:31 +0100, N.M. Maclaren wrote:
>> On Apr 14 2011, Ralph Castain wrote:
>> >> 
>> >>> ...  It's hopeless, and whatever you do will be wrong for many
>> >>> people.  ...
>> >> 
>> >> I think that sums it up pretty well.  :-)
>> >> 
>> >> It does seem a little strange that the scenario you describe somewhat 
>> >> implies that one process is calling MPI_Finalize lng before the 
>> >> others do. Specifically, the user is concerned with tying up resources 
>> >> after one process has called Finalize -- which implies that the others 
>> >> may continue on for a while. It's not invalid, of course, but it is a 
>> >> little unusual.
>> >
>> > I'm finding it more common than we thought. Note that I didn't say that 
>> > one process called MPI_Finalize before the others. In this case, they 
>> > call it fairly close together, but the individual processes continue 
>> > running for quite some time, or until they determine that something is 
>> > wrong and exit with non-zero status.
>> 
>> Nobody is denying that it is common.  Now, what happens when you encounter
>> a language or compiler that uses return codes for mere warnings (e.g.
>> ignored IEEE 754 flags, as stated to be desirable by LIA-1)?  Bang!
>> 
>> Remember that C is not the universe and many languages use MPI via the
>> C interface, but do not let C control their model.
>> 
>> Regards,
>> Nick Maclaren.
>> 
>> ___
>> devel mailing list
>> 
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> =
> Kenneth A. Lloyd
> CEO - Director of Systems Science
> Watt Systems Technologies Inc.
> www.wattsys.com
> kenneth.ll...@wattsys.com 
> 
> This e-mail is covered by the Electronic Communications Privacy Act, 18 
> U.S.C. 2510-2521 and is intended only for the addressee named above. It may 
> contain privileged or confidential information. If you are not the addressee 
> you must not copy, distribute, disclose or use any of the information in it. 
> If you have received it in error please delete it and immediately notify the 
> sender.
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Shamis, Pavel

Hello Rolf,

CUDA support is always welcome. 
Please see my comments bellow

+#if OMPI_CUDA_SUPPORT
+fl->fl_frag_block_alignment = 0;
+fl->fl_flags = 0;
+#endif

[pasha] It seem that the "fl_flags" is a hack that allow you to do the second 
(cuda) registration in
 mpool_rdma:

+#if OMPI_CUDA_SUPPORT
+if ((flags & MCA_MPOOL_FLAGS_CUDA_MEM) && 
mca_common_cuda_registered_memory) {
+mca_common_cuda_register(addr, size,
+ 
mpool->mpool_component->mpool_version.mca_component_name);
+   }
+#endif

[pasha] It is really _hack_ way to enable multiple device registration.  
I would prefer see new mpool component, that has support multiple device 
registration in contrast to single device registration in mpool_rdma.

 fl->fl_payload_buffer_size=0;
 fl->fl_payload_buffer_alignment=0;
 fl->fl_frag_class = OBJ_CLASS(ompi_free_list_item_t);
@@ -190,7 +194,19 @@
 alloc_size = num_elements * head_size + sizeof(ompi_free_list_memory_t) +
 flist->fl_frag_alignment;

+#if OMPI_CUDA_SUPPORT
+/* Hack for TCP since there is no memory pool. */
+if (flist->fl_frag_block_alignment) {
+alloc_size = OPAL_ALIGN(alloc_size, 4096, size_t);
+if((errno = posix_memalign((void *)&alloc_ptr, 4096, alloc_size)) != 
0) {
+alloc_ptr = NULL;
+}
+} else {
+alloc_ptr = (ompi_free_list_memory_t*)malloc(alloc_size);
+}
+#else
 alloc_ptr = (ompi_free_list_memory_t*)malloc(alloc_size);
+#endif

[pasha] I would prefer not to _hack_ ompi_free_list  in order to work around 
BTL related issues. Such kinda of problem should be handled by tcp btl. If you 
think, that it is not enough flexibility in free list or mpool interface, we 
may discuss the inderface update or modification. IMHO it is much better that 
hack.

Regards,

Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote:

> WHAT: Add support to send data directly from CUDA device memory via MPI calls.
>  
> TIMEOUT: April 25, 2011
>  
> DETAILS: When programming in a mixed MPI and CUDA environment, one cannot 
> currently send data directly from CUDA device memory.  The programmer first 
> has to move the data into host memory, and then send it.  On the receiving 
> side, it has to first be received into host memory, and then copied into CUDA 
> device memory.
>  
> This RFC adds the ability to send and receive CUDA device memory directly.
>  
> There are three basic changes being made to add the support.  First, when it 
> is detected that a buffer is CUDA device memory, the protocols that can be 
> used are restricted to the ones that first copy data into internal buffers.  
> This means that we will not be using the PUT and RGET protocols, just the 
> send and receive ones.  Secondly, rather than using memcpy to move the data 
> into and out of the host buffers, the library has to use a special CUDA copy 
> routine called cuMemcpy.  Lastly, to improve performance, the internal host 
> buffers have to also be registered with the CUDA environment (although 
> currently it is unclear how helpful that is)
>  
> By default, the code is disable and has to be configured into the library.
>   --with-cuda(=DIR)   Build cuda support, optionally adding DIR/include,
>  DIR/lib, and DIR/lib64
>   --with-cuda-libdir=DIR  Search for cuda libraries in DIR
>  
> An initial implementation can be viewed at:
> https://bitbucket.org/rolfv/ompi-trunk-cuda-3
>  
> Here is a list of the files being modified so one can see the scope of the 
> impact.
>  
> $ svn status
> M   VERSION
> M   opal/datatype/opal_convertor.h
> M   opal/datatype/opal_datatype_unpack.c
> M   opal/datatype/opal_datatype_pack.h
> M   opal/datatype/opal_convertor.c
> M   opal/datatype/opal_datatype_unpack.h
> M   configure.ac
> M   ompi/mca/btl/sm/btl_sm.c
> M   ompi/mca/btl/sm/Makefile.am
> M   ompi/mca/btl/tcp/btl_tcp_component.c
> M   ompi/mca/btl/tcp/btl_tcp.c
> M   ompi/mca/btl/tcp/Makefile.am
> M   ompi/mca/btl/openib/btl_openib_component.c
> M   ompi/mca/btl/openib/btl_openib_endpoint.c
> M   ompi/mca/btl/openib/btl_openib_mca.c
> M   ompi/mca/mpool/sm/Makefile.am
> M   ompi/mca/mpool/sm/mpool_sm_module.c
> M   ompi/mca/mpool/rdma/mpool_rdma_module.c
> M   ompi/mca/mpool/rdma/Makefile.am
> M   ompi/mca/mpool/mpool.h
> A   ompi/mca/common/cuda
> A   ompi/mca/common/cuda/configure.m4
> A   ompi/mca/common/cuda/common_cuda.c
> A   ompi/mca/common/cuda/help-mpi-common-cuda.txt
> A   ompi/mca/common/cuda/Makefile.am
> A   ompi/mca/common/cuda/common_cuda.h
> M   ompi/mca/pml/ob1/pml_ob1_component.c
> M   ompi/mca/pml/ob1/pml_ob1_sendreq.h
> M   ompi/mca/pml/ob1/pml_ob1_recvreq.h
> M   ompi/mca/pml/ob1/Makefile.am
>

Re: [OMPI devel] Exit status

2011-04-14 Thread N.M. Maclaren


On Apr 14 2011, Jeff Squyres wrote:

I think Ralph's point is that OMPI is providing the run-time environment 
for the application, and it would probably behoove us to support both 
kinds of behaviors since there are obviously people in both camps out 
there.


It's pretty easy to add a non-default MCA param / orterun CLI option to 
say "abort the job if any of them exit with a non-zero status."


That's not a problem!  Any more than a similar one to provide timeouts,
both on inactivity and on total running time (useful for teaching).
Unless options are unclean or excessive, they can be ignored by people
who don't want them.


Regards,
Nick Maclaren.

[OMPI devel] Problem of memory lost in MPI_Type_create_hindexed() with count = 1 (patch proposed)

2011-04-14 Thread Pascal Deveze


Calling MPI_Type_create_hindexed(int count, int array_of_blocklengths[],
   MPI_Aint array_of_displacements[], MPI_Datatype oldtype,
   MPI_Datatype *newtype)
with a count parameter of 1 causes a loss of memory detected by valgrind :

==2053== 576 (448 direct, 128 indirect) bytes in 1 blocks are definitely 
lost in loss record 157 of 182

==2053==at 0x4C2415D: malloc (vg_replace_malloc.c:195)
==2053==by 0x4E7CEC7: opal_obj_new (opal_object.h:469)
==2053==by 0x4E7D134: ompi_datatype_create (ompi_datatype_create.c:71)
==2053==by 0x4E7D58E: ompi_datatype_create_hindexed 
(ompi_datatype_create_indexed.c:89)
==2053==by 0x4EA74D0: PMPI_Type_create_hindexed 
(ptype_create_hindexed.c:75)

==2053==by 0x401A5C: main (in /home_nfs/xxx/type_create_hindexed)

This can be reproduced with the following trivial code:
=
#include "mpi.h"

MPI_Datatype newtype;
int lg[3];
MPI_Aint disp[3];

int main(int argc, char **argv) {
   MPI_Init(&argc,&argv);

   disp[0] = (MPI_Aint)disp;
   disp[1] = (MPI_Aint)disp+1;
   lg[0] = 5;
   lg[1] = 5;

   MPI_Type_create_hindexed(1, lg, disp, MPI_BYTE, &newtype);
   MPI_Type_free(&newtype);

   MPI_Finalize();
}
==
If MPI_Type_create_hindexed() is called with a count parameter greater 
1, valgrind does not detect any lost record.


Patch proposed:

hg diff ompi/datatype/ompi_datatype_create_indexed.c
diff -r a2d94a70f474 ompi/datatype/ompi_datatype_create_indexed.c
--- a/ompi/datatype/ompi_datatype_create_indexed.c  Wed Mar 30 
18:47:31 2011 +0200
+++ b/ompi/datatype/ompi_datatype_create_indexed.c  Thu Apr 14 
16:16:08 2011 +0200

@@ -91,11 +91,6 @@
dLength = pBlockLength[0];
endat = disp + dLength * extent;

-if( 1 >= count ) {
-pdt = ompi_datatype_create( oldType->super.desc.used + 2 );
-/* multiply by count to make it zero if count is zero */
-ompi_datatype_add( pdt, oldType, count * dLength, disp, extent );
-} else {
for( i = 1; i < count; i++ ) {
if( endat == pDisp[i] ) {
/* contiguous with the previsious */
@@ -109,7 +104,6 @@
}
}
ompi_datatype_add( pdt, oldType, dLength, disp, extent );
-}
*newType = pdt;
return OMPI_SUCCESS;
}

Explanation:
   The case (0 == count) was resolved before by returning.
   The problem is that, in the case ( 1 >= count ), 
ompi_datatype_create() is called again (it has been just called before).
   In fact the case (1 == count) is not different of the case (1 < 
count), so it is possible to just avoid the if-else statement.


We need a patch for OpenMPI 1.5 branch.

Re: [OMPI devel] Problem of memory lost in MPI_Type_create_hindexed() with count = 1 (patch proposed)

2011-04-14 Thread Jeff Squyres

That looks reasonable to me, but I'd also re-indent the body of the else{} 
(i.e., remove 4 spaces from each).

George?


On Apr 14, 2011, at 10:48 AM, Pascal Deveze wrote:

> Calling MPI_Type_create_hindexed(int count, int array_of_blocklengths[],
>   MPI_Aint array_of_displacements[], MPI_Datatype oldtype,
>   MPI_Datatype *newtype)
> with a count parameter of 1 causes a loss of memory detected by valgrind :
> 
> ==2053== 576 (448 direct, 128 indirect) bytes in 1 blocks are definitely lost 
> in loss record 157 of 182
> ==2053==at 0x4C2415D: malloc (vg_replace_malloc.c:195)
> ==2053==by 0x4E7CEC7: opal_obj_new (opal_object.h:469)
> ==2053==by 0x4E7D134: ompi_datatype_create (ompi_datatype_create.c:71)
> ==2053==by 0x4E7D58E: ompi_datatype_create_hindexed 
> (ompi_datatype_create_indexed.c:89)
> ==2053==by 0x4EA74D0: PMPI_Type_create_hindexed 
> (ptype_create_hindexed.c:75)
> ==2053==by 0x401A5C: main (in /home_nfs/xxx/type_create_hindexed)
> 
> This can be reproduced with the following trivial code:
> =
> #include "mpi.h"
> 
> MPI_Datatype newtype;
> int lg[3];
> MPI_Aint disp[3];
> 
> int main(int argc, char **argv) {
>   MPI_Init(&argc,&argv);
> 
>   disp[0] = (MPI_Aint)disp;
>   disp[1] = (MPI_Aint)disp+1;
>   lg[0] = 5;
>   lg[1] = 5;
> 
>   MPI_Type_create_hindexed(1, lg, disp, MPI_BYTE, &newtype);
>   MPI_Type_free(&newtype);
> 
>   MPI_Finalize();
> }
> ==
> If MPI_Type_create_hindexed() is called with a count parameter greater 1, 
> valgrind does not detect any lost record.
> 
> Patch proposed:
> 
> hg diff ompi/datatype/ompi_datatype_create_indexed.c
> diff -r a2d94a70f474 ompi/datatype/ompi_datatype_create_indexed.c
> --- a/ompi/datatype/ompi_datatype_create_indexed.c  Wed Mar 30 18:47:31 
> 2011 +0200
> +++ b/ompi/datatype/ompi_datatype_create_indexed.c  Thu Apr 14 16:16:08 
> 2011 +0200
> @@ -91,11 +91,6 @@
>dLength = pBlockLength[0];
>endat = disp + dLength * extent;
> -if( 1 >= count ) {
> -pdt = ompi_datatype_create( oldType->super.desc.used + 2 );
> -/* multiply by count to make it zero if count is zero */
> -ompi_datatype_add( pdt, oldType, count * dLength, disp, extent );
> -} else {
>for( i = 1; i < count; i++ ) {
>if( endat == pDisp[i] ) {
>/* contiguous with the previsious */
> @@ -109,7 +104,6 @@
>}
>}
>ompi_datatype_add( pdt, oldType, dLength, disp, extent );
> -}
>*newType = pdt;
>return OMPI_SUCCESS;
> }
> 
> Explanation:
>   The case (0 == count) was resolved before by returning.
>   The problem is that, in the case ( 1 >= count ), ompi_datatype_create() is 
> called again (it has been just called before).
>   In fact the case (1 == count) is not different of the case (1 < count), so 
> it is possible to just avoid the if-else statement.
> 
> We need a patch for OpenMPI 1.5 branch.
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Jeff Squyres

On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote:

> By default, the code is disable and has to be configured into the library.
>   --with-cuda(=DIR)   Build cuda support, optionally adding DIR/include,
>  DIR/lib, and DIR/lib64
>   --with-cuda-libdir=DIR  Search for cuda libraries in DIR

My $0.02: cuda shouldn't be disabled by default (and only enabled if you 
--with-cuda).  If configure finds all the Right cuda magic, then cuda support 
should be enabled by default.  Just like all other optional support libraries 
that OMPI uses.

More specifically: the cuda support code in OMPI should strive to be such that 
it can be enabled by default and not cause any performance penalties to codes 
that do not need/use any cuda stuff.  I'm not saying I know how to do that -- 
I'm just saying that that should be the goal.  :-)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Jeff Squyres

On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote:

> An initial implementation can be viewed at:
> https://bitbucket.org/rolfv/ompi-trunk-cuda-3

Random comments on the code...

1. I see changes like this:

mca_btl_sm_la_LIBADD += \
   $(top_ompi_builddir)/ompi/mca/common/cuda/libmca_common_cuda.la

But I don't see any common/cuda function calls in the SM BTL.  Why the link? 

2. I see a new "opal_output(-1,.." in btl_tcp.c.  If it's a developer-only 
opal_output, it should be compiled out by default.

3. In ompi_free_list.c, you call posix_memalign(), protected by 
OMPI_CUDA_SUPPORT.  Does posix_memalign() exist in Windows, and/or does 
OMPI_CUDE_SUPPORT exclude Windows?

4. Along with what Pasha said, it seems odd to put a CUDA-specific value in 
mpool.h (MCA_MPOOL_FLAGS_CUDA_MEM).

--> Some explanation is required for this comment.  My gut reaction is to have 
portable code in OMPI, such that we can support multiple registration-necessary 
memory pools.  That being said, NVIDIA is the first mover here; is there any 
other interest in ever wanting to be able to register other kinds of memory, 
too?  Or should we let NVIDIA do it this way on the assumption that it will be 
years before anyone *might* want to use some other multi-memory-registration 
scheme?  I can see both sides of the coin here...

5. In pml_ob1_sendreq.h, you set size to 0 if OMPI_CUDA_SUPPORT.  That means 
that any OMPI compiled with CUDA support will have this value -- regardless if 
they're using accelerators or not.  Shouldn't there be a compile-time check AND 
a run-time check for this kind of thing?

6. Instead of #if OMPI_CUDA_SUPPORT to select which memcpy to use, why not have 
a different opal memcopy MCA component for the cuda memcpy?  Would that make a 
bunch of convertor #if OMPI_CUDA_SUPPORT's go away?

7. Using the name OMPI_* down in OPAL doesn't seem like a good idea (there's 
still some OMPI_* preprocessor names down in there that haven't yet been 
converted to OPAL_*, but adding new OMPI_* names down there doesn't seem to be 
a good idea).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Ken Lloyd

I'd suggest supporting CUDA device queries in carto and hwloc.

Ken


On Thu, 2011-04-14 at 11:25 -0400, Jeff Squyres wrote:

> On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote:
> 
> > By default, the code is disable and has to be configured into the library.
> >   --with-cuda(=DIR)   Build cuda support, optionally adding DIR/include,
> >  DIR/lib, and DIR/lib64
> >   --with-cuda-libdir=DIR  Search for cuda libraries in DIR
> 
> My $0.02: cuda shouldn't be disabled by default (and only enabled if you 
> --with-cuda).  If configure finds all the Right cuda magic, then cuda support 
> should be enabled by default.  Just like all other optional support libraries 
> that OMPI uses.
> 
> More specifically: the cuda support code in OMPI should strive to be such 
> that it can be enabled by default and not cause any performance penalties to 
> codes that do not need/use any cuda stuff.  I'm not saying I know how to do 
> that -- I'm just saying that that should be the goal.  :-)
>

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Shamis, Pavel

> 
>> By default, the code is disable and has to be configured into the library.
>>  --with-cuda(=DIR)   Build cuda support, optionally adding DIR/include,
>> DIR/lib, and DIR/lib64
>>  --with-cuda-libdir=DIR  Search for cuda libraries in DIR
> 
> My $0.02: cuda shouldn't be disabled by default (and only enabled if you 
> --with-cuda).  If configure finds all the Right cuda magic, then cuda support 
> should be enabled by default.  Just like all other optional support libraries 
> that OMPI uses.

Actually I'm not sure that it is good idea to enable CUDA by default, since it 
disables zero-copy protocol, that is critical for good performance.

My 0.02$

Pasha.

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread George Bosilca


On Apr 13, 2011, at 20:07 , Ken Lloyd wrote:

> George, Yes. GPUDirect eliminated an additional (host) memory buffering step 
> between the HCA and the GPU that took CPU cycles.

If this is the case then why do we need to use special memcpy functions to copy 
the data back into the host memory prior to using the send/recv protocol? If 
GPUDirect remove the need for host buffering then as soon as the memory is 
identified as being on the device (using the Unified Virtual Addressing), the 
device can deliver it directly to the network card.

  george.

> I was never very comfortable with the kernel patch necessary, nor the patched 
> OFED required to make it all work.  Having said that, it did provide a ~14% 
> improvement in throughput on some of my code. Not bad.
> 
> Now comes GPUDirect 2.0 (mostly helping GPU-GPU across PCIe) and Unified 
> Virtual Addressing. Holds great promise, but the real understanding comes 
> from whitebox analysis, and instrumenting my app code.
> 
> On Wed, 2011-04-13 at 17:21 -0400, George Bosilca wrote:
>> On Apr 13, 2011, at 14:48 , Rolf vandeVaart wrote:
>> 
>> > This work does not depend on GPU Direct.  It is making use of the fact 
>> > that one can malloc memory, register it with IB, and register it with CUDA 
>> > via the new 4.0 API cuMemHostRegister API.  Then one can copy device 
>> > memory into this memory.
>> 
>> Wasn't that the point behind GPUDirect? To allow direct memory copy between 
>> the GPU and the network card without external intervention?
>> 
>>   george.
>> 
>> 
>> 
>> ___
>> devel mailing list
>> 
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> =
> Kenneth A. Lloyd
> CEO - Director of Systems Science
> Watt Systems Technologies Inc.
> www.wattsys.com
> kenneth.ll...@wattsys.com 
> 
> This e-mail is covered by the Electronic Communications Privacy Act, 18 
> U.S.C. 2510-2521 and is intended only for the addressee named above. It may 
> contain privileged or confidential information. If you are not the addressee 
> you must not copy, distribute, disclose or use any of the information in it. 
> If you have received it in error please delete it and immediately notify the 
> sender.
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

"I disapprove of what you say, but I will defend to the death your right to say 
it"
  -- Evelyn Beatrice Hall

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Brice Goglin

Le 14/04/2011 17:58, George Bosilca a écrit :
> On Apr 13, 2011, at 20:07 , Ken Lloyd wrote:
>
>   
>> George, Yes. GPUDirect eliminated an additional (host) memory buffering step 
>> between the HCA and the GPU that took CPU cycles.
>> 
> If this is the case then why do we need to use special memcpy functions to 
> copy the data back into the host memory prior to using the send/recv 
> protocol? If GPUDirect remove the need for host buffering then as soon as the 
> memory is identified as being on the device (using the Unified Virtual 
> Addressing), the device can deliver it directly to the network card.
>   

GPUDirect is only about using the same host buffer for DMA from/to both
the NIC and the GPU. Without GPUDirect, you have a host buffer for the
GPU and another one for IB (looks like some strange memory registration
problem to me...), and you have to memcpy between them in the middle .

We're all confused with the name "GPUDirect" because we remember people
doing DMA directly between the NIC and a GPU or SCSI disk ten years ago.
GPUDirect doesn't go that far unfortunately :/

Brice

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Brice Goglin

hwloc (since 1.1, on Linux) can already tell you which CPUs are close to
a CUDA device, see
https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cuda.h
and https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cudart.h
Do you need anything else ?

Brice



Le 14/04/2011 17:44, Ken Lloyd a écrit :
> I'd suggest supporting CUDA device queries in carto and hwloc.
>
> Ken
>
>
> On Thu, 2011-04-14 at 11:25 -0400, Jeff Squyres wrote:
>> On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote:
>>
>> > By default, the code is disable and has to be configured into the library.
>> >   --with-cuda(=DIR)   Build cuda support, optionally adding 
>> > DIR/include,
>> >  DIR/lib, and DIR/lib64
>> >   --with-cuda-libdir=DIR  Search for cuda libraries in DIR
>>
>> My $0.02: cuda shouldn't be disabled by default (and only enabled if you 
>> --with-cuda).  If configure finds all the Right cuda magic, then cuda 
>> support should be enabled by default.  Just like all other optional support 
>> libraries that OMPI uses.
>>
>> More specifically: the cuda support code in OMPI should strive to be such 
>> that it can be enabled by default and not cause any performance penalties to 
>> codes that do not need/use any cuda stuff.  I'm not saying I know how to do 
>> that -- I'm just saying that that should be the goal.  :-)
>>
>> 
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Jeff Squyres

On Apr 14, 2011, at 11:48 AM, Shamis, Pavel wrote:

> Actually I'm not sure that it is good idea to enable CUDA by default, since 
> it disables zero-copy protocol, that is critical for good performance.

That can easily be a run-time check during startup.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Jeff Squyres

On Apr 14, 2011, at 12:37 PM, Brice Goglin wrote:

> GPUDirect is only about using the same host buffer for DMA from/to both
> the NIC and the GPU. Without GPUDirect, you have a host buffer for the
> GPU and another one for IB (looks like some strange memory registration
> problem to me...), and you have to memcpy between them in the middle .
> 
> We're all confused with the name "GPUDirect" because we remember people
> doing DMA directly between the NIC and a GPU or SCSI disk ten years ago.
> GPUDirect doesn't go that far unfortunately :/

Correct.  GPUDirect is a brilliant marketing name.  Its name has nothing to do 
with what it really is: the ability to register the same buffer to both CUDA 
and OpenFabrics.  

As Brice says: GPUDirect does NOT send/receive data directly from the 
accelerator's memory.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Jeff Squyres

On Apr 14, 2011, at 12:41 PM, Brice Goglin wrote:

> hwloc (since 1.1, on Linux) can already tell you which CPUs are close to a 
> CUDA device, see 
> https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cuda.h and 
> https://svn.open-mpi.org/trac/hwloc/browser/trunk/include/hwloc/cudart.h
> Do you need anything else ?

Nope.

I think the inference was that *all* CUDA support should be under carto/hwloc.  
I don't think that's quite possible, though, for some of the reasons Rolf 
mentioned (i.e., we need to do more than just know *where* the accelerators 
are).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Shamis, Pavel

> 
>> Actually I'm not sure that it is good idea to enable CUDA by default, since 
>> it disables zero-copy protocol, that is critical for good performance.
> 
> That can easily be a run-time check during startup.

It could be fixed. My point was that in the existing code, it's compile time 
decision and not run time.

Pasha

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Jeff Squyres

On Apr 14, 2011, at 3:13 PM, Shamis, Pavel wrote:

>> That can easily be a run-time check during startup.
> 
> It could be fixed. My point was that in the existing code, it's compile time 
> decision and not run time.

I agree; I mentioned the same issue in my review, too.  Some of the code can 
clearly use both a compile time and a run time check (like the part that we're 
talking about right now :-) ).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] Problem of memory lost in MPI_Type_create_hindexed() with count = 1 (patch proposed)

2011-04-14 Thread George Bosilca

Interesting, this issue exists in 2 out of 3 functions defined in the 
ompi_datatype_create_indexed.c file. Based on your patch I create one that 
fixes all the issues with the indexed type creation. Attached is the patch. 
I'll push it in the trunk and create CMRs.

  Thanks,
george.

Index: ompi/datatype/ompi_datatype_create_indexed.c
===
--- ompi/datatype/ompi_datatype_create_indexed.c(revision 24616)
+++ ompi/datatype/ompi_datatype_create_indexed.c(working copy)
@@ -3,7 +3,7 @@
  * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana
  * University Research and Technology
  * Corporation.  All rights reserved.
- * Copyright (c) 2004-2009 The University of Tennessee and The University
+ * Copyright (c) 2004-2010 The University of Tennessee and The University
  * of Tennessee Research Foundation.  All rights
  * reserved.
  * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart,
@@ -46,26 +46,21 @@
 dLength = pBlockLength[0];
 endat = disp + dLength;
 ompi_datatype_type_extent( oldType, &extent );
-if( 1 >= count ) {
-pdt = ompi_datatype_create( oldType->super.desc.used + 2 );
-/* multiply by count to make it zero if count is zero */
-ompi_datatype_add( pdt, oldType, count * dLength, disp * extent, 
extent );
-} else {
-pdt = ompi_datatype_create( count * (2 + oldType->super.desc.used) );
-for( i = 1; i < count; i++ ) {
-if( endat == pDisp[i] ) {
-/* contiguous with the previsious */
-dLength += pBlockLength[i];
-endat += pBlockLength[i];
-} else {
-ompi_datatype_add( pdt, oldType, dLength, disp * extent, 
extent );
-disp = pDisp[i];
-dLength = pBlockLength[i];
-endat = disp + pBlockLength[i];
-}
+
+pdt = ompi_datatype_create( count * (2 + oldType->super.desc.used) );
+for( i = 1; i < count; i++ ) {
+if( endat == pDisp[i] ) {
+/* contiguous with the previsious */
+dLength += pBlockLength[i];
+endat += pBlockLength[i];
+} else {
+ompi_datatype_add( pdt, oldType, dLength, disp * extent, extent );
+disp = pDisp[i];
+dLength = pBlockLength[i];
+endat = disp + pBlockLength[i];
 }
-ompi_datatype_add( pdt, oldType, dLength, disp * extent, extent );
 }
+ompi_datatype_add( pdt, oldType, dLength, disp * extent, extent );

 *newType = pdt;
 return OMPI_SUCCESS;
@@ -91,25 +86,20 @@
 dLength = pBlockLength[0];
 endat = disp + dLength * extent;

-if( 1 >= count ) {
-pdt = ompi_datatype_create( oldType->super.desc.used + 2 );
-/* multiply by count to make it zero if count is zero */
-ompi_datatype_add( pdt, oldType, count * dLength, disp, extent );
-} else {
-for( i = 1; i < count; i++ ) {
-if( endat == pDisp[i] ) {
-/* contiguous with the previsious */
-dLength += pBlockLength[i];
-endat += pBlockLength[i] * extent;
-} else {
-ompi_datatype_add( pdt, oldType, dLength, disp, extent );
-disp = pDisp[i];
-dLength = pBlockLength[i];
-endat = disp + pBlockLength[i] * extent;
-}
+for( i = 1; i < count; i++ ) {
+if( endat == pDisp[i] ) {
+/* contiguous with the previsious */
+dLength += pBlockLength[i];
+endat += pBlockLength[i] * extent;
+} else {
+ompi_datatype_add( pdt, oldType, dLength, disp, extent );
+disp = pDisp[i];
+dLength = pBlockLength[i];
+endat = disp + pBlockLength[i] * extent;
 }
-ompi_datatype_add( pdt, oldType, dLength, disp, extent );
 }
+ompi_datatype_add( pdt, oldType, dLength, disp, extent );
+
 *newType = pdt;
 return OMPI_SUCCESS;
 }


On Apr 14, 2011, at 10:48 , Pascal Deveze wrote:

> Calling MPI_Type_create_hindexed(int count, int array_of_blocklengths[],
>   MPI_Aint array_of_displacements[], MPI_Datatype oldtype,
>   MPI_Datatype *newtype)
> with a count parameter of 1 causes a loss of memory detected by valgrind :
> 
> ==2053== 576 (448 direct, 128 indirect) bytes in 1 blocks are definitely lost 
> in loss record 157 of 182
> ==2053==at 0x4C2415D: malloc (vg_replace_malloc.c:195)
> ==2053==by 0x4E7CEC7: opal_obj_new (opal_object.h:469)
> ==2053==by 0x4E7D134: ompi_datatype_create (ompi_datatype_create.c:71)
> ==2053==by 0x4E7D58E: ompi_datatype_create_hindexed 
> (ompi_datatype_create_indexed.c:89)
> ==2053==by 0x4EA74D0: PMPI_Type_create_hindexed 
> (ptype_create_hindexed.c:75)
> ==2

[OMPI devel] Fwd: [OMPI svn-full] svn:open-mpi r24617

2011-04-14 Thread Jeff Squyres

George --

Unfortunately, this didn't automatically create CMRs (I'm not sure why).  :-(


Begin forwarded message:

> From: bosi...@osl.iu.edu
> Date: April 14, 2011 5:50:07 PM EDT
> To: svn-f...@open-mpi.org
> Subject: [OMPI svn-full] svn:open-mpi r24617
> Reply-To: de...@open-mpi.org
> 
> Author: bosilca
> Date: 2011-04-14 17:50:06 EDT (Thu, 14 Apr 2011)
> New Revision: 24617
> URL: https://svn.open-mpi.org/trac/ompi/changeset/24617
> 
> Log:
> Based on the patch submitted by Pascal Deveze, here is the memory leak fix
> for the type indexed creation.
> 
> CMR v1.4 and v1.5.
> 
> Text files modified: 
>   trunk/ompi/datatype/ompi_datatype_create_indexed.c |62 
> --- 
>   1 files changed, 26 insertions(+), 36 deletions(-)
> 
> Modified: trunk/ompi/datatype/ompi_datatype_create_indexed.c
> ==
> --- trunk/ompi/datatype/ompi_datatype_create_indexed.c(original)
> +++ trunk/ompi/datatype/ompi_datatype_create_indexed.c2011-04-14 
> 17:50:06 EDT (Thu, 14 Apr 2011)
> @@ -3,7 +3,7 @@
>  * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana
>  * University Research and Technology
>  * Corporation.  All rights reserved.
> - * Copyright (c) 2004-2009 The University of Tennessee and The University
> + * Copyright (c) 2004-2010 The University of Tennessee and The University
>  * of Tennessee Research Foundation.  All rights
>  * reserved.
>  * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart,
> @@ -46,26 +46,21 @@
> dLength = pBlockLength[0];
> endat = disp + dLength;
> ompi_datatype_type_extent( oldType, &extent );
> -if( 1 >= count ) {
> -pdt = ompi_datatype_create( oldType->super.desc.used + 2 );
> -/* multiply by count to make it zero if count is zero */
> -ompi_datatype_add( pdt, oldType, count * dLength, disp * extent, 
> extent );
> -} else {
> -pdt = ompi_datatype_create( count * (2 + oldType->super.desc.used) );
> -for( i = 1; i < count; i++ ) {
> -if( endat == pDisp[i] ) {
> -/* contiguous with the previsious */
> -dLength += pBlockLength[i];
> -endat += pBlockLength[i];
> -} else {
> -ompi_datatype_add( pdt, oldType, dLength, disp * extent, 
> extent );
> -disp = pDisp[i];
> -dLength = pBlockLength[i];
> -endat = disp + pBlockLength[i];
> -}
> +
> +pdt = ompi_datatype_create( count * (2 + oldType->super.desc.used) );
> +for( i = 1; i < count; i++ ) {
> +if( endat == pDisp[i] ) {
> +/* contiguous with the previsious */
> +dLength += pBlockLength[i];
> +endat += pBlockLength[i];
> +} else {
> +ompi_datatype_add( pdt, oldType, dLength, disp * extent, extent 
> );
> +disp = pDisp[i];
> +dLength = pBlockLength[i];
> +endat = disp + pBlockLength[i];
> }
> -ompi_datatype_add( pdt, oldType, dLength, disp * extent, extent );
> }
> +ompi_datatype_add( pdt, oldType, dLength, disp * extent, extent );
> 
> *newType = pdt;
> return OMPI_SUCCESS;
> @@ -91,25 +86,20 @@
> dLength = pBlockLength[0];
> endat = disp + dLength * extent;
> 
> -if( 1 >= count ) {
> -pdt = ompi_datatype_create( oldType->super.desc.used + 2 );
> -/* multiply by count to make it zero if count is zero */
> -ompi_datatype_add( pdt, oldType, count * dLength, disp, extent );
> -} else {
> -for( i = 1; i < count; i++ ) {
> -if( endat == pDisp[i] ) {
> -/* contiguous with the previsious */
> -dLength += pBlockLength[i];
> -endat += pBlockLength[i] * extent;
> -} else {
> -ompi_datatype_add( pdt, oldType, dLength, disp, extent );
> -disp = pDisp[i];
> -dLength = pBlockLength[i];
> -endat = disp + pBlockLength[i] * extent;
> -}
> +for( i = 1; i < count; i++ ) {
> +if( endat == pDisp[i] ) {
> +/* contiguous with the previsious */
> +dLength += pBlockLength[i];
> +endat += pBlockLength[i] * extent;
> +} else {
> +ompi_datatype_add( pdt, oldType, dLength, disp, extent );
> +disp = pDisp[i];
> +dLength = pBlockLength[i];
> +endat = disp + pBlockLength[i] * extent;
> }
> -ompi_datatype_add( pdt, oldType, dLength, disp, extent );
> }
> +ompi_datatype_add( pdt, oldType, dLength, disp, extent );
> +
> *newType = pdt;
> return OMPI_SUCCESS;
> }
> ___
> svn-full mailing list
> svn-f...@open-mpi.or

Re: [OMPI devel] Exit status

Re: [OMPI devel] Exit status

Re: [OMPI devel] Exit status

Re: [OMPI devel] Exit status

Re: [OMPI devel] Exit status

Re: [OMPI devel] Exit status

Re: [OMPI devel] Exit status

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Re: [OMPI devel] Exit status

[OMPI devel] Problem of memory lost in MPI_Type_create_hindexed() with count = 1 (patch proposed)

Re: [OMPI devel] Problem of memory lost in MPI_Type_create_hindexed() with count = 1 (patch proposed)

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Re: [OMPI devel] Problem of memory lost in MPI_Type_create_hindexed() with count = 1 (patch proposed)

[OMPI devel] Fwd: [OMPI svn-full] svn:open-mpi r24617

25 matches

Site Navigation

Mail list logo

Footer information