Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29055 - in trunk/ompi/mca: btl btl/smcuda common/cuda pml/ob1

George Bosilca Wed, 28 Aug 2013 17:11:52 -0400 (EDT)

Rolf,

I'm not arguing against the choices you made. I just want to pinpoint that 
there are other way of achieving a similar goal with less impact on the 
components outside of the SMCUDA BTL, an approach which might be interesting in 
the long run. Thus my answer below is just so there is a trace of a possible 
starting point toward a less intrusive approach, in case someone is interested 
in implementing in the future (an intern or some PhD student).


On Aug 23, 2013, at 21:53 , Rolf vandeVaart <rvandeva...@nvidia.com> wrote:

> Yes, I agree that the CUDA support is more intrusive and ends up in different 
> areas.  The problem is that the changes could not be simply isolated in a BTL.
> 
> 1. To support the direct movement of GPU buffers, we often utilize copying 
> into host memory and then out of host memory.   These copies have to be done 
> utilizing the cuMemcpy() functions rather than the memcpy() function.  This 
> is why some changes ended up in the opal datatype area.  Must of that copying 
> is driven by the convertor.

Each proc has a master_convertor which is a specialized class of convertors 
used for the target processor(s), and this specialized convertor is cloned for 
every new request. Some of these convertors support heterogeneous operations, 
some of them checksums and some of them just memcpy. The CUDA could have become 
just another class of convertors, with their definition visible only in the 
SMCUDA component.

> 2. I added support for doing an asynchronous copy into and out of host 
> buffers.  This ended up touching datatype, PML, and BTL layers.

I'm not sure I understand the issue here, but what you do in the BTL is none of 
the PML business as long as you respect the rules. For message logging we have 
to copy the data locally in addition to sending it over the network, and we do 
not sequentialize these two steps, the can progress simultaneously (supported 
by extra threads eventually).

> 3. A GPU buffer may utilize a different protocol than a HOST buffer within a 
> BTL.  This required me to find different ways to direct which PML protocol to 
> use.  In addition, it is assume that a BTL either supports RDMA or not.  
> There is no idea of supporting based on the type of buffer one is sending.

The change where the PML is using the flags on the endpoint instead of the one 
on the BTL is a good change, allowing far more flexibility on the handling of 
the transfer protocols.

Based on my understudying of your code, you changed the AM tag used by the PML 
to send the message something that could have been done easily in the BTL as an 
example by providing different path for the PUT/GET operations based on the 
memory ownership. Basically I could not figure out why the decision of using 
specialized RDMA (IPC based) should be reported upstream to the PML instead of 
being a totally local decision on the BTL and exposed to the PML as a single 
PUT/GET interface.

  George.

> Therefore, to leverage much of the existing datatype and PML support, I had 
> to make changes in there.  Overall, I agree it is not ideal, but the best I 
> could come up with. 
> 
>> -----Original Message-----
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
>> Bosilca
>> Sent: Friday, August 23, 2013 7:36 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29055 - in
>> trunk/ompi/mca: btl btl/smcuda common/cuda pml/ob1
>> 
>> Rolf,
>> 
>> On Aug 22, 2013, at 19:24 , Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
>> 
>>> Hi George:
>>> 
>>> The reason it tainted the PML is because the CUDA IPC support makes use
>> of the large message RDMA protocol of the PML layer.  The smcuda btl starts
>> up, but does not initially support any large message RDMA (RGET,RPUT)
>> protocols.  Then when a GPU buffer is first accessed, the smcuda btl starts 
>> an
>> exchange of some control messages with its peer.  If they determine that
>> they can support CUDA IPC, then the smcuda calls up into the PML layer and
>> says it is OK to start using the large message RDMA.  This all happens in 
>> code
>> that is only compiled in if the user asks for CUDA-aware support.
>> 
>> The issue is not that is compiled in only when CUDA support is enabled. The
>> problem is that it is a breakage of the separation between PML and BTL.
>> Almost all BTL did manage to implement highly optimized protocols under this
>> design without having to taint the PML. Very similarly to CUDA I can cite CMA
>> and KNEM support in the SM BTL. So I really wonder why the CUDA support is
>> so different, at the point where it had to go all over the place (convertor,
>> memory pool and PML)?
>> 
>>> The key requirement was I wanted to dynamically add the support for CUDA
>> IPC when the user first started accessing GPU buffers rather than during
>> MPI_Init.
>> 
>> Moving from BTL flags to endpoint based flags is indeed a good thing. This is
>> something that should be done everywhere in the PML code, as it will allow
>> the BTL to support different behaviors based on the  peer.
>> 
>> George.
>> 
>>> This the best way I could figure out how to accomplish this but I am open to
>> other ideas.
>> 
>> 
>> 
>>> 
>>> Thanks,
>>> Rolf
>>> 
>>>> -----Original Message-----
>>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
>>>> Bosilca
>>>> Sent: Thursday, August 22, 2013 11:32 AM
>>>> To: de...@open-mpi.org
>>>> Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29055 - in
>>>> trunk/ompi/mca: btl btl/smcuda common/cuda pml/ob1
>>>> 
>>>> I'm not very keen of seeing BTL modification tainting the PML. I
>>>> would have expected support for IPC between GPU must be a BTL-level
>>>> decision, no a special path in the PML.
>>>> 
>>>> Is there a reason IPC support cannot be hidden down in the SMCUDA BTL?
>>>> 
>>>> Thanks,
>>>>  George.
>>>> 
>>>> On Aug 21, 2013, at 23:00 , svn-commit-mai...@open-mpi.org wrote:
>>>> 
>>>>> Author: rolfv (Rolf Vandevaart)
>>>>> Date: 2013-08-21 17:00:09 EDT (Wed, 21 Aug 2013) New Revision: 29055
>>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/29055
>>>>> 
>>>>> Log:
>>>>> Fix support in smcuda btl so it does not blow up when there is no
>>>>> CUDA IPC
>>>> support between two GPUs. Also make it so CUDA IPC support is added
>>>> dynamically.
>>>>> Fixes ticket 3531.
>>>>> 
>>>>> Added:
>>>>> trunk/ompi/mca/btl/smcuda/README
>>>>> Text files modified:
>>>>> trunk/ompi/mca/btl/btl.h                         |     2
>>>>> trunk/ompi/mca/btl/smcuda/README                 |   113
>>>> ++++++++++++++++++++++
>>>>> trunk/ompi/mca/btl/smcuda/btl_smcuda.c           |   104
>>>> ++++++++++++++++++++
>>>>> trunk/ompi/mca/btl/smcuda/btl_smcuda.h           |    28 +++++
>>>>> trunk/ompi/mca/btl/smcuda/btl_smcuda_component.c |   200
>>>> +++++++++++++++++++++++++++++++++++++++
>>>>> trunk/ompi/mca/btl/smcuda/btl_smcuda_endpoint.h  |     5 +
>>>>> trunk/ompi/mca/common/cuda/common_cuda.c         |    29 +++++
>>>>> trunk/ompi/mca/common/cuda/common_cuda.h         |     3
>>>>> trunk/ompi/mca/pml/ob1/pml_ob1.c                 |    11 ++
>>>>> trunk/ompi/mca/pml/ob1/pml_ob1_cuda.c            |    42 ++++++++
>>>>> trunk/ompi/mca/pml/ob1/pml_ob1_recvreq.c         |     6
>>>>> 11 files changed, 535 insertions(+), 8 deletions(-)
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> ----------------------------------------------------------------------
>>> ------------- This email message is for the sole use of the intended
>>> recipient(s) and may contain confidential information.  Any
>>> unauthorized review, use, disclosure or distribution is prohibited.
>>> If you are not the intended recipient, please contact the sender by
>>> reply email and destroy all copies of the original message.
>>> ----------------------------------------------------------------------
>>> ------------- _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29055 - in trunk/ompi/mca: btl btl/smcuda common/cuda pml/ob1

Reply via email to