Re: [OMPI users] Not getting zero-copy with custom datatype

2024-04-29 Thread George Bosilca via users
Like a vector where the stride and the blocklen are the same. There is an
optimizer in MPI_Type_Commit that tries to reshape the datatype description
to represent the same memory layout but involving less memcpy during
pack/unpack.

What you are asking can be done, but it is relatively complex, and
the benefit is not obvious, despite the fact that it sounds evident, aka.
zero copy gives you better bandwidth. One of the peers will need to expose
the entire span of the memory layout to the other process (this is usually
an expensive operation), and will also need to provide the peer with the
description of the datatype (adding extra latency to the operation). With
this info the peer can then create a union of memcpy between its local
datatype and the remote datatype, and issue them all resulting in a
zero-copy communication.

  George.


On Fri, Apr 26, 2024 at 4:21 AM Pascal Boeschoten via users <
users@lists.open-mpi.org> wrote:

> Hello George,
>
> Thank you, that's good to know. I expected the datatype to be enough of a
> description of the memory layout, since it's all on the same machine.
> Would you be able to clarify what you mean with "can be seen as contiguous
> (even if described otherwise)"? In what way could it be described
> otherwise, but still be seen as contiguous?
>
> Thanks,
> Pascal Boeschoten
>
> On Tue, 23 Apr 2024 at 16:05, George Bosilca  wrote:
>
>> zero copy does not work with non-contiguous datatypes (it would require
>> both processes to know the memory layout used by the peer). As long as the
>> memory layout described by the type can be seen as contiguous (even if
>> described otherwise), it should work just fine.
>>
>>   George.
>>
>> On Tue, Apr 23, 2024 at 10:02 AM Pascal Boeschoten via users <
>> users@lists.open-mpi.org> wrote:
>>
>>> Hello,
>>>
>>> I'm using a custom datatype created through MPI_Type_create_struct() to
>>> send data with a dynamic structure to another process on the same node over
>>> shared memory, and noticed it's much slower than expected.
>>>
>>> I ran a profile, and it looks like it's not using CMA zero-copy, falling
>>> back to using opal_generic_simple_pack()/opal_generic_simple_unpack().
>>> Simpler datatypes do seem to use zero-copy, using
>>> mca_btl_vader_get_cma(), so I don't think it's a configuration or system
>>> issue.
>>>
>>> I suspect it's because the struct datatype is not contiguous, i.e. the
>>> blocks of the struct have gaps between them.
>>> Is anyone able to confirm whether zero-copy with an MPI struct requires
>>> a contiguous data structure, and whether it has other requirements like the
>>> displacements being in ascending order, having homogeneous block
>>> datatypes/lengths, etc?
>>>
>>> I'm using OpenMPI 4.1.6.
>>>
>>> Thanks,
>>> Pascal Boeschoten
>>>
>>


Re: [OMPI users] Not getting zero-copy with custom datatype

2024-04-26 Thread Pascal Boeschoten via users
Hello George,

Thank you, that's good to know. I expected the datatype to be enough of a
description of the memory layout, since it's all on the same machine.
Would you be able to clarify what you mean with "can be seen as contiguous
(even if described otherwise)"? In what way could it be described
otherwise, but still be seen as contiguous?

Thanks,
Pascal Boeschoten

On Tue, 23 Apr 2024 at 16:05, George Bosilca  wrote:

> zero copy does not work with non-contiguous datatypes (it would require
> both processes to know the memory layout used by the peer). As long as the
> memory layout described by the type can be seen as contiguous (even if
> described otherwise), it should work just fine.
>
>   George.
>
> On Tue, Apr 23, 2024 at 10:02 AM Pascal Boeschoten via users <
> users@lists.open-mpi.org> wrote:
>
>> Hello,
>>
>> I'm using a custom datatype created through MPI_Type_create_struct() to
>> send data with a dynamic structure to another process on the same node over
>> shared memory, and noticed it's much slower than expected.
>>
>> I ran a profile, and it looks like it's not using CMA zero-copy, falling
>> back to using opal_generic_simple_pack()/opal_generic_simple_unpack().
>> Simpler datatypes do seem to use zero-copy, using
>> mca_btl_vader_get_cma(), so I don't think it's a configuration or system
>> issue.
>>
>> I suspect it's because the struct datatype is not contiguous, i.e. the
>> blocks of the struct have gaps between them.
>> Is anyone able to confirm whether zero-copy with an MPI struct requires a
>> contiguous data structure, and whether it has other requirements like the
>> displacements being in ascending order, having homogeneous block
>> datatypes/lengths, etc?
>>
>> I'm using OpenMPI 4.1.6.
>>
>> Thanks,
>> Pascal Boeschoten
>>
>


Re: [OMPI users] Not getting zero-copy with custom datatype

2024-04-23 Thread George Bosilca via users
zero copy does not work with non-contiguous datatypes (it would require
both processes to know the memory layout used by the peer). As long as the
memory layout described by the type can be seen as contiguous (even if
described otherwise), it should work just fine.

  George.

On Tue, Apr 23, 2024 at 10:02 AM Pascal Boeschoten via users <
users@lists.open-mpi.org> wrote:

> Hello,
>
> I'm using a custom datatype created through MPI_Type_create_struct() to
> send data with a dynamic structure to another process on the same node over
> shared memory, and noticed it's much slower than expected.
>
> I ran a profile, and it looks like it's not using CMA zero-copy, falling
> back to using opal_generic_simple_pack()/opal_generic_simple_unpack().
> Simpler datatypes do seem to use zero-copy, using mca_btl_vader_get_cma(),
> so I don't think it's a configuration or system issue.
>
> I suspect it's because the struct datatype is not contiguous, i.e. the
> blocks of the struct have gaps between them.
> Is anyone able to confirm whether zero-copy with an MPI struct requires a
> contiguous data structure, and whether it has other requirements like the
> displacements being in ascending order, having homogeneous block
> datatypes/lengths, etc?
>
> I'm using OpenMPI 4.1.6.
>
> Thanks,
> Pascal Boeschoten
>


[OMPI users] Not getting zero-copy with custom datatype

2024-04-23 Thread Pascal Boeschoten via users
Hello,

I'm using a custom datatype created through MPI_Type_create_struct() to
send data with a dynamic structure to another process on the same node over
shared memory, and noticed it's much slower than expected.

I ran a profile, and it looks like it's not using CMA zero-copy, falling
back to using opal_generic_simple_pack()/opal_generic_simple_unpack().
Simpler datatypes do seem to use zero-copy, using mca_btl_vader_get_cma(),
so I don't think it's a configuration or system issue.

I suspect it's because the struct datatype is not contiguous, i.e. the
blocks of the struct have gaps between them.
Is anyone able to confirm whether zero-copy with an MPI struct requires a
contiguous data structure, and whether it has other requirements like the
displacements being in ascending order, having homogeneous block
datatypes/lengths, etc?

I'm using OpenMPI 4.1.6.

Thanks,
Pascal Boeschoten