On Tue, 8 Nov 2011 06:36:03 -0800, Rolf vandeVaart <[email protected]>
wrote:
>> george.
>>
>>PS: Regarding the hand-copy instead of the memcpy, we tried to avoid
> using
>>memcpy in performance critical codes, especially when we know the size of
>>the data and the alignment. This relieves the compiler of adding ugly
> intrinsics,
>>allowing it to nicely pipeline to load/stores. Anyway, with both
> approaches
>>you will copy more data than needed for all BTLs except uGNI.
>
> I was looking at a case in a BTL I was working on where I actually need
64
> bytes (yes, bytes) as the remote key size as opposed to the current 16
> bytes (128 bits).
> Not sure how I can handle that yet. (I assume configure is my friend,
but
> even in that case, all headers will need to carry around the extra data.)
>
I have been thinking about this a little bit. What I think should be done
(and I am sure George will disagree) is to allow BTLs to define how long a
segment is. The PML would then just memcpy the segments into the send
buffer (instead of copying each member).
For example mca_btl_base_segment_t would become:
struct mca_btl_base_segment_t {
size_t seg_len;
};
since the pml needs the segment size (it does not need anything else).
and then each btl would define its own segment like:
struct mca_btl_ugni_segment_t {
struct mca_btl_base_segment_t base;
gni_mem_handle_t seg_key;
};
and we would add:
size_t btl_segment_len;
to the mca_btl_base_module_t or the base frag so the pml knows how much it
needs to copy.
This design would address George's criticism of the length of the seg_key
and also allow BTLs to do what they need to. It would require a memcpy but
I disagree this would slow the critical path. Even if it does it would be
relatively minor (i think) and the flexibility is worth more in the long
run.
-Nathan