Soliciting input from the community:
WHAT: Modify PML cm component to remove unnecessary initializations,
optimizing blocking operations
WHY: Remove overhead in fast-path by allowing a "direct mode" increases
single packet latency
HOW: In PML cm, even if the request starts and ends within the scope of the
blocking send/recv function,
A full request, a structure of up to 488 bytes (not including the
MTL request appendix size) may be initialized.
The request includes the opmi_request_t structure, used by an
underlying MTL component, the converter
which corresponds to the datatype and other parameters - some of
which are stored and only used if the
request is asynchronous. This causes a significant amount of
writes, especially when considering the send
buffer could be as small as several bytes.
The proposed patch introduces a "direct mode" (currently set iff
the underlying MTL is "mxm", which is the
only option I had available for testing), which when on cuts most
of the initialization for blocking send and
receive operations to include only the bare minimum required to
function. Aside from initializing only a part
of the request structure (field like "dst" and "tag" are passed
again to the MTL_CALL macro rather than use
the request struct anyway), the function uses a single
pre-allocated request buffer - which is possible since
the call is blocking. Our tests show that this increases packet
rate by approximately 20% with 8-byte buffers.
Note that the "redundant" if-conditions for irrelevant functions
(e.g. recv_init) are removed by compiler,
since the macro substitutes and gets "if (0 == 0)".
WHERE: Most of the files in ompi/mca/pml/cm .
WHEN: ?
Joshua S. Ladd, PhD
HPC Algorithms Engineer
Mellanox Technologies
Email: [email protected]<mailto:[email protected]>
Cell: +1 (865) 258 - 8898