Hi list,

I'm currently working on IB bandwidth improvements and maybe some of you may help me understanding some things. I'm trying to align every IB RDMA operation to 64 bytes, because having it unaligned can hurt your performance from lightly to very badly, depending on your architecture.

So, I'm trying to understand the RDMA protocol (PUT and GET), and here is what I understood :

* if we have one btl, RDMA is performed with only one GET operation, otherwise, we use multiple PUT operations. I can understand that the GET operation improves asynchronous aspects. So, why not always use GET operations ?

* if mpi_leave_pinned is 0, this is becoming more strange. We start a rendez-vous (not RDMA) with a size equal to the eager limit, then we switch to RDMA because the remote peer asks for RDMA PUTs (even if btl_openib_flags does not have the PUT operation btw). Why this corner case ? Why not starting a normal RDMA (especially since we switch back to RDMA afterwards) ?

* the openib btl has a "buffer alignment" parameter. Fantastic, just what I needed. Unfortunately, I can't see where it is used (and indeed performance is bad if my buffers are not aligned to 64 bytes). Am I missing something ?

* I did a prototype to split GET operations in openib into two operations : a small one to correct buffer alignment and a big aligned one. It would certainly be better to perform the first one with a normal send/recv, but for the prototype, doing it inside the openib GET was simpler. Performance on unaligned buffers is much better (but this is just a prototype). Is there anyone working on this right now or should I pursue my effort to make it clean and stable ?

Thanks in advance for any feedback,
Sylvain

Reply via email to