Re: [OMPI devel] RDMA with ob1 and openib

2010-04-27 Thread George Bosilca
On Apr 27, 2010, at 10:20 , Sylvain Jeaugey wrote:

> Hi list,
> 
> I'm currently working on IB bandwidth improvements and maybe some of you may 
> help me understanding some things. I'm trying to align every IB RDMA 
> operation to 64 bytes, because having it unaligned can hurt your performance 
> from lightly to very badly, depending on your architecture.
> 
> So, I'm trying to understand the RDMA protocol (PUT and GET), and here is 
> what I understood :
> 
> * if we have one btl, RDMA is performed with only one GET operation, 
> otherwise, we use multiple PUT operations. I can understand that the GET 
> operation improves asynchronous aspects. So, why not always use GET 
> operations ?

Because nobody had the time to implement the pipelined GET protocol.

> * if mpi_leave_pinned is 0, this is becoming more strange. We start a 
> rendez-vous (not RDMA) with a size equal to the eager limit, then we switch 
> to RDMA because the remote peer asks for RDMA PUTs (even if btl_openib_flags 
> does not have the PUT operation btw). Why this corner case ? Why not starting 
> a normal RDMA (especially since we switch back to RDMA afterwards) ?

I guess you just found a bug. In fact the protocol is a little bit more 
complex: eager, RDMA and send/recv. There is a small amount of data sent over 
the copy in/copy out at the end of the buffer. Originally this was done on the 
data right after the eager, but for some "well known" issues on IB (something 
related to fork, Jeff can give you more details here) we move it at the end.

> * the openib btl has a "buffer alignment" parameter. Fantastic, just what I 
> needed. Unfortunately, I can't see where it is used (and indeed performance 
> is bad if my buffers are not aligned to 64 bytes). Am I missing something ?

No comments ...

> * I did a prototype to split GET operations in openib into two operations : a 
> small one to correct buffer alignment and a big aligned one. It would 
> certainly be better to perform the first one with a normal send/recv, but for 
> the prototype, doing it inside the openib GET was simpler. Performance on 
> unaligned buffers is much better (but this is just a prototype). Is there 
> anyone working on this right now or should I pursue my effort to make it 
> clean and stable ?

This can be easily done internally in the IB BTL, without any support from the 
upper layer. What I would like to have, is a more generic solution, as I think 
that all BTL are impacted by the unaligned buffers for RDMA operations. My idea 
is to modify the way we deal with the eager fragment, and be able to recompute 
the eager size based on the alignment we want for the next RMA operation. For 
IB it might be 64 bytes, but for SM it is 4K...

  george.

> 
> Thanks in advance for any feedback,
> Sylvain
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] RDMA with ob1 and openib

2010-04-27 Thread Sylvain Jeaugey

Hi list,

I'm currently working on IB bandwidth improvements and maybe some of you 
may help me understanding some things. I'm trying to align every IB RDMA 
operation to 64 bytes, because having it unaligned can hurt your 
performance from lightly to very badly, depending on your architecture.


So, I'm trying to understand the RDMA protocol (PUT and GET), and here is 
what I understood :


* if we have one btl, RDMA is performed with only one GET operation, 
otherwise, we use multiple PUT operations. I can understand that the GET 
operation improves asynchronous aspects. So, why not always use GET 
operations ?


* if mpi_leave_pinned is 0, this is becoming more strange. We start a 
rendez-vous (not RDMA) with a size equal to the eager limit, then we 
switch to RDMA because the remote peer asks for RDMA PUTs (even if 
btl_openib_flags does not have the PUT operation btw). Why this corner 
case ? Why not starting a normal RDMA (especially since we switch back to 
RDMA afterwards) ?


* the openib btl has a "buffer alignment" parameter. Fantastic, just what 
I needed. Unfortunately, I can't see where it is used (and indeed 
performance is bad if my buffers are not aligned to 64 bytes). Am I 
missing something ?


* I did a prototype to split GET operations in openib into two operations 
: a small one to correct buffer alignment and a big aligned one. It would 
certainly be better to perform the first one with a normal send/recv, but 
for the prototype, doing it inside the openib GET was simpler. Performance 
on unaligned buffers is much better (but this is just a prototype). Is 
there anyone working on this right now or should I pursue my effort to 
make it clean and stable ?


Thanks in advance for any feedback,
Sylvain