[EMAIL PROTECTED] wrote on Wed, 14 Jun 2006 15:21 +0200:
> If we did this within BMI, we would be paying an extra round trip 
> latency time for each large TCP message, which we should probably try to 
> avoid.
> 
> I vote for just changing the ordering of sys-io.sm so that it does not 
> post write flows until a positive write ack is received from the server. 
>  That is basically equivalent to performing one handshake (or 
> rendezvous) for the whole flow rather than one per BMI message.
> 
> The sys-io.sm state machine already does that for reads.  Read flows do 
> not get posted until an ack is received from the server.

Agree this is probably the best thing to do.  But it may slow things
down.  Unfortunately with the stream semantics and no rendezvous
protocol in bmi_tcp, you're mostly stuck with that.  You could go
with two sockets per connection:  data + control, but that seems
like hacking around the problem.  You could also switch to SCTP and
hope that gets widely adopted someday.  :)

Some IB insight:  because it is legal in BMI to post a send before
the receiver has posted a receive, even for expected messages, and
because IB will break the connection if data arrives with no buffers
posted, we have to implement rendezvous and credit-based flow
control.  And to do big transfers well, we have to exchange RDMA
addresses before the transfer can go anyway.

So we won't see this problem with IB, and can actually overlap the
write ack part with the RDMA address exchange that follows with
nothing breaking.  (There is a limit of 20 messages per connection
in flight, including RDMA address exchange and unexpecteds and
whatnot, but we're not approaching that.)

I haven't timed the gains from this overlap, but once I see the
sys-io patch you need, I'll run some tests and see if there's any
detectable difference.

                -- Pete
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to