Steve --

Can you file a trac bug about this?


On May 10, 2007, at 6:15 PM, Steve Wise wrote:



There are two new issues so far:

1) this has uncovered a connection migration issue in the Chelsio
driver/firmware.  We are developing and testing a fix for this now.
Should be ready tomorrow hopefully.


I have a fix for the above issue and I can continue with OMPI testing.

To work around the client-must-send issue, I put a nice fat sleep in the
udapl btl right after it calls dat_cr_accept(), in
mca_btl_udapl_accept_connect().  This, however, exposes another issue
with the udapl btl:

Neither the client nor the server side of the udapl btl connection setup
pre-post RECV buffers before connecting.  This can allow a SEND to
arrive before a RECV buffer is available. I _think_ IB will handle this
issue by retransmitting the SEND.  Chelsio's iWARP device, however,
TERMINATEs the connection. My sleep() makes this condition happen every
time.

From what I can tell, the udapl btl exchanges memory info as a first
order of business after connection establishment
(mba_btl_udapl_sendrecv().  The RECV buffer post for this exchange,
however, should really be done _before_ the dat_ep_connect() on the
active side, and _before_ the dat_cr_accept() on the server side.
Currently its done after the ESTABLISHED event is dequeued, thus
allowing the race condition.

I believe the rules are the ULP must ensure that a RECV is posted before
the client can post a SEND for that buffer.  And further, the ULP must
enforce flow control somehow so that a SEND never arrives without a RECV
buffer being available.

Perhaps this is just a bug and I opened it up with my sleep()

Or is the uDAPL btl assuming the transport will deal with lack of RECV
buffer at the time a SEND arrives?

Also: Given there is a message exchange _always_ after connection setup,
then we can change that exchange to support the client-must-send-first
issue...


Steve.




--
Jeff Squyres
Cisco Systems

Reply via email to