Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

Nitin Hande Mon, 14 Nov 2005 12:50:08 -0800

Michael Krause wrote:

At 01:02 PM 11/11/2005, Ranjit Pandit wrote:
On 11/11/05, Michael Krause <[EMAIL PROTECTED]> wrote:
> Please clarify the following which was in the document provided byOracle.
>
> On page 3 of the RDS document, under the section "RDP Interface",the 2nd
> and 3rd paragraphs are state:
>
>    * RDP does not guarantee that a datagram is delivered to the remote
> application.
> * It is up to the RDP client to deal with datagrams lost due totransport
> failure or remote application failure.
>
> The HCA is still a fault domain with RDS - it does not addressflushing data> out of the HCA fault domain, nor does it sound like it ensures thatCQE loss
> is recoverable.
>
> I do believe RDS will replay all of the sendmsg's that it believes are
> pending, but it has no way to determine if already sent sendmsgs were
> actually successfully delivered to the remote application unless itprovides
> some level of resync of the outstanding sends not completed from an
> application's perspective as well as any state updated via RDMAoperations
> which may occur without an explicit send operation to flush to a known
> state. I'm still trying to ascertain whether RDS completelyrecovers from> HCA failure (assuming there is another HCA / path available) betweenthe two
> endnodes.

RDS will replay the sends that are completed in error by the HCA,
which typically would happen if the current path fails or the remote
node/HCA dies.
Does this mean that the receiving RDS entity is responsible for dealingwith duplicates?

I believe so...

A Send completion error does not mean that the

receiving endnode did not receive the data for either IB or iWARP; itonly indicates that the Send operation failed which could be just a lossof the receive ACK with the Send completing on the receiver. Such ascenario would imply that RDS would have to comprehend what buffers haveactually been consumed before retransmission, i.e. a resync isperformed, else one could receive duplicate data at the applicationlayer which can cause corruption or other problems as a function of theapplication (tolerance will vary by application thus the ULP mustpresent consistent semantics to enable a broader set of applicationsthan perhaps the initial targeted application to be supported).

In absence of any protocol level ack (and regardless of protocol levelack), it is the application which has to implement its ownreliability. RDS becomes a passive channel passing packet back andforth including duplicate packets. The responsibility then shifts tothe application to figure out what is missing, duplicate's etc.


Thanks
Nitin

In case of a catastrophic error on the local HCA, subsequent sendswill fail (for a certain time (session_time_wait ) ) as if there wasno alternate path available at that time. On getting an error theapplication should discard any sends unacknowledged by it's peer andtake corrective action.
Unacknowledged by the peer means at the interconnect or the applicationlevel? Again, how is the receive buffer management handled?
After the time_wait is over, subsequent sends will initiate a brandnew connection which could use the alternate HCA ( if the path isavailable).
This is understood.

Mike


------------------------------------------------------------------------

_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

Reply via email to