At 01:24 PM 11/9/2005, Greg Lindahl wrote:
On Wed, Nov 09, 2005 at 12:18:28PM -0800, Michael Krause wrote:

> So, things like HCA failure are not transparent and one cannot simply
> replay the operations since you don't know what was really seen by the
> other side unless the application performs the resync itself.

I think you are over-stating the case. On the remote end, the kernel
piece of RDS knows what it presented to the remote application, ditto
on the local end. If only an HCA fails, and not the sending and
receiving kernels or applications, that knowledge is not lost.

Perhaps you were assuming that RDS would be implemented only in
firmware on the HCA, and there is no kernel piece that knows what's
going on. I hadn't seen that stated by anyone, and of course there are
several existing and contemplated OpenIB devices that are considerably
different from the usual offload engine. You could also choose to
implement RDS using an offload engine and still keep enough state in
the kernel to recover.

I hadn't assumed anything.  I'm simply trying to understand the assertions concerning availability and recovery.  What you indicate above is that RDS will implement a resync of the two sides of the association to determine what has been successfully sent.  It will then retransmit what has not transparent to the application.  This then implies that the reliability of the underlying interconnect isn't as critical per se as the end-to-end RDS protocol will assure that data is delivered to the RDS components in the face of hardware failures.   Correct?

Mike
_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to