Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

Nitin Hande Mon, 14 Nov 2005 12:50:18 -0800

Michael Krause wrote:

At 01:01 PM 11/11/2005, Nitin Hande wrote:
Michael Krause wrote:
At 10:28 AM 11/9/2005, Rick Frank wrote:
Yes, the application is responsible for detecting lost msgs at theapplication level - the transport can not do this.RDS does not guarantee that a message has been delivered to theapplication - just that once the transport has accepted a msg itwill deliver the msg to the remote node in order without duplication- dealing with retransmissions, etc due to sporadic / intermittentmsg loss over the interconnect. If after accepting the send - thecurrent path fails - then RDS will transparently fail over toanother path - and if required will resend / send any already queuedmsgs to the remote node - again insuring that no msg is duplicatedand they are in order. This is no different than APM - with theexception that RDS can do this across HCAs.The application - Oracle in this case - will deal with detecting acatastrophic path failure - either due to a send that does notarrive and or a timedout response or send failure returned from thetransport. If there is no network path to a remote node - it isrequired that we remove the remote node from the operating clusterto avoid what is commonly termed as a "split brain" condition -otherwise known as a "partition in time".BTW - in our case - the application failure domain logic is the samewhether we are using UDP / uDAPL / iTAPI / TCP / SCTP / etc.Basically, if we can not talk to a remote node - after some definedperiod of time - we will remove the remote node from the cluster. Inthis case the database will recover all the interesting state thatmay have been maintained on the removed node - allowing theremaining nodes to continue. If later on, communication to theremote node is restored - it will be allowed to rejoin the clusterand take on application load.
Please clarify the following which was in the document provided byOracle.On page 3 of the RDS document, under the section "RDP Interface", the2nd and 3rd paragraphs are state:* RDP does not guarantee that a datagram is delivered to theremote application.* It is up to the RDP client to deal with datagrams lost due totransport failure or remote application failure.The HCA is still a fault domain with RDS - it does not addressflushing data out of the HCA fault domain, nor does it sound like itensures that CQE loss is recoverable.I do believe RDS will replay all of the sendmsg's that it believesare pending, but it has no way to determine if already sent sendmsgswere actually successfully delivered to the remote application unlessit provides some level of resync of the outstanding sends notcompleted from an application's perspective as well as any stateupdated via RDMA operations which may occur without an explicit sendoperation to flush to a known state.
If RDS could define a mechanism that the application could use toinform the sender to resync and replay on catastrophic failure, isthat a correct understanding of your suggestion ?
I'm not suggesting anything at this point. I'm trying to reconcile thedocumentation with the e-mail statements made by its proponents.
I'm still trying to ascertain whether RDS completely
recovers from HCA failure (assuming there is another HCA / pathavailable) between the two endnodes
Reading at the doc and the thread, it looks like we need src/dst portfor multiplexing connections, we need seq/ack# for resyncing, we needsome kind of window availability for flow control. Are'nt we veryclose to tcp header ? ..
TCP does not provide end-to-end to the application as implemented bymost OS. Unless one ties TCP ACK to the application's consumption of thereceive data, there is no method to ascertain that the applicationreally received the data. The application would be required to sendits own application-level acknowledgement. I believe the intent is forapplications to remain responsible for the end-to-end receipt of dataand that RDS and the interconnect are simply responsible for theexchange at the lower levels.

Yes, a TCP ack only implies that it has received the data, and meansnothing to the application. It is the application which has send aapplication level ack to its peer.


Nitin


Mike


------------------------------------------------------------------------

_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

Reply via email to