Michael Krause wrote:
At 01:01 PM 11/11/2005, Nitin Hande wrote:
Michael Krause wrote:
At 10:28 AM 11/9/2005, Rick Frank wrote:
Yes, the application is responsible for detecting lost msgs at the
application level - the transport can not do this.
RDS does not guarantee that a message has been delivered to the
application - just that once the transport has accepted a msg it
will deliver the msg to the remote node in order without duplication
- dealing with retransmissions, etc due to sporadic / intermittent
msg loss over the interconnect. If after accepting the send - the
current path fails - then RDS will transparently fail over to
another path - and if required will resend / send any already queued
msgs to the remote node - again insuring that no msg is duplicated
and they are in order. This is no different than APM - with the
exception that RDS can do this across HCAs.
The application - Oracle in this case - will deal with detecting a
catastrophic path failure - either due to a send that does not
arrive and or a timedout response or send failure returned from the
transport. If there is no network path to a remote node - it is
required that we remove the remote node from the operating cluster
to avoid what is commonly termed as a "split brain" condition -
otherwise known as a "partition in time".
BTW - in our case - the application failure domain logic is the same
whether we are using UDP / uDAPL / iTAPI / TCP / SCTP / etc.
Basically, if we can not talk to a remote node - after some defined
period of time - we will remove the remote node from the cluster. In
this case the database will recover all the interesting state that
may have been maintained on the removed node - allowing the
remaining nodes to continue. If later on, communication to the
remote node is restored - it will be allowed to rejoin the cluster
and take on application load.
Please clarify the following which was in the document provided by
Oracle.
On page 3 of the RDS document, under the section "RDP Interface", the
2nd and 3rd paragraphs are state:
* RDP does not guarantee that a datagram is delivered to the
remote application.
* It is up to the RDP client to deal with datagrams lost due to
transport failure or remote application failure.
The HCA is still a fault domain with RDS - it does not address
flushing data out of the HCA fault domain, nor does it sound like it
ensures that CQE loss is recoverable.
I do believe RDS will replay all of the sendmsg's that it believes
are pending, but it has no way to determine if already sent sendmsgs
were actually successfully delivered to the remote application unless
it provides some level of resync of the outstanding sends not
completed from an application's perspective as well as any state
updated via RDMA operations which may occur without an explicit send
operation to flush to a known state.
If RDS could define a mechanism that the application could use to
inform the sender to resync and replay on catastrophic failure, is
that a correct understanding of your suggestion ?
I'm not suggesting anything at this point. I'm trying to reconcile the
documentation with the e-mail statements made by its proponents.
I'm still trying to ascertain whether RDS completely
recovers from HCA failure (assuming there is another HCA / path
available) between the two endnodes
Reading at the doc and the thread, it looks like we need src/dst port
for multiplexing connections, we need seq/ack# for resyncing, we need
some kind of window availability for flow control. Are'nt we very
close to tcp header ? ..
TCP does not provide end-to-end to the application as implemented by
most OS. Unless one ties TCP ACK to the application's consumption of the
receive data, there is no method to ascertain that the application
really received the data. The application would be required to send
its own application-level acknowledgement. I believe the intent is for
applications to remain responsible for the end-to-end receipt of data
and that RDS and the interconnect are simply responsible for the
exchange at the lower levels.
Yes, a TCP ack only implies that it has received the data, and means
nothing to the application. It is the application which has send a
application level ack to its peer.
Nitin
Mike
------------------------------------------------------------------------
_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general