RE: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-15 Thread Caitlin Bestler



 

  In absence of any protocol level 
ack (and regardless of protocol level ack), it is the application which has 
to implement its own reliability. RDS becomes a passive channel passing 
packet back and forth including duplicate packets. The responsibility then 
shifts to the application to figure out what is missing, duplicate's 
etc.
  This would seem at odds with earlier assertions that as long as there 
  were another path to the endnode, RDS would transparently recover on behalf of 
  the application.  I thought Oracle stated for their application that send 
  failure would be interpreted as endnode failure and cast out the peer - 
  perhaps I misread their usage model.  Other applications who might want 
  to use RDS could be designed to deal with the associated faults but if one has 
  to deal with recovery / resync at the application layer, then that is quite a 
  bit of work to perform in every application and is again at odds with the 
  purpose of RDS which is to move reliability to the interconnect to the extent 
  possible and to RDS so that the UDP application does not need to take on this 
  complex code and attempt to get it right.[cait] 
   
I would agree that there isn't much point in defining a "reliable" 
datagram service unless it is more
reliable than unreliable.
 
To me that means that the the transport should deal with all networking 
problems other than a
*total* failure to re-establish contact with the remote end. That 
makes it basically equivalent 
of a point-to-point Reliable Connection.
 
The biggest difference, and justification, for having something 
like RDS is to eliminate point-to-point
flow control and allow it to be replaced with ULP based flow control 
that is not point-to-point. The
resources associated with tracking credits is where a lot of the overhead 
inherent in multiple
point-to-point connections come from (that, and the synchronization of 
that data over the
network).
 
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-15 Thread Michael Krause


At 12:49 PM 11/14/2005, Nitin Hande wrote:
Michael Krause wrote:
At 01:01 PM 11/11/2005, Nitin
Hande wrote:
Michael Krause wrote:
At 10:28 AM 11/9/2005, Rick
Frank wrote:
Yes, the application is
responsible for detecting lost msgs at the application level - the
transport can not do this.
 
RDS does not guarantee that a message has been delivered to the
application - just that once the transport has accepted a msg it will
deliver the msg to the remote node in order without duplication - dealing
with retransmissions, etc due to sporadic / intermittent msg loss over
the interconnect. If after accepting the send - the current path fails -
then RDS will transparently fail over to another path - and if required
will resend / send any already queued msgs to the remote node - again
insuring that no msg is duplicated and they are in order.  This is
no different than APM - with the exception that RDS can do this across
HCAs.
 
The application - Oracle in this case - will deal with detecting a
catastrophic path failure - either due to a send that does not arrive and
or a timedout response or send failure returned from the transport. If
there is no network path to a remote node - it is required that we remove
the remote node from the operating cluster to avoid what is commonly
termed as a "split brain" condition - otherwise known as a
"partition in time".
 
BTW - in our case - the application failure domain logic is the same
whether we are using UDP /  uDAPL / iTAPI / TCP / SCTP / etc.
Basically, if we can not talk to a remote node - after some defined
period of time - we will remove the remote node from the cluster. In this
case the database will recover all the interesting state that may have
been maintained on the removed node - allowing the remaining nodes to
continue. If later on, communication to the remote node is restored - it
will be allowed to rejoin the cluster and take on application load.

Please clarify the following which was in the document provided by
Oracle.
On page 3 of the RDS document, under the section "RDP
Interface", the 2nd and 3rd paragraphs are state:
   * RDP does not guarantee that a datagram is delivered to the
remote application.
   * It is up to the RDP client to deal with datagrams lost due
to transport failure or remote application failure.
The HCA is still a fault domain with RDS - it does not address flushing
data out of the HCA fault domain, nor does it sound like it ensures that
CQE loss is recoverable.
I do believe RDS will replay all of the sendmsg's that it believes are
pending, but it has no way to determine if already sent sendmsgs were
actually successfully delivered to the remote application unless it
provides some level of resync of the outstanding sends not completed from
an application's perspective as well as any state updated via RDMA
operations which may occur without an explicit send operation to flush to
a known state.  
If RDS could define a mechanism that the application could use to inform
the sender to resync and replay on catastrophic failure, is that a
correct understanding of your suggestion ?
I'm not suggesting anything at this point. I'm trying to reconcile the
documentation with the e-mail statements made by its proponents.
I'm still trying to ascertain
whether RDS completely
recovers from HCA failure
(assuming there is another HCA / path available) between the two
endnodes
Reading at the doc and the thread, it looks like we need src/dst port for
multiplexing connections, we need seq/ack# for resyncing, we need some
kind of window availability for flow control. Are'nt we very close to tcp
header ? ..
TCP does not provide end-to-end to the application as implemented by most
OS. Unless one ties TCP ACK to the application's consumption of the
receive data, there is no method to ascertain that the application really
received the data.   The application would be required to send
its own application-level acknowledgement.   I believe the
intent is for applications to remain responsible for the end-to-end
receipt of data and that RDS and the interconnect are simply responsible
for the exchange at the lower levels.Yes, a TCP ack only
implies that it has received the data, and means nothing to the
application. It is the application which has send a application level ack
to its peer.
TCP ACK was intended to be an end-to-end ACK but implementations took it
to a lower level ACK only.  A TCP stack linked into an application
as demonstrated by multiple IHV and research does provide an end-to-end
ACK and considerable performance improvements over the traditional
network stack implementations.  Some claim it is more than good
enough to eliminate the need for protocol off-load / RDMA which is true
for many applications (certainly for most Sockets, etc.)  but not
true when one takes advantage of the RDMA comms paradigm which has
benefit for a number of applications.
Mike

___
openib-general mailing list
openib-general@openib.org

Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-15 Thread Michael Krause


At 12:49 PM 11/14/2005, Nitin Hande wrote:
Michael Krause wrote:
At 01:02 PM 11/11/2005, Ranjit
Pandit wrote:
On 11/11/05, Michael Krause
<[EMAIL PROTECTED]> wrote:
> Please clarify the following which was in the document provided by
Oracle.
>
> On page 3 of the RDS document, under the section "RDP
Interface", the 2nd
> and 3rd paragraphs are state:
>
>    * RDP does not guarantee that a datagram is
delivered to the remote
> application.
>    * It is up to the RDP client to deal with
datagrams lost due to transport
> failure or remote application failure.
>
> The HCA is still a fault domain with RDS - it does not address
flushing data
> out of the HCA fault domain, nor does it sound like it ensures that
CQE loss
> is recoverable.
>
> I do believe RDS will replay all of the sendmsg's that it believes
are
> pending, but it has no way to determine if already sent sendmsgs
were
> actually successfully delivered to the remote application unless it
provides
> some level of resync of the outstanding sends not completed from
an
> application's perspective as well as any state updated via RDMA
operations
> which may occur without an explicit send operation to flush to a
known
> state.  I'm still trying to ascertain whether RDS completely
recovers from
> HCA failure (assuming there is another HCA / path available) between
the two
> endnodes.
RDS will replay the sends that are completed in error by the HCA,
which typically would happen if the current path fails or the remote
node/HCA dies.
Does this mean that the receiving RDS entity is responsible for dealing
with duplicates?  I believe so...
A Send completion error does not mean that the
receiving endnode did not
receive the data for either IB or iWARP; it only indicates that the Send
operation failed which could be just a loss of the receive ACK with the
Send completing on the receiver.  Such a
scenario would imply that RDS would have to comprehend what buffers have
actually been consumed before retransmission, i.e. a resync is performed,
else one could receive duplicate data at the application layer which can
cause corruption or other problems as a function of the application
(tolerance will vary by application thus the ULP must present consistent
semantics to enable a broader set of applications than perhaps the
initial targeted application to be supported).In absence of
any protocol level ack (and regardless of protocol level ack), it is the
application which has to implement its own reliability. RDS becomes a
passive channel passing packet back and forth including duplicate
packets. The responsibility then shifts to the application to figure out
what is missing, duplicate's etc.
This would seem at odds with earlier assertions that as long as there
were another path to the endnode, RDS would transparently recover on
behalf of the application.  I thought Oracle stated for their
application that send failure would be interpreted as endnode failure and
cast out the peer - perhaps I misread their usage model.  Other
applications who might want to use RDS could be designed to deal with the
associated faults but if one has to deal with recovery / resync at the
application layer, then that is quite a bit of work to perform in every
application and is again at odds with the purpose of RDS which is to move
reliability to the interconnect to the extent possible and to RDS so that
the UDP application does not need to take on this complex code and
attempt to get it right.
Mike

Thanks
Nitin


In case of a catastrophic error
on the local HCA, subsequent sends will fail (for a certain time
(session_time_wait ) ) as if there was no alternate path available at
that time. On getting an error the application should discard any sends
unacknowledged by it's peer and take corrective action.
Unacknowledged by the peer means at the interconnect or the application
level?  Again, how is the receive buffer management
handled?
After the time_wait is over,
subsequent sends will initiate a brand new connection which could use the
alternate HCA ( if the path is available).
This is understood.
Mike

___
openib-general mailing list
openib-general@openib.org

http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit

http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-14 Thread Nitin Hande

Michael Krause wrote:

At 01:01 PM 11/11/2005, Nitin Hande wrote:


Michael Krause wrote:


At 10:28 AM 11/9/2005, Rick Frank wrote:

Yes, the application is responsible for detecting lost msgs at the 
application level - the transport can not do this.
 
RDS does not guarantee that a message has been delivered to the 
application - just that once the transport has accepted a msg it 
will deliver the msg to the remote node in order without duplication 
- dealing with retransmissions, etc due to sporadic / intermittent 
msg loss over the interconnect. If after accepting the send - the 
current path fails - then RDS will transparently fail over to 
another path - and if required will resend / send any already queued 
msgs to the remote node - again insuring that no msg is duplicated 
and they are in order.  This is no different than APM - with the 
exception that RDS can do this across HCAs.
 
The application - Oracle in this case - will deal with detecting a 
catastrophic path failure - either due to a send that does not 
arrive and or a timedout response or send failure returned from the 
transport. If there is no network path to a remote node - it is 
required that we remove the remote node from the operating cluster 
to avoid what is commonly termed as a "split brain" condition - 
otherwise known as a "partition in time".
 
BTW - in our case - the application failure domain logic is the same 
whether we are using UDP /  uDAPL / iTAPI / TCP / SCTP / etc. 
Basically, if we can not talk to a remote node - after some defined 
period of time - we will remove the remote node from the cluster. In 
this case the database will recover all the interesting state that 
may have been maintained on the removed node - allowing the 
remaining nodes to continue. If later on, communication to the 
remote node is restored - it will be allowed to rejoin the cluster 
and take on application load. 



Please clarify the following which was in the document provided by 
Oracle.
On page 3 of the RDS document, under the section "RDP Interface", the 
2nd and 3rd paragraphs are state:
   * RDP does not guarantee that a datagram is delivered to the 
remote application.
   * It is up to the RDP client to deal with datagrams lost due to 
transport failure or remote application failure.
The HCA is still a fault domain with RDS - it does not address 
flushing data out of the HCA fault domain, nor does it sound like it 
ensures that CQE loss is recoverable.
I do believe RDS will replay all of the sendmsg's that it believes 
are pending, but it has no way to determine if already sent sendmsgs 
were actually successfully delivered to the remote application unless 
it provides some level of resync of the outstanding sends not 
completed from an application's perspective as well as any state 
updated via RDMA operations which may occur without an explicit send 
operation to flush to a known state.  


If RDS could define a mechanism that the application could use to 
inform the sender to resync and replay on catastrophic failure, is 
that a correct understanding of your suggestion ?



I'm not suggesting anything at this point. I'm trying to reconcile the 
documentation with the e-mail statements made by its proponents.



I'm still trying to ascertain whether RDS completely

recovers from HCA failure (assuming there is another HCA / path 
available) between the two endnodes


Reading at the doc and the thread, it looks like we need src/dst port 
for multiplexing connections, we need seq/ack# for resyncing, we need 
some kind of window availability for flow control. Are'nt we very 
close to tcp header ? ..



TCP does not provide end-to-end to the application as implemented by 
most OS. Unless one ties TCP ACK to the application's consumption of the 
receive data, there is no method to ascertain that the application 
really received the data.   The application would be required to send 
its own application-level acknowledgement.   I believe the intent is for 
applications to remain responsible for the end-to-end receipt of data 
and that RDS and the interconnect are simply responsible for the 
exchange at the lower levels.
Yes, a TCP ack only implies that it has received the data, and means 
nothing to the application. It is the application which has send a 
application level ack to its peer.


Nitin



Mike




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-14 Thread Nitin Hande

Michael Krause wrote:

At 01:02 PM 11/11/2005, Ranjit Pandit wrote:


On 11/11/05, Michael Krause <[EMAIL PROTECTED]> wrote:
> Please clarify the following which was in the document provided by 
Oracle.

>
> On page 3 of the RDS document, under the section "RDP Interface", 
the 2nd

> and 3rd paragraphs are state:
>
>* RDP does not guarantee that a datagram is delivered to the remote
> application.
>* It is up to the RDP client to deal with datagrams lost due to 
transport

> failure or remote application failure.
>
> The HCA is still a fault domain with RDS - it does not address 
flushing data
> out of the HCA fault domain, nor does it sound like it ensures that 
CQE loss

> is recoverable.
>
> I do believe RDS will replay all of the sendmsg's that it believes are
> pending, but it has no way to determine if already sent sendmsgs were
> actually successfully delivered to the remote application unless it 
provides

> some level of resync of the outstanding sends not completed from an
> application's perspective as well as any state updated via RDMA 
operations

> which may occur without an explicit send operation to flush to a known
> state.  I'm still trying to ascertain whether RDS completely 
recovers from
> HCA failure (assuming there is another HCA / path available) between 
the two

> endnodes.

RDS will replay the sends that are completed in error by the HCA,
which typically would happen if the current path fails or the remote
node/HCA dies.



Does this mean that the receiving RDS entity is responsible for dealing 
with duplicates?  

I believe so...

A Send completion error does not mean that the
receiving endnode did not receive the data for either IB or iWARP; it 
only indicates that the Send operation failed which could be just a loss 
of the receive ACK with the Send completing on the receiver.  Such a
scenario would imply that RDS would have to comprehend what buffers have 
actually been consumed before retransmission, i.e. a resync is 
performed, else one could receive duplicate data at the application 
layer which can cause corruption or other problems as a function of the 
application (tolerance will vary by application thus the ULP must 
present consistent semantics to enable a broader set of applications 
than perhaps the initial targeted application to be supported).
In absence of any protocol level ack (and regardless of protocol level 
ack), it is the application which has to implement its own 
reliability. RDS becomes a passive channel passing packet back and 
forth including duplicate packets. The responsibility then shifts to 
the application to figure out what is missing, duplicate's etc.


Thanks
Nitin




In case of a catastrophic error on the local HCA, subsequent sends 
will fail (for a certain time (session_time_wait ) ) as if there was 
no alternate path available at that time. On getting an error the 
application should discard any sends unacknowledged by it's peer and 
take corrective action.



Unacknowledged by the peer means at the interconnect or the application 
level?  Again, how is the receive buffer management handled?


After the time_wait is over, subsequent sends will initiate a brand 
new connection which could use the alternate HCA ( if the path is 
available).



This is understood.

Mike




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-11 Thread Michael Krause


At 01:02 PM 11/11/2005, Ranjit Pandit wrote:
On 11/11/05, Michael Krause
<[EMAIL PROTECTED]> wrote:
> Please clarify the following which was in the document provided by
Oracle.
>
> On page 3 of the RDS document, under the section "RDP
Interface", the 2nd
> and 3rd paragraphs are state:
>
>    * RDP does not guarantee that a datagram is
delivered to the remote
> application.
>    * It is up to the RDP client to deal with
datagrams lost due to transport
> failure or remote application failure.
>
> The HCA is still a fault domain with RDS - it does not address
flushing data
> out of the HCA fault domain, nor does it sound like it ensures that
CQE loss
> is recoverable.
>
> I do believe RDS will replay all of the sendmsg's that it believes
are
> pending, but it has no way to determine if already sent sendmsgs
were
> actually successfully delivered to the remote application unless it
provides
> some level of resync of the outstanding sends not completed from
an
> application's perspective as well as any state updated via RDMA
operations
> which may occur without an explicit send operation to flush to a
known
> state.  I'm still trying to ascertain whether RDS completely
recovers from
> HCA failure (assuming there is another HCA / path available) between
the two
> endnodes.
RDS will replay the sends that are completed in error by the HCA,
which typically would happen if the current path fails or the remote
node/HCA dies.
Does this mean that the receiving RDS entity is responsible for dealing
with duplicates?  A Send completion error does not mean that the
receiving endnode did not receive the data for either IB or iWARP; it
only indicates that the Send operation failed which could be just a loss
of the receive ACK with the Send completing on the receiver.  Such a
scenario would imply that RDS would have to comprehend what buffers have
actually been consumed before retransmission, i.e. a resync is performed,
else one could receive duplicate data at the application layer which can
cause corruption or other problems as a function of the application
(tolerance will vary by application thus the ULP must present consistent
semantics to enable a broader set of applications than perhaps the
initial targeted application to be supported).
In case of a
catastrophic error on the local HCA, subsequent sends will fail (for a
certain time (session_time_wait ) ) as if there was no alternate path
available at that time. On getting an error the application should
discard any sends unacknowledged by it's peer and take corrective
action.
Unacknowledged by the peer means at the interconnect or the application
level?  Again, how is the receive buffer management
handled?
After the time_wait
is over, subsequent sends will initiate a brand new connection which
could use the alternate HCA ( if the path is
available).
This is understood.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-11 Thread Michael Krause


At 01:01 PM 11/11/2005, Nitin Hande wrote:
Michael Krause wrote:
At 10:28 AM 11/9/2005, Rick
Frank wrote:
Yes, the application is
responsible for detecting lost msgs at the application level - the
transport can not do this.
 
RDS does not guarantee that a message has been delivered to the
application - just that once the transport has accepted a msg it will
deliver the msg to the remote node in order without duplication - dealing
with retransmissions, etc due to sporadic / intermittent msg loss over
the interconnect. If after accepting the send - the current path fails -
then RDS will transparently fail over to another path - and if required
will resend / send any already queued msgs to the remote node - again
insuring that no msg is duplicated and they are in order.  This is
no different than APM - with the exception that RDS can do this across
HCAs.
 
The application - Oracle in this case - will deal with detecting a
catastrophic path failure - either due to a send that does not arrive and
or a timedout response or send failure returned from the transport. If
there is no network path to a remote node - it is required that we remove
the remote node from the operating cluster to avoid what is commonly
termed as a "split brain" condition - otherwise known as a
"partition in time".
 
BTW - in our case - the application failure domain logic is the same
whether we are using UDP /  uDAPL / iTAPI / TCP / SCTP / etc.
Basically, if we can not talk to a remote node - after some defined
period of time - we will remove the remote node from the cluster. In this
case the database will recover all the interesting state that may have
been maintained on the removed node - allowing the remaining nodes to
continue. If later on, communication to the remote node is restored - it
will be allowed to rejoin the cluster and take on application load.

Please clarify the following which was in the document provided by
Oracle.
On page 3 of the RDS document, under the section "RDP
Interface", the 2nd and 3rd paragraphs are state:
   * RDP does not guarantee that a datagram is delivered to the
remote application.
   * It is up to the RDP client to deal with datagrams lost due
to transport failure or remote application failure.
The HCA is still a fault domain with RDS - it does not address flushing
data out of the HCA fault domain, nor does it sound like it ensures that
CQE loss is recoverable.
I do believe RDS will replay all of the sendmsg's that it believes are
pending, but it has no way to determine if already sent sendmsgs were
actually successfully delivered to the remote application unless it
provides some level of resync of the outstanding sends not completed from
an application's perspective as well as any state updated via RDMA
operations which may occur without an explicit send operation to flush to
a known state.  If RDS could define a mechanism that
the application could use to inform the sender to resync and replay on
catastrophic failure, is that a correct understanding of your suggestion
?
I'm not suggesting anything at this point. I'm trying to reconcile the
documentation with the e-mail statements made by its proponents.
I'm still trying to
ascertain whether RDS completely
recovers from HCA failure
(assuming there is another HCA / path available) between the two
endnodesReading at the doc and the thread, it looks like we
need src/dst port for multiplexing connections, we need seq/ack# for
resyncing, we need some kind of window availability for flow control.
Are'nt we very close to tcp header ? ..
TCP does not provide end-to-end to the application as implemented by most
OS. Unless one ties TCP ACK to the application's consumption of the
receive data, there is no method to ascertain that the application really
received the data.   The application would be required to send
its own application-level acknowledgement.   I believe the
intent is for applications to remain responsible for the end-to-end
receipt of data and that RDS and the interconnect are simply responsible
for the exchange at the lower levels.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-11 Thread Ranjit Pandit
On 11/11/05, Michael Krause <[EMAIL PROTECTED]> wrote:
> Please clarify the following which was in the document provided by Oracle.
>
> On page 3 of the RDS document, under the section "RDP Interface", the 2nd
> and 3rd paragraphs are state:
>
>* RDP does not guarantee that a datagram is delivered to the remote
> application.
>* It is up to the RDP client to deal with datagrams lost due to transport
> failure or remote application failure.
>
> The HCA is still a fault domain with RDS - it does not address flushing data
> out of the HCA fault domain, nor does it sound like it ensures that CQE loss
> is recoverable.
>
> I do believe RDS will replay all of the sendmsg's that it believes are
> pending, but it has no way to determine if already sent sendmsgs were
> actually successfully delivered to the remote application unless it provides
> some level of resync of the outstanding sends not completed from an
> application's perspective as well as any state updated via RDMA operations
> which may occur without an explicit send operation to flush to a known
> state.  I'm still trying to ascertain whether RDS completely recovers from
> HCA failure (assuming there is another HCA / path available) between the two
> endnodes.

RDS will replay the sends that are completed in error by the HCA,
which typically would happen if the current path fails or the remote
node/HCA dies.

In case of a catastrophic error on the local HCA, subsequent sends
will fail (for a certain time (session_time_wait ) ) as if there was
no alternate path available at that time.
On getting an error the application should discard any sends
unacknowledged by it's peer and take corrective action.

After the time_wait is over, subsequent sends will initiate a brand
new connection which could use the alternate HCA ( if the path is
available).

>
> Mike
>
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-11 Thread Nitin Hande

Michael Krause wrote:

At 10:28 AM 11/9/2005, Rick Frank wrote:

Yes, the application is responsible for detecting lost msgs at the 
application level - the transport can not do this.
 
RDS does not guarantee that a message has been delivered to the 
application - just that once the transport has accepted a msg it will 
deliver the msg to the remote node in order without duplication - 
dealing with retransmissions, etc due to sporadic / intermittent msg 
loss over the interconnect. If after accepting the send - the current 
path fails - then RDS will transparently fail over to another path - 
and if required will resend / send any already queued msgs to the 
remote node - again insuring that no msg is duplicated and they are in 
order.  This is no different than APM - with the exception that RDS 
can do this across HCAs.
 
The application - Oracle in this case - will deal with detecting a 
catastrophic path failure - either due to a send that does not arrive 
and or a timedout response or send failure returned from the 
transport. If there is no network path to a remote node - it is 
required that we remove the remote node from the operating cluster to 
avoid what is commonly termed as a "split brain" condition - otherwise 
known as a "partition in time".
 
BTW - in our case - the application failure domain logic is the same 
whether we are using UDP /  uDAPL / iTAPI / TCP / SCTP / etc. 
Basically, if we can not talk to a remote node - after some defined 
period of time - we will remove the remote node from the cluster. In 
this case the database will recover all the interesting state that may 
have been maintained on the removed node - allowing the remaining 
nodes to continue. If later on, communication to the remote node is 
restored - it will be allowed to rejoin the cluster and take on 
application load. 




Please clarify the following which was in the document provided by Oracle.

On page 3 of the RDS document, under the section "RDP Interface", the 
2nd and 3rd paragraphs are state:


   * RDP does not guarantee that a datagram is delivered to the remote 
application.
   * It is up to the RDP client to deal with datagrams lost due to 
transport failure or remote application failure.


The HCA is still a fault domain with RDS - it does not address flushing 
data out of the HCA fault domain, nor does it sound like it ensures that 
CQE loss is recoverable.


I do believe RDS will replay all of the sendmsg's that it believes are 
pending, but it has no way to determine if already sent sendmsgs were 
actually successfully delivered to the remote application unless it 
provides some level of resync of the outstanding sends not completed 
from an application's perspective as well as any state updated via RDMA 
operations which may occur without an explicit send operation to flush 
to a known state.  
If RDS could define a mechanism that the application could use to 
inform the sender to resync and replay on catastrophic failure, is 
that a correct understanding of your suggestion ?


I'm still trying to ascertain whether RDS completely
recovers from HCA failure (assuming there is another HCA / path 
available) between the two endnodes
Reading at the doc and the thread, it looks like we need src/dst port 
for multiplexing connections, we need seq/ack# for resyncing, we need 
some kind of window availability for flow control. Are'nt we very 
close to tcp header ? ..


Nitin

.


Mike




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-11 Thread Michael Krause


At 10:28 AM 11/9/2005, Rick Frank wrote:

Yes, the application is responsible for detecting lost msgs at the
application level - the transport can not do this.
 
RDS does not guarantee that a message
has been delivered to the application - just that once the transport has
accepted a msg it will deliver the msg to the remote node in order
without duplication - dealing with retransmissions, etc due to sporadic /
intermittent msg loss over the interconnect. If after accepting the send
- the current path fails - then RDS will transparently fail over to
another path - and if required will resend / send any already queued msgs
to the remote node - again insuring that no msg is duplicated and they
are in order.  This is no different than APM - with the exception
that RDS can do this across HCAs. 
 
The application - Oracle in this case -
will deal with detecting a catastrophic path failure - either due to a
send that does not arrive and or a timedout response or send failure
returned from the transport. If there is no network path to a remote node
- it is required that we remove the remote node from the operating
cluster to avoid what is commonly termed as a "split brain"
condition - otherwise known as a "partition in time".
 
BTW - in our case - the application
failure domain logic is the same whether we are using UDP /  uDAPL /
iTAPI / TCP / SCTP / etc. Basically, if we can not talk to a remote node
- after some defined period of time - we will remove the remote node from
the cluster. In this case the database will recover all the interesting
state that may have been maintained on the removed node - allowing the
remaining nodes to continue. If later on, communication to the remote
node is restored - it will be allowed to rejoin the cluster and take on
application load. 
Please clarify the following which was in the document provided by
Oracle. 
On page 3 of the RDS document, under the section "RDP
Interface", the 2nd and 3rd paragraphs are state: 
   * RDP does not guarantee that a datagram is delivered to the
remote application.
   * It is up to the RDP client to deal with datagrams lost due
to transport failure or remote application failure.
The HCA is still a fault domain with RDS - it does not address flushing
data out of the HCA fault domain, nor does it sound like it ensures that
CQE loss is recoverable.
I do believe RDS will replay all of the sendmsg's that it believes are
pending, but it has no way to determine if already sent sendmsgs were
actually successfully delivered to the remote application unless it
provides some level of resync of the outstanding sends not completed from
an application's perspective as well as any state updated via RDMA
operations which may occur without an explicit send operation to flush to
a known state.  I'm still trying to ascertain whether RDS completely
recovers from HCA failure (assuming there is another HCA / path
available) between the two endnodes.
Mike


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB

2005-11-10 Thread Caitlin Bestler



 

  My concern is the requirement that RDS resync the structures in the 
  face of failureand know whether to re-transmit or will deal with 
  duplicates.  Having pre-posted bufferswill help enable the resync to 
  be accomplished but should not be equated to pre-post equalsone can deal 
  with duplicates or will verify to prevent duplicates from 
  occurring.Mike  
   
The 
semantics should be that barring an error the flow between any 
two
endpoints is reliable and ordered.
 
The 
difference versus a normal point-to-point definition of reliable is 
that
a) 
lack of a receive buffer is an error, b) the endpoint 
communicates
with 
many known remote peers (as opposed to one known remote
peer, 
or many unknown).
 
Having 
an API with those semantics, particularly as an upgrade in
semanitcs from SOCK_DGRAM while preserving 
SOCK_DGRAM
syntax, is something that I believe is of distinct value to 
many
cluster based applications. Further the API can be 
implemeneted
in an 
offload device (IB or IP) more efficiently than if it is 
simply
implemented on top of SOCK_STREAM sockets by the 
application.
 
Documenting and clarifying the semantics to make it's general 
applicability
clearer should definitely be done, however.
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB

2005-11-10 Thread Michael Krause


At 10:48 AM 11/10/2005, Caitlin Bestler wrote:
 

Mike Krause wrote in response to Greg Lindahl:


>   If it is to
be reasonably robust, then RDS should be required to
support
> the resync between the two sides of the communication.  This
aligns
with the
> stated objective of implementing reliability in one location in
software and
> one location in hardware.  Without such resync being required
in the
ULP,
> then one ends up with a ULP that falls shorts of its stated
objectives
and
> pushes complexity back up to the application which is where the
advocates
> have stated it is too complex or expensive to get it correct.

I haven't reread all of RDS fine print to double-check this, but my
impression is that RDS semantics exactly match the subset of MPI
point-to-point communications where the receiving rank is required
to have pre-posted buffers before the send is allowed.

My concern is the requirement that RDS resync the structures in the face
of failure and know whether to re-transmit or will deal with
duplicates.  Having pre-posted buffers will help enable the resync
to be accomplished but should not be equated to pre-post equals one can
deal with duplicates or will verify to prevent duplicates from
occurring.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-10 Thread Rick Frank

Yes, this is the case.

- Original Message - 
From: "Caitlin Bestler" <[EMAIL PROTECTED]>

To: 
Sent: Thursday, November 10, 2005 1:48 PM
Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS 
(ReliableDatagramSockets) to OpenIB






Mike Krause wrote in response to Greg Lindahl:



If it is to be reasonably robust, then RDS should be required to

support

the resync between the two sides of the communication.  This aligns

with the

stated objective of implementing reliability in one location in

software and

one location in hardware.  Without such resync being required in the

ULP,

then one ends up with a ULP that falls shorts of its stated objectives

and

pushes complexity back up to the application which is where the

advocates

have stated it is too complex or expensive to get it correct.






This sort of message service, by the way, has a long

history in distributed computing.



Yep.



I haven't reread all of RDS fine print to double-check this, but my
impression is that RDS semantics exactly match the subset of MPI
point-to-point communications where the receiving rank is required
to have pre-posted buffers before the send is allowed.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB

2005-11-10 Thread Caitlin Bestler
 


Mike Krause wrote in response to Greg Lindahl:


>   If it is to be reasonably robust, then RDS should be required to
support
> the resync between the two sides of the communication.  This aligns
with the
> stated objective of implementing reliability in one location in
software and
> one location in hardware.  Without such resync being required in the
ULP,
> then one ends up with a ULP that falls shorts of its stated objectives
and
> pushes complexity back up to the application which is where the
advocates
> have stated it is too complex or expensive to get it correct.




>>  This sort of message service, by the way, has a long
history in distributed computing.


>   Yep.   


I haven't reread all of RDS fine print to double-check this, but my
impression is that RDS semantics exactly match the subset of MPI
point-to-point communications where the receiving rank is required
to have pre-posted buffers before the send is allowed.


 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-10 Thread Michael Krause


At 02:09 PM 11/9/2005, Greg Lindahl wrote:
On Wed, Nov 09, 2005 at
01:57:06PM -0800, Michael Krause wrote:
> What you indicate above is that RDS 
> will implement a resync of the two sides of the association to
determine 
> what has been successfully sent.
More accurate to say that it "could" implement that. I'm
just
kibbutzing on someone else's proposal.
> This then implies that the reliability of the underlying
> interconnect isn't as critical per se as the end-to-end RDS
protocol
> will assure that data is delivered to the RDS components in the
face
> of hardware failures.  Correct?
Yes. That's the intent that I see in the proposal. The
implementation
required to actually support this may not be what the proposers had
in
mind.
If it is to be reasonably robust, then RDS should be required to support
the resync between the two sides of the communication.  This aligns
with the stated objective of implementing reliability in one location in
software and one location in hardware.  Without such resync being
required in the ULP, then one ends up with a ULP that falls shorts of its
stated objectives and pushes complexity back up to the application which
is where the advocates have stated it is too complex or expensive to get
it correct.

This sort of
message service, by the way, has a long history in distributed
computing.
Yep.   
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Grant Grundler
On Wed, Nov 09, 2005 at 12:45:17PM -0800, Caitlin Bestler wrote:
>  
...

Caitlin,
I'm having problems reading the quoting "style" too.
Please, can you take a look at "quotefix"?
http://home.in.tum.de/~jain/software/outlook-quotefix/

thanks,
grant



> 
> 
>   From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Michael Krause
>   Sent: Wednesday, November 09, 2005 12:21 PM
>   To: Rick Frank; Ranjit Pandit
>   Cc: openib-general@openib.org
>   Subject: Re: [openib-general] [ANNOUNCE] Contribute
> RDS(ReliableDatagramSockets) to OpenIB
>   
>   
> 
>   One could be able to talk to the remote node across
> other HCA but that does not mean one has an understanding of the state
> at the remote node unless the failure is noted and a resync of state
> occurs or the remote is able to deal with duplicates, etc.   This has
> nothing to do with API or the transport involved but, as Caitlin noted,
> the difference between knowing a send buffer is free vs. knowing that
> the application received the data requested.  Therefore, one has only
> reduced the reliability / robustness problem space to some extent but
> has not solved it by the use of RDS.
>   
>   
> 
> Correct. When there are point-to-point credits (even if only
> enforced/understood
> at the ULP) then the application can correctly infer that message N was
> successfully processed because the matching credit was restored. A
> transport
> neutral application can only communicate restoration of credits via ULP
> messaging. When credits are shared across sessions then the ULP
> has a much more complex task to properly communicate credits.
>  
> The proposal I presented at RAIT for multistreamed MPA had a
> non-highlighted
> option for a "wildcard" endpoint. Without the option multistream MPA is
> essentially
> the SCTP adaptation for RDMA running over plain MPA/TCP. It achieves the
> same reduction in reliable transport layer connections that RDS does,
> but
> does not reduce the number of RDMA endpoints. The wildcard option 
> reduces the number of RDMA endpoints as well, but greatly complicates
> the RDMA state machines. RDS over IB faces similar problems, but solved
> them slightly differently.
>  
> Over iWARP I believe these complexities favor keeping the point-to-point
> logical connection between QP and only reducing the number of L4 
> connections (from many TCP connections to a single TCP connection
> or SCTP association). The advantage of that approach is that the API
> from application to RDMA endpoint (QP) can be left totally unchanged.
> But I do not see any such option over IB, unless RD is improved or a
> new SCTP-like connection mode is defined.
>  
> In my opinion the multi-streaming is the most important feature here,
> but over IB I do not think there is a natural adaptation that provides
> multi-streaming without also adding the any-to-any endpoint semantics.
> Multistream MPA and SCTP can both support the any-to-any endpoint
> semantics by moving the source to payload information rather than
> transport information (by invoking "wildcard status" in MS-MPA or
> by duplicating the field for SCTP). So the RDS API strikes me as
> the best option for a transport neutral application. MS-MPA and SCTP
> reductions in transport overhead would be available without special
> API support.
>  

> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Ranjit Pandit
On 11/9/05, Greg Lindahl <[EMAIL PROTECTED]> wrote:
> On Wed, Nov 09, 2005 at 01:57:06PM -0800, Michael Krause wrote:
>
> > What you indicate above is that RDS
> > will implement a resync of the two sides of the association to determine
> > what has been successfully sent.
>
> More accurate to say that it "could" implement that. I'm just
> kibbutzing on someone else's proposal.
>
> > This then implies that the reliability of the underlying
> > interconnect isn't as critical per se as the end-to-end RDS protocol
> > will assure that data is delivered to the RDS components in the face
> > of hardware failures.  Correct?
>
> Yes. That's the intent that I see in the proposal. The implementation
> required to actually support this may not be what the proposers had in
> mind.

The reference implementation of RDS already supports this.
It supports failover across HCAs just like APM does across ports within an HCA.

>
> This sort of message service, by the way, has a long history in
> distributed computing.
>
> -- greg
>
>
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Ranjit Pandit
On 11/9/05, Michael Krause <[EMAIL PROTECTED]> wrote:

>  I hadn't assumed anything.  I'm simply trying to understand the assertions
> concerning availability and recovery.  What you indicate above is that RDS
> will implement a resync of the two sides of the association to determine
> what has been successfully sent.  It will then retransmit what has not
> transparent to the application.  This then implies that the reliability of
> the underlying interconnect isn't as critical per se as the end-to-end RDS
> protocol will assure that data is delivered to the RDS components in the
> face of hardware failures.   Correct?
>
>  Mike

Correct.

Ranjit

> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Greg Lindahl
On Wed, Nov 09, 2005 at 01:57:06PM -0800, Michael Krause wrote:

> What you indicate above is that RDS 
> will implement a resync of the two sides of the association to determine 
> what has been successfully sent.

More accurate to say that it "could" implement that. I'm just
kibbutzing on someone else's proposal.

> This then implies that the reliability of the underlying
> interconnect isn't as critical per se as the end-to-end RDS protocol
> will assure that data is delivered to the RDS components in the face
> of hardware failures.  Correct?

Yes. That's the intent that I see in the proposal. The implementation
required to actually support this may not be what the proposers had in
mind.

This sort of message service, by the way, has a long history in
distributed computing.

-- greg


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Michael Krause


At 01:24 PM 11/9/2005, Greg Lindahl wrote:
On Wed, Nov 09, 2005 at
12:18:28PM -0800, Michael Krause wrote:
> So, things like HCA failure are not transparent and one cannot
simply 
> replay the operations since you don't know what was really seen by
the 
> other side unless the application performs the resync
itself.
I think you are over-stating the case. On the remote end, the kernel
piece of RDS knows what it presented to the remote application,
ditto
on the local end. If only an HCA fails, and not the sending and
receiving kernels or applications, that knowledge is not lost.
Perhaps you were assuming that RDS would be implemented only in
firmware on the HCA, and there is no kernel piece that knows what's
going on. I hadn't seen that stated by anyone, and of course there
are
several existing and contemplated OpenIB devices that are
considerably
different from the usual offload engine. You could also choose to
implement RDS using an offload engine and still keep enough state in
the kernel to recover.
I hadn't assumed anything.  I'm simply trying to understand the
assertions concerning availability and recovery.  What you indicate
above is that RDS will implement a resync of the two sides of the
association to determine what has been successfully sent.  It will
then retransmit what has not transparent to the application.  This
then implies that the reliability of the underlying interconnect isn't as
critical per se as the end-to-end RDS protocol will assure that data is
delivered to the RDS components in the face of hardware
failures.   Correct?
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Greg Lindahl
On Wed, Nov 09, 2005 at 12:18:28PM -0800, Michael Krause wrote:

> So, things like HCA failure are not transparent and one cannot simply 
> replay the operations since you don't know what was really seen by the 
> other side unless the application performs the resync itself.

I think you are over-stating the case. On the remote end, the kernel
piece of RDS knows what it presented to the remote application, ditto
on the local end. If only an HCA fails, and not the sending and
receiving kernels or applications, that knowledge is not lost.

Perhaps you were assuming that RDS would be implemented only in
firmware on the HCA, and there is no kernel piece that knows what's
going on. I hadn't seen that stated by anyone, and of course there are
several existing and contemplated OpenIB devices that are considerably
different from the usual offload engine. You could also choose to
implement RDS using an offload engine and still keep enough state in
the kernel to recover.

-- greg

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Caitlin Bestler



 

  
  
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Michael 
  KrauseSent: Wednesday, November 09, 2005 12:21 PMTo: 
  Rick Frank; Ranjit PanditCc: 
  openib-general@openib.orgSubject: Re: [openib-general] [ANNOUNCE] 
  Contribute RDS(ReliableDatagramSockets) to OpenIB
  
  One could be able to talk to the 
remote node across other HCA but that does not mean one has an understanding 
of the state at the remote node unless the failure is noted and a resync of 
state occurs or the remote is able to deal with duplicates, etc.   
This has nothing to do with API or the transport involved but, as Caitlin 
noted, the difference between knowing a send buffer is free vs. knowing that 
the application received the data requested.  Therefore, one has only 
reduced the reliability / robustness problem space to some extent but has 
not solved it by the use of RDS.
Correct. When there are point-to-point credits (even if 
only enforced/understood
at the ULP) then the application can correctly infer 
that message N was
successfully processed because the matching credit was 
restored. A transport
neutral application can only communicate restoration of 
credits via ULP
messaging. When credits are shared across sessions then 
the ULP
has a much more complex task to properly communicate 
credits.
 
The proposal I presented at RAIT for multistreamed MPA 
had a non-highlighted
option for a "wildcard" endpoint. Without the option 
multistream MPA is essentially
the SCTP adaptation for RDMA running over plain 
MPA/TCP. It achieves the
same reduction in reliable transport layer connections 
that RDS does, but
does not reduce the number of RDMA endpoints. The 
wildcard option 
reduces the number of RDMA endpoints as well, but 
greatly complicates
the RDMA state machines. RDS over IB faces similar 
problems, but solved
them slightly differently.
 
Over iWARP I believe these complexities favor keeping 
the point-to-point
logical connection between QP and only reducing the 
number of L4 
connections (from many TCP connections to a single TCP 
connection
or SCTP association). The advantage of that approach is 
that the API
from application to RDMA endpoint (QP) can be left 
totally unchanged.
But I do not see any such option over IB, unless RD is 
improved or a
new SCTP-like connection mode is 
defined.
 
In my opinion the multi-streaming is the most important 
feature here,
but over IB I do not think there is a natural 
adaptation that provides
multi-streaming without also adding the any-to-any 
endpoint semantics.
Multistream MPA and SCTP can both support the 
any-to-any endpoint
semantics by moving the source to payload information 
rather than
transport information (by invoking "wildcard status" in 
MS-MPA or
by duplicating the field for SCTP). So the RDS API 
strikes me as
the best option for a transport neutral application. 
MS-MPA and SCTP
reductions in transport overhead would be available 
without special
API support.
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Michael Krause


At 10:28 AM 11/9/2005, Rick Frank wrote:

Yes, the application is responsible for detecting lost msgs at the
application level - the transport can not do this.
 
RDS does not guarantee that a message
has been delivered to the application - just that once the transport has
accepted a msg it will deliver the msg to the remote node in order
without duplication - dealing with retransmissions, etc due to sporadic /
intermittent msg loss over the interconnect. If after accepting the send
- the current path fails - then RDS will transparently fail over to
another path - and if required will resend / send any already queued msgs
to the remote node - again insuring that no msg is duplicated and they
are in order.  This is no different than APM - with the exception
that RDS can do this across HCAs. 
 
The application - Oracle in this case -
will deal with detecting a catastrophic path failure - either due to a
send that does not arrive and or a timedout response or send failure
returned from the transport. If there is no network path to a remote node
- it is required that we remove the remote node from the operating
cluster to avoid what is commonly termed as a "split brain"
condition - otherwise known as a "partition in time".
 
BTW - in our case - the application
failure domain logic is the same whether we are using UDP /  uDAPL /
iTAPI / TCP / SCTP / etc. Basically, if we can not talk to a remote node
- after some defined period of time - we will remove the remote node from
the cluster. In this case the database will recover all the interesting
state that may have been maintained on the removed node - allowing the
remaining nodes to continue. If later on, communication to the remote
node is restored - it will be allowed to rejoin the cluster and take on
application load. 
One could be able to talk to the remote node across other HCA but that
does not mean one has an understanding of the state at the remote node
unless the failure is noted and a resync of state occurs or the remote is
able to deal with duplicates, etc.   This has nothing to do
with API or the transport involved but, as Caitlin noted, the difference
between knowing a send buffer is free vs. knowing that the application
received the data requested.  Therefore, one has only reduced the
reliability / robustness problem space to some extent but has not solved
it by the use of RDS.
Mike
 
 
- Original Message - 


From: Michael Krause 

To: Ranjit Pandit


Cc:
openib-general@openib.org


Sent: Tuesday, November 08, 2005 4:08 PM

Subject: Re: [openib-general] [ANNOUNCE] Contribute
RDS(ReliableDatagramSockets) to OpenIB

At 12:33 PM 11/8/2005, Ranjit Pandit wrote:

> Mike wrote:

>  - RDS does not solve a set of failure models.  For
example, if a RNIC / HCA

> were to fail, then one cannot simply replay the operations on
another RNIC /

> HCA without extracting state, etc. and providing some end-to-end
sync of

> what was really sent / received by the application.  Yes,
one can recover

> from cable or switch port failure by using APM style recovery
but that is

> only one class of faults.  The harder faults either result
in the end node

> being cast out of the cluster or see silent data corruption
unless

> additional steps are taken to transparently recover - again app
writers

> don't want to solve the hard problems; they want that done for
them.

The current reference implementation of RDS solves the HCA failure
case as well.

Since applications don't need to keep connection states, it's
easier

to handle cases like HCA and intermediate path failures.

As far as application is concerned, every sendmsg 'could' result in
a

new connection setup in the driver.

If the current path fails, RDS reestablishes a connection, if

available, on a different port or a different HCA , and replays
the

failed messages.

Using APM is not useful because it doesn't provide failover across
HCA's.

I think others may disagree about whether RDS solves the
problem.  You have no way of knowing whether something was received
or not into the other node's coherency domain without some intermediary
or application's involvement to see the data arrived.  As such, you
might see many hardware level acks occur and not know there is a real
failure.  If an application takes any action assuming that send
complete means it is delivered, then it is subject to silent data
corruption.  Hence, RDS can replay to its heart content but until
there is an application or middleware level of acknowledgement, you have
not solve the fault domain issues.  Some may be happy with this as
they just cast out the endnode from the cluster / database but others see
the loss of a server as a big deal so may not be happy to see this
occur.  It really comes down to whether you believe loosing a server
is worth while just for a local failure event which is not fatal to the
rest of the server.

APM's

Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Michael Krause


At 11:42 AM 11/9/2005, Greg Lindahl wrote:
On Tue, Nov 08, 2005 at
01:08:13PM -0800, Michael Krause wrote:
> If an application takes any action assuming that send complete
means
> it is delivered, then it is subject to silent data
corruption.
Right. That's the same as pretty much all other *transport* layers.
I
don't think anyone's asserting RDS is any different: you can't
assume
the other side's application received and acted on your message
until
the other side's application tells you that it did.
So, things like HCA failure are not transparent and one cannot simply
replay the operations since you don't know what was really seen by the
other side unless the application performs the resync itself. 
Hence, while RDS can attempt to retransmit, the application must deal
with duplicates, etc. or note the error, resync, and retransmit to avoid
duplicates.  
BTW, host-based transport implementations can transparently recover from
device failure on behalf of applications since their state is in the host
and not in the failed device - this is true for networking, storage,
etc.  HCA / RNIC / TOE / FC / etc. all loose state or cannot be
trusted thus must rely upon upper level software to perform the recovery,
resync, retransmission, etc.  Unless RDS has implemented its own
state checkpoint between endnodes, this class of failures must be solved
by the application since it cannot be solved in the hardware. 
Hence, RDS may push some of its reliability requirements to the
interconnect but it does not eliminate all reliability requirements from
the application or RDS itself.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Greg Lindahl
On Tue, Nov 08, 2005 at 01:08:13PM -0800, Michael Krause wrote:

> If an application takes any action assuming that send complete means
> it is delivered, then it is subject to silent data corruption.

Right. That's the same as pretty much all other *transport* layers. I
don't think anyone's asserting RDS is any different: you can't assume
the other side's application received and acted on your message until
the other side's application tells you that it did.

-- greg

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Rick Frank



Yes, the application is responsible for detecting 
lost msgs at the application level - the transport can not do this.
 
RDS does not guarantee that a message has been 
delivered to the application - just that once the transport has accepted a 
msg it will deliver the msg to the remote node in order without duplication 
- dealing with retransmissions, etc due to sporadic / intermittent msg loss over 
the interconnect. If after accepting the send - the current path fails - then 
RDS will transparently fail over to another path - and if required will resend / 
send any already queued msgs to the remote node - again insuring that no msg is 
duplicated and they are in order.  This is no different than APM - with the 
exception that RDS can do this across HCAs. 
 
The application - Oracle in this case - will deal 
with detecting a catastrophic path failure - either due to a send that does not 
arrive and or a timedout response or send failure returned from the transport. 
If there is no network path to a remote node - it is required that we 
remove the remote node from the operating cluster to avoid what is commonly 
termed as a "split brain" condition - otherwise known as a "partition in 
time".
 
BTW - in our case - the application failure domain 
logic is the same whether we are using UDP /  uDAPL / iTAPI / TCP / 
SCTP / etc. Basically, if we can not talk to a remote node - after some defined 
period of time - we will remove the remote node from the cluster. In this case 
the database will recover all the interesting state that may have been 
maintained on the removed node - allowing the remaining nodes to continue. If 
later on, communication to the remote node is restored - it will be allowed to 
rejoin the cluster and take on application load. 
 
 
- Original Message - 

  From: 
  Michael Krause 
  
  To: Ranjit Pandit 
  Cc: openib-general@openib.org 
  Sent: Tuesday, November 08, 2005 4:08 
  PM
  Subject: Re: [openib-general] [ANNOUNCE] 
  Contribute RDS(ReliableDatagramSockets) to OpenIB
  At 12:33 PM 11/8/2005, Ranjit Pandit wrote:
  > Mike wrote:>  - 
RDS does not solve a set of failure models.  For example, if a RNIC / 
HCA> were to fail, then one cannot simply replay the operations on 
another RNIC /> HCA without extracting state, etc. and providing some 
end-to-end sync of> what was really sent / received by the 
application.  Yes, one can recover> from cable or switch port 
failure by using APM style recovery but that is> only one class of 
faults.  The harder faults either result in the end node> being 
cast out of the cluster or see silent data corruption unless> 
additional steps are taken to transparently recover - again app 
writers> don't want to solve the hard problems; they want that done 
for them.The current reference implementation of RDS solves the HCA 
failure case as well.Since applications don't need to keep connection 
states, it's easierto handle cases like HCA and intermediate path 
failures.As far as application is concerned, every sendmsg 'could' 
result in anew connection setup in the driver.If the current path 
fails, RDS reestablishes a connection, ifavailable, on a different port 
or a different HCA , and replays thefailed messages.Using APM is not 
useful because it doesn't provide failover across HCA's.I 
  think others may disagree about whether RDS solves the problem.  You have 
  no way of knowing whether something was received or not into the other node's 
  coherency domain without some intermediary or application's involvement to see 
  the data arrived.  As such, you might see many hardware level acks occur 
  and not know there is a real failure.  If an application takes any action 
  assuming that send complete means it is delivered, then it is subject to 
  silent data corruption.  Hence, RDS can replay to its heart content but 
  until there is an application or middleware level of acknowledgement, you have 
  not solve the fault domain issues.  Some may be happy with this as they 
  just cast out the endnode from the cluster / database but others see the loss 
  of a server as a big deal so may not be happy to see this occur.  It 
  really comes down to whether you believe loosing a server is worth while just 
  for a local failure event which is not fatal to the rest of the 
  server.APM's value is the ability to recover from link failure.  
  It has the same value for any other ULP in that it recovers transparently to 
  the ULP.Mike 
  
  

  ___openib-general 
  mailing 
  listopenib-general@openib.orghttp://openib.org/mailman/listinfo/openib-generalTo 
  unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
___
openib-general mailing list
openib-general@openib.o

Re: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Greg Lindahl
Caitlin,

Can you please use the standard quoting style? I can't tell which
comments are yours. Thanks.

-- greg
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Caitlin Bestler



 

  
  
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Michael 
  KrauseSent: Tuesday, November 08, 2005 1:08 PMTo: Ranjit 
  PanditCc: openib-general@openib.orgSubject: Re: 
  [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to 
  OpenIB
  At 12:33 PM 11/8/2005, Ranjit Pandit wrote:
  > Mike wrote:>  - 
RDS does not solve a set of failure models.  For example, if a RNIC / 
HCA> were to fail, then one cannot simply replay the operations on 
another RNIC /> HCA without extracting state, etc. and providing some 
end-to-end sync of> what was really sent / received by the 
application.  Yes, one can recover> from cable or switch port 
failure by using APM style recovery but that is> only one class of 
faults.  The harder faults either result in the end node> being 
cast out of the cluster or see silent data corruption unless> 
additional steps are taken to transparently recover - again app 
writers> don't want to solve the hard problems; they want that done 
for them.The current reference implementation of RDS solves the HCA 
failure case as well.Since applications don't need to keep connection 
states, it's easierto handle cases like HCA and intermediate path 
failures.As far as application is concerned, every sendmsg 'could' 
result in anew connection setup in the driver.If the current path 
fails, RDS reestablishes a connection, ifavailable, on a different port 
or a different HCA , and replays thefailed messages.Using APM is not 
useful because it doesn't provide failover across HCA's.
  I think others may disagree about whether RDS solves the 
  problem.  You have no way of knowing whether something was received or 
  not into the other node's coherency domain without some intermediary or 
  application's involvement to see the data arrived.  As such, you might 
  see many hardware level acks occur and not know there is a real failure.  
  If an application takes any action assuming that send complete means it is 
  delivered, then it is subject to silent data corruption.  Hence, RDS can 
  replay to its heart content but until there is an application or middleware 
  level of acknowledgement, you have not solve the fault domain issues.  
  Some may be happy with this as they just cast out the endnode from the cluster 
  / database but others see the loss of a server as a big deal so may not be 
  happy to see this occur.  It really comes down to whether you believe 
  loosing a server is worth while just for a local failure event which is not 
  fatal to the rest of the server.[cait] 
Applications should not infer anything from send 
completion other than that their source
buffer is no longer requried for the transmit to 
complete.
 
That is the only assumption that can be supported in a 
transport neutral way.
 
I'll also point out that even under InfiniBand the fact 
that a send or write has
completed does NOT guarantee that the remote peer has 
*noticed* the data.
The Remote peer could fail *after* the date has been 
delivered to it and before
it has had a chance to act upon it. A well-designed 
robust application should
never rely on anything other than a peer ack to 
indicate that the peer has truly
taken ownership of transmitted 
information.
 
The essence of RDS, or any similar solution, is the 
delivery of message with
datagram semantics reliably over point-to-point 
reliable connections. So whatever
reliability and fault-tolerance benefits the reliable 
connections are inherited by
the RDS layer. After that it is mostly a matter of how 
you avoid head-of-line
blocking problems when there is no receive buffer. You 
don't want to send
an RNR (or drop the DDP Segment under iWARP) because 
*one* endpoint
does not have available buffers. Other than that any 
reliable datagram service
should be just as reliable as the underlying rc 
service.
 
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Michael Krause


At 12:33 PM 11/8/2005, Ranjit Pandit wrote:
> Mike wrote:
>  - RDS does not solve a set of failure models.  For
example, if a RNIC / HCA
> were to fail, then one cannot simply replay the operations on
another RNIC /
> HCA without extracting state, etc. and providing some end-to-end
sync of
> what was really sent / received by the application.  Yes, one
can recover
> from cable or switch port failure by using APM style recovery but
that is
> only one class of faults.  The harder faults either result in
the end node
> being cast out of the cluster or see silent data corruption
unless
> additional steps are taken to transparently recover - again app
writers
> don't want to solve the hard problems; they want that done for
them.
The current reference implementation of RDS solves the HCA failure case
as well.
Since applications don't need to keep connection states, it's easier
to handle cases like HCA and intermediate path failures.
As far as application is concerned, every sendmsg 'could' result in
a
new connection setup in the driver.
If the current path fails, RDS reestablishes a connection, if
available, on a different port or a different HCA , and replays the
failed messages.
Using APM is not useful because it doesn't provide failover across
HCA's.
I think others may disagree about whether RDS solves the problem. 
You have no way of knowing whether something was received or not into the
other node's coherency domain without some intermediary or application's
involvement to see the data arrived.  As such, you might see many
hardware level acks occur and not know there is a real failure.  If
an application takes any action assuming that send complete means it is
delivered, then it is subject to silent data corruption.  Hence, RDS
can replay to its heart content but until there is an application or
middleware level of acknowledgement, you have not solve the fault domain
issues.  Some may be happy with this as they just cast out the
endnode from the cluster / database but others see the loss of a server
as a big deal so may not be happy to see this occur.  It really
comes down to whether you believe loosing a server is worth while just
for a local failure event which is not fatal to the rest of the
server.
APM's value is the ability to recover from link failure.  It has the
same value for any other ULP in that it recovers transparently to the
ULP.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-09 Thread Michael Krause


At 12:37 PM 11/8/2005, Hal Rosenstock wrote:
On Tue, 2005-11-08 at 15:33,
Ranjit Pandit wrote:
> Using APM is not useful because it doesn't provide failover across
HCA's.
Can't APM be made to work across HCAs ?
No.  It requires state that is only within the HCA and there are
other aspects that prevent this, e.g. no single unified QP space across
all HCA, etc.
Mike

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB

2005-11-08 Thread Caitlin Bestler



 

  
  
  
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Michael 
  KrauseSent: Tuesday, November 08, 2005 11:52 AMTo: 
  Rimmer, ToddCc: openib-general@openib.orgSubject: RE: 
  [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to 
  OpenIB
   The entire 
  discussion might be distilled into the following:- Datagram 
  applications trade reliability for flexibility and resource savings.  
  
Reliable Datagram 
applications have endpoints that accept messages from 
multiple
known sources, rather than 
from a single known source (TCP, RC) or multiple 
unknown sources (UDP, 
RD).
 
This does save resources, 
but perhaps just as importantly it may reflect how
the application truly thinks 
of its communication endpoints. Oracle is not unique
in this communication 
requirement. This is essentially the interface MPI presents
to its users as 
well.

   
   - Datagram applications 
  that require reliability have to re-invent the wheel and given it is 
  non-trivial, they often get it variable quality and can suffer performance 
  loss if done poorly or the network is very lossy.  Given networks are a 
  lot less lossy today than years past, sans congestion drops, one might argue 
  about whether there is still a significant problem or not.[cait] 
Standardized congestion control that is not dependent 
on application specific
control 
is highly desirable. In the IP world new ULPs based upon UDP 
are
heavily 
discouraged for exactly this reason.

   
   - The reliable datagram 
  model isn't new - been there, done that on earlier interconnects - but it 
  isn't free.  IB could have done something like RDS but the people who 
  pushed the original requirements (some who are advocating RDS now) did not 
  want to take on the associated software enablement thus it was subsumed into 
  hardware and made slightly more restrictive as a result - perhaps more than 
  some people may like.  The only real delta between RDS one sense and the 
  current IB RD is the number of outstanding messages in flight on a given 
  EEC.  If RD were re-defined to allow software to recover some types of 
  failures much like UC, then one could simply use RD.[cait] 
The RDS 
API should definitely be compatiable with IB RD service, especially any 
later
one that 
solves the crippling limitation on in-flight 
messages.
 
Similarly 
the API should be compatible with IP based solutions, which since it is 
derived
from 
SOCK_DGRAM isn't much of a challenge.
 

   
   - RDS does not solve a 
  set of failure models.  For example, if a RNIC / HCA were to fail, then 
  one cannot simply replay the operations on another RNIC / HCA without 
  extracting state, etc. and providing some end-to-end sync of what was really 
  sent / received by the application.  Yes, one can recover from cable or 
  switch port failure by using APM style recovery but that is only one class of 
  faults.  The harder faults either result in the end node being cast out 
  of the cluster or see silent data corruption unless additional steps are taken 
  to transparently recover - again app writers don't want to solve the hard 
  problems; they want that done for them.[cait] 
This goes 
to the question of where the Reliable Datagram Service is 
implemented.
When done 
as middleware over existing reliable connection services then the 
middleware
does have 
a few issues on handling flushed buffers after an RNIC failure. These issues 
make
implementation of a zero-copy strategy more of an 
issue.
 
But if 
the endpoint is truly a datagram endpoint then these issues are the same as 
for
failover 
of connection-oriented endpoints between two 
RNICs/HCAs.
 

   - RNIC / HCA 
  provide hardware acceleration and reliable delivery to the remote RNIC / HCA 
  (not to the application since that is in a separate fault domain).  Doing 
  software multiplexing over such an interconnect as envisioned for IB RD is 
  relatively straight in many respects but not a trivial exercise as some might 
  contend.  Yes, people can point to a small number of lines of code but 
  that is just for the initial offering and is not an indication of what it 
  might have to become long-term to add all of the bells-n-whistles that people 
  have envisioned.[cait] 
IB RD is 
not transport neutral, and has the problem of severe in-flight limitations 
that
would 
make it unacceptable to most applications that would benefit from RDS 
even
if they 
were 
 
There is 
no way that iWARP vendors would ever implement a service designed 
to
match IB 
RD. An RDS service could be implemented over TCP, MPA, 
MS-MPA
or 
SCTP.
 

   - RDS is not an API but 
  a ULP.  It really uses a set of physical connections and which are then 
  used to set up logical application associations (often referred to as 
  connections but really are not in terms of the interconnect).  These 
  associations can be quickly established as they are just control messages over 
  t

Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-08 Thread Hal Rosenstock
On Tue, 2005-11-08 at 15:33, Ranjit Pandit wrote:
> Using APM is not useful because it doesn't provide failover across HCA's.

Can't APM be made to work across HCAs ?

-- Hal



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-08 Thread Ranjit Pandit
> Mike wrote:
>  - RDS does not solve a set of failure models.  For example, if a RNIC / HCA
> were to fail, then one cannot simply replay the operations on another RNIC /
> HCA without extracting state, etc. and providing some end-to-end sync of
> what was really sent / received by the application.  Yes, one can recover
> from cable or switch port failure by using APM style recovery but that is
> only one class of faults.  The harder faults either result in the end node
> being cast out of the cluster or see silent data corruption unless
> additional steps are taken to transparently recover - again app writers
> don't want to solve the hard problems; they want that done for them.

The current reference implementation of RDS solves the HCA failure case as well.
Since applications don't need to keep connection states, it's easier
to handle cases like HCA and intermediate path failures.
As far as application is concerned, every sendmsg 'could' result in a
new connection setup in the driver.
If the current path fails, RDS reestablishes a connection, if
available, on a different port or a different HCA , and replays the
failed messages.
Using APM is not useful because it doesn't provide failover across HCA's.

> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-08 Thread Michael Krause


At 03:02 PM 11/4/2005, Rimmer, Todd wrote:
> Bob wrote,
> Perhaps if tunneling udp packets over RC connections rather
than
> UD connections provides better performance, as was seen in the
RDS
> experiment, then why not just convert
> IPoIB to use a connected model (rather than datagrams)
> and then all existing IP upper level
> protocols would could benefit, TCP, UDP, SCTP, 
This would miss the second major improvement of RDS, namely removing the
need for the application to perform timeouts and retries on datagram
packets.  If Oracle ran over UDP/IP/IPoIB it would not be guaranteed
a loss-less reliable interface.  If UDP/IP/IPoIB provided a
loss-less reliable interface it would likely break or affect other UDP
applications which are expecting a flow controlled
interface.
The entire discussion might be distilled into the following:
- Datagram applications trade reliability for flexibility and resource
savings.  
- Datagram applications that require reliability have to re-invent the
wheel and given it is non-trivial, they often get it variable quality and
can suffer performance loss if done poorly or the network is very
lossy.  Given networks are a lot less lossy today than years past,
sans congestion drops, one might argue about whether there is still a
significant problem or not.
- The reliable datagram model isn't new - been there, done that on
earlier interconnects - but it isn't free.  IB could have done
something like RDS but the people who pushed the original requirements
(some who are advocating RDS now) did not want to take on the associated
software enablement thus it was subsumed into hardware and made slightly
more restrictive as a result - perhaps more than some people may
like.  The only real delta between RDS one sense and the current IB
RD is the number of outstanding messages in flight on a given EEC. 
If RD were re-defined to allow software to recover some types of failures
much like UC, then one could simply use RD.
- RDS does not solve a set of failure models.  For example, if a
RNIC / HCA were to fail, then one cannot simply replay the operations on
another RNIC / HCA without extracting state, etc. and providing some
end-to-end sync of what was really sent / received by the
application.  Yes, one can recover from cable or switch port failure
by using APM style recovery but that is only one class of faults. 
The harder faults either result in the end node being cast out of the
cluster or see silent data corruption unless additional steps are taken
to transparently recover - again app writers don't want to solve the hard
problems; they want that done for them.
- RNIC / HCA provide hardware acceleration and reliable delivery to the
remote RNIC / HCA (not to the application since that is in a separate
fault domain).  Doing software multiplexing over such an
interconnect as envisioned for IB RD is relatively straight in many
respects but not a trivial exercise as some might contend.  Yes,
people can point to a small number of lines of code but that is just for
the initial offering and is not an indication of what it might have to
become long-term to add all of the bells-n-whistles that people have
envisioned.
- RDS is not an API but a ULP.  It really uses a set of physical
connections and which are then used to set up logical application
associations (often referred to as connections but really are not in
terms of the interconnect).  These associations can be quickly
established as they are just control messages over the existing physical
connections.  Again, builds on concepts already shipping in earlier
interconnects / solutions from a number of years back.  Hence, for
large scale applications which are association intensive, RDS is able to
improve the performance of establishing these associations.  While
RDS improves the performance in this regard, its impacts on actual
performance stem more from avoiding some operations thus nearly all of
the performance numbers quoted are really an apple-to-orange
comparison.  Nothing wrong with this but people need to keep in mind
that things are not being compared with one another on the same level
thus the results can look more dramatic.
- One thing to keep in mind is that RDS is about not doing work to gain
performance and to potentially improve code by eliminating software that
was too complex / difficult to get clean when it was invoked to recover
from fabric-related issues.  This is somewhat the same logic as used
by NFS when migrating to TCP from UDP.   Could not get clean
software so change the underlying comms to push the problem to a place
where it is largely solved.
Now, whether you believe RDS is great or not, it is an attempt to solve a
problem plaguing one class of applications who'd rather not spend their
resources on the problem.  That is a fair thing to consider if
someone else has already done it better using another technology. 
One could also consider having IB change the RD semantics to see if that
would solve the problem sin

RE: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-06 Thread Caitlin Bestler



-Original Message-
From: [EMAIL PROTECTED] on behalf of Roland Dreier
Sent: Fri 11/4/2005 6:49 PM
To: Rick Frank
Cc: openib-general@openib.org
Subject: Re: [openib-general] [ANNOUNCE] Contribute 
RDS(ReliableDatagramSockets) to OpenIB
 
Rick> Do you mean useTCP and the RC transport in the ethernet
Rick> verbs provider ?

No, I mean just write RDS for ethernet on top of sockets.  I don't
think it's worth implementing a whole RDMA provider on top of ethernet
just so you can use the same RDS code.  The SilverStorm RDS code is
only about 10K lines of code, and I think a sane implementation would
probably be less than 5K, so you're not getting much benefit from from
all the effort of writing an RDMA provider.

In fact I'm not sure that it doesn't make sense to implement RDS as a
library + daemon completely in userspace.

 - R.

[Caitlin]
Correct, the idea of providing Reliable Datagram service over reliable
point-to-point tunnels enables userspace solutions as long as they
have access to high-throughput reliable connection service. Whether
a TCP service that provides no stateful acceleration qualifies is a topic
that we do not need to take up here.
[/Caitlin]

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Roland Dreier
Rick> Do you mean useTCP and the RC transport in the ethernet
Rick> verbs provider ?

No, I mean just write RDS for ethernet on top of sockets.  I don't
think it's worth implementing a whole RDMA provider on top of ethernet
just so you can use the same RDS code.  The SilverStorm RDS code is
only about 10K lines of code, and I think a sane implementation would
probably be less than 5K, so you're not getting much benefit from from
all the effort of writing an RDMA provider.

In fact I'm not sure that it doesn't make sense to implement RDS as a
library + daemon completely in userspace.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Roland Dreier
Rick> We (Oracle) are currently investigating / working on an RDS
Rick> over Ethernet driver for Linux. Our current plans are to
Rick> produce a new verbs provider that registers with Gen 2 IB
Rick> verbs layer. This new driver will bind to a standard
Rick> ethernet nic driver and implement the RC semantics. This
Rick> will allow us to use 100% of the ported RDS ULP.

That seems rather an awkward way to go about it.  Why not just use TCP?

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Rick Frank
At this point we really need to get RDS on IB ported to Gen 2 so we can get 
this into Linux distributions ASAP.


We (Oracle) are currently investigating / working on an RDS over Ethernet 
driver for Linux. Our current plans are to produce a new verbs provider that 
registers with Gen 2 IB verbs layer. This new driver will bind to a standard 
ethernet nic driver and implement the RC semantics. This will allow us to 
use 100% of the ported RDS ULP.


Note that RDP should also run over any other interconnect that registers 
with the verbs layer - such as iWARP, etc .


- Original Message - 
From: "Bob Woodruff" <[EMAIL PROTECTED]>

To: "'Ranjit Pandit'" <[EMAIL PROTECTED]>
Cc: "Rick Frank" <[EMAIL PROTECTED]>; 
Sent: Friday, November 04, 2005 6:58 PM
Subject: RE: [openib-general] [ANNOUNCE] Contribute 
RDS(ReliableDatagramSockets) to OpenIB




Ranjit wrote,

RDS is somewhat like SDP in that it offloads/accelerates SOCK_DGRAM
instead of SOCK_STREAM.


So back to the question from Roland that started this thread.
When do you plan to re-work the code to use the OpenIB
verbs and make it suitable for the kernel ?

And do you plan to develop the code, or at least the infrastructure
to allow multiple RDS providers to plug in
so that it is ubiquitous - supported on all interconnects - to include
simple Ethernet NICs ?

woody

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general





___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Bob Woodruff
Ranjit wrote,
>RDS is somewhat like SDP in that it offloads/accelerates SOCK_DGRAM
>instead of SOCK_STREAM.

So back to the question from Roland that started this thread.
When do you plan to re-work the code to use the OpenIB
verbs and make it suitable for the kernel ?

And do you plan to develop the code, or at least the infrastructure
to allow multiple RDS providers to plug in 
so that it is ubiquitous - supported on all interconnects - to include
simple Ethernet NICs ?

woody

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Bob Woodruff
Rick wrote,
>SCTP is connection based - we have many dependencies on our connectionless 
>datagram model.

I think I get it now. I was just talking with Roy about SCTP, 
and he said the same thing, SCTP is a connected rather than datagram model,
so SCTP does not seem to solve the problem since it 
has the same FD scaling problems as TCP.

>Of course for this to work - we will need RDS to be ubiquitous - supported 
>on all interconnects - to include simple Ethernet NICs.

Makes sense.  

woody



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Rick Frank
SCTP is connection based - we have many dependencies on our connectionless 
datagram model.



- Original Message - 
From: "Bob Woodruff" <[EMAIL PROTECTED]>
To: "'Rimmer, Todd'" <[EMAIL PROTECTED]>; "Caitlin Bestler" 
<[EMAIL PROTECTED]>; "Rick Frank" <[EMAIL PROTECTED]>; "Pandit, 
Ranjit" <[EMAIL PROTECTED]>; "Grant Grundler" <[EMAIL PROTECTED]>

Cc: 
Sent: Friday, November 04, 2005 6:10 PM
Subject: RE: [openib-general] [ANNOUNCE] Contribute 
RDS(ReliableDatagramSockets) to OpenIB




Todd wrote,

This would miss the second major improvement of RDS, namely removing the

need for >the application to perform timeouts and retries on datagram
packets.  If Oracle

ran over UDP/IP/IPoIB it would not be guaranteed a loss-less reliable

interface.  >If UDP/IP/IPoIB provided a loss-less reliable interface it
would likely break or >affect other UDP applications which are expecting a
flow controlled interface.


Todd Rimmer


Then use SCTP instead of UDP, which already provides a loss-less reliable
interface.
If SCTP has problems with the number of endpoints it can currently 
support,

why not just fix that problem and fix IpoIB to use a connected model to
increase performance, rather than inventing a completly new protocol 
and/or

address family.

Just a thought.

woody






___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general





___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Ranjit Pandit
On 11/4/05, Bob Woodruff <[EMAIL PROTECTED]> wrote:
> Woody wrote,
> >Perhaps if tunneling udp packets over RC connections rather than
> >UD connections provides better performance, as was seen in the RDS
> >experiment, then why not just convert
> >IPoIB to use a connected model (rather than datagrams)
> >and then all existing IP upper level
> >protocols would could benefit, TCP, UDP, SCTP, 
>
> Saying this another way.
> Make the hardware run the existing protocols better, don't
> design a new protocol to work around the problems with a
> specific hardware transport.
>

What about SDP? Isn't SDP bypassing the existing TCP protocol stack to
take advantage of a specific harware transport - IB?

RDS is somewhat like SDP in that it offloads/accelerates SOCK_DGRAM
instead of SOCK_STREAM.

> woody
>
>
>
>
> -Original Message-
> From: Caitlin Bestler [mailto:[EMAIL PROTECTED]
> Sent: Friday, November 04, 2005 2:31 PM
> To: Woodruff, Robert J; Rick Frank; Ranjit Pandit; Grant Grundler
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS (
> ReliableDatagramSockets) to OpenIB
>
>
>
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Bob Woodruff
> > Sent: Friday, November 04, 2005 2:15 PM
> > To: 'Rick Frank'; Ranjit Pandit; Grant Grundler
> > Cc: openib-general@openib.org
> > Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS (
> > ReliableDatagramSockets) to OpenIB
> >
> > Rick wrote,
> > >I've atttached a draft proposal for RDS from Oracle which discusses
> > >some of
> >
> > >the motivation for RDS.
> >
> > Couple of questions/comments on the spec.
> >
> >
> > AF_INET_OFFLOAD should be renamed to something like AF_INET_RDS.
> >
> > Would something like SCTP provide the same type of
> > capabilities (relaible datagrams) that you are suggesting to
> > add with RDP ?
> >
>
> Each stream within an SCTP association provides a reliable,
> ordered service.
>
> There would be two primary constraints in using SCTP for
> this usage profile:
>
> 1) The Stream ID is 16 bits, and the natural mapping would
>be to have each stream represent a source/destination
>pairing. That would imply fewer than 256 endpoints per
>host. If the source were encoded by hand then the limitation
>would be 64K, but that's an awkard mix of application and
>transport layer encoding.
> 2) The network has to be composed of SCTP friendly equipment.
>When IP network equipment operated exclusively at L2/L3,
>and L4 was left to the endpoints, SCTP would have had no
>problem being deployed. But because of security and IPV4
>address shortages there are a lot of middleboxes that are
>L4 aware, and generally that L4 awareness is limited to
>TCP and UDP.
>
> SCTP support would also have to be part of the offload device.
> RDS enables reliable datagrams using existing offloaded RC
> services (IB RC, iWARP, TOE). No NIC enhancements are required.
>
>
>
>
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
> ___
> openib-general mailing list
> openib-general@openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Bob Woodruff
Todd wrote,
>This would miss the second major improvement of RDS, namely removing the
need for >the application to perform timeouts and retries on datagram
packets.  If Oracle 
>ran over UDP/IP/IPoIB it would not be guaranteed a loss-less reliable
interface.  >If UDP/IP/IPoIB provided a loss-less reliable interface it
would likely break or >affect other UDP applications which are expecting a
flow controlled interface.

>Todd Rimmer

Then use SCTP instead of UDP, which already provides a loss-less reliable
interface.
If SCTP has problems with the number of endpoints it can currently support,
why not just fix that problem and fix IpoIB to use a connected model to
increase performance, rather than inventing a completly new protocol and/or
address family.  

Just a thought.

woody






___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Bob Woodruff
Woody wrote, 
>Perhaps if tunneling udp packets over RC connections rather than
>UD connections provides better performance, as was seen in the RDS
>experiment, then why not just convert
>IPoIB to use a connected model (rather than datagrams)
>and then all existing IP upper level
>protocols would could benefit, TCP, UDP, SCTP, 

Saying this another way.
Make the hardware run the existing protocols better, don't
design a new protocol to work around the problems with a
specific hardware transport.

woody




-Original Message-
From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 04, 2005 2:31 PM
To: Woodruff, Robert J; Rick Frank; Ranjit Pandit; Grant Grundler
Cc: openib-general@openib.org
Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS (
ReliableDatagramSockets) to OpenIB

 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Bob Woodruff
> Sent: Friday, November 04, 2005 2:15 PM
> To: 'Rick Frank'; Ranjit Pandit; Grant Grundler
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS ( 
> ReliableDatagramSockets) to OpenIB
> 
> Rick wrote, 
> >I've atttached a draft proposal for RDS from Oracle which discusses 
> >some of
> 
> >the motivation for RDS.
> 
> Couple of questions/comments on the spec.
> 
> 
> AF_INET_OFFLOAD should be renamed to something like AF_INET_RDS.
> 
> Would something like SCTP provide the same type of 
> capabilities (relaible datagrams) that you are suggesting to 
> add with RDP ?
> 

Each stream within an SCTP association provides a reliable,
ordered service.

There would be two primary constraints in using SCTP for
this usage profile:

1) The Stream ID is 16 bits, and the natural mapping would
   be to have each stream represent a source/destination
   pairing. That would imply fewer than 256 endpoints per
   host. If the source were encoded by hand then the limitation
   would be 64K, but that's an awkard mix of application and
   transport layer encoding.
2) The network has to be composed of SCTP friendly equipment.
   When IP network equipment operated exclusively at L2/L3,
   and L4 was left to the endpoints, SCTP would have had no
   problem being deployed. But because of security and IPV4
   address shortages there are a lot of middleboxes that are
   L4 aware, and generally that L4 awareness is limited to
   TCP and UDP.

SCTP support would also have to be part of the offload device.
RDS enables reliable datagrams using existing offloaded RC
services (IB RC, iWARP, TOE). No NIC enhancements are required.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Rimmer, Todd
> Bob wrote,
> Perhaps if tunneling udp packets over RC connections rather than
> UD connections provides better performance, as was seen in the RDS
> experiment, then why not just convert
> IPoIB to use a connected model (rather than datagrams)
> and then all existing IP upper level
> protocols would could benefit, TCP, UDP, SCTP, 

This would miss the second major improvement of RDS, namely removing the need 
for the application to perform timeouts and retries on datagram packets.  If 
Oracle ran over UDP/IP/IPoIB it would not be guaranteed a loss-less reliable 
interface.  If UDP/IP/IPoIB provided a loss-less reliable interface it would 
likely break or affect other UDP applications which are expecting a flow 
controlled interface.

Todd Rimmer
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Bob Woodruff
Catlin wrote,
>SCTP support would also have to be part of the offload device.
>RDS enables reliable datagrams using existing offloaded RC
>services (IB RC, iWARP, TOE). No NIC enhancements are required.

BTW. SCTP runs in Linux today without any NIC enhancements or offload
support. 

Perhaps if tunneling udp packets over RC connections rather than
UD connections provides better performance, as was seen in the RDS
experiment, then why not just convert
IPoIB to use a connected model (rather than datagrams)
and then all existing IP upper level
protocols would could benefit, TCP, UDP, SCTP, 


woody




-Original Message-
From: Caitlin Bestler [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 04, 2005 2:31 PM
To: Woodruff, Robert J; Rick Frank; Ranjit Pandit; Grant Grundler
Cc: openib-general@openib.org
Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS (
ReliableDatagramSockets) to OpenIB

 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Bob Woodruff
> Sent: Friday, November 04, 2005 2:15 PM
> To: 'Rick Frank'; Ranjit Pandit; Grant Grundler
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS ( 
> ReliableDatagramSockets) to OpenIB
> 
> Rick wrote, 
> >I've atttached a draft proposal for RDS from Oracle which discusses 
> >some of
> 
> >the motivation for RDS.
> 
> Couple of questions/comments on the spec.
> 
> 
> AF_INET_OFFLOAD should be renamed to something like AF_INET_RDS.
> 
> Would something like SCTP provide the same type of 
> capabilities (relaible datagrams) that you are suggesting to 
> add with RDP ?
> 

Each stream within an SCTP association provides a reliable,
ordered service.

There would be two primary constraints in using SCTP for
this usage profile:

1) The Stream ID is 16 bits, and the natural mapping would
   be to have each stream represent a source/destination
   pairing. That would imply fewer than 256 endpoints per
   host. If the source were encoded by hand then the limitation
   would be 64K, but that's an awkard mix of application and
   transport layer encoding.
2) The network has to be composed of SCTP friendly equipment.
   When IP network equipment operated exclusively at L2/L3,
   and L4 was left to the endpoints, SCTP would have had no
   problem being deployed. But because of security and IPV4
   address shortages there are a lot of middleboxes that are
   L4 aware, and generally that L4 awareness is limited to
   TCP and UDP.

SCTP support would also have to be part of the offload device.
RDS enables reliable datagrams using existing offloaded RC
services (IB RC, iWARP, TOE). No NIC enhancements are required.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Caitlin Bestler
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Bob Woodruff
> Sent: Friday, November 04, 2005 2:15 PM
> To: 'Rick Frank'; Ranjit Pandit; Grant Grundler
> Cc: openib-general@openib.org
> Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS ( 
> ReliableDatagramSockets) to OpenIB
> 
> Rick wrote, 
> >I've atttached a draft proposal for RDS from Oracle which discusses 
> >some of
> 
> >the motivation for RDS.
> 
> Couple of questions/comments on the spec.
> 
> 
> AF_INET_OFFLOAD should be renamed to something like AF_INET_RDS.
> 
> Would something like SCTP provide the same type of 
> capabilities (relaible datagrams) that you are suggesting to 
> add with RDP ?
> 

Each stream within an SCTP association provides a reliable,
ordered service.

There would be two primary constraints in using SCTP for
this usage profile:

1) The Stream ID is 16 bits, and the natural mapping would
   be to have each stream represent a source/destination
   pairing. That would imply fewer than 256 endpoints per
   host. If the source were encoded by hand then the limitation
   would be 64K, but that's an awkard mix of application and
   transport layer encoding.
2) The network has to be composed of SCTP friendly equipment.
   When IP network equipment operated exclusively at L2/L3,
   and L4 was left to the endpoints, SCTP would have had no
   problem being deployed. But because of security and IPV4
   address shortages there are a lot of middleboxes that are
   L4 aware, and generally that L4 awareness is limited to
   TCP and UDP.

SCTP support would also have to be part of the offload device.
RDS enables reliable datagrams using existing offloaded RC
services (IB RC, iWARP, TOE). No NIC enhancements are required.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Bob Woodruff
Rick wrote, 
>I've atttached a draft proposal for RDS from Oracle which discusses some of

>the motivation for RDS.

Couple of questions/comments on the spec.


AF_INET_OFFLOAD should be renamed to something like AF_INET_RDS.

Would something like SCTP provide the same type of capabilities
(relaible datagrams) that you are suggesting to add with RDP ?

http://www.networksorcery.com/enp/protocol/sctp.htm

http://www.faqs.org/rfcs/rfc2960.html

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Sean Hefty

Rick Frank wrote:

No we do not use TCP sockets - we use to many connections for this 100k+.


Isn't RDS implemented on top of reliable IB/RDMA connections anyway?

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Rick Frank

No we do not use TCP sockets - we use to many connections for this 100k+.
- Original Message - 
From: "Bob Woodruff" <[EMAIL PROTECTED]>
To: "'Rick Frank'" <[EMAIL PROTECTED]>; "Ranjit Pandit" 
<[EMAIL PROTECTED]>; "Grant Grundler" <[EMAIL PROTECTED]>

Cc: 
Sent: Friday, November 04, 2005 11:35 AM
Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS 
(ReliableDatagramSockets) to OpenIB




Rick wrote,
I've atttached a draft proposal for RDS from Oracle which discusses some 
of



the motivation for RDS.


I assume that you have a driver that uses TCP sockets, Correct ?
If so, have you compared the performance of RDS to SDP ?

woody





___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-04 Thread Bob Woodruff
Rick wrote,
>I've atttached a draft proposal for RDS from Oracle which discusses some of

>the motivation for RDS.

I assume that you have a driver that uses TCP sockets, Correct ?
If so, have you compared the performance of RDS to SDP ?

woody

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB

2005-11-03 Thread Bob Woodruff
Grant wrote, 
>2) include some docs on it's use and why RDS is better than SDP.
>3) nag people to review the ported code
>4) post functional test results

Looking at the code that is in the contrib branch,
it looks like RDS uses connected channels,
Is that correct ?
If so, I do not see that it provides any value over SDP.

If it indeed were using datagrams over IB, then I see that
it might provide for better scaling than SDP, since with very
large numbers of connections, memory usage becomes an issue,
but as it is currently coded, I don't see the point.

I was unable to attend the RDS talk at OpenIB workshop, so
perhaps Rick can provide some reason why this protocol is
better than SDP.

woody

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general