Include a brief overview of the rsocket protocol and underlying design
with the source code to make it easier for someone trying to decipher
the actual code.

Signed-off-by: Sean Hefty <sean.he...@intel.com>
---
 docs/rsocket |  144 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 144 insertions(+), 0 deletions(-)
 create mode 100644 docs/rsocket

diff --git a/docs/rsocket b/docs/rsocket
new file mode 100644
index 0000000..5399f6c
--- /dev/null
+++ b/docs/rsocket
@@ -0,0 +1,144 @@
+rsocket Protocol and Design Guide               9/10/2012
+
+Overview
+--------
+Rsockets is a protocol over RDMA that supports a socket-level API
+for applications.  For details on the current state of the
+implementation, readers should refer to the rsocket man page.  This
+document describes the rsocket protocol, general design, and
+some implementation details. 
+
+Rsockets exchanges data by performing RDMA write operations into
+exposed data buffers.  In addition to RDMA write data, rsockets uses
+small, 32-bit messages for internal communication.  RDMA writes
+are used to transfer application data into remote data buffers
+and to notify the peer when new target data buffers are available.
+The following figure highlights the operation.
+
+   host A                   host B
+                          remote SGL                      
+  target SGL  <-------------  [  ]
+     [  ] ------
+     [  ] --    ------   receive buffer(s)         
+            --        ----->  +--+
+              --              |  |
+                --            |  |
+                  --          |  |
+                    --        +--+
+                      --       
+                        --->  +--+
+                              |  |
+                              |  |
+                              +--+
+
+The remote SGL contains the address, size, and rkey of the target SGL.  As
+receive buffers become available on host B, rsockets will issue an RDMA
+write against one of the entries in the target SGL on host A.  The
+updated entry will reference an available receive buffer.  Immediate data
+included with the RDMA write will indicate to host A that a target SGE
+has been updated.
+
+When host A has data to send, it will check its target SGL.  The current
+target SGE will contain the address, size, and rkey of the next receive
+buffer on host B.  If the data transfer is smaller than the size of the
+remote receive buffer, host A will update its target SGE to reflect the
+remaining size of the receive buffer.  That is, once a receive buffer has
+been published to a remote peer, it will be fully consumed before a second
+buffer is used.
+
+Rsockets relies on immediate data to notify the remote peer when data has
+been transferred or when a target SGL has been updated.  Because immediate
+data requires that the remote QP have a posted receive, rsockets also uses
+a credit based flow control mechanism.  The number of credits is based on
+the size of the receive queue, with initial credits exchanged during
+connection setup.  In order to transfer data, rsockets requires both
+available receive buffers (published via the target SGL) and data credits.
+
+Since immediate data is limited to 32-bits, messages may either indicate
+the arrival of application data or may be an internal message, but not both.
+To avoid credit deadlock, rsockets reserves a small number of available
+credits for control messages only, with the protocol relying on RNR NAKs
+and retries to make forward progress.
+
+
+Connection Establishment
+------------------------
+rsockets uses the RDMA CM for connection establishment.  Struct rs_conn_data
+is exchanged during the connection exchange as private data in the request
+and reply messages.
+
+struct rs_sge {
+       uint64_t addr;
+       uint32_t key;
+       uint32_t length;
+};
+
+#define RS_CONN_FLAG_NET 1
+
+struct rs_conn_data {
+       uint8_t           version;
+       uint8_t           flags;
+       uint16_t          credits;
+       uint32_t          reserved2;
+       struct rs_sge target_sgl;
+       struct rs_sge data_buf;
+};
+
+Version - current version is 1
+Flags
+RS_CONN_FLAG_NET - Set to 1 if host is big Endian.
+                   Determines byte ordering for RDMA write messages
+Credits - number of initial receive credits
+Reserved2 - set to 0
+Target SGL - Address, size (# entries), and rkey of target SGL.
+             Remote side will copy this into their remote SGL.
+Data Buffer - Initial receive buffer address, size (in bytes), and rkey.
+              Remote side will copy this into their first target SGE.
+
+
+Message Format
+--------------
+Rsocket uses RDMA writes with immediate data for all message exchanges.
+RDMA writes of 0 length are used if no additional data beyond the message
+needs to be exchanged.  Immediate data is limited to 32-bits.  Rsockets
+defines the following format for messages.
+
+The upper 3 bits are used to define the type of message being exchanged,
+with the meaning of the lower 29 bits determined by the upper bits.
+
+Bits    Message             Meaning of
+31:29    Type               Bits 28:0
+000    Data Transfer     bytes transfered
+001    reserved
+010    reserved
+011    reserved
+100    Credit Update     received credits granted
+101    reserved
+110    reserved
+111    Control           control message type
+
+Data Transfer
+Indicates that application data has been written into the next available
+receive buffer.  The size of the transfer, in bytes, is carried in the lower
+bits of the message.
+
+Credit Update
+Used to indicate that additional receive buffers and credits are available.
+The number of available credits is carried in the lower bits of the message.
+A credit update message is also used to indicate that a target SGE has been
+updated, in which case the number of additional credits may be 0.  The
+receiver of a credit update message must check for updates to the target SGL
+by inspecting the contents of the SGL.  The rsocket implementation must take
+care not to modify a remote target SGL while it may be in use.  This is done
+by tracking when a receive buffer referenced by a remote target SGL has been
+filled.
+
+Control Message - DISCONNECT
+Indicates that the rsocket connection has been fully disconnected and will no
+longer send or receive data.  Data received before the disconnect message was
+processed may still be available for reading.
+ 
+Control Message - SHUTDOWN
+Indicates that the remote rsocket has shutdown the send side of its
+connection.  The recipient of a shutdown message will no longer accept
+incoming data, but may still transfer outbound data.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to