Lustre layers the following protocols above TCP/IP...

<lustre service specific>
lustre RPC
LNET
socket LND

Andreas's previous reply should be enough to start understanding the lustre RPC 
and above, so here's some stuff on the levels below.
Relevent headers are...

   <lustre>/lnet/include/lnet/socklnd.h
   <lustre>/lnet/include/lnet/lib-types.h


Socket LND protocol
-------------------

The HELLO message is the first thing sent on the TCP/IP bytestream and it's 
used to negotiate connection attributes.  The protocol
version is determined by looking at the first 4+4 bytes of the hello message, 
which contain a magic number and the protocol version

In KSOCK_PROTO_V1, the hello message is an lnet_hdr_t of type LNET_MSG_HELLO, 
with the dest_nid replaced by lnet_magicversion_t.
This is followed by 'payload_length' bytes of IP addresses (each 4 bytes) which 
list the interfaces that the sending socklnd owns.
The whole message is sent in LE byte order.  

There is no socklnd level V1 protocol after the initial HELLO - i.e. everything 
that follows is unencapsulated LNET messages.

In KSOCK_PROTO_V2, the hello message is a ksock_hello_msg_t.  The whole message 
is sent in sender's byte order and you can use the
bytesex of 'kshm_magic' on arrival to determine if the receiver needs to flip.  
>From then on every message is a ksock_msg_t also sent in the sender's byte 
>order.  This either encapsulates an LNET message
(ksm_type == KSOCK_MSG_LNET) or is a NOOP.  Every message includes zero-copy 
request and ACK cookies in every message so that a
zero-copy sender can determine when the source buffer can be released without 
resorting to a kernel patch.  The NOOP is provided for
delivering a zero-copy ACK when there is no LNET message to piggy-back it on.

Note that the socklnd may connect to its peers via a "bundle" of sockets - one 
for bidirectional "ping-pong" data and the other 2
for unidirectional bulk data.  However the message protocol on every socket is 
as described above.

LNET protocol
-------------

Every LNET message is an lnet_hdr_t sent in LE byte order followed by 
'payload_length' bytes of opaque payload data.  There are 4
types...

PUT   - request to send data contained in the payload
ACK   - response to a PUT with ack_wmd != LNET_WIRE_HANDLE_NONE
GET   - request to fetch data
REPLY - response to a GET with data in the payload

ACK and GET messages typically have 0 bytes of payload.

                Cheers,
                        Eric


_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to