Lustre layers the following protocols above TCP/IP...
<lustre service specific>
lustre RPC
LNET
socket LND
Andreas's previous reply should be enough to start understanding the lustre RPC
and above, so here's some stuff on the levels below.
Relevent headers are...
<lustre>/lnet/include/lnet/socklnd.h
<lustre>/lnet/include/lnet/lib-types.h
Socket LND protocol
-------------------
The HELLO message is the first thing sent on the TCP/IP bytestream and it's
used to negotiate connection attributes. The protocol
version is determined by looking at the first 4+4 bytes of the hello message,
which contain a magic number and the protocol version
In KSOCK_PROTO_V1, the hello message is an lnet_hdr_t of type LNET_MSG_HELLO,
with the dest_nid replaced by lnet_magicversion_t.
This is followed by 'payload_length' bytes of IP addresses (each 4 bytes) which
list the interfaces that the sending socklnd owns.
The whole message is sent in LE byte order.
There is no socklnd level V1 protocol after the initial HELLO - i.e. everything
that follows is unencapsulated LNET messages.
In KSOCK_PROTO_V2, the hello message is a ksock_hello_msg_t. The whole message
is sent in sender's byte order and you can use the
bytesex of 'kshm_magic' on arrival to determine if the receiver needs to flip.
>From then on every message is a ksock_msg_t also sent in the sender's byte
>order. This either encapsulates an LNET message
(ksm_type == KSOCK_MSG_LNET) or is a NOOP. Every message includes zero-copy
request and ACK cookies in every message so that a
zero-copy sender can determine when the source buffer can be released without
resorting to a kernel patch. The NOOP is provided for
delivering a zero-copy ACK when there is no LNET message to piggy-back it on.
Note that the socklnd may connect to its peers via a "bundle" of sockets - one
for bidirectional "ping-pong" data and the other 2
for unidirectional bulk data. However the message protocol on every socket is
as described above.
LNET protocol
-------------
Every LNET message is an lnet_hdr_t sent in LE byte order followed by
'payload_length' bytes of opaque payload data. There are 4
types...
PUT - request to send data contained in the payload
ACK - response to a PUT with ack_wmd != LNET_WIRE_HANDLE_NONE
GET - request to fetch data
REPLY - response to a GET with data in the payload
ACK and GET messages typically have 0 bytes of payload.
Cheers,
Eric
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss