Kernel Connection Multiplexor (KCM) is a facility that provides a
message based interface over TCP for generic application protocols.
The motivation for this is based on the observation that although
TCP is byte stream transport protocol with no concept of message
boundaries, a common use case is to implement a framed application
layer protocol running over TCP. To date, most TCP stacks offer
byte stream API for applications, which places the burden of message
delineation, message I/O operation atomicity, and load balancing
in the application. With KCM an application can efficiently send
and receive application protocol messages over TCP using a
datagram interface.

In order to delineate message in a TCP stream for receive in KCM, the
kernel implements a message parser. For this we chose to employ BPF
which is applied to the TCP stream. BPF code parses application layer
messages and returns a message length. Nearly all binary application
protocols are parsable in this manner, so KCM should be applicable
across a wide range of applications. Other than message length
determination in receive, KCM does not require any other application
specific awareness. KCM does not implement any other application
protocol semantics-- these are are provided in userspace or could be
implemented in a kernel module layered above KCM.

KCM implements an NxM multiplexor in the kernel as diagrammed below:

+------------+   +------------+   +------------+   +------------+
| KCM socket |   | KCM socket |   | KCM socket |   | KCM socket |
+------------+   +------------+   +------------+   +------------+
      |                 |               |                |
      +-----------+     |               |     +----------+
                  |     |               |     |
               +----------------------------------+
               |           Multiplexor            |
               +----------------------------------+
                 |   |           |           |  |
       +---------+   |           |           |  ------------+
       |             |           |           |              |
+----------+  +----------+  +----------+  +----------+ +----------+
|  Psock   |  |  Psock   |  |  Psock   |  |  Psock   | |  Psock   |
+----------+  +----------+  +----------+  +----------+ +----------+
      |              |           |            |             |
+----------+  +----------+  +----------+  +----------+ +----------+
| TCP sock |  | TCP sock |  | TCP sock |  | TCP sock | | TCP sock |
+----------+  +----------+  +----------+  +----------+ +----------+

The KCM sockets provide the datagram interface to applications,
Psocks are the state for each attached TCP connection (i.e. where
message delineation is performed on receive).

A description of the APIs and design can be found in the included
Documentation/networking/kcm.txt.

Testing:

For testing I have been developing kcmperf and super_kcmperf which
should allow functional verification and some baseline comparisons
with netperf using TCP. For the test results listed below, one
instance of kcmperf is run which creates a trivial MUX with one KCM
socket and one attached TCP connection.

netperf TCP_RR
   - 1 instance, 1 byte RR size
     34219 tps

   - 1 instance, 1000000 byte RR size
     464 tps

   - 200 instances, 1 byte RR size
     1721552 tps
     86.86% CPU utilization

   - 200 instances, 1000000 byte RR size
     1165
     7.16% CPU utilization

kcmperf
   - 1 instance, byte RR size
     32679 tps

   - 1 instance, byte RR size
     412 tps

   - 200 instances, 1 byte RR size
     1420454 tps
     80.02% CPU utilization 

   - 200 instances, 1000000 byte RR size
     1130 tps
     10.08% CPU utilization

Future support:

The implementation provided here should be thought of as a first cut,
for which the goal is to establish a robust base implementation. There
are many avenues for extending this basic implementation and improving
upon this:

 - Sample application support
 - SOCK_SEQPACKET support
 - Integration with TLS (TLS-in-kernel is a separate intiative).
 - Page operations/splice support
 - sendmmsg, recvmmsg support
 - Unconnected KCM sockets. Will be able to attach sockets to different
   destinations, AF_KCM addresses with be used in sendmsg and recvmsg
   to indicate destination
 - Explore more utility in performing BPF inline with a TCP data stream
   (setting SO_MARK, rxhash for messages being sent received on
   KCM sockets).
 - Performance work
   - Reduce locking (MUX lock is currently a bottleneck).
   - KCM socket to TCP socket affinity
   - Small message coalescing, direct calls to TCP send functions

Tom Herbert (3):
  rcu: Add list_next_or_null_rcu
  kcm: Kernel Connection Multiplexor module
  kcm: Add statistics and proc interfaces

 Documentation/networking/kcm.txt |  173 ++++
 include/linux/rculist.h          |   21 +
 include/linux/socket.h           |    6 +-
 include/net/kcm.h                |  211 +++++
 include/uapi/linux/errqueue.h    |    1 +
 include/uapi/linux/kcm.h         |   26 +
 net/Kconfig                      |    1 +
 net/Makefile                     |    1 +
 net/kcm/Kconfig                  |   10 +
 net/kcm/Makefile                 |    3 +
 net/kcm/kcmproc.c                |  415 ++++++++++
 net/kcm/kcmsock.c                | 1629 ++++++++++++++++++++++++++++++++++++++
 12 files changed, 2496 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/networking/kcm.txt
 create mode 100644 include/net/kcm.h
 create mode 100644 include/uapi/linux/kcm.h
 create mode 100644 net/kcm/Kconfig
 create mode 100644 net/kcm/Makefile
 create mode 100644 net/kcm/kcmproc.c
 create mode 100644 net/kcm/kcmsock.c

-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to