This RFC proposes an implementation for a new action::

  socket(netns, inode, {actions})

This action will hook ct/output paths that forward packets, and for
specific tuples, on specific datapaths, with support netdevs will
introduce a way to bypass the normal reception path of IP{v4,v6}
packets and forward the data portions directly to the socket buffers
used by the application.

One partial implementation for this action was proposed at:
https://mail.openvswitch.org/pipermail/ovs-dev/2025-June/424390.html

This drew minimal feedback, so I'm proposing here an accompanying
userspace implementation that will provide support for flows using
the socket(netns,ino) proposal (the second proposal in that RFC) in
the 'output:' cases only (see patch 8 for details on the CT work that
still needs to be done).

Important note on patches 1-3:
==============================
Patch 1 & 2 aren't actually needed in this version.  I was spending
quite a bit of time on patch 3 getting the inet diag netlink messaging
working.  This proves to be more complex that expected because they
rely on calling setfd(netns-fd), which transitions the current context
completely into the netns.  We need to use a clone that can switch to
the netns without impacting the main context.  To do that, I was using
a separate process and writing to a pipe, but that turned out to be
quite a bit of management, I observed lots of stalls/hangs.  The right
thing to do may actually be to implement ovs netlink commands to scan
foreign netns details, but in order to get feedback early I opted to
use a heap of kluge scanning the procfs and pulling details that way.
"It works" but it is very inelegant.

Important note on CT
====================
Some additional details in patch 8, but basically the recirculation
generation is always assuming that we will generate future
recirculations and needing to rewrite the chain prior to the ct() call
is quite a bit more work.  Without doing that rewriting, the major
benefit of this series would be lost (after all, we will take a major
hit by doing the ct() call first, and we will need to go through ct()
before we can reach an output port).  This needs more planning.

Sampling / tracing / etc
========================
I didn't implement support for ipfix/sflow yet.  That requires a bit
more work to get right.  Additionally, the way to actually flow dump
would be either with a mirror port (so you'd want to use a utility
like ovs-tcpdump that sets one up) or using something like
retis/bpftrace since the skbuff forwarding 'skips' the normal routing
layers that handle pushing to listening sockets.

Performance details:
====================
I haven't benchmarked this implementation yet, as I've been working
with a kernel that has the older 'sock(tuple/try/commit)' primitives,
and I rather prefer this implementation because it seems 'cleaner'
in the end.

Aaron Conole (8):
  netdev: Add a mechanism for retrieving the target namespace.
  netlink: Introduce nl_msg_data helper for retrieving payload.
  netdev-linux: Add a socket inode and netns lookup.
  netdev: Add the ability to toggle socket lookup.
  bridge: Enable the configuration of 'socket-offload'.
  sockets: Add support for a direct socket submission action.
  dpif: Detected and advertise the socket action is available.
  ofproto: Add support for generating a socket action.

 NEWS                          |   6 +
 include/linux/openvswitch.h   |  12 ++
 lib/daemon-unix.c             |   1 +
 lib/dpif-netdev.c             |  23 +++
 lib/dpif.c                    |   3 +-
 lib/netdev-dummy.c            |  73 ++++++++
 lib/netdev-linux-private.h    |   1 +
 lib/netdev-linux.c            | 314 ++++++++++++++++++++++++++++++++++
 lib/netdev-provider.h         |  35 ++++
 lib/netdev.c                  | 128 ++++++++++++++
 lib/netdev.h                  |   9 +
 lib/netlink.c                 |  24 +++
 lib/netlink.h                 |   3 +
 lib/odp-execute.c             |   2 +
 lib/odp-util.c                |  80 ++++++++-
 ofproto/ofproto-dpif-ipfix.c  |   7 +
 ofproto/ofproto-dpif-rid.h    |   1 +
 ofproto/ofproto-dpif-sflow.c  |   7 +
 ofproto/ofproto-dpif-xlate.c  | 153 +++++++++++++++++
 ofproto/ofproto-dpif-xlate.h  |   9 +
 ofproto/ofproto-dpif.c        |  40 +++++
 ofproto/ofproto-dpif.h        |   5 +-
 tests/library.at              |   1 +
 tests/ofproto-dpif.at         |  81 +++++++++
 tests/test-netlink-policy.c   |  72 ++++++++
 utilities/checkpatch_dict.txt |   5 +
 vswitchd/bridge.c             |  14 ++
 vswitchd/vswitch.xml          |   7 +
 28 files changed, 1113 insertions(+), 3 deletions(-)

-- 
2.51.0

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to