> -----Original Message----- > From: Ferruh Yigit <ferruh.yi...@amd.com> > Sent: Wednesday, February 8, 2023 6:31 PM > To: Koikkara Reeny, Shibin <shibin.koikkara.re...@intel.com>; > dev@dpdk.org; Zhang, Qi Z <qi.z.zh...@intel.com>; Burakov, Anatoly > <anatoly.bura...@intel.com>; Richardson, Bruce > <bruce.richard...@intel.com>; Mcnamara, John > <john.mcnam...@intel.com> > Cc: Loftus, Ciara <ciara.lof...@intel.com> > Subject: Re: [PATCH v3] net/af_xdp: AF_XDP PMD CNI Integration > > On 2/2/2023 4:55 PM, Shibin Koikkara Reeny wrote: > > Integrate support for the AF_XDP CNI and device plugin [1] so that the > > DPDK AF_XDP PMD can work in an unprivileged container environment. > > Part of the AF_XDP PMD initialization process involves loading an eBPF > > program onto the given netdev. This operation requires privileges, > > which prevents the PMD from being able to work in an unprivileged > > container (without root access). The plugin CNI handles the program > > loading. CNI open Unix Domain Socket (UDS) and waits listening for a > > client to make requests over that UDS. The client(DPDK) connects and a > > "handshake" occurs, then the File Descriptor which points to the > > XSKMAP associated with the loaded eBPF program is handed over to the > > client. The client can then proceed with creating an AF_XDP socket and > > inserting the socket into the XSKMAP pointed to by the FD received on > > the UDS. > > > > A new vdev arg "use_cni" is created to indicate user wishes to run the > > PMD in unprivileged mode and to receive the XSKMAP FD from the CNI. > > When this flag is set, the XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf > > flag should be used when creating the socket, which tells libbpf not > > to load the default libbpf program on the netdev. We tell libbpf not > > to do this because the loading is handled by the CNI in this scenario. > > > > Patch include howto doc explain how to configure AF_XDP CNI to working > > with DPDK. > > > > I can't able to test this but I will rely on Anatoly's tested-by to proceed. > > > [1]: https://github.com/intel/afxdp-plugins-for-kubernetes > > > > Signed-off-by: Shibin Koikkara Reeny <shibin.koikkara.re...@intel.com> > > --- > > doc/guides/howto/af_xdp_cni.rst | 254 +++++++++++++++++++++ > > doc/guides/howto/index.rst | 1 + > > drivers/net/af_xdp/rte_eth_af_xdp.c | 337 > > +++++++++++++++++++++++++++- > > 3 files changed, 580 insertions(+), 12 deletions(-) create mode > > 100644 doc/guides/howto/af_xdp_cni.rst > > > > diff --git a/doc/guides/howto/af_xdp_cni.rst > > b/doc/guides/howto/af_xdp_cni.rst new file mode 100644 index > > 0000000000..1cda08b2c6 > > --- /dev/null > > +++ b/doc/guides/howto/af_xdp_cni.rst > > @@ -0,0 +1,254 @@ > > +.. SPDX-License-Identifier: BSD-3-Clause > > + Copyright(c) 2020 Intel Corporation. > > 2023?
Will update it in the next patch. > > > + > > + > > +Using a CNI with the AF_XDP driver > > +================================== > > + > > + > > + > > Is multiple empty lines required? Will update it in the next patch. > > > +Introduction > > +------------ > > + > > +CNI, the Container Network Interface, is a technology for configuring > > +container network interfaces and which can be used to setup > > +Kubernetes networking. AF_XDP is a Linux socket Address Family that > > +enables an XDP program to redirect packets to a memory buffer in > userspace. > > + > > +This document explains how to enable the `AF_XDP Plugin for > > +Kubernetes`_ within a DPDK application using the `AF_XDP PMD`_ to > connect and use these technologies. > > + > > +.. _AF_XDP Plugin for Kubernetes: > > +https://github.com/intel/afxdp-plugins-for-kubernetes > > + > > +Background > > +---------- > > + > > +The standard `AF_XDP PMD`_ initialization process involves loading an > > +eBPF program onto the kernel netdev to be used by the PMD. This > > +operation requires root or escalated Linux privileges and thus > > +prevents the PMD from working in an unprivileged container. The > > +AF_XDP CNI plugin handles this situation by providing a device plugin that > performs the program loading. > > + > > +At a technical level the CNI opens a Unix Domain Socket and listens > > +for a client to make requests over that socket. A DPDK application > > +acting as a client connects and initiates a configuration > > +"handshake". The client then receives a file descriptor which points > > +to the XSKMAP associated with the loaded eBPF program. The XSKMAP is > > +a BPF map of AF_XDP sockets (XSK). The client can then proceed with > > +creating an AF_XDP socket and inserting that socket into the XSKMAP > pointed to by the descriptor. > > + > > +The EAL vdev argument ``use_cni`` is used to indicate that the user > > +wishes to run the PMD in unprivileged mode and to receive the XSKMAP > > +file descriptor from the CNI. When this flag is set, the > > +``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` > > +libbpf flag should be used when creating the socket to instruct > > +libbpf not to load the default libbpf program on the netdev. Instead > > +the loading is handled by the CNI. > > + > > +.. _AF_XDP PMD: https://doc.dpdk.org/guides/nics/af_xdp.html > > + > > +.. Note:: > > + > > + The Unix Domain Socket file path appear in the end user is > "/tmp/afxdp.sock". > > + > > @Anatoly, I guess you did a patch in the past to group runtime files in a > common folder, it was something like '/tmp/dpdk/<prefix>/'. > Does that runtime folder includes the socket files? > /tmp/afxdp.sock is created by the AF_XDP CNI plugin not by the DPDK testpmd. So we don’t have the control over the creation of this socket. > Also is this fixes socket name a blocker to have multiple primary process (-- > file-prefix)? (assuming both using af_xdp driver). > > > > +Prerequisites > > +------------- > > + > > +Docker and container prerequisites: > > + > > +* Set up the device plugin as described in the instructions for > > +`AF_XDP Plugin > > + for Kubernetes`_. > > + > > +* The Docker image should contain the libbpf and libxdp libraries, > > +which > > + are dependencies for AF_XDP, and should include support for the > > +``ethtool`` > > + command. > > + > > Is there any specific library version dependency? If so can you please > document? > Library Libbpf and Libxdp are dependence for AF_XDP and it is already documented in the https://doc.dpdk.org/guides/nics/af_xdp.html. This link is also added as part of the Prerequisites. > <...> > > > + > > + For further reference please use the `config.json`_ > > + > > + .. _config.json: > > + https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/test > > + /e2e/config.json > > + > > > Should references have a fixed tag instead of a dynamic branch that can > change later, like: > https://github.com/intel/afxdp-plugins-for- > kubernetes/blob/v0.0.2/test/e2e/config.json > > What do you think? > I agree will update in the next version. > <...> > > > @@ -81,6 +82,24 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, > NOTICE); > > > > #define ETH_AF_XDP_MP_KEY "afxdp_mp_send_fds" > > > > +#define MAX_LONG_OPT_SZ 64 > > +#define UDS_MAX_FD_NUM 2 > > +#define UDS_MAX_CMD_LEN 64 > > +#define UDS_MAX_CMD_RESP 128 > > +#define UDS_XSK_MAP_FD_MSG "/xsk_map_fd" > > +#define UDS_SOCK "/tmp/afxdp.sock" > > +#define UDS_CONNECT_MSG "/connect" > > +#define UDS_HOST_OK_MSG "/host_ok" > > +#define UDS_HOST_NAK_MSG "/host_nak" > > +#define UDS_VERSION_MSG "/version" > > +#define UDS_XSK_MAP_FD_MSG "/xsk_map_fd" > > +#define UDS_XSK_SOCKET_MSG "/xsk_socket" > > +#define UDS_FD_ACK_MSG "/fd_ack" > > +#define UDS_FD_NAK_MSG "/fd_nak" > > +#define UDS_FIN_MSG "/fin" > > +#define UDS_FIN_ACK_MSG "/fin_ack" > > + > > + > > extra empty line Will remove it in the next version. > > > static int afxdp_dev_count; > > > > /* Message header to synchronize fds via IPC */ @@ -151,6 +170,7 @@ > > struct pmd_internals { > > char prog_path[PATH_MAX]; > > bool custom_prog_configured; > > bool force_copy; > > + bool use_cni; > > struct bpf_map *map; > > > > struct rte_ether_addr eth_addr; > > @@ -170,6 +190,7 @@ struct pmd_process_private { > > #define ETH_AF_XDP_PROG_ARG "xdp_prog" > > #define ETH_AF_XDP_BUDGET_ARG > "busy_budget" > > #define ETH_AF_XDP_FORCE_COPY_ARG "force_copy" > > +#define ETH_AF_XDP_USE_CNI_ARG "use_cni" > > > > Should document new devarg in driver documentation: > doc/guides/nics/af_xdp.rst > Will update it. > Also need to add it to 'RTE_PMD_REGISTER_PARAM_STRING' macro for > pmdinfo.py. > Will be updating it in the next version. > > static const char * const valid_arguments[] = { > > ETH_AF_XDP_IFACE_ARG, > > @@ -179,8 +200,8 @@ static const char * const valid_arguments[] = { > > ETH_AF_XDP_PROG_ARG, > > ETH_AF_XDP_BUDGET_ARG, > > ETH_AF_XDP_FORCE_COPY_ARG, > > - NULL > > -}; > > + ETH_AF_XDP_USE_CNI_ARG, > > + NULL}; > > > > please keep '}' in next line. > Will update it in the next version. > <...> > > > +static int > > +make_request_cni(int sock, struct sockaddr_un *server, char *request, > > + int *req_fd, char *response, int *out_fd) { > > + int rval; > > + > > + AF_XDP_LOG(INFO, "Request: [%s]\n", request); > > + > > Isn't INFO level logging too verbose for log each request and response, what > do you think for debug level? > I agree. Will update in the next version. > <...> > > > @@ -1584,6 +1865,26 @@ static const struct eth_dev_ops ops = { > > .get_monitor_addr = eth_get_monitor_addr, }; > > > > +/* CNI option works in unprivileged container environment > > + * and ethernet device functionality will be reduced. So > > + * additional customiszed eth_dev_ops struct is needed > > + * for cni. > > + **/ > > +static const struct eth_dev_ops ops_cni = { > > + .dev_start = eth_dev_start, > > + .dev_stop = eth_dev_stop, > > + .dev_close = eth_dev_close, > > + .dev_configure = eth_dev_configure, > > + .dev_infos_get = eth_dev_info, > > + .mtu_set = eth_dev_mtu_set, > > + .rx_queue_setup = eth_rx_queue_setup, > > + .tx_queue_setup = eth_tx_queue_setup, > > + .link_update = eth_link_update, > > + .stats_get = eth_stats_get, > > + .stats_reset = eth_stats_reset, > > + .get_monitor_addr = eth_get_monitor_addr, }; > > + > > The customisation is reduced dev_ops, right? Can you please comment > which functionality is removed and why (what kind of permission > requirement prevents that functionality to be used)? > Will update in the next version. > And can you please document in the driver documentation that 'use_cni' > will cause reduced functionality and what those dropped functionalities are? > This is updated in the Prerequisites as Pod should enable some capabilities as PMD will have only limited functionality. > > /** parse busy_budget argument */ > > static int > > parse_budget_arg(const char *key __rte_unused, @@ -1704,8 +2005,8 > @@ > > xdp_get_channels_info(const char *if_name, int *max_queues, > > > > static int > > parse_parameters(struct rte_kvargs *kvlist, char *if_name, int > *start_queue, > > - int *queue_cnt, int *shared_umem, char > *prog_path, > > - int *busy_budget, int *force_copy) > > + int *queue_cnt, int *shared_umem, char *prog_path, > > + int *busy_budget, int *force_copy, int *use_cni) > > At some point need to put these variables for paramters into a struct, instead > of keep increasing the parameter list. Feel free to make a preperation patch > before this patch if you want. I think it will be good to do. But I prefer to do it after this patch.