2018-02-05 14:42 GMT+01:00 Jesper Dangaard Brouer <bro...@redhat.com>: > On Wed, 31 Jan 2018 14:53:37 +0100 Björn Töpel <bjorn.to...@gmail.com> wrote: > >> The bpf_xdpsk_redirect call redirects the XDP context to the XDP >> socket bound to the receiving queue (if any). > > As I explained in-person at FOSDEM, my suggestion is to use the > bpf-map infrastructure for AF_XDP redirect, but people on this > upstream mailing also need a chance to validate my idea ;-) > > The important thing to keep in-mind is how we can still maintain a > SPSC (Single producer Single Consumer) relationship between an > RX-queue and a userspace consumer-process. > > This AF_XDP "FOSDEM" patchset, store the "xsk" (xdp_sock) pointer > directly in the net_device (_rx[].netdev_rx_queue.xs) structure. This > limit each RX-queue to service a single xdp_sock. It sounds good from > a SPSC pov, but not very flexible. With a "xdp_sock_map" we can get > the flexibility to select among multiple xdp_sock'ets (via XDP > pre-filter selecting a different map), and still maintain a SPSC > relationship. The RX-queue will just have several SPSC relationships > with the individual xdp_sock's. > > This is true for the AF_XDP-copy mode, and require less driver change. > For the AF_XDP-zero-copy (ZC) mode drivers need significant changes > anyhow, and in ZC case we will have to disallow this multiple > xdp_sock's, which is simply achieved by checking if the xdp_sock > pointer returned from the map lookup match the one that userspace > requested driver to register it's memory for RX-rings from. > > The "xdp_sock_map" is an array, where the index correspond to the > queue_index. The bpf_redirect_map() ignore the specified index and > instead use xdp_rxq_info->queue_index in the lookup. > > Notice that a bpf-map have no pinned relationship with the device or > XDP prog loaded. Thus, userspace need to bind() this map to the > device before traffic can flow, like the proposed bind() on the > xdp_sock. This is to establish the SPSC binding. My proposal is that > userspace insert the xdp_sock file-descriptor(s) in the map at the > queue-index, and the map (which is also just a file-descriptor) is > bound maybe via bind() to a specific device (via the ifindex). Kernel > side will walk the map and do required actions xdp_sock's in find in > map. >
As we discussed at FOSDEM, I like the idea of using a map. This also opens up for configuring the AF_XDP sockets via bpf code, like sockmap does. I'll have a stab at adding an "xdp_sock_map/xskmap" or similar, and also extending the cgroup sock_ops to support AF_XDP sockets, so that the xskmap can be configured from bpf-land. Björn > TX-side is harder, as now multiple xdp_sock's can have the same > queue-pair ID with the same net_device. But Magnus propose that this > can be solved with hardware. As newer NICs have many TX-queue, and the > queue-pair ID is just an external visible number, while the kernel > internal structure can point to a dedicated TX-queue per xdp_sock. > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer