On Wed, 31 Jan 2018 14:53:37 +0100 Björn Töpel <bjorn.to...@gmail.com> wrote:
> The bpf_xdpsk_redirect call redirects the XDP context to the XDP > socket bound to the receiving queue (if any). As I explained in-person at FOSDEM, my suggestion is to use the bpf-map infrastructure for AF_XDP redirect, but people on this upstream mailing also need a chance to validate my idea ;-) The important thing to keep in-mind is how we can still maintain a SPSC (Single producer Single Consumer) relationship between an RX-queue and a userspace consumer-process. This AF_XDP "FOSDEM" patchset, store the "xsk" (xdp_sock) pointer directly in the net_device (_rx[].netdev_rx_queue.xs) structure. This limit each RX-queue to service a single xdp_sock. It sounds good from a SPSC pov, but not very flexible. With a "xdp_sock_map" we can get the flexibility to select among multiple xdp_sock'ets (via XDP pre-filter selecting a different map), and still maintain a SPSC relationship. The RX-queue will just have several SPSC relationships with the individual xdp_sock's. This is true for the AF_XDP-copy mode, and require less driver change. For the AF_XDP-zero-copy (ZC) mode drivers need significant changes anyhow, and in ZC case we will have to disallow this multiple xdp_sock's, which is simply achieved by checking if the xdp_sock pointer returned from the map lookup match the one that userspace requested driver to register it's memory for RX-rings from. The "xdp_sock_map" is an array, where the index correspond to the queue_index. The bpf_redirect_map() ignore the specified index and instead use xdp_rxq_info->queue_index in the lookup. Notice that a bpf-map have no pinned relationship with the device or XDP prog loaded. Thus, userspace need to bind() this map to the device before traffic can flow, like the proposed bind() on the xdp_sock. This is to establish the SPSC binding. My proposal is that userspace insert the xdp_sock file-descriptor(s) in the map at the queue-index, and the map (which is also just a file-descriptor) is bound maybe via bind() to a specific device (via the ifindex). Kernel side will walk the map and do required actions xdp_sock's in find in map. TX-side is harder, as now multiple xdp_sock's can have the same queue-pair ID with the same net_device. But Magnus propose that this can be solved with hardware. As newer NICs have many TX-queue, and the queue-pair ID is just an external visible number, while the kernel internal structure can point to a dedicated TX-queue per xdp_sock. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer