On Mon, 25 Sep 2017 02:25:51 +0200 Daniel Borkmann <dan...@iogearbox.net> wrote:
> This work enables generic transfer of metadata from XDP into skb. The > basic idea is that we can make use of the fact that the resulting skb > must be linear and already comes with a larger headroom for supporting > bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work > on a similar principle and introduce a small helper bpf_xdp_adjust_meta() > for adjusting a new pointer called xdp->data_meta. Thus, the packet has > a flexible and programmable room for meta data, followed by the actual > packet data. struct xdp_buff is therefore laid out that we first point > to data_hard_start, then data_meta directly prepended to data followed > by data_end marking the end of packet. bpf_xdp_adjust_head() takes into > account whether we have meta data already prepended and if so, memmove()s > this along with the given offset provided there's enough room. > > [...] The scratch space at the head > of the packet can be multiple of 4 byte up to 32 byte large. Drivers not > yet supporting xdp->data_meta can simply be set up with xdp->data_meta > as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out, > such that the subsequent match against xdp->data for later access is > guaranteed to fail. So, xdp->meta_data is placed just before the packet xdp->data starts. I'm currently implementing a cpumap type, that transfers raw XDP frames to another CPU, and the SKB is allocated on the remote CPU. (It actually works extremely well). For transferring info I need, I'm currently using xdp->data_hard_start (the top/start of the xdp page). Which should be compatible with your approach, right? The info I need: struct xdp_pkt { void *data; u16 len; u16 headroom; struct net_device *dev_rx; }; When I enqueue the xdp packet I do the following: int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp, struct net_device *dev_rx) { struct xdp_pkt *xdp_pkt; int headroom; /* Convert xdp_buff to xdp_pkt */ headroom = xdp->data - xdp->data_hard_start; if (headroom < sizeof(*xdp_pkt)) return -EOVERFLOW; xdp_pkt = xdp->data_hard_start; xdp_pkt->data = xdp->data; xdp_pkt->len = xdp->data_end - xdp->data; xdp_pkt->headroom = headroom - sizeof(*xdp_pkt); /* Info needed when constructing SKB on remote CPU */ xdp_pkt->dev_rx = dev_rx; bq_enqueue(rcpu, xdp_pkt); return 0; } On the remote CPU dequeueing the packet, I'm doing the following. As you can see I'm still lacking some meta-data, that would be nice to also transfer. Could I use your infrastructure for that? static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu, struct xdp_pkt *xdp_pkt) { unsigned int truesize; void *pkt_data_start; struct sk_buff *skb; /* TODO: rcpu could provide truesize, it's static per RX-ring */ truesize = 2048; // pkt_data_start = xdp_pkt + sizeof(*xdp_pkt); pkt_data_start = xdp_pkt->data - xdp_pkt->headroom; /* Need to adjust "truesize" for skb_shared_info to get proper * placed, to take into account that xdp_pkt is using part of * headroom */ skb = build_skb(pkt_data_start, truesize - sizeof(*xdp_pkt)); if (!skb) return NULL; skb_reserve(skb, xdp_pkt->headroom); __skb_put(skb, xdp_pkt->len); // skb_record_rx_queue(skb, rx_ring->queue_index); skb->protocol = eth_type_trans(skb, xdp_pkt->dev_rx); // How much does csum matter? // skb->ip_summed = CHECKSUM_UNNECESSARY; // Try to fake it... // Does setting skb_set_hash()) matter? // __skb_set_hash(skb, 42, true, false); // Say it is software // __skb_set_hash(skb, 42, false, true); // Say it is hardware // Do we lack setting rx_queue... it doesn't seem to matter // skb_record_rx_queue(skb, 0); return skb; } (I'll send out some patches soonish, hopefully tomorrow... to show in more details what I'm doing) -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer