On Mon, 25 Sep 2017 02:25:51 +0200
Daniel Borkmann <dan...@iogearbox.net> wrote:

> This work enables generic transfer of metadata from XDP into skb. The
> basic idea is that we can make use of the fact that the resulting skb
> must be linear and already comes with a larger headroom for supporting
> bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work
> on a similar principle and introduce a small helper bpf_xdp_adjust_meta()
> for adjusting a new pointer called xdp->data_meta. Thus, the packet has
> a flexible and programmable room for meta data, followed by the actual
> packet data. struct xdp_buff is therefore laid out that we first point
> to data_hard_start, then data_meta directly prepended to data followed
> by data_end marking the end of packet. bpf_xdp_adjust_head() takes into
> account whether we have meta data already prepended and if so, memmove()s
> this along with the given offset provided there's enough room.
> 
> [...] The scratch space at the head
> of the packet can be multiple of 4 byte up to 32 byte large. Drivers not
> yet supporting xdp->data_meta can simply be set up with xdp->data_meta
> as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,
> such that the subsequent match against xdp->data for later access is
> guaranteed to fail.

So, xdp->meta_data is placed just before the packet xdp->data starts.

I'm currently implementing a cpumap type, that transfers raw XDP frames
to another CPU, and the SKB is allocated on the remote CPU.  (It
actually works extremely well).  

For transferring info I need, I'm currently using xdp->data_hard_start
(the top/start of the xdp page).  Which should be compatible with your
approach, right?

The info I need:

 struct xdp_pkt {
        void *data;
        u16 len;
        u16 headroom;
        struct net_device *dev_rx;
 };

When I enqueue the xdp packet I do the following:

 int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
        struct net_device *dev_rx)
 {
        struct xdp_pkt *xdp_pkt;
        int headroom;

        /* Convert xdp_buff to xdp_pkt */
        headroom = xdp->data - xdp->data_hard_start;
        if (headroom < sizeof(*xdp_pkt))
                return -EOVERFLOW;
        xdp_pkt = xdp->data_hard_start;
        xdp_pkt->data = xdp->data;
        xdp_pkt->len  = xdp->data_end - xdp->data;
        xdp_pkt->headroom = headroom - sizeof(*xdp_pkt);

        /* Info needed when constructing SKB on remote CPU */
        xdp_pkt->dev_rx = dev_rx;

        bq_enqueue(rcpu, xdp_pkt);
        return 0;
 }

On the remote CPU dequeueing the packet, I'm doing the following.  As
you can see I'm still lacking some meta-data, that would be nice to
also transfer.  Could I use your infrastructure for that?

 static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
                                          struct xdp_pkt *xdp_pkt)
 {
        unsigned int truesize;
        void *pkt_data_start;
        struct sk_buff *skb;

        /* TODO: rcpu could provide truesize, it's static per RX-ring */
        truesize = 2048;

        // pkt_data_start = xdp_pkt + sizeof(*xdp_pkt);
        pkt_data_start = xdp_pkt->data - xdp_pkt->headroom;

        /* Need to adjust "truesize" for skb_shared_info to get proper
         * placed, to take into account that xdp_pkt is using part of
         * headroom
         */
        skb = build_skb(pkt_data_start, truesize - sizeof(*xdp_pkt));
        if (!skb)
                return NULL;

        skb_reserve(skb, xdp_pkt->headroom);
        __skb_put(skb, xdp_pkt->len);

        // skb_record_rx_queue(skb, rx_ring->queue_index);
        skb->protocol = eth_type_trans(skb, xdp_pkt->dev_rx);

        // How much does csum matter? 
 //     skb->ip_summed = CHECKSUM_UNNECESSARY; // Try to fake it...

        // Does setting skb_set_hash()) matter?
 //     __skb_set_hash(skb, 42, true, false); // Say it is software
 //     __skb_set_hash(skb, 42, false, true); // Say it is hardware

        // Do we lack setting rx_queue... it doesn't seem to matter
 //     skb_record_rx_queue(skb, 0);

        return skb;
 }

(I'll send out some patches soonish, hopefully tomorrow... to show in
more details what I'm doing)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Reply via email to