Hi Oliver,

> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier MATZ
> Sent: Wednesday, July 6, 2016 8:43 AM
> To: Liang, Cunming <cunming.liang at intel.com>; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type
> from data
> 
> Hi Cunming,
> 
> On 07/06/2016 08:44 AM, Liang, Cunming wrote:
> > Hi Olivier,
> >
> > On 7/5/2016 11:41 PM, Olivier Matz wrote:
> >> Introduce the function rte_pktmbuf_get_ptype() that parses a mbuf and
> >> returns its packet type. For now, the following packet types are
> >> parsed:
> >>     L2: Ether
> >>     L3: IPv4, IPv6
> >>     L4: TCP, UDP, SCTP
> >>
> >> The goal here is to provide a reference implementation for packet
> >> type parsing. This function will be used by testpmd in next commits,
> >> allowing to compare its result with the value given by the hardware.
> >>
> >> This function will also be useful when implementing Rx offload
> >> support in virtio pmd. Indeed, the virtio protocol gives the csum
> >> start and offset, but it does not give the L4 protocol nor it tells
> >> if the checksum is relevant for inner or outer. This information has
> >> to be known to properly set the ol_flags in mbuf.
> >>
> >> Signed-off-by: Didier Pallard <didier.pallard at 6wind.com>
> >> Signed-off-by: Jean Dao <jean.dao at 6wind.com>
> >> Signed-off-by: Olivier Matz <olivier.matz at 6wind.com>
> >> ---
> >>   doc/guides/rel_notes/release_16_11.rst |   5 +
> >>   lib/librte_mbuf/Makefile               |   5 +-
> >>   lib/librte_mbuf/rte_mbuf_ptype.c       | 234
> >> +++++++++++++++++++++++++++++++++
> >>   lib/librte_mbuf/rte_mbuf_ptype.h       |  43 ++++++
> >>   lib/librte_mbuf/rte_mbuf_version.map   |   1 +
> >>   5 files changed, 286 insertions(+), 2 deletions(-)
> >>   create mode 100644 lib/librte_mbuf/rte_mbuf_ptype.c
> >>
> >> [...]
> >> +
> >> +/* parse mbuf data to get packet type */ uint32_t
> >> +rte_pktmbuf_get_ptype(const struct rte_mbuf *m,
> >> +    struct rte_mbuf_hdr_lens *hdr_lens) {
> >> +    struct rte_mbuf_hdr_lens local_hdr_lens;
> >> +    const struct ether_hdr *eh;
> >> +    struct ether_hdr eh_copy;
> >> +    uint32_t pkt_type = RTE_PTYPE_L2_ETHER;
> >> +    uint32_t off = 0;
> >> +    uint16_t proto;
> >> +
> >> +    if (hdr_lens == NULL)
> >> +        hdr_lens = &local_hdr_lens;
> >> +
> >> +    eh = rte_pktmbuf_read(m, off, sizeof(*eh), &eh_copy);
> >> +    if (unlikely(eh == NULL))
> >> +        return 0;
> >> +    proto = eh->ether_type;
> >> +    off = sizeof(*eh);
> >> +    hdr_lens->l2_len = off;
> >> +
> >> +    if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv4)) {
> >> +        const struct ipv4_hdr *ip4h;
> >> +        struct ipv4_hdr ip4h_copy;
> >> +
> >> +        ip4h = rte_pktmbuf_read(m, off, sizeof(*ip4h), &ip4h_copy);
> >> +        if (unlikely(ip4h == NULL))
> >> +            return pkt_type;
> >> +
> >> +        pkt_type |= ptype_l3_ip(ip4h->version_ihl);
> >> +        hdr_lens->l3_len = ip4_hlen(ip4h);
> >> +        off += hdr_lens->l3_len;
> >> +        if (ip4h->fragment_offset &
> >> +                rte_cpu_to_be_16(IPV4_HDR_OFFSET_MASK |
> >> +                    IPV4_HDR_MF_FLAG)) {
> >> +            pkt_type |= RTE_PTYPE_L4_FRAG;
> >> +            hdr_lens->l4_len = 0;
> >> +            return pkt_type;
> >> +        }
> >> +        proto = ip4h->next_proto_id;
> >> +        pkt_type |= ptype_l4(proto);
> >> +    } else if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv6)) {
> >> +        const struct ipv6_hdr *ip6h;
> >> +        struct ipv6_hdr ip6h_copy;
> >> +        int frag = 0;
> >> +
> >> +        ip6h = rte_pktmbuf_read(m, off, sizeof(*ip6h), &ip6h_copy);
> >> +        if (unlikely(ip6h == NULL))
> >> +            return pkt_type;
> >> +
> >> +        proto = ip6h->proto;
> >> +        hdr_lens->l3_len = sizeof(*ip6h);
> >> +        off += hdr_lens->l3_len;
> >> +        pkt_type |= ptype_l3_ip6(proto);
> >> +        if ((pkt_type & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV6_EXT) {
> >> +            proto = skip_ip6_ext(proto, m, &off, &frag);
> >> +            hdr_lens->l3_len = off - hdr_lens->l2_len;
> >> +        }
> >> +        if (proto == 0)
> >> +            return pkt_type;
> >> +        if (frag) {
> >> +            pkt_type |= RTE_PTYPE_L4_FRAG;
> >> +            hdr_lens->l4_len = 0;
> >> +            return pkt_type;
> >> +        }
> >> +        pkt_type |= ptype_l4(proto);
> >> +    }
> >> +
> >> +    if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP) {
> >> +        hdr_lens->l4_len = sizeof(struct udp_hdr);
> >> +    } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP) {
> >> +        const struct tcp_hdr *th;
> >> +        struct tcp_hdr th_copy;
> >> +
> >> +        th = rte_pktmbuf_read(m, off, sizeof(*th), &th_copy);
> >> +        if (unlikely(th == NULL))
> >> +            return pkt_type & (RTE_PTYPE_L2_MASK |
> >> +                RTE_PTYPE_L3_MASK);
> >> +        hdr_lens->l4_len = (th->data_off & 0xf0) >> 2;
> >> +    } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) {
> >> +        hdr_lens->l4_len = sizeof(struct sctp_hdr);
> >> +    } else {
> >> +        hdr_lens->l4_len = 0;
> >> +    }
> >> +
> >> +    return pkt_type;
> >> +}
> >> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h
> >> b/lib/librte_mbuf/rte_mbuf_ptype.h
> >> index 4a34678..f468520 100644
> >> --- a/lib/librte_mbuf/rte_mbuf_ptype.h
> >> +++ b/lib/librte_mbuf/rte_mbuf_ptype.h
> >> @@ -545,6 +545,49 @@ extern "C" {
> >>           RTE_PTYPE_INNER_L3_MASK |                \
> >>           RTE_PTYPE_INNER_L4_MASK))
> >>   +struct rte_mbuf;
> >> +
> >> +/**
> >> + * Structure containing header lengths associated to a packet.
> >> + */
> >> +struct rte_mbuf_hdr_lens {
> >> +    uint8_t l2_len;
> >> +    uint8_t l3_len;
> >> +    uint8_t l4_len;
> >> +    uint8_t tunnel_len;
> >> +    uint8_t inner_l2_len;
> >> +    uint8_t inner_l3_len;
> >> +    uint8_t inner_l4_len;
> >> +};
> > [LC] The header parsing graph usually is not unique. The definition
> > maybe nice for the basic IP and L4 tunnel.
> > However it can't scale out to other cases, e.g. qinq, mac-in-mac, mpls
> > l2/l3 tunnel.
> > The parsing logic of "rte_pktmbuf_get_ptype()" and the definition of
> > "struct rte_mbuf_hdr_lens" consist a pair for one specific parser scheme.
> > In this case, the fixed function is to support below.
> >
> > + * Supported packet types are:
> > + *   L2: Ether
> > + *   L3: IPv4, IPv6
> > + *   L4: TCP, UDP, SCTP
> >
> > Of course, it can add more packet type detection logic in future. But
> > the more support, the higher the cost.
> >
> > One of the alternative way is to allow registering parser pair. APP
> > decides to choose the predefined scheme(by DPDK LIB), or to
> > self-define the parsing logic.
> > In this way, the scheme can do some assumption for the specific case
> > and ignore some useless graph detection.
> > In addition, besides the SW parser, the HW parser(identified by
> > packet_type in mbuf) can be turn on/off by leveraging the same manner.
> 
> Sorry, I'm not sure I'm fully getting what you are saying. If I understand 
> well,
> you would like to have something more flexible that supports the registration 
> of
> protocol to be recognized?
> 
> I'm not sure having a function with a dynamic registration method would really
> increase the performance compared to a static complete function.
> Actually, we will never support a tons of protocols since each layer packet 
> type
> is 4 bits, and since it requires that at least one hw supports it.

This patch will be very useful as a reference implementation, but it also 
highlights an issue with the current implementation of packet types reporting 
by HW and SW - as you just mentioned there are only 4 bits per each layer. As 
these 4 bit are used as a enumeration it is impossible to reports multiple 
headers located on the same layer. MPLS is one example, different packets could 
have different numbers of MPLS labels, but it is impossible to report using 
current packet_type structure.

It is possible, however, to  program HW to report user (application) specific 
packet types. For example, for IPoMPLS with one MPLS label, HW will report 
packet type A, but for IPoMPLS with two MPLS labels HW will reports packet type 
B. In this case, instead of defining and supporting tons of statically defined 
(or enumerated) protocol headers combinations, application will register packet 
types it expects from HW in addition to standard packet types. At the moment we 
 have high bits of packet_type reserved, so one possible solution would be to 
use the highest bit to indicate that this is user defined packet_type, specific 
to the application. Then it could be used with HW and with SW parser. For 
example, packet_type 0x8000000A is IPoMPLS with one MPLS label, 0x8000000B is 
IPoMPLS with two MPLS labels and so on.

Regards,
Andrey
> 
> As described in the cover letter, the 2 main goals of this patchset are to 
> provide
> a reference implementation for packet type recognition, and enable the
> support of virtio offloads (I'll send the patchset soon).
> This function is adapted to these 2 usages. Are you thinking of another 
> use-case
> that would not be covered?

> 
> Regards,
> Olivier

Reply via email to