[dpdk-dev] [PATCH v8 1/6] ethdev: add Tx preparation

2016-10-24 Thread Kulasek, TomaszX


> -Original Message-
> From: Ananyev, Konstantin
> Sent: Monday, October 24, 2016 14:57
> To: Kulasek, TomaszX ; dev at dpdk.org
> Cc: olivier.matz at 6wind.com
> Subject: RE: [PATCH v8 1/6] ethdev: add Tx preparation
> 
> 
> 
> > -Original Message-
> > From: Kulasek, TomaszX
> > Sent: Monday, October 24, 2016 1:49 PM
> > To: Ananyev, Konstantin ; dev at dpdk.org
> > Cc: olivier.matz at 6wind.com
> > Subject: RE: [PATCH v8 1/6] ethdev: add Tx preparation
> >
> > Hi Konstantin,
> >
> > > -Original Message-
> > > From: Ananyev, Konstantin
> > > Sent: Monday, October 24, 2016 14:15
> > > To: Kulasek, TomaszX ; dev at dpdk.org
> > > Cc: olivier.matz at 6wind.com
> > > Subject: RE: [PATCH v8 1/6] ethdev: add Tx preparation
> > >
> > > Hi Tomasz,
> > >
> >
> > [...]
> >
> > > >
> > > > +/**
> > > > + * Fix pseudo header checksum
> > > > + *
> > > > + * This function fixes pseudo header checksum for TSO and non-TSO
> > > > +tcp/udp in
> > > > + * provided mbufs packet data.
> > > > + *
> > > > + * - for non-TSO tcp/udp packets full pseudo-header checksum is
> > > > +counted
> > > and set
> > > > + *   in packet data,
> > > > + * - for TSO the IP payload length is not included in pseudo
> header.
> > > > + *
> > > > + * This function expects that used headers are in the first data
> > > > +segment of
> > > > + * mbuf, and are not fragmented.
> > > > + *
> > > > + * @param m
> > > > + *   The packet mbuf to be validated.
> > > > + * @return
> > > > + *   0 if checksum is initialized properly
> > > > + */
> > > > +static inline int
> > > > +rte_phdr_cksum_fix(struct rte_mbuf *m) {
> > > > +   struct ipv4_hdr *ipv4_hdr;
> > > > +   struct ipv6_hdr *ipv6_hdr;
> > > > +   struct tcp_hdr *tcp_hdr;
> > > > +   struct udp_hdr *udp_hdr;
> > > > +   uint64_t ol_flags = m->ol_flags;
> > > > +   uint64_t inner_l3_offset = m->l2_len;
> > > > +
> > > > +   if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
> > > > +   inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
> > > > +
> > > > +   /* headers are fragmented */
> > > > +   if (unlikely(rte_pktmbuf_data_len(m) >= inner_l3_offset +
> > > > +m->l3_len
> > > +
> > > > +   m->l4_len))
> > >
> > > Might be better to move that check into rte_validate_tx_offload(),
> > > so it would be called only when TX_DEBUG is on.
> >
> > While unfragmented headers are not general requirements for Tx
> > offloads, and this requirement is for this particular implementation,
> maybe for performance reasons will be better to keep it here, and just add
> #if DEBUG to leave rte_validate_tx_offload more generic.
> 
> Hmm and what is the advantage to pollute that code with more ifdefs?
> Again, why unfragmented headers are not general requirements?
> As long as DPDK pseudo-headear csum calculation routines can't handle
> fragmented case, it pretty much is a general requirement, no?
> Konstantin
> 

Ok, you're right, if we assume that this is general requirement, it should be 
moved.

> >
> > > Another thing, shouldn't it be:
> > > if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len)
> ?
> >
> > Yes, it should.
> >
> > > Konstantin
> > >
> >
> > Tomasz


[dpdk-dev] [PATCH v8 1/6] ethdev: add Tx preparation

2016-10-24 Thread Ananyev, Konstantin


> -Original Message-
> From: Kulasek, TomaszX
> Sent: Monday, October 24, 2016 1:49 PM
> To: Ananyev, Konstantin ; dev at dpdk.org
> Cc: olivier.matz at 6wind.com
> Subject: RE: [PATCH v8 1/6] ethdev: add Tx preparation
> 
> Hi Konstantin,
> 
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Monday, October 24, 2016 14:15
> > To: Kulasek, TomaszX ; dev at dpdk.org
> > Cc: olivier.matz at 6wind.com
> > Subject: RE: [PATCH v8 1/6] ethdev: add Tx preparation
> >
> > Hi Tomasz,
> >
> 
> [...]
> 
> > >
> > > +/**
> > > + * Fix pseudo header checksum
> > > + *
> > > + * This function fixes pseudo header checksum for TSO and non-TSO
> > > +tcp/udp in
> > > + * provided mbufs packet data.
> > > + *
> > > + * - for non-TSO tcp/udp packets full pseudo-header checksum is counted
> > and set
> > > + *   in packet data,
> > > + * - for TSO the IP payload length is not included in pseudo header.
> > > + *
> > > + * This function expects that used headers are in the first data
> > > +segment of
> > > + * mbuf, and are not fragmented.
> > > + *
> > > + * @param m
> > > + *   The packet mbuf to be validated.
> > > + * @return
> > > + *   0 if checksum is initialized properly
> > > + */
> > > +static inline int
> > > +rte_phdr_cksum_fix(struct rte_mbuf *m) {
> > > + struct ipv4_hdr *ipv4_hdr;
> > > + struct ipv6_hdr *ipv6_hdr;
> > > + struct tcp_hdr *tcp_hdr;
> > > + struct udp_hdr *udp_hdr;
> > > + uint64_t ol_flags = m->ol_flags;
> > > + uint64_t inner_l3_offset = m->l2_len;
> > > +
> > > + if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
> > > + inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
> > > +
> > > + /* headers are fragmented */
> > > + if (unlikely(rte_pktmbuf_data_len(m) >= inner_l3_offset + m->l3_len
> > +
> > > + m->l4_len))
> >
> > Might be better to move that check into rte_validate_tx_offload(), so it
> > would be called only when TX_DEBUG is on.
> 
> While unfragmented headers are not general requirements for Tx offloads, and 
> this requirement is for this particular implementation,
> maybe for performance reasons will be better to keep it here, and just add 
> #if DEBUG to leave rte_validate_tx_offload more generic.

Hmm and what is the advantage to pollute that code with more ifdefs?
Again, why unfragmented headers are not general requirements?
As long as DPDK pseudo-headear csum calculation routines can't handle 
fragmented case,
it pretty much is a general requirement, no?
Konstantin

> 
> > Another thing, shouldn't it be:
> > if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len) ?
> 
> Yes, it should.
> 
> > Konstantin
> >
> 
> Tomasz


[dpdk-dev] [PATCH v8 1/6] ethdev: add Tx preparation

2016-10-24 Thread Kulasek, TomaszX
Hi Konstantin,

> -Original Message-
> From: Ananyev, Konstantin
> Sent: Monday, October 24, 2016 14:15
> To: Kulasek, TomaszX ; dev at dpdk.org
> Cc: olivier.matz at 6wind.com
> Subject: RE: [PATCH v8 1/6] ethdev: add Tx preparation
> 
> Hi Tomasz,
> 

[...]

> >
> > +/**
> > + * Fix pseudo header checksum
> > + *
> > + * This function fixes pseudo header checksum for TSO and non-TSO
> > +tcp/udp in
> > + * provided mbufs packet data.
> > + *
> > + * - for non-TSO tcp/udp packets full pseudo-header checksum is counted
> and set
> > + *   in packet data,
> > + * - for TSO the IP payload length is not included in pseudo header.
> > + *
> > + * This function expects that used headers are in the first data
> > +segment of
> > + * mbuf, and are not fragmented.
> > + *
> > + * @param m
> > + *   The packet mbuf to be validated.
> > + * @return
> > + *   0 if checksum is initialized properly
> > + */
> > +static inline int
> > +rte_phdr_cksum_fix(struct rte_mbuf *m) {
> > +   struct ipv4_hdr *ipv4_hdr;
> > +   struct ipv6_hdr *ipv6_hdr;
> > +   struct tcp_hdr *tcp_hdr;
> > +   struct udp_hdr *udp_hdr;
> > +   uint64_t ol_flags = m->ol_flags;
> > +   uint64_t inner_l3_offset = m->l2_len;
> > +
> > +   if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
> > +   inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
> > +
> > +   /* headers are fragmented */
> > +   if (unlikely(rte_pktmbuf_data_len(m) >= inner_l3_offset + m->l3_len
> +
> > +   m->l4_len))
> 
> Might be better to move that check into rte_validate_tx_offload(), so it
> would be called only when TX_DEBUG is on.

While unfragmented headers are not general requirements for Tx offloads, and 
this requirement is for this particular implementation, maybe for performance 
reasons will be better to keep it here, and just add #if DEBUG to leave 
rte_validate_tx_offload more generic.

> Another thing, shouldn't it be:
> if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len) ?

Yes, it should.

> Konstantin
> 

Tomasz


[dpdk-dev] [PATCH v8 1/6] ethdev: add Tx preparation

2016-10-24 Thread Ananyev, Konstantin
Hi Tomasz,

> 
>  /**
> + * Validate general requirements for tx offload in mbuf.
> + *
> + * This function checks correctness and completeness of Tx offload settings.
> + *
> + * @param m
> + *   The packet mbuf to be validated.
> + * @return
> + *   0 if packet is valid
> + */
> +static inline int
> +rte_validate_tx_offload(const struct rte_mbuf *m)
> +{
> + uint64_t ol_flags = m->ol_flags;
> +
> + /* Does packet set any of available offloads? */
> + if (!(ol_flags & PKT_TX_OFFLOAD_MASK))
> + return 0;
> +
> + /* IP checksum can be counted only for IPv4 packet */
> + if ((ol_flags & PKT_TX_IP_CKSUM) && (ol_flags & PKT_TX_IPV6))
> + return -EINVAL;
> +
> + /* IP type not set when required */
> + if (ol_flags & (PKT_TX_L4_MASK | PKT_TX_TCP_SEG))
> + if (!(ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)))
> + return -EINVAL;
> +
> + /* Check requirements for TSO packet */
> + if (ol_flags & PKT_TX_TCP_SEG)
> + if ((m->tso_segsz == 0) ||
> + ((ol_flags & PKT_TX_IPV4) &&
> + !(ol_flags & PKT_TX_IP_CKSUM)))
> + return -EINVAL;
> +
> + /* PKT_TX_OUTER_IP_CKSUM set for non outer IPv4 packet. */
> + if ((ol_flags & PKT_TX_OUTER_IP_CKSUM) &&
> + !(ol_flags & PKT_TX_OUTER_IPV4))
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> +/**
>   * Dump an mbuf structure to a file.
>   *
>   * Dump all fields for the given packet mbuf and all its associated
> diff --git a/lib/librte_net/rte_net.h b/lib/librte_net/rte_net.h
> index d4156ae..79669d7 100644
> --- a/lib/librte_net/rte_net.h
> +++ b/lib/librte_net/rte_net.h
> @@ -38,6 +38,11 @@
>  extern "C" {
>  #endif
> 
> +#include 
> +#include 
> +#include 
> +#include 
> +
>  /**
>   * Structure containing header lengths associated to a packet, filled
>   * by rte_net_get_ptype().
> @@ -86,6 +91,91 @@ struct rte_net_hdr_lens {
>  uint32_t rte_net_get_ptype(const struct rte_mbuf *m,
>   struct rte_net_hdr_lens *hdr_lens, uint32_t layers);
> 
> +/**
> + * Fix pseudo header checksum
> + *
> + * This function fixes pseudo header checksum for TSO and non-TSO tcp/udp in
> + * provided mbufs packet data.
> + *
> + * - for non-TSO tcp/udp packets full pseudo-header checksum is counted and 
> set
> + *   in packet data,
> + * - for TSO the IP payload length is not included in pseudo header.
> + *
> + * This function expects that used headers are in the first data segment of
> + * mbuf, and are not fragmented.
> + *
> + * @param m
> + *   The packet mbuf to be validated.
> + * @return
> + *   0 if checksum is initialized properly
> + */
> +static inline int
> +rte_phdr_cksum_fix(struct rte_mbuf *m)
> +{
> + struct ipv4_hdr *ipv4_hdr;
> + struct ipv6_hdr *ipv6_hdr;
> + struct tcp_hdr *tcp_hdr;
> + struct udp_hdr *udp_hdr;
> + uint64_t ol_flags = m->ol_flags;
> + uint64_t inner_l3_offset = m->l2_len;
> +
> + if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
> + inner_l3_offset += m->outer_l2_len + m->outer_l3_len;
> +
> + /* headers are fragmented */
> + if (unlikely(rte_pktmbuf_data_len(m) >= inner_l3_offset + m->l3_len +
> + m->l4_len))

Might be better to move that check into rte_validate_tx_offload(),
so it would be called only when TX_DEBUG is on.
Another thing, shouldn't it be:
if (rte_pktmbuf_data_len(m) < inner_l3_offset + m->l3_len + m->l4_len)
?
Konstantin


> + return -ENOTSUP;
> +
> + if ((ol_flags & PKT_TX_UDP_CKSUM) == PKT_TX_UDP_CKSUM) {
> + if (ol_flags & PKT_TX_IPV4) {
> + ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
> + inner_l3_offset);
> +
> + if (ol_flags & PKT_TX_IP_CKSUM)
> + ipv4_hdr->hdr_checksum = 0;
> +
> + udp_hdr = (struct udp_hdr *)((char *)ipv4_hdr +
> + m->l3_len);
> + udp_hdr->dgram_cksum = rte_ipv4_phdr_cksum(ipv4_hdr,
> + ol_flags);
> + } else {
> + ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct ipv6_hdr *,
> + inner_l3_offset);
> + /* non-TSO udp */
> + udp_hdr = rte_pktmbuf_mtod_offset(m, struct udp_hdr *,
> + inner_l3_offset + m->l3_len);
> + udp_hdr->dgram_cksum = rte_ipv6_phdr_cksum(ipv6_hdr,
> + ol_flags);
> + }
> + } else if ((ol_flags & PKT_TX_TCP_CKSUM) ||
> + (ol_flags & PKT_TX_TCP_SEG)) {
> + if (ol_flags & PKT_TX_IPV4) {
> + ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct ipv4_hdr *,
> + inner_l3_offset);
> +
> +   

[dpdk-dev] [PATCH v8 1/6] ethdev: add Tx preparation

2016-10-21 Thread Tomasz Kulasek
Added API for `rte_eth_tx_prep`

uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
struct rte_mbuf **tx_pkts, uint16_t nb_pkts)

Added fields to the `struct rte_eth_desc_lim`:

uint16_t nb_seg_max;
/**< Max number of segments per whole packet. */

uint16_t nb_mtu_seg_max;
/**< Max number of segments per one MTU */

Added functions:

int rte_validate_tx_offload(const struct rte_mbuf *m)
to validate general requirements for tx offload set in mbuf of packet
  such a flag completness. In current implementation this function is
  called optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.

int rte_phdr_cksum_fix(struct rte_mbuf *m)
to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
before hardware tx checksum offload.
 - for non-TSO tcp/udp packets full pseudo-header checksum is
   counted and set.
 - for TSO the IP payload length is not included.

PERFORMANCE TESTS
-

This feature was tested with modified csum engine from test-pmd.

The packet checksum preparation was moved from application to Tx
preparation step placed before burst.

We may expect some overhead costs caused by:
1) using additional callback before burst,
2) rescanning burst,
3) additional condition checking (packet validation),
4) worse optimization (e.g. packet data access, etc.)

We tested it using ixgbe Tx preparation implementation with some parts
disabled to have comparable information about the impact of diferent
parts of implementation.

IMPACT:

1) For unimplemented Tx preparation callback the performance impact is
   negligible,
2) For packet condition check without checksum modifications (nb_segs,
   available offloads, etc.) is 14626628/14252168 (~2.62% drop),
3) Full support in ixgbe driver (point 2 + packet checksum
   initialization) is 14060924/13588094 (~3.48% drop)

Signed-off-by: Tomasz Kulasek 
---
 config/common_base|1 +
 lib/librte_ether/rte_ethdev.h |   97 +
 lib/librte_mbuf/rte_mbuf.h|   56 
 lib/librte_net/rte_net.h  |   90 ++
 4 files changed, 244 insertions(+)

diff --git a/config/common_base b/config/common_base
index c7fd3db..619284b 100644
--- a/config/common_base
+++ b/config/common_base
@@ -120,6 +120,7 @@ CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+CONFIG_RTE_ETHDEV_TX_PREP=y

 #
 # Support NIC bypass logic
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 38641e8..d548d48 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -182,6 +182,7 @@ extern "C" {
 #include 
 #include 
 #include 
+#include 
 #include "rte_ether.h"
 #include "rte_eth_ctrl.h"
 #include "rte_dev_info.h"
@@ -699,6 +700,8 @@ struct rte_eth_desc_lim {
uint16_t nb_max;   /**< Max allowed number of descriptors. */
uint16_t nb_min;   /**< Min allowed number of descriptors. */
uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+   uint16_t nb_seg_max; /**< Max number of segments per whole packet. 
*/
+   uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
 };

 /**
@@ -1188,6 +1191,11 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
   uint16_t nb_pkts);
 /**< @internal Send output packets on a transmit queue of an Ethernet device. 
*/

+typedef uint16_t (*eth_tx_prep_t)(void *txq,
+  struct rte_mbuf **tx_pkts,
+  uint16_t nb_pkts);
+/**< @internal Prepare output packets on a transmit queue of an Ethernet 
device. */
+
 typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
   struct rte_eth_fc_conf *fc_conf);
 /**< @internal Get current flow control parameter on an Ethernet device */
@@ -1622,6 +1630,7 @@ struct rte_eth_rxtx_callback {
 struct rte_eth_dev {
eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+   eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare 
function. */
struct rte_eth_dev_data *data;  /**< Pointer to device data */
const struct eth_driver *driver;/**< Driver for this device */
const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
@@ -2816,6 +2825,94 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, 
nb_pkts);
 }

+/**
+ * Process a burst of output packets on a transmit queue of an Ethernet device.
+ *
+ * The rte_eth_tx_prep() function is invoked to prepare output packets to be
+ * transmitted on the output queue *queue_id* of the Ethernet device designated
+ * by its *port_