date:20161005

[dpdk-dev] [PATCH 2/2] bnx2x: fix to use SOCKET_ID_ANY for slowpath memory

2016-10-05 Thread Rasesh Mody

When the DMA allocation routine is invoked in the context of a non-EAL
thread, the API rte_lcore_id() returns -1 and indexing on that in
rte_lcore_to_socket_id() leads to segfault. The fix is to use
SOCKET_ID_ANY as the socket_id for all slowpath memory allocation.

Fixes: 540a211 ("bnx2x: driver core")

Signed-off-by: Rasesh Mody 
---
 drivers/net/bnx2x/bnx2x.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bnx2x/bnx2x.c b/drivers/net/bnx2x/bnx2x.c
index 2bb4a84..8970334 100644
--- a/drivers/net/bnx2x/bnx2x.c
+++ b/drivers/net/bnx2x/bnx2x.c
@@ -178,7 +178,7 @@ bnx2x_dma_alloc(struct bnx2x_softc *sc, size_t size, struct 
bnx2x_dma *dma,

/* Caller must take care that strlen(mz_name) < RTE_MEMZONE_NAMESIZE */
z = rte_memzone_reserve_aligned(mz_name, (uint64_t) (size),
-   rte_lcore_to_socket_id(rte_lcore_id()),
+   SOCKET_ID_ANY,
0, align);
if (z == NULL) {
PMD_DRV_LOG(ERR, "DMA alloc failed for %s", msg);
-- 
1.7.10.3

[dpdk-dev] [PATCH 1/2] bnx2x: fix maximum PF queues

2016-10-05 Thread Rasesh Mody

Fix the max number of PF rx/tx queues. Set the value based
on BNX2X_MAX_RSS_COUNT() rather than hard coding it to 128.

Fixes: 540a211 ("bnx2x: driver core")

Signed-off-by: Rasesh Mody 
---
 drivers/net/bnx2x/bnx2x.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x.c b/drivers/net/bnx2x/bnx2x.c
index a49a07f..2bb4a84 100644
--- a/drivers/net/bnx2x/bnx2x.c
+++ b/drivers/net/bnx2x/bnx2x.c
@@ -9556,8 +9556,8 @@ static void bnx2x_init_rte(struct bnx2x_softc *sc)
sc->max_rx_queues = min(BNX2X_VF_MAX_QUEUES_PER_VF,
sc->igu_sb_cnt);
} else {
-   sc->max_tx_queues = 128;
-   sc->max_rx_queues = 128;
+   sc->max_rx_queues = BNX2X_MAX_RSS_COUNT(sc);
+   sc->max_tx_queues = sc->max_rx_queues;
}
 }

-- 
1.7.10.3

[dpdk-dev] [PATCH v2 2/2] i40e: Enable bad checksum flags in i40e vPMD

2016-10-05 Thread Chen, Jing D

Hi, 

> -Original Message-
> From: Shaw, Jeffrey B
> Sent: Wednesday, October 5, 2016 5:13 PM
> To: dev at dpdk.org
> Cc: Zhang, Helin ; Wu, Jingjing
> ; damarion at cisco.com; Zhang, Qi Z
> ; Chen, Jing D 
> Subject: [PATCH v2 2/2] i40e: Enable bad checksum flags in i40e vPMD
> 
> From: Damjan Marion 
> 
> Decode the checksum flags from the rx descriptor, setting the appropriate bit
> in the mbuf ol_flags field when the flag indicates a bad checksum.
> 
> Signed-off-by: Damjan Marion 
> Signed-off-by: Jeff Shaw 
Acked-by: Jing Chen 

It seems this patch also fixed a vlan flag bug, should it explain a little bit?

[dpdk-dev] [PATCH v2 1/2] i40e: Add packet_type metadata in the i40e vPMD

2016-10-05 Thread Chen, Jing D

Hi, 

> -Original Message-
> From: Shaw, Jeffrey B
> Sent: Wednesday, October 5, 2016 5:13 PM
> To: dev at dpdk.org
> Cc: Zhang, Helin ; Wu, Jingjing
> ; damarion at cisco.com; Zhang, Qi Z
> ; Chen, Jing D 
> Subject: [PATCH v2 1/2] i40e: Add packet_type metadata in the i40e vPMD
> 
> From: Damjan Marion 
> 
> The ptype is decoded from the rx descriptor and stored in the packet type
> field in the mbuf using the same function as the non-vector driver.
> 
> Signed-off-by: Damjan Marion 
> Signed-off-by: Jeff Shaw 
> Acked-by: Qi Zhang 
> ---
> 
> Changes in v2:
>  - Add missing reference to i40e_recv_scattered_pkts_vec() when
>querying supported packet types.
> 
>  drivers/net/i40e/i40e_rxtx.c | 567 
> +--
>  drivers/net/i40e/i40e_rxtx.h | 563
> ++
>  drivers/net/i40e/i40e_rxtx_vec.c |  16 ++
>  3 files changed, 582 insertions(+), 564 deletions(-)
> 
Acked-by: Jing Chen

[dpdk-dev] [PATCH v2 2/2] i40e: Enable bad checksum flags in i40e vPMD

2016-10-05 Thread Jeff Shaw

From: Damjan Marion 

Decode the checksum flags from the rx descriptor, setting
the appropriate bit in the mbuf ol_flags field when the flag
indicates a bad checksum.

Signed-off-by: Damjan Marion 
Signed-off-by: Jeff Shaw 
---
 drivers/net/i40e/i40e_rxtx_vec.c | 48 +++-
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx_vec.c b/drivers/net/i40e/i40e_rxtx_vec.c
index 6c63141..d2267ad 100644
--- a/drivers/net/i40e/i40e_rxtx_vec.c
+++ b/drivers/net/i40e/i40e_rxtx_vec.c
@@ -138,19 +138,14 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq)
 static inline void
 desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 {
-   __m128i vlan0, vlan1, rss;
-   union {
-   uint16_t e[4];
-   uint64_t dword;
-   } vol;
+   __m128i vlan0, vlan1, rss, l3_l4e;

/* mask everything except RSS, flow director and VLAN flags
 * bit2 is for VLAN tag, bit11 for flow director indication
 * bit13:12 for RSS indication.
 */
-   const __m128i rss_vlan_msk = _mm_set_epi16(
-   0x, 0x, 0x, 0x,
-   0x3804, 0x3804, 0x3804, 0x3804);
+   const __m128i rss_vlan_msk = _mm_set_epi32(
+   0x1c03004, 0x1c03004, 0x1c03004, 0x1c03004);

/* map rss and vlan type to rss hash and vlan flag */
const __m128i vlan_flags = _mm_set_epi8(0, 0, 0, 0,
@@ -163,23 +158,36 @@ desc_to_olflags_v(__m128i descs[4], struct rte_mbuf 
**rx_pkts)
PKT_RX_RSS_HASH | PKT_RX_FDIR, PKT_RX_RSS_HASH, 0, 0,
0, 0, PKT_RX_FDIR, 0);

-   vlan0 = _mm_unpackhi_epi16(descs[0], descs[1]);
-   vlan1 = _mm_unpackhi_epi16(descs[2], descs[3]);
-   vlan0 = _mm_unpacklo_epi32(vlan0, vlan1);
+   const __m128i l3_l4e_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
+   PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | 
PKT_RX_IP_CKSUM_BAD,
+   PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD,
+   PKT_RX_EIP_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD,
+   PKT_RX_EIP_CKSUM_BAD,
+   PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD,
+   PKT_RX_L4_CKSUM_BAD,
+   PKT_RX_IP_CKSUM_BAD,
+   0);
+
+   vlan0 = _mm_unpackhi_epi32(descs[0], descs[1]);
+   vlan1 = _mm_unpackhi_epi32(descs[2], descs[3]);
+   vlan0 = _mm_unpacklo_epi64(vlan0, vlan1);

vlan1 = _mm_and_si128(vlan0, rss_vlan_msk);
vlan0 = _mm_shuffle_epi8(vlan_flags, vlan1);

-   rss = _mm_srli_epi16(vlan1, 11);
+   rss = _mm_srli_epi32(vlan1, 12);
rss = _mm_shuffle_epi8(rss_flags, rss);

+   l3_l4e = _mm_srli_epi32(vlan1, 22);
+   l3_l4e = _mm_shuffle_epi8(l3_l4e_flags, l3_l4e);
+
vlan0 = _mm_or_si128(vlan0, rss);
-   vol.dword = _mm_cvtsi128_si64(vlan0);
+   vlan0 = _mm_or_si128(vlan0, l3_l4e);

-   rx_pkts[0]->ol_flags = vol.e[0];
-   rx_pkts[1]->ol_flags = vol.e[1];
-   rx_pkts[2]->ol_flags = vol.e[2];
-   rx_pkts[3]->ol_flags = vol.e[3];
+   rx_pkts[0]->ol_flags = _mm_extract_epi16(vlan0, 0);
+   rx_pkts[1]->ol_flags = _mm_extract_epi16(vlan0, 2);
+   rx_pkts[2]->ol_flags = _mm_extract_epi16(vlan0, 4);
+   rx_pkts[3]->ol_flags = _mm_extract_epi16(vlan0, 6);
 }
 #else
 #define desc_to_olflags_v(desc, rx_pkts) do {} while (0)
@@ -754,7 +762,8 @@ i40e_rx_vec_dev_conf_condition_check(struct rte_eth_dev 
*dev)
 #ifndef RTE_LIBRTE_I40E_RX_OLFLAGS_ENABLE
/* whithout rx ol_flags, no VP flag report */
if (rxmode->hw_vlan_strip != 0 ||
-   rxmode->hw_vlan_extend != 0)
+   rxmode->hw_vlan_extend != 0 ||
+   rxmode->hw_ip_checksum != 0)
return -1;
 #endif

@@ -765,8 +774,7 @@ i40e_rx_vec_dev_conf_condition_check(struct rte_eth_dev 
*dev)
 /* - no csum error report support
 * - no header split support
 */
-   if (rxmode->hw_ip_checksum == 1 ||
-   rxmode->header_split == 1)
+   if (rxmode->header_split == 1)
return -1;

return 0;
-- 
2.1.0

[dpdk-dev] [PATCH v2 1/2] i40e: Add packet_type metadata in the i40e vPMD

2016-10-05 Thread Jeff Shaw

From: Damjan Marion 

The ptype is decoded from the rx descriptor and stored
in the packet type field in the mbuf using the same function
as the non-vector driver.

Signed-off-by: Damjan Marion 
Signed-off-by: Jeff Shaw 
Acked-by: Qi Zhang 
---

Changes in v2:
 - Add missing reference to i40e_recv_scattered_pkts_vec() when
   querying supported packet types.

 drivers/net/i40e/i40e_rxtx.c | 567 +--
 drivers/net/i40e/i40e_rxtx.h | 563 ++
 drivers/net/i40e/i40e_rxtx_vec.c |  16 ++
 3 files changed, 582 insertions(+), 564 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 554d167..7433480 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -174,569 +174,6 @@ i40e_get_iee15888_flags(struct rte_mbuf *mb, uint64_t 
qword)
 }
 #endif

-/* For each value it means, datasheet of hardware can tell more details
- *
- * @note: fix i40e_dev_supported_ptypes_get() if any change here.
- */
-static inline uint32_t
-i40e_rxd_pkt_type_mapping(uint8_t ptype)
-{
-   static const uint32_t type_table[UINT8_MAX + 1] __rte_cache_aligned = {
-   /* L2 types */
-   /* [0] reserved */
-   [1] = RTE_PTYPE_L2_ETHER,
-   [2] = RTE_PTYPE_L2_ETHER_TIMESYNC,
-   /* [3] - [5] reserved */
-   [6] = RTE_PTYPE_L2_ETHER_LLDP,
-   /* [7] - [10] reserved */
-   [11] = RTE_PTYPE_L2_ETHER_ARP,
-   /* [12] - [21] reserved */
-
-   /* Non tunneled IPv4 */
-   [22] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_L4_FRAG,
-   [23] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_L4_NONFRAG,
-   [24] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_L4_UDP,
-   /* [25] reserved */
-   [26] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_L4_TCP,
-   [27] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_L4_SCTP,
-   [28] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_L4_ICMP,
-
-   /* IPv4 --> IPv4 */
-   [29] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_TUNNEL_IP |
-   RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_INNER_L4_FRAG,
-   [30] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_TUNNEL_IP |
-   RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_INNER_L4_NONFRAG,
-   [31] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_TUNNEL_IP |
-   RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_INNER_L4_UDP,
-   /* [32] reserved */
-   [33] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_TUNNEL_IP |
-   RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_INNER_L4_TCP,
-   [34] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_TUNNEL_IP |
-   RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_INNER_L4_SCTP,
-   [35] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_TUNNEL_IP |
-   RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_INNER_L4_ICMP,
-
-   /* IPv4 --> IPv6 */
-   [36] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_TUNNEL_IP |
-   RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
-   RTE_PTYPE_INNER_L4_FRAG,
-   [37] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_TUNNEL_IP |
-   RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
-   RTE_PTYPE_INNER_L4_NONFRAG,
-   [38] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_TUNNEL_IP |
-   RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
-   RTE_PTYPE_INNER_L4_UDP,
-   /* [39] reserved */
-   [40] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_TUNNEL_IP |
-   RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
-   RTE_PTYPE_INNER_L4_TCP,
-   [41] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
-   RTE_PTYPE_TUNNEL_IP |
-

[dpdk-dev] [PATCH v2 2/2] i40e: Enable bad checksum flags in i40e vPMD

2016-10-05 Thread Jeff Shaw

On Wed, Oct 05, 2016 at 04:57:28PM -0700, Chen, Jing D wrote:
> Hi,
> 
> > -Original Message-
> > From: Shaw, Jeffrey B
> > Sent: Wednesday, October 5, 2016 5:13 PM
> > To: dev at dpdk.org
> > Cc: Zhang, Helin ; Wu, Jingjing
> > ; damarion at cisco.com; Zhang, Qi Z
> > ; Chen, Jing D 
> > Subject: [PATCH v2 2/2] i40e: Enable bad checksum flags in i40e vPMD
> > 
> > From: Damjan Marion 
> > 
> > Decode the checksum flags from the rx descriptor, setting the appropriate 
> > bit
> > in the mbuf ol_flags field when the flag indicates a bad checksum.
> > 
> > Signed-off-by: Damjan Marion 
> > Signed-off-by: Jeff Shaw 
> > ---
> >  drivers/net/i40e/i40e_rxtx_vec.c | 48 
> > +++---
> > --
> >  1 file changed, 28 insertions(+), 20 deletions(-)
> > 
> > diff --git a/drivers/net/i40e/i40e_rxtx_vec.c
> > b/drivers/net/i40e/i40e_rxtx_vec.c
> > index 6c63141..d2267ad 100644
> > --- a/drivers/net/i40e/i40e_rxtx_vec.c
> > +++ b/drivers/net/i40e/i40e_rxtx_vec.c
> > @@ -138,19 +138,14 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq)  static
> > inline void  desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts) 
> >  {
> > -   __m128i vlan0, vlan1, rss;
> > -   union {
> > -   uint16_t e[4];
> > -   uint64_t dword;
> > -   } vol;
> > +   __m128i vlan0, vlan1, rss, l3_l4e;
> > 
> > /* mask everything except RSS, flow director and VLAN flags
> >  * bit2 is for VLAN tag, bit11 for flow director indication
> >  * bit13:12 for RSS indication.
> >  */
> > -   const __m128i rss_vlan_msk = _mm_set_epi16(
> > -   0x, 0x, 0x, 0x,
> > -   0x3804, 0x3804, 0x3804, 0x3804);
> > +   const __m128i rss_vlan_msk = _mm_set_epi32(
> > +   0x1c03004, 0x1c03004, 0x1c03004, 0x1c03004);

Mask is wrong here. Should be 0x1c03804, ..., etc.

> > 
> > /* map rss and vlan type to rss hash and vlan flag */
> > const __m128i vlan_flags = _mm_set_epi8(0, 0, 0, 0, @@ -163,23
> > +158,36 @@ desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
> > PKT_RX_RSS_HASH | PKT_RX_FDIR,
> > PKT_RX_RSS_HASH, 0, 0,
> > 0, 0, PKT_RX_FDIR, 0);
> > 
> > -   vlan0 = _mm_unpackhi_epi16(descs[0], descs[1]);
> > -   vlan1 = _mm_unpackhi_epi16(descs[2], descs[3]);
> > -   vlan0 = _mm_unpacklo_epi32(vlan0, vlan1);
> > +   const __m128i l3_l4e_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
> > +   PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD
> > | PKT_RX_IP_CKSUM_BAD,
> > +   PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD,
> > +   PKT_RX_EIP_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD,
> > +   PKT_RX_EIP_CKSUM_BAD,
> > +   PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD,
> > +   PKT_RX_L4_CKSUM_BAD,
> > +   PKT_RX_IP_CKSUM_BAD,
> > +   0);
> > +
> > +   vlan0 = _mm_unpackhi_epi32(descs[0], descs[1]);
> > +   vlan1 = _mm_unpackhi_epi32(descs[2], descs[3]);
> > +   vlan0 = _mm_unpacklo_epi64(vlan0, vlan1);
> > 
> > vlan1 = _mm_and_si128(vlan0, rss_vlan_msk);
> > vlan0 = _mm_shuffle_epi8(vlan_flags, vlan1);
> > 
> > -   rss = _mm_srli_epi16(vlan1, 11);
> > +   rss = _mm_srli_epi32(vlan1, 12);
> > rss = _mm_shuffle_epi8(rss_flags, rss);
> 
> My bad. Original code will use bit[13:11] to identify RSS and FDIR flag. Now 
> It masked bit 11 out when creating " rss_vlan_msk" and doing shift above,
> while it still try to use  original "rss_flags"?

Good catch.  I have no idea how you spotted that, and you're right, we should
be shifting by 11, not 12. Also the mask needs to be updated (as you
mentioned to me offline) which I noted above.

Damjan, unless you object I'll send a v3 with an updated rss_vlan_msk and
the 11 bit shift so we also get the Flow Director Filter Match (FLM)
indication.

>

[dpdk-dev] [PATCH v4 1/2] librte_ether: add internal callback functions

2016-10-05 Thread Thomas Monjalon

2016-10-05 17:04, Iremonger, Bernard:
> > > --- a/lib/librte_ether/rte_ethdev.c
> > > +++ b/lib/librte_ether/rte_ethdev.c
> > > @@ -2510,6 +2510,20 @@ void
> > >  _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
> > >   enum rte_eth_event_type event)
> > >  {
> > > + return _rte_eth_dev_callback_process_generic(dev, event, NULL); }
> > > +
> > > +void
> > > +_rte_eth_dev_callback_process_vf(struct rte_eth_dev *dev,
> > > + enum rte_eth_event_type event, void *param) {
> > > + return _rte_eth_dev_callback_process_generic(dev, event, param);
> > }
> > 
> > This function is just adding a parameter, compared to the legacy
> > _rte_eth_dev_callback_process.
> > Why calling it process_vf?
> 
> The parameter is just being added for the VF event, the handling of the other 
> events is unchanged.
> 
> > And by the way, why not just replacing the legacy function?
> > As it is a driver interface, there is no ABI restriction.
> 
> I thought there would be an ABI issue if the legacy function is replaced.
> The _rte_eth_dev_callback_process is exported in DPDK 2.2 and used in the 
> following PMD's, lib and app:
> 
> app/test/virtual_pmd
> drivers/net/e1000
> drivers/net/ixgbe
> drivers/net/mlx5
> drivers/net/vhost
> drivers/net/virtio
> lib/librte_ether
> 
>  Adding a parameter to _rte_eth_dev_callback_process()  will impact all of 
> the above.
> Will this cause an ABI issue?

No because ABI is for applications (Application Binary Interface).
Here you are just changing the driver interface. And we have no commitment
to maintain the compatibility of this interface for external drivers.

> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -3026,6 +3026,7 @@ enum rte_eth_event_type {
> > >   /**< queue state event (enabled/disabled)
> > */
> > >   RTE_ETH_EVENT_INTR_RESET,
> > >   /**< reset interrupt event, sent to VF on PF reset */
> > > + RTE_ETH_EVENT_VF_MBOX,  /**< PF mailbox processing callback */
> > >   RTE_ETH_EVENT_MAX   /**< max value of this enum */
> > >  };
> > 
> > Either we choose to have a "generic" VF event well documented, or it is just
> > a specific event with a tip on where to find the doc.
> > Here we need at least to know how to handle the argument.
> 
> It is a specific event for VF to PF messages, details on the function and 
> arguments are in the rte_ethdev.h file.

No I think it is only explained in the ixgbe code.

[dpdk-dev] [PATCH 1/1] eal: Don't fail secondary if primary is missing tailqs

2016-10-05 Thread Thomas Monjalon

2016-10-05 09:49, Jean Tourrilhes:
> On Wed, Oct 05, 2016 at 09:58:01AM +0200, David Marchand wrote:
> > I thought you had unaligned binaries.
> > You are compiling only one binary ?
> 
>   Primary is compiled using the DPDK build process.
>   Secondary is build using the Snort build process.
>   Both are pointing to the exact same libdpdk.a.

Probably that you would have some aligned builds if Snort was using
a pkg-config approach to link DPDK.
I cannot commit but I would like to generate some pkg-config files
in the DPDK build system to ease linking from external applications.

> > I am not sure Sergio is talking about the constructor approach.
> 
>   But, this is exactly the cause of the problem.
> 
> > Anyway, the constructors invocation order should not matter.
> 
>   For tailq, I agree. For mempool constructors, order do matter.

I don't know why such a complex function (rte_mempool_register_ops) is
called inside a constructor. Maybe that's the main problem.

[dpdk-dev] [PATCH v2 04/22] qede/base: update base driver

2016-10-05 Thread Thomas Monjalon

2016-09-30 18:40, Mody, Rasesh:
> > From: Ferruh Yigit [mailto:ferruh.yigit at intel.com]
> > Thank you for the update, base driver patch update now reduced from
> > "14653 insertions(+), 8536 deletions(-)" to
> > "10857 insertions(+), 4853 deletions(-)"
> > 
> > But this is still to big for reviewing, specially there are some low 
> > hanging fruits
> > for cleanup, like big chunk of comment updates or whitespace updates or
> > non base driver codes in the patch.
> > 
> > If the expectation is that somebody non maintainer review the code,
> > understand it and highlight any possible defects, I believe this patch is 
> > too big
> > and needs to be split more into logical pieces, but since this is a driver 
> > code
> > and a little special, and it may not be possible to completely understand 
> > the
> > code without knowing underlying hardware, I am not sure how to proceed
> > and adding Bruce and Thomas to cc for guidance.

As you said it makes contribution really hard.
So it is missing the point of sharing its source code in Open Source.

> This 8.10.x.x base driver is a common code shared by multiple drivers. It has 
> gone through extensive testing.
> We have split the base driver patch into smaller logical patches. It would be 
> difficult to split this patch further.

Please check how Intel updates its base drivers and try harder.
We all know it requires some time, but it provides a valuable knowledge base.
Thanks

[dpdk-dev] [PATCH v4 1/2] librte_ether: add internal callback functions

2016-10-05 Thread Thomas Monjalon

2016-10-04 15:52, Bernard Iremonger:
> add _rte_eth_dev_callback_process_vf function.
> add _rte_eth_dev_callback_process_generic function
> 
> Adding a callback to the user application on VF to PF mailbox message,
> allows passing information to the application controlling the PF
> when a VF mailbox event message is received, such as VF reset.

I have some difficulties to parse this explanation.
Please could you reword it and precise the direction of the message
and the use case context?

> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -2510,6 +2510,20 @@ void
>  _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
>   enum rte_eth_event_type event)
>  {
> + return _rte_eth_dev_callback_process_generic(dev, event, NULL);
> +}
> +
> +void
> +_rte_eth_dev_callback_process_vf(struct rte_eth_dev *dev,
> + enum rte_eth_event_type event, void *param)
> +{
> + return _rte_eth_dev_callback_process_generic(dev, event, param);
> +}

This function is just adding a parameter, compared to the legacy
_rte_eth_dev_callback_process.
Why calling it process_vf?
And by the way, why not just replacing the legacy function?
As it is a driver interface, there is no ABI restriction.

> +
> +void
> +_rte_eth_dev_callback_process_generic(struct rte_eth_dev *dev,
> + enum rte_eth_event_type event, void *param)
> +{
[...]
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -3026,6 +3026,7 @@ enum rte_eth_event_type {
>   /**< queue state event (enabled/disabled) */
>   RTE_ETH_EVENT_INTR_RESET,
>   /**< reset interrupt event, sent to VF on PF reset */
> + RTE_ETH_EVENT_VF_MBOX,  /**< PF mailbox processing callback */
>   RTE_ETH_EVENT_MAX   /**< max value of this enum */
>  };

Either we choose to have a "generic" VF event well documented,
or it is just a specific event with a tip on where to find the doc.
Here we need at least to know how to handle the argument.

> +/**
> + * @internal Executes all the user application registered callbacks. Used by:
> + * _rte_eth_dev_callback_process and _rte_eth_dev_callback_process_vf
> + * It is for DPDK internal user only. User application should not call it
> + * directly.
> + *
> + * @param dev
> + *  Pointer to struct rte_eth_dev.
> + * @param event
> + *  Eth device interrupt event type.
> + *
> + * @param param
> + *  parameters to pass back to user application.
> + *
> + * @return
> + *  void
> + */
> +void
> +_rte_eth_dev_callback_process_generic(struct rte_eth_dev *dev,
> + enum rte_eth_event_type event, void *param);

This is really an internal function and should not be exported at all.

[dpdk-dev] [PATCH v11 00/24] Introducing rte_driver/rte_device generalization

2016-10-05 Thread Shreyansh Jain

Hi Thomas,

On Tuesday 04 October 2016 01:12 PM, Thomas Monjalon wrote:
> 2016-10-04 12:21, Shreyansh Jain:
>> Hi Thomas,
>>
>> On Monday 03 October 2016 07:58 PM, Thomas Monjalon wrote:
>>> Applied, thanks everybody for the great (re)work!
>>
>> Thanks!
>>
[...]
[...]
>>
>> It can be merged with changes for:
>>   - drv_name
>>   - EAL_ before _REGISTER_ macros
>>   - eth_driver => rte_driver naming
>
> Good.
> Could you make it this week, please?
>

Certainly. At least some of those I can send within this week :)


-
Shreyansh

[dpdk-dev] [PATCH] examples/l3fwd: em path hash offload to machine

2016-10-05 Thread Jerin Jacob

On Tue, Aug 23, 2016 at 08:24:39PM +0530, Hemant Agrawal wrote:

Maybe you can change the subject line to:
examples/l3fwd: em: use hw accelerated crc hash function for arm64
instead of:
examples/l3fwd: em path hash offload to machine

> if machine level CRC extension are available, offload the
> hash to machine provided functions e.g. armv8-a CRC extensions
> support it
> 
> Signed-off-by: Hemant Agrawal 
> ---
>  examples/l3fwd/l3fwd_em.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
> index def5a02..a889c67 100644
> --- a/examples/l3fwd/l3fwd_em.c
> +++ b/examples/l3fwd/l3fwd_em.c
> @@ -58,13 +58,13 @@
>  
>  #include "l3fwd.h"
>  
> -#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) || defined(RTE_MACHINE_CPUFLAG_CRC32)

Rather than adding new compilation flag everywhere, Maybe you can add

#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) &&
defined(RTE_MACHINE_CPUFLAG_CRC32)
#define EM_HASH_CRC 1
#endif

something like above to reduce the change for future platforms with crc
support.

Other than that, you can add:
Reviewed-by: Jerin Jacob 

>  #include 
>  #define DEFAULT_HASH_FUNC   rte_hash_crc
>  #else
>  #include 
>  #define DEFAULT_HASH_FUNC   rte_jhash
> -#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
> +#endif
>  
>  #define IPV6_ADDR_LEN 16
>  
> @@ -169,17 +169,17 @@ ipv4_hash_crc(const void *data, __rte_unused uint32_t 
> data_len,
>   t = k->proto;
>   p = (const uint32_t *)>port_src;
>  
> -#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) || defined(RTE_MACHINE_CPUFLAG_CRC32)
>   init_val = rte_hash_crc_4byte(t, init_val);
>   init_val = rte_hash_crc_4byte(k->ip_src, init_val);
>   init_val = rte_hash_crc_4byte(k->ip_dst, init_val);
>   init_val = rte_hash_crc_4byte(*p, init_val);
> -#else /* RTE_MACHINE_CPUFLAG_SSE4_2 */
> +#else
>   init_val = rte_jhash_1word(t, init_val);
>   init_val = rte_jhash_1word(k->ip_src, init_val);
>   init_val = rte_jhash_1word(k->ip_dst, init_val);
>   init_val = rte_jhash_1word(*p, init_val);
> -#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
> +#endif
>  
>   return init_val;
>  }
> @@ -191,16 +191,16 @@ ipv6_hash_crc(const void *data, __rte_unused uint32_t 
> data_len,
>   const union ipv6_5tuple_host *k;
>   uint32_t t;
>   const uint32_t *p;
> -#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) || defined(RTE_MACHINE_CPUFLAG_CRC32)
>   const uint32_t  *ip_src0, *ip_src1, *ip_src2, *ip_src3;
>   const uint32_t  *ip_dst0, *ip_dst1, *ip_dst2, *ip_dst3;
> -#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
> +#endif
>  
>   k = data;
>   t = k->proto;
>   p = (const uint32_t *)>port_src;
>  
> -#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) || defined(RTE_MACHINE_CPUFLAG_CRC32)
>   ip_src0 = (const uint32_t *) k->ip_src;
>   ip_src1 = (const uint32_t *)(k->ip_src+4);
>   ip_src2 = (const uint32_t *)(k->ip_src+8);
> @@ -219,14 +219,14 @@ ipv6_hash_crc(const void *data, __rte_unused uint32_t 
> data_len,
>   init_val = rte_hash_crc_4byte(*ip_dst2, init_val);
>   init_val = rte_hash_crc_4byte(*ip_dst3, init_val);
>   init_val = rte_hash_crc_4byte(*p, init_val);
> -#else /* RTE_MACHINE_CPUFLAG_SSE4_2 */
> +#else
>   init_val = rte_jhash_1word(t, init_val);
>   init_val = rte_jhash(k->ip_src,
>   sizeof(uint8_t) * IPV6_ADDR_LEN, init_val);
>   init_val = rte_jhash(k->ip_dst,
>   sizeof(uint8_t) * IPV6_ADDR_LEN, init_val);
>   init_val = rte_jhash_1word(*p, init_val);
> -#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
> +#endif
>   return init_val;
>  }
>  
> -- 
> 1.9.1
>

[dpdk-dev] [PATCH v4 1/2] librte_ether: add internal callback functions

2016-10-05 Thread Iremonger, Bernard

Hi Thomas,



> Subject: Re: [dpdk-dev] [PATCH v4 1/2] librte_ether: add internal callback
> functions
> 
> 2016-10-04 15:52, Bernard Iremonger:
> > add _rte_eth_dev_callback_process_vf function.
> > add _rte_eth_dev_callback_process_generic function
> >
> > Adding a callback to the user application on VF to PF mailbox message,
> > allows passing information to the application controlling the PF when
> > a VF mailbox event message is received, such as VF reset.
> 
> I have some difficulties to parse this explanation.
> Please could you reword it and precise the direction of the message and the
> use case context?

I will reword the explanation and add use case context.

> > --- a/lib/librte_ether/rte_ethdev.c
> > +++ b/lib/librte_ether/rte_ethdev.c
> > @@ -2510,6 +2510,20 @@ void
> >  _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
> > enum rte_eth_event_type event)
> >  {
> > +   return _rte_eth_dev_callback_process_generic(dev, event, NULL); }
> > +
> > +void
> > +_rte_eth_dev_callback_process_vf(struct rte_eth_dev *dev,
> > +   enum rte_eth_event_type event, void *param) {
> > +   return _rte_eth_dev_callback_process_generic(dev, event, param);
> }
> 
> This function is just adding a parameter, compared to the legacy
> _rte_eth_dev_callback_process.
> Why calling it process_vf?

The parameter is just being added for the VF event, the handling of the other 
events is unchanged.

> And by the way, why not just replacing the legacy function?
> As it is a driver interface, there is no ABI restriction.

I thought there would be an ABI issue if the legacy function is replaced.
The _rte_eth_dev_callback_process is exported in DPDK 2.2 and used in the 
following PMD's, lib and app:

app/test/virtual_pmd
drivers/net/e1000
drivers/net/ixgbe
drivers/net/mlx5
drivers/net/vhost
drivers/net/virtio
lib/librte_ether

 Adding a parameter to _rte_eth_dev_callback_process()  will impact all of the 
above.
Will this cause an ABI issue?

> > +
> > +void
> > +_rte_eth_dev_callback_process_generic(struct rte_eth_dev *dev,
> > +   enum rte_eth_event_type event, void *param) {
> [...]
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -3026,6 +3026,7 @@ enum rte_eth_event_type {
> > /**< queue state event (enabled/disabled)
> */
> > RTE_ETH_EVENT_INTR_RESET,
> > /**< reset interrupt event, sent to VF on PF reset */
> > +   RTE_ETH_EVENT_VF_MBOX,  /**< PF mailbox processing callback */
> > RTE_ETH_EVENT_MAX   /**< max value of this enum */
> >  };
> 
> Either we choose to have a "generic" VF event well documented, or it is just
> a specific event with a tip on where to find the doc.
> Here we need at least to know how to handle the argument.

It is a specific event for VF to PF messages, details on the function and 
arguments are in the rte_ethdev.h file.

> > +/**
> > + * @internal Executes all the user application registered callbacks. Used
> by:
> > + * _rte_eth_dev_callback_process and
> _rte_eth_dev_callback_process_vf
> > + * It is for DPDK internal user only. User application should not
> > +call it
> > + * directly.
> > + *
> > + * @param dev
> > + *  Pointer to struct rte_eth_dev.
> > + * @param event
> > + *  Eth device interrupt event type.
> > + *
> > + * @param param
> > + *  parameters to pass back to user application.
> > + *
> > + * @return
> > + *  void
> > + */
> > +void
> > +_rte_eth_dev_callback_process_generic(struct rte_eth_dev *dev,
> > +   enum rte_eth_event_type event, void
> *param);
> 
> This is really an internal function and should not be exported at all.

Both new functions are internal I  will make them static and remove them from 
the map file.
When the functions are made static, should the function declarations be moved 
from rte_ethdev.h to rte_ethdev.c ?

Thanks for the review.

Regards,

Bernard.

[dpdk-dev] [PATCH v2 1/8] mbuf: add function to dump ol flag list

2016-10-05 Thread Olivier Matz

Hi Pablo,

On 10/05/2016 08:45 AM, De Lara Guarch, Pablo wrote:
> Hi Olivier,
> 
>> -Original Message-
>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
>> index fb2b962..56f37e6 100644
>> --- a/lib/librte_mbuf/rte_mbuf.c
>> +++ b/lib/librte_mbuf/rte_mbuf.c
>> @@ -319,6 +319,53 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
>>  }
>>  }
>>
>> +struct flag_mask {
>> +uint64_t flag;
>> +uint64_t mask;
>> +const char *default_name;
>> +};
>> +
>> +/* write the list of rx ol flags in buffer buf */
>> +int rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)
> 
> Return type should go in a separate line (same case in the other function).

I'll fix that too, thanks for reviewing.

Olivier

[dpdk-dev] [PATCH v2 4/8] app/testpmd: add option to enable lro

2016-10-05 Thread Olivier Matz

Hi Pablo,

On 10/05/2016 08:26 AM, De Lara Guarch, Pablo wrote:
> Hi Olivier,
> 
>> -Original Message-
>> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
>> Sent: Friday, September 09, 2016 12:56 AM
>> To: dev at dpdk.org; De Lara Guarch, Pablo
>> Subject: [PATCH v2 4/8] app/testpmd: add option to enable lro
>>
>> Introduce a new argument '--enable-lro' to ask testpmd to enable the LRO
>> feature on enabled ports, like it's done for '--enable-rx-cksum' for
>> instance.
>>
>> Signed-off-by: Olivier Matz 
>> ---
>>  app/test-pmd/parameters.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
>> index 8792c2c..612ad37 100644
>> --- a/app/test-pmd/parameters.c
>> +++ b/app/test-pmd/parameters.c
>> @@ -150,6 +150,7 @@ usage(char* progname)
>> "If the drop-queue doesn't exist, the packet is dropped. "
>> "By default drop-queue=127.\n");
>>  printf("  --crc-strip: enable CRC stripping by hardware.\n");
>> +printf("  --enable-lro: enable large receive offload.\n");
>>  printf("  --enable-rx-cksum: enable rx hardware checksum
>> offload.\n");
>>  printf("  --disable-hw-vlan: disable hardware vlan.\n");
>>  printf("  --disable-hw-vlan-filter: disable hardware vlan filter.\n");
>> @@ -525,6 +526,7 @@ launch_args_parse(int argc, char** argv)
>>  { "pkt-filter-size",1, 0, 0 },
>>  { "pkt-filter-drop-queue",  1, 0, 0 },
>>  { "crc-strip",  0, 0, 0 },
>> +{ "enable-lro", 0, 0, 0 },
>>  { "enable-rx-cksum",0, 0, 0 },
>>  { "enable-scatter", 0, 0, 0 },
>>  { "disable-hw-vlan",0, 0, 0 },
>> @@ -765,6 +767,8 @@ launch_args_parse(int argc, char** argv)
>>  }
>>  if (!strcmp(lgopts[opt_idx].name, "crc-strip"))
>>  rx_mode.hw_strip_crc = 1;
>> +if (!strcmp(lgopts[opt_idx].name, "enable-lro"))
>> +rx_mode.enable_lro = 1;
>>  if (!strcmp(lgopts[opt_idx].name, "enable-scatter"))
>>  rx_mode.enable_scatter = 1;
>>  if (!strcmp(lgopts[opt_idx].name, "enable-rx-cksum"))
>> --
>> 2.8.1
> 
> Could you add this new parameter in the testpmd documentation?

Yes, I'll do it.


Thanks,
Olivier

[dpdk-dev] Proposal: enable redirection of DPDK logs from the user app

2016-10-05 Thread Olivier Matz

Hi Francesco,

On 10/05/2016 03:26 PM, Montorsi, Francesco wrote:
> Hi Olivier,
> 
>> On 10/04/2016 02:28 PM, Montorsi, Francesco wrote:
>>> Yes, but to be honest, that seems a troublesome solution for something
>>> as easy as logging a string; e.g. by using fopencookie() approach, you
>>> don't have the concept of "log message", you just provide a function
>>> that must write a block of bytes somewhere.
>>> Typically instead, you need to know where a log message starts and
>>> ends, to e.g., add prefixes/postfixes to it.
>>
>> I'm not sure that true if you call setbuf(log_stream, NULL).
>>
>> In that case, it looks easy to prefix / postfix messages with a fopencookie
>> callback like:
>>
>> /* example on stdout */
>> ssize_t
>> simple_write(void *c, const char *buf, size_t size) {
>>  ssize_t ret1, ret2, ret3;
>>
>>  ret1 = fwrite("<", 1, 1, stdout);
>>  if (ret1 == 0)
>>  return 0;
>>  ret2 = fwrite(buf, size, 1, stdout);
>>  if (ret2 == 0)
>>  return 0;
>>  ret3 = fwrite(">", 1, 1, stdout);
>>  if (ret3 == 0)
>>  return 0;
>>  return ret1 + ret2 + ret3;
>> }
>>
> I didn't know about setbuf()... but are we sure that in this way the 
> simple_write() function will always receive a full string? I mean: in the 
> manpage for setbuf() it says:
> 
> "... When the first I/O operation occurs on a file, malloc(3) is called, and 
> a buffer is obtained.  If the argument buf is NULL, only the mode is 
> affected; a new buffer will be allocated on the next read or write operation."
> 
> But: is it true that 1 write operation corresponds to 1 vfprintf() call? 
> Maybe if you have a "long" a single vfprintf() call may translate to several 
> simple_write() calls... I don't know honestly.

I did a quick test with a fixed version of simple_write():

ssize_t
simple_write(void *c, const char *buf, size_t size)
{
ssize_t ret1 = -42, ret3 = -42;
ssize_t ret = 0;

ret1 = fwrite("<", 1, 1, stdout);
if (ret1 == 0)
goto ret;
ret = fwrite(buf, 1, size, stdout);
if (ret != size)
goto ret;
ret3 = fwrite(">", 1, 1, stdout);
if (ret3 == 0)
goto ret;

ret:
/* printf("ret=%d ret1=%d ret3=%d\n", (int)ret, (int)ret1, (int)ret3); 
*/
return ret;
}


It looks like transmitting a string bigger than BUFSIZ (8192) induces
several calls to simple_write(). For smaller calls, it seems there is no
split (1 printf = 1 simple_write).

Of course, this is a just test and not a proof :)
I think we would have a similar issue with the other approach.

For me, the current API looks ok, however let's see the opinion of the
maintainer that could be different of mine. In any case, thank you for
proposing enhancements.

Regards,
Olivier

[dpdk-dev] [PATCH] ip_frag: fix missing dependency on librte_hash

2016-10-05 Thread Thomas Monjalon

2016-10-05 15:14, Panu Matilainen:
> Not sure what exactly changed and where, but I've started getting
> build failures on Fedora rawhide i386:
> lib/librte_ip_frag/ip_frag_internal.c:36:23: fatal error:
>   rte_jhash.h: No such file or directory
>  #include 
>^
> Looking at librte_ip_frag, it clearly depends on librte_hash so
> its probably more a question of something commonly masking the issue.
> 
> Signed-off-by: Panu Matilainen 

Applied, thanks

[dpdk-dev] [PATCH 2/2] kni: remove unnecessary ethtool files

2016-10-05 Thread Thomas Monjalon

2016-10-05 09:58, Remy Horton:
> 
> On 30/09/2016 11:10, Ferruh Yigit wrote:
> > Signed-off-by: Ferruh Yigit 
> > ---
> >  lib/librte_eal/linuxapp/kni/Makefile   |   3 -
> >  .../linuxapp/kni/ethtool/igb/igb_debugfs.c |  28 --
> >  .../linuxapp/kni/ethtool/igb/igb_hwmon.c   | 260 ---
> >  .../linuxapp/kni/ethtool/igb/igb_procfs.c  | 363 
> > -
> >  4 files changed, 654 deletions(-)
> >  delete mode 100644 lib/librte_eal/linuxapp/kni/ethtool/igb/igb_debugfs.c
> >  delete mode 100644 lib/librte_eal/linuxapp/kni/ethtool/igb/igb_hwmon.c
> >  delete mode 100644 lib/librte_eal/linuxapp/kni/ethtool/igb/igb_procfs.c
> 
> Acked-by: Remy Horton 

Applied, thanks

[dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support

2016-10-05 Thread Olivier Matz



On 10/05/2016 03:27 PM, Maxime Coquelin wrote:
>> @@ -903,7 +905,8 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
>> **rx_pkts, uint16_t nb_pkts)
>> rte_vlan_strip(rxm);
>>
>> /* Update offload features */
>> -   if (virtio_rx_offload(rxm, hdr) < 0) {
>> +   if ((features & VIRTIO_NET_F_GUEST_CSUM) &&
> s/VIRTIO_NET_F_GUEST_CSUM/(1u << VIRTIO_NET_F_GUEST_CSUM)/

oooh good catch :)

> And don't forget to update the test for LRO patch.

yep

> Except this, it sounds good.

Thanks, I'll send a v3 soon.

Olivier

[dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support

2016-10-05 Thread Maxime Coquelin

Hi Olivier,

On 10/05/2016 01:56 PM, Olivier Matz wrote:
> Hi Maxime,
>
> On 10/03/2016 02:51 PM, Maxime Coquelin wrote:
>>> --- a/drivers/net/virtio/virtio_rxtx.c
>>> +++ b/drivers/net/virtio/virtio_rxtx.c
>>> @@ -50,6 +50,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>
>>>  #include "virtio_logs.h"
>>>  #include "virtio_ethdev.h"
>>> @@ -627,6 +628,56 @@ virtio_update_packet_stats(struct virtnet_stats
>>> *stats, struct rte_mbuf *mbuf)
>>>  }
>>>  }
>>>
>>> +/* Optionally fill offload information in structure */
>>> +static int
>>> +virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
>>> +{
>>> +struct rte_net_hdr_lens hdr_lens;
>>> +uint32_t hdrlen, ptype;
>>> +int l4_supported = 0;
>>> +
>>> +/* nothing to do */
>>> +if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
>>> +return 0;
>> Maybe we could first check whether offload features were negotiated?
>> Doing this, we could return before accessing the header and so avoid a
>> cache miss.
>
> Yes, doing this would avoid reading the virtio header when the rx
> function is virtio_recv_pkts(). When using virtio_recv_mergeable_pkts(),
> it won't have a big impact since we already need to read hdr->num_buffers.
Right, it matters only for the non-mergeable buffers case.

>
>
> I plan to do something like this in both recv functions:
>
> @@ -854,6 +854,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts, uint16_t nb_pkts)
> int error;
> uint32_t i, nb_enqueued;
> uint32_t hdr_size;
> +   uint64_t features;
> struct virtio_net_hdr *hdr;
>
> nb_used = VIRTQUEUE_NUSED(vq);
> @@ -872,6 +873,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts, uint16_t nb_pkts)
> nb_rx = 0;
> nb_enqueued = 0;
> hdr_size = hw->vtnet_hdr_size;
> +   features = hw->guest_features;
>
> for (i = 0; i < num ; i++) {
> rxm = rcv_pkts[i];
> @@ -903,7 +905,8 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts, uint16_t nb_pkts)
> rte_vlan_strip(rxm);
>
> /* Update offload features */
> -   if (virtio_rx_offload(rxm, hdr) < 0) {
> +   if ((features & VIRTIO_NET_F_GUEST_CSUM) &&
s/VIRTIO_NET_F_GUEST_CSUM/(1u << VIRTIO_NET_F_GUEST_CSUM)/
And don't forget to update the test for LRO patch.
Except this, it sounds good.

Thanks,
Maxime
> +   virtio_rx_offload(rxm, hdr) < 0) {
> virtio_discard_rxbuf(vq, rxm);
> rxvq->stats.errors++;
> continue;
>
> Thank you for the feedback.
> Olivier
>

[dpdk-dev] [PATCH] ip_frag: fix missing dependency on librte_hash

2016-10-05 Thread Panu Matilainen

Not sure what exactly changed and where, but I've started getting
build failures on Fedora rawhide i386:
lib/librte_ip_frag/ip_frag_internal.c:36:23: fatal error:
rte_jhash.h: No such file or directory
 #include 
   ^
Looking at librte_ip_frag, it clearly depends on librte_hash so
its probably more a question of something commonly masking the issue.

Signed-off-by: Panu Matilainen 
---
 lib/librte_ip_frag/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_ip_frag/Makefile b/lib/librte_ip_frag/Makefile
index e97dfbd..43f8b1e 100644
--- a/lib/librte_ip_frag/Makefile
+++ b/lib/librte_ip_frag/Makefile
@@ -54,6 +54,7 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_IP_FRAG)-include += rte_ip_frag.h

 DEPDIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += lib/librte_eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += lib/librte_hash
 DEPDIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += lib/librte_mbuf
 DEPDIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += lib/librte_mempool

-- 
2.7.4

[dpdk-dev] Proposal: enable redirection of DPDK logs from the user app

2016-10-05 Thread Matthew Hall

On Wed, Oct 05, 2016 at 01:26:30PM +, Montorsi, Francesco wrote:
> Correct, but in my experience DPDK never creates such a long line of log 
> message... 
> 
> Francesco

This comment is fatally flawed. Many of us write our applications using these 
functions. I have things which hex-dump packets when certain debug levels are 
enabled. Sadly my patch to add deeper levels to default DPDK was rejected, but 
with that I have very detailed levels that say everything I ever do to the 
packet.

Matthew.

[dpdk-dev] [PATCH] app/test: add mempool walk

2016-10-05 Thread Olivier Matz

Hi Thomas,

On 10/03/2016 10:58 PM, Thomas Monjalon wrote:
> The mempool function rte_mempool_walk was not tested.
> It will print the name of all mempools.
> 
> Signed-off-by: Thomas Monjalon 

Acked-by: Olivier Matz

[dpdk-dev] [PATCH] mbuf: fix leak and errno on pool creation error

2016-10-05 Thread Thomas Monjalon

2016-09-19 14:34, Olivier Matz:
> On error, the mempool object has to be freed, and rte_errno should be a
> positive value.
> 
> Fixes: 152ca517900b ("mbuf: use default mempool handler from config")
> 
> Signed-off-by: Olivier Matz 

Applied, thanks

[dpdk-dev] Proposal: enable redirection of DPDK logs from the user app

2016-10-05 Thread Olivier Matz

Hi Francesco,

On 10/04/2016 02:28 PM, Montorsi, Francesco wrote:
> Hi Olivier,
> 
>> It seems the mailing list stripped your patch sent as attachment. 
>> Can you please resend it again in the body of the mail?
> You're right sorry. It's attached at the end of this mail.
> 
>> I think we can already redirect logs to a file by using
>> fopencookie() + rte_openlog_stream(). Did you already check these
>> functions?
> 
> Yes, but to be honest, that seems a troublesome solution for
> something as easy as logging a string; e.g. by using fopencookie()
> approach, you don't have the concept of "log message", you just
> provide a function that must write a block of bytes somewhere.
> Typically instead, you need to know where a log message starts and
> ends, to e.g., add prefixes/postfixes to it.

I'm not sure that true if you call setbuf(log_stream, NULL).

In that case, it looks easy to prefix / postfix messages with a
fopencookie callback like:

/* example on stdout */
ssize_t
simple_write(void *c, const char *buf, size_t size)
{
ssize_t ret1, ret2, ret3;

ret1 = fwrite("<", 1, 1, stdout);
if (ret1 == 0)
return 0;
ret2 = fwrite(buf, size, 1, stdout);
if (ret2 == 0)
return 0;
ret3 = fwrite(">", 1, 1, stdout);
if (ret3 == 0)
return 0;
return ret1 + ret2 + ret3;
}


> Indeed, most of the C/C++ (open source) libraries have some simple
> hook that allows the user to have more control on logging... I think
> DPDK should be no exception... :)

I understand that the current API is a bit more complex, but I don't
feel there is any blocking issue to do what you want. What do you think?


Also, I know you've said your patch needs some rework, but as you've
also said you are using it, maybe it would be useful for you to know:
- it makes use of a global variable 'log_buffer', shared by all the
pthreads, which can lead to crashes
- it strips the log messages to 4095 chars

Regards,
Olivier

[dpdk-dev] [RFC 0/7] changing mbuf pool handler

2016-10-05 Thread Hunt, David



On 5/10/2016 12:49 PM, Hemant Agrawal wrote:
> Hi Olivier,
>
>> -Original Message-
>> From: Hunt, David [mailto:david.hunt at intel.com]
>> Hi Olivier,
>>
>>
>> On 3/10/2016 4:49 PM, Olivier Matz wrote:
>>> Hi Hemant,
>>>
>>> Thank you for your feedback.
>>>
>>> On 09/22/2016 01:52 PM, Hemant Agrawal wrote:
 Hi Olivier

 On 9/19/2016 7:12 PM, Olivier Matz wrote:
> Hello,
>
> Following discussion from [1] ("usages issue with external mempool").
>
> This is a tentative to make the mempool_ops feature introduced by
> David Hunt [2] more widely used by applications.
>
> It applies on top of a minor fix in mbuf lib [3].
>
> To sumarize the needs (please comment if I did not got it properly):
>
> - new hw-assisted mempool handlers will soon be introduced
> - to make use of it, the new mempool API [4]
>> (rte_mempool_create_empty,
> rte_mempool_populate, ...) has to be used
> - the legacy mempool API (rte_mempool_create) does not allow to
>> change
> the mempool ops. The default is "ring_p_c" depending on
> flags.
> - the mbuf helper (rte_pktmbuf_pool_create) does not allow to change
> them either, and the default is RTE_MBUF_DEFAULT_MEMPOOL_OPS
> ("ring_mp_mc")
> - today, most (if not all) applications and examples use either
> rte_pktmbuf_pool_create or rte_mempool_create to create the mbuf
> pool, making it difficult to take advantage of this feature with
> existing apps.
>
> My initial idea was to deprecate both rte_pktmbuf_pool_create() and
> rte_mempool_create(), forcing the applications to use the new API,
> which is more flexible. But after digging a bit, it appeared that
> rte_mempool_create() is widely used, and not only for mbufs.
> Deprecating it would have a big impact on applications, and
> replacing it with the new API would be overkill in many use-cases.
 I agree with the proposal.

> So I finally tried the following approach (inspired from a
> suggestion Jerin [5]):
>
> - add a new mempool_ops parameter to rte_pktmbuf_pool_create().
>> This
> unfortunatelly breaks the API, but I implemented an ABI compat layer.
> If the patch is accepted, we could discuss how to announce/schedule
> the API change.
> - update the applications and documentation to prefer
> rte_pktmbuf_pool_create() as much as possible
> - update most used examples (testpmd, l2fwd, l3fwd) to add a new
>> command
> line argument to select the mempool handler
>
> I hope the external applications would then switch to
> rte_pktmbuf_pool_create(), since it supports most of the use-cases
> (even priv_size != 0, since we can call rte_mempool_obj_iter() after) .
>
 I will still prefer if you can add the "rte_mempool_obj_cb_t *obj_cb,
 void *obj_cb_arg" into "rte_pktmbuf_pool_create". This single
 consolidated wrapper will almost make it certain that applications
 will not try to use rte_mempool_create for packet buffers.
>>> The patch changes the example applications. I'm not sure I understand
>>> why adding these arguments would force application to not use
>>> rte_mempool_create() for packet buffers. Do you have a application in
>> mind?
>>> For the mempool_ops parameter, we must pass it at init because we need
>>> to know the mempool handler before populating the pool. For object
>>> initialization, it can be done after, so I thought it was better to
>>> reduce the number of arguments to avoid to fall in the
>>> mempool_create() syndrom :)
>> I also agree with the proposal. Looks cleaner.
>>
>> I would lean to the side of keeping the parameters to the minimum, i.e.
>> not adding *obj_cb and *obj_cb_arg into rte_pktmbuf_pool_create.
>> Developers always have the option of going with rte_mempool_create if they
>> need more fine-grained control.
> [Hemant] The implementations with hw offloaded mempools don't want developer 
> using *rte_mempool_create* for packet buffer pools.
> This API does not work for hw offloaded mempool.
>
> Also, *rte_mempool_create_empty* - may not be convenient for many 
> application, as it requires calling  4+ APIs.
>
> Olivier is not in favor of deprecating the *rte_mempool_create*.   I agree 
> with concerns raised by him.
>
> Essentially, I was suggesting to upgrade * rte_pktmbuf_pool_create* to be 
> like *rte_mempool_create*  for packet buffers exclusively.
>
> This will provide a clear segregation for API usages w.r.t the packet buffer 
> pool vs all other type of mempools.

Yes, it does sound like we need those extra parameters on 
rte_pktmbuf_pool_create.

Regards,
Dave.

[dpdk-dev] [PATCH v2] doc: arm64: document DPDK application profiling methods

2016-10-05 Thread Jerin Jacob

Signed-off-by: Jerin Jacob 
---
v2:
-Addressed ARM64 specific review comments(Suggested by Thomas)
http://dpdk.org/dev/patchwork/patch/16362/
---
 doc/guides/prog_guide/profile_app.rst | 58 +++
 1 file changed, 58 insertions(+)

diff --git a/doc/guides/prog_guide/profile_app.rst 
b/doc/guides/prog_guide/profile_app.rst
index 3226187..9f1b7ee 100644
--- a/doc/guides/prog_guide/profile_app.rst
+++ b/doc/guides/prog_guide/profile_app.rst
@@ -31,6 +31,14 @@
 Profile Your Application
 

+Introduction
+
+
+The following sections describe the methods to profile DPDK applications on
+different architectures.
+
+x86
+~~~
 Intel processors provide performance counters to monitor events.
 Some tools provided by Intel can be used to profile and benchmark an 
application.
 See the *VTune Performance Analyzer Essentials* publication from Intel Press 
for more information.
@@ -50,3 +58,53 @@ The main situations that should be monitored through event 
counters are:
 Refer to the
 `Intel Performance Analysis Guide 
`_
 for details about application profiling.
+
+ARM64
+~
+
+Perf
+
+ARM64 architecture provide performance counters to monitor events.
+The Linux perf tool can be used to profile and benchmark an application.
+In addition to the standard events, perf can be used to profile arm64 specific
+PMU events through raw events(-e -rXX)
+
+Refer to the
+`ARM64 specific PMU events enumeration 
`_
+
+High-resolution cycle counter
+^
+The default cntvct_el0 based rte_rdtsc() provides portable means to get wall
+clock counter at user space. Typically it runs at <= 100MHz.
+
+The alternative method to enable rte_rdtsc() for high resolution
+wall clock counter is through armv8 PMU subsystem.
+The PMU cycle counter runs at CPU frequency, However, access to PMU cycle
+counter from user space is not enabled by default in the arm64 linux kernel.
+It is possible to enable cycle counter at user space access
+by configuring the PMU from the privileged mode (kernel space).
+
+by default rte_rdtsc() implementation uses portable cntvct_el0 scheme.
+Application can choose the PMU based implementation with
+CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU
+
+Find below the example steps to configure the PMU based cycle counter on an
+armv8 machine.
+
+.. code-block:: console
+
+git clone https://github.com/jerinjacobk/armv8_pmu_cycle_counter_el0
+cd armv8_pmu_cycle_counter_el0
+make
+sudo insmod pmu_el0_cycle_counter.ko
+cd $DPDK_DIR
+make config T=arm64-armv8a-linuxapp-gcc
+echo "CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU=y" >> build/.config
+make
+
+.. warning::
+
+   The PMU based scheme is useful for high accuracy performance profiling with
+   rte_rdtsc(). However, This method can not be used in conjunction with Linux
+   userspace profiling tools like perf as this scheme alters the PMU registers
+   state.
-- 
2.5.5

[dpdk-dev] [PATCH v7] net/virtio: add set_mtu in virtio

2016-10-05 Thread Dey, Souvik

Yes Mark, I have modified the patch with the below comments.

drivers/net/virtio/virtio_ethdev.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 423c597..1dbfea6 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -653,12 +653,20 @@ virtio_dev_allmulticast_disable(struct rte_eth_dev *dev)
PMD_INIT_LOG(ERR, "Failed to disable allmulticast");
 } 

+#define VLAN_TAG_LEN   4/* 802.3ac tag (not DMA'd) */
+
+static int  virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   uint32_t ether_hdr_len = ETHER_HDR_LEN + VLAN_TAG_LEN +
+   hw->vtnet_hdr_size;
+   uint32_t frame_size = mtu + ether_hdr_len;
+
+   if (mtu < ETHER_MIN_MTU || frame_size > VIRTIO_MAX_RX_PKTLEN) {
+   PMD_INIT_LOG(ERR, "MTU should be between %d and %d\n",
+   ETHER_MIN_MTU, (VIRTIO_MAX_RX_PKTLEN - ether_hdr_len));
+   return -EINVAL;
+   }
+   return 0;
+}

Let mem know if this looks good or we have few more comments. 

--
Regards,
Souvik

-Original Message-
From: Kavanagh, Mark B [mailto:mark.b.kavan...@intel.com] 
Sent: Wednesday, October 5, 2016 4:16 AM
To: Dey, Souvik ; yuanhan.liu at linux.intel.com; 
stephen at networkplumber.org
Cc: dev at dpdk.org
Subject: RE: [PATCH v7] net/virtio: add set_mtu in virtio

>Hi All,
>   Is there any further comments or modifications required for this 
>patch, or what next steps do you guys suggest here ?

Hi Souvik,

Some minor comments inline.

Thanks,
Mark

>
>--
>Regards,
>Souvik
>
>-Original Message-
>From: Dey, Souvik
>Sent: Saturday, October 1, 2016 10:09 AM
>To: mark.b.kavanagh at intel.com; yuanhan.liu at linux.intel.com; 
>stephen at networkplumber.org; dev at dpdk.org
>Subject: RE: [PATCH v7] net/virtio: add set_mtu in virtio
>
>Hi Liu/Stephen/Mark,
>
>   I have submitted Version 7 of this patch. Do let me know if this looks 
> proper.
>
>--
>Regards,
>Souvik
>
>-Original Message-
>From: Dey, Souvik
>Sent: Thursday, September 29, 2016 4:32 PM
>To: mark.b.kavanagh at intel.com; yuanhan.liu at linux.intel.com; 
>stephen at networkplumber.org; dev at dpdk.org
>Cc: Dey, Souvik 
>Subject: [PATCH v7] net/virtio: add set_mtu in virtio
>
>
>Virtio interfaces do not currently allow the user to specify a 
>particular Maximum Transmission Unit (MTU).Consequently, the MTU of 
>Virtio interfaces is typically set to the Ethernet default value of 1500.
>This is problematic in the case of cloud deployments, in which a 
>specific (and potentially non-standard) MTU needs to be set by a DHCP 
>server, which needs to be honored by all interfaces across the traffic 
>path.To achieve this Virtio interfaces should support setting of MTU.
>In case when GRE/VXLAN tunneling is used for internal communication, 
>there will be an overhead added by the infrastructure in the packet 
>over and above the ETHER MTU of 1518. So to take care of this overhead 
>in these cases the DHCP server corrects the L3 MTU to 1454. But since 
>virtio interfaces was not having the MTU set functionality that MTU 
>sent by the DHCP server was ignored and the instance will still send 
>packets with 1500 MTU which after encapsulation will become more than 
>1518 and eventually gets dropped in the infrastructure.
>By adding an additional 'set_mtu' function to the Virtio driver, we can 
>honor the MTU sent by the DHCP server. The dhcp server/controller can 
>then leverage this 'set_mtu' functionality to resolve the above 
>mentioned issue of packets getting dropped due to incorrect size.
>
>
>Signed-off-by: Souvik Dey 
>
>---
>Changes in v7:
>- Replaced the CRC_LEN with the merge rx buf header length.
>- Changed the frame_len max validation to VIRTIO_MAX_RX_PKTLEN.
>Changes in v6:
>- Description of change corrected
>- Corrected the identations
>- Corrected the subject line too
>- The From line was also not correct
>- Re-submitting as the below patch was not proper Changes in v5:
>- Fix log message for out-of-bounds MTU parameter in virtio_mtu_set
>- Calculate frame size, based on 'mtu' parameter
>- Corrected the upper bound and lower bound checks in virtio_mtu_set 
>Changes in v4: Incorporated review comments.
>Changes in v3: Corrected few style errors as reported by sys-stv.
>Changes in v2: Incorporated review comments.
>
> drivers/net/virtio/virtio_ethdev.c | 16 
> 1 file changed, 16 insertions(+)
>
>diff --git a/drivers/net/virtio/virtio_ethdev.c 
>b/drivers/net/virtio/virtio_ethdev.c
>index 423c597..1dbfea6 100644
>--- a/drivers/net/virtio/virtio_ethdev.c
>+++ b/drivers/net/virtio/virtio_ethdev.c
>@@ -653,12 +653,20 @@ virtio_dev_allmulticast_disable(struct rte_eth_dev *dev)
>PMD_INIT_LOG(ERR, "Failed to disable allmulticast");  }
>
>+#define VLAN_TAG_LEN   4/*

[dpdk-dev] [PATCH v2 1/2] mbuf: add rte_pktmbuff_reset_headroom function

2016-10-05 Thread Olivier Matz

Hi Maxime,

On 10/04/2016 02:05 PM, Maxime Coquelin wrote:
> Some application use rte_mbuf_raw_alloc() function to improve
> performance by not resetting mbuf's fields to their default state.
> 
> This can be however problematic for mbuf consumers that need some
> headroom, meaning that data_off field gets decremented after
> allocation. When the mbuf is re-used afterwards, there might not
> be enough room for the consumer to prepend anything, if the data_off
> field is not reset to its default value.
> 
> This patch adds a new rte_pktmbuf_reset_headroom() function that
> applications can call to reset the data_off field.
> This patch also replaces current data_off affectations in the mbuf
> lib with a call to this function.
> 
> Signed-off-by: Maxime Coquelin 

Series:

Acked-by: Olivier Matz 

Thanks

[dpdk-dev] [PATCH] log: do not drop debug logs at compile time

2016-10-05 Thread Olivier Matz

Hi David,

On 10/04/2016 10:28 AM, David Marchand wrote:
> On Mon, Oct 3, 2016 at 6:27 PM, Wiles, Keith  wrote:
>>> On Oct 3, 2016, at 10:37 AM, Olivier Matz  wrote:
>>> What makes you feel it's easier to add a log level instead of adding a
>>> new RTE_LOG_DP() function?
>>
>> It seems to me the log levels are for displaying logs at different levels 
>> adding a new macro to not log is just a hack because we do not have a log 
>> level for data path. This is why I would like to see a log level added and 
>> not a new macro.
>>
>> It also appears the new RTE_LOG() will always be in the code as you moved 
>> the test to the RTE_LOG_DP() macro. This would mean all RTE_LOG() in the 
>> code will always call rte_log(), correct?
>>
>> If using a new DEBUG_DP (maybe DATAPATH is a better log level name) level we 
>> can use the same macro as before and modify the level only. This way we can 
>> remove via the compiler any log that is below the default RTE_LOG_LEVEL. I 
>> see keeping the rte_log() could be a performance problem or code blot when 
>> you really want to remove them all.
>>
>> The DATAPATH log level would be above (smaller number) then DEBUG in the 
>> enum list. To remove all debug logs just set the RTE_LOG_LEVEL to 
>> RTE_LOG_DATAPATH.
> 
> If I try to draw a parrallel to syslog (well, the log subsystem in eal
> has always been bound to syslog ...), what you propose here is like
> adding a new level in syslog while you have syslog facilities.
> 
> With the current log api, we have types and levels, can't we filter at
> build time depending on the log "type" ?
> Here we would strip PMD type logs > INFO.

I think we may have dataplane logs in other part of the code: in the
application (log type = USER), and surely in some dpdk libraries.

Moreover, as the behavior of the macro changes (in one case it is
stripped at compilation, in the other case not), I'd say having
different functions is clearer for the developer than having a different
behavior depending on log level or log type.

Regards,
Olivier

[dpdk-dev] [PATCH] log: do not drop debug logs at compile time

2016-10-05 Thread Olivier Matz

Hi David,

On 10/04/2016 10:26 AM, David Marchand wrote:
> On Fri, Sep 16, 2016 at 9:43 AM, Olivier Matz  
> wrote:
>> Today, all logs whose level is lower than INFO are dropped at
>> compile-time. This prevents from enabling debug logs at runtime using
>> --log-level=8.
>>
>> The rationale was to remove debug logs from the data path at
>> compile-time, avoiding a test at run-time.
>>
>> This patch changes the behavior of RTE_LOG() to avoid the compile-time
>> optimization, and introduces the RTE_LOG_DP() macro that has the same
>> behavior than the previous RTE_LOG(), for the rare cases where debug
>> logs are in the data path.
>>
>> So it is now possible to enable debug logs at run-time by just
>> specifying --log-level=8. Some drivers still have special compile-time
>> options to enable more debug log. Maintainers may consider to
>> remove/reduce them.
>>
>> Signed-off-by: Olivier Matz 
>> ---
>>  config/common_base  |  1 +
>>  doc/guides/faq/faq.rst  |  2 +-
>>  drivers/net/bnxt/bnxt_txr.c |  2 +-
>>  drivers/net/nfp/nfp_net.c   |  8 +++---
>>  examples/distributor/main.c |  4 +--
>>  examples/ipsec-secgw/esp.c  |  2 +-
>>  examples/ipsec-secgw/ipsec.c|  4 +--
>>  examples/packet_ordering/main.c |  6 ++--
>>  examples/quota_watermark/qw/main.c  |  2 +-
>>  examples/tep_termination/main.c |  4 +--
>>  examples/vhost/main.c   | 14 +-
>>  examples/vhost_xen/main.c   | 20 +++---
>>  lib/librte_eal/common/include/rte_log.h | 49 
>> +
>>  13 files changed, 67 insertions(+), 51 deletions(-)
>>
>> diff --git a/config/common_base b/config/common_base
>> index 7830535..04b71e9 100644
>> --- a/config/common_base
>> +++ b/config/common_base
>> @@ -89,6 +89,7 @@ CONFIG_RTE_MAX_MEMSEG=256
>>  CONFIG_RTE_MAX_MEMZONE=2560
>>  CONFIG_RTE_MAX_TAILQ=32
>>  CONFIG_RTE_LOG_LEVEL=RTE_LOG_INFO
>> +CONFIG_RTE_LOG_DP_LEVEL=RTE_LOG_INFO
>>  CONFIG_RTE_LOG_HISTORY=256
>>  CONFIG_RTE_LIBEAL_USE_HPET=n
>>  CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
> 
> [snip]
> 
>> diff --git a/lib/librte_eal/common/include/rte_log.h 
>> b/lib/librte_eal/common/include/rte_log.h
>> index 919563c..76b198f 100644
>> --- a/lib/librte_eal/common/include/rte_log.h
>> +++ b/lib/librte_eal/common/include/rte_log.h
> 
> [snip]
> 
>> @@ -266,6 +257,30 @@ int rte_vlog(uint32_t level, uint32_t logtype, const 
>> char *format, va_list ap)
>>   *   - Negative on error.
>>   */
>>  #define RTE_LOG(l, t, ...) \
>> +rte_log(RTE_LOG_ ## l, \
>> +RTE_LOGTYPE_ ## t, # t ": " __VA_ARGS__)
>> +
>> +/**
>> + * Generates a log message for data path.
>> + *
>> + * Similar to RTE_LOG(), except that it is removed at compilation time
>> + * if the RTE_LOG_DP_LEVEL configuration option is lower than the log
>> + * level argument.
>> + *
>> + * @param l
>> + *   Log level. A value between EMERG (1) and DEBUG (8). The short name is
>> + *   expanded by the macro, so it cannot be an integer value.
>> + * @param t
>> + *   The log type, for example, EAL. The short name is expanded by the
>> + *   macro, so it cannot be an integer value.
>> + * @param ...
>> + *   The fmt string, as in printf(3), followed by the variable arguments
>> + *   required by the format.
>> + * @return
>> + *   - 0: Success.
>> + *   - Negative on error.
>> + */
>> +#define RTE_LOG_DP(l, t, ...)  \
>> (void)((RTE_LOG_ ## l <= RTE_LOG_LEVEL) ?   \
>>  rte_log(RTE_LOG_ ## l, \
>>  RTE_LOGTYPE_ ## t, # t ": " __VA_ARGS__) : \
>> --
>> 2.8.1
> 
> Hum, I suppose RTE_LOG_DP should look at RTE_LOG_DP_LEVEL.
> 

That's correct, good catch :)
I'll send a v2 with that fix.


Olivier

[dpdk-dev] [PATCH v2 09/12] virtio: add Rx checksum offload support

2016-10-05 Thread Olivier Matz

Hi Maxime,

On 10/03/2016 02:51 PM, Maxime Coquelin wrote:
>> --- a/drivers/net/virtio/virtio_rxtx.c
>> +++ b/drivers/net/virtio/virtio_rxtx.c
>> @@ -50,6 +50,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include "virtio_logs.h"
>>  #include "virtio_ethdev.h"
>> @@ -627,6 +628,56 @@ virtio_update_packet_stats(struct virtnet_stats
>> *stats, struct rte_mbuf *mbuf)
>>  }
>>  }
>>
>> +/* Optionally fill offload information in structure */
>> +static int
>> +virtio_rx_offload(struct rte_mbuf *m, struct virtio_net_hdr *hdr)
>> +{
>> +struct rte_net_hdr_lens hdr_lens;
>> +uint32_t hdrlen, ptype;
>> +int l4_supported = 0;
>> +
>> +/* nothing to do */
>> +if (hdr->flags == 0 && hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE)
>> +return 0;
> Maybe we could first check whether offload features were negotiated?
> Doing this, we could return before accessing the header and so avoid a
> cache miss.

Yes, doing this would avoid reading the virtio header when the rx
function is virtio_recv_pkts(). When using virtio_recv_mergeable_pkts(),
it won't have a big impact since we already need to read hdr->num_buffers.


I plan to do something like this in both recv functions:

@@ -854,6 +854,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
**rx_pkts, uint16_t nb_pkts)
int error;
uint32_t i, nb_enqueued;
uint32_t hdr_size;
+   uint64_t features;
struct virtio_net_hdr *hdr;

nb_used = VIRTQUEUE_NUSED(vq);
@@ -872,6 +873,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
**rx_pkts, uint16_t nb_pkts)
nb_rx = 0;
nb_enqueued = 0;
hdr_size = hw->vtnet_hdr_size;
+   features = hw->guest_features;

for (i = 0; i < num ; i++) {
rxm = rcv_pkts[i];
@@ -903,7 +905,8 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
**rx_pkts, uint16_t nb_pkts)
rte_vlan_strip(rxm);

/* Update offload features */
-   if (virtio_rx_offload(rxm, hdr) < 0) {
+   if ((features & VIRTIO_NET_F_GUEST_CSUM) &&
+   virtio_rx_offload(rxm, hdr) < 0) {
virtio_discard_rxbuf(vq, rxm);
rxvq->stats.errors++;
continue;


Thank you for the feedback.
Olivier

[dpdk-dev] Proposal: enable redirection of DPDK logs from the user app

2016-10-05 Thread Montorsi, Francesco

Hi Olivier,

> On 10/04/2016 02:28 PM, Montorsi, Francesco wrote:
> > Yes, but to be honest, that seems a troublesome solution for something
> > as easy as logging a string; e.g. by using fopencookie() approach, you
> > don't have the concept of "log message", you just provide a function
> > that must write a block of bytes somewhere.
> > Typically instead, you need to know where a log message starts and
> > ends, to e.g., add prefixes/postfixes to it.
> 
> I'm not sure that true if you call setbuf(log_stream, NULL).
> 
> In that case, it looks easy to prefix / postfix messages with a fopencookie
> callback like:
> 
> /* example on stdout */
> ssize_t
> simple_write(void *c, const char *buf, size_t size) {
>   ssize_t ret1, ret2, ret3;
> 
>   ret1 = fwrite("<", 1, 1, stdout);
>   if (ret1 == 0)
>   return 0;
>   ret2 = fwrite(buf, size, 1, stdout);
>   if (ret2 == 0)
>   return 0;
>   ret3 = fwrite(">", 1, 1, stdout);
>   if (ret3 == 0)
>   return 0;
>   return ret1 + ret2 + ret3;
> }
> 
I didn't know about setbuf()... but are we sure that in this way the 
simple_write() function will always receive a full string? I mean: in the 
manpage for setbuf() it says:

"... When the first I/O operation occurs on a file, malloc(3) is called, and a 
buffer is obtained.  If the argument buf is NULL, only the mode is 
affected; a new buffer will be allocated on the next read or write operation."

But: is it true that 1 write operation corresponds to 1 vfprintf() call? Maybe 
if you have a "long" a single vfprintf() call may translate to several 
simple_write() calls... I don't know honestly.

> > Indeed, most of the C/C++ (open source) libraries have some simple
> > hook that allows the user to have more control on logging... I think
> > DPDK should be no exception... :)
> 
> I understand that the current API is a bit more complex, but I don't feel 
> there
> is any blocking issue to do what you want. What do you think?

See above. Moreover, IMHO it would be much more user-friendly to have a simple 
function callback to implement vs having to dig into fopencookie()+setbuf()+etc 
etc . 

> Also, I know you've said your patch needs some rework, but as you've also
> said you are using it, maybe it would be useful for you to know:
> - it makes use of a global variable 'log_buffer', shared by all the pthreads,
> which can lead to crashes

That's a good point. I will turn it into a __thread variable. Thanks for 
pointing out this.

> - it strips the log messages to 4095 chars

Correct, but in my experience DPDK never creates such a long line of log 
message... 

Francesco

[dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK

2016-10-05 Thread Jerin Jacob

On Tue, Oct 04, 2016 at 09:49:52PM +, Vangati, Narender wrote:
> Hi Jerin,

Hi Narender,

Thanks for the comments.I agree with proposed changes; I will address these 
comments in v2.

/Jerin


> 
> 
> 
> Here are some comments on the libeventdev RFC.
> 
> These are collated thoughts after discussions with you & others to understand 
> the concepts and rationale for the current proposal.
> 
> 
> 
> 1. Concept of flow queues. This is better abstracted as flow ids and not as 
> flow queues which implies there is a queueing structure per flow. A s/w 
> implementation can do atomic load balancing on multiple flow ids more 
> efficiently than maintaining each event in a specific flow queue.
> 
> 
> 
> 2. Scheduling group. A scheduling group is more a steam of events, so an 
> event queue might be a better abstraction.
> 
> 
> 
> 3. An event queue should support the concept of max active atomic flows 
> (maximum number of active flows this queue can track at any given time) and 
> max active ordered sequences (maximum number of outstanding events waiting to 
> be egress reordered by this queue). This allows a scheduler implementation to 
> dimension/partition its resources among event queues.
> 
> 
> 
> 4. An event queue should support concept of a single consumer. In an 
> application, a stream of events may need to be brought together to a single 
> core for some stages of processing, e.g. for TX at the end of the pipeline to 
> avoid NIC reordering of the packets. Having a 'single consumer' event queue 
> for that stage allows the intensive scheduling logic to be short circuited 
> and can improve throughput for s/w implementations.
> 
> 
> 
> 5. Instead of tying eventdev access to an lcore, a higher level of 
> abstraction called event port is needed which is the application i/f to the 
> eventdev. Event ports are connected to event queues and is the object the 
> application uses to dequeue and enqueue events. There can be more than one 
> event port per lcore allowing multiple lightweight threads to have their own 
> i/f into eventdev, if the implementation supports it. An event port 
> abstraction also encapsulates dequeue depth and enqueue depth for a scheduler 
> implementations which can schedule multiple events at a time and output 
> events that can be buffered.
> 
> 
> 
> 6. An event should support priority. Per event priority is useful for 
> segregating high priority (control messages) traffic from low priority within 
> the same flow. This needs to be part of the event definition for 
> implementations which support it.
> 
> 
> 
> 7. Event port to event queue servicing priority. This allows two event ports 
> to connect to the same event queue with different priorities. For 
> implementations which support it, this allows a worker core to participate in 
> two different workflows with different priorities (workflow 1 needing 3.5 
> cores, workflow 2 needing 2.5 cores, and so on).
> 
> 
> 
> 8. Define the workflow as schedule/dequeue/enqueue. An implementation is free 
> to define schedule as NOOP. A distributed s/w scheduler can use this to 
> schedule events; also a centralized s/w scheduler can make this a NOOP on 
> non-scheduler cores.
> 
> 
> 
> 9. The schedule_from_group API does not fit the workflow.
> 
> 
> 
> 10. The ctxt_update/ctxt_wait breaks the normal workflow. If the normal 
> workflow is a dequeue -> do work based on event type -> enqueue,  a pin_event 
> argument to enqueue (where the pinned event is returned through the normal 
> dequeue) allows application workflow to remain the same whether or not an 
> implementation supports it.
> 
> 
> 
> 11. Burst dequeue/enqueue needed.
> 
> 
> 
> 12. Definition of a closed/open system - where open system is memory backed 
> and closed system eventdev has limited capacity. In such systems, it is also 
> useful to denote per event port how many packets can be active in the system. 
> This can serve as a threshold for ethdev like devices so they don't overwhelm 
> core to core events.
> 
> 
> 
> 13. There should be sort of device capabilities definition to address 
> different implementations.
> 
> 
> 
> 
> vnr
> ---
>

[dpdk-dev] [PATCH 2/2] net/bonding: enable slave VLAN filter

2016-10-05 Thread Iremonger, Bernard

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Eric Kinzie
> Sent: Thursday, August 4, 2016 7:25 PM
> To: dev at dpdk.org
> Cc: Jan Blunck 
> Subject: [dpdk-dev] [PATCH 2/2] net/bonding: enable slave VLAN filter
> 
> SR-IOV virtual functions cannot rely on promiscuous mode for the reception
> of VLAN tagged frames.  Program the vlan filter for each slave when a vlan is
> configured for the bonding master.
> 
> Signed-off-by: Eric Kinzie 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH 1/2] net/bonding: validate speed after link up

2016-10-05 Thread Iremonger, Bernard

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Eric Kinzie
> Sent: Thursday, August 4, 2016 7:25 PM
> To: dev at dpdk.org
> Cc: Jan Blunck 
> Subject: [dpdk-dev] [PATCH 1/2] net/bonding: validate speed after link up
> 
> It's possible for the bonding driver to mistakenly reject an interface based 
> in
> it's, as yet, unnegotiated link speed and duplex.  Always allow the interface
> to be added to the bonding interface but require link properties validation to
> succeed before slave is activated.
> 
> Fixes: 2efb58cbab6e ("bond: new link bonding library")
> 
> Signed-off-by: Eric Kinzie 

Acked-by: Bernard Iremonger

[dpdk-dev] qos: traffic shaping at queue level

2016-10-05 Thread Nikhil Jagtap

Hi Cristian,

Thanks for the info. A few more comments/questions inline.

On 3 October 2016 at 23:42, Dumitrescu, Cristian <
cristian.dumitrescu at intel.com> wrote:

>
>
>
>
> *From:* Nikhil Jagtap [mailto:nikhil.jagtap at gmail.com]
> *Sent:* Friday, September 30, 2016 7:12 AM
> *To:* dev at dpdk.org; Dumitrescu, Cristian  intel.com>;
> users at dpdk.org
> *Subject:* Re: qos: traffic shaping at queue level
>
>
>
> Hi,
>
> Can someone please answer my queries?
>
> I tried using queue weights to distribute traffic-class bandwidth among
> the child queues, but did not get the desired results.
>
> [Cristian] Can you please describe what issues you see?
>
[Nikhil] At the end of a 20 minute test, the total number of packets
dequeued from the respective queues were not in the ratio 1:5.
In one other test where 4 equal-rate traffic-streams were hitting 4
different queues of the same TC configured with weights 1:2:4:8, I observed
that the queue with highest weight had the least number of dequeued packets
when in theory it should have been the one with highest packet count.

Regards,
>
> Nikhil
>
>
>
> On 27 September 2016 at 15:34, Nikhil Jagtap 
> wrote:
>
> Hi,
>
>
>
> I have a few questions about the hierarchical scheduler. I am taking a
> simple example here to get a better understanding.
>
>
>
> Reference example:
>
>   pipe rate = 30 mbps
>
>   tc 0 rate = 30 mbps
>
>   traffic-type 0 being queued to queue 0, tc 0.
>
>   traffic-type 1 being queued to queue 1, tc 0.
>
>   Assume traffic-type 0 is being received at the rate of 25 mbps.
>
>   Assume traffic-type 1 is also being received at the rate of 25 mbps.
>
>
>
> Requirement:
>
>   To limit traffic-type 0 to (CIR =  5 mbps, PIR = 30 mbps), AND
>
>   limit traffic-type 1 to (CIR = 25 mbps, PIR = 30 mbps).
>
>
>
> The questions:
>
> 1) I understand that with the scheduler, it is possible to do rate
> limiting only at the sub-port and pipe levels and not at the individual
> queue level.
>
> [Cristian] Yes, correct, only subports and pipes own token buckets, with
> all the pipe traffic classes and queues sharing their pipe token bucket.
>
>
>
> Is it possible to achieve rate limiting using the notion of queue weights?
> For the above example, will assigning weights in 1:5 ratio to the two
> queues help achieve shaping the two traffic-types at the two different
> rates?
>
> [Cristian] Yes. However, getting the weight observed accurately relies on
> all the queues being backlogged (always having packets to dequeue). When a
> pipe and certain TC is examined for dequeuing, the relative weights are
> enforced between the queues that have packets at that precise moment in
> time, with the empty queues being ignored. The fully backlogged scenario is
> not taking place in practice, and the set of non-empty queues changes over
> time. As said it the past, having big relative weight ratios between queues
> helps (1:5 should be good).
>
> [Nikhil] I see. So I guess not having fully backlogged queues could be one
of the reasons for the observations I mentioned above where the
weights-ratio does not directly translate into rate-ratio. I think I should
also mention that there was no pipelining i.e. packet-processing, queueing,
dequeing was all being done inline in a run-to-completion model.
a) Would having some kind of pipelining help achieve better rate-ratio? May
be say atleast splitting the enqueue and dequeue operations?
b) If pipelining is not an option, what would be the recommended values for
enqueue and dequeue packet count in the run-to-completion model? You have
mentioned in one of your presentations to use different values for these
two. If I go with (enqueue# > dequeue#), don't I run the risk of filling up
the scheduler queues and failed enqueues even at rates lower than the
scheduler pipe rates? In the other case where (dequeue# > enqueue#), we
would end up dequeing all packets that were enqueued every time.


>
>
> 2) In continuation to previous question: if queue weights don't help,
> would it be possible to use metering to achieve rate limiting? Assume we
> meter individual traffic-types (using CIR-PIR config mentioned above)
> before queuing it to the scheduler queues. So to achieve the respective
> queue rates, the dequeuer would be expected to prioritise green packets
> over yellow.
>
> Looking into the code, the packet color is used as an input to the dropper
> block, but does not seem to be used anywhere in the scheduler. So I guess
> it is not possible to prioritise green packets when dequeing?
>
> [Cristian] Packet color is used by Weighted RED (WRED) congestion
> management scheme on the enqueue side, not on the dequeue side. Once the
> packet has been enqueued, it cannot be dropped (i.e. every enqueued packet
> will eventually be dequeued), so rate limiting cannot be enforced on the
> dequeue side.
>
>
>
> Regards,
>
> Nikhil
>
>
>
>
>
Thanks.
Nikhil

[dpdk-dev] [PATCH v5 0/4] Cuckoo hash enhancements

2016-10-05 Thread Thomas Monjalon

2016-10-05 00:25, Pablo de Lara:
> This patchset improves lookup performance on the current hash library
> by changing the existing lookup bulk pipeline, with an improved pipeline,
> based on a loop-and-jump model, instead of the current 4-stage 2-entry 
> pipeline.
> Also, x86 vectorized intrinsics are used to improve performance when 
> comparing signatures.

Applied, thanks

[dpdk-dev] [PATCH] test_cryptodev_perf: IV and digest should be stored at a DMAeble address

2016-10-05 Thread Akhil Goyal

On 9/26/2016 10:03 PM, akhil.goyal at nxp.com wrote:
> From: Akhil Goyal 
>
> For physical crypto devices, IV and digest are processed by the crypto
> device which need the contents to be written on some DMA able address.
>
> So in order to do that, IV and digest are accomodated in the packet.
>
> Signed-off-by: Akhil Goyal 
> ---
>  app/test/test_cryptodev_perf.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/app/test/test_cryptodev_perf.c b/app/test/test_cryptodev_perf.c
> index 0ea7ec1..930d5b8 100644
> --- a/app/test/test_cryptodev_perf.c
> +++ b/app/test/test_cryptodev_perf.c
> @@ -2366,9 +2366,13 @@ test_perf_set_crypto_op(struct rte_crypto_op *op, 
> struct rte_mbuf *m,
>   op->sym->auth.aad.length = AES_CBC_CIPHER_IV_LENGTH;
>
>   /* Cipher Parameters */
> - op->sym->cipher.iv.data = aes_cbc_iv;
> + op->sym->cipher.iv.data = (uint8_t *)m->buf_addr + m->data_off;
> + op->sym->cipher.iv.phys_addr = rte_pktmbuf_mtophys(m);
>   op->sym->cipher.iv.length = AES_CBC_CIPHER_IV_LENGTH;
>
> + rte_memcpy(op->sym->cipher.iv.data, aes_cbc_iv,
> + AES_CBC_CIPHER_IV_LENGTH);
> +
>   /* Data lengths/offsets Parameters */
>   op->sym->auth.data.offset = 0;
>   op->sym->auth.data.length = data_len;
> @@ -2468,7 +2472,9 @@ test_perf_aes_sha(uint8_t dev_id, uint16_t queue_id,
>   rte_pktmbuf_free(mbufs[k]);
>   return -1;
>   }
> -
> + /* Make room for Digest and IV in mbuf */
> + rte_pktmbuf_append(mbufs[i], digest_length);
> + rte_pktmbuf_prepend(mbufs[i], AES_CBC_CIPHER_IV_LENGTH);
>   }
>
>
>
Hi Declan,

Sorry I missed out copy your name in the TO list. Do we have some 
comments on this patch.

Regards,
Akhil

[dpdk-dev] [PATCH v2]:rte_timer:timer lag issue correction

2016-10-05 Thread Thomas Monjalon

> > For Periodic timers ,if the lag gets introduced, the current code
> > added additional delay when the next peridoc timer was initialized
> > by not taking into account the delay added, with this fix the code
> > would start the next occurrence of timer keeping in account the
> > lag added.Corrected the behavior.
> > 
> > Fixes: 9b15ba89 ("timer: use a skip list")
> > 
> > Karmarkar Suyash (1):
> > Signed-off-by: Karmarkar Suyash 
> 
> Yes, this change makes sense. I ran timer tests and they passed.
> 
> Acked-by: Robert Sanford 

Applied, thanks

[dpdk-dev] [PATCH] examples/ipsec-secgw: Update checksum while decrementing ttl

2016-10-05 Thread Akhil Goyal

On 10/5/2016 6:04 AM, De Lara Guarch, Pablo wrote:
>
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Sergio Gonzalez
>> Monroy
>> Sent: Monday, September 26, 2016 6:28 AM
>> To: akhil.goyal at nxp.com; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] examples/ipsec-secgw: Update checksum
>> while decrementing ttl
>>
>> Hi Akhil,
>>
>> This application relies on checksum offload in both outbound and inbound
>> paths (PKT_TX_IP_CKSUM flag).
[Akhil]Agreed that the application relies on checksum offload, but here 
we are talking about the inner ip header. Inner IP checksum will be 
updated on the next end point after decryption. This would expect that 
the next end point must have checksum offload capability. What if we are 
capturing the encrypted packets on wireshark or say send it to some 
other machine which does not run DPDK and do not know about checksum 
offload, then wireshark/other machine will not be able to get the 
correct the checksum and will show error.
>>
>> Because we assume that we always forward the packet in both paths, we
>> decrement the ttl in both inbound and outbound.
>> You seem to only increment (recalculate) the checksum of the inner IP
>> header in the outbound path but not the inbound path.
[Akhil]Correct I missed out the inbound path.
>>
>> Also, in the inbound path you have to consider a possible ECN value update.
[Akhil]If I take care of the ECN then it would mean I need to calculate 
the checksum completely, incremental checksum wont give correct results. 
This would surely impact performance. Any suggestion on how should we 
take care of ECN update. Should I recalculate the checksum and send the 
patch for ECN update? Or do we have a better solution.
>
> Any further comments here, Akhil?
>
> Thanks,
> Pablo
>
[Akhil] Sorry I missed out the previous reply from Sergio.

Thanks,
Akhil
>>
>> Sergio
>>
>>
>> On 26/09/2016 17:32, akhil.goyal at nxp.com wrote:
>>> From: Akhil Goyal 
>>>
>>> In IPsec-secgw application when TTL is decremented in IP header
>>> before forwarding the packet, checksum needs to be updated.
>>>
>>> In this patch an incremental checksum is added.
>>> Other applications(like l3fwd) are also doing so.
>>>
>>> Signed-off-by: Akhil Goyal 
>>> ---
>>>   examples/ipsec-secgw/ipip.h | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/examples/ipsec-secgw/ipip.h b/examples/ipsec-secgw/ipip.h
>>> index ff1dccd..ef059a9 100644
>>> --- a/examples/ipsec-secgw/ipip.h
>>> +++ b/examples/ipsec-secgw/ipip.h
>>> @@ -56,6 +56,7 @@ ipip_outbound(struct rte_mbuf *m, uint32_t offset,
>> uint32_t is_ipv6,
>>> if (inip4->ip_v == IPVERSION) {
>>> /* XXX This should be done by the forwarding engine instead
>> */
>>> inip4->ip_ttl -= 1;
>>> +   inip4->ip_sum += 1;
>>> ds_ecn = inip4->ip_tos;
>>> } else {
>>> inip6 = (struct ip6_hdr *)inip4;
>>
>>
>
>

[dpdk-dev] [PATCH v2 1/3] mem: fix hugepage mapping error messages

2016-10-05 Thread Thomas Monjalon

2016-10-04 20:07, Sergio Gonzalez Monroy:
> On 04/10/2016 18:17, Jean Tourrilhes wrote:
> > Running secondary is tricky due to the need to map the memory region
> > at the right place in VM, which is whatever primary has chosen. If the
> > base address for primary happens to by already mapped in the
> > secondary, we will hit precisely these error messages (depending if we
> > fail on the config region or the hugepages). This is why there is
> > already a comment about ASLR.
> >
> > The issue is that in most cases, remapping does not happen and "errno"
> > is not changed and therefore stale. In our case, we got a "permission
> > denied", which sent us down the wrong track. It's such a common error
> > for secondary that I feel this error message should be unambiguous and
> > helpful.
> > The call to close was also moved because close() may override errno.
> >
> > Signed-off-by: Jean Tourrilhes 
> > ---
> >   lib/librte_eal/linuxapp/eal/eal.c| 14 +++---
> >   lib/librte_eal/linuxapp/eal/eal_memory.c | 16 
> >   2 files changed, 23 insertions(+), 7 deletions(-)
> 
> Acked-by: Sergio Gonzalez Monroy 

Applied, thanks
A rebase was necessary because of this patch: http://dpdk.org/commit/c00ae961
Please check everything is OK.

[dpdk-dev] [RFC 0/7] changing mbuf pool handler

2016-10-05 Thread Hemant Agrawal

Hi Olivier,

> -Original Message-
> From: Hunt, David [mailto:david.hunt at intel.com]
> Hi Olivier,
> 
> 
> On 3/10/2016 4:49 PM, Olivier Matz wrote:
> > Hi Hemant,
> >
> > Thank you for your feedback.
> >
> > On 09/22/2016 01:52 PM, Hemant Agrawal wrote:
> >> Hi Olivier
> >>
> >> On 9/19/2016 7:12 PM, Olivier Matz wrote:
> >>> Hello,
> >>>
> >>> Following discussion from [1] ("usages issue with external mempool").
> >>>
> >>> This is a tentative to make the mempool_ops feature introduced by
> >>> David Hunt [2] more widely used by applications.
> >>>
> >>> It applies on top of a minor fix in mbuf lib [3].
> >>>
> >>> To sumarize the needs (please comment if I did not got it properly):
> >>>
> >>> - new hw-assisted mempool handlers will soon be introduced
> >>> - to make use of it, the new mempool API [4]
> (rte_mempool_create_empty,
> >>>rte_mempool_populate, ...) has to be used
> >>> - the legacy mempool API (rte_mempool_create) does not allow to
> change
> >>>the mempool ops. The default is "ring_p_c" depending on
> >>>flags.
> >>> - the mbuf helper (rte_pktmbuf_pool_create) does not allow to change
> >>>them either, and the default is RTE_MBUF_DEFAULT_MEMPOOL_OPS
> >>>("ring_mp_mc")
> >>> - today, most (if not all) applications and examples use either
> >>>rte_pktmbuf_pool_create or rte_mempool_create to create the mbuf
> >>>pool, making it difficult to take advantage of this feature with
> >>>existing apps.
> >>>
> >>> My initial idea was to deprecate both rte_pktmbuf_pool_create() and
> >>> rte_mempool_create(), forcing the applications to use the new API,
> >>> which is more flexible. But after digging a bit, it appeared that
> >>> rte_mempool_create() is widely used, and not only for mbufs.
> >>> Deprecating it would have a big impact on applications, and
> >>> replacing it with the new API would be overkill in many use-cases.
> >> I agree with the proposal.
> >>
> >>> So I finally tried the following approach (inspired from a
> >>> suggestion Jerin [5]):
> >>>
> >>> - add a new mempool_ops parameter to rte_pktmbuf_pool_create().
> This
> >>>unfortunatelly breaks the API, but I implemented an ABI compat layer.
> >>>If the patch is accepted, we could discuss how to announce/schedule
> >>>the API change.
> >>> - update the applications and documentation to prefer
> >>>rte_pktmbuf_pool_create() as much as possible
> >>> - update most used examples (testpmd, l2fwd, l3fwd) to add a new
> command
> >>>line argument to select the mempool handler
> >>>
> >>> I hope the external applications would then switch to
> >>> rte_pktmbuf_pool_create(), since it supports most of the use-cases
> >>> (even priv_size != 0, since we can call rte_mempool_obj_iter() after) .
> >>>
> >> I will still prefer if you can add the "rte_mempool_obj_cb_t *obj_cb,
> >> void *obj_cb_arg" into "rte_pktmbuf_pool_create". This single
> >> consolidated wrapper will almost make it certain that applications
> >> will not try to use rte_mempool_create for packet buffers.
> > The patch changes the example applications. I'm not sure I understand
> > why adding these arguments would force application to not use
> > rte_mempool_create() for packet buffers. Do you have a application in
> mind?
> >
> > For the mempool_ops parameter, we must pass it at init because we need
> > to know the mempool handler before populating the pool. For object
> > initialization, it can be done after, so I thought it was better to
> > reduce the number of arguments to avoid to fall in the
> > mempool_create() syndrom :)
> 
> I also agree with the proposal. Looks cleaner.
> 
> I would lean to the side of keeping the parameters to the minimum, i.e.
> not adding *obj_cb and *obj_cb_arg into rte_pktmbuf_pool_create.
> Developers always have the option of going with rte_mempool_create if they
> need more fine-grained control.

[Hemant] The implementations with hw offloaded mempools don't want developer 
using *rte_mempool_create* for packet buffer pools. 
This API does not work for hw offloaded mempool. 

Also, *rte_mempool_create_empty* - may not be convenient for many application, 
as it requires calling  4+ APIs.

Olivier is not in favor of deprecating the *rte_mempool_create*.   I agree with 
concerns raised by him. 

Essentially, I was suggesting to upgrade * rte_pktmbuf_pool_create* to be like 
*rte_mempool_create*  for packet buffers exclusively.

This will provide a clear segregation for API usages w.r.t the packet buffer 
pool vs all other type of mempools. 


Regards,
Hemant

> 
> Regards,
> Dave.
> 
> > Any other opinions?
> >
> > Regards,
> > Olivier

[dpdk-dev] [PATCH] eal: fix c++ compilation issue with rte_delay_us()

2016-10-05 Thread Thomas Monjalon

2016-10-03 18:27, Konstantin Ananyev:
> When compiling with C++, it treats
> void (*rte_delay_us)(unsigned int us);
> as definition of the global variable.
> So further linking with librte_eal fails.
> 
> Fixes: b4d63fb62240 ("eal: customize delay function")

Applied, thanks

I don't understand why it was not failing with C compilation?

[dpdk-dev] [PATCH v2] cryptodev: fix compilation error in SUSE 11 SP2

2016-10-05 Thread Adrien Mazarguil

On Wed, Oct 05, 2016 at 03:45:51AM +0100, Pablo de Lara wrote:
> This commit fixes following build error, which happens in SUSE 11 SP2,
> with gcc 4.5.1:
> 
> In file included from lib/librte_cryptodev/rte_cryptodev.c:70:0:
> lib/librte_cryptodev/rte_cryptodev.h:772:7:
> error: flexible array member in otherwise empty struct
> 
> Fixes: 347a1e037fd3 ("lib: use C99 syntax for zero-size arrays")
> 
> Signed-off-by: Pablo de Lara 
> ---
> 
> Changes in v2:
> - Fixed commit message
> 
>  lib/librte_cryptodev/rte_cryptodev.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_cryptodev/rte_cryptodev.h 
> b/lib/librte_cryptodev/rte_cryptodev.h
> index d565f39..6ad5e91 100644
> --- a/lib/librte_cryptodev/rte_cryptodev.h
> +++ b/lib/librte_cryptodev/rte_cryptodev.h
> @@ -773,7 +773,7 @@ struct rte_cryptodev_sym_session {
>   } __rte_aligned(8);
>   /**< Public symmetric session details */
>  
> - char _private[];
> + __extension__ char _private[0];
>   /**< Private session material */
>  };
>  
> -- 
> 2.7.4

Acked-by: Adrien Mazarguil 

-- 
Adrien Mazarguil
6WIND

[dpdk-dev] [PATCH v2] eal: don't fail secondary if primary is missing tailqs

2016-10-05 Thread Jean Tourrilhes

If the primary and secondary process were build using different build
systems, the list of constructors included by the linker in each
binary might be different. Tailqs are registered via constructors, so
the linker magic will directly impact which tailqs are registered with
the primary and the secondary.

DPDK currently assumes that the secondary has a subset of the tailqs
registered at the primary. In some build scenario, the secondary might
register a tailq that the primary did not register. In this case,
instead of exiting with a panic, just unregister the offending tailq
and allow the secondary to run.

Signed-off-by: Jean Tourrilhes 
---
 lib/librte_eal/common/eal_common_tailqs.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_tailqs.c 
b/lib/librte_eal/common/eal_common_tailqs.c
index bb08ec8..cf5a771 100644
--- a/lib/librte_eal/common/eal_common_tailqs.c
+++ b/lib/librte_eal/common/eal_common_tailqs.c
@@ -143,6 +143,8 @@ rte_eal_tailq_update(struct rte_tailq_elem *t)
t->head = rte_eal_tailq_create(t->name);
} else {
t->head = rte_eal_tailq_lookup(t->name);
+   if (t->head != NULL)
+   rte_tailqs_count++;
}
 }

@@ -178,19 +180,27 @@ int
 rte_eal_tailqs_init(void)
 {
struct rte_tailq_elem *t;
+   void *tmp_te;

rte_tailqs_count = 0;

-   TAILQ_FOREACH(t, _tailq_elem_head, next) {
+   TAILQ_FOREACH_SAFE(t, _tailq_elem_head, next, tmp_te) {
/* second part of register job for "early" tailqs, see
 * rte_eal_tailq_register and EAL_REGISTER_TAILQ */
rte_eal_tailq_update(t);
if (t->head == NULL) {
RTE_LOG(ERR, EAL,
"Cannot initialize tailq: %s\n", t->name);
-   /* no need to TAILQ_REMOVE, we are going to panic in
-* rte_eal_init() */
-   goto fail;
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+   /* no need to TAILQ_REMOVE, we are going
+* to panic in rte_eal_init() */
+   goto fail;
+   } else {
+   /* This means our list of constructor is
+* no the same as primary. Just remove
+* that missing tailq and continue */
+   TAILQ_REMOVE(_tailq_elem_head, t, next);
+   }
}
}

[dpdk-dev] [RFC 0/7] changing mbuf pool handler

2016-10-05 Thread Hunt, David

Hi Olivier,


On 3/10/2016 4:49 PM, Olivier Matz wrote:
> Hi Hemant,
>
> Thank you for your feedback.
>
> On 09/22/2016 01:52 PM, Hemant Agrawal wrote:
>> Hi Olivier
>>
>> On 9/19/2016 7:12 PM, Olivier Matz wrote:
>>> Hello,
>>>
>>> Following discussion from [1] ("usages issue with external mempool").
>>>
>>> This is a tentative to make the mempool_ops feature introduced
>>> by David Hunt [2] more widely used by applications.
>>>
>>> It applies on top of a minor fix in mbuf lib [3].
>>>
>>> To sumarize the needs (please comment if I did not got it properly):
>>>
>>> - new hw-assisted mempool handlers will soon be introduced
>>> - to make use of it, the new mempool API [4] (rte_mempool_create_empty,
>>>rte_mempool_populate, ...) has to be used
>>> - the legacy mempool API (rte_mempool_create) does not allow to change
>>>the mempool ops. The default is "ring_p_c" depending on
>>>flags.
>>> - the mbuf helper (rte_pktmbuf_pool_create) does not allow to change
>>>them either, and the default is RTE_MBUF_DEFAULT_MEMPOOL_OPS
>>>("ring_mp_mc")
>>> - today, most (if not all) applications and examples use either
>>>rte_pktmbuf_pool_create or rte_mempool_create to create the mbuf
>>>pool, making it difficult to take advantage of this feature with
>>>existing apps.
>>>
>>> My initial idea was to deprecate both rte_pktmbuf_pool_create() and
>>> rte_mempool_create(), forcing the applications to use the new API, which
>>> is more flexible. But after digging a bit, it appeared that
>>> rte_mempool_create() is widely used, and not only for mbufs. Deprecating
>>> it would have a big impact on applications, and replacing it with the
>>> new API would be overkill in many use-cases.
>> I agree with the proposal.
>>
>>> So I finally tried the following approach (inspired from a suggestion
>>> Jerin [5]):
>>>
>>> - add a new mempool_ops parameter to rte_pktmbuf_pool_create(). This
>>>unfortunatelly breaks the API, but I implemented an ABI compat layer.
>>>If the patch is accepted, we could discuss how to announce/schedule
>>>the API change.
>>> - update the applications and documentation to prefer
>>>rte_pktmbuf_pool_create() as much as possible
>>> - update most used examples (testpmd, l2fwd, l3fwd) to add a new command
>>>line argument to select the mempool handler
>>>
>>> I hope the external applications would then switch to
>>> rte_pktmbuf_pool_create(), since it supports most of the use-cases (even
>>> priv_size != 0, since we can call rte_mempool_obj_iter() after) .
>>>
>> I will still prefer if you can add the "rte_mempool_obj_cb_t *obj_cb,
>> void *obj_cb_arg" into "rte_pktmbuf_pool_create". This single
>> consolidated wrapper will almost make it certain that applications will
>> not try to use rte_mempool_create for packet buffers.
> The patch changes the example applications. I'm not sure I understand
> why adding these arguments would force application to not use
> rte_mempool_create() for packet buffers. Do you have a application in mind?
>
> For the mempool_ops parameter, we must pass it at init because we need
> to know the mempool handler before populating the pool. For object
> initialization, it can be done after, so I thought it was better to
> reduce the number of arguments to avoid to fall in the mempool_create()
> syndrom :)

I also agree with the proposal. Looks cleaner.

I would lean to the side of keeping the parameters to the minimum, i.e. 
not adding *obj_cb and *obj_cb_arg into rte_pktmbuf_pool_create. 
Developers always have the option of going with rte_mempool_create if 
they need more fine-grained control.

Regards,
Dave.

> Any other opinions?
>
> Regards,
> Olivier

[dpdk-dev] [PATCH 00/19] KNI checkpatch cleanup

2016-10-05 Thread Pattan, Reshma

Hi Ferruh,


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Thursday, September 15, 2016 4:46 PM
> To: dev at dpdk.org
> Cc: Stephen Hemminger ; Ferruh Yigit
> 
> Subject: [dpdk-dev] [PATCH 00/19] KNI checkpatch cleanup
> 
> KNI checkpatch cleanup, mostly non-functional but cosmetic modifications.
> Only functional change is related logging, switched to kernel dynamic logging
> and compile time KNI debug options removed, some log message levels
> updated.
> 
> Ferruh Yigit (19):
>   kni: move externs to the header file
>   kni: uninitialize global variables
>   kni: make static struct const
>   kni: whitespace, indentation, long line corrections
>   kni: prefer unsigned int to unsigned
>   kni: remove useless return
>   kni: comparisons should place the constant on the right
>   kni: trailing statements should be on next line
>   kni: do not use assignment in if condition
>   kni: macros with complex values should be enclosed in parentheses
>   kni: prefer min_t to min
>   kni: prefer ether_addr_copy to memcpy
>   kni: update kernel logging
>   kni: remove unnecessary 'out of memory' message
>   kni: move functions to eliminate function declarations
>   kni: remove compile time debug configuration
>   kni: updated log messages
>   kni: prefer uint32_t to unsigned int
>   kni: move kernel version ifdefs to compat header
> 

Patches 4,5,11 and 13-19 are failed to apply.

Thanks,
Reshma

[dpdk-dev] [PATCH 1/1] eal: Don't fail secondary if primary is missing tailqs

2016-10-05 Thread Jean Tourrilhes

On Wed, Oct 05, 2016 at 07:09:14PM +0200, Thomas Monjalon wrote:
> 
> Probably that you would have some aligned builds if Snort was using
> a pkg-config approach to link DPDK.

I seriously doubt it, but maybe there is some deep linker
magic that would pick the appropriate set of constructor.

> > For tailq, I agree. For mempool constructors, order do matter.
> 
> I don't know why such a complex function (rte_mempool_register_ops) is
> called inside a constructor. Maybe that's the main problem.

No. The problem is that the list of constructors linked by the
linker in each binary is different, whereas DPDK expect them to be the
same.
Regards,

Jean

[dpdk-dev] [PATCH] eal: fix c++ compilation issue with rte_delay_us()

2016-10-05 Thread Ananyev, Konstantin

Hi Thomas,

> 
> 2016-10-03 18:27, Konstantin Ananyev:
> > When compiling with C++, it treats
> > void (*rte_delay_us)(unsigned int us);
> > as definition of the global variable.
> > So further linking with librte_eal fails.
> >
> > Fixes: b4d63fb62240 ("eal: customize delay function")
> 
> Applied, thanks
> 
> I don't understand why it was not failing with C compilation?

Don't know off hand.
Yes, I would expect gcc to fail with same symptoms too.
But by some reason it puts it makes it a 'common' symbol:

$ cat rttm1.c

#include 
#include 
#include 

int main(int argc, char *argv[])
{
int ret = rte_eal_init(argc, argv);
rte_delay_us(1);
printf("return code: %d\n", ret);
return ret;
}

$ gcc -m64 -pthread -o rttm1 rttm1.o -ldl   -L/${RTE_SDK}/${RTE_TARGET}/lib 
-Wl,-lrte_eal
$ nm rttm1.o | grep rte_delay_us
0008 C rte_delay_us

Konstantin

[dpdk-dev] [PATCH v2] app/testpmd: fix DCB config issue

2016-10-05 Thread Iremonger, Bernard

> -Original Message-
> From: Lu, Wenzhuo
> Sent: Monday, September 26, 2016 2:11 AM
> To: dev at dpdk.org
> Cc: De Lara Guarch, Pablo ; Iremonger,
> Bernard ; Lu, Wenzhuo
> 
> Subject: [PATCH v2] app/testpmd: fix DCB config issue
> 
> An issue is found that DCB cannot be configured on ixgbe NICs. It's said the
> TX queue number is not right.
> On ixgbe the max TX queue number is not fixed, it depends on the multi-
> queue mode.
> 
> This patch adds the device configuration before getting info in the DCB
> configuration process. So the right info can be got depending on the
> configuration.
> 
> Fixes: 1a572499beb6 (app/testpmd: setup DCB forwarding based on traffic
> class)
> Signed-off-by: Wenzhuo Lu 

Acked-by: Bernard Iremonger

[dpdk-dev] [PATCH 2/2] kni: remove unnecessary ethtool files

2016-10-05 Thread Remy Horton



On 30/09/2016 11:10, Ferruh Yigit wrote:
> Signed-off-by: Ferruh Yigit 
> ---
>  lib/librte_eal/linuxapp/kni/Makefile   |   3 -
>  .../linuxapp/kni/ethtool/igb/igb_debugfs.c |  28 --
>  .../linuxapp/kni/ethtool/igb/igb_hwmon.c   | 260 ---
>  .../linuxapp/kni/ethtool/igb/igb_procfs.c  | 363 
> -
>  4 files changed, 654 deletions(-)
>  delete mode 100644 lib/librte_eal/linuxapp/kni/ethtool/igb/igb_debugfs.c
>  delete mode 100644 lib/librte_eal/linuxapp/kni/ethtool/igb/igb_hwmon.c
>  delete mode 100644 lib/librte_eal/linuxapp/kni/ethtool/igb/igb_procfs.c

Acked-by: Remy Horton

[dpdk-dev] [PATCH 1/2] kni: remove unused ethtool files

2016-10-05 Thread Remy Horton



On 30/09/2016 11:10, Ferruh Yigit wrote:
> Signed-off-by: Ferruh Yigit 
> ---
>  lib/librte_eal/linuxapp/kni/Makefile   |2 -
>  lib/librte_eal/linuxapp/kni/ethtool/igb/igb_ptp.c  |  944 -
>  lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.c  | 1482 
> 
>  .../linuxapp/kni/ethtool/igb/kcompat_ethtool.c | 1171 
>  .../linuxapp/kni/ethtool/ixgbe/ixgbe_sriov.h   |   73 -
>  5 files changed, 3672 deletions(-)
>  delete mode 100644 lib/librte_eal/linuxapp/kni/ethtool/igb/igb_ptp.c
>  delete mode 100644 lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.c
>  delete mode 100644 lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat_ethtool.c
>  delete mode 100644 lib/librte_eal/linuxapp/kni/ethtool/ixgbe/ixgbe_sriov.h

Acked-by: Remy Horton

[dpdk-dev] [PATCH 1/1] eal: Don't fail secondary if primary is missing tailqs

2016-10-05 Thread David Marchand

Hello,

On Tue, Oct 4, 2016 at 6:59 PM, Jean Tourrilhes  wrote:
> On Tue, Oct 04, 2016 at 02:11:39PM +0100, Sergio Gonzalez Monroy wrote:
>> The case you are trying to fix is, as an example, when your secondary app is
>> using LPM but your primary is not.
>> So basically with this patch, you are removing the tailq for LPM on
>> secondary and continuing as normal, is that the case?
>
> The secondary can't use tailq types that the primary does not
> have, because they are shared across the shared memory.

I am not a "multi process" user but afaik the primary process is
responsible for filling the shared memory.
The secondary processes look at it.
So having unaligned processes can't work.

> What happens is that the primary and secondary did not compile
> in the same list of tailq. See previous e-mail :
> http://dpdk.org/ml/archives/dev/2016-September/047329.html
> The reason it's happening is that the secondary was not
> compiled with the DPDK build system, but with the build system of the
> application (in this case, Snort). Oubviously, porting the application
> to the DPDK build system is not practical, so we need to live with
> this case.
> The build system of the application does not have all the
> subtelties of the DPDK build system, and ends up including *all* the
> constructors, wether they are used or not in the code. Moreover, they
> are included in a different order. Actually, by default the builds
> include no constructors at all (which is a big fail), so the library
> needs to be included with --whole-archive (see Snort DPDK
> instructions).

I thought you had unaligned binaries.
You are compiling only one binary ?

>> I am not convinced about this approach.
>
> I agree that the whole constructor approach is flaky and my
> patch is only a band aid. Constructors should be entirely removed
> IMHO, and a more deterministic init method should be used instead of
> depending on linker magic.
> Note that the other constructors happen to work right in my
> case, but that's probably pure luck. The list of mempool constructors
> happen to be the same and in the same order (order matters for mempool
> constructors). The app is not using spinlock, hash, crc and acl, so
> I did not look if the lists did match.

I am not sure Sergio is talking about the constructor approach.

Anyway, the constructors invocation order should not matter.
Primary and secondary processes build their local tailq entries list
in constructors (so far, I can't see how this is wrong).
"Later", each process updates this list with the actual pointer to the
lists by looking at the shared memory in rte_eal_init (calling
rte_eal_tailqs_init).

What matters is that secondary tailqs are a subset of the primary tailqs.

I still have some trouble understanding what you are trying to do.
As Sergio asked, can you come up with a simplified example/use case ?

Thanks.

-- 
David Marchand

[dpdk-dev] [PATCH 1/1] eal: Don't fail secondary if primary is missing tailqs

2016-10-05 Thread Jean Tourrilhes

On Wed, Oct 05, 2016 at 09:58:01AM +0200, David Marchand wrote:
> Hello,

Hi there,

> I thought you had unaligned binaries.
> You are compiling only one binary ?

Primary is compiled using the DPDK build process.
Secondary is build using the Snort build process.
Both are pointing to the exact same libdpdk.a.

> I am not sure Sergio is talking about the constructor approach.

But, this is exactly the cause of the problem.

> Anyway, the constructors invocation order should not matter.

For tailq, I agree. For mempool constructors, order do matter.

> Primary and secondary processes build their local tailq entries list
> in constructors (so far, I can't see how this is wrong).
> "Later", each process updates this list with the actual pointer to the
> lists by looking at the shared memory in rte_eal_init (calling
> rte_eal_tailqs_init).
> 
> What matters is that secondary tailqs are a subset of the primary tailqs.

Which is not the case for me, I have secondary including all
tailqs, and primary only having a subset.
Check here :
http://dpdk.org/ml/archives/dev/2016-September/047329.html

> I still have some trouble understanding what you are trying to do.

Having things work ;-)

> As Sergio asked, can you come up with a simplified example/use case ?

Not trivial. I'll see what I can do.

> Thanks.
> 
> 
> -- 
> David Marchand

Regards,

Jean

[dpdk-dev] [PATCH v2 1/3] mem: fix hugepage mapping error messages

2016-10-05 Thread Jean Tourrilhes

On Wed, Oct 05, 2016 at 11:51:48AM +0200, Thomas Monjalon wrote:
> 
> Applied, thanks
> A rebase was necessary because of this patch: http://dpdk.org/commit/c00ae961
> Please check everything is OK.

Tested today's master. Working as expected.
Thanks !

Jean

[dpdk-dev] [PATCH v2]:rte_timer:timer lag issue correction

2016-10-05 Thread Pattan, Reshma

Hi  Suyash,

> -Original Message-
> From: Karmarkar Suyash [mailto:skarmarkar at sonusnet.com]
> Sent: Tuesday, October 4, 2016 11:36 PM
> To: Sanford, Robert ; dev at dpdk.org;
> thomas.monjalon at 6wind.com; Pattan, Reshma 
> Subject: RE: [PATCH v2]:rte_timer:timer lag issue correction
> 
> Thanks !! So as next steps I will push the patch .
> 

Thomas will apply the patch, you no need to do anything.

Thanks,
Reshma

[dpdk-dev] [PATCH v7] net/virtio: add set_mtu in virtio

2016-10-05 Thread Kavanagh, Mark B

>Hi All,
>   Is there any further comments or modifications required for this patch, 
> or what next
>steps do you guys suggest here ?

Hi Souvik,

Some minor comments inline.

Thanks,
Mark

>
>--
>Regards,
>Souvik
>
>-Original Message-
>From: Dey, Souvik
>Sent: Saturday, October 1, 2016 10:09 AM
>To: mark.b.kavanagh at intel.com; yuanhan.liu at linux.intel.com; stephen at 
>networkplumber.org;
>dev at dpdk.org
>Subject: RE: [PATCH v7] net/virtio: add set_mtu in virtio
>
>Hi Liu/Stephen/Mark,
>
>   I have submitted Version 7 of this patch. Do let me know if this looks 
> proper.
>
>--
>Regards,
>Souvik
>
>-Original Message-
>From: Dey, Souvik
>Sent: Thursday, September 29, 2016 4:32 PM
>To: mark.b.kavanagh at intel.com; yuanhan.liu at linux.intel.com; stephen at 
>networkplumber.org;
>dev at dpdk.org
>Cc: Dey, Souvik 
>Subject: [PATCH v7] net/virtio: add set_mtu in virtio
>
>
>Virtio interfaces do not currently allow the user to specify a particular
>Maximum Transmission Unit (MTU).Consequently, the MTU of Virtio interfaces
>is typically set to the Ethernet default value of 1500.
>This is problematic in the case of cloud deployments, in which a specific
>(and potentially non-standard) MTU needs to be set by a DHCP server, which
>needs to be honored by all interfaces across the traffic path.To achieve
>this Virtio interfaces should support setting of MTU.
>In case when GRE/VXLAN tunneling is used for internal communication, there
>will be an overhead added by the infrastructure in the packet over and
>above the ETHER MTU of 1518. So to take care of this overhead in these
>cases the DHCP server corrects the L3 MTU to 1454. But since virtio
>interfaces was not having the MTU set functionality that MTU sent by the
>DHCP server was ignored and the instance will still send packets with 1500
>MTU which after encapsulation will become more than 1518 and eventually
>gets dropped in the infrastructure.
>By adding an additional 'set_mtu' function to the Virtio driver, we can
>honor the MTU sent by the DHCP server. The dhcp server/controller can
>then leverage this 'set_mtu' functionality to resolve the above
>mentioned issue of packets getting dropped due to incorrect size.
>
>
>Signed-off-by: Souvik Dey 
>
>---
>Changes in v7:
>- Replaced the CRC_LEN with the merge rx buf header length.
>- Changed the frame_len max validation to VIRTIO_MAX_RX_PKTLEN.
>Changes in v6:
>- Description of change corrected
>- Corrected the identations
>- Corrected the subject line too
>- The From line was also not correct
>- Re-submitting as the below patch was not proper
>Changes in v5:
>- Fix log message for out-of-bounds MTU parameter in virtio_mtu_set
>- Calculate frame size, based on 'mtu' parameter
>- Corrected the upper bound and lower bound checks in virtio_mtu_set
>Changes in v4: Incorporated review comments.
>Changes in v3: Corrected few style errors as reported by sys-stv.
>Changes in v2: Incorporated review comments.
>
> drivers/net/virtio/virtio_ethdev.c | 16 
> 1 file changed, 16 insertions(+)
>
>diff --git a/drivers/net/virtio/virtio_ethdev.c 
>b/drivers/net/virtio/virtio_ethdev.c
>index 423c597..1dbfea6 100644
>--- a/drivers/net/virtio/virtio_ethdev.c
>+++ b/drivers/net/virtio/virtio_ethdev.c
>@@ -653,12 +653,20 @@ virtio_dev_allmulticast_disable(struct rte_eth_dev *dev)
>PMD_INIT_LOG(ERR, "Failed to disable allmulticast");
> }
>
>+#define VLAN_TAG_LEN   4/* 802.3ac tag (not DMA'd) */

There should be a blank line between the #define and the function prototype 
beneath.

>+static int  virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
>+{
>+  struct virtio_hw *hw = dev->data->dev_private;
>+  uint32_t ether_hdr_len = ETHER_HDR_LEN + VLAN_TAG_LEN +
>+  hw->vtnet_hdr_size;

I'll rely on Stephen and Yuanhan's judgment for this.

>+  uint32_t frame_size = mtu + ether_hdr_len;
>+
>+  if (mtu < ETHER_MIN_MTU || frame_size > VIRTIO_MAX_RX_PKTLEN) {
>+  PMD_INIT_LOG(ERR, "MTU should be between %d and %d\n",
>+  ETHER_MIN_MTU, VIRTIO_MAX_RX_PKTLEN);

Shouldn't last format print parameter should be (VIRTIO_MAX_RX_PKTLEN - 
ether_hdr_len)?
i.e PMD_INIT_LOG(ERR, "MTU should%d\n",
  ETHER_MIN_MTU, (VIRTIO_MAX_RX_PKTLEN - ether_hdr_len));

>+  return -EINVAL;
>+  }
>+  return 0;
>+}
>
> /*
>  * dev_ops for virtio, bare necessities for basic operation
>  */
>@@ -677,7 +685,6 @@ static const struct eth_dev_ops virtio_eth_dev_ops = {
>   .allmulticast_enable = virtio_dev_allmulticast_enable,
>   .allmulticast_disable= virtio_dev_allmulticast_disable,
>+  .mtu_set = virtio_mtu_set,
>   .dev_infos_get   = virtio_dev_info_get,
>   .stats_get   = virtio_dev_stats_get,
>   .xstats_get  = virtio_dev_xstats_get,
>--
>2.9.3.windows.1

[dpdk-dev] [PATCH v2 1/8] mbuf: add function to dump ol flag list

2016-10-05 Thread De Lara Guarch, Pablo

Hi Olivier,

> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Friday, September 09, 2016 12:55 AM
> To: dev at dpdk.org; De Lara Guarch, Pablo
> Subject: [PATCH v2 1/8] mbuf: add function to dump ol flag list
> 
> The functions rte_get_rx_ol_flag_name() and rte_get_tx_ol_flag_name()
> can dump one flag, or set of flag that are part of the same mask (ex:
> PKT_TX_UDP_CKSUM, part of PKT_TX_L4_MASK). But they are not designed
> to
> dump the list of flags contained in mbuf->ol_flags.
> 
> This commit introduce new functions to do that. Similarly to the packet
> type dump functions, the goal is to factorize the code that could be
> used in several applications and reduce the risk of desynchronization
> between the flags and the dump functions.
> 
> Signed-off-by: Olivier Matz 
> ---
>  doc/guides/rel_notes/release_16_11.rst |  5 ++
>  lib/librte_mbuf/rte_mbuf.c | 91
> ++
>  lib/librte_mbuf/rte_mbuf.h | 28 +++
>  lib/librte_mbuf/rte_mbuf_version.map   |  2 +
>  4 files changed, 126 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_16_11.rst
> b/doc/guides/rel_notes/release_16_11.rst
> index 36111f3..a877e58 100644
> --- a/doc/guides/rel_notes/release_16_11.rst
> +++ b/doc/guides/rel_notes/release_16_11.rst
> @@ -50,6 +50,11 @@ New Features
> 
>Added new functions ``rte_get_ptype_*()`` to dump a packet type as a
> string.
> 
> +* **Added functions to dump the offload flags as a string.**
> +
> +  Added two new functions ``rte_get_rx_ol_flag_list()`` and
> +  ``rte_get_tx_ol_flag_list()`` to dump offload flags as a string.
> +
>  Resolved Issues
>  ---
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index fb2b962..56f37e6 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -319,6 +319,53 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
>   }
>  }
> 
> +struct flag_mask {
> + uint64_t flag;
> + uint64_t mask;
> + const char *default_name;
> +};
> +
> +/* write the list of rx ol flags in buffer buf */
> +int rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)

Return type should go in a separate line (same case in the other function).

Thanks,
Pablo

[dpdk-dev] [PATCH v2 4/8] app/testpmd: add option to enable lro

2016-10-05 Thread De Lara Guarch, Pablo

Hi Olivier,

> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Friday, September 09, 2016 12:56 AM
> To: dev at dpdk.org; De Lara Guarch, Pablo
> Subject: [PATCH v2 4/8] app/testpmd: add option to enable lro
> 
> Introduce a new argument '--enable-lro' to ask testpmd to enable the LRO
> feature on enabled ports, like it's done for '--enable-rx-cksum' for
> instance.
> 
> Signed-off-by: Olivier Matz 
> ---
>  app/test-pmd/parameters.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
> index 8792c2c..612ad37 100644
> --- a/app/test-pmd/parameters.c
> +++ b/app/test-pmd/parameters.c
> @@ -150,6 +150,7 @@ usage(char* progname)
>  "If the drop-queue doesn't exist, the packet is dropped. "
>  "By default drop-queue=127.\n");
>   printf("  --crc-strip: enable CRC stripping by hardware.\n");
> + printf("  --enable-lro: enable large receive offload.\n");
>   printf("  --enable-rx-cksum: enable rx hardware checksum
> offload.\n");
>   printf("  --disable-hw-vlan: disable hardware vlan.\n");
>   printf("  --disable-hw-vlan-filter: disable hardware vlan filter.\n");
> @@ -525,6 +526,7 @@ launch_args_parse(int argc, char** argv)
>   { "pkt-filter-size",1, 0, 0 },
>   { "pkt-filter-drop-queue",  1, 0, 0 },
>   { "crc-strip",  0, 0, 0 },
> + { "enable-lro", 0, 0, 0 },
>   { "enable-rx-cksum",0, 0, 0 },
>   { "enable-scatter", 0, 0, 0 },
>   { "disable-hw-vlan",0, 0, 0 },
> @@ -765,6 +767,8 @@ launch_args_parse(int argc, char** argv)
>   }
>   if (!strcmp(lgopts[opt_idx].name, "crc-strip"))
>   rx_mode.hw_strip_crc = 1;
> + if (!strcmp(lgopts[opt_idx].name, "enable-lro"))
> + rx_mode.enable_lro = 1;
>   if (!strcmp(lgopts[opt_idx].name, "enable-scatter"))
>   rx_mode.enable_scatter = 1;
>   if (!strcmp(lgopts[opt_idx].name, "enable-rx-cksum"))
> --
> 2.8.1

Could you add this new parameter in the testpmd documentation?

Thanks,
Pablo

[dpdk-dev] [PATCH v2] cryptodev: fix compilation error in SUSE 11 SP2

2016-10-05 Thread Pablo de Lara

This commit fixes following build error, which happens in SUSE 11 SP2,
with gcc 4.5.1:

In file included from lib/librte_cryptodev/rte_cryptodev.c:70:0:
lib/librte_cryptodev/rte_cryptodev.h:772:7:
error: flexible array member in otherwise empty struct

Fixes: 347a1e037fd3 ("lib: use C99 syntax for zero-size arrays")

Signed-off-by: Pablo de Lara 
---

Changes in v2:
- Fixed commit message

 lib/librte_cryptodev/rte_cryptodev.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_cryptodev/rte_cryptodev.h 
b/lib/librte_cryptodev/rte_cryptodev.h
index d565f39..6ad5e91 100644
--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -773,7 +773,7 @@ struct rte_cryptodev_sym_session {
} __rte_aligned(8);
/**< Public symmetric session details */

-   char _private[];
+   __extension__ char _private[0];
/**< Private session material */
 };

-- 
2.7.4

[dpdk-dev] [PATCH] cryptodev: fix compilation error in SUSE 11 SP2

2016-10-05 Thread De Lara Guarch, Pablo

Hi Adrien,

> -Original Message-
> From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> Sent: Friday, September 30, 2016 1:34 AM
> To: De Lara Guarch, Pablo
> Cc: dev at dpdk.org; Doherty, Declan
> Subject: Re: [PATCH] cryptodev: fix compilation error in SUSE 11 SP2
> 
> On Thu, Sep 29, 2016 at 07:30:31PM +, De Lara Guarch, Pablo wrote:
> > > -Original Message-
> > > From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> > > Sent: Tuesday, September 27, 2016 12:45 AM
> > > To: De Lara Guarch, Pablo
> > > Cc: dev at dpdk.org; Doherty, Declan
> > > Subject: Re: [PATCH] cryptodev: fix compilation error in SUSE 11 SP2
> > >
> > > On Mon, Sep 26, 2016 at 10:50:35PM +0100, Pablo de Lara wrote:
> > > > This commit fixes following build error, which happens in SUSE 11 SP2,
> > > > with gcc 4.5.1:
> > > >
> > > > In file included from lib/librte_cryptodev/rte_cryptodev.c:71:0:
> > > > lib/librte_cryptodev/rte_cryptodev_pmd.h:76:7:
> > > > error: flexible array member in otherwise empty struct
> > > >
> > > > Fixes: 347a1e037fd3 ("lib: use C99 syntax for zero-size arrays")
> > > >
> > > > Signed-off-by: Pablo de Lara 
> > >
> > > Hmm, this error message does not seem related to your patch. Assuming a
> > > similar error is caused by the original code, I think there is a more
> > > important issue as the struct should not be empty. Can you check the
> > > error?
> >
> > Well, I don't really understand what is the difference between array[] and
> array[0],
> > I thought both were the same, but some compilers only accept the latter.
> 
> Before array[] got standardized by C99, a common trick was to use array[0],
> in a sense they are similar except for this one case: a struct with a single
> array[] field is explicitly not allowed in C99 since it causes the structure
> to be empty (this syntax only provides an alignment constraint for what
> follows in case padding is required), no such problem with array[0] which
> although nonstandard, is an accepted behavior, sizeof(struct foo) may yield
> 0 without complaint.
> 
> > In any case, the struct will not be empty, as there are other fields, that 
> > are
> not variable sized.
> >
> > I saw that in your patch you made these two changes (among others):
> >
> > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> b/lib/librte_cryptodev/rte_cryptodev.h
> > index affbdec..1e30a19 100644
> > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > @@ -759,7 +759,7 @@ struct rte_cryptodev_sym_session {
> > } __rte_aligned(8);
> > /**< Public symmetric session details */
> >
> > -   char _private[0];
> > +   char _private[];
> > /**< Private session material */
> >  };
> >
> > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > index 7d049ea..42e7b79 100644
> > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > @@ -71,7 +71,7 @@ struct rte_cryptodev_session {
> > struct rte_mempool *mp;
> > } __rte_aligned(8);
> >
> > -   char _private[0];
> > +   __extension__ char _private[0];
> >  };
> >
> > So I would expect the same change in both, as they are almost identical,
> > but you took different approaches (do you know why? I would like to know
> :))
> 
> Yes, this was done to address the exact same error (probably with the same
> old GCC version (4.4.7 perhaps?)), hence my surprise to see it fixed once
> again according to your commit log, I think your only mistake was to paste
> the error message for the wrong header in there (rte_cryptodev_pmd.h
> instead
> of rte_cryptodev.h), nothing wrong with your patch besides this.

Ohhh, all right! I understand now. Will send a v2 with the commit message fixed.

Thanks!
Pablo

> 
> > Basically, I noticed that gcc 4.5 doesn't complain when using your second
> approach,
> > that's why I changed it.
> 
> For the record GCC wrongly thinks the structure is empty because a unnamed
> struct field is declared inside. Before C11 such declarations only created a
> new type that did not occupy any space and not an actual field, hence why it
> complains when faced with [] instead of the well-behaved [0].
> 
> In this particular case it's a parsing error fixed in subsequent GCC
> versions, the unnamed struct actually uses some space otherwise it would
> have crashed during non-regression testing (right?)
> 
> --
> Adrien Mazarguil
> 6WIND

[dpdk-dev] [PATCH 1/2] i40e: Add packet_type metadata in the i40e vPMD

2016-10-05 Thread Chen, Jing D

Hi, 

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jeff Shaw
> Sent: Thursday, July 14, 2016 9:59 AM
> To: dev at dpdk.org; Zhang, Helin ; Wu, Jingjing
> ; damarion at cisco.com
> Subject: [dpdk-dev] [PATCH 1/2] i40e: Add packet_type metadata in the i40e
> vPMD
> 
> From: Damjan Marion 
> 
> The ptype is decoded from the rx descriptor and stored in the packet type
> field in the mbuf using the same function as the non-vector driver.
> 
> Signed-off-by: Damjan Marion 
> Signed-off-by: Jeff Shaw 
> ---
>  drivers/net/i40e/i40e_rxtx.c | 566 
> +--
>  drivers/net/i40e/i40e_rxtx.h | 563
> ++
>  drivers/net/i40e/i40e_rxtx_vec.c |  16 ++
>  3 files changed, 581 insertions(+), 564 deletions(-)
> 
> -
>  #define I40E_RX_DESC_EXT_STATUS_FLEXBH_MASK   0x03
>  #define I40E_RX_DESC_EXT_STATUS_FLEXBH_FD_ID  0x01
>  #define I40E_RX_DESC_EXT_STATUS_FLEXBH_FLEX   0x02
> @@ -2136,7 +1573,8 @@ i40e_dev_supported_ptypes_get(struct rte_eth_dev
> *dev)  #ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC
>   dev->rx_pkt_burst == i40e_recv_pkts_bulk_alloc ||  #endif
> - dev->rx_pkt_burst == i40e_recv_scattered_pkts)
> + dev->rx_pkt_burst == i40e_recv_scattered_pkts ||
> + dev->rx_pkt_burst == i40e_recv_pkts_vec)

Missed i40e_recv_scattered_pkts_vec()?

[dpdk-dev] [PATCH v2] app/test: remove hard-coding of crypto num qps

2016-10-05 Thread De Lara Guarch, Pablo

Hi,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Fiona Trahe
> Sent: Thursday, September 29, 2016 10:18 AM
> To: dev at dpdk.org
> Cc: De Lara Guarch, Pablo; Trahe, Fiona; Akhil Goyal
> Subject: [dpdk-dev] [PATCH v2] app/test: remove hard-coding of crypto num
> qps
> 
> ts_params->conf.nb_queue_pairs should not be hard coded with device
> specific number. It should be retrieved from the device info.
> Any test which changes it should restore it to orig value.
> 
> Also related cleanup of test code setting number and size of
> queue-pairs on a device, e.g.
> * Removed irrelevant ?for? loop ? was hardcoded to only loop once.
> * Removed obsolete comment re inability to free and re-allocate queu
> memory
>   and obsolete workaround for it which used to create maximum size queues.
> 
> And added freeing of ring memory on queue-pair release in aesni_mb PMD,
> else releasing and setting up queue-pair of a different size fails.
> 
> Signed-off-by: Akhil Goyal 
> Signed-off-by: Fiona Trahe 
> ---
> 
> v2:
>   Fix for broken QAT PMD unit tests exposed by v1
>   i.e. In test_device_configure_invalid_queue_pair_ids() after running tests
>   for invalid values restore original nb_queue_pairs.
>   Also cleanup of test code setting number and size of queue-pairs on a device
>   Also fix for aesni_mb PMD not freeing ring memory on qp release

Sorry, I missed this patch. Could you split this patch into different patches?
It looks like you are making (three?) changes in different places.

Thanks,
Pablo

[dpdk-dev] [PATCH v3 0/9] IPsec Enhancements

2016-10-05 Thread De Lara Guarch, Pablo



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of De Lara Guarch,
> Pablo
> Sent: Thursday, September 29, 2016 5:28 PM
> To: Gonzalez Monroy, Sergio; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/9] IPsec Enhancements
> 
> 
> 
> > -Original Message-
> > From: Gonzalez Monroy, Sergio
> > Sent: Thursday, September 29, 2016 8:44 AM
> > To: dev at dpdk.org
> > Cc: De Lara Guarch, Pablo
> > Subject: [PATCH v3 0/9] IPsec Enhancements
> >
> > This patch set mainly adds support for AES-GCM and AES-CTR.
> >
> > It also updates the IV generation method for AES-CBC mode using
> > the forward function instead of randomly generating the IV.
> >
> > v3:
> >  - update sample app guide
> >  - remove unused function
> >  - improve commit messages
> >
> > v2:
> >  - Update releas notes.
> >  - Initialize salt values, GCM/CTR key length now is 20B,
> >16B key and 4 LSB salt.
> >  - Do not check SP/ACL if we have no rules.
> >  - Add macro for cryptodev queue size
> >
> > Dependencies:
> > examples/ipsec-secgw: add configuration file support
> > http://dpdk.org/dev/patchwork/patch/16004/
> >
> > examples/ipsec-secgw: add sample configuration files
> > http://dpdk.org/dev/patchwork/patch/16003/
> >
> > Sergio Gonzalez Monroy (9):
> >   examples/ipsec-secgw: change CBC IV generation
> >   examples/ipsec-secgw: reset crypto operation status
> >   examples/ipsec-secgw: add AES-GCM support
> >   examples/ipsec-secgw: enable AES-CTR mode
> >   examples/ipsec-secgw: check sp only when setup
> >   examples/ipsec-secgw: add cryptodev queue size macro
> >   examples/ipsec-secgw: initialize sa salt
> >   examples/ipsec-secgw: update release notes
> >   examples/ipsec-secgw: update ipsec-secgw guide
> >
> >  doc/guides/rel_notes/release_16_11.rst   |   9 ++
> >  doc/guides/sample_app_ug/ipsec_secgw.rst |  15 ++--
> >  examples/ipsec-secgw/esp.c   | 144 
> > ++-
> >  examples/ipsec-secgw/ipsec-secgw.c   |   7 +-
> >  examples/ipsec-secgw/ipsec.c |   1 +
> >  examples/ipsec-secgw/ipsec.h |  35 +++-
> >  examples/ipsec-secgw/sa.c|  54 ++--
> >  7 files changed, 207 insertions(+), 58 deletions(-)
> >
> > --
> > 2.5.5
> 
> Series-acked-by: Pablo de Lara 

Applied to dpdk-next-crypto.
Thanks,

Pablo

[dpdk-dev] [PATCH] examples/ipsec-secgw: Update checksum while decrementing ttl

2016-10-05 Thread De Lara Guarch, Pablo



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Sergio Gonzalez
> Monroy
> Sent: Monday, September 26, 2016 6:28 AM
> To: akhil.goyal at nxp.com; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] examples/ipsec-secgw: Update checksum
> while decrementing ttl
> 
> Hi Akhil,
> 
> This application relies on checksum offload in both outbound and inbound
> paths (PKT_TX_IP_CKSUM flag).
> 
> Because we assume that we always forward the packet in both paths, we
> decrement the ttl in both inbound and outbound.
> You seem to only increment (recalculate) the checksum of the inner IP
> header in the outbound path but not the inbound path.
> 
> Also, in the inbound path you have to consider a possible ECN value update.

Any further comments here, Akhil?

Thanks,
Pablo

> 
> Sergio
> 
> 
> On 26/09/2016 17:32, akhil.goyal at nxp.com wrote:
> > From: Akhil Goyal 
> >
> > In IPsec-secgw application when TTL is decremented in IP header
> > before forwarding the packet, checksum needs to be updated.
> >
> > In this patch an incremental checksum is added.
> > Other applications(like l3fwd) are also doing so.
> >
> > Signed-off-by: Akhil Goyal 
> > ---
> >   examples/ipsec-secgw/ipip.h | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/examples/ipsec-secgw/ipip.h b/examples/ipsec-secgw/ipip.h
> > index ff1dccd..ef059a9 100644
> > --- a/examples/ipsec-secgw/ipip.h
> > +++ b/examples/ipsec-secgw/ipip.h
> > @@ -56,6 +56,7 @@ ipip_outbound(struct rte_mbuf *m, uint32_t offset,
> uint32_t is_ipv6,
> > if (inip4->ip_v == IPVERSION) {
> > /* XXX This should be done by the forwarding engine instead
> */
> > inip4->ip_ttl -= 1;
> > +   inip4->ip_sum += 1;
> > ds_ecn = inip4->ip_tos;
> > } else {
> > inip6 = (struct ip6_hdr *)inip4;
> 
>

[dpdk-dev] [PATCH v5 4/4] hash: modify lookup bulk pipeline

2016-10-05 Thread Pablo de Lara

From: Byron Marohn 

This patch replaces the pipelined rte_hash lookup mechanism with a
loop-and-jump model, which performs significantly better,
especially for smaller table sizes and smaller table occupancies.

Signed-off-by: Byron Marohn 
Signed-off-by: Saikrishna Edupuganti 
Signed-off-by: Pablo de Lara 
Acked-by: Bruce Richardson 
Acked-by: Sameh Gobriel 
---
 lib/librte_hash/rte_cuckoo_hash.c | 378 --
 lib/librte_hash/rte_cuckoo_hash.h |   3 +-
 2 files changed, 117 insertions(+), 264 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index d762f36..3324b17 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -914,44 +914,8 @@ rte_hash_get_key_with_position(const struct rte_hash *h, 
const int32_t position,
return 0;
 }

-/* Lookup bulk stage 0: Prefetch input key */
 static inline void
-lookup_stage0(unsigned *idx, uint64_t *lookup_mask,
-   const void * const *keys)
-{
-   *idx = __builtin_ctzl(*lookup_mask);
-   if (*lookup_mask == 0)
-   *idx = 0;
-
-   rte_prefetch0(keys[*idx]);
-   *lookup_mask &= ~(1llu << *idx);
-}
-
-/*
- * Lookup bulk stage 1: Calculate primary/secondary hashes
- * and prefetch primary/secondary buckets
- */
-static inline void
-lookup_stage1(unsigned idx, hash_sig_t *prim_hash, hash_sig_t *sec_hash,
-   const struct rte_hash_bucket **primary_bkt,
-   const struct rte_hash_bucket **secondary_bkt,
-   hash_sig_t *hash_vals, const void * const *keys,
-   const struct rte_hash *h)
-{
-   *prim_hash = rte_hash_hash(h, keys[idx]);
-   hash_vals[idx] = *prim_hash;
-   *sec_hash = rte_hash_secondary_hash(*prim_hash);
-
-   *primary_bkt = >buckets[*prim_hash & h->bucket_bitmask];
-   *secondary_bkt = >buckets[*sec_hash & h->bucket_bitmask];
-
-   rte_prefetch0(*primary_bkt);
-   rte_prefetch0(*secondary_bkt);
-}
-
-static inline void
-compare_signatures(unsigned int *prim_hash_matches,
-   unsigned int *sec_hash_matches,
+compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
const struct rte_hash_bucket *prim_bkt,
const struct rte_hash_bucket *sec_bkt,
hash_sig_t prim_hash, hash_sig_t sec_hash,
@@ -962,11 +926,11 @@ compare_signatures(unsigned int *prim_hash_matches,
switch (sig_cmp_fn) {
 #ifdef RTE_MACHINE_CPUFLAG_AVX2
case RTE_HASH_COMPARE_AVX2:
-   *prim_hash_matches |= 
_mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
+   *prim_hash_matches = 
_mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
_mm256_load_si256(
(__m256i const *)prim_bkt->sig_current),
_mm256_set1_epi32(prim_hash)));
-   *sec_hash_matches |= 
_mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
+   *sec_hash_matches = 
_mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
_mm256_load_si256(
(__m256i const *)sec_bkt->sig_current),
_mm256_set1_epi32(sec_hash)));
@@ -975,7 +939,7 @@ compare_signatures(unsigned int *prim_hash_matches,
 #ifdef RTE_MACHINE_CPUFLAG_SSE2
case RTE_HASH_COMPARE_SSE:
/* Compare the first 4 signatures in the bucket */
-   *prim_hash_matches |= _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+   *prim_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
_mm_load_si128(
(__m128i const *)prim_bkt->sig_current),
_mm_set1_epi32(prim_hash)));
@@ -984,7 +948,7 @@ compare_signatures(unsigned int *prim_hash_matches,
(__m128i const 
*)_bkt->sig_current[4]),
_mm_set1_epi32(prim_hash << 4;
/* Compare the first 4 signatures in the bucket */
-   *sec_hash_matches |= _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+   *sec_hash_matches = _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
_mm_load_si128(
(__m128i const *)sec_bkt->sig_current),
_mm_set1_epi32(sec_hash)));
@@ -1005,244 +969,134 @@ compare_signatures(unsigned int *prim_hash_matches,

 }

-/*
- * Lookup bulk stage 2:  Search for match hashes in primary/secondary locations
- * and prefetch first key slot
- */
+#define PREFETCH_OFFSET 4
 static inline void
-lookup_stage2(unsigned idx, hash_sig_t prim_hash, hash_sig_t sec_hash,
-   const struct rte_hash_bucket *prim_bkt,
-   const struct rte_hash_bucket *sec_bkt,
-   const struct

[dpdk-dev] [PATCH v5 3/4] hash: add vectorized comparison

2016-10-05 Thread Pablo de Lara

From: Byron Marohn 

In lookup bulk function, the signatures of all entries
are compared against the signature of the key that is being looked up.
Now that all the signatures are together, they can be compared
with vector instructions (SSE, AVX2), achieving higher lookup performance.

Also, entries per bucket are increased to 8 when using processors
with AVX2, as 256 bits can be compared at once, which is the size of
8x32-bit signatures.

Signed-off-by: Byron Marohn 
Signed-off-by: Saikrishna Edupuganti 
Signed-off-by: Pablo de Lara 
Acked-by: Bruce Richardson 
Acked-by: Sameh Gobriel 
---
 lib/librte_hash/rte_cuckoo_hash.c | 76 +++
 lib/librte_hash/rte_cuckoo_hash.h | 12 ++-
 2 files changed, 81 insertions(+), 7 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index a7ee2b9..d762f36 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -284,6 +284,15 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->free_slots = r;
h->hw_trans_mem_support = hw_trans_mem_support;

+#if defined(RTE_ARCH_X86)
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
+   h->sig_cmp_fn = RTE_HASH_COMPARE_AVX2;
+   else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
+   h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
+   else
+#endif
+   h->sig_cmp_fn = RTE_HASH_COMPARE_SCALAR;
+
/* Turn on multi-writer only with explicit flat from user and TM
 * support.
 */
@@ -940,6 +949,62 @@ lookup_stage1(unsigned idx, hash_sig_t *prim_hash, 
hash_sig_t *sec_hash,
rte_prefetch0(*secondary_bkt);
 }

+static inline void
+compare_signatures(unsigned int *prim_hash_matches,
+   unsigned int *sec_hash_matches,
+   const struct rte_hash_bucket *prim_bkt,
+   const struct rte_hash_bucket *sec_bkt,
+   hash_sig_t prim_hash, hash_sig_t sec_hash,
+   enum rte_hash_sig_compare_function sig_cmp_fn)
+{
+   unsigned int i;
+
+   switch (sig_cmp_fn) {
+#ifdef RTE_MACHINE_CPUFLAG_AVX2
+   case RTE_HASH_COMPARE_AVX2:
+   *prim_hash_matches |= 
_mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
+   _mm256_load_si256(
+   (__m256i const *)prim_bkt->sig_current),
+   _mm256_set1_epi32(prim_hash)));
+   *sec_hash_matches |= 
_mm256_movemask_ps((__m256)_mm256_cmpeq_epi32(
+   _mm256_load_si256(
+   (__m256i const *)sec_bkt->sig_current),
+   _mm256_set1_epi32(sec_hash)));
+   break;
+#endif
+#ifdef RTE_MACHINE_CPUFLAG_SSE2
+   case RTE_HASH_COMPARE_SSE:
+   /* Compare the first 4 signatures in the bucket */
+   *prim_hash_matches |= _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+   _mm_load_si128(
+   (__m128i const *)prim_bkt->sig_current),
+   _mm_set1_epi32(prim_hash)));
+   *prim_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+   _mm_load_si128(
+   (__m128i const 
*)_bkt->sig_current[4]),
+   _mm_set1_epi32(prim_hash << 4;
+   /* Compare the first 4 signatures in the bucket */
+   *sec_hash_matches |= _mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+   _mm_load_si128(
+   (__m128i const *)sec_bkt->sig_current),
+   _mm_set1_epi32(sec_hash)));
+   *sec_hash_matches |= (_mm_movemask_ps((__m128)_mm_cmpeq_epi16(
+   _mm_load_si128(
+   (__m128i const 
*)_bkt->sig_current[4]),
+   _mm_set1_epi32(sec_hash << 4;
+   break;
+#endif
+   default:
+   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   *prim_hash_matches |=
+   ((prim_hash == prim_bkt->sig_current[i]) << i);
+   *sec_hash_matches |=
+   ((sec_hash == sec_bkt->sig_current[i]) << i);
+   }
+   }
+
+}
+
 /*
  * Lookup bulk stage 2:  Search for match hashes in primary/secondary locations
  * and prefetch first key slot
@@ -952,15 +1017,14 @@ lookup_stage2(unsigned idx, hash_sig_t prim_hash, 
hash_sig_t sec_hash,
uint64_t *extra_hits_mask, const void *keys,
const struct rte_hash *h)
 {
-   unsigned prim_hash_matches, sec_hash_matches, key_idx, i;
-   unsigned total_hash_matches;
+   unsigned int prim_hash_matches,

[dpdk-dev] [PATCH v5 2/4] hash: reorganize bucket structure

2016-10-05 Thread Pablo de Lara

From: Byron Marohn 

Move current signatures of all entries together in the bucket
and same with all alternative signatures, instead of having
current and alternative signatures together per entry in the bucket.
This will be benefitial in the next commits, where a vectorized
comparison will be performed, achieving better performance.

The alternative signatures have been moved away from
the current signatures, to make the key indices be consecutive
to the current signatures, as these two fields are used by lookup,
so they are in the same cache line.

Signed-off-by: Byron Marohn 
Signed-off-by: Saikrishna Edupuganti 
Acked-by: Bruce Richardson 
Acked-by: Sameh Gobriel 
---
 lib/librte_hash/rte_cuckoo_hash.c | 43 ++-
 lib/librte_hash/rte_cuckoo_hash.h | 17 --
 lib/librte_hash/rte_cuckoo_hash_x86.h | 20 
 3 files changed, 37 insertions(+), 43 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index 4de4422..a7ee2b9 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -421,7 +421,7 @@ make_space_bucket(const struct rte_hash *h, struct 
rte_hash_bucket *bkt)
 */
for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
/* Search for space in alternative locations */
-   next_bucket_idx = bkt->signatures[i].alt & h->bucket_bitmask;
+   next_bucket_idx = bkt->sig_alt[i] & h->bucket_bitmask;
next_bkt[i] = >buckets[next_bucket_idx];
for (j = 0; j < RTE_HASH_BUCKET_ENTRIES; j++) {
if (next_bkt[i]->key_idx[j] == EMPTY_SLOT)
@@ -434,8 +434,8 @@ make_space_bucket(const struct rte_hash *h, struct 
rte_hash_bucket *bkt)

/* Alternative location has spare room (end of recursive function) */
if (i != RTE_HASH_BUCKET_ENTRIES) {
-   next_bkt[i]->signatures[j].alt = bkt->signatures[i].current;
-   next_bkt[i]->signatures[j].current = bkt->signatures[i].alt;
+   next_bkt[i]->sig_alt[j] = bkt->sig_current[i];
+   next_bkt[i]->sig_current[j] = bkt->sig_alt[i];
next_bkt[i]->key_idx[j] = bkt->key_idx[i];
return i;
}
@@ -461,8 +461,8 @@ make_space_bucket(const struct rte_hash *h, struct 
rte_hash_bucket *bkt)
 */
bkt->flag[i] = 0;
if (ret >= 0) {
-   next_bkt[i]->signatures[ret].alt = bkt->signatures[i].current;
-   next_bkt[i]->signatures[ret].current = bkt->signatures[i].alt;
+   next_bkt[i]->sig_alt[ret] = bkt->sig_current[i];
+   next_bkt[i]->sig_current[ret] = bkt->sig_alt[i];
next_bkt[i]->key_idx[ret] = bkt->key_idx[i];
return i;
} else
@@ -544,8 +544,8 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, 
const void *key,

/* Check if key is already inserted in primary location */
for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-   if (prim_bkt->signatures[i].current == sig &&
-   prim_bkt->signatures[i].alt == alt_hash) {
+   if (prim_bkt->sig_current[i] == sig &&
+   prim_bkt->sig_alt[i] == alt_hash) {
k = (struct rte_hash_key *) ((char *)keys +
prim_bkt->key_idx[i] * 
h->key_entry_size);
if (rte_hash_cmp_eq(key, k->key, h) == 0) {
@@ -564,8 +564,8 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, 
const void *key,

/* Check if key is already inserted in secondary location */
for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-   if (sec_bkt->signatures[i].alt == sig &&
-   sec_bkt->signatures[i].current == alt_hash) {
+   if (sec_bkt->sig_alt[i] == sig &&
+   sec_bkt->sig_current[i] == alt_hash) {
k = (struct rte_hash_key *) ((char *)keys +
sec_bkt->key_idx[i] * 
h->key_entry_size);
if (rte_hash_cmp_eq(key, k->key, h) == 0) {
@@ -611,8 +611,8 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, 
const void *key,
for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
/* Check if slot is available */
if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
-   prim_bkt->signatures[i].current = sig;
-   prim_bkt->signatures[i].alt = alt_hash;
+   prim_bkt->sig_current[i] = sig;
+   prim_bkt->sig_alt[i] = alt_hash;
prim_bkt->key_idx[i] = new_idx;
break;
}
@@ -632,8 +632,8 @@ __rte_hash_add_key_with_hash(const struct

[dpdk-dev] [PATCH v5 1/4] hash: reorder hash structure

2016-10-05 Thread Pablo de Lara

In order to optimize lookup performance, hash structure
is reordered, so all fields used for lookup will be
in the first cache line.

Signed-off-by: Pablo de Lara 
Acked-by: Bruce Richardson 
Acked-by: Sameh Gobriel 
---
 lib/librte_hash/rte_cuckoo_hash.h | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.h 
b/lib/librte_hash/rte_cuckoo_hash.h
index e290dab..27a47e5 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -185,30 +185,36 @@ struct rte_hash {
char name[RTE_HASH_NAMESIZE];   /**< Name of the hash. */
uint32_t entries;   /**< Total table entries. */
uint32_t num_buckets;   /**< Number of buckets in table. */
-   uint32_t key_len;   /**< Length of hash key. */
+
+   struct rte_ring *free_slots;
+   /**< Ring that stores all indexes of the free slots in the key table */
+   uint8_t hw_trans_mem_support;
+   /**< Hardware transactional memory support */
+   struct lcore_cache *local_free_slots;
+   /**< Local cache per lcore, storing some indexes of the free slots */
+   enum add_key_case add_key; /**< Multi-writer hash add behavior */
+
+   rte_spinlock_t *multiwriter_lock; /**< Multi-writer spinlock for w/o TM 
*/
+
+   /* Fields used in lookup */
+
+   uint32_t key_len __rte_cache_aligned;
+   /**< Length of hash key. */
rte_hash_function hash_func;/**< Function used to calculate hash. */
uint32_t hash_func_init_val;/**< Init value used by hash_func. */
rte_hash_cmp_eq_t rte_hash_custom_cmp_eq;
/**< Custom function used to compare keys. */
enum cmp_jump_table_case cmp_jump_table_idx;
/**< Indicates which compare function to use. */
-   uint32_t bucket_bitmask;/**< Bitmask for getting bucket index
-   from hash signature. */
+   uint32_t bucket_bitmask;
+   /**< Bitmask for getting bucket index from hash signature. */
uint32_t key_entry_size; /**< Size of each key entry. */

-   struct rte_ring *free_slots;/**< Ring that stores all indexes
-   of the free slots in the key 
table */
void *key_store;/**< Table storing all keys and data */
-   struct rte_hash_bucket *buckets;/**< Table with buckets storing 
all the
-   hash values and key 
indexes
-   to the key table*/
-   uint8_t hw_trans_mem_support;   /**< Hardware transactional
-   memory support */
-   struct lcore_cache *local_free_slots;
-   /**< Local cache per lcore, storing some indexes of the free slots */
-   enum add_key_case add_key; /**< Multi-writer hash add behavior */
-
-   rte_spinlock_t *multiwriter_lock; /**< Multi-writer spinlock for w/o TM 
*/
+   struct rte_hash_bucket *buckets;
+   /**< Table with buckets storing all the hash values and key indexes
+* to the key table.
+*/
 } __rte_cache_aligned;

 struct queue_node {
-- 
2.7.4

[dpdk-dev] [PATCH v5 0/4] Cuckoo hash enhancements

2016-10-05 Thread Pablo de Lara

This patchset improves lookup performance on the current hash library
by changing the existing lookup bulk pipeline, with an improved pipeline,
based on a loop-and-jump model, instead of the current 4-stage 2-entry pipeline.
Also, x86 vectorized intrinsics are used to improve performance when comparing 
signatures.

First patch reorganizes the order of the hash structure.
The structure takes more than one 64-byte cache line, but not all
the fields are used in the lookup operation (the most common operation).
Therefore, all these fields have been moved to the first part of the structure,
so they all fit in one cache line, improving slightly the performance in some
scenarios.

Second patch modifies the order of the bucket structure.
Currently, the buckets store all the signatures together (current and 
alternative).
In order to be able to perform a vectorized signature comparison,
all current signatures have to be together, so the order of the bucket has been 
changed,
having separated all the current signatures from the alternative signatures.

Third patch introduces x86 vectorized intrinsics.
When performing a lookup bulk operation, all current signatures in a bucket
are compared against the signature of the key being looked up.
Now that they all are together, a vectorized comparison can be performed,
which takes less instructions to be carried out.
In case of having a machine with AVX2, number of entries per bucket are
increased from 4 to 8, as AVX2 allows comparing two 256-bit values, with 
8x32-bit integers,
which are the 8 signatures on the bucket.

Fourth (and last) patch modifies the current pipeline of the lookup bulk 
function.
The new pipeline is based on a loop-and-jump model. The two key improvements 
are:

- Better prefetching: in this case, first 4 keys to be looked up are prefetched,
  and after that, the rest of the keys are prefetched at the time the 
calculation
  of the signatures are being performed. This gives more time for the CPU to
  prefetch the data requesting before actually need it, which result in less
  cache misses and therefore, higher throughput.

- Lower performance penalty when using fallback: the lookup bulk algorithm
  assumes that most times there will not be a collision in a bucket, but it 
might
  happen that two or more signatures are equal, which means that more than one
  key comparison might be necessary. In that case, only the key of the first 
hit is prefetched,
  like in the current implementation. The difference now is that if this 
comparison
  results in a miss, the information of the other keys to be compared has been 
stored,
  unlike the current implementation, which needs to perform an entire simple 
lookup again.

Changes in v5:
- Rebased against current HEAD
- Fix checkpatch warnings

Changes in v4:
- Reordered hash structure, so alt signature is at the start
  of the next cache line, and explain in the commit message
  why it has been moved
- Reordered hash structure, so name field is on top of the structure,
  leaving all the fields used in lookup in the next cache line
  (instead of the first cache line)

Changes in v3:
- Corrected the cover letter (wrong number of patches)

Changes in v2:
- Increased entries per bucket from 4 to 8 for all cases,
  so it is not architecture dependent any longer.
- Replaced compile-time signature comparison function election
  with run-time election, so best optimization available
  will be used from a single binary.
- Reordered the hash structure, so all the fields used by lookup
  are in the same cache line (first).


Byron Marohn (3):
  hash: reorganize bucket structure
  hash: add vectorized comparison
  hash: modify lookup bulk pipeline

Pablo de Lara (1):
  hash: reorder hash structure

 lib/librte_hash/rte_cuckoo_hash.c | 455 ++
 lib/librte_hash/rte_cuckoo_hash.h |  68 ++---
 lib/librte_hash/rte_cuckoo_hash_x86.h |  20 +-
 3 files changed, 235 insertions(+), 308 deletions(-)

-- 
2.7.4

[dpdk-dev] [PATCH v6 0/4] new crypto software based device

2016-10-05 Thread De Lara Guarch, Pablo



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of De Lara Guarch,
> Pablo
> Sent: Tuesday, October 04, 2016 4:37 PM
> To: Mrozowicz, SlawomirX; dev at dpdk.org
> Cc: Mrozowicz, SlawomirX
> Subject: Re: [dpdk-dev] [PATCH v6 0/4] new crypto software based device
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Slawomir
> Mrozowicz
> > Sent: Tuesday, October 04, 2016 8:11 AM
> > To: dev at dpdk.org
> > Cc: Mrozowicz, SlawomirX
> > Subject: [dpdk-dev] [PATCH v6 0/4] new crypto software based device
> >
> > This code provides the initial implementation of the libcrypto poll mode
> > driver.
> > All cryptography operations are using Openssl library crypto API.
> > Each algorithm uses EVP_ interface from openssl API - which is
> recommended
> > by
> > Openssl maintainers.
> >
> > For more information about how to use this driver, go to:
> > doc/guides/cryptodevs/libcrypto.rst
> >
> > Changes in V6:
> > - fix checkpatch warnings
> >
> > Changes in V5:
> > - reduce source of big data test
> >
> > Changes in V4:
> > - move aes test rework to another patch
> > - move big data test to another patch
> > - checking if libcrypto pmd is available
> >
> > Changes in V3:
> > - add nagative verification tests
> > - add big data test
> > - fix pmd according to negative verification tests
> > - change gmac aad max size
> > - update documentation and commits comments
> >
> > Changes in V2:
> > - add gcm/gmac algorithm correction
> > - unit test rework
> >
> > Slawomir Mrozowicz (1):
> >   libcrypto_pmd: initial implementation of SW crypto device
> >
> > Piotr Azarewicz (2)
> >   app/test: cryptodev AES tests rework
> >   app/test: added tests for libcrypto PMD
> >
> > Daniel Mrzyglod (1)
> >   examples/l2fwd-crypto: updated example for libcrypto PMD
> >
> >  MAINTAINERS|4 +
> >  app/test/Makefile  |2 +-
> >  app/test/test_cryptodev.c  | 1584 
> > ++--
> >  app/test/test_cryptodev.h  |1 +
> >  app/test/test_cryptodev_aes.c  |  687 -
> >  app/test/test_cryptodev_aes.h  | 1124 --
> >  app/test/test_cryptodev_aes_test_vectors.h | 1097 ++
> >  app/test/test_cryptodev_blockcipher.c  |  538 +++
> >  app/test/test_cryptodev_blockcipher.h  |  125 ++
> >  app/test/test_cryptodev_des_test_vectors.h |  955 
> >  app/test/test_cryptodev_gcm_test_vectors.h |   36 +-
> >  app/test/test_cryptodev_hash_test_vectors.h|  491 ++
> >  app/test/test_cryptodev_perf.c |  712 -
> >  config/common_base |6 +
> >  doc/guides/cryptodevs/index.rst|1 +
> >  doc/guides/cryptodevs/libcrypto.rst|  116 ++
> >  doc/guides/rel_notes/release_16_11.rst |   23 +-
> >  drivers/crypto/Makefile|1 +
> >  drivers/crypto/libcrypto/Makefile  |   60 +
> >  drivers/crypto/libcrypto/rte_libcrypto_pmd.c   | 1062 +
> >  drivers/crypto/libcrypto/rte_libcrypto_pmd_ops.c   |  708 +
> >  .../crypto/libcrypto/rte_libcrypto_pmd_private.h   |  174 +++
> >  .../crypto/libcrypto/rte_pmd_libcrypto_version.map |3 +
> >  examples/l2fwd-crypto/main.c   |9 +
> >  lib/librte_cryptodev/rte_cryptodev.h   |5 +-
> >  mk/rte.app.mk  |   23 +-
> >  26 files changed, 7621 insertions(+), 1926 deletions(-)
> >  delete mode 100644 app/test/test_cryptodev_aes.c
> >  delete mode 100644 app/test/test_cryptodev_aes.h
> >  create mode 100644 app/test/test_cryptodev_aes_test_vectors.h
> >  create mode 100644 app/test/test_cryptodev_blockcipher.c
> >  create mode 100644 app/test/test_cryptodev_blockcipher.h
> >  create mode 100644 app/test/test_cryptodev_des_test_vectors.h
> >  create mode 100644 app/test/test_cryptodev_hash_test_vectors.h
> >  create mode 100644 doc/guides/cryptodevs/libcrypto.rst
> >  create mode 100644 drivers/crypto/libcrypto/Makefile
> >  create mode 100644 drivers/crypto/libcrypto/rte_libcrypto_pmd.c
> >  create mode 100644 drivers/crypto/libcrypto/rte_libcrypto_pmd_ops.c
> >  create mode 100644 drivers/crypto/libcrypto/rte_libcrypto_pmd_private.h
> >  create mode 100644
> > drivers/crypto/libcrypto/rte_pmd_libcrypto_version.map
> >
> > --
> > 2.5.0
> 
> Series-acked-by: Pablo de Lara 
> 
> Thanks for all the rework!

Applied to dpdk-next-crypto.

I made a small change in the first patch, due to the changes introduced in
http://dpdk.org/ml/archives/dev/2016-September/047087.html,
which affects driver registration.

Thanks,
Pablo

[dpdk-dev] [PATCH v6 0/4] new crypto software based device

2016-10-05 Thread De Lara Guarch, Pablo



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Slawomir Mrozowicz
> Sent: Tuesday, October 04, 2016 8:11 AM
> To: dev at dpdk.org
> Cc: Mrozowicz, SlawomirX
> Subject: [dpdk-dev] [PATCH v6 0/4] new crypto software based device
> 
> This code provides the initial implementation of the libcrypto poll mode
> driver.
> All cryptography operations are using Openssl library crypto API.
> Each algorithm uses EVP_ interface from openssl API - which is recommended
> by
> Openssl maintainers.
> 
> For more information about how to use this driver, go to:
> doc/guides/cryptodevs/libcrypto.rst
> 
> Changes in V6:
> - fix checkpatch warnings
> 
> Changes in V5:
> - reduce source of big data test
> 
> Changes in V4:
> - move aes test rework to another patch
> - move big data test to another patch
> - checking if libcrypto pmd is available
> 
> Changes in V3:
> - add nagative verification tests
> - add big data test
> - fix pmd according to negative verification tests
> - change gmac aad max size
> - update documentation and commits comments
> 
> Changes in V2:
> - add gcm/gmac algorithm correction
> - unit test rework
> 
> Slawomir Mrozowicz (1):
>   libcrypto_pmd: initial implementation of SW crypto device
> 
> Piotr Azarewicz (2)
>   app/test: cryptodev AES tests rework
>   app/test: added tests for libcrypto PMD
> 
> Daniel Mrzyglod (1)
>   examples/l2fwd-crypto: updated example for libcrypto PMD
> 
>  MAINTAINERS|4 +
>  app/test/Makefile  |2 +-
>  app/test/test_cryptodev.c  | 1584 
> ++--
>  app/test/test_cryptodev.h  |1 +
>  app/test/test_cryptodev_aes.c  |  687 -
>  app/test/test_cryptodev_aes.h  | 1124 --
>  app/test/test_cryptodev_aes_test_vectors.h | 1097 ++
>  app/test/test_cryptodev_blockcipher.c  |  538 +++
>  app/test/test_cryptodev_blockcipher.h  |  125 ++
>  app/test/test_cryptodev_des_test_vectors.h |  955 
>  app/test/test_cryptodev_gcm_test_vectors.h |   36 +-
>  app/test/test_cryptodev_hash_test_vectors.h|  491 ++
>  app/test/test_cryptodev_perf.c |  712 -
>  config/common_base |6 +
>  doc/guides/cryptodevs/index.rst|1 +
>  doc/guides/cryptodevs/libcrypto.rst|  116 ++
>  doc/guides/rel_notes/release_16_11.rst |   23 +-
>  drivers/crypto/Makefile|1 +
>  drivers/crypto/libcrypto/Makefile  |   60 +
>  drivers/crypto/libcrypto/rte_libcrypto_pmd.c   | 1062 +
>  drivers/crypto/libcrypto/rte_libcrypto_pmd_ops.c   |  708 +
>  .../crypto/libcrypto/rte_libcrypto_pmd_private.h   |  174 +++
>  .../crypto/libcrypto/rte_pmd_libcrypto_version.map |3 +
>  examples/l2fwd-crypto/main.c   |9 +
>  lib/librte_cryptodev/rte_cryptodev.h   |5 +-
>  mk/rte.app.mk  |   23 +-
>  26 files changed, 7621 insertions(+), 1926 deletions(-)
>  delete mode 100644 app/test/test_cryptodev_aes.c
>  delete mode 100644 app/test/test_cryptodev_aes.h
>  create mode 100644 app/test/test_cryptodev_aes_test_vectors.h
>  create mode 100644 app/test/test_cryptodev_blockcipher.c
>  create mode 100644 app/test/test_cryptodev_blockcipher.h
>  create mode 100644 app/test/test_cryptodev_des_test_vectors.h
>  create mode 100644 app/test/test_cryptodev_hash_test_vectors.h
>  create mode 100644 doc/guides/cryptodevs/libcrypto.rst
>  create mode 100644 drivers/crypto/libcrypto/Makefile
>  create mode 100644 drivers/crypto/libcrypto/rte_libcrypto_pmd.c
>  create mode 100644 drivers/crypto/libcrypto/rte_libcrypto_pmd_ops.c
>  create mode 100644 drivers/crypto/libcrypto/rte_libcrypto_pmd_private.h
>  create mode 100644
> drivers/crypto/libcrypto/rte_pmd_libcrypto_version.map
> 
> --
> 2.5.0

Series-acked-by: Pablo de Lara 

Thanks for all the rework!

[dpdk-dev] [PATCH v7] net/virtio: add set_mtu in virtio

2016-10-05 Thread Dey, Souvik

Hi All,
Is there any further comments or modifications required for this patch, 
or what next steps do you guys suggest here ?

--
Regards,
Souvik

-Original Message-
From: Dey, Souvik 
Sent: Saturday, October 1, 2016 10:09 AM
To: mark.b.kavanagh at intel.com; yuanhan.liu at linux.intel.com; stephen at 
networkplumber.org; dev at dpdk.org
Subject: RE: [PATCH v7] net/virtio: add set_mtu in virtio

Hi Liu/Stephen/Mark,

I have submitted Version 7 of this patch. Do let me know if this looks 
proper.

--
Regards,
Souvik  

-Original Message-
From: Dey, Souvik 
Sent: Thursday, September 29, 2016 4:32 PM
To: mark.b.kavanagh at intel.com; yuanhan.liu at linux.intel.com; stephen at 
networkplumber.org; dev at dpdk.org
Cc: Dey, Souvik 
Subject: [PATCH v7] net/virtio: add set_mtu in virtio


Virtio interfaces do not currently allow the user to specify a particular 
Maximum Transmission Unit (MTU).Consequently, the MTU of Virtio interfaces 
is typically set to the Ethernet default value of 1500.
This is problematic in the case of cloud deployments, in which a specific
(and potentially non-standard) MTU needs to be set by a DHCP server, which 
needs to be honored by all interfaces across the traffic path.To acheive 
this Virtio interfaces should support setting of MTU.
In case when GRE/VXLAN tunneling is used for internal communication, there 
will be an overhead added by the infrastructure in the packet over and 
above the ETHER MTU of 1518. So to take care of this overhead in these 
cases the DHCP server corrects the L3 MTU to 1454. But since virtio 
interfaces was not having the MTU set functionality that MTU sent by the 
DHCP server was ignored and the instance will still send packets with 1500 
MTU which after encapsulation will become more than 1518 and eventually 
gets dropped in the infrastructure. 
By adding an additional 'set_mtu' function to the Virtio driver, we can 
honor the MTU sent by the DHCP server. The dhcp server/controller can 
then leverage this 'set_mtu' functionality to resolve the above 
mentioned issue of packets getting dropped due to incorrect size.


Signed-off-by: Souvik Dey 

---
Changes in v7:
- Replaced the CRC_LEN with the merge rx buf header length.
- Changed the frame_len max validation to VIRTIO_MAX_RX_PKTLEN.
Changes in v6:
- Description of change corrected
- Corrected the identations
- Corrected the subject line too
- The From line was also not correct
- Re-submitting as the below patch was not proper
Changes in v5: 
- Fix log message for out-of-bounds MTU parameter in virtio_mtu_set
- Calculate frame size, based on 'mtu' parameter
- Corrected the upper bound and lower bound checks in virtio_mtu_set
Changes in v4: Incorporated review comments.
Changes in v3: Corrected few style errors as reported by sys-stv.
Changes in v2: Incorporated review comments.

 drivers/net/virtio/virtio_ethdev.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 423c597..1dbfea6 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -653,12 +653,20 @@ virtio_dev_allmulticast_disable(struct rte_eth_dev *dev)
PMD_INIT_LOG(ERR, "Failed to disable allmulticast");
 } 

+#define VLAN_TAG_LEN   4/* 802.3ac tag (not DMA'd) */
+static int  virtio_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   uint32_t ether_hdr_len = ETHER_HDR_LEN + VLAN_TAG_LEN +
+   hw->vtnet_hdr_size;
+   uint32_t frame_size = mtu + ether_hdr_len;
+
+   if (mtu < ETHER_MIN_MTU || frame_size > VIRTIO_MAX_RX_PKTLEN) {
+   PMD_INIT_LOG(ERR, "MTU should be between %d and %d\n",
+   ETHER_MIN_MTU, VIRTIO_MAX_RX_PKTLEN);
+   return -EINVAL;
+   }
+   return 0;
+}

 /*
  * dev_ops for virtio, bare necessities for basic operation
  */
@@ -677,7 +685,6 @@ static const struct eth_dev_ops virtio_eth_dev_ops = {
.allmulticast_enable = virtio_dev_allmulticast_enable,
.allmulticast_disable= virtio_dev_allmulticast_disable,
+   .mtu_set = virtio_mtu_set,
.dev_infos_get   = virtio_dev_info_get,
.stats_get   = virtio_dev_stats_get,
.xstats_get  = virtio_dev_xstats_get,
-- 
2.9.3.windows.1

75 matches

Mail list logo