[dpdk-dev] [PATCH 0/3] net: fix out of order rx read issue

2016-10-17 Thread Chen, Jing D
Hi,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Qi Zhang
> Sent: Monday, October 17, 2016 1:24 AM
> To: Wu, Jingjing ; Zhang, Helin
> 
> Cc: dev at dpdk.org; Zhang, Qi Z 
> Subject: [dpdk-dev] [PATCH 0/3] net: fix out of order rx read issue
> 
> Volatile point has been cast to non-volatile point when call _mm_loadu_si128,
> so add compile barrier to prevent compiler reorder.
> 
> Qi Zhang (3):
>   net/i40e: fix out of order rx read issue
>   net/ixgbe: fix out of order rx read issue
>   net/fm10k: fix out of order rx read issue
> 
>  drivers/net/fm10k/fm10k_rxtx_vec.c | 3 +++
>  drivers/net/i40e/i40e_rxtx_vec.c   | 3 +++
>  drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 3 +++
>  3 files changed, 9 insertions(+)

I have an overall comment on the committed message. You'd better to describe
why we have out of order issue here and the requirement on the execution order
of the 4 loads. 


[dpdk-dev] [PATCH v4 2/2] i40e: Enable bad checksum flags in i40e vPMD

2016-10-06 Thread Chen, Jing D

> -Original Message-
> From: Shaw, Jeffrey B
> Sent: Wednesday, October 5, 2016 11:38 PM
> To: dev at dpdk.org
> Cc: Zhang, Helin ; Wu, Jingjing
> ; damarion at cisco.com; Zhang, Qi Z
> ; Chen, Jing D 
> Subject: [PATCH v4 2/2] i40e: Enable bad checksum flags in i40e vPMD
> 
> From: Damjan Marion 
> 
> Decode the checksum flags from the rx descriptor, setting the appropriate bit
> in the mbuf ol_flags field when the flag indicates a bad checksum.
> 
> Signed-off-by: Damjan Marion 
> Signed-off-by: Jeff Shaw 
Acked-by: Jing Chen 


[dpdk-dev] [PATCH v4 1/2] i40e: Add packet_type metadata in the i40e vPMD

2016-10-06 Thread Chen, Jing D

> -Original Message-
> From: Shaw, Jeffrey B
> Sent: Wednesday, October 5, 2016 11:38 PM
> To: dev at dpdk.org
> Cc: Zhang, Helin ; Wu, Jingjing
> ; damarion at cisco.com; Zhang, Qi Z
> ; Chen, Jing D 
> Subject: [PATCH v4 1/2] i40e: Add packet_type metadata in the i40e vPMD
> 
> From: Damjan Marion 
> 
> The ptype is decoded from the rx descriptor and stored in the packet type
> field in the mbuf using the same function as the non-vector driver.
> 
> Signed-off-by: Damjan Marion 
> Signed-off-by: Jeff Shaw 
> Acked-by: Qi Zhang 
> ---
> 
> Changes in v2:
>  - Add missing reference to i40e_recv_scattered_pkts_vec() when
>querying supported packet types.
> 
> Changes in v3:
>  - None. (Please ignore this version).
> 
> Changes in v4:
>  - Fix rss/fdir status mask and shift to get accurate Flow Director Filter
>Match (FLM) indication.
> 
>  drivers/net/i40e/i40e_rxtx.c | 567 
> +--
>  drivers/net/i40e/i40e_rxtx.h | 563
> ++
>  drivers/net/i40e/i40e_rxtx_vec.c |  16 ++
>  3 files changed, 582 insertions(+), 564 deletions(-)
Acked-by : Jing Chen 



[dpdk-dev] [PATCH v2 2/2] i40e: Enable bad checksum flags in i40e vPMD

2016-10-06 Thread Chen, Jing D
Hi,

> -Original Message-
> From: Shaw, Jeffrey B
> Sent: Wednesday, October 5, 2016 5:13 PM
> To: dev at dpdk.org
> Cc: Zhang, Helin ; Wu, Jingjing
> ; damarion at cisco.com; Zhang, Qi Z
> ; Chen, Jing D 
> Subject: [PATCH v2 2/2] i40e: Enable bad checksum flags in i40e vPMD
> 
> From: Damjan Marion 
> 
> Decode the checksum flags from the rx descriptor, setting the appropriate bit
> in the mbuf ol_flags field when the flag indicates a bad checksum.
> 
> Signed-off-by: Damjan Marion 
> Signed-off-by: Jeff Shaw 
> ---
>  drivers/net/i40e/i40e_rxtx_vec.c | 48 +++---
> --
>  1 file changed, 28 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx_vec.c
> b/drivers/net/i40e/i40e_rxtx_vec.c
> index 6c63141..d2267ad 100644
> --- a/drivers/net/i40e/i40e_rxtx_vec.c
> +++ b/drivers/net/i40e/i40e_rxtx_vec.c
> @@ -138,19 +138,14 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq)  static
> inline void  desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)  {
> - __m128i vlan0, vlan1, rss;
> - union {
> - uint16_t e[4];
> - uint64_t dword;
> - } vol;
> + __m128i vlan0, vlan1, rss, l3_l4e;
> 
>   /* mask everything except RSS, flow director and VLAN flags
>* bit2 is for VLAN tag, bit11 for flow director indication
>* bit13:12 for RSS indication.
>*/
> - const __m128i rss_vlan_msk = _mm_set_epi16(
> - 0x, 0x, 0x, 0x,
> - 0x3804, 0x3804, 0x3804, 0x3804);
> + const __m128i rss_vlan_msk = _mm_set_epi32(
> + 0x1c03004, 0x1c03004, 0x1c03004, 0x1c03004);
> 
>   /* map rss and vlan type to rss hash and vlan flag */
>   const __m128i vlan_flags = _mm_set_epi8(0, 0, 0, 0, @@ -163,23
> +158,36 @@ desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
>   PKT_RX_RSS_HASH | PKT_RX_FDIR,
> PKT_RX_RSS_HASH, 0, 0,
>   0, 0, PKT_RX_FDIR, 0);
> 
> - vlan0 = _mm_unpackhi_epi16(descs[0], descs[1]);
> - vlan1 = _mm_unpackhi_epi16(descs[2], descs[3]);
> - vlan0 = _mm_unpacklo_epi32(vlan0, vlan1);
> + const __m128i l3_l4e_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
> + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD
> | PKT_RX_IP_CKSUM_BAD,
> + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD,
> + PKT_RX_EIP_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD,
> + PKT_RX_EIP_CKSUM_BAD,
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD,
> + PKT_RX_L4_CKSUM_BAD,
> + PKT_RX_IP_CKSUM_BAD,
> + 0);
> +
> + vlan0 = _mm_unpackhi_epi32(descs[0], descs[1]);
> + vlan1 = _mm_unpackhi_epi32(descs[2], descs[3]);
> + vlan0 = _mm_unpacklo_epi64(vlan0, vlan1);
> 
>   vlan1 = _mm_and_si128(vlan0, rss_vlan_msk);
>   vlan0 = _mm_shuffle_epi8(vlan_flags, vlan1);
> 
> - rss = _mm_srli_epi16(vlan1, 11);
> + rss = _mm_srli_epi32(vlan1, 12);
>   rss = _mm_shuffle_epi8(rss_flags, rss);

My bad. Original code will use bit[13:11] to identify RSS and FDIR flag. Now 
It masked bit 11 out when creating " rss_vlan_msk" and doing shift above,
while it still try to use  original "rss_flags"?



[dpdk-dev] [PATCH v2 2/2] i40e: Enable bad checksum flags in i40e vPMD

2016-10-05 Thread Chen, Jing D
Hi, 

> -Original Message-
> From: Shaw, Jeffrey B
> Sent: Wednesday, October 5, 2016 5:13 PM
> To: dev at dpdk.org
> Cc: Zhang, Helin ; Wu, Jingjing
> ; damarion at cisco.com; Zhang, Qi Z
> ; Chen, Jing D 
> Subject: [PATCH v2 2/2] i40e: Enable bad checksum flags in i40e vPMD
> 
> From: Damjan Marion 
> 
> Decode the checksum flags from the rx descriptor, setting the appropriate bit
> in the mbuf ol_flags field when the flag indicates a bad checksum.
> 
> Signed-off-by: Damjan Marion 
> Signed-off-by: Jeff Shaw 
Acked-by: Jing Chen 

It seems this patch also fixed a vlan flag bug, should it explain a little bit?




[dpdk-dev] [PATCH v2 1/2] i40e: Add packet_type metadata in the i40e vPMD

2016-10-05 Thread Chen, Jing D
Hi, 

> -Original Message-
> From: Shaw, Jeffrey B
> Sent: Wednesday, October 5, 2016 5:13 PM
> To: dev at dpdk.org
> Cc: Zhang, Helin ; Wu, Jingjing
> ; damarion at cisco.com; Zhang, Qi Z
> ; Chen, Jing D 
> Subject: [PATCH v2 1/2] i40e: Add packet_type metadata in the i40e vPMD
> 
> From: Damjan Marion 
> 
> The ptype is decoded from the rx descriptor and stored in the packet type
> field in the mbuf using the same function as the non-vector driver.
> 
> Signed-off-by: Damjan Marion 
> Signed-off-by: Jeff Shaw 
> Acked-by: Qi Zhang 
> ---
> 
> Changes in v2:
>  - Add missing reference to i40e_recv_scattered_pkts_vec() when
>querying supported packet types.
> 
>  drivers/net/i40e/i40e_rxtx.c | 567 
> +--
>  drivers/net/i40e/i40e_rxtx.h | 563
> ++
>  drivers/net/i40e/i40e_rxtx_vec.c |  16 ++
>  3 files changed, 582 insertions(+), 564 deletions(-)
> 
Acked-by: Jing Chen 



[dpdk-dev] [PATCH 1/2] i40e: Add packet_type metadata in the i40e vPMD

2016-10-05 Thread Chen, Jing D
Hi, 

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jeff Shaw
> Sent: Thursday, July 14, 2016 9:59 AM
> To: dev at dpdk.org; Zhang, Helin ; Wu, Jingjing
> ; damarion at cisco.com
> Subject: [dpdk-dev] [PATCH 1/2] i40e: Add packet_type metadata in the i40e
> vPMD
> 
> From: Damjan Marion 
> 
> The ptype is decoded from the rx descriptor and stored in the packet type
> field in the mbuf using the same function as the non-vector driver.
> 
> Signed-off-by: Damjan Marion 
> Signed-off-by: Jeff Shaw 
> ---
>  drivers/net/i40e/i40e_rxtx.c | 566 
> +--
>  drivers/net/i40e/i40e_rxtx.h | 563
> ++
>  drivers/net/i40e/i40e_rxtx_vec.c |  16 ++
>  3 files changed, 581 insertions(+), 564 deletions(-)
> 
> -
>  #define I40E_RX_DESC_EXT_STATUS_FLEXBH_MASK   0x03
>  #define I40E_RX_DESC_EXT_STATUS_FLEXBH_FD_ID  0x01
>  #define I40E_RX_DESC_EXT_STATUS_FLEXBH_FLEX   0x02
> @@ -2136,7 +1573,8 @@ i40e_dev_supported_ptypes_get(struct rte_eth_dev
> *dev)  #ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC
>   dev->rx_pkt_burst == i40e_recv_pkts_bulk_alloc ||  #endif
> - dev->rx_pkt_burst == i40e_recv_scattered_pkts)
> + dev->rx_pkt_burst == i40e_recv_scattered_pkts ||
> + dev->rx_pkt_burst == i40e_recv_pkts_vec)

Missed i40e_recv_scattered_pkts_vec()?



[dpdk-dev] [PATCH v2 0/5] implement new Rx checksum flag

2016-09-14 Thread Chen, Jing D
Hi,

> -Original Message-
> From: Wang, Xiao W
> Sent: Tuesday, September 06, 2016 9:27 AM
> To: dev at dpdk.org
> Cc: Chen, Jing D ; olivier.matz at 6wind.com; Wang, 
> Xiao W
> 
> Subject: [PATCH v2 0/5] implement new Rx checksum flag
> 
> v2:
> * Removed hw_ip_checksum check in fm10k_rx_vec_condition_check().
> 
> * Defined CKSUM_SHIFT for SSE bits shift.
> 
> * Changed commit title from "add back Rx checksum offload" to
>   "fix Rx checksum flags".
> 
> * Added new cksum flag support for ixgbe vector Rx, based on patch
>   (http://dpdk.org/dev/patchwork/patch/14630/) which came earlier.
> 
> v1:
> Following http://dpdk.org/dev/patchwork/patch/14941/, this patch set
> implements newly defined Rx checksum flag for igb, ixgbe, i40e and fm10k.
> 
> Currently, ixgbe and fm10k support Rx checksum offload in both scalar
> and vector datapath, while the other two don't, this patch set keeps
> this situation.
> 
> Note: This patch set has dependency on the following patches:
> 
> "mbuf: add new Rx checksum mbuf flags"
> (http://dpdk.org/dev/patchwork/patch/14941/)
> 
> "ixgbe: support checksum flags in sse vector Rx function"
> (http://dpdk.org/dev/patchwork/patch/14630/)
> 
> Xiao Wang (5):
>   net/fm10k: fix Rx checksum flags
>   net/fm10k: implement new Rx checksum flag
>   net/e1000: implement new Rx checksum flag
>   net/ixgbe: implement new Rx checksum flag
>   net/i40e: implement new Rx checksum flag
> 
>  drivers/net/e1000/igb_rxtx.c   |  4 +++-
>  drivers/net/fm10k/fm10k_rxtx.c | 14 ++
>  drivers/net/fm10k/fm10k_rxtx_vec.c | 24 +---
>  drivers/net/i40e/i40e_rxtx.c   |  6 ++
>  drivers/net/ixgbe/ixgbe_rxtx.c |  4 +++-
>  drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 30 --
>  6 files changed, 67 insertions(+), 15 deletions(-)

Acked-by : Jing Chen 


[dpdk-dev] [PATCH 2/5] net/fm10k: implement new Rx checksum flag

2016-08-29 Thread Chen, Jing D
Hi,

>  uint16_t
> diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c
> b/drivers/net/fm10k/fm10k_rxtx_vec.c
> index 9ea747e..8c08b44 100644
> --- a/drivers/net/fm10k/fm10k_rxtx_vec.c
> +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
> @@ -95,8 +95,10 @@ fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf
> **rx_pkts)
>   const __m128i l3l4cksum_flag = _mm_set_epi8(0, 0, 0, 0,
>   0, 0, 0, 0,
>   0, 0, 0, 0,
> - PKT_RX_IP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD,
> - PKT_RX_IP_CKSUM_BAD, PKT_RX_L4_CKSUM_BAD, 0);
> + (PKT_RX_IP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD) >> 1,
> + (PKT_RX_IP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_IP_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD) >>
> 1,
> + (PKT_RX_IP_CKSUM_GOOD |
> PKT_RX_L4_CKSUM_GOOD) >> 1);

Can we define a macro, like "#define RTE_CKSUM_SHIFT 1" to avoid numeric?

> 
>   const __m128i rxe_flag = _mm_set_epi8(0, 0, 0, 0,
>   0, 0, 0, 0,
> @@ -139,6 +141,7 @@ fm10k_desc_to_olflags_v(__m128i descs[4], struct
> rte_mbuf **rx_pkts)
>   /* Process L4/L3 checksum error flags */
>   cksumflag = _mm_srli_epi16(cksumflag, L3L4EFLAG_SHIFT);
>   cksumflag = _mm_shuffle_epi8(l3l4cksum_flag, cksumflag);
> + cksumflag = _mm_slli_epi16(cksumflag, 1);
>   vtag1 = _mm_or_si128(cksumflag, vtag1);
> 
>   vol.dword = _mm_cvtsi128_si64(vtag1);
> --
> 1.9.3

Besides that, just realize we should remove "hw_ip_checksum" check in func
fm10k_rx_vec_condition_check() since we already support it.
Can you help to make the change?


[dpdk-dev] [PATCH] net/fm10k: fix MAC address remnant in switch

2016-08-05 Thread Chen, Jing D
Hi,

> -Original Message-
> From: Wang, Xiao W
> Sent: Friday, August 05, 2016 11:18 AM
> To: Chen, Jing D ; Lin, Xueqin  intel.com>
> Cc: dev at dpdk.org; Wang, Xiao W 
> Subject: [PATCH] net/fm10k: fix MAC address remnant in switch
> 
> When testpmd quits with two ports, the second port's MAC address
> remains in the MAC table of switch manager.
> 
> There should be some time for HW to quiesce when closing a port,
> otherwise the subsequent port close won't be handled correctly.
> 
> This patch adds some delay after turning off a logic port, just as
> what the kernel driver does.
> 
> Fixes: 8b5c9ec20b7b ("support VMDQ in MAC/VLAN filter")
> 
> Reported-by: Xueqin Lin 
> Signed-off-by: Xiao Wang 
Acked-by : Jing Chen 



[dpdk-dev] [PATCH] net/fm10k: fix RSS hash config

2016-07-22 Thread Chen, Jing D
Hi, Thomas,


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Friday, July 22, 2016 4:29 PM
> To: Chen, Jing D 
> Cc: dev at dpdk.org; Wang, Xiao W ; Lin, Xueqin
> 
> Subject: Re: [dpdk-dev] [PATCH] net/fm10k: fix RSS hash config
> 
> 2016-07-22 08:23, Chen, Jing D:
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > 2016-07-21 09:35, Wang, Xiao W:
> > > > From: Chen, Jing D
> > > > > > --- a/drivers/net/fm10k/fm10k_ethdev.c
> > > > > > +++ b/drivers/net/fm10k/fm10k_ethdev.c
> > > > > > @@ -2159,8 +2159,8 @@ fm10k_rss_hash_update(struct rte_eth_dev
> *dev,
> > > > > >
> > > > > > PMD_INIT_FUNC_TRACE();
> > > > > >
> > > > > > -   if (rss_conf->rss_key_len < FM10K_RSSRK_SIZE *
> > > > > > -   FM10K_RSSRK_ENTRIES_PER_REG)
> > > > > > +   if (key && (rss_conf->rss_key_len < FM10K_RSSRK_SIZE *
> > > > > > +   FM10K_RSSRK_ENTRIES_PER_REG))
> > > > > > return -EINVAL;
> > > > > >
> > > > > > if (hf == 0)
> > > > >
> > > > > It's also possible that app wants to update rss key and not expect to 
> > > > > update
> hash
> > > > > function.
> > > > > Is that indicate we shouldn't return error in case hf == 0?
> > > > >
> > > >
> > > > If the app just wants to update RSS key, it needs to read out the RSS 
> > > > config first,
> > > then
> > > > change only the key field. This is what testpmd does for this operation.
> > > >
> > > > hf == 0 will disable RSS feature, I think we should return error to 
> > > > protect
> multi-
> > > queue.
> > >
> > > Jing, do you confirm we can apply this patch, please?
> > I think we need some rework or more explanations here.
> 
> It is not reasonnable to wait RC5 for such a fix.
> Either it is not important and postponed to 16.11 or you submit
> a v2 very shortly for 16.07.
> Please advise

Please kindly merge.


[dpdk-dev] [PATCH] net/fm10k: fix RSS hash config

2016-07-22 Thread Chen, Jing D
Hi, Thomas,

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Friday, July 22, 2016 4:22 PM
> To: Chen, Jing D 
> Cc: dev at dpdk.org; Wang, Xiao W ; Lin, Xueqin
> 
> Subject: Re: [dpdk-dev] [PATCH] net/fm10k: fix RSS hash config
> 
> 2016-07-21 09:35, Wang, Xiao W:
> > From: Chen, Jing D
> > > > --- a/drivers/net/fm10k/fm10k_ethdev.c
> > > > +++ b/drivers/net/fm10k/fm10k_ethdev.c
> > > > @@ -2159,8 +2159,8 @@ fm10k_rss_hash_update(struct rte_eth_dev *dev,
> > > >
> > > > PMD_INIT_FUNC_TRACE();
> > > >
> > > > -   if (rss_conf->rss_key_len < FM10K_RSSRK_SIZE *
> > > > -   FM10K_RSSRK_ENTRIES_PER_REG)
> > > > +   if (key && (rss_conf->rss_key_len < FM10K_RSSRK_SIZE *
> > > > +   FM10K_RSSRK_ENTRIES_PER_REG))
> > > > return -EINVAL;
> > > >
> > > > if (hf == 0)
> > >
> > > It's also possible that app wants to update rss key and not expect to 
> > > update hash
> > > function.
> > > Is that indicate we shouldn't return error in case hf == 0?
> > >
> >
> > If the app just wants to update RSS key, it needs to read out the RSS 
> > config first,
> then
> > change only the key field. This is what testpmd does for this operation.
> >
> > hf == 0 will disable RSS feature, I think we should return error to protect 
> > multi-
> queue.
> 
> Jing, do you confirm we can apply this patch, please?
I think we need some rework or more explanations here.


[dpdk-dev] [PATCH] net/fm10k: fix RSS hash config

2016-07-21 Thread Chen, Jing D
Hi,

> diff --git a/drivers/net/fm10k/fm10k_ethdev.c 
> b/drivers/net/fm10k/fm10k_ethdev.c
> index 144b2de..01f4a72 100644
> --- a/drivers/net/fm10k/fm10k_ethdev.c
> +++ b/drivers/net/fm10k/fm10k_ethdev.c
> @@ -2159,8 +2159,8 @@ fm10k_rss_hash_update(struct rte_eth_dev *dev,
> 
>   PMD_INIT_FUNC_TRACE();
> 
> - if (rss_conf->rss_key_len < FM10K_RSSRK_SIZE *
> - FM10K_RSSRK_ENTRIES_PER_REG)
> + if (key && (rss_conf->rss_key_len < FM10K_RSSRK_SIZE *
> + FM10K_RSSRK_ENTRIES_PER_REG))
>   return -EINVAL;
> 
>   if (hf == 0)

It's also possible that app wants to update rss key and not expect to update 
hash
function.
Is that indicate we shouldn't return error in case hf == 0?




[dpdk-dev] [PATCH] net/i40e: revert VLAN filtering fix

2016-07-13 Thread Chen, Jing D
Hi,

> -Original Message-
> From: Wu, Jingjing
> Sent: Wednesday, July 13, 2016 6:28 PM
> To: Richardson, Bruce 
> Cc: dev at dpdk.org; Wu, Jingjing ; Shaw, Jeffrey B
> ; Zhang, Helin ; Chen,
> Jing D ; Ananyev, Konstantin
> 
> Subject: [PATCH] net/i40e: revert VLAN filtering fix
> 
> This reverts commit 4761f57d58c6f52543738dbe299f846d62d75895.
> Introducing VLAN table by adding VLAN adminq command will cause NIC's
> throughput drop obviously. It's a hardware issue.
> With this revert, VLAN filtering can only work when promiscuous mode is
> disabled.
> 
> Reverts: 4761f57d58c6 ("net/i40e: fix VLAN filtering in promiscuous mode")
> 
> Reported-by: Jeffrey Shaw 
> Signed-off-by: Jingjing Wu 
Acked-by : Jing Chen 


[dpdk-dev] [PATCH] fm10k: fix VF cannot receive broadcast traffic

2016-06-19 Thread Chen, Jing D
Hi, Bruce,

> -Original Message-
> From: Richardson, Bruce
> Sent: Friday, June 17, 2016 6:19 PM
> To: Wang, Xiao W 
> Cc: Chen, Jing D ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] fm10k: fix VF cannot receive broadcast
> traffic
> 
> On Mon, Jun 06, 2016 at 05:00:47PM +0800, Wang Xiao W wrote:
> > When app tries promisc/allmulti setting, fm10k will check if a valid
> > glort is acquired, if not then exit without doing anything. It's a
> > long journey for VF to acquire glort info from VF to PF mailbox, PF to 
> > switch
> mailbox.
> > It could be a long interval that's out of DPDK's control. Thus, app
> > may
> 
> I think the use of "thus" here is wrong, as I suspect that the failure is not 
> due
> to the "long interval that's out of DPDK's control", but instead due to not
> having a valid glort.

The logic in VF is glort ID is invalid at beginning. When VF port is enabled by
sending mailbox to PF, PF will send a message back to VF without carrying valid
info. Then, VF will fake a glort ID. 
In this case, it's useless to do sanity check of VALID glort ID.  Besides that, 
VF didn't
use glort ID to do functional call at all.

> 
> > fail on promisc/allmulti setting in VF. In fact, we don't need a valid
> > glort value in VF, so this patch just skips the glort check for VF.
> >
> > Fixes: df02ba864695 ("fm10k: support promiscuous mode")
> >
> > Signed-off-by: Wang Xiao W 
> 
> I rework this commit message for you on apply. Please check the updated
> version when done.
> 
> /Bruce



[dpdk-dev] [PATCH] fm10k: fix VF cannot receive broadcast traffic

2016-06-14 Thread Chen, Jing D
Hi,

> -Original Message-
> From: Wang, Xiao W
> Sent: Monday, June 06, 2016 5:01 PM
> To: Chen, Jing D 
> Cc: dev at dpdk.org; Wang, Xiao W 
> Subject: [PATCH] fm10k: fix VF cannot receive broadcast traffic
> 
> When app tries promisc/allmulti setting, fm10k will check if a valid glort
> is acquired, if not then exit without doing anything. It's a long journey
> for VF to acquire glort info from VF to PF mailbox, PF to switch mailbox.
> It could be a long interval that's out of DPDK's control. Thus, app may
> fail on promisc/allmulti setting in VF. In fact, we don't need a valid
> glort value in VF, so this patch just skips the glort check for VF.
> 
> Fixes: df02ba864695 ("fm10k: support promiscuous mode")
> 
> Signed-off-by: Wang Xiao W 
Acked-by: Jing Chen 


[dpdk-dev] [PATCH] doc: remove reference to MATCH

2016-06-08 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Intel stopped supporting MATCH, remove reference of MATCH in the
document.

Signed-off-by: Chen Jing D(Mark) 
---
 doc/guides/nics/fm10k.rst |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst
index c4915d8..7fc4862 100644
--- a/doc/guides/nics/fm10k.rst
+++ b/doc/guides/nics/fm10k.rst
@@ -157,10 +157,9 @@ Switch manager
 The Intel FM1 family of NICs integrate a hardware switch and multiple host
 interfaces. The FM1 PMD driver only manages host interfaces. For the
 switch component another switch driver has to be loaded prior to to the
-FM1 PMD driver.  The switch driver can be acquired for Intel support or
-from the `Match Interface <https://github.com/match-interface>`_ project.
+FM1 PMD driver. The switch driver can be acquired from Intel support.
 Only Testpoint is validated with DPDK, the latest version that has been
-validated with DPDK2.2 is 4.1.6.
+validated with DPDK is 4.1.6.

 CRC striping
 
-- 
1.7.7.6



[dpdk-dev] [PATCH v2] fm10k: set packet type for multi-segment packets

2016-04-18 Thread Chen, Jing D
Hi,

> -Original Message-
> From: Michael Frasca [mailto:michael.frasca at oracle.com]
> Sent: Monday, April 18, 2016 8:52 PM
> To: Chen, Jing D 
> Cc: dev at dpdk.org; Michael Frasca 
> Subject: [PATCH v2] fm10k: set packet type for multi-segment packets
> 
> When building a chain of mbufs for a multi-segment packet, the packet_type
> field resides at the end of the chain. It should be copied forward to the head
> of the list.
> 
> Also, uses RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE to guard packet-type
> computation. The mbuf fields are not copied when this define is not set.
> 
> Fixes: fe65e1e1ce61 ("fm10k: add vector scatter Rx")
> 
> Signed-off-by: Michael Frasca 
Acked-by : Jing Chen 



[dpdk-dev] [PATCH] fm10k: set packet type for multi-segment packets

2016-04-18 Thread Chen, Jing D
Hi, Frasca,

> -Original Message-
> From: Michael Frasca [mailto:michael.frasca at oracle.com]
> Sent: Friday, April 15, 2016 3:32 AM
> To: Chen, Jing D
> Cc: dev at dpdk.org; Michael Frasca
> Subject: [PATCH] fm10k: set packet type for multi-segment packets
> 
> When building a chain of mbufs for a multi-segment packet, the
> packet_type field resides at the end of the chain. It should be
> copied forward to the head of the list.
> 
> Fixes: fe65e1e1ce61 ("fm10k: add vector scatter Rx")
> 
> Signed-off-by: Michael Frasca 
> ---
>  drivers/net/fm10k/fm10k_rxtx_vec.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c
> b/drivers/net/fm10k/fm10k_rxtx_vec.c
> index f8efe8f..66f126f 100644
> --- a/drivers/net/fm10k/fm10k_rxtx_vec.c
> +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
> @@ -608,6 +608,7 @@ fm10k_reassemble_packets(struct fm10k_rx_queue
> *rxq,
>   /* it's the last packet of the set */
>   start->hash = end->hash;
>   start->ol_flags = end->ol_flags;
> + start->packet_type = end->packet_type;
>   pkts[pkt_idx++] = start;
>   start = end = NULL;
>   }
> --
> 2.5.0
Good catch. Just one comment. We'll parse packet type until 
"RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE" is applied. Can we add this macro for
your change? Same to "hash" and "olf_flags".

Best Regards,
Mark


[dpdk-dev] [PATCH v3] doc: update nic overview

2016-04-07 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add feature support list for fm10k, fm10k-vec, fm10kvf and
fm10kvf-vec.

Signed-off-by: Chen Jing D(Mark) 
---
v3:
 - rebase to latest repo
 - Add a few feature set that fm10k support

v2:
 - fix a typo

 doc/guides/nics/overview.rst |  106 +-
 1 files changed, 53 insertions(+), 53 deletions(-)

diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst
index 05f7f72..5208a6c 100644
--- a/doc/guides/nics/overview.rst
+++ b/doc/guides/nics/overview.rst
@@ -74,76 +74,76 @@ Most of these differences are summarized below.

 .. table:: Features availability in networking drivers

-    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = =
-   Feature  a b b b c e e e i i i i i i i i i i f f m m m n n p r 
s v v v v x
-f n n o x 1 n n 4 4 4 4 g g x x x x m m l l p f u c i 
z h i i m e
-p x x n g 0 a i 0 0 0 0 b b g g g g 1 1 x x i p l a n 
e o r r x n
-a 2 2 d b 0   c e e e e   v b b b b 0 0 4 5 p   l p g 
d s t t n v
-c x x i e 0   . v v   f e e e e k k e 
a t i i e i
-k   v n   . f f   . v v   .   
t   o o t r
-e   f g   .   .   . f f   .   
a . 3 t
-t v   v   v   v   v   
2 v
-  e   e   e   e   e
 e
-  c   c   c   c   c
 c
-    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = =
+    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = = = =
+   Feature  a b b b c e e e i i i i i i i i i i f f f f m m m n n 
p r s v v v v x
+f n n o x 1 n n 4 4 4 4 g g x x x x m m m m l l p f u 
c i z h i i m e
+p x x n g 0 a i 0 0 0 0 b b g g g g 1 1 1 1 x x i p l 
a n e o r r x n
+a 2 2 d b 0   c e e e e   v b b b b 0 0 0 0 4 5 p   l 
p g d s t t n v
+c x x i e 0   . v v   f e e e e k k k k e  
   a t i i e i
+k   v n   . f f   . v v   . v v
   t   o o t r
+e   f g   .   .   . f f   . f f
   a . 3 t
+t v   v   v   v   v   v
   2 v
+  e   e   e   e   e   e
 e
+  c   c   c   c   c   c
 c
+    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = = = =
speed capabilities
-   link status  X X   X X X X   X X X X X X   
X X X X
-   link status event  X X X X   X X X X
 X
-   queue status event  
 X
-   Rx interrupt   X X X X X X X X X X X
-   queue start/stop X   X X X X X X X X X X   
X   X X
-   MTU update   X X X   X   X X X X X X
-   jumbo frame  X X X X X X X X X   X X X X X X
-   scattered Rx X X X   X X X X X X X X X X X X   
X   X
+   link status  X X   X X X X   X X X X X X
   X X X X
+   link status event  X X X X   X X X X
 X
+   queue status event  
 X
+   Rx interrupt   X X X X X X X X X X X X X X X
+   queue start/stop X   X X X X X X X X X X X X X X
   X   X X
+   MTU update   X X X   X   X X X X X X
+   jumbo frame  X X X X X X X X X   X X X X X X X X X X
+   scattered Rx X X X   X X X X X X X X X X X X X X X X
   X   X
LRO  X X X X
-   TSO  X   X   X X X X X X X X X X
-   promiscuous mode X X   X X X X X X X X X X X   
X   X X
-   allmulticast modeX X X X X X X X X X X X X X   
X   X X
-   unicast MAC filter X   X X X X X X X X X X X
   X X
-   multicast MAC filter   X X X X X
   X X
-   RSS hash X   X X X X X X X   X X X X X X
-   RSS key update   X   X X X X X   X X X X   X
-   RSS reta update  X   X X X X X   X X X X   X
-   VMDq X X X 

[dpdk-dev] [PATCH v2] doc: update nic overview

2016-04-07 Thread Chen, Jing D
Hi, Thomas,


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, April 07, 2016 3:54 PM
> To: Chen, Jing D
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] doc: update nic overview
> 
> Hi Mark,
> 
> I'm waiting for these small comments.
> Please could you rebase your patch and fix it if needed?
> 

Sorry, will do today.

> 2016-04-06 10:33, Thomas Monjalon:
> > 2016-04-01 16:55, Chen Jing D:
> > > -   stats per queue  X
> > >  X
> > > +   stats per queue  X
> > >  X
> >
> > I think you should fill "stats per queue"
> >
> > > BSD nic_uio  X   X X X X
> >
> > What is the issue with BSD?
> 



[dpdk-dev] [PATCH v2] doc: update nic overview

2016-04-05 Thread Chen, Jing D
Thomas,


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Saturday, April 02, 2016 5:40 AM
> To: Chen, Jing D
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] doc: update nic overview
> 
> 2016-04-01 16:55, Chen Jing D:
> > Add feature support list for fm10k, fm10k-vec, fm10kvf and
> > fm10kvf-vec.
> 
> Please help me to understand what is fm10kvf.
> I see only one fm10k driver:
> % git grep 'struct eth_driver' drivers/net/fm10k/
> drivers/net/fm10k/fm10k_ethdev.c:static struct eth_driver rte_pmd_fm10k
> = {

You can refer to below definition:

static const struct rte_pci_id pci_id_fm10k_map[] = {
#define RTE_PCI_DEV_ID_DECL_FM10K(vend, dev) { RTE_PCI_DEVICE(vend, dev) },
#define RTE_PCI_DEV_ID_DECL_FM10KVF(vend, dev) { RTE_PCI_DEVICE(vend, dev) },
#include "rte_pci_dev_ids.h"
{ .vendor_id = 0, /* sentinel */ },
};

As you can see that fm10k driver will manage 2 different types of devices, PF 
and VF. 
We can say that there are 2 drivers under fm10k directory. The aspects that not 
applicable
to PF/VF will use condition check to control execution path. This makes driver 
can work with
PF and VF devices and reduce redundant code. 


[dpdk-dev] [PATCH v2] doc: update nic overview

2016-04-01 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add feature support list for fm10k, fm10k-vec, fm10kvf and
fm10kvf-vec.

Signed-off-by: Chen Jing D(Mark) 
---
v2:
 - fix a typo

 doc/guides/nics/overview.rst |   86 +-
 1 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst
index 542479a..b5d1c2c 100644
--- a/doc/guides/nics/overview.rst
+++ b/doc/guides/nics/overview.rst
@@ -74,39 +74,39 @@ Most of these differences are summarized below.

 .. table:: Features availability in networking drivers

-    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = =
-   Feature  a b b b c e e e i i i i i i i i i i f f m m m n n p r 
s v v v v x
-f n n o x 1 n n 4 4 4 4 g g x x x x m m l l p f u c i 
z h i i m e
-p x x n g 0 a i 0 0 0 0 b b g g g g 1 1 x x i p l a n 
e o r r x n
-a 2 2 d b 0   c e e e e   v b b b b 0 0 4 5 p   l p g 
d s t t n v
-c x x i e 0   . v v   f e e e e k k e 
a t i i e i
-k   v n   . f f   . v v   .   
t   o o t r
-e   f g   .   .   . f f   .   
a . 3 t
-t v   v   v   v   v   
2 v
-  e   e   e   e   e
 e
-  c   c   c   c   c
 c
-    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = =
-   link status  X   X X   
X X
-   link status eventX X
 X
-   queue status event  
 X
-   Rx interrupt X X X X
-   queue start/stop X   X   X X X X   X
+    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = = = =
+   Feature  a b b b c e e e i i i i i i i i i i f f f f m m m n n 
p r s v v v v x
+f n n o x 1 n n 4 4 4 4 g g x x x x m m m m l l p f u 
c i z h i i m e
+p x x n g 0 a i 0 0 0 0 b b g g g g 1 1 1 1 x x i p l 
a n e o r r x n
+a 2 2 d b 0   c e e e e   v b b b b 0 0 0 0 4 5 p   l 
p g d s t t n v
+c x x i e 0   . v v   f e e e e k k k k e  
   a t i i e i
+k   v n   . f f   . v v   . v v
   t   o o t r
+e   f g   .   .   . f f   . f f
   a . 3 t
+t v   v   v   v   v   v
   2 v
+  e   e   e   e   e   e
 e
+  c   c   c   c   c   c
 c
+    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = = = =
+   link status  X   X X
   X X
+   link status eventX X
 X
+   queue status event  
 X
+   Rx interrupt X X X X X X X X
+   queue start/stop X   X   X X X X X X X X
   X
MTU update   X   X
-   jumbo frame  X   X   X X X X
-   scattered Rx X   X   X X X X   X
+   jumbo frame  X   X   X X X X X X X X
+   scattered Rx X   X   X X X X X X X X
   X
LRO
-   TSO  X   X   X X X X
-   promiscuous mode X   X X X X   X
-   allmulticast modeX   X X X X   X
-   unicast MAC filter   X X X X
-   multicast MAC filter X X X X
-   RSS hash X   X   X X X X
-   RSS key update   X   X X X X
-   RSS reta update  X   X X X X
-   VMDq X X
+   TSO  X   X   X X X X X X X X
+   promiscuous mode X   X X X X X X
   X
+   allmulticast modeX   X X X X X X
   X
+   unicast MAC filter   X X X X X X
+   multicast MAC filter X X X X X X
+   RSS hash X   X   X X X X X X X X
+   RSS key update   X   X X X X

[dpdk-dev] [PATCH] doc: update nic overview

2016-04-01 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add feature support list for fm10k, fm10k-vec, fm10kvf and
fm10kvf-vec.

Signed-off-by: Chen Jing D(Mark) 
---
 doc/guides/nics/overview.rst |   86 +-
 1 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst
index 542479a..4f6ad9e 100644
--- a/doc/guides/nics/overview.rst
+++ b/doc/guides/nics/overview.rst
@@ -74,39 +74,39 @@ Most of these differences are summarized below.

 .. table:: Features availability in networking drivers

-    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = =
-   Feature  a b b b c e e e i i i i i i i i i i f f m m m n n p r 
s v v v v x
-f n n o x 1 n n 4 4 4 4 g g x x x x m m l l p f u c i 
z h i i m e
-p x x n g 0 a i 0 0 0 0 b b g g g g 1 1 x x i p l a n 
e o r r x n
-a 2 2 d b 0   c e e e e   v b b b b 0 0 4 5 p   l p g 
d s t t n v
-c x x i e 0   . v v   f e e e e k k e 
a t i i e i
-k   v n   . f f   . v v   .   
t   o o t r
-e   f g   .   .   . f f   .   
a . 3 t
-t v   v   v   v   v   
2 v
-  e   e   e   e   e
 e
-  c   c   c   c   c
 c
-    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = =
-   link status  X   X X   
X X
-   link status eventX X
 X
-   queue status event  
 X
-   Rx interrupt X X X X
-   queue start/stop X   X   X X X X   X
+    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = = = =
+   Feature  a b b b c e e e i i i i i i i i i i f f f f m m m n n 
p r s v v v v x
+f n n o x 1 n n 4 4 4 4 g g x x x x m m m m l l p f u 
c i z h i i m e
+p x x n g 0 a i 0 0 0 0 b b g g g g 1 1 1 1 x x i p l 
a n e o r r x n
+a 2 2 d b 0   c e e e e   v b b b b 0 0 0 0 4 5 p   l 
p g d s t t n v
+c x x i e 0   . v v   f e e e e k k k k e  
   a t i i e i
+k   v n   . f f   . v v   . v .
   t   o o t r
+e   f g   .   .   . f f   . f .
   a . 3 t
+t v   v   v   v   v   v
   2 v
+  e   e   e   e   e   e
 e
+  c   c   c   c   c   c
 c
+    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = = = =
+   link status  X   X X
   X X
+   link status eventX X
 X
+   queue status event  
 X
+   Rx interrupt X X X X X X X X
+   queue start/stop X   X   X X X X X X X X
   X
MTU update   X   X
-   jumbo frame  X   X   X X X X
-   scattered Rx X   X   X X X X   X
+   jumbo frame  X   X   X X X X X X X X
+   scattered Rx X   X   X X X X X X X X
   X
LRO
-   TSO  X   X   X X X X
-   promiscuous mode X   X X X X   X
-   allmulticast modeX   X X X X   X
-   unicast MAC filter   X X X X
-   multicast MAC filter X X X X
-   RSS hash X   X   X X X X
-   RSS key update   X   X X X X
-   RSS reta update  X   X X X X
-   VMDq X X
+   TSO  X   X   X X X X X X X X
+   promiscuous mode X   X X X X X X
   X
+   allmulticast modeX   X X X X X X
   X
+   unicast MAC filter   X X X X X X
+   multicast MAC filter X X X X X X
+   RSS hash X   X   X X X X X X X X
+   RSS key update   X   X X X X X X X X
+  

[dpdk-dev] [PATCH] fm10k: conditionally disable RSS during device initialization

2016-03-31 Thread Chen, Jing D
Thomas,

We've agreed offline that the patch works without side effect.
Please kindly apply if possible.

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, March 31, 2016 9:57 PM
> To: dev at dpdk.org
> Cc: Michael Frasca ; Chen, Jing D
> 
> Subject: Re: [dpdk-dev] [PATCH] fm10k: conditionally disable RSS during
> device initialization
> 
> Please, anyone to confirm that the patch is valid and must be applied?
> This discussion shows some doubts.
> 
> 
> 2016-03-24 13:35, Michael Frasca:
> > Jing,
> >
> > Thanks for your assistance. The experiment that you have built should
> > allow you to observe the bug. In [5], I would expect that queue 0
> > receives roughly 1/4 of the packets that you sending, assuming the
> > input packets have varied IP addresses. Can you measure what % of
> > packets are actually being received in this single queue setup (after first
> running a 4-queue setup)?
> >
> > When trying to running with only one RX queue, the fm10k retains the
> > same RSS hash function and redirection table that was configured from
> > a previous run. As a result, some packets are still being directed to
> > other receive queues. I have confirmed this by polling the queue
> > specific stats, which I retrieved via rte_eth_xstats_get().
> >
> > Looking at fm10k_dev_rss_configure(), one should see that there is no
> > modification of fm10k registers when nb_rx_queues == 1. As far as I
> > can tell, this is the reason that only a certain partition of packets
> > are being receive in a single queue setup (after first running a multi-queue
> configuration).
> >
> > I am unable to access my development environment today, but if you
> > need, I can later craft a patch to l3fwd that shows the measurement of
> > packets received at each queue.
> >
> > Thanks,
> > Mike
> >
> >
> > > On Mar 24, 2016, at 2:40 AM, Chen, Jing D  
> > > wrote:
> > >
> > > Hi, Frasca,
> > >
> > >
> > >
> > >> -Original Message-
> > >> From: Michael Frasca [mailto:michael.frasca at oracle.com
> > >> <mailto:michael.frasca at oracle.com>]
> > >> Sent: Wednesday, March 23, 2016 9:43 PM
> > >> To: Chen, Jing D
> > >> Cc: dev at dpdk.org <mailto:dev at dpdk.org>
> > >> Subject: Re: [PATCH] fm10k: conditionally disable RSS during device
> > >> initialization
> > >>
> > >> Hi Jing,
> > >>
> > >> I ran into this issue while trying to run experiments with
> > >> different RSS configurations (no RSS being one cases). It is not
> > >> clear to me that setting this register to zero is the best way to disable
> RSS.
> > >>
> > >> After digging further, I have a theory that I'm having this issues
> > >> because I've only attached my DPDK application to SR-IOV ports. In
> > >> fm10k_dev_dglort_map_configure(), I see that 'RSS Length' is set
> > >> for the DGLORT decoder. However, it appears that this is only
> > >> invoked for physical functions.
> > >>
> > >> Could this be my problem? Is it required that I bind to the
> > >> physical function if I want to properly manipulate RSS?
> > >>
> > >> Thanks,
> > >> Mike
> > >>
> > > I don't know exactly what problem you ran into. I think we needn't
> > > worry about those DGLORT setting when using VF device.
> > >
> > > I've followed steps to use SRIOV device with RSS enabled and
> > > disabled, both are worked well from my side after applying your patch.
> Below is my setup.
> > >
> > > 1. PF with Linux driver "fm10k-next_0.19.3".
> > > 2. DPDK with latest code from master branch, apply your patch.
> > > 3. Use 1 VF device created by kernel driver.
> > > 4. use l3fwd with " ./examples/l3fwd/build/l3fwd -c fc -n 4 -- -p 0x1 --
> config="(0,0,2),(0,1,2),(0,2,3),(0,3,3)""
> > >with RSS enabled. After sending packets, I can see all 4 queues 
> > > received
> packets.
> > > 5. use l3fwd with " ./examples/l3fwd/build/l3fwd -c fc -n 4 -- -p 0x1 --
> config="(0,0,2)""
> > >with RSS disabled. After sending packets, I can see queue 0 received
> packets.
> > >
> > > Can you explain what actual problem is?
> > > We can talk offline.
> >
> 



[dpdk-dev] [PATCH] fm10k: conditionally disable RSS during device initialization

2016-03-24 Thread Chen, Jing D
Hi, Frasca,



> -Original Message-
> From: Michael Frasca [mailto:michael.frasca at oracle.com]
> Sent: Wednesday, March 23, 2016 9:43 PM
> To: Chen, Jing D
> Cc: dev at dpdk.org
> Subject: Re: [PATCH] fm10k: conditionally disable RSS during device
> initialization
> 
> Hi Jing,
> 
> I ran into this issue while trying to run experiments with different RSS
> configurations (no RSS being one cases). It is not clear to me that setting 
> this
> register to zero is the best way to disable RSS.
> 
> After digging further, I have a theory that I'm having this issues because 
> I've
> only attached my DPDK application to SR-IOV ports. In
> fm10k_dev_dglort_map_configure(), I see that 'RSS Length' is set for the
> DGLORT
> decoder. However, it appears that this is only invoked for physical functions.
> 
> Could this be my problem? Is it required that I bind to the physical function
> if I want to properly manipulate RSS?
> 
> Thanks,
> Mike
> 
I don't know exactly what problem you ran into. I think we needn't worry about 
those DGLORT setting when using VF device.

I've followed steps to use SRIOV device with RSS enabled and disabled, both
are worked well from my side after applying your patch. Below is my setup.

1. PF with Linux driver "fm10k-next_0.19.3".
2. DPDK with latest code from master branch, apply your patch.
3. Use 1 VF device created by kernel driver.
4. use l3fwd with " ./examples/l3fwd/build/l3fwd -c fc -n 4 -- -p 0x1 
--config="(0,0,2),(0,1,2),(0,2,3),(0,3,3)""
with RSS enabled. After sending packets, I can see all 4 queues received 
packets.
5. use l3fwd with " ./examples/l3fwd/build/l3fwd -c fc -n 4 -- -p 0x1 
--config="(0,0,2)""
with RSS disabled. After sending packets, I can see queue 0 received 
packets.

Can you explain what actual problem is?
We can talk offline.


[dpdk-dev] [PATCH] fm10k: conditionally disable RSS during device initialization

2016-03-23 Thread Chen, Jing D
Hi,

> -Original Message-
> From: Michael Frasca [mailto:michael.frasca at oracle.com]
> Sent: Wednesday, March 23, 2016 12:58 AM
> To: Chen, Jing D
> Cc: dev at dpdk.org; Michael Frasca
> Subject: [PATCH] fm10k: conditionally disable RSS during device initialization
> 
> If the provided configuration does not call for RSS, then RSS is
> explicitly disabled. Without this change, the device continues to
> operate under the previous RSS configuration.
> 
> Fixes: 57033cdf8fdc ("fm10k: add PF RSS")
> 
> Signed-off-by: Michael Frasca 
Acked-by : Jing Chen 



[dpdk-dev] [PATCH v11 5/8] ethdev: add speed capabilities

2016-03-18 Thread Chen, Jing D
Hi,

Best Regards,
Mark


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Friday, March 18, 2016 2:09 AM
> To: marcdevel at gmail.com; Richardson, Bruce; Doherty, Declan; Ananyev,
> Konstantin; Lu, Wenzhuo; Zhang, Helin; Chen, Jing D; harish.patil at 
> qlogic.com;
> rahul.lakkireddy at chelsio.com; johndale at cisco.com; vido at cesnet.cz;
> adrien.mazarguil at 6wind.com; alejandro.lucero at netronome.com
> Cc: dev at dpdk.org
> Subject: [PATCH v11 5/8] ethdev: add speed capabilities
> 
> From: Marc Sune 
> 
> The speed capabilities of a device can be retrieved with
> rte_eth_dev_info_get().
> 
> The new field speed_capa is initialized in the drivers without
> taking care of device characteristics in this patch.
> When the capabilities of a driver are accurate, the table in
> overview.rst must be filled.
> 
> Signed-off-by: Marc Sune 
> ---
>  doc/guides/nics/overview.rst   |  1 +
>  doc/guides/rel_notes/release_16_04.rst |  8 
>  drivers/net/bnx2x/bnx2x_ethdev.c   |  1 +
>  drivers/net/cxgbe/cxgbe_ethdev.c   |  1 +
>  drivers/net/e1000/em_ethdev.c  |  4 
>  drivers/net/e1000/igb_ethdev.c |  4 
>  drivers/net/fm10k/fm10k_ethdev.c   |  4 
>  drivers/net/i40e/i40e_ethdev.c |  8 
>  drivers/net/ixgbe/ixgbe_ethdev.c   |  8 
>  drivers/net/mlx4/mlx4.c|  2 ++
>  drivers/net/mlx5/mlx5_ethdev.c |  3 +++
>  drivers/net/nfp/nfp_net.c  |  2 ++
>  lib/librte_ether/rte_ethdev.h  | 21 +
>  13 files changed, 67 insertions(+)
> 
> 
>  static void
> diff --git a/drivers/net/fm10k/fm10k_ethdev.c
> b/drivers/net/fm10k/fm10k_ethdev.c
> index edc8c11..2a1c222 100644
> --- a/drivers/net/fm10k/fm10k_ethdev.c
> +++ b/drivers/net/fm10k/fm10k_ethdev.c
> @@ -1410,6 +1410,10 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
>   .nb_min = FM10K_MIN_TX_DESC,
>   .nb_align = FM10K_MULT_TX_DESC,
>   };
> +
> + dev_info->speed_capa = ETH_LINK_SPEED_1G |
> ETH_LINK_SPEED_2_5G |
> + ETH_LINK_SPEED_10G | ETH_LINK_SPEED_25G |
> + ETH_LINK_SPEED_40G;
>  }
> 

Fm10k has 100G capability, we'd better to add ETH_LINK_SPEED_100G here.



[dpdk-dev] [PATCH v3 00/18] fm10k: update shared code

2016-03-08 Thread Chen, Jing D
Hi, Xiao

> -Original Message-
> From: Wang, Xiao W
> Sent: Tuesday, March 8, 2016 8:15 AM
> To: Richardson, Bruce ; Chen, Jing D
> 
> Cc: Chen, Jing D ; dev at dpdk.org; He, Shaopeng
> 
> Subject: RE: [PATCH v3 00/18] fm10k: update shared code
> 
> 
> 
> > -Original Message-
> > From: Richardson, Bruce
> > Sent: Tuesday, March 8, 2016 9:24 PM
> > To: Wang, Xiao W ; Chen, Jing D
> > 
> > Cc: Chen, Jing D ; dev at dpdk.org; He, Shaopeng
> > 
> > Subject: Re: [PATCH v3 00/18] fm10k: update shared code
> >
> > On Fri, Feb 19, 2016 at 07:06:47PM +0800, Wang Xiao W wrote:
> > > v3:
> > > * Fixed checkpatch.pl warning about long commit message.
> > > * Fixed the issue of compile failure about part of patches applied.
> > > * Split the misc-small-fixes patch into several patches.
> > >
> > > v2:
> > > * Put the two extra fix patches ahead of the base code patches.
> > >
> > > This patch set has passed regression test.
> > >
> > > Wang Xiao W (18):
> > >   fm10k: use default mailbox message handler for PF
> > >   fm10k/base: correct typecast in fm10k_update_xc_addr_pf
> > >   fm10k/base: cleanup namespace pollution
> > >   fm10k/base: use bitshift for itr_scale
> > >   fm10k/base: reset max_queues on init_hw_vf failure
> > >   fm10k/base: document ITR scale workaround in VF TDLEN register
> > >   fm10k/base: cleanup lines over 80 characters
> > >   fm10k/base: cleanup useless else
> > >   fm10k/base: use BIT macro instead of open-coded bit-shifting of 1
> > >   fm10k/base: do not use CamelCase
> > >   fm10k/base: use memcpy for mac addr copy
> > >   fm10k/base: allow removal of is_slot_appropriate function
> > >   fm10k/base: consistently use VLAN ID when referencing vid variables
> > >   fm10k/base: imporve comment per upstream review changes
> > >   fm10k/base: fix TLV structures alignment
> > >   fm10k/base: move constants to the right of binary operators
> > >   fm10k/base: minor cleanups
> > >   fm10k/base: remove unused struct element
> > >
> > >  drivers/net/fm10k/base/fm10k_api.c   |   2 +
> > >  drivers/net/fm10k/base/fm10k_api.h   |   2 +
> > >  drivers/net/fm10k/base/fm10k_mbx.c   |  63 +++-
> > >  drivers/net/fm10k/base/fm10k_mbx.h   |  11 +--
> > >  drivers/net/fm10k/base/fm10k_osdep.h |  32 ++
> > >  drivers/net/fm10k/base/fm10k_pf.c|  88 +
> > >  drivers/net/fm10k/base/fm10k_pf.h|  18 ++--
> > >  drivers/net/fm10k/base/fm10k_tlv.c   |  40 
> > >  drivers/net/fm10k/base/fm10k_tlv.h   |   9 +-
> > >  drivers/net/fm10k/base/fm10k_type.h  | 182 +++-
> ---
> > >  drivers/net/fm10k/base/fm10k_vf.c|  32 --
> > >  drivers/net/fm10k/fm10k_ethdev.c |  41 +++-
> > >  12 files changed, 222 insertions(+), 298 deletions(-)
> > >
> > > --
> > > 1.9.3
> > >
> > Hi Mark,
> >
> > Can we get fm10k maintainer review and/or ack on this patchset please.
> >
> > Thanks,
> > /Bruce
> 
> Hi Bruce,
> 
> Mark has reviewed and acked the patch set in v2, and I put the "Acked-by "
> in the v3 01/18 patch.
> It's the same for my FTAG patch.
> 

It's better to add acked-by in both patch set and cover letter, this may be more
helpful for maintainers. 

> Best Regards,
> Xiao


[dpdk-dev] [PATCH v4 05/12] pmd/fm10k: add dev_ptype_info_get implementation

2016-03-02 Thread Chen, Jing D
Hi,

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jianfeng Tan
Sent: Thursday, February 25, 2016 6:09 PM
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH v4 05/12] pmd/fm10k: add dev_ptype_info_get 
implementation

Signed-off-by: Jianfeng Tan 
---
 drivers/net/fm10k/fm10k_ethdev.c   | 50 ++
 drivers/net/fm10k/fm10k_rxtx.c |  3 +++
 drivers/net/fm10k/fm10k_rxtx_vec.c |  3 +++
 3 files changed, 56 insertions(+)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 421266b..429cbdd 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1335,6 +1335,55 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
};
 }

+#ifdef RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE
+static const uint32_t *
+fm10k_dev_ptype_info_get(struct rte_eth_dev *dev) {
+   if (dev->rx_pkt_burst == fm10k_recv_pkts ||
+   dev->rx_pkt_burst == fm10k_recv_scattered_pkts) {
+   static uint32_t ptypes[] = {
+   /* refers to rx_desc_to_ol_flags() */
+   RTE_PTYPE_L2_ETHER,
+   RTE_PTYPE_L3_IPV4,
+   RTE_PTYPE_L3_IPV4_EXT,
+   RTE_PTYPE_L3_IPV6,
+   RTE_PTYPE_L3_IPV6_EXT,
+   RTE_PTYPE_L4_TCP,
+   RTE_PTYPE_L4_UDP,
+   RTE_PTYPE_UNKNOWN
+   };
+
+   return ptypes;
+   } else if (dev->rx_pkt_burst == fm10k_recv_pkts_vec ||
+  dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec) {
+   static uint32_t ptypes_vec[] = {
+   /* refers to fm10k_desc_to_pktype_v() */
+   RTE_PTYPE_L3_IPV4,
+   RTE_PTYPE_L3_IPV4_EXT,
+   RTE_PTYPE_L3_IPV6,
+   RTE_PTYPE_L3_IPV6_EXT,
+   RTE_PTYPE_L4_TCP,
+   RTE_PTYPE_L4_UDP,
+   RTE_PTYPE_TUNNEL_GENEVE,
+   RTE_PTYPE_TUNNEL_NVGRE,
+   RTE_PTYPE_TUNNEL_VXLAN,
+   RTE_PTYPE_TUNNEL_GRE,
+   RTE_PTYPE_UNKNOWN
+   };
+
+   return ptypes_vec;
+   }
+
+   return NULL;
+}
May I know when " fm10k_dev_ptype_info_get " will be called? In fm10k, the 
actual 
Rx/tx func will be decided after port is started. 


[dpdk-dev] [PATCH v3] doc: add Vector FM10K introductions

2016-02-26 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add introductions on how to enable Vector FM10K Rx/Tx functions,
the preconditions and assumptions on Rx/Tx configuration parameters.
The new content also lists the limitations of vector, so app/customer
can do better to select best Rx/Tx functions.

Signed-off-by: Chen Jing D(Mark) 
---
v3:
 - rebase to dpdk-next-16.04
 - Minor change to reword a few sentences.

v2:
 - rebase to latest repo
 - Reword a few sentences that not follow coding style.

 doc/guides/nics/fm10k.rst |   98 +
 1 files changed, 98 insertions(+), 0 deletions(-)

diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst
index 4206b7f..b97f611 100644
--- a/doc/guides/nics/fm10k.rst
+++ b/doc/guides/nics/fm10k.rst
@@ -35,6 +35,104 @@ The FM10K poll mode driver library provides support for the 
Intel FM1
 (FM10K) family of 40GbE/100GbE adapters.


+Vector PMD for FM10K
+
+
+Vector PMD (vPMD) uses Intel? SIMD instructions to optimize packet I/O.
+It improves load/store bandwidth efficiency of L1 data cache by using a wider
+SSE/AVX ''register (1)''.
+The wider register gives space to hold multiple packet buffers so as to save
+on the number of instructions when bulk processing packets.
+
+There is no change to the PMD API. The RX/TX handlers are the only two entries 
for
+vPMD packet I/O. They are transparently registered at runtime RX/TX execution
+if all required conditions are met.
+
+1.  To date, only an SSE version of FM10K vPMD is available.
+To ensure that vPMD is in the binary code, set
+``CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y`` in the configure file.
+
+Some constraints apply as pre-conditions for specific optimizations on bulk
+packet transfers. The following sections explain RX and TX constraints in the
+vPMD.
+
+
+RX Constraints
+~~
+
+
+Prerequisites and Pre-conditions
+
+
+For Vector RX it is assumed that the number of descriptor rings will be a power
+of 2. With this pre-condition, the ring pointer can easily scroll back to the
+head after hitting the tail without a conditional check. In addition Vector RX
+can use this assumption to do a bit mask using ``ring_size - 1``.
+
+
+Features not Supported by Vector RX PMD
+^^^
+
+Some features are not supported when trying to increase the throughput in
+vPMD. They are:
+
+*   IEEE1588
+
+*   Flow director
+
+*   Header split
+
+*   RX checksum offload
+
+Other features are supported using optional MACRO configuration. They include:
+
+*   HW VLAN strip
+
+*   L3/L4 packet type
+
+To enable via ``RX_OLFLAGS`` use ``RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y``.
+
+To guarantee the constraint, the following configuration flags in 
``dev_conf.rxmode``
+will be checked:
+
+*   ``hw_vlan_extend``
+
+*   ``hw_ip_checksum``
+
+*   ``header_split``
+
+*   ``fdir_conf->mode``
+
+
+RX Burst Size
+^
+
+As vPMD is focused on high throughput, it processes 4 packets at a time. So it 
assumes
+that the RX burst should be greater than 4 packets per burst. It returns zero 
if using
+``nb_pkt`` < 4 in the receive handler. If ``nb_pkt`` is not a multiple of 4, a
+floor alignment will be applied.
+
+
+TX Constraint
+~
+
+Features not Supported by TX Vector PMD
+^^^
+
+TX vPMD only works when ``txq_flags`` is set to ``FM10K_SIMPLE_TX_FLAG``.
+This means that it does not support TX multi-segment, VLAN offload or TX csum
+offload. The following MACROs are used for these three features:
+
+*   ``ETH_TXQ_FLAGS_NOMULTSEGS``
+
+*   ``ETH_TXQ_FLAGS_NOVLANOFFL``
+
+*   ``ETH_TXQ_FLAGS_NOXSUMSCTP``
+
+*   ``ETH_TXQ_FLAGS_NOXSUMUDP``
+
+*   ``ETH_TXQ_FLAGS_NOXSUMTCP``
+
 Limitations
 ---

-- 
1.7.7.6



[dpdk-dev] [PATCH v3 1/3] fm10k: enable FTAG based forwarding

2016-02-25 Thread Chen, Jing D
Hi, Bruce,

> -Original Message-
> From: Richardson, Bruce
> Sent: Thursday, February 25, 2016 9:35 PM
> To: Chen, Jing D 
> Cc: Thomas Monjalon ; Wang, Xiao W
> ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 1/3] fm10k: enable FTAG based
> forwarding
> 
> On Thu, Feb 25, 2016 at 10:04:02AM +, Chen, Jing D wrote:
> > Hi, Bruce, Thomas,
> >
> > Best Regards,
> > Mark
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas
> Monjalon
> > > Sent: Thursday, February 25, 2016 12:38 AM
> > > To: Richardson, Bruce; Wang, Xiao W
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v3 1/3] fm10k: enable FTAG based
> > > forwarding
> > >
> > > 2016-02-24 15:42, Bruce Richardson:
> > > > On Thu, Feb 04, 2016 at 11:38:47AM +0800, Wang Xiao W wrote:
> > > > > This patch enables reading sglort info into mbuf for RX and
> > > > > inserting an FTAG at the beginning of the packet for TX. The
> > > > > vlan_tci_outer field selected from rte_mbuf structure for sglort is 
> > > > > not
> used in fm10k now.
> > > > > In FTAG based forwarding mode, the switch will forward packets
> > > according
> > > > > to glort info in FTAG rather than mac and vlan table.
> > > > >
> > > > > To activate this feature, user needs to turn
> > > ``CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD``
> > > > > to y in common_linuxapp or common_bsdapp. Currently this feature
> > > > > is
> > > supported
> > > > > only on PF, because FM10K_PFVTCTL register is read-only for VF.
> > > > >
> > > > > Signed-off-by: Wang Xiao W 
> > > >
> > > > Any comments on this patch?
> > > >
> > > > My thoughts: is there a way in which this could be done without
> > > > adding in a
> > > new
> > > > build time config option?
> > >
> > > Bruce, it's simpler to explain that build time options are forbidden
> > > to enable such options.
> > > Or the terrific kid's approach: one day, the Big Build-Option Eater
> > > will come and will eat every undecided features! ;)
> >
> > This feature is trying to use FTAG (a unique tech in fm10k) instead of
> > mac/vlan to forward packets. App need a way to tell PMD driver that
> > which forwarding style it would like to use.
> 
> Why not just specify this in the port configuration at setup time?
> 

Please educate me. I think the port configuration flags are also common to all 
PMD
Drivers. Is it possible to add a flag like "RTE_USE_FTAG" and pass to PMD 
driver?

> > So, the best option is to let packets carry a flag in mbuf to tell drive in 
> > fast
> path.
> > You can see that this is unique to fm10k and we thought community
> > won't like to see this flag introduced into mbuf. If you do agree, we can
> introduce a new flag.
> 
> Why does it need to be specified per-mbuf? The existing config flag added is
> global, so a per-mbuf flag shouldn't be needed to get equivalent behaviour.
> 
> > So, we step backwards and assume customer will use static
> > configurations to enable this feature. After it is enabled, we'll assume app
> will use FTAG for all packets.
> 
> Yes, but instead of compile time option, why not port config-time option
> instead?
> 
> /Bruce


[dpdk-dev] [PATCH v3 1/3] fm10k: enable FTAG based forwarding

2016-02-25 Thread Chen, Jing D
Hi, Bruce, Thomas,

Best Regards,
Mark

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Thursday, February 25, 2016 12:38 AM
> To: Richardson, Bruce; Wang, Xiao W
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 1/3] fm10k: enable FTAG based
> forwarding
> 
> 2016-02-24 15:42, Bruce Richardson:
> > On Thu, Feb 04, 2016 at 11:38:47AM +0800, Wang Xiao W wrote:
> > > This patch enables reading sglort info into mbuf for RX and inserting
> > > an FTAG at the beginning of the packet for TX. The vlan_tci_outer field
> > > selected from rte_mbuf structure for sglort is not used in fm10k now.
> > > In FTAG based forwarding mode, the switch will forward packets
> according
> > > to glort info in FTAG rather than mac and vlan table.
> > >
> > > To activate this feature, user needs to turn
> ``CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD``
> > > to y in common_linuxapp or common_bsdapp. Currently this feature is
> supported
> > > only on PF, because FM10K_PFVTCTL register is read-only for VF.
> > >
> > > Signed-off-by: Wang Xiao W 
> >
> > Any comments on this patch?
> >
> > My thoughts: is there a way in which this could be done without adding in a
> new
> > build time config option?
> 
> Bruce, it's simpler to explain that build time options are forbidden to
> enable such options.
> Or the terrific kid's approach: one day, the Big Build-Option Eater will come
> and will eat every undecided features! ;)

This feature is trying to use FTAG (a unique tech in fm10k) instead of mac/vlan
to forward packets. App need a way to tell PMD driver that which forwarding
style it would like to use. 
So, the best option is to let packets carry a flag in mbuf to tell drive in 
fast path. 
You can see that this is unique to fm10k and we thought community won't like to 
see 
this flag introduced into mbuf. If you do agree, we can introduce a new flag.
So, we step backwards and assume customer will use static configurations to 
enable
this feature. After it is enabled, we'll assume app will use FTAG for all 
packets.


[dpdk-dev] [PATCH] fm10k: optimize legacy TX func

2016-02-16 Thread Chen, Jing D
Hi,  Bruce,

> -Original Message-
> From: Richardson, Bruce
> Sent: Tuesday, February 16, 2016 11:28 PM
> To: Chen, Jing D
> Cc: Qiu, Michael; Ananyev, Konstantin; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] fm10k: optimize legacy TX func
> 
> On Thu, Jan 28, 2016 at 05:45:59PM +0800, Chen Jing D(Mark) wrote:
> > From: "Chen Jing D(Mark)" 
> >
> > When legacy TX func tries to free a bunch of mbufs, it will free them
> > one by one. This change will scan the free list and merge the requests
> > in case they belongs to same pool, then free once, which will reduce
> > cycles on freeing mbufs.
> >
> > Signed-off-by: Chen Jing D(Mark) 
> > ---
> >  doc/guides/rel_notes/release_2_3.rst |2 +
> >  drivers/net/fm10k/fm10k_rxtx.c   |   59
> -
> >  2 files changed, 52 insertions(+), 9 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/release_2_3.rst
> > b/doc/guides/rel_notes/release_2_3.rst
> > index 99de186..20ce78d 100644
> > --- a/doc/guides/rel_notes/release_2_3.rst
> > +++ b/doc/guides/rel_notes/release_2_3.rst
> > @@ -3,7 +3,9 @@ DPDK Release 2.3
> >
> >  New Features
> >  
> > +* **Optimize fm10k Tx func.**
> >
> > +  * Free multiple mbufs at a time to reduce freeing mbuf cycles.
> >
> 
> Is this really a significant enough change to warrant being called out in the
> release notes?
> Personally, I don't think so, so if you are ok with it, I'll just apply this 
> patch
> without the RN update.
> 
> /Bruce

This change will have some performance gain with legacy TX func.
That's why I'd like to add a line in release notes.
If you thinks it's not necessary, please kindly remove it.


[dpdk-dev] [PATCH v3] fm10k: fix switch manager high CPU usage

2016-02-16 Thread Chen, Jing D
Hi,

Best Regards,
Mark


> -Original Message-
> From: He, Shaopeng
> Sent: Friday, February 05, 2016 10:46 AM
> To: dev at dpdk.org
> Cc: Chen, Jing D; Wang, Xiao W; He, Shaopeng
> Subject: [PATCH v3] fm10k: fix switch manager high CPU usage
> 
> fm10k switch core uses source MAC + VID + SGLORT to do
> look up in MAC table. If no match, an exception interrupt
> will be sent to the switch manager. Too much of this kind
> of exception interrupts cause switch manager side high CPU
> usage.
> To reproduce this issue, one DPDK testpmd runs on a server
> with one fm10k NIC, mac forwards test traffic from one of
> fm10k ports to another port. The CPU usage for the switch
> manager will go up to about 20% for test traffic rate at
> 10G bps, comparing to near 0% for no test traffic.
> This patch fixes this issue. A default SGLORT is assigned
> to each TX queue. This default value works for non-VMDq mode
> and current VMDq example. For advanced VMDq usage, e.g.
> different source MAC address for different TX queue, FTAG
> forwarding function could be used to change this default
> SGLORT value.
> 
> Fixes: 9ae6068c ("fm10k: add dev start/stop")
> 
> Signed-off-by: Shaopeng He 
Acked-by : Jing Chen 




[dpdk-dev] [PATCH v2 00/16] fm10k: update shared code

2016-02-16 Thread Chen, Jing D
Hi,

Best Regards,
Mark


> -Original Message-
> From: Wang, Xiao W
> Sent: Wednesday, January 27, 2016 11:51 AM
> To: Chen, Jing D
> Cc: dev at dpdk.org; Richardson, Bruce; He, Shaopeng; Wang, Xiao W
> Subject: [PATCH v2 00/16] fm10k: update shared code
> 
> v2:
> * Put the two extra fix patches ahead of the base code patches.
> 
> Wang Xiao W (16):
>   fm10k: use default mailbox message handler for pf
>   fm10k/base: add macro definitions that are needed
>   fm10k/base: cleanup namespace pollution and correct typecast
>   fm10k/base: use bitshift for itr_scale
>   fm10k/base: reset max_queues on init_hw_vf failure
>   fm10k/base: document ITR scale workaround in VF TDLEN register
>   fm10k/base: fix checkpatch warning
>   fm10k/base: use BIT macro instead of open-coded bit-shifting of 1
>   fm10k/base: do not use CamelCase
>   fm10k/base: use memcpy for mac addr copy
>   fm10k/base: allow removal of is_slot_appropriate function
>   fm10k/base: consistently use VLAN ID when referencing vid variables
>   fm10k/base: fix comment per upstream review changes
>   fm10k/base: TLV structures must be 4byte aligned, not 1byte aligned
>   fm10k/base: move constants to the right of binary operators
>   fm10k/base: minor cleanups
> 
>  drivers/net/fm10k/base/fm10k_api.c   |   2 +
>  drivers/net/fm10k/base/fm10k_api.h   |   2 +
>  drivers/net/fm10k/base/fm10k_mbx.c   |  63 +++-
>  drivers/net/fm10k/base/fm10k_mbx.h   |  11 +--
>  drivers/net/fm10k/base/fm10k_osdep.h |  30 ++
>  drivers/net/fm10k/base/fm10k_pf.c|  88 +
>  drivers/net/fm10k/base/fm10k_pf.h|  18 ++--
>  drivers/net/fm10k/base/fm10k_tlv.c   |  40 
>  drivers/net/fm10k/base/fm10k_tlv.h   |   9 +-
>  drivers/net/fm10k/base/fm10k_type.h  | 182 +++--
> --
>  drivers/net/fm10k/base/fm10k_vf.c|  32 --
>  drivers/net/fm10k/fm10k_ethdev.c |  41 +++-
>  12 files changed, 220 insertions(+), 298 deletions(-)
> 
> --
> 1.9.3

Acked-by : Jing Chen 




[dpdk-dev] [PATCH v8 2/4] ethdev: Fill speed capability bitmaps in the PMDs

2016-02-15 Thread Chen, Jing D
Hi, Marc,

Best Regards,
Mark

> -Original Message-
> From: N?lio Laranjeiro [mailto:nelio.laranjeiro at 6wind.com]
> Sent: Monday, February 15, 2016 4:43 PM
> To: Marc Sune
> Cc: dev at dpdk.org; Lu, Wenzhuo; Zhang, Helin; Harish Patil; Chen, Jing D
> Subject: Re: [dpdk-dev] [PATCH v8 2/4] ethdev: Fill speed capability bitmaps
> in the PMDs
> 
> On Sun, Feb 14, 2016 at 11:17:37PM +0100, Marc Sune wrote:
> > Added speed capabilities to all pmds supporting physical NICs:
> >
> > * e1000
> > * ixgbe
> > * i40
> > * bnx2x
> > * cxgbe
> > * mlx4
> > * mlx5
> > * nfp
> > * fm10k
> >[...]
> > diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> b/drivers/net/mlx5/mlx5_ethdev.c
> > index 1159fa3..99dac09 100644
> > --- a/drivers/net/mlx5/mlx5_ethdev.c
> > +++ b/drivers/net/mlx5/mlx5_ethdev.c
> > @@ -523,6 +523,11 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev,
> struct rte_eth_dev_info *info)
> >  * size if it is not fixed.
> >  * The API should be updated to solve this problem. */
> > info->reta_size = priv->ind_table_max_size;
> > +
> > +   info->speed_capa = ETH_SPEED_CAP_1G | ETH_SPEED_CAP_10G |
> > +   ETH_SPEED_CAP_10G |
> ETH_SPEED_CAP_40G |
> > +   ETH_SPEED_CAP_56G;
> > +
> > priv_unlock(priv);
> >  }
> 
> Hi Marc,
> 
> I have a question about this information, is it a list of the
> capabilities of the family or the capability of the NIC?
> Because with ConnectX4 family we have a range of NICs which does not
> support all this kind of speeds.
> 
> The speeds above are not completed the range is 1/10/25/40/50/100G.
> 

Fm10k also includes several cards and different ones are designed to have 
different speed.
A better solution for fm10k is to acquire NIC type (like, BR card for 100G/40G, 
Atwood for 25/10G, etc)
Then, return proper speed.

> --
> N?lio Laranjeiro
> 6WIND


[dpdk-dev] [PATCH v2] fm10k: handle err flags in vector RX func

2016-02-06 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Using SSE instructions to parse error flags in HW Rx descriptor,
then set corresponding bits of mbuf.

Signed-off-by: Chen Jing D(Mark) 
---
v2:
 - rebase to latest repo
 - fix a typo in the processing of HBO and IXE error flags

 doc/guides/rel_notes/release_2_3.rst |4 +++
 drivers/net/fm10k/fm10k_rxtx_vec.c   |   42 +-
 2 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 7945694..6715351 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -39,6 +39,10 @@ This section should contain new features added in this 
release. Sample format:

   Enabled virtio 1.0 support for virtio pmd driver.

+* **Handle error flags in fm10k vector RX func**
+
+  * Parse err flags in Rx desc and set error bits in mbuf with vector 
instructions.
+

 Resolved Issues
 ---
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 2a57eef..9f178db 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -61,11 +61,17 @@ fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);
 #define L3TYPE_SHIFT (4)
 /* L4 type shift */
 #define L4TYPE_SHIFT (7)
+/* HBO flag shift */
+#define HBOFLAG_SHIFT (10)
+/* RXE flag shift */
+#define RXEFLAG_SHIFT (13)
+/* IPE/L4E flag shift */
+#define L3L4EFLAG_SHIFT (14)

 static inline void
 fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 {
-   __m128i ptype0, ptype1, vtag0, vtag1;
+   __m128i ptype0, ptype1, vtag0, vtag1, eflag0, eflag1, cksumflag;
union {
uint16_t e[4];
uint64_t dword;
@@ -81,12 +87,29 @@ fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf 
**rx_pkts)
0x, 0x, 0x, 0x,
0x000F, 0x000F, 0x000F, 0x000F);

+   /* mask for HBO and RXE flag flags */
+   const __m128i rxe_msk = _mm_set_epi16(
+   0x, 0x, 0x, 0x,
+   0x0001, 0x0001, 0x0001, 0x0001);
+
+   const __m128i l3l4cksum_flag = _mm_set_epi8(0, 0, 0, 0,
+   0, 0, 0, 0,
+   0, 0, 0, 0,
+   PKT_RX_IP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD,
+   PKT_RX_IP_CKSUM_BAD, PKT_RX_L4_CKSUM_BAD, 0);
+
+   const __m128i rxe_flag = _mm_set_epi8(0, 0, 0, 0,
+   0, 0, 0, 0,
+   0, 0, 0, 0,
+   0, 0, PKT_RX_RECIP_ERR, 0);
+
/* map rss type to rss hash flag */
const __m128i rss_flags = _mm_set_epi8(0, 0, 0, 0,
0, 0, 0, PKT_RX_RSS_HASH,
PKT_RX_RSS_HASH, 0, PKT_RX_RSS_HASH, 0,
PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, 0);

+   /* Calculate RSS_hash and Vlan fields */
ptype0 = _mm_unpacklo_epi16(descs[0], descs[1]);
ptype1 = _mm_unpacklo_epi16(descs[2], descs[3]);
vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
@@ -97,10 +120,27 @@ fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf 
**rx_pkts)
ptype0 = _mm_shuffle_epi8(rss_flags, ptype0);

vtag1 = _mm_unpacklo_epi32(vtag0, vtag1);
+   eflag0 = vtag1;
+   cksumflag = vtag1;
vtag1 = _mm_srli_epi16(vtag1, VP_SHIFT);
vtag1 = _mm_and_si128(vtag1, pkttype_msk);

vtag1 = _mm_or_si128(ptype0, vtag1);
+
+   /* Process err flags, simply set RECIP_ERR bit if HBO/IXE is set */
+   eflag1 = _mm_srli_epi16(eflag0, RXEFLAG_SHIFT);
+   eflag0 = _mm_srli_epi16(eflag0, HBOFLAG_SHIFT);
+   eflag0 = _mm_or_si128(eflag0, eflag1);
+   eflag0 = _mm_and_si128(eflag0, rxe_msk);
+   eflag0 = _mm_shuffle_epi8(rxe_flag, eflag0);
+
+   vtag1 = _mm_or_si128(eflag0, vtag1);
+
+   /* Process L4/L3 checksum error flags */
+   cksumflag = _mm_srli_epi16(cksumflag, L3L4EFLAG_SHIFT);
+   cksumflag = _mm_shuffle_epi8(l3l4cksum_flag, cksumflag);
+   vtag1 = _mm_or_si128(cksumflag, vtag1);
+
vol.dword = _mm_cvtsi128_si64(vtag1);

rx_pkts[0]->ol_flags = vol.e[0];
-- 
1.7.7.6



[dpdk-dev] [PATCH v2] doc: add Vector FM10K introductions

2016-02-06 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add introductions on how to enable Vector FM10K Rx/Tx functions,
the preconditions and assumptions on Rx/Tx configuration parameters.
The new content also lists the limitations of vector, so app/customer
can do better to select best Rx/Tx functions.

Signed-off-by: Chen Jing D(Mark) 
---
v2:
 - rebase to latest repo
 - Reword a few sentences that not follow coding style.

 doc/guides/nics/fm10k.rst |   98 +
 1 files changed, 98 insertions(+), 0 deletions(-)

diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst
index 4206b7f..a502ffd 100644
--- a/doc/guides/nics/fm10k.rst
+++ b/doc/guides/nics/fm10k.rst
@@ -35,6 +35,104 @@ The FM10K poll mode driver library provides support for the 
Intel FM1
 (FM10K) family of 40GbE/100GbE adapters.


+Vector PMD for FM10K
+
+
+Vector PMD (vPMD) uses Intel? SIMD instructions to optimize packet I/O.
+It improves load/store bandwidth efficiency of L1 data cache by using a wider
+SSE/AVX register 1 (1).
+The wider register gives space to hold multiple packet buffers so as to save
+instruction number when processing bulk of packets.
+
+There is no change to PMD API. The RX/TX handler are the only two entries for
+vPMD packet I/O. They are transparently registered at runtime RX/TX execution
+if all condition checks pass.
+
+1.  To date, only an SSE version of FM10K vPMD is available.
+To ensure that vPMD is in the binary code, ensure that the option
+CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y is in the configure file.
+
+Some constraints apply as pre-conditions for specific optimizations on bulk
+packet transfers. The following sections explain RX and TX constraints in the
+vPMD.
+
+
+RX Constraints
+~~
+
+
+Prerequisites and Pre-conditions
+
+
+For Vector RX it is assumed that the number of descriptor ring will be power
+of 2. With this pre-condition, the ring pointer can easily scroll back to the
+head after hitting the tail without conditional check. In addition Vector RX
+can use this assumption to do a bit mask using ``ring_size - 1``.
+
+
+Features not Supported by Vector RX PMD
+^^^
+
+Some features are not supported when trying to increase the throughput in
+vPMD. They are:
+
+*   IEEE1588
+
+*   Flow director
+
+*   Header split
+
+*   RX checksum offload
+
+Other features are supported using optional MACRO configuration. They include:
+
+*   HW VLAN strip
+
+*   L3/L4 packet type
+
+To enabled by RX_OLFLAGS use ``RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y``.
+
+To guarantee the constraint, the configuration flags in ``dev_conf.rxmode``
+will be checked:
+
+*   ``hw_vlan_extend``
+
+*   ``hw_ip_checksum``
+
+*   ``header_split``
+
+*   ``fdir_conf->mode``
+
+
+RX Burst Size
+^
+
+As vPMD is focused on high throughput, it 4 packets at a time.  So it assumes
+that the RX burst should be greater than 4 per burst. It returns zero if using
+``nb_pkt`` < 4 in the receive handler. If ``nb_pkt`` is not multiple of 4, a
+floor alignment will be applied.
+
+
+TX Constraint
+~
+
+Features not Supported by TX Vector PMD
+^^^
+
+TX vPMD only works when ``txq_flags`` is set to ``FM10K_SIMPLE_TX_FLAG``.
+This means that it does not support TX multi-segment, VLAN offload or TX csum
+offload. The following MACROs are used for these three features:
+
+*   ``ETH_TXQ_FLAGS_NOMULTSEGS``
+
+*   ``ETH_TXQ_FLAGS_NOVLANOFFL``
+
+*   ``ETH_TXQ_FLAGS_NOXSUMSCTP``
+
+*   ``ETH_TXQ_FLAGS_NOXSUMUDP``
+
+*   ``ETH_TXQ_FLAGS_NOXSUMTCP``
+
 Limitations
 ---

-- 
1.7.7.6



[dpdk-dev] [PATCH v2] fm10k: fix switch manager high CPU usage

2016-02-05 Thread Chen, Jing D
Hi,

Best Regards,
Mark


> -Original Message-
> From: He, Shaopeng
> Sent: Thursday, February 04, 2016 8:45 PM
> To: dev at dpdk.org
> Cc: Chen, Jing D; Wang, Xiao W; He, Shaopeng
> Subject: [PATCH v2] fm10k: fix switch manager high CPU usage
> 
> fm10k switch core uses source MAC + VID + SGLORT to do
> look up in MAC table. If no match, an exception interrupt
> will be sent to the switch manager. Too much of this kind
> of exception interrupts cause switch manager side high CPU
> usage.
> To reproduce this issue, one DPDK testpmd runs on a server
> with one fm10k NIC, mac forwards test traffic from one of
> fm10k ports to another port. The CPU usage for the switch
> manager will go up to about 20% for test traffic rate at
> 10G bps, comparing to near 0% for no test traffic.
> This patch fixes this issue. A default SGLORT is assigned
> to each TX queue. This default value works for non-VMDq mode
> and current VMDq example. For advanced VMDq usage, e.g.
> different source MAC address for different TX queue, FTAG
> forwarding function could be used to change this default
> SGLORT value.
> 
> Signed-off-by: Shaopeng He 
Acked-by: Jing Chen 



[dpdk-dev] [PATCH v2] fm10k: enable PCIe port level Loopback Suppression

2016-02-05 Thread Chen, Jing D
Hi, 

Best Regards,
Mark


> -Original Message-
> From: He, Shaopeng
> Sent: Thursday, February 04, 2016 8:43 PM
> To: dev at dpdk.org
> Cc: Chen, Jing D; Wang, Xiao W; He, Shaopeng
> Subject: [PATCH v2] fm10k: enable PCIe port level Loopback Suppression
> 
> In FM10K, a single PCIe port can derive out a few logical ports,
> like SRIOV PF/VF devices, VMDQ objects. To better manage them, FM10K
> silicon assigned a Unique GLORT ID to each logical ports.
> When a logical port sends a broadcast packet, the silicon will flood
> it to all Logical ports, including the one sent the broadcast packet.
> To prevent this, silicon has a rxq register to fill the glort id of
> the logical port that queue binds to.
> FM10K has a switch core inside, which has another loopback suppression
> mechanism in the switch level. Switch level loopback suppression mostly
> works for the ether port traffic.
> This patch assigns a SGLORT for each RX queue, and enables PCIe port
> level Loopback Suppression.
> 
> Signed-off-by: Shaopeng He 
Acked-by : Jing Chen 



[dpdk-dev] [PATCH 1/2 v2] fm10k: Add Atwood Channel Support

2016-02-04 Thread Chen, Jing D
Hi,

Best Regards,
Mark


> -Original Message-
> From: Qiu, Michael
> Sent: Thursday, February 04, 2016 4:36 PM
> To: dev at dpdk.org
> Cc: Chen, Jing D; Qiu, Michael
> Subject: [PATCH 1/2 v2] fm10k: Add Atwood Channel Support
> 
> Atwood Channel is intel 25G NIC, and this patch add the support
> in DPDK.
> 
> Signed-off-by: Michael Qiu
> Acked-by: John McNamara 
Acked-by: Jing Chen 


[dpdk-dev] [PATCH 2/2] fm10k: update doc for Atwood Channel

2016-02-03 Thread Chen, Jing D
Hi,

Best Regards,
Mark


> -Original Message-
> From: Qiu, Michael
> Sent: Monday, January 11, 2016 3:28 PM
> To: dev at dpdk.org
> Cc: Chen, Jing D; Qiu, Michael
> Subject: [PATCH 2/2] fm10k: update doc for Atwood Channel
> 
> Atwood Channel is 20GbE NIC and belongs to Intel FM10K family,
> update the doc for it.

There is a typo above. It's 25G, not 20.

> 
> Signed-off-by: Michael Qiu 
> ---
>  doc/guides/rel_notes/release_2_3.rst | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_2_3.rst
> b/doc/guides/rel_notes/release_2_3.rst
> index 99de186..7dd9c0f 100644
> --- a/doc/guides/rel_notes/release_2_3.rst
> +++ b/doc/guides/rel_notes/release_2_3.rst
> @@ -3,7 +3,9 @@ DPDK Release 2.3
> 
>  New Features
>  
> +* **New NIC Atwood Channel support.**
> 
> +  Added support for the Atwood Channel variant of Intel's fm10k NIC family.
> 
>  Resolved Issues
>  ---
> --
> 1.9.3



[dpdk-dev] [PATCH] fm10k: enable PCIe port level Loopback Suppression

2016-02-03 Thread Chen, Jing D
Hi,

Best Regards,
Mark


> -Original Message-
> From: He, Shaopeng
> Sent: Thursday, January 28, 2016 1:49 PM
> To: dev at dpdk.org
> Cc: Chen, Jing D; Wang, Xiao W; He, Shaopeng
> Subject: [PATCH] fm10k: enable PCIe port level Loopback Suppression
> 
> A PCIe port may represent within it multiple logical ports
> (for example when SR-IOV is enabled, or when a VMDQ type logical
> port scheme is employed assigning ports to sets of queues).
> For this reason each RX queue in each PCIe port is given a source
> GLORT that is used for loopback suppression.
> This patch assigns a SGLORT for each RX queue, and enables PCIe
> port level Loopback Suppression.
> 

The log message is a little obscure for me. Maybe you can wrote:
In FM10K, a single PF device can derive out a few logical ports, like SRIOV
VF device, VMDQ object. To better manage them, FM10K silicon assigned a
Unique GLORT ID to each logical ports. 
When a logical port sends a broadcast packet, the silicon will flood it to all
Logical ports, including the one sent the broadcast packet. To prevent this,
silicon has a rxq register to fill the glort id of the logical port that queue 
binds 
to

> Signed-off-by: Shaopeng He 
> ---
>  drivers/net/fm10k/fm10k_ethdev.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/fm10k/fm10k_ethdev.c
> b/drivers/net/fm10k/fm10k_ethdev.c
> index f6eb05d..60f821a 100644
> --- a/drivers/net/fm10k/fm10k_ethdev.c
> +++ b/drivers/net/fm10k/fm10k_ethdev.c
> @@ -690,12 +690,15 @@ static int
>  fm10k_dev_rx_init(struct rte_eth_dev *dev)



[dpdk-dev] [PATCH] fm10k: handle err flags in vector RX func

2016-01-28 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Using SSE instructions to parse error flags in HW Rx descriptor,
then set corresponding bits of mbuf.

Signed-off-by: Chen Jing D(Mark) 
---
 doc/guides/rel_notes/release_2_3.rst |2 +
 drivers/net/fm10k/fm10k_rxtx_vec.c   |   42 +-
 2 files changed, 43 insertions(+), 1 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..19e8aa2 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -3,7 +3,9 @@ DPDK Release 2.3

 New Features
 
+* **Handle error flags in fm10k vector RX func**

+  * Parse err flags in Rx desc and set error bits in mbuf with vector 
instructions.

 Resolved Issues
 ---
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 2a57eef..0c48a48 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -61,11 +61,17 @@ fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);
 #define L3TYPE_SHIFT (4)
 /* L4 type shift */
 #define L4TYPE_SHIFT (7)
+/* HBO flag shift */
+#define HBOFLAG_SHIFT (10)
+/* RXE flag shift */
+#define RXEFLAG_SHIFT (13)
+/* IPE/L4E flag shift */
+#define L3L4EFLAG_SHIFT (14)

 static inline void
 fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 {
-   __m128i ptype0, ptype1, vtag0, vtag1;
+   __m128i ptype0, ptype1, vtag0, vtag1, eflag0, eflag1, cksumflag;
union {
uint16_t e[4];
uint64_t dword;
@@ -81,12 +87,29 @@ fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf 
**rx_pkts)
0x, 0x, 0x, 0x,
0x000F, 0x000F, 0x000F, 0x000F);

+   /* mask for HBO and RXE flag flags */
+   const __m128i rxe_msk = _mm_set_epi16(
+   0x, 0x, 0x, 0x,
+   0x0001, 0x0001, 0x0001, 0x0001);
+
+   const __m128i l3l4cksum_flag = _mm_set_epi8(0, 0, 0, 0,
+   0, 0, 0, 0,
+   0, 0, 0, 0,
+   PKT_RX_IP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD,
+   PKT_RX_IP_CKSUM_BAD, PKT_RX_L4_CKSUM_BAD, 0);
+
+   const __m128i rxe_flag = _mm_set_epi8(0, 0, 0, 0,
+   0, 0, 0, 0,
+   0, 0, 0, 0,
+   0, 0, PKT_RX_RECIP_ERR, 0);
+
/* map rss type to rss hash flag */
const __m128i rss_flags = _mm_set_epi8(0, 0, 0, 0,
0, 0, 0, PKT_RX_RSS_HASH,
PKT_RX_RSS_HASH, 0, PKT_RX_RSS_HASH, 0,
PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, 0);

+   /* Calculate RSS_hash and Vlan fields */
ptype0 = _mm_unpacklo_epi16(descs[0], descs[1]);
ptype1 = _mm_unpacklo_epi16(descs[2], descs[3]);
vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
@@ -97,10 +120,27 @@ fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf 
**rx_pkts)
ptype0 = _mm_shuffle_epi8(rss_flags, ptype0);

vtag1 = _mm_unpacklo_epi32(vtag0, vtag1);
+   eflag0 = vtag1;
+   cksumflag = vtag1;
vtag1 = _mm_srli_epi16(vtag1, VP_SHIFT);
vtag1 = _mm_and_si128(vtag1, pkttype_msk);

vtag1 = _mm_or_si128(ptype0, vtag1);
+
+   /* Process err flags, simply set RECIP_ERR bit if HBO/IXE is set */
+   eflag1 = _mm_srli_epi16(eflag0, RXEFLAG_SHIFT);
+   eflag0 = _mm_srli_epi16(eflag0, HBOFLAG_SHIFT);
+   eflag0 = _mm_or_si128(eflag0, eflag1);
+   eflag0 = _mm_and_si128(eflag1, rxe_msk);
+   eflag0 = _mm_shuffle_epi8(rxe_flag, eflag0);
+
+   vtag1 = _mm_or_si128(eflag0, vtag1);
+
+   /* Process L4/L3 checksum error flags */
+   cksumflag = _mm_srli_epi16(cksumflag, L3L4EFLAG_SHIFT);
+   cksumflag = _mm_shuffle_epi8(l3l4cksum_flag, cksumflag);
+   vtag1 = _mm_or_si128(cksumflag, vtag1);
+
vol.dword = _mm_cvtsi128_si64(vtag1);

rx_pkts[0]->ol_flags = vol.e[0];
-- 
1.7.7.6



[dpdk-dev] [PATCH] fm10k: optimize legacy TX func

2016-01-28 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

When legacy TX func tries to free a bunch of mbufs, it will free
them one by one. This change will scan the free list and merge the
requests in case they belongs to same pool, then free once, which
will reduce cycles on freeing mbufs.

Signed-off-by: Chen Jing D(Mark) 
---
 doc/guides/rel_notes/release_2_3.rst |2 +
 drivers/net/fm10k/fm10k_rxtx.c   |   59 -
 2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..20ce78d 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -3,7 +3,9 @@ DPDK Release 2.3

 New Features
 
+* **Optimize fm10k Tx func.**

+  * Free multiple mbufs at a time to reduce freeing mbuf cycles.

 Resolved Issues
 ---
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index e958865..f3de691 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -369,6 +369,51 @@ fm10k_recv_scattered_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts,
return nb_rcv;
 }

+/*
+ * Free multiple TX mbuf at a time if they are in the same pool
+ *
+ * @txep: software desc ring index that starts to free
+ * @num: number of descs to free
+ *
+ */
+static inline void tx_free_bulk_mbuf(struct rte_mbuf **txep, int num)
+{
+   struct rte_mbuf *m, *free[RTE_FM10K_TX_MAX_FREE_BUF_SZ];
+   int i;
+   int nb_free = 0;
+
+   if (unlikely(num == 0))
+   return;
+
+   m = __rte_pktmbuf_prefree_seg(txep[0]);
+   if (likely(m != NULL)) {
+   free[0] = m;
+   nb_free = 1;
+   for (i = 1; i < num; i++) {
+   m = __rte_pktmbuf_prefree_seg(txep[i]);
+   if (likely(m != NULL)) {
+   if (likely(m->pool == free[0]->pool))
+   free[nb_free++] = m;
+   else {
+   rte_mempool_put_bulk(free[0]->pool,
+   (void *)free, nb_free);
+   free[0] = m;
+   nb_free = 1;
+   }
+   }
+   txep[i] = NULL;
+   }
+   rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+   } else {
+   for (i = 1; i < num; i++) {
+   m = __rte_pktmbuf_prefree_seg(txep[i]);
+   if (m != NULL)
+   rte_mempool_put(m->pool, m);
+   txep[i] = NULL;
+   }
+   }
+}
+
 static inline void tx_free_descriptors(struct fm10k_tx_queue *q)
 {
uint16_t next_rs, count = 0;
@@ -385,11 +430,7 @@ static inline void tx_free_descriptors(struct 
fm10k_tx_queue *q)
 * including nb_desc */
if (q->last_free > next_rs) {
count = q->nb_desc - q->last_free;
-   while (q->last_free < q->nb_desc) {
-   rte_pktmbuf_free_seg(q->sw_ring[q->last_free]);
-   q->sw_ring[q->last_free] = NULL;
-   ++q->last_free;
-   }
+   tx_free_bulk_mbuf(>sw_ring[q->last_free], count);
q->last_free = 0;
}

@@ -397,10 +438,10 @@ static inline void tx_free_descriptors(struct 
fm10k_tx_queue *q)
q->nb_free += count + (next_rs + 1 - q->last_free);

/* free buffers from last_free, up to and including next_rs */
-   while (q->last_free <= next_rs) {
-   rte_pktmbuf_free_seg(q->sw_ring[q->last_free]);
-   q->sw_ring[q->last_free] = NULL;
-   ++q->last_free;
+   if (q->last_free <= next_rs) {
+   count = next_rs - q->last_free + 1;
+   tx_free_bulk_mbuf(>sw_ring[q->last_free], count);
+   q->last_free += count;
}

if (q->last_free == q->nb_desc)
-- 
1.7.7.6



[dpdk-dev] [PATCH] fm10k: allocate logical ports for flow director

2015-12-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

In fm10k, PF, VF, VMDQ or queues binding to flow director rule can
be considered as a logical port. Original implementation only create
single one for all cases. This change creates totally 128 logical
ones, first 64 for PF and VMDQ, second 64 for flow director.

Registers DGLORTDEC/DGLORTMAP define rules how to classify packets
into different queues. Now only PF and VMDQ cases are considered.
This change add rules for flow director.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_ethdev.c |   54 +++---
 1 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index e4aed94..6662157 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -55,6 +55,13 @@
 #define CHARS_PER_UINT32 (sizeof(uint32_t))
 #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)

+/* First 64 Logical ports for PF/VMDQ, second 64 for Flow director */
+#define MAX_LPORT_NUM128
+#define GLORT_FD_Q_BASE  0x40
+#define GLORT_PF_MASK0xFFC0
+#define GLORT_FD_MASKGLORT_PF_MASK
+#define GLORT_FD_INDEX   GLORT_FD_Q_BASE
+
 static void fm10k_close_mbx_service(struct fm10k_hw *hw);
 static void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev);
 static void fm10k_dev_promiscuous_disable(struct rte_eth_dev *dev);
@@ -571,22 +578,11 @@ fm10k_dev_rss_configure(struct rte_eth_dev *dev)
 }

 static void
-fm10k_dev_logic_port_update(struct rte_eth_dev *dev,
-   uint16_t nb_lport_old, uint16_t nb_lport_new)
+fm10k_dev_logic_port_update(struct rte_eth_dev *dev, uint16_t nb_lport_new)
 {
struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
uint32_t i;

-   fm10k_mbx_lock(hw);
-   /* Disable previous logic ports */
-   if (nb_lport_old)
-   hw->mac.ops.update_lport_state(hw, hw->mac.dglort_map,
-   nb_lport_old, false);
-   /* Enable new logic ports */
-   hw->mac.ops.update_lport_state(hw, hw->mac.dglort_map,
-   nb_lport_new, true);
-   fm10k_mbx_unlock(hw);
-
for (i = 0; i < nb_lport_new; i++) {
/* Set unicast mode by default. App can change
 * to other mode in other API func.
@@ -606,7 +602,7 @@ fm10k_dev_mq_rx_configure(struct rte_eth_dev *dev)
struct rte_eth_conf *dev_conf = >data->dev_conf;
struct fm10k_macvlan_filter_info *macvlan;
uint16_t nb_queue_pools = 0; /* pool number in configuration */
-   uint16_t nb_lport_new, nb_lport_old;
+   uint16_t nb_lport_new;

macvlan = FM10K_DEV_PRIVATE_TO_MACVLAN(dev->data->dev_private);
vmdq_conf = >data->dev_conf.rx_adv_conf.vmdq_rx_conf;
@@ -624,9 +620,8 @@ fm10k_dev_mq_rx_configure(struct rte_eth_dev *dev)
if (macvlan->nb_queue_pools == nb_queue_pools)
return;

-   nb_lport_old = macvlan->nb_queue_pools ? macvlan->nb_queue_pools : 1;
nb_lport_new = nb_queue_pools ? nb_queue_pools : 1;
-   fm10k_dev_logic_port_update(dev, nb_lport_old, nb_lport_new);
+   fm10k_dev_logic_port_update(dev, nb_lport_new);

/* reset MAC/VLAN as it's based on VMDQ or PF main VSI */
memset(dev->data->mac_addrs, 0,
@@ -997,7 +992,7 @@ static void
 fm10k_dev_dglort_map_configure(struct rte_eth_dev *dev)
 {
struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   uint32_t dglortdec, pool_len, rss_len, i;
+   uint32_t dglortdec, pool_len, rss_len, i, dglortmask;
uint16_t nb_queue_pools;
struct fm10k_macvlan_filter_info *macvlan;

@@ -1005,16 +1000,24 @@ fm10k_dev_dglort_map_configure(struct rte_eth_dev *dev)
nb_queue_pools = macvlan->nb_queue_pools;
pool_len = nb_queue_pools ? fls(nb_queue_pools - 1) : 0;
rss_len = fls(dev->data->nb_rx_queues - 1) - pool_len;
-   dglortdec = (rss_len << FM10K_DGLORTDEC_RSSLENGTH_SHIFT) | pool_len;
-
-   /* Establish only MAP 0 as valid */
-   FM10K_WRITE_REG(hw, FM10K_DGLORTMAP(0), FM10K_DGLORTMAP_ANY);

+   /* GLORT 0x0-0x3F are used by PF and VMDQ,  0x40-0x7F used by FD */
+   dglortdec = (rss_len << FM10K_DGLORTDEC_RSSLENGTH_SHIFT) | pool_len;
+   dglortmask = (GLORT_PF_MASK << FM10K_DGLORTMAP_MASK_SHIFT) |
+   hw->mac.dglort_map;
+   FM10K_WRITE_REG(hw, FM10K_DGLORTMAP(0), dglortmask);
/* Configure VMDQ/RSS DGlort Decoder */
FM10K_WRITE_REG(hw, FM10K_DGLORTDEC(0), dglortdec);

+   /* Flow Director configurations, only queue number is valid. */
+   dglortdec = fls(dev->data->nb_rx_queues - 1);
+   dglortmask = (GLORT_FD_MASK << FM10K_DGLORTMAP_MASK_SHIFT) |
+   (hw->mac.dglort_map + GLORT_FD_Q_BASE);
+   FM10K_WRITE_REG(hw, FM10K

[dpdk-dev] [PATCH] doc: add Vector FM10K introductions

2015-12-23 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add introductions on how to enable Vector FM10K Rx/Tx functions,
the preconditions and assumptions on Rx/Tx configuration parameters.
The new content also lists the limitations of vector, so app/customer
can do better to select best Rx/Tx functions.

Signed-off-by: Chen Jing D(Mark) 
---
 doc/guides/nics/fm10k.rst |   89 +
 1 files changed, 89 insertions(+), 0 deletions(-)

diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst
index 4206b7f..54b761c 100644
--- a/doc/guides/nics/fm10k.rst
+++ b/doc/guides/nics/fm10k.rst
@@ -34,6 +34,95 @@ FM10K Poll Mode Driver
 The FM10K poll mode driver library provides support for the Intel FM1
 (FM10K) family of 40GbE/100GbE adapters.

+Vector PMD for FM10K
+
+Vector PMD uses Intel? SIMD instructions to optimize packet I/O.
+It improves load/store bandwidth efficiency of L1 data cache by using a wider
+SSE/AVX register 1 (1).
+The wider register gives space to hold multiple packet buffers so as to save
+instruction number when processing bulk of packets.
+
+There is no change to PMD API. The RX/TX handler are the only two entries for
+vPMD packet I/O. They are transparently registered at runtime RX/TX execution
+if all condition checks pass.
+
+1.  To date, only an SSE version of FM10K vPMD is available.
+To ensure that vPMD is in the binary code, ensure that the option
+CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y is in the configure file.
+
+Some constraints apply as pre-conditions for specific optimizations on bulk
+packet transfers. The following sections explain RX and TX constraints in the
+vPMD.
+
+RX Constraints
+~~
+
+Prerequisites and Pre-conditions
+
+Number of descriptor ring must be power of 2. This is the assumptions for
+Vector RX. With this pre-condition, ring pointer can easily scroll back to head
+after hitting tail without conditional check. Besides that, Vector RX can use
+it to do bit mask by ``ring_size - 1``.
+
+Feature not Supported by Vector RX PMD
+^^
+Some features are not supported when trying to increase the throughput in vPMD.
+They are:
+
+*   IEEE1588
+
+*   FDIR
+
+*   Header split
+
+*   RX checksum offload
+
+Other features are supported using optional MACRO configuration. They include:
+
+*   HW VLAN strip
+
+*   L3/L4 packet type
+
+To enabled by RX_OLFLAGS (RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y)
+
+To guarantee the constraint, configuration flags in dev_conf.rxmode will be
+checked:
+
+*   hw_vlan_extend
+
+*   hw_ip_checksum
+
+*   header_split
+
+*   fdir_conf->mode
+
+RX Burst Size
+^
+
+As vPMD is focused on high throughput, which processes 4 packets at a time.
+So it assumes that the RX burst should be greater than 4 per burst. It returns
+zero if using nb_pkt < 4 in the receive handler. If nb_pkt is not multiple of
+4, a floor alignment will be applied.
+
+TX Constraint
+~
+
+Feature not Supported by TX Vector PMD
+^^
+
+TX vPMD only works when txq_flags is set to FM10K_SIMPLE_TX_FLAG.
+This means that it does not support TX multi-segment, VLAN offload and TX csum
+offload. The following MACROs are used for these three features:
+
+*   ETH_TXQ_FLAGS_NOMULTSEGS
+
+*   ETH_TXQ_FLAGS_NOVLANOFFL
+
+*   ETH_TXQ_FLAGS_NOXSUMSCTP
+
+*   ETH_TXQ_FLAGS_NOXSUMUDP
+
+*   ETH_TXQ_FLAGS_NOXSUMTCP

 Limitations
 ---
-- 
1.7.7.6



[dpdk-dev] [PATCH v2] doc: add fm10k driver

2015-12-10 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

This documentation covers introdutions and limitations on Intel
FM1 series products.

Signed-off-by: Chen Jing D(Mark) 
---

v2 changes:
 - sync with latest repo.
 - Fix some format and syntax issues.
 - Add descriptions on Testpoint.

 doc/guides/nics/fm10k.rst |   67 +
 doc/guides/nics/index.rst |1 +
 2 files changed, 68 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/nics/fm10k.rst

diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst
new file mode 100644
index 000..4206b7f
--- /dev/null
+++ b/doc/guides/nics/fm10k.rst
@@ -0,0 +1,67 @@
+..  BSD LICENSE
+Copyright(c) 2015 Intel Corporation. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+FM10K Poll Mode Driver
+==
+
+The FM10K poll mode driver library provides support for the Intel FM1
+(FM10K) family of 40GbE/100GbE adapters.
+
+
+Limitations
+---
+
+
+Switch manager
+~~
+
+The Intel FM1 family of NICs integrate a hardware switch and multiple host
+interfaces. The FM1 PMD driver only manages host interfaces. For the
+switch component another switch driver has to be loaded prior to to the
+FM1 PMD driver.  The switch driver can be acquired for Intel support or
+from the `Match Interface <https://github.com/match-interface>`_ project.
+Only Testpoint is validated with DPDK, the latest version that has been
+validated with DPDK2.2 is 4.1.6.
+
+CRC striping
+
+
+The FM1 family of NICs strip the CRC for every packets coming into the
+host interface.  So, CRC will be stripped even when the
+``rxmode.hw_strip_crc`` member is set to 0 in ``struct rte_eth_conf``.
+
+
+Maximum packet length
+~
+
+The FM1 family of NICS support a maximum of a 15K jumbo frame. The value
+is fixed and cannot be changed. So, even when the ``rxmode.max_rx_pkt_len``
+member of ``struct rte_eth_conf`` is set to a value lower than 15364, frames
+up to 15364 bytes can still reach the host interface.
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 7bf2938..4f2cc6c 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -51,6 +51,7 @@ Network Interface Controller Drivers
 virtio
 vmxnet3
 pcap_ring
+fm10k

 **Figures**

-- 
1.7.7.6



[dpdk-dev] [PATCH] doc: add fm10k driver

2015-12-10 Thread Chen, Jing D
Hi, John,

Best Regards,
Mark


> -Original Message-
> From: Mcnamara, John
> Sent: Wednesday, December 09, 2015 10:21 PM
> To: Chen, Jing D; dev at dpdk.org
> Subject: RE: [PATCH] doc: add fm10k driver
> 
> > -Original Message-----
> > From: Chen, Jing D
> > Sent: Wednesday, December 9, 2015 8:25 AM
> > To: dev at dpdk.org
> > Cc: Mcnamara, John; Chen, Jing D
> > Subject: [PATCH] doc: add fm10k driver
> >
> > From: "Chen Jing D(Mark)" 
> >
> > This documentation covers introdutions and limitations on Intel
> > FM1 series products.
> 
> Hi Mark,
> 
> Thanks for that.
> 
> The docs build cleanly but there is one whitespace warning on merge.
> 
> Minor comments below.
> 
> 
> 
> > @@ -0,0 +1,54 @@
> > +..  BSD LICENSE
> > +Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
> 
> The year should be 2015 only and this line shouldn't be indented in relation
> to the next lines.
> 
> 
> 
> > +FM10K Poll Mode Driver
> > +==
> > +The FM10K poll mode driver library provides support for Intel FM1
> > +family of 40GbE/100GbE adapters.
> 
> The DPDK Documentation Guidelines say to leave a blank line after section
> headers (here and with the other sections):
> 
> > +FM10K Poll Mode Driver
> > +==
> >
> > +The FM10K poll mode driver library provides support for Intel FM1
> > +family of 40GbE/100GbE adapters.
> 
>  See:  http://dpdk.org/doc/guides/contributing/documentation.html#rst-
> guidelines
> 
> 
> > +The FM10K poll mode driver library provides support for Intel FM1
> > +family of 40GbE/100GbE adapters.
> 
> Might be worth introducing the common FM10K name here as well:
> 
> The FM10K poll mode driver library provides support for the Intel FM1
> (FM10K) family of 40GbE/100GbE adapters.
> 
> > +Intel FM1 family of NICs integrate an hardware switch and multiple
> > +host interfaces. FM10K PMD driver only manages host interfaces. For the
> 
> The doc uses FM1 in some places and FM10K in others. It should use one
> or the other consistently.
> 
> 
> 
> > +switch component, another switch driver has to be loaded prior to FM10K
> > PMD driver.
> > +The switch driver either can be acquired by Intel support or from below
> > link:
> > +https://github.com/match-interface
> 
> Better to add an actual link like:
> 
> The switch driver can be acquired for Intel support or from the
> `Match Interface <https://github.com/match-interface>`_ project.
> 
> Also should that be to: https://github.com/match-interface/match
> 
> 
> 
> > +
> > +CRC strip
> > +FM1 family always strip CRC for every packets coming into host
> > interface.
> 
> 
> Limitations
> ---
> 
> Switch manager
> ~~
> 
> The Intel FM1 family of NICs integrate a hardware switch and multiple
> host
> 
> Etc.
> 
> > +Max packet length
> > +FM1 family support maximum of 15K jumbo frame. The value is fixed
> > +and can't be changed. So, even (struct
> > +rte_eth_conf).rxmode.max_rx_pkt_len is set to a value other than 15364,
> > the frames with 15364 byte still can reach to host interface.
> 
> This isn't clear (to me). Maybe something like:
> 
> The FM1 family of NICS support a maximum of a 15K jumbo frame. The
> value
> is fixed and cannot be changed. So, even when the
> ``rxmode.max_rx_pkt_len``
> member of ``struct rte_eth_conf`` is set to a value lower than 15364, frames
> up to 15364 bytes can still reach the host interface.
> 
> 
> Regards,
> 
> John.
> --

Many thanks for the comments, I'll change accordingly.



[dpdk-dev] [PATCH] doc: add fm10k driver

2015-12-09 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

This documentation covers introdutions and limitations on Intel
FM1 series products.

Signed-off-by: Chen Jing D(Mark) 
---
 doc/guides/nics/fm10k.rst |   54 +
 doc/guides/nics/index.rst |1 +
 2 files changed, 55 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/nics/fm10k.rst

diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst
new file mode 100644
index 000..1c7ca57
--- /dev/null
+++ b/doc/guides/nics/fm10k.rst
@@ -0,0 +1,54 @@
+..  BSD LICENSE
+Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+FM10K Poll Mode Driver
+==
+The FM10K poll mode driver library provides support for Intel FM1 family
+of 40GbE/100GbE adapters.
+
+Limitations
+---
+Intel FM1 family of NICs integrate an hardware switch and multiple host
+interfaces. FM10K PMD driver only manages host interfaces. For the switch
+component, another switch driver has to be loaded prior to FM10K PMD driver.
+The switch driver either can be acquired by Intel support or from below link:
+https://github.com/match-interface
+
+CRC strip
+FM1 family always strip CRC for every packets coming into host interface.
+So, CRC will be stripped even (struct rte_eth_conf).rxmode.hw_strip_crc is set
+to 0.
+
+Max packet length
+FM1 family support maximum of 15K jumbo frame. The value is fixed and can't
+be changed. So, even (struct rte_eth_conf).rxmode.max_rx_pkt_len is set to a 
value
+other than 15364, the frames with 15364 byte still can reach to host interface.
+
+
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 7bf2938..4f2cc6c 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -51,6 +51,7 @@ Network Interface Controller Drivers
 virtio
 vmxnet3
 pcap_ring
+fm10k

 **Figures**

-- 
1.7.7.6



[dpdk-dev] [PATCH] fm10k: fix wrong Rx func is used

2015-11-27 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Steps to reproduce the bug:
1. All Rx offloading is disabled and start the device, then
   Vector Rx is used.
2. Stop the device. Re-configure to enable hw_ip_checksum = 1,
   start the device again.
3. In this case, assume regular Rx should be used since Vector
   Rx doesn't support ip checksum offload. But actually Vector
   Rx is used and cause checksum won't be done by hardware.

The reason is after re-configuring, driver misses an "else" in
func fm10k_set_rx_function(). Then Rx func in last round are
used.

Fixes:77a8ab47("fm10k: select best Rx function")

Reported-by: Xiao Wang 
Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_ethdev.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 4f23ce3..e4aed94 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2486,6 +2486,8 @@ fm10k_set_rx_function(struct rte_eth_dev *dev)
dev->rx_pkt_burst = fm10k_recv_pkts_vec;
} else if (dev->data->scattered_rx)
dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
+   else
+   dev->rx_pkt_burst = fm10k_recv_pkts;

rx_using_sse =
(dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec ||
-- 
1.7.7.6



[dpdk-dev] [PATCH v2] fm10k: add debug info for actual Rx/Tx func

2015-11-27 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

After introducing vPMD feature, fm10k driver will select best
Rx/Tx in running time. Original implementation selects Rx/Tx
silently without notifications.

This patch adds debug info to notify user what actual Rx/Tx
func are used.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_ethdev.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 0dd56d2..4f23ce3 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2458,13 +2458,16 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
}

if (use_sse) {
+   PMD_INIT_LOG(DEBUG, "Use vector Tx func");
for (i = 0; i < dev->data->nb_tx_queues; i++) {
txq = dev->data->tx_queues[i];
fm10k_txq_vec_setup(txq);
}
dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
-   } else
+   } else {
dev->tx_pkt_burst = fm10k_xmit_pkts;
+   PMD_INIT_LOG(DEBUG, "Use regular Tx func");
+   }
 }

 static void __attribute__((cold))
@@ -2488,6 +2491,11 @@ fm10k_set_rx_function(struct rte_eth_dev *dev)
(dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec ||
dev->rx_pkt_burst == fm10k_recv_pkts_vec);

+   if (rx_using_sse)
+   PMD_INIT_LOG(DEBUG, "Use vector Rx func");
+   else
+   PMD_INIT_LOG(DEBUG, "Use regular Rx func");
+
for (i = 0; i < dev->data->nb_rx_queues; i++) {
struct fm10k_rx_queue *rxq = dev->data->rx_queues[i];

-- 
1.7.7.6



[dpdk-dev] [PATCH] fm10k: add debug info for actual Rx/Tx func

2015-11-25 Thread Chen, Jing D

> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Wednesday, November 25, 2015 12:42 AM
> To: Thomas Monjalon
> Cc: Chen, Jing D; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] fm10k: add debug info for actual Rx/Tx func
> 
> On Tue, 24 Nov 2015 11:32:46 +0100
> Thomas Monjalon  wrote:
> 
> > 2015-11-24 14:00, Chen Jing D:
> > > This patch adds debug info to notify user what actual Rx/Tx
> > > func are used.
> > [...]
> > > + if (rx_using_sse)
> > > + PMD_INIT_LOG(ERR, "Use vector Rx func");
> > > + else
> > > + PMD_INIT_LOG(ERR, "Use regular Rx func");
> >
> > debug info != LOG(ERR
> 
> Really should be DEBUG.
> Developers need to remember you don't wan to see those log messages
> in a production system.

Thanks for the comments. I'll change accordingly. 


[dpdk-dev] [PATCH] fm10k: improvement for vPMD compiling

2015-11-24 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

fm10k driver will meet compile error on non-x86 platforms due to
SSE instructions. Original implementation didn't have switch to
turn off vPMD.
The improvement introduces a macro to turn on/off vPMD functions,
it's on by default. On non-x86 platforms, it can simply be turned
off to fix compile issue.

Signed-off-by: Chen Jing D(Mark) 
---
 config/common_linuxapp |1 +
 drivers/net/fm10k/Makefile |2 +-
 drivers/net/fm10k/fm10k.h  |5 +++
 drivers/net/fm10k/fm10k_ethdev.c   |   66 +---
 drivers/net/fm10k/fm10k_rxtx_vec.c |   10 +
 5 files changed, 78 insertions(+), 6 deletions(-)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index f72c46d..a565153 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -206,6 +206,7 @@ CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX=n
 CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y
+CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y

 #
 # Compile burst-oriented Mellanox ConnectX-3 (MLX4) PMD
diff --git a/drivers/net/fm10k/Makefile b/drivers/net/fm10k/Makefile
index 06ebf83..602a2d2 100644
--- a/drivers/net/fm10k/Makefile
+++ b/drivers/net/fm10k/Makefile
@@ -93,7 +93,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_mbx.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_api.c
-SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_rxtx_vec.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR) += fm10k_rxtx_vec.c

 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 38d5489..cd38af2 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -129,6 +129,9 @@
 #define RTE_FM10K_TX_MAX_FREE_BUF_SZ64
 #define RTE_FM10K_DESCS_PER_LOOP4

+#define FM10K_SIMPLE_TX_FLAG ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+   ETH_TXQ_FLAGS_NOOFFLOADS)
+
 struct fm10k_macvlan_filter_info {
uint16_t vlan_num;   /* Total VLAN number */
uint16_t mac_num;/* Total mac number */
@@ -354,4 +357,6 @@ uint16_t fm10k_recv_scattered_pkts_vec(void *, struct 
rte_mbuf **,
 uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
 void fm10k_txq_vec_setup(struct fm10k_tx_queue *txq);
+int fm10k_tx_vec_condition_check(struct fm10k_tx_queue *txq);
+
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 34dd55c..0a1df7f 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -55,9 +55,6 @@
 #define CHARS_PER_UINT32 (sizeof(uint32_t))
 #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)

-#define FM10K_SIMPLE_TX_FLAG ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
-   ETH_TXQ_FLAGS_NOOFFLOADS)
-
 static void fm10k_close_mbx_service(struct fm10k_hw *hw);
 static void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev);
 static void fm10k_dev_promiscuous_disable(struct rte_eth_dev *dev);
@@ -132,6 +129,65 @@ fm10k_mbx_unlock(struct fm10k_hw *hw)
rte_spinlock_unlock(FM10K_DEV_PRIVATE_TO_MBXLOCK(hw->back));
 }

+/* Stubs needed for linkage when vPMD is disabled */
+int __attribute__((weak))
+fm10k_rx_vec_condition_check(__rte_unused struct rte_eth_dev *dev)
+{
+   return -1;
+}
+
+uint16_t __attribute__((weak))
+fm10k_recv_pkts_vec(
+   __rte_unused void *rx_queue,
+   __rte_unused struct rte_mbuf **rx_pkts,
+   __rte_unused uint16_t nb_pkts)
+{
+   return 0;
+}
+
+uint16_t __attribute__((weak))
+fm10k_recv_scattered_pkts_vec(
+   __rte_unused void *rx_queue,
+   __rte_unused struct rte_mbuf **rx_pkts,
+   __rte_unused uint16_t nb_pkts)
+{
+   return 0;
+}
+
+int __attribute__((weak))
+fm10k_rxq_vec_setup(__rte_unused struct fm10k_rx_queue *rxq)
+
+{
+   return -1;
+}
+
+void __attribute__((weak))
+fm10k_rx_queue_release_mbufs_vec(
+   __rte_unused struct fm10k_rx_queue *rxq)
+{
+   return;
+}
+
+void __attribute__((weak))
+fm10k_txq_vec_setup(__rte_unused struct fm10k_tx_queue *txq)
+{
+   return;
+}
+
+int __attribute__((weak))
+fm10k_tx_vec_condition_check(__rte_unused struct fm10k_tx_queue *txq)
+{
+   return -1;
+}
+
+uint16_t __attribute__((weak))
+fm10k_xmit_pkts_vec(__rte_unused void *tx_queue,
+   __rte_unused struct rte_mbuf **tx_pkts,
+   __rte_unused uint16_t nb_pkts)
+{
+   return 0;
+}
+
 /*
  * reset queue to initial state, allocate software buffers used when starting
  * device.
@@ -2394,8 +2450,8 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)

for (i = 0; i < dev->data->nb_tx_queues; i++) {
  

[dpdk-dev] [PATCH] fm10k: add debug info for actual Rx/Tx func

2015-11-24 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

After introducing vPMD feature, fm10k driver will select best
Rx/Tx in running time. Original implementation selects Rx/Tx
silently without notifications.

This patch adds debug info to notify user what actual Rx/Tx
func are used.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_ethdev.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 7f5c852..34dd55c 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2402,13 +2402,16 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
}

if (use_sse) {
+   PMD_INIT_LOG(ERR, "Use vector Tx func");
for (i = 0; i < dev->data->nb_tx_queues; i++) {
txq = dev->data->tx_queues[i];
fm10k_txq_vec_setup(txq);
}
dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
-   } else
+   } else {
dev->tx_pkt_burst = fm10k_xmit_pkts;
+   PMD_INIT_LOG(ERR, "Use regular Tx func");
+   }
 }

 static void __attribute__((cold))
@@ -2432,6 +2435,11 @@ fm10k_set_rx_function(struct rte_eth_dev *dev)
(dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec ||
dev->rx_pkt_burst == fm10k_recv_pkts_vec);

+   if (rx_using_sse)
+   PMD_INIT_LOG(ERR, "Use vector Rx func");
+   else
+   PMD_INIT_LOG(ERR, "Use regular Rx func");
+
for (i = 0; i < dev->data->nb_rx_queues; i++) {
struct fm10k_rx_queue *rxq = dev->data->rx_queues[i];

-- 
1.7.7.6



[dpdk-dev] [PATCH v2] fm10k: fix a crash bug when quit from testpmd

2015-11-24 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

When the fm10k port is closed, both func tx_queue_clean() and
fm10k_tx_queue_release_mbufs_vec() will try to release buffer in
SW ring. The latter func won't do sanity check on those pointers
and cause crash.

The fix removed Vector TX buffer release func since it can share
the release functions with regular TX.

fixes: fb9066e479a6(fm10k: reset and release mbuf)

Signed-off-by: Chen Jing D(Mark) 
Acked-by: Michael Qiu 
---
 drivers/net/fm10k/fm10k.h  |1 -
 drivers/net/fm10k/fm10k_ethdev.c   |7 +++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   28 
 3 files changed, 3 insertions(+), 33 deletions(-)

v2 changes:
 - remove debug info for actual rx/tx func.

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 754aa6a..38d5489 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -237,7 +237,6 @@ struct fm10k_tx_queue {
 };

 struct fm10k_txq_ops {
-   void (*release_mbufs)(struct fm10k_tx_queue *txq);
void (*reset)(struct fm10k_tx_queue *txq);
 };

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 441f713..7f5c852 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -386,7 +386,6 @@ fm10k_check_mq_mode(struct rte_eth_dev *dev)
 }

 static const struct fm10k_txq_ops def_txq_ops = {
-   .release_mbufs = tx_queue_free,
.reset = tx_queue_reset,
 };

@@ -1073,7 +1072,7 @@ fm10k_dev_queue_release(struct rte_eth_dev *dev)
for (i = 0; i < dev->data->nb_tx_queues; i++) {
struct fm10k_tx_queue *txq = dev->data->tx_queues[i];

-   txq->ops->release_mbufs(txq);
+   tx_queue_free(txq);
}
}

@@ -1761,7 +1760,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
if (dev->data->tx_queues[queue_id] != NULL) {
struct fm10k_tx_queue *txq = dev->data->tx_queues[queue_id];

-   txq->ops->release_mbufs(txq);
+   tx_queue_free(txq);
dev->data->tx_queues[queue_id] = NULL;
}

@@ -1836,7 +1835,7 @@ fm10k_tx_queue_release(void *queue)
struct fm10k_tx_queue *q = queue;
PMD_INIT_FUNC_TRACE();

-   q->ops->release_mbufs(q);
+   tx_queue_free(q);
 }

 static int
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 06beca9..6042568 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -45,8 +45,6 @@
 #endif

 static void
-fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq);
-static void
 fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);

 /* Handling the offload flags (olflags) field takes computation
@@ -634,7 +632,6 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
 }

 static const struct fm10k_txq_ops vec_txq_ops = {
-   .release_mbufs = fm10k_tx_queue_release_mbufs_vec,
.reset = fm10k_reset_tx_queue,
 };

@@ -795,31 +792,6 @@ fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 }

 static void __attribute__((cold))
-fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq)
-{
-   unsigned i;
-   const uint16_t max_desc = (uint16_t)(txq->nb_desc - 1);
-
-   if (txq->sw_ring == NULL || txq->nb_free == max_desc)
-   return;
-
-   /* release the used mbufs in sw_ring */
-   for (i = txq->next_dd - (txq->rs_thresh - 1);
-i != txq->next_free;
-i = (i + 1) & max_desc)
-   rte_pktmbuf_free_seg(txq->sw_ring[i]);
-
-   txq->nb_free = max_desc;
-
-   /* reset tx_entry */
-   for (i = 0; i < txq->nb_desc; i++)
-   txq->sw_ring[i] = NULL;
-
-   rte_free(txq->sw_ring);
-   txq->sw_ring = NULL;
-}
-
-static void __attribute__((cold))
 fm10k_reset_tx_queue(struct fm10k_tx_queue *txq)
 {
static const struct fm10k_tx_desc zeroed_desc = {0};
-- 
1.7.7.6



[dpdk-dev] [PATCH] fm10k: fix a crash bug when quit from testpmd

2015-11-24 Thread Chen, Jing D

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, November 24, 2015 6:55 AM
> To: Chen, Jing D
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] fm10k: fix a crash bug when quit from
> testpmd
> 
> 2015-11-12 12:57, Chen Jing D:
> > From: "Chen Jing D(Mark)" 
> >
> > When the fm10k port is closed, both func tx_queue_clean() and
> > fm10k_tx_queue_release_mbufs_vec() will try to release buffer in
> > SW ring. The latter func won't do sanity check on those pointers
> > and cause crash.
> >
> > The fix include 2 parts.
> > 1. Remove Vector TX buffer release func since it can share the
> >release functions with regular TX.
> > 2. Add log to print out what actual Rx/Tx func is used.
> 
> 2 parts mean 2 patches.

OK, I'll send 2.

> 
> [...]
> > +   if (rx_using_sse)
> > +   PMD_INIT_LOG(ERR, "Use vector Rx func");
> > +   else
> > +   PMD_INIT_LOG(ERR, "Use regular Rx func");
> 
> Why using en error log level?

Because fm10k will decide best rx/tx func in running time, some users 
complain they can't find which rx/tx func they are using. the error level log
will help them.


[dpdk-dev] [PATCH v2] fm10k: fix wrong VLAN value in RX mbuf

2015-11-20 Thread Chen, Jing D
> Signed-off-by: Shaopeng He 
> ---
> ChangeLog:
> 
> v2:
> * change flag PKT_RX_VLAN_PKT to always set
> * preserve the priority bits in vlan_tci
> 
>  drivers/net/fm10k/fm10k_rxtx.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/fm10k/fm10k_rxtx.c
> b/drivers/net/fm10k/fm10k_rxtx.c
> index 1bac28d..764f4f4 100644
> --- a/drivers/net/fm10k/fm10k_rxtx.c
> +++ b/drivers/net/fm10k/fm10k_rxtx.c
> @@ -104,8 +104,14 @@ rx_desc_to_ol_flags(struct rte_mbuf *m, const
> union fm10k_rx_desc *d)
>   (FM10K_RXD_STATUS_L4CS | FM10K_RXD_STATUS_L4E)))
>   m->ol_flags |= PKT_RX_L4_CKSUM_BAD;
> 
> - if (d->d.staterr & FM10K_RXD_STATUS_VEXT)
> - m->ol_flags |= PKT_RX_VLAN_PKT;
> + /**
> +  * fm10k's Ethernet switch core associates a VLAN ID and VLAN PRI

Change to Packets in fm10k device always carries a vlan tag?

> +  * for each packet. For those packets coming in without a VLAN,
> +  * the port default VLAN ID will be used.
> +  * So in fm10k, always PKT_RX_VLAN_PKT flag is set and vlan_tci
> +  * is valid for each RX packet's mbuf.
> +  */
> + m->ol_flags |= PKT_RX_VLAN_PKT;

Since vlan_tci is always valid, is it better to move above line to below added 
lines?

> 
>   if (unlikely(d->d.staterr & FM10K_RXD_STATUS_HBO))
>   m->ol_flags |= PKT_RX_HBUF_OVERFLOW;
> @@ -146,6 +152,8 @@ fm10k_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts,
>  #endif
> 
>   mbuf->hash.rss = desc.d.rss;
> + /* in fm10k, vlan_tci is always valid for RX packet */
> + mbuf->vlan_tci = desc.w.vlan;
> 
>   rx_pkts[count] = mbuf;
>   if (++next_dd == q->nb_desc) {
> @@ -292,6 +300,8 @@ fm10k_recv_scattered_pkts(void *rx_queue, struct
> rte_mbuf **rx_pkts,
>   rx_desc_to_ol_flags(first_seg, );
>  #endif
>   first_seg->hash.rss = desc.d.rss;
> + /* in fm10k, vlan_tci is always valid for RX packet */
> + first_seg->vlan_tci = desc.w.vlan;
> 
>   /* Prefetch data of first segment, if configured to do so. */
>   rte_packet_prefetch((char *)first_seg->buf_addr +
> --
> 1.9.3




[dpdk-dev] [PATCH] fm10k: fix wrong VLAN value in RX mbuf

2015-11-19 Thread Chen, Jing D
Hi, 

Worth to adding comments that vlan_tci is only valid in case 
RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE  is turned on and 
Flag PKT_RX_VLAN_PKT is set.

Best Regards,
Mark


> -Original Message-
> From: He, Shaopeng
> Sent: Wednesday, November 18, 2015 4:50 PM
> To: dev at dpdk.org
> Cc: Chen, Jing D; Qiu, Michael; He, Shaopeng
> Subject: [PATCH] fm10k: fix wrong VLAN value in RX mbuf
> 
> VLAN value should be copied from RX descriptor to mbuf,
> this patch fixes this issue.
> 
> Signed-off-by: Shaopeng He 
> ---
>  drivers/net/fm10k/fm10k_rxtx.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/fm10k/fm10k_rxtx.c
> b/drivers/net/fm10k/fm10k_rxtx.c
> index 1bac28d..eeb635e 100644
> --- a/drivers/net/fm10k/fm10k_rxtx.c
> +++ b/drivers/net/fm10k/fm10k_rxtx.c
> @@ -146,6 +146,7 @@ fm10k_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts,
>  #endif
> 
>   mbuf->hash.rss = desc.d.rss;
> + mbuf->vlan_tci = desc.w.vlan &
> FM10K_RXD_VLAN_ID_MASK;
> 
>   rx_pkts[count] = mbuf;
>   if (++next_dd == q->nb_desc) {
> @@ -292,6 +293,7 @@ fm10k_recv_scattered_pkts(void *rx_queue, struct
> rte_mbuf **rx_pkts,
>   rx_desc_to_ol_flags(first_seg, );
>  #endif
>   first_seg->hash.rss = desc.d.rss;
> + first_seg->vlan_tci = desc.w.vlan &
> FM10K_RXD_VLAN_ID_MASK;
> 
>   /* Prefetch data of first segment, if configured to do so. */
>   rte_packet_prefetch((char *)first_seg->buf_addr +
> --
> 1.9.3


[dpdk-dev] [PATCH] fm10k: fix a crash bug when quit from testpmd

2015-11-12 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

When the fm10k port is closed, both func tx_queue_clean() and
fm10k_tx_queue_release_mbufs_vec() will try to release buffer in
SW ring. The latter func won't do sanity check on those pointers
and cause crash.

The fix include 2 parts.
1. Remove Vector TX buffer release func since it can share the
   release functions with regular TX.
2. Add log to print out what actual Rx/Tx func is used.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |1 -
 drivers/net/fm10k/fm10k_ethdev.c   |   17 -
 drivers/net/fm10k/fm10k_rxtx_vec.c |   28 
 3 files changed, 12 insertions(+), 34 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 754aa6a..38d5489 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -237,7 +237,6 @@ struct fm10k_tx_queue {
 };

 struct fm10k_txq_ops {
-   void (*release_mbufs)(struct fm10k_tx_queue *txq);
void (*reset)(struct fm10k_tx_queue *txq);
 };

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index cf7ada7..af7b0c2 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -386,7 +386,6 @@ fm10k_check_mq_mode(struct rte_eth_dev *dev)
 }

 static const struct fm10k_txq_ops def_txq_ops = {
-   .release_mbufs = tx_queue_free,
.reset = tx_queue_reset,
 };

@@ -1073,7 +1072,7 @@ fm10k_dev_queue_release(struct rte_eth_dev *dev)
for (i = 0; i < dev->data->nb_tx_queues; i++) {
struct fm10k_tx_queue *txq = dev->data->tx_queues[i];

-   txq->ops->release_mbufs(txq);
+   tx_queue_free(txq);
}
}

@@ -1793,7 +1792,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
if (dev->data->tx_queues[queue_id] != NULL) {
struct fm10k_tx_queue *txq = dev->data->tx_queues[queue_id];

-   txq->ops->release_mbufs(txq);
+   tx_queue_free(txq);
dev->data->tx_queues[queue_id] = NULL;
}

@@ -1872,7 +1871,7 @@ fm10k_tx_queue_release(void *queue)
struct fm10k_tx_queue *q = queue;
PMD_INIT_FUNC_TRACE();

-   q->ops->release_mbufs(q);
+   tx_queue_free(q);
 }

 static int
@@ -2439,13 +2438,16 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
}

if (use_sse) {
+   PMD_INIT_LOG(ERR, "Use vector Tx func");
for (i = 0; i < dev->data->nb_tx_queues; i++) {
txq = dev->data->tx_queues[i];
fm10k_txq_vec_setup(txq);
}
dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
-   } else
+   } else {
dev->tx_pkt_burst = fm10k_xmit_pkts;
+   PMD_INIT_LOG(ERR, "Use regular Tx func");
+   }
 }

 static void __attribute__((cold))
@@ -2469,6 +2471,11 @@ fm10k_set_rx_function(struct rte_eth_dev *dev)
(dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec ||
dev->rx_pkt_burst == fm10k_recv_pkts_vec);

+   if (rx_using_sse)
+   PMD_INIT_LOG(ERR, "Use vector Rx func");
+   else
+   PMD_INIT_LOG(ERR, "Use regular Rx func");
+
for (i = 0; i < dev->data->nb_rx_queues; i++) {
struct fm10k_rx_queue *rxq = dev->data->rx_queues[i];

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 06beca9..6042568 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -45,8 +45,6 @@
 #endif

 static void
-fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq);
-static void
 fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);

 /* Handling the offload flags (olflags) field takes computation
@@ -634,7 +632,6 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
 }

 static const struct fm10k_txq_ops vec_txq_ops = {
-   .release_mbufs = fm10k_tx_queue_release_mbufs_vec,
.reset = fm10k_reset_tx_queue,
 };

@@ -795,31 +792,6 @@ fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 }

 static void __attribute__((cold))
-fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq)
-{
-   unsigned i;
-   const uint16_t max_desc = (uint16_t)(txq->nb_desc - 1);
-
-   if (txq->sw_ring == NULL || txq->nb_free == max_desc)
-   return;
-
-   /* release the used mbufs in sw_ring */
-   for (i = txq->next_dd - (txq->rs_thresh - 1);
-i != txq->next_free;
-i = (i + 1) & max_desc)
-   rte_pktmbuf_free_seg(txq->sw_ring[i]);
-
-   txq->nb_free = max_desc;
-
-   /* reset tx_entry */
-   for (i = 0; i < txq->nb_desc; i++)
-   

[dpdk-dev] [PATCH v5 13/14] fm10k: fix a crash issue in vector RX func

2015-10-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Vector RX function will process 4 packets at a time. When the RX
ring wrapps to the tail and the left descriptor size is not multiple
of 4, SW will overwrite memory that not belongs to it and cause crash.
The fix will allocate additional 4 HW/SW spaces at the tail to avoid
overwrite.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h|4 +++-
 drivers/net/fm10k/fm10k_ethdev.c |   19 +--
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 8e2c6a4..82a548f 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -177,7 +177,7 @@ struct fm10k_rx_queue {
struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
uint64_t hw_ring_phys_addr;
uint64_t mbuf_initializer; /* value to init mbufs */
-   /** need to alloc dummy mbuf, for wraparound when scanning hw ring */
+   /* need to alloc dummy mbuf, for wraparound when scanning hw ring */
struct rte_mbuf fake_mbuf;
uint16_t next_dd;
uint16_t next_alloc;
@@ -185,6 +185,8 @@ struct fm10k_rx_queue {
uint16_t alloc_thresh;
volatile uint32_t *tail_ptr;
uint16_t nb_desc;
+   /* Number of faked desc added at the tail for Vector RX function */
+   uint16_t nb_fake_desc;
uint16_t queue_id;
/* Below 2 fields only valid in case vPMD is applied. */
uint16_t rxrearm_nb; /* number of remaining to be re-armed */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 05ed90d..dde067f 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -102,6 +102,7 @@ fm10k_mbx_unlock(struct fm10k_hw *hw)
 static inline int
 rx_queue_reset(struct fm10k_rx_queue *q)
 {
+   static const union fm10k_rx_desc zero = {{0} };
uint64_t dma_addr;
int i, diag;
PMD_INIT_FUNC_TRACE();
@@ -122,6 +123,15 @@ rx_queue_reset(struct fm10k_rx_queue *q)
q->hw_ring[i].q.hdr_addr = dma_addr;
}

+   /* initialize extra software ring entries. Space for these extra
+* entries is always allocated.
+*/
+   memset(>fake_mbuf, 0x0, sizeof(q->fake_mbuf));
+   for (i = 0; i < q->nb_fake_desc; ++i) {
+   q->sw_ring[q->nb_desc + i] = >fake_mbuf;
+   q->hw_ring[q->nb_desc + i] = zero;
+   }
+
q->next_dd = 0;
q->next_alloc = 0;
q->next_trigger = q->alloc_thresh - 1;
@@ -147,6 +157,10 @@ rx_queue_clean(struct fm10k_rx_queue *q)
for (i = 0; i < q->nb_desc; ++i)
q->hw_ring[i] = zero;

+   /* zero faked descriptors */
+   for (i = 0; i < q->nb_fake_desc; ++i)
+   q->hw_ring[q->nb_desc + i] = zero;
+
/* vPMD driver has a different way of releasing mbufs. */
if (q->rx_using_sse) {
fm10k_rx_queue_release_mbufs_vec(q);
@@ -1326,6 +1340,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
/* setup queue */
q->mp = mp;
q->nb_desc = nb_desc;
+   q->nb_fake_desc = FM10K_MULT_RX_DESC;
q->port_id = dev->data->port_id;
q->queue_id = queue_id;
q->tail_ptr = (volatile uint32_t *)
@@ -1335,8 +1350,8 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,

/* allocate memory for the software ring */
q->sw_ring = rte_zmalloc_socket("fm10k sw ring",
-   nb_desc * sizeof(struct rte_mbuf *),
-   RTE_CACHE_LINE_SIZE, socket_id);
+   (nb_desc + q->nb_fake_desc) * sizeof(struct rte_mbuf *),
+   RTE_CACHE_LINE_SIZE, socket_id);
if (q->sw_ring == NULL) {
PMD_INIT_LOG(ERR, "Cannot allocate software ring");
rte_free(q);
-- 
1.7.7.6



[dpdk-dev] [PATCH v5 12/14] fm10k: Add function to decide best TX func

2015-10-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add func fm10k_set_tx_function to decide the best TX func in
fm10k_dev_tx_init.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h|1 +
 drivers/net/fm10k/fm10k_ethdev.c |   38 --
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index bfb71da..8e2c6a4 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -224,6 +224,7 @@ struct fm10k_tx_queue {
uint16_t next_rs; /* Next pos to set RS flag */
uint16_t next_dd; /* Next pos to check DD flag */
volatile uint32_t *tail_ptr;
+   uint32_t txq_flags; /* Holds flags for this TXq */
uint16_t nb_desc;
uint8_t port_id;
uint8_t tx_deferred_start; /** < don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 0b40797..05ed90d 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -53,6 +53,9 @@
 #define CHARS_PER_UINT32 (sizeof(uint32_t))
 #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)

+#define FM10K_SIMPLE_TX_FLAG ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+   ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static void fm10k_close_mbx_service(struct fm10k_hw *hw);
 static void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev);
 static void fm10k_dev_promiscuous_disable(struct rte_eth_dev *dev);
@@ -68,6 +71,7 @@ fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
 static void fm10k_set_rx_function(struct rte_eth_dev *dev);
+static void fm10k_set_tx_function(struct rte_eth_dev *dev);

 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -414,6 +418,10 @@ fm10k_dev_tx_init(struct rte_eth_dev *dev)
base_addr >> (CHAR_BIT * sizeof(uint32_t)));
FM10K_WRITE_REG(hw, FM10K_TDLEN(i), size);
}
+
+   /* set up vector or scalar TX function as appropriate */
+   fm10k_set_tx_function(dev);
+
return 0;
 }

@@ -983,8 +991,7 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
},
.tx_free_thresh = FM10K_TX_FREE_THRESH_DEFAULT(0),
.tx_rs_thresh = FM10K_TX_RS_THRESH_DEFAULT(0),
-   .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
-   ETH_TXQ_FLAGS_NOOFFLOADS,
+   .txq_flags = FM10K_SIMPLE_TX_FLAG,
};

 }
@@ -1483,6 +1490,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
q->nb_desc = nb_desc;
q->port_id = dev->data->port_id;
q->queue_id = queue_id;
+   q->txq_flags = conf->txq_flags;
q->ops = _txq_ops;
q->tail_ptr = (volatile uint32_t *)
&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
@@ -2094,6 +2102,32 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 };

 static void __attribute__((cold))
+fm10k_set_tx_function(struct rte_eth_dev *dev)
+{
+   struct fm10k_tx_queue *txq;
+   int i;
+   int use_sse = 1;
+
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   txq = dev->data->tx_queues[i];
+   if ((txq->txq_flags & FM10K_SIMPLE_TX_FLAG) !=
+   FM10K_SIMPLE_TX_FLAG) {
+   use_sse = 0;
+   break;
+   }
+   }
+
+   if (use_sse) {
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   txq = dev->data->tx_queues[i];
+   fm10k_txq_vec_setup(txq);
+   }
+   dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+   } else
+   dev->tx_pkt_burst = fm10k_xmit_pkts;
+}
+
+static void __attribute__((cold))
 fm10k_set_rx_function(struct rte_eth_dev *dev)
 {
struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
-- 
1.7.7.6



[dpdk-dev] [PATCH v5 11/14] fm10k: introduce 2 funcs to reset TX queue and mbuf release

2015-10-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add 2 funcs to reset TX queue and mbuf release when Vector TX
applied.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |   68 
 1 files changed, 68 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 4515b26..06beca9 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,11 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif

+static void
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq);
+static void
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);
+
 /* Handling the offload flags (olflags) field takes computation
  * time when receiving packets. Therefore we provide a flag to disable
  * the processing of the olflags field when they are not needed. This
@@ -628,6 +633,17 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
_flags[i]);
 }

+static const struct fm10k_txq_ops vec_txq_ops = {
+   .release_mbufs = fm10k_tx_queue_release_mbufs_vec,
+   .reset = fm10k_reset_tx_queue,
+};
+
+void __attribute__((cold))
+fm10k_txq_vec_setup(struct fm10k_tx_queue *txq)
+{
+   txq->ops = _txq_ops;
+}
+
 static inline void
 vtx1(volatile struct fm10k_tx_desc *txdp,
struct rte_mbuf *pkt, uint64_t flags)
@@ -777,3 +793,55 @@ fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,

return nb_pkts;
 }
+
+static void __attribute__((cold))
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq)
+{
+   unsigned i;
+   const uint16_t max_desc = (uint16_t)(txq->nb_desc - 1);
+
+   if (txq->sw_ring == NULL || txq->nb_free == max_desc)
+   return;
+
+   /* release the used mbufs in sw_ring */
+   for (i = txq->next_dd - (txq->rs_thresh - 1);
+i != txq->next_free;
+i = (i + 1) & max_desc)
+   rte_pktmbuf_free_seg(txq->sw_ring[i]);
+
+   txq->nb_free = max_desc;
+
+   /* reset tx_entry */
+   for (i = 0; i < txq->nb_desc; i++)
+   txq->sw_ring[i] = NULL;
+
+   rte_free(txq->sw_ring);
+   txq->sw_ring = NULL;
+}
+
+static void __attribute__((cold))
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq)
+{
+   static const struct fm10k_tx_desc zeroed_desc = {0};
+   struct rte_mbuf **txe = txq->sw_ring;
+   uint16_t i;
+
+   /* Zero out HW ring memory */
+   for (i = 0; i < txq->nb_desc; i++)
+   txq->hw_ring[i] = zeroed_desc;
+
+   /* Initialize SW ring entries */
+   for (i = 0; i < txq->nb_desc; i++)
+   txe[i] = NULL;
+
+   txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+   txq->next_rs = (uint16_t)(txq->rs_thresh - 1);
+
+   txq->next_free = 0;
+   txq->nb_used = 0;
+   /* Always allow 1 descriptor to be un-allocated to avoid
+* a H/W race condition
+*/
+   txq->nb_free = (uint16_t)(txq->nb_desc - 1);
+   FM10K_PCI_REG_WRITE(txq->tail_ptr, 0);
+}
-- 
1.7.7.6



[dpdk-dev] [PATCH v5 10/14] fm10k: use func pointer to reset TX queue and mbuf release

2015-10-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Vector TX use different way to manage TX queue, it's necessary
to use different functions to reset TX queue and release mbuf
in TX queue. So, introduce 2 function pointers to do such ops.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h|9 +
 drivers/net/fm10k/fm10k_ethdev.c |   24 +++-
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 5525b72..bfb71da 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -206,11 +206,14 @@ struct fifo {
uint16_t *endp;
 };

+struct fm10k_txq_ops;
+
 struct fm10k_tx_queue {
struct rte_mbuf **sw_ring;
struct fm10k_tx_desc *hw_ring;
uint64_t hw_ring_phys_addr;
struct fifo rs_tracker;
+   const struct fm10k_txq_ops *ops; /* txq ops */
uint16_t last_free;
uint16_t next_free;
uint16_t nb_free;
@@ -227,6 +230,11 @@ struct fm10k_tx_queue {
uint16_t queue_id;
 };

+struct fm10k_txq_ops {
+   void (*release_mbufs)(struct fm10k_tx_queue *txq);
+   void (*reset)(struct fm10k_tx_queue *txq);
+};
+
 #define MBUF_DMA_ADDR(mb) \
((uint64_t) ((mb)->buf_physaddr + (mb)->data_off))

@@ -340,4 +348,5 @@ uint16_t fm10k_recv_scattered_pkts_vec(void *, struct 
rte_mbuf **,
uint16_t);
 uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
+void fm10k_txq_vec_setup(struct fm10k_tx_queue *txq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 3c7b707..0b40797 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -292,6 +292,11 @@ tx_queue_disable(struct fm10k_hw *hw, uint16_t qnum)
return 0;
 }

+static const struct fm10k_txq_ops def_txq_ops = {
+   .release_mbufs = tx_queue_free,
+   .reset = tx_queue_reset,
+};
+
 static int
 fm10k_dev_configure(struct rte_eth_dev *dev)
 {
@@ -571,7 +576,9 @@ fm10k_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t 
tx_queue_id)
PMD_INIT_FUNC_TRACE();

if (tx_queue_id < dev->data->nb_tx_queues) {
-   tx_queue_reset(dev->data->tx_queues[tx_queue_id]);
+   struct fm10k_tx_queue *q = dev->data->tx_queues[tx_queue_id];
+
+   q->ops->reset(q);

/* reset head and tail pointers */
FM10K_WRITE_REG(hw, FM10K_TDH(tx_queue_id), 0);
@@ -837,8 +844,11 @@ fm10k_dev_queue_release(struct rte_eth_dev *dev)
PMD_INIT_FUNC_TRACE();

if (dev->data->tx_queues) {
-   for (i = 0; i < dev->data->nb_tx_queues; i++)
-   fm10k_tx_queue_release(dev->data->tx_queues[i]);
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   struct fm10k_tx_queue *txq = dev->data->tx_queues[i];
+
+   txq->ops->release_mbufs(txq);
+   }
}

if (dev->data->rx_queues) {
@@ -1455,7 +1465,9 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
 * different socket than was previously used.
 */
if (dev->data->tx_queues[queue_id] != NULL) {
-   tx_queue_free(dev->data->tx_queues[queue_id]);
+   struct fm10k_tx_queue *txq = dev->data->tx_queues[queue_id];
+
+   txq->ops->release_mbufs(txq);
dev->data->tx_queues[queue_id] = NULL;
}

@@ -1471,6 +1483,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
q->nb_desc = nb_desc;
q->port_id = dev->data->port_id;
q->queue_id = queue_id;
+   q->ops = _txq_ops;
q->tail_ptr = (volatile uint32_t *)
&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
if (handle_txconf(q, conf))
@@ -1529,9 +1542,10 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
 static void
 fm10k_tx_queue_release(void *queue)
 {
+   struct fm10k_tx_queue *q = queue;
PMD_INIT_FUNC_TRACE();

-   tx_queue_free(queue);
+   q->ops->release_mbufs(q);
 }

 static int
-- 
1.7.7.6



[dpdk-dev] [PATCH v5 09/14] fm10k: add Vector TX function

2015-10-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add Vector TX func fm10k_xmit_pkts_vec to transmit packets.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |5 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  150 
 2 files changed, 155 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index d17b2fb..5525b72 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -217,6 +217,9 @@ struct fm10k_tx_queue {
uint16_t nb_used;
uint16_t free_thresh;
uint16_t rs_thresh;
+   /* Below 2 fields only valid in case vPMD is applied. */
+   uint16_t next_rs; /* Next pos to set RS flag */
+   uint16_t next_dd; /* Next pos to check DD flag */
volatile uint32_t *tail_ptr;
uint16_t nb_desc;
uint8_t port_id;
@@ -335,4 +338,6 @@ void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue 
*rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
uint16_t);
+uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 4d90d6a..4515b26 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -627,3 +627,153 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
return i + fm10k_reassemble_packets(rxq, _pkts[i], nb_bufs - i,
_flags[i]);
 }
+
+static inline void
+vtx1(volatile struct fm10k_tx_desc *txdp,
+   struct rte_mbuf *pkt, uint64_t flags)
+{
+   __m128i descriptor = _mm_set_epi64x(flags << 56 |
+   pkt->vlan_tci << 16 | pkt->data_len,
+   MBUF_DMA_ADDR(pkt));
+   _mm_store_si128((__m128i *)txdp, descriptor);
+}
+
+static inline void
+vtx(volatile struct fm10k_tx_desc *txdp,
+   struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
+{
+   int i;
+
+   for (i = 0; i < nb_pkts; ++i, ++txdp, ++pkt)
+   vtx1(txdp, *pkt, flags);
+}
+
+static inline int __attribute__((always_inline))
+fm10k_tx_free_bufs(struct fm10k_tx_queue *txq)
+{
+   struct rte_mbuf **txep;
+   uint8_t flags;
+   uint32_t n;
+   uint32_t i;
+   int nb_free = 0;
+   struct rte_mbuf *m, *free[RTE_FM10K_TX_MAX_FREE_BUF_SZ];
+
+   /* check DD bit on threshold descriptor */
+   flags = txq->hw_ring[txq->next_dd].flags;
+   if (!(flags & FM10K_TXD_FLAG_DONE))
+   return 0;
+
+   n = txq->rs_thresh;
+
+   /* First buffer to free from S/W ring is at index
+* next_dd - (rs_thresh-1)
+*/
+   txep = >sw_ring[txq->next_dd - (n - 1)];
+   m = __rte_pktmbuf_prefree_seg(txep[0]);
+   if (likely(m != NULL)) {
+   free[0] = m;
+   nb_free = 1;
+   for (i = 1; i < n; i++) {
+   m = __rte_pktmbuf_prefree_seg(txep[i]);
+   if (likely(m != NULL)) {
+   if (likely(m->pool == free[0]->pool))
+   free[nb_free++] = m;
+   else {
+   rte_mempool_put_bulk(free[0]->pool,
+   (void *)free, nb_free);
+   free[0] = m;
+   nb_free = 1;
+   }
+   }
+   }
+   rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+   } else {
+   for (i = 1; i < n; i++) {
+   m = __rte_pktmbuf_prefree_seg(txep[i]);
+   if (m != NULL)
+   rte_mempool_put(m->pool, m);
+   }
+   }
+
+   /* buffers were freed, update counters */
+   txq->nb_free = (uint16_t)(txq->nb_free + txq->rs_thresh);
+   txq->next_dd = (uint16_t)(txq->next_dd + txq->rs_thresh);
+   if (txq->next_dd >= txq->nb_desc)
+   txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+
+   return txq->rs_thresh;
+}
+
+static inline void __attribute__((always_inline))
+tx_backlog_entry(struct rte_mbuf **txep,
+struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+   int i;
+
+   for (i = 0; i < (int)nb_pkts; ++i)
+   txep[i] = tx_pkts[i];
+}
+
+uint16_t
+fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts)
+{
+   struct fm10k_tx_queue *txq = (struct fm10k_tx_queue *)tx_queue;
+   volatile struct fm10k_tx_desc *txdp;
+   struct rte_

[dpdk-dev] [PATCH v5 08/14] fm10k: add func to release mbuf in case Vector RX applied

2015-10-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Since Vector RX use different variables to trace RX HW ring, it
leads to need different func to release mbuf properly.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |1 +
 drivers/net/fm10k/fm10k_ethdev.c   |6 ++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   18 ++
 3 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 5666af6..d17b2fb 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -331,6 +331,7 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,

 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
+void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
uint16_t);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 70dac2a..3c7b707 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -143,6 +143,12 @@ rx_queue_clean(struct fm10k_rx_queue *q)
for (i = 0; i < q->nb_desc; ++i)
q->hw_ring[i] = zero;

+   /* vPMD driver has a different way of releasing mbufs. */
+   if (q->rx_using_sse) {
+   fm10k_rx_queue_release_mbufs_vec(q);
+   return;
+   }
+
/* free software buffers */
for (i = 0; i < q->nb_desc; ++i) {
if (q->sw_ring[i]) {
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index ffd022a..4d90d6a 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -321,6 +321,24 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
 }

+void __attribute__((cold))
+fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq)
+{
+   const unsigned mask = rxq->nb_desc - 1;
+   unsigned i;
+
+   if (rxq->sw_ring == NULL || rxq->rxrearm_nb >= rxq->nb_desc)
+   return;
+
+   /* free all mbufs that are valid in the ring */
+   for (i = rxq->next_dd; i != rxq->rxrearm_start; i = (i + 1) & mask)
+   rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+   rxq->rxrearm_nb = rxq->nb_desc;
+
+   /* set all entries to NULL */
+   memset(rxq->sw_ring, 0, sizeof(rxq->sw_ring[0]) * rxq->nb_desc);
+}
+
 static inline uint16_t
 fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts, uint8_t *split_packet)
-- 
1.7.7.6



[dpdk-dev] [PATCH v5 07/14] fm10k: add function to decide best RX function

2015-10-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add func fm10k_set_rx_function to decide best RX func in
fm10k_dev_rx_init

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h|1 +
 drivers/net/fm10k/fm10k_ethdev.c |   36 
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 8dba27b..5666af6 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -189,6 +189,7 @@ struct fm10k_rx_queue {
/* Below 2 fields only valid in case vPMD is applied. */
uint16_t rxrearm_nb; /* number of remaining to be re-armed */
uint16_t rxrearm_start;  /* the idx we start the re-arming from */
+   uint16_t rx_using_sse; /* indicates that vector RX is in use */
uint8_t port_id;
uint8_t drop_en;
uint8_t rx_deferred_start; /* don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 6be764a..70dac2a 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -67,6 +67,7 @@ static void
 fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
+static void fm10k_set_rx_function(struct rte_eth_dev *dev);

 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -462,7 +463,6 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
dev->data->dev_conf.rxmode.enable_scatter) {
uint32_t reg;
dev->data->scattered_rx = 1;
-   dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
reg = FM10K_READ_REG(hw, FM10K_SRRCTL(i));
reg |= FM10K_SRRCTL_BUFFER_CHAINING_EN;
FM10K_WRITE_REG(hw, FM10K_SRRCTL(i), reg);
@@ -478,6 +478,9 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)

/* Configure RSS if applicable */
fm10k_dev_mq_rx_configure(dev);
+
+   /* Decide the best RX function */
+   fm10k_set_rx_function(dev);
return 0;
 }

@@ -2070,6 +2073,34 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
.rss_hash_conf_get  = fm10k_rss_hash_conf_get,
 };

+static void __attribute__((cold))
+fm10k_set_rx_function(struct rte_eth_dev *dev)
+{
+   struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+   uint16_t i, rx_using_sse;
+
+   /* In order to allow Vector Rx there are a few configuration
+* conditions to be met.
+*/
+   if (!fm10k_rx_vec_condition_check(dev) && dev_info->rx_vec_allowed) {
+   if (dev->data->scattered_rx)
+   dev->rx_pkt_burst = fm10k_recv_scattered_pkts_vec;
+   else
+   dev->rx_pkt_burst = fm10k_recv_pkts_vec;
+   } else if (dev->data->scattered_rx)
+   dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
+
+   rx_using_sse =
+   (dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec ||
+   dev->rx_pkt_burst == fm10k_recv_pkts_vec);
+
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   struct fm10k_rx_queue *rxq = dev->data->rx_queues[i];
+
+   rxq->rx_using_sse = rx_using_sse;
+   }
+}
+
 static void
 fm10k_params_init(struct rte_eth_dev *dev)
 {
@@ -2104,9 +2135,6 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
dev->rx_pkt_burst = _recv_pkts;
dev->tx_pkt_burst = _xmit_pkts;

-   if (dev->data->scattered_rx)
-   dev->rx_pkt_burst = _recv_scattered_pkts;
-
/* only initialize in the primary process */
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
return 0;
-- 
1.7.7.6



[dpdk-dev] [PATCH v5 06/14] fm10k: add Vector RX scatter function

2015-10-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add func fm10k_recv_scattered_pkts_vec to receive chained packets
with SSE instructions.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |2 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   88 
 2 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 6c1c698..8dba27b 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -331,4 +331,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
+uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
+   uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 64036e3..ffd022a 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -521,3 +521,91 @@ fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf 
**rx_pkts,
 {
return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }
+
+static inline uint16_t
+fm10k_reassemble_packets(struct fm10k_rx_queue *rxq,
+   struct rte_mbuf **rx_bufs,
+   uint16_t nb_bufs, uint8_t *split_flags)
+{
+   struct rte_mbuf *pkts[RTE_FM10K_MAX_RX_BURST]; /*finished pkts*/
+   struct rte_mbuf *start = rxq->pkt_first_seg;
+   struct rte_mbuf *end =  rxq->pkt_last_seg;
+   unsigned pkt_idx, buf_idx;
+
+   for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
+   if (end != NULL) {
+   /* processing a split packet */
+   end->next = rx_bufs[buf_idx];
+   start->nb_segs++;
+   start->pkt_len += rx_bufs[buf_idx]->data_len;
+   end = end->next;
+
+   if (!split_flags[buf_idx]) {
+   /* it's the last packet of the set */
+   start->hash = end->hash;
+   start->ol_flags = end->ol_flags;
+   pkts[pkt_idx++] = start;
+   start = end = NULL;
+   }
+   } else {
+   /* not processing a split packet */
+   if (!split_flags[buf_idx]) {
+   /* not a split packet, save and skip */
+   pkts[pkt_idx++] = rx_bufs[buf_idx];
+   continue;
+   }
+   end = start = rx_bufs[buf_idx];
+   }
+   }
+
+   /* save the partial packet for next time */
+   rxq->pkt_first_seg = start;
+   rxq->pkt_last_seg = end;
+   memcpy(rx_bufs, pkts, pkt_idx * (sizeof(*pkts)));
+   return pkt_idx;
+}
+
+/*
+ * vPMD receive routine that reassembles scattered packets
+ *
+ * Notice:
+ * - don't support ol_flags for rss and csum err
+ * - nb_pkts > RTE_FM10K_MAX_RX_BURST, only scan RTE_FM10K_MAX_RX_BURST
+ *   numbers of DD bit
+ */
+uint16_t
+fm10k_recv_scattered_pkts_vec(void *rx_queue,
+   struct rte_mbuf **rx_pkts,
+   uint16_t nb_pkts)
+{
+   struct fm10k_rx_queue *rxq = rx_queue;
+   uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
+   unsigned i = 0;
+
+   /* Split_flags only can support max of RTE_FM10K_MAX_RX_BURST */
+   nb_pkts = RTE_MIN(nb_pkts, RTE_FM10K_MAX_RX_BURST);
+   /* get some new buffers */
+   uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
+   split_flags);
+   if (nb_bufs == 0)
+   return 0;
+
+   /* happy day case, full burst + no packets to be joined */
+   const uint64_t *split_fl64 = (uint64_t *)split_flags;
+
+   if (rxq->pkt_first_seg == NULL &&
+   split_fl64[0] == 0 && split_fl64[1] == 0 &&
+   split_fl64[2] == 0 && split_fl64[3] == 0)
+   return nb_bufs;
+
+   /* reassemble any packets that need reassembly*/
+   if (rxq->pkt_first_seg == NULL) {
+   /* find the first split flag, and only reassemble then*/
+   while (i < nb_bufs && !split_flags[i])
+   i++;
+   if (i == nb_bufs)
+   return nb_bufs;
+   }
+   return i + fm10k_reassemble_packets(rxq, _pkts[i], nb_bufs - i,
+   _flags[i]);
+}
-- 
1.7.7.6



[dpdk-dev] [PATCH v5 05/14] fm10k: add func to do Vector RX condition check

2015-10-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add func fm10k_rx_vec_condition_check to check if Vector RX
func can be applied.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   31 +++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 96b30a7..6c1c698 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,5 +329,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
uint16_t nb_pkts);

 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 9633f35..64036e3 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -172,6 +172,37 @@ fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf 
**rx_pkts)
 #endif

 int __attribute__((cold))
+fm10k_rx_vec_condition_check(struct rte_eth_dev *dev)
+{
+#ifndef RTE_LIBRTE_IEEE1588
+   struct rte_eth_rxmode *rxmode = >data->dev_conf.rxmode;
+   struct rte_fdir_conf *fconf = >data->dev_conf.fdir_conf;
+
+#ifndef RTE_FM10K_RX_OLFLAGS_ENABLE
+   /* whithout rx ol_flags, no VP flag report */
+   if (rxmode->hw_vlan_extend != 0)
+   return -1;
+#endif
+
+   /* no fdir support */
+   if (fconf->mode != RTE_FDIR_MODE_NONE)
+   return -1;
+
+   /* - no csum error report support
+* - no header split support
+*/
+   if (rxmode->hw_ip_checksum == 1 ||
+   rxmode->header_split == 1)
+   return -1;
+
+   return 0;
+#else
+   RTE_SET_USED(dev);
+   return -1;
+#endif
+}
+
+int __attribute__((cold))
 fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 {
uintptr_t p;
-- 
1.7.7.6



[dpdk-dev] [PATCH v5 04/14] fm10k: add Vector RX function

2015-10-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

This patch add below functions:
1. Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
in RX HW ring.
2. Add 2 functions, in which using SSE instructions to parse RX desc
to get pkt_type and ol_flags in mbuf.
3. Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
includes possible chained packets.
4. Add func fm10k_recv_pkts_vec to receive single mbuf packet.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |   12 +
 drivers/net/fm10k/fm10k_ethdev.c   |3 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  426 
 3 files changed, 441 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 362a2d0..96b30a7 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -123,6 +123,12 @@
 #define FM10K_VFTA_BIT(vlan_id)(1 << ((vlan_id) & 0x1F))
 #define FM10K_VFTA_IDX(vlan_id)((vlan_id) >> 5)

+#define RTE_FM10K_RXQ_REARM_THRESH  32
+#define RTE_FM10K_VPMD_TX_BURST 32
+#define RTE_FM10K_MAX_RX_BURST  RTE_FM10K_RXQ_REARM_THRESH
+#define RTE_FM10K_TX_MAX_FREE_BUF_SZ64
+#define RTE_FM10K_DESCS_PER_LOOP4
+
 struct fm10k_macvlan_filter_info {
uint16_t vlan_num;   /* Total VLAN number */
uint16_t mac_num;/* Total mac number */
@@ -171,6 +177,8 @@ struct fm10k_rx_queue {
struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
uint64_t hw_ring_phys_addr;
uint64_t mbuf_initializer; /* value to init mbufs */
+   /** need to alloc dummy mbuf, for wraparound when scanning hw ring */
+   struct rte_mbuf fake_mbuf;
uint16_t next_dd;
uint16_t next_alloc;
uint16_t next_trigger;
@@ -178,6 +186,9 @@ struct fm10k_rx_queue {
volatile uint32_t *tail_ptr;
uint16_t nb_desc;
uint16_t queue_id;
+   /* Below 2 fields only valid in case vPMD is applied. */
+   uint16_t rxrearm_nb; /* number of remaining to be re-armed */
+   uint16_t rxrearm_start;  /* the idx we start the re-arming from */
uint8_t port_id;
uint8_t drop_en;
uint8_t rx_deferred_start; /* don't start this queue in dev start. */
@@ -318,4 +329,5 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
uint16_t nb_pkts);

 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 8dd64bf..6be764a 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -121,6 +121,9 @@ rx_queue_reset(struct fm10k_rx_queue *q)
q->next_alloc = 0;
q->next_trigger = q->alloc_thresh - 1;
FM10K_PCI_REG_WRITE(q->tail_ptr, q->nb_desc - 1);
+   q->rxrearm_start = 0;
+   q->rxrearm_nb = 0;
+
return 0;
 }

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 34b677b..9633f35 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,133 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif

+/* Handling the offload flags (olflags) field takes computation
+ * time when receiving packets. Therefore we provide a flag to disable
+ * the processing of the olflags field when they are not needed. This
+ * gives improved performance, at the cost of losing the offload info
+ * in the received packet
+ */
+#ifdef RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE
+
+/* Vlan present flag shift */
+#define VP_SHIFT (2)
+/* L3 type shift */
+#define L3TYPE_SHIFT (4)
+/* L4 type shift */
+#define L4TYPE_SHIFT (7)
+
+static inline void
+fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+   __m128i ptype0, ptype1, vtag0, vtag1;
+   union {
+   uint16_t e[4];
+   uint64_t dword;
+   } vol;
+
+   const __m128i pkttype_msk = _mm_set_epi16(
+   0x, 0x, 0x, 0x,
+   PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT,
+   PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT);
+
+   /* mask everything except rss type */
+   const __m128i rsstype_msk = _mm_set_epi16(
+   0x, 0x, 0x, 0x,
+   0x000F, 0x000F, 0x000F, 0x000F);
+
+   /* map rss type to rss hash flag */
+   const __m128i rss_flags = _mm_set_epi8(0, 0, 0, 0,
+   0, 0, 0, PKT_RX_RSS_HASH,
+   PKT_RX_RSS_HASH, 0, PKT_RX_RSS_HASH, 0,
+   PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, 0);
+
+   ptype0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+   ptype1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+   vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
+   v

[dpdk-dev] [PATCH v5 03/14] fm10k: Add a new func to initialize all parameters

2015-10-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add new function fm10k_params_init to initialize all fm10k related
variables.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_ethdev.c |   35 +++
 1 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 680a7fe..8dd64bf 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2067,6 +2067,27 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
.rss_hash_conf_get  = fm10k_rss_hash_conf_get,
 };

+static void
+fm10k_params_init(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct fm10k_dev_info *info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+
+   /* Inialize bus info. Normally we would call fm10k_get_bus_info(), but
+* there is no way to get link status without reading BAR4.  Until this
+* works, assume we have maximum bandwidth.
+* @todo - fix bus info
+*/
+   hw->bus_caps.speed = fm10k_bus_speed_8000;
+   hw->bus_caps.width = fm10k_bus_width_pcie_x8;
+   hw->bus_caps.payload = fm10k_bus_payload_512;
+   hw->bus.speed = fm10k_bus_speed_8000;
+   hw->bus.width = fm10k_bus_width_pcie_x8;
+   hw->bus.payload = fm10k_bus_payload_256;
+
+   info->rx_vec_allowed = true;
+}
+
 static int
 eth_fm10k_dev_init(struct rte_eth_dev *dev)
 {
@@ -2113,18 +2134,8 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
return -EIO;
}

-   /*
-* Inialize bus info. Normally we would call fm10k_get_bus_info(), but
-* there is no way to get link status without reading BAR4.  Until this
-* works, assume we have maximum bandwidth.
-* @todo - fix bus info
-*/
-   hw->bus_caps.speed = fm10k_bus_speed_8000;
-   hw->bus_caps.width = fm10k_bus_width_pcie_x8;
-   hw->bus_caps.payload = fm10k_bus_payload_512;
-   hw->bus.speed = fm10k_bus_speed_8000;
-   hw->bus.width = fm10k_bus_width_pcie_x8;
-   hw->bus.payload = fm10k_bus_payload_256;
+   /* Initialize parameters */
+   fm10k_params_init(dev);

/* Initialize the hw */
diag = fm10k_init_hw(hw);
-- 
1.7.7.6



[dpdk-dev] [PATCH v5 02/14] fm10k: add vPMD pre-condition check for each RX queue

2015-10-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add condition check in rx_queue_setup func. If number of RX desc
can't satisfy vPMD requirement, record it into a variable. Or
call fm10k_rxq_vec_setup to initialize Vector RX.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |   11 ---
 drivers/net/fm10k/fm10k_ethdev.c   |   11 +++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   21 +
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index c089882..362a2d0 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -135,6 +135,8 @@ struct fm10k_dev_info {
/* Protect the mailbox to avoid race condition */
rte_spinlock_tmbx_lock;
struct fm10k_macvlan_filter_infomacvlan;
+   /* Flag to indicate if RX vector conditions satisfied */
+   bool rx_vec_allowed;
 };

 /*
@@ -165,9 +167,10 @@ struct fm10k_rx_queue {
struct rte_mempool *mp;
struct rte_mbuf **sw_ring;
volatile union fm10k_rx_desc *hw_ring;
-   struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
-   struct rte_mbuf *pkt_last_seg;  /**< Last segment of current packet. */
+   struct rte_mbuf *pkt_first_seg; /* First segment of current packet. */
+   struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
uint64_t hw_ring_phys_addr;
+   uint64_t mbuf_initializer; /* value to init mbufs */
uint16_t next_dd;
uint16_t next_alloc;
uint16_t next_trigger;
@@ -177,7 +180,7 @@ struct fm10k_rx_queue {
uint16_t queue_id;
uint8_t port_id;
uint8_t drop_en;
-   uint8_t rx_deferred_start; /**< don't start this queue in dev start. */
+   uint8_t rx_deferred_start; /* don't start this queue in dev start. */
 };

 /*
@@ -313,4 +316,6 @@ uint16_t fm10k_recv_scattered_pkts(void *rx_queue,

 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
+
+int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index b104fc2..680a7fe 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1252,6 +1252,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
const struct rte_eth_rxconf *conf, struct rte_mempool *mp)
 {
struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
struct fm10k_rx_queue *q;
const struct rte_memzone *mz;

@@ -1334,6 +1335,16 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
q->hw_ring_phys_addr = mz->phys_addr;
 #endif

+   /* Check if number of descs satisfied Vector requirement */
+   if (!rte_is_power_of_2(nb_desc)) {
+   PMD_INIT_LOG(DEBUG, "queue[%d] doesn't meet Vector Rx "
+   "preconditions - canceling the feature for "
+   "the whole port[%d]",
+q->queue_id, q->port_id);
+   dev_info->rx_vec_allowed = false;
+   } else
+   fm10k_rxq_vec_setup(q);
+
dev->data->rx_queues[queue_id] = q;
return 0;
 }
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 69174d9..34b677b 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -43,3 +43,24 @@
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
+
+int __attribute__((cold))
+fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
+{
+   uintptr_t p;
+   struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
+
+   mb_def.nb_segs = 1;
+   /* data_off will be ajusted after new mbuf allocated for 512-byte
+* alignment.
+*/
+   mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+   mb_def.port = rxq->port_id;
+   rte_mbuf_refcnt_set(_def, 1);
+
+   /* prevent compiler reordering: rearm_data covers previous fields */
+   rte_compiler_barrier();
+   p = (uintptr_t)_def.rearm_data;
+   rxq->mbuf_initializer = *(uint64_t *)p;
+   return 0;
+}
-- 
1.7.7.6



[dpdk-dev] [PATCH v5 01/14] fm10k: add new vPMD file

2015-10-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add new file fm10k_rxtx_vec.c and add it into compiling.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/Makefile |1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   45 
 2 files changed, 46 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

diff --git a/drivers/net/fm10k/Makefile b/drivers/net/fm10k/Makefile
index a4a8f56..06ebf83 100644
--- a/drivers/net/fm10k/Makefile
+++ b/drivers/net/fm10k/Makefile
@@ -93,6 +93,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_mbx.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_api.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_rxtx_vec.c

 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
new file mode 100644
index 000..69174d9
--- /dev/null
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -0,0 +1,45 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+
+#include 
+#include 
+#include "fm10k.h"
+#include "base/fm10k_type.h"
+
+#include 
+
+#ifndef __INTEL_COMPILER
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif
-- 
1.7.7.6



[dpdk-dev] [PATCH v5 00/14] Vector Rx/Tx PMD implementation for fm10k

2015-10-30 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

v5:
 - Fix some warnings reported by checkpatch.pl
 - Squash 3 patches into 1 to avoid compile error on unsued functions.
 - Sync with master branch

v4:
 - Clear HW/SW ring content after allocating mbuf failed.

v3:
 - Add a blank line after variable definition.
 - Do floor alignment for passing in argument nb_pkts to avoid memory 
overwritten.
 - Only scan max of 32 desc in scatter Rx function to avoid memory overwritten.

v2:
 - Fix a typo issue.
 - Fix an improper prefetch in vector RX function, in which prefetches
   un-initialized mbuf.
 - Remove limitation on number of desc pointer in vector RX function.
 - Re-organize some comments.
 - Add a new patch to fix a crash issue in vector RX func.
 - Add a new patch to update release notes.

v1:
This patch set includes Vector Rx/Tx functions to receive/transmit packets
for fm10k devices. It also contains logic to do sanity check for proper
RX/TX function selections.

Chen Jing D(Mark) (14):
  fm10k: add new vPMD file
  fm10k: add vPMD pre-condition check for each RX queue
  fm10k: Add a new func to initialize all parameters
  fm10k: add Vector RX function
  fm10k: add func to do Vector RX condition check
  fm10k: add Vector RX scatter function
  fm10k: add function to decide best RX function
  fm10k: add func to release mbuf in case Vector RX applied
  fm10k: add Vector TX function
  fm10k: use func pointer to reset TX queue and mbuf release
  fm10k: introduce 2 funcs to reset TX queue and mbuf release
  fm10k: Add function to decide best TX func
  fm10k: fix a crash issue in vector RX func
  doc: release notes update for fm10k Vector PMD

 doc/guides/rel_notes/release_2_2.rst |6 +
 drivers/net/fm10k/Makefile   |1 +
 drivers/net/fm10k/fm10k.h|   45 ++-
 drivers/net/fm10k/fm10k_ethdev.c |  172 ++-
 drivers/net/fm10k/fm10k_rxtx_vec.c   |  847 ++
 5 files changed, 1043 insertions(+), 28 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

-- 
1.7.7.6



[dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k

2015-10-30 Thread Chen, Jing D
Hi, Thomas,

Best Regards,
Mark


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Friday, October 30, 2015 7:13 AM
> To: Chen, Jing D
> Cc: dev at dpdk.org; Liang, Cunming
> Subject: Re: [dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation
> for fm10k
> 
> > > Chen Jing D(Mark) (16):
> > >   fm10k: add new vPMD file
> > >   fm10k: add vPMD pre-condition check for each RX queue
> > >   fm10k: Add a new func to initialize all parameters
> > >   fm10k: add func to re-allocate mbuf for RX ring
> > >   fm10k: add 2 functions to parse pkt_type and offload flag
> > >   fm10k: add Vector RX function
> > >   fm10k: add func to do Vector RX condition check
> > >   fm10k: add Vector RX scatter function
> > >   fm10k: add function to decide best RX function
> > >   fm10k: add func to release mbuf in case Vector RX applied
> > >   fm10k: add Vector TX function
> > >   fm10k: use func pointer to reset TX queue and mbuf release
> > >   fm10k: introduce 2 funcs to reset TX queue and mbuf release
> > >   fm10k: Add function to decide best TX func
> > >   fm10k: fix a crash issue in vector RX func
> > >   doc: release notes update for fm10k Vector PMD
> >
> > Acked-by: Cunming Liang 
> 
> Sorry, there are some checkpatch warnings and a compilation error:
> 
> SPACING: No space is necessary after a cast
> SPACING: spaces preferred around that '+'
> LINE_CONTINUATIONS: Avoid unnecessary line continuations
> 
> And more important, with clang:
> 
> fm10k_rxtx_vec.c:69:1: error: unused function 'fm10k_rxq_rearm'

Thanks for the comments. I'll fix it.


[dpdk-dev] [PATCH v4 16/16] doc: release notes update for fm10k Vector PMD

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Update 2.2 release notes, add descriptions for Vector PMD implementation
in fm10k driver.

Signed-off-by: Chen Jing D(Mark) 
---
 doc/guides/rel_notes/release_2_2.rst |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index 9a70dae..44a3f74 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -39,6 +39,11 @@ Drivers

   Fixed issue with libvirt ``virsh destroy`` not killing the VM.

+* **fm10k:  Add Vector Rx/Tx implementation.**
+
+  This patch set includes Vector Rx/Tx functions to receive/transmit packets
+  for fm10k devices. It also contains logic to do sanity check for proper
+  RX/TX function selections.

 Libraries
 ~
-- 
1.7.7.6



[dpdk-dev] [PATCH v4 15/16] fm10k: fix a crash issue in vector RX func

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Vector RX function will process 4 packets at a time. When the RX
ring wrapps to the tail and the left descriptor size is not multiple
of 4, SW will overwrite memory that not belongs to it and cause crash.
The fix will allocate additional 4 HW/SW spaces at the tail to avoid
overwrite.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h|4 +++-
 drivers/net/fm10k/fm10k_ethdev.c |   19 +--
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 8e2c6a4..82a548f 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -177,7 +177,7 @@ struct fm10k_rx_queue {
struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
uint64_t hw_ring_phys_addr;
uint64_t mbuf_initializer; /* value to init mbufs */
-   /** need to alloc dummy mbuf, for wraparound when scanning hw ring */
+   /* need to alloc dummy mbuf, for wraparound when scanning hw ring */
struct rte_mbuf fake_mbuf;
uint16_t next_dd;
uint16_t next_alloc;
@@ -185,6 +185,8 @@ struct fm10k_rx_queue {
uint16_t alloc_thresh;
volatile uint32_t *tail_ptr;
uint16_t nb_desc;
+   /* Number of faked desc added at the tail for Vector RX function */
+   uint16_t nb_fake_desc;
uint16_t queue_id;
/* Below 2 fields only valid in case vPMD is applied. */
uint16_t rxrearm_nb; /* number of remaining to be re-armed */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 469bd85..705b311 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -102,6 +102,7 @@ fm10k_mbx_unlock(struct fm10k_hw *hw)
 static inline int
 rx_queue_reset(struct fm10k_rx_queue *q)
 {
+   static const union fm10k_rx_desc zero = {{0} };
uint64_t dma_addr;
int i, diag;
PMD_INIT_FUNC_TRACE();
@@ -122,6 +123,15 @@ rx_queue_reset(struct fm10k_rx_queue *q)
q->hw_ring[i].q.hdr_addr = dma_addr;
}

+   /* initialize extra software ring entries. Space for these extra
+* entries is always allocated.
+*/
+   memset(>fake_mbuf, 0x0, sizeof(q->fake_mbuf));
+   for (i = 0; i < q->nb_fake_desc; ++i) {
+   q->sw_ring[q->nb_desc + i] = >fake_mbuf;
+   q->hw_ring[q->nb_desc + i] = zero;
+   }
+
q->next_dd = 0;
q->next_alloc = 0;
q->next_trigger = q->alloc_thresh - 1;
@@ -147,6 +157,10 @@ rx_queue_clean(struct fm10k_rx_queue *q)
for (i = 0; i < q->nb_desc; ++i)
q->hw_ring[i] = zero;

+   /* zero faked descriptors */
+   for (i = 0; i < q->nb_fake_desc; ++i)
+   q->hw_ring[q->nb_desc + i] = zero;
+
/* vPMD driver has a different way of releasing mbufs. */
if (q->rx_using_sse) {
fm10k_rx_queue_release_mbufs_vec(q);
@@ -1323,6 +1337,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
/* setup queue */
q->mp = mp;
q->nb_desc = nb_desc;
+   q->nb_fake_desc = FM10K_MULT_RX_DESC;
q->port_id = dev->data->port_id;
q->queue_id = queue_id;
q->tail_ptr = (volatile uint32_t *)
@@ -1332,8 +1347,8 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,

/* allocate memory for the software ring */
q->sw_ring = rte_zmalloc_socket("fm10k sw ring",
-   nb_desc * sizeof(struct rte_mbuf *),
-   RTE_CACHE_LINE_SIZE, socket_id);
+   (nb_desc + q->nb_fake_desc) * sizeof(struct rte_mbuf *),
+   RTE_CACHE_LINE_SIZE, socket_id);
if (q->sw_ring == NULL) {
PMD_INIT_LOG(ERR, "Cannot allocate software ring");
rte_free(q);
-- 
1.7.7.6



[dpdk-dev] [PATCH v4 14/16] fm10k: Add function to decide best TX func

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add func fm10k_set_tx_function to decide the best TX func in
fm10k_dev_tx_init.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h|1 +
 drivers/net/fm10k/fm10k_ethdev.c |   38 --
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index bfb71da..8e2c6a4 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -224,6 +224,7 @@ struct fm10k_tx_queue {
uint16_t next_rs; /* Next pos to set RS flag */
uint16_t next_dd; /* Next pos to check DD flag */
volatile uint32_t *tail_ptr;
+   uint32_t txq_flags; /* Holds flags for this TXq */
uint16_t nb_desc;
uint8_t port_id;
uint8_t tx_deferred_start; /** < don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 88bd887..469bd85 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -53,6 +53,9 @@
 #define CHARS_PER_UINT32 (sizeof(uint32_t))
 #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)

+#define FM10K_SIMPLE_TX_FLAG ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+   ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static void fm10k_close_mbx_service(struct fm10k_hw *hw);
 static void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev);
 static void fm10k_dev_promiscuous_disable(struct rte_eth_dev *dev);
@@ -68,6 +71,7 @@ fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
 static void fm10k_set_rx_function(struct rte_eth_dev *dev);
+static void fm10k_set_tx_function(struct rte_eth_dev *dev);

 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -414,6 +418,10 @@ fm10k_dev_tx_init(struct rte_eth_dev *dev)
base_addr >> (CHAR_BIT * sizeof(uint32_t)));
FM10K_WRITE_REG(hw, FM10K_TDLEN(i), size);
}
+
+   /* set up vector or scalar TX function as appropriate */
+   fm10k_set_tx_function(dev);
+
return 0;
 }

@@ -980,8 +988,7 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
},
.tx_free_thresh = FM10K_TX_FREE_THRESH_DEFAULT(0),
.tx_rs_thresh = FM10K_TX_RS_THRESH_DEFAULT(0),
-   .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
-   ETH_TXQ_FLAGS_NOOFFLOADS,
+   .txq_flags = FM10K_SIMPLE_TX_FLAG,
};

 }
@@ -1479,6 +1486,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
q->nb_desc = nb_desc;
q->port_id = dev->data->port_id;
q->queue_id = queue_id;
+   q->txq_flags = conf->txq_flags;
q->ops = _txq_ops;
q->tail_ptr = (volatile uint32_t *)
&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
@@ -2090,6 +2098,32 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 };

 static void __attribute__((cold))
+fm10k_set_tx_function(struct rte_eth_dev *dev)
+{
+   struct fm10k_tx_queue *txq;
+   int i;
+   int use_sse = 1;
+
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   txq = dev->data->tx_queues[i];
+   if ((txq->txq_flags & FM10K_SIMPLE_TX_FLAG) != \
+   FM10K_SIMPLE_TX_FLAG) {
+   use_sse = 0;
+   break;
+   }
+   }
+
+   if (use_sse) {
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   txq = dev->data->tx_queues[i];
+   fm10k_txq_vec_setup(txq);
+   }
+   dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+   } else
+   dev->tx_pkt_burst = fm10k_xmit_pkts;
+}
+
+static void __attribute__((cold))
 fm10k_set_rx_function(struct rte_eth_dev *dev)
 {
struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
-- 
1.7.7.6



[dpdk-dev] [PATCH v4 13/16] fm10k: introduce 2 funcs to reset TX queue and mbuf release

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add 2 funcs to reset TX queue and mbuf release when Vector TX
applied.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |   68 
 1 files changed, 68 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 37418bf..e572715 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,11 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif

+static void
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq);
+static void
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);
+
 /* Handling the offload flags (olflags) field takes computation
  * time when receiving packets. Therefore we provide a flag to disable
  * the processing of the olflags field when they are not needed. This
@@ -628,6 +633,17 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
_flags[i]);
 }

+static const struct fm10k_txq_ops vec_txq_ops = {
+   .release_mbufs = fm10k_tx_queue_release_mbufs_vec,
+   .reset = fm10k_reset_tx_queue,
+};
+
+void __attribute__((cold))
+fm10k_txq_vec_setup(struct fm10k_tx_queue *txq)
+{
+   txq->ops = _txq_ops;
+}
+
 static inline void
 vtx1(volatile struct fm10k_tx_desc *txdp,
struct rte_mbuf *pkt, uint64_t flags)
@@ -777,3 +793,55 @@ fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,

return nb_pkts;
 }
+
+static void __attribute__((cold))
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq)
+{
+   unsigned i;
+   const uint16_t max_desc = (uint16_t)(txq->nb_desc - 1);
+
+   if (txq->sw_ring == NULL || txq->nb_free == max_desc)
+   return;
+
+   /* release the used mbufs in sw_ring */
+   for (i = txq->next_dd - (txq->rs_thresh - 1);
+i != txq->next_free;
+i = (i + 1) & max_desc)
+   rte_pktmbuf_free_seg(txq->sw_ring[i]);
+
+   txq->nb_free = max_desc;
+
+   /* reset tx_entry */
+   for (i = 0; i < txq->nb_desc; i++)
+   txq->sw_ring[i] = NULL;
+
+   rte_free(txq->sw_ring);
+   txq->sw_ring = NULL;
+}
+
+static void __attribute__((cold))
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq)
+{
+   static const struct fm10k_tx_desc zeroed_desc = {0};
+   struct rte_mbuf **txe = txq->sw_ring;
+   uint16_t i;
+
+   /* Zero out HW ring memory */
+   for (i = 0; i < txq->nb_desc; i++)
+   txq->hw_ring[i] = zeroed_desc;
+
+   /* Initialize SW ring entries */
+   for (i = 0; i < txq->nb_desc; i++)
+   txe[i] = NULL;
+
+   txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+   txq->next_rs = (uint16_t)(txq->rs_thresh - 1);
+
+   txq->next_free = 0;
+   txq->nb_used = 0;
+   /* Always allow 1 descriptor to be un-allocated to avoid
+* a H/W race condition
+*/
+   txq->nb_free = (uint16_t)(txq->nb_desc - 1);
+   FM10K_PCI_REG_WRITE(txq->tail_ptr, 0);
+}
-- 
1.7.7.6



[dpdk-dev] [PATCH v4 12/16] fm10k: use func pointer to reset TX queue and mbuf release

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Vector TX use different way to manage TX queue, it's necessary
to use different functions to reset TX queue and release mbuf
in TX queue. So, introduce 2 function pointers to do such ops.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h|9 +
 drivers/net/fm10k/fm10k_ethdev.c |   21 -
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 5525b72..bfb71da 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -206,11 +206,14 @@ struct fifo {
uint16_t *endp;
 };

+struct fm10k_txq_ops;
+
 struct fm10k_tx_queue {
struct rte_mbuf **sw_ring;
struct fm10k_tx_desc *hw_ring;
uint64_t hw_ring_phys_addr;
struct fifo rs_tracker;
+   const struct fm10k_txq_ops *ops; /* txq ops */
uint16_t last_free;
uint16_t next_free;
uint16_t nb_free;
@@ -227,6 +230,11 @@ struct fm10k_tx_queue {
uint16_t queue_id;
 };

+struct fm10k_txq_ops {
+   void (*release_mbufs)(struct fm10k_tx_queue *txq);
+   void (*reset)(struct fm10k_tx_queue *txq);
+};
+
 #define MBUF_DMA_ADDR(mb) \
((uint64_t) ((mb)->buf_physaddr + (mb)->data_off))

@@ -340,4 +348,5 @@ uint16_t fm10k_recv_scattered_pkts_vec(void *, struct 
rte_mbuf **,
uint16_t);
 uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
+void fm10k_txq_vec_setup(struct fm10k_tx_queue *txq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index a46a349..88bd887 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -292,6 +292,11 @@ tx_queue_disable(struct fm10k_hw *hw, uint16_t qnum)
return 0;
 }

+static const struct fm10k_txq_ops def_txq_ops = {
+   .release_mbufs = tx_queue_free,
+   .reset = tx_queue_reset,
+};
+
 static int
 fm10k_dev_configure(struct rte_eth_dev *dev)
 {
@@ -571,7 +576,8 @@ fm10k_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t 
tx_queue_id)
PMD_INIT_FUNC_TRACE();

if (tx_queue_id < dev->data->nb_tx_queues) {
-   tx_queue_reset(dev->data->tx_queues[tx_queue_id]);
+   struct fm10k_tx_queue *q = dev->data->tx_queues[tx_queue_id];
+   q->ops->reset(q);

/* reset head and tail pointers */
FM10K_WRITE_REG(hw, FM10K_TDH(tx_queue_id), 0);
@@ -837,8 +843,10 @@ fm10k_dev_queue_release(struct rte_eth_dev *dev)
PMD_INIT_FUNC_TRACE();

if (dev->data->tx_queues) {
-   for (i = 0; i < dev->data->nb_tx_queues; i++)
-   fm10k_tx_queue_release(dev->data->tx_queues[i]);
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   struct fm10k_tx_queue *txq = dev->data->tx_queues[i];
+   txq->ops->release_mbufs(txq);
+   }
}

if (dev->data->rx_queues) {
@@ -1454,7 +1462,8 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
 * different socket than was previously used.
 */
if (dev->data->tx_queues[queue_id] != NULL) {
-   tx_queue_free(dev->data->tx_queues[queue_id]);
+   struct fm10k_tx_queue *txq = dev->data->tx_queues[queue_id];
+   txq->ops->release_mbufs(txq);
dev->data->tx_queues[queue_id] = NULL;
}

@@ -1470,6 +1479,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
q->nb_desc = nb_desc;
q->port_id = dev->data->port_id;
q->queue_id = queue_id;
+   q->ops = _txq_ops;
q->tail_ptr = (volatile uint32_t *)
&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
if (handle_txconf(q, conf))
@@ -1528,9 +1538,10 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
 static void
 fm10k_tx_queue_release(void *queue)
 {
+   struct fm10k_tx_queue *q = queue;
PMD_INIT_FUNC_TRACE();

-   tx_queue_free(queue);
+   q->ops->release_mbufs(q);
 }

 static int
-- 
1.7.7.6



[dpdk-dev] [PATCH v4 10/16] fm10k: add func to release mbuf in case Vector RX applied

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Since Vector RX use different variables to trace RX HW ring, it
leads to need different func to release mbuf properly.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |1 +
 drivers/net/fm10k/fm10k_ethdev.c   |6 ++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   18 ++
 3 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 5666af6..d17b2fb 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -331,6 +331,7 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,

 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
+void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
uint16_t);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 4690a0c..a46a349 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -143,6 +143,12 @@ rx_queue_clean(struct fm10k_rx_queue *q)
for (i = 0; i < q->nb_desc; ++i)
q->hw_ring[i] = zero;

+   /* vPMD driver has a different way of releasing mbufs. */
+   if (q->rx_using_sse) {
+   fm10k_rx_queue_release_mbufs_vec(q);
+   return;
+   }
+
/* free software buffers */
for (i = 0; i < q->nb_desc; ++i) {
if (q->sw_ring[i]) {
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 2d1dfa3..0869aa3 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -321,6 +321,24 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
 }

+void __attribute__((cold))
+fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq)
+{
+   const unsigned mask = rxq->nb_desc - 1;
+   unsigned i;
+
+   if (rxq->sw_ring == NULL || rxq->rxrearm_nb >= rxq->nb_desc)
+   return;
+
+   /* free all mbufs that are valid in the ring */
+   for (i = rxq->next_dd; i != rxq->rxrearm_start; i = (i + 1) & mask)
+   rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+   rxq->rxrearm_nb = rxq->nb_desc;
+
+   /* set all entries to NULL */
+   memset(rxq->sw_ring, 0, sizeof(rxq->sw_ring[0]) * rxq->nb_desc);
+}
+
 static inline uint16_t
 fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts, uint8_t *split_packet)
-- 
1.7.7.6



[dpdk-dev] [PATCH v4 09/16] fm10k: add function to decide best RX function

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add func fm10k_set_rx_function to decide best RX func in
fm10k_dev_rx_init

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h|1 +
 drivers/net/fm10k/fm10k_ethdev.c |   36 
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 8dba27b..5666af6 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -189,6 +189,7 @@ struct fm10k_rx_queue {
/* Below 2 fields only valid in case vPMD is applied. */
uint16_t rxrearm_nb; /* number of remaining to be re-armed */
uint16_t rxrearm_start;  /* the idx we start the re-arming from */
+   uint16_t rx_using_sse; /* indicates that vector RX is in use */
uint8_t port_id;
uint8_t drop_en;
uint8_t rx_deferred_start; /* don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 44c3d34..4690a0c 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -67,6 +67,7 @@ static void
 fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
+static void fm10k_set_rx_function(struct rte_eth_dev *dev);

 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -462,7 +463,6 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
dev->data->dev_conf.rxmode.enable_scatter) {
uint32_t reg;
dev->data->scattered_rx = 1;
-   dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
reg = FM10K_READ_REG(hw, FM10K_SRRCTL(i));
reg |= FM10K_SRRCTL_BUFFER_CHAINING_EN;
FM10K_WRITE_REG(hw, FM10K_SRRCTL(i), reg);
@@ -478,6 +478,9 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)

/* Configure RSS if applicable */
fm10k_dev_mq_rx_configure(dev);
+
+   /* Decide the best RX function */
+   fm10k_set_rx_function(dev);
return 0;
 }

@@ -2069,6 +2072,34 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
.rss_hash_conf_get  = fm10k_rss_hash_conf_get,
 };

+static void __attribute__((cold))
+fm10k_set_rx_function(struct rte_eth_dev *dev)
+{
+   struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+   uint16_t i, rx_using_sse;
+
+   /* In order to allow Vector Rx there are a few configuration
+* conditions to be met.
+*/
+   if (!fm10k_rx_vec_condition_check(dev) && dev_info->rx_vec_allowed) {
+   if (dev->data->scattered_rx)
+   dev->rx_pkt_burst = fm10k_recv_scattered_pkts_vec;
+   else
+   dev->rx_pkt_burst = fm10k_recv_pkts_vec;
+   } else if (dev->data->scattered_rx)
+   dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
+
+   rx_using_sse =
+   (dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec ||
+   dev->rx_pkt_burst == fm10k_recv_pkts_vec);
+
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   struct fm10k_rx_queue *rxq = dev->data->rx_queues[i];
+   rxq->rx_using_sse = rx_using_sse;
+   }
+
+}
+
 static void
 fm10k_params_init(struct rte_eth_dev *dev)
 {
@@ -2103,9 +2134,6 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
dev->rx_pkt_burst = _recv_pkts;
dev->tx_pkt_burst = _xmit_pkts;

-   if (dev->data->scattered_rx)
-   dev->rx_pkt_burst = _recv_scattered_pkts;
-
/* only initialize in the primary process */
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
return 0;
-- 
1.7.7.6



[dpdk-dev] [PATCH v4 08/16] fm10k: add Vector RX scatter function

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add func fm10k_recv_scattered_pkts_vec to receive chained packets
with SSE instructions.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |2 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   88 
 2 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 6c1c698..8dba27b 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -331,4 +331,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
+uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
+   uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 4ecb471..2d1dfa3 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -521,3 +521,91 @@ fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf 
**rx_pkts,
 {
return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }
+
+static inline uint16_t
+fm10k_reassemble_packets(struct fm10k_rx_queue *rxq,
+   struct rte_mbuf **rx_bufs,
+   uint16_t nb_bufs, uint8_t *split_flags)
+{
+   struct rte_mbuf *pkts[RTE_FM10K_MAX_RX_BURST]; /*finished pkts*/
+   struct rte_mbuf *start = rxq->pkt_first_seg;
+   struct rte_mbuf *end =  rxq->pkt_last_seg;
+   unsigned pkt_idx, buf_idx;
+
+
+   for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
+   if (end != NULL) {
+   /* processing a split packet */
+   end->next = rx_bufs[buf_idx];
+   start->nb_segs++;
+   start->pkt_len += rx_bufs[buf_idx]->data_len;
+   end = end->next;
+
+   if (!split_flags[buf_idx]) {
+   /* it's the last packet of the set */
+   start->hash = end->hash;
+   start->ol_flags = end->ol_flags;
+   pkts[pkt_idx++] = start;
+   start = end = NULL;
+   }
+   } else {
+   /* not processing a split packet */
+   if (!split_flags[buf_idx]) {
+   /* not a split packet, save and skip */
+   pkts[pkt_idx++] = rx_bufs[buf_idx];
+   continue;
+   }
+   end = start = rx_bufs[buf_idx];
+   }
+   }
+
+   /* save the partial packet for next time */
+   rxq->pkt_first_seg = start;
+   rxq->pkt_last_seg = end;
+   memcpy(rx_bufs, pkts, pkt_idx * (sizeof(*pkts)));
+   return pkt_idx;
+}
+
+/*
+ * vPMD receive routine that reassembles scattered packets
+ *
+ * Notice:
+ * - don't support ol_flags for rss and csum err
+ * - nb_pkts > RTE_FM10K_MAX_RX_BURST, only scan RTE_FM10K_MAX_RX_BURST
+ *   numbers of DD bit
+ */
+uint16_t
+fm10k_recv_scattered_pkts_vec(void *rx_queue,
+   struct rte_mbuf **rx_pkts,
+   uint16_t nb_pkts)
+{
+   struct fm10k_rx_queue *rxq = rx_queue;
+   uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
+   unsigned i = 0;
+
+   /* Split_flags only can support max of RTE_FM10K_MAX_RX_BURST */
+   nb_pkts = RTE_MIN(nb_pkts, RTE_FM10K_MAX_RX_BURST);
+   /* get some new buffers */
+   uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
+   split_flags);
+   if (nb_bufs == 0)
+   return 0;
+
+   /* happy day case, full burst + no packets to be joined */
+   const uint64_t *split_fl64 = (uint64_t *)split_flags;
+   if (rxq->pkt_first_seg == NULL &&
+   split_fl64[0] == 0 && split_fl64[1] == 0 &&
+   split_fl64[2] == 0 && split_fl64[3] == 0)
+   return nb_bufs;
+
+   /* reassemble any packets that need reassembly*/
+   if (rxq->pkt_first_seg == NULL) {
+   /* find the first split flag, and only reassemble then*/
+   while (i < nb_bufs && !split_flags[i])
+   i++;
+   if (i == nb_bufs)
+   return nb_bufs;
+   }
+   return i + fm10k_reassemble_packets(rxq, _pkts[i], nb_bufs - i,
+   _flags[i]);
+}
-- 
1.7.7.6



[dpdk-dev] [PATCH v4 07/16] fm10k: add func to do Vector RX condition check

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add func fm10k_rx_vec_condition_check to check if Vector RX
func can be applied.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   31 +++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 96b30a7..6c1c698 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,5 +329,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
uint16_t nb_pkts);

 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 8c535f0..4ecb471 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -172,6 +172,37 @@ fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf 
**rx_pkts)
 #endif

 int __attribute__((cold))
+fm10k_rx_vec_condition_check(struct rte_eth_dev *dev)
+{
+#ifndef RTE_LIBRTE_IEEE1588
+   struct rte_eth_rxmode *rxmode = >data->dev_conf.rxmode;
+   struct rte_fdir_conf *fconf = >data->dev_conf.fdir_conf;
+
+#ifndef RTE_FM10K_RX_OLFLAGS_ENABLE
+   /* whithout rx ol_flags, no VP flag report */
+   if (rxmode->hw_vlan_extend != 0)
+   return -1;
+#endif
+
+   /* no fdir support */
+   if (fconf->mode != RTE_FDIR_MODE_NONE)
+   return -1;
+
+   /* - no csum error report support
+* - no header split support
+*/
+   if (rxmode->hw_ip_checksum == 1 ||
+   rxmode->header_split == 1)
+   return -1;
+
+   return 0;
+#else
+   RTE_SET_USED(dev);
+   return -1;
+#endif
+}
+
+int __attribute__((cold))
 fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 {
uintptr_t p;
-- 
1.7.7.6



[dpdk-dev] [PATCH v4 06/16] fm10k: add Vector RX function

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
includes possible chained packets.
Add func fm10k_recv_pkts_vec to receive single mbuf packet.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  201 
 2 files changed, 202 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 5513644..96b30a7 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,4 +329,5 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
uint16_t nb_pkts);

 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 88c9536..8c535f0 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -289,3 +289,204 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
/* Update the tail pointer on the NIC */
FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
 }
+
+static inline uint16_t
+fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+   uint16_t nb_pkts, uint8_t *split_packet)
+{
+   volatile union fm10k_rx_desc *rxdp;
+   struct rte_mbuf **mbufp;
+   uint16_t nb_pkts_recd;
+   int pos;
+   struct fm10k_rx_queue *rxq = rx_queue;
+   uint64_t var;
+   __m128i shuf_msk;
+   __m128i dd_check, eop_check;
+   uint16_t next_dd;
+
+   next_dd = rxq->next_dd;
+
+   /* Just the act of getting into the function from the application is
+* going to cost about 7 cycles
+*/
+   rxdp = rxq->hw_ring + next_dd;
+
+   _mm_prefetch((const void *)rxdp, _MM_HINT_T0);
+
+   /* See if we need to rearm the RX queue - gives the prefetch a bit
+* of time to act
+*/
+   if (rxq->rxrearm_nb > RTE_FM10K_RXQ_REARM_THRESH)
+   fm10k_rxq_rearm(rxq);
+
+   /* Before we start moving massive data around, check to see if
+* there is actually a packet available
+*/
+   if (!(rxdp->d.staterr & FM10K_RXD_STATUS_DD))
+   return 0;
+
+   /* Vecotr RX will process 4 packets at a time, strip the unaligned
+* tails in case it's not multiple of 4.
+*/
+   nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, RTE_FM10K_DESCS_PER_LOOP);
+
+   /* 4 packets DD mask */
+   dd_check = _mm_set_epi64x(0x00010001LL, 0x00010001LL);
+
+   /* 4 packets EOP mask */
+   eop_check = _mm_set_epi64x(0x00020002LL, 0x00020002LL);
+
+   /* mask to shuffle from desc. to mbuf */
+   shuf_msk = _mm_set_epi8(
+   7, 6, 5, 4,  /* octet 4~7, 32bits rss */
+   15, 14,  /* octet 14~15, low 16 bits vlan_macip */
+   13, 12,  /* octet 12~13, 16 bits data_len */
+   0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
+   13, 12,  /* octet 12~13, low 16 bits pkt_len */
+   0xFF, 0xFF,  /* skip high 16 bits pkt_type */
+   0xFF, 0xFF   /* Skip pkt_type field in shuffle operation */
+   );
+
+   /* Cache is empty -> need to scan the buffer rings, but first move
+* the next 'n' mbufs into the cache
+*/
+   mbufp = >sw_ring[next_dd];
+
+   /* A. load 4 packet in one loop
+* [A*. mask out 4 unused dirty field in desc]
+* B. copy 4 mbuf point from swring to rx_pkts
+* C. calc the number of DD bits among the 4 packets
+* [C*. extract the end-of-packet bit, if requested]
+* D. fill info. from desc to mbuf
+*/
+   for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
+   pos += RTE_FM10K_DESCS_PER_LOOP,
+   rxdp += RTE_FM10K_DESCS_PER_LOOP) {
+   __m128i descs0[RTE_FM10K_DESCS_PER_LOOP];
+   __m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
+   __m128i zero, staterr, sterr_tmp1, sterr_tmp2;
+   __m128i mbp1, mbp2; /* two mbuf pointer in one XMM reg. */
+
+   /* B.1 load 1 mbuf point */
+   mbp1 = _mm_loadu_si128((__m128i *)[pos]);
+
+   /* Read desc statuses backwards to avoid race condition */
+   /* A.1 load 4 pkts desc */
+   descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
+
+   /* B.2 copy 2 mbuf point into rx_pkts  */
+   _mm_storeu_si128((__m128i *)_pkts[pos], mbp1);
+
+   /* B.1 load 1 mbuf point */
+   mbp2 = _mm_loadu_si128((__m128i *)[pos+2]);
+
+   descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
+   /* B.1 load 2 mbuf point */
+   descs0[1] = _mm_loadu_si128((

[dpdk-dev] [PATCH v4 05/16] fm10k: add 2 functions to parse pkt_type and offload flag

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add 2 functions, in which using SSE instructions to parse RX desc
to get pkt_type and ol_flags in mbuf.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |  127 
 1 files changed, 127 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 6c21f15..88c9536 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,133 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif

+/* Handling the offload flags (olflags) field takes computation
+ * time when receiving packets. Therefore we provide a flag to disable
+ * the processing of the olflags field when they are not needed. This
+ * gives improved performance, at the cost of losing the offload info
+ * in the received packet
+ */
+#ifdef RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE
+
+/* Vlan present flag shift */
+#define VP_SHIFT (2)
+/* L3 type shift */
+#define L3TYPE_SHIFT (4)
+/* L4 type shift */
+#define L4TYPE_SHIFT (7)
+
+static inline void
+fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+   __m128i ptype0, ptype1, vtag0, vtag1;
+   union {
+   uint16_t e[4];
+   uint64_t dword;
+   } vol;
+
+   const __m128i pkttype_msk = _mm_set_epi16(
+   0x, 0x, 0x, 0x,
+   PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT,
+   PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT);
+
+   /* mask everything except rss type */
+   const __m128i rsstype_msk = _mm_set_epi16(
+   0x, 0x, 0x, 0x,
+   0x000F, 0x000F, 0x000F, 0x000F);
+
+   /* map rss type to rss hash flag */
+   const __m128i rss_flags = _mm_set_epi8(0, 0, 0, 0,
+   0, 0, 0, PKT_RX_RSS_HASH,
+   PKT_RX_RSS_HASH, 0, PKT_RX_RSS_HASH, 0,
+   PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, 0);
+
+   ptype0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+   ptype1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+   vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
+   vtag1 = _mm_unpackhi_epi16(descs[2], descs[3]);
+
+   ptype0 = _mm_unpacklo_epi32(ptype0, ptype1);
+   ptype0 = _mm_and_si128(ptype0, rsstype_msk);
+   ptype0 = _mm_shuffle_epi8(rss_flags, ptype0);
+
+   vtag1 = _mm_unpacklo_epi32(vtag0, vtag1);
+   vtag1 = _mm_srli_epi16(vtag1, VP_SHIFT);
+   vtag1 = _mm_and_si128(vtag1, pkttype_msk);
+
+   vtag1 = _mm_or_si128(ptype0, vtag1);
+   vol.dword = _mm_cvtsi128_si64(vtag1);
+
+   rx_pkts[0]->ol_flags = vol.e[0];
+   rx_pkts[1]->ol_flags = vol.e[1];
+   rx_pkts[2]->ol_flags = vol.e[2];
+   rx_pkts[3]->ol_flags = vol.e[3];
+}
+
+static inline void
+fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+   __m128i l3l4type0, l3l4type1, l3type, l4type;
+   union {
+   uint16_t e[4];
+   uint64_t dword;
+   } vol;
+
+   /* L3 pkt type mask  Bit4 to Bit6 */
+   const __m128i l3type_msk = _mm_set_epi16(
+   0x, 0x, 0x, 0x,
+   0x0070, 0x0070, 0x0070, 0x0070);
+
+   /* L4 pkt type mask  Bit7 to Bit9 */
+   const __m128i l4type_msk = _mm_set_epi16(
+   0x, 0x, 0x, 0x,
+   0x0380, 0x0380, 0x0380, 0x0380);
+
+   /* convert RRC l3 type to mbuf format */
+   const __m128i l3type_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
+   0, 0, 0, RTE_PTYPE_L3_IPV6_EXT,
+   RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV4_EXT,
+   RTE_PTYPE_L3_IPV4, 0);
+
+   /* Convert RRC l4 type to mbuf format l4type_flags shift-left 8 bits
+* to fill into8 bits length.
+*/
+   const __m128i l4type_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, 0,
+   RTE_PTYPE_TUNNEL_GENEVE >> 8,
+   RTE_PTYPE_TUNNEL_NVGRE >> 8,
+   RTE_PTYPE_TUNNEL_VXLAN >> 8,
+   RTE_PTYPE_TUNNEL_GRE >> 8,
+   RTE_PTYPE_L4_UDP >> 8,
+   RTE_PTYPE_L4_TCP >> 8,
+   0);
+
+   l3l4type0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+   l3l4type1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+   l3l4type0 = _mm_unpacklo_epi32(l3l4type0, l3l4type1);
+
+   l3type = _mm_and_si128(l3l4type0, l3type_msk);
+   l4type = _mm_and_si128(l3l4type0, l4type_msk);
+
+   l3type = _mm_srli_epi16(l3type, L3TYPE_SHIFT);
+   l4type = _mm_srli_epi16(l4type, L4TYPE_SHIFT);
+
+   l3type = _mm_shuffle_epi8(l3type_flags, l3type);
+   /* l4type_flags shift-left for 8 bits, need shift-right back */

[dpdk-dev] [PATCH v4 04/16] fm10k: add func to re-allocate mbuf for RX ring

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
in RX HW ring.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |   11 
 drivers/net/fm10k/fm10k_ethdev.c   |3 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   98 
 3 files changed, 112 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 362a2d0..5513644 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -123,6 +123,12 @@
 #define FM10K_VFTA_BIT(vlan_id)(1 << ((vlan_id) & 0x1F))
 #define FM10K_VFTA_IDX(vlan_id)((vlan_id) >> 5)

+#define RTE_FM10K_RXQ_REARM_THRESH  32
+#define RTE_FM10K_VPMD_TX_BURST 32
+#define RTE_FM10K_MAX_RX_BURST  RTE_FM10K_RXQ_REARM_THRESH
+#define RTE_FM10K_TX_MAX_FREE_BUF_SZ64
+#define RTE_FM10K_DESCS_PER_LOOP4
+
 struct fm10k_macvlan_filter_info {
uint16_t vlan_num;   /* Total VLAN number */
uint16_t mac_num;/* Total mac number */
@@ -171,6 +177,8 @@ struct fm10k_rx_queue {
struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
uint64_t hw_ring_phys_addr;
uint64_t mbuf_initializer; /* value to init mbufs */
+   /** need to alloc dummy mbuf, for wraparound when scanning hw ring */
+   struct rte_mbuf fake_mbuf;
uint16_t next_dd;
uint16_t next_alloc;
uint16_t next_trigger;
@@ -178,6 +186,9 @@ struct fm10k_rx_queue {
volatile uint32_t *tail_ptr;
uint16_t nb_desc;
uint16_t queue_id;
+   /* Below 2 fields only valid in case vPMD is applied. */
+   uint16_t rxrearm_nb; /* number of remaining to be re-armed */
+   uint16_t rxrearm_start;  /* the idx we start the re-arming from */
uint8_t port_id;
uint8_t drop_en;
uint8_t rx_deferred_start; /* don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 363ef98..44c3d34 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -121,6 +121,9 @@ rx_queue_reset(struct fm10k_rx_queue *q)
q->next_alloc = 0;
q->next_trigger = q->alloc_thresh - 1;
FM10K_PCI_REG_WRITE(q->tail_ptr, q->nb_desc - 1);
+   q->rxrearm_start = 0;
+   q->rxrearm_nb = 0;
+
return 0;
 }

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 34b677b..6c21f15 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -64,3 +64,101 @@ fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
rxq->mbuf_initializer = *(uint64_t *)p;
return 0;
 }
+
+static inline void
+fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
+{
+   int i;
+   uint16_t rx_id;
+   volatile union fm10k_rx_desc *rxdp;
+   struct rte_mbuf **mb_alloc = >sw_ring[rxq->rxrearm_start];
+   struct rte_mbuf *mb0, *mb1;
+   __m128i head_off = _mm_set_epi64x(
+   RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1,
+   RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1);
+   __m128i dma_addr0, dma_addr1;
+   /* Rx buffer need to be aligned with 512 byte */
+   const __m128i hba_msk = _mm_set_epi64x(0,
+   UINT64_MAX - FM10K_RX_DATABUF_ALIGN + 1);
+
+   rxdp = rxq->hw_ring + rxq->rxrearm_start;
+
+   /* Pull 'n' more MBUFs into the software ring */
+   if (rte_mempool_get_bulk(rxq->mp,
+(void *)mb_alloc,
+RTE_FM10K_RXQ_REARM_THRESH) < 0) {
+   dma_addr0 = _mm_setzero_si128();
+   /* Clean up all the HW/SW ring content */
+   for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) {
+   mb_alloc[i] = >fake_mbuf;
+   _mm_store_si128((__m128i *)[i].q,
+   dma_addr0);
+   }
+
+   rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+   RTE_FM10K_RXQ_REARM_THRESH;
+   return;
+   }
+
+   /* Initialize the mbufs in vector, process 2 mbufs in one loop */
+   for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i += 2, mb_alloc += 2) {
+   __m128i vaddr0, vaddr1;
+   uintptr_t p0, p1;
+
+   mb0 = mb_alloc[0];
+   mb1 = mb_alloc[1];
+
+   /* Flush mbuf with pkt template.
+* Data to be rearmed is 6 bytes long.
+* Though, RX will overwrite ol_flags that are coming next
+* anyway. So overwrite whole 8 bytes with one load:
+* 6 bytes of rearm_data plus first 2 bytes of ol_flags.
+*/
+  

[dpdk-dev] [PATCH v4 03/16] fm10k: Add a new func to initialize all parameters

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add new function fm10k_params_init to initialize all fm10k related
variables.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_ethdev.c |   35 +++
 1 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 3c7784e..363ef98 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2066,6 +2066,27 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
.rss_hash_conf_get  = fm10k_rss_hash_conf_get,
 };

+static void
+fm10k_params_init(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct fm10k_dev_info *info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+
+   /* Inialize bus info. Normally we would call fm10k_get_bus_info(), but
+* there is no way to get link status without reading BAR4.  Until this
+* works, assume we have maximum bandwidth.
+* @todo - fix bus info
+*/
+   hw->bus_caps.speed = fm10k_bus_speed_8000;
+   hw->bus_caps.width = fm10k_bus_width_pcie_x8;
+   hw->bus_caps.payload = fm10k_bus_payload_512;
+   hw->bus.speed = fm10k_bus_speed_8000;
+   hw->bus.width = fm10k_bus_width_pcie_x8;
+   hw->bus.payload = fm10k_bus_payload_256;
+
+   info->rx_vec_allowed = true;
+}
+
 static int
 eth_fm10k_dev_init(struct rte_eth_dev *dev)
 {
@@ -2112,18 +2133,8 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
return -EIO;
}

-   /*
-* Inialize bus info. Normally we would call fm10k_get_bus_info(), but
-* there is no way to get link status without reading BAR4.  Until this
-* works, assume we have maximum bandwidth.
-* @todo - fix bus info
-*/
-   hw->bus_caps.speed = fm10k_bus_speed_8000;
-   hw->bus_caps.width = fm10k_bus_width_pcie_x8;
-   hw->bus_caps.payload = fm10k_bus_payload_512;
-   hw->bus.speed = fm10k_bus_speed_8000;
-   hw->bus.width = fm10k_bus_width_pcie_x8;
-   hw->bus.payload = fm10k_bus_payload_256;
+   /* Initialize parameters */
+   fm10k_params_init(dev);

/* Initialize the hw */
diag = fm10k_init_hw(hw);
-- 
1.7.7.6



[dpdk-dev] [PATCH v4 02/16] fm10k: add vPMD pre-condition check for each RX queue

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add condition check in rx_queue_setup func. If number of RX desc
can't satisfy vPMD requirement, record it into a variable. Or
call fm10k_rxq_vec_setup to initialize Vector RX.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |   11 ---
 drivers/net/fm10k/fm10k_ethdev.c   |   11 +++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   21 +
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index c089882..362a2d0 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -135,6 +135,8 @@ struct fm10k_dev_info {
/* Protect the mailbox to avoid race condition */
rte_spinlock_tmbx_lock;
struct fm10k_macvlan_filter_infomacvlan;
+   /* Flag to indicate if RX vector conditions satisfied */
+   bool rx_vec_allowed;
 };

 /*
@@ -165,9 +167,10 @@ struct fm10k_rx_queue {
struct rte_mempool *mp;
struct rte_mbuf **sw_ring;
volatile union fm10k_rx_desc *hw_ring;
-   struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
-   struct rte_mbuf *pkt_last_seg;  /**< Last segment of current packet. */
+   struct rte_mbuf *pkt_first_seg; /* First segment of current packet. */
+   struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
uint64_t hw_ring_phys_addr;
+   uint64_t mbuf_initializer; /* value to init mbufs */
uint16_t next_dd;
uint16_t next_alloc;
uint16_t next_trigger;
@@ -177,7 +180,7 @@ struct fm10k_rx_queue {
uint16_t queue_id;
uint8_t port_id;
uint8_t drop_en;
-   uint8_t rx_deferred_start; /**< don't start this queue in dev start. */
+   uint8_t rx_deferred_start; /* don't start this queue in dev start. */
 };

 /*
@@ -313,4 +316,6 @@ uint16_t fm10k_recv_scattered_pkts(void *rx_queue,

 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
+
+int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index a69c990..3c7784e 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1251,6 +1251,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
const struct rte_eth_rxconf *conf, struct rte_mempool *mp)
 {
struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
struct fm10k_rx_queue *q;
const struct rte_memzone *mz;

@@ -1333,6 +1334,16 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
q->hw_ring_phys_addr = mz->phys_addr;
 #endif

+   /* Check if number of descs satisfied Vector requirement */
+   if (!rte_is_power_of_2(nb_desc)) {
+   PMD_INIT_LOG(DEBUG, "queue[%d] doesn't meet Vector Rx "
+   "preconditions - canceling the feature for "
+   "the whole port[%d]",
+q->queue_id, q->port_id);
+   dev_info->rx_vec_allowed = false;
+   } else
+   fm10k_rxq_vec_setup(q);
+
dev->data->rx_queues[queue_id] = q;
return 0;
 }
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 69174d9..34b677b 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -43,3 +43,24 @@
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
+
+int __attribute__((cold))
+fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
+{
+   uintptr_t p;
+   struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
+
+   mb_def.nb_segs = 1;
+   /* data_off will be ajusted after new mbuf allocated for 512-byte
+* alignment.
+*/
+   mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+   mb_def.port = rxq->port_id;
+   rte_mbuf_refcnt_set(_def, 1);
+
+   /* prevent compiler reordering: rearm_data covers previous fields */
+   rte_compiler_barrier();
+   p = (uintptr_t)_def.rearm_data;
+   rxq->mbuf_initializer = *(uint64_t *)p;
+   return 0;
+}
-- 
1.7.7.6



[dpdk-dev] [PATCH v4 01/16] fm10k: add new vPMD file

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add new file fm10k_rxtx_vec.c and add it into compiling.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/Makefile |1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   45 
 2 files changed, 46 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

diff --git a/drivers/net/fm10k/Makefile b/drivers/net/fm10k/Makefile
index a4a8f56..06ebf83 100644
--- a/drivers/net/fm10k/Makefile
+++ b/drivers/net/fm10k/Makefile
@@ -93,6 +93,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_mbx.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_api.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_rxtx_vec.c

 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
new file mode 100644
index 000..69174d9
--- /dev/null
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -0,0 +1,45 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+
+#include 
+#include 
+#include "fm10k.h"
+#include "base/fm10k_type.h"
+
+#include 
+
+#ifndef __INTEL_COMPILER
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif
-- 
1.7.7.6



[dpdk-dev] [PATCH v4 00/16] Vector Rx/Tx PMD implementation for fm10k

2015-10-29 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

v4:
 - Clear HW/SW ring content after allocating mbuf failed.

v3:
 - Add a blank line after variable definition.
 - Do floor alignment for passing in argument nb_pkts to avoid memory 
overwritten.
 - Only scan max of 32 desc in scatter Rx function to avoid memory overwritten.

v2:
 - Fix a typo issue.
 - Fix an improper prefetch in vector RX function, in which prefetches
   un-initialized mbuf.
 - Remove limitation on number of desc pointer in vector RX function.
 - Re-organize some comments.
 - Add a new patch to fix a crash issue in vector RX func.
 - Add a new patch to update release notes.

v1:
This patch set includes Vector Rx/Tx functions to receive/transmit packets
for fm10k devices. It also contains logic to do sanity check for proper
RX/TX function selections.

Chen Jing D(Mark) (16):
  fm10k: add new vPMD file
  fm10k: add vPMD pre-condition check for each RX queue
  fm10k: Add a new func to initialize all parameters
  fm10k: add func to re-allocate mbuf for RX ring
  fm10k: add 2 functions to parse pkt_type and offload flag
  fm10k: add Vector RX function
  fm10k: add func to do Vector RX condition check
  fm10k: add Vector RX scatter function
  fm10k: add function to decide best RX function
  fm10k: add func to release mbuf in case Vector RX applied
  fm10k: add Vector TX function
  fm10k: use func pointer to reset TX queue and mbuf release
  fm10k: introduce 2 funcs to reset TX queue and mbuf release
  fm10k: Add function to decide best TX func
  fm10k: fix a crash issue in vector RX func
  doc: release notes update for fm10k Vector PMD

 doc/guides/rel_notes/release_2_2.rst |5 +
 drivers/net/fm10k/Makefile   |1 +
 drivers/net/fm10k/fm10k.h|   45 ++-
 drivers/net/fm10k/fm10k_ethdev.c |  169 ++-
 drivers/net/fm10k/fm10k_rxtx_vec.c   |  847 ++
 5 files changed, 1039 insertions(+), 28 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

-- 
1.7.7.6



[dpdk-dev] [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring

2015-10-29 Thread Chen, Jing D
Hi, Steve,

Best Regards,
Mark


> -Original Message-
> From: Liang, Cunming
> Sent: Thursday, October 29, 2015 4:15 PM
> To: Chen, Jing D; dev at dpdk.org
> Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> Subject: RE: [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring
> 
> Hi Mark?
> 
> 
> > -Original Message-
> > From: Chen, Jing D
> > Sent: Thursday, October 29, 2015 1:24 PM
> > To: Liang, Cunming; dev at dpdk.org
> > Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> > Subject: RE: [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX
> ring
> >
> > Hi, Steve,
> >
> > Best Regards,
> > Mark
> >
> >
> > > -----Original Message-
> > > From: Liang, Cunming
> > > Sent: Wednesday, October 28, 2015 9:59 PM
> > > To: Chen, Jing D; dev at dpdk.org
> > > Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> > > Subject: Re: [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX
> ring
> > >
> > > Hi Mark,
> > >
> > > On 10/27/2015 5:46 PM, Chen Jing D(Mark) wrote:
> > > > From: "Chen Jing D(Mark)" 
> > > >
> > > > Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
> > > > in RX HW ring.
> > > >
> > > > Signed-off-by: Chen Jing D(Mark) 
> > > > ---
> > > >   drivers/net/fm10k/fm10k.h  |9 
> > > >   drivers/net/fm10k/fm10k_ethdev.c   |3 +
> > > >   drivers/net/fm10k/fm10k_rxtx_vec.c |   90
> > > 
> > > >   3 files changed, 102 insertions(+), 0 deletions(-)
> > > [...]
> > > > +static inline void
> > > > +fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
> > > > +{
> > > > +   int i;
> > > > +   uint16_t rx_id;
> > > > +   volatile union fm10k_rx_desc *rxdp;
> > > > +   struct rte_mbuf **mb_alloc = >sw_ring[rxq->rxrearm_start];
> > > > +   struct rte_mbuf *mb0, *mb1;
> > > > +   __m128i head_off = _mm_set_epi64x(
> > > > +   RTE_PKTMBUF_HEADROOM +
> > > FM10K_RX_DATABUF_ALIGN - 1,
> > > > +   RTE_PKTMBUF_HEADROOM +
> > > FM10K_RX_DATABUF_ALIGN - 1);
> > > > +   __m128i dma_addr0, dma_addr1;
> > > > +   /* Rx buffer need to be aligned with 512 byte */
> > > > +   const __m128i hba_msk = _mm_set_epi64x(0,
> > > > +   UINT64_MAX - FM10K_RX_DATABUF_ALIGN
> > > + 1);
> > > > +
> > > > +   rxdp = rxq->hw_ring + rxq->rxrearm_start;
> > > > +
> > > > +   /* Pull 'n' more MBUFs into the software ring */
> > > > +   if (rte_mempool_get_bulk(rxq->mp,
> > > > +(void *)mb_alloc,
> > > > +RTE_FM10K_RXQ_REARM_THRESH) < 0) {
> > > Here's one potential issue when the failure happens. As tail won't
> > > update, the head will equal to tail in the end. HW won't write back
> > > anyway, however the SW recv_raw_pkts_vec only check DD bit, the old
> > > 'dirty' descriptor(DD bit is not clean) will be taken and continue move
> > > forward to check the next which even beyond the tail. I'm sorry didn't
> > > catch it on the first time. /Steve
> >
> > I have a different view on this. In case mbuf allocation always failed and 
> > tail
> > equaled to head,
> > then HW will stop to send new packet to HW ring, as you pointed out. Then,
> > when
> > Mbuf can be allocated, this function will refill HW ring and update tail.
> We can't guarantee it successful to recover and allocates new mbuf before
> the polling SW already move beyond the un-rearmed dirty entry.
> So, HW

Thanks! I got you. I'll change accordingly.

> > will
> > resume to fill packet to HW ring. Receive functions will continue to work.
> The point is HW is pending on that moment, but polling receive function
> won't wait, it just read next DD, but the value is 1 which hasn't cleared.
> > Anything I missed?
> >
> > > > +   rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed
> > > +=
> > > > +   RTE_FM10K_RXQ_REARM_THRESH;
> > > > +   return;
> > > > +   }
> > > > +
> > > > +



[dpdk-dev] [PATCH v3 08/16] fm10k: add Vector RX scatter function

2015-10-29 Thread Chen, Jing D
Hi, Steve,

Best Regards,
Mark


> -Original Message-
> From: Liang, Cunming
> Sent: Wednesday, October 28, 2015 10:30 PM
> To: Chen, Jing D; dev at dpdk.org
> Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> Subject: Re: [PATCH v3 08/16] fm10k: add Vector RX scatter function
> 
> Hi Mark,
> 
> On 10/27/2015 5:46 PM, Chen Jing D(Mark) wrote:
> > From: "Chen Jing D(Mark)" 
> >
> > Add func fm10k_recv_scattered_pkts_vec to receive chained packets
> > with SSE instructions.
> >
> > Signed-off-by: Chen Jing D(Mark) 
> > ---
> >   drivers/net/fm10k/fm10k.h  |2 +
> >   drivers/net/fm10k/fm10k_rxtx_vec.c |   88
> 
> >   2 files changed, 90 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
> > index 1502ae3..06697fa 100644
> > --- a/drivers/net/fm10k/fm10k.h
> > +++ b/drivers/net/fm10k/fm10k.h
> > @@ -329,4 +329,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts,
> >   int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
> >   int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
> >   uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
> > +uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
> > +   uint16_t);
> >   #endif
> > diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c
> b/drivers/net/fm10k/fm10k_rxtx_vec.c
> > index 2e6f1a2..3fd5d45 100644
> > --- a/drivers/net/fm10k/fm10k_rxtx_vec.c
> > +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
> > @@ -513,3 +513,91 @@ fm10k_recv_pkts_vec(void *rx_queue, struct
> rte_mbuf **rx_pkts,
> >   {
> > return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts,
> NULL);
> >   }
> > +
> > +static inline uint16_t
> > +fm10k_reassemble_packets(struct fm10k_rx_queue *rxq,
> > +   struct rte_mbuf **rx_bufs,
> > +   uint16_t nb_bufs, uint8_t *split_flags)
> > +{
> > +   struct rte_mbuf *pkts[RTE_FM10K_MAX_RX_BURST]; /*finished
> pkts*/
> > +   struct rte_mbuf *start = rxq->pkt_first_seg;
> > +   struct rte_mbuf *end =  rxq->pkt_last_seg;
> > +   unsigned pkt_idx, buf_idx;
> > +
> > +
> > +   for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
> > +   if (end != NULL) {
> > +   /* processing a split packet */
> > +   end->next = rx_bufs[buf_idx];
> > +   start->nb_segs++;
> > +   start->pkt_len += rx_bufs[buf_idx]->data_len;
> > +   end = end->next;
> > +
> > +   if (!split_flags[buf_idx]) {
> > +   /* it's the last packet of the set */
> > +   start->hash = end->hash;
> > +   start->ol_flags = end->ol_flags;
> > +   pkts[pkt_idx++] = start;
> > +   start = end = NULL;
> > +   }
> > +   } else {
> > +   /* not processing a split packet */
> > +   if (!split_flags[buf_idx]) {
> > +   /* not a split packet, save and skip */
> > +   pkts[pkt_idx++] = rx_bufs[buf_idx];
> > +   continue;
> > +   }
> > +   end = start = rx_bufs[buf_idx];
> > +   }
> I guess you forgot to consider the crc_len during processing. /Steve

In fm10k, crc is always be stripped and pkt_len/data_len carried actual
data length. So, we needn't add crc_len back here.  This is a little different
from IXGBE.



[dpdk-dev] [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring

2015-10-29 Thread Chen, Jing D
Hi, Steve,

Best Regards,
Mark


> -Original Message-
> From: Liang, Cunming
> Sent: Wednesday, October 28, 2015 9:59 PM
> To: Chen, Jing D; dev at dpdk.org
> Cc: Tao, Zhe; He, Shaopeng; Ananyev, Konstantin; Richardson, Bruce
> Subject: Re: [PATCH v3 04/16] fm10k: add func to re-allocate mbuf for RX ring
> 
> Hi Mark,
> 
> On 10/27/2015 5:46 PM, Chen Jing D(Mark) wrote:
> > From: "Chen Jing D(Mark)" 
> >
> > Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
> > in RX HW ring.
> >
> > Signed-off-by: Chen Jing D(Mark) 
> > ---
> >   drivers/net/fm10k/fm10k.h  |9 
> >   drivers/net/fm10k/fm10k_ethdev.c   |3 +
> >   drivers/net/fm10k/fm10k_rxtx_vec.c |   90
> 
> >   3 files changed, 102 insertions(+), 0 deletions(-)
> [...]
> > +static inline void
> > +fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
> > +{
> > +   int i;
> > +   uint16_t rx_id;
> > +   volatile union fm10k_rx_desc *rxdp;
> > +   struct rte_mbuf **mb_alloc = >sw_ring[rxq->rxrearm_start];
> > +   struct rte_mbuf *mb0, *mb1;
> > +   __m128i head_off = _mm_set_epi64x(
> > +   RTE_PKTMBUF_HEADROOM +
> FM10K_RX_DATABUF_ALIGN - 1,
> > +   RTE_PKTMBUF_HEADROOM +
> FM10K_RX_DATABUF_ALIGN - 1);
> > +   __m128i dma_addr0, dma_addr1;
> > +   /* Rx buffer need to be aligned with 512 byte */
> > +   const __m128i hba_msk = _mm_set_epi64x(0,
> > +   UINT64_MAX - FM10K_RX_DATABUF_ALIGN
> + 1);
> > +
> > +   rxdp = rxq->hw_ring + rxq->rxrearm_start;
> > +
> > +   /* Pull 'n' more MBUFs into the software ring */
> > +   if (rte_mempool_get_bulk(rxq->mp,
> > +(void *)mb_alloc,
> > +RTE_FM10K_RXQ_REARM_THRESH) < 0) {
> Here's one potential issue when the failure happens. As tail won't
> update, the head will equal to tail in the end. HW won't write back
> anyway, however the SW recv_raw_pkts_vec only check DD bit, the old
> 'dirty' descriptor(DD bit is not clean) will be taken and continue move
> forward to check the next which even beyond the tail. I'm sorry didn't
> catch it on the first time. /Steve

I have a different view on this. In case mbuf allocation always failed and tail 
equaled to head, 
then HW will stop to send new packet to HW ring, as you pointed out. Then, when 
Mbuf can be allocated, this function will refill HW ring and update tail. So, 
HW will 
resume to fill packet to HW ring. Receive functions will continue to work. 
Anything I missed?

> > +   rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed
> +=
> > +   RTE_FM10K_RXQ_REARM_THRESH;
> > +   return;
> > +   }
> > +
> > +



[dpdk-dev] [PATCH v2 0/7] interrupt mode for fm10k

2015-10-28 Thread Chen, Jing D

> -Original Message-
> From: He, Shaopeng
> Sent: Monday, October 26, 2015 11:48 AM
> To: dev at dpdk.org
> Cc: Chen, Jing D; Qiu, Michael; He, Shaopeng
> Subject: [PATCH v2 0/7] interrupt mode for fm10k
> 
> This patch series adds interrupt mode support for fm10k,
> contains four major parts:
> 
> 1. implement rx_descriptor_done function in fm10k
> 2. make sure default VID available in dev_init in fm10k
> 3. fix a memory leak for non-ip packet in l3fwd-power
> 4. add rx interrupt support in fm10k PF and VF
> 
> The patch set is developed based on one previous patch set
> "[PATCH v1 00/11] interrupt mode for i40e"
> http://www.dpdk.org/ml/archives/dev/2015-September/023903.html
> 
> Shaopeng He (7):
>   fm10k: implement rx_descriptor_done function
>   fm10k: setup rx queue interrupts for PF and VF
>   fm10k: remove rx queue interrupts when dev stops
>   fm10k: add rx queue interrupt en/dis functions
>   fm10k: make sure default VID available in dev_init
>   l3fwd-power: fix a memory leak for non-ip packet
>   doc: release note update for fm10k intr mode
> 
>  doc/guides/rel_notes/release_2_2.rst |   2 +
>  drivers/net/fm10k/fm10k.h|   3 +
>  drivers/net/fm10k/fm10k_ethdev.c | 164
> +--
>  drivers/net/fm10k/fm10k_rxtx.c   |  25 ++
>  examples/l3fwd-power/main.c  |   3 +-
>  5 files changed, 189 insertions(+), 8 deletions(-)
> 
> --
> 1.9.3

Acked-by : Jing Chen 



[dpdk-dev] [PATCH v3 16/16] doc: release notes update for fm10k Vector PMD

2015-10-27 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Update 2.2 release notes, add descriptions for Vector PMD implementation
in fm10k driver.

Signed-off-by: Chen Jing D(Mark) 
---
 doc/guides/rel_notes/release_2_2.rst |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index 9a70dae..44a3f74 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -39,6 +39,11 @@ Drivers

   Fixed issue with libvirt ``virsh destroy`` not killing the VM.

+* **fm10k:  Add Vector Rx/Tx implementation.**
+
+  This patch set includes Vector Rx/Tx functions to receive/transmit packets
+  for fm10k devices. It also contains logic to do sanity check for proper
+  RX/TX function selections.

 Libraries
 ~
-- 
1.7.7.6



[dpdk-dev] [PATCH v3 15/16] fm10k: fix a crash issue in vector RX func

2015-10-27 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Vector RX function will process 4 packets at a time. When the RX
ring wrapps to the tail and the left descriptor size is not multiple
of 4, SW will overwrite memory that not belongs to it and cause crash.
The fix will allocate additional 4 HW/SW spaces at the tail to avoid
overwrite.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h|4 
 drivers/net/fm10k/fm10k_ethdev.c |   19 +--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 68ae1b8..82a548f 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -177,12 +177,16 @@ struct fm10k_rx_queue {
struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
uint64_t hw_ring_phys_addr;
uint64_t mbuf_initializer; /* value to init mbufs */
+   /* need to alloc dummy mbuf, for wraparound when scanning hw ring */
+   struct rte_mbuf fake_mbuf;
uint16_t next_dd;
uint16_t next_alloc;
uint16_t next_trigger;
uint16_t alloc_thresh;
volatile uint32_t *tail_ptr;
uint16_t nb_desc;
+   /* Number of faked desc added at the tail for Vector RX function */
+   uint16_t nb_fake_desc;
uint16_t queue_id;
/* Below 2 fields only valid in case vPMD is applied. */
uint16_t rxrearm_nb; /* number of remaining to be re-armed */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 469bd85..fb8ec0d 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -102,6 +102,7 @@ fm10k_mbx_unlock(struct fm10k_hw *hw)
 static inline int
 rx_queue_reset(struct fm10k_rx_queue *q)
 {
+   static const union fm10k_rx_desc zero = {{0}};
uint64_t dma_addr;
int i, diag;
PMD_INIT_FUNC_TRACE();
@@ -122,6 +123,15 @@ rx_queue_reset(struct fm10k_rx_queue *q)
q->hw_ring[i].q.hdr_addr = dma_addr;
}

+   /* initialize extra software ring entries. Space for these extra
+* entries is always allocated.
+*/
+   memset(>fake_mbuf, 0x0, sizeof(q->fake_mbuf));
+   for (i = 0; i < q->nb_fake_desc; ++i) {
+   q->sw_ring[q->nb_desc + i] = >fake_mbuf;
+   q->hw_ring[q->nb_desc + i] = zero;
+   }
+
q->next_dd = 0;
q->next_alloc = 0;
q->next_trigger = q->alloc_thresh - 1;
@@ -147,6 +157,10 @@ rx_queue_clean(struct fm10k_rx_queue *q)
for (i = 0; i < q->nb_desc; ++i)
q->hw_ring[i] = zero;

+   /* zero faked descriptors */
+   for (i = 0; i < q->nb_fake_desc; ++i)
+   q->hw_ring[q->nb_desc + i] = zero;
+
/* vPMD driver has a different way of releasing mbufs. */
if (q->rx_using_sse) {
fm10k_rx_queue_release_mbufs_vec(q);
@@ -1323,6 +1337,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
/* setup queue */
q->mp = mp;
q->nb_desc = nb_desc;
+   q->nb_fake_desc = FM10K_MULT_RX_DESC;
q->port_id = dev->data->port_id;
q->queue_id = queue_id;
q->tail_ptr = (volatile uint32_t *)
@@ -1332,8 +1347,8 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,

/* allocate memory for the software ring */
q->sw_ring = rte_zmalloc_socket("fm10k sw ring",
-   nb_desc * sizeof(struct rte_mbuf *),
-   RTE_CACHE_LINE_SIZE, socket_id);
+   (nb_desc + q->nb_fake_desc) * sizeof(struct rte_mbuf *),
+   RTE_CACHE_LINE_SIZE, socket_id);
if (q->sw_ring == NULL) {
PMD_INIT_LOG(ERR, "Cannot allocate software ring");
rte_free(q);
-- 
1.7.7.6



[dpdk-dev] [PATCH v3 14/16] fm10k: Add function to decide best TX func

2015-10-27 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add func fm10k_set_tx_function to decide the best TX func in
fm10k_dev_tx_init.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h|1 +
 drivers/net/fm10k/fm10k_ethdev.c |   38 --
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 2bead12..68ae1b8 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -222,6 +222,7 @@ struct fm10k_tx_queue {
uint16_t next_rs; /* Next pos to set RS flag */
uint16_t next_dd; /* Next pos to check DD flag */
volatile uint32_t *tail_ptr;
+   uint32_t txq_flags; /* Holds flags for this TXq */
uint16_t nb_desc;
uint8_t port_id;
uint8_t tx_deferred_start; /** < don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 88bd887..469bd85 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -53,6 +53,9 @@
 #define CHARS_PER_UINT32 (sizeof(uint32_t))
 #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)

+#define FM10K_SIMPLE_TX_FLAG ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+   ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static void fm10k_close_mbx_service(struct fm10k_hw *hw);
 static void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev);
 static void fm10k_dev_promiscuous_disable(struct rte_eth_dev *dev);
@@ -68,6 +71,7 @@ fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
 static void fm10k_set_rx_function(struct rte_eth_dev *dev);
+static void fm10k_set_tx_function(struct rte_eth_dev *dev);

 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -414,6 +418,10 @@ fm10k_dev_tx_init(struct rte_eth_dev *dev)
base_addr >> (CHAR_BIT * sizeof(uint32_t)));
FM10K_WRITE_REG(hw, FM10K_TDLEN(i), size);
}
+
+   /* set up vector or scalar TX function as appropriate */
+   fm10k_set_tx_function(dev);
+
return 0;
 }

@@ -980,8 +988,7 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
},
.tx_free_thresh = FM10K_TX_FREE_THRESH_DEFAULT(0),
.tx_rs_thresh = FM10K_TX_RS_THRESH_DEFAULT(0),
-   .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
-   ETH_TXQ_FLAGS_NOOFFLOADS,
+   .txq_flags = FM10K_SIMPLE_TX_FLAG,
};

 }
@@ -1479,6 +1486,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
q->nb_desc = nb_desc;
q->port_id = dev->data->port_id;
q->queue_id = queue_id;
+   q->txq_flags = conf->txq_flags;
q->ops = _txq_ops;
q->tail_ptr = (volatile uint32_t *)
&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
@@ -2090,6 +2098,32 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 };

 static void __attribute__((cold))
+fm10k_set_tx_function(struct rte_eth_dev *dev)
+{
+   struct fm10k_tx_queue *txq;
+   int i;
+   int use_sse = 1;
+
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   txq = dev->data->tx_queues[i];
+   if ((txq->txq_flags & FM10K_SIMPLE_TX_FLAG) != \
+   FM10K_SIMPLE_TX_FLAG) {
+   use_sse = 0;
+   break;
+   }
+   }
+
+   if (use_sse) {
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   txq = dev->data->tx_queues[i];
+   fm10k_txq_vec_setup(txq);
+   }
+   dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+   } else
+   dev->tx_pkt_burst = fm10k_xmit_pkts;
+}
+
+static void __attribute__((cold))
 fm10k_set_rx_function(struct rte_eth_dev *dev)
 {
struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
-- 
1.7.7.6



[dpdk-dev] [PATCH v3 13/16] fm10k: introduce 2 funcs to reset TX queue and mbuf release

2015-10-27 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" <jing.d.c...@intel.com>

Add 2 funcs to reset TX queue and mbuf release when Vector TX
applied.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |   68 
 1 files changed, 68 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index e802eec..7ef7910 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,11 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif

+static void
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq);
+static void
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);
+
 /* Handling the offload flags (olflags) field takes computation
  * time when receiving packets. Therefore we provide a flag to disable
  * the processing of the olflags field when they are not needed. This
@@ -620,6 +625,17 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
_flags[i]);
 }

+static const struct fm10k_txq_ops vec_txq_ops = {
+   .release_mbufs = fm10k_tx_queue_release_mbufs_vec,
+   .reset = fm10k_reset_tx_queue,
+};
+
+void __attribute__((cold))
+fm10k_txq_vec_setup(struct fm10k_tx_queue *txq)
+{
+   txq->ops = _txq_ops;
+}
+
 static inline void
 vtx1(volatile struct fm10k_tx_desc *txdp,
struct rte_mbuf *pkt, uint64_t flags)
@@ -769,3 +785,55 @@ fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,

return nb_pkts;
 }
+
+static void __attribute__((cold))
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq)
+{
+   unsigned i;
+   const uint16_t max_desc = (uint16_t)(txq->nb_desc - 1);
+
+   if (txq->sw_ring == NULL || txq->nb_free == max_desc)
+   return;
+
+   /* release the used mbufs in sw_ring */
+   for (i = txq->next_dd - (txq->rs_thresh - 1);
+i != txq->next_free;
+i = (i + 1) & max_desc)
+   rte_pktmbuf_free_seg(txq->sw_ring[i]);
+
+   txq->nb_free = max_desc;
+
+   /* reset tx_entry */
+   for (i = 0; i < txq->nb_desc; i++)
+   txq->sw_ring[i] = NULL;
+
+   rte_free(txq->sw_ring);
+   txq->sw_ring = NULL;
+}
+
+static void __attribute__((cold))
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq)
+{
+   static const struct fm10k_tx_desc zeroed_desc = {0};
+   struct rte_mbuf **txe = txq->sw_ring;
+   uint16_t i;
+
+   /* Zero out HW ring memory */
+   for (i = 0; i < txq->nb_desc; i++)
+   txq->hw_ring[i] = zeroed_desc;
+
+   /* Initialize SW ring entries */
+   for (i = 0; i < txq->nb_desc; i++)
+   txe[i] = NULL;
+
+   txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+   txq->next_rs = (uint16_t)(txq->rs_thresh - 1);
+
+   txq->next_free = 0;
+   txq->nb_used = 0;
+   /* Always allow 1 descriptor to be un-allocated to avoid
+* a H/W race condition
+*/
+   txq->nb_free = (uint16_t)(txq->nb_desc - 1);
+   FM10K_PCI_REG_WRITE(txq->tail_ptr, 0);
+}
-- 
1.7.7.6



  1   2   3   4   >