date:20151022

[dpdk-dev] i40e: problem with rx packet drops not accounted in statistics

2015-10-22 Thread Zhang, Helin

Hi Martin

Yes, we have a developer working on it now, and hopefully he will have 
something soon later on this fix.
But what do you mean the performance problem? Did you mean the performance 
number is not good as expected, or else?

Regards,
Helin

> -Original Message-
> From: Martin Weiser [mailto:martin.weiser at allegro-packets.com]
> Sent: Wednesday, October 21, 2015 4:44 PM
> To: Zhang, Helin
> Cc: dev at dpdk.org
> Subject: Re: i40e: problem with rx packet drops not accounted in statistics
> 
> Hi Helin,
> 
> any news on this issue? By the way this is not just a problem with statistics 
> for us
> but also a performance problem since these packet discards start appearing at 
> a
> relatively low bandwidth (~5GBit/s and ~1.5Mpps).
> 
> Best regards,
> Martin
> 
> On 10.09.15 03:09, Zhang, Helin wrote:
> > Hi Martin
> >
> > Yes, the statistics issue has been reported several times recently.
> > We will check the issue and try to fix it or get a workaround soon. Thank 
> > you
> very much!
> >
> > Regards,
> > Helin
> >
> >> -Original Message-
> >> From: Martin Weiser [mailto:martin.weiser at allegro-packets.com]
> >> Sent: Wednesday, September 9, 2015 7:58 PM
> >> To: Zhang, Helin
> >> Cc: dev at dpdk.org
> >> Subject: i40e: problem with rx packet drops not accounted in
> >> statistics
> >>
> >> Hi Helin,
> >>
> >> in one of our test setups involving i40e adapters we are experiencing
> >> packet drops which are not reflected in the interfaces statistics.
> >> The call to rte_eth_stats_get suggests that all packets were properly
> >> received but the total number of packets received through
> >> rte_eth_rx_burst is less than the ipackets counter.
> >> When for example running the l2fwd application (l2fwd -c 0xfe -n 4 --
> >> -p
> >> 0x3) and having driver debug messages enabled the following output is
> >> generated for the interface in question:
> >>
> >> ...
> >> PMD: i40e_update_vsi_stats(): * VSI[6] stats start
> >> ***
> >> PMD: i40e_update_vsi_stats(): rx_bytes:24262434
> >> PMD: i40e_update_vsi_stats(): rx_unicast:  16779
> >> PMD: i40e_update_vsi_stats(): rx_multicast:0
> >> PMD: i40e_update_vsi_stats(): rx_broadcast:0
> >> PMD: i40e_update_vsi_stats(): rx_discards: 1192557
> >> PMD: i40e_update_vsi_stats(): rx_unknown_protocol: 0
> >> PMD: i40e_update_vsi_stats(): tx_bytes:0
> >> PMD: i40e_update_vsi_stats(): tx_unicast:  0
> >> PMD: i40e_update_vsi_stats(): tx_multicast:0
> >> PMD: i40e_update_vsi_stats(): tx_broadcast:0
> >> PMD: i40e_update_vsi_stats(): tx_discards: 0
> >> PMD: i40e_update_vsi_stats(): tx_errors:   0
> >> PMD: i40e_update_vsi_stats(): * VSI[6] stats end
> >> ***
> >> PMD: i40e_dev_stats_get(): * PF stats start
> >> ***
> >> PMD: i40e_dev_stats_get(): rx_bytes:24262434
> >> PMD: i40e_dev_stats_get(): rx_unicast:  16779
> >> PMD: i40e_dev_stats_get(): rx_multicast:0
> >> PMD: i40e_dev_stats_get(): rx_broadcast:0
> >> PMD: i40e_dev_stats_get(): rx_discards: 0
> >> PMD: i40e_dev_stats_get(): rx_unknown_protocol: 16779
> >> PMD: i40e_dev_stats_get(): tx_bytes:0
> >> PMD: i40e_dev_stats_get(): tx_unicast:  0
> >> PMD: i40e_dev_stats_get(): tx_multicast:0
> >> PMD: i40e_dev_stats_get(): tx_broadcast:0
> >> PMD: i40e_dev_stats_get(): tx_discards: 0
> >> PMD: i40e_dev_stats_get(): tx_errors:   0
> >> PMD: i40e_dev_stats_get(): tx_dropped_link_down: 0
> >> PMD: i40e_dev_stats_get(): crc_errors:   0
> >> PMD: i40e_dev_stats_get(): illegal_bytes:0
> >> PMD: i40e_dev_stats_get(): error_bytes:  0
> >> PMD: i40e_dev_stats_get(): mac_local_faults: 1
> >> PMD: i40e_dev_stats_get(): mac_remote_faults:1
> >> PMD: i40e_dev_stats_get(): rx_length_errors: 0
> >> PMD: i40e_dev_stats_get(): link_xon_rx:  0
> >> PMD: i40e_dev_stats_get(): link_xoff_rx: 0
> >> PMD: i40e_dev_stats_get(): priority_xon_rx[0]:  0
> >> PMD: i40e_dev_stats_get(): priority_xoff_rx[0]: 0
> >> PMD: i40e_dev_stats_get(): priority_xon_rx[1]:  0
> >> PMD: i40e_dev_stats_get(): priority_xoff_rx[1]: 0
> >> PMD: i40e_dev_stats_get(): priority_xon_rx[2]:  0
> >> PMD: i40e_dev_stats_get(): priority_xoff_rx[2]: 0
> >> PMD: i40e_dev_stats_get(): priority_xon_rx[3]:  0
> >> PMD: i40e_dev_stats_get(): priority_xoff_rx[3]: 0
> >> PMD: i40e_dev_stats_get(): priority_xon_rx[4]:  0
> >> PMD: i40e_dev_stats_get(): priority_xoff_rx[4]: 0
> >> PMD: i40e_dev_stats_get(): priority_xon_rx[5]:  0
> >> PMD: i40e_dev_stats_get(): priority_xoff_rx[5]: 0
> >> PMD: i40e_dev_stats_get(): priority_xon_rx[6]:  0
> >> PMD: i40e_dev_stats_get(): priority_xoff_rx[6]: 0
> >> PMD: i40e_dev_stats_get(

[dpdk-dev] [PATCH 5/6] doc: Update BNX2X PMD documentation

2015-10-22 Thread Rasesh Mody

> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, October 20, 2015 9:20 AM
>
> This patch can be avoided by updating the documentation with each code
> change atomically.

Agreed, will take care in next submission.



This message and any attached documents contain information from the sending 
company or its parent company(s), subsidiaries, divisions or branch offices 
that may be confidential. If you are not the intended recipient, you may not 
read, copy, distribute, or use this information. If you have received this 
transmission in error, please notify the sender immediately by reply e-mail and 
then delete this message.

[dpdk-dev] [PATCH 4/6] config: Enable BNX2X driver build by default

2015-10-22 Thread Rasesh Mody

> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, October 20, 2015 9:18 AM
>
> 2015-10-08 09:54, Rasesh Mody:
> > From: Harish Patil 
> >
> > Signed-off-by: Harish Patil 
> > Signed-off-by: Rasesh Mody 
>
> You cannot enable bnx2x without gracefully handle miss of zlib header.

Ok, will resubmit this.
Thanks!
Rasesh




This message and any attached documents contain information from the sending 
company or its parent company(s), subsidiaries, divisions or branch offices 
that may be confidential. If you are not the intended recipient, you may not 
read, copy, distribute, or use this information. If you have received this 
transmission in error, please notify the sender immediately by reply e-mail and 
then delete this message.

[dpdk-dev] [PATCH v2 6/6] ixgbe: implementation for fdir new modes' config

2015-10-22 Thread Lu, Wenzhuo

Hi Konstantin,

> -Original Message-
> From: Ananyev, Konstantin
> Sent: Wednesday, October 21, 2015 6:19 PM
> To: Lu, Wenzhuo; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2 6/6] ixgbe: implementation for fdir new
> modes' config
> 
> 
> 
> > -Original Message-
> > From: Lu, Wenzhuo
> > Sent: Wednesday, October 21, 2015 2:48 AM
> > To: Ananyev, Konstantin; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v2 6/6] ixgbe: implementation for fdir
> > new modes' config
> >
> > Hi Konstantin,
> >
> > > -Original Message-
> > > From: Ananyev, Konstantin
> > > Sent: Tuesday, October 20, 2015 9:56 PM
> > > To: Lu, Wenzhuo; dev at dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH v2 6/6] ixgbe: implementation for
> > > fdir new modes' config
> > >
> > > Hi Wenzhuo,
> > > Few questions/comments from me, see below.
> > > Thanks
> > > Konstantin
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wenzhuo Lu
> > > > Sent: Tuesday, September 29, 2015 6:31 AM
> > > > To: dev at dpdk.org
> > > > Subject: [dpdk-dev] [PATCH v2 6/6] ixgbe: implementation for fdir
> > > > new modes' config
> > > >
> > > > Implement the new CLIs for fdir mac vlan and tunnel modes,
> > > > including flow_director_filter and flow_director_mask. Set the mask of
> fdir.
> > > > Add, delete or update the entities of filter.
> > > >
> > > > Signed-off-by: Wenzhuo Lu 
> > > > ---
> > > >  drivers/net/ixgbe/ixgbe_ethdev.h |   3 +
> > > >  drivers/net/ixgbe/ixgbe_fdir.c   | 241
> > > ---
> > > >  2 files changed, 202 insertions(+), 42 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h
> > > > b/drivers/net/ixgbe/ixgbe_ethdev.h
> > > > index c3d4f4f..9cc45a0 100644
> > > > --- a/drivers/net/ixgbe/ixgbe_ethdev.h
> > > > +++ b/drivers/net/ixgbe/ixgbe_ethdev.h
> > > > @@ -133,6 +133,9 @@ struct ixgbe_hw_fdir_mask {
> > > > uint16_t src_port_mask;
> > > > uint16_t dst_port_mask;
> > > > uint16_t flex_bytes_mask;
> > > > +   uint8_t  mac_addr_mask;
> > > > +   uint32_t tunnel_id_mask;
> > > > +   uint8_t  tunnel_type_mask;
> > > >  };
> > > >
> > > >  struct ixgbe_hw_fdir_info {
> > > > diff --git a/drivers/net/ixgbe/ixgbe_fdir.c
> > > > b/drivers/net/ixgbe/ixgbe_fdir.c index 5c8b833..87e7081 100644
> > > > --- a/drivers/net/ixgbe/ixgbe_fdir.c
> > > > +++ b/drivers/net/ixgbe/ixgbe_fdir.c
> > > > @@ -105,6 +105,8 @@
> > > > rte_memcpy((ipaddr), ipv6_addr, sizeof(ipv6_addr));\  } while
> > > > (0)
> > > >
> > > > +#define DEFAULT_VXLAN_PORT 4789
> > > > +
> > > >  static int fdir_erase_filter_82599(struct ixgbe_hw *hw, uint32_t
> > > > fdirhash);  static int fdir_set_input_mask_82599(struct rte_eth_dev
> *dev,
> > > > const struct rte_eth_fdir_masks *input_mask); @@ -113,7
> > > +115,8 @@
> > > > static int ixgbe_set_fdir_flex_conf(struct rte_eth_dev *dev,
> > > > static int fdir_enable_82599(struct ixgbe_hw *hw, uint32_t
> > > > fdirctrl);  static int ixgbe_fdir_filter_to_atr_input(
> > > > const struct rte_eth_fdir_filter *fdir_filter,
> > > > -   union ixgbe_atr_input *input);
> > > > +   union ixgbe_atr_input *input,
> > > > +   enum rte_fdir_mode mode);
> > > >  static uint32_t ixgbe_atr_compute_hash_82599(union
> > > > ixgbe_atr_input
> > > *atr_input,
> > > >  uint32_t key);
> > > >  static uint32_t atr_compute_sig_hash_82599(union ixgbe_atr_input
> > > > *input, @@ -122,7 +125,8 @@ static uint32_t
> > > atr_compute_perfect_hash_82599(union ixgbe_atr_input *input,
> > > > enum rte_fdir_pballoc_type pballoc);  static int
> > > > fdir_write_perfect_filter_82599(struct ixgbe_hw *hw,
> > > > union ixgbe_atr_input *input, uint8_t queue,
> > > > -   uint32_t fdircmd, uint32_t fdirhash);
> > > > +   uint32_t fdircmd, uint32_t fdirhash,
> > > > +   enum rte_fdir_mode mode);
> > > >  static int fdir_add_signature_filter_82599(struct ixgbe_hw *hw,
> > > > union ixgbe_atr_input *input, u8 queue, uint32_t 
> > > > fdircmd,
> > > > uint32_t fdirhash);
> > > > @@ -243,9 +247,15 @@ configure_fdir_flags(const struct
> > > > rte_fdir_conf
> > > *conf, uint32_t *fdirctrl)
> > > > *fdirctrl |= (IXGBE_DEFAULT_FLEXBYTES_OFFSET / sizeof(uint16_t))
> > > <<
> > > >  IXGBE_FDIRCTRL_FLEX_SHIFT;
> > > >
> > > > -   if (conf->mode == RTE_FDIR_MODE_PERFECT) {
> > > > +   if (conf->mode >= RTE_FDIR_MODE_PERFECT) {
> > >
> > > I think better  if (conf->mode >= RTE_FDIR_MODE_PERFECT  &&
> > > conf->mode <= RTE_FDIR_MODE_PERFECT_TUNNEL) To make sure that
> future
> > > expansion of RTE_FDIR_MODE_* wouldn't break that code.
> > Yes, you're right. I'll change it.
> >
> > >
> > > > *fdirctrl |= IXGBE_FDIRCTRL_PERFECT_MATCH;
> >

[dpdk-dev] [PATCH 1/3] fm10k: add multi-queue checking

2015-10-22 Thread Qiu, Michael

On 2015/10/15 19:07, He, Shaopeng wrote:
> Hi, Michael
>
>> -Original Message-
>> From: Qiu, Michael
>> Sent: Thursday, October 15, 2015 2:28 PM
>> To: He, Shaopeng; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH 1/3] fm10k: add multi-queue checking
>>
>> On 2015/9/30 15:29, Shaopeng He wrote:
>>> Add multi-queue checking in device configure process.
>>> Currently, VMDQ and RSS are supported.
>>>
>>> Signed-off-by: Shaopeng He 
>>> ---
>>>  drivers/net/fm10k/fm10k_ethdev.c | 44
>>> 
>>>  1 file changed, 44 insertions(+)
>>>
>>> diff --git a/drivers/net/fm10k/fm10k_ethdev.c
>>> b/drivers/net/fm10k/fm10k_ethdev.c
>>> index a69c990..082937d 100644
>>> --- a/drivers/net/fm10k/fm10k_ethdev.c
>>> +++ b/drivers/net/fm10k/fm10k_ethdev.c
>>> @@ -283,12 +283,56 @@ tx_queue_disable(struct fm10k_hw *hw,
>> uint16_t
>>> qnum)  }
>>>
>>>  static int
>>> +fm10k_check_mq_mode(struct rte_eth_dev *dev) {
>>> +   enum rte_eth_rx_mq_mode rx_mq_mode = dev->data-
>>> dev_conf.rxmode.mq_mode;
>>> +   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data-
>>> dev_private);
>>> +   struct rte_eth_vmdq_rx_conf *vmdq_conf;
>>> +   uint16_t nb_rx_q = dev->data->nb_rx_queues;
>>> +
>>> +   vmdq_conf = &dev->data->dev_conf.rx_adv_conf.vmdq_rx_conf;
>>> +
>>> +   if (rx_mq_mode & ETH_MQ_RX_DCB_FLAG) {
>>> +   PMD_INIT_LOG(ERR, "DCB mode is not supported.");
>>> +   return -EINVAL;
>>> +   }
>>> +
>>> +   if (!(rx_mq_mode & ETH_MQ_RX_VMDQ_FLAG))
>>> +   return 0;
>>> +
>>> +   if (hw->mac.type == fm10k_mac_vf) {
>>> +   PMD_INIT_LOG(ERR, "VMDQ mode is not supported in VF.");
>>> +   return -EINVAL;
>>> +   }
>> I think vf check should be the first one, then we do not need check dcb and
>> VMDq flag.
>>
>> Thanks,
>> Michael
> Thanks for the comments. There is a case of RSS support on VF, if vf check be 
> the first one, it will return fail, which is not correct.

OK, you are right.

Thanks,
Michael
> Thanks,
> --Shaopeng
>>> +
>>> +   /* Check VMDQ queue pool number */
>>> +   if (vmdq_conf->nb_queue_pools >
>>> +   sizeof(vmdq_conf->pool_map[0].pools) * CHAR_BIT
>> ||
>>> +   vmdq_conf->nb_queue_pools > nb_rx_q) {
>>> +   PMD_INIT_LOG(ERR, "Too many of queue pools: %d",
>>> +   vmdq_conf->nb_queue_pools);
>>> +   return -EINVAL;
>>> +   }
>>> +
>>> +   return 0;
>>> +}
>>> +
>>> +static int
>>>  fm10k_dev_configure(struct rte_eth_dev *dev)  {
>>> +   int ret;
>>> +
>>> PMD_INIT_FUNC_TRACE();
>>>
>>> if (dev->data->dev_conf.rxmode.hw_strip_crc == 0)
>>> PMD_INIT_LOG(WARNING, "fm10k always strip CRC");
>>> +   /* multipe queue mode checking */
>>> +   ret  = fm10k_check_mq_mode(dev);
>>> +   if (ret != 0) {
>>> +   PMD_DRV_LOG(ERR, "fm10k_check_mq_mode fails
>> with %d.",
>>> +   ret);
>>> +   return ret;
>>> +   }
>>>
>>> return 0;
>>>  }
>

[dpdk-dev] DPDK patch backlog

2015-10-22 Thread Qiu, Michael

On 2015/10/16 22:25, Neil Horman wrote:
> On Fri, Oct 16, 2015 at 10:45:23AM +0200, Thomas Monjalon wrote:
>> 2015-10-15 14:44, Stephen Hemminger:
>>> There are currently 428 patches in New state in DPDK patchwork.
>>>
>>> Thomas, could you start reducing that backlog?
>> Yes
>>
>>> The simplest solution would be to merge some of the big patch series
>>> from Intel for the base drivers, then reviewers can focus on the other
>>> patches.
>> That's why having a drivers/net subtree would be useful.
>>
> Agreed, a dpdk-next tree would really be the solution here.

Can't agree more :)

Thanks,
Michael
> Neil
>
>

[dpdk-dev] DPDK patch backlog

2015-10-22 Thread Qiu, Michael

On 2015/10/21 17:05, Thomas Monjalon wrote:
> 2015-10-21 11:48, Panu Matilainen:
>> On 10/21/2015 11:25 AM, Thomas Monjalon wrote:
>>> 2015-10-20 21:34, Stephen Hemminger:
 Patch backlog is not getting better, now at 486.

 How can we break this logjam?
 Do I need to make a new "ready for merge" tree?
>>> What would mean "ready for merge"?
>>> A lot of patches are acked but do not compile or doc is missing.
>> Well, isn't that one quite reasonable definition of being "ready"?
>> - patch must be acked
>> - patch must apply and compile (when relevant)
>> - is appropriately documented (commit message style and all)
> Yes.
> Compilation must be tested with GCC and clang, as static and shared libraries
> and for 32-bit and 64-bit targets.
> Documented means good commit message and doc or release notes updated.

What about bug fix patches?

Thanks,
Michael
>

[dpdk-dev] [PATCH] fix lpm bugs

2015-10-22 Thread mablexidana

hi:
Fixes: 25e4f515fe63 ("fix lpm bugs")

   the random test of lpm , multiple delete and add ip address, it do not 
recover the last right ip address.
  eg1: 
   add a lot of routes:
 rule id : 1, ip : 16.32.0.0/19, next_hop : 62, 
rule id : 2, ip : 16.32.28.0/22, next_hop : 97,
rule id :  28, ip:16.32.0.0/21, next_hop :36
.
when you delete rule id 3, then lookup 16.32.0.150, the return is 
16.32.28.0/22,but not 16.32.0.0/19. this is because in delete_depth_small 
function, when lpm->tbl24[i].ext_entry == 0 and lpm->tbl24[i].depth > depth, it 
will run into the  tbl8 process.then the next_hop will be doing as tbl8_gindex, 
and the lpm->tbl8[j] data is being wrong processed.
fix: + else if (lpm->tbl24[i].ext_entry == 1) {

eg2:
when add ,delete and add again, it will also has problem.
in delete_depth_small function, the valid_group of new struct 
rte_lpm_tbl8_entry is INVALID, so when process  lpm->tbl8[j] = new_tbl8_entry, 
the valid_group is covered. and when just add a route depth > 24,  and  alloc a 
tbl8 index, then the tbl8_alloc will return it as new index, then the data is 
being wrong rewrite.
fix:+ .valid_group = VALID,

thanks.I will provide the testing program later .

regards

yuerxin

At 2015-10-21 19:07:49, "Bruce Richardson"  
wrote:
>On Wed, Oct 21, 2015 at 05:54:13PM +0800, mablexidana wrote:
>> hi:
>> We test some lpm cases and find some bugs, below is how to fix it. 
>> thanks :)
>
>Hi,
>
>thanks for the patch. Could you perhaps provide a description of how to 
>reproduce
>the bug (or bugs you are fixing), so that we can reproduce them and verify the
>fix. (A unit test added to the existing lpm unit tests for this would be the 
>best solution.)
>For the patch itself, the commit message should also describe the bug, and
>how the patch fixes it. It's also good to include a one-line "Fixes:" line
>in the comment - generated by using the git alias "fixline" added as:
>   fixline = log -1 --abbrev=12 --format='Fixes: %h (\"%s\")'
>
>Regards,
>/Bruce
>
>> ---
>>  lib/librte_lpm/rte_lpm.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>> 
>> 
>> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
>> index 163ba3c..b5199ff 100644
>> --- a/lib/librte_lpm/rte_lpm.c
>> +++ b/lib/librte_lpm/rte_lpm.c
>> @@ -735,7 +735,7 @@ delete_depth_small(struct rte_lpm *lpm, uint32_t 
>> ip_masked,
>> lpm->tbl24[i].depth <= depth ) {
>> lpm->tbl24[i].valid = INVALID;
>> }
>> -   else {
>> +   else if (lpm->tbl24[i].ext_entry == 1){
>> /*
>>  * If TBL24 entry is extended, then there has
>>  * to be a rule with depth >= 25 in the
>> @@ -770,6 +770,7 @@ delete_depth_small(struct rte_lpm *lpm, uint32_t 
>> ip_masked,
>> 
>> 
>> struct rte_lpm_tbl8_entry new_tbl8_entry = {
>> .valid = VALID,
>> +   .valid_group = VALID,
>> .depth = sub_rule_depth,
>> .next_hop = lpm->rules_tbl
>> [sub_rule_index].next_hop,
>> @@ -781,7 +782,7 @@ delete_depth_small(struct rte_lpm *lpm, uint32_t 
>> ip_masked,
>> lpm->tbl24[i].depth <= depth ) {
>> lpm->tbl24[i] = new_tbl24_entry;
>> }
>> -   else {
>> +   else  if (lpm->tbl24[i].ext_entry == 1) {
>> /*
>>  * If TBL24 entry is extended, then there has
>>  * to be a rule with depth >= 25 in the
>> --
>> 1.8.5.2 (Apple Git-48)

[dpdk-dev] [PATCH v3 6/7] virtio: simple tx routine

2015-10-22 Thread Tan, Jianfeng

On 10/22/2015 10:26 AM, Jianfeng wrote: 

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
> Sent: Tuesday, October 20, 2015 11:30 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v3 6/7] virtio: simple tx routine
> 
> Changes in v3:
> - Remove return at the end of void function
> - Remove always_inline attribute for virtio_xmit_cleanup
> 
> bulk free of mbufs when clean used ring.
> shift operation of idx could be saved if vq_free_cnt means free slots rather
> than free descriptors.
> 
> TODO: rearrange vq data structure, pack the stats var together so that we
> could use one vec instruction to update all of them.
> 
> Signed-off-by: Huawei Xie 
> ---
>  drivers/net/virtio/virtio_ethdev.h  |  3 ++
>  drivers/net/virtio/virtio_rxtx_simple.c | 93
> +
>  2 files changed, 96 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.h
> b/drivers/net/virtio/virtio_ethdev.h
> index d7797ab..ae2d47d 100644
> --- a/drivers/net/virtio/virtio_ethdev.h
> +++ b/drivers/net/virtio/virtio_ethdev.h
> @@ -111,6 +111,9 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct
> rte_mbuf **tx_pkts,  uint16_t virtio_recv_pkts_vec(void *rx_queue, struct
> rte_mbuf **rx_pkts,
>   uint16_t nb_pkts);
> 
> +uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf
> **tx_pkts,
> + uint16_t nb_pkts);
> +
>  /*
>   * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
>   * frames larger than 1514 bytes. We do not yet support software LRO diff --
> git a/drivers/net/virtio/virtio_rxtx_simple.c
> b/drivers/net/virtio/virtio_rxtx_simple.c
> index ef17562..a53d462 100644
> --- a/drivers/net/virtio/virtio_rxtx_simple.c
> +++ b/drivers/net/virtio/virtio_rxtx_simple.c
> @@ -288,6 +288,99 @@ virtio_recv_pkts_vec(void *rx_queue, struct
> rte_mbuf **rx_pkts,
>   return nb_pkts_received;
>  }
> 
> +#define VIRTIO_TX_FREE_THRESH 32
> +#define VIRTIO_TX_MAX_FREE_BUF_SZ 32
> +#define VIRTIO_TX_FREE_NR 32
> +/* TODO: vq->tx_free_cnt could mean num of free slots so we could avoid
> +shift */ static inline void virtio_xmit_cleanup(struct virtqueue *vq) {
> + uint16_t i, desc_idx;
> + int nb_free = 0;
> + struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
> +
> + desc_idx = (uint16_t)(vq->vq_used_cons_idx &
> + ((vq->vq_nentries >> 1) - 1));
> + free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
> + nb_free = 1;
> +
> + for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
> + m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
> + if (likely(m->pool == free[0]->pool))
> + free[nb_free++] = m;
> + else {
> + rte_mempool_put_bulk(free[0]->pool, (void **)free,
> + nb_free);
> + free[0] = m;
> + nb_free = 1;
> + }
> + }
> +
> + rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
> + vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
> + vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1); }
> +
> +uint16_t
> +virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
> + uint16_t nb_pkts)
> +{
> + struct virtqueue *txvq = tx_queue;
> + uint16_t nb_used;
> + uint16_t desc_idx;
> + struct vring_desc *start_dp;
> + uint16_t nb_tail, nb_commit;
> + int i;
> + uint16_t desc_idx_max = (txvq->vq_nentries >> 1) - 1;
> +
> + nb_used = VIRTQUEUE_NUSED(txvq);
> + rte_compiler_barrier();
> +
> + nb_commit = nb_pkts = RTE_MIN((txvq->vq_free_cnt >> 1),
> nb_pkts);

Here if nb_commit is zero, how about return 0 immediately?

> + desc_idx = (uint16_t) (txvq->vq_avail_idx & desc_idx_max);
> + start_dp = txvq->vq_ring.desc;
> + nb_tail = (uint16_t) (desc_idx_max + 1 - desc_idx);
> +
> + if (nb_used >= VIRTIO_TX_FREE_THRESH)
> + virtio_xmit_cleanup(tx_queue);

If this cleanup should be put before vq_free_cnt is referenced? It's because it 
may free some descs to vq_free_cnt.

> +
> + if (nb_commit >= nb_tail) {
> + for (i = 0; i < nb_tail; i++)
> + txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
> + for (i = 0; i < nb_tail; i++) {
> + start_dp[desc_idx].addr =
> + RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
> + start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
> + tx_pkts++;
> + desc_idx++;
> + }
> + nb_commit -= nb_tail;
> + desc_idx = 0;
> + }
> + for (i = 0; i < nb_commit; i++)
> + txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
> + for (i = 0; i < nb_commit; i++) {
> + start_dp[desc_idx].addr =
> RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
> + start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
> + tx_pkts++;
> +

[dpdk-dev] volunteer to be the maintainer of driver/net/intel sub-tree

2015-10-22 Thread Lu, Wenzhuo

Hi all,
Following the discussion of DPDK user space and the maintenance of development 
sub-trees, I'd like to volunteer myself to be the maintainer of sub-tree 
driver/net/intel. It includes all the PMD of Intel NICs. And Helin can be my 
backup.
I suggest we create a new directory to move the driver/net/e1000, 
driver/net/fm10k... to it. And we can also create directories for other vendors 
just like the kernel driver do.

Additionally, as we observed, some patch sets will not only change the files in 
drivers/net, but also some files in lib/librte_ether, doc, app, examples... 
Only being drivers/net/intel maintainer cannot work for these patch sets, 
especially for the new features. Applying partial feature patch set is not 
ideal. Ideally we need a maintainer to drive the RTE_ETHER discussion. Maybe 
Bruce can be a top-level maintainer. So, he can help when we face this scenario.

Best regards
Wenzhuo Lu

[dpdk-dev] [PATCH v3 7/7] virtio: pick simple rx/tx func

2015-10-22 Thread Tan, Jianfeng

On 10/22/2015 10:45 AM, Jianfeng wrote:

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
> Sent: Tuesday, October 20, 2015 11:30 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v3 7/7] virtio: pick simple rx/tx func
> 
> simple rx/tx func is enabled when user specifies single segment and no
> offload support.
> merge-able should be disabled to use simple rxtx.
> 
> Signed-off-by: Huawei Xie 
> ---
>  drivers/net/virtio/virtio_rxtx.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_rxtx.c 
> b/drivers/net/virtio/virtio_rxtx.c
> index 947fc46..71f8cd4 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -62,6 +62,10 @@
>  #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)  #endif
> 
> +
> +#define VIRTIO_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS
> | \
> + ETH_TXQ_FLAGS_NOOFFLOADS)
> +
>  static int use_simple_rxtx;
> 
>  static void
> @@ -471,6 +475,14 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev
> *dev,
>   return -EINVAL;
>   }
> 
> + /* Use simple rx/tx func if single segment and no offloads */
> + if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) ==
> VIRTIO_SIMPLE_FLAGS) {
> + PMD_INIT_LOG(INFO, "Using simple rx/tx path");
> + dev->tx_pkt_burst = virtio_xmit_pkts_simple;
> + dev->rx_pkt_burst = virtio_recv_pkts_vec;

Whether recv side mergeable is supported is controlled by 
virtio_negotiate_feature().
So "dev->rx_pkt_burst = virtio_recv_pkts_vec" should be restricted by 
hw->guest_features & VIRTIO_NET_F_MRG_RXBUF, right?

> + use_simple_rxtx = 1;
> + }
> +
>   ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx,
> vtpci_queue_idx,
>   nb_desc, socket_id, &vq);
>   if (ret < 0) {
> --
> 1.8.1.4

[dpdk-dev] [PATCH] kni: allow per-net instances

2015-10-22 Thread Zhang, Helin

Hi Dex

Two comments inlined. Thank you very much for the really good contribution to 
KNI!

Regards,
Helin

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Dex Chen
> Sent: Thursday, July 2, 2015 6:12 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] kni: allow per-net instances
> 
> There is a global variable 'device_in_use' which is used to make sure only one
> instance is using /dev/kni device. If you were using LXC, you will find there 
> is only
> one instance of KNI example could be run even differnt namespaces were
> created.
> 
> In order to have /dev/kni used simultaneously in different namespaces, making
> all of global variables as per network namespace variables.
> 
> With regard to single kernel thread mode, there will be one kernel thread for
> each of network namespace.
> 
> Signed-off-by: Dex Chen 
> ---
>  lib/librte_eal/linuxapp/kni/kni_misc.c | 129
> ++---
>  1 file changed, 85 insertions(+), 44 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c
> b/lib/librte_eal/linuxapp/kni/kni_misc.c
> index 2e9fa89..5ba8ab8 100644
> --- a/lib/librte_eal/linuxapp/kni/kni_misc.c
> +++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
> @@ -28,6 +28,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
> 
>  #include 
>  #include "kni_dev.h"
> @@ -90,18 +93,48 @@ static unsigned multiple_kthread_on = 0;
> 
>  #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */
> 
> -static volatile unsigned long device_in_use; /* device in use flag */ 
> -static struct
> task_struct *kni_kthread;
> +static int kni_net_id;
> 
> -/* kni list lock */
> -static DECLARE_RWSEM(kni_list_lock);
> +struct kni_net {
> + volatile unsigned long device_in_use; /* device in use flag */
> + struct task_struct *kni_kthread;
> + struct rw_semaphore kni_list_lock;
> + struct list_head kni_list_head;
> +};
> +
> +static __net_init int kni_init_net(struct net *net) {
> + struct kni_net *knet = net_generic(net, kni_net_id);
> 
> -/* kni list */
> -static struct list_head kni_list_head = LIST_HEAD_INIT(kni_list_head);
> + /* Clear the bit of device in use */
> + clear_bit(KNI_DEV_IN_USE_BIT_NUM, &knet->device_in_use);
> +
> + init_rwsem(&knet->kni_list_lock);
> + INIT_LIST_HEAD(&knet->kni_list_head);
> +
> + return 0;
> +}
> +
> +static __net_exit void kni_exit_net(struct net *net) {
> + /*
> +  * Nothing to do here.
> +  * Assuming all cleanup jobs were done in kni_release().
> +  */
> +}
Agree with Stephen, kernel should handle it well.

> +
> +static struct pernet_operations kni_net_ops = {
> + .init = kni_init_net,
> + .exit = kni_exit_net,
> + .id   = &kni_net_id,
> + .size = sizeof(struct kni_net),
> +};
> 
>  static int __init
>  kni_init(void)
>  {
> + int rc;
> +
>   KNI_PRINT(" DPDK kni module loading \n");
> 
>   if (kni_parse_kthread_mode() < 0) {
> @@ -114,8 +147,9 @@ kni_init(void)
>   return -EPERM;
>   }
> 
> - /* Clear the bit of device in use */
> - clear_bit(KNI_DEV_IN_USE_BIT_NUM, &device_in_use);
> + rc = register_pernet_subsys(&kni_net_ops);
> + if (rc)
> + goto out;
> 
>   /* Configure the lo mode according to the input parameter */
>   kni_net_config_lo_mode(lo_mode);
> @@ -123,11 +157,16 @@ kni_init(void)
>   KNI_PRINT(" DPDK kni module loaded  \n");
> 
>   return 0;
> +
> +out:
> + misc_deregister(&kni_misc);
> + return rc;
>  }
> 
>  static void __exit
>  kni_exit(void)
>  {
> + unregister_pernet_subsys(&kni_net_ops);
Should above 'unregsiter' be moved after 'misc_deregister()'?

>   misc_deregister(&kni_misc);
>   KNI_PRINT("### DPDK kni module unloaded  ###\n");  } @@
> -151,19 +190,22 @@ kni_parse_kthread_mode(void)  static int

[dpdk-dev] [PATCH v3 5/7] virtio: virtio vec rx

2015-10-22 Thread Wang, Zhihong



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
> Sent: Tuesday, October 20, 2015 11:30 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v3 5/7] virtio: virtio vec rx
> 
> With fixed avail ring, we don't need to get desc idx from avail ring.
> virtio driver only has to deal with desc ring.
> This patch uses vector instruction to accelerate processing desc ring.
> 
> Signed-off-by: Huawei Xie 
> ---
>  drivers/net/virtio/virtio_ethdev.h  |   2 +
>  drivers/net/virtio/virtio_rxtx.c|   3 +
>  drivers/net/virtio/virtio_rxtx.h|   2 +
>  drivers/net/virtio/virtio_rxtx_simple.c | 224
> 
>  drivers/net/virtio/virtqueue.h  |   1 +
>  5 files changed, 232 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.h 
> b/drivers/net/virtio/virtio_ethdev.h
> index 9026d42..d7797ab 100644
> --- a/drivers/net/virtio/virtio_ethdev.h
> +++ b/drivers/net/virtio/virtio_ethdev.h
> @@ -108,6 +108,8 @@ uint16_t virtio_recv_mergeable_pkts(void *rx_queue,
> struct rte_mbuf **rx_pkts,
>  uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
>   uint16_t nb_pkts);
> 
> +uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
> + uint16_t nb_pkts);
> 
>  /*
>   * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
> diff --git a/drivers/net/virtio/virtio_rxtx.c 
> b/drivers/net/virtio/virtio_rxtx.c
> index 5162ce6..947fc46 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -432,6 +432,9 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
>   vq->mpool = mp;
> 
>   dev->data->rx_queues[queue_idx] = vq;
> +
> + virtio_rxq_vec_setup(vq);
> +
>   return 0;
>  }
> 
> diff --git a/drivers/net/virtio/virtio_rxtx.h 
> b/drivers/net/virtio/virtio_rxtx.h
> index 7d2d8fe..831e492 100644
> --- a/drivers/net/virtio/virtio_rxtx.h
> +++ b/drivers/net/virtio/virtio_rxtx.h
> @@ -33,5 +33,7 @@
> 
>  #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
> 
> +int virtio_rxq_vec_setup(struct virtqueue *rxq);
> +
>  int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
>   struct rte_mbuf *m);
> diff --git a/drivers/net/virtio/virtio_rxtx_simple.c
> b/drivers/net/virtio/virtio_rxtx_simple.c
> index cac5b9f..ef17562 100644
> --- a/drivers/net/virtio/virtio_rxtx_simple.c
> +++ b/drivers/net/virtio/virtio_rxtx_simple.c
> @@ -58,6 +58,10 @@
>  #include "virtqueue.h"
>  #include "virtio_rxtx.h"
> 
> +#define RTE_VIRTIO_VPMD_RX_BURST 32
> +#define RTE_VIRTIO_DESC_PER_LOOP 8
> +#define RTE_VIRTIO_VPMD_RX_REARM_THRESH
> RTE_VIRTIO_VPMD_RX_BURST
> +
>  int __attribute__((cold))
>  virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
>   struct rte_mbuf *cookie)
> @@ -82,3 +86,223 @@ virtqueue_enqueue_recv_refill_simple(struct
> virtqueue *vq,
> 
>   return 0;
>  }
> +
> +static inline void
> +virtio_rxq_rearm_vec(struct virtqueue *rxvq)
> +{
> + int i;
> + uint16_t desc_idx;
> + struct rte_mbuf **sw_ring;
> + struct vring_desc *start_dp;
> + int ret;
> +
> + desc_idx = rxvq->vq_avail_idx & (rxvq->vq_nentries - 1);
> + sw_ring = &rxvq->sw_ring[desc_idx];
> + start_dp = &rxvq->vq_ring.desc[desc_idx];
> +
> + ret = rte_mempool_get_bulk(rxvq->mpool, (void **)sw_ring,
> + RTE_VIRTIO_VPMD_RX_REARM_THRESH);
> + if (unlikely(ret)) {
> + rte_eth_devices[rxvq->port_id].data->rx_mbuf_alloc_failed +=
> + RTE_VIRTIO_VPMD_RX_REARM_THRESH;
> + return;
> + }
> +
> + for (i = 0; i < RTE_VIRTIO_VPMD_RX_REARM_THRESH; i++) {
> + uintptr_t p;
> +
> + p = (uintptr_t)&sw_ring[i]->rearm_data;
> + *(uint64_t *)p = rxvq->mbuf_initializer;
> +
> + start_dp[i].addr =
> + (uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr +
> + RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
> + start_dp[i].len = sw_ring[i]->buf_len -
> + RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
> + }
> +
> + rxvq->vq_avail_idx += RTE_VIRTIO_VPMD_RX_REARM_THRESH;
> + rxvq->vq_free_cnt -= RTE_VIRTIO_VPMD_RX_REARM_THRESH;
> + vq_update_avail_idx(rxvq);
> +}
> +
> +/* virtio vPMD receive routine, only accept(nb_pkts >=
> RTE_VIRTIO_DESC_PER_LOOP)
> + *
> + * This routine is for non-mergable RX, one desc for each guest buffer.
> + * This routine is based on the RX ring layout optimization. Each entry in 
> the
> + * avail ring points to the desc with the same index in the desc ring and 
> this
> + * will never be changed in the driver.
> + *
> + * - nb_pkts < RTE_VIRTIO_DESC_PER_LOOP, just return no packet
> + */
> +uint16_t
> +virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
> + uint16_t nb_pkts)
> +{
> + struct virtqueue *rxvq = rx_queue;
> + uint16_t nb_used;
> + uint16_t desc_idx

[dpdk-dev] [PATCH] drivers: fix shared library dependencies to external libraries

2015-10-22 Thread Panu Matilainen

On 10/21/2015 07:30 PM, Nicolas Pernas Maradei wrote:
> Hi,
>
> Are those the only two libraries with external dependencies? I took a
> quick look to the rte.app.mk file and there seem to be some others like
> -lfuse and -lnuma. Would it be possible to move those to their specific
> Makefiles as well?

AFAICS those were only remaining *drivers* with external dependencies.

The libraries have dependencies of their own like you noted, but they're 
more scattered, and things start getting more complicated because of 
CONFIG_RTE_BUILD_COMBINE_LIBS etc. I plan to get to that later when time 
permits but wanted to get the driver side out of the way because they're 
the worst offenders, and one driver already does this so the situation 
is inconsistent too.

- Panu -

[dpdk-dev] [PATCH v3 6/7] virtio: simple tx routine

2015-10-22 Thread Xie, Huawei

On 10/21/2015 2:58 AM, Stephen Hemminger wrote:
> On Tue, 20 Oct 2015 23:30:06 +0800
> Huawei Xie  wrote:
>
>> +desc_idx = (uint16_t)(vq->vq_used_cons_idx &
>> +((vq->vq_nentries >> 1) - 1));
>> +free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
>> +nb_free = 1;
>> +
>> +for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
>> +m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
>> +if (likely(m->pool == free[0]->pool))
>> +free[nb_free++] = m;
>> +else {
>> +rte_mempool_put_bulk(free[0]->pool, (void **)free,
>> +nb_free);
>> +free[0] = m;
>> +nb_free = 1;
>> +}
>> +}
>> +
>> +rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
> Might be better to introduce a function in rte_mbuf.h which
> does this so other drivers can use same code?
>
> rte_pktmbuf_free_bulk(pkts[], n)
Agree. It would be good to have a generic rte_pktmbuf_free(/alloc)_bulk.
Several other drivers and future vhost patches also use the same logic.
I prefer to implement this later as this is API change.

[dpdk-dev] [PATCH v3 5/7] virtio: virtio vec rx

2015-10-22 Thread Xie, Huawei

On 10/22/2015 12:04 PM, Wang, Zhihong wrote:
> Wonder if the prefetch will actually help here.
> Will prefetching rx_pkts[i] be more helpful?
What is your concern prefetch the virtio ring?
rx_pkts is local array, Why do we need to prefetch it?

[dpdk-dev] dpdk proposal installation process

2015-10-22 Thread Panu Matilainen

On 10/21/2015 10:15 PM, Olivier MATZ wrote:
> Hi Mario,
>
> On 10/20/2015 11:17 AM, Bruce Richardson wrote:
>> On Tue, Oct 20, 2015 at 12:21:00AM +, Arevalo, Mario Alfredo C wrote:
>>> Hi folks,
>>>
>>>Good day, this is a proposal in order to improve the dpdk install 
>>> process,
>>> I would like to know your point of view about the next points according to
>>> previous conversations :) in order to create a new patches version.
>>>
>>> 1) I think the first thing that I have to be aware is "compatibility", the
>>> new changes won't affect the current dpdk behaviour.
>
> Yes. As I stated in a previous mail, I think nobody uses the current
> "make install" without specifying T= as the default value is to build
> and install for all targets.
>
> My suggestion is:
>
> - rename the previous "install" target. The name could probably
>be "mbuild" (for multiple builds). Other ideas are welcome.
>
> - when "make install" is invoked with T= argument, call the mbuild
>target to have the same behavior than before. This compat layer
>could be removed in the future.
>
> - when "make install" is invoked without T=, it installs the fhs.

Nice, this sounds like the best of both worlds.

>
>>> 2) Create new makefile rules, these rules is going to install dpdk files in
>>> default paths, however the linux distributions don't use the same paths for 
>>> their
>>> files, the linux distribution and the architecture can be factor for 
>>> different
>>> path as Panu commented in previous conversations, he is right, then all 
>>> variables
>>> could be overridden, the variables names for the user can be included in 
>>> documentation.
>>> Also an option could be a configuration file for paths, however I'm not 
>>> sure.
>
> I think having variables is ok.
>
>>> 3) The default paths for dpdk in order to follow a hierarchy, however the 
>>> variable
>>> with those values can be overridden.
>>>
>>> -install-bin  --> /usr/bin.
>>> -install-headers  --> /usr/include/dpdk
>>> -install-lib   --> /usr/lib64
>
> I remember Panu suggested to have /usr/lib by default.
> I also think /usr/lib a better default value: some distributions
> use /usr/lib for 64 bits libs, but we never have 32 bits libs in
> /usr/lib64.

Yes, just stick /usr/lib there and be done with it, lib64 is not a good 
default for these very reasons.

>>> -install-doc --> /usr/share/doc/dpdk
>>> -install-mod--> if RTE_EXEC_ENV=linuxapp then 
>>> KERNEL_DIR=/lib/modules/$(uname -r)/extra/drivers/dpdk
>>>  else KERNEL_DIR=/boot/modules).
>
> I'm not sure KERNEL_DIR is the proper name. Maybe KMOD_DIR?
>
>>> -install-sdk --> /usr/share/dpdk and call install-headers ).
>>> -install-fhs  --> call install-libraries, install-mod, install-bin 
>>> and install-doc (maybe install-headers)
>>>
>>> 4) I'm going to take account all feedback about variables, paths etc for 
>>> the new version :).
>>>
>>> Thank you so much for your help.
>>>
>>>
>>> Mario.
>>
>> Hi Mario,
>>
>> that seems like a lot of commands to add - are they all individually needed?
>>
>> In terms of where things go, should the "usr" part not a) be configurable via
>> a parameter, and b) default to "/usr/local" as that's where user-installed
>> software from outside the packaging system normally gets put.
>
> A PREFIX variable would do the job.
> About the default to /usr or /usr/local, I agree that /usr/local looks
> more usual, and I don't think it's a problem for packaging as soon as
> it can be overridden.

Yeah, PREFIX support would be nice, and defaulting that to /usr/local 
would be the right thing.

- Panu -

>
>
> Regards,
> Olivier
>

[dpdk-dev] [PATCH 2/5] fm10k: enable Rx queue interrupts for PF and VF

2015-10-22 Thread Chen, Jing D

Hi,

Best Regards,
Mark


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Shaopeng He
> Sent: Friday, September 25, 2015 1:37 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH 2/5] fm10k: enable Rx queue interrupts for PF
> and VF
> 
> The patch does below things for fm10k PF and VF:
> - Setup NIC to generate MSI-X interrupts
> - Set the RXINT register to map interrupt causes to vectors
> - Implement interrupt enable/disable functions

The description is too simple, can you help to extend?
Besides that, there are complicated changes in this patch. 
Can you help you split it to several smaller ones for better understanding?

> 
> Signed-off-by: Shaopeng He 
> ---
>  drivers/net/fm10k/fm10k_ethdev.c | 147
> +--
>  1 file changed, 140 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/fm10k/fm10k_ethdev.c
> b/drivers/net/fm10k/fm10k_ethdev.c
> index a82cd59..6648934 100644
> --- a/drivers/net/fm10k/fm10k_ethdev.c
> +++ b/drivers/net/fm10k/fm10k_ethdev.c
> 
>  static int
> +fm10k_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t
> queue_id)
> +{
> + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> +
> + /* Enable ITR */
> + if (hw->mac.type == fm10k_mac_pf)
> + FM10K_WRITE_REG(hw, FM10K_ITR(Q2V(dev, queue_id)),
> + FM10K_ITR_AUTOMASK |
> FM10K_ITR_MASK_CLEAR);
> + else
> + FM10K_WRITE_REG(hw, FM10K_VFITR(Q2V(dev, queue_id)),
> + FM10K_ITR_AUTOMASK |
> FM10K_ITR_MASK_CLEAR);
> + rte_intr_enable(&dev->pci_dev->intr_handle);
> + return 0;
> +}
> +
> +static int
> +fm10k_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t
> queue_id)
> +{
> + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> +
> + /* Disable ITR */
> + if (hw->mac.type == fm10k_mac_pf)
> + FM10K_WRITE_REG(hw, FM10K_ITR(Q2V(dev, queue_id)),
> + FM10K_ITR_MASK_SET);
> + else
> + FM10K_WRITE_REG(hw, FM10K_VFITR(Q2V(dev, queue_id)),
> + FM10K_ITR_MASK_SET);

In previous enable function, you'll use rte_intr_enable() to enable interrupt, 
but 
You needn't disable it in this function? 

> + return 0;
> +}
> +
> +static int
> +fm10k_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
> +{
> + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> + struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
> + uint32_t intr_vector, vec;
> + uint16_t queue_id;
> + int result = 0;
> +
> + /* fm10k needs interrupt for mailbox
> +  * so igb_uio is not supported for rx interrupt
> +  */

I guess you'll support both igb_uio and VFIO, RX interrupt mode only enabled in 
case VFIO is used.
I suggest you add more comments here for better understanding.

> + if (!rte_intr_cap_multiple(intr_handle) ||
> + dev->data->dev_conf.intr_conf.rxq == 0)
> + return result;
> +
> + intr_vector = dev->data->nb_rx_queues;
> +
> + /* disable interrupt first */
> + rte_intr_disable(&dev->pci_dev->intr_handle);
> + if (hw->mac.type == fm10k_mac_pf)
> + fm10k_dev_disable_intr_pf(dev);
> + else
> + fm10k_dev_disable_intr_vf(dev);
> +
> + if (rte_intr_efd_enable(intr_handle, intr_vector)) {
> + PMD_INIT_LOG(ERR, "Failed to init event fd");
> + result = -EIO;
> + }
> +
> + if (rte_intr_dp_is_en(intr_handle) && !result) {
> + intr_handle->intr_vec = rte_zmalloc("intr_vec",
> + dev->data->nb_rx_queues * sizeof(int), 0);
> + if (intr_handle->intr_vec) {
> + for (queue_id = 0, vec = RX_VEC_START;
> + queue_id < dev->data-
> >nb_rx_queues;
> + queue_id++) {
> + intr_handle->intr_vec[queue_id] = vec;
> + if (vec < intr_handle->nb_efd - 1 +
> RX_VEC_START)
> + vec++;

No "else" to handle exceptional case?

> + }
> + } else {
> + PMD_INIT_LOG(ERR, "Failed to allocate %d
> rx_queues"
> + " intr_vec", dev->data->nb_rx_queues);
> + result = -ENOMEM;
> + }
> + }
> +
> + if (hw->mac.type == fm10k_mac_pf)
> + fm10k_dev_enable_intr_pf(dev);
> + else
> + fm10k_dev_enable_intr_vf(dev);
> + rte_intr_enable(&dev->pci_dev->intr_handle);
> + hw->mac.ops.update_int_moderator(hw);
> + return result;
> +}
> +
> +static int
>  fm10k_dev_handle_fault(struct fm10k_hw *hw, uint32_t eicr)
>  {
>   struct fm10k_fault fault;
> @@ -2050,6 +2181,8 @@ static const struct eth_dev_ops
> fm10k_eth_dev_ops = {
>   .tx_queue_setup = fm

[dpdk-dev] [PATCH 7/8] i40e: get_dcb_info ops implement

2015-10-22 Thread Liu, Jijiang



> -Original Message-
> From: Wu, Jingjing
> Sent: Thursday, September 24, 2015 2:03 PM
> To: dev at dpdk.org
> Cc: Wu, Jingjing; Liu, Jijiang; Zhang, Helin; Tao, Zhe; Pei, Yulong
> Subject: [PATCH 7/8] i40e: get_dcb_info ops implement
> 
> This patch implements the get_dcb_info ops in i40e driver.
> 
> Signed-off-by: Jingjing Wu 
> ---
>  drivers/net/i40e/i40e_ethdev.c | 42
> ++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
> index 7d252fa..76e2353 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -220,6 +220,8 @@ static int i40e_dev_filter_ctrl(struct rte_eth_dev
> *dev,
>   enum rte_filter_type filter_type,
>   enum rte_filter_op filter_op,
>   void *arg);
> +static void i40e_dev_get_dcb_info(struct rte_eth_dev *dev,
> +   struct rte_eth_dcb_info *dcb_info);
>  static void i40e_configure_registers(struct i40e_hw *hw);  static void
> i40e_hw_init(struct i40e_hw *hw);  static int i40e_config_qinq(struct
> i40e_hw *hw, struct i40e_vsi *vsi); @@ -292,6 +294,7 @@ static const
> struct eth_dev_ops i40e_eth_dev_ops = {
>   .timesync_disable = i40e_timesync_disable,
>   .timesync_read_rx_timestamp   = i40e_timesync_read_rx_timestamp,
>   .timesync_read_tx_timestamp   = i40e_timesync_read_tx_timestamp,
> + .get_dcb_info = i40e_dev_get_dcb_info,
>  };
> 
>  static struct eth_driver rte_i40e_pmd = { @@ -6808,3 +6811,42 @@
> i40e_dcb_setup(struct rte_eth_dev *dev)
>   }
>   return 0;
>  }
> +
> +static void
> +i40e_dev_get_dcb_info(struct rte_eth_dev *dev,
> +   struct rte_eth_dcb_info *dcb_info) {
> + struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data-
> >dev_private);
> + struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data-
> >dev_private);
> + struct i40e_vsi *vsi = pf->main_vsi;
> + struct i40e_dcbx_config *dcb_cfg = &hw->local_dcbx_config;
> + uint16_t bsf, tc_mapping;
> + int i;
> +
> + if (dev->data->dev_conf.rxmode.mq_mode &
> ETH_MQ_RX_DCB_FLAG)
> + dcb_info->nb_tcs =
> + dev->data-
> >dev_conf.rx_adv_conf.dcb_rx_conf.nb_tcs;
> + else
> + dcb_info->nb_tcs = 1;
> + for (i = 0; i < I40E_MAX_USER_PRIORITY; i++)
> + dcb_info->prio_tc[i] = dcb_cfg->etscfg.prioritytable[i];
> + for (i = 0; i < dcb_info->nb_tcs; i++)
> + dcb_info->tc_bws[i] = dcb_cfg->etscfg.tcbwtable[i];
> +
> + for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) {
> + if (vsi->enabled_tc & (1 << i)) {
> + tc_mapping = rte_le_to_cpu_16(vsi-
> >info.tc_mapping[i]);
> + /* only main vsi support multi TCs */
> + dcb_info->tc_queue.tc_rxq[0][i].base =
> + (tc_mapping &
> I40E_AQ_VSI_TC_QUE_OFFSET_MASK) >>
> + I40E_AQ_VSI_TC_QUE_OFFSET_SHIFT;
> + dcb_info->tc_queue.tc_txq[0][i].base =
> + dcb_info->tc_queue.tc_rxq[0][i].base;
> + bsf = (tc_mapping &
> I40E_AQ_VSI_TC_QUE_NUMBER_MASK) >>
> + I40E_AQ_VSI_TC_QUE_NUMBER_SHIFT;
> + dcb_info->tc_queue.tc_rxq[0][i].nb_queue = 1 << bsf;
> + dcb_info->tc_queue.tc_txq[0][i].nb_queue =
> + dcb_info->tc_queue.tc_rxq[0][i].nb_queue;
> + }
> + }
> +}
> --
> 2.4.0

If there are some  command lines in testpmd to get DCB information, that is 
great.

[dpdk-dev] [PATCH v3 0/7] Support new flow director modes on Intel x550 NIC

2015-10-22 Thread Wenzhuo Lu

This patch set adds 2 new flow director modes on Intel x550 NIC.
The 2 new fdir modes are mac vlan mode and tunnel mode.
The mac vlan mode can direct the flow based on the MAC address and VLAN
TCI.
The tunnel mode provides the support for VxLAN and NVGRE. x550 can recognize
VxLAN and NVGRE packets, and direct the packets based on the MAC address,
VLAN TCI, TNI/VNI.
Surely, the MAC address, VLAN TCI, TNI/VNI can be masked, so, the flow
can be directed based on the left conditions. For example, if we want to
direct the flow based on the MAC address, we can use mac vlan mode with
VLAN TCI masked.
Now, only x550 supports these 2 modes. We should not use the new mode on
other NICs. If so, the ports will not be initialized successfully.

V2:
Change the word 'cloud' to 'tunnel'.
Change 'tni_vni' to 'tunnel_id'.

V3:
Change the name mac_addr_mask to mac_addr_byte_mask, for some NICs may like
to support per bit mask in future.
Set default VxLAN port only when the NIC support VxLAN.
Make the condition more strict when check the fdir mode for avoiding the code
being broken with future expansion.
Make mac mask more flexible.
Add a new function for MAC VLAN and tunnel mask.


Wenzhuo Lu (7):
  lib/librte_ether: modify the structures for fdir new modes
  app/testpmd: initialize the new fields for fdir mask
  app/testpmd: new fdir modes for testpmd parameter
  app/testpmd: modify the output of the CLI show port fdir
  app/testpmd: modify and add fdir filter and mask CLIs for new modes
  ixgbe: implementation for fdir new modes' config
  doc: release notes update for flow director enhancement

 app/test-pmd/cmdline.c   | 293 +--
 app/test-pmd/config.c|  45 --
 app/test-pmd/parameters.c|   7 +-
 app/test-pmd/testpmd.c   |   3 +
 doc/guides/rel_notes/release_2_2.rst |   3 +
 drivers/net/ixgbe/ixgbe_ethdev.h |   3 +
 drivers/net/ixgbe/ixgbe_fdir.c   | 261 +++
 lib/librte_ether/rte_eth_ctrl.h  |  69 ++---
 8 files changed, 606 insertions(+), 78 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH v3 1/7] lib/librte_ether: modify the structures for fdir new modes

2015-10-22 Thread Wenzhuo Lu

Define the new modes and modify the filter and mask structures for
the mac vlan and tunnel modes.

Signed-off-by: Wenzhuo Lu 
---
 lib/librte_ether/rte_eth_ctrl.h | 69 ++---
 1 file changed, 51 insertions(+), 18 deletions(-)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index 26b7b33..078faf9 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -248,6 +248,17 @@ enum rte_eth_tunnel_type {
 };

 /**
+ *  Flow Director setting modes: none, signature or perfect.
+ */
+enum rte_fdir_mode {
+   RTE_FDIR_MODE_NONE  = 0, /**< Disable FDIR support. */
+   RTE_FDIR_MODE_SIGNATURE, /**< Enable FDIR signature filter mode. */
+   RTE_FDIR_MODE_PERFECT,   /**< Enable FDIR perfect filter mode for IP. */
+   RTE_FDIR_MODE_PERFECT_MAC_VLAN, /**< Enable FDIR filter mode - MAC 
VLAN. */
+   RTE_FDIR_MODE_PERFECT_TUNNEL,   /**< Enable FDIR filter mode - tunnel. 
*/
+};
+
+/**
  * filter type of tunneling packet
  */
 #define ETH_TUNNEL_FILTER_OMAC  0x01 /**< filter by outer MAC addr */
@@ -377,18 +388,46 @@ struct rte_eth_sctpv6_flow {
 };

 /**
+ * A structure used to define the input for MAC VLAN flow
+ */
+struct rte_eth_mac_vlan_flow {
+   struct ether_addr mac_addr;  /**< Mac address to match. */
+};
+
+/**
+ * Tunnel type for flow director.
+ */
+enum rte_eth_fdir_tunnel_type {
+   RTE_FDIR_TUNNEL_TYPE_NVGRE = 0,
+   RTE_FDIR_TUNNEL_TYPE_VXLAN,
+   RTE_FDIR_TUNNEL_TYPE_UNKNOWN,
+};
+
+/**
+ * A structure used to define the input for tunnel flow, now it's VxLAN or
+ * NVGRE
+ */
+struct rte_eth_tunnel_flow {
+   enum rte_eth_fdir_tunnel_type tunnel_type; /**< Tunnel type to match. */
+   uint32_t tunnel_id;/**< Tunnel ID to match. 
TNI, VNI... */
+   struct ether_addr mac_addr;/**< Mac address to match. */
+};
+
+/**
  * An union contains the inputs for all types of flow
  */
 union rte_eth_fdir_flow {
-   struct rte_eth_l2_flow l2_flow;
-   struct rte_eth_udpv4_flow  udp4_flow;
-   struct rte_eth_tcpv4_flow  tcp4_flow;
-   struct rte_eth_sctpv4_flow sctp4_flow;
-   struct rte_eth_ipv4_flow   ip4_flow;
-   struct rte_eth_udpv6_flow  udp6_flow;
-   struct rte_eth_tcpv6_flow  tcp6_flow;
-   struct rte_eth_sctpv6_flow sctp6_flow;
-   struct rte_eth_ipv6_flow   ipv6_flow;
+   struct rte_eth_l2_flow l2_flow;
+   struct rte_eth_udpv4_flow  udp4_flow;
+   struct rte_eth_tcpv4_flow  tcp4_flow;
+   struct rte_eth_sctpv4_flow sctp4_flow;
+   struct rte_eth_ipv4_flow   ip4_flow;
+   struct rte_eth_udpv6_flow  udp6_flow;
+   struct rte_eth_tcpv6_flow  tcp6_flow;
+   struct rte_eth_sctpv6_flow sctp6_flow;
+   struct rte_eth_ipv6_flow   ipv6_flow;
+   struct rte_eth_mac_vlan_flow   mac_vlan_flow;
+   struct rte_eth_tunnel_flow tunnel_flow;
 };

 /**
@@ -465,6 +504,9 @@ struct rte_eth_fdir_masks {
struct rte_eth_ipv6_flow   ipv6_mask;
uint16_t src_port_mask;
uint16_t dst_port_mask;
+   uint8_t mac_addr_byte_mask;  /** Per byte MAC address mask */
+   uint32_t tunnel_id_mask;  /** tunnel ID mask */
+   uint8_t tunnel_type_mask;
 };

 /**
@@ -515,15 +557,6 @@ struct rte_eth_fdir_flex_conf {
/**< Flex mask configuration for each flow type */
 };

-/**
- *  Flow Director setting modes: none, signature or perfect.
- */
-enum rte_fdir_mode {
-   RTE_FDIR_MODE_NONE  = 0, /**< Disable FDIR support. */
-   RTE_FDIR_MODE_SIGNATURE, /**< Enable FDIR signature filter mode. */
-   RTE_FDIR_MODE_PERFECT,   /**< Enable FDIR perfect filter mode. */
-};
-
 #define UINT32_BIT (CHAR_BIT * sizeof(uint32_t))
 #define RTE_FLOW_MASK_ARRAY_SIZE \
(RTE_ALIGN(RTE_ETH_FLOW_MAX, UINT32_BIT)/UINT32_BIT)
-- 
1.9.3

[dpdk-dev] [PATCH v3 2/7] app/testpmd: initialize the new fields for fdir mask

2015-10-22 Thread Wenzhuo Lu

When a port is enabled, there're default values for the parameters of
fdir mask. For the new parameters, the default values also need to be
set.

Signed-off-by: Wenzhuo Lu 
---
 app/test-pmd/testpmd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 386bf84..d34c81a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -298,6 +298,9 @@ struct rte_fdir_conf fdir_conf = {
},
.src_port_mask = 0x,
.dst_port_mask = 0x,
+   .mac_addr_byte_mask = 0xFF,
+   .tunnel_type_mask = 1,
+   .tunnel_id_mask = 0x,
},
.drop_queue = 127,
 };
-- 
1.9.3

[dpdk-dev] [PATCH v3 4/7] app/testpmd: modify the output of the CLI show port fdir

2015-10-22 Thread Wenzhuo Lu

There're fdir mask and supported flow type in the output of the CLI,
show port fdir. But not every parameter has meaning for all the fdir
modes, and the supported flow type is meaningless for mac vlan and
tunnel modes. So, we output different thing for different mode.

Signed-off-by: Wenzhuo Lu 
---
 app/test-pmd/config.c | 45 +++--
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index cf2aa6e..1ec6a77 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1829,18 +1829,28 @@ set_qmap(portid_t port_id, uint8_t is_rx, uint16_t 
queue_id, uint8_t map_value)
 static inline void
 print_fdir_mask(struct rte_eth_fdir_masks *mask)
 {
-   printf("\nvlan_tci: 0x%04x, src_ipv4: 0x%08x, dst_ipv4: 0x%08x,"
- " src_port: 0x%04x, dst_port: 0x%04x",
-   mask->vlan_tci_mask, mask->ipv4_mask.src_ip,
-   mask->ipv4_mask.dst_ip,
-   mask->src_port_mask, mask->dst_port_mask);
-
-   printf("\nsrc_ipv6: 0x%08x,0x%08x,0x%08x,0x%08x,"
-" dst_ipv6: 0x%08x,0x%08x,0x%08x,0x%08x",
-   mask->ipv6_mask.src_ip[0], mask->ipv6_mask.src_ip[1],
-   mask->ipv6_mask.src_ip[2], mask->ipv6_mask.src_ip[3],
-   mask->ipv6_mask.dst_ip[0], mask->ipv6_mask.dst_ip[1],
-   mask->ipv6_mask.dst_ip[2], mask->ipv6_mask.dst_ip[3]);
+   printf("\nvlan_tci: 0x%04x, ", mask->vlan_tci_mask);
+
+   if (fdir_conf.mode == RTE_FDIR_MODE_PERFECT_MAC_VLAN)
+   printf("mac_addr: 0x%02x", mask->mac_addr_byte_mask);
+   else if (fdir_conf.mode == RTE_FDIR_MODE_PERFECT_TUNNEL)
+   printf("mac_addr: 0x%02x, tunnel_type: 0x%01x, tunnel_id: 
0x%08x",
+   mask->mac_addr_byte_mask, mask->tunnel_type_mask,
+   mask->tunnel_id_mask);
+   else {
+   printf("src_ipv4: 0x%08x, dst_ipv4: 0x%08x,"
+   " src_port: 0x%04x, dst_port: 0x%04x",
+   mask->ipv4_mask.src_ip, mask->ipv4_mask.dst_ip,
+   mask->src_port_mask, mask->dst_port_mask);
+
+   printf("\nsrc_ipv6: 0x%08x,0x%08x,0x%08x,0x%08x,"
+   " dst_ipv6: 0x%08x,0x%08x,0x%08x,0x%08x",
+   mask->ipv6_mask.src_ip[0], mask->ipv6_mask.src_ip[1],
+   mask->ipv6_mask.src_ip[2], mask->ipv6_mask.src_ip[3],
+   mask->ipv6_mask.dst_ip[0], mask->ipv6_mask.dst_ip[1],
+   mask->ipv6_mask.dst_ip[2], mask->ipv6_mask.dst_ip[3]);
+   }
+
printf("\n");
 }

@@ -1966,12 +1976,19 @@ fdir_get_infos(portid_t port_id)
printf("  MODE: ");
if (fdir_info.mode == RTE_FDIR_MODE_PERFECT)
printf("  PERFECT\n");
+   else if (fdir_info.mode == RTE_FDIR_MODE_PERFECT_MAC_VLAN)
+   printf("  PERFECT-MAC-VLAN\n");
+   else if (fdir_info.mode == RTE_FDIR_MODE_PERFECT_TUNNEL)
+   printf("  PERFECT-TUNNEL\n");
else if (fdir_info.mode == RTE_FDIR_MODE_SIGNATURE)
printf("  SIGNATURE\n");
else
printf("  DISABLE\n");
-   printf("  SUPPORTED FLOW TYPE: ");
-   print_fdir_flow_type(fdir_info.flow_types_mask[0]);
+   if (fdir_info.mode != RTE_FDIR_MODE_PERFECT_MAC_VLAN
+   && fdir_info.mode != RTE_FDIR_MODE_PERFECT_TUNNEL) {
+   printf("  SUPPORTED FLOW TYPE: ");
+   print_fdir_flow_type(fdir_info.flow_types_mask[0]);
+   }
printf("  FLEX PAYLOAD INFO:\n");
printf("  max_len:   %-10"PRIu32"  payload_limit: %-10"PRIu32"\n"
   "  payload_unit:  %-10"PRIu32"  payload_seg:   %-10"PRIu32"\n"
-- 
1.9.3

[dpdk-dev] [PATCH v3 3/7] app/testpmd: new fdir modes for testpmd parameter

2015-10-22 Thread Wenzhuo Lu

For testpmd CLI's parameter pkt-filter-mode, there're new values supported for
fdir new modes, perfect-mac-vlan, perfect-tunnel.

Signed-off-by: Wenzhuo Lu 
---
 app/test-pmd/parameters.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index f1daa6e..df16e8f 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -707,12 +707,17 @@ launch_args_parse(int argc, char** argv)
RTE_FDIR_MODE_SIGNATURE;
else if (!strcmp(optarg, "perfect"))
fdir_conf.mode = RTE_FDIR_MODE_PERFECT;
+   else if (!strcmp(optarg, "perfect-mac-vlan"))
+   fdir_conf.mode = 
RTE_FDIR_MODE_PERFECT_MAC_VLAN;
+   else if (!strcmp(optarg, "perfect-tunnel"))
+   fdir_conf.mode = 
RTE_FDIR_MODE_PERFECT_TUNNEL;
else if (!strcmp(optarg, "none"))
fdir_conf.mode = RTE_FDIR_MODE_NONE;
else
rte_exit(EXIT_FAILURE,
 "pkt-mode-invalid %s invalid - 
must be: "
-"none, signature or perfect\n",
+"none, signature, perfect, 
perfect-mac-vlan"
+" or perfect-tunnel\n",
 optarg);
}
if (!strcmp(lgopts[opt_idx].name,
-- 
1.9.3

[dpdk-dev] [PATCH v3 5/7] app/testpmd: modify and add fdir filter and mask CLIs for new modes

2015-10-22 Thread Wenzhuo Lu

The different fdir mode needs different parameters, so, the parameter *mode*
is introduced to the CLI flow_director_filter and flow_director_mask. This
parameter can pormpt the user to input the appropriate parameters for different
mode.
Please be aware, as we should set the fdir mode, the value of the parameter
pkt-filter-mode, when we start testpmd. We cannot set a different mode for
mask or filter.

The new CLIs are added for the mac vlan and tunnel modes, like this,
flow_director_mask X mode MAC-VLAN vlan  mac XX,
flow_director_mask X mode Tunnel vlan  mac XX tunnel-type X tunnel-id ,
flow_director_filter X mode MAC-VLAN add/del/update mac XX:XX:XX:XX:XX:XX
vlan  flexbytes (X,X) fwd/drop queue X fd_id X,
flow_director_filter X mode Tunnel add/del/update mac XX:XX:XX:XX:XX:XX
vlan  tunnel NVGRE/VxLAN tunnel-id  flexbytes (X,X) fwd/drop queue X
fd_id X.

Signed-off-by: Wenzhuo Lu 
---
 app/test-pmd/cmdline.c | 293 ++---
 1 file changed, 278 insertions(+), 15 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 0f8f48f..ac44ab0 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -7725,6 +7725,8 @@ cmdline_parse_inst_t cmd_ethertype_filter = {
 struct cmd_flow_director_result {
cmdline_fixed_string_t flow_director_filter;
uint8_t port_id;
+   cmdline_fixed_string_t mode;
+   cmdline_fixed_string_t mode_value;
cmdline_fixed_string_t ops;
cmdline_fixed_string_t flow;
cmdline_fixed_string_t flow_type;
@@ -7747,6 +7749,12 @@ struct cmd_flow_director_result {
uint16_t  queue_id;
cmdline_fixed_string_t fd_id;
uint32_t  fd_id_value;
+   cmdline_fixed_string_t mac;
+   struct ether_addr mac_addr;
+   cmdline_fixed_string_t tunnel;
+   cmdline_fixed_string_t tunnel_type;
+   cmdline_fixed_string_t tunnel_id;
+   uint32_t tunnel_id_value;
 };

 static inline int
@@ -7818,6 +7826,25 @@ str2flowtype(char *string)
return RTE_ETH_FLOW_UNKNOWN;
 }

+static uint8_t
+str2fdir_tunneltype(char *string)
+{
+   uint8_t i = 0;
+   static const struct {
+   char str[32];
+   uint8_t type;
+   } tunneltype_str[] = {
+   {"NVGRE", RTE_FDIR_TUNNEL_TYPE_NVGRE},
+   {"VxLAN", RTE_FDIR_TUNNEL_TYPE_VXLAN},
+   };
+
+   for (i = 0; i < RTE_DIM(tunneltype_str); i++) {
+   if (!strcmp(tunneltype_str[i].str, string))
+   return tunneltype_str[i].type;
+   }
+   return RTE_FDIR_TUNNEL_TYPE_UNKNOWN;
+}
+
 #define IPV4_ADDR_TO_UINT(ip_addr, ip) \
 do { \
if ((ip_addr).family == AF_INET) \
@@ -7858,6 +7885,25 @@ cmd_flow_director_filter_parsed(void *parsed_result,
}
memset(flexbytes, 0, sizeof(flexbytes));
memset(&entry, 0, sizeof(struct rte_eth_fdir_filter));
+
+   if (fdir_conf.mode ==  RTE_FDIR_MODE_PERFECT_MAC_VLAN) {
+   if (strcmp(res->mode_value, "MAC-VLAN")) {
+   printf("Please set mode to MAC-VLAN.\n");
+   return;
+   }
+   } else if (fdir_conf.mode ==  RTE_FDIR_MODE_PERFECT_TUNNEL) {
+   if (strcmp(res->mode_value, "Tunnel")) {
+   printf("Please set mode to Tunnel.\n");
+   return;
+   }
+   } else {
+   if (strcmp(res->mode_value, "IP")) {
+   printf("Please set mode to IP.\n");
+   return;
+   }
+   entry.input.flow_type = str2flowtype(res->flow_type);
+   }
+
ret = parse_flexbytes(res->flexbytes_value,
flexbytes,
RTE_ETH_FDIR_MAX_FLEXLEN);
@@ -7866,7 +7912,6 @@ cmd_flow_director_filter_parsed(void *parsed_result,
return;
}

-   entry.input.flow_type = str2flowtype(res->flow_type);
switch (entry.input.flow_type) {
case RTE_ETH_FLOW_FRAG_IPV4:
case RTE_ETH_FLOW_NONFRAG_IPV4_OTHER:
@@ -7927,9 +7972,24 @@ cmd_flow_director_filter_parsed(void *parsed_result,
rte_cpu_to_be_16(res->ether_type);
break;
default:
-   printf("invalid parameter.\n");
-   return;
+   break;
+   }
+
+   if (fdir_conf.mode ==  RTE_FDIR_MODE_PERFECT_MAC_VLAN)
+   (void)rte_memcpy(&entry.input.flow.mac_vlan_flow.mac_addr,
+&res->mac_addr,
+sizeof(struct ether_addr));
+
+   if (fdir_conf.mode ==  RTE_FDIR_MODE_PERFECT_TUNNEL) {
+   (void)rte_memcpy(&entry.input.flow.tunnel_flow.mac_addr,
+&res->mac_addr,
+sizeof(struct ether_addr));
+   entry.input.flow.tunnel_flow.tunnel_type =
+   st

[dpdk-dev] [PATCH v3 6/7] ixgbe: implementation for fdir new modes' config

2015-10-22 Thread Wenzhuo Lu

Implement the new CLIs for fdir mac vlan and tunnel modes, including
flow_director_filter and flow_director_mask. Set the mask of fdir.
Add, delete or update the entities of filter.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/ixgbe_ethdev.h |   3 +
 drivers/net/ixgbe/ixgbe_fdir.c   | 261 ++-
 2 files changed, 234 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index c3d4f4f..1e971b9 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -133,6 +133,9 @@ struct ixgbe_hw_fdir_mask {
uint16_t src_port_mask;
uint16_t dst_port_mask;
uint16_t flex_bytes_mask;
+   uint8_t  mac_addr_byte_mask;
+   uint32_t tunnel_id_mask;
+   uint8_t  tunnel_type_mask;
 };

 struct ixgbe_hw_fdir_info {
diff --git a/drivers/net/ixgbe/ixgbe_fdir.c b/drivers/net/ixgbe/ixgbe_fdir.c
index 5c8b833..c8352f4 100644
--- a/drivers/net/ixgbe/ixgbe_fdir.c
+++ b/drivers/net/ixgbe/ixgbe_fdir.c
@@ -105,15 +105,23 @@
rte_memcpy((ipaddr), ipv6_addr, sizeof(ipv6_addr));\
 } while (0)

+#define DEFAULT_VXLAN_PORT 4789
+#define IXGBE_FDIRIP6M_INNER_MAC_SHIFT 4
+
 static int fdir_erase_filter_82599(struct ixgbe_hw *hw, uint32_t fdirhash);
+static int fdir_set_input_mask(struct rte_eth_dev *dev,
+   const struct rte_eth_fdir_masks *input_mask);
 static int fdir_set_input_mask_82599(struct rte_eth_dev *dev,
const struct rte_eth_fdir_masks *input_mask);
+static int fdir_set_input_mask_x550(struct rte_eth_dev *dev,
+   const struct rte_eth_fdir_masks *input_mask);
 static int ixgbe_set_fdir_flex_conf(struct rte_eth_dev *dev,
const struct rte_eth_fdir_flex_conf *conf, uint32_t *fdirctrl);
 static int fdir_enable_82599(struct ixgbe_hw *hw, uint32_t fdirctrl);
 static int ixgbe_fdir_filter_to_atr_input(
const struct rte_eth_fdir_filter *fdir_filter,
-   union ixgbe_atr_input *input);
+   union ixgbe_atr_input *input,
+   enum rte_fdir_mode mode);
 static uint32_t ixgbe_atr_compute_hash_82599(union ixgbe_atr_input *atr_input,
 uint32_t key);
 static uint32_t atr_compute_sig_hash_82599(union ixgbe_atr_input *input,
@@ -122,7 +130,8 @@ static uint32_t atr_compute_perfect_hash_82599(union 
ixgbe_atr_input *input,
enum rte_fdir_pballoc_type pballoc);
 static int fdir_write_perfect_filter_82599(struct ixgbe_hw *hw,
union ixgbe_atr_input *input, uint8_t queue,
-   uint32_t fdircmd, uint32_t fdirhash);
+   uint32_t fdircmd, uint32_t fdirhash,
+   enum rte_fdir_mode mode);
 static int fdir_add_signature_filter_82599(struct ixgbe_hw *hw,
union ixgbe_atr_input *input, u8 queue, uint32_t fdircmd,
uint32_t fdirhash);
@@ -243,9 +252,16 @@ configure_fdir_flags(const struct rte_fdir_conf *conf, 
uint32_t *fdirctrl)
*fdirctrl |= (IXGBE_DEFAULT_FLEXBYTES_OFFSET / sizeof(uint16_t)) <<
 IXGBE_FDIRCTRL_FLEX_SHIFT;

-   if (conf->mode == RTE_FDIR_MODE_PERFECT) {
+   if (conf->mode >= RTE_FDIR_MODE_PERFECT
+   && conf->mode <= RTE_FDIR_MODE_PERFECT_TUNNEL) {
*fdirctrl |= IXGBE_FDIRCTRL_PERFECT_MATCH;
*fdirctrl |= (conf->drop_queue << IXGBE_FDIRCTRL_DROP_Q_SHIFT);
+   if (conf->mode == RTE_FDIR_MODE_PERFECT_MAC_VLAN)
+   *fdirctrl |= (IXGBE_FDIRCTRL_FILTERMODE_MACVLAN
+   << IXGBE_FDIRCTRL_FILTERMODE_SHIFT);
+   else if (conf->mode == RTE_FDIR_MODE_PERFECT_TUNNEL)
+   *fdirctrl |= (IXGBE_FDIRCTRL_FILTERMODE_CLOUD
+   << IXGBE_FDIRCTRL_FILTERMODE_SHIFT);
}

return 0;
@@ -274,7 +290,7 @@ reverse_fdir_bitmasks(uint16_t hi_dword, uint16_t lo_dword)
 }

 /*
- * This is based on ixgbe_fdir_set_input_mask_82599() in base/ixgbe_82599.c,
+ * This references ixgbe_fdir_set_input_mask_82599() in base/ixgbe_82599.c,
  * but makes use of the rte_fdir_masks structure to see which bits to set.
  */
 static int
@@ -342,7 +358,6 @@ fdir_set_input_mask_82599(struct rte_eth_dev *dev,

if (dev->data->dev_conf.fdir_conf.mode == RTE_FDIR_MODE_SIGNATURE) {
/*
-* IPv6 mask is only meaningful in signature mode
 * Store source and destination IPv6 masks (bit reversed)
 */
IPV6_ADDR_TO_MASK(input_mask->ipv6_mask.src_ip, src_ipv6m);
@@ -358,6 +373,122 @@ fdir_set_input_mask_82599(struct rte_eth_dev *dev,
 }

 /*
+ * This references ixgbe_fdir_set_input_mask_82599() in base/ixgbe_82599.c,
+ * but makes use of the rte_fdir_masks structure to see which bits to set.
+ */
+static int
+fdir_set_input_mask_x550(struct rte_eth_dev *dev,
+   co

[dpdk-dev] [PATCH v3 7/7] doc: release notes update for flow director enhancement

2015-10-22 Thread Wenzhuo Lu

Signed-off-by: Wenzhuo Lu 
---
 doc/guides/rel_notes/release_2_2.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index bc9b00f..9d0a4d7 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -4,6 +4,9 @@ DPDK Release 2.2
 New Features
 

+* **ixgbe: flow director enhancement on Intel x550 NIC**
+  Add 2 new flow director mode on x550.
+  One is MAC VLAN mode, the other is tunnel mode.

 Resolved Issues
 ---
-- 
1.9.3

[dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index

2015-10-22 Thread Xie, Huawei

On 10/21/2015 11:48 AM, Yuanhan Liu wrote:

[...]
>  
>  #define MAX_PKT_BURST 32
>  
> +static inline int __attribute__((always_inline))
> +is_valid_virt_queue_idx(uint32_t virtq_idx, int is_tx, uint32_t max_qp_idx)
> +{
> + if ((is_tx ^ (virtq_idx & 0x1)) ||
> + (virtq_idx >= max_qp_idx * VIRTIO_QNUM))
> + return 0;
> +
> + return 1;
> +}
> +
>  /**
>   * This function adds buffers to the virtio devices RX virtqueue. Buffers can
>   * be received from the physical port or from another virtio device. A packet
> @@ -68,12 +78,14 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
>   uint8_t success = 0;
>  
>   LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh);
> - if (unlikely(queue_id != VIRTIO_RXQ)) {
> - LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
> + if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
> + RTE_LOG(ERR, VHOST_DATA,
> + "%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
> + __func__, dev->device_fh, queue_id);
>   return 0;
>   }
>  
> - vq = dev->virtqueue[VIRTIO_RXQ];
> + vq = dev->virtqueue[queue_id];
>   count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count;
>  
>   /*
>
Besides the always_inline issue, i think we should remove the queue_id
check here in the "data" path. Caller should guarantee that they pass us
the correct queue idx.
We could add VHOST_DEBUG macro for the sanity check for debug purpose only.

On the other hand, currently we lack of enough check for the guest
because there could be malicious guests. Plan to fix this in next release.

[...]

[dpdk-dev] [PATCH v2 0/2] i40e: Enlarge the number of supported queues

2015-10-22 Thread Helin Zhang

It enlarges the number of supported queues to hardware allowed
maximum. There was a software limitation of 64 per physical port
which is not reasonable.

v2 changes:
Fixed issues of using wrong configured number of VF queues.

Helin Zhang (2):
  i40e: adjust the number of queues for RSS
  i40e: Enlarge the number of supported queues

 config/common_bsdapp  |   3 +-
 config/common_linuxapp|   3 +-
 drivers/net/i40e/i40e_ethdev.c| 146 --
 drivers/net/i40e/i40e_ethdev.h|   8 +++
 drivers/net/i40e/i40e_ethdev_vf.c |   2 +-
 5 files changed, 74 insertions(+), 88 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH v2 1/2] i40e: adjust the number of queues for RSS

2015-10-22 Thread Helin Zhang

It adjusts the number of queues for RSS from power of 2 to any as
long as it does not exceeds the hardware allowed.

Signed-off-by: Helin Zhang 
---
 drivers/net/i40e/i40e_ethdev.c| 8 
 drivers/net/i40e/i40e_ethdev_vf.c | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 2dd9fdc..4b70588 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -5153,12 +5153,12 @@ i40e_pf_config_rss(struct i40e_pf *pf)
 * If both VMDQ and RSS enabled, not all of PF queues are configured.
 * It's necessary to calulate the actual PF queues that are configured.
 */
-   if (pf->dev_data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_VMDQ_FLAG) {
+   if (pf->dev_data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_VMDQ_FLAG)
num = i40e_pf_calc_configured_queues_num(pf);
-   num = i40e_align_floor(num);
-   } else
-   num = i40e_align_floor(pf->dev_data->nb_rx_queues);
+   else
+   num = pf->dev_data->nb_rx_queues;

+   num = RTE_MIN(num, I40E_MAX_Q_PER_TC);
PMD_INIT_LOG(INFO, "Max of contiguous %u PF queues are configured",
num);

diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index b694400..b15ff7b 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1915,7 +1915,7 @@ i40evf_config_rss(struct i40e_vf *vf)
return 0;
}

-   num = i40e_align_floor(vf->dev_data->nb_rx_queues);
+   num = RTE_MIN(vf->dev_data->nb_rx_queues, I40E_MAX_QP_NUM_PER_VF);
/* Fill out the look up table */
for (i = 0, j = 0; i < nb_q; i++, j++) {
if (j >= num)
-- 
1.9.3

[dpdk-dev] [PATCH v2 2/2] i40e: Enlarge the number of supported queues

2015-10-22 Thread Helin Zhang

It enlarges the number of supported queues to hardware allowed
maximum. There was a software limitation of 64 per physical port
which is not reasonable.

Signed-off-by: Helin Zhang 
---
 config/common_bsdapp   |   3 +-
 config/common_linuxapp |   3 +-
 drivers/net/i40e/i40e_ethdev.c | 138 +
 drivers/net/i40e/i40e_ethdev.h |   8 +++
 4 files changed, 69 insertions(+), 83 deletions(-)

v2 changes:
Fixed issues of using wrong configured number of VF queues

diff --git a/config/common_bsdapp b/config/common_bsdapp
index b37dcf4..dac6dad 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -141,7 +141,7 @@ CONFIG_RTE_LIBRTE_KVARGS=y
 CONFIG_RTE_LIBRTE_ETHER=y
 CONFIG_RTE_LIBRTE_ETHDEV_DEBUG=n
 CONFIG_RTE_MAX_ETHPORTS=32
-CONFIG_RTE_MAX_QUEUES_PER_PORT=256
+CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
@@ -187,6 +187,7 @@ CONFIG_RTE_LIBRTE_I40E_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_I40E_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC=y
 CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC=n
+CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF=64
 CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF=4
 CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM=4
 # interval up to 8160 us, aligned to 2 (or default value)
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..2ce8d66 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -139,7 +139,7 @@ CONFIG_RTE_LIBRTE_KVARGS=y
 CONFIG_RTE_LIBRTE_ETHER=y
 CONFIG_RTE_LIBRTE_ETHDEV_DEBUG=n
 CONFIG_RTE_MAX_ETHPORTS=32
-CONFIG_RTE_MAX_QUEUES_PER_PORT=256
+CONFIG_RTE_MAX_QUEUES_PER_PORT=1024
 CONFIG_RTE_LIBRTE_IEEE1588=n
 CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16
 CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
@@ -185,6 +185,7 @@ CONFIG_RTE_LIBRTE_I40E_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_I40E_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC=y
 CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC=n
+CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF=64
 CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF=4
 CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM=4
 # interval up to 8160 us, aligned to 2 (or default value)
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 4b70588..8928b0a 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -2240,113 +2240,88 @@ i40e_pf_parameter_init(struct rte_eth_dev *dev)
 {
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
struct i40e_hw *hw = I40E_PF_TO_HW(pf);
-   uint16_t sum_queues = 0, sum_vsis, left_queues;
+   uint16_t qp_count = 0, vsi_count = 0;

-   /* First check if FW support SRIOV */
if (dev->pci_dev->max_vfs && !hw->func_caps.sr_iov_1_1) {
PMD_INIT_LOG(ERR, "HW configuration doesn't support SRIOV");
return -EINVAL;
}

pf->flags = I40E_FLAG_HEADER_SPLIT_DISABLED;
-   pf->max_num_vsi = RTE_MIN(hw->func_caps.num_vsis, I40E_MAX_NUM_VSIS);
-   PMD_INIT_LOG(INFO, "Max supported VSIs:%u", pf->max_num_vsi);
-   /* Allocate queues for pf */
-   if (hw->func_caps.rss) {
-   pf->flags |= I40E_FLAG_RSS;
-   pf->lan_nb_qps = RTE_MIN(hw->func_caps.num_tx_qp,
-   (uint32_t)(1 << hw->func_caps.rss_table_entry_width));
-   pf->lan_nb_qps = i40e_align_floor(pf->lan_nb_qps);
-   } else
+   pf->max_num_vsi = hw->func_caps.num_vsis;
+   pf->lan_nb_qp_max = RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF;
+   pf->vmdq_nb_qp_max = RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM;
+   pf->vf_nb_qp_max = RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF;
+
+   /* FDir queue/VSI allocation */
+   pf->fdir_qp_offset = 0;
+   if (hw->func_caps.fd) {
+   pf->flags |= I40E_FLAG_FDIR;
+   pf->fdir_nb_qps = I40E_DEFAULT_QP_NUM_FDIR;
+   } else {
+   pf->fdir_nb_qps = 0;
+   }
+   qp_count += pf->fdir_nb_qps;
+   vsi_count += 1;
+
+   /* LAN queue/VSI allocation */
+   pf->lan_qp_offset = pf->fdir_qp_offset + pf->fdir_nb_qps;
+   if (!hw->func_caps.rss) {
pf->lan_nb_qps = 1;
-   sum_queues = pf->lan_nb_qps;
-   /* Default VSI is not counted in */
-   sum_vsis = 0;
-   PMD_INIT_LOG(INFO, "PF queue pairs:%u", pf->lan_nb_qps);
+   } else {
+   pf->flags |= I40E_FLAG_RSS;
+   pf->lan_nb_qps = pf->lan_nb_qp_max;
+   }
+   qp_count += pf->lan_nb_qps;
+   vsi_count += 1;

+   /* VF queue/VSI allocation */
+   pf->vf_qp_offset = pf->lan_qp_offset + pf->lan_nb_qps;
if (hw->func_caps.sr_iov_1_1 && dev->pci_dev->max_vfs) {
pf->flags |= I40E_FLAG_SRIOV;
pf->vf_nb_qps = RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF;
-   if (dev->pci_dev->max_vfs > hw->func_caps.num_vfs) {
-   PMD_INIT_LOG(ERR, "Config VF number %u, "
-"max sup

[dpdk-dev] [PATCH v2] ixgbe: Drop flow control frames from VFs

2015-10-22 Thread Wenzhuo Lu

This patch will drop flow control frames from being transmitted
from VSIs.
With this patch in place a malicious VF cannot send flow control
or PFC packets out on the wire.

V2:
Reword the comments.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/ixgbe_pf.c | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c
index fd1c4ca..b33f4e9 100644
--- a/drivers/net/ixgbe/ixgbe_pf.c
+++ b/drivers/net/ixgbe/ixgbe_pf.c
@@ -55,6 +55,7 @@
 #define IXGBE_MAX_VFTA (128)
 #define IXGBE_VF_MSG_SIZE_DEFAULT 1
 #define IXGBE_VF_GET_QUEUE_MSG_SIZE 5
+#define IXGBE_ETHERTYPE_FLOW_CTRL 0x8808

 static inline uint16_t
 dev_num_vf(struct rte_eth_dev *eth_dev)
@@ -166,6 +167,46 @@ void ixgbe_pf_host_uninit(struct rte_eth_dev *eth_dev)
*vfinfo = NULL;
 }

+static void
+ixgbe_add_tx_flow_control_drop_filter(struct rte_eth_dev *eth_dev)
+{
+   struct ixgbe_hw *hw =
+   IXGBE_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
+   struct ixgbe_filter_info *filter_info =
+   IXGBE_DEV_PRIVATE_TO_FILTER_INFO(eth_dev->data->dev_private);
+   uint16_t vf_num;
+   int i;
+
+   /* occupy an entity of ether type filter */
+   for (i = 0; i < IXGBE_MAX_ETQF_FILTERS; i++) {
+   if (!(filter_info->ethertype_mask & (1 << i))) {
+   filter_info->ethertype_mask |= 1 << i;
+   filter_info->ethertype_filters[i] =
+   IXGBE_ETHERTYPE_FLOW_CTRL;
+   break;
+   }
+   }
+   if (i == IXGBE_MAX_ETQF_FILTERS) {
+   RTE_LOG(ERR, PMD, "Cannot find an unused ether type filter"
+   " entity for flow control.\n");
+   return;
+   }
+
+   if (hw->mac.ops.set_ethertype_anti_spoofing) {
+   IXGBE_WRITE_REG(hw, IXGBE_ETQF(i),
+   (IXGBE_ETQF_FILTER_EN |
+   IXGBE_ETQF_TX_ANTISPOOF |
+   IXGBE_ETHERTYPE_FLOW_CTRL));
+
+   vf_num = dev_num_vf(eth_dev);
+   for (i = 0; i < vf_num; i++) {
+   hw->mac.ops.set_ethertype_anti_spoofing(hw, true, i);
+   }
+   }
+
+   return;
+}
+
 int ixgbe_pf_host_configure(struct rte_eth_dev *eth_dev)
 {
uint32_t vtctl, fcrth;
@@ -262,6 +303,8 @@ int ixgbe_pf_host_configure(struct rte_eth_dev *eth_dev)
IXGBE_WRITE_REG(hw, IXGBE_FCRTH_82599(i), fcrth);
}

+   ixgbe_add_tx_flow_control_drop_filter(eth_dev);
+
return 0;
 }

-- 
1.9.3

[dpdk-dev] ixgbe: ierrors counter spuriously increasing in DPDK 2.1

2015-10-22 Thread Martin Weiser

Hi Andriy,

thank you for pointing this discussion out to me. I somehow missed it.
Unfortunately it looks like the discussion stopped after Maryam made a
good proposal so I will vote in on that and hopefully get things started
again.

Best regards,
Martin



On 21.10.15 17:53, Andriy Berestovskyy wrote:
> Yes Marcin,
> The issue was discussed here:
> http://dpdk.org/ml/archives/dev/2015-September/023229.html
>
> You can either fix the ierrors in ixgbe_dev_stats_get() or implement a
> workaround in your app getting the extended statistics and counting
> out some of extended counters from the ierrors.
>
> Here is an example:
> https://github.com/Juniper/contrail-vrouter/commit/72f6ca05ac81d0ca5e7eb93c6ffe7a93648c2b00#diff-99c1f65a00658c7d38b3d1b64cb5fd93R1306
>
> Regards,
> Andriy
>
> On Wed, Oct 21, 2015 at 10:38 AM, Martin Weiser
>  wrote:
>> Hi,
>>
>> with DPDK 2.1 we are seeing the ierrors counter increasing for 82599ES
>> ports without reason. Even directly after starting test-pmd the error
>> counter immediately is 1 without even a single packet being sent to the
>> device:
>>
>> ./testpmd -c 0xfe -n 4 -- --portmask 0x3 --interactive
>> ...
>> testpmd> show port stats all
>>
>>    NIC statistics for port 0  
>> 
>>   RX-packets: 0  RX-missed: 0  RX-bytes:  0
>>   RX-badcrc:  0  RX-badlen: 0  RX-errors: 1
>>   RX-nombuf:  0
>>   TX-packets: 0  TX-errors: 0  TX-bytes:  0
>>   
>> 
>>
>>    NIC statistics for port 1  
>> 
>>   RX-packets: 0  RX-missed: 0  RX-bytes:  0
>>   RX-badcrc:  0  RX-badlen: 0  RX-errors: 1
>>   RX-nombuf:  0
>>   TX-packets: 0  TX-errors: 0  TX-bytes:  0
>>   
>> 
>>
>>
>> When packet forwarding is started the ports perform normally and
>> properly forward all packets but a huge number of ierrors is counted:
>>
>> testpmd> start
>> ...
>> testpmd> show port stats all
>>
>>    NIC statistics for port 0  
>> 
>>   RX-packets: 9011857RX-missed: 0  RX-bytes:  5020932992
>>   RX-badcrc:  0  RX-badlen: 0  RX-errors: 9011753
>>   RX-nombuf:  0
>>   TX-packets: 9026250TX-errors: 0  TX-bytes:  2922375542
>>   
>> 
>>
>>    NIC statistics for port 1  
>> 
>>   RX-packets: 9026250RX-missed: 0  RX-bytes:  2922375542
>>   RX-badcrc:  0  RX-badlen: 0  RX-errors: 9026138
>>   RX-nombuf:  0
>>   TX-packets: 9011857TX-errors: 0  TX-bytes:  5020932992
>>   
>> 
>>
>>
>> When running the exact same test with DPDK version 2.0 no ierrors are
>> reported.
>> Is anyone else seeing strange ierrors being reported for Intel Niantic
>> cards with DPDK 2.1?
>>
>> Best regards,
>> Martin
>>
>
>

[dpdk-dev] [PATCH v4 0/2] e1000: enable igb TSO support

2015-10-22 Thread Lu, Wenzhuo

Hi,

> -Original Message-
> From: Wang, Xiao W
> Sent: Wednesday, October 21, 2015 3:55 PM
> To: dev at dpdk.org
> Cc: Lu, Wenzhuo; Richardson, Bruce; Zhang, Helin; Wang, Xiao W
> Subject: [PATCH v4 0/2] e1000: enable igb TSO support
> 
> v4:
> * Added ULL postfix to compare mask of igb_tx_offload.
> 
> v3:
> * Removed the "unlikely" in check_tso_para function, for there was no
> obvious performance
>   difference, let the branch predictor do the job.
> 
> v2:
> * Reworded the old comment about union igb_vlan_macip which was no
> more used.
> 
> * Corrected typo in line "There're some limitaions in hardware for TCP
> segmentaion offload".
> 
> * Added "unlikely" in check_tso_para function.
> 
> v1:
> * Initial version for igb TSO feature.
> 
> Wang Xiao W (2):
>   e1000: enable igb TSO support
>   doc: update release note for igb TSO support
> 
>  doc/guides/rel_notes/release_2_2.rst |   4 +
>  drivers/net/e1000/igb_ethdev.c   |   6 +-
>  drivers/net/e1000/igb_rxtx.c | 200 
> +--
>  3 files changed, 131 insertions(+), 79 deletions(-)
> 
> --
> 1.9.3
Acked-by: Wenzhuo Lu

[dpdk-dev] i40e: problem with rx packet drops not accounted in statistics

2015-10-22 Thread Martin Weiser

Hi Helin,

good to know that there is work being done on that issue.
By performance problem I mean that theses packet discards start to
appear at low bandwidths where I would not expect any packets to be
dropped. On the same system we can reach higher bandwidths using ixgbe
NICs without loosing a single packet so seeing packets being lost at
only ~5GBit/s and ~1.5Mpps on a 40Gb adapter worries me a bit.

Best regards,
Martin


On 22.10.15 02:16, Zhang, Helin wrote:
> Hi Martin
>
> Yes, we have a developer working on it now, and hopefully he will have 
> something soon later on this fix.
> But what do you mean the performance problem? Did you mean the performance 
> number is not good as expected, or else?
>
> Regards,
> Helin
>
>> -Original Message-
>> From: Martin Weiser [mailto:martin.weiser at allegro-packets.com]
>> Sent: Wednesday, October 21, 2015 4:44 PM
>> To: Zhang, Helin
>> Cc: dev at dpdk.org
>> Subject: Re: i40e: problem with rx packet drops not accounted in statistics
>>
>> Hi Helin,
>>
>> any news on this issue? By the way this is not just a problem with 
>> statistics for us
>> but also a performance problem since these packet discards start appearing 
>> at a
>> relatively low bandwidth (~5GBit/s and ~1.5Mpps).
>>
>> Best regards,
>> Martin
>>
>> On 10.09.15 03:09, Zhang, Helin wrote:
>>> Hi Martin
>>>
>>> Yes, the statistics issue has been reported several times recently.
>>> We will check the issue and try to fix it or get a workaround soon. Thank 
>>> you
>> very much!
>>> Regards,
>>> Helin
>>>
 -Original Message-
 From: Martin Weiser [mailto:martin.weiser at allegro-packets.com]
 Sent: Wednesday, September 9, 2015 7:58 PM
 To: Zhang, Helin
 Cc: dev at dpdk.org
 Subject: i40e: problem with rx packet drops not accounted in
 statistics

 Hi Helin,

 in one of our test setups involving i40e adapters we are experiencing
 packet drops which are not reflected in the interfaces statistics.
 The call to rte_eth_stats_get suggests that all packets were properly
 received but the total number of packets received through
 rte_eth_rx_burst is less than the ipackets counter.
 When for example running the l2fwd application (l2fwd -c 0xfe -n 4 --
 -p
 0x3) and having driver debug messages enabled the following output is
 generated for the interface in question:

 ...
 PMD: i40e_update_vsi_stats(): * VSI[6] stats start
 ***
 PMD: i40e_update_vsi_stats(): rx_bytes:24262434
 PMD: i40e_update_vsi_stats(): rx_unicast:  16779
 PMD: i40e_update_vsi_stats(): rx_multicast:0
 PMD: i40e_update_vsi_stats(): rx_broadcast:0
 PMD: i40e_update_vsi_stats(): rx_discards: 1192557
 PMD: i40e_update_vsi_stats(): rx_unknown_protocol: 0
 PMD: i40e_update_vsi_stats(): tx_bytes:0
 PMD: i40e_update_vsi_stats(): tx_unicast:  0
 PMD: i40e_update_vsi_stats(): tx_multicast:0
 PMD: i40e_update_vsi_stats(): tx_broadcast:0
 PMD: i40e_update_vsi_stats(): tx_discards: 0
 PMD: i40e_update_vsi_stats(): tx_errors:   0
 PMD: i40e_update_vsi_stats(): * VSI[6] stats end
 ***
 PMD: i40e_dev_stats_get(): * PF stats start
 ***
 PMD: i40e_dev_stats_get(): rx_bytes:24262434
 PMD: i40e_dev_stats_get(): rx_unicast:  16779
 PMD: i40e_dev_stats_get(): rx_multicast:0
 PMD: i40e_dev_stats_get(): rx_broadcast:0
 PMD: i40e_dev_stats_get(): rx_discards: 0
 PMD: i40e_dev_stats_get(): rx_unknown_protocol: 16779
 PMD: i40e_dev_stats_get(): tx_bytes:0
 PMD: i40e_dev_stats_get(): tx_unicast:  0
 PMD: i40e_dev_stats_get(): tx_multicast:0
 PMD: i40e_dev_stats_get(): tx_broadcast:0
 PMD: i40e_dev_stats_get(): tx_discards: 0
 PMD: i40e_dev_stats_get(): tx_errors:   0
 PMD: i40e_dev_stats_get(): tx_dropped_link_down: 0
 PMD: i40e_dev_stats_get(): crc_errors:   0
 PMD: i40e_dev_stats_get(): illegal_bytes:0
 PMD: i40e_dev_stats_get(): error_bytes:  0
 PMD: i40e_dev_stats_get(): mac_local_faults: 1
 PMD: i40e_dev_stats_get(): mac_remote_faults:1
 PMD: i40e_dev_stats_get(): rx_length_errors: 0
 PMD: i40e_dev_stats_get(): link_xon_rx:  0
 PMD: i40e_dev_stats_get(): link_xoff_rx: 0
 PMD: i40e_dev_stats_get(): priority_xon_rx[0]:  0
 PMD: i40e_dev_stats_get(): priority_xoff_rx[0]: 0
 PMD: i40e_dev_stats_get(): priority_xon_rx[1]:  0
 PMD: i40e_dev_stats_get(): priority_xoff_rx[1]: 0
 PMD: i40e_dev_stats_get(): priority_xon_rx[2]:  0
 PMD: i40e_dev_stats_get(): priority_xoff_rx[2]: 0
>>

[dpdk-dev] volunteer to be the maintainer of driver/net/intel sub-tree

2015-10-22 Thread Thomas Monjalon

Hi Wenzhuo,

2015-10-22 02:49, Lu, Wenzhuo:
> Hi all,
> Following the discussion of DPDK user space and the maintenance of 
> development sub-trees, I'd like to volunteer myself to be the maintainer of 
> sub-tree driver/net/intel. It includes all the PMD of Intel NICs. And Helin 
> can be my backup.

Thanks for proposing.
You are already doing part of the work being maintainer of e1000,
and Helin for ixgbe and i40e.

> I suggest we create a new directory to move the driver/net/e1000, 
> driver/net/fm10k... to it. And we can also create directories for other 
> vendors just like the kernel driver do.

We don't need to move files to be able to manage them in a sub-tree.
For the day to day tasks, it's better to limit directory depth.
And think about what happened with Broadcom and Qlogic, we are not going
to move files when Intel will buy the NIC xyz.
Generally speaking, it's better to keep company names outside of technical 
things.

> Additionally, as we observed, some patch sets will not only change the files 
> in drivers/net, but also some files in lib/librte_ether, doc, app, 
> examples... Only being drivers/net/intel maintainer cannot work for these 
> patch sets, especially for the new features. Applying partial feature patch 
> set is not ideal. Ideally we need a maintainer to drive the RTE_ETHER 
> discussion. Maybe Bruce can be a top-level maintainer. So, he can help when 
> we face this scenario.

A sub-tree is not restricted to some directories. It must manage an
expertise zone, a technical domain, an area of interest, choose your words ;)

Today we have no working sub-tree. So we should start splitting areas in
some large grain and make it work. Then we can split more with a top down
approach.
So I think we should first create the subtree for networking drivers and wait
a little before having a subtree for Intel NICs.

Do you agree?

[dpdk-dev] ixgbe: account more Rx errors Issue

2015-10-22 Thread Martin Weiser

On 14.09.15 11:50, Tahhan, Maryam wrote:
>> From: Kyle Larose [mailto:eomereadig at gmail.com] 
>> Sent: Wednesday, September 9, 2015 6:43 PM
>> To: Tahhan, Maryam
>> Cc: Olivier MATZ; Andriy Berestovskyy; dev at dpdk.org
>> Subject: Re: [dpdk-dev] ixgbe: account more Rx errors Issue
>>
>>
>> On Mon, Sep 7, 2015 at 7:44 AM, Tahhan, Maryam  
>> wrote:
>>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
>>> Sent: Monday, September 7, 2015 9:30 AM
>>> To: Tahhan, Maryam; Andriy Berestovskyy
>>> Cc: dev at dpdk.org
>>> Subject: Re: ixgbe: account more Rx errors Issue
>>>
>>> Hi,
>>>
>>> On 09/06/2015 07:15 PM, Tahhan, Maryam wrote:
> From: Andriy Berestovskyy [mailto:aber at semihalf.com]
> Sent: Friday, September 4, 2015 5:59 PM
> To: Tahhan, Maryam
> Cc: dev at dpdk.org; Olivier MATZ
> Subject: Re: ixgbe: account more Rx errors Issue
>
> Hi Maryam,
> Please see below.
>
>> XEC counts the Number of receive IPv4, TCP, UDP or SCTP XSUM errors
> Please note than UDP checksum is optional for IPv4, but UDP packets
> with zero checksum hit XEC.
>
 I understand, but this is what the hardware register is picking up and 
 what I
>>> included previously is the definitions of the registers from the datasheet.
>> And general crc errors counts Counts the number of receive packets
>> with
> CRC errors.
>
> Let me explain you with an example.
>
> DPDK 2.0 behavior:
> host A sends 10M IPv4 UDP packets (no checksum) to host B host B
> stats: 9M ipackets + 1M ierrors (missed) = 10M
>
> DPDK 2.1 behavior:
> host A sends 10M IPv4 UDP packets (no checksum) to host B host B
> stats: 9M ipackets + 11M in ierrors (1M missed + 10M XEC) = 20M?
 Because it's hitting the 2 error registers. If you had packets with 
 multiple
>>> errors that are added up as part of ierrors you'll still be getting more 
>>> than
>>> 10M errors which is why I asked for feedback on the 3 suggestions below.
>>> What I'm saying is the number of errors being > the number of received
>>> packets will be seen if you hit multiple error registers on the NIC.
>> So our options are we can:
>> 1. Add only one of these into the error stats.
>> 2. We can introduce some cooking of stats in this scenario, so only
>> add
> either or if they are equal or one is higher than the other.
>> 3. Add them all which means you can have more errors than the number
>> of
> received packets, but TBH this is going to be the case if your
> packets have multiple errors anyway.
>
> 4. ierrors should reflect NIC drops only.
 I may have misinterpreted this, but ierrors in rte_ethdev.h ierrors is 
 defined
>>> as the Total number of erroneous received packets.
 Maybe we need a clear definition or a separate drop counter as I see
>>> uint64_t q_errors defined as: Total number of queue packets received that
>>> are dropped.
> XEC does not count drops, so IMO it should be removed from ierrors.
 While it's picking up the 0 checksum as an error (which it shouldn't
 necessarily be doing), removing it could mean missing other valid
 L3/L4 checksum errors... Let me experiment some more with L3/L4
 checksum errors and crcerrs to see if we can cook the stats around
 this register in particular. I would hate to remove it and miss
 genuine errors
>>> For me, the definition that looks the most straightforward is:
>>>
>>>  ipackets = packets successfully received by hardware imissed = packets
>>> dropped by hardware because the software does
>>>not poll fast enough (= queue full)
>>> ierrors = packets dropped by hardware (malformed packets, ...)
>>>
>>> These 3 stats never count twice the same packet.
>>>
>>> If we want more statistics, they could go in xstats. For instance, a 
>>> counter for
>>> invalid checksum. The definition of these stats would be pmd-specific.
>>>
>>> I agree we should clarify and have a consensus on the definitions before 
>>> going
>>> further.
>>>
>>>
>>> Regards,
>>> Olivier
>> Hi Olivier
>> I think it's important to distinguish between errors and drops and provide a 
>> statistics API that exposes both. This way people have access to as much 
>> information as possible when things do go wrong and nothing is missed in 
>> terms of errors.
>>
>> My suggestion for the high level registers would be:
>> ipackets = Total number of packets successfully received by hardware
>> imissed = Total number of  packets dropped by hardware because the software 
>> does not poll fast enough (= queue full)
>> idrops = Total number of packets dropped by hardware (malformed packets, 
>> ...) Where the # of drops can ONLY be <=  the packets received (without 
>> overlap between registers).
>> ierrors = Total number of erroneous received packets. Where the # of errors 
>> can be >= the packets received (without overlap between registers), this is 
>> because there may be multiple errors ass

[dpdk-dev] [PATCH v3 0/7] Support new flow director modes on Intel x550 NIC

2015-10-22 Thread Ananyev, Konstantin



> -Original Message-
> From: Lu, Wenzhuo
> Sent: Thursday, October 22, 2015 8:12 AM
> To: dev at dpdk.org
> Cc: Ananyev, Konstantin
> Subject: [PATCH v3 0/7] Support new flow director modes on Intel x550 NIC
> 
> This patch set adds 2 new flow director modes on Intel x550 NIC.
> The 2 new fdir modes are mac vlan mode and tunnel mode.
> The mac vlan mode can direct the flow based on the MAC address and VLAN
> TCI.
> The tunnel mode provides the support for VxLAN and NVGRE. x550 can recognize
> VxLAN and NVGRE packets, and direct the packets based on the MAC address,
> VLAN TCI, TNI/VNI.
> Surely, the MAC address, VLAN TCI, TNI/VNI can be masked, so, the flow
> can be directed based on the left conditions. For example, if we want to
> direct the flow based on the MAC address, we can use mac vlan mode with
> VLAN TCI masked.
> Now, only x550 supports these 2 modes. We should not use the new mode on
> other NICs. If so, the ports will not be initialized successfully.
> 
> V2:
> Change the word 'cloud' to 'tunnel'.
> Change 'tni_vni' to 'tunnel_id'.
> 
> V3:
> Change the name mac_addr_mask to mac_addr_byte_mask, for some NICs may like
> to support per bit mask in future.
> Set default VxLAN port only when the NIC support VxLAN.
> Make the condition more strict when check the fdir mode for avoiding the code
> being broken with future expansion.
> Make mac mask more flexible.
> Add a new function for MAC VLAN and tunnel mask.
> 
> 
> Wenzhuo Lu (7):
>   lib/librte_ether: modify the structures for fdir new modes
>   app/testpmd: initialize the new fields for fdir mask
>   app/testpmd: new fdir modes for testpmd parameter
>   app/testpmd: modify the output of the CLI show port fdir
>   app/testpmd: modify and add fdir filter and mask CLIs for new modes
>   ixgbe: implementation for fdir new modes' config
>   doc: release notes update for flow director enhancement
> 
>  app/test-pmd/cmdline.c   | 293 
> +--
>  app/test-pmd/config.c|  45 --
>  app/test-pmd/parameters.c|   7 +-
>  app/test-pmd/testpmd.c   |   3 +
>  doc/guides/rel_notes/release_2_2.rst |   3 +
>  drivers/net/ixgbe/ixgbe_ethdev.h |   3 +
>  drivers/net/ixgbe/ixgbe_fdir.c   | 261 +++
>  lib/librte_ether/rte_eth_ctrl.h  |  69 ++---
>  8 files changed, 606 insertions(+), 78 deletions(-)
> 
> --

Acked-by: Konstantin Ananyev 

> 1.9.3

[dpdk-dev] [PATCH v3] mem: command line option to delete hugepage backing files

2015-10-22 Thread Sergio Gonzalez Monroy

On 21/10/2015 17:34, Bruce Richardson wrote:
> On Wed, Oct 21, 2015 at 04:22:45PM +, shesha Sreenivasamurthy (shesha) 
> wrote:
>> When an application using huge-pages crash or exists, the hugetlbfs
>> backing files are not cleaned up. This is a patch to clean those files.
>> There are multi-process DPDK applications that may be benefited by those
>> backing files. Therefore, I have made that configurable so that the
>> application that does not need those backing files can remove them, thus
>> not changing the current default behavior. The application itself can
>> clean it up, however the rationale behind DPDK cleaning it up is, DPDK
>> created it and therefore, it is better it unlinks it.
>>
>> Signed-off-by: Shesha Sreenivasamurthy 
>> ---
>>   lib/librte_eal/common/eal_common_options.c | 12 
>>   lib/librte_eal/common/eal_internal_cfg.h   |  1 +
>>   lib/librte_eal/common/eal_options.h|  2 ++
>>   lib/librte_eal/linuxapp/eal/eal_memory.c   | 30
>> ++
>>   4 files changed, 45 insertions(+)
>>
> 
>> +static int
>> +unlink_hugepage_files(struct hugepage_file *hugepg_tbl,
>> +unsigned num_hp_info)
>> +{
>> +unsigned socket, size;
>> +int page, nrpages = 0;
>> +
>> +/* get total number of hugepages */
>> +for (size = 0; size < num_hp_info; size++)
>> +for (socket = 0; socket < RTE_MAX_NUMA_NODES; socket++)
>> +nrpages += 
>> internal_config.hugepage_info[size].num_pages[socket];
>> +
>> +for (page = 0; page < nrpages; page++) {
>> +struct hugepage_file *hp = &hugepg_tbl[page];
>> +if (hp->final_va != NULL && unlink(hp->filepath)) {
>> +RTE_LOG(WARNING, EAL, "%s(): Removing %s failed: %s\n",
>> +__func__, hp->filepath, strerror(errno));
>> +}
>> +}
>> +return 0;
>> +}
>> +
>>   /*
>>* unmaps hugepages that are not going to be used. since we originally
>> allocate
>>* ALL hugepages (not just those we need), additional unmapping needs to
>> be done.
>> @@ -1289,6 +1311,14 @@ rte_eal_hugepage_init(void)
>>  goto fail;
>>  }
>>   
>> +/* free the hugepage backing files */
>> +if (internal_config.hugepage_unlink &&
>> +unlink_hugepage_files(tmp_hp,
>> +internal_config.num_hugepage_sizes) < 0) {
>> +RTE_LOG(ERR, EAL, "Unlinking hugepage backing files 
>> failed!\n");
>> +goto fail;
>> +}
>> +
> Sorry for the late comment, but...
>
> Rather than adding a whole new function to be called here, can the same effect
> not be got by adding in 2/3 lines like:
>   if (internal_config.hugepage_unlink)
>   unlink(hugetlb[i].filepath)
>
> at line 409 of eal_memory.c where were have done our final mmap of the file.
> [You also need the same couple of lines for the 32-bit special case at line 
> 351].
> It would be a shorter diff.
>
> /Bruce
If you wanted to avoid the extra function call, I might be cleaner to 
just unlink all files when
doing unmap_all_hugepages_orig.
My two cents: I think it would be easier to read/debug having a function 
that "unlinks files" instead
of unlinking files at different points in map_all_hugepages.

Unfortunately the proposed approach does not work for all cases:
- If we have single file segment, map_all_hugepages does not get call a 
second time, instead we call
   remap_all_hugepages
- If we use options -m or --socket-mem, because unmap_unneeded_hugepages 
does not expect files
   already unlinked, it will fail when trying to unlink unneeded 
hugepage files.

The current patch would work as we only unlink after 
unmap_unneeded_hugepages.

Sergio

[dpdk-dev] [PATCH v4] mem: command line option to delete hugepage backing files

2015-10-22 Thread Sergio Gonzalez Monroy

On 21/10/2015 18:21, shesha Sreenivasamurthy (shesha) wrote:
> When an application using huge-pages crash or exists, the hugetlbfs
> backing files are not cleaned up. This is a patch to clean those files.
> There are multi-process DPDK applications that may be benefited by those
> backing files. Therefore, I have made that configurable so that the
> application that does not need those backing files can remove them, thus
> not changing the current default behavior. The application itself can
> clean it up, however the rationale behind DPDK cleaning it up is, DPDK
> created it and therefore, it is better it unlinks it.
>
>
> Signed-off-by: Shesha Sreenivasamurthy 
> ---
>   lib/librte_eal/common/eal_common_options.c | 12 
>   lib/librte_eal/common/eal_internal_cfg.h   |  1 +
>   lib/librte_eal/common/eal_options.h|  2 ++
>   lib/librte_eal/linuxapp/eal/eal_memory.c   | 12 
>   4 files changed, 27 insertions(+)
>
> diff --git a/lib/librte_eal/common/eal_common_options.c
> b/lib/librte_eal/common/eal_common_options.c
> index 1f459ac..5fe6374 100644
> --- a/lib/librte_eal/common/eal_common_options.c
> +++ b/lib/librte_eal/common/eal_common_options.c
> @@ -79,6 +79,7 @@ eal_long_options[] = {
>   {OPT_MASTER_LCORE,  1, NULL, OPT_MASTER_LCORE_NUM },
>   {OPT_NO_HPET,   0, NULL, OPT_NO_HPET_NUM  },
>   {OPT_NO_HUGE,   0, NULL, OPT_NO_HUGE_NUM  },
> + {OPT_HUGE_UNLINK,   0, NULL, OPT_HUGE_UNLINK_NUM  },
>   {OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM   },
>   {OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM},
>   {OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
> @@ -722,6 +723,10 @@ eal_parse_common_option(int opt, const char *optarg,
>   conf->no_hugetlbfs = 1;
>   break;
>   
> + case OPT_HUGE_UNLINK_NUM:
> + conf->hugepage_unlink = 1;
> + break;
> +
>   case OPT_NO_PCI_NUM:
>   conf->no_pci = 1;
>   break;
> @@ -856,6 +861,12 @@ eal_check_common_options(struct internal_config
> *internal_cfg)
>   return -1;
>   }
>   
> + if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
> + RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
> + "be specified together with --"OPT_NO_HUGE"\n");
> + return -1;
> + }
> +
>   if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) != 0 &&
>   rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_PCI) != 0) {
>   RTE_LOG(ERR, EAL, "Options blacklist (-b) and whitelist (-w) "
> @@ -906,6 +917,7 @@ eal_common_usage(void)
>  "  -h, --help  This help\n"
>  "\nEAL options for DEBUG use only:\n"
>  "  --"OPT_NO_HUGE"   Use malloc instead of hugetlbfs\n"
> +"  --"OPT_HUGE_UNLINK"   Unlink hugepage backing file after
> initalization\n"
The line above (and a couple more in the patch) is getting wrapped, 
causing checkpatch to
report errors and git failing to apply the patch.
>  "  --"OPT_NO_PCI"Disable PCI\n"
>  "  --"OPT_NO_HPET"   Disable HPET\n"
>  "  --"OPT_NO_SHCONF" No shared config (mmap'd files)\n"
> diff --git a/lib/librte_eal/common/eal_internal_cfg.h
> b/lib/librte_eal/common/eal_internal_cfg.h
> index e2ecb0d..84b075f 100644
> --- a/lib/librte_eal/common/eal_internal_cfg.h
> +++ b/lib/librte_eal/common/eal_internal_cfg.h
> @@ -64,6 +64,7 @@ struct internal_config {
>   volatile unsigned force_nchannel; /**< force number of channels */
>   volatile unsigned force_nrank;/**< force number of ranks */
>   volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
> + volatile unsigned hugepage_unlink; /** < true to unlink backing files */
>   volatile unsigned xen_dom0_support; /**< support app running on Xen
> Dom0*/
>   volatile unsigned no_pci; /**< true to disable PCI */
>   volatile unsigned no_hpet;/**< true to disable HPET */
> diff --git a/lib/librte_eal/common/eal_options.h
> b/lib/librte_eal/common/eal_options.h
> index f6714d9..745f38c 100644
> --- a/lib/librte_eal/common/eal_options.h
> +++ b/lib/librte_eal/common/eal_options.h
> @@ -63,6 +63,8 @@ enum {
>   OPT_PROC_TYPE_NUM,
>   #define OPT_NO_HPET   "no-hpet"
>   OPT_NO_HPET_NUM,
> +#define OPT_HUGE_UNLINK"huge-unlink"
> + OPT_HUGE_UNLINK_NUM,
>   #define OPT_NO_HUGE   "no-huge"
>   OPT_NO_HUGE_NUM,
>   #define OPT_NO_PCI"no-pci"
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index ac2745e..c6f383b 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -348,6 +348,12 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>

[dpdk-dev] [PATCH v7 2/8] vhost-user: add VHOST_USER_GET_QUEUE_NUM message

2015-10-22 Thread Xie, Huawei

On 10/21/2015 11:48 AM, Yuanhan Liu wrote:
> To tell the frontend (qemu) how many queue pairs we support.
>
> And it is initiated to VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX.
s/initiated/initialized/

[dpdk-dev] [PATCH v2 01/16] fm10k: add new vPMD file

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Add new file fm10k_rxtx_vec.c and add it into compiling.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/Makefile |1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   45 
 2 files changed, 46 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

diff --git a/drivers/net/fm10k/Makefile b/drivers/net/fm10k/Makefile
index a4a8f56..06ebf83 100644
--- a/drivers/net/fm10k/Makefile
+++ b/drivers/net/fm10k/Makefile
@@ -93,6 +93,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_mbx.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_api.c
+SRCS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k_rxtx_vec.c

 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
new file mode 100644
index 000..69174d9
--- /dev/null
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -0,0 +1,45 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+
+#include 
+#include 
+#include "fm10k.h"
+#include "base/fm10k_type.h"
+
+#include 
+
+#ifndef __INTEL_COMPILER
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif
-- 
1.7.7.6

[dpdk-dev] [PATCH v2 03/16] fm10k: Add a new func to initialize all parameters

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Add new function fm10k_params_init to initialize all fm10k related
variables.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_ethdev.c |   34 ++
 1 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 3c7784e..1bc1e7c 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2066,6 +2066,26 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
.rss_hash_conf_get  = fm10k_rss_hash_conf_get,
 };

+static void
+fm10k_params_init(struct rte_eth_dev *dev)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct fm10k_dev_info *info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+   /* Inialize bus info. Normally we would call fm10k_get_bus_info(), but
+* there is no way to get link status without reading BAR4.  Until this
+* works, assume we have maximum bandwidth.
+* @todo - fix bus info
+*/
+   hw->bus_caps.speed = fm10k_bus_speed_8000;
+   hw->bus_caps.width = fm10k_bus_width_pcie_x8;
+   hw->bus_caps.payload = fm10k_bus_payload_512;
+   hw->bus.speed = fm10k_bus_speed_8000;
+   hw->bus.width = fm10k_bus_width_pcie_x8;
+   hw->bus.payload = fm10k_bus_payload_256;
+
+   info->rx_vec_allowed = true;
+}
+
 static int
 eth_fm10k_dev_init(struct rte_eth_dev *dev)
 {
@@ -2112,18 +2132,8 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
return -EIO;
}

-   /*
-* Inialize bus info. Normally we would call fm10k_get_bus_info(), but
-* there is no way to get link status without reading BAR4.  Until this
-* works, assume we have maximum bandwidth.
-* @todo - fix bus info
-*/
-   hw->bus_caps.speed = fm10k_bus_speed_8000;
-   hw->bus_caps.width = fm10k_bus_width_pcie_x8;
-   hw->bus_caps.payload = fm10k_bus_payload_512;
-   hw->bus.speed = fm10k_bus_speed_8000;
-   hw->bus.width = fm10k_bus_width_pcie_x8;
-   hw->bus.payload = fm10k_bus_payload_256;
+   /* Initialize parameters */
+   fm10k_params_init(dev);

/* Initialize the hw */
diag = fm10k_init_hw(hw);
-- 
1.7.7.6

[dpdk-dev] [PATCH v2 05/16] fm10k: add 2 functions to parse pkt_type and offload flag

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Add 2 functions, in which using SSE instructions to parse RX desc
to get pkt_type and ol_flags in mbuf.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |  127 
 1 files changed, 127 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 75533f9..581a309 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,133 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif

+/* Handling the offload flags (olflags) field takes computation
+ * time when receiving packets. Therefore we provide a flag to disable
+ * the processing of the olflags field when they are not needed. This
+ * gives improved performance, at the cost of losing the offload info
+ * in the received packet
+ */
+#ifdef RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE
+
+/* Vlan present flag shift */
+#define VP_SHIFT (2)
+/* L3 type shift */
+#define L3TYPE_SHIFT (4)
+/* L4 type shift */
+#define L4TYPE_SHIFT (7)
+
+static inline void
+fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+   __m128i ptype0, ptype1, vtag0, vtag1;
+   union {
+   uint16_t e[4];
+   uint64_t dword;
+   } vol;
+
+   const __m128i pkttype_msk = _mm_set_epi16(
+   0x, 0x, 0x, 0x,
+   PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT,
+   PKT_RX_VLAN_PKT, PKT_RX_VLAN_PKT);
+
+   /* mask everything except rss type */
+   const __m128i rsstype_msk = _mm_set_epi16(
+   0x, 0x, 0x, 0x,
+   0x000F, 0x000F, 0x000F, 0x000F);
+
+   /* map rss type to rss hash flag */
+   const __m128i rss_flags = _mm_set_epi8(0, 0, 0, 0,
+   0, 0, 0, PKT_RX_RSS_HASH,
+   PKT_RX_RSS_HASH, 0, PKT_RX_RSS_HASH, 0,
+   PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, 0);
+
+   ptype0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+   ptype1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+   vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
+   vtag1 = _mm_unpackhi_epi16(descs[2], descs[3]);
+
+   ptype0 = _mm_unpacklo_epi32(ptype0, ptype1);
+   ptype0 = _mm_and_si128(ptype0, rsstype_msk);
+   ptype0 = _mm_shuffle_epi8(rss_flags, ptype0);
+
+   vtag1 = _mm_unpacklo_epi32(vtag0, vtag1);
+   vtag1 = _mm_srli_epi16(vtag1, VP_SHIFT);
+   vtag1 = _mm_and_si128(vtag1, pkttype_msk);
+
+   vtag1 = _mm_or_si128(ptype0, vtag1);
+   vol.dword = _mm_cvtsi128_si64(vtag1);
+
+   rx_pkts[0]->ol_flags = vol.e[0];
+   rx_pkts[1]->ol_flags = vol.e[1];
+   rx_pkts[2]->ol_flags = vol.e[2];
+   rx_pkts[3]->ol_flags = vol.e[3];
+}
+
+static inline void
+fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
+{
+   __m128i l3l4type0, l3l4type1, l3type, l4type;
+   union {
+   uint16_t e[4];
+   uint64_t dword;
+   } vol;
+
+   /* L3 pkt type mask  Bit4 to Bit6 */
+   const __m128i l3type_msk = _mm_set_epi16(
+   0x, 0x, 0x, 0x,
+   0x0070, 0x0070, 0x0070, 0x0070);
+
+   /* L4 pkt type mask  Bit7 to Bit9 */
+   const __m128i l4type_msk = _mm_set_epi16(
+   0x, 0x, 0x, 0x,
+   0x0380, 0x0380, 0x0380, 0x0380);
+
+   /* convert RRC l3 type to mbuf format */
+   const __m128i l3type_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
+   0, 0, 0, RTE_PTYPE_L3_IPV6_EXT,
+   RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV4_EXT,
+   RTE_PTYPE_L3_IPV4, 0);
+
+   /* Convert RRC l4 type to mbuf format l4type_flags shift-left 8 bits
+* to fill into8 bits length.
+*/
+   const __m128i l4type_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, 0,
+   RTE_PTYPE_TUNNEL_GENEVE >> 8,
+   RTE_PTYPE_TUNNEL_NVGRE >> 8,
+   RTE_PTYPE_TUNNEL_VXLAN >> 8,
+   RTE_PTYPE_TUNNEL_GRE >> 8,
+   RTE_PTYPE_L4_UDP >> 8,
+   RTE_PTYPE_L4_TCP >> 8,
+   0);
+
+   l3l4type0 = _mm_unpacklo_epi16(descs[0], descs[1]);
+   l3l4type1 = _mm_unpacklo_epi16(descs[2], descs[3]);
+   l3l4type0 = _mm_unpacklo_epi32(l3l4type0, l3l4type1);
+
+   l3type = _mm_and_si128(l3l4type0, l3type_msk);
+   l4type = _mm_and_si128(l3l4type0, l4type_msk);
+
+   l3type = _mm_srli_epi16(l3type, L3TYPE_SHIFT);
+   l4type = _mm_srli_epi16(l4type, L4TYPE_SHIFT);
+
+   l3type = _mm_shuffle_epi8(l3type_flags, l3type);
+   /* l4type_flags shift-left for 8 bits, need shift-right back */
+   l4type = _mm_shuffle_epi8(l4type_flags, l4type);
+
+   l4type = _mm_slli_epi16(l4type,

[dpdk-dev] [PATCH v2 06/16] fm10k: add Vector RX function

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Add func fm10k_recv_raw_pkts_vec to parse raw packets, in which
includes possible chained packets.
Add func fm10k_recv_pkts_vec to receive single mbuf packet.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  196 
 2 files changed, 197 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 5df7960..f04ba2c 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -327,4 +327,5 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
uint16_t nb_pkts);

 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 581a309..482b76c 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -281,3 +281,199 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
/* Update the tail pointer on the NIC */
FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
 }
+
+static inline uint16_t
+fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+   uint16_t nb_pkts, uint8_t *split_packet)
+{
+   volatile union fm10k_rx_desc *rxdp;
+   struct rte_mbuf **mbufp;
+   uint16_t nb_pkts_recd;
+   int pos;
+   struct fm10k_rx_queue *rxq = rx_queue;
+   uint64_t var;
+   __m128i shuf_msk;
+   __m128i dd_check, eop_check;
+   uint16_t next_dd;
+
+   next_dd = rxq->next_dd;
+
+   /* Just the act of getting into the function from the application is
+* going to cost about 7 cycles
+*/
+   rxdp = rxq->hw_ring + next_dd;
+
+   _mm_prefetch((const void *)rxdp, _MM_HINT_T0);
+
+   /* See if we need to rearm the RX queue - gives the prefetch a bit
+* of time to act
+*/
+   if (rxq->rxrearm_nb > RTE_FM10K_RXQ_REARM_THRESH)
+   fm10k_rxq_rearm(rxq);
+
+   /* Before we start moving massive data around, check to see if
+* there is actually a packet available
+*/
+   if (!(rxdp->d.staterr & FM10K_RXD_STATUS_DD))
+   return 0;
+
+   /* 4 packets DD mask */
+   dd_check = _mm_set_epi64x(0x00010001LL, 0x00010001LL);
+
+   /* 4 packets EOP mask */
+   eop_check = _mm_set_epi64x(0x00020002LL, 0x00020002LL);
+
+   /* mask to shuffle from desc. to mbuf */
+   shuf_msk = _mm_set_epi8(
+   7, 6, 5, 4,  /* octet 4~7, 32bits rss */
+   15, 14,  /* octet 14~15, low 16 bits vlan_macip */
+   13, 12,  /* octet 12~13, 16 bits data_len */
+   0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
+   13, 12,  /* octet 12~13, low 16 bits pkt_len */
+   0xFF, 0xFF,  /* skip high 16 bits pkt_type */
+   0xFF, 0xFF   /* Skip pkt_type field in shuffle operation */
+   );
+
+   /* Cache is empty -> need to scan the buffer rings, but first move
+* the next 'n' mbufs into the cache
+*/
+   mbufp = &rxq->sw_ring[next_dd];
+
+   /* A. load 4 packet in one loop
+* [A*. mask out 4 unused dirty field in desc]
+* B. copy 4 mbuf point from swring to rx_pkts
+* C. calc the number of DD bits among the 4 packets
+* [C*. extract the end-of-packet bit, if requested]
+* D. fill info. from desc to mbuf
+*/
+   for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
+   pos += RTE_FM10K_DESCS_PER_LOOP,
+   rxdp += RTE_FM10K_DESCS_PER_LOOP) {
+   __m128i descs0[RTE_FM10K_DESCS_PER_LOOP];
+   __m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
+   __m128i zero, staterr, sterr_tmp1, sterr_tmp2;
+   __m128i mbp1, mbp2; /* two mbuf pointer in one XMM reg. */
+
+   /* B.1 load 1 mbuf point */
+   mbp1 = _mm_loadu_si128((__m128i *)&mbufp[pos]);
+
+   /* Read desc statuses backwards to avoid race condition */
+   /* A.1 load 4 pkts desc */
+   descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
+
+   /* B.2 copy 2 mbuf point into rx_pkts  */
+   _mm_storeu_si128((__m128i *)&rx_pkts[pos], mbp1);
+
+   /* B.1 load 1 mbuf point */
+   mbp2 = _mm_loadu_si128((__m128i *)&mbufp[pos+2]);
+
+   descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
+   /* B.1 load 2 mbuf point */
+   descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1));
+   descs0[0] = _mm_loadu_si128((__m128i *)(rxdp));
+
+   /* B.2 copy 2 mbuf point into rx_pkts  */
+   _mm_storeu_si128((__m128i *)&rx_pkts[pos+2], mbp2);
+
+   /* avoid compiler reorder optimizatio

[dpdk-dev] [PATCH v2 07/16] fm10k: add func to do Vector RX condition check

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Add func fm10k_rx_vec_condition_check to check if Vector RX
func can be applied.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |1 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   31 +++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index f04ba2c..1502ae3 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -327,5 +327,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
uint16_t nb_pkts);

 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
+int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 482b76c..96ca28b 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -172,6 +172,37 @@ fm10k_desc_to_pktype_v(__m128i descs[4], struct rte_mbuf 
**rx_pkts)
 #endif

 int __attribute__((cold))
+fm10k_rx_vec_condition_check(struct rte_eth_dev *dev)
+{
+#ifndef RTE_LIBRTE_IEEE1588
+   struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
+   struct rte_fdir_conf *fconf = &dev->data->dev_conf.fdir_conf;
+
+#ifndef RTE_FM10K_RX_OLFLAGS_ENABLE
+   /* whithout rx ol_flags, no VP flag report */
+   if (rxmode->hw_vlan_extend != 0)
+   return -1;
+#endif
+
+   /* no fdir support */
+   if (fconf->mode != RTE_FDIR_MODE_NONE)
+   return -1;
+
+   /* - no csum error report support
+* - no header split support
+*/
+   if (rxmode->hw_ip_checksum == 1 ||
+   rxmode->header_split == 1)
+   return -1;
+
+   return 0;
+#else
+   RTE_SET_USED(dev);
+   return -1;
+#endif
+}
+
+int __attribute__((cold))
 fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
 {
uintptr_t p;
-- 
1.7.7.6

[dpdk-dev] [PATCH v2 08/16] fm10k: add Vector RX scatter function

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Add func fm10k_recv_scattered_pkts_vec to receive chained packets
with SSE instructions.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |2 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   88 
 2 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 1502ae3..06697fa 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,4 +329,6 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
+uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
+   uint16_t);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 96ca28b..237de9d 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -508,3 +508,91 @@ fm10k_recv_pkts_vec(void *rx_queue, struct rte_mbuf 
**rx_pkts,
 {
return fm10k_recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }
+
+static inline uint16_t
+fm10k_reassemble_packets(struct fm10k_rx_queue *rxq,
+   struct rte_mbuf **rx_bufs,
+   uint16_t nb_bufs, uint8_t *split_flags)
+{
+   struct rte_mbuf *pkts[RTE_FM10K_MAX_RX_BURST]; /*finished pkts*/
+   struct rte_mbuf *start = rxq->pkt_first_seg;
+   struct rte_mbuf *end =  rxq->pkt_last_seg;
+   unsigned pkt_idx, buf_idx;
+
+
+   for (buf_idx = 0, pkt_idx = 0; buf_idx < nb_bufs; buf_idx++) {
+   if (end != NULL) {
+   /* processing a split packet */
+   end->next = rx_bufs[buf_idx];
+   start->nb_segs++;
+   start->pkt_len += rx_bufs[buf_idx]->data_len;
+   end = end->next;
+
+   if (!split_flags[buf_idx]) {
+   /* it's the last packet of the set */
+   start->hash = end->hash;
+   start->ol_flags = end->ol_flags;
+   pkts[pkt_idx++] = start;
+   start = end = NULL;
+   }
+   } else {
+   /* not processing a split packet */
+   if (!split_flags[buf_idx]) {
+   /* not a split packet, save and skip */
+   pkts[pkt_idx++] = rx_bufs[buf_idx];
+   continue;
+   }
+   end = start = rx_bufs[buf_idx];
+   }
+   }
+
+   /* save the partial packet for next time */
+   rxq->pkt_first_seg = start;
+   rxq->pkt_last_seg = end;
+   memcpy(rx_bufs, pkts, pkt_idx * (sizeof(*pkts)));
+   return pkt_idx;
+}
+
+/*
+ * vPMD receive routine that reassembles scattered packets
+ *
+ * Notice:
+ * - don't support ol_flags for rss and csum err
+ * - nb_pkts < RTE_IXGBE_DESCS_PER_LOOP, just return no packet
+ * - nb_pkts > RTE_IXGBE_MAX_RX_BURST, only scan RTE_IXGBE_MAX_RX_BURST
+ *   numbers of DD bit
+ * - floor align nb_pkts to a RTE_IXGBE_DESC_PER_LOOP power-of-two
+ */
+uint16_t
+fm10k_recv_scattered_pkts_vec(void *rx_queue,
+   struct rte_mbuf **rx_pkts,
+   uint16_t nb_pkts)
+{
+   struct fm10k_rx_queue *rxq = rx_queue;
+   uint8_t split_flags[RTE_FM10K_MAX_RX_BURST] = {0};
+   unsigned i = 0;
+
+   /* get some new buffers */
+   uint16_t nb_bufs = fm10k_recv_raw_pkts_vec(rxq, rx_pkts, nb_pkts,
+   split_flags);
+   if (nb_bufs == 0)
+   return 0;
+
+   /* happy day case, full burst + no packets to be joined */
+   const uint64_t *split_fl64 = (uint64_t *)split_flags;
+   if (rxq->pkt_first_seg == NULL &&
+   split_fl64[0] == 0 && split_fl64[1] == 0 &&
+   split_fl64[2] == 0 && split_fl64[3] == 0)
+   return nb_bufs;
+
+   /* reassemble any packets that need reassembly*/
+   if (rxq->pkt_first_seg == NULL) {
+   /* find the first split flag, and only reassemble then*/
+   while (i < nb_bufs && !split_flags[i])
+   i++;
+   if (i == nb_bufs)
+   return nb_bufs;
+   }
+   return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
+   &split_flags[i]);
+}
-- 
1.7.7.6

[dpdk-dev] [PATCH v2 00/16] Vector Rx/Tx PMD implementation for fm10k

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

v2:
 - Fix a typo issue
 - Fix an improper prefetch in vector RX function, in which prefetches
   un-initialized mbuf.
 - Remove limitation on number of desc pointer in vector RX function.
 - Re-organize some comments.
 - Add a new patch to fix a crash issue in vector RX func.
 - Add a new patch to update release notes.

v1:
This patch set includes Vector Rx/Tx functions to receive/transmit packets
for fm10k devices. It also contains logic to do sanity check for proper
RX/TX function selections.

Chen Jing D(Mark) (16):
  fm10k: add new vPMD file
  fm10k: add vPMD pre-condition check for each RX queue
  fm10k: Add a new func to initialize all parameters
  fm10k: add func to re-allocate mbuf for RX ring
  fm10k: add 2 functions to parse pkt_type and offload flag
  fm10k: add Vector RX function
  fm10k: add func to do Vector RX condition check
  fm10k: add Vector RX scatter function
  fm10k: add function to decide best RX function
  fm10k: add func to release mbuf in case Vector RX applied
  fm10k: add Vector TX function
  fm10k: use func pointer to reset TX queue and mbuf release
  fm10k: introduce 2 funcs to reset TX queue and mbuf release
  fm10k: Add function to decide best TX func
  fm10k: fix a crash issue in vector RX func
  doc: release notes update for fm10k Vector PMD

 doc/guides/rel_notes/release_2_2.rst |5 +
 drivers/net/fm10k/Makefile   |1 +
 drivers/net/fm10k/fm10k.h|   45 ++-
 drivers/net/fm10k/fm10k_ethdev.c |  168 ++-
 drivers/net/fm10k/fm10k_rxtx_vec.c   |  834 ++
 5 files changed, 1025 insertions(+), 28 deletions(-)
 create mode 100644 drivers/net/fm10k/fm10k_rxtx_vec.c

-- 
1.7.7.6

[dpdk-dev] [PATCH v2 10/16] fm10k: add func to release mbuf in case Vector RX applied

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Since Vector RX use different variables to trace RX HW ring, it
leads to need different func to release mbuf properly.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |1 +
 drivers/net/fm10k/fm10k_ethdev.c   |6 ++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   18 ++
 3 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 8614e81..c5e66e2 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -329,6 +329,7 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,

 int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 int fm10k_rx_vec_condition_check(struct rte_eth_dev *);
+void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
uint16_t);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 53c4ef1..2c3d8be 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -143,6 +143,12 @@ rx_queue_clean(struct fm10k_rx_queue *q)
for (i = 0; i < q->nb_desc; ++i)
q->hw_ring[i] = zero;

+   /* vPMD driver has a different way of releasing mbufs. */
+   if (q->rx_using_sse) {
+   fm10k_rx_queue_release_mbufs_vec(q);
+   return;
+   }
+
/* free software buffers */
for (i = 0; i < q->nb_desc; ++i) {
if (q->sw_ring[i]) {
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 237de9d..ab0218e 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -313,6 +313,24 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
FM10K_PCI_REG_WRITE(rxq->tail_ptr, rx_id);
 }

+void __attribute__((cold))
+fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq)
+{
+   const unsigned mask = rxq->nb_desc - 1;
+   unsigned i;
+
+   if (rxq->sw_ring == NULL || rxq->rxrearm_nb >= rxq->nb_desc)
+   return;
+
+   /* free all mbufs that are valid in the ring */
+   for (i = rxq->next_dd; i != rxq->rxrearm_start; i = (i + 1) & mask)
+   rte_pktmbuf_free_seg(rxq->sw_ring[i]);
+   rxq->rxrearm_nb = rxq->nb_desc;
+
+   /* set all entries to NULL */
+   memset(rxq->sw_ring, 0, sizeof(rxq->sw_ring[0]) * rxq->nb_desc);
+}
+
 static inline uint16_t
 fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts, uint8_t *split_packet)
-- 
1.7.7.6

[dpdk-dev] [PATCH v2 02/16] fm10k: add vPMD pre-condition check for each RX queue

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Add condition check in rx_queue_setup func. If number of RX desc
can't satisfy vPMD requirement, record it into a variable. Or
call fm10k_rxq_vec_setup to initialize Vector RX.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |   11 ---
 drivers/net/fm10k/fm10k_ethdev.c   |   11 +++
 drivers/net/fm10k/fm10k_rxtx_vec.c |   21 +
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index c089882..362a2d0 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -135,6 +135,8 @@ struct fm10k_dev_info {
/* Protect the mailbox to avoid race condition */
rte_spinlock_tmbx_lock;
struct fm10k_macvlan_filter_infomacvlan;
+   /* Flag to indicate if RX vector conditions satisfied */
+   bool rx_vec_allowed;
 };

 /*
@@ -165,9 +167,10 @@ struct fm10k_rx_queue {
struct rte_mempool *mp;
struct rte_mbuf **sw_ring;
volatile union fm10k_rx_desc *hw_ring;
-   struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
-   struct rte_mbuf *pkt_last_seg;  /**< Last segment of current packet. */
+   struct rte_mbuf *pkt_first_seg; /* First segment of current packet. */
+   struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
uint64_t hw_ring_phys_addr;
+   uint64_t mbuf_initializer; /* value to init mbufs */
uint16_t next_dd;
uint16_t next_alloc;
uint16_t next_trigger;
@@ -177,7 +180,7 @@ struct fm10k_rx_queue {
uint16_t queue_id;
uint8_t port_id;
uint8_t drop_en;
-   uint8_t rx_deferred_start; /**< don't start this queue in dev start. */
+   uint8_t rx_deferred_start; /* don't start this queue in dev start. */
 };

 /*
@@ -313,4 +316,6 @@ uint16_t fm10k_recv_scattered_pkts(void *rx_queue,

 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
+
+int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index a69c990..3c7784e 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1251,6 +1251,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
const struct rte_eth_rxconf *conf, struct rte_mempool *mp)
 {
struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
struct fm10k_rx_queue *q;
const struct rte_memzone *mz;

@@ -1333,6 +1334,16 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
q->hw_ring_phys_addr = mz->phys_addr;
 #endif

+   /* Check if number of descs satisfied Vector requirement */
+   if (!rte_is_power_of_2(nb_desc)) {
+   PMD_INIT_LOG(DEBUG, "queue[%d] doesn't meet Vector Rx "
+   "preconditions - canceling the feature for "
+   "the whole port[%d]",
+q->queue_id, q->port_id);
+   dev_info->rx_vec_allowed = false;
+   } else
+   fm10k_rxq_vec_setup(q);
+
dev->data->rx_queues[queue_id] = q;
return 0;
 }
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 69174d9..34b677b 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -43,3 +43,24 @@
 #ifndef __INTEL_COMPILER
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif
+
+int __attribute__((cold))
+fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
+{
+   uintptr_t p;
+   struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
+
+   mb_def.nb_segs = 1;
+   /* data_off will be ajusted after new mbuf allocated for 512-byte
+* alignment.
+*/
+   mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+   mb_def.port = rxq->port_id;
+   rte_mbuf_refcnt_set(&mb_def, 1);
+
+   /* prevent compiler reordering: rearm_data covers previous fields */
+   rte_compiler_barrier();
+   p = (uintptr_t)&mb_def.rearm_data;
+   rxq->mbuf_initializer = *(uint64_t *)p;
+   return 0;
+}
-- 
1.7.7.6

[dpdk-dev] [PATCH v2 04/16] fm10k: add func to re-allocate mbuf for RX ring

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Add function fm10k_rxq_rearm to re-allocate mbuf for used desc
in RX HW ring.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |9 
 drivers/net/fm10k/fm10k_ethdev.c   |3 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |   90 
 3 files changed, 102 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 362a2d0..5df7960 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -123,6 +123,12 @@
 #define FM10K_VFTA_BIT(vlan_id)(1 << ((vlan_id) & 0x1F))
 #define FM10K_VFTA_IDX(vlan_id)((vlan_id) >> 5)

+#define RTE_FM10K_RXQ_REARM_THRESH  32
+#define RTE_FM10K_VPMD_TX_BURST 32
+#define RTE_FM10K_MAX_RX_BURST  RTE_FM10K_RXQ_REARM_THRESH
+#define RTE_FM10K_TX_MAX_FREE_BUF_SZ64
+#define RTE_FM10K_DESCS_PER_LOOP4
+
 struct fm10k_macvlan_filter_info {
uint16_t vlan_num;   /* Total VLAN number */
uint16_t mac_num;/* Total mac number */
@@ -178,6 +184,9 @@ struct fm10k_rx_queue {
volatile uint32_t *tail_ptr;
uint16_t nb_desc;
uint16_t queue_id;
+   /* Below 2 fields only valid in case vPMD is applied. */
+   uint16_t rxrearm_nb; /* number of remaining to be re-armed */
+   uint16_t rxrearm_start;  /* the idx we start the re-arming from */
uint8_t port_id;
uint8_t drop_en;
uint8_t rx_deferred_start; /* don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 1bc1e7c..24f936a 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -121,6 +121,9 @@ rx_queue_reset(struct fm10k_rx_queue *q)
q->next_alloc = 0;
q->next_trigger = q->alloc_thresh - 1;
FM10K_PCI_REG_WRITE(q->tail_ptr, q->nb_desc - 1);
+   q->rxrearm_start = 0;
+   q->rxrearm_nb = 0;
+
return 0;
 }

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 34b677b..75533f9 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -64,3 +64,93 @@ fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq)
rxq->mbuf_initializer = *(uint64_t *)p;
return 0;
 }
+
+static inline void
+fm10k_rxq_rearm(struct fm10k_rx_queue *rxq)
+{
+   int i;
+   uint16_t rx_id;
+   volatile union fm10k_rx_desc *rxdp;
+   struct rte_mbuf **mb_alloc = &rxq->sw_ring[rxq->rxrearm_start];
+   struct rte_mbuf *mb0, *mb1;
+   __m128i head_off = _mm_set_epi64x(
+   RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1,
+   RTE_PKTMBUF_HEADROOM + FM10K_RX_DATABUF_ALIGN - 1);
+   __m128i dma_addr0, dma_addr1;
+   /* Rx buffer need to be aligned with 512 byte */
+   const __m128i hba_msk = _mm_set_epi64x(0,
+   UINT64_MAX - FM10K_RX_DATABUF_ALIGN + 1);
+
+   rxdp = rxq->hw_ring + rxq->rxrearm_start;
+
+   /* Pull 'n' more MBUFs into the software ring */
+   if (rte_mempool_get_bulk(rxq->mp,
+(void *)mb_alloc,
+RTE_FM10K_RXQ_REARM_THRESH) < 0) {
+   rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+   RTE_FM10K_RXQ_REARM_THRESH;
+   return;
+   }
+
+   /* Initialize the mbufs in vector, process 2 mbufs in one loop */
+   for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i += 2, mb_alloc += 2) {
+   __m128i vaddr0, vaddr1;
+   uintptr_t p0, p1;
+
+   mb0 = mb_alloc[0];
+   mb1 = mb_alloc[1];
+
+   /* Flush mbuf with pkt template.
+* Data to be rearmed is 6 bytes long.
+* Though, RX will overwrite ol_flags that are coming next
+* anyway. So overwrite whole 8 bytes with one load:
+* 6 bytes of rearm_data plus first 2 bytes of ol_flags.
+*/
+   p0 = (uintptr_t)&mb0->rearm_data;
+   *(uint64_t *)p0 = rxq->mbuf_initializer;
+   p1 = (uintptr_t)&mb1->rearm_data;
+   *(uint64_t *)p1 = rxq->mbuf_initializer;
+
+   /* load buf_addr(lo 64bit) and buf_physaddr(hi 64bit) */
+   vaddr0 = _mm_loadu_si128((__m128i *)&(mb0->buf_addr));
+   vaddr1 = _mm_loadu_si128((__m128i *)&(mb1->buf_addr));
+
+   /* convert pa to dma_addr hdr/data */
+   dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0);
+   dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1);
+
+   /* add headroom to pa values */
+   dma_addr0 = _mm_add_epi64(dma_addr0, head_off);
+   dma_addr1 = _mm_add_epi64(dma_addr1, head_off);
+
+   /* Do 512 byte alignment to satisfy HW requirement, in the
+* meanwhile, set Header Buffer

[dpdk-dev] [PATCH v2 12/16] fm10k: use func pointer to reset TX queue and mbuf release

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Vector TX use different way to manage TX queue, it's necessary
to use different functions to reset TX queue and release mbuf
in TX queue. So, introduce 2 function pointers to do such ops.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h|9 +
 drivers/net/fm10k/fm10k_ethdev.c |   21 -
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 0a4c174..2bead12 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -204,11 +204,14 @@ struct fifo {
uint16_t *endp;
 };

+struct fm10k_txq_ops;
+
 struct fm10k_tx_queue {
struct rte_mbuf **sw_ring;
struct fm10k_tx_desc *hw_ring;
uint64_t hw_ring_phys_addr;
struct fifo rs_tracker;
+   const struct fm10k_txq_ops *ops; /* txq ops */
uint16_t last_free;
uint16_t next_free;
uint16_t nb_free;
@@ -225,6 +228,11 @@ struct fm10k_tx_queue {
uint16_t queue_id;
 };

+struct fm10k_txq_ops {
+   void (*release_mbufs)(struct fm10k_tx_queue *txq);
+   void (*reset)(struct fm10k_tx_queue *txq);
+};
+
 #define MBUF_DMA_ADDR(mb) \
((uint64_t) ((mb)->buf_physaddr + (mb)->data_off))

@@ -338,4 +346,5 @@ uint16_t fm10k_recv_scattered_pkts_vec(void *, struct 
rte_mbuf **,
uint16_t);
 uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
+void fm10k_txq_vec_setup(struct fm10k_tx_queue *txq);
 #endif
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 2c3d8be..0a523eb 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -292,6 +292,11 @@ tx_queue_disable(struct fm10k_hw *hw, uint16_t qnum)
return 0;
 }

+static const struct fm10k_txq_ops def_txq_ops = {
+   .release_mbufs = tx_queue_free,
+   .reset = tx_queue_reset,
+};
+
 static int
 fm10k_dev_configure(struct rte_eth_dev *dev)
 {
@@ -571,7 +576,8 @@ fm10k_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t 
tx_queue_id)
PMD_INIT_FUNC_TRACE();

if (tx_queue_id < dev->data->nb_tx_queues) {
-   tx_queue_reset(dev->data->tx_queues[tx_queue_id]);
+   struct fm10k_tx_queue *q = dev->data->tx_queues[tx_queue_id];
+   q->ops->reset(q);

/* reset head and tail pointers */
FM10K_WRITE_REG(hw, FM10K_TDH(tx_queue_id), 0);
@@ -837,8 +843,10 @@ fm10k_dev_queue_release(struct rte_eth_dev *dev)
PMD_INIT_FUNC_TRACE();

if (dev->data->tx_queues) {
-   for (i = 0; i < dev->data->nb_tx_queues; i++)
-   fm10k_tx_queue_release(dev->data->tx_queues[i]);
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   struct fm10k_tx_queue *txq = dev->data->tx_queues[i];
+   txq->ops->release_mbufs(txq);
+   }
}

if (dev->data->rx_queues) {
@@ -1454,7 +1462,8 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
 * different socket than was previously used.
 */
if (dev->data->tx_queues[queue_id] != NULL) {
-   tx_queue_free(dev->data->tx_queues[queue_id]);
+   struct fm10k_tx_queue *txq = dev->data->tx_queues[queue_id];
+   txq->ops->release_mbufs(txq);
dev->data->tx_queues[queue_id] = NULL;
}

@@ -1470,6 +1479,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
q->nb_desc = nb_desc;
q->port_id = dev->data->port_id;
q->queue_id = queue_id;
+   q->ops = &def_txq_ops;
q->tail_ptr = (volatile uint32_t *)
&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
if (handle_txconf(q, conf))
@@ -1528,9 +1538,10 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
 static void
 fm10k_tx_queue_release(void *queue)
 {
+   struct fm10k_tx_queue *q = queue;
PMD_INIT_FUNC_TRACE();

-   tx_queue_free(queue);
+   q->ops->release_mbufs(q);
 }

 static int
-- 
1.7.7.6

[dpdk-dev] [PATCH v2 09/16] fm10k: add function to decide best RX function

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Add func fm10k_set_rx_function to decide best RX func in
fm10k_dev_rx_init

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h|1 +
 drivers/net/fm10k/fm10k_ethdev.c |   36 
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 06697fa..8614e81 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -187,6 +187,7 @@ struct fm10k_rx_queue {
/* Below 2 fields only valid in case vPMD is applied. */
uint16_t rxrearm_nb; /* number of remaining to be re-armed */
uint16_t rxrearm_start;  /* the idx we start the re-arming from */
+   uint16_t rx_using_sse; /* indicates that vector RX is in use */
uint8_t port_id;
uint8_t drop_en;
uint8_t rx_deferred_start; /* don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 24f936a..53c4ef1 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -67,6 +67,7 @@ static void
 fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
+static void fm10k_set_rx_function(struct rte_eth_dev *dev);

 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -462,7 +463,6 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
dev->data->dev_conf.rxmode.enable_scatter) {
uint32_t reg;
dev->data->scattered_rx = 1;
-   dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
reg = FM10K_READ_REG(hw, FM10K_SRRCTL(i));
reg |= FM10K_SRRCTL_BUFFER_CHAINING_EN;
FM10K_WRITE_REG(hw, FM10K_SRRCTL(i), reg);
@@ -478,6 +478,9 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)

/* Configure RSS if applicable */
fm10k_dev_mq_rx_configure(dev);
+
+   /* Decide the best RX function */
+   fm10k_set_rx_function(dev);
return 0;
 }

@@ -2069,6 +2072,34 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
.rss_hash_conf_get  = fm10k_rss_hash_conf_get,
 };

+static void __attribute__((cold))
+fm10k_set_rx_function(struct rte_eth_dev *dev)
+{
+   struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
+   uint16_t i, rx_using_sse;
+
+   /* In order to allow Vector Rx there are a few configuration
+* conditions to be met.
+*/
+   if (!fm10k_rx_vec_condition_check(dev) && dev_info->rx_vec_allowed) {
+   if (dev->data->scattered_rx)
+   dev->rx_pkt_burst = fm10k_recv_scattered_pkts_vec;
+   else
+   dev->rx_pkt_burst = fm10k_recv_pkts_vec;
+   } else if (dev->data->scattered_rx)
+   dev->rx_pkt_burst = fm10k_recv_scattered_pkts;
+
+   rx_using_sse =
+   (dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec ||
+   dev->rx_pkt_burst == fm10k_recv_pkts_vec);
+
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   struct fm10k_rx_queue *rxq = dev->data->rx_queues[i];
+   rxq->rx_using_sse = rx_using_sse;
+   }
+
+}
+
 static void
 fm10k_params_init(struct rte_eth_dev *dev)
 {
@@ -2102,9 +2133,6 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
dev->rx_pkt_burst = &fm10k_recv_pkts;
dev->tx_pkt_burst = &fm10k_xmit_pkts;

-   if (dev->data->scattered_rx)
-   dev->rx_pkt_burst = &fm10k_recv_scattered_pkts;
-
/* only initialize in the primary process */
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
return 0;
-- 
1.7.7.6

[dpdk-dev] [PATCH v2 11/16] fm10k: add Vector TX function

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Add Vector TX func fm10k_xmit_pkts_vec to transmit packets.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |5 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  150 
 2 files changed, 155 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index c5e66e2..0a4c174 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -215,6 +215,9 @@ struct fm10k_tx_queue {
uint16_t nb_used;
uint16_t free_thresh;
uint16_t rs_thresh;
+   /* Below 2 fields only valid in case vPMD is applied. */
+   uint16_t next_rs; /* Next pos to set RS flag */
+   uint16_t next_dd; /* Next pos to check DD flag */
volatile uint32_t *tail_ptr;
uint16_t nb_desc;
uint8_t port_id;
@@ -333,4 +336,6 @@ void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue 
*rxq);
 uint16_t fm10k_recv_pkts_vec(void *, struct rte_mbuf **, uint16_t);
 uint16_t fm10k_recv_scattered_pkts_vec(void *, struct rte_mbuf **,
uint16_t);
+uint16_t fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts);
 #endif
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index ab0218e..f119c2c 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -614,3 +614,153 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
return i + fm10k_reassemble_packets(rxq, &rx_pkts[i], nb_bufs - i,
&split_flags[i]);
 }
+
+static inline void
+vtx1(volatile struct fm10k_tx_desc *txdp,
+   struct rte_mbuf *pkt, uint64_t flags)
+{
+   __m128i descriptor = _mm_set_epi64x(flags << 56 |
+   pkt->vlan_tci << 16 | pkt->data_len,
+   MBUF_DMA_ADDR(pkt));
+   _mm_store_si128((__m128i *)txdp, descriptor);
+}
+
+static inline void
+vtx(volatile struct fm10k_tx_desc *txdp,
+   struct rte_mbuf **pkt, uint16_t nb_pkts,  uint64_t flags)
+{
+   int i;
+
+   for (i = 0; i < nb_pkts; ++i, ++txdp, ++pkt)
+   vtx1(txdp, *pkt, flags);
+}
+
+static inline int __attribute__((always_inline))
+fm10k_tx_free_bufs(struct fm10k_tx_queue *txq)
+{
+   struct rte_mbuf **txep;
+   uint8_t flags;
+   uint32_t n;
+   uint32_t i;
+   int nb_free = 0;
+   struct rte_mbuf *m, *free[RTE_FM10K_TX_MAX_FREE_BUF_SZ];
+
+   /* check DD bit on threshold descriptor */
+   flags = txq->hw_ring[txq->next_dd].flags;
+   if (!(flags & FM10K_TXD_FLAG_DONE))
+   return 0;
+
+   n = txq->rs_thresh;
+
+   /* First buffer to free from S/W ring is at index
+* next_dd - (rs_thresh-1)
+*/
+   txep = &txq->sw_ring[txq->next_dd - (n - 1)];
+   m = __rte_pktmbuf_prefree_seg(txep[0]);
+   if (likely(m != NULL)) {
+   free[0] = m;
+   nb_free = 1;
+   for (i = 1; i < n; i++) {
+   m = __rte_pktmbuf_prefree_seg(txep[i]);
+   if (likely(m != NULL)) {
+   if (likely(m->pool == free[0]->pool))
+   free[nb_free++] = m;
+   else {
+   rte_mempool_put_bulk(free[0]->pool,
+   (void *)free, nb_free);
+   free[0] = m;
+   nb_free = 1;
+   }
+   }
+   }
+   rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+   } else {
+   for (i = 1; i < n; i++) {
+   m = __rte_pktmbuf_prefree_seg(txep[i]);
+   if (m != NULL)
+   rte_mempool_put(m->pool, m);
+   }
+   }
+
+   /* buffers were freed, update counters */
+   txq->nb_free = (uint16_t)(txq->nb_free + txq->rs_thresh);
+   txq->next_dd = (uint16_t)(txq->next_dd + txq->rs_thresh);
+   if (txq->next_dd >= txq->nb_desc)
+   txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+
+   return txq->rs_thresh;
+}
+
+static inline void __attribute__((always_inline))
+tx_backlog_entry(struct rte_mbuf **txep,
+struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+   int i;
+
+   for (i = 0; i < (int)nb_pkts; ++i)
+   txep[i] = tx_pkts[i];
+}
+
+uint16_t
+fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts)
+{
+   struct fm10k_tx_queue *txq = (struct fm10k_tx_queue *)tx_queue;
+   volatile struct fm10k_tx_desc *txdp;
+   struct rte_mbuf **txep;
+   uint16_t n, nb_commit, tx_id;
+   uint64_t flags = FM10K_TXD_FLAG_LAST;
+   uint64_t rs = FM10K_TXD_FLAG_RS | FM10

[dpdk-dev] [PATCH v2 14/16] fm10k: Add function to decide best TX func

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Add func fm10k_set_tx_function to decide the best TX func in
fm10k_dev_tx_init.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h|1 +
 drivers/net/fm10k/fm10k_ethdev.c |   38 --
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 2bead12..68ae1b8 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -222,6 +222,7 @@ struct fm10k_tx_queue {
uint16_t next_rs; /* Next pos to set RS flag */
uint16_t next_dd; /* Next pos to check DD flag */
volatile uint32_t *tail_ptr;
+   uint32_t txq_flags; /* Holds flags for this TXq */
uint16_t nb_desc;
uint8_t port_id;
uint8_t tx_deferred_start; /** < don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 0a523eb..046979d 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -53,6 +53,9 @@
 #define CHARS_PER_UINT32 (sizeof(uint32_t))
 #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)

+#define FM10K_SIMPLE_TX_FLAG ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+   ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static void fm10k_close_mbx_service(struct fm10k_hw *hw);
 static void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev);
 static void fm10k_dev_promiscuous_disable(struct rte_eth_dev *dev);
@@ -68,6 +71,7 @@ fm10k_MACVLAN_remove_all(struct rte_eth_dev *dev);
 static void fm10k_tx_queue_release(void *queue);
 static void fm10k_rx_queue_release(void *queue);
 static void fm10k_set_rx_function(struct rte_eth_dev *dev);
+static void fm10k_set_tx_function(struct rte_eth_dev *dev);

 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -414,6 +418,10 @@ fm10k_dev_tx_init(struct rte_eth_dev *dev)
base_addr >> (CHAR_BIT * sizeof(uint32_t)));
FM10K_WRITE_REG(hw, FM10K_TDLEN(i), size);
}
+
+   /* set up vector or scalar TX function as appropriate */
+   fm10k_set_tx_function(dev);
+
return 0;
 }

@@ -980,8 +988,7 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
},
.tx_free_thresh = FM10K_TX_FREE_THRESH_DEFAULT(0),
.tx_rs_thresh = FM10K_TX_RS_THRESH_DEFAULT(0),
-   .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
-   ETH_TXQ_FLAGS_NOOFFLOADS,
+   .txq_flags = FM10K_SIMPLE_TX_FLAG,
};

 }
@@ -1479,6 +1486,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
q->nb_desc = nb_desc;
q->port_id = dev->data->port_id;
q->queue_id = queue_id;
+   q->txq_flags = conf->txq_flags;
q->ops = &def_txq_ops;
q->tail_ptr = (volatile uint32_t *)
&((uint32_t *)hw->hw_addr)[FM10K_TDT(queue_id)];
@@ -2090,6 +2098,32 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
 };

 static void __attribute__((cold))
+fm10k_set_tx_function(struct rte_eth_dev *dev)
+{
+   struct fm10k_tx_queue *txq;
+   int i;
+   int use_sse = 1;
+
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   txq = dev->data->tx_queues[i];
+   if ((txq->txq_flags & FM10K_SIMPLE_TX_FLAG) != \
+   FM10K_SIMPLE_TX_FLAG) {
+   use_sse = 0;
+   break;
+   }
+   }
+
+   if (use_sse) {
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   txq = dev->data->tx_queues[i];
+   fm10k_txq_vec_setup(txq);
+   }
+   dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
+   } else
+   dev->tx_pkt_burst = fm10k_xmit_pkts;
+}
+
+static void __attribute__((cold))
 fm10k_set_rx_function(struct rte_eth_dev *dev)
 {
struct fm10k_dev_info *dev_info = FM10K_DEV_PRIVATE_TO_INFO(dev);
-- 
1.7.7.6

[dpdk-dev] [PATCH v2 13/16] fm10k: introduce 2 funcs to reset TX queue and mbuf release

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Add 2 funcs to reset TX queue and mbuf release when Vector TX
applied.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k_rxtx_vec.c |   68 
 1 files changed, 68 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index f119c2c..5ed8653 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -44,6 +44,11 @@
 #pragma GCC diagnostic ignored "-Wcast-qual"
 #endif

+static void
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq);
+static void
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);
+
 /* Handling the offload flags (olflags) field takes computation
  * time when receiving packets. Therefore we provide a flag to disable
  * the processing of the olflags field when they are not needed. This
@@ -615,6 +620,17 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
&split_flags[i]);
 }

+static const struct fm10k_txq_ops vec_txq_ops = {
+   .release_mbufs = fm10k_tx_queue_release_mbufs_vec,
+   .reset = fm10k_reset_tx_queue,
+};
+
+void __attribute__((cold))
+fm10k_txq_vec_setup(struct fm10k_tx_queue *txq)
+{
+   txq->ops = &vec_txq_ops;
+}
+
 static inline void
 vtx1(volatile struct fm10k_tx_desc *txdp,
struct rte_mbuf *pkt, uint64_t flags)
@@ -764,3 +780,55 @@ fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,

return nb_pkts;
 }
+
+static void __attribute__((cold))
+fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq)
+{
+   unsigned i;
+   const uint16_t max_desc = (uint16_t)(txq->nb_desc - 1);
+
+   if (txq->sw_ring == NULL || txq->nb_free == max_desc)
+   return;
+
+   /* release the used mbufs in sw_ring */
+   for (i = txq->next_dd - (txq->rs_thresh - 1);
+i != txq->next_free;
+i = (i + 1) & max_desc)
+   rte_pktmbuf_free_seg(txq->sw_ring[i]);
+
+   txq->nb_free = max_desc;
+
+   /* reset tx_entry */
+   for (i = 0; i < txq->nb_desc; i++)
+   txq->sw_ring[i] = NULL;
+
+   rte_free(txq->sw_ring);
+   txq->sw_ring = NULL;
+}
+
+static void __attribute__((cold))
+fm10k_reset_tx_queue(struct fm10k_tx_queue *txq)
+{
+   static const struct fm10k_tx_desc zeroed_desc = {0};
+   struct rte_mbuf **txe = txq->sw_ring;
+   uint16_t i;
+
+   /* Zero out HW ring memory */
+   for (i = 0; i < txq->nb_desc; i++)
+   txq->hw_ring[i] = zeroed_desc;
+
+   /* Initialize SW ring entries */
+   for (i = 0; i < txq->nb_desc; i++)
+   txe[i] = NULL;
+
+   txq->next_dd = (uint16_t)(txq->rs_thresh - 1);
+   txq->next_rs = (uint16_t)(txq->rs_thresh - 1);
+
+   txq->next_free = 0;
+   txq->nb_used = 0;
+   /* Always allow 1 descriptor to be un-allocated to avoid
+* a H/W race condition
+*/
+   txq->nb_free = (uint16_t)(txq->nb_desc - 1);
+   FM10K_PCI_REG_WRITE(txq->tail_ptr, 0);
+}
-- 
1.7.7.6

[dpdk-dev] [PATCH v2 15/16] fm10k: fix a crash issue in vector RX func

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Vector RX function will process 4 packets at a time. When the RX
ring wrapps to the tail and the left descriptor size is not multiple
of 4, SW will overwrite memory that not belongs to it and cause crash.
The fix will allocate additional 4 HW/SW spaces at the tail to avoid
overwrite.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h|4 
 drivers/net/fm10k/fm10k_ethdev.c |   19 +--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 68ae1b8..82a548f 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -177,12 +177,16 @@ struct fm10k_rx_queue {
struct rte_mbuf *pkt_last_seg;  /* Last segment of current packet. */
uint64_t hw_ring_phys_addr;
uint64_t mbuf_initializer; /* value to init mbufs */
+   /* need to alloc dummy mbuf, for wraparound when scanning hw ring */
+   struct rte_mbuf fake_mbuf;
uint16_t next_dd;
uint16_t next_alloc;
uint16_t next_trigger;
uint16_t alloc_thresh;
volatile uint32_t *tail_ptr;
uint16_t nb_desc;
+   /* Number of faked desc added at the tail for Vector RX function */
+   uint16_t nb_fake_desc;
uint16_t queue_id;
/* Below 2 fields only valid in case vPMD is applied. */
uint16_t rxrearm_nb; /* number of remaining to be re-armed */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 046979d..31c96ac 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -102,6 +102,7 @@ fm10k_mbx_unlock(struct fm10k_hw *hw)
 static inline int
 rx_queue_reset(struct fm10k_rx_queue *q)
 {
+   static const union fm10k_rx_desc zero = {{0}};
uint64_t dma_addr;
int i, diag;
PMD_INIT_FUNC_TRACE();
@@ -122,6 +123,15 @@ rx_queue_reset(struct fm10k_rx_queue *q)
q->hw_ring[i].q.hdr_addr = dma_addr;
}

+   /* initialize extra software ring entries. Space for these extra
+* entries is always allocated.
+*/
+   memset(&q->fake_mbuf, 0x0, sizeof(q->fake_mbuf));
+   for (i = 0; i < q->nb_fake_desc; ++i) {
+   q->sw_ring[q->nb_desc + i] = &q->fake_mbuf;
+   q->hw_ring[q->nb_desc + i] = zero;
+   }
+
q->next_dd = 0;
q->next_alloc = 0;
q->next_trigger = q->alloc_thresh - 1;
@@ -147,6 +157,10 @@ rx_queue_clean(struct fm10k_rx_queue *q)
for (i = 0; i < q->nb_desc; ++i)
q->hw_ring[i] = zero;

+   /* zero faked descriptors */
+   for (i = 0; i < q->nb_fake_desc; ++i)
+   q->hw_ring[q->nb_desc + i] = zero;
+
/* vPMD driver has a different way of releasing mbufs. */
if (q->rx_using_sse) {
fm10k_rx_queue_release_mbufs_vec(q);
@@ -1323,6 +1337,7 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
/* setup queue */
q->mp = mp;
q->nb_desc = nb_desc;
+   q->nb_fake_desc = FM10K_MULT_RX_DESC;
q->port_id = dev->data->port_id;
q->queue_id = queue_id;
q->tail_ptr = (volatile uint32_t *)
@@ -1332,8 +1347,8 @@ fm10k_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,

/* allocate memory for the software ring */
q->sw_ring = rte_zmalloc_socket("fm10k sw ring",
-   nb_desc * sizeof(struct rte_mbuf *),
-   RTE_CACHE_LINE_SIZE, socket_id);
+   (nb_desc + q->nb_fake_desc) * sizeof(struct rte_mbuf *),
+   RTE_CACHE_LINE_SIZE, socket_id);
if (q->sw_ring == NULL) {
PMD_INIT_LOG(ERR, "Cannot allocate software ring");
rte_free(q);
-- 
1.7.7.6

[dpdk-dev] [PATCH v2 16/16] doc: release notes update for fm10k Vector PMD

2015-10-22 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Update 2.2 release notes, add descriptions for Vector PMD implementation
in fm10k driver.

Signed-off-by: Chen Jing D(Mark) 
---
 doc/guides/rel_notes/release_2_2.rst |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index 9a70dae..44a3f74 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -39,6 +39,11 @@ Drivers

   Fixed issue with libvirt ``virsh destroy`` not killing the VM.

+* **fm10k:  Add Vector Rx/Tx implementation.**
+
+  This patch set includes Vector Rx/Tx functions to receive/transmit packets
+  for fm10k devices. It also contains logic to do sanity check for proper
+  RX/TX function selections.

 Libraries
 ~
-- 
1.7.7.6

[dpdk-dev] [RFC PATCH v3 0/2] Add VHOST PMD

2015-10-22 Thread Tetsuya Mukawa

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

I've submitted below patches in former patch sets. But it seems some issues
were fixed already.

 - [PATCH 1/3] vhost: Fix return value of GET_VRING_BASE message
 - [PATCH 2/3] vhost: Fix RESET_OWNER handling not to close callfd
 - [PATCH 3/3] vhost: Fix RESET_OWNER handling not to free virtqueue

I've still seen some reasource leaks of vhost library, but in this RFC,
I focused on vhost PMD.
After I get agreement, I will submit a patch for the leak issue as separated
patch. So please check directionality of vhost PMD.

PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a bit of limitations. See "rte_eth_vhost.h".
- rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

Tetsuya Mukawa (2):
  vhost: Add callback and private data for vhost PMD
  vhost: Add VHOST PMD

 config/common_linuxapp|   6 +
 drivers/net/Makefile  |   4 +
 drivers/net/vhost/Makefile|  62 +++
 drivers/net/vhost/rte_eth_vhost.c | 735 ++
 drivers/net/vhost/rte_eth_vhost.h |  65 +++
 drivers/net/vhost/rte_pmd_vhost_version.map   |   8 +
 lib/librte_vhost/rte_virtio_net.h |   3 +
 lib/librte_vhost/vhost_user/virtio-net-user.c |   8 +-
 lib/librte_vhost/virtio-net.c |  40 +-
 lib/librte_vhost/virtio-net.h |   3 +-
 mk/rte.app.mk |   8 +-
 11 files changed, 934 insertions(+), 8 deletions(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

[dpdk-dev] [RFC PATCH v3 1/2] vhost: Add callback and private data for vhost PMD

2015-10-22 Thread Tetsuya Mukawa

These variables are needed to be able to manage one of virtio devices
using both vhost library APIs and vhost PMD.
For example, if vhost PMD uses current callback handler and private data
provided by vhost library, A DPDK application that links vhost library
cannot use some of vhost library APIs. To avoid it, callback and private
data for vhost PMD are needed.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_vhost/rte_virtio_net.h |  3 ++
 lib/librte_vhost/vhost_user/virtio-net-user.c |  8 +++---
 lib/librte_vhost/virtio-net.c | 40 +--
 lib/librte_vhost/virtio-net.h |  3 +-
 4 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 93d3e27..ec84c9b 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -108,6 +108,7 @@ struct virtio_net {
uint32_tvirt_qp_nb;
uint32_tmem_idx;/** Used in set memory layout, 
unique for each queue within virtio device. */
void*priv;  /**< private context */
+   void*pmd_priv;  /**< private context for vhost 
PMD */
 } __rte_cache_aligned;

 /**
@@ -198,6 +199,8 @@ int rte_vhost_driver_unregister(const char *dev_name);

 /* Register callbacks. */
 int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * 
const);
+/* Register callbacks for vhost PMD (Only for internal). */
+int rte_vhost_driver_pmd_callback_register(struct virtio_net_device_ops const 
* const);
 /* Start vhost driver session blocking loop. */
 int rte_vhost_driver_session_start(void);

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c 
b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 6a12d96..a75697f 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -288,7 +288,7 @@ user_set_vring_kick(struct vhost_device_ctx ctx, struct 
VhostUserMsg *pmsg)

if (virtio_is_ready(dev) &&
!(dev->flags & VIRTIO_DEV_RUNNING))
-   notify_ops->new_device(dev);
+   notify_new_device(dev);
 }

 /*
@@ -302,7 +302,7 @@ user_get_vring_base(struct vhost_device_ctx ctx,

/* We have to stop the queue (virtio) if it is running. */
if (dev->flags & VIRTIO_DEV_RUNNING)
-   notify_ops->destroy_device(dev);
+   notify_destroy_device(dev);

/* Here we are safe to get the last used index */
ops->get_vring_base(ctx, state->index, state);
@@ -333,7 +333,7 @@ user_reset_owner(struct vhost_device_ctx ctx,

/* We have to stop the queue (virtio) if it is running. */
if (dev->flags & VIRTIO_DEV_RUNNING)
-   notify_ops->destroy_device(dev);
+   notify_destroy_device(dev);

RTE_LOG(INFO, VHOST_CONFIG,
"reset owner --- state idx:%d state num:%d\n", state->index, 
state->num);
@@ -379,7 +379,7 @@ user_destroy_device(struct vhost_device_ctx ctx)
uint32_t i;

if (dev && (dev->flags & VIRTIO_DEV_RUNNING))
-   notify_ops->destroy_device(dev);
+   notify_destroy_device(dev);

for (i = 0; i < dev->virt_qp_nb; i++)
if (dev && dev->mem_arr[i]) {
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 3131719..eec3c22 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -64,6 +64,8 @@ struct virtio_net_config_ll {

 /* device ops to add/remove device to/from data core. */
 struct virtio_net_device_ops const *notify_ops;
+/* device ops for vhost PMD to add/remove device to/from data core. */
+struct virtio_net_device_ops const *pmd_notify_ops;
 /* root address of the linked list of managed virtio devices */
 static struct virtio_net_config_ll *ll_root;

@@ -84,6 +86,29 @@ static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;

 static uint64_t VHOST_PROTOCOL_FEATURES = VHOST_SUPPORTED_PROTOCOL_FEATURES;

+int
+notify_new_device(struct virtio_net *dev)
+{
+   if ((pmd_notify_ops != NULL) && (pmd_notify_ops->new_device != NULL)) {
+   int ret = pmd_notify_ops->new_device(dev);
+   if (ret != 0)
+   return ret;
+   }
+   if ((notify_ops != NULL) && (notify_ops->new_device != NULL))
+   return notify_ops->new_device(dev);
+
+   return 0;
+}
+
+void
+notify_destroy_device(volatile struct virtio_net *dev)
+{
+   if ((pmd_notify_ops != NULL) && (pmd_notify_ops->destroy_device != 
NULL))
+   pmd_notify_ops->destroy_device(dev);
+   if ((notify_ops != NULL) && (notify_ops->destroy_device != NULL))
+   notify_ops->destroy_device(dev);
+}
+
 /*
  * Converts QEMU virtual address to Vhost virtual address. This function is
  * used to convert the ring addresses to our address space.
@@ -

[dpdk-dev] [RFC PATCH v3 2/2] vhost: Add VHOST PMD

2015-10-22 Thread Tetsuya Mukawa

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path connect to a
   virtio-net device.
 - queues: The parameter is used to specify the number of the queues
   virtio-net device has.
   (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=3' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \

-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce \
-device virtio-net-pci,netdev=net0

Signed-off-by: Tetsuya Mukawa 
---
 config/common_linuxapp  |   6 +
 drivers/net/Makefile|   4 +
 drivers/net/vhost/Makefile  |  62 +++
 drivers/net/vhost/rte_eth_vhost.c   | 735 
 drivers/net/vhost/rte_eth_vhost.h   |  65 +++
 drivers/net/vhost/rte_pmd_vhost_version.map |   8 +
 mk/rte.app.mk   |   8 +-
 7 files changed, 887 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..7310240 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -446,6 +446,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n

 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 5ebf963..e46a38e 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -49,5 +49,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt

+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 000..8186a80
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2015 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_vhost.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_vhost_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_vhost.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/

[dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index

2015-10-22 Thread Yuanhan Liu

On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > Please note that for virtio devices, guest is supposed to
> > > control the placement of incoming packets in RX queues.
> > 
> > I may not follow you.
> > 
> > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > guest, how could the guest take the control here?
> > 
> > --yliu
> 
> vhost should do what guest told it to.
> 
> See virtio spec:
>   5.1.6.5.5 Automatic receive steering in multiqueue mode

Spec says:

After the driver transmitted a packet of a flow on transmitqX,
the device SHOULD cause incoming packets for that flow to be
steered to receiveqX.

Michael, I still have no idea how vhost could know the flow even
after discussion with Huawei. Could you be more specific about
this? Say, how could guest know that? And how could guest tell
vhost which RX is gonna to use?

Thanks.

--yliu

[dpdk-dev] [PATCH v7 3/8] vhost: vring queue setup for multiple queue support

2015-10-22 Thread Xie, Huawei

On 10/21/2015 11:48 AM, Yuanhan Liu wrote:
> All queue pairs, including the default (the first) queue pair,
> are allocated dynamically, when a vring_call message is received
> first time for a specific queue pair.
>
> This is a refactor work for enabling vhost-user multiple queue;
> it should not break anything as it does no functional changes:
> we don't support mq set, so there is only one mq at max.
>
> This patch is based on Changchun's patch.
>
[...]
>  
>  void
> @@ -290,13 +298,9 @@ user_get_vring_base(struct vhost_device_ctx ctx,
>* sent and only sent in vhost_vring_stop.
>* TODO: cleanup the vring, it isn't usable since here.
>*/
> - if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
> - close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
> - dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
> - }
> - if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
> - close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
> - dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
> + if ((dev->virtqueue[state->index]->kickfd) >= 0) {
> + close(dev->virtqueue[state->index]->kickfd);
> + dev->virtqueue[state->index]->kickfd = -1;
>   }
Since we change the behavior here, better list in the commit message as
well.

>  
>  
> @@ -680,13 +704,21 @@ set_vring_call(struct vhost_device_ctx ctx, struct 
> vhost_vring_file *file)
>  {
>   struct virtio_net *dev;
>   struct vhost_virtqueue *vq;
> + uint32_t cur_qp_idx = file->index / VIRTIO_QNUM;
>  
>   dev = get_device(ctx);
>   if (dev == NULL)
>   return -1;
>  
> + /* alloc vring queue pair if it is a new queue pair */
> + if (cur_qp_idx + 1 > dev->virt_qp_nb) {
> + if (alloc_vring_queue_pair(dev, cur_qp_idx) < 0)
> + return -1;
> + }
> +
Here we rely on the fact that this set_vring_call message is sent in the
continuous ascending order of queue idx 0, 1, 2, ...

>   /* file->index refers to the queue index. The txq is 1, rxq is 0. */
>   vq = dev->virtqueue[file->index];
> + assert(vq != NULL);
>  
If we allocate the queue until the we receive the first vring message,
better add comment that we rely on this fact.
Could we add the vhost-user message to tell us the queue number QEMU
allocates before vring message?
>   if (vq->callfd >= 0)
>   close(vq->callfd);

[dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD

2015-10-22 Thread Tetsuya Mukawa

On 2015/10/21 19:22, Bruce Richardson wrote:
> On Wed, Oct 21, 2015 at 09:25:12AM +0300, Panu Matilainen wrote:
>> On 10/21/2015 07:35 AM, Tetsuya Mukawa wrote:
>>> On 2015/10/19 22:27, Richardson, Bruce wrote:
> -Original Message-
> From: Panu Matilainen [mailto:pmatilai at redhat.com]
> Sent: Monday, October 19, 2015 2:26 PM
> To: Tetsuya Mukawa ; Richardson, Bruce
> ; Loftus, Ciara 
> Cc: dev at dpdk.org; ann.zhuangyanying at huawei.com
> Subject: Re: [dpdk-dev] [RFC PATCH v2] vhost: Add VHOST PMD
>
> On 10/19/2015 01:50 PM, Tetsuya Mukawa wrote:
>> On 2015/10/19 18:45, Bruce Richardson wrote:
>>> On Mon, Oct 19, 2015 at 10:32:50AM +0100, Loftus, Ciara wrote:
> On 2015/10/16 21:52, Bruce Richardson wrote:
>> On Mon, Aug 31, 2015 at 12:55:26PM +0900, Tetsuya Mukawa wrote:
>>> The patch introduces a new PMD. This PMD is implemented as thin
> wrapper
>>> of librte_vhost. It means librte_vhost is also needed to compile
> the PMD.
>>> The PMD can have 'iface' parameter like below to specify a path
>>> to
> connect
>>> to a virtio-net device.
>>>
>>> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0' -- -i
>>>
>>> To connect above testpmd, here is qemu command example.
>>>
>>> $ qemu-system-x86_64 \
>>>  
>>>  -chardev socket,id=chr0,path=/tmp/sock0 \
>>>  -netdev vhost-user,id=net0,chardev=chr0,vhostforce \
>>>  -device virtio-net-pci,netdev=net0
>>>
>>> Signed-off-by: Tetsuya Mukawa 
>> With this PMD in place, is there any need to keep the existing
>> vhost library around as a separate entity? Can the existing
>> library be
> subsumed/converted into
>> a standard PMD?
>>
>> /Bruce
> Hi Bruce,
>
> I concern about whether the PMD has all features of librte_vhost,
> because librte_vhost provides more features and freedom than ethdev
> API provides.
> In some cases, user needs to choose limited implementation without
> librte_vhost.
> I am going to eliminate such cases while implementing the PMD.
> But I don't have strong belief that we can remove librte_vhost now.
>
> So how about keeping current separation in next DPDK?
> I guess people will try to replace librte_vhost to vhost PMD,
> because apparently using ethdev APIs will be useful in many cases.
> And we will get feedbacks like "vhost PMD needs to support like this
> usage".
> (Or we will not have feedbacks, but it's also OK.) Then, we will be
> able to merge librte_vhost and vhost PMD.
 I agree with the above. One the concerns I had when reviewing the
> patch was that the PMD removes some freedom that is available with the
> library. Eg. Ability to implement the new_device and destroy_device
> callbacks. If using the PMD you are constrained to the implementations of
> these in the PMD driver, but if using librte_vhost, you can implement your
> own with whatever functionality you like - a good example of this can be
> seen in the vhost sample app.
 On the other hand, the PMD is useful in that it removes a lot of
> complexity for the user and may work for some more general use cases. So I
> would be in favour of having both options available too.
 Ciara

>>> Thanks.
>>> However, just because the libraries are merged does not mean that you
>>> need be limited by PMD functionality. Many PMDs provide additional
>>> library-specific functions over and above their PMD capabilities. The
>>> bonded PMD is a good example here, as it has a whole set of extra
>>> functions to create and manipulate bonded devices - things that are
>>> obviously not part of the general ethdev API. Other vPMDs similarly
> include functions to allow them to be created on the fly too.
>>> regards,
>>> /Bruce
>> Hi Bruce,
>>
>> I appreciate for showing a good example. I haven't noticed the PMD.
>> I will check the bonding PMD, and try to remove librte_vhost without
>> losing freedom and features of the library.
> Hi,
>
> Just a gentle reminder - if you consider removing (even if by just
> replacing/renaming) an entire library, it needs to happen the ABI
> deprecation process.
>
> It seems obvious enough. But for all the ABI policing here, somehow we all
> failed to notice the two compatibility breaking rename-elephants in the
> room during 2.1 development:
> - libintel_dpdk was renamed to libdpdk
> - librte_pmd_virtio_uio was renamed to librte_pmd_virtio
>
> Of course these cases are easy to work around with symlinks, and are
> unrelated to the matter at hand. Just wa

[dpdk-dev] [PATCH v7 1/8] vhost-user: add protocol features support

2015-10-22 Thread Xie, Huawei

On 10/21/2015 11:48 AM, Yuanhan Liu wrote:
[...]
>
> VHOST_USER_PROTOCOL_FEATURES is initated to 0, as we don't support
>
s/initiated/initialized/

[dpdk-dev] Inconsistent statistics counters for pmd_i40e

2015-10-22 Thread Eimear Morrissey



Arnon Warshavsky  wrote on 10/19/2015 03:46:22 PM:

> From: Arnon Warshavsky 
> To: Eimear Morrissey/Ireland/IBM at IBMIE
> Cc: dev at dpdk.org
> Date: 10/19/2015 03:46 PM
> Subject: Re: [dpdk-dev] Inconsistent statistics counters for pmd_i40e
>
> Hi Eimear
>
> This is the link I have.
> https://downloadcenter.intel.com/download/24769
>
> I guess that the version seen in the web page comes from a different
> parallel universe.
> You should see the actual fw version inside the zip file.

> Thanks
> /Arnon
>
> On Mon, Oct 19, 2015 at 5:30 PM, Eimear Morrissey
 > wrote:
> Arnon Warshavsky  wrote on 10/19/2015 03:01:46 PM:
>
> > From: Arnon Warshavsky 
> > To: Eimear Morrissey/Ireland/IBM at IBMIE
> > Cc: dev at dpdk.org
> > Date: 10/19/2015 03:01 PM
> > Subject: Re: [dpdk-dev] Inconsistent statistics counters for pmd_i40e
>
> >
> > Hi Eimear,
> >
> > I just experienced the same problem with firmware versions 4.23 and
> > 4.33 (dpdk 2.0). Did not get to try the latest which is 4.5.
> > Looking at the code, I don't see that this counter is being read any
> > differently than its peer counters and I suspect the nic itself.
> > Can you tell which firmware version you were using?
> >
> > thanks
> > /Arnon
> >
> > On Mon, Oct 19, 2015 at 2:43 PM, Eimear Morrissey <
> eimear.morrissey at ie.ibm.com
> > > wrote:
> >
> >
> > Hi,
> >
> > I'm having issues measuring packets dropped at the NIC in both the
2.0.0
> > and 2.1.0 versions of DPDK on an X710 Intel NIC.
> >
> > In dpdk-2.0.0
> > Using rte_eth_xstats the rx_packets and rx_bytes counters increase as
> > expected, however rx_missed_errors is always 0 even if a sleep
statement is
> > added between calls to rte_eth_rx_burst. However changing the coremask
so
> > the application is running on a different socket than the card will
cause
> > rx_missed_errors to increment for a limited amount of time and then
stop.
> > Using rte_eth_stats, ipackets is incremented on packet receipt but the
> > q_ipackets and q_errors arrays remain zero. Even crossing sockets seems
to
> > have no effect on q_errors.
> >
> > In dpdk-2.1.0 the behaviour is the same as above, except that the
number of
> > fields returned by rte_eth_xstats_get is reduced (no rx_missed errors
at
> > all) so running on a different socket no longer has any noticeable
effect
> > on the stats.
> >
> > My understanding from the API manual is that the rte_eth_stats q_errors
> > array should count the packets missed because software isn't polling
fast
> > enough, but that doesn't seem to be the case? Is there a standard DPDK
way
> > to check this? The application is a forwarding one so there's no other
way
> > to estimate drop except through NIC rx.
> >
> > Thanks,
> > Eimear
> >
> >
> >
> > --
> >
> > Arnon Warshavsky
> > Qwilt | work: +972-72-2221634 | mobile: +972-50-8583058 |
arnon at qwilt.com

> Hi Arnon,
>
> The firmware version I'm using is 4.26. Where do you see the latest
> is 4.5 - I can't find anything obvious in the download centre?
>
> Regards,
> Eimear
>
>
>
> --
>
> Arnon Warshavsky
> Qwilt | work: +972-72-2221634 | mobile: +972-50-8583058 | arnon at qwilt.com


I tried installing the firmware but on running the NVM update tool I just
get "No updates available for your device" which makes me think that (for
my SKU at least) I'm on the latest firmware.

Also, if I bind the card back to the i40e driver, I can force the dropped
count in ifconfig to increase by decreasing the rx ring size so I'm not
convinced it's entirely a hardware issue.

Regards,
Eimear

[dpdk-dev] [PATCH] tools: exit setup script without prompt

2015-10-22 Thread John McNamara

Exit tools/setup.sh script without prompting "Press enter to continue".

The script can now be exited by typing the option number, "quit" or "q".

Signed-off-by: John McNamara 
---
 tools/setup.sh | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/setup.sh b/tools/setup.sh
index 5a8b2f3..fbd3853 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -57,6 +57,12 @@ quit()
QUIT=$1
 }

+# Shortcut for quit.
+q()
+{
+   quit
+}
+
 #
 # Sets up environmental variables for ICC.
 #
@@ -628,6 +634,10 @@ while [ "$QUIT" == "0" ]; do
read our_entry
echo ""
${OPTIONS[our_entry]} ${our_entry}
-   echo
-   echo -n "Press enter to continue ..."; read
+
+   if [ "$QUIT" == "0" ] ; then
+   echo
+   echo -n "Press enter to continue ..."; read
+   fi
+
 done
-- 
1.8.1.4

[dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements

2015-10-22 Thread Xie, Huawei

On 10/21/2015 9:20 PM, Thomas Monjalon wrote:
> 2015-10-18 22:16, Stephen Hemminger:
>> This is a tested version of the virtio Tx performance improvements
>> that I posted earlier on the list, and described at the DPDK Userspace
>> meeting in Dublin. Together they get a 25% performance improvement for
>> both small packet and large multi-segment packet case when testing
>> from DPDK guest application to Linux KVM host.
>>
>> Stephen Hemminger (5):
>>   virtio: clean up space checks on xmit
>>   virtio: don't use unlikely for normal tx stuff
>>   virtio: use indirect ring elements
>>   virtio: use any layout on transmit
>>   virtio: optimize transmit enqueue
> Huawei, do you ack this series?
>
Okay with this patchset with two remained questions,

+/* Region reserved to allow for transmit header and indirect ring */
+#define VIRTIO_MAX_TX_INDIRECT 8
+struct virtio_tx_region {
+   struct virtio_net_hdr_mrg_rxbuf tx_hdr;

Why use merge-able rx header here in the tx region?

> + struct vring_desc tx_indir[VIRTIO_MAX_TX_INDIRECT]
> +__attribute__((__aligned__(16)));

WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
[...]

[dpdk-dev] ixgbe: ierrors counter spuriously increasing in DPDK 2.1

2015-10-22 Thread Andriy Berestovskyy

Hi Martin,
We agreed on the main point: it's an issue. IMO the implementation
details are up to Maryam.

There have been few patches, so I guess it will be fixed in 2.2.

Andriy


On Thu, Oct 22, 2015 at 9:46 AM, Martin Weiser
 wrote:
> Hi Andriy,
>
> thank you for pointing this discussion out to me. I somehow missed it.
> Unfortunately it looks like the discussion stopped after Maryam made a
> good proposal so I will vote in on that and hopefully get things started
> again.
>
> Best regards,
> Martin
>
>
>
> On 21.10.15 17:53, Andriy Berestovskyy wrote:
>> Yes Marcin,
>> The issue was discussed here:
>> http://dpdk.org/ml/archives/dev/2015-September/023229.html
>>
>> You can either fix the ierrors in ixgbe_dev_stats_get() or implement a
>> workaround in your app getting the extended statistics and counting
>> out some of extended counters from the ierrors.
>>
>> Here is an example:
>> https://github.com/Juniper/contrail-vrouter/commit/72f6ca05ac81d0ca5e7eb93c6ffe7a93648c2b00#diff-99c1f65a00658c7d38b3d1b64cb5fd93R1306
>>
>> Regards,
>> Andriy
>>
>> On Wed, Oct 21, 2015 at 10:38 AM, Martin Weiser
>>  wrote:
>>> Hi,
>>>
>>> with DPDK 2.1 we are seeing the ierrors counter increasing for 82599ES
>>> ports without reason. Even directly after starting test-pmd the error
>>> counter immediately is 1 without even a single packet being sent to the
>>> device:
>>>
>>> ./testpmd -c 0xfe -n 4 -- --portmask 0x3 --interactive
>>> ...
>>> testpmd> show port stats all
>>>
>>>    NIC statistics for port 0  
>>> 
>>>   RX-packets: 0  RX-missed: 0  RX-bytes:  0
>>>   RX-badcrc:  0  RX-badlen: 0  RX-errors: 1
>>>   RX-nombuf:  0
>>>   TX-packets: 0  TX-errors: 0  TX-bytes:  0
>>>   
>>> 
>>>
>>>    NIC statistics for port 1  
>>> 
>>>   RX-packets: 0  RX-missed: 0  RX-bytes:  0
>>>   RX-badcrc:  0  RX-badlen: 0  RX-errors: 1
>>>   RX-nombuf:  0
>>>   TX-packets: 0  TX-errors: 0  TX-bytes:  0
>>>   
>>> 
>>>
>>>
>>> When packet forwarding is started the ports perform normally and
>>> properly forward all packets but a huge number of ierrors is counted:
>>>
>>> testpmd> start
>>> ...
>>> testpmd> show port stats all
>>>
>>>    NIC statistics for port 0  
>>> 
>>>   RX-packets: 9011857RX-missed: 0  RX-bytes:  5020932992
>>>   RX-badcrc:  0  RX-badlen: 0  RX-errors: 9011753
>>>   RX-nombuf:  0
>>>   TX-packets: 9026250TX-errors: 0  TX-bytes:  2922375542
>>>   
>>> 
>>>
>>>    NIC statistics for port 1  
>>> 
>>>   RX-packets: 9026250RX-missed: 0  RX-bytes:  2922375542
>>>   RX-badcrc:  0  RX-badlen: 0  RX-errors: 9026138
>>>   RX-nombuf:  0
>>>   TX-packets: 9011857TX-errors: 0  TX-bytes:  5020932992
>>>   
>>> 
>>>
>>>
>>> When running the exact same test with DPDK version 2.0 no ierrors are
>>> reported.
>>> Is anyone else seeing strange ierrors being reported for Intel Niantic
>>> cards with DPDK 2.1?
>>>
>>> Best regards,
>>> Martin
>>>
>>
>>
>
>



-- 
Andriy Berestovskyy

[dpdk-dev] Inconsistent statistics counters for pmd_i40e

2015-10-22 Thread Arnon Warshavsky

You are right.
Given this thread updated today
http://dpdk.org/ml/archives/dev/2015-September/023480.html (updates from
today still not there)
It seems I was too quick to jump to conclusion.

Just in case, when bound to i40e, can you run ethtool -i on that interface?
It should show the fw version.

thanks
/Arnon

On Thu, Oct 22, 2015 at 12:57 PM, Eimear Morrissey <
eimear.morrissey at ie.ibm.com> wrote:

> Arnon Warshavsky  wrote on 10/19/2015 03:46:22 PM:
>
> > From: Arnon Warshavsky 
> > To: Eimear Morrissey/Ireland/IBM at IBMIE
> > Cc: dev at dpdk.org
> > Date: 10/19/2015 03:46 PM
>
> > Subject: Re: [dpdk-dev] Inconsistent statistics counters for pmd_i40e
> >
> > Hi Eimear
> >
> > This is the link I have.
> > https://downloadcenter.intel.com/download/24769
> >
> > I guess that the version seen in the web page comes from a different
> > parallel universe.
> > You should see the actual fw version inside the zip file.
>
> > Thanks
> > /Arnon
> >
> > On Mon, Oct 19, 2015 at 5:30 PM, Eimear Morrissey <
> eimear.morrissey at ie.ibm.com
> > > wrote:
> > Arnon Warshavsky  wrote on 10/19/2015 03:01:46 PM:
> >
> > > From: Arnon Warshavsky 
> > > To: Eimear Morrissey/Ireland/IBM at IBMIE
> > > Cc: dev at dpdk.org
> > > Date: 10/19/2015 03:01 PM
> > > Subject: Re: [dpdk-dev] Inconsistent statistics counters for pmd_i40e
> >
> > >
> > > Hi Eimear,
> > >
> > > I just experienced the same problem with firmware versions 4.23 and
> > > 4.33 (dpdk 2.0). Did not get to try the latest which is 4.5.
> > > Looking at the code, I don't see that this counter is being read any
> > > differently than its peer counters and I suspect the nic itself.
> > > Can you tell which firmware version you were using?
> > >
> > > thanks
> > > /Arnon
> > >
> > > On Mon, Oct 19, 2015 at 2:43 PM, Eimear Morrissey <
> > eimear.morrissey at ie.ibm.com
> > > > wrote:
> > >
> > >
> > > Hi,
> > >
> > > I'm having issues measuring packets dropped at the NIC in both the
> 2.0.0
> > > and 2.1.0 versions of DPDK on an X710 Intel NIC.
> > >
> > > In dpdk-2.0.0
> > > Using rte_eth_xstats the rx_packets and rx_bytes counters increase as
> > > expected, however rx_missed_errors is always 0 even if a sleep
> statement is
> > > added between calls to rte_eth_rx_burst. However changing the coremask
> so
> > > the application is running on a different socket than the card will
> cause
> > > rx_missed_errors to increment for a limited amount of time and then
> stop.
> > > Using rte_eth_stats, ipackets is incremented on packet receipt but the
> > > q_ipackets and q_errors arrays remain zero. Even crossing sockets
> seems to
> > > have no effect on q_errors.
> > >
> > > In dpdk-2.1.0 the behaviour is the same as above, except that the
> number of
> > > fields returned by rte_eth_xstats_get is reduced (no rx_missed errors
> at
> > > all) so running on a different socket no longer has any noticeable
> effect
> > > on the stats.
> > >
> > > My understanding from the API manual is that the rte_eth_stats q_errors
> > > array should count the packets missed because software isn't polling
> fast
> > > enough, but that doesn't seem to be the case? Is there a standard DPDK
> way
> > > to check this? The application is a forwarding one so there's no other
> way
> > > to estimate drop except through NIC rx.
> > >
> > > Thanks,
> > > Eimear
> > >
> > >
> > >
> > > --
> > >
> > > Arnon Warshavsky
> > > Qwilt | work: +972-72-2221634 | mobile: +972-50-8583058 |
> arnon at qwilt.com
>
> > Hi Arnon,
> >
> > The firmware version I'm using is 4.26. Where do you see the latest
> > is 4.5 - I can't find anything obvious in the download centre?
> >
> > Regards,
> > Eimear
> >
> >
> >
> > --
> >
> > Arnon Warshavsky
> > Qwilt | work: +972-72-2221634 | mobile: +972-50-8583058 |
> arnon at qwilt.com
>
>
> I tried installing the firmware but on running the NVM update tool I just
> get "No updates available for your device" which makes me think that (for
> my SKU at least) I'm on the latest firmware.
>
> Also, if I bind the card back to the i40e driver, I can force the dropped
> count in ifconfig to increase by decreasing the rx ring size so I'm not
> convinced it's entirely a hardware issue.
>
> Regards,
> Eimear
>
>


-- 

*Arnon Warshavsky*
*Qwilt | work: +972-72-2221634 | mobile: +972-50-8583058 | arnon at qwilt.com
*

[dpdk-dev] [PATCH v7 3/8] vhost: vring queue setup for multiple queue support

2015-10-22 Thread Yuanhan Liu

On Thu, Oct 22, 2015 at 09:49:58AM +, Xie, Huawei wrote:
> On 10/21/2015 11:48 AM, Yuanhan Liu wrote:
> > All queue pairs, including the default (the first) queue pair,
> > are allocated dynamically, when a vring_call message is received
> > first time for a specific queue pair.
> >
> > This is a refactor work for enabling vhost-user multiple queue;
> > it should not break anything as it does no functional changes:
> > we don't support mq set, so there is only one mq at max.
> >
> > This patch is based on Changchun's patch.
> >
> [...]
> >  
> >  void
> > @@ -290,13 +298,9 @@ user_get_vring_base(struct vhost_device_ctx ctx,
> >  * sent and only sent in vhost_vring_stop.
> >  * TODO: cleanup the vring, it isn't usable since here.
> >  */
> > -   if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
> > -   close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
> > -   dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
> > -   }
> > -   if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
> > -   close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
> > -   dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
> > +   if ((dev->virtqueue[state->index]->kickfd) >= 0) {
> > +   close(dev->virtqueue[state->index]->kickfd);
> > +   dev->virtqueue[state->index]->kickfd = -1;
> > }
> Since we change the behavior here, better list in the commit message as
> well.

I checked the code again, and found I should not change that:
GET_VRING_BASE is sent per virt queue pair.

BTW, it's wrong to do this kind of stuff here, we need fix
it in future.

> 
> >  
> >  
> > @@ -680,13 +704,21 @@ set_vring_call(struct vhost_device_ctx ctx, struct 
> > vhost_vring_file *file)
> >  {
> > struct virtio_net *dev;
> > struct vhost_virtqueue *vq;
> > +   uint32_t cur_qp_idx = file->index / VIRTIO_QNUM;
> >  
> > dev = get_device(ctx);
> > if (dev == NULL)
> > return -1;
> >  
> > +   /* alloc vring queue pair if it is a new queue pair */
> > +   if (cur_qp_idx + 1 > dev->virt_qp_nb) {
> > +   if (alloc_vring_queue_pair(dev, cur_qp_idx) < 0)
> > +   return -1;
> > +   }
> > +
> Here we rely on the fact that this set_vring_call message is sent in the
> continuous ascending order of queue idx 0, 1, 2, ...

That's true.

> 
> > /* file->index refers to the queue index. The txq is 1, rxq is 0. */
> > vq = dev->virtqueue[file->index];
> > +   assert(vq != NULL);
> >  
> If we allocate the queue until the we receive the first vring message,
> better add comment that we rely on this fact.

Will do that.

> Could we add the vhost-user message to tell us the queue number QEMU
> allocates before vring message?

We may need do that. But it's too late to make it in v2.2

--yliu

> > if (vq->callfd >= 0)
> > close(vq->callfd);
>

[dpdk-dev] [PATCH v7 4/8] vhost: rxtx: use queue id instead of constant ring index

2015-10-22 Thread Michael S. Tsirkin

On Thu, Oct 22, 2015 at 05:49:55PM +0800, Yuanhan Liu wrote:
> On Wed, Oct 21, 2015 at 05:26:18PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Oct 21, 2015 at 08:48:15PM +0800, Yuanhan Liu wrote:
> > > > Please note that for virtio devices, guest is supposed to
> > > > control the placement of incoming packets in RX queues.
> > > 
> > > I may not follow you.
> > > 
> > > Enqueuing packets to a RX queue is done at vhost lib, outside the
> > > guest, how could the guest take the control here?
> > > 
> > >   --yliu
> > 
> > vhost should do what guest told it to.
> > 
> > See virtio spec:
> > 5.1.6.5.5 Automatic receive steering in multiqueue mode
> 
> Spec says:
> 
> After the driver transmitted a packet of a flow on transmitqX,
> the device SHOULD cause incoming packets for that flow to be
> steered to receiveqX.
> 
> 
> Michael, I still have no idea how vhost could know the flow even
> after discussion with Huawei. Could you be more specific about
> this? Say, how could guest know that? And how could guest tell
> vhost which RX is gonna to use?
> 
> Thanks.
> 
>   --yliu

I don't really understand the question.

When guests transmits a packet, it makes a decision
about the flow to use, and maps that to a tx/rx pair of queues.

It sends packets out on the tx queue and expects device to
return packets from the same flow on the rx queue.

During transmit, device needs to figure out the flow
of packets as they are received from guest, and track
which flows go on which tx queue.
When it selects the rx queue, it has to use the same table.

There is currently no provision for controlling
steering for uni-directional
flows which are possible e.g. with UDP.

We might solve this in a future spec - for example, set a flag notifying
guest that steering information is missing for a given flow, for example
by setting a flag in a packet, or using the command queue, and have
guest send a dummy empty packet to set steering rule for this flow.

-- 
MST

[dpdk-dev] [PATCH v3 7/7] virtio: pick simple rx/tx func

2015-10-22 Thread Xie, Huawei

On 10/22/2015 10:50 AM, Tan, Jianfeng wrote:
> On 10/22/2015 10:45 AM, Jianfeng wrote:
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
>> Sent: Tuesday, October 20, 2015 11:30 PM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH v3 7/7] virtio: pick simple rx/tx func
>>
>> simple rx/tx func is enabled when user specifies single segment and no
>> offload support.
>> merge-able should be disabled to use simple rxtx.
>>
>> Signed-off-by: Huawei Xie 
>> ---
>>  drivers/net/virtio/virtio_rxtx.c | 12 
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/net/virtio/virtio_rxtx.c 
>> b/drivers/net/virtio/virtio_rxtx.c
>> index 947fc46..71f8cd4 100644
>> --- a/drivers/net/virtio/virtio_rxtx.c
>> +++ b/drivers/net/virtio/virtio_rxtx.c
>> @@ -62,6 +62,10 @@
>>  #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)  #endif
>>
>> +
>> +#define VIRTIO_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS
>> | \
>> +ETH_TXQ_FLAGS_NOOFFLOADS)
>> +
>>  static int use_simple_rxtx;
>>
>>  static void
>> @@ -471,6 +475,14 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev
>> *dev,
>>  return -EINVAL;
>>  }
>>
>> +/* Use simple rx/tx func if single segment and no offloads */
>> +if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) ==
>> VIRTIO_SIMPLE_FLAGS) {
>> +PMD_INIT_LOG(INFO, "Using simple rx/tx path");
>> +dev->tx_pkt_burst = virtio_xmit_pkts_simple;
>> +dev->rx_pkt_burst = virtio_recv_pkts_vec;
> Whether recv side mergeable is supported is controlled by 
> virtio_negotiate_feature().
> So "dev->rx_pkt_burst = virtio_recv_pkts_vec" should be restricted by 
> hw->guest_features & VIRTIO_NET_F_MRG_RXBUF, right?
Add this check in next version. However it will still be put here as we
want to leave us a chance to dynamically choose normal/simple rx function.
>
>> +use_simple_rxtx = 1;
>> +}
>> +
>>  ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx,
>> vtpci_queue_idx,
>>  nb_desc, socket_id, &vq);
>>  if (ret < 0) {
>> --
>> 1.8.1.4
>

[dpdk-dev] [PATCH v3 2/4] nfp-uio: new uio driver for netronome nfp6000 card

2015-10-22 Thread Alejandro Lucero

Submitting just the PMD for integration makes sense. I will remove all the
references to nfp_uio.

My doubt is with documentation. Working with the NFP PMD will not be
possible without nfp_uio. We could modify the documentation saying it is
possible to use igb_uio, but this is not the right thing to do (pci mask
will be wrong). So, would it be acceptable to submit a new PMD without any
documentation by now? I prefer this for the sake of integration than giving
wrong or incomplete documentation.

Thanks

On Wed, Oct 21, 2015 at 8:40 PM, Alejandro Lucero <
alejandro.lucero at netronome.com> wrote:

>
>
> On Wed, Oct 21, 2015 at 5:03 PM, Thomas Monjalon <
> thomas.monjalon at 6wind.com> wrote:
>
>> 2015-10-21 16:57, Alejandro Lucero:
>> > I understand interest for not having another UIO driver does exist. We
>> > could maintain an external nfp_uio by now till either we get rid of it
>> or
>> > we definitely find out it is really needed. any chance to accept
>> nfp_uio by
>> > now?
>>
>> No, there are some work currently to get rid of igb_uio.
>> So there are little chances to accept nfp_uio one day.
>> Please take the first step of integrating your PMD without link interrupt.
>> Later we'll be able to discuss how to mitigate the interrupt issue.
>>
>
> Ok. I will create a new patchset version without nfp_uio.
>
> By the way, that work with igb_uio is about the patches to
> pci_uio_generic? I thought there was some reticence from the maintainer for
> adding pci bus master there.
>
>
>

[dpdk-dev] volunteer to be the maintainer of driver/net/intel sub-tree

2015-10-22 Thread Lu, Wenzhuo

Hi Thomas,

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, October 22, 2015 4:17 PM
> To: Lu, Wenzhuo
> Cc: dev at dpdk.org; Zhang, Helin; Richardson, Bruce
> Subject: Re: volunteer to be the maintainer of driver/net/intel sub-tree
> 
> Hi Wenzhuo,
> 
> 2015-10-22 02:49, Lu, Wenzhuo:
> > Hi all,
> > Following the discussion of DPDK user space and the maintenance of
> development sub-trees, I'd like to volunteer myself to be the maintainer of
> sub-tree driver/net/intel. It includes all the PMD of Intel NICs. And Helin 
> can
> be my backup.
> 
> Thanks for proposing.
> You are already doing part of the work being maintainer of e1000, and Helin
> for ixgbe and i40e.
> 
> > I suggest we create a new directory to move the driver/net/e1000,
> driver/net/fm10k... to it. And we can also create directories for other
> vendors just like the kernel driver do.
> 
> We don't need to move files to be able to manage them in a sub-tree.
> For the day to day tasks, it's better to limit directory depth.
> And think about what happened with Broadcom and Qlogic, we are not
> going to move files when Intel will buy the NIC xyz.
> Generally speaking, it's better to keep company names outside of technical
> things.
> 
I think you're reasonable. I thought adding a directory may make thing's simple,
but the reality is it may introduce something that's out of our control. :)
But a single directory also has its benefit, it can easily let us know all the 
things
in the directory is maintained by a or a group of owners. I think the key is 
how to
let the developers know what happens. If something can explain itself, we need 
not
clarify it. Agree we'd better not adding the company name. But I think it's 
still worth
thinking about how to make things more clear. Honestly, I don't have any 
proposal
now. Maybe we have to come out a doc at last. :)

> > Additionally, as we observed, some patch sets will not only change the files
> in drivers/net, but also some files in lib/librte_ether, doc, app, examples...
> Only being drivers/net/intel maintainer cannot work for these patch sets,
> especially for the new features. Applying partial feature patch set is not 
> ideal.
> Ideally we need a maintainer to drive the RTE_ETHER discussion. Maybe
> Bruce can be a top-level maintainer. So, he can help when we face this
> scenario.
> 
> A sub-tree is not restricted to some directories. It must manage an expertise
> zone, a technical domain, an area of interest, choose your words ;)
> 
> Today we have no working sub-tree. So we should start splitting areas in
> some large grain and make it work. Then we can split more with a top down
> approach.
> So I think we should first create the subtree for networking drivers and wait
> a little before having a subtree for Intel NICs.
> 
> Do you agree?
Honestly, I cannot tell which is better that we begin with a big area or a 
small area.
But I agree we should have something working first. It can become a guide .
A sub-tree for networking drivers seems to be a good option. If it's not too 
bothering,
I'd like to mention again that the concern is the networking drivers are 
affinitive with
rte_eth.

[dpdk-dev] [PATCHv6 0/9] ethdev: add new API to retrieve RX/TX queue information

2015-10-22 Thread Konstantin Ananyev

Add the ability for the upper layer to query:
1) configured RX/TX queue information.
2) information about RX/TX descriptors min/max/align
numbers per queue for the device.

v2 changes:
- Add formal check for the qinfo input parameter.
- As suggested rename 'rx_qinfo/tx_qinfo' to 'rxq_info/txq_info'

v3 changes:
- Updated rte_ether_version.map
- Merged with latest changes

v4 changes:
- rte_ether_version.map: move new functions into DPDK_2.1 sub-space.

v5 changes:
- adressed previous code-review comments
- rte_ether_version.map: move new functions into DPDK_2.2 sub-space.
- added new fields into rte_eth_dev_info

v6 chages:
- respin to comply with latest dpdk.org
- update release_notes

Konstantin Ananyev (9):
  ethdev: add new API to retrieve RX/TX queue information
  i40e: add support for eth_(rxq|txq)_info_get and (rx|tx)_desc_lim
  ixgbe: add support for eth_(rxq|txq)_info_get and (rx|tx)_desc_lim
  e1000: add support for eth_(rxq|txq)_info_get and (rx|tx)_desc_lim
  testpmd: add new command to display RX/TX queue information
  fm10k: add HW specific desc_lim data into dev_info
  cxgbe: add HW specific desc_lim data into dev_info
  vmxnet3: add HW specific desc_lim data into dev_info
  doc: release notes update for queue_info_get()

 app/test-pmd/cmdline.c | 48 +++
 app/test-pmd/config.c  | 77 ++
 app/test-pmd/testpmd.h |  2 +
 doc/guides/rel_notes/release_2_2.rst   |  7 +++
 drivers/net/cxgbe/cxgbe_ethdev.c   |  9 
 drivers/net/e1000/e1000_ethdev.h   | 36 ++
 drivers/net/e1000/em_ethdev.c  | 14 ++
 drivers/net/e1000/em_rxtx.c| 71 
 drivers/net/e1000/igb_ethdev.c | 22 +
 drivers/net/e1000/igb_rxtx.c   | 66 +-
 drivers/net/fm10k/fm10k_ethdev.c   | 11 +
 drivers/net/i40e/i40e_ethdev.c | 14 ++
 drivers/net/i40e/i40e_ethdev.h |  5 ++
 drivers/net/i40e/i40e_ethdev_vf.c  | 12 +
 drivers/net/i40e/i40e_rxtx.c   | 37 +++
 drivers/net/ixgbe/ixgbe_ethdev.c   | 23 +
 drivers/net/ixgbe/ixgbe_ethdev.h   |  6 +++
 drivers/net/ixgbe/ixgbe_rxtx.c | 68 +--
 drivers/net/ixgbe/ixgbe_rxtx.h | 21 +
 drivers/net/vmxnet3/vmxnet3_ethdev.c   | 12 +
 lib/librte_ether/rte_ethdev.c  | 68 +++
 lib/librte_ether/rte_ethdev.h  | 85 +-
 lib/librte_ether/rte_ether_version.map |  8 
 23 files changed, 642 insertions(+), 80 deletions(-)

-- 
1.8.5.3

[dpdk-dev] [PATCHv6 1/9] ethdev: add new API to retrieve RX/TX queue information

2015-10-22 Thread Konstantin Ananyev

From: "Ananyev, Konstantin" 

Add the ability for the upper layer to query RX/TX queue information.
Add into rte_eth_dev_info new fields to represent information about
RX/TX descriptors min/max/alig nnumbers per queue for the device.

Add new structures:
struct rte_eth_rxq_info
struct rte_eth_txq_info

new functions:
rte_eth_rx_queue_info_get
rte_eth_tx_queue_info_get

into rte_etdev API.

Left extra free space in the queue info structures,
so extra fields could be added later without ABI breakage.

Add new fields:
rx_desc_lim
tx_desc_lim
into rte_eth_dev_info.

Signed-off-by: Konstantin Ananyev 
---
 lib/librte_ether/rte_ethdev.c  | 68 +++
 lib/librte_ether/rte_ethdev.h  | 85 +-
 lib/librte_ether/rte_ether_version.map |  8 
 3 files changed, 159 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index f593f6e..d18ecb5 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1447,6 +1447,19 @@ rte_eth_rx_queue_setup(uint8_t port_id, uint16_t 
rx_queue_id,
return -EINVAL;
}

+   if (nb_rx_desc > dev_info.rx_desc_lim.nb_max ||
+   nb_rx_desc < dev_info.rx_desc_lim.nb_min ||
+   nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) {
+
+   PMD_DEBUG_TRACE("Invalid value for nb_rx_desc(=%hu), "
+   "should be: <= %hu, = %hu, and a product of %hu\n",
+   nb_rx_desc,
+   dev_info.rx_desc_lim.nb_max,
+   dev_info.rx_desc_lim.nb_min,
+   dev_info.rx_desc_lim.nb_align);
+   return -EINVAL;
+   }
+
if (rx_conf == NULL)
rx_conf = &dev_info.default_rxconf;

@@ -1786,11 +1799,18 @@ void
 rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info)
 {
struct rte_eth_dev *dev;
+   const struct rte_eth_desc_lim lim = {
+   .nb_max = UINT16_MAX,
+   .nb_min = 0,
+   .nb_align = 1,
+   };

VALID_PORTID_OR_RET(port_id);
dev = &rte_eth_devices[port_id];

memset(dev_info, 0, sizeof(struct rte_eth_dev_info));
+   dev_info->rx_desc_lim = lim;
+   dev_info->tx_desc_lim = lim;

FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
(*dev->dev_ops->dev_infos_get)(dev, dev_info);
@@ -3221,6 +3241,54 @@ rte_eth_remove_tx_callback(uint8_t port_id, uint16_t 
queue_id,
 }

 int
+rte_eth_rx_queue_info_get(uint8_t port_id, uint16_t queue_id,
+   struct rte_eth_rxq_info *qinfo)
+{
+   struct rte_eth_dev *dev;
+
+   VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+   if (qinfo == NULL)
+   return -EINVAL;
+
+   dev = &rte_eth_devices[port_id];
+   if (queue_id >= dev->data->nb_rx_queues) {
+   PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", queue_id);
+   return -EINVAL;
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rxq_info_get, -ENOTSUP);
+
+   memset(qinfo, 0, sizeof(*qinfo));
+   dev->dev_ops->rxq_info_get(dev, queue_id, qinfo);
+   return 0;
+}
+
+int
+rte_eth_tx_queue_info_get(uint8_t port_id, uint16_t queue_id,
+   struct rte_eth_txq_info *qinfo)
+{
+   struct rte_eth_dev *dev;
+
+   VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+   if (qinfo == NULL)
+   return -EINVAL;
+
+   dev = &rte_eth_devices[port_id];
+   if (queue_id >= dev->data->nb_tx_queues) {
+   PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
+   return -EINVAL;
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->txq_info_get, -ENOTSUP);
+
+   memset(qinfo, 0, sizeof(*qinfo));
+   dev->dev_ops->txq_info_get(dev, queue_id, qinfo);
+   return 0;
+}
+
+int
 rte_eth_dev_set_mc_addr_list(uint8_t port_id,
 struct ether_addr *mc_addr_set,
 uint32_t nb_mc_addr)
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 8a8c82b..4d7b6f2 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -653,6 +653,15 @@ struct rte_eth_txconf {
 };

 /**
+ * A structure contains information about HW descriptor ring limitations.
+ */
+struct rte_eth_desc_lim {
+   uint16_t nb_max;   /**< Max allowed number of descriptors. */
+   uint16_t nb_min;   /**< Min allowed number of descriptors. */
+   uint16_t nb_align; /**< Number of descriptors should be aligned to. */
+};
+
+/**
  * This enum indicates the flow control mode
  */
 enum rte_eth_fc_mode {
@@ -837,6 +846,8 @@ struct rte_eth_dev_info {
uint16_t vmdq_queue_base; /**< First queue ID for VMDQ pools. */
uint16_t vmdq_queue_num;  /**< Queue number for VMDQ pools. */
uint16_t vmdq_pool_base;  /**< First ID of VMDQ pools. */
+   struct rte_eth_desc_lim rx_desc_lim;  /**< RX descrip

[dpdk-dev] [PATCHv6 6/9] cxgbe: add HW specific desc_lim data into dev_info

2015-10-22 Thread Konstantin Ananyev

Signed-off-by: Konstantin Ananyev 
---
 drivers/net/cxgbe/cxgbe_ethdev.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index a8e057b..920e071 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -141,6 +141,12 @@ static void cxgbe_dev_info_get(struct rte_eth_dev *eth_dev,
struct adapter *adapter = pi->adapter;
int max_queues = adapter->sge.max_ethqsets / adapter->params.nports;

+   static const struct rte_eth_desc_lim cxgbe_desc_lim = {
+   .nb_max = CXGBE_MAX_RING_DESC_SIZE,
+   .nb_min = CXGBE_MIN_RING_DESC_SIZE,
+   .nb_align = 1,
+   };
+
device_info->min_rx_bufsize = CXGBE_MIN_RX_BUFSIZE;
device_info->max_rx_pktlen = CXGBE_MAX_RX_PKTLEN;
device_info->max_rx_queues = max_queues;
@@ -162,6 +168,9 @@ static void cxgbe_dev_info_get(struct rte_eth_dev *eth_dev,
   DEV_TX_OFFLOAD_TCP_TSO;

device_info->reta_size = pi->rss_size;
+
+   device_info->rx_desc_lim = cxgbe_desc_lim;
+   device_info->tx_desc_lim = cxgbe_desc_lim;
 }

 static void cxgbe_dev_promiscuous_enable(struct rte_eth_dev *eth_dev)
-- 
1.8.5.3

[dpdk-dev] [PATCHv6 4/9] e1000: add support for eth_(rxq|txq)_info_get and (rx|tx)_desc_lim

2015-10-22 Thread Konstantin Ananyev

Signed-off-by: Konstantin Ananyev 
---
 drivers/net/e1000/e1000_ethdev.h | 36 
 drivers/net/e1000/em_ethdev.c| 14 
 drivers/net/e1000/em_rxtx.c  | 71 +++-
 drivers/net/e1000/igb_ethdev.c   | 22 +
 drivers/net/e1000/igb_rxtx.c | 66 -
 5 files changed, 156 insertions(+), 53 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 4e69e44..3c6f613 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -108,6 +108,30 @@
ETH_RSS_IPV6_TCP_EX | \
ETH_RSS_IPV6_UDP_EX)

+/*
+ * Maximum number of Ring Descriptors.
+ *
+ * Since RDLEN/TDLEN should be multiple of 128 bytes, the number of ring
+ * desscriptors should meet the following condition:
+ * (num_ring_desc * sizeof(struct e1000_rx/tx_desc)) % 128 == 0
+ */
+#defineE1000_MIN_RING_DESC 32
+#defineE1000_MAX_RING_DESC 4096
+
+/*
+ * TDBA/RDBA should be aligned on 16 byte boundary. But TDLEN/RDLEN should be
+ * multiple of 128 bytes. So we align TDBA/RDBA on 128 byte boundary.
+ * This will also optimize cache line size effect.
+ * H/W supports up to cache line size 128.
+ */
+#defineE1000_ALIGN 128
+
+#defineIGB_RXD_ALIGN   (E1000_ALIGN / sizeof(union e1000_adv_rx_desc))
+#defineIGB_TXD_ALIGN   (E1000_ALIGN / sizeof(union e1000_adv_tx_desc))
+
+#defineEM_RXD_ALIGN(E1000_ALIGN / sizeof(struct e1000_rx_desc))
+#defineEM_TXD_ALIGN(E1000_ALIGN / sizeof(struct e1000_data_desc))
+
 /* structure for interrupt relative data */
 struct e1000_interrupt {
uint32_t flags;
@@ -307,6 +331,12 @@ void igb_pf_mbx_process(struct rte_eth_dev *eth_dev);

 int igb_pf_host_configure(struct rte_eth_dev *eth_dev);

+void igb_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
+   struct rte_eth_rxq_info *qinfo);
+
+void igb_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
+   struct rte_eth_txq_info *qinfo);
+
 /*
  * RX/TX EM function prototypes
  */
@@ -343,6 +373,12 @@ uint16_t eth_em_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts,
 uint16_t eth_em_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);

+void em_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
+   struct rte_eth_rxq_info *qinfo);
+
+void em_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
+   struct rte_eth_txq_info *qinfo);
+
 void igb_pf_host_uninit(struct rte_eth_dev *dev);

 #endif /* _E1000_ETHDEV_H_ */
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 912f5dd..0cbc228 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -166,6 +166,8 @@ static const struct eth_dev_ops eth_em_ops = {
.mac_addr_add = eth_em_rar_set,
.mac_addr_remove  = eth_em_rar_clear,
.set_mc_addr_list = eth_em_set_mc_addr_list,
+   .rxq_info_get = em_rxq_info_get,
+   .txq_info_get = em_txq_info_get,
 };

 /**
@@ -933,6 +935,18 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)

dev_info->max_rx_queues = 1;
dev_info->max_tx_queues = 1;
+
+   dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+   .nb_max = E1000_MAX_RING_DESC,
+   .nb_min = E1000_MIN_RING_DESC,
+   .nb_align = EM_RXD_ALIGN,
+   };
+
+   dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+   .nb_max = E1000_MAX_RING_DESC,
+   .nb_min = E1000_MIN_RING_DESC,
+   .nb_align = EM_TXD_ALIGN,
+   };
 }

 /* return 0 means link status changed, -1 means not changed */
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 3b8776d..03e1bc2 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -1081,26 +1081,6 @@ eth_em_recv_scattered_pkts(void *rx_queue, struct 
rte_mbuf **rx_pkts,
return (nb_rx);
 }

-/*
- * Rings setup and release.
- *
- * TDBA/RDBA should be aligned on 16 byte boundary. But TDLEN/RDLEN should be
- * multiple of 128 bytes. So we align TDBA/RDBA on 128 byte boundary.
- * This will also optimize cache line size effect.
- * H/W supports up to cache line size 128.
- */
-#define EM_ALIGN 128
-
-/*
- * Maximum number of Ring Descriptors.
- *
- * Since RDLEN/TDLEN should be multiple of 128 bytes, the number of ring
- * desscriptors should meet the following condition:
- * (num_ring_desc * sizeof(struct e1000_rx/tx_desc)) % 128 == 0
- */
-#define EM_MIN_RING_DESC 32
-#define EM_MAX_RING_DESC 4096
-
 #defineEM_MAX_BUF_SIZE 16384
 #define EM_RCTL_FLXBUF_STEP 1024

@@ -1210,11 +1190,11 @@ eth_em_tx_queue_setup(struct rte_eth_dev *dev,
/*
 * Validate number of transmit descriptors.
 * It must not exceed hardware maximum, and must be multiple
-*

[dpdk-dev] [PATCHv6 8/9] testpmd: add new command to display RX/TX queue information

2015-10-22 Thread Konstantin Ananyev

From: "Ananyev, Konstantin" 

Signed-off-by: Konstantin Ananyev 
---
 app/test-pmd/cmdline.c | 48 +++
 app/test-pmd/config.c  | 77 ++
 app/test-pmd/testpmd.h |  2 ++
 3 files changed, 127 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 0f8f48f..ea2b8a8 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -5305,6 +5305,53 @@ cmdline_parse_inst_t cmd_showport = {
},
 };

+/* *** SHOW QUEUE INFO *** */
+struct cmd_showqueue_result {
+   cmdline_fixed_string_t show;
+   cmdline_fixed_string_t type;
+   cmdline_fixed_string_t what;
+   uint8_t portnum;
+   uint16_t queuenum;
+};
+
+static void
+cmd_showqueue_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_showqueue_result *res = parsed_result;
+
+   if (!strcmp(res->type, "rxq"))
+   rx_queue_infos_display(res->portnum, res->queuenum);
+   else if (!strcmp(res->type, "txq"))
+   tx_queue_infos_display(res->portnum, res->queuenum);
+}
+
+cmdline_parse_token_string_t cmd_showqueue_show =
+   TOKEN_STRING_INITIALIZER(struct cmd_showqueue_result, show, "show");
+cmdline_parse_token_string_t cmd_showqueue_type =
+   TOKEN_STRING_INITIALIZER(struct cmd_showqueue_result, type, "rxq#txq");
+cmdline_parse_token_string_t cmd_showqueue_what =
+   TOKEN_STRING_INITIALIZER(struct cmd_showqueue_result, what, "info");
+cmdline_parse_token_num_t cmd_showqueue_portnum =
+   TOKEN_NUM_INITIALIZER(struct cmd_showqueue_result, portnum, UINT8);
+cmdline_parse_token_num_t cmd_showqueue_queuenum =
+   TOKEN_NUM_INITIALIZER(struct cmd_showqueue_result, queuenum, UINT16);
+
+cmdline_parse_inst_t cmd_showqueue = {
+   .f = cmd_showqueue_parsed,
+   .data = NULL,
+   .help_str = "show rxq|txq info  ",
+   .tokens = {
+   (void *)&cmd_showqueue_show,
+   (void *)&cmd_showqueue_type,
+   (void *)&cmd_showqueue_what,
+   (void *)&cmd_showqueue_portnum,
+   (void *)&cmd_showqueue_queuenum,
+   NULL,
+   },
+};
+
 /* *** READ PORT REGISTER *** */
 struct cmd_read_reg_result {
cmdline_fixed_string_t read;
@@ -8910,6 +8957,7 @@ cmdline_parse_ctx_t main_ctx[] = {
(cmdline_parse_inst_t *)&cmd_help_long,
(cmdline_parse_inst_t *)&cmd_quit,
(cmdline_parse_inst_t *)&cmd_showport,
+   (cmdline_parse_inst_t *)&cmd_showqueue,
(cmdline_parse_inst_t *)&cmd_showportall,
(cmdline_parse_inst_t *)&cmd_showcfg,
(cmdline_parse_inst_t *)&cmd_start,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index cf2aa6e..aad2ab6 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -293,6 +293,69 @@ nic_stats_mapping_display(portid_t port_id)
 }

 void
+rx_queue_infos_display(portid_t port_id, uint16_t queue_id)
+{
+   struct rte_eth_rxq_info qinfo;
+   int32_t rc;
+   static const char *info_border = "*";
+
+   rc = rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo);
+   if (rc != 0) {
+   printf("Failed to retrieve information for port: %hhu, "
+   "RX queue: %hu\nerror desc: %s(%d)\n",
+   port_id, queue_id, strerror(-rc), rc);
+   return;
+   }
+
+   printf("\n%s Infos for port %-2u, RX queue %-2u %s",
+  info_border, port_id, queue_id, info_border);
+
+   printf("\nMempool: %s", (qinfo.mp == NULL) ? "NULL" : qinfo.mp->name);
+   printf("\nRX prefetch threshold: %hhu", qinfo.conf.rx_thresh.pthresh);
+   printf("\nRX host threshold: %hhu", qinfo.conf.rx_thresh.hthresh);
+   printf("\nRX writeback threshold: %hhu", qinfo.conf.rx_thresh.wthresh);
+   printf("\nRX free threshold: %hu", qinfo.conf.rx_free_thresh);
+   printf("\nRX drop packets: %s",
+   (qinfo.conf.rx_drop_en != 0) ? "on" : "off");
+   printf("\nRX deferred start: %s",
+   (qinfo.conf.rx_deferred_start != 0) ? "on" : "off");
+   printf("\nRX scattered packets: %s",
+   (qinfo.scattered_rx != 0) ? "on" : "off");
+   printf("\nNumber of RXDs: %hu", qinfo.nb_desc);
+   printf("\n");
+}
+
+void
+tx_queue_infos_display(portid_t port_id, uint16_t queue_id)
+{
+   struct rte_eth_txq_info qinfo;
+   int32_t rc;
+   static const char *info_border = "*";
+
+   rc = rte_eth_tx_queue_info_get(port_id, queue_id, &qinfo);
+   if (rc != 0) {
+   printf("Failed to retrieve information for port: %hhu, "
+   "TX queue: %hu\nerror desc: %s(%d)\n",
+   port_id, queue_id, strerror(-rc), rc);
+   return;
+   }
+
+   printf("\n%s Infos for port %-2u, TX queue %-2u %s",
+  info_border, port_i

[dpdk-dev] [PATCHv6 9/9] doc: release notes update for queue_info_get()

2015-10-22 Thread Konstantin Ananyev

Signed-off-by: Konstantin Ananyev 
---
 doc/guides/rel_notes/release_2_2.rst | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index 4f75cff..33ea399 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -9,6 +9,11 @@ New Features
   *  Added support for Jumbo Frames.
   *  Optimize forwarding performance for Chelsio T5 40GbE cards.

+* **Add new API into rte_ethdev to retrieve RX/TX queue information.**
+
+  *  Add the ability for the upper layer to query RX/TX queue information.
+  *  Add into rte_eth_dev_info new fields to represent information about
+ RX/TX descriptors min/max/alig nnumbers per queue for the device.

 Resolved Issues
 ---
@@ -94,6 +99,8 @@ API Changes
 * The deprecated ring PMD functions are removed:
   rte_eth_ring_pair_create() and rte_eth_ring_pair_attach().

+* New functions rte_eth_rx_queue_info_get() and rte_eth_tx_queue_info_get()
+  are introduced.

 ABI Changes
 ---
-- 
1.8.5.3

[dpdk-dev] [PATCHv6 2/9] i40e: add support for eth_(rxq|txq)_info_get and (rx|tx)_desc_lim

2015-10-22 Thread Konstantin Ananyev

This patch assumes that the patch:
i40e: fix wrong alignment for the number of HW descriptors
already applied.

Signed-off-by: Konstantin Ananyev 
---
 drivers/net/i40e/i40e_ethdev.c| 14 ++
 drivers/net/i40e/i40e_ethdev.h|  5 +
 drivers/net/i40e/i40e_ethdev_vf.c | 12 
 drivers/net/i40e/i40e_rxtx.c  | 37 +
 4 files changed, 68 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 2dd9fdc..cbc1985 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -283,6 +283,8 @@ static const struct eth_dev_ops i40e_eth_dev_ops = {
.udp_tunnel_add   = i40e_dev_udp_tunnel_add,
.udp_tunnel_del   = i40e_dev_udp_tunnel_del,
.filter_ctrl  = i40e_dev_filter_ctrl,
+   .rxq_info_get = i40e_rxq_info_get,
+   .txq_info_get = i40e_txq_info_get,
.mirror_rule_set  = i40e_mirror_rule_set,
.mirror_rule_reset= i40e_mirror_rule_reset,
.timesync_enable  = i40e_timesync_enable,
@@ -1674,6 +1676,18 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
ETH_TXQ_FLAGS_NOOFFLOADS,
};

+   dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+   .nb_max = I40E_MAX_RING_DESC,
+   .nb_min = I40E_MIN_RING_DESC,
+   .nb_align = I40E_ALIGN_RING_DESC,
+   };
+
+   dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+   .nb_max = I40E_MAX_RING_DESC,
+   .nb_min = I40E_MIN_RING_DESC,
+   .nb_align = I40E_ALIGN_RING_DESC,
+   };
+
if (pf->flags & I40E_FLAG_VMDQ) {
dev_info->max_vmdq_pools = pf->max_nb_vmdq_vsi;
dev_info->vmdq_queue_base = dev_info->max_rx_queues;
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 6185657..4748392 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -502,6 +502,11 @@ int i40e_fdir_ctrl_func(struct rte_eth_dev *dev,
  enum rte_filter_op filter_op,
  void *arg);

+void i40e_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
+   struct rte_eth_rxq_info *qinfo);
+void i40e_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
+   struct rte_eth_txq_info *qinfo);
+
 /* I40E_DEV_PRIVATE_TO */
 #define I40E_DEV_PRIVATE_TO_PF(adapter) \
(&((struct i40e_adapter *)adapter)->pf)
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index b694400..5dad12d 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1756,6 +1756,18 @@ i40evf_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
ETH_TXQ_FLAGS_NOOFFLOADS,
};
+
+   dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+   .nb_max = I40E_MAX_RING_DESC,
+   .nb_min = I40E_MIN_RING_DESC,
+   .nb_align = I40E_ALIGN_RING_DESC,
+   };
+
+   dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+   .nb_max = I40E_MAX_RING_DESC,
+   .nb_min = I40E_MIN_RING_DESC,
+   .nb_align = I40E_ALIGN_RING_DESC,
+   };
 }

 static void
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 260e580..fa1451e 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -3063,3 +3063,40 @@ i40e_fdir_setup_rx_resources(struct i40e_pf *pf)

return I40E_SUCCESS;
 }
+
+void
+i40e_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
+   struct rte_eth_rxq_info *qinfo)
+{
+   struct i40e_rx_queue *rxq;
+
+   rxq = dev->data->rx_queues[queue_id];
+
+   qinfo->mp = rxq->mp;
+   qinfo->scattered_rx = dev->data->scattered_rx;
+   qinfo->nb_desc = rxq->nb_rx_desc;
+
+   qinfo->conf.rx_free_thresh = rxq->rx_free_thresh;
+   qinfo->conf.rx_drop_en = rxq->drop_en;
+   qinfo->conf.rx_deferred_start = rxq->rx_deferred_start;
+}
+
+void
+i40e_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
+   struct rte_eth_txq_info *qinfo)
+{
+   struct i40e_tx_queue *txq;
+
+   txq = dev->data->tx_queues[queue_id];
+
+   qinfo->nb_desc = txq->nb_tx_desc;
+
+   qinfo->conf.tx_thresh.pthresh = txq->pthresh;
+   qinfo->conf.tx_thresh.hthresh = txq->hthresh;
+   qinfo->conf.tx_thresh.wthresh = txq->wthresh;
+
+   qinfo->conf.tx_free_thresh = txq->tx_free_thresh;
+   qinfo->conf.tx_rs_thresh = txq->tx_rs_thresh;
+   qinfo->conf.txq_flags = txq->txq_flags;
+   qinfo->conf.tx_deferred_start = txq->tx_deferred_start;
+}
-- 
1.8.5.3

[dpdk-dev] [PATCHv6 5/9] fm10k: add HW specific desc_lim data into dev_info

2015-10-22 Thread Konstantin Ananyev

Signed-off-by: Konstantin Ananyev 
---
 drivers/net/fm10k/fm10k_ethdev.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index a69c990..9588dab 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -964,6 +964,17 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
ETH_TXQ_FLAGS_NOOFFLOADS,
};

+   dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+   .nb_max = FM10K_MAX_RX_DESC,
+   .nb_min = FM10K_MIN_RX_DESC,
+   .nb_align = FM10K_MULT_RX_DESC,
+   };
+
+   dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+   .nb_max = FM10K_MAX_TX_DESC,
+   .nb_min = FM10K_MIN_TX_DESC,
+   .nb_align = FM10K_MULT_TX_DESC,
+   };
 }

 static int
-- 
1.8.5.3

[dpdk-dev] [PATCHv6 7/9] vmxnet3: add HW specific desc_lim data into dev_info

2015-10-22 Thread Konstantin Ananyev

Signed-off-by: Konstantin Ananyev 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index a70be5c..3745b7d 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -677,6 +677,18 @@ vmxnet3_dev_info_get(__attribute__((unused))struct 
rte_eth_dev *dev, struct rte_
dev_info->default_txconf.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
ETH_TXQ_FLAGS_NOOFFLOADS;
dev_info->flow_type_rss_offloads = VMXNET3_RSS_OFFLOAD_ALL;
+
+   dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+   .nb_max = VMXNET3_RX_RING_MAX_SIZE,
+   .nb_min = VMXNET3_DEF_RX_RING_SIZE,
+   .nb_align = 1,
+   };
+
+   dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+   .nb_max = VMXNET3_TX_RING_MAX_SIZE,
+   .nb_min = VMXNET3_DEF_TX_RING_SIZE,
+   .nb_align = 1,
+   };
 }

 /* return 0 means link status changed, -1 means not changed */
-- 
1.8.5.3

[dpdk-dev] [PATCHv6 3/9] ixgbe: add support for eth_(rxq|txq)_info_get and (rx|tx)_desc_lim

2015-10-22 Thread Konstantin Ananyev

Signed-off-by: Konstantin Ananyev 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 23 ++
 drivers/net/ixgbe/ixgbe_ethdev.h |  6 
 drivers/net/ixgbe/ixgbe_rxtx.c   | 68 +---
 drivers/net/ixgbe/ixgbe_rxtx.h   | 21 +
 4 files changed, 93 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index ec2918c..4769bb0 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -386,6 +386,18 @@ static const struct rte_pci_id pci_id_ixgbevf_map[] = {

 };

+static const struct rte_eth_desc_lim rx_desc_lim = {
+   .nb_max = IXGBE_MAX_RING_DESC,
+   .nb_min = IXGBE_MIN_RING_DESC,
+   .nb_align = IXGBE_RXD_ALIGN,
+};
+
+static const struct rte_eth_desc_lim tx_desc_lim = {
+   .nb_max = IXGBE_MAX_RING_DESC,
+   .nb_min = IXGBE_MIN_RING_DESC,
+   .nb_align = IXGBE_TXD_ALIGN,
+};
+
 static const struct eth_dev_ops ixgbe_eth_dev_ops = {
.dev_configure= ixgbe_dev_configure,
.dev_start= ixgbe_dev_start,
@@ -456,6 +468,8 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
.rss_hash_conf_get= ixgbe_dev_rss_hash_conf_get,
.filter_ctrl  = ixgbe_dev_filter_ctrl,
.set_mc_addr_list = ixgbe_dev_set_mc_addr_list,
+   .rxq_info_get = ixgbe_rxq_info_get,
+   .txq_info_get = ixgbe_txq_info_get,
.timesync_enable  = ixgbe_timesync_enable,
.timesync_disable = ixgbe_timesync_disable,
.timesync_read_rx_timestamp = ixgbe_timesync_read_rx_timestamp,
@@ -494,6 +508,8 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops = {
.mac_addr_add = ixgbevf_add_mac_addr,
.mac_addr_remove  = ixgbevf_remove_mac_addr,
.set_mc_addr_list = ixgbe_dev_set_mc_addr_list,
+   .rxq_info_get = ixgbe_rxq_info_get,
+   .txq_info_get = ixgbe_txq_info_get,
.mac_addr_set = ixgbevf_set_default_mac_addr,
.get_reg_length   = ixgbevf_get_reg_length,
.get_reg  = ixgbevf_get_regs,
@@ -2396,6 +2412,10 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
ETH_TXQ_FLAGS_NOOFFLOADS,
};
+
+   dev_info->rx_desc_lim = rx_desc_lim;
+   dev_info->tx_desc_lim = tx_desc_lim;
+
dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t);
dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL;
@@ -2449,6 +2469,9 @@ ixgbevf_dev_info_get(struct rte_eth_dev *dev,
.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
ETH_TXQ_FLAGS_NOOFFLOADS,
};
+
+   dev_info->rx_desc_lim = rx_desc_lim;
+   dev_info->tx_desc_lim = tx_desc_lim;
 }

 /* return 0 means link status changed, -1 means not changed */
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index c3d4f4f..d16f476 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -351,6 +351,12 @@ int ixgbe_dev_tx_queue_start(struct rte_eth_dev *dev, 
uint16_t tx_queue_id);

 int ixgbe_dev_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id);

+void ixgbe_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
+   struct rte_eth_rxq_info *qinfo);
+
+void ixgbe_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
+   struct rte_eth_txq_info *qinfo);
+
 int ixgbevf_dev_rx_init(struct rte_eth_dev *dev);

 void ixgbevf_dev_tx_init(struct rte_eth_dev *dev);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index a598a72..ba08588 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1821,25 +1821,6 @@ ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, struct 
rte_mbuf **rx_pkts,
  **/

 /*
- * Rings setup and release.
- *
- * TDBA/RDBA should be aligned on 16 byte boundary. But TDLEN/RDLEN should be
- * multiple of 128 bytes. So we align TDBA/RDBA on 128 byte boundary. This will
- * also optimize cache line size effect. H/W supports up to cache line size 
128.
- */
-#define IXGBE_ALIGN 128
-
-/*
- * Maximum number of Ring Descriptors.
- *
- * Since RDLEN/TDLEN should be multiple of 128 bytes, the number of ring
- * descriptors should meet the following condition:
- *  (num_ring_desc * sizeof(rx/tx descriptor)) % 128 == 0
- */
-#define IXGBE_MIN_RING_DESC 32
-#define IXGBE_MAX_RING_DESC 4096
-
-/*
  * Create memzone for HW rings. malloc can't be used as the physical address is
  * needed. If the memzone is already created, then this function returns a ptr
  * to the old one.
@@ -2007,9 +1988,9 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 * It must not exceed hardware maxi

[dpdk-dev] [PATCH v4 0/7] virtio ring layout optimization and simple rx/tx processing

2015-10-22 Thread Huawei Xie

Changes in v2:
- Remove the configure macro
- Enable simple R/TX processing when user specifies simple txq flags
- Reword some comments and commit messages

Changes in v3:
- Remove unnecessary NULL test for rte_free
- Remove unnecessary assign of local var after free
- Remove return at the end of void function
- Remove always_inline attribute for virtio_xmit_cleanup
- Reword some commit messages
- Add TODO in the commit message of simple tx patch

Changes in v4:
- Fix the error in virtio tx ring layout ascii chart in the commit message
- move virtio_xmit_cleanup ahead to free descriptors earlier
- Test merge-able feature when select simple rx/tx functions

In DPDK based switching enviroment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on other different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it need to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
allocate the fixed descriptor for each entry of avail ring, so avail ring 
will
always be the same during the run.
This removes L1M cache transfer from virtio core to vhost core for avail ring.
(Note we couldn't avoid the cache transfer for descriptors).
Besides, descriptor allocation and free operation is eliminated.
This also makes vector procesing possible to further accelerate the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
avail
idx
+
|
+++---+-+--+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+-+---+--+---+
  |||   |   |  |
  |||   |   |  |
  vvv   |   v  v
+-+--+-+--+-+-+-+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+++---+-+--+
|
|
+++---+-+--+
| 0  | 1  | 2 | |  254  | 255  |  used ring
+++---+-+--+
|
+

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

 ++
 ||
 ||
+-+-+-+--+--+--+--+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-+---+--+---+--+---+--+--+---+
   | ||  ||  |  | |
   v vv  ||  v  v v
+--+--+--+--+-+---+--+---+--+---+--+--+---+
| 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for 
virtio_net_hdr
+--+--+--+--+-+---+--+---+--+---+--+--+---+
   | ||  ||  |  | |
   v vv  ||  v  v v
+--+--+--+--+-+---+--+---+--+---+--+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-+-+-+--+--+--+--+
 ||
 ||
 ++


Performance boost could be observed only if the virtio backend isn't the 
bottleneck or in VM2VM
case.
There are also several vhost optimization patches to be submitted later.

Huawei Xie (7):
  virtio: add virtio_rxtx.h header file
  virtio: add software rx ring, fake_buf into virtqueue
  virtio: rx/tx ring layout optimization
  virtio: fill RX avail ring with blank mbufs
  virtio: virtio vec rx
  virtio: simple tx routine
  virtio: choose simple rx/tx func

 drivers/net/virtio/Makefile |   2 +-
 drivers/net/virtio/virtio_ethdev.c  |  12 +-
 drivers/net/virtio/virtio_ethdev.h  |   5 +
 drivers/net/virtio/virtio_rxtx.c|  56 -
 drivers/net/virtio/virtio_rxtx.h|  39 
 drivers/net/virtio/virtio_rxtx_simple.c | 401 
 drivers/net/virtio/virtqueue.h  |   5 +
 7 files changed, 516 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

-- 
1.8.1.4

[dpdk-dev] [PATCH v4 1/7] virtio: add virtio_rxtx.h header file

2015-10-22 Thread Huawei Xie

Would move all rx/tx related declarations into this header file in future.
Add RTE_VIRTIO_PMD_MAX_BURST.

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_ethdev.c |  1 +
 drivers/net/virtio/virtio_rxtx.c   |  1 +
 drivers/net/virtio/virtio_rxtx.h   | 34 ++
 3 files changed, 36 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_rxtx.h

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 465d3cd..79a3640 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -61,6 +61,7 @@
 #include "virtio_pci.h"
 #include "virtio_logs.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"


 static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index c5b53bb..9324f7f 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -54,6 +54,7 @@
 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
 #include "virtqueue.h"
+#include "virtio_rxtx.h"

 #ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
 #define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len)
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
new file mode 100644
index 000..a10aa69
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -0,0 +1,34 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define RTE_PMD_VIRTIO_RX_MAX_BURST 64
-- 
1.8.1.4

[dpdk-dev] [PATCH v4 3/7] virtio: rx/tx ring layout optimization

2015-10-22 Thread Huawei Xie

Changes in V4:
- fix the error in tx ring layout chart in this commit message.

In DPDK based switching envrioment, mostly vhost runs on a dedicated core
while virtio processing in guest VMs runs on different cores.
Take RX for example, with generic implementation, for each guest buffer,
a) virtio driver allocates a descriptor from free descriptor list
b) modify the entry of avail ring to point to allocated descriptor
c) after packet is received, free the descriptor

When vhost fetches the avail ring, it need to fetch the modified L1 cache from
virtio core, which is a heavy cost in current CPU implementation.

This idea of this optimization is:
allocate the fixed descriptor for each entry of avail ring, so avail ring 
will
always be the same during the run.
This removes L1M cache transfer from virtio core to vhost core for avail ring.
(Note we couldn't avoid the cache transfer for descriptors).
Besides, descriptor allocation and free operation is eliminated.
This also makes vector procesing possible to further accelerate the processing.

This is the layout for the avail ring(take 256 ring entries for example), with
each entry pointing to the descriptor with the same index.
avail
idx
+
|
+++---+-+--+
| 0  | 1  | 2 | ... |  254  | 255  |  avail ring
+-+--+-+--+-+-+-+---+--+---+
  |||   |   |  |
  |||   |   |  |
  vvv   |   v  v
+-+--+-+--+-+-+-+---+--+---+
| 0  | 1  | 2 | ... |  254  | 255  |  desc ring
+++---+-+--+
|
|
+++---+-+--+
| 0  | 1  | 2 | |  254  | 255  |  used ring
+++---+-+--+
|
+

This is the ring layout for TX.
As we need one virtio header for each xmit packet, we have 128 slots available.

 ++
 ||
 ||
+-+-+-+--+--+--+--+
|  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
+--+--+--+--+-+---+--+---+--+---+--+--+---+
   | ||  ||  |  | |
   v vv  ||  v  v v
+--+--+--+--+-+---+--+---+--+---+--+--+---+
| 128 | 129 | ... |  255 || 128  | 129  | ...  | 255  |   desc ring for 
virtio_net_hdr
+--+--+--+--+-+---+--+---+--+---+--+--+---+
   | ||  ||  |  | |
   v vv  ||  v  v v
+--+--+--+--+-+---+--+---+--+---+--+--+---+
|  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat
+-+-+-+--+--+--+--+
 ||
 ||
 ++

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_rxtx.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5c00e9d..7c82a6a 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -302,6 +302,12 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
queue_type)
nbufs = 0;
error = ENOSPC;

+   if (use_simple_rxtx)
+   for (i = 0; i < vq->vq_nentries; i++) {
+   vq->vq_ring.avail->ring[i] = i;
+   vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
+   }
+
memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
@@ -332,6 +338,24 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
queue_type)
VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
} else if (queue_type == VTNET_TQ) {
+   if (use_simple_rxtx) {
+   int mid_idx  = vq->vq_nentries >> 1;
+   for (i = 0; i < mid_idx; i++) {
+   vq->vq_ring.avail->ring[i] = i + mid_idx;
+   vq->vq_ring.desc[i + mid_idx].next = i;
+   vq->vq_ring.desc[i + mid_idx].addr =
+   vq->virtio_net_hdr_mem +
+   mid_idx * 
vq->hw->vtnet_hdr_size;
+   vq->vq_ring.desc[i + mid_idx].len =
+   vq->hw->vtnet_hdr_size;
+   vq->vq_ring.desc[i + mid_idx].flags =
+   VRING_DESC_F_NEXT;
+   vq->vq_ring.desc[i].flags = 0;
+   }
+

[dpdk-dev] [PATCH v4 2/7] virtio: add software rx ring, fake_buf into virtqueue

2015-10-22 Thread Huawei Xie

Changes in v3:
- Remove unnecessary NULL test for rte_free
- Remove unnecessary assign of local var vq after free

Add software RX ring in virtqueue.
Add fake_mbuf in virtqueue for wraparound processing.
Use global simple_rxtx to indicate whether simple rxtx is enabled

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_ethdev.c | 11 ++-
 drivers/net/virtio/virtio_rxtx.c   |  7 +++
 drivers/net/virtio/virtqueue.h |  4 
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 79a3640..82676d3 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -247,8 +247,8 @@ virtio_dev_queue_release(struct virtqueue *vq) {
VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vq->queue_id);
VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN, 0);

+   rte_free(vq->sw_ring);
rte_free(vq);
-   vq = NULL;
}
 }

@@ -292,6 +292,9 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
dev->data->port_id, queue_idx);
vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
vq_size * sizeof(struct vq_desc_extra), 
RTE_CACHE_LINE_SIZE);
+   vq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
+   (RTE_PMD_VIRTIO_RX_MAX_BURST + vq_size) *
+   sizeof(vq->sw_ring[0]), RTE_CACHE_LINE_SIZE, socket_id);
} else if (queue_type == VTNET_TQ) {
snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d",
dev->data->port_id, queue_idx);
@@ -308,6 +311,12 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue", __func__);
return (-ENOMEM);
}
+   if (queue_type == VTNET_RQ && vq->sw_ring == NULL) {
+   PMD_INIT_LOG(ERR, "%s: Can not allocate RX soft ring",
+   __func__);
+   rte_free(vq);
+   return -ENOMEM;
+   }

vq->hw = hw;
vq->port_id = dev->data->port_id;
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 9324f7f..5c00e9d 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -62,6 +62,8 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif

+static int use_simple_rxtx;
+
 static void
 vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
 {
@@ -299,6 +301,11 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
queue_type)
/* Allocate blank mbufs for the each rx descriptor */
nbufs = 0;
error = ENOSPC;
+
+   memset(&vq->fake_mbuf, 0, sizeof(vq->fake_mbuf));
+   for (i = 0; i < RTE_PMD_VIRTIO_RX_MAX_BURST; i++)
+   vq->sw_ring[vq->vq_nentries + i] = &vq->fake_mbuf;
+
while (!virtqueue_full(vq)) {
m = rte_rxmbuf_alloc(vq->mpool);
if (m == NULL)
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 7789411..6a1ec48 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -190,6 +190,10 @@ struct virtqueue {
uint16_t vq_avail_idx;
phys_addr_t virtio_net_hdr_mem; /**< hdr for each xmit packet */

+   struct rte_mbuf **sw_ring; /**< RX software ring. */
+   /* dummy mbuf, for wraparound when processing RX ring. */
+   struct rte_mbuf fake_mbuf;
+
/* Statistics */
uint64_tpackets;
uint64_tbytes;
-- 
1.8.1.4

[dpdk-dev] [PATCH v4 4/7] virtio: fill RX avail ring with blank mbufs

2015-10-22 Thread Huawei Xie

fill avail ring with blank mbufs in virtio_dev_vring_start

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/Makefile |  2 +-
 drivers/net/virtio/virtio_rxtx.c|  6 ++-
 drivers/net/virtio/virtio_rxtx.h|  3 ++
 drivers/net/virtio/virtio_rxtx_simple.c | 84 +
 4 files changed, 92 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_rxtx_simple.c

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 930b60f..43835ba 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -50,7 +50,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
-
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c

 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 7c82a6a..5162ce6 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -320,8 +320,10 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
queue_type)
/**
* Enqueue allocated buffers*
***/
-   error = virtqueue_enqueue_recv_refill(vq, m);
-
+   if (use_simple_rxtx)
+   error = 
virtqueue_enqueue_recv_refill_simple(vq, m);
+   else
+   error = virtqueue_enqueue_recv_refill(vq, m);
if (error) {
rte_pktmbuf_free(m);
break;
diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index a10aa69..7d2d8fe 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -32,3 +32,6 @@
  */

 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64
+
+int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+   struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c 
b/drivers/net/virtio/virtio_rxtx_simple.c
new file mode 100644
index 000..cac5b9f
--- /dev/null
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -0,0 +1,84 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "virtio_logs.h"
+#include "virtio_ethdev.h"
+#include "virtqueue.h"
+#include "virtio_rxtx.h"
+
+int __attribute__((cold))
+virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
+   struct rte_mbuf *cookie)
+{
+   struct vq_desc_extra *dxp;
+   struct vring_desc *start_dp;
+   uint16_t desc_idx;
+
+   desc_idx = vq->vq_avail_idx & (vq->vq_nentries - 1);
+   dxp = &vq->vq_descx[desc_idx];
+   dxp->cookie = (void *)cookie;
+   vq->sw_ring[desc_idx] = cookie;
+
+   start_dp = vq->vq_ring.desc;
+   start_dp[desc_idx

[dpdk-dev] [PATCH v4 5/7] virtio: virtio vec rx

2015-10-22 Thread Huawei Xie

With fixed avail ring, we don't need to get desc idx from avail ring.
virtio driver only has to deal with desc ring.
This patch uses vector instruction to accelerate processing desc ring.

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_ethdev.h  |   2 +
 drivers/net/virtio/virtio_rxtx.c|   3 +
 drivers/net/virtio/virtio_rxtx.h|   2 +
 drivers/net/virtio/virtio_rxtx_simple.c | 224 
 drivers/net/virtio/virtqueue.h  |   1 +
 5 files changed, 232 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index 9026d42..d7797ab 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -108,6 +108,8 @@ uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct 
rte_mbuf **rx_pkts,
 uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

+uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+   uint16_t nb_pkts);

 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 5162ce6..947fc46 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -432,6 +432,9 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
vq->mpool = mp;

dev->data->rx_queues[queue_idx] = vq;
+
+   virtio_rxq_vec_setup(vq);
+
return 0;
 }

diff --git a/drivers/net/virtio/virtio_rxtx.h b/drivers/net/virtio/virtio_rxtx.h
index 7d2d8fe..831e492 100644
--- a/drivers/net/virtio/virtio_rxtx.h
+++ b/drivers/net/virtio/virtio_rxtx.h
@@ -33,5 +33,7 @@

 #define RTE_PMD_VIRTIO_RX_MAX_BURST 64

+int virtio_rxq_vec_setup(struct virtqueue *rxq);
+
 int virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
struct rte_mbuf *m);
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c 
b/drivers/net/virtio/virtio_rxtx_simple.c
index cac5b9f..ef17562 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -58,6 +58,10 @@
 #include "virtqueue.h"
 #include "virtio_rxtx.h"

+#define RTE_VIRTIO_VPMD_RX_BURST 32
+#define RTE_VIRTIO_DESC_PER_LOOP 8
+#define RTE_VIRTIO_VPMD_RX_REARM_THRESH RTE_VIRTIO_VPMD_RX_BURST
+
 int __attribute__((cold))
 virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
struct rte_mbuf *cookie)
@@ -82,3 +86,223 @@ virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,

return 0;
 }
+
+static inline void
+virtio_rxq_rearm_vec(struct virtqueue *rxvq)
+{
+   int i;
+   uint16_t desc_idx;
+   struct rte_mbuf **sw_ring;
+   struct vring_desc *start_dp;
+   int ret;
+
+   desc_idx = rxvq->vq_avail_idx & (rxvq->vq_nentries - 1);
+   sw_ring = &rxvq->sw_ring[desc_idx];
+   start_dp = &rxvq->vq_ring.desc[desc_idx];
+
+   ret = rte_mempool_get_bulk(rxvq->mpool, (void **)sw_ring,
+   RTE_VIRTIO_VPMD_RX_REARM_THRESH);
+   if (unlikely(ret)) {
+   rte_eth_devices[rxvq->port_id].data->rx_mbuf_alloc_failed +=
+   RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+   return;
+   }
+
+   for (i = 0; i < RTE_VIRTIO_VPMD_RX_REARM_THRESH; i++) {
+   uintptr_t p;
+
+   p = (uintptr_t)&sw_ring[i]->rearm_data;
+   *(uint64_t *)p = rxvq->mbuf_initializer;
+
+   start_dp[i].addr =
+   (uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr +
+   RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+   start_dp[i].len = sw_ring[i]->buf_len -
+   RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+   }
+
+   rxvq->vq_avail_idx += RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+   rxvq->vq_free_cnt -= RTE_VIRTIO_VPMD_RX_REARM_THRESH;
+   vq_update_avail_idx(rxvq);
+}
+
+/* virtio vPMD receive routine, only accept(nb_pkts >= 
RTE_VIRTIO_DESC_PER_LOOP)
+ *
+ * This routine is for non-mergable RX, one desc for each guest buffer.
+ * This routine is based on the RX ring layout optimization. Each entry in the
+ * avail ring points to the desc with the same index in the desc ring and this
+ * will never be changed in the driver.
+ *
+ * - nb_pkts < RTE_VIRTIO_DESC_PER_LOOP, just return no packet
+ */
+uint16_t
+virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
+   uint16_t nb_pkts)
+{
+   struct virtqueue *rxvq = rx_queue;
+   uint16_t nb_used;
+   uint16_t desc_idx;
+   struct vring_used_elem *rused;
+   struct rte_mbuf **sw_ring;
+   struct rte_mbuf **sw_ring_end;
+   uint16_t nb_pkts_received;
+   __m128i shuf_msk1, shuf_msk2, len_adjust;
+
+   shuf_msk1 = _mm_set_epi8(
+   0xFF, 0xFF, 0xFF, 0xFF,
+   0xFF, 0xFF, /* vlan tci */
+   5, 4,   /* dat len */
+   0xFF, 0xFF, 5, 4,

[dpdk-dev] [PATCH v4 6/7] virtio: simple tx routine

2015-10-22 Thread Huawei Xie

Changes in v4:
- move virtio_xmit_cleanup ahead to free descriptors earlier

Changes in v3:
- Remove return at the end of void function
- Remove always_inline attribute for virtio_xmit_cleanup
bulk free of mbufs when clean used ring.
shift operation of idx could be saved if vq_free_cnt means
free slots rather than free descriptors.

TODO: rearrange vq data structure, pack the stats var together so that we
could use one vec instruction to update all of them.

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_ethdev.h  |  3 ++
 drivers/net/virtio/virtio_rxtx_simple.c | 93 +
 2 files changed, 96 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index d7797ab..ae2d47d 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -111,6 +111,9 @@ uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 uint16_t virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);

+uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts);
+
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
  * frames larger than 1514 bytes. We do not yet support software LRO
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c 
b/drivers/net/virtio/virtio_rxtx_simple.c
index ef17562..79b4f7f 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -288,6 +288,99 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf 
**rx_pkts,
return nb_pkts_received;
 }

+#define VIRTIO_TX_FREE_THRESH 32
+#define VIRTIO_TX_MAX_FREE_BUF_SZ 32
+#define VIRTIO_TX_FREE_NR 32
+/* TODO: vq->tx_free_cnt could mean num of free slots so we could avoid shift 
*/
+static inline void
+virtio_xmit_cleanup(struct virtqueue *vq)
+{
+   uint16_t i, desc_idx;
+   int nb_free = 0;
+   struct rte_mbuf *m, *free[VIRTIO_TX_MAX_FREE_BUF_SZ];
+
+   desc_idx = (uint16_t)(vq->vq_used_cons_idx &
+   ((vq->vq_nentries >> 1) - 1));
+   free[0] = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+   nb_free = 1;
+
+   for (i = 1; i < VIRTIO_TX_FREE_NR; i++) {
+   m = (struct rte_mbuf *)vq->vq_descx[desc_idx++].cookie;
+   if (likely(m->pool == free[0]->pool))
+   free[nb_free++] = m;
+   else {
+   rte_mempool_put_bulk(free[0]->pool, (void **)free,
+   nb_free);
+   free[0] = m;
+   nb_free = 1;
+   }
+   }
+
+   rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+   vq->vq_used_cons_idx += VIRTIO_TX_FREE_NR;
+   vq->vq_free_cnt += (VIRTIO_TX_FREE_NR << 1);
+}
+
+uint16_t
+virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts)
+{
+   struct virtqueue *txvq = tx_queue;
+   uint16_t nb_used;
+   uint16_t desc_idx;
+   struct vring_desc *start_dp;
+   uint16_t nb_tail, nb_commit;
+   int i;
+   uint16_t desc_idx_max = (txvq->vq_nentries >> 1) - 1;
+
+   nb_used = VIRTQUEUE_NUSED(txvq);
+   rte_compiler_barrier();
+
+   if (nb_used >= VIRTIO_TX_FREE_THRESH)
+   virtio_xmit_cleanup(tx_queue);
+
+   nb_commit = nb_pkts = RTE_MIN((txvq->vq_free_cnt >> 1), nb_pkts);
+   desc_idx = (uint16_t) (txvq->vq_avail_idx & desc_idx_max);
+   start_dp = txvq->vq_ring.desc;
+   nb_tail = (uint16_t) (desc_idx_max + 1 - desc_idx);
+
+   if (nb_commit >= nb_tail) {
+   for (i = 0; i < nb_tail; i++)
+   txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+   for (i = 0; i < nb_tail; i++) {
+   start_dp[desc_idx].addr =
+   RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+   start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+   tx_pkts++;
+   desc_idx++;
+   }
+   nb_commit -= nb_tail;
+   desc_idx = 0;
+   }
+   for (i = 0; i < nb_commit; i++)
+   txvq->vq_descx[desc_idx + i].cookie = tx_pkts[i];
+   for (i = 0; i < nb_commit; i++) {
+   start_dp[desc_idx].addr = RTE_MBUF_DATA_DMA_ADDR(*tx_pkts);
+   start_dp[desc_idx].len = (*tx_pkts)->pkt_len;
+   tx_pkts++;
+   desc_idx++;
+   }
+
+   rte_compiler_barrier();
+
+   txvq->vq_free_cnt -= (uint16_t)(nb_pkts << 1);
+   txvq->vq_avail_idx += nb_pkts;
+   txvq->vq_ring.avail->idx = txvq->vq_avail_idx;
+   txvq->packets += nb_pkts;
+
+   if (likely(nb_pkts)) {
+   if (unlikely(virtqueue_kick_prepare(txvq)))
+   virtqueue_notify(txvq);
+   }
+
+   return nb_pkts;
+}
+
 int __attribute__((cold))
 virtio_rxq_vec_setup(struct vir

[dpdk-dev] [PATCH v4 7/7] virtio: pick simple rx/tx func

2015-10-22 Thread Huawei Xie

Changes in v4:
Check merge-able feature when select simple rx/tx functions.

simple rx/tx func is chose when merge-able rx is disabled and user specifies 
single segment and
no offload support.

Signed-off-by: Huawei Xie 
---
 drivers/net/virtio/virtio_rxtx.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 947fc46..0f1daf2 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -53,6 +53,7 @@

 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
+#include "virtio_pci.h"
 #include "virtqueue.h"
 #include "virtio_rxtx.h"

@@ -62,6 +63,10 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif

+
+#define VIRTIO_SIMPLE_FLAGS ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
+   ETH_TXQ_FLAGS_NOOFFLOADS)
+
 static int use_simple_rxtx;

 static void
@@ -459,6 +464,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
const struct rte_eth_txconf *tx_conf)
 {
uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtqueue *vq;
uint16_t tx_free_thresh;
int ret;
@@ -471,6 +477,15 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
return -EINVAL;
}

+   /* Use simple rx/tx func if single segment and no offloads */
+   if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) == VIRTIO_SIMPLE_FLAGS &&
+!vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) {
+   PMD_INIT_LOG(INFO, "Using simple rx/tx path");
+   dev->tx_pkt_burst = virtio_xmit_pkts_simple;
+   dev->rx_pkt_burst = virtio_recv_pkts_vec;
+   use_simple_rxtx = 1;
+   }
+
ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, vtpci_queue_idx,
nb_desc, socket_id, &vq);
if (ret < 0) {
-- 
1.8.1.4

[dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements

2015-10-22 Thread Xie, Huawei

On 10/22/2015 6:39 PM, Xie, Huawei wrote:
> On 10/21/2015 9:20 PM, Thomas Monjalon wrote:
>> 2015-10-18 22:16, Stephen Hemminger:
>>> This is a tested version of the virtio Tx performance improvements
>>> that I posted earlier on the list, and described at the DPDK Userspace
>>> meeting in Dublin. Together they get a 25% performance improvement for
>>> both small packet and large multi-segment packet case when testing
>>> from DPDK guest application to Linux KVM host.
>>>
>>> Stephen Hemminger (5):
>>>   virtio: clean up space checks on xmit
>>>   virtio: don't use unlikely for normal tx stuff
>>>   virtio: use indirect ring elements
>>>   virtio: use any layout on transmit
>>>   virtio: optimize transmit enqueue
>> Huawei, do you ack this series?
>>
> Okay with this patchset with two remained questions,
Forget to cc Stephen.
>
> +/* Region reserved to allow for transmit header and indirect ring */
> +#define VIRTIO_MAX_TX_INDIRECT 8
> +struct virtio_tx_region {
> + struct virtio_net_hdr_mrg_rxbuf tx_hdr;
>
> Why use merge-able rx header here in the tx region?
>
>> +struct vring_desc tx_indir[VIRTIO_MAX_TX_INDIRECT]
>> +   __attribute__((__aligned__(16)));
> WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
> [...]
>
>
>
>

[dpdk-dev] [PATCH v8 0/8] vhost-user multiple queues enabling

2015-10-22 Thread Yuanhan Liu

This patch set enables vhost-user multiple queue feature.


v8:

   - put SET_VRING_ENABLE() patch before the patch actually
 enable mq, since that make more sense.

   - don't change the kickfd reset behavior for patch 3

   - move virt_queue field to the end of virtio_net struct.

   - comment and type fixe


v7:

   - Removed vhost-user mq examples in this patch set

 Because the example leverages the hardware VMDq feature to
 demonstrate the mq feature, which introduces too much 
 limitation, yet it's turned out to be not elegant.

   - Commit log fixes

   - Dropped the patch to fix RESET_OWNER handling, as I found
 Jerome's solution works as well, and it makes more sense to
 me:

 http://dpdk.org/dev/patchwork/project/dpdk/list/?submitter=354



Overview


It depends on some QEMU patches that has already been merged to upstream.
Those qemu patches introduce some new vhost-user messages, for vhost-user
mq enabling negotiation. Here is the main negotiation steps (Qemu
as master, and DPDK vhost-user as slave):

- Master queries features by VHOST_USER_GET_FEATURES from slave

- Check if VHOST_USER_F_PROTOCOL_FEATURES exist. If not, mq is not
  supported. (check patch 1 for why VHOST_USER_F_PROTOCOL_FEATURES
  is introduced)

- Master then sends another command, VHOST_USER_GET_QUEUE_NUM, for
  querying how many queues the slave supports.

  Master will compare the result with the requested queue number.
  Qemu exits if the former is smaller.

- Master then tries to initiate all queue pairs by sending some vhost
  user commands, including VHOST_USER_SET_VRING_CALL, which will
  trigger the slave to do related vring setup, such as vring allocation.


Till now, all necessary initiation and negotiation are done. And master
could send another message, VHOST_USER_SET_VRING_ENABLE, to enable/disable
a specific queue dynamically later.


Patchset


Patch 1-6 are all prepare works for enabling mq; they are all atomic
changes, with "do not breaking anything" beared in mind while making
them.

Patch 7 actually enables mq feature, by setting two key feature flags.


Test with OVS
=

Marcel created a simple yet quite clear test guide with OVS at:

   http://wiki.qemu.org/Features/vhost-user-ovs-dpdk




---
Changchun Ouyang (3):
  vhost: rxtx: use queue id instead of constant ring index
  virtio: fix deadloop due to reading virtio_net_config incorrectly
  vhost: add VHOST_USER_SET_VRING_ENABLE message

Yuanhan Liu (5):
  vhost-user: add protocol features support
  vhost-user: add VHOST_USER_GET_QUEUE_NUM message
  vhost: vring queue setup for multiple queue support
  vhost-user: enable vhost-user multiple queue
  doc: update release note for vhost-user mq support

 doc/guides/rel_notes/release_2_2.rst  |   4 +
 drivers/net/virtio/virtio_ethdev.c|  16 ++-
 lib/librte_vhost/rte_virtio_net.h |  13 +-
 lib/librte_vhost/vhost_rxtx.c |  53 +---
 lib/librte_vhost/vhost_user/vhost-net-user.c  |  25 +++-
 lib/librte_vhost/vhost_user/vhost-net-user.h  |   4 +
 lib/librte_vhost/vhost_user/virtio-net-user.c |  86 ++---
 lib/librte_vhost/vhost_user/virtio-net-user.h |  10 ++
 lib/librte_vhost/virtio-net.c | 168 --
 9 files changed, 275 insertions(+), 104 deletions(-)

-- 
1.9.0

[dpdk-dev] [PATCH v8 1/8] vhost-user: add protocol features support

2015-10-22 Thread Yuanhan Liu

The two protocol features messages are introduced by qemu vhost
maintainer(Michael) for extendting vhost-user interface. Here is
an excerpta from the vhost-user spec:

Any protocol extensions are gated by protocol feature bits,
which allows full backwards compatibility on both master
and slave.

The vhost-user multiple queue features will be treated as a vhost-user
extension, hence, we have to implement the two messages first.

VHOST_USER_PROTOCOL_FEATURES is initialized to 0, as we don't support
any yet.

Signed-off-by: Yuanhan Liu 
Acked-by: Flavio Leitner 
---
 lib/librte_vhost/rte_virtio_net.h |  1 +
 lib/librte_vhost/vhost_user/vhost-net-user.c  | 13 -
 lib/librte_vhost/vhost_user/vhost-net-user.h  |  2 ++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 13 +
 lib/librte_vhost/vhost_user/virtio-net-user.h |  5 +
 lib/librte_vhost/virtio-net.c |  5 -
 6 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index a037c15..e3a21e5 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -99,6 +99,7 @@ struct virtio_net {
struct vhost_virtqueue  *virtqueue[VIRTIO_QNUM];/**< Contains 
all virtqueue information. */
struct virtio_memory*mem;   /**< QEMU memory and memory 
region information. */
uint64_tfeatures;   /**< Negotiated feature set. */
+   uint64_tprotocol_features;  /**< Negotiated 
protocol feature set. */
uint64_tdevice_fh;  /**< device identifier. */
uint32_tflags;  /**< Device flags. Only used to 
check if device is running on data core. */
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
b/lib/librte_vhost/vhost_user/vhost-net-user.c
index d1f8877..bc2ad24 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -95,7 +95,9 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_GET_VRING_BASE] = "VHOST_USER_GET_VRING_BASE",
[VHOST_USER_SET_VRING_KICK] = "VHOST_USER_SET_VRING_KICK",
[VHOST_USER_SET_VRING_CALL] = "VHOST_USER_SET_VRING_CALL",
-   [VHOST_USER_SET_VRING_ERR]  = "VHOST_USER_SET_VRING_ERR"
+   [VHOST_USER_SET_VRING_ERR]  = "VHOST_USER_SET_VRING_ERR",
+   [VHOST_USER_GET_PROTOCOL_FEATURES]  = 
"VHOST_USER_GET_PROTOCOL_FEATURES",
+   [VHOST_USER_SET_PROTOCOL_FEATURES]  = 
"VHOST_USER_SET_PROTOCOL_FEATURES",
 };

 /**
@@ -363,6 +365,15 @@ vserver_message_handler(int connfd, void *dat, int *remove)
ops->set_features(ctx, &features);
break;

+   case VHOST_USER_GET_PROTOCOL_FEATURES:
+   msg.payload.u64 = VHOST_USER_PROTOCOL_FEATURES;
+   msg.size = sizeof(msg.payload.u64);
+   send_vhost_message(connfd, &msg);
+   break;
+   case VHOST_USER_SET_PROTOCOL_FEATURES:
+   user_set_protocol_features(ctx, msg.payload.u64);
+   break;
+
case VHOST_USER_SET_OWNER:
ops->set_owner(ctx);
break;
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h 
b/lib/librte_vhost/vhost_user/vhost-net-user.h
index 2e72f3c..4490d23 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.h
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -63,6 +63,8 @@ typedef enum VhostUserRequest {
VHOST_USER_SET_VRING_KICK = 12,
VHOST_USER_SET_VRING_CALL = 13,
VHOST_USER_SET_VRING_ERR = 14,
+   VHOST_USER_GET_PROTOCOL_FEATURES = 15,
+   VHOST_USER_SET_PROTOCOL_FEATURES = 16,
VHOST_USER_MAX
 } VhostUserRequest;

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c 
b/lib/librte_vhost/vhost_user/virtio-net-user.c
index e0bc2a4..6da729d 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -318,3 +318,16 @@ user_destroy_device(struct vhost_device_ctx ctx)
dev->mem = NULL;
}
 }
+
+void
+user_set_protocol_features(struct vhost_device_ctx ctx,
+  uint64_t protocol_features)
+{
+   struct virtio_net *dev;
+
+   dev = get_device(ctx);
+   if (dev == NULL || protocol_features & ~VHOST_USER_PROTOCOL_FEATURES)
+   return;
+
+   dev->protocol_features = protocol_features;
+}
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h 
b/lib/librte_vhost/vhost_user/virtio-net-user.h
index df24860..e7a6ff4 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.h
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.h
@@ -37,12 +37,17 @@
 #include "vhost-net.h"
 #include "vhost-net-user.h"

+#define VHOST_USER_PROTOCOL_FEATURES   0ULL
+
 int user_set_mem_table(struct vhost_device_ctx, struct VhostUser

[dpdk-dev] [PATCH v8 2/8] vhost-user: add VHOST_USER_GET_QUEUE_NUM message

2015-10-22 Thread Yuanhan Liu

To tell the frontend (qemu) how many queue pairs we support.

And it is initiated to VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX.

Signed-off-by: Yuanhan Liu 
Acked-by: Flavio Leitner 
---
 lib/librte_vhost/vhost_user/vhost-net-user.c | 7 +++
 lib/librte_vhost/vhost_user/vhost-net-user.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
b/lib/librte_vhost/vhost_user/vhost-net-user.c
index bc2ad24..8675cd4 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -98,6 +98,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_SET_VRING_ERR]  = "VHOST_USER_SET_VRING_ERR",
[VHOST_USER_GET_PROTOCOL_FEATURES]  = 
"VHOST_USER_GET_PROTOCOL_FEATURES",
[VHOST_USER_SET_PROTOCOL_FEATURES]  = 
"VHOST_USER_SET_PROTOCOL_FEATURES",
+   [VHOST_USER_GET_QUEUE_NUM]  = "VHOST_USER_GET_QUEUE_NUM",
 };

 /**
@@ -421,6 +422,12 @@ vserver_message_handler(int connfd, void *dat, int *remove)
RTE_LOG(INFO, VHOST_CONFIG, "not implemented\n");
break;

+   case VHOST_USER_GET_QUEUE_NUM:
+   msg.payload.u64 = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX;
+   msg.size = sizeof(msg.payload.u64);
+   send_vhost_message(connfd, &msg);
+   break;
+
default:
break;

diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h 
b/lib/librte_vhost/vhost_user/vhost-net-user.h
index 4490d23..389d21d 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.h
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -65,6 +65,7 @@ typedef enum VhostUserRequest {
VHOST_USER_SET_VRING_ERR = 14,
VHOST_USER_GET_PROTOCOL_FEATURES = 15,
VHOST_USER_SET_PROTOCOL_FEATURES = 16,
+   VHOST_USER_GET_QUEUE_NUM = 17,
VHOST_USER_MAX
 } VhostUserRequest;

-- 
1.9.0

[dpdk-dev] [PATCH v8 4/8] vhost: rxtx: use queue id instead of constant ring index

2015-10-22 Thread Yuanhan Liu

From: Changchun Ouyang 

Do not use VIRTIO_RXQ or VIRTIO_TXQ anymore; use the queue_id
instead, which will be set to a proper value for a specific queue
when we have multiple queue support enabled.

For now, queue_id is still set with VIRTIO_RXQ or VIRTIO_TXQ,
so it should not break anything.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Yuanhan Liu 
Acked-by: Flavio Leitner 

---

v8: simplify is_valid_vrit_queue_idx()

v7: commit title fix
---
 lib/librte_vhost/vhost_rxtx.c | 43 +--
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 7026bfa..1ec8850 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -32,6 +32,7 @@
  */

 #include 
+#include 
 #include 

 #include 
@@ -42,6 +43,12 @@

 #define MAX_PKT_BURST 32

+static bool
+is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t qp_nb)
+{
+   return (is_tx ^ (idx & 1)) == 0 && idx < qp_nb * VIRTIO_QNUM;
+}
+
 /**
  * This function adds buffers to the virtio devices RX virtqueue. Buffers can
  * be received from the physical port or from another virtio device. A packet
@@ -68,12 +75,14 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
uint8_t success = 0;

LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh);
-   if (unlikely(queue_id != VIRTIO_RXQ)) {
-   LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
+   if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
+   RTE_LOG(ERR, VHOST_DATA,
+   "%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
+   __func__, dev->device_fh, queue_id);
return 0;
}

-   vq = dev->virtqueue[VIRTIO_RXQ];
+   vq = dev->virtqueue[queue_id];
count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count;

/*
@@ -235,8 +244,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 }

 static inline uint32_t __attribute__((always_inline))
-copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx,
-   uint16_t res_end_idx, struct rte_mbuf *pkt)
+copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id,
+   uint16_t res_base_idx, uint16_t res_end_idx,
+   struct rte_mbuf *pkt)
 {
uint32_t vec_idx = 0;
uint32_t entry_success = 0;
@@ -264,7 +274,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t 
res_base_idx,
 * Convert from gpa to vva
 * (guest physical addr -> vhost virtual addr)
 */
-   vq = dev->virtqueue[VIRTIO_RXQ];
+   vq = dev->virtqueue[queue_id];
vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr);
vb_hdr_addr = vb_addr;

@@ -464,11 +474,14 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,

LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n",
dev->device_fh);
-   if (unlikely(queue_id != VIRTIO_RXQ)) {
-   LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
+   if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
+   RTE_LOG(ERR, VHOST_DATA,
+   "%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
+   __func__, dev->device_fh, queue_id);
+   return 0;
}

-   vq = dev->virtqueue[VIRTIO_RXQ];
+   vq = dev->virtqueue[queue_id];
count = RTE_MIN((uint32_t)MAX_PKT_BURST, count);

if (count == 0)
@@ -509,8 +522,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
res_cur_idx);
} while (success == 0);

-   entry_success = copy_from_mbuf_to_vring(dev, res_base_idx,
-   res_cur_idx, pkts[pkt_idx]);
+   entry_success = copy_from_mbuf_to_vring(dev, queue_id,
+   res_base_idx, res_cur_idx, pkts[pkt_idx]);

rte_compiler_barrier();

@@ -562,12 +575,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
uint16_t free_entries, entry_success = 0;
uint16_t avail_idx;

-   if (unlikely(queue_id != VIRTIO_TXQ)) {
-   LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n");
+   if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) {
+   RTE_LOG(ERR, VHOST_DATA,
+   "%s (%"PRIu64"): virtqueue idx:%d invalid.\n",
+   __func__, dev->device_fh, queue_id);
return 0;
}

-   vq = dev->virtqueue[VIRTIO_TXQ];
+   vq = dev->virtqueue[queue_id];
avail_idx =  *((volatile uint16_t *)&vq->avail->idx);

/* If there are no available buffers then return. */
-- 
1.9.0

[dpdk-dev] [PATCH v8 5/8] virtio: fix deadloop due to reading virtio_net_config incorrectly

2015-10-22 Thread Yuanhan Liu

From: Changchun Ouyang 

The old code adjusts the config bytes we want to read depending on
what kind of features we have, but we later cast the entire buf we
read with "struct virtio_net_config", which is obviously wrong.

The wrong config reading results to a dead loop at virtio_send_command()
while starting testpmd.

The right way to go is to read related config bytes when corresponding
feature is set, which is exactly what this patch does.

Fixes: 823ad647950a ("virtio: support multiple queues")

Signed-off-by: Changchun Ouyang 
Signed-off-by: Yuanhan Liu 
Acked-by: Flavio Leitner 

---

v7: commit log fixes

v6: read mac unconditionally.
---
 drivers/net/virtio/virtio_ethdev.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 02f698a..12fcc23 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1162,7 +1162,6 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
struct virtio_hw *hw = eth_dev->data->dev_private;
struct virtio_net_config *config;
struct virtio_net_config local_config;
-   uint32_t offset_conf = sizeof(config->mac);
struct rte_pci_device *pci_dev;

RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr));
@@ -1225,8 +1224,14 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) {
config = &local_config;

+   vtpci_read_dev_config(hw,
+   offsetof(struct virtio_net_config, mac),
+   &config->mac, sizeof(config->mac));
+
if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
-   offset_conf += sizeof(config->status);
+   vtpci_read_dev_config(hw,
+   offsetof(struct virtio_net_config, status),
+   &config->status, sizeof(config->status));
} else {
PMD_INIT_LOG(DEBUG,
 "VIRTIO_NET_F_STATUS is not supported");
@@ -1234,15 +1239,16 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
}

if (vtpci_with_feature(hw, VIRTIO_NET_F_MQ)) {
-   offset_conf += sizeof(config->max_virtqueue_pairs);
+   vtpci_read_dev_config(hw,
+   offsetof(struct virtio_net_config, 
max_virtqueue_pairs),
+   &config->max_virtqueue_pairs,
+   sizeof(config->max_virtqueue_pairs));
} else {
PMD_INIT_LOG(DEBUG,
 "VIRTIO_NET_F_MQ is not supported");
config->max_virtqueue_pairs = 1;
}

-   vtpci_read_dev_config(hw, 0, (uint8_t *)config, offset_conf);
-
hw->max_rx_queues =
(VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ?
VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs;
-- 
1.9.0

[dpdk-dev] [PATCH v8 3/8] vhost: vring queue setup for multiple queue support

2015-10-22 Thread Yuanhan Liu

All queue pairs, including the default (the first) queue pair,
are allocated dynamically, when a vring_call message is received
first time for a specific queue pair.

This is a refactor work for enabling vhost-user multiple queue;
it should not break anything as it does no functional changes:
we don't support mq set, so there is only one mq at max.

This patch is based on Changchun's patch.

Signed-off-by: Ouyang Changchun 
Signed-off-by: Yuanhan Liu 
Acked-by: Flavio Leitner 

---

v8: - move virtuque field to the end of `virtio_net' struct.

- Add a FIXME at set_vring_call() for doing vring queue pair
  allocation.
---
 lib/librte_vhost/rte_virtio_net.h |   3 +-
 lib/librte_vhost/vhost_user/virtio-net-user.c |  46 
 lib/librte_vhost/virtio-net.c | 156 --
 3 files changed, 123 insertions(+), 82 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index e3a21e5..9a32a95 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -96,7 +96,6 @@ struct vhost_virtqueue {
  * Device structure contains all configuration information relating to the 
device.
  */
 struct virtio_net {
-   struct vhost_virtqueue  *virtqueue[VIRTIO_QNUM];/**< Contains 
all virtqueue information. */
struct virtio_memory*mem;   /**< QEMU memory and memory 
region information. */
uint64_tfeatures;   /**< Negotiated feature set. */
uint64_tprotocol_features;  /**< Negotiated 
protocol feature set. */
@@ -104,7 +103,9 @@ struct virtio_net {
uint32_tflags;  /**< Device flags. Only used to 
check if device is running on data core. */
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
charifname[IF_NAME_SZ]; /**< Name of the tap 
device or socket path. */
+   uint32_tvirt_qp_nb; /**< number of queue pair we 
have allocated */
void*priv;  /**< private context */
+   struct vhost_virtqueue  *virtqueue[VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX];
/**< Contains all virtqueue information. */
 } __rte_cache_aligned;

 /**
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c 
b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 6da729d..d62f3d7 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -206,25 +206,33 @@ err_mmap:
 }

 static int
+vq_is_ready(struct vhost_virtqueue *vq)
+{
+   return vq && vq->desc   &&
+  vq->kickfd != -1 &&
+  vq->callfd != -1;
+}
+
+static int
 virtio_is_ready(struct virtio_net *dev)
 {
struct vhost_virtqueue *rvq, *tvq;
+   uint32_t i;

-   /* mq support in future.*/
-   rvq = dev->virtqueue[VIRTIO_RXQ];
-   tvq = dev->virtqueue[VIRTIO_TXQ];
-   if (rvq && tvq && rvq->desc && tvq->desc &&
-   (rvq->kickfd != -1) &&
-   (rvq->callfd != -1) &&
-   (tvq->kickfd != -1) &&
-   (tvq->callfd != -1)) {
-   RTE_LOG(INFO, VHOST_CONFIG,
-   "virtio is now ready for processing.\n");
-   return 1;
+   for (i = 0; i < dev->virt_qp_nb; i++) {
+   rvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ];
+   tvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ];
+
+   if (!vq_is_ready(rvq) || !vq_is_ready(tvq)) {
+   RTE_LOG(INFO, VHOST_CONFIG,
+   "virtio is not ready for processing.\n");
+   return 0;
+   }
}
+
RTE_LOG(INFO, VHOST_CONFIG,
-   "virtio isn't ready for processing.\n");
-   return 0;
+   "virtio is now ready for processing.\n");
+   return 1;
 }

 void
@@ -292,13 +300,13 @@ user_get_vring_base(struct vhost_device_ctx ctx,
 * sent and only sent in vhost_vring_stop.
 * TODO: cleanup the vring, it isn't usable since here.
 */
-   if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
-   close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
-   dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
+   if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
+   close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
+   dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
}
-   if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
-   close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
-   dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
+   if ((dev->virtqueue[state->index]->kickfd + VIRTIO_TXQ) >= 0) {
+   close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
+   dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
}

return 0;
diff --git a/lib/librte_vhost/virtio-net

1 2 >

1 - 100 of 153 matches

Mail list logo