date:20160818

[dpdk-dev] ConnectX4 100GbE - Compilation problem

2016-08-18 Thread george....@gmail.com

Hi Adrien,

Thanks for the prompt reply!
You are right, I didn't go via the DPDK route, in the hope that Mellanox
will provide the exact source and configuration.
DPDK 16.07 from dpdk.org works like a charm and my NIC is in PMD mode,
thanks a lot for your guidance!

Best regards,
Georgios

On Thu, Aug 18, 2016 at 6:05 PM, Adrien Mazarguil <
adrien.mazarguil at 6wind.com> wrote:

> Hi George,
>
> On Thu, Aug 18, 2016 at 05:41:38PM +0200, george.dit at gmail.com wrote:
> > Hi,
> >
> > I have a single port Mellanox ConnectX 4 100GbE NIC and I want to test
> its
> > Rx/Tx capabilites  using DPDK.
> > My system runs a Linux kernel 4.4 compiled from sources.
> >
> > I found the PMD driver for this NIC as provided by Mellanox here
> >  209=pmd_for_dpdk>
> > .
> > Following this
> >  MLNX_DPDK_Quick_Start_Guide_v2.2_2.7.pdf>
> > guideline, I put my NIC in Ethernet mode, configured the 3 options
> > regarding mlx5 in the config/common_linuxapp file and applied 'make
> install
> > T=x86_64-native-linuxapp-gcc'.
>
> Please note this is a third party package maintained by Mellanox, therefore
> this mailing list is not the right place to discuss related errors, unless
> they can be reproduced with a version downloaded from dpdk.org.
>
> > Then, I stumbled upon a compilation problem in the mlx4 module.
> > The compiler's output is as follows:
> >
> > == Build drivers/net/mlx4
> >   CC mlx4.o
> > In file included from /usr/include/linux/if.h:31:0,
> >  from /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:57:
> > /usr/include/linux/hdlc/ioctl.h:73:14: error: ?IFNAMSIZ? undeclared here
> > (not in a function)
> >   char master[IFNAMSIZ]; /* Name of master FRAD device */
> >   ^
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:612:53: error: ?struct
> > ifreq? declared inside parameter list [-Werror]
> >  priv_ifreq(const struct priv *priv, int req, struct ifreq *ifr)
> >  ^
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:612:53: error: its scope
> is
> > only this definition or declaration, which is probably not what you want
> > [-Werror]
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function
> ?priv_ifreq?:
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:619:32: error:
> dereferencing
> > pointer to incomplete type ?struct ifreq?
> >   if (priv_get_ifname(priv, >ifr_name) == 0)
> > ^
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function ?rxq_setup?:
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:3735:29: error: unused
> > parameter ?inactive? [-Werror=unused-parameter]
> > unsigned int socket, int inactive, const struct rte_eth_rxconf *conf,
> >  ^
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function
> > ?mlx4_link_update_unlocked?:
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4712:15: error: storage
> size
> > of ?ifr? isn?t known
> >   struct ifreq ifr;
> >^
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4724:43: error: ?IFF_UP?
> > undeclared (first use in this function)
> >   dev_link.link_status = ((ifr.ifr_flags & IFF_UP) &&
> >^
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4724:43: note: each
> > undeclared identifier is reported only once for each function it appears
> in
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4725:22: error:
> > ?IFF_RUNNING? undeclared (first use in this function)
> >  (ifr.ifr_flags & IFF_RUNNING));
> >   ^
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4712:15: error: unused
> > variable ?ifr? [-Werror=unused-variable]
> >   struct ifreq ifr;
> >^
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function
> > ?mlx4_dev_get_flow_ctrl?:
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4880:15: error: storage
> size
> > of ?ifr? isn?t known
> >   struct ifreq ifr;
> >^
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4880:15: error: unused
> > variable ?ifr? [-Werror=unused-variable]
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function
> > ?mlx4_dev_set_flow_ctrl?:
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4930:15: error: storage
> size
> > of ?ifr? isn?t known
> >   struct ifreq ifr;
> >^
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4930:15: error: unused
> > variable ?ifr? [-Werror=unused-variable]
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function
> ?priv_get_mac?:
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:5184:15: error: storage
> size
> > of ?request? isn?t known
> >   struct ifreq request;
> >^
> > /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:5184:15: error: unused
> > variable ?request? [-Werror=unused-variable]
> >

[dpdk-dev] ConnectX4 100GbE - Compilation problem

2016-08-18 Thread Adrien Mazarguil

Hi George,

On Thu, Aug 18, 2016 at 05:41:38PM +0200, george.dit at gmail.com wrote:
> Hi,
> 
> I have a single port Mellanox ConnectX 4 100GbE NIC and I want to test its
> Rx/Tx capabilites  using DPDK.
> My system runs a Linux kernel 4.4 compiled from sources.
> 
> I found the PMD driver for this NIC as provided by Mellanox here
> 
> .
> Following this
> 
> guideline, I put my NIC in Ethernet mode, configured the 3 options
> regarding mlx5 in the config/common_linuxapp file and applied 'make install
> T=x86_64-native-linuxapp-gcc'.

Please note this is a third party package maintained by Mellanox, therefore
this mailing list is not the right place to discuss related errors, unless
they can be reproduced with a version downloaded from dpdk.org.

> Then, I stumbled upon a compilation problem in the mlx4 module.
> The compiler's output is as follows:
> 
> == Build drivers/net/mlx4
>   CC mlx4.o
> In file included from /usr/include/linux/if.h:31:0,
>  from /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:57:
> /usr/include/linux/hdlc/ioctl.h:73:14: error: ?IFNAMSIZ? undeclared here
> (not in a function)
>   char master[IFNAMSIZ]; /* Name of master FRAD device */
>   ^
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:612:53: error: ?struct
> ifreq? declared inside parameter list [-Werror]
>  priv_ifreq(const struct priv *priv, int req, struct ifreq *ifr)
>  ^
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:612:53: error: its scope is
> only this definition or declaration, which is probably not what you want
> [-Werror]
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function ?priv_ifreq?:
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:619:32: error: dereferencing
> pointer to incomplete type ?struct ifreq?
>   if (priv_get_ifname(priv, >ifr_name) == 0)
> ^
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function ?rxq_setup?:
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:3735:29: error: unused
> parameter ?inactive? [-Werror=unused-parameter]
> unsigned int socket, int inactive, const struct rte_eth_rxconf *conf,
>  ^
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function
> ?mlx4_link_update_unlocked?:
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4712:15: error: storage size
> of ?ifr? isn?t known
>   struct ifreq ifr;
>^
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4724:43: error: ?IFF_UP?
> undeclared (first use in this function)
>   dev_link.link_status = ((ifr.ifr_flags & IFF_UP) &&
>^
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4724:43: note: each
> undeclared identifier is reported only once for each function it appears in
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4725:22: error:
> ?IFF_RUNNING? undeclared (first use in this function)
>  (ifr.ifr_flags & IFF_RUNNING));
>   ^
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4712:15: error: unused
> variable ?ifr? [-Werror=unused-variable]
>   struct ifreq ifr;
>^
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function
> ?mlx4_dev_get_flow_ctrl?:
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4880:15: error: storage size
> of ?ifr? isn?t known
>   struct ifreq ifr;
>^
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4880:15: error: unused
> variable ?ifr? [-Werror=unused-variable]
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function
> ?mlx4_dev_set_flow_ctrl?:
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4930:15: error: storage size
> of ?ifr? isn?t known
>   struct ifreq ifr;
>^
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4930:15: error: unused
> variable ?ifr? [-Werror=unused-variable]
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function ?priv_get_mac?:
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:5184:15: error: storage size
> of ?request? isn?t known
>   struct ifreq request;
>^
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:5184:15: error: unused
> variable ?request? [-Werror=unused-variable]
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function
> ?mlx4_pci_devinit?:
> /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:5725:25: error: ?IFF_UP?
> undeclared (first use in this function)
>priv_set_flags(priv, ~IFF_UP, IFF_UP);
>  ^
> cc1: all warnings being treated as errors
> /opt/MLNX_DPDK_2.2_2.7/mk/internal/rte.compile-pre.mk:126: recipe for
> target 'mlx4.o' failed
> 
> Iwould appreciate any suggestions and guidance.

Well fortunately these errors are also present in v2.2.0 and should have
been addressed since v16.07 by the following commit:

[dpdk-dev] Not able to bring up the VM with dpdk enabled ovs bridge

2016-08-18 Thread harshavardhan Reddy

Hi,

When I try to add the virtio interface to dpdk enabled ovs bridge I am
getting below error message.

"Unable to add port vnet1 to OVS bridge ovsbr0"

I have added created the ovs bridge and added ports to the same as below.
$OVS_DIR/utilities/ovs-vsctl add-port ovsbr0 vhost-user1 -- set Interface
vhost-user1 type=dpdkvhostuser
$OVS_DIR/utilities/ovs-vsctl add-port ovsbr0 vhost-user2 -- set Interface
vhost-user2 type=dpdkvhostuser

In my VM's xml file I have done the below "highlighted "changes as well

sudo virsh dumpxml AP1

  AP1
  6dd3c551-76a2-65d5-32e7-8daaaf433cf4
  4194304
  4194304
  2
  
hvm

  
  



  
  
  destroy
  restart
  restart
  
/usr/bin/qemu-system-x86_64

  
  
  
  


  
  
  
  


  



  


  
  
  
  


  
  
  

  
  
  


  
  
  
  


  


  





  


  
  


  

  



However still I am not able to start the VM.
When I try to start the VM, I am getting below error message.

#virsh start AP1
error: Failed to start domain AP1
error: internal error: Unable to add port vnet1 to OVS bridge ovsbr0

I have Ubuntu latest version as my host
"Ubuntu 16.04.1 LTS" with qemu 2.5.0 version.

Kindly suggest If I am missing anything here.

Regards,
Hvr

[dpdk-dev] ConnectX4 100GbE - Compilation problem

2016-08-18 Thread george....@gmail.com

Hi,

I have a single port Mellanox ConnectX 4 100GbE NIC and I want to test its
Rx/Tx capabilites  using DPDK.
My system runs a Linux kernel 4.4 compiled from sources.

I found the PMD driver for this NIC as provided by Mellanox here

. 
Following this

guideline, I put my NIC in Ethernet mode, configured the 3 options
regarding mlx5 in the config/common_linuxapp file and applied 'make install
T=x86_64-native-linuxapp-gcc'.

Then, I stumbled upon a compilation problem in the mlx4 module.
The compiler's output is as follows:

== Build drivers/net/mlx4
  CC mlx4.o
In file included from /usr/include/linux/if.h:31:0,
 from /opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:57:
/usr/include/linux/hdlc/ioctl.h:73:14: error: ?IFNAMSIZ? undeclared here
(not in a function)
  char master[IFNAMSIZ]; /* Name of master FRAD device */
  ^
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:612:53: error: ?struct
ifreq? declared inside parameter list [-Werror]
 priv_ifreq(const struct priv *priv, int req, struct ifreq *ifr)
 ^
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:612:53: error: its scope is
only this definition or declaration, which is probably not what you want
[-Werror]
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function ?priv_ifreq?:
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:619:32: error: dereferencing
pointer to incomplete type ?struct ifreq?
  if (priv_get_ifname(priv, >ifr_name) == 0)
^
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function ?rxq_setup?:
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:3735:29: error: unused
parameter ?inactive? [-Werror=unused-parameter]
unsigned int socket, int inactive, const struct rte_eth_rxconf *conf,
 ^
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function
?mlx4_link_update_unlocked?:
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4712:15: error: storage size
of ?ifr? isn?t known
  struct ifreq ifr;
   ^
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4724:43: error: ?IFF_UP?
undeclared (first use in this function)
  dev_link.link_status = ((ifr.ifr_flags & IFF_UP) &&
   ^
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4724:43: note: each
undeclared identifier is reported only once for each function it appears in
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4725:22: error:
?IFF_RUNNING? undeclared (first use in this function)
 (ifr.ifr_flags & IFF_RUNNING));
  ^
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4712:15: error: unused
variable ?ifr? [-Werror=unused-variable]
  struct ifreq ifr;
   ^
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function
?mlx4_dev_get_flow_ctrl?:
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4880:15: error: storage size
of ?ifr? isn?t known
  struct ifreq ifr;
   ^
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4880:15: error: unused
variable ?ifr? [-Werror=unused-variable]
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function
?mlx4_dev_set_flow_ctrl?:
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4930:15: error: storage size
of ?ifr? isn?t known
  struct ifreq ifr;
   ^
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:4930:15: error: unused
variable ?ifr? [-Werror=unused-variable]
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function ?priv_get_mac?:
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:5184:15: error: storage size
of ?request? isn?t known
  struct ifreq request;
   ^
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:5184:15: error: unused
variable ?request? [-Werror=unused-variable]
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c: In function
?mlx4_pci_devinit?:
/opt/MLNX_DPDK_2.2_2.7/drivers/net/mlx4/mlx4.c:5725:25: error: ?IFF_UP?
undeclared (first use in this function)
   priv_set_flags(priv, ~IFF_UP, IFF_UP);
 ^
cc1: all warnings being treated as errors
/opt/MLNX_DPDK_2.2_2.7/mk/internal/rte.compile-pre.mk:126: recipe for
target 'mlx4.o' failed

Iwould appreciate any suggestions and guidance.

Best regards,
Georgios Katsikas

[dpdk-dev] [PATCH] eal/armv8: high-resolution cycle counter

2016-08-18 Thread Jerin Jacob

Existing cntvct_el0 based rte_rdtsc() provides portable
means to get wall clock counter at user space. Typically
it runs at <= 100MHz.

The alternative method to enable rte_rdtsc() for high resolution
wall clock counter is through armv8 PMU subsystem.
The PMU cycle counter runs at CPU frequency, However,
access to PMU cycle counter from user space is not enabled
by default in the arm64 linux kernel.
It is possible to enable cycle counter at user space access
by configuring the PMU from the privileged mode (kernel space).

by default rte_rdtsc() implementation uses portable
cntvct_el0 scheme. Application can choose the PMU based
implementation with CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU

Signed-off-by: Jerin Jacob 
---

The PMU based scheme useful for high accuracy performance profiling.
Find below the example steps to configure the PMU based cycle counter on an
armv8 machine.

# git clone https://github.com/jerinjacobk/armv8_pmu_cycle_counter_el0
# cd armv8_pmu_cycle_counter_el0
# make
# sudo insmod pmu_el0_cycle_counter.ko
# cd $DPDK_DIR
# make config T=arm64-armv8a-linuxapp-gcc
# echo "CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU=y" >> build/.config
# make -j 4

---
 .../common/include/arch/arm/rte_cycles_64.h| 33 ++
 1 file changed, 33 insertions(+)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h 
b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
index 14f2612..867a946 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
@@ -45,6 +45,11 @@ extern "C" {
  * @return
  *   The time base for this lcore.
  */
+#ifndef RTE_ARM_EAL_RDTSC_USE_PMU
+/**
+ * This call is portable to any ARMv8 architecture, however, typically
+ * cntvct_el0 runs at <= 100MHz and it may be imprecise for some tasks.
+ */
 static inline uint64_t
 rte_rdtsc(void)
 {
@@ -53,6 +58,34 @@ rte_rdtsc(void)
asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
return tsc;
 }
+#else
+/**
+ * This is an alternative method to enable rte_rdtsc() with high resolution
+ * PMU cycles counter.The cycle counter runs at cpu frequency and this scheme
+ * uses ARMv8 PMU subsystem to get the cycle counter at userspace, However,
+ * access to PMU cycle counter from user space is not enabled by default in
+ * arm64 linux kernel.
+ * It is possible to enable cycle counter at user space access by configuring
+ * the PMU from the privileged mode (kernel space).
+ *
+ * asm volatile("msr pmintenset_el1, %0" : : "r" ((u64)(0 << 31)));
+ * asm volatile("msr pmcntenset_el0, %0" :: "r" BIT(31));
+ * asm volatile("msr pmuserenr_el0, %0" : : "r"(BIT(0) | BIT(2)));
+ * asm volatile("mrs %0, pmcr_el0" : "=r" (val));
+ * val |= (BIT(0) | BIT(2));
+ * isb();
+ * asm volatile("msr pmcr_el0, %0" : : "r" (val));
+ *
+ */
+static inline uint64_t
+rte_rdtsc(void)
+{
+   uint64_t tsc;
+
+   asm volatile("mrs %0, pmccntr_el0" : "=r"(tsc));
+   return tsc;
+}
+#endif

 static inline uint64_t
 rte_rdtsc_precise(void)
-- 
2.5.5

[dpdk-dev] [PATCH 7/7] vhost: simplify features set/get

2016-08-18 Thread Yuanhan Liu

No need to use a pointer to store/retrieve features.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost_user.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index ef4a0c1..eee99e9 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -155,23 +155,22 @@ vhost_user_reset_owner(struct virtio_net *dev)
 /*
  * The features that we support are requested.
  */
-static int
-vhost_user_get_features(uint64_t *pu)
+static uint64_t
+vhost_user_get_features(void)
 {
-   *pu = VHOST_FEATURES;
-   return 0;
+   return VHOST_FEATURES;
 }

 /*
  * We receive the negotiated features supported by us and the virtio device.
  */
 static int
-vhost_user_set_features(struct virtio_net *dev, uint64_t *pu)
+vhost_user_set_features(struct virtio_net *dev, uint64_t features)
 {
-   if (*pu & ~VHOST_FEATURES)
+   if (features & ~VHOST_FEATURES)
return -1;

-   dev->features = *pu;
+   dev->features = features;
if (dev->features &
((1 << VIRTIO_NET_F_MRG_RXBUF) | (1ULL << VIRTIO_F_VERSION_1))) 
{
dev->vhost_hlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
@@ -802,7 +801,6 @@ vhost_user_msg_handler(int vid, int fd)
 {
struct virtio_net *dev;
struct VhostUserMsg msg;
-   uint64_t features = 0;
int ret;

dev = get_device(vid);
@@ -828,14 +826,12 @@ vhost_user_msg_handler(int vid, int fd)
vhost_message_str[msg.request]);
switch (msg.request) {
case VHOST_USER_GET_FEATURES:
-   ret = vhost_user_get_features();
-   msg.payload.u64 = features;
+   msg.payload.u64 = vhost_user_get_features();
msg.size = sizeof(msg.payload.u64);
send_vhost_message(fd, );
break;
case VHOST_USER_SET_FEATURES:
-   features = msg.payload.u64;
-   vhost_user_set_features(dev, );
+   vhost_user_set_features(dev, msg.payload.u64);
break;

case VHOST_USER_GET_PROTOCOL_FEATURES:
-- 
1.9.0

[dpdk-dev] [PATCH 6/7] vhost: get device once

2016-08-18 Thread Yuanhan Liu

Invoke get_device() at the beginning of vhost_user_msg_handler, so that
we could check the return value once. Which could save tons of duplicate
get-and-check device.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost_user.c | 160 +-
 1 file changed, 47 insertions(+), 113 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index de3048c..ef4a0c1 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -134,29 +134,17 @@ vhost_backend_cleanup(struct virtio_net *dev)
  * the device hasn't been initialised.
  */
 static int
-vhost_user_set_owner(int vid)
+vhost_user_set_owner(void)
 {
-   struct virtio_net *dev;
-
-   dev = get_device(vid);
-   if (dev == NULL)
-   return -1;
-
return 0;
 }

 static int
-vhost_user_reset_owner(int vid)
+vhost_user_reset_owner(struct virtio_net *dev)
 {
-   struct virtio_net *dev;
-
-   dev = get_device(vid);
-   if (dev == NULL)
-   return -1;
-
if (dev->flags & VIRTIO_DEV_RUNNING) {
dev->flags &= ~VIRTIO_DEV_RUNNING;
-   notify_ops->destroy_device(vid);
+   notify_ops->destroy_device(dev->vid);
}

cleanup_device(dev, 0);
@@ -168,15 +156,8 @@ vhost_user_reset_owner(int vid)
  * The features that we support are requested.
  */
 static int
-vhost_user_get_features(int vid, uint64_t *pu)
+vhost_user_get_features(uint64_t *pu)
 {
-   struct virtio_net *dev;
-
-   dev = get_device(vid);
-   if (dev == NULL)
-   return -1;
-
-   /* Send our supported features. */
*pu = VHOST_FEATURES;
return 0;
 }
@@ -185,13 +166,8 @@ vhost_user_get_features(int vid, uint64_t *pu)
  * We receive the negotiated features supported by us and the virtio device.
  */
 static int
-vhost_user_set_features(int vid, uint64_t *pu)
+vhost_user_set_features(struct virtio_net *dev, uint64_t *pu)
 {
-   struct virtio_net *dev;
-
-   dev = get_device(vid);
-   if (dev == NULL)
-   return -1;
if (*pu & ~VHOST_FEATURES)
return -1;

@@ -215,15 +191,9 @@ vhost_user_set_features(int vid, uint64_t *pu)
  * The virtio device sends us the size of the descriptor ring.
  */
 static int
-vhost_user_set_vring_num(int vid, struct vhost_vring_state *state)
+vhost_user_set_vring_num(struct virtio_net *dev,
+struct vhost_vring_state *state)
 {
-   struct virtio_net *dev;
-
-   dev = get_device(vid);
-   if (dev == NULL)
-   return -1;
-
-   /* State->index refers to the queue index. The txq is 1, rxq is 0. */
dev->virtqueue[state->index]->size = state->num;

return 0;
@@ -343,13 +313,11 @@ qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
  * This function then converts these to our address space.
  */
 static int
-vhost_user_set_vring_addr(int vid, struct vhost_vring_addr *addr)
+vhost_user_set_vring_addr(struct virtio_net *dev, struct vhost_vring_addr 
*addr)
 {
-   struct virtio_net *dev;
struct vhost_virtqueue *vq;

-   dev = get_device(vid);
-   if ((dev == NULL) || (dev->mem == NULL))
+   if (dev->mem == NULL)
return -1;

/* addr->index refers to the queue index. The txq 1, rxq is 0. */
@@ -412,40 +380,28 @@ vhost_user_set_vring_addr(int vid, struct 
vhost_vring_addr *addr)
  * The virtio device sends us the available ring last used index.
  */
 static int
-vhost_user_set_vring_base(int vid, struct vhost_vring_state *state)
+vhost_user_set_vring_base(struct virtio_net *dev,
+ struct vhost_vring_state *state)
 {
-   struct virtio_net *dev;
-
-   dev = get_device(vid);
-   if (dev == NULL)
-   return -1;
-
-   /* State->index refers to the queue index. The txq is 1, rxq is 0. */
dev->virtqueue[state->index]->last_used_idx = state->num;

return 0;
 }

 static int
-vhost_user_set_mem_table(int vid, struct VhostUserMsg *pmsg)
+vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 {
struct VhostUserMemory memory = pmsg->payload.memory;
struct virtio_memory_regions *pregion;
uint64_t mapped_address, mapped_size;
-   struct virtio_net *dev;
unsigned int idx = 0;
struct orig_region_map *pregion_orig;
uint64_t alignment;

-   /* unmap old memory regions one by one*/
-   dev = get_device(vid);
-   if (dev == NULL)
-   return -1;
-
/* Remove from the data plane. */
if (dev->flags & VIRTIO_DEV_RUNNING) {
dev->flags &= ~VIRTIO_DEV_RUNNING;
-   notify_ops->destroy_device(vid);
+   notify_ops->destroy_device(dev->vid);
}

if (dev->mem) {
@@ -587,16 +543,12 @@ virtio_is_ready(struct virtio_net *dev)
 }

 static void
-vhost_user_set_vring_call(int vid, struct VhostUserMsg

[dpdk-dev] [PATCH 5/7] vhost: unify function names

2016-08-18 Thread Yuanhan Liu

Some functions are with prefix "user_", while others with "vhost_".
Making them all starting with "vhost_user_" to unify the function names.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost_user.c | 60 +--
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index ada0a63..de3048c 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -134,7 +134,7 @@ vhost_backend_cleanup(struct virtio_net *dev)
  * the device hasn't been initialised.
  */
 static int
-vhost_set_owner(int vid)
+vhost_user_set_owner(int vid)
 {
struct virtio_net *dev;

@@ -146,7 +146,7 @@ vhost_set_owner(int vid)
 }

 static int
-vhost_reset_owner(int vid)
+vhost_user_reset_owner(int vid)
 {
struct virtio_net *dev;

@@ -168,7 +168,7 @@ vhost_reset_owner(int vid)
  * The features that we support are requested.
  */
 static int
-vhost_get_features(int vid, uint64_t *pu)
+vhost_user_get_features(int vid, uint64_t *pu)
 {
struct virtio_net *dev;

@@ -185,7 +185,7 @@ vhost_get_features(int vid, uint64_t *pu)
  * We receive the negotiated features supported by us and the virtio device.
  */
 static int
-vhost_set_features(int vid, uint64_t *pu)
+vhost_user_set_features(int vid, uint64_t *pu)
 {
struct virtio_net *dev;

@@ -215,7 +215,7 @@ vhost_set_features(int vid, uint64_t *pu)
  * The virtio device sends us the size of the descriptor ring.
  */
 static int
-vhost_set_vring_num(int vid, struct vhost_vring_state *state)
+vhost_user_set_vring_num(int vid, struct vhost_vring_state *state)
 {
struct virtio_net *dev;

@@ -343,7 +343,7 @@ qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
  * This function then converts these to our address space.
  */
 static int
-vhost_set_vring_addr(int vid, struct vhost_vring_addr *addr)
+vhost_user_set_vring_addr(int vid, struct vhost_vring_addr *addr)
 {
struct virtio_net *dev;
struct vhost_virtqueue *vq;
@@ -412,7 +412,7 @@ vhost_set_vring_addr(int vid, struct vhost_vring_addr *addr)
  * The virtio device sends us the available ring last used index.
  */
 static int
-vhost_set_vring_base(int vid, struct vhost_vring_state *state)
+vhost_user_set_vring_base(int vid, struct vhost_vring_state *state)
 {
struct virtio_net *dev;

@@ -427,7 +427,7 @@ vhost_set_vring_base(int vid, struct vhost_vring_state 
*state)
 }

 static int
-user_set_mem_table(int vid, struct VhostUserMsg *pmsg)
+vhost_user_set_mem_table(int vid, struct VhostUserMsg *pmsg)
 {
struct VhostUserMemory memory = pmsg->payload.memory;
struct virtio_memory_regions *pregion;
@@ -587,7 +587,7 @@ virtio_is_ready(struct virtio_net *dev)
 }

 static void
-user_set_vring_call(int vid, struct VhostUserMsg *pmsg)
+vhost_user_set_vring_call(int vid, struct VhostUserMsg *pmsg)
 {
struct vhost_vring_file file;
struct virtio_net *dev = get_device(vid);
@@ -629,7 +629,7 @@ user_set_vring_call(int vid, struct VhostUserMsg *pmsg)
  *  device is ready for packet processing.
  */
 static void
-user_set_vring_kick(int vid, struct VhostUserMsg *pmsg)
+vhost_user_set_vring_kick(int vid, struct VhostUserMsg *pmsg)
 {
struct vhost_vring_file file;
struct virtio_net *dev = get_device(vid);
@@ -661,7 +661,7 @@ user_set_vring_kick(int vid, struct VhostUserMsg *pmsg)
  * when virtio is stopped, qemu will send us the GET_VRING_BASE message.
  */
 static int
-user_get_vring_base(int vid, struct vhost_vring_state *state)
+vhost_user_get_vring_base(int vid, struct vhost_vring_state *state)
 {
struct virtio_net *dev = get_device(vid);

@@ -696,7 +696,7 @@ user_get_vring_base(int vid, struct vhost_vring_state 
*state)
  * enable the virtio queue pair.
  */
 static int
-user_set_vring_enable(int vid, struct vhost_vring_state *state)
+vhost_user_set_vring_enable(int vid, struct vhost_vring_state *state)
 {
struct virtio_net *dev;
int enable = (int)state->num;
@@ -718,7 +718,7 @@ user_set_vring_enable(int vid, struct vhost_vring_state 
*state)
 }

 static void
-user_set_protocol_features(int vid, uint64_t protocol_features)
+vhost_user_set_protocol_features(int vid, uint64_t protocol_features)
 {
struct virtio_net *dev;

@@ -730,7 +730,7 @@ user_set_protocol_features(int vid, uint64_t 
protocol_features)
 }

 static int
-user_set_log_base(int vid, struct VhostUserMsg *msg)
+vhost_user_set_log_base(int vid, struct VhostUserMsg *msg)
 {
struct virtio_net *dev;
int fd = msg->fds[0];
@@ -793,7 +793,7 @@ user_set_log_base(int vid, struct VhostUserMsg *msg)
  * a flag 'broadcast_rarp' to let rte_vhost_dequeue_burst() inject it.
  */
 static int
-user_send_rarp(int vid, struct VhostUserMsg *msg)
+vhost_user_send_rarp(int vid, struct VhostUserMsg *msg)
 {
struct virtio_net *dev;
uint8_t *mac = (uint8_t *)>payload.u64;
@@ -894,14 +894,14 @@ vhost_user_msg_handler(int vid,

[dpdk-dev] [PATCH 4/7] vhost: fold common message handlers

2016-08-18 Thread Yuanhan Liu

Due to history reason (that we have 2 vhost implementations), some
messages are handled in two calls: vhost specific implementation
handles it first and then invoke the common one to do another handling.

We have one implementation only now, we could write one method for
each message. Here fold those common handles to corresponding vhost
user handler.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost_user.c | 115 --
 1 file changed, 31 insertions(+), 84 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index c4714b7..ada0a63 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -426,87 +426,6 @@ vhost_set_vring_base(int vid, struct vhost_vring_state 
*state)
return 0;
 }

-/*
- * We send the virtio device our available ring last used index.
- */
-static int
-vhost_get_vring_base(int vid, uint32_t index,
-   struct vhost_vring_state *state)
-{
-   struct virtio_net *dev;
-
-   dev = get_device(vid);
-   if (dev == NULL)
-   return -1;
-
-   state->index = index;
-   /* State->index refers to the queue index. The txq is 1, rxq is 0. */
-   state->num = dev->virtqueue[state->index]->last_used_idx;
-
-   return 0;
-}
-
-/*
- * The virtio device sends an eventfd to interrupt the guest. This fd gets
- * copied into our process space.
- */
-static int
-vhost_set_vring_call(int vid, struct vhost_vring_file *file)
-{
-   struct virtio_net *dev;
-   struct vhost_virtqueue *vq;
-   uint32_t cur_qp_idx = file->index / VIRTIO_QNUM;
-
-   dev = get_device(vid);
-   if (dev == NULL)
-   return -1;
-
-   /*
-* FIXME: VHOST_SET_VRING_CALL is the first per-vring message
-* we get, so we do vring queue pair allocation here.
-*/
-   if (cur_qp_idx + 1 > dev->virt_qp_nb) {
-   if (alloc_vring_queue_pair(dev, cur_qp_idx) < 0)
-   return -1;
-   }
-
-   /* file->index refers to the queue index. The txq is 1, rxq is 0. */
-   vq = dev->virtqueue[file->index];
-   assert(vq != NULL);
-
-   if (vq->callfd >= 0)
-   close(vq->callfd);
-
-   vq->callfd = file->fd;
-
-   return 0;
-}
-
-/*
- * The virtio device sends an eventfd that it can use to notify us.
- * This fd gets copied into our process space.
- */
-static int
-vhost_set_vring_kick(int vid, struct vhost_vring_file *file)
-{
-   struct virtio_net *dev;
-   struct vhost_virtqueue *vq;
-
-   dev = get_device(vid);
-   if (dev == NULL)
-   return -1;
-
-   /* file->index refers to the queue index. The txq is 1, rxq is 0. */
-   vq = dev->virtqueue[file->index];
-
-   if (vq->kickfd >= 0)
-   close(vq->kickfd);
-
-   vq->kickfd = file->fd;
-
-   return 0;
-}
-
 static int
 user_set_mem_table(int vid, struct VhostUserMsg *pmsg)
 {
@@ -671,6 +590,12 @@ static void
 user_set_vring_call(int vid, struct VhostUserMsg *pmsg)
 {
struct vhost_vring_file file;
+   struct virtio_net *dev = get_device(vid);
+   struct vhost_virtqueue *vq;
+   uint32_t cur_qp_idx;
+
+   if (!dev)
+   return;

file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
@@ -679,7 +604,24 @@ user_set_vring_call(int vid, struct VhostUserMsg *pmsg)
file.fd = pmsg->fds[0];
RTE_LOG(INFO, VHOST_CONFIG,
"vring call idx:%d file:%d\n", file.index, file.fd);
-   vhost_set_vring_call(vid, );
+
+   /*
+* FIXME: VHOST_SET_VRING_CALL is the first per-vring message
+* we get, so we do vring queue pair allocation here.
+*/
+   cur_qp_idx = file.index / VIRTIO_QNUM;
+   if (cur_qp_idx + 1 > dev->virt_qp_nb) {
+   if (alloc_vring_queue_pair(dev, cur_qp_idx) < 0)
+   return;
+   }
+
+   vq = dev->virtqueue[file.index];
+   assert(vq != NULL);
+
+   if (vq->callfd >= 0)
+   close(vq->callfd);
+
+   vq->callfd = file.fd;
 }

 /*
@@ -691,6 +633,7 @@ user_set_vring_kick(int vid, struct VhostUserMsg *pmsg)
 {
struct vhost_vring_file file;
struct virtio_net *dev = get_device(vid);
+   struct vhost_virtqueue *vq;

if (!dev)
return;
@@ -702,7 +645,11 @@ user_set_vring_kick(int vid, struct VhostUserMsg *pmsg)
file.fd = pmsg->fds[0];
RTE_LOG(INFO, VHOST_CONFIG,
"vring kick idx:%d file:%d\n", file.index, file.fd);
-   vhost_set_vring_kick(vid, );
+
+   vq = dev->virtqueue[file.index];
+   if (vq->kickfd >= 0)
+   close(vq->kickfd);
+   vq->kickfd = file.fd;

if (virtio_is_ready(dev) && !(dev->flags & VIRTIO_DEV_RUNNING)) {
if (notify_ops->new_device(vid) == 0)
@@ -727,7 +674,7 @@ user_get_vring_base(int

[dpdk-dev] [PATCH 3/7] vhost: refactor source code structure

2016-08-18 Thread Yuanhan Liu

The code structure is a bit messy now. For example, vhost-user message
handling is spread to three different files:

vhost-net-user.c  virtio-net.c  virtio-net-user.c

Where, vhost-net-user.c is the entrance to handle all those messages
and then invoke the right method for a specific message. Some of them
are stored at virtio-net.c, while others are stored at virtio-net-user.c.

The truth is all of them should be in one file, vhost_user.c.

So this patch refactors the source code structure: mainly on renaming
files and moving code from one file to another file that is more suitable
for storing it. Thus, no functional changes are made.

After the refactor, the code structure becomes to:

- socket.c  handles all vhost-user socket file related stuff, such
as, socket file creation for server mode, reconnection
for client mode.

- vhost.c   mainly on stuff like vhost device creation/destroy/reset.
Most of the vhost API implementation are there, too.

- vhost_user.c  all stuff about vhost-user messages handling goes there.

- virtio_net.c  all stuff about virtio-net should go there. It has virtio
net Rx/Tx implementation only so far: it's just a rename
from vhost_rxtx.c

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/Makefile  |6 +-
 lib/librte_vhost/{vhost-net-user.c => socket.c}|  209 +---
 lib/librte_vhost/vhost.c   |  409 
 lib/librte_vhost/{vhost-net.h => vhost.h}  |   24 +-
 lib/librte_vhost/vhost_user.c  | 1040 
 .../{vhost-net-user.h => vhost_user.h} |   17 +-
 lib/librte_vhost/virtio-net-user.c |  470 -
 lib/librte_vhost/virtio-net-user.h |   62 --
 lib/librte_vhost/virtio-net.c  |  847 
 lib/librte_vhost/{vhost_rxtx.c => virtio_net.c}|4 +-
 10 files changed, 1489 insertions(+), 1599 deletions(-)
 rename lib/librte_vhost/{vhost-net-user.c => socket.c} (71%)
 create mode 100644 lib/librte_vhost/vhost.c
 rename lib/librte_vhost/{vhost-net.h => vhost.h} (92%)
 create mode 100644 lib/librte_vhost/vhost_user.c
 rename lib/librte_vhost/{vhost-net-user.h => vhost_user.h} (87%)
 delete mode 100644 lib/librte_vhost/virtio-net-user.c
 delete mode 100644 lib/librte_vhost/virtio-net-user.h
 delete mode 100644 lib/librte_vhost/virtio-net.c
 rename lib/librte_vhost/{vhost_rxtx.c => virtio_net.c} (99%)

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 277390f..415ffc6 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -47,10 +47,8 @@ LDLIBS += -lnuma
 endif

 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := virtio-net.c vhost_rxtx.c
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost-net-user.c
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += virtio-net-user.c
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += fd_man.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c socket.c vhost.c vhost_user.c \
+  virtio_net.c

 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/vhost-net-user.c b/lib/librte_vhost/socket.c
similarity index 71%
rename from lib/librte_vhost/vhost-net-user.c
rename to lib/librte_vhost/socket.c
index b35594d..bf03f84 100644
--- a/lib/librte_vhost/vhost-net-user.c
+++ b/lib/librte_vhost/socket.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -47,12 +47,10 @@
 #include 

 #include 
-#include 

 #include "fd_man.h"
-#include "vhost-net-user.h"
-#include "vhost-net.h"
-#include "virtio-net-user.h"
+#include "vhost.h"
+#include "vhost_user.h"

 /*
  * Every time rte_vhost_driver_register() is invoked, an associated
@@ -82,7 +80,7 @@ struct vhost_user {
 #define MAX_VIRTIO_BACKLOG 128

 static void vhost_user_server_new_connection(int fd, void *data, int *remove);
-static void vhost_user_msg_handler(int fd, void *dat, int *remove);
+static void vhost_user_read_cb(int fd, void *dat, int *remove);
 static int vhost_user_create_client(struct vhost_user_socket *vsocket);

 static struct vhost_user vhost_user = {
@@ -95,31 +93,8 @@ static struct vhost_user vhost_user = {
.mutex = PTHREAD_MUTEX_INITIALIZER,
 };

-static const char *vhost_message_str[VHOST_USER_MAX] = {
-   [VHOST_USER_NONE] = "VHOST_USER_NONE",
-   [VHOST_USER_GET_FEATURES] = "VHOST_USER_GET_FEATURES",
-   [VHOST_USER_SET_FEATURES] = "VHOST_USER_SET_FEATURES",
-   [VHOST_USER_SET_OWNER] = "VHOST_USER_SET_OWNER",
-   [VHOST_USER_RESET_OWNER] = "VHOST_USER_RESET_OWNER",
-   [VHOST_USER_SET_MEM_TABLE] = "VHOST_USER_SET_MEM_TABLE",
-   [VHOST_USER_SET_LOG_BASE] =

[dpdk-dev] [PATCH 2/7] vhost: remove sub source dir

2016-08-18 Thread Yuanhan Liu

We now have one vhost implementation; no sub source dir is needed.
Rmove it by move them to upper dir.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/Makefile   | 6 +++---
 lib/librte_vhost/{vhost_user => }/fd_man.c  | 0
 lib/librte_vhost/{vhost_user => }/fd_man.h  | 0
 lib/librte_vhost/{vhost_user => }/vhost-net-user.c  | 0
 lib/librte_vhost/{vhost_user => }/vhost-net-user.h  | 0
 lib/librte_vhost/{vhost_user => }/virtio-net-user.c | 0
 lib/librte_vhost/{vhost_user => }/virtio-net-user.h | 0
 7 files changed, 3 insertions(+), 3 deletions(-)
 rename lib/librte_vhost/{vhost_user => }/fd_man.c (100%)
 rename lib/librte_vhost/{vhost_user => }/fd_man.h (100%)
 rename lib/librte_vhost/{vhost_user => }/vhost-net-user.c (100%)
 rename lib/librte_vhost/{vhost_user => }/vhost-net-user.h (100%)
 rename lib/librte_vhost/{vhost_user => }/virtio-net-user.c (100%)
 rename lib/librte_vhost/{vhost_user => }/virtio-net-user.h (100%)

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index fb4e7f8..277390f 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -48,9 +48,9 @@ endif

 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := virtio-net.c vhost_rxtx.c
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/vhost-net-user.c
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/virtio-net-user.c
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/fd_man.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost-net-user.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += virtio-net-user.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += fd_man.c

 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/vhost_user/fd_man.c b/lib/librte_vhost/fd_man.c
similarity index 100%
rename from lib/librte_vhost/vhost_user/fd_man.c
rename to lib/librte_vhost/fd_man.c
diff --git a/lib/librte_vhost/vhost_user/fd_man.h b/lib/librte_vhost/fd_man.h
similarity index 100%
rename from lib/librte_vhost/vhost_user/fd_man.h
rename to lib/librte_vhost/fd_man.h
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
b/lib/librte_vhost/vhost-net-user.c
similarity index 100%
rename from lib/librte_vhost/vhost_user/vhost-net-user.c
rename to lib/librte_vhost/vhost-net-user.c
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h 
b/lib/librte_vhost/vhost-net-user.h
similarity index 100%
rename from lib/librte_vhost/vhost_user/vhost-net-user.h
rename to lib/librte_vhost/vhost-net-user.h
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c 
b/lib/librte_vhost/virtio-net-user.c
similarity index 100%
rename from lib/librte_vhost/vhost_user/virtio-net-user.c
rename to lib/librte_vhost/virtio-net-user.c
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h 
b/lib/librte_vhost/virtio-net-user.h
similarity index 100%
rename from lib/librte_vhost/vhost_user/virtio-net-user.h
rename to lib/librte_vhost/virtio-net-user.h
-- 
1.9.0

[dpdk-dev] [PATCH 1/7] vhost: remove vhost-cuse

2016-08-18 Thread Yuanhan Liu

remove vhost-cuse code, including the eventfd_link kernel module that
is for vhost-cuse only.

The lib/virt/qemu-wrap.py is also removed, as it's mainly for vhost-cuse
usage.

As we have one vhost implementation now, one vhost config option is
needed only. Thus, CONFIG_RTE_LIBRTE_VHOST_USER is removed.

Signed-off-by: Yuanhan Liu 
---
 config/common_base|   6 +-
 lib/librte_vhost/Makefile |  13 +-
 lib/librte_vhost/eventfd_link/Makefile|  41 ---
 lib/librte_vhost/eventfd_link/eventfd_link.c  | 277 
 lib/librte_vhost/eventfd_link/eventfd_link.h  |  94 --
 lib/librte_vhost/libvirt/qemu-wrap.py | 387 ---
 lib/librte_vhost/vhost_cuse/eventfd_copy.c| 104 ---
 lib/librte_vhost/vhost_cuse/eventfd_copy.h|  45 ---
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  | 431 -
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 433 --
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  56 
 mk/rte.app.mk |   3 -
 12 files changed, 4 insertions(+), 1886 deletions(-)
 delete mode 100644 lib/librte_vhost/eventfd_link/Makefile
 delete mode 100644 lib/librte_vhost/eventfd_link/eventfd_link.c
 delete mode 100644 lib/librte_vhost/eventfd_link/eventfd_link.h
 delete mode 100755 lib/librte_vhost/libvirt/qemu-wrap.py
 delete mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c
 delete mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h
 delete mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
 delete mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
 delete mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h

diff --git a/config/common_base b/config/common_base
index 7830535..c703908 100644
--- a/config/common_base
+++ b/config/common_base
@@ -546,13 +546,9 @@ CONFIG_RTE_KNI_VHOST_DEBUG_TX=n
 CONFIG_RTE_LIBRTE_PDUMP=y

 #
-# Compile vhost library
-# fuse-devel is needed to run vhost-cuse.
-# fuse-devel enables user space char driver development
-# vhost-user is turned on by default.
+# Compile vhost user library
 #
 CONFIG_RTE_LIBRTE_VHOST=n
-CONFIG_RTE_LIBRTE_VHOST_USER=y
 CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 538adb0..fb4e7f8 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -39,13 +39,8 @@ EXPORT_MAP := rte_vhost_version.map
 LIBABIVER := 3

 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -D_FILE_OFFSET_BITS=64
-ifeq ($(CONFIG_RTE_LIBRTE_VHOST_USER),y)
 CFLAGS += -I vhost_user
 LDLIBS += -lpthread
-else
-CFLAGS += -I vhost_cuse
-LDLIBS += -lfuse
-endif

 ifeq ($(CONFIG_RTE_LIBRTE_VHOST_NUMA),y)
 LDLIBS += -lnuma
@@ -53,11 +48,9 @@ endif

 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := virtio-net.c vhost_rxtx.c
-ifeq ($(CONFIG_RTE_LIBRTE_VHOST_USER),y)
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/vhost-net-user.c 
vhost_user/virtio-net-user.c vhost_user/fd_man.c
-else
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_cuse/vhost-net-cdev.c 
vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c
-endif
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/vhost-net-user.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/virtio-net-user.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/fd_man.c

 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
diff --git a/lib/librte_vhost/eventfd_link/Makefile 
b/lib/librte_vhost/eventfd_link/Makefile
deleted file mode 100644
index 3140e8b..000
--- a/lib/librte_vhost/eventfd_link/Makefile
+++ /dev/null
@@ -1,41 +0,0 @@
-#   BSD LICENSE
-#
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
-#   All rights reserved.
-#
-#   Redistribution and use in source and binary forms, with or without
-#   modification, are permitted provided that the following conditions
-#   are met:
-#
-# * Redistributions of source code must retain the above copyright
-#   notice, this list of conditions and the following disclaimer.
-# * Redistributions in binary form must reproduce the above copyright
-#   notice, this list of conditions and the following disclaimer in
-#   the documentation and/or other materials provided with the
-#   distribution.
-# * Neither the name of Intel Corporation nor the names of its
-#   contributors may be used to endorse or promote products derived
-#   from this software without specific prior written permission.
-#
-#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-#   LIMITED

[dpdk-dev] [PATCH 0/7] vhost: vhost-cuse removal and code path refactoring

2016-08-18 Thread Yuanhan Liu

The first patch removes the vhost-cuse (see following link for the
deprecate note)

http://dpdk.org/ml/archives/dev/2016-July/044080.html


After the removal, there is no reason to keep the vhost_user sub source
dir any more. This also brings a chance to rename all those files in a
more proper way (see patch 3 for details).


---
Yuanhan Liu (7):
  vhost: remove vhost-cuse
  vhost: remove sub source dir
  vhost: refactor source code structure
  vhost: fold common message handlers
  vhost: unify function names
  vhost: get device once
  vhost: simplify features set/get

 config/common_base |   6 +-
 lib/librte_vhost/Makefile  |  13 +-
 lib/librte_vhost/eventfd_link/Makefile |  41 -
 lib/librte_vhost/eventfd_link/eventfd_link.c   | 277 ---
 lib/librte_vhost/eventfd_link/eventfd_link.h   |  94 ---
 lib/librte_vhost/{vhost_user => }/fd_man.c |   0
 lib/librte_vhost/{vhost_user => }/fd_man.h |   0
 lib/librte_vhost/libvirt/qemu-wrap.py  | 387 -
 .../{vhost_user/vhost-net-user.c => socket.c}  | 209 +
 lib/librte_vhost/vhost.c   | 409 +
 lib/librte_vhost/{vhost-net.h => vhost.h}  |  24 +-
 lib/librte_vhost/vhost_cuse/eventfd_copy.c | 104 ---
 lib/librte_vhost/vhost_cuse/eventfd_copy.h |  45 -
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c   | 431 --
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c  | 433 --
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h  |  56 --
 lib/librte_vhost/vhost_user.c  | 917 +
 .../{vhost_user/vhost-net-user.h => vhost_user.h}  |  17 +-
 lib/librte_vhost/vhost_user/virtio-net-user.c  | 470 ---
 lib/librte_vhost/vhost_user/virtio-net-user.h  |  62 --
 lib/librte_vhost/virtio-net.c  | 847 ---
 lib/librte_vhost/{vhost_rxtx.c => virtio_net.c}|   4 +-
 mk/rte.app.mk  |   3 -
 23 files changed, 1367 insertions(+), 3482 deletions(-)
 delete mode 100644 lib/librte_vhost/eventfd_link/Makefile
 delete mode 100644 lib/librte_vhost/eventfd_link/eventfd_link.c
 delete mode 100644 lib/librte_vhost/eventfd_link/eventfd_link.h
 rename lib/librte_vhost/{vhost_user => }/fd_man.c (100%)
 rename lib/librte_vhost/{vhost_user => }/fd_man.h (100%)
 delete mode 100755 lib/librte_vhost/libvirt/qemu-wrap.py
 rename lib/librte_vhost/{vhost_user/vhost-net-user.c => socket.c} (71%)
 create mode 100644 lib/librte_vhost/vhost.c
 rename lib/librte_vhost/{vhost-net.h => vhost.h} (92%)
 delete mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c
 delete mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h
 delete mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
 delete mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
 delete mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
 create mode 100644 lib/librte_vhost/vhost_user.c
 rename lib/librte_vhost/{vhost_user/vhost-net-user.h => vhost_user.h} (87%)
 delete mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
 delete mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
 delete mode 100644 lib/librte_vhost/virtio-net.c
 rename lib/librte_vhost/{vhost_rxtx.c => virtio_net.c} (99%)

-- 
1.9.0

[dpdk-dev] [PATCH 2/2] examples/vhost: support multiple socket files

2016-08-18 Thread Yuanhan Liu

On Thu, Aug 18, 2016 at 10:27:55AM +0200, Maxime Coquelin wrote:
> 
> 
> On 08/16/2016 06:14 PM, Jiayu Hu wrote:
> >When examples/vhost runs in client mode, only one QEMU can be connected.
> >This is because that examples/vhost just supports one socket file. This
> >patch is to add multiple sockets support for examples/vhost.
> >
> >Signed-off-by: Jiayu Hu 
> >---
> > examples/vhost/main.c | 50 
> > ++
> > 1 file changed, 38 insertions(+), 12 deletions(-)
> >
> >diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> >index a718577..9974f0b 100644
> >--- a/examples/vhost/main.c
> >+++ b/examples/vhost/main.c
> >@@ -136,8 +136,9 @@ static uint32_t burst_rx_delay_time = BURST_RX_WAIT_US;
> > /* Specify the number of retries on RX. */
> > static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
> >
> >-/* Socket file path. Can be set by user */
> >-static char socket_file[PATH_MAX] = "vhost-net";
> Default name being removed, you can drop my comment on patch 1. :)
> 
> >+/* Socket file paths. Can be set by user */
> >+static char *socket_files;
> >+int nb_sockets;
> Any reason not to make it static?

Right, it should be "static". Hmm, I missed it in review :(

--yliu

[dpdk-dev] [PATCH 1/2] examples/vhost: rename dev-basename

2016-08-18 Thread Yuanhan Liu

On Thu, Aug 18, 2016 at 10:22:38AM +0200, Maxime Coquelin wrote:
> Hi Jiayu,
> 
> On 08/16/2016 06:14 PM, Jiayu Hu wrote:
> >In examples/vhost, "dev-basename" is a program option, which is to set
> >the vhost-net socket used by vhost-user, or the character device used
> >by vhost-cuse. Since vhost-cuse should be dropped, and "dev-basename"
> >is not a suitable name for the vhost-net socket. Therefore, this patch
> >is to change this option name for examples/vhost.
> >
> >Signed-off-by: Jiayu Hu 
> >---
> > examples/vhost/main.c | 41 +
> > 1 file changed, 21 insertions(+), 20 deletions(-)
> >
> >diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> >index 92a9823..a718577 100644
> >--- a/examples/vhost/main.c
> >+++ b/examples/vhost/main.c
> >@@ -90,9 +90,6 @@
> > /* Size of buffers used for snprintfs. */
> > #define MAX_PRINT_BUFF 6072
> >
> >-/* Maximum character device basename size. */
> >-#define MAX_BASENAME_SZ 10
> >-
> > /* Maximum long option length for option parsing. */
> > #define MAX_LONG_OPT_SZ 64
> >
> >@@ -139,8 +136,8 @@ static uint32_t burst_rx_delay_time = BURST_RX_WAIT_US;
> > /* Specify the number of retries on RX. */
> > static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
> >
> >-/* Character device basename. Can be set by user. */
> >-static char dev_basename[MAX_BASENAME_SZ] = "vhost-net";
> >+/* Socket file path. Can be set by user */
> >+static char socket_file[PATH_MAX] = "vhost-net";
> 
> Not very important, but now that we only support vhost-user,
> maybe we could default the name to "vhost-user"?
> 
> There is no real convention I think, but this is what OVS is
> used to use in its examples.

I think it doesn't matter now, since since the 2nd patch, --socket-path
is a must but not optional any more, meaning there is no default
socket file path.

--yliu
> 
> Other than that:
> Reviewed-by: Maxime Coquelin 
> 
> Thanks,
> Maxime

[dpdk-dev] vhost [query] : support for multiple ports and non VMDQ devices in vhost switch

2016-08-18 Thread Yuanhan Liu

On Wed, Aug 17, 2016 at 03:54:21PM +0530, Pankaj Chauhan wrote:
> My use case is that my machine/board which is not sitting as end node of
> network but somewhere in between like an router. So the traffic looks
> something like this:
> 
> Physical port 1 -> Enter VM(s) through virtio -> exit from physical port 2

I'm not quite sure testpmd has this kind of support or not, routing the
data from a specific port to another specific port. Zhihong might have
the answer.

--yliu

> For above use case i need a vhost-back-end which supports multiple physical
> ports.
> 
> Thanks for the suggestion of vhost-pmd ( i was not aware of that), i'll
> explore possibility of using it for my use case of multiple physical ports.

[dpdk-dev] [PATCH 2/2] examples/vhost: support multiple socket files

2016-08-18 Thread Yuanhan Liu

On Tue, Aug 16, 2016 at 12:14:39PM -0400, Jiayu Hu wrote:
> +/*
> + * This function is used to unregister drivers.
> + */
> +static void
> +unregister_drivers(int socket_num)
> +{

Redundant comment. The function name already explains it well.

>   /* Register vhost user driver to handle vhost messages. */
> - ret = rte_vhost_driver_register(socket_file, flags);
> - if (ret != 0)
> - rte_exit(EXIT_FAILURE, "vhost driver register failure.\n");
> + for (i = 0; i < nb_sockets; i++) {
> + ret = rte_vhost_driver_register
> + (socket_files + i * PATH_MAX, flags);
> + if (ret != 0) {
> + unregister_drivers(i);
> + rte_exit(EXIT_FAILURE, "vhost driver register 
> failure.\n");

Lines over 80 chars.

Besides, please cc corresponding maintainers while sending patches, say
cc me for virtio/vhost changes. From MAINTAINERS you could find the
names.

So, please make a v2, with above 2 minor fixed. And also, please follow
the guide on http://dpdk.org/dev to send v2:

If a previous version of the patch has already been sent, a version
number and changelog annotations are helpful:

git send-email -1 -v2 --annotate --in-reply-to 
--to dev at dpdk.org --cc 

--yliu

[dpdk-dev] [PATCH v3 2/4] virtio: move SSE based Rx implementation to separate file

2016-08-18 Thread Yuanhan Liu

On Tue, Jul 05, 2016 at 06:19:24PM +0530, Jerin Jacob wrote:
> Split out SSE instruction based virtio simple Rx
> implementation to a separate file
> 
> Signed-off-by: Jerin Jacob 

Hi,

I was about to apply this set. I then did some build test and found a
weird issue: it breaks the build with clang (ubuntu 16.04).

drivers/net/virtio/virtio_rxtx_simple_sse.c:130:2: error: cast from 'const 
void *' to 'void *' drops const qualifier [-Werror,-Wcast-qual]
_mm_prefetch((const void *)rused, _MM_HINT_T0);
^
/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/xmmintrin.h:684:58: note: 
expanded from macro '_mm_prefetch'
#define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a), 0, (sel)))
 ^
1 error generated.

Weird enough I don't see this issue before this commit: the error
line is exactly the same before and after this commit.

Another note is that _mm_prefetch() is actually with different prototype
for gcc and clang. For gcc, we have:

_mm_prefetch (const void *__P, enum _mm_hint __I)

Any thoughts?

--yliu

[dpdk-dev] [RFC PATCH 5/5] app/test_pmd: add tests for new API's

2016-08-18 Thread Bernard Iremonger

add test for vf vlan anti spoof
add test for vf mac anti spoof
add test for vf ping
add test for vf vlan strip
add test for vf vlan insert
add test for tx loopback
add test for all queues drop enable bit
add test for vf split drop enable bit
add test for vf mac address
add new API's to the testpmd guide

Signed-off-by: Bernard Iremonger 
---
 app/test-pmd/cmdline.c  | 700 
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  68 ++-
 2 files changed, 766 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index f90befc..12e89c3 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -10585,6 +10585,697 @@ cmdline_parse_inst_t cmd_config_e_tag_filter_del = {
},
 };

+/* vf vlan anti spoof configuration */
+
+/* Common result structure for vf vlan anti spoof */
+struct cmd_vf_vlan_anti_spoof_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t vf;
+   cmdline_fixed_string_t vlan;
+   cmdline_fixed_string_t antispoof;
+   uint8_t port_id;
+   uint32_t vf_id;
+   uint8_t on;
+};
+
+/* Common CLI fields for vf vlan anti spoof enable disable */
+cmdline_parse_token_string_t cmd_vf_vlan_anti_spoof_set =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_vf_vlan_anti_spoof_result,
+set, "set");
+cmdline_parse_token_string_t cmd_vf_vlan_anti_spoof_vf =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_vf_vlan_anti_spoof_result,
+vf, "vf");
+cmdline_parse_token_string_t cmd_vf_vlan_anti_spoof_vlan =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_vf_vlan_anti_spoof_result,
+vlan, "vlan");
+cmdline_parse_token_string_t cmd_vf_vlan_anti_spoof_antispoof =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_vf_vlan_anti_spoof_result,
+antispoof, "antispoof");
+cmdline_parse_token_num_t cmd_vf_vlan_anti_spoof_port_id =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_vf_vlan_anti_spoof_result,
+port_id, UINT8);
+cmdline_parse_token_num_t cmd_vf_vlan_anti_spoof_vf_id =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_vf_vlan_anti_spoof_result,
+vf_id, UINT32);
+cmdline_parse_token_string_t cmd_vf_vlan_anti_spoof_on =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_vf_vlan_anti_spoof_result,
+on, UINT8);
+
+static void
+cmd_set_vf_vlan_anti_spoof_parsed(
+   void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_vf_vlan_anti_spoof_result *res = parsed_result;
+   int ret = 0;
+
+   if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+   return;
+
+   if (res->vf_id > 63) {
+   printf("vf_id must be less than 64.\n");
+   return;
+   }
+   ret = rte_eth_dev_set_vf_vlan_anti_spoof(res->port_id, res->vf_id, 
res->on);
+   if (ret < 0)
+   printf("vf vlan anti spoofing programming error: (%s)\n",
+  strerror(-ret));
+}
+
+cmdline_parse_inst_t cmd_set_vf_vlan_anti_spoof = {
+   .f = cmd_set_vf_vlan_anti_spoof_parsed,
+   .data = NULL,
+   .help_str = "enable/disable vf vlan anti spoof",
+   .tokens = {
+   (void *)_vf_vlan_anti_spoof_set,
+   (void *)_vf_vlan_anti_spoof_vf,
+   (void *)_vf_vlan_anti_spoof_vlan,
+   (void *)_vf_vlan_anti_spoof_antispoof,
+   (void *)_vf_vlan_anti_spoof_port_id,
+   (void *)_vf_vlan_anti_spoof_vf_id,
+   (void *)_vf_vlan_anti_spoof_on,
+   NULL,
+   },
+};
+
+/* vf mac anti spoof configuration */
+
+/* Common result structure for vf mac anti spoof */
+struct cmd_vf_mac_anti_spoof_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t vf;
+   cmdline_fixed_string_t mac;
+   cmdline_fixed_string_t antispoof;
+   uint8_t port_id;
+   uint32_t vf_id;
+   uint8_t on;
+};
+
+/* Common CLI fields for vf mac anti spoof enable disable */
+cmdline_parse_token_string_t cmd_vf_mac_anti_spoof_set =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_vf_mac_anti_spoof_result,
+set, "set");
+cmdline_parse_token_string_t cmd_vf_mac_anti_spoof_vf =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_vf_mac_anti_spoof_result,
+vf, "vf");
+cmdline_parse_token_string_t cmd_vf_mac_anti_spoof_mac =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_vf_mac_anti_spoof_result,
+mac, "mac");
+cmdline_parse_token_string_t cmd_vf_mac_anti_spoof_antispoof =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_vf_mac_anti_spoof_result,
+antispoof, "antispoof");
+cmdline_parse_token_num_t cmd_vf_mac_anti_spoof_port_id =
+   TOKEN_NUM_INITIALIZER
+   (struct

[dpdk-dev] [RFC PATCH 4/5] net/ixgbe: add functions for VF management

2016-08-18 Thread Bernard Iremonger

Add new functions to configure and manage VF's on a Niantic NIC.

add ixgbe_vf_ping function.
add ixgbe_set_vf_vlan_anti_spoof function.
add ixgbe_set_vf_mac_anti_spoof function.

Signed-off-by: azelezniak 

add ixgbe_set_vf_vlan_strip function
add ixgbe_set_vf_vlan_insert function.
add ixgbe_set_tx_loopback function.
add ixgbe_set_all_queues_drop function.
add ixgbe_set_vf_split_drop_en function.
add ixgbe_set_vf_mac_addr function.

Signed-off-by: Bernard Iremonger 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 179 +++
 1 file changed, 179 insertions(+)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index d478a15..3b0ee82 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -240,6 +240,8 @@ static void ixgbe_add_rar(struct rte_eth_dev *dev, struct 
ether_addr *mac_addr,
 static void ixgbe_remove_rar(struct rte_eth_dev *dev, uint32_t index);
 static void ixgbe_set_default_mac_addr(struct rte_eth_dev *dev,
   struct ether_addr *mac_addr);
+static int ixgbe_set_vf_mac_addr(struct rte_eth_dev *dev, uint16_t vf,
+   struct ether_addr *mac_addr);
 static void ixgbe_dcb_init(struct ixgbe_hw *hw, struct ixgbe_dcb_config 
*dcb_config);

 /* For Virtual Function support */
@@ -280,6 +282,19 @@ static int ixgbe_set_pool_rx(struct rte_eth_dev *dev, 
uint16_t pool, uint8_t on)
 static int ixgbe_set_pool_tx(struct rte_eth_dev *dev, uint16_t pool, uint8_t 
on);
 static int ixgbe_set_pool_vlan_filter(struct rte_eth_dev *dev, uint16_t vlan,
uint64_t pool_mask, uint8_t vlan_on);
+static void ixgbe_set_vf_vlan_anti_spoof(struct rte_eth_dev *dev,
+   uint16_t vf, uint8_t on);
+static void ixgbe_set_vf_mac_anti_spoof(struct rte_eth_dev *dev,
+   uint16_t vf, uint8_t on);
+static int ixgbe_vf_ping(struct rte_eth_dev *dev, int32_t vf);
+static void ixgbe_set_vf_vlan_strip(struct rte_eth_dev *dev,
+   int on, uint16_t queues_per_pool);
+static void ixgbe_set_vf_vlan_insert(struct rte_eth_dev *dev, uint16_t vf,
+   int vlan);
+static void ixgbe_set_tx_loopback(struct rte_eth_dev *dev, int on);
+static void ixgbe_set_all_queues_drop_en(struct rte_eth_dev *dev, int state);
+static void ixgbe_set_vf_split_drop_en(struct rte_eth_dev *dev, uint16_t vf,
+   int state);
 static int ixgbe_mirror_rule_set(struct rte_eth_dev *dev,
struct rte_eth_mirror_conf *mirror_conf,
uint8_t rule_id, uint8_t on);
@@ -505,6 +520,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
.mac_addr_add = ixgbe_add_rar,
.mac_addr_remove  = ixgbe_remove_rar,
.mac_addr_set = ixgbe_set_default_mac_addr,
+   .set_vf_mac_addr  = ixgbe_set_vf_mac_addr,
.uc_hash_table_set= ixgbe_uc_hash_table_set,
.uc_all_hash_table_set  = ixgbe_uc_all_hash_table_set,
.mirror_rule_set  = ixgbe_mirror_rule_set,
@@ -513,6 +529,14 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
.set_vf_rx= ixgbe_set_pool_rx,
.set_vf_tx= ixgbe_set_pool_tx,
.set_vf_vlan_filter   = ixgbe_set_pool_vlan_filter,
+   .set_vf_vlan_anti_spoof  = ixgbe_set_vf_vlan_anti_spoof,
+   .set_vf_mac_anti_spoof   = ixgbe_set_vf_mac_anti_spoof,
+   .vf_ping  = ixgbe_vf_ping,
+   .set_vf_vlan_strip= ixgbe_set_vf_vlan_strip,
+   .set_vf_vlan_insert   = ixgbe_set_vf_vlan_insert,
+   .set_tx_loopback  = ixgbe_set_tx_loopback,
+   .set_all_queues_drop_en = ixgbe_set_all_queues_drop_en,
+   .set_vf_split_drop_en = ixgbe_set_vf_split_drop_en,
.set_queue_rate_limit = ixgbe_set_queue_rate_limit,
.set_vf_rate_limit= ixgbe_set_vf_rate_limit,
.reta_update  = ixgbe_dev_rss_reta_update,
@@ -4012,6 +4036,22 @@ ixgbe_set_default_mac_addr(struct rte_eth_dev *dev, 
struct ether_addr *addr)
 }

 static int
+ixgbe_set_vf_mac_addr(struct rte_eth_dev *dev, uint16_t vf, struct ether_addr 
*mac_addr)
+{
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct ixgbe_vf_info *vfinfo =
+   *(IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private));
+   int rar_entry = hw->mac.num_rar_entries - (vf + 1);
+   uint8_t *new_mac = (uint8_t *)(mac_addr);
+
+   if (is_valid_assigned_ether_addr((struct ether_addr *)new_mac)) {
+   rte_memcpy(vfinfo[vf].vf_mac_addresses, new_mac, 
ETHER_ADDR_LEN);
+   return hw->mac.ops.set_rar(hw, rar_entry, new_mac, vf, 
IXGBE_RAH_AV);
+   }
+   return -1;
+}
+
+static int
 ixgbe_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
uint32_t hlreg0;
@@ -4315,6 +4355,16 @@ ixgbevf_vlan_strip_queue_set(struct rte_eth_dev *dev, 
uint16_t queue, int on)
 }

 static void
+ixgbe_set_vf_vlan_strip(struct rte_eth_dev *dev, int on,
+

[dpdk-dev] [RFC PATCH 3/5] librte_ether: add API's for VF management

2016-08-18 Thread Bernard Iremonger

Add new API functions to configure and manage VF's on a NIC.

add rte_eth_dev_vf_ping function.
add rte_eth_dev_set_vf_vlan_anti_spoof function.
add rte_eth_dev_set_vf_mac_anti_spoof function.

Signed-off-by: azelezniak 

add rte_eth_dev_set_vf_vlan_strip function.
add rte_eth_dev_set_vf_vlan_insert function.
add rte_eth_dev_set_loopback function.
add rte_eth_dev_set_all_queues_drop function.
add rte_eth_dev_set_vf_split_drop_en function
add rte_eth_dev_set_vf_mac_addr function.
increment LIBABIVER to 5.

Signed-off-by: Bernard Iremonger 
---
 lib/librte_ether/rte_ethdev.c  | 159 +++
 lib/librte_ether/rte_ethdev.h  | 223 +
 lib/librte_ether/rte_ether_version.map |   9 ++
 3 files changed, 391 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 1388ea3..2a3d2ae 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2306,6 +2306,22 @@ rte_eth_dev_default_mac_addr_set(uint8_t port_id, struct 
ether_addr *addr)
 }

 int
+rte_eth_dev_set_vf_mac_addr(uint8_t port_id, uint16_t vf, struct ether_addr 
*addr)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+   if (!is_valid_assigned_ether_addr(addr))
+   return -EINVAL;
+
+   dev = _eth_devices[port_id];
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_mac_addr, -ENOTSUP);
+
+   return (*dev->dev_ops->set_vf_mac_addr)(dev, vf, addr);
+}
+
+int
 rte_eth_dev_set_vf_rxmode(uint8_t port_id,  uint16_t vf,
uint16_t rx_mode, uint8_t on)
 {
@@ -2490,6 +2506,149 @@ rte_eth_dev_set_vf_vlan_filter(uint8_t port_id, 
uint16_t vlan_id,
   vf_mask, vlan_on);
 }

+int
+rte_eth_dev_set_vf_vlan_anti_spoof(uint8_t port_id,
+  uint16_t vf, uint8_t on)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+   dev = _eth_devices[port_id];
+   if (vf > 63) {
+   RTE_PMD_DEBUG_TRACE("VF VLAN anti spoof:VF %d > 63\n", vf);
+   return -EINVAL;
+   }
+
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_vlan_anti_spoof, 
-ENOTSUP);
+   (*dev->dev_ops->set_vf_vlan_anti_spoof)(dev, vf, on);
+   return 0;
+}
+
+int
+rte_eth_dev_set_vf_mac_anti_spoof(uint8_t port_id,
+  uint16_t vf, uint8_t on)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+   dev = _eth_devices[port_id];
+   if (vf > 63) {
+   RTE_PMD_DEBUG_TRACE("VF MAC anti spoof:VF %d > 63\n", vf);
+   return -EINVAL;
+   }
+
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_mac_anti_spoof, -ENOTSUP);
+   (*dev->dev_ops->set_vf_mac_anti_spoof)(dev, vf, on);
+   return 0;
+}
+
+int
+rte_eth_dev_vf_ping(uint8_t port_id, int32_t vf)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+   dev = _eth_devices[port_id];
+   if (vf > 63) {
+   RTE_PMD_DEBUG_TRACE("VF ping: VF %d > 64\n", vf);
+   return -EINVAL;
+   }
+
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->vf_ping, -ENOTSUP);
+   return (*dev->dev_ops->vf_ping)(dev, vf);
+}
+
+int
+rte_eth_dev_set_vf_vlan_strip(uint8_t port_id, uint16_t vf, int on)
+{
+   struct rte_eth_dev *dev;
+   struct rte_eth_dev_info dev_info;
+   uint16_t queues_per_pool;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+   dev = _eth_devices[port_id];
+   if (vf > 63) {
+   RTE_PMD_DEBUG_TRACE("VF vlan strip set VF %d > 63\n", vf);
+   return -EINVAL;
+   }
+
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_vlan_strip, -ENOTSUP);
+
+   rte_eth_dev_info_get(port_id, _info);
+   queues_per_pool = dev_info.vmdq_queue_num/dev_info.max_vmdq_pools;
+
+   (*dev->dev_ops->set_vf_vlan_strip)(dev, on, queues_per_pool);
+   return 0;
+}
+
+int
+rte_eth_dev_set_vf_vlan_insert(uint8_t port_id, uint16_t vf, int on)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+   dev = _eth_devices[port_id];
+   if (vf > 63) {
+   RTE_PMD_DEBUG_TRACE("VF vlan insert set VF %d > 63\n", vf);
+   return -EINVAL;
+   }
+
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_vlan_insert, -ENOTSUP);
+
+   (*dev->dev_ops->set_vf_vlan_insert)(dev, vf, on);
+   return 0;
+}
+
+int
+rte_eth_dev_set_tx_loopback(uint8_t port_id, int on)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+   dev = _eth_devices[port_id];
+
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_tx_loopback, -ENOTSUP);
+
+   (*dev->dev_ops->set_tx_loopback)(dev, on);
+   return 0;
+}
+
+int

[dpdk-dev] [RFC PATCH 2/5] net/ixgbe: add callback to user app on VF to PF mbox msg

2016-08-18 Thread Bernard Iremonger

call _rte_eth_dev_callback_process_vf from ixgbe_rcv_msg_from_vf function.

The callback asks the user application if it is allowed to perform
the function.
If the cb_param.retval is RTE_ETH_MB_EVENT_PROCEED then continue,
if 0, do nothing and send ACK to VF
if > 1, do nothing and send NAK to VF.

Signed-off-by: azelezniak 
Signed-off-by: Bernard Iremonger 
---
 drivers/net/ixgbe/ixgbe_pf.c | 39 ++-
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c
index 56393ff..bb14106 100644
--- a/drivers/net/ixgbe/ixgbe_pf.c
+++ b/drivers/net/ixgbe/ixgbe_pf.c
@@ -660,6 +660,7 @@ ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t vf)
struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct ixgbe_vf_info *vfinfo =
*IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private);
+   struct rte_eth_mb_event_param cb_param;

retval = ixgbe_read_mbx(hw, msgbuf, mbx_size, vf);
if (retval) {
@@ -674,27 +675,54 @@ ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t 
vf)
/* flush the ack before we write any messages back */
IXGBE_WRITE_FLUSH(hw);

+   /**
+* initialise structure to send to user application
+* will return response from user in retval field
+*/
+   cb_param.retval = RTE_ETH_MB_EVENT_PROCEED;
+   cb_param.vfid = vf;
+   cb_param.msg_type = msgbuf[0] & 0x;
+   cb_param.userdata = (void *)msgbuf;
+
/* perform VF reset */
if (msgbuf[0] == IXGBE_VF_RESET) {
int ret = ixgbe_vf_reset(dev, vf, msgbuf);

vfinfo[vf].clear_to_send = true;
+
+   /* notify application about VF reset */
+   _rte_eth_dev_callback_process_vf(dev, RTE_ETH_EVENT_VF_MBOX, 
_param);
return ret;
}

+   /**
+* ask user application if we allowed to perform those functions
+* if we get cb_param.retval == RTE_ETH_MB_EVENT_PROCEED then business
+* as usual,
+* if 0, do nothing and send ACK to VF
+* if cb_param.retval > 1, do nothing and send NAK to VF
+*/
+   _rte_eth_dev_callback_process_vf(dev, RTE_ETH_EVENT_VF_MBOX, _param);
+
+   retval = cb_param.retval;
+
/* check & process VF to PF mailbox message */
switch ((msgbuf[0] & 0x)) {
case IXGBE_VF_SET_MAC_ADDR:
-   retval = ixgbe_vf_set_mac_addr(dev, vf, msgbuf);
+   if (retval == RTE_ETH_MB_EVENT_PROCEED)
+   retval = ixgbe_vf_set_mac_addr(dev, vf, msgbuf);
break;
case IXGBE_VF_SET_MULTICAST:
-   retval = ixgbe_vf_set_multicast(dev, vf, msgbuf);
+   if (retval == RTE_ETH_MB_EVENT_PROCEED)
+   retval = ixgbe_vf_set_multicast(dev, vf, msgbuf);
break;
case IXGBE_VF_SET_LPE:
-   retval = ixgbe_set_vf_lpe(dev, vf, msgbuf);
+   if (retval == RTE_ETH_MB_EVENT_PROCEED)
+   retval = ixgbe_set_vf_lpe(dev, vf, msgbuf);
break;
case IXGBE_VF_SET_VLAN:
-   retval = ixgbe_vf_set_vlan(dev, vf, msgbuf);
+   if (retval == RTE_ETH_MB_EVENT_PROCEED)
+   retval = ixgbe_vf_set_vlan(dev, vf, msgbuf);
break;
case IXGBE_VF_API_NEGOTIATE:
retval = ixgbe_negotiate_vf_api(dev, vf, msgbuf);
@@ -704,7 +732,8 @@ ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t vf)
msg_size = IXGBE_VF_GET_QUEUE_MSG_SIZE;
break;
case IXGBE_VF_UPDATE_XCAST_MODE:
-   retval = ixgbe_set_vf_mc_promisc(dev, vf, msgbuf);
+   if (retval == RTE_ETH_MB_EVENT_PROCEED)
+   retval = ixgbe_set_vf_mc_promisc(dev, vf, msgbuf);
break;
default:
PMD_DRV_LOG(DEBUG, "Unhandled Msg %8.8x", (unsigned)msgbuf[0]);
-- 
2.9.0

[dpdk-dev] [RFC PATCH 1/5] librte_ether: add internal callback functions

2016-08-18 Thread Bernard Iremonger

add _rte_eth_dev_callback_process_vf function.
add _rte_eth_dev_callback_process_generic function

Adding a callback to the user application on VF to PF mailbox message,
allows passing information to the application controlling the PF
when a VF mailbox event message is received, such as VF reset.

Signed-off-by: azelezniak 
Signed-off-by: Bernard Iremonger 
---
 lib/librte_ether/rte_ethdev.c  | 17 ++
 lib/librte_ether/rte_ethdev.h  | 61 ++
 lib/librte_ether/rte_ether_version.map |  7 
 3 files changed, 85 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index f62a9ec..1388ea3 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2690,6 +2690,20 @@ void
 _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
enum rte_eth_event_type event)
 {
+   return _rte_eth_dev_callback_process_generic(dev, event, NULL);
+}
+
+void
+_rte_eth_dev_callback_process_vf(struct rte_eth_dev *dev,
+   enum rte_eth_event_type event, void *param)
+{
+   return _rte_eth_dev_callback_process_generic(dev, event, param);
+}
+
+void
+_rte_eth_dev_callback_process_generic(struct rte_eth_dev *dev,
+   enum rte_eth_event_type event, void *param)
+{
struct rte_eth_dev_callback *cb_lst;
struct rte_eth_dev_callback dev_cb;

@@ -2699,6 +2713,9 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
continue;
dev_cb = *cb_lst;
cb_lst->active = 1;
+   if (param != NULL)
+   dev_cb.cb_arg = (void *) param;
+
rte_spinlock_unlock(_eth_dev_cb_lock);
dev_cb.cb_fn(dev->data->port_id, dev_cb.event,
dev_cb.cb_arg);
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index b0fe033..4fb0b9c 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -3047,9 +3047,27 @@ enum rte_eth_event_type {
/**< queue state event (enabled/disabled) */
RTE_ETH_EVENT_INTR_RESET,
/**< reset interrupt event, sent to VF on PF reset */
+   RTE_ETH_EVENT_VF_MBOX,  /**< PF mailbox processing callback */
RTE_ETH_EVENT_MAX   /**< max value of this enum */
 };

+/**
+ * Response sent back to ixgbe driver from user app after callback
+ */
+enum rte_eth_mb_event_rsp {
+   RTE_ETH_MB_EVENT_NOOP_ACK,  /**< skip mbox request and ACK */
+   RTE_ETH_MB_EVENT_NOOP_NACK, /**< skip mbox request and NACK */
+   RTE_ETH_MB_EVENT_PROCEED,  /**< proceed with mbox request  */
+   RTE_ETH_MB_EVENT_MAX   /**< max value of this enum */
+};
+
+struct rte_eth_mb_event_param {
+   uint16_t vfid;
+   uint16_t msg_type;
+   uint16_t retval;
+   void *userdata;
+};
+
 typedef void (*rte_eth_dev_cb_fn)(uint8_t port_id, \
enum rte_eth_event_type event, void *cb_arg);
 /**< user application callback to be registered for interrupts */
@@ -3114,6 +3132,49 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev 
*dev,
enum rte_eth_event_type event);

 /**
+ * @internal Executes all the user application registered callbacks for
+ * the specific device where parameter have to be passed to user application.
+ * It is for DPDK internal user only. User application should not call it
+ * directly.
+ *
+ * @param dev
+ *  Pointer to struct rte_eth_dev.
+ * @param event
+ *  Eth device interrupt event type.
+ *
+ * @param param
+ *  parameters to pass back to user application.
+ *
+ * @return
+ *  void
+ */
+
+void
+_rte_eth_dev_callback_process_vf(struct rte_eth_dev *dev,
+   enum rte_eth_event_type event, void *param);
+
+/**
+ * @internal Executes all the user application registered callbacks. Used by:
+ * _rte_eth_dev_callback_process and _rte_eth_dev_callback_process_vf
+ * It is for DPDK internal user only. User application should not call it
+ * directly.
+ *
+ * @param dev
+ *  Pointer to struct rte_eth_dev.
+ * @param event
+ *  Eth device interrupt event type.
+ *
+ * @param param
+ *  parameters to pass back to user application.
+ *
+ * @return
+ *  void
+ */
+void
+_rte_eth_dev_callback_process_generic(struct rte_eth_dev *dev,
+   enum rte_eth_event_type event, void *param);
+
+/**
  * When there is no rx packet coming in Rx Queue for a long time, we can
  * sleep lcore related to RX Queue for power saving, and enable rx interrupt
  * to be triggered when rx packect arrives.
diff --git a/lib/librte_ether/rte_ether_version.map 
b/lib/librte_ether/rte_ether_version.map
index 45ddf44..cb7ef15 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -139,3 +139,10 @@ DPDK_16.07 {
rte_eth_dev_get_port_by_name;
rte_eth_xstats_get_names;
 } DPDK_16.04;
+

[dpdk-dev] [RFC PATCH 0/5] add API's for VF management

2016-08-18 Thread Bernard Iremonger

This RFC patchset contains new DPDK API's requested by AT for use
with the Virtual Function Daemon (VFD).

The need to configure and manage VF's on a NIC has grown to the
point where AT have devloped a DPDK based tool, VFD, to do this.

This RFC proposes to add the following API extensions to DPDK:
  mailbox communication callback support
  VF configuration

Nine new functions have been added to the eth_dev_ops structure.
Corresponding functions have been added to the ixgbe PMD for the
Niantic NIC.

Two new callback functions have been added.
Changes have been made to the ixgbe_rcv_msg_from_vf function to
use the callback functions.

Changes have been made to testpmd to facilitate testing of the new API's.
The testpmd documentation has been updated to document the testpmd changes.

Note:
Adding new functions to the eth_dev_ops structure will cause an
ABI breakage.

Bernard Iremonger (5):
  librte_ether: add internal callback functions
  net/ixgbe: add callback to user app on VF to PF mbox msg
  librte_ether: add API's for VF management
  net/ixgbe: add functions for VF management
  app/test_pmd: add tests for new API's

 app/test-pmd/cmdline.c  | 700 
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  68 ++-
 drivers/net/ixgbe/ixgbe_ethdev.c| 179 +++
 drivers/net/ixgbe/ixgbe_pf.c|  39 +-
 lib/librte_ether/rte_ethdev.c   | 176 +++
 lib/librte_ether/rte_ethdev.h   | 284 +++
 lib/librte_ether/rte_ether_version.map  |  16 +
 7 files changed, 1455 insertions(+), 7 deletions(-)

-- 
2.9.0

[dpdk-dev] VLAN with ixgbevf pmd

2016-08-18 Thread Dey, Souvik

Hi,



I am trying to get tagged packets to work in SRIOV mode.  I am using dpdk  2.1 
version with an application on KVM.

 The setup is as below: The same configuration works for untagged packets.

 Guest VM (Virtual Function/Created tagged kni interfaces)--- > KVM host (PF/no 
tag on the VF ) -->Client server

 When the packet is tagged (vlan tag/id) the packet is sent from kni interface 
to the application is it received with the tag and is also sent out to the pmd. 
But the packets does not go out of the host. Neither any tagged packets are 
coming in from out to the application. The ol_flags is set to 0 in my 
application.

Can someone let me know what I am missing? Do we need to do some specific 
configuration on the rx or tx ports for this ? Does the vlan id configured on 
the kni gets peculated down to the vf of the host too which is causing the 
issue ?



Any help would be highly appreciated!



--

Regards,

Souvik

[dpdk-dev] [PATCH 2/2] app/test: add test cases for NULL for Intel QAT driver

2016-08-18 Thread Deepak Kumar Jain

Added NULL algorithm to test file for Intel(R) QuickAssist
Technology Driver

Signed-off-by: Deepak Kumar Jain 
---
 app/test/test_cryptodev.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index 8553759..67ca912 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -4136,6 +4136,16 @@ static struct unit_test_suite cryptodev_qat_testsuite  = 
{
TEST_CASE_ST(ut_setup, ut_teardown,
test_MD5_HMAC_verify_case_2),

+   /** NULL tests */
+   TEST_CASE_ST(ut_setup, ut_teardown,
+   test_null_auth_only_operation),
+   TEST_CASE_ST(ut_setup, ut_teardown,
+   test_null_cipher_only_operation),
+   TEST_CASE_ST(ut_setup, ut_teardown,
+   test_null_cipher_auth_operation),
+   TEST_CASE_ST(ut_setup, ut_teardown,
+   test_null_auth_cipher_operation),
+
TEST_CASES_END() /**< NULL terminate unit test array */
}
 };
-- 
2.5.5

[dpdk-dev] [PATCH 1/2] crypto/qat: add NULL capability to Intel QAT driver

2016-08-18 Thread Deepak Kumar Jain

enabled NULL crypto for Intel(R) QuickAssist Technology

Signed-off-by: Deepak Kumar Jain 
---
 doc/guides/cryptodevs/qat.rst| 3 ++-
 drivers/crypto/qat/qat_adf/qat_algs_build_desc.c | 2 ++
 drivers/crypto/qat/qat_crypto.c  | 4 
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst
index 78a734f..bb62f22 100644
--- a/doc/guides/cryptodevs/qat.rst
+++ b/doc/guides/cryptodevs/qat.rst
@@ -49,6 +49,7 @@ Cipher algorithms:
 * ``RTE_CRYPTO_SYM_CIPHER_AES256_CTR``
 * ``RTE_CRYPTO_SYM_CIPHER_SNOW3G_UEA2``
 * ``RTE_CRYPTO_CIPHER_AES_GCM``
+* ``RTE_CRYPTO_CIPHER_NULL``

 Hash algorithms:

@@ -60,7 +61,7 @@ Hash algorithms:
 * ``RTE_CRYPTO_AUTH_AES_XCBC_MAC``
 * ``RTE_CRYPTO_AUTH_SNOW3G_UIA2``
 * ``RTE_CRYPTO_AUTH_MD5_HMAC``
-
+* ``RTE_CRYPTO_AUTH_NULL``

 Limitations
 ---
diff --git a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c 
b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
index af8c176..d9437bc 100644
--- a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
+++ b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
@@ -720,6 +720,8 @@ int qat_alg_aead_session_create_content_desc_auth(struct 
qat_session *cdesc,
}
state2_size = ICP_QAT_HW_MD5_STATE2_SZ;
break;
+   case ICP_QAT_HW_AUTH_ALGO_NULL:
+   break;
default:
PMD_DRV_LOG(ERR, "Invalid HASH alg %u", cdesc->qat_hash_alg);
return -EFAULT;
diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c
index a474512..434ff81 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -427,6 +427,8 @@ qat_crypto_sym_configure_session_cipher(struct 
rte_cryptodev *dev,
session->qat_mode = ICP_QAT_HW_CIPHER_ECB_MODE;
break;
case RTE_CRYPTO_CIPHER_NULL:
+   session->qat_mode = ICP_QAT_HW_CIPHER_ECB_MODE;
+   break;
case RTE_CRYPTO_CIPHER_3DES_ECB:
case RTE_CRYPTO_CIPHER_3DES_CBC:
case RTE_CRYPTO_CIPHER_AES_ECB:
@@ -558,6 +560,8 @@ qat_crypto_sym_configure_session_auth(struct rte_cryptodev 
*dev,
session->qat_hash_alg = ICP_QAT_HW_AUTH_ALGO_MD5;
break;
case RTE_CRYPTO_AUTH_NULL:
+   session->qat_hash_alg = ICP_QAT_HW_AUTH_ALGO_NULL;
+   break;
case RTE_CRYPTO_AUTH_SHA1:
case RTE_CRYPTO_AUTH_SHA256:
case RTE_CRYPTO_AUTH_SHA512:
-- 
2.5.5

[dpdk-dev] [PATCH 0/2] add NULL crypto support in Intel QAT driver

2016-08-18 Thread Deepak Kumar Jain

This patchset adds support of NULL crypto in Intel(R) QuickAssist Technology 
driver.

This patchset depends on following patchset:
"crypto/qat: add aes-sha384-hmac capability to Intel QAT driver"
(http://dpdk.org/dev/patchwork/patch/15228/)

Deepak Kumar Jain (2):
  crypto/qat: add NULL capability to Intel QAT driver
  app/test: add test cases for NULL for Intel QAT driver

 app/test/test_cryptodev.c| 10 ++
 doc/guides/cryptodevs/qat.rst|  3 ++-
 drivers/crypto/qat/qat_adf/qat_algs_build_desc.c |  2 ++
 drivers/crypto/qat/qat_crypto.c  |  4 
 4 files changed, 18 insertions(+), 1 deletion(-)

-- 
2.5.5

[dpdk-dev] [PATCH 2/2] app/test: add test cases for aes-sha384-hmac for Intel QAT driver

2016-08-18 Thread Deepak Kumar Jain

From: "Jain, Deepak K" 

Added aes-sha384-hmac algorithm to test file for Intel(R) QuickAssist
Technology Driver

Signed-off-by: Deepak Kumar Jain 
---
 app/test/test_cryptodev_aes.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/app/test/test_cryptodev_aes.c b/app/test/test_cryptodev_aes.c
index 6ad2674..e19c45b 100644
--- a/app/test/test_cryptodev_aes.c
+++ b/app/test/test_cryptodev_aes.c
@@ -226,14 +226,16 @@ static const struct aes_test_case aes_test_cases[] = {
.test_descr = "AES-128-CBC HMAC-SHA384 Encryption Digest",
.test_data = _test_data_9,
.op_mask = AES_TEST_OP_ENC_AUTH_GEN,
-   .pmd_mask = AES_TEST_TARGET_PMD_MB
+   .pmd_mask = AES_TEST_TARGET_PMD_MB |
+   AES_TEST_TARGET_PMD_QAT
},
{
.test_descr = "AES-128-CBC HMAC-SHA384 Decryption Digest "
"Verify",
.test_data = _test_data_9,
.op_mask = AES_TEST_OP_AUTH_VERIFY_DEC,
-   .pmd_mask = AES_TEST_TARGET_PMD_MB
+   .pmd_mask = AES_TEST_TARGET_PMD_MB |
+   AES_TEST_TARGET_PMD_QAT
},
 };

-- 
2.5.5

[dpdk-dev] [PATCH 1/2] crypto/qat: add aes-sha384-hmac capability to Intel QAT driver

2016-08-18 Thread Deepak Kumar Jain

From: "Jain, Deepak K" 

enabled support of aes-sha384-hmac in Intel(R) QuickAssist driver

Signed-off-by: Deepak Kumar Jain 
---
 doc/guides/cryptodevs/qat.rst|  1 +
 drivers/crypto/qat/qat_adf/qat_algs_build_desc.c | 33 
 drivers/crypto/qat/qat_crypto.c  | 10 ---
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst
index 7f630be..78a734f 100644
--- a/doc/guides/cryptodevs/qat.rst
+++ b/doc/guides/cryptodevs/qat.rst
@@ -55,6 +55,7 @@ Hash algorithms:
 * ``RTE_CRYPTO_AUTH_SHA1_HMAC``
 * ``RTE_CRYPTO_AUTH_SHA224_HMAC``
 * ``RTE_CRYPTO_AUTH_SHA256_HMAC``
+* ``RTE_CRYPTO_AUTH_SHA384_HMAC``
 * ``RTE_CRYPTO_AUTH_SHA512_HMAC``
 * ``RTE_CRYPTO_AUTH_AES_XCBC_MAC``
 * ``RTE_CRYPTO_AUTH_SNOW3G_UIA2``
diff --git a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c 
b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
index 77e6548..af8c176 100644
--- a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
+++ b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
@@ -77,6 +77,9 @@ static int qat_hash_get_state1_size(enum icp_qat_hw_auth_algo 
qat_hash_alg)
case ICP_QAT_HW_AUTH_ALGO_SHA256:
return QAT_HW_ROUND_UP(ICP_QAT_HW_SHA256_STATE1_SZ,
QAT_HW_DEFAULT_ALIGNMENT);
+   case ICP_QAT_HW_AUTH_ALGO_SHA384:
+   return QAT_HW_ROUND_UP(ICP_QAT_HW_SHA384_STATE1_SZ,
+   QAT_HW_DEFAULT_ALIGNMENT);
case ICP_QAT_HW_AUTH_ALGO_SHA512:
return QAT_HW_ROUND_UP(ICP_QAT_HW_SHA512_STATE1_SZ,
QAT_HW_DEFAULT_ALIGNMENT);
@@ -114,6 +117,8 @@ static int qat_hash_get_digest_size(enum 
icp_qat_hw_auth_algo qat_hash_alg)
return ICP_QAT_HW_SHA224_STATE1_SZ;
case ICP_QAT_HW_AUTH_ALGO_SHA256:
return ICP_QAT_HW_SHA256_STATE1_SZ;
+   case ICP_QAT_HW_AUTH_ALGO_SHA384:
+   return ICP_QAT_HW_SHA384_STATE1_SZ;
case ICP_QAT_HW_AUTH_ALGO_SHA512:
return ICP_QAT_HW_SHA512_STATE1_SZ;
case ICP_QAT_HW_AUTH_ALGO_MD5:
@@ -138,6 +143,8 @@ static int qat_hash_get_block_size(enum 
icp_qat_hw_auth_algo qat_hash_alg)
return SHA256_CBLOCK;
case ICP_QAT_HW_AUTH_ALGO_SHA256:
return SHA256_CBLOCK;
+   case ICP_QAT_HW_AUTH_ALGO_SHA384:
+   return SHA512_CBLOCK;
case ICP_QAT_HW_AUTH_ALGO_SHA512:
return SHA512_CBLOCK;
case ICP_QAT_HW_AUTH_ALGO_GALOIS_128:
@@ -187,6 +194,17 @@ static int partial_hash_sha256(uint8_t *data_in, uint8_t 
*data_out)
return 0;
 }

+static int partial_hash_sha384(uint8_t *data_in, uint8_t *data_out)
+{
+   SHA512_CTX ctx;
+
+   if (!SHA384_Init())
+   return -EFAULT;
+   SHA512_Transform(, data_in);
+   rte_memcpy(data_out, , SHA512_DIGEST_LENGTH);
+   return 0;
+}
+
 static int partial_hash_sha512(uint8_t *data_in, uint8_t *data_out)
 {
SHA512_CTX ctx;
@@ -252,6 +270,13 @@ static int partial_hash_compute(enum icp_qat_hw_auth_algo 
hash_alg,
*hash_state_out_be32 =
rte_bswap32(*(((uint32_t *)digest)+i));
break;
+   case ICP_QAT_HW_AUTH_ALGO_SHA384:
+   if (partial_hash_sha384(data_in, digest))
+   return -EFAULT;
+   for (i = 0; i < digest_size >> 3; i++, hash_state_out_be64++)
+   *hash_state_out_be64 =
+   rte_bswap64(*(((uint64_t *)digest)+i));
+   break;
case ICP_QAT_HW_AUTH_ALGO_SHA512:
if (partial_hash_sha512(data_in, digest))
return -EFAULT;
@@ -616,6 +641,14 @@ int qat_alg_aead_session_create_content_desc_auth(struct 
qat_session *cdesc,
}
state2_size = ICP_QAT_HW_SHA256_STATE2_SZ;
break;
+   case ICP_QAT_HW_AUTH_ALGO_SHA384:
+   if (qat_alg_do_precomputes(ICP_QAT_HW_AUTH_ALGO_SHA384,
+   authkey, authkeylen, cdesc->cd_cur_ptr, _size)) {
+   PMD_DRV_LOG(ERR, "(SHA)precompute failed");
+   return -EFAULT;
+   }
+   state2_size = ICP_QAT_HW_SHA384_STATE2_SZ;
+   break;
case ICP_QAT_HW_AUTH_ALGO_SHA512:
if (qat_alg_do_precomputes(ICP_QAT_HW_AUTH_ALGO_SHA512,
authkey, authkeylen, cdesc->cd_cur_ptr, _size)) {
diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c
index e872759..a474512 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -533,15 +533,18 @@ qat_crypto_sym_configure_session_auth(struct 
rte_cryptodev *dev,
case RTE_CRYPTO_AUTH_SHA1_HMAC:

[dpdk-dev] [PATCH 0/2] add aes-sha384-hmac support to Intel QAT driver

2016-08-18 Thread Deepak Kumar Jain

This patchset adds support of aes-sha384-hmac
in Intel(R) QuickAssist Technology driver.

This patchset depends on following patchset:
"crypto/qat: add aes-sha224-hmac capability to Intel QAT driver"
(http://dpdk.org/dev/patchwork/patch/15226/)

Jain, Deepak K (2):
  crypto/qat: add aes-sha384-hmac capability to Intel QAT driver
  app/test: add test cases for aes-sha384-hmac for Intel QAT driver

 app/test/test_cryptodev_aes.c|  6 +++--
 doc/guides/cryptodevs/qat.rst|  1 +
 drivers/crypto/qat/qat_adf/qat_algs_build_desc.c | 33 
 drivers/crypto/qat/qat_crypto.c  | 10 ---
 4 files changed, 44 insertions(+), 6 deletions(-)

-- 
2.5.5

[dpdk-dev] [PATCH 2/2] app/test: add test cases for aes-sha224-hmac for Intel QAT driver

2016-08-18 Thread Deepak Kumar Jain

From: "Jain, Deepak K" 

Added aes-sha224-hmac algorithm to test file for Intel(R) QuickAssist
Technology Driver

Signed-off-by: Deepak Kumar Jain 
---
 app/test/test_cryptodev_aes.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/app/test/test_cryptodev_aes.c b/app/test/test_cryptodev_aes.c
index bf832b6..6ad2674 100644
--- a/app/test/test_cryptodev_aes.c
+++ b/app/test/test_cryptodev_aes.c
@@ -211,14 +211,16 @@ static const struct aes_test_case aes_test_cases[] = {
.test_descr = "AES-128-CBC HMAC-SHA224 Encryption Digest",
.test_data = _test_data_8,
.op_mask = AES_TEST_OP_ENC_AUTH_GEN,
-   .pmd_mask = AES_TEST_TARGET_PMD_MB
+   .pmd_mask = AES_TEST_TARGET_PMD_MB |
+   AES_TEST_TARGET_PMD_QAT
},
{
.test_descr = "AES-128-CBC HMAC-SHA224 Decryption Digest "
"Verify",
.test_data = _test_data_8,
.op_mask = AES_TEST_OP_AUTH_VERIFY_DEC,
-   .pmd_mask = AES_TEST_TARGET_PMD_MB
+   .pmd_mask = AES_TEST_TARGET_PMD_MB |
+   AES_TEST_TARGET_PMD_QAT
},
{
.test_descr = "AES-128-CBC HMAC-SHA384 Encryption Digest",
-- 
2.5.5

[dpdk-dev] [PATCH 1/2] crypto/qat: add aes-sha224-hmac capability to Intel QAT driver

2016-08-18 Thread Deepak Kumar Jain

From: "Jain, Deepak K" 

Added support of aes-sha224-hmac in Intel(R) QuickAssist driver

Signed-off-by: Deepak Kumar Jain 
---
 doc/guides/cryptodevs/qat.rst|  1 +
 drivers/crypto/qat/qat_adf/qat_algs_build_desc.c | 33 
 drivers/crypto/qat/qat_crypto.c  |  4 ++-
 3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst
index 485abb4..7f630be 100644
--- a/doc/guides/cryptodevs/qat.rst
+++ b/doc/guides/cryptodevs/qat.rst
@@ -53,6 +53,7 @@ Cipher algorithms:
 Hash algorithms:

 * ``RTE_CRYPTO_AUTH_SHA1_HMAC``
+* ``RTE_CRYPTO_AUTH_SHA224_HMAC``
 * ``RTE_CRYPTO_AUTH_SHA256_HMAC``
 * ``RTE_CRYPTO_AUTH_SHA512_HMAC``
 * ``RTE_CRYPTO_AUTH_AES_XCBC_MAC``
diff --git a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c 
b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
index 521a9c4..77e6548 100644
--- a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
+++ b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
@@ -71,6 +71,9 @@ static int qat_hash_get_state1_size(enum icp_qat_hw_auth_algo 
qat_hash_alg)
case ICP_QAT_HW_AUTH_ALGO_SHA1:
return QAT_HW_ROUND_UP(ICP_QAT_HW_SHA1_STATE1_SZ,
QAT_HW_DEFAULT_ALIGNMENT);
+   case ICP_QAT_HW_AUTH_ALGO_SHA224:
+   return QAT_HW_ROUND_UP(ICP_QAT_HW_SHA224_STATE1_SZ,
+   QAT_HW_DEFAULT_ALIGNMENT);
case ICP_QAT_HW_AUTH_ALGO_SHA256:
return QAT_HW_ROUND_UP(ICP_QAT_HW_SHA256_STATE1_SZ,
QAT_HW_DEFAULT_ALIGNMENT);
@@ -107,6 +110,8 @@ static int qat_hash_get_digest_size(enum 
icp_qat_hw_auth_algo qat_hash_alg)
switch (qat_hash_alg) {
case ICP_QAT_HW_AUTH_ALGO_SHA1:
return ICP_QAT_HW_SHA1_STATE1_SZ;
+   case ICP_QAT_HW_AUTH_ALGO_SHA224:
+   return ICP_QAT_HW_SHA224_STATE1_SZ;
case ICP_QAT_HW_AUTH_ALGO_SHA256:
return ICP_QAT_HW_SHA256_STATE1_SZ;
case ICP_QAT_HW_AUTH_ALGO_SHA512:
@@ -129,6 +134,8 @@ static int qat_hash_get_block_size(enum 
icp_qat_hw_auth_algo qat_hash_alg)
switch (qat_hash_alg) {
case ICP_QAT_HW_AUTH_ALGO_SHA1:
return SHA_CBLOCK;
+   case ICP_QAT_HW_AUTH_ALGO_SHA224:
+   return SHA256_CBLOCK;
case ICP_QAT_HW_AUTH_ALGO_SHA256:
return SHA256_CBLOCK;
case ICP_QAT_HW_AUTH_ALGO_SHA512:
@@ -158,6 +165,17 @@ static int partial_hash_sha1(uint8_t *data_in, uint8_t 
*data_out)
return 0;
 }

+static int partial_hash_sha224(uint8_t *data_in, uint8_t *data_out)
+{
+   SHA256_CTX ctx;
+
+   if (!SHA224_Init())
+   return -EFAULT;
+   SHA256_Transform(, data_in);
+   rte_memcpy(data_out, , SHA256_DIGEST_LENGTH);
+   return 0;
+}
+
 static int partial_hash_sha256(uint8_t *data_in, uint8_t *data_out)
 {
SHA256_CTX ctx;
@@ -220,6 +238,13 @@ static int partial_hash_compute(enum icp_qat_hw_auth_algo 
hash_alg,
*hash_state_out_be32 =
rte_bswap32(*(((uint32_t *)digest)+i));
break;
+   case ICP_QAT_HW_AUTH_ALGO_SHA224:
+   if (partial_hash_sha224(data_in, digest))
+   return -EFAULT;
+   for (i = 0; i < digest_size >> 2; i++, hash_state_out_be32++)
+   *hash_state_out_be32 =
+   rte_bswap32(*(((uint32_t *)digest)+i));
+   break;
case ICP_QAT_HW_AUTH_ALGO_SHA256:
if (partial_hash_sha256(data_in, digest))
return -EFAULT;
@@ -575,6 +600,14 @@ int qat_alg_aead_session_create_content_desc_auth(struct 
qat_session *cdesc,
}
state2_size = RTE_ALIGN_CEIL(ICP_QAT_HW_SHA1_STATE2_SZ, 8);
break;
+   case ICP_QAT_HW_AUTH_ALGO_SHA224:
+   if (qat_alg_do_precomputes(ICP_QAT_HW_AUTH_ALGO_SHA224,
+   authkey, authkeylen, cdesc->cd_cur_ptr, _size)) {
+   PMD_DRV_LOG(ERR, "(SHA)precompute failed");
+   return -EFAULT;
+   }
+   state2_size = ICP_QAT_HW_SHA224_STATE2_SZ;
+   break;
case ICP_QAT_HW_AUTH_ALGO_SHA256:
if (qat_alg_do_precomputes(ICP_QAT_HW_AUTH_ALGO_SHA256,
authkey, authkeylen, cdesc->cd_cur_ptr, _size)) {
diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c
index b9558d0..e872759 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -539,6 +539,9 @@ qat_crypto_sym_configure_session_auth(struct rte_cryptodev 
*dev,
case RTE_CRYPTO_AUTH_SHA512_HMAC:
session->qat_hash_alg = ICP_QAT_HW_AUTH_ALGO_SHA512;
break;
+   case

[dpdk-dev] [PATCH 0/2] add aes-sha224-hmac support to Intel QAT driver

2016-08-18 Thread Deepak Kumar Jain

This patchset adds support of aes-sha224-hmac
in Intel(R) QuickAssist Technology driver.

This patchset depends on following patchset:
"crypto/qat: add MD5 HMAC capability to Intel QAT driver"
(http://dpdk.org/dev/patchwork/patch/15165/)

Jain, Deepak K (2):
  crypto/qat: add aes-sha224-hmac capability to Intel QAT driver
  app/test: add test cases for aes-sha224-hmac for Intel QAT driver

 app/test/test_cryptodev_aes.c|  6 +++--
 doc/guides/cryptodevs/qat.rst|  1 +
 drivers/crypto/qat/qat_adf/qat_algs_build_desc.c | 33 
 drivers/crypto/qat/qat_crypto.c  |  4 ++-
 4 files changed, 41 insertions(+), 3 deletions(-)

-- 
2.5.5

[dpdk-dev] [PATCH] ivshmem: remove integration in dpdk

2016-08-18 Thread Burakov, Anatoly

> Following discussions on the mailing list [1] and since nobody stood up to
> implement the necessary cleanups, here is the ivshmem integration removal.
> 
> There is not much to say about this patch, a lot of code is being removed.
> The default configuration file for packet_ordering example is replaced with
> the "native" x86 file.
> The only tricky part is in eal_memory with the memseg index stuff.
> 
> More cleanups can be done after this but will come in subsequent patchsets.
> 
> [1]: http://dpdk.org/ml/archives/dev/2016-June/040844.html
> 
> Signed-off-by: David Marchand 

Acked-by: Anatoly  Burakov

[dpdk-dev] [PATCH] optimize vhost enqueue

2016-08-18 Thread Wang, Zhihong

Thanks Maxime and Yuanhan for your review and suggestions!
Please help review the v2 of this patch.


> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Wednesday, August 17, 2016 5:51 PM
> To: Maxime Coquelin 
> Cc: Wang, Zhihong ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] optimize vhost enqueue
> 
> On Wed, Aug 17, 2016 at 11:17:46AM +0200, Maxime Coquelin wrote:
> > >>>This is something I've thought about while writing the code, the reason I
> > >>>keep it as one function body is that:
> > >>>
> > >>> 1. This function is very performance sensitive, and we need full 
> > >>> control of
> > >>>code ordering (You can compare with the current performance with
> the
> > >>>mrg_rxbuf feature turned on to see the difference).
> > >>
> > >>Will inline functions help?
> > >
> > >
> > >Optimization in this patch actually reorganizes the code from its logic,
> > >so it's not suitable for making separated functions.
> > >
> > >I'll explain this in v2.
> >
> > I agree with Yuanhan.
> > Inline functions should not break the optimizations.
> > IMHO, this is mandatory for the patch to be accepted.
> 
> Yes.
> 
> > It seems you are not the only one facing the issue:
> > https://github.com/YanVugenfirer/kvm-guest-drivers-windows/issues/70
> >
> > So a dedicated fix is really important.
> 
> Yes.
> 
> >
> > >This patch doesn't try to fix this issue, it rewrites the logic totally,
> > >and somehow fixes this issue.
> > >
> > >Do you think integrating this whole patch into the stable branch will work?
> > >Personally I think it makes more sense.
> >
> > No.
> > We don't even know why/how it fixes the Windows issue, which would be
> > the first thing to understand before integrating a fix in stable branch.
> 
> Yes.
> 
> >
> > And the stable branch is not meant for integrating such big reworks,
> > it is only meant to fix bugs.
> 
> Yes.
> 
> > The risk of regressions have to be avoided as much as possible.
> 
> Yes.
> 
>   --yliu

[dpdk-dev] How to get the number of used descriptors for vHost-users

2016-08-18 Thread Ali Volkan Atli


Hi all

Is there a function -like rte_eth_rx_queue_count()- to get the number of used 
(or free) descriptors in a vHost RX/TX's queue? I used 
rte_vhost_avail_entries() but I'm not sure it is correct way.

Thanks in advance.

- Volkan

[dpdk-dev] [PATCH] vhost: add back support for concurrent enqueue

2016-08-18 Thread Rich Lane

On Mon, Aug 15, 2016 at 7:37 PM, Yuanhan Liu 
wrote:

> On Mon, Aug 15, 2016 at 01:00:24PM -0700, Rich Lane wrote:
> > Concurrent enqueue is an important performance optimization when the
> number
> > of cores used for switching is different than the number of vhost queues.
> > I've observed a 20% performance improvement compared to a strategy that
> > binds queues to cores.
> >
> > The atomic cmpset is only executed when the application calls
> > rte_vhost_enqueue_burst_mp. Benchmarks show no performance impact
> > when not using concurrent enqueue.
> >
> > Mergeable RX buffers aren't supported by concurrent enqueue to minimize
> > code complexity.
>
> I think that would break things when Mergeable rx is enabled (which is
> actually enabled by default).
>

Would it be reasonable to return -ENOTSUP in this case, and restrict
concurrent enqueue
to devices where VIRTIO_NET_F_MRG_RXBUF is disabled?

I could also add back concurrent enqueue support for mergeable RX, but I
was hoping to avoid
that since the mergeable codepath is already complex and wouldn't be used
in high performance
deployments.

> Besides that, as mentioned in the last week f2f talk, do you think adding
> a new flag RTE_VHOST_USER_CONCURRENT_ENQUEUE (for
> rte_vhost_driver_register())
> __might__ be a better idea? That could save us a API, to which I don't
> object
> though.
>

Sure, I can add a flag instead. That will be similar to how the rte_ring
library picks the enqueue method.

[dpdk-dev] vhost [query] : support for multiple ports and non VMDQ devices in vhost switch

2016-08-18 Thread Tan, Jianfeng

Hi Maxime,

> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
> Sent: Thursday, August 18, 2016 3:43 PM
> To: Tan, Jianfeng; Yuanhan Liu; Pankaj Chauhan
> Cc: dev at dpdk.org; hemant.agrawal at nxp.com; shreyansh.jain at nxp.com
> Subject: Re: [dpdk-dev] vhost [query] : support for multiple ports and non
> VMDQ devices in vhost switch
> 
> Hi,
> 
> On 08/18/2016 04:35 AM, Tan, Jianfeng wrote:
> > Hi Maxime,
> >
> > On 8/17/2016 7:18 PM, Maxime Coquelin wrote:
> >> Hi Jianfeng,
> >>
> >> On 08/17/2016 04:33 AM, Tan, Jianfeng wrote:
> >>> Hi,
> >>>
> >>> Please review below proposal of Pankaj and myself after an offline
> >>> discussion. (Pankaj, please correct me if I'm going somewhere wrong).
> >>>
> >>> a. Remove HW dependent option, --strip-vlan, because different kinds
> of
> >>> NICs behave differently. It's a bug fix.
> >>> b. Abstract switching logic into a framework, so that we can develop
> >>> different kinds of switching logics. In this phase, we will have two
> >>> switching logics: (1) a simple software-based mac learning switching;
> >>> (2) VMDQ based switching. Any other advanced switching logics can be
> >>> proposed based on this framework.
> >>> c. Merge tep_termination example vxlan as a switching logic of the
> >>> framework.
> >>
> >> I was also thinking of making physical port optional and add MAC
> >> learning,
> >> so this is all good for me.
> >
> > To make it clear, we are not proposing to eliminate physical port,
> > instead, we just eliminate the binding of VMDQ and virtio ports,
> > superseding it with a MAC learning switching.
> 
> So you confirm we could have setup with only VMs, and no physical
> NIC? That's what I meant when saying "making physical port optional".

Yes, this case would be supported too.

> 
> >
> >>
> >> Let me know if I can help in implementation, I'll be happy to
> >> contribute.
> >
> > Thank you for participating. Currently, I'm working on item a (will be a
> > quick and simple fix). Pankaj is working on item b (which would be a
> > huge change). Item c is depending on item b. So let's wait RFC patch
> > from Pankaj and see what we can help.
> 
> Good, let's wait for Pankaj's RFC.
> 
> >
> >>
> >>> To be decided:
> >>> d. Support multiple physical ports.
> >>> e. Keep the current way to use vhost lib directly or use vhost pmd
> >>> instead.
> >> Do you see advantages of using vhost lib directly vs. pmd?
> >> Wouldn't using vhost pmd make achieving zero-copy harder?
> >> (I'm not sure, I didn't investigate the topic much for now).
> >
> > Yes, by using vhost lib, we can add back the removed feature zero-copy.
> > But my understanding is zero-copy (nic-to-vm or vm-to-nic) or delayed
> > copy (vm-to-vm) would be great and common features, which should be
> > integrated into vhost lib and enabled in vhost pmd, so that all
> > applications can benefit from it. And in fact, Yuanhan is working on the
> > delayed copy now. An exception is rx-side-zero-copy, I don't know if
> > it's common enough to be integrated in a vhost lib, because it'll
> > require hardware queue binding.
> 
> Ok, I'm interrested in knowing how vm-to-vm delayed copy will be
> implemented.
> 
> > Besides, vhost pmd would be easier to use than vhost lib (personal
> > opinion). Secondly, vhost pmd would be more clear in logic, 1:1:1
> > mapping among vhost port, unix socket path, and virtio port. Thirdly, by
> > using vhost pmd, we can treat vhost ports the way of physical ports,
> > otherwise, we use different API to receive/transmit packets.
> 
> I'm 100% aligned with you on this, the vhost pmd makes things more
> standard, so more flexible.
> 
> >>
> >> Also, if we use pmd directly, then it would no more be a vhost switch
> >> only, as it could potentially be used with physical NICs also.
> >
> > You mean we are building a switch instead of vhost switch? Yes, a switch
> > can switch packets between virtio-virtio and virtio-physical nic.
> 
> And physical-physical also, as we will be standard API with the
> vhost-pmd, nothing will prevent using it with only physical switches,
> no?

Oh yes, I agree.

Thanks,
Jianfeng

> 
> Thanks,
> Maxime
> 
> >
> > Thanks,
> > Jianfeng
> >
> >>
> >> Any thoughts?
> >>
> >> Thanks,
> >> Maxime
> >

[dpdk-dev] vhost [query] : support for multiple ports and non VMDQ devices in vhost switch

2016-08-18 Thread Tan, Jianfeng

Hi Maxime,

On 8/17/2016 7:18 PM, Maxime Coquelin wrote:
> Hi Jianfeng,
>
> On 08/17/2016 04:33 AM, Tan, Jianfeng wrote:
>> Hi,
>>
>> Please review below proposal of Pankaj and myself after an offline
>> discussion. (Pankaj, please correct me if I'm going somewhere wrong).
>>
>> a. Remove HW dependent option, --strip-vlan, because different kinds of
>> NICs behave differently. It's a bug fix.
>> b. Abstract switching logic into a framework, so that we can develop
>> different kinds of switching logics. In this phase, we will have two
>> switching logics: (1) a simple software-based mac learning switching;
>> (2) VMDQ based switching. Any other advanced switching logics can be
>> proposed based on this framework.
>> c. Merge tep_termination example vxlan as a switching logic of the
>> framework.
>
> I was also thinking of making physical port optional and add MAC 
> learning,
> so this is all good for me.

To make it clear, we are not proposing to eliminate physical port, 
instead, we just eliminate the binding of VMDQ and virtio ports, 
superseding it with a MAC learning switching.

>
> Let me know if I can help in implementation, I'll be happy to
> contribute.

Thank you for participating. Currently, I'm working on item a (will be a 
quick and simple fix). Pankaj is working on item b (which would be a 
huge change). Item c is depending on item b. So let's wait RFC patch 
from Pankaj and see what we can help.

>
>> To be decided:
>> d. Support multiple physical ports.
>> e. Keep the current way to use vhost lib directly or use vhost pmd 
>> instead.
> Do you see advantages of using vhost lib directly vs. pmd?
> Wouldn't using vhost pmd make achieving zero-copy harder?
> (I'm not sure, I didn't investigate the topic much for now).

Yes, by using vhost lib, we can add back the removed feature zero-copy. 
But my understanding is zero-copy (nic-to-vm or vm-to-nic) or delayed 
copy (vm-to-vm) would be great and common features, which should be 
integrated into vhost lib and enabled in vhost pmd, so that all 
applications can benefit from it. And in fact, Yuanhan is working on the 
delayed copy now. An exception is rx-side-zero-copy, I don't know if 
it's common enough to be integrated in a vhost lib, because it'll 
require hardware queue binding.

Besides, vhost pmd would be easier to use than vhost lib (personal 
opinion). Secondly, vhost pmd would be more clear in logic, 1:1:1 
mapping among vhost port, unix socket path, and virtio port. Thirdly, by 
using vhost pmd, we can treat vhost ports the way of physical ports, 
otherwise, we use different API to receive/transmit packets.

>
> Also, if we use pmd directly, then it would no more be a vhost switch
> only, as it could potentially be used with physical NICs also.

You mean we are building a switch instead of vhost switch? Yes, a switch 
can switch packets between virtio-virtio and virtio-physical nic.

Thanks,
Jianfeng

>
> Any thoughts?
>
> Thanks,
> Maxime

[dpdk-dev] [PATCH 1/2] examples/vhost: rename dev-basename

2016-08-18 Thread Maxime Coquelin



On 08/18/2016 10:35 AM, Yuanhan Liu wrote:
> On Thu, Aug 18, 2016 at 10:22:38AM +0200, Maxime Coquelin wrote:
>> Hi Jiayu,
>>
>> On 08/16/2016 06:14 PM, Jiayu Hu wrote:
>>> In examples/vhost, "dev-basename" is a program option, which is to set
>>> the vhost-net socket used by vhost-user, or the character device used
>>> by vhost-cuse. Since vhost-cuse should be dropped, and "dev-basename"
>>> is not a suitable name for the vhost-net socket. Therefore, this patch
>>> is to change this option name for examples/vhost.
>>>
>>> Signed-off-by: Jiayu Hu 
>>> ---
>>> examples/vhost/main.c | 41 +
>>> 1 file changed, 21 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
>>> index 92a9823..a718577 100644
>>> --- a/examples/vhost/main.c
>>> +++ b/examples/vhost/main.c
>>> @@ -90,9 +90,6 @@
>>> /* Size of buffers used for snprintfs. */
>>> #define MAX_PRINT_BUFF 6072
>>>
>>> -/* Maximum character device basename size. */
>>> -#define MAX_BASENAME_SZ 10
>>> -
>>> /* Maximum long option length for option parsing. */
>>> #define MAX_LONG_OPT_SZ 64
>>>
>>> @@ -139,8 +136,8 @@ static uint32_t burst_rx_delay_time = BURST_RX_WAIT_US;
>>> /* Specify the number of retries on RX. */
>>> static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
>>>
>>> -/* Character device basename. Can be set by user. */
>>> -static char dev_basename[MAX_BASENAME_SZ] = "vhost-net";
>>> +/* Socket file path. Can be set by user */
>>> +static char socket_file[PATH_MAX] = "vhost-net";
>>
>> Not very important, but now that we only support vhost-user,
>> maybe we could default the name to "vhost-user"?
>>
>> There is no real convention I think, but this is what OVS is
>> used to use in its examples.
>
> I think it doesn't matter now, since since the 2nd patch, --socket-path
> is a must but not optional any more, meaning there is no default
> socket file path.

Yes, just noticed it while reviewing patch 2.
So this is all good to me for this patch.

Thanks,
Maxime

[dpdk-dev] [PATCH 2/2] examples/vhost: support multiple socket files

2016-08-18 Thread Maxime Coquelin



On 08/16/2016 06:14 PM, Jiayu Hu wrote:
> When examples/vhost runs in client mode, only one QEMU can be connected.
> This is because that examples/vhost just supports one socket file. This
> patch is to add multiple sockets support for examples/vhost.
>
> Signed-off-by: Jiayu Hu 
> ---
>  examples/vhost/main.c | 50 ++
>  1 file changed, 38 insertions(+), 12 deletions(-)
>
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index a718577..9974f0b 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -136,8 +136,9 @@ static uint32_t burst_rx_delay_time = BURST_RX_WAIT_US;
>  /* Specify the number of retries on RX. */
>  static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
>
> -/* Socket file path. Can be set by user */
> -static char socket_file[PATH_MAX] = "vhost-net";
Default name being removed, you can drop my comment on patch 1. :)

> +/* Socket file paths. Can be set by user */
> +static char *socket_files;
> +int nb_sockets;
Any reason not to make it static?

>  /* empty vmdq configuration structure. Filled in programatically */
>  static struct rte_eth_conf vmdq_conf_default = {
> @@ -395,11 +396,12 @@ static int
>  us_vhost_parse_socket_path(const char *q_arg)
>  {
>   /* parse number string */
> -
>   if (strnlen(q_arg, PATH_MAX) > PATH_MAX)
>   return -1;
> - else
> - snprintf((char *)_file, PATH_MAX, "%s", q_arg);
> +
> + socket_files = realloc(socket_files, PATH_MAX * (nb_sockets + 1));
> + snprintf(socket_files + nb_sockets * PATH_MAX, PATH_MAX, "%s", q_arg);
> + nb_sockets++;
>
>   return 0;
>  }
> @@ -1341,14 +1343,30 @@ print_stats(void)
>   }
>  }
>
> +/*
> + * This function is used to unregister drivers.
> + */
> +static void
> +unregister_drivers(int socket_num)
> +{
> + int i, ret;
> +
> + for (i = 0; i < socket_num; i++) {
> + ret = rte_vhost_driver_unregister(socket_files + i * PATH_MAX);
> + if (ret != 0)
> + RTE_LOG(ERR, VHOST_CONFIG,
> + "Fail to unregister vhost driver for %s.\n",
> + socket_files + i * PATH_MAX);
> + }
> +}
> +
>  /* When we receive a INT signal, unregister vhost driver */
>  static void
>  sigint_handler(__rte_unused int signum)
>  {
>   /* Unregister vhost driver. */
> - int ret = rte_vhost_driver_unregister((char *)_file);
> - if (ret != 0)
> - rte_exit(EXIT_FAILURE, "vhost driver unregister failure.\n");
> + unregister_drivers(nb_sockets);
> +
>   exit(0);
>  }
>
> @@ -1412,12 +1430,15 @@ main(int argc, char *argv[])
>  {
>   unsigned lcore_id, core_id = 0;
>   unsigned nb_ports, valid_num_ports;
> - int ret;
> + int ret, i;
>   uint8_t portid;
>   static pthread_t tid;
>   char thread_name[RTE_MAX_THREAD_NAME_LEN];
>   uint64_t flags = 0;
>
> + nb_sockets = 0;
> + socket_files = NULL;
Since socket_files is static, no need to initialize it to NULL.
If you staticize nb_sockets, same remark will apply.

[dpdk-dev] [PATCH 1/2] examples/vhost: rename dev-basename

2016-08-18 Thread Maxime Coquelin

Hi Jiayu,

On 08/16/2016 06:14 PM, Jiayu Hu wrote:
> In examples/vhost, "dev-basename" is a program option, which is to set
> the vhost-net socket used by vhost-user, or the character device used
> by vhost-cuse. Since vhost-cuse should be dropped, and "dev-basename"
> is not a suitable name for the vhost-net socket. Therefore, this patch
> is to change this option name for examples/vhost.
>
> Signed-off-by: Jiayu Hu 
> ---
>  examples/vhost/main.c | 41 +
>  1 file changed, 21 insertions(+), 20 deletions(-)
>
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index 92a9823..a718577 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -90,9 +90,6 @@
>  /* Size of buffers used for snprintfs. */
>  #define MAX_PRINT_BUFF 6072
>
> -/* Maximum character device basename size. */
> -#define MAX_BASENAME_SZ 10
> -
>  /* Maximum long option length for option parsing. */
>  #define MAX_LONG_OPT_SZ 64
>
> @@ -139,8 +136,8 @@ static uint32_t burst_rx_delay_time = BURST_RX_WAIT_US;
>  /* Specify the number of retries on RX. */
>  static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
>
> -/* Character device basename. Can be set by user. */
> -static char dev_basename[MAX_BASENAME_SZ] = "vhost-net";
> +/* Socket file path. Can be set by user */
> +static char socket_file[PATH_MAX] = "vhost-net";

Not very important, but now that we only support vhost-user,
maybe we could default the name to "vhost-user"?

There is no real convention I think, but this is what OVS is
used to use in its examples.

Other than that:
Reviewed-by: Maxime Coquelin 

Thanks,
Maxime

[dpdk-dev] [PATCH] examples/vhost: remove VLAN strip option

2016-08-18 Thread Tan, Jianfeng

Hi Maxime,

> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
> Sent: Thursday, August 18, 2016 3:52 PM
> To: Tan, Jianfeng; dev at dpdk.org
> Cc: yuanhan.liu at linux.intel.com
> Subject: Re: [dpdk-dev] [PATCH] examples/vhost: remove VLAN strip option
> 
> 
> 
> On 08/18/2016 07:46 AM, Jianfeng Tan wrote:
> > When VMDQ is enabled, different NICs have different behaviors for
> > disabling VLAN strip. In detail, i40e only enables/disables it of
> > PF's main vsi; fm10k cannot disable VLAN strip, etc. We now remove
> > this option, --vlan-strip, to reduce any confusion. And now, VLAN
> > strip will be enabled and cannot be disabled.
> >
> > Reported-by: Qian Xu 
> > Signed-off-by: Jianfeng Tan 
> > ---
> >  doc/guides/sample_app_ug/vhost.rst | 11 ---
> >  examples/vhost/main.c  | 26 +-
> >  2 files changed, 5 insertions(+), 32 deletions(-)
> 
> Minor comment below. Other than that:
> Reviewed-by: Maxime Coquelin 
> 
> > diff --git a/doc/guides/sample_app_ug/vhost.rst
> b/doc/guides/sample_app_ug/vhost.rst
> > index 2b7defc..a204f78 100644
> > --- a/doc/guides/sample_app_ug/vhost.rst
> > +++ b/doc/guides/sample_app_ug/vhost.rst
> > @@ -496,13 +496,10 @@ due to the large and complex code, it's better to
> redesign it than fixing
> >  it to make it work again. Hence, zero copy may be added back later.
> >
> >  **VLAN strip.**
> > -The VLAN strip option enable/disable the VLAN strip on host, if disabled,
> the guest will receive the packets with VLAN tag.
> > -It is enabled by default.
> > -
> > -.. code-block:: console
> > -
> > -./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \
> > - -- --vlan-strip [0, 1]
> > +VLAN strip option is removed, because different NICs have different
> behaviors
> > +when disabling VLAN strip. Such feature, which heavily depends on
> hardware,
> > +should be removed from this example to deduce confusion. Now, VLAN
> strip is
> I'm not a native English speaker, but I would use "reduce" instead of
> "deduce" here. I might be wrong, so feel free to keep as-is if
> appropriate.

Nice catch! Yes, "reduce" instead of "deduce".

Thanks,
Jianfeng

> 
> Thanks,
> Maxime

[dpdk-dev] [PATCH] examples/vhost: remove VLAN strip option

2016-08-18 Thread Maxime Coquelin



On 08/18/2016 07:46 AM, Jianfeng Tan wrote:
> When VMDQ is enabled, different NICs have different behaviors for
> disabling VLAN strip. In detail, i40e only enables/disables it of
> PF's main vsi; fm10k cannot disable VLAN strip, etc. We now remove
> this option, --vlan-strip, to reduce any confusion. And now, VLAN
> strip will be enabled and cannot be disabled.
>
> Reported-by: Qian Xu 
> Signed-off-by: Jianfeng Tan 
> ---
>  doc/guides/sample_app_ug/vhost.rst | 11 ---
>  examples/vhost/main.c  | 26 +-
>  2 files changed, 5 insertions(+), 32 deletions(-)

Minor comment below. Other than that:
Reviewed-by: Maxime Coquelin 

> diff --git a/doc/guides/sample_app_ug/vhost.rst 
> b/doc/guides/sample_app_ug/vhost.rst
> index 2b7defc..a204f78 100644
> --- a/doc/guides/sample_app_ug/vhost.rst
> +++ b/doc/guides/sample_app_ug/vhost.rst
> @@ -496,13 +496,10 @@ due to the large and complex code, it's better to 
> redesign it than fixing
>  it to make it work again. Hence, zero copy may be added back later.
>
>  **VLAN strip.**
> -The VLAN strip option enable/disable the VLAN strip on host, if disabled, 
> the guest will receive the packets with VLAN tag.
> -It is enabled by default.
> -
> -.. code-block:: console
> -
> -./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \
> - -- --vlan-strip [0, 1]
> +VLAN strip option is removed, because different NICs have different behaviors
> +when disabling VLAN strip. Such feature, which heavily depends on hardware,
> +should be removed from this example to deduce confusion. Now, VLAN strip is
I'm not a native English speaker, but I would use "reduce" instead of
"deduce" here. I might be wrong, so feel free to keep as-is if
appropriate.

Thanks,
Maxime

[dpdk-dev] vhost [query] : support for multiple ports and non VMDQ devices in vhost switch

2016-08-18 Thread Maxime Coquelin

Hi,

On 08/18/2016 04:35 AM, Tan, Jianfeng wrote:
> Hi Maxime,
>
> On 8/17/2016 7:18 PM, Maxime Coquelin wrote:
>> Hi Jianfeng,
>>
>> On 08/17/2016 04:33 AM, Tan, Jianfeng wrote:
>>> Hi,
>>>
>>> Please review below proposal of Pankaj and myself after an offline
>>> discussion. (Pankaj, please correct me if I'm going somewhere wrong).
>>>
>>> a. Remove HW dependent option, --strip-vlan, because different kinds of
>>> NICs behave differently. It's a bug fix.
>>> b. Abstract switching logic into a framework, so that we can develop
>>> different kinds of switching logics. In this phase, we will have two
>>> switching logics: (1) a simple software-based mac learning switching;
>>> (2) VMDQ based switching. Any other advanced switching logics can be
>>> proposed based on this framework.
>>> c. Merge tep_termination example vxlan as a switching logic of the
>>> framework.
>>
>> I was also thinking of making physical port optional and add MAC
>> learning,
>> so this is all good for me.
>
> To make it clear, we are not proposing to eliminate physical port,
> instead, we just eliminate the binding of VMDQ and virtio ports,
> superseding it with a MAC learning switching.

So you confirm we could have setup with only VMs, and no physical
NIC? That's what I meant when saying "making physical port optional".

>
>>
>> Let me know if I can help in implementation, I'll be happy to
>> contribute.
>
> Thank you for participating. Currently, I'm working on item a (will be a
> quick and simple fix). Pankaj is working on item b (which would be a
> huge change). Item c is depending on item b. So let's wait RFC patch
> from Pankaj and see what we can help.

Good, let's wait for Pankaj's RFC.

>
>>
>>> To be decided:
>>> d. Support multiple physical ports.
>>> e. Keep the current way to use vhost lib directly or use vhost pmd
>>> instead.
>> Do you see advantages of using vhost lib directly vs. pmd?
>> Wouldn't using vhost pmd make achieving zero-copy harder?
>> (I'm not sure, I didn't investigate the topic much for now).
>
> Yes, by using vhost lib, we can add back the removed feature zero-copy.
> But my understanding is zero-copy (nic-to-vm or vm-to-nic) or delayed
> copy (vm-to-vm) would be great and common features, which should be
> integrated into vhost lib and enabled in vhost pmd, so that all
> applications can benefit from it. And in fact, Yuanhan is working on the
> delayed copy now. An exception is rx-side-zero-copy, I don't know if
> it's common enough to be integrated in a vhost lib, because it'll
> require hardware queue binding.

Ok, I'm interrested in knowing how vm-to-vm delayed copy will be
implemented.

> Besides, vhost pmd would be easier to use than vhost lib (personal
> opinion). Secondly, vhost pmd would be more clear in logic, 1:1:1
> mapping among vhost port, unix socket path, and virtio port. Thirdly, by
> using vhost pmd, we can treat vhost ports the way of physical ports,
> otherwise, we use different API to receive/transmit packets.

I'm 100% aligned with you on this, the vhost pmd makes things more
standard, so more flexible.

>>
>> Also, if we use pmd directly, then it would no more be a vhost switch
>> only, as it could potentially be used with physical NICs also.
>
> You mean we are building a switch instead of vhost switch? Yes, a switch
> can switch packets between virtio-virtio and virtio-physical nic.

And physical-physical also, as we will be standard API with the
vhost-pmd, nothing will prevent using it with only physical switches,
no?

Thanks,
Maxime

>
> Thanks,
> Jianfeng
>
>>
>> Any thoughts?
>>
>> Thanks,
>> Maxime
>

[dpdk-dev] [PATCH v2 6/6] vhost: optimize cache access

2016-08-18 Thread Zhihong Wang

This patch reorders the code to delay virtio header write to optimize cache
access efficiency for cases where the mrg_rxbuf feature is turned on. It
reduces CPU pipeline stall cycles significantly.


Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/vhost_rxtx.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 60d63d3..15f7f9c 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -154,6 +154,7 @@ enqueue_packet(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
uint32_t mbuf_len = 0;
uint32_t mbuf_len_left = 0;
uint32_t copy_len = 0;
+   uint32_t copy_virtio_hdr = 0;
uint32_t extra_buffers = 0;

/* start with the first mbuf of the packet */
@@ -168,18 +169,17 @@ enqueue_packet(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
if (unlikely(!desc_host_write_addr))
goto error;

-   /* handle virtio header */
+   /*
+* handle virtio header, the actual write operation
+* is delayed for cache optimization.
+*/
virtio_hdr = (struct virtio_net_hdr_mrg_rxbuf *)
(uintptr_t)desc_host_write_addr;
-   memset((void *)(uintptr_t)&(virtio_hdr->hdr),
-   0, dev->vhost_hlen);
-   virtio_enqueue_offload(mbuf, &(virtio_hdr->hdr));
+   copy_virtio_hdr = 1;
vhost_log_write(dev, desc->addr, dev->vhost_hlen);
desc_write_offset = dev->vhost_hlen;
desc_chain_len = desc_write_offset;
desc_host_write_addr += desc_write_offset;
-   if (is_mrg_rxbuf)
-   virtio_hdr->num_buffers = 1;

/* start copy from mbuf to desc */
while (1) {
@@ -233,9 +233,18 @@ enqueue_packet(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
goto rollback;
}

-   /* copy mbuf data */
+   /* copy virtio header and mbuf data */
copy_len = RTE_MIN(desc->len - desc_write_offset,
mbuf_len_left);
+   if (copy_virtio_hdr) {
+   copy_virtio_hdr = 0;
+   memset((void *)(uintptr_t)&(virtio_hdr->hdr),
+   0, dev->vhost_hlen);
+   virtio_enqueue_offload(mbuf, &(virtio_hdr->hdr));
+   if (is_mrg_rxbuf)
+   virtio_hdr->num_buffers = extra_buffers + 1;
+   }
+
rte_memcpy((void *)(uintptr_t)desc_host_write_addr,
rte_pktmbuf_mtod_offset(mbuf, void *,
mbuf_len - mbuf_len_left),
-- 
2.7.4

[dpdk-dev] [PATCH v2 5/6] vhost: batch update used ring

2016-08-18 Thread Zhihong Wang

This patch enables batch update of the used ring for better efficiency.

Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/vhost-net.h  |  4 +++
 lib/librte_vhost/vhost_rxtx.c | 68 +--
 lib/librte_vhost/virtio-net.c | 15 --
 3 files changed, 68 insertions(+), 19 deletions(-)

diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 51fdf3d..a15182c 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -85,6 +85,10 @@ struct vhost_virtqueue {

/* Physical address of used ring, for logging */
uint64_tlog_guest_addr;
+
+   /* Shadow used ring for performance */
+   struct vring_used_elem  *shadow_used_ring;
+   uint32_tshadow_used_idx;
 } __rte_cache_aligned;

 /* Old kernels have no such macro defined */
diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 7db83d0..60d63d3 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -155,7 +155,6 @@ enqueue_packet(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
uint32_t mbuf_len_left = 0;
uint32_t copy_len = 0;
uint32_t extra_buffers = 0;
-   uint32_t used_idx_round = 0;

/* start with the first mbuf of the packet */
mbuf_len = rte_pktmbuf_data_len(mbuf);
@@ -207,17 +206,11 @@ enqueue_packet(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
goto rollback;
} else if (is_mrg_rxbuf) {
/* start with the next desc chain */
-   used_idx_round = vq->last_used_idx
-   & (vq->size - 1);
-   vq->used->ring[used_idx_round].id =
+   vq->shadow_used_ring[vq->shadow_used_idx].id =
desc_chain_head;
-   vq->used->ring[used_idx_round].len =
+   vq->shadow_used_ring[vq->shadow_used_idx].len =
desc_chain_len;
-   vhost_log_used_vring(dev, vq,
-   offsetof(struct vring_used,
-   ring[used_idx_round]),
-   sizeof(vq->used->ring[
-   used_idx_round]));
+   vq->shadow_used_idx++;
vq->last_used_idx++;
extra_buffers++;
virtio_hdr->num_buffers++;
@@ -255,12 +248,9 @@ enqueue_packet(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
desc_chain_len += copy_len;
}

-   used_idx_round = vq->last_used_idx & (vq->size - 1);
-   vq->used->ring[used_idx_round].id = desc_chain_head;
-   vq->used->ring[used_idx_round].len = desc_chain_len;
-   vhost_log_used_vring(dev, vq,
-   offsetof(struct vring_used, ring[used_idx_round]),
-   sizeof(vq->used->ring[used_idx_round]));
+   vq->shadow_used_ring[vq->shadow_used_idx].id = desc_chain_head;
+   vq->shadow_used_ring[vq->shadow_used_idx].len = desc_chain_len;
+   vq->shadow_used_idx++;
vq->last_used_idx++;

return 0;
@@ -275,6 +265,45 @@ error:
 }

 static inline void __attribute__((always_inline))
+update_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq,
+   uint32_t used_idx_start)
+{
+   if (used_idx_start + vq->shadow_used_idx < vq->size) {
+   rte_memcpy(>used->ring[used_idx_start],
+   >shadow_used_ring[0],
+   vq->shadow_used_idx *
+   sizeof(struct vring_used_elem));
+   vhost_log_used_vring(dev, vq,
+   offsetof(struct vring_used,
+   ring[used_idx_start]),
+   vq->shadow_used_idx *
+   sizeof(struct vring_used_elem));
+   } else {
+   uint32_t part_1 = vq->size - used_idx_start;
+   uint32_t part_2 = vq->shadow_used_idx - part_1;
+
+   rte_memcpy(>used->ring[used_idx_start],
+   >shadow_used_ring[0],
+   part_1 *
+   sizeof(struct vring_used_elem));
+   vhost_log_used_vring(dev, vq,
+   offsetof(struct vring_used,
+   ring[used_idx_start]),
+   part_1 *
+   sizeof(struct vring_used_elem));
+   rte_memcpy(>used->ring[0],
+   >shadow_used_ring[part_1],
+   part_2 *
+

[dpdk-dev] [PATCH v2 4/6] vhost: add desc prefetch

2016-08-18 Thread Zhihong Wang

This patch adds descriptor prefetch to hide cache access latency.

Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/vhost_rxtx.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 939957d..7db83d0 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -131,6 +131,11 @@ loop_check(struct vhost_virtqueue *vq, uint16_t avail_idx, 
uint32_t pkt_left)
if (pkt_left == 0 || avail_idx == vq->last_used_idx)
return 1;

+   /* prefetch the next desc */
+   if (pkt_left > 1 && avail_idx != vq->last_used_idx + 1)
+   rte_prefetch0(>desc[vq->avail->ring[
+   (vq->last_used_idx + 1) & (vq->size - 1)]]);
+
return 0;
 }

-- 
2.7.4

[dpdk-dev] [PATCH v2 3/6] vhost: remove useless volatile

2016-08-18 Thread Zhihong Wang

This patch removes useless volatile attribute to allow compiler
optimization.

Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/vhost-net.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 38593a2..51fdf3d 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -71,7 +71,7 @@ struct vhost_virtqueue {
uint32_tsize;

/* Last index used on the available ring */
-   volatile uint16_t   last_used_idx;
+   uint16_tlast_used_idx;
 #define VIRTIO_INVALID_EVENTFD (-1)
 #define VIRTIO_UNINITIALIZED_EVENTFD   (-2)

-- 
2.7.4

[dpdk-dev] [PATCH v2 2/6] vhost: remove obsolete

2016-08-18 Thread Zhihong Wang

This patch removes obsolete functions.

Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/vhost_rxtx.c | 408 --
 1 file changed, 408 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 8e6d782..939957d 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -125,414 +125,6 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf, struct 
virtio_net_hdr *net_hdr)
}
 }

-static inline void
-copy_virtio_net_hdr(struct virtio_net *dev, uint64_t desc_addr,
-   struct virtio_net_hdr_mrg_rxbuf hdr)
-{
-   if (dev->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf))
-   *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
-   else
-   *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
-}
-
-static inline int __attribute__((always_inline))
-copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
- struct rte_mbuf *m, uint16_t desc_idx)
-{
-   uint32_t desc_avail, desc_offset;
-   uint32_t mbuf_avail, mbuf_offset;
-   uint32_t cpy_len;
-   struct vring_desc *desc;
-   uint64_t desc_addr;
-   struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};
-
-   desc = >desc[desc_idx];
-   desc_addr = gpa_to_vva(dev, desc->addr);
-   /*
-* Checking of 'desc_addr' placed outside of 'unlikely' macro to avoid
-* performance issue with some versions of gcc (4.8.4 and 5.3.0) which
-* otherwise stores offset on the stack instead of in a register.
-*/
-   if (unlikely(desc->len < dev->vhost_hlen) || !desc_addr)
-   return -1;
-
-   rte_prefetch0((void *)(uintptr_t)desc_addr);
-
-   virtio_enqueue_offload(m, _hdr.hdr);
-   copy_virtio_net_hdr(dev, desc_addr, virtio_hdr);
-   vhost_log_write(dev, desc->addr, dev->vhost_hlen);
-   PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0);
-
-   desc_offset = dev->vhost_hlen;
-   desc_avail  = desc->len - dev->vhost_hlen;
-
-   mbuf_avail  = rte_pktmbuf_data_len(m);
-   mbuf_offset = 0;
-   while (mbuf_avail != 0 || m->next != NULL) {
-   /* done with current mbuf, fetch next */
-   if (mbuf_avail == 0) {
-   m = m->next;
-
-   mbuf_offset = 0;
-   mbuf_avail  = rte_pktmbuf_data_len(m);
-   }
-
-   /* done with current desc buf, fetch next */
-   if (desc_avail == 0) {
-   if ((desc->flags & VRING_DESC_F_NEXT) == 0) {
-   /* Room in vring buffer is not enough */
-   return -1;
-   }
-   if (unlikely(desc->next >= vq->size))
-   return -1;
-
-   desc = >desc[desc->next];
-   desc_addr = gpa_to_vva(dev, desc->addr);
-   if (unlikely(!desc_addr))
-   return -1;
-
-   desc_offset = 0;
-   desc_avail  = desc->len;
-   }
-
-   cpy_len = RTE_MIN(desc_avail, mbuf_avail);
-   rte_memcpy((void *)((uintptr_t)(desc_addr + desc_offset)),
-   rte_pktmbuf_mtod_offset(m, void *, mbuf_offset),
-   cpy_len);
-   vhost_log_write(dev, desc->addr + desc_offset, cpy_len);
-   PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset),
-cpy_len, 0);
-
-   mbuf_avail  -= cpy_len;
-   mbuf_offset += cpy_len;
-   desc_avail  -= cpy_len;
-   desc_offset += cpy_len;
-   }
-
-   return 0;
-}
-
-/**
- * This function adds buffers to the virtio devices RX virtqueue. Buffers can
- * be received from the physical port or from another virtio device. A packet
- * count is returned to indicate the number of packets that are succesfully
- * added to the RX queue. This function works when the mbuf is scattered, but
- * it doesn't support the mergeable feature.
- */
-static inline uint32_t __attribute__((always_inline))
-virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
- struct rte_mbuf **pkts, uint32_t count)
-{
-   struct vhost_virtqueue *vq;
-   uint16_t avail_idx, free_entries, start_idx;
-   uint16_t desc_indexes[MAX_PKT_BURST];
-   uint16_t used_idx;
-   uint32_t i;
-
-   LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__);
-   if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) {
-   RTE_LOG(ERR, VHOST_DATA, "(%d) %s: invalid virtqueue idx %d.\n",
-   dev->vid, __func__, queue_id);
-   return 0;
-   }
-
-   vq = dev->virtqueue[queue_id];
-   if (unlikely(vq->enabled == 0))
-

[dpdk-dev] [PATCH v2 1/6] vhost: rewrite enqueue

2016-08-18 Thread Zhihong Wang

This patch implements the vhost logic from scratch into a single function
designed for high performance and better maintainability.

Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/vhost_rxtx.c | 212 --
 1 file changed, 205 insertions(+), 7 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 08a73fd..8e6d782 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -91,7 +91,7 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t 
qp_nb)
return (is_tx ^ (idx & 1)) == 0 && idx < qp_nb * VIRTIO_QNUM;
 }

-static void
+static inline void __attribute__((always_inline))
 virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr)
 {
if (m_buf->ol_flags & PKT_TX_L4_MASK) {
@@ -533,19 +533,217 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
return pkt_idx;
 }

+static inline uint32_t __attribute__((always_inline))
+loop_check(struct vhost_virtqueue *vq, uint16_t avail_idx, uint32_t pkt_left)
+{
+   if (pkt_left == 0 || avail_idx == vq->last_used_idx)
+   return 1;
+
+   return 0;
+}
+
+static inline uint32_t __attribute__((always_inline))
+enqueue_packet(struct virtio_net *dev, struct vhost_virtqueue *vq,
+   uint16_t avail_idx, struct rte_mbuf *mbuf,
+   uint32_t is_mrg_rxbuf)
+{
+   struct virtio_net_hdr_mrg_rxbuf *virtio_hdr;
+   struct vring_desc *desc;
+   uint64_t desc_host_write_addr = 0;
+   uint32_t desc_chain_head = 0;
+   uint32_t desc_chain_len = 0;
+   uint32_t desc_current = 0;
+   uint32_t desc_write_offset = 0;
+   uint32_t mbuf_len = 0;
+   uint32_t mbuf_len_left = 0;
+   uint32_t copy_len = 0;
+   uint32_t extra_buffers = 0;
+   uint32_t used_idx_round = 0;
+
+   /* start with the first mbuf of the packet */
+   mbuf_len = rte_pktmbuf_data_len(mbuf);
+   mbuf_len_left = mbuf_len;
+
+   /* get the current desc */
+   desc_current = vq->avail->ring[(vq->last_used_idx) & (vq->size - 1)];
+   desc_chain_head = desc_current;
+   desc = >desc[desc_current];
+   desc_host_write_addr = gpa_to_vva(dev, desc->addr);
+   if (unlikely(!desc_host_write_addr))
+   goto error;
+
+   /* handle virtio header */
+   virtio_hdr = (struct virtio_net_hdr_mrg_rxbuf *)
+   (uintptr_t)desc_host_write_addr;
+   memset((void *)(uintptr_t)&(virtio_hdr->hdr),
+   0, dev->vhost_hlen);
+   virtio_enqueue_offload(mbuf, &(virtio_hdr->hdr));
+   vhost_log_write(dev, desc->addr, dev->vhost_hlen);
+   desc_write_offset = dev->vhost_hlen;
+   desc_chain_len = desc_write_offset;
+   desc_host_write_addr += desc_write_offset;
+   if (is_mrg_rxbuf)
+   virtio_hdr->num_buffers = 1;
+
+   /* start copy from mbuf to desc */
+   while (1) {
+   /* get the next mbuf if the current done */
+   if (!mbuf_len_left) {
+   if (mbuf->next) {
+   mbuf = mbuf->next;
+   mbuf_len = rte_pktmbuf_data_len(mbuf);
+   mbuf_len_left = mbuf_len;
+   } else
+   break;
+   }
+
+   /* get the next desc if the current done */
+   if (desc->len <= desc_write_offset) {
+   if (desc->flags & VRING_DESC_F_NEXT) {
+   /* go on with the current desc chain */
+   desc_write_offset = 0;
+   desc_current = desc->next;
+   desc = >desc[desc_current];
+   desc_host_write_addr =
+   gpa_to_vva(dev, desc->addr);
+   if (unlikely(!desc_host_write_addr))
+   goto rollback;
+   } else if (is_mrg_rxbuf) {
+   /* start with the next desc chain */
+   used_idx_round = vq->last_used_idx
+   & (vq->size - 1);
+   vq->used->ring[used_idx_round].id =
+   desc_chain_head;
+   vq->used->ring[used_idx_round].len =
+   desc_chain_len;
+   vhost_log_used_vring(dev, vq,
+   offsetof(struct vring_used,
+   ring[used_idx_round]),
+   sizeof(vq->used->ring[
+   used_idx_round]));
+   vq->last_used_idx++;
+   extra_buffers++;
+

[dpdk-dev] [PATCH v2 0/6] vhost: optimize enqueue

2016-08-18 Thread Zhihong Wang

This patch set optimizes the vhost enqueue function.

It implements the vhost logic from scratch into a single function designed
for high performance and good maintainability, and improves CPU efficiency
significantly by optimizing cache access, which means:

 *  For fast frontends (eg. DPDK virtio pmd), higher performance (maximum
throughput) can be achieved.

 *  For slow frontends (eg. kernel virtio-net), better scalability can be
achieved, each vhost core can support more connections since it takes
less cycles to handle each single frontend.

The main optimization techniques are:

 1. Reorder code to reduce CPU pipeline stall cycles.

 2. Batch update the used ring for better efficiency.

 3. Prefetch descriptor to hide cache latency.

 4. Remove useless volatile attribute to allow compiler optimization.

In the existing code there're 2 callbacks for vhost enqueue:

 *  virtio_dev_merge_rx for mrg_rxbuf turned on cases.

 *  virtio_dev_rx for mrg_rxbuf turned off cases.

The performance of the existing code is not optimal, especially when the
mrg_rxbuf feature turned on. Also, having 2 separated functions increases
maintenance efforts.

---
Changes in v2:

 1. Split the big function into several small ones

 2. Use multiple patches to explain each optimization

 3. Add comments

Zhihong Wang (6):
  vhost: rewrite enqueue
  vhost: remove obsolete
  vhost: remove useless volatile
  vhost: add desc prefetch
  vhost: batch update used ring
  vhost: optimize cache access

 lib/librte_vhost/vhost-net.h  |   6 +-
 lib/librte_vhost/vhost_rxtx.c | 582 +++---
 lib/librte_vhost/virtio-net.c |  15 +-
 3 files changed, 228 insertions(+), 375 deletions(-)

-- 
2.7.4

53 matches

Mail list logo