[dpdk-dev] 答复: [PATCH] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-05 Thread XU Liang
I think the base_virtadd will be carefully selected by user when they need it. 
So maybe it's not a real problem. ?:>The real reason is I can't find a easy way 
to get the end address of hugepages. Can you give me some?suggestions 
?--Burakov, 
Anatoly ?2014?11?5?(???) 23:10?? 
?dev at dpdk.org RE: [dpdk-dev] 
[PATCH] eal: map uio resources after hugepages when the  base_virtaddr is 
configured.
I have a slight problems with this patch.

The base_virtaddr doesn't necessarily correspond to an address that everything 
gets mapped to. It's a "hint" of sorts, that may or may not be taken into 
account by mmap. Therefore we can't simply assume that if we requested a 
base-virtaddr, everything will get mapped at exactly that address. We also 
can't assume that hugepages will be ordered one after the other and occupy 
neatly all the contiguous virtual memory between base_virtaddr and 
base_virtaddr + internal_config.memory - there may be holes, for whatever 
reasons.

Also, 

Thanks,
Anatoly

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of lxu
Sent: Wednesday, November 5, 2014 1:25 PM
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH] eal: map uio resources after hugepages when the 
base_virtaddr is configured.

---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..bc7ed3a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -289,6 +289,11 @@ pci_uio_map_resource(struct rte_pci_device *dev)
struct rte_pci_addr *loc = &dev->addr;
struct mapped_pci_resource *uio_res;
struct pci_map *maps;
+   static void * requested_addr = NULL;
+   if (internal_config.base_virtaddr && NULL == requested_addr) {
+   requested_addr = (uint8_t *) internal_config.base_virtaddr 
+   + internal_config.memory;
+   }

dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN; @@ -371,10 +376,12 @@ 
pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   mapaddr = pci_map_resource(requested_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+   else if (NULL != requested_addr)
+   requested_addr = (uint8_t *)mapaddr + 
maps[j].size;
}

if (fail) {
--
1.9.1


[dpdk-dev] [PATCH v4 0/3] Fix packet length issue

2014-11-05 Thread Thomas Monjalon
> > This patch set fix packet length issue in vhost app, and enhance code by
> > extracting a function to replace duplicated codes in one copy and zero copy
> > TX function.
> > 
> > -v4 chang:
> >  Check offset value and extra bytes inside packet buffer cross page 
> > boundary.
> > 
> > -v3 change:
> >  Extract a function to replace duplicated codes in one copy and zero copy 
> > TX function.
> > 
> > -v2 change:
> >  Update data length by plus offset in first segment instead of last segment.
> > 
> > -v1 change:
> >  Update the packet length by plus offset;
> >  Use macro to replace constant.
> > 
> > Changchun Ouyang (3):
> >   Fix packet length issue in vhost.
> >   Extract a function to replace duplicated codes in vhost.
> >   Check offset value in vhost
> > 
> >  examples/vhost/main.c | 142 
> > +++---
> >  1 file changed, 65 insertions(+), 77 deletions(-)
> > 
> > --
> > 1.8.4.2
> 
> Acked-by: Konstantin Ananyev 

Applied

Thanks
-- 
Thomas


[dpdk-dev] [PATCH] lib/librte_vhost: code style fixes

2014-11-05 Thread Thomas Monjalon
2014-11-05 21:55, Xie, Huawei:
> > 2014-11-05 21:21, Xie, Huawei:
> Btw, for alignment in structure,  I see different projects use different way,
> some use tab, and some use space(between type and var), do we have rules for 
> this?
> {
>   Typevar1
>   Typevar2
> }

I prefer using spaces for such alignments. But Linux style is to use a mix of
tab and space with tab being equivalent to 8 spaces.
We have to write clear rules about such things and have our own checkpatch.
For release 1.8, we still don't care about this level of details.
Please let's re-open this topic after the release.

-- 
Thomas


[dpdk-dev] [PATCH v3 00/10] split architecture specific operations

2014-11-05 Thread Thomas Monjalon
> > The set of patches split x86 architecture specific operations from DPDK and 
> > put
> > them to x86 arch directory.
> > This will make the adoption of DPDK much easier on other computer 
> > architecture.
> > For a new architecture, just add an architecture specific directory and
> > necessary building configuration files, then DPDK eal library can support 
> > it.
> >
> >
> > Reviewing patchset from Chao, I ended up modifying it along the way,
> > so here is a new iteration of this patchset.
> >
> > Changes since Chao v2 patchset :
> >
> > - added a preliminary patch for moving rte_atomic.h (for better readability)
> > - fixed documentation generation
> > - implemented a generic header for each arch specific header (cpuflags, 
> > memcpy,
> >prefetch were missing)
> > - removed C++ stuff from generic headers
> > - centralised all doxygen stuff in generic headers (no need to have 
> > duplicates)
> > - refactored rte_cycles functions
> > - moved vmware tsc stuff to arch rte_cycles.h headers
> > - finished x86 factorisation
> >
> >
> > Little summary of current state :
> >
> > - all applications continue to include the eal headers as before, these 
> > headers
> >are the arch-specific ones
> > - the arch specific headers always include the generic ones. The generic 
> > headers
> >contain the doxygen documentation and code common to all architectures
> > - a x86 architecture has been defined which handles both 32bits and 64bits
> >peculiarities
> >
> >
> > It builds fine for 32/64 bits (w and w/o "force intrinsics"), but I really 
> > would
> > like a lot of eyes on this (and I would say, especially, rte_cycles, 
> > rte_memcpy
> > and rte_cpuflags).
> > I still have some concerns about the use of intrinsics for architecture != 
> > x86
> > but I think Chao will be the best to look at this.
> >
> >
> Acked-by: Chao Zhu 

Acked-by: Thomas Monjalon 

Applied

Thanks for the big clean-up.
Now we are ready to accept new CPUs.

-- 
Thomas


[dpdk-dev] [PATCH] eal_pci.c: pci_scan_one: fix inaccurate NUMA node error comment

2014-11-05 Thread Thomas Monjalon
> > Signed-off-by: Matthew Hall 
> 
> Acked-by: Bruce Richardson 

Applied

Thanks
-- 
Thomas


[dpdk-dev] [PATCH] eal/bsd: fix return value when mapping device resources in secondary process

2014-11-05 Thread Thomas Monjalon
> On FreeBSD, when initializing a secondary process,
> EAL was complaining if there were ports not bound
> to nic_uio module, exiting the application, which
> should not happen, as this is expected behaviour,
> and not an error
> 
> Signed-off-by: Pablo de Lara 

Acked-by: Thomas Monjalon 

Applied

Thanks
-- 
Thomas


[dpdk-dev] [PATCH] mk: pass MODULE_CFLAGS to BSD module build system

2014-11-05 Thread Thomas Monjalon
> > When building shared libs (for both GCC and CLANG targets), -fPIC flag
> > has been added to CFLAGS and leaks to BSD module build system causing
> > the following error:
> > 
> > fatal error: error in backend: Cannot select: 0x802ad8010: i64 =
> > X86ISD::WrapperRIP 0x802ade110
> >   [ID=13]
> >   0x802ade110: i64 = TargetGlobalAddress 0
> > [TF=5] [ID=10]
> > 
> > Reset CFLAGS to MODULE_CFLAGS before building BSD module.
> > 
> > Signed-off-by: Sergio Gonzalez Monroy
> > 
> 
> Acked-by: Pablo de Lara 

Applied

Thanks
-- 
Thomas


[dpdk-dev] bifurcated driver

2014-11-05 Thread Zhou, Danny
Hi Thomas,

Thanks for sharing the links to ibverbs, I will take a close look at it and 
compare it to bifurcated driver. My take
after a rough review is that idea is very much similar, but bifurcated driver 
implementation is generic for any 
Ethernet device based on existing af_packet mechanism, with extension of 
exchanging the messages between 
user space and kernel space driver.

I have an internal document to summary the pros and cons of below solutions, 
except for ibvers, but 
will be adding it shortly.

- igb_uio
- uio_pci_generic
- VFIO
- bifurcated driver

Short answers to your questions:
>   - upstream status
Adding IOMMU based memory protection and generic descriptor description support 
now, into version 2 
kernel patches.

>   - usable with kernel netdev
af_packet based, and relevant patchset will be submitted to netdev for sure.

>   - usable in a vm
No, it does no coexist with SRIOV for number of reasons. but if you 
pass-through a PF to a VM, it works perfect.

>   - usable for Ethernet
It could work with all Ethernet NICs, as flow director is available and NIC 
driver support new net_ops to split off 
queue pairs for user space.

>   - hardware requirements
No specific hardware requirements. All mainstream NICs have multiple qpairs and 
flow director support. 

>   - security protection
Leverage IOMMU to provide memory protection on Intel platform. Other archs 
provide similar memory protection
mechanism, so we only use arch-agnostic DMA memory allocation APIs in kernel to 
support memory protection.

>   - performance
DPDK native performance on user space queues, as long as drop_en is enabled to 
avoid head-of-line blocking.

-Danny

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, November 05, 2014 9:01 PM
> To: Zhou, Danny
> Cc: dev at dpdk.org; Fastabend, John R
> Subject: Re: [dpdk-dev] bifurcated driver
> 
> Hi Danny,
> 
> 2014-10-31 17:36, O'driscoll, Tim:
> > Bifurcated Driver (Danny.Zhou at intel.com)
> 
> Thanks for the presentation of bifurcated driver during the community call.
> I asked if you looked at ibverbs and you wanted a link to check.
> The kernel module is here:
>   
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core
> The userspace library:
>   http://git.kernel.org/cgit/libs/infiniband/libibverbs.git
> 
> Extract from Kconfig:
> "
> config INFINIBAND_USER_ACCESS
>   tristate "InfiniBand userspace access (verbs and CM)"
>   select ANON_INODES
>   ---help---
> Userspace InfiniBand access support.  This enables the
> kernel side of userspace verbs and the userspace
> communication manager (CM).  This allows userspace processes
> to set up connections and directly access InfiniBand
> hardware for fast-path operations.  You will also need
> libibverbs, libibcm and a hardware driver library from
> .
> "
> 
> It seems to be close to the bifurcated driver needs.
> Not sure if it can solve the security issues if there is no dedicated MMU
> in the NIC.
> 
> I feel we should sum up pros and cons of
>   - igb_uio
>   - uio_pci_generic
>   - VFIO
>   - ibverbs
>   - bifurcated driver
> I suggest to consider these criterias:
>   - upstream status
>   - usable with kernel netdev
>   - usable in a vm
>   - usable for ethernet
>   - hardware requirements
>   - security protection
>   - performance
> 
> --
> Thomas


[dpdk-dev] [PATCH v2] ixgbe: fix icc issue with mbuf initializer

2014-11-05 Thread Thomas Monjalon
> When using Intel C++ compiler(icc) 14.0.1.106 or the older icc 13.x
> version, the mbuf initializer variable was not getting configured
> correctly, as the mb_def variable was not set correctly. This is due
> to an issue with icc (DPD200249565 which already been fixed in
> icc 14.0.2 and newer compiler release) where it incorrectly calculates
> the field offsets with initializers when zero-sized fields
> are used in a structure.
> To work around this, the code in ixgbe_rxq_vec_setup does not setup the
> fields using an initializer, but instead assigns the values individually
> in code
> NOTE: There is no performance impact to this change as the queue
> setup functions are not data-plane APIs, but are only used at app
> initialization.
> 
> Signed-off-by: Bruce Richardson 

Acked-by: Thomas Monjalon 

Applied

Thanks
-- 
Thomas


[dpdk-dev] [PATCH] i40e: fix build of VXLAN packet identification debug

2014-11-05 Thread Thomas Monjalon
> > The commit 15dbb63ef9e9f108e7dcd837b88234f27a1ec258 didn't compile,
> > if CONFIG_RTE_LIBRTE_I40E_DEBUG_DRIVER is enabled.
> > 
> > Signed-off-by: Choonho Son 
> 
> Acked-by: Pablo de Lara 

Applied

Thanks
-- 
Thomas


[dpdk-dev] [PATCH] app/test: pci_autotest test fails if there is any device bound to igb_uio driver

2014-11-05 Thread Thomas Monjalon
> Since commit a155d430119 ("support link bonding device initialization"),
> rte_eal_pci_probe() is called in rte_eal_init().
> pci_autotest called it to bind devices to the test_driver and test_driver2.
> Therefore, the function is called twice and devices already allocated
> will cause the test fail.
> 
> This patch solves that issue, unregistering all previous drivers before
> calling rte_eal_pci_probe() for the first time, so DPDK does not try
> to allocate data for the devices, binding them to their previous
> drivers again.
> 
> Signed-off-by: Pablo de Lara 

Applied

Thanks
-- 
Thomas


[dpdk-dev] [PATCH] lib/librte_vhost: code style fixes

2014-11-05 Thread Thomas Monjalon
2014-11-05 21:21, Xie, Huawei:
> Thomas:
> I checked before. checkpatch reports  9 warnings, "over 80 characters" and 
> "prefer pr_deubg".
> This code style fixes code style issue only, not the pr_debug/printk issue. 
> Thoughts?

Using pr_debug is a code style fix.

> Besides, I don't understand the MISORDERED_TYPE.

You should replace (long long unsigned) by (unsigned long long).

> > -Original Message-
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Sent: Wednesday, November 05, 2014 1:42 PM
> > To: Xie, Huawei
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] lib/librte_vhost: code style fixes
> > 
> > Hi Huawei,
> > 
> > checkpatch.pl reports some errors of types PREFER_PR_LEVEL
> > and MISORDERED_TYPE.
> > 
> > --
> > Thomas



[dpdk-dev] bifurcated driver

2014-11-05 Thread Zhou, Danny


From: Alex Markuze [mailto:a...@weka.io]
Sent: Wednesday, November 05, 2014 11:19 PM
To: Thomas Monjalon
Cc: Zhou, Danny; dev at dpdk.org; Fastabend, John R
Subject: Re: [dpdk-dev] bifurcated driver



On Wed, Nov 5, 2014 at 5:14 PM, Alex Markuze mailto:alex at 
weka.io>> wrote:
On Wed, Nov 5, 2014 at 3:00 PM, Thomas Monjalon mailto:thomas.monjalon at 6wind.com>> wrote:
Hi Danny,

2014-10-31 17:36, O'driscoll, Tim:
> Bifurcated Driver (Danny.Zhou at intel.com)

Thanks for the presentation of bifurcated driver during the community call.
I asked if you looked at ibverbs and you wanted a link to check.
The kernel module is here:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core
The userspace library:
http://git.kernel.org/cgit/libs/infiniband/libibverbs.git

Extract from Kconfig:
"
config INFINIBAND_USER_ACCESS
tristate "InfiniBand userspace access (verbs and CM)"
select ANON_INODES
---help---
  Userspace InfiniBand access support.  This enables the
  kernel side of userspace verbs and the userspace
  communication manager (CM).  This allows userspace processes
  to set up connections and directly access InfiniBand
  hardware for fast-path operations.  You will also need
  libibverbs, libibcm and a hardware driver library from
  .
"

It seems to be close to the bifurcated driver needs.
Not sure if it can solve the security issues if there is no dedicated MMU
in the NIC.

Mellanox NIC's and other  RDMA HW (Infiniband/RoCE/iWARP) have MTT units - 
memory translation units - a dedicated MMU. These are filled via an ibv_reg_mr 
sys calls - this creates a Process VM to physical/iova memory mapping in the 
NIC. Thus each process can access only its own memory via the NIC. This is the 
way RNIC*s resolve the security issue I'm not sure how standard intel nics 
could support this scheme.

DZ:  Intel NICs does not provide such a embedded memory translation unit, but 
Intel chipset supports IOMMU with a generic memory protection mechanism to 
provide physical/iova memory mapping for DMA transactions on any PCIe device, 
rather than NIC only.

There is already a 6wind PMD for mellanox Nics. I'm assuming this PMD is verbs 
based and behaves similar to the bifurcated driver proposed.
http://www.mellanox.com/page/press_release_item?id=979

DZ: is it open sourced for community to use? I guess answer is No. Also, that 
PMD should have ported majority of Mellanox kernel driver code to DPDK as lots 
of NIC control related code needed, while the bifurcated driver approach only 
needs to support minimum Mellanox NIC specific packet rx/tx routines to achieve 
the DPDK claimed high performance by using all DPDK performance optimization 
techniques, such as huge page, fixed-size packet buffer, zero-copy, PMD, etc. 
Kernel driver still remains NIC control, without porting it to DPDK.

One, thing that I don't understand (And will be happy if some one could shed 
some light on), is how does the NIC supposed do distinguish between packets 
that need to go to the kernel driver rings and packets going to user space 
rings.

DZ: it depends on user. User should use standard ethtool (see below examples) 
to enable flow director and distribute packets to kernel or user space owned rx 
queue, by specifying 5-tuple as well as destination rxq index. Flow director 
embedded in NIC does flow classification and distribution, rather than the 
software approach like DPDK KNI. If you argue SRIOV has similar rx/tx queue 
pair partition capability, I would say bifurcated driver approach provides much 
more flexibility than SRIOV, (e.g, variable number of qpairs allocation for 
user space, L3 5-tuple based flow classification and distribution rather than 
SRIOV? L2 classification based on MAC or VLAN)

ethtool -K ethX ntuple on   # enable flow director
ethtool -N ethX flow-type udp4 src-ip 0.0.0.0 action 0   # distribute udp 
packet wit source IP 0.0.0.0 to rx queue No.0

I feel we should sum up pros and cons of
- igb_uio
- uio_pci_generic
- VFIO
- ibverbs
- bifurcated driver
I suggest to consider these criterias:
- upstream status
- usable with kernel netdev
- usable in a vm
- usable for ethernet
- hardware requirements
- security protection
- performance
Regarding IBVERBS - I'm not sure how its relevant to future DPDK development , 
but this is the run down as I know It.
 This is a veteran package called OFED , or its counterpart Mellanox OFED.
    The kernel drivers are upstream
    The PCI dev stays in the kernels care trough out its life span
    SRIOV support exists, paravirt support exists only(AFAIK) as an Office 
of the CTO(VMware) project called vRDMA
    Eth/RoCE (RDMA over Converged Ethernet)/IB
   === HW === RDMA capable HW ONLY

[dpdk-dev] [PATCH] lib/librte_vhost: code style fixes

2014-11-05 Thread Xie, Huawei
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, November 05, 2014 2:25 PM
> To: Xie, Huawei
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] lib/librte_vhost: code style fixes
> 
> 2014-11-05 21:21, Xie, Huawei:
> > Thomas:
> > I checked before. checkpatch reports  9 warnings, "over 80 characters" and
> "prefer pr_deubg".
> > This code style fixes code style issue only, not the pr_debug/printk issue.
> Thoughts?
> 
> Using pr_debug is a code style fix.
> 
Ok, will also fix this.
If there are other code style issue you want to be fixed in this patch, please 
let me know.
Btw, for alignment in structure,  I see different projects use different way, 
some use tab, and some use space(between type and var), do we have rules for 
this?
{
Typevar1
Typevar2
}
> > Besides, I don't understand the MISORDERED_TYPE.
> 
> You should replace (long long unsigned) by (unsigned long long).
Check patch at my side doesn't report warnings about the patch. I remember I 
have also check against the source file. Will double check and fix those.
> 
> > > -Original Message-
> > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > Sent: Wednesday, November 05, 2014 1:42 PM
> > > To: Xie, Huawei
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH] lib/librte_vhost: code style fixes
> > >
> > > Hi Huawei,
> > >
> > > checkpatch.pl reports some errors of types PREFER_PR_LEVEL
> > > and MISORDERED_TYPE.
> > >
> > > --
> > > Thomas



[dpdk-dev] [PATCH v4 7/8] ethdev: support of multiple sizes of redirection table

2014-11-05 Thread Thomas Monjalon
2014-10-31 17:03, Helin Zhang:
>  #define ETH_RSS_RETA_SIZE_64  64
>  #define ETH_RSS_RETA_SIZE_128 128
>  #define ETH_RSS_RETA_SIZE_512 512

Are these values still needed?
Why 256 is forbidden?
Maybe that some comments are needed here.

> +#define RTE_RETA_GROUP_SIZE   64



[dpdk-dev] [PATCH] lib/librte_vhost: code style fixes

2014-11-05 Thread Thomas Monjalon
Hi Huawei,

checkpatch.pl reports some errors of types PREFER_PR_LEVEL
and MISORDERED_TYPE.

-- 
Thomas


[dpdk-dev] [PATCH] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-05 Thread lxu
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..bc7ed3a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -289,6 +289,11 @@ pci_uio_map_resource(struct rte_pci_device *dev)
struct rte_pci_addr *loc = &dev->addr;
struct mapped_pci_resource *uio_res;
struct pci_map *maps;
+   static void * requested_addr = NULL;
+   if (internal_config.base_virtaddr && NULL == requested_addr) {
+   requested_addr = (uint8_t *) internal_config.base_virtaddr 
+   + internal_config.memory;
+   }

dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
@@ -371,10 +376,12 @@ pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   mapaddr = pci_map_resource(requested_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+   else if (NULL != requested_addr)
+   requested_addr = (uint8_t *)mapaddr + 
maps[j].size;
}

if (fail) {
-- 
1.9.1



[dpdk-dev] [PATCH] lib/librte_vhost: code style fixes

2014-11-05 Thread Xie, Huawei
Thomas:
I checked before. checkpatch reports  9 warnings, "over 80 characters" and 
"prefer pr_deubg".
This code style fixes code style issue only, not the pr_debug/printk issue. 
Thoughts?
Besides, I don't understand the MISORDERED_TYPE. 


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, November 05, 2014 1:42 PM
> To: Xie, Huawei
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] lib/librte_vhost: code style fixes
> 
> Hi Huawei,
> 
> checkpatch.pl reports some errors of types PREFER_PR_LEVEL
> and MISORDERED_TYPE.
> 
> --
> Thomas


[dpdk-dev] [PATCH v5 05/21] i40e: implement operations to add/delete flow director

2014-11-05 Thread De Lara Guarch, Pablo


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jingjing Wu
> Sent: Thursday, October 30, 2014 7:27 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v5 05/21] i40e: implement operations to
> add/delete flow director
> 
> Deal with two operations for flow director
>  - RTE_ETH_FILTER_ADD
>  - RTE_ETH_FILTER_DELETE
> Encode the flow inputs to programming packet.
> Sent the packet to filter programming queue and check status
> on the status report queue.
> 
> Signed-off-by: Jingjing Wu 
> ---
>  lib/librte_pmd_i40e/i40e_ethdev.c |   3 +
>  lib/librte_pmd_i40e/i40e_ethdev.h |   3 +
>  lib/librte_pmd_i40e/i40e_fdir.c   | 622
> ++
>  3 files changed, 628 insertions(+)
> 
> diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c
> b/lib/librte_pmd_i40e/i40e_ethdev.c
> index 8195e8a..fb43efb 100644
> --- a/lib/librte_pmd_i40e/i40e_ethdev.c
> +++ b/lib/librte_pmd_i40e/i40e_ethdev.c
> @@ -4577,6 +4577,7 @@ i40e_dev_filter_ctrl(struct rte_eth_dev *dev,
>enum rte_filter_op filter_op,
>void *arg)
>  {
> + struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data-
> >dev_private);
>   int ret = 0;
> 
>   if (dev == NULL)
> @@ -4585,6 +4586,8 @@ i40e_dev_filter_ctrl(struct rte_eth_dev *dev,
>   switch (filter_type) {
>   case RTE_ETH_FILTER_TUNNEL:
>   ret = i40e_tunnel_filter_handle(dev, filter_op, arg);

Missing break here?

> + case RTE_ETH_FILTER_FDIR:
> + ret = i40e_fdir_ctrl_func(pf, filter_op, arg);
>   break;
>   default:
>   PMD_DRV_LOG(WARNING, "Filter type (%d) not
> supported",



[dpdk-dev] [PATCH] Add user defined tag calculation callback to librte_distributor.

2014-11-05 Thread jigsaw
Hi Bruce,

OK understood. Then there's no real need to make any change.
But the question remains about this line:

http://dpdk.org/browse/dpdk/tree/lib/librte_distributor/rte_distributor.c#n285

new_tag = (next_mb->hash.rss | 1);

Why the logical OR is needed?

thx &
rgds,

-qinglai

On Wed, Nov 5, 2014 at 6:36 PM, Bruce Richardson  wrote:

> On Wed, Nov 05, 2014 at 05:11:51PM +0200, jigsaw wrote:
> > Hi Bruce,
> >
> > Thanks for reply.
> > The idea is triggered by real life use case, where the flow id is buried
> in
> > L3 payload. Deep packet inspection is one of the scenarios, tunneled pkts
> > is another.
> > However, only functionality is verified. Performance impact has not been
> > checked yet.
> >
> > To add distributor and another void * as params is nice.
> >
> > Your advice of extract tags in a row inspired me another solution, which
> is
> > to change the union hash inside rte_mbuf:
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index e8f9bfc..5b13c0b 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -185,6 +185,7 @@ struct rte_mbuf {
> > uint16_t id;
> > } fdir;   /**< Filter identifier if FDIR enabled
> */
> > uint32_t sched;   /**< Hierarchical scheduler */
> > +   uint32_t user;/**< User defined hash tag */
> > } hash;   /**< hash information */
> >
> > /* second cache line - fields only used in slow path or on TX */
> >
> > The new union field user is actually for documentation purpose only, coz
> > user application can set hash.rss value and have the same result.
> > Therefore, the user application is free to calculate the tag in burst
> mode
> > before calling rte_distributor_process.
> >
> > Then rte_distributor_process needs to read next_mb->hash.user.
> > Does it sounds better?
>
> What you propose is the exact original intent, though I did not try to add
> a new union member purely for documentation purposes. I had planned, but
> perhaps did not explain well enough, that the application would itself set
> up
> the tag as it thought best before passing packets to the distributor. I
> suspect
> that overloading the RSS field for this impeded that idea geting through.
>
> /Bruce
>
> >
> > I have another question: why the logical OR 1 is added to new_tag?
> >
> > thx &
> > rgds,
> > -qinglai
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Nov 5, 2014 at 4:27 PM, Bruce Richardson <
> bruce.richardson at intel.com
> > > wrote:
> >
> > > On Wed, Nov 05, 2014 at 03:30:37PM +0200, Qinglai Xiao wrote:
> > > > User defined tag calculation has access to mbuf.
> > > > Default tag is RSS hash result.
> > > >
> > >
> > > Interesting idea.
> > > Did you investigate was there any performance improvement or regression
> > > comparing
> > > whether the callback was called per-packet as packets were dequeued for
> > > distribution
> > > (i.e. how you have things now in your patch), compared to calling
> > > the callback in a loop to extract the tags for all packets initially? I
> > > suspect
> > > there probably isn't much performance difference either way, but it
> may be
> > > worth
> > > checking.
> > > One other point, is that I think the callback to extract the tag should
> > > have
> > > additional parameters - at least one, if not two. I would suggest that
> the
> > > distributor pointer be passed in, as well as an arbitrary void *
> pointer.
> > >
> > > Regards,
> > > /Bruce
> > >
> > > > Signed-off-by: Qinglai Xiao 
> > > > ---
> > > >  app/test/test_distributor.c  |6 +++---
> > > >  app/test/test_distributor_perf.c |2 +-
> > > >  lib/librte_distributor/rte_distributor.c |   12 ++--
> > > >  lib/librte_distributor/rte_distributor.h |7 ++-
> > > >  4 files changed, 20 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/app/test/test_distributor.c
> b/app/test/test_distributor.c
> > > > index ce06436..6ea4943 100644
> > > > --- a/app/test/test_distributor.c
> > > > +++ b/app/test/test_distributor.c
> > > > @@ -452,7 +452,7 @@ int test_error_distributor_create_name(void)
> > > >   char *name = NULL;
> > > >
> > > >   d = rte_distributor_create(name, rte_socket_id(),
> > > > - rte_lcore_count() - 1);
> > > > + rte_lcore_count() - 1, NULL);
> > > >   if (d != NULL || rte_errno != EINVAL) {
> > > >   printf("ERROR: No error on create() with NULL name
> > > param\n");
> > > >   return -1;
> > > > @@ -467,7 +467,7 @@ int
> test_error_distributor_create_numworkers(void)
> > > >  {
> > > >   struct rte_distributor *d = NULL;
> > > >   d = rte_distributor_create("test_numworkers", rte_socket_id(),
> > > > - RTE_MAX_LCORE + 10);
> > > > + RTE_MAX_LCORE + 10, NULL);
> > > >   if (d != NULL || rte_errno != EINVAL) {
> > > >   printf("ERROR:

[dpdk-dev] eal_flags_autotest fails with tailq fully local

2014-11-05 Thread Thomas Monjalon
> Since release 1.7.1 and patchset "tailq fully local", the test
> eal_flags_autotest doesn't work anymore. It fails because it doesn't
> find any free hugepage (rte_memzone_reserve).
> 
> The interesting commits are:
>   http://dpdk.org/browse/dpdk/commit/?id=e3f3b68c6e42
>   http://dpdk.org/browse/dpdk/commit/?id=dd0024ccbc70
> The test works only when these patches are reverted.
> 
> Here is how to reproduce:
>   git revert e3f3b68c6e42
>   git revert dd0024ccbc70
>   make config T=x86_64-native-linuxapp-gcc
>   make
>   mkdir -p /mnt/huge
>   mount -t hugetlbfs nodev /mnt/huge
>   echo 9 > 
> /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
>   build/app/test -c 1 -n 1 -m 1
>   eal_flags_autotest
> 
> I'd like this regression to be fixed for the coming release.
> If someone has an idea, he's welcome :)

Thanks to Anatoly, it's now fixed: 
http://dpdk.org/browse/dpdk/commit/?id=b600409890

-- 
Thomas


[dpdk-dev] [PATCH] Fix regression for eal_flags_autotest introduced by tailq rework

2014-11-05 Thread Thomas Monjalon
2014-11-05 13:24, De Lara Guarch, Pablo:
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Anatoly Burakov
> > Sent: Wednesday, November 05, 2014 12:11 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] Fix regression for eal_flags_autotest
> > introduced by tailq rework
> > 
> > As a result of moving tailq's into local memory, some tailq data
> > is now reserved in rte_malloc heaps (because it needs to be
> > shared across DPDK processes). The first thing DPDK initializes
> > is a log mempool, and since it creates a tailq, it reserves
> > space in rte_malloc heap before allocating the mempool itself.
> > By default, rte_malloc allocates way more space than is necessary,
> > so under some conditions (namely, overall memory available is low)
> > this results in malloc heap eating up so much memory that log
> > mempool is not able to allocate its memzone.
> > 
> > This patch fixes the unit tests to account for that change.
> > 
> > Signed-off-by: Anatoly Burakov 
> > ---
> 
> Acked-by: Pablo de Lara 

Tested-by: Thomas Monjalon 

echo 9 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
build/app/test -c 1 -n 1 -m 4

It works now. Thanks Anatoly!

Applied

Reminder for everyone: patches should be posted with --in-reply-to option
to insert the patch in the thread context of the mailing list.
It makes easier to follow discussions.
For a new version of a patchset, it's a good practice to reply to the cover
letter of the previous version.
For a bug fix (this case), it's a good idea to reply to the bug report.
So someone searching in the archives can find the fix he's looking for.

Thanks
-- 
Thomas


[dpdk-dev] [PATCH v4 3/3] vhost: Check offset value

2014-11-05 Thread Thomas Monjalon
2014-11-05 16:52, Xie, Huawei:
> Why don't we merge 1,2,3 patches?

Because it's simpler to understand small patches with a dedicated
explanation in the commit log of each patch.
Why do you want to merge them?

-- 
Thomas


[dpdk-dev] [PATCH v2] eal: add option --master-lcore

2014-11-05 Thread Thomas Monjalon
2014-11-05 11:54, Ananyev, Konstantin:
> From: Thomas Monjalon
> > +   long master_lcore;
> > +   char *parsing_end;
> > +   struct rte_config *cfg = rte_eal_get_configuration();
> > +
> > +   errno = 0;
> > +   master_lcore = strtol(arg, &parsing_end, 0);
> > +   if (errno || parsing_end == arg)
> > +   return -1;
> 
> Why not: "errno || parsing_end[0] != 0"
> ?
> Otherwise something like "1blah" would be considered as valid input.

Good point.

> > +   if (!(master_lcore >= 0 && master_lcore < RTE_MAX_LCORE))
> > +   return -1;
> 
> If negative values are not allowed, then why not:
> 
> unsigned long master_lcore;
> ...
> master_lcore = strtoul(...)
> ...
> if(master_clore > RTE_MAX_LCORE)
> return -1;   

Matter of taste. Your code is less explicit.
But it should be
if(master_clore >= RTE_MAX_LCORE)
Anyone else to vote for 1 solution or the other?

> > +   if (opt == OPT_MASTER_LCORE_NUM && !coremask_ok) {
> > +   RTE_LOG(ERR, EAL, "please specify the master lcore id"
> > +   "after specifying the coremask\n");
> > +   eal_usage(prgname);
> > +   return -1;
> > +   }
> > +
> 
> I don't really like an idea of introducing strict order between -c and  
> "--master-lcore..

Me too. And Aaron too :)

> Can we move check for coremask_ok/ and assignment of cfg->master_lcore out of 
>  
> while (getopt_long(...)) loop?
> 
> > ret = eal_parse_common_option(opt, optarg, &internal_config);
> > /* common parser is not happy */
> > if (ret < 0) {

Yes we should move the check outside of the loop.
First we should migrate all flags check in a common function for BSD and Linux.

Simon made the v1. I made the v2. Any volunteer for the v3?

-- 
Thomas


[dpdk-dev] [PATCH] eal: map uio resources after hugepages when --base_virtaddr is configured

2014-11-05 Thread XU Liang

When start a secondary process, we got error message "EAL: pci_map_resource(): 
cannot mmap(11, 0x77fba000, 0x2, 0x0): Bad file descriptor 
(0x77fb9000)"

The secondary process link difference shared libraries, so the 0x77fba000 
is used. 

We know the --base_virtaddr is designed for this situation for hugepages.

This patch map the device resouce into address that is after hugepages when 
--base_virtaddr is  configured.


[dpdk-dev] [PATCH] Add user defined tag calculation callback to librte_distributor.

2014-11-05 Thread jigsaw
Hi Konstantin,

I agree with you. A callback is not necessary. Pls refer to my previous
mail, which proposed a not-at-all-intrusive patch.
Pls let me know your concern.

thx &
rgds,
-qinglai

On Wed, Nov 5, 2014 at 5:13 PM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Wednesday, November 05, 2014 2:28 PM
> > To: Qinglai Xiao
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] Add user defined tag calculation
> callback to librte_distributor.
> >
> > On Wed, Nov 05, 2014 at 03:30:37PM +0200, Qinglai Xiao wrote:
> > > User defined tag calculation has access to mbuf.
> > > Default tag is RSS hash result.
> > >
> >
> > Interesting idea.
> > Did you investigate was there any performance improvement or regression
> comparing
> > whether the callback was called per-packet as packets were dequeued for
> distribution
> > (i.e. how you have things now in your patch), compared to calling
> > the callback in a loop to extract the tags for all packets initially? I
> suspect
> > there probably isn't much performance difference either way, but it may
> be worth
> > checking.
> > One other point, is that I think the callback to extract the tag should
> have
> > additional parameters - at least one, if not two. I would suggest that
> the
> > distributor pointer be passed in, as well as an arbitrary void * pointer.
>
>
> Just wonder, why do you need a call-back?
> Why not just make rte_distributor_process() to accept an extra parameter:
> array of tags?
>
> rte_distributor_process(struct rte_distributor *d,
> struct rte_mbuf **mbufs, uint32_t *mbuf_tags, unsigned
> num_mbufs)
>
> ?
>
>
> >
> > Regards,
> > /Bruce
> >
> > > Signed-off-by: Qinglai Xiao 
> > > ---
> > >  app/test/test_distributor.c  |6 +++---
> > >  app/test/test_distributor_perf.c |2 +-
> > >  lib/librte_distributor/rte_distributor.c |   12 ++--
> > >  lib/librte_distributor/rte_distributor.h |7 ++-
> > >  4 files changed, 20 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
> > > index ce06436..6ea4943 100644
> > > --- a/app/test/test_distributor.c
> > > +++ b/app/test/test_distributor.c
> > > @@ -452,7 +452,7 @@ int test_error_distributor_create_name(void)
> > > char *name = NULL;
> > >
> > > d = rte_distributor_create(name, rte_socket_id(),
> > > -   rte_lcore_count() - 1);
> > > +   rte_lcore_count() - 1, NULL);
> > > if (d != NULL || rte_errno != EINVAL) {
> > > printf("ERROR: No error on create() with NULL name
> param\n");
> > > return -1;
> > > @@ -467,7 +467,7 @@ int test_error_distributor_create_numworkers(void)
> > >  {
> > > struct rte_distributor *d = NULL;
> > > d = rte_distributor_create("test_numworkers", rte_socket_id(),
> > > -   RTE_MAX_LCORE + 10);
> > > +   RTE_MAX_LCORE + 10, NULL);
> > > if (d != NULL || rte_errno != EINVAL) {
> > > printf("ERROR: No error on create() with num_workers >
> MAX\n");
> > > return -1;
> > > @@ -515,7 +515,7 @@ test_distributor(void)
> > >
> > > if (d == NULL) {
> > > d = rte_distributor_create("Test_distributor",
> rte_socket_id(),
> > > -   rte_lcore_count() - 1);
> > > +   rte_lcore_count() - 1, NULL);
> > > if (d == NULL) {
> > > printf("Error creating distributor\n");
> > > return -1;
> > > diff --git a/app/test/test_distributor_perf.c
> b/app/test/test_distributor_perf.c
> > > index b04864c..507e446 100644
> > > --- a/app/test/test_distributor_perf.c
> > > +++ b/app/test/test_distributor_perf.c
> > > @@ -227,7 +227,7 @@ test_distributor_perf(void)
> > >
> > > if (d == NULL) {
> > > d = rte_distributor_create("Test_perf", rte_socket_id(),
> > > -   rte_lcore_count() - 1);
> > > +   rte_lcore_count() - 1, NULL);
> > > if (d == NULL) {
> > > printf("Error creating distributor\n");
> > > return -1;
> > > diff --git a/lib/librte_distributor/rte_distributor.c
> b/lib/librte_distributor/rte_distributor.c
> > > index 585ff88..78c92bd 100644
> > > --- a/lib/librte_distributor/rte_distributor.c
> > > +++ b/lib/librte_distributor/rte_distributor.c
> > > @@ -97,6 +97,7 @@ struct rte_distributor {
> > > union rte_distributor_buffer bufs[RTE_MAX_LCORE];
> > >
> > > struct rte_distributor_returned_pkts returns;
> > > +   rte_distributor_tag_fn tag_cb;
> > >  };
> > >
> > >  TAILQ_HEAD(rte_distributor_list, rte_distributor);
> > > @@ -267,6 +268,7 @@ rte_distributor_process(struct rte_distributor *d,
> > > struct rte_mbuf *next_mb = NULL;
> > > int64_t next_value = 0;
> > >  

[dpdk-dev] bifurcated driver

2014-11-05 Thread Alex Markuze
On Wed, Nov 5, 2014 at 5:14 PM, Alex Markuze  wrote:

> On Wed, Nov 5, 2014 at 3:00 PM, Thomas Monjalon  > wrote:
>
>> Hi Danny,
>>
>> 2014-10-31 17:36, O'driscoll, Tim:
>> > Bifurcated Driver (Danny.Zhou at intel.com)
>>
>> Thanks for the presentation of bifurcated driver during the community
>> call.
>> I asked if you looked at ibverbs and you wanted a link to check.
>> The kernel module is here:
>>
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core
>> The userspace library:
>> http://git.kernel.org/cgit/libs/infiniband/libibverbs.git
>>
>> Extract from Kconfig:
>> "
>> config INFINIBAND_USER_ACCESS
>> tristate "InfiniBand userspace access (verbs and CM)"
>> select ANON_INODES
>> ---help---
>>   Userspace InfiniBand access support.  This enables the
>>   kernel side of userspace verbs and the userspace
>>   communication manager (CM).  This allows userspace processes
>>   to set up connections and directly access InfiniBand
>>   hardware for fast-path operations.  You will also need
>>   libibverbs, libibcm and a hardware driver library from
>>   .
>> "
>>
>> It seems to be close to the bifurcated driver needs.
>> Not sure if it can solve the security issues if there is no dedicated MMU
>> in the NIC.
>>
>
> Mellanox NIC's and other  RDMA HW (Infiniband/RoCE/iWARP) have MTT units -
> memory translation units - a dedicated MMU. These are filled via an
> ibv_reg_mr sys calls - this creates a Process VM to physical/iova memory
> mapping in the NIC. Thus each process can access only its own memory via
> the NIC. This is the way RNIC*s resolve the security issue I'm not sure how
> standard intel nics could support this scheme.
>
> There is already a 6wind PMD for mellanox Nics. I'm assuming this PMD is
> verbs based and behaves similar to the bifurcated driver proposed.
> http://www.mellanox.com/page/press_release_item?id=979
>
> One, thing that I don't understand (And will be happy if some one could
> shed some light on), is how does the NIC supposed do distinguish between
> packets that need to go to the kernel driver rings and packets going to
> user space rings.
>
> I feel we should sum up pros and cons of
>> - igb_uio
>> - uio_pci_generic
>> - VFIO
>> - ibverbs
>> - bifurcated driver
>> I suggest to consider these criterias:
>> - upstream status
>> - usable with kernel netdev
>> - usable in a vm
>> - usable for ethernet
>> - hardware requirements
>> - security protection
>> - performance
>>
>> Regarding IBVERBS - I'm not sure how its relevant to future DPDK
> development , but this is the run down as I know It.
>  This is a veteran package called OFED , or its counterpart Mellanox OFED.
> The kernel drivers are upstream
> The PCI dev stays in the kernels care trough out its life span
> SRIOV support exists, paravirt support exists only(AFAIK) as an
> Office of the CTO(VMware) project called vRDMA
> Eth/RoCE (RDMA over Converged Ethernet)/IB
>=== HW === RDMA capable HW ONLY.
> Security is designed into RDMA HW
>
    Stellar performance - Favored by HPC.
>

*RNIC - RDMA (Remote DMA - iWARP/Infinibad/RoCE)capable NICs.

>
>
>> --
>> Thomas
>>
>
>


[dpdk-dev] bifurcated driver

2014-11-05 Thread Alex Markuze
On Wed, Nov 5, 2014 at 3:00 PM, Thomas Monjalon 
wrote:

> Hi Danny,
>
> 2014-10-31 17:36, O'driscoll, Tim:
> > Bifurcated Driver (Danny.Zhou at intel.com)
>
> Thanks for the presentation of bifurcated driver during the community call.
> I asked if you looked at ibverbs and you wanted a link to check.
> The kernel module is here:
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core
> The userspace library:
> http://git.kernel.org/cgit/libs/infiniband/libibverbs.git
>
> Extract from Kconfig:
> "
> config INFINIBAND_USER_ACCESS
> tristate "InfiniBand userspace access (verbs and CM)"
> select ANON_INODES
> ---help---
>   Userspace InfiniBand access support.  This enables the
>   kernel side of userspace verbs and the userspace
>   communication manager (CM).  This allows userspace processes
>   to set up connections and directly access InfiniBand
>   hardware for fast-path operations.  You will also need
>   libibverbs, libibcm and a hardware driver library from
>   .
> "
>
> It seems to be close to the bifurcated driver needs.
> Not sure if it can solve the security issues if there is no dedicated MMU
> in the NIC.
>

Mellanox NIC's and other  RDMA HW (Infiniband/RoCE/iWARP) have MTT units -
memory translation units - a dedicated MMU. These are filled via an
ibv_reg_mr sys calls - this creates a Process VM to physical/iova memory
mapping in the NIC. Thus each process can access only its own memory via
the NIC. This is the way RNICs resolve the security issue I'm not sure how
standard intel nics could support this scheme.

There is already a 6wind PMD for mellanox Nics. I'm assuming this PMD is
verbs based and behaves similar to the bifurcated driver proposed.
http://www.mellanox.com/page/press_release_item?id=979

One, thing that I don't understand (And will be happy if some one could
shed some light on), is how does the NIC supposed do distinguish between
packets that need to go to the kernel driver rings and packets going to
user space rings.

I feel we should sum up pros and cons of
> - igb_uio
> - uio_pci_generic
> - VFIO
> - ibverbs
> - bifurcated driver
> I suggest to consider these criterias:
> - upstream status
> - usable with kernel netdev
> - usable in a vm
> - usable for ethernet
> - hardware requirements
> - security protection
> - performance
>
> Regarding obverts - I'm not sure how its relevant to future DPDK
development , but this is the run down as I know It.
 This is a veteran package called OFED , or its counterpart Mellanox OFED.
    The kernel drivers are upstream
    The PCI dev stays in the kernels care trough out its life span
    SRIOV support exists, paravirt support exists only(AFAIK) as an
Office of the CTO(VMware) project called vRDMA
    Eth/RoCE (RDMA over Converged Ethernet)/IB
   === HW === RDMA capable HW ONLY.
    Security is designed into RDMA HW
    Stellar performance - Favored by HPC.



> --
> Thomas
>


[dpdk-dev] [PATCH] Add user defined tag calculation callback to librte_distributor.

2014-11-05 Thread jigsaw
Hi Bruce,

Thanks for reply.
The idea is triggered by real life use case, where the flow id is buried in
L3 payload. Deep packet inspection is one of the scenarios, tunneled pkts
is another.
However, only functionality is verified. Performance impact has not been
checked yet.

To add distributor and another void * as params is nice.

Your advice of extract tags in a row inspired me another solution, which is
to change the union hash inside rte_mbuf:

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index e8f9bfc..5b13c0b 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -185,6 +185,7 @@ struct rte_mbuf {
uint16_t id;
} fdir;   /**< Filter identifier if FDIR enabled */
uint32_t sched;   /**< Hierarchical scheduler */
+   uint32_t user;/**< User defined hash tag */
} hash;   /**< hash information */

/* second cache line - fields only used in slow path or on TX */

The new union field user is actually for documentation purpose only, coz
user application can set hash.rss value and have the same result.
Therefore, the user application is free to calculate the tag in burst mode
before calling rte_distributor_process.

Then rte_distributor_process needs to read next_mb->hash.user.
Does it sounds better?

I have another question: why the logical OR 1 is added to new_tag?

thx &
rgds,
-qinglai







On Wed, Nov 5, 2014 at 4:27 PM, Bruce Richardson  wrote:

> On Wed, Nov 05, 2014 at 03:30:37PM +0200, Qinglai Xiao wrote:
> > User defined tag calculation has access to mbuf.
> > Default tag is RSS hash result.
> >
>
> Interesting idea.
> Did you investigate was there any performance improvement or regression
> comparing
> whether the callback was called per-packet as packets were dequeued for
> distribution
> (i.e. how you have things now in your patch), compared to calling
> the callback in a loop to extract the tags for all packets initially? I
> suspect
> there probably isn't much performance difference either way, but it may be
> worth
> checking.
> One other point, is that I think the callback to extract the tag should
> have
> additional parameters - at least one, if not two. I would suggest that the
> distributor pointer be passed in, as well as an arbitrary void * pointer.
>
> Regards,
> /Bruce
>
> > Signed-off-by: Qinglai Xiao 
> > ---
> >  app/test/test_distributor.c  |6 +++---
> >  app/test/test_distributor_perf.c |2 +-
> >  lib/librte_distributor/rte_distributor.c |   12 ++--
> >  lib/librte_distributor/rte_distributor.h |7 ++-
> >  4 files changed, 20 insertions(+), 7 deletions(-)
> >
> > diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
> > index ce06436..6ea4943 100644
> > --- a/app/test/test_distributor.c
> > +++ b/app/test/test_distributor.c
> > @@ -452,7 +452,7 @@ int test_error_distributor_create_name(void)
> >   char *name = NULL;
> >
> >   d = rte_distributor_create(name, rte_socket_id(),
> > - rte_lcore_count() - 1);
> > + rte_lcore_count() - 1, NULL);
> >   if (d != NULL || rte_errno != EINVAL) {
> >   printf("ERROR: No error on create() with NULL name
> param\n");
> >   return -1;
> > @@ -467,7 +467,7 @@ int test_error_distributor_create_numworkers(void)
> >  {
> >   struct rte_distributor *d = NULL;
> >   d = rte_distributor_create("test_numworkers", rte_socket_id(),
> > - RTE_MAX_LCORE + 10);
> > + RTE_MAX_LCORE + 10, NULL);
> >   if (d != NULL || rte_errno != EINVAL) {
> >   printf("ERROR: No error on create() with num_workers >
> MAX\n");
> >   return -1;
> > @@ -515,7 +515,7 @@ test_distributor(void)
> >
> >   if (d == NULL) {
> >   d = rte_distributor_create("Test_distributor",
> rte_socket_id(),
> > - rte_lcore_count() - 1);
> > + rte_lcore_count() - 1, NULL);
> >   if (d == NULL) {
> >   printf("Error creating distributor\n");
> >   return -1;
> > diff --git a/app/test/test_distributor_perf.c
> b/app/test/test_distributor_perf.c
> > index b04864c..507e446 100644
> > --- a/app/test/test_distributor_perf.c
> > +++ b/app/test/test_distributor_perf.c
> > @@ -227,7 +227,7 @@ test_distributor_perf(void)
> >
> >   if (d == NULL) {
> >   d = rte_distributor_create("Test_perf", rte_socket_id(),
> > - rte_lcore_count() - 1);
> > + rte_lcore_count() - 1, NULL);
> >   if (d == NULL) {
> >   printf("Error creating distributor\n");
> >   return -1;
> > diff --git a/lib/librte_distributor/rte_distributor.c
> b/lib/librte_distributor/rte_distributor.c
> > index 585ff88..78c92bd 100644
> > -

[dpdk-dev] [PATCH] lib/librte_vhost: code style fixes

2014-11-05 Thread Xie, Huawei
:(.
Resent done. Please drop this patch.

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, November 05, 2014 2:10 AM
> To: Xie, Huawei
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] lib/librte_vhost: code style fixes
> 
> Hi Huawei,
> 
> Please set a Signed-off in your patch.
> 
> 2014-11-05 12:42, Huawei Xie:
> > This patch fixes code style issues and refines some comments in vhost 
> > library.
> >
> >
> > ---
> >  lib/librte_vhost/eventfd_link/eventfd_link.c | 244 ++---
> >  lib/librte_vhost/eventfd_link/eventfd_link.h | 127 ++-
> >  lib/librte_vhost/rte_virtio_net.h|   3 +-
> >  lib/librte_vhost/vhost-net-cdev.c| 187 +---
> >  lib/librte_vhost/vhost_rxtx.c|  13 +-
> >  lib/librte_vhost/virtio-net.c| 317 
> > +--
> >  6 files changed, 494 insertions(+), 397 deletions(-)
> >
> > diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.c
> b/lib/librte_vhost/eventfd_link/eventfd_link.c
> > index fc0653a..542ec2c 100644
> > --- a/lib/librte_vhost/eventfd_link/eventfd_link.c
> > +++ b/lib/librte_vhost/eventfd_link/eventfd_link.c
> > @@ -1,26 +1,26 @@
> >  /*-
> > - *  * GPL LICENSE SUMMARY
> > - *  *
> > - *  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > - *  *
> > - *  *   This program is free software; you can redistribute it and/or 
> > modify
> > - *  *   it under the terms of version 2 of the GNU General Public License 
> > as
> > - *  *   published by the Free Software Foundation.
> > - *  *
> > - *  *   This program is distributed in the hope that it will be useful, but
> > - *  *   WITHOUT ANY WARRANTY; without even the implied warranty of
> > - *  *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU
> > - *  *   General Public License for more details.
> > - *  *
> > - *  *   You should have received a copy of the GNU General Public License
> > - *  *   along with this program; if not, write to the Free Software
> > - *  *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 
> > 02110-1301
> USA.
> > - *  *   The full GNU General Public License is included in this 
> > distribution
> > - *  *   in the file called LICENSE.GPL.
> > - *  *
> > - *  *   Contact Information:
> > - *  *   Intel Corporation
> > - *   */
> > + * GPL LICENSE SUMMARY
> > + *
> > + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > + *
> > + *   This program is free software; you can redistribute it and/or modify
> > + *   it under the terms of version 2 of the GNU General Public License as
> > + *   published by the Free Software Foundation.
> > + *
> > + *   This program is distributed in the hope that it will be useful, but
> > + *   WITHOUT ANY WARRANTY; without even the implied warranty of
> > + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU
> > + *   General Public License for more details.
> > + *
> > + *   You should have received a copy of the GNU General Public License
> > + *   along with this program; if not, write to the Free Software
> > + *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 
> > USA.
> > + *   The full GNU General Public License is included in this distribution
> > + *   in the file called LICENSE.GPL.
> > + *
> > + *   Contact Information:
> > + *   Intel Corporation
> > + */
> >
> >  #include 
> >  #include 
> > @@ -42,15 +42,15 @@
> >   * get_files_struct is copied from fs/file.c
> >   */
> >  struct files_struct *
> > -get_files_struct (struct task_struct *task)
> > +get_files_struct(struct task_struct *task)
> >  {
> > struct files_struct *files;
> >
> > -   task_lock (task);
> > +   task_lock(task);
> > files = task->files;
> > if (files)
> > -   atomic_inc (&files->count);
> > -   task_unlock (task);
> > +   atomic_inc(&files->count);
> > +   task_unlock(task);
> >
> > return files;
> >  }
> > @@ -59,17 +59,15 @@ get_files_struct (struct task_struct *task)
> >   * put_files_struct is extracted from fs/file.c
> >   */
> >  void
> > -put_files_struct (struct files_struct *files)
> > +put_files_struct(struct files_struct *files)
> >  {
> > -   if (atomic_dec_and_test (&files->count))
> > -   {
> > -   BUG ();
> > -   }
> > +   if (atomic_dec_and_test(&files->count))
> > +   BUG();
> >  }
> >
> >
> >  static long
> > -eventfd_link_ioctl (struct file *f, unsigned int ioctl, unsigned long arg)
> > +eventfd_link_ioctl(struct file *f, unsigned int ioctl, unsigned long arg)
> >  {
> > void __user *argp = (void __user *) arg;
> > struct task_struct *task_target = NULL;
> > @@ -78,96 +76,88 @@ eventfd_link_ioctl (struct file *f, unsigned int ioctl,
> unsigned long arg)
> > struct fdtable *fdt;
> > struct eventfd_copy eventfd_copy;
> >
> > -   switch (ioctl)
> > -   {
> > -   case EVENTFD_COPY:
> > -   if (copy_fro

[dpdk-dev] [PATCH v4 3/3] vhost: Check offset value

2014-11-05 Thread Xie, Huawei


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, November 05, 2014 10:01 AM
> To: Xie, Huawei
> Cc: dev at dpdk.org; Ouyang, Changchun
> Subject: Re: [dpdk-dev] [PATCH v4 3/3] vhost: Check offset value
> 
> 2014-11-05 16:52, Xie, Huawei:
> > Why don't we merge 1,2,3 patches?
> 
> Because it's simpler to understand small patches with a dedicated
> explanation in the commit log of each patch.
> Why do you want to merge them?
> 
> --
> Thomas
Got it, so we prefer each patch fixes one dedicated thing.


[dpdk-dev] [PATCH v4 3/3] vhost: Check offset value

2014-11-05 Thread Xie, Huawei


> -Original Message-
> From: Ouyang, Changchun
> Sent: Wednesday, November 05, 2014 12:11 AM
> To: dev at dpdk.org
> Cc: Xie, Huawei; Ananyev, Konstantin; Cao, Waterman; Ouyang, Changchun
> Subject: [PATCH v4 3/3] vhost: Check offset value
> 
> This patch checks the packet length offset value, and checks if the extra 
> bytes
> inside buffer
> cross page boundary.
> 
> Signed-off-by: Changchun Ouyang 
> ---
>  examples/vhost/main.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index 2916313..a93f7a0 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -1110,7 +1110,8 @@ virtio_tx_route(struct vhost_dev *vdev, struct
> rte_mbuf *m, uint16_t vlan_tag)
>   }
> 
>   if (vm2vm_mode == VM2VM_HARDWARE) {
> - if (find_local_dest(dev, m, &offset, &vlan_tag) != 0) {
> + if (find_local_dest(dev, m, &offset, &vlan_tag) != 0 ||
> + offset > rte_pktmbuf_tailroom(m)) {
>   rte_pktmbuf_free(m);
>   return;
>   }
> @@ -1896,7 +1897,9 @@ virtio_dev_tx_zcp(struct virtio_net *dev)
> 
>   /* Buffer address translation. */
>   buff_addr = gpa_to_vva(dev, desc->addr);
> - phys_addr = gpa_to_hpa(vdev, desc->addr, desc->len,
> &addr_type);
> + /* Need check extra VLAN_HLEN size for inserting VLAN tag */
> + phys_addr = gpa_to_hpa(vdev, desc->addr, desc->len +
> VLAN_HLEN,
> + &addr_type);
> 
>   if (likely(packet_success < (free_entries - 1)))
>   /* Prefetch descriptor index. */
> --
> 1.8.4.2
Why don't we merge 1,2,3 patches?


[dpdk-dev] [PATCH v4 1/3] vhost: Fix packet length issue

2014-11-05 Thread Xie, Huawei
> -Original Message-
> From: Ouyang, Changchun
> Sent: Wednesday, November 05, 2014 12:11 AM
> To: dev at dpdk.org
> Cc: Xie, Huawei; Ananyev, Konstantin; Cao, Waterman; Ouyang, Changchun
> Subject: [PATCH v4 1/3] vhost: Fix packet length issue
> 
> As HW vlan strip will reduce the packet length by minus length of vlan tag,
> so it need restore the packet length by plus it.
> 
> Signed-off-by: Changchun Ouyang 
> ---
>  examples/vhost/main.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index 57ef464..5ca8dce 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -1078,7 +1078,13 @@ virtio_tx_route(struct vhost_dev *vdev, struct
> rte_mbuf *m, uint16_t vlan_tag)
>   rte_pktmbuf_free(m);
>   return;
>   }
> - offset = 4;
> +
> + /*
> +  * HW vlan strip will reduce the packet length
> +  * by minus length of vlan tag, so need restore
> +  * the packet length by plus it.
> +  */
> + offset = VLAN_HLEN;
>   vlan_tag =
>   (uint16_t)
>   vlan_tags[(uint16_t)dev_ll->vdev->dev-
> >device_fh];
> @@ -1102,8 +1108,10 @@ virtio_tx_route(struct vhost_dev *vdev, struct
> rte_mbuf *m, uint16_t vlan_tag)
>   len = tx_q->len;
> 
>   m->ol_flags = PKT_TX_VLAN_PKT;
> - /*FIXME: offset*/
> +
>   m->data_len += offset;
> + m->pkt_len += offset;
> +
>   m->vlan_tci = vlan_tag;
> 
>   tx_q->m_table[len] = m;
> --
> 1.8.4.2
Only one thing, I feel "by minus/plus" has grammar problem. :).
Acked-by: Huawei Xie .


[dpdk-dev] [PATCH] Add user defined tag calculation callback to librte_distributor.

2014-11-05 Thread Bruce Richardson
On Wed, Nov 05, 2014 at 05:11:51PM +0200, jigsaw wrote:
> Hi Bruce,
> 
> Thanks for reply.
> The idea is triggered by real life use case, where the flow id is buried in
> L3 payload. Deep packet inspection is one of the scenarios, tunneled pkts
> is another.
> However, only functionality is verified. Performance impact has not been
> checked yet.
> 
> To add distributor and another void * as params is nice.
> 
> Your advice of extract tags in a row inspired me another solution, which is
> to change the union hash inside rte_mbuf:
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index e8f9bfc..5b13c0b 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -185,6 +185,7 @@ struct rte_mbuf {
> uint16_t id;
> } fdir;   /**< Filter identifier if FDIR enabled */
> uint32_t sched;   /**< Hierarchical scheduler */
> +   uint32_t user;/**< User defined hash tag */
> } hash;   /**< hash information */
> 
> /* second cache line - fields only used in slow path or on TX */
> 
> The new union field user is actually for documentation purpose only, coz
> user application can set hash.rss value and have the same result.
> Therefore, the user application is free to calculate the tag in burst mode
> before calling rte_distributor_process.
> 
> Then rte_distributor_process needs to read next_mb->hash.user.
> Does it sounds better?

What you propose is the exact original intent, though I did not try to add
a new union member purely for documentation purposes. I had planned, but
perhaps did not explain well enough, that the application would itself set up
the tag as it thought best before passing packets to the distributor. I suspect
that overloading the RSS field for this impeded that idea geting through.

/Bruce

> 
> I have another question: why the logical OR 1 is added to new_tag?
> 
> thx &
> rgds,
> -qinglai
> 
> 
> 
> 
> 
> 
> 
> On Wed, Nov 5, 2014 at 4:27 PM, Bruce Richardson  intel.com
> > wrote:
> 
> > On Wed, Nov 05, 2014 at 03:30:37PM +0200, Qinglai Xiao wrote:
> > > User defined tag calculation has access to mbuf.
> > > Default tag is RSS hash result.
> > >
> >
> > Interesting idea.
> > Did you investigate was there any performance improvement or regression
> > comparing
> > whether the callback was called per-packet as packets were dequeued for
> > distribution
> > (i.e. how you have things now in your patch), compared to calling
> > the callback in a loop to extract the tags for all packets initially? I
> > suspect
> > there probably isn't much performance difference either way, but it may be
> > worth
> > checking.
> > One other point, is that I think the callback to extract the tag should
> > have
> > additional parameters - at least one, if not two. I would suggest that the
> > distributor pointer be passed in, as well as an arbitrary void * pointer.
> >
> > Regards,
> > /Bruce
> >
> > > Signed-off-by: Qinglai Xiao 
> > > ---
> > >  app/test/test_distributor.c  |6 +++---
> > >  app/test/test_distributor_perf.c |2 +-
> > >  lib/librte_distributor/rte_distributor.c |   12 ++--
> > >  lib/librte_distributor/rte_distributor.h |7 ++-
> > >  4 files changed, 20 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
> > > index ce06436..6ea4943 100644
> > > --- a/app/test/test_distributor.c
> > > +++ b/app/test/test_distributor.c
> > > @@ -452,7 +452,7 @@ int test_error_distributor_create_name(void)
> > >   char *name = NULL;
> > >
> > >   d = rte_distributor_create(name, rte_socket_id(),
> > > - rte_lcore_count() - 1);
> > > + rte_lcore_count() - 1, NULL);
> > >   if (d != NULL || rte_errno != EINVAL) {
> > >   printf("ERROR: No error on create() with NULL name
> > param\n");
> > >   return -1;
> > > @@ -467,7 +467,7 @@ int test_error_distributor_create_numworkers(void)
> > >  {
> > >   struct rte_distributor *d = NULL;
> > >   d = rte_distributor_create("test_numworkers", rte_socket_id(),
> > > - RTE_MAX_LCORE + 10);
> > > + RTE_MAX_LCORE + 10, NULL);
> > >   if (d != NULL || rte_errno != EINVAL) {
> > >   printf("ERROR: No error on create() with num_workers >
> > MAX\n");
> > >   return -1;
> > > @@ -515,7 +515,7 @@ test_distributor(void)
> > >
> > >   if (d == NULL) {
> > >   d = rte_distributor_create("Test_distributor",
> > rte_socket_id(),
> > > - rte_lcore_count() - 1);
> > > + rte_lcore_count() - 1, NULL);
> > >   if (d == NULL) {
> > >   printf("Error creating distributor\n");
> > >   return -1;
> > > diff --git a/app/test/test_distributor_perf.c
> > b/app/test/test_distr

[dpdk-dev] [PATCH] i40e: fix build of VXLAN packet identification debug

2014-11-05 Thread De Lara Guarch, Pablo


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Choonho Son
> Sent: Wednesday, November 05, 2014 3:16 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] i40e: fix build of VXLAN packet identification
> debug
> 
> The commit 15dbb63ef9e9f108e7dcd837b88234f27a1ec258 didn't compile,
> if CONFIG_RTE_LIBRTE_I40E_DEBUG_DRIVER is enabled.
> 
> Signed-off-by: Choonho Son 
> ---

Acked-by: Pablo de Lara 


[dpdk-dev] 答复:答复: [PATCH] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-05 Thread Burakov, Anatoly
Ah, that makes sense. So you?re actually hitting the very issue I?m concerned 
about when mapping purely with base-virtaddr (a library is mapped into where 
you?re trying to map your UIO resources).

Well, as I said, you can try and walk the memsegs and work out the biggest 
end-address of hugepage memory. That?s the easiest way I can think of.

Thanks,
Anatoly

From: XU Liang [mailto:liang...@cinfotech.cn]
Sent: Wednesday, November 5, 2014 4:10 PM
To: Burakov, Anatoly; dev at dpdk.org
Subject: ??[dpdk-dev] [PATCH] eal: map uio resources after hugepages when 
the base_virtaddr is configured.

I have a multiple processes application. When start a secondary process, we got 
error message "EAL: pci_map_resource(): cannot mmap(11, 0x77fba000, 
0x2, 0x0): Bad file descriptor (0x77fb9000)".

The secondary process link difference shared libraries, so the address 
0x77fba000 is used.

--
Burakov, Anatoly mailto:anatoly.burakov at 
intel.com>>
?2014?11?5?(???) 23:59
?? mailto:liang.xu at cinfotech.cn>>?dev at 
dpdk.org mailto:dev at dpdk.org>>
RE: ???[dpdk-dev] [PATCH] eal: map uio resources after hugepages when the 
base_virtaddr is configured.

Hi Liang

Yes it is a problem. Even if it was carefully selected by user, nothing stops 
the DPDK application from mapping something into where you?re trying to map 
your UIO devices. Plus, this changes the default behavior where a wrong 
base-virtaddr leads to a failure to initialize, rather than simply using a 
different address (remember that pci_map_resource fails if it cannot map the 
resource at the exact address you requested).

A very crude way of finding out where hugepages end would be to walk the 
hugepage memory (walk through memsegs and note the maximum start addr + length 
of that memseg).

Could you perhaps explain what is the problem that you?re trying to solve with 
this? I can?t think of a situation where the location of UIO maps would matter, 
to be honest.

Thanks,
Anatoly

From: XU Liang [mailto:liang...@cinfotech.cn]
Sent: Wednesday, November 5, 2014 3:49 PM
To: Burakov, Anatoly; dev at dpdk.org
Subject: ???[dpdk-dev] [PATCH] eal: map uio resources after hugepages when the 
base_virtaddr is configured.

I think the base_virtadd will be carefully selected by user when they need it. 
So maybe it's not a real problem.  :>

The real reason is I can't find a easy way to get the end address of hugepages. 
Can you give me some suggestions ?
--
Burakov, Anatoly mailto:anatoly.burakov at 
intel.com>>
?2014?11?5?(???) 23:10
?? mailto:liang.xu at cinfotech.cn>>?dev at 
dpdk.org mailto:dev at dpdk.org>>
RE: [dpdk-dev] [PATCH] eal: map uio resources after hugepages when the 
base_virtaddr is configured.

I have a slight problems with this patch.

The base_virtaddr doesn't necessarily correspond to an address that everything 
gets mapped to. It's a "hint" of sorts, that may or may not be taken into 
account by mmap. Therefore we can't simply assume that if we requested a 
base-virtaddr, everything will get mapped at exactly that address. We also 
can't assume that hugepages will be ordered one after the other and occupy 
neatly all the contiguous virtual memory between base_virtaddr and 
base_virtaddr + internal_config.memory - there may be holes, for whatever 
reasons.

Also,

Thanks,
Anatoly

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of lxu
Sent: Wednesday, November 5, 2014 1:25 PM
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH] eal: map uio resources after hugepages when the 
base_virtaddr is configured.

---
lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 9 -
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..bc7ed3a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -289,6 +289,11 @@ pci_uio_map_resource(struct rte_pci_device *dev)
struct rte_pci_addr *loc = &dev->addr;
struct mapped_pci_resource *uio_res;
struct pci_map *maps;
+ static void * requested_addr = NULL;
+ if (internal_config.base_virtaddr && NULL == requested_addr) {
+ requested_addr = (uint8_t *) internal_config.base_virtaddr
+ + internal_config.memory;
+ }

dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN; @@ -371,10 +376,12 @@ 
pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
- mapaddr = pci_map_resource(NULL, fd, (off_t)offset,
+ mapaddr = pci_map_resource(requested_addr, fd, (off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+ else if (NULL != requested_addr)
+ requested_addr = (uint8_t *)mapaddr + maps[j].size;
}

if (fail) {
--
1.9.1

[dpdk-dev] 答复: [PATCH] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-05 Thread Burakov, Anatoly
Hi Liang

Yes it is a problem. Even if it was carefully selected by user, nothing stops 
the DPDK application from mapping something into where you?re trying to map 
your UIO devices. Plus, this changes the default behavior where a wrong 
base-virtaddr leads to a failure to initialize, rather than simply using a 
different address (remember that pci_map_resource fails if it cannot map the 
resource at the exact address you requested).

A very crude way of finding out where hugepages end would be to walk the 
hugepage memory (walk through memsegs and note the maximum start addr + length 
of that memseg).

Could you perhaps explain what is the problem that you?re trying to solve with 
this? I can?t think of a situation where the location of UIO maps would matter, 
to be honest.

Thanks,
Anatoly

From: XU Liang [mailto:liang...@cinfotech.cn]
Sent: Wednesday, November 5, 2014 3:49 PM
To: Burakov, Anatoly; dev at dpdk.org
Subject: ???[dpdk-dev] [PATCH] eal: map uio resources after hugepages when the 
base_virtaddr is configured.

I think the base_virtadd will be carefully selected by user when they need it. 
So maybe it's not a real problem.  :>

The real reason is I can't find a easy way to get the end address of hugepages. 
Can you give me some suggestions ?
--
Burakov, Anatoly mailto:anatoly.burakov at 
intel.com>>
?2014?11?5?(???) 23:10
?? mailto:liang.xu at cinfotech.cn>>?dev at 
dpdk.org mailto:dev at dpdk.org>>
RE: [dpdk-dev] [PATCH] eal: map uio resources after hugepages when the 
base_virtaddr is configured.

I have a slight problems with this patch.

The base_virtaddr doesn't necessarily correspond to an address that everything 
gets mapped to. It's a "hint" of sorts, that may or may not be taken into 
account by mmap. Therefore we can't simply assume that if we requested a 
base-virtaddr, everything will get mapped at exactly that address. We also 
can't assume that hugepages will be ordered one after the other and occupy 
neatly all the contiguous virtual memory between base_virtaddr and 
base_virtaddr + internal_config.memory - there may be holes, for whatever 
reasons.

Also,

Thanks,
Anatoly

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of lxu
Sent: Wednesday, November 5, 2014 1:25 PM
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH] eal: map uio resources after hugepages when the 
base_virtaddr is configured.

---
lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 9 -
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..bc7ed3a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -289,6 +289,11 @@ pci_uio_map_resource(struct rte_pci_device *dev)
struct rte_pci_addr *loc = &dev->addr;
struct mapped_pci_resource *uio_res;
struct pci_map *maps;
+ static void * requested_addr = NULL;
+ if (internal_config.base_virtaddr && NULL == requested_addr) {
+ requested_addr = (uint8_t *) internal_config.base_virtaddr
+ + internal_config.memory;
+ }

dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN; @@ -371,10 +376,12 @@ 
pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
- mapaddr = pci_map_resource(NULL, fd, (off_t)offset,
+ mapaddr = pci_map_resource(requested_addr, fd, (off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+ else if (NULL != requested_addr)
+ requested_addr = (uint8_t *)mapaddr + maps[j].size;
}

if (fail) {
--
1.9.1


[dpdk-dev] [PATCH] Add user defined tag calculation callback to librte_distributor.

2014-11-05 Thread Qinglai Xiao
User defined tag calculation has access to mbuf.
Default tag is RSS hash result.

Signed-off-by: Qinglai Xiao 
---
 app/test/test_distributor.c  |6 +++---
 app/test/test_distributor_perf.c |2 +-
 lib/librte_distributor/rte_distributor.c |   12 ++--
 lib/librte_distributor/rte_distributor.h |7 ++-
 4 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index ce06436..6ea4943 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -452,7 +452,7 @@ int test_error_distributor_create_name(void)
char *name = NULL;

d = rte_distributor_create(name, rte_socket_id(),
-   rte_lcore_count() - 1);
+   rte_lcore_count() - 1, NULL);
if (d != NULL || rte_errno != EINVAL) {
printf("ERROR: No error on create() with NULL name param\n");
return -1;
@@ -467,7 +467,7 @@ int test_error_distributor_create_numworkers(void)
 {
struct rte_distributor *d = NULL;
d = rte_distributor_create("test_numworkers", rte_socket_id(),
-   RTE_MAX_LCORE + 10);
+   RTE_MAX_LCORE + 10, NULL);
if (d != NULL || rte_errno != EINVAL) {
printf("ERROR: No error on create() with num_workers > MAX\n");
return -1;
@@ -515,7 +515,7 @@ test_distributor(void)

if (d == NULL) {
d = rte_distributor_create("Test_distributor", rte_socket_id(),
-   rte_lcore_count() - 1);
+   rte_lcore_count() - 1, NULL);
if (d == NULL) {
printf("Error creating distributor\n");
return -1;
diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
index b04864c..507e446 100644
--- a/app/test/test_distributor_perf.c
+++ b/app/test/test_distributor_perf.c
@@ -227,7 +227,7 @@ test_distributor_perf(void)

if (d == NULL) {
d = rte_distributor_create("Test_perf", rte_socket_id(),
-   rte_lcore_count() - 1);
+   rte_lcore_count() - 1, NULL);
if (d == NULL) {
printf("Error creating distributor\n");
return -1;
diff --git a/lib/librte_distributor/rte_distributor.c 
b/lib/librte_distributor/rte_distributor.c
index 585ff88..78c92bd 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -97,6 +97,7 @@ struct rte_distributor {
union rte_distributor_buffer bufs[RTE_MAX_LCORE];

struct rte_distributor_returned_pkts returns;
+   rte_distributor_tag_fn tag_cb;
 };

 TAILQ_HEAD(rte_distributor_list, rte_distributor);
@@ -267,6 +268,7 @@ rte_distributor_process(struct rte_distributor *d,
struct rte_mbuf *next_mb = NULL;
int64_t next_value = 0;
uint32_t new_tag = 0;
+   rte_distributor_tag_fn tag_cb = d->tag_cb;
unsigned ret_start = d->returns.start,
ret_count = d->returns.count;

@@ -282,7 +284,11 @@ rte_distributor_process(struct rte_distributor *d,
next_mb = mbufs[next_idx++];
next_value = (((int64_t)(uintptr_t)next_mb)
<< RTE_DISTRIB_FLAG_BITS);
-   new_tag = (next_mb->hash.rss | 1);
+   if (tag_cb) {
+   new_tag = tag_cb(next_mb);
+   } else {
+   new_tag = (next_mb->hash.rss | 1);
+   }

uint32_t match = 0;
unsigned i;
@@ -401,7 +407,8 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 struct rte_distributor *
 rte_distributor_create(const char *name,
unsigned socket_id,
-   unsigned num_workers)
+   unsigned num_workers,
+   rte_distributor_tag_fn tag_cb)
 {
struct rte_distributor *d;
struct rte_distributor_list *distributor_list;
@@ -435,6 +442,7 @@ rte_distributor_create(const char *name,
d = mz->addr;
snprintf(d->name, sizeof(d->name), "%s", name);
d->num_workers = num_workers;
+   d->tag_cb = tag_cb;

rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
TAILQ_INSERT_TAIL(distributor_list, d, next);
diff --git a/lib/librte_distributor/rte_distributor.h 
b/lib/librte_distributor/rte_distributor.h
index ec0d74a..844d325 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -52,6 +52,9 @@ extern "C" {

 struct rte_distributor;

+typedef uint32_t (*rte_distributor_tag_fn)(struct rte_mbuf *);
+/**< User defined tag calculation function */
+
 /**
  * Function to create a new distributor instance
  *
@@ -65,12 +68,14 @@ s

[dpdk-dev] Ports not detected by IGB_UIO in DPDK 1.7.1 in QEMU_KVM environment

2014-11-05 Thread Manoj Viswanath
Hi,

I have a DPDK application running on QEMU-KVM environment using DPDK 1.6.0.
I am trying to port the same to DPDK version 1.7.1.

I am using Virt-manager GUI to assign e1000 emulated port to the VM. This
works fine in DPDK 1.6.0. The device is identified by IGB_UIO and
initialized by my application as expected.

However in case of DPDK 1.7.1, the emulated e1000 devices do not seem to be
recognized.
Following is my analysis:

1. The API pci_get_uio_dev() is returning ERROR. This is called from
pci_uio_map_resource() in the flow of PCI PROBE [rte_eal_pci_probe()].

2. Due to this, the PCI device is not getting mapped to the correct driver
(EM
? Driver?
).

3. The reason for the error in [1] appears to be that "uio" sub-directory
doesn't seem to be correctly created for interfaces assigned to this VM.

4. Upon further analysis i found that IGB_UIO probe function
["igbuio_pci_probe()"] is not getting triggered indicating the port has *not
been assigned* to the IGB_UIO.

Kindly refer to the attachments:-
- "Output of sys-bus-pci-devices" - indicating "uio" subdirectory not
created for PCI devices in case of DPDK 1.7.1
- "Output of lspci -v" - indicating device not bound to driver in case of
DPDK 1.7.1
- IGB_UIO init log snippet - indicating PCI devices not detected and
initialized by IGB_UIO in case of DPDK 1.7.1
- CONFIG file used for DPDK compilation

Not sure what has changed between 1.6.0 and 1.7.1 which is impacting this.

Could someone throw light in this regard as to what i may be missing ?

Thanks in advance.

Regards,
Manoj
-- next part --
--
[A] IGB UIO INIT LOGS - VM running DPDK 1.6.0
--

Oct 20 05:14:31 localhost kernel: Use MSIX interrupt by default
Oct 20 05:14:31 localhost kernel: Use MSIX interrupt by default

Oct 20 05:14:31 localhost kernel: igb_uio :00:04.0: setting latency timer to
 64
Oct 20 05:14:31 localhost kernel: fail to enable pci msix, or not enough msix en
tries
Oct 20 05:14:31 localhost kernel: fail to enable pci msix, or not enough msix en
tries
Oct 20 05:14:31 localhost kernel: uio device registered with irq b  
-> PCI device bound to driver
Oct 20 05:14:31 localhost kernel: uio device registered with irq b

Oct 20 05:14:31 localhost kernel: igb_uio :00:08.0: setting latency timer to
 64
Oct 20 05:14:31 localhost kernel: fail to enable pci msix, or not enough msix en
tries
Oct 20 05:14:31 localhost kernel: fail to enable pci msix, or not enough msix en
tries
Oct 20 05:14:31 localhost kernel: uio device registered with irq b  
-> PCI device bound to driver
Oct 20 05:14:31 localhost kernel: uio device registered with irq b

--
[B] IGB UIO INIT LOGS - VM running DPDK 1.7.1
--

Oct 20 05:10:40 localhost kernel: igb_uio: Use MSIX interrupt by default
Oct 20 05:10:40 localhost kernel: igb_uio: Use MSIX interrupt by default
=> No output for PCI initialization by IGB_UIO 
-- next part --
-
[VM-2 running DPDK 1.6.0]
-
00:08.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controll
er (rev 03)
Subsystem: Red Hat, Inc Device 1100
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at f20a (32-bit, non-prefetchable) [size=128K]
I/O ports at c140 [size=64]
Expansion ROM at f20c [disabled] [size=128K]
Kernel driver in use: igb_uio=> Driver binding succesfull
Kernel modules: igb_uio ==> Driver binding succesfull

-
[VM-1 running DPDK 1.7.1]
-
00:04.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controll
er (rev 03)
Subsystem: Red Hat, Inc Device 1100
Flags: fast devsel, IRQ 11
Memory at f204 (32-bit, non-prefetchable) [size=128K]
I/O ports at c080 [size=64]
Expansion ROM at f206 [disabled] [size=128K]
===> Missing driver binding info

-- next part --

@@
@ On VM-1 running DPDK 1.7.1 @
@@

bash-4.2# lspci | grep 82540EM
00:04.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 03)
00:05.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 03)

##SYS-BUS
bash-4.2# ls /sys/bus/pci/devices/   
:00:00.0  :00:01.2  :00:03.0  :00:06.0
:00:01.0  :00:01.3  :00:04.0  :00:07.0
:00:01.1  :00:02.0  :00:05.0  :00:08.0

bash-4.2# ls /sys/bus/pci/devices/\:00\:04.0/
broken_parity_status  local_cpulist resource0
class local_cpusresource1
configmodalias  rom
consistent_dma_mask

[dpdk-dev] [PATCH] Add user defined tag calculation callback to librte_distributor.

2014-11-05 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> Sent: Wednesday, November 05, 2014 2:28 PM
> To: Qinglai Xiao
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] Add user defined tag calculation callback to 
> librte_distributor.
> 
> On Wed, Nov 05, 2014 at 03:30:37PM +0200, Qinglai Xiao wrote:
> > User defined tag calculation has access to mbuf.
> > Default tag is RSS hash result.
> >
> 
> Interesting idea.
> Did you investigate was there any performance improvement or regression 
> comparing
> whether the callback was called per-packet as packets were dequeued for 
> distribution
> (i.e. how you have things now in your patch), compared to calling
> the callback in a loop to extract the tags for all packets initially? I 
> suspect
> there probably isn't much performance difference either way, but it may be 
> worth
> checking.
> One other point, is that I think the callback to extract the tag should have
> additional parameters - at least one, if not two. I would suggest that the
> distributor pointer be passed in, as well as an arbitrary void * pointer.


Just wonder, why do you need a call-back?
Why not just make rte_distributor_process() to accept an extra parameter: array 
of tags?

rte_distributor_process(struct rte_distributor *d,
struct rte_mbuf **mbufs, uint32_t *mbuf_tags, unsigned 
num_mbufs)

?


> 
> Regards,
> /Bruce
> 
> > Signed-off-by: Qinglai Xiao 
> > ---
> >  app/test/test_distributor.c  |6 +++---
> >  app/test/test_distributor_perf.c |2 +-
> >  lib/librte_distributor/rte_distributor.c |   12 ++--
> >  lib/librte_distributor/rte_distributor.h |7 ++-
> >  4 files changed, 20 insertions(+), 7 deletions(-)
> >
> > diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
> > index ce06436..6ea4943 100644
> > --- a/app/test/test_distributor.c
> > +++ b/app/test/test_distributor.c
> > @@ -452,7 +452,7 @@ int test_error_distributor_create_name(void)
> > char *name = NULL;
> >
> > d = rte_distributor_create(name, rte_socket_id(),
> > -   rte_lcore_count() - 1);
> > +   rte_lcore_count() - 1, NULL);
> > if (d != NULL || rte_errno != EINVAL) {
> > printf("ERROR: No error on create() with NULL name param\n");
> > return -1;
> > @@ -467,7 +467,7 @@ int test_error_distributor_create_numworkers(void)
> >  {
> > struct rte_distributor *d = NULL;
> > d = rte_distributor_create("test_numworkers", rte_socket_id(),
> > -   RTE_MAX_LCORE + 10);
> > +   RTE_MAX_LCORE + 10, NULL);
> > if (d != NULL || rte_errno != EINVAL) {
> > printf("ERROR: No error on create() with num_workers > MAX\n");
> > return -1;
> > @@ -515,7 +515,7 @@ test_distributor(void)
> >
> > if (d == NULL) {
> > d = rte_distributor_create("Test_distributor", rte_socket_id(),
> > -   rte_lcore_count() - 1);
> > +   rte_lcore_count() - 1, NULL);
> > if (d == NULL) {
> > printf("Error creating distributor\n");
> > return -1;
> > diff --git a/app/test/test_distributor_perf.c 
> > b/app/test/test_distributor_perf.c
> > index b04864c..507e446 100644
> > --- a/app/test/test_distributor_perf.c
> > +++ b/app/test/test_distributor_perf.c
> > @@ -227,7 +227,7 @@ test_distributor_perf(void)
> >
> > if (d == NULL) {
> > d = rte_distributor_create("Test_perf", rte_socket_id(),
> > -   rte_lcore_count() - 1);
> > +   rte_lcore_count() - 1, NULL);
> > if (d == NULL) {
> > printf("Error creating distributor\n");
> > return -1;
> > diff --git a/lib/librte_distributor/rte_distributor.c 
> > b/lib/librte_distributor/rte_distributor.c
> > index 585ff88..78c92bd 100644
> > --- a/lib/librte_distributor/rte_distributor.c
> > +++ b/lib/librte_distributor/rte_distributor.c
> > @@ -97,6 +97,7 @@ struct rte_distributor {
> > union rte_distributor_buffer bufs[RTE_MAX_LCORE];
> >
> > struct rte_distributor_returned_pkts returns;
> > +   rte_distributor_tag_fn tag_cb;
> >  };
> >
> >  TAILQ_HEAD(rte_distributor_list, rte_distributor);
> > @@ -267,6 +268,7 @@ rte_distributor_process(struct rte_distributor *d,
> > struct rte_mbuf *next_mb = NULL;
> > int64_t next_value = 0;
> > uint32_t new_tag = 0;
> > +   rte_distributor_tag_fn tag_cb = d->tag_cb;
> > unsigned ret_start = d->returns.start,
> > ret_count = d->returns.count;
> >
> > @@ -282,7 +284,11 @@ rte_distributor_process(struct rte_distributor *d,
> > next_mb = mbufs[next_idx++];
> > next_value = (((int64_t)(uintptr_t)next_mb)
> > << RTE_DISTRIB_FLAG_BITS);
> > -   new_tag = (next_mb->h

[dpdk-dev] [PATCH v4 3/3] vhost: Check offset value

2014-11-05 Thread Ouyang Changchun
This patch checks the packet length offset value, and checks if the extra bytes 
inside buffer
cross page boundary.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 2916313..a93f7a0 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1110,7 +1110,8 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
}

if (vm2vm_mode == VM2VM_HARDWARE) {
-   if (find_local_dest(dev, m, &offset, &vlan_tag) != 0) {
+   if (find_local_dest(dev, m, &offset, &vlan_tag) != 0 ||
+   offset > rte_pktmbuf_tailroom(m)) {
rte_pktmbuf_free(m);
return;
}
@@ -1896,7 +1897,9 @@ virtio_dev_tx_zcp(struct virtio_net *dev)

/* Buffer address translation. */
buff_addr = gpa_to_vva(dev, desc->addr);
-   phys_addr = gpa_to_hpa(vdev, desc->addr, desc->len, &addr_type);
+   /* Need check extra VLAN_HLEN size for inserting VLAN tag */
+   phys_addr = gpa_to_hpa(vdev, desc->addr, desc->len + VLAN_HLEN,
+   &addr_type);

if (likely(packet_success < (free_entries - 1)))
/* Prefetch descriptor index. */
-- 
1.8.4.2



[dpdk-dev] [PATCH v4 2/3] vhost: Remove duplicated codes

2014-11-05 Thread Ouyang Changchun
Extract a function to replace duplicated codes in one copy and zero copy TX 
function.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 139 +-
 1 file changed, 58 insertions(+), 81 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 5ca8dce..2916313 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1040,6 +1040,57 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf 
*m)
 }

 /*
+ * Check if the destination MAC of a packet is one local VM,
+ * and get its vlan tag, and offset if it is.
+ */
+static inline int __attribute__((always_inline))
+find_local_dest(struct virtio_net *dev, struct rte_mbuf *m,
+   uint32_t *offset, uint16_t *vlan_tag)
+{
+   struct virtio_net_data_ll *dev_ll = ll_root_used;
+   struct ether_hdr *pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
+
+   while (dev_ll != NULL) {
+   if ((dev_ll->vdev->ready == DEVICE_RX)
+   && ether_addr_cmp(&(pkt_hdr->d_addr),
+   &dev_ll->vdev->mac_address)) {
+   /*
+* Drop the packet if the TX packet is
+* destined for the TX device.
+*/
+   if (dev_ll->vdev->dev->device_fh == dev->device_fh) {
+   LOG_DEBUG(VHOST_DATA,
+   "(%"PRIu64") TX: Source and destination"
+   " MAC addresses are the same. Dropping "
+   "packet.\n",
+   dev_ll->vdev->dev->device_fh);
+   return -1;
+   }
+
+   /*
+* HW vlan strip will reduce the packet length
+* by minus length of vlan tag, so need restore
+* the packet length by plus it.
+*/
+   *offset = VLAN_HLEN;
+   *vlan_tag =
+   (uint16_t)
+   vlan_tags[(uint16_t)dev_ll->vdev->dev->device_fh];
+
+   LOG_DEBUG(VHOST_DATA,
+   "(%"PRIu64") TX: pkt to local VM device id:"
+   "(%"PRIu64") vlan tag: %d.\n",
+   dev->device_fh, dev_ll->vdev->dev->device_fh,
+   vlan_tag);
+
+   break;
+   }
+   dev_ll = dev_ll->next;
+   }
+   return 0;
+}
+
+/*
  * This function routes the TX packet to the correct interface. This may be a 
local device
  * or the physical port.
  */
@@ -1050,8 +1101,6 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
struct rte_mbuf **m_table;
unsigned len, ret, offset = 0;
const uint16_t lcore_id = rte_lcore_id();
-   struct virtio_net_data_ll *dev_ll = ll_root_used;
-   struct ether_hdr *pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
struct virtio_net *dev = vdev->dev;

/*check if destination is local VM*/
@@ -1061,43 +1110,9 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
}

if (vm2vm_mode == VM2VM_HARDWARE) {
-   while (dev_ll != NULL) {
-   if ((dev_ll->vdev->ready == DEVICE_RX)
-   && ether_addr_cmp(&(pkt_hdr->d_addr),
-   &dev_ll->vdev->mac_address)) {
-   /*
-* Drop the packet if the TX packet is
-* destined for the TX device.
-*/
-   if (dev_ll->vdev->dev->device_fh == 
dev->device_fh) {
-   LOG_DEBUG(VHOST_DATA,
-   "(%"PRIu64") TX: Source and destination"
-   " MAC addresses are the same. Dropping "
-   "packet.\n",
-   dev_ll->vdev->dev->device_fh);
-   rte_pktmbuf_free(m);
-   return;
-   }
-
-   /*
-* HW vlan strip will reduce the packet length
-* by minus length of vlan tag, so need restore
-* the packet length by plus it.
-*/
-   offset = VLAN_HLEN;
-   vlan_tag =
-   (uint16_t)
-   
vlan_tags[(uint16_t)dev_ll->vdev->dev->device_fh];
-
-   LOG_DEBUG(VHOST_DATA,
-   "(%"PRIu64") TX: pkt to local VM device id:"
-   

[dpdk-dev] [PATCH v4 1/3] vhost: Fix packet length issue

2014-11-05 Thread Ouyang Changchun
As HW vlan strip will reduce the packet length by minus length of vlan tag,
so it need restore the packet length by plus it.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 57ef464..5ca8dce 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1078,7 +1078,13 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
rte_pktmbuf_free(m);
return;
}
-   offset = 4;
+
+   /*
+* HW vlan strip will reduce the packet length
+* by minus length of vlan tag, so need restore
+* the packet length by plus it.
+*/
+   offset = VLAN_HLEN;
vlan_tag =
(uint16_t)

vlan_tags[(uint16_t)dev_ll->vdev->dev->device_fh];
@@ -1102,8 +1108,10 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
len = tx_q->len;

m->ol_flags = PKT_TX_VLAN_PKT;
-   /*FIXME: offset*/
+
m->data_len += offset;
+   m->pkt_len += offset;
+
m->vlan_tci = vlan_tag;

tx_q->m_table[len] = m;
-- 
1.8.4.2



[dpdk-dev] [PATCH v4 0/3] Fix packet length issue

2014-11-05 Thread Ouyang Changchun
This patch set fix packet length issue in vhost app, and enhance code by
extracting a function to replace duplicated codes in one copy and zero copy
TX function.

-v4 chang:
 Check offset value and extra bytes inside packet buffer cross page boundary.

-v3 change:
 Extract a function to replace duplicated codes in one copy and zero copy TX 
function.

-v2 change:
 Update data length by plus offset in first segment instead of last segment.

-v1 change:
 Update the packet length by plus offset;
 Use macro to replace constant.

Changchun Ouyang (3):
  Fix packet length issue in vhost.
  Extract a function to replace duplicated codes in vhost.
  Check offset value in vhost

 examples/vhost/main.c | 142 +++---
 1 file changed, 65 insertions(+), 77 deletions(-)

-- 
1.8.4.2



[dpdk-dev] [PATCH] eal: map uio resources after hugepages when the base_virtaddr is configured.

2014-11-05 Thread Burakov, Anatoly
I have a slight problems with this patch.

The base_virtaddr doesn't necessarily correspond to an address that everything 
gets mapped to. It's a "hint" of sorts, that may or may not be taken into 
account by mmap. Therefore we can't simply assume that if we requested a 
base-virtaddr, everything will get mapped at exactly that address. We also 
can't assume that hugepages will be ordered one after the other and occupy 
neatly all the contiguous virtual memory between base_virtaddr and 
base_virtaddr + internal_config.memory - there may be holes, for whatever 
reasons.

Also, 

Thanks,
Anatoly

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of lxu
Sent: Wednesday, November 5, 2014 1:25 PM
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH] eal: map uio resources after hugepages when the 
base_virtaddr is configured.

---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..bc7ed3a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -289,6 +289,11 @@ pci_uio_map_resource(struct rte_pci_device *dev)
struct rte_pci_addr *loc = &dev->addr;
struct mapped_pci_resource *uio_res;
struct pci_map *maps;
+   static void * requested_addr = NULL;
+   if (internal_config.base_virtaddr && NULL == requested_addr) {
+   requested_addr = (uint8_t *) internal_config.base_virtaddr 
+   + internal_config.memory;
+   }

dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN; @@ -371,10 +376,12 @@ 
pci_uio_map_resource(struct rte_pci_device *dev)
if (maps[j].addr != NULL)
fail = 1;
else {
-   mapaddr = pci_map_resource(NULL, fd, 
(off_t)offset,
+   mapaddr = pci_map_resource(requested_addr, fd, 
(off_t)offset,
(size_t)maps[j].size);
if (mapaddr == NULL)
fail = 1;
+   else if (NULL != requested_addr)
+   requested_addr = (uint8_t *)mapaddr + 
maps[j].size;
}

if (fail) {
--
1.9.1



[dpdk-dev] [PATCH] mk: --no-as-needed by default for linux exec-env

2014-11-05 Thread De Lara Guarch, Pablo


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Sergio Gonzalez
> Monroy
> Sent: Thursday, October 30, 2014 10:58 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] mk: --no-as-needed by default for linux exec-
> env
> 
> Ubuntu/Debian toolchain passes --as-needed flag to the linker by default.
> Add --no-as-needed flag by default in linuxapp exec-env to ensure correct
> linking.
> 
> Signed-off-by: Sergio Gonzalez Monroy
> 

Acked-by: Pablo de Lara 

Anyway, it is worth stating that as Neil and Sergio have pointed out, we should 
probably
change the way we build the libraries, considering all problems that we have 
encountered recently.



[dpdk-dev] [PATCH] mk: pass MODULE_CFLAGS to BSD module build system

2014-11-05 Thread De Lara Guarch, Pablo


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Sergio Gonzalez
> Monroy
> Sent: Thursday, October 30, 2014 4:59 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] mk: pass MODULE_CFLAGS to BSD module
> build system
> 
> When building shared libs (for both GCC and CLANG targets), -fPIC flag
> has been added to CFLAGS and leaks to BSD module build system causing
> the following error:
> 
> fatal error: error in backend: Cannot select: 0x802ad8010: i64 =
> X86ISD::WrapperRIP 0x802ade110
>   [ID=13]
>   0x802ade110: i64 = TargetGlobalAddress 0
> [TF=5] [ID=10]
> 
> Reset CFLAGS to MODULE_CFLAGS before building BSD module.
> 
> Signed-off-by: Sergio Gonzalez Monroy
> 

Acked-by: Pablo de Lara 



[dpdk-dev] [PATCH] Add user defined tag calculation callback to librte_distributor.

2014-11-05 Thread Bruce Richardson
On Wed, Nov 05, 2014 at 03:30:37PM +0200, Qinglai Xiao wrote:
> User defined tag calculation has access to mbuf.
> Default tag is RSS hash result.
> 

Interesting idea.
Did you investigate was there any performance improvement or regression 
comparing 
whether the callback was called per-packet as packets were dequeued for 
distribution
(i.e. how you have things now in your patch), compared to calling
the callback in a loop to extract the tags for all packets initially? I suspect
there probably isn't much performance difference either way, but it may be worth
checking.
One other point, is that I think the callback to extract the tag should have
additional parameters - at least one, if not two. I would suggest that the
distributor pointer be passed in, as well as an arbitrary void * pointer.

Regards,
/Bruce

> Signed-off-by: Qinglai Xiao 
> ---
>  app/test/test_distributor.c  |6 +++---
>  app/test/test_distributor_perf.c |2 +-
>  lib/librte_distributor/rte_distributor.c |   12 ++--
>  lib/librte_distributor/rte_distributor.h |7 ++-
>  4 files changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
> index ce06436..6ea4943 100644
> --- a/app/test/test_distributor.c
> +++ b/app/test/test_distributor.c
> @@ -452,7 +452,7 @@ int test_error_distributor_create_name(void)
>   char *name = NULL;
>  
>   d = rte_distributor_create(name, rte_socket_id(),
> - rte_lcore_count() - 1);
> + rte_lcore_count() - 1, NULL);
>   if (d != NULL || rte_errno != EINVAL) {
>   printf("ERROR: No error on create() with NULL name param\n");
>   return -1;
> @@ -467,7 +467,7 @@ int test_error_distributor_create_numworkers(void)
>  {
>   struct rte_distributor *d = NULL;
>   d = rte_distributor_create("test_numworkers", rte_socket_id(),
> - RTE_MAX_LCORE + 10);
> + RTE_MAX_LCORE + 10, NULL);
>   if (d != NULL || rte_errno != EINVAL) {
>   printf("ERROR: No error on create() with num_workers > MAX\n");
>   return -1;
> @@ -515,7 +515,7 @@ test_distributor(void)
>  
>   if (d == NULL) {
>   d = rte_distributor_create("Test_distributor", rte_socket_id(),
> - rte_lcore_count() - 1);
> + rte_lcore_count() - 1, NULL);
>   if (d == NULL) {
>   printf("Error creating distributor\n");
>   return -1;
> diff --git a/app/test/test_distributor_perf.c 
> b/app/test/test_distributor_perf.c
> index b04864c..507e446 100644
> --- a/app/test/test_distributor_perf.c
> +++ b/app/test/test_distributor_perf.c
> @@ -227,7 +227,7 @@ test_distributor_perf(void)
>  
>   if (d == NULL) {
>   d = rte_distributor_create("Test_perf", rte_socket_id(),
> - rte_lcore_count() - 1);
> + rte_lcore_count() - 1, NULL);
>   if (d == NULL) {
>   printf("Error creating distributor\n");
>   return -1;
> diff --git a/lib/librte_distributor/rte_distributor.c 
> b/lib/librte_distributor/rte_distributor.c
> index 585ff88..78c92bd 100644
> --- a/lib/librte_distributor/rte_distributor.c
> +++ b/lib/librte_distributor/rte_distributor.c
> @@ -97,6 +97,7 @@ struct rte_distributor {
>   union rte_distributor_buffer bufs[RTE_MAX_LCORE];
>  
>   struct rte_distributor_returned_pkts returns;
> + rte_distributor_tag_fn tag_cb;
>  };
>  
>  TAILQ_HEAD(rte_distributor_list, rte_distributor);
> @@ -267,6 +268,7 @@ rte_distributor_process(struct rte_distributor *d,
>   struct rte_mbuf *next_mb = NULL;
>   int64_t next_value = 0;
>   uint32_t new_tag = 0;
> + rte_distributor_tag_fn tag_cb = d->tag_cb;
>   unsigned ret_start = d->returns.start,
>   ret_count = d->returns.count;
>  
> @@ -282,7 +284,11 @@ rte_distributor_process(struct rte_distributor *d,
>   next_mb = mbufs[next_idx++];
>   next_value = (((int64_t)(uintptr_t)next_mb)
>   << RTE_DISTRIB_FLAG_BITS);
> - new_tag = (next_mb->hash.rss | 1);
> + if (tag_cb) {
> + new_tag = tag_cb(next_mb);
> + } else {
> + new_tag = (next_mb->hash.rss | 1);
> + }
>  
>   uint32_t match = 0;
>   unsigned i;
> @@ -401,7 +407,8 @@ rte_distributor_clear_returns(struct rte_distributor *d)
>  struct rte_distributor *
>  rte_distributor_create(const char *name,
>   unsigned socket_id,
> - unsigned num_workers)
> + unsigned num_workers,
> + rte_distributor_tag_fn tag_cb)
>  {
>   struct rte_distributor *d;
>   s

[dpdk-dev] bifurcated driver

2014-11-05 Thread Thomas Monjalon
Hi Danny,

2014-10-31 17:36, O'driscoll, Tim:
> Bifurcated Driver (Danny.Zhou at intel.com)

Thanks for the presentation of bifurcated driver during the community call.
I asked if you looked at ibverbs and you wanted a link to check.
The kernel module is here:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core
The userspace library:
http://git.kernel.org/cgit/libs/infiniband/libibverbs.git

Extract from Kconfig:
"
config INFINIBAND_USER_ACCESS
tristate "InfiniBand userspace access (verbs and CM)"
select ANON_INODES
---help---
  Userspace InfiniBand access support.  This enables the
  kernel side of userspace verbs and the userspace
  communication manager (CM).  This allows userspace processes
  to set up connections and directly access InfiniBand
  hardware for fast-path operations.  You will also need
  libibverbs, libibcm and a hardware driver library from
  .
"

It seems to be close to the bifurcated driver needs.
Not sure if it can solve the security issues if there is no dedicated MMU
in the NIC.

I feel we should sum up pros and cons of
- igb_uio
- uio_pci_generic
- VFIO
- ibverbs
- bifurcated driver
I suggest to consider these criterias:
- upstream status
- usable with kernel netdev
- usable in a vm
- usable for ethernet
- hardware requirements
- security protection
- performance

-- 
Thomas


[dpdk-dev] [PATCH] Fix regression for eal_flags_autotest introduced by tailq rework

2014-11-05 Thread De Lara Guarch, Pablo


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Anatoly Burakov
> Sent: Wednesday, November 05, 2014 12:11 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] Fix regression for eal_flags_autotest
> introduced by tailq rework
> 
> As a result of moving tailq's into local memory, some tailq data
> is now reserved in rte_malloc heaps (because it needs to be
> shared across DPDK processes). The first thing DPDK initializes
> is a log mempool, and since it creates a tailq, it reserves
> space in rte_malloc heap before allocating the mempool itself.
> By default, rte_malloc allocates way more space than is necessary,
> so under some conditions (namely, overall memory available is low)
> this results in malloc heap eating up so much memory that log
> mempool is not able to allocate its memzone.
> 
> This patch fixes the unit tests to account for that change.
> 
> Signed-off-by: Anatoly Burakov 
> ---

Acked-by: Pablo de Lara 



[dpdk-dev] [PATCH] lib/librte_vhost: code style fixes

2014-11-05 Thread Huawei Xie
This patch fixes code style issues and refines some comments in vhost library.


---
 lib/librte_vhost/eventfd_link/eventfd_link.c | 244 ++---
 lib/librte_vhost/eventfd_link/eventfd_link.h | 127 ++-
 lib/librte_vhost/rte_virtio_net.h|   3 +-
 lib/librte_vhost/vhost-net-cdev.c| 187 +---
 lib/librte_vhost/vhost_rxtx.c|  13 +-
 lib/librte_vhost/virtio-net.c| 317 +--
 6 files changed, 494 insertions(+), 397 deletions(-)

diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.c 
b/lib/librte_vhost/eventfd_link/eventfd_link.c
index fc0653a..542ec2c 100644
--- a/lib/librte_vhost/eventfd_link/eventfd_link.c
+++ b/lib/librte_vhost/eventfd_link/eventfd_link.c
@@ -1,26 +1,26 @@
 /*-
- *  * GPL LICENSE SUMMARY
- *  *
- *  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *  *
- *  *   This program is free software; you can redistribute it and/or modify
- *  *   it under the terms of version 2 of the GNU General Public License as
- *  *   published by the Free Software Foundation.
- *  *
- *  *   This program is distributed in the hope that it will be useful, but
- *  *   WITHOUT ANY WARRANTY; without even the implied warranty of
- *  *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- *  *   General Public License for more details.
- *  *
- *  *   You should have received a copy of the GNU General Public License
- *  *   along with this program; if not, write to the Free Software
- *  *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 
USA.
- *  *   The full GNU General Public License is included in this distribution
- *  *   in the file called LICENSE.GPL.
- *  *
- *  *   Contact Information:
- *  *   Intel Corporation
- *   */
+ * GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *   General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *   The full GNU General Public License is included in this distribution
+ *   in the file called LICENSE.GPL.
+ *
+ *   Contact Information:
+ *   Intel Corporation
+ */

 #include 
 #include 
@@ -42,15 +42,15 @@
  * get_files_struct is copied from fs/file.c
  */
 struct files_struct *
-get_files_struct (struct task_struct *task)
+get_files_struct(struct task_struct *task)
 {
struct files_struct *files;

-   task_lock (task);
+   task_lock(task);
files = task->files;
if (files)
-   atomic_inc (&files->count);
-   task_unlock (task);
+   atomic_inc(&files->count);
+   task_unlock(task);

return files;
 }
@@ -59,17 +59,15 @@ get_files_struct (struct task_struct *task)
  * put_files_struct is extracted from fs/file.c
  */
 void
-put_files_struct (struct files_struct *files)
+put_files_struct(struct files_struct *files)
 {
-   if (atomic_dec_and_test (&files->count))
-   {
-   BUG ();
-   }
+   if (atomic_dec_and_test(&files->count))
+   BUG();
 }


 static long
-eventfd_link_ioctl (struct file *f, unsigned int ioctl, unsigned long arg)
+eventfd_link_ioctl(struct file *f, unsigned int ioctl, unsigned long arg)
 {
void __user *argp = (void __user *) arg;
struct task_struct *task_target = NULL;
@@ -78,96 +76,88 @@ eventfd_link_ioctl (struct file *f, unsigned int ioctl, 
unsigned long arg)
struct fdtable *fdt;
struct eventfd_copy eventfd_copy;

-   switch (ioctl)
-   {
-   case EVENTFD_COPY:
-   if (copy_from_user (&eventfd_copy, argp, sizeof (struct 
eventfd_copy)))
-   return -EFAULT;
-
-   /*
-* Find the task struct for the target pid
-*/
-   task_target =
-   pid_task (find_vpid (eventfd_copy.target_pid), 
PIDTYPE_PID);
-   if (task_target == NULL)
-   {
-   printk (KERN_DEBUG "Failed to get mem ctx for 
target pid\n");
-   return -EFAULT;
-   }
-
-   files = get_files_struct (current);
-   if (files == NULL)
-   {
-   printk 

[dpdk-dev] [PATCH] lib/librte_vhost: code style fixes

2014-11-05 Thread Huawei Xie
This patch fixes code style issues and refines some comments in vhost library.


---
 lib/librte_vhost/eventfd_link/eventfd_link.c | 244 ++---
 lib/librte_vhost/eventfd_link/eventfd_link.h | 127 ++-
 lib/librte_vhost/rte_virtio_net.h|   3 +-
 lib/librte_vhost/vhost-net-cdev.c| 187 +---
 lib/librte_vhost/vhost_rxtx.c|  13 +-
 lib/librte_vhost/virtio-net.c| 317 +--
 6 files changed, 494 insertions(+), 397 deletions(-)

diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.c 
b/lib/librte_vhost/eventfd_link/eventfd_link.c
index fc0653a..542ec2c 100644
--- a/lib/librte_vhost/eventfd_link/eventfd_link.c
+++ b/lib/librte_vhost/eventfd_link/eventfd_link.c
@@ -1,26 +1,26 @@
 /*-
- *  * GPL LICENSE SUMMARY
- *  *
- *  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *  *
- *  *   This program is free software; you can redistribute it and/or modify
- *  *   it under the terms of version 2 of the GNU General Public License as
- *  *   published by the Free Software Foundation.
- *  *
- *  *   This program is distributed in the hope that it will be useful, but
- *  *   WITHOUT ANY WARRANTY; without even the implied warranty of
- *  *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- *  *   General Public License for more details.
- *  *
- *  *   You should have received a copy of the GNU General Public License
- *  *   along with this program; if not, write to the Free Software
- *  *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 
USA.
- *  *   The full GNU General Public License is included in this distribution
- *  *   in the file called LICENSE.GPL.
- *  *
- *  *   Contact Information:
- *  *   Intel Corporation
- *   */
+ * GPL LICENSE SUMMARY
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of version 2 of the GNU General Public License as
+ *   published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *   General Public License for more details.
+ *
+ *   You should have received a copy of the GNU General Public License
+ *   along with this program; if not, write to the Free Software
+ *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *   The full GNU General Public License is included in this distribution
+ *   in the file called LICENSE.GPL.
+ *
+ *   Contact Information:
+ *   Intel Corporation
+ */

 #include 
 #include 
@@ -42,15 +42,15 @@
  * get_files_struct is copied from fs/file.c
  */
 struct files_struct *
-get_files_struct (struct task_struct *task)
+get_files_struct(struct task_struct *task)
 {
struct files_struct *files;

-   task_lock (task);
+   task_lock(task);
files = task->files;
if (files)
-   atomic_inc (&files->count);
-   task_unlock (task);
+   atomic_inc(&files->count);
+   task_unlock(task);

return files;
 }
@@ -59,17 +59,15 @@ get_files_struct (struct task_struct *task)
  * put_files_struct is extracted from fs/file.c
  */
 void
-put_files_struct (struct files_struct *files)
+put_files_struct(struct files_struct *files)
 {
-   if (atomic_dec_and_test (&files->count))
-   {
-   BUG ();
-   }
+   if (atomic_dec_and_test(&files->count))
+   BUG();
 }


 static long
-eventfd_link_ioctl (struct file *f, unsigned int ioctl, unsigned long arg)
+eventfd_link_ioctl(struct file *f, unsigned int ioctl, unsigned long arg)
 {
void __user *argp = (void __user *) arg;
struct task_struct *task_target = NULL;
@@ -78,96 +76,88 @@ eventfd_link_ioctl (struct file *f, unsigned int ioctl, 
unsigned long arg)
struct fdtable *fdt;
struct eventfd_copy eventfd_copy;

-   switch (ioctl)
-   {
-   case EVENTFD_COPY:
-   if (copy_from_user (&eventfd_copy, argp, sizeof (struct 
eventfd_copy)))
-   return -EFAULT;
-
-   /*
-* Find the task struct for the target pid
-*/
-   task_target =
-   pid_task (find_vpid (eventfd_copy.target_pid), 
PIDTYPE_PID);
-   if (task_target == NULL)
-   {
-   printk (KERN_DEBUG "Failed to get mem ctx for 
target pid\n");
-   return -EFAULT;
-   }
-
-   files = get_files_struct (current);
-   if (files == NULL)
-   {
-   printk 

[dpdk-dev] [PATCH] Fix regression for eal_flags_autotest introduced by tailq rework

2014-11-05 Thread Anatoly Burakov
As a result of moving tailq's into local memory, some tailq data
is now reserved in rte_malloc heaps (because it needs to be
shared across DPDK processes). The first thing DPDK initializes
is a log mempool, and since it creates a tailq, it reserves
space in rte_malloc heap before allocating the mempool itself.
By default, rte_malloc allocates way more space than is necessary,
so under some conditions (namely, overall memory available is low)
this results in malloc heap eating up so much memory that log
mempool is not able to allocate its memzone.

This patch fixes the unit tests to account for that change.

Signed-off-by: Anatoly Burakov 
---
 app/test/test_eal_flags.c | 43 +--
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 21e6cca..9541619 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -52,6 +52,11 @@

 #include "process.h"

+#ifdef RTE_LIBRTE_XEN_DOM0
+#define DEFAULT_MEM_SIZE "30"
+#else
+#define DEFAULT_MEM_SIZE "8"
+#endif
 #define mp_flag "--proc-type=secondary"
 #define no_hpet "--no-hpet"
 #define no_huge "--no-huge"
@@ -616,14 +621,15 @@ test_no_huge_flag(void)
/* With --no-huge */
const char *argv1[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2"};
/* With --no-huge and -m */
-   const char *argv2[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2", 
"-m", "2"};
+   const char *argv2[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2",
+   "-m", DEFAULT_MEM_SIZE};

/* With --no-huge and --socket-mem */
const char *argv3[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2",
-   "--socket-mem=2"};
+   "--socket-mem=" DEFAULT_MEM_SIZE};
/* With --no-huge, -m and --socket-mem */
const char *argv4[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2",
-   "-m", "2", "--socket-mem=2"};
+   "-m", DEFAULT_MEM_SIZE, "--socket-mem=" 
DEFAULT_MEM_SIZE};
if (launch_proc(argv1) != 0) {
printf("Error - process did not run ok with --no-huge flag\n");
return -1;
@@ -789,20 +795,20 @@ test_misc_flags(void)
/* With invalid --syslog */
const char *argv5[] = {prgname, prefix, mp_flag, "-c", "1", "--syslog", 
"error"};
/* With no-sh-conf */
-   const char *argv6[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv6[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
no_shconf, nosh_prefix };

 #ifdef RTE_EXEC_ENV_BSDAPP
return 0;
 #endif
/* With --huge-dir */
-   const char *argv7[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv7[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=hugedir", "--huge-dir", hugepath};
/* With empty --huge-dir (should fail) */
-   const char *argv8[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv8[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=hugedir", "--huge-dir"};
/* With invalid --huge-dir */
-   const char *argv9[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv9[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=hugedir", "--huge-dir", "invalid"};
/* Secondary process with invalid --huge-dir (should run as flag has no
 * effect on secondary processes) */
@@ -923,15 +929,15 @@ test_file_prefix(void)
 #endif

/* this should fail unless the test itself is run with "memtest" prefix 
*/
-   const char *argv0[] = {prgname, mp_flag, "-c", "1", "-n", "2", "-m", 
"2",
+   const char *argv0[] = {prgname, mp_flag, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=" memtest };

/* primary process with memtest1 */
-   const char *argv1[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv1[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=" memtest1 };

/* primary process with memtest2 */
-   const char *argv2[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv2[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=" memtest2 };

char prefix[32];
@@ -1025,7 +1031,6 @@ test_file_prefix(void)
 static int
 test_memory_flags(void)
 {
-   const char* mem_size = NULL;
 #ifdef RTE_EXEC_ENV_BSDAPP
/* BSD target doesn't support prefixes at this point */
const char * prefix = "";
@@ -1037,20 +1042,14 @@ test_memory_flags(void)
}
snprintf(prefix, sizeof(prefix), "--file-prefix=%s", tmp);
 #endif
-#ifdef RTE_LIBRTE_XEN_DOM0
-   mem_size = "30"

[dpdk-dev] [PATCH] Fix regression for eal_flags_autotest introduced by tailq rework

2014-11-05 Thread Anatoly Burakov
As a result of moving tailq's into local memory, some tailq data
is now reserved in rte_malloc heaps (because it needs to be
shared across DPDK processes). The first thing DPDK initializes
is a log mempool, and since it creates a tailq, it reserves
space in rte_malloc heap before allocating the mempool itself.
By default, rte_malloc allocates way more space than is necessary,
so under some conditions (namely, overall memory available is low)
this results in malloc heap eating up so much memory that log
mempool is not able to allocate its memzone.

This patch fixes the unit tests to account for that change.
---
 app/test/test_eal_flags.c | 43 +--
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 21e6cca..9541619 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -52,6 +52,11 @@

 #include "process.h"

+#ifdef RTE_LIBRTE_XEN_DOM0
+#define DEFAULT_MEM_SIZE "30"
+#else
+#define DEFAULT_MEM_SIZE "8"
+#endif
 #define mp_flag "--proc-type=secondary"
 #define no_hpet "--no-hpet"
 #define no_huge "--no-huge"
@@ -616,14 +621,15 @@ test_no_huge_flag(void)
/* With --no-huge */
const char *argv1[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2"};
/* With --no-huge and -m */
-   const char *argv2[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2", 
"-m", "2"};
+   const char *argv2[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2",
+   "-m", DEFAULT_MEM_SIZE};

/* With --no-huge and --socket-mem */
const char *argv3[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2",
-   "--socket-mem=2"};
+   "--socket-mem=" DEFAULT_MEM_SIZE};
/* With --no-huge, -m and --socket-mem */
const char *argv4[] = {prgname, prefix, no_huge, "-c", "1", "-n", "2",
-   "-m", "2", "--socket-mem=2"};
+   "-m", DEFAULT_MEM_SIZE, "--socket-mem=" 
DEFAULT_MEM_SIZE};
if (launch_proc(argv1) != 0) {
printf("Error - process did not run ok with --no-huge flag\n");
return -1;
@@ -789,20 +795,20 @@ test_misc_flags(void)
/* With invalid --syslog */
const char *argv5[] = {prgname, prefix, mp_flag, "-c", "1", "--syslog", 
"error"};
/* With no-sh-conf */
-   const char *argv6[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv6[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
no_shconf, nosh_prefix };

 #ifdef RTE_EXEC_ENV_BSDAPP
return 0;
 #endif
/* With --huge-dir */
-   const char *argv7[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv7[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=hugedir", "--huge-dir", hugepath};
/* With empty --huge-dir (should fail) */
-   const char *argv8[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv8[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=hugedir", "--huge-dir"};
/* With invalid --huge-dir */
-   const char *argv9[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv9[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=hugedir", "--huge-dir", "invalid"};
/* Secondary process with invalid --huge-dir (should run as flag has no
 * effect on secondary processes) */
@@ -923,15 +929,15 @@ test_file_prefix(void)
 #endif

/* this should fail unless the test itself is run with "memtest" prefix 
*/
-   const char *argv0[] = {prgname, mp_flag, "-c", "1", "-n", "2", "-m", 
"2",
+   const char *argv0[] = {prgname, mp_flag, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=" memtest };

/* primary process with memtest1 */
-   const char *argv1[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv1[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=" memtest1 };

/* primary process with memtest2 */
-   const char *argv2[] = {prgname, "-c", "1", "-n", "2", "-m", "2",
+   const char *argv2[] = {prgname, "-c", "1", "-n", "2", "-m", 
DEFAULT_MEM_SIZE,
"--file-prefix=" memtest2 };

char prefix[32];
@@ -1025,7 +1031,6 @@ test_file_prefix(void)
 static int
 test_memory_flags(void)
 {
-   const char* mem_size = NULL;
 #ifdef RTE_EXEC_ENV_BSDAPP
/* BSD target doesn't support prefixes at this point */
const char * prefix = "";
@@ -1037,20 +1042,14 @@ test_memory_flags(void)
}
snprintf(prefix, sizeof(prefix), "--file-prefix=%s", tmp);
 #endif
-#ifdef RTE_LIBRTE_XEN_DOM0
-   mem_size = "30";
-#else
-   mem_size = "2";

[dpdk-dev] [PATCH v2] eal: add option --master-lcore

2014-11-05 Thread Ananyev, Konstantin
Hi Thomas,
Few questions/comments below.
Konstantin

> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Tuesday, November 04, 2014 9:41 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2] eal: add option --master-lcore
> 
> From: Simon Kuenzer 
> 
> Enable users to specify the lcore id that is used as master lcore.
> 
> Signed-off-by: Simon Kuenzer 
> Signed-off-by: Thomas Monjalon 
> ---
> 
> changes in v2:
> - rebase on HEAD including common options for BSD and Linux
> - use strtol() instead of atoi() to check syntax errors
> - unit tests
> 
>  app/test/test.c |  1 +
>  app/test/test_eal_flags.c   | 49 
> +
>  lib/librte_eal/bsdapp/eal/eal.c |  7 +
>  lib/librte_eal/common/eal_common_options.c  | 31 ++
>  lib/librte_eal/common/include/eal_options.h |  2 ++
>  lib/librte_eal/linuxapp/eal/eal.c   |  7 +
>  6 files changed, 97 insertions(+)
> 
> diff --git a/app/test/test.c b/app/test/test.c
> index 9bee6bb..2fecff5 100644
> --- a/app/test/test.c
> +++ b/app/test/test.c
> @@ -82,6 +82,7 @@ do_recursive_call(void)
>   } actions[] =  {
>   { "run_secondary_instances", test_mp_secondary },
>   { "test_missing_c_flag", no_action },
> + { "test_master_lcore_flag", no_action },
>   { "test_missing_n_flag", no_action },
>   { "test_no_hpet_flag", no_action },
>   { "test_whitelist_flag", no_action },
> diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
> index 21e6cca..45020b8 100644
> --- a/app/test/test_eal_flags.c
> +++ b/app/test/test_eal_flags.c
> @@ -520,6 +520,49 @@ test_missing_c_flag(void)
>  }
> 
>  /*
> + * Test --master-lcore option with matching coremask
> + */
> +static int
> +test_master_lcore_flag(void)
> +{
> +#ifdef RTE_EXEC_ENV_BSDAPP
> + /* BSD target doesn't support prefixes at this point */
> + const char * prefix = "";
> +#else
> + char prefix[PATH_MAX], tmp[PATH_MAX];
> + if (get_current_prefix(tmp, sizeof(tmp)) == NULL) {
> + printf("Error - unable to get current prefix!\n");
> + return -1;
> + }
> + snprintf(prefix, sizeof(prefix), "--file-prefix=%s", tmp);
> +#endif
> +
> + /* --master-lcore flag but no value */
> + const char *argv1[] = { prgname, prefix, mp_flag, "-n", "1", "-c", "3", 
> "--master-lcore"};
> + /* --master-lcore flag with invalid value */
> + const char *argv2[] = { prgname, prefix, mp_flag, "-n", "1", "-c", "3", 
> "--master-lcore", "-1"};
> + /* --master-lcore flag with invalid value */
> + const char *argv3[] = { prgname, prefix, mp_flag, "-n", "1", "-c", "3", 
> "--master-lcore", "X"};
> + /* master lcore not in coremask */
> + const char *argv4[] = { prgname, prefix, mp_flag, "-n", "1", "-c", "3", 
> "--master-lcore", "2"};
> + /* valid value */
> + const char *argv5[] = { prgname, prefix, mp_flag, "-n", "1", "-c", "3", 
> "--master-lcore", "1"};
> +
> + if (launch_proc(argv1) == 0
> + || launch_proc(argv2) == 0
> + || launch_proc(argv3) == 0
> + || launch_proc(argv4) == 0) {
> + printf("Error - process ran without error with wrong 
> --master-lcore\n");
> + return -1;
> + }
> + if (launch_proc(argv5) != 0) {
> + printf("Error - process did not run ok with valid 
> --master-lcore\n");
> + return -1;
> + }
> + return 0;
> +}
> +
> +/*
>   * Test that the app doesn't run without the -n flag. In all cases
>   * should give an error and fail to run.
>   * Since -n is not compulsory for MP, we instead use --no-huge and 
> --no-shconf
> @@ -1214,6 +1257,12 @@ test_eal_flags(void)
>   return ret;
>   }
> 
> + ret = test_master_lcore_flag();
> + if (ret < 0) {
> + printf("Error in test_master_lcore_flag()\n");
> + return ret;
> + }
> +
>   ret = test_missing_n_flag();
>   if (ret < 0) {
>   printf("Error in test_missing_n_flag()\n");
> diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
> index ca99cb9..c764fec 100644
> --- a/lib/librte_eal/bsdapp/eal/eal.c
> +++ b/lib/librte_eal/bsdapp/eal/eal.c
> @@ -354,6 +354,13 @@ eal_parse_args(int argc, char **argv)
>   if (opt == '?')
>   return -1;
> 
> + if (opt == OPT_MASTER_LCORE_NUM && !coremask_ok) {
> + RTE_LOG(ERR, EAL, "please specify the master lcore id"
> + "after specifying the coremask\n");
> + eal_usage(prgname);
> + return -1;
> + }
> +
>   ret = eal_parse_common_option(opt, optarg, &internal_config);
>   /* common parser is not happy */
>   if

[dpdk-dev] [PATCH] eal: map uio resources after hugepages when --base_virtaddr is configured

2014-11-05 Thread Thomas Monjalon
The patch was attached and automatically dropped.
Please follow these guidelines:
http://dpdk.org/dev#send

2014-11-05 09:58, De Lara Guarch, Pablo:
> Patch is missing. 
> 
> Thanks,
> Pablo
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of XU Liang
> > Sent: Wednesday, November 05, 2014 9:50 AM
> > To: dev
> > Subject: [dpdk-dev] [PATCH] eal: map uio resources after hugepages when --
> > base_virtaddr is configured
> > 
> > 
> > When start a secondary process, we got error message "EAL:
> > pci_map_resource(): cannot mmap(11, 0x77fba000, 0x2, 0x0): Bad
> > file descriptor (0x77fb9000)"
> > 
> > The secondary process link difference shared libraries, so the 
> > 0x77fba000
> > is used.
> > 
> > We know the --base_virtaddr is designed for this situation for hugepages.
> > 
> > This patch map the device resouce into address that is after hugepages when
> > --base_virtaddr is  configured.



[dpdk-dev] [PATCH v2] eal: add option --master-lcore

2014-11-05 Thread Aaron Campbell
Acked-by: Aaron Campbell mailto:aaron at arbor.net>>

Minor comments inline below, I don?t need to see another patch.

Thanks,
-Aaron

> On Nov 4, 2014, at 5:40 PM, Thomas Monjalon  
> wrote:
> 
> + RTE_LOG(ERR, EAL, "please specify the master lcore id"
> + "after specifying the coremask\n?);

Missing a space between ?id? and ?after?.

> + RTE_LOG(ERR, EAL, "please specify the master lcore id"
> + "after specifying the coremask\n?);

Same here in the Linux version of the code.


[dpdk-dev] [PATCH] eal/linuxapp: Add parameter to specify master lcore id

2014-11-05 Thread Aaron Campbell
> On Nov 4, 2014, at 3:00 PM, Thomas Monjalon  
> wrote:
> 
> 2014-11-03 13:02, Aaron Campbell:
>>> On Jul 8, 2014, at 5:28 AM, Simon Kuenzer  
>>> wrote:
>>> 
>>> +   else if (!strcmp(lgopts[option_index].name, 
>>> OPT_MASTER_LCORE)) {
>>> +   if (!coremask_ok) {
>>> +   RTE_LOG(ERR, EAL, "please specify the 
>>> master "
>>> +   "lcore id after 
>>> specifying "
>>> +   "the coremask\n");
>>> +   eal_usage(prgname);
>>> +   return -1;
>>> +   }
>> 
>> 
>> Hi Simon,
>> 
>> I think that forcing a particular command line order is not that clean.
>> It might be better to remove the cfg->master_lcore setting from
>> eal_parse_coremask(), and defer the selection of the master lcore until
>> all of the command-line options have been parsed.  If ?master-lcore was
>> specified, save the value and use that, otherwise
>> rte_get_next_lcore(-1, 0, 0) can return the first bit set in the coremask.
> 
> It's not sufficient: eal_parse_master_lcore() requires cfg->lcore_role
> to be set. There is a real dependency between these 2 options.
> I'm going to submit a v2. Feel free to improve it with another patch.

I was nit-picking; although it might be nice if the new option is given, to 
verify the specified lcore is in the coremask.  I will ack v2 though and this 
can be improved some other time.

-Aaron


[dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload

2014-11-05 Thread Olivier MATZ
Hi Jijiang,

Thank you for your answer. Please find some comments below.

On 11/05/2014 07:02 AM, Liu, Jijiang wrote:
>> First, the code checks if the mbuf has the flag PKT_RX_TUNNEL_IPV4_HDR.
>> What is the meaning of this flag? It was added by [3], but there is no 
>> description
>> in comments or in the commit log explaining in which case this flag is set 
>> by the
>> driver. The name supposes that this flag is set when the received packet is 
>> an IPv4
>> tunnel, but the commit log talks about vxlan.
>
> The flag PKT_RX_TUNNEL_IPV4_HDR can be used for all tunneling packet types 
> with outer IPV4 header.
> For example:
> IPv4 --> GRE/Teredo/VXLAN --> MAC --> IPv4:
> MAC, IPV4, GRENAT, MAC, IPV4, SCTP, PAY4
> MAC, IPV4, GRENAT, MAC, IPV6, UDP, PAY4
> MAC, IPV4, GRENAT, MAC, IPV6, UDP, PAY4
> These tunneling packet formats have a common point that is outer IPv4 header 
> here.
>
> Only VXLAN tunneling packet is supported in DPDK for i40e now, so  the commit 
> log talks about VXLAN .

Is it possible to have a more formal definition? For instance, is the
following definition below correct?

  "the PKT_RX_TUNNEL_IPV4_HDR flag CAN be set by a driver if the packet
   contains a tunneling protocol inside an IPv4 header".

If the definition above is correct, I don't see how this flag can help
an application to run faster. There is already a flag telling if there
is a valid IPv4 header (PKT_RX_IPV4_HDR). As the PKT_RX_TUNNEL_IPV4_HDR
flag does not tell what is ip->proto, the work done by an application
to dissect a packet would be exactly the same with or without this flag.

Please, can you give an example showing in which conditions this flag
can help an application?


>> What is the meaning of this flag?
>> Is it enough to checksum outer L3, inner L3, and inner L4 as specified in 
>> commit
>> log? If yes, why are the other flags PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM,
>> (...) added in the mbuf later?
>> In my comprehension, these flags are needed in addition to
>> PKT_TX_VXLAN_CKSUM to do the checksum of the inner headers.
>
> Yes, these flags(PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM)  are needed by HW 
> offload of non-tunneling and tunneling  packet.

OK, so I understand that when PKT_TX_VXLAN_CKSUM is set, if the
driver supports it, it will process IP and UDP checksum of outer
header, using l2_len and l3_len.


>> In general, I need to understand how to use all the new offloading stuff. In 
>> [5],
>> new fields were added in the mbuf structure (inner_l3_len and inner_l2_len). 
>> I'm
>> not sure I understand which fields have to be filled. Below is my 
>> understanding,
>> can you please check that it is correct?
>>
>> A- To send a Ether/IP/TCP packet and process IP and TCP TX
>>  checksum in the NIC:
>>
>> - set l2_len = 14, l3_len = 20,
>>   ol_flags = PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM,
>>   write all checksums to 0 in the packet
> IP checksum is 0, but tcp checksum is not 0.
> tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);

OK, right. I forgot it but indeed it's in csumonly.c


>> B- To send a Ether/IP/UDP/vxlan/Ether/IP/TCP packet and
>>  process inner IP and inner TCP TX checksum in the NIC:
>>
>> - set l2_len = 14+20+8+8+14, l3_len = 20,
>>   ol_flags = PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM,
>>   write all checksums to 0 in the packet
>
> No, l2_len is outer L2 length, its value also is 14.

If you set l2_len to 14, how could the driver guess the offset of
the inner IP header? I'm pretty sure it should work with 14+20+8+8+14.

>> C- To send a Ether/IP/UDP/vxlan/Ether/IP/TCP packet and
>>  process outer IP and UDP, plus inner IP and inner TCP TX
>>  checksum in the NIC:
>>
>> - set l2_len = 14, l3_len = 20, inner_l2_len = 14,
>>   inner_l3_len = 20,
> Yes
>>   ol_flags = PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM |
>> PKT_TX_VXLAN_CKSUM,
> Yes
>>   write all checksums to 0 in the packet
>
> Outer and inner IP checksum is 0, but tcp checksum is not 0.
> tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);

What about outer udp checksum? Why shouldn't we set it to the
phdr checksum too?


>> D- To send a Ether/IP/UDP/vxlan/Ether/IP/TCP packet and
>>  process outer IP and UDP TX checksum in the NIC:
>>
>> - set l2_len = 14, l3_len = 20,
>>   ol_flags = PKT_TX_IP_CKSUM | PKT_TX_UDP_CKSUM,
>
> Yes
>>   write all checksums to 0 in the packet
> Outer  IP checksum is 0, but tcp checksum is not 0.
> tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);
>
>> First, can you confirm this? I think it is very important to document it, as 
>> this is a
>> public API that can be used by DPDK users.
>
> These should be included in documents.

Another thing is surprising me.

- if PKT_TX_VXLAN_CKSUM is not set (legacy use case), then the
   driver use l2_len and l3_len to offload inner IP/UDP/TCP checksums.

- if PKT_TX_VXLAN_CKSUM is set, then the driver has to use
   inner_l{23}_len instead of l{23}_len for the same operation.

Adding PKT_TX_VXLAN_CKSUM change

[dpdk-dev] [PATCH v3 00/10] split architecture specific operations

2014-11-05 Thread Chao Zhu

> The set of patches split x86 architecture specific operations from DPDK and 
> put
> them to x86 arch directory.
> This will make the adoption of DPDK much easier on other computer 
> architecture.
> For a new architecture, just add an architecture specific directory and
> necessary building configuration files, then DPDK eal library can support it.
>
>
> Reviewing patchset from Chao, I ended up modifying it along the way,
> so here is a new iteration of this patchset.
>
> Changes since Chao v2 patchset :
>
> - added a preliminary patch for moving rte_atomic.h (for better readability)
> - fixed documentation generation
> - implemented a generic header for each arch specific header (cpuflags, 
> memcpy,
>prefetch were missing)
> - removed C++ stuff from generic headers
> - centralised all doxygen stuff in generic headers (no need to have 
> duplicates)
> - refactored rte_cycles functions
> - moved vmware tsc stuff to arch rte_cycles.h headers
> - finished x86 factorisation
>
>
> Little summary of current state :
>
> - all applications continue to include the eal headers as before, these 
> headers
>are the arch-specific ones
> - the arch specific headers always include the generic ones. The generic 
> headers
>contain the doxygen documentation and code common to all architectures
> - a x86 architecture has been defined which handles both 32bits and 64bits
>peculiarities
>
>
> It builds fine for 32/64 bits (w and w/o "force intrinsics"), but I really 
> would
> like a lot of eyes on this (and I would say, especially, rte_cycles, 
> rte_memcpy
> and rte_cpuflags).
> I still have some concerns about the use of intrinsics for architecture != x86
> but I think Chao will be the best to look at this.
>
>
Acked-by: Chao Zhu 



[dpdk-dev] Ports not detected by IGB_UIO in DPDK 1.7.1 in QEMU_KVM environment

2014-11-05 Thread Bruce Richardson
On Wed, Nov 05, 2014 at 03:28:13PM +0530, Manoj Viswanath wrote:
> Hi,
> 
> I have a DPDK application running on QEMU-KVM environment using DPDK 1.6.0.
> I am trying to port the same to DPDK version 1.7.1.
> 
> I am using Virt-manager GUI to assign e1000 emulated port to the VM. This
> works fine in DPDK 1.6.0. The device is identified by IGB_UIO and
> initialized by my application as expected.
> 
> However in case of DPDK 1.7.1, the emulated e1000 devices do not seem to be
> recognized.
> Following is my analysis:
> 
> 1. The API pci_get_uio_dev() is returning ERROR. This is called from
> pci_uio_map_resource() in the flow of PCI PROBE [rte_eal_pci_probe()].
> 
> 2. Due to this, the PCI device is not getting mapped to the correct driver
> (EM
> ? Driver?
> ).
> 
> 3. The reason for the error in [1] appears to be that "uio" sub-directory
> doesn't seem to be correctly created for interfaces assigned to this VM.
> 
> 4. Upon further analysis i found that IGB_UIO probe function
> ["igbuio_pci_probe()"] is not getting triggered indicating the port has *not
> been assigned* to the IGB_UIO.
> 
> Kindly refer to the attachments:-
> - "Output of sys-bus-pci-devices" - indicating "uio" subdirectory not
> created for PCI devices in case of DPDK 1.7.1
> - "Output of lspci -v" - indicating device not bound to driver in case of
> DPDK 1.7.1
> - IGB_UIO init log snippet - indicating PCI devices not detected and
> initialized by IGB_UIO in case of DPDK 1.7.1
> - CONFIG file used for DPDK compilation
> 
> Not sure what has changed between 1.6.0 and 1.7.1 which is impacting this.
> 
> Could someone throw light in this regard as to what i may be missing ?
> 
> Thanks in advance.
> 
> Regards,
> Manoj

Hi Manoj,

can you perhaps give some details on how you were binding the device to the uio
module both for 1.6 and for 1.7?

/Bruce


[dpdk-dev] [PATCH] Add external parser support for unknown commands.

2014-11-05 Thread Neil Horman
On Tue, Nov 04, 2014 at 08:45:53PM +, Wiles, Roger Keith wrote:
> 
> >> 
> > Can you provide a real example here?  usnig vague terms like "huge" really 
> > makes
> > more of an emotional argument than a factual one.  To cite an example the
> > cmdline_test program adds a command line paramter that just lets you parse a
> > number out of the command line.  Silly, granted, but it serves the purpose. 
> >  Its
> > called cmd_num, and with functions and data all told, it looks like it takes
> > about 17 lines of code.  Now thats more than what you're adding with you 
> > patch,
> > I grant you, but I assert that the "potentially huge" argument you're making
> > above is false, especially whan you consider that some reasonably clever 
> > coding
> > can likely allow you to reuse function parsing fairly easily.
> 
> I think I gave you an example below, but here is one I am looking at now.
> 
This was the level I was looking for, thank you.

> I have a number of programs and scripts in a bin directory, one of the 
> reasons for these programs is to be able to extend the application without 
> having to rebuild the application or adding special parsing of the arguments 
> for just the one program. In my case the scripts can parse the arguments as 
> it already has the support builtin and having cmdline do any processing of 
> the commands is redundant.
> 
> The programs being loaded are shared objects via dlopen() and already have 
> its own parsing code and trying to parse an IP address in cmdline to a binary 
> format would just require me to convert it back to a string to pass that 
> string in a argc/argv format. The project I am doing now is building a 
> dpdk-shell like environment, which starts up DPDK as normal then stops at a 
> command prompt.
> 
So, just so that I'm clear, you're taking example applications from the DPDK,
rebuilding them as shared objects (which they are not currently setup to be
built as), and then running them from a program of your authorship, by loading
them (the DPDK example applications), via dlopen  and calling some arbitrary
function(s) within them.  Is that correct?  If so, that seemsodd.  Commonly
programs that extend other programs without modifying them use fork to establish
a parent/child relationship, and then use the parent to properly control the
child.  GDB is a great example here, one which handles your command line
predicament by adding a command to the gdb console to build the child command
line.  Theres no reason you can't do the same thing, and that requires even less
code than what your proposing here.  If you wanted it to be non-interactive, its
just as easy to add a command line option to your application parser that is:

--child-options=""

which passes a complete command line to the dpdk application without having to
convolute the work of two parsers.

> In the command prompt I am able to load and execute application like l2fwd or 
> l3fwd built as a shared object. Plus I still have a command prompt from the 
> dpdk-sh to launch other applications or look at stats or debug the 
> application. I am still refining the details, but being able to launch a 
> ?dpdk-sh? like system then be able to execute/debug/view stats and anything 
> else one can think of is a very reasonable use. Using this method the user 
> commands are simple and easy to remember, plus someone can build a new 
> application to debug without rebuilding DPDK.
> 
This sounds even more like the GDB example above.  Its set args command should
be your guide to a good implementation.

> I can see this type of environment as a cleaner way for new users to 
> understand DPDK and start playing with the system. I want to add support to 
> run multiple applications at the same time or at least be able to grab stats 
> and information from the application and/or DPDK without someone having to 
> add that support to his application. It is possible with some changes to 
> remove the cmdline parsing from the l3fwd application by adding the basic 
> commands into the dpdk-sh environment. ** This does not mean I am forcing 
> cmdline on applications or developers they can still use DPDK without cmdline 
> or any other feature.
> 
Sounding more and more like a GDB type of environment...

> Plus I can now execute any function within the application of loaded modules 
> or DPDK by doing a symbol lookup and call the function.
> 
Like GDB

> I am trying to build the dpdk-sh with very little modifications to DPDK and 
> as another application similar to Pktgen. Today the examples directory has 
> some great example code all developed by Intel and I would like to start 
> seeing other applications (like Pktgen) contributed to the community. It 
> would be even better (with some effort) to rewrite the examples directory to 
> use dpdk-sh instead or as well.

Then you should definately use the GDB model, as that requires zero changes to
anything in the DPDK.

> > 
> >> Lets say you have a directory on the

[dpdk-dev] [PATCH] lib/librte_vhost: code style fixes

2014-11-05 Thread Thomas Monjalon
Hi Huawei,

Please set a Signed-off in your patch.

2014-11-05 12:42, Huawei Xie:
> This patch fixes code style issues and refines some comments in vhost library.
> 
> 
> ---
>  lib/librte_vhost/eventfd_link/eventfd_link.c | 244 ++---
>  lib/librte_vhost/eventfd_link/eventfd_link.h | 127 ++-
>  lib/librte_vhost/rte_virtio_net.h|   3 +-
>  lib/librte_vhost/vhost-net-cdev.c| 187 +---
>  lib/librte_vhost/vhost_rxtx.c|  13 +-
>  lib/librte_vhost/virtio-net.c| 317 
> +--
>  6 files changed, 494 insertions(+), 397 deletions(-)
> 
> diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.c 
> b/lib/librte_vhost/eventfd_link/eventfd_link.c
> index fc0653a..542ec2c 100644
> --- a/lib/librte_vhost/eventfd_link/eventfd_link.c
> +++ b/lib/librte_vhost/eventfd_link/eventfd_link.c
> @@ -1,26 +1,26 @@
>  /*-
> - *  * GPL LICENSE SUMMARY
> - *  *
> - *  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> - *  *
> - *  *   This program is free software; you can redistribute it and/or modify
> - *  *   it under the terms of version 2 of the GNU General Public License as
> - *  *   published by the Free Software Foundation.
> - *  *
> - *  *   This program is distributed in the hope that it will be useful, but
> - *  *   WITHOUT ANY WARRANTY; without even the implied warranty of
> - *  *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> - *  *   General Public License for more details.
> - *  *
> - *  *   You should have received a copy of the GNU General Public License
> - *  *   along with this program; if not, write to the Free Software
> - *  *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 
> 02110-1301 USA.
> - *  *   The full GNU General Public License is included in this distribution
> - *  *   in the file called LICENSE.GPL.
> - *  *
> - *  *   Contact Information:
> - *  *   Intel Corporation
> - *   */
> + * GPL LICENSE SUMMARY
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *
> + *   This program is free software; you can redistribute it and/or modify
> + *   it under the terms of version 2 of the GNU General Public License as
> + *   published by the Free Software Foundation.
> + *
> + *   This program is distributed in the hope that it will be useful, but
> + *   WITHOUT ANY WARRANTY; without even the implied warranty of
> + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + *   General Public License for more details.
> + *
> + *   You should have received a copy of the GNU General Public License
> + *   along with this program; if not, write to the Free Software
> + *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 
> USA.
> + *   The full GNU General Public License is included in this distribution
> + *   in the file called LICENSE.GPL.
> + *
> + *   Contact Information:
> + *   Intel Corporation
> + */
>  
>  #include 
>  #include 
> @@ -42,15 +42,15 @@
>   * get_files_struct is copied from fs/file.c
>   */
>  struct files_struct *
> -get_files_struct (struct task_struct *task)
> +get_files_struct(struct task_struct *task)
>  {
>   struct files_struct *files;
>  
> - task_lock (task);
> + task_lock(task);
>   files = task->files;
>   if (files)
> - atomic_inc (&files->count);
> - task_unlock (task);
> + atomic_inc(&files->count);
> + task_unlock(task);
>  
>   return files;
>  }
> @@ -59,17 +59,15 @@ get_files_struct (struct task_struct *task)
>   * put_files_struct is extracted from fs/file.c
>   */
>  void
> -put_files_struct (struct files_struct *files)
> +put_files_struct(struct files_struct *files)
>  {
> - if (atomic_dec_and_test (&files->count))
> - {
> - BUG ();
> - }
> + if (atomic_dec_and_test(&files->count))
> + BUG();
>  }
>  
>  
>  static long
> -eventfd_link_ioctl (struct file *f, unsigned int ioctl, unsigned long arg)
> +eventfd_link_ioctl(struct file *f, unsigned int ioctl, unsigned long arg)
>  {
>   void __user *argp = (void __user *) arg;
>   struct task_struct *task_target = NULL;
> @@ -78,96 +76,88 @@ eventfd_link_ioctl (struct file *f, unsigned int ioctl, 
> unsigned long arg)
>   struct fdtable *fdt;
>   struct eventfd_copy eventfd_copy;
>  
> - switch (ioctl)
> - {
> - case EVENTFD_COPY:
> - if (copy_from_user (&eventfd_copy, argp, sizeof (struct 
> eventfd_copy)))
> - return -EFAULT;
> -
> - /*
> -  * Find the task struct for the target pid
> -  */
> - task_target =
> - pid_task (find_vpid (eventfd_copy.target_pid), 
> PIDTYPE_PID);
> - if (task_target == NULL)
> - {
> - printk (KERN_DEBUG 

[dpdk-dev] [PATCH] eal: map uio resources after hugepages when --base_virtaddr is configured

2014-11-05 Thread De Lara Guarch, Pablo
Patch is missing. 

Thanks,
Pablo

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of XU Liang
> Sent: Wednesday, November 05, 2014 9:50 AM
> To: dev
> Subject: [dpdk-dev] [PATCH] eal: map uio resources after hugepages when --
> base_virtaddr is configured
> 
> 
> When start a secondary process, we got error message "EAL:
> pci_map_resource(): cannot mmap(11, 0x77fba000, 0x2, 0x0): Bad
> file descriptor (0x77fb9000)"
> 
> The secondary process link difference shared libraries, so the 0x77fba000
> is used.
> 
> We know the --base_virtaddr is designed for this situation for hugepages.
> 
> This patch map the device resouce into address that is after hugepages when
> --base_virtaddr is  configured.



[dpdk-dev] Intel 82599 tx_conf setting

2014-11-05 Thread Gyumin
Hi

I've read the Intel 82599 official manual and I found that optimal 
PTHRESH is the tx descriptor buffer size - N (N is CPU cache line 
divided by 16).
1. I guess the size of the tx descriptor buffer is 128. Isn't it right?
Where is the size of the tx descriptor buffer in the official manual?

2. What it means that the TX_PTHRESH=36 in the testpmd.c?
If the size of tx descriptor buffer is 128 then optimal thresholds 
to minimize latency are pthresh=4(cache line / 16), hthresh=0 and 
wthresh=0. Is there something I missed?


Thanks.


[dpdk-dev] [PATCH v4 0/3] Fix packet length issue

2014-11-05 Thread Ananyev, Konstantin
> From: Ouyang, Changchun
> Sent: Wednesday, November 05, 2014 7:11 AM
> To: dev at dpdk.org
> Cc: Xie, Huawei; Ananyev, Konstantin; Cao, Waterman; Ouyang, Changchun
> Subject: [PATCH v4 0/3] Fix packet length issue
> 
> This patch set fix packet length issue in vhost app, and enhance code by
> extracting a function to replace duplicated codes in one copy and zero copy
> TX function.
> 
> -v4 chang:
>  Check offset value and extra bytes inside packet buffer cross page boundary.
> 
> -v3 change:
>  Extract a function to replace duplicated codes in one copy and zero copy TX 
> function.
> 
> -v2 change:
>  Update data length by plus offset in first segment instead of last segment.
> 
> -v1 change:
>  Update the packet length by plus offset;
>  Use macro to replace constant.
> 
> Changchun Ouyang (3):
>   Fix packet length issue in vhost.
>   Extract a function to replace duplicated codes in vhost.
>   Check offset value in vhost
> 
>  examples/vhost/main.c | 142 
> +++---
>  1 file changed, 65 insertions(+), 77 deletions(-)
> 
> --
> 1.8.4.2

Acked-by: Konstantin Ananyev 



[dpdk-dev] segmented recv ixgbevf

2014-11-05 Thread Matt Laswell
Hey Folks,

I ran into the same issue that Alex is describing here, and I wanted to
expand just a little bit on his comments, as the documentation isn't very
clear.

Per the documentation, the two arguments to rte_pktmbuf_pool_init() are a
pointer to the memory pool that contains the newly-allocated mbufs and an
opaque pointer.  The docs are pretty vague about what the opaque pointer
should point to or what it's contents mean; all of the examples I looked at
just pass a NULL pointer. The docs for this function describe the opaque
pointer this way:

"A pointer that can be used by the user to retrieve useful information for
mbuf initialization. This pointer comes from the init_arg parameter of
rte_mempool_create()

."

This is a little bit misleading.  Under the covers, rte_pktmbuf_pool_init()
doesn't threat the opaque pointer as a pointer at all.  Rather, it just
converts it to a uint16_t which contains the desired mbuf size.   If it
receives 0 (in other words, if you passed in a NULL pointer), it will use
2048 bytes + RTE_PKTMBUF_HEADROOM.  Hence, incoming jumbo frames will be
segmented into 2K chunks.

Any chance we could get an improvement to the documentation about this
parameter?  It seems as though the opaque pointer isn't a pointer and
probably shouldn't be opaque.

Hope this helps the next person who comes across this behavior.

--
Matt Laswell
infinite io, inc.

On Thu, Oct 30, 2014 at 7:48 AM, Alex Markuze  wrote:

> For posterity.
>
> 1.When using MTU larger then 2K its advised to provide the value
> to rte_pktmbuf_pool_init.
> 2.ixgbevf rounds down the ("MBUF size" - RTE_PKTMBUF_HEADROOM) to the
> nearest 1K multiple when deciding on the receiving capabilities [buffer
> size]of the Buffers in the pool.
> The function SRRCTL register,  is considered here for some reason?
>


[dpdk-dev] Intel 82599 tx_conf setting

2014-11-05 Thread Jeff Shaw
On Wed, Nov 05, 2014 at 09:43:43AM +0900, Gyumin wrote:
> Hi
> 
> I've read the Intel 82599 official manual and I found that optimal 
> PTHRESH is the tx descriptor buffer size - N (N is CPU cache line 
> divided by 16).

This is sometimes true, but not always.  I believe you are referring
to section "7.2.3.4.1 Transmit Descriptor Fetch and Write-back Settings"
in the datasheet.  You'll see the PTHRESH, HTHRESH, and WTHRESH parameters
should be tuned to for your workload. You should try a few combinations
of parameters (starting with the defaults) to see which is really optimal
for your application.

> 1. I guess the size of the tx descriptor buffer is 128. Isn't it right?
>Where is the size of the tx descriptor buffer in the official manual?

The wording in the manual may be a bit confusing. You will see the manual
refers to the "on-chip descriptor buffer size".  This is where the NIC
stores descriptors which were fetched from the actual descriptor ring in
host memory.  Section "7.2.3.3 Transmit Descriptor Ring" states that the
size of the on-chip descriptor buffer size per queue is 40.

> 
> 2. What it means that the TX_PTHRESH=36 in the testpmd.c?
>If the size of tx descriptor buffer is 128 then optimal thresholds 
> to minimize latency are pthresh=4(cache line / 16), hthresh=0 and 
> wthresh=0. Is there something I missed?

Since the on-chip descriptor buffer size is 40, it is clear that we have
chosen reasonable defaults since 40 minus 4 is 36. I recommend you test
a few different values to see how these parameters impact the performance
characteristics of your workload.

> 
> 
> Thanks.
You're welcome.

-Jeff


[dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload

2014-11-05 Thread Liu, Jijiang
Hi Olivier,


> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Tuesday, November 4, 2014 4:19 PM
> To: Liu, Jijiang
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum
> offload
> 
> Hello Jijiang,
> 
> On 10/27/2014 03:13 AM, Jijiang Liu wrote:
> > Add test cases in testpmd to test VxLAN Tx Checksum offload, which include
> >   - IPv4 and IPv6 packet
> >   - outer L3, inner L3 and L4 checksum offload for Tx side.
> >
> > Signed-off-by: Jijiang Liu 
> 
> I'm trying to port the test of TSO in csum_only.c which was originally part 
> of this
> patch [1].
> 
> Meanwhile, the source was modified by the patch provided by the email I'm
> replying to (also available at [2]), and I would like to understand what is 
> the
> purpose of it.
> 
> First, the code checks if the mbuf has the flag PKT_RX_TUNNEL_IPV4_HDR.
> What is the meaning of this flag? It was added by [3], but there is no 
> description
> in comments or in the commit log explaining in which case this flag is set by 
> the
> driver. The name supposes that this flag is set when the received packet is 
> an IPv4
> tunnel, but the commit log talks about vxlan.

The flag PKT_RX_TUNNEL_IPV4_HDR can be used for all tunneling packet types with 
outer IPV4 header.
For example:
IPv4 --> GRE/Teredo/VXLAN --> MAC --> IPv4:
MAC, IPV4, GRENAT, MAC, IPV4, SCTP, PAY4 
MAC, IPV4, GRENAT, MAC, IPV6, UDP, PAY4
MAC, IPV4, GRENAT, MAC, IPV6, UDP, PAY4
These tunneling packet formats have a common point that is outer IPv4 header 
here.

Only VXLAN tunneling packet is supported in DPDK for i40e now, so  the commit 
log talks about VXLAN .


> Then, if this flag was present, the code assumes it's a vxlan packet.
> If one of inner checksum is specified by the user in cmdline, the flag
> PKT_TX_VXLAN_CKSUM is added to the mbuf. This flag definition was added by
> [4] (at the wrong place, I'll fix it in my patchset). 

> What is the meaning of this flag?
> Is it enough to checksum outer L3, inner L3, and inner L4 as specified in 
> commit
> log? If yes, why are the other flags PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM,
> (...) added in the mbuf later?
> In my comprehension, these flags are needed in addition to
> PKT_TX_VXLAN_CKSUM to do the checksum of the inner headers.

Yes, these flags(PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM)  are needed by HW offload 
of non-tunneling and tunneling  packet.

> In general, I need to understand how to use all the new offloading stuff. In 
> [5],
> new fields were added in the mbuf structure (inner_l3_len and inner_l2_len). 
> I'm
> not sure I understand which fields have to be filled. Below is my 
> understanding,
> can you please check that it is correct?
> 
> A- To send a Ether/IP/TCP packet and process IP and TCP TX
> checksum in the NIC:
> 
>- set l2_len = 14, l3_len = 20,
>  ol_flags = PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM,
>  write all checksums to 0 in the packet
   IP checksum is 0, but tcp checksum is not 0.
   tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);

> B- To send a Ether/IP/UDP/vxlan/Ether/IP/TCP packet and
> process inner IP and inner TCP TX checksum in the NIC:
> 
>- set l2_len = 14+20+8+8+14, l3_len = 20,

No, l2_len is outer L2 length, its value also is 14. 
>  ol_flags = PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM,
>  write all checksums to 0 in the packet

   IP checksum is 0, but tcp checksum is not 0.
   tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);

> C- To send a Ether/IP/UDP/vxlan/Ether/IP/TCP packet and
> process outer IP and UDP, plus inner IP and inner TCP TX
> checksum in the NIC:
> 
>- set l2_len = 14, l3_len = 20, inner_l2_len = 14,
>  inner_l3_len = 20,
Yes
>  ol_flags = PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM |
> PKT_TX_VXLAN_CKSUM,
Yes
>  write all checksums to 0 in the packet

   Outer and inner IP checksum is 0, but tcp checksum is not 0.
   tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);


> D- To send a Ether/IP/UDP/vxlan/Ether/IP/TCP packet and
> process outer IP and UDP TX checksum in the NIC:
> 
>- set l2_len = 14, l3_len = 20,
>  ol_flags = PKT_TX_IP_CKSUM | PKT_TX_UDP_CKSUM,

Yes
>  write all checksums to 0 in the packet
Outer  IP checksum is 0, but tcp checksum is not 0.
   tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);


> First, can you confirm this? I think it is very important to document it, as 
> this is a
> public API that can be used by DPDK users.

These should be included in documents.

> To validate my modifications, I will try to reproduce the test plan described 
> in [6].
> The test report contains useful information but I'm not sure to understand the
> following:
> 
> Enable outer IP,UDP,TCP,SCTP and inner IP,UDP checksum offload
> when inner L4 protocal is UDP.
>testpmd>tx_checksum set 0 0xf3

"tx_checksum set 0 0xf3" should be "tx_checksum set 0 0x33", but the tester use 
0xFX (that is to say, enable all inner TX flag)in order to write tes

[dpdk-dev] [PATCH 0/5] vmxnet3 pmd fixes/improvement

2014-11-05 Thread Cao, Waterman
Hi Thomas,

Yes. Xiaonan just want to confirm if yong's patch doesn't impact 
original functionality and regression test cases under VMware.
Xiaonan will check with yong and see if we can add some test in the 
regression to new changes.

Waterman 

-Original Message-
>From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
>Sent: Wednesday, November 5, 2014 6:50 AM
>To: Zhang, XiaonanX
>Cc: dev at dpdk.org
>Subject: Re: [dpdk-dev] [PATCH 0/5] vmxnet3 pmd fixes/improvement
>
>Hi,
>
>These tests don't seem related to the patchset.
>It would be more interesting to test vlan, stop/restart, Rx checks and Rx 
>performance improvement.
>
>--
>Thomas
>
>
>2014-11-04 05:57, Zhang, XiaonanX:
>> Tested-by: Xiaonan Zhang 
>> 
>> - Tested Commit: Yong Wang
>> - OS: Fedora20 3.15.8-200.fc20.x86_64
>> - GCC: gcc version 4.8.3 20140624
>> - CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
>> - NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection 
>> [8086:10fb]
>> - Default x86_64-native-linuxapp-gcc configuration
>> - Total 6 cases, 6 passed, 0 failed
>> - Test Environment setup
>> 
>> - Topology #1: Create 2VMs (Fedora 20, 64bit);for each VM, pass through one 
>> physical port(Niantic 82599) to VM, and also create one virtual device: 
>> vmxnet3 in VM. Between two VMs, use one vswitch to connect 2 vmxnet3. In 
>> summary, PF1 
>>and vmxnet3A are in VM1; PF2 and vmxnet3B are in VM2.The 
>> traffic flow for l2fwd/l3fwd is as below:
>>Ixia -> PF1 -> vmxnet3A -> vswitch -> vmxnet3B -> PF2 -> 
>> Ixia. 
>> - Topology #2: Create 1VM (Fedora 20, 64bit), on this VM, created 2 vmxnet3, 
>> called vmxnet3A, vmxnet3B; create 2 vswitch, vswitchA connecting PF1 and 
>> vmxnet3A, while vswitchB connecting PF2 and vmxnet3B. The traffic flow is as 
>> below:
>>Ixia -> PF1 -> vswitchA -> vmxnet3A -> vmxnet3B -> vswitchB 
>> -> PF2 -> Ixia.
>> 
>> - Test Case1: L2fwd with Topology#1 
>>   Description: Set up topology#1(in prerequisite session), and bind PF1, 
>> PF2, Vmxnet3A, vmxnet3B to DPDK poll-mode driver (igb_uio).
>>Increase the flow at line rate (uni-directional traffic), 
>> send the flow at different packet size (64bytes, 128bytes, 256bytes, 
>> 512bytes, 1024bytes, 1280bytes and 1518bytes) and check the received 
>> packets/rate to see  
>>if any unexpected behavior, such as no receives after N 
>> packets. 
>>   Command / instruction:
>> To run the l2fwd example in 2VMs:
>> ./build/l2fwd -c f -n 4 -- -p 0x3
>> - Test IXIA Flow prerequisite: Ixia port1 sends 5 packets to PF1, and the 
>> flow should have PF1's MAC as destination MAC. Check if ixia port2 have 
>> received the 5 packets.
>>   Expected test result:
>> Passed
>> 
>> - Test Case2: L3fwd-VF with Topology#1
>>   Description: Set up topology#1(in prerequisite session), and bind PF1, 
>> PF2, Vmxnet3A, vmxnet3B to DPDK poll-mode driver (igb_uio)
>>Increase the flow at line rate (uni-directional traffic), 
>> send the flow at different packet size (64bytes, 128bytes, 256bytes, 
>> 512bytes, 1024bytes, 1280bytes and 1518bytes) and check the received 
>> packets/rate to see  
>>if any unexpected behavior, such as no receives after N 
>> packets.
>>   Command / instruction:
>> To run the l3fwd-vf example in 2VMs:
>> ./build/l3fwd-vf -c 0x6 -n 4 -- -p 0x3 
>> --config "(0,0,1),(1,0,2)"
>> - Test IXIA Flow prerequisite: Ixia port1 sends 5 packets to PF1, and the 
>> flow should have PF1's MAC as destination MAC and have 2.1.1.x as 
>> destination IP. Check if ixia port2 have received the 5 packets.
>>   Expected test result:
>> Passed
>> 
>> - Test Case3: L2fwd with Topology#2
>>   Description: Set up topology#2(in prerequisite session), and bind vmxnet3A 
>> and vmxnet3B to DPDK poll-mode driver (igb_uio).
>>Increase the flow at line rate (uni-directional traffic), 
>> send the flow at different packet size (64bytes, 128bytes, 256bytes, 
>> 512bytes, 1024bytes, 1280bytes and 1518bytes) and check the received 
>> packets/rate to see  
>>if any unexpected behavior, such as no receives after N 
>> packets.
>>   Command / instruction:
>> To run the l2fwd example in VM1:
>> ./build/l2fwd -c f -n 4 -- -p 0x3
>> - Test IXIA Flow prerequisite: Ixia port1 sends 5 packets to port0 
>> (vmxnet3A), and the flow should have port0's MAC as destination MAC. Check 
>> if ixia port2 have received the 5 packets. Similar things need to be done at 
>> ixia port2.
>>   Expected test result:
>> Passed
>> 
>> - Test Case4: L3fwd-VF with Topology#2
>>   Description: Set up topology#2(in prerequisite session), and bind vmxnet3A 
>> and vmxnet3B to DPDK poll-mode driver (igb_uio).

[dpdk-dev] [PATCH v5 0/8] link bonding

2014-11-05 Thread Jiajia, SunX
Tested-by: Jiajia, SunX  
- Tested Commit: f7aaae2fe6f7f9a78eab7313d77e92b934693b5d
- OS: Fedora20 3.11.10-301.fc20.x86_64 and 3.16.6-200.fc20.x86_64
- GCC: gcc version 4.8.2
- CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
- NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection 
[8086:10fb]
- Default x86_64-native-linuxapp-gcc configuration
- Total 18 cases, 18 passed, 0 failed

TOPO:
* Connections ports between tester/ixia and DUT
  - TESTER(Or IXIA)---DUT
  - portA--port0
  - portB--port1
  - portC--port2
  - portD--port3
* Connections ports between tester/ixia and switch and DUT
  - TESTER(Or IXIA)---SWITCH
  -   SWITCH-DUT
  - TESTER(Or IXIA)--DUT
  - portA--switch
  -switch--port0
  -switch--port1
  -switch--port2
  - portB--port3


Test Setup#1 for Functional test


Tester has 4 ports(portA--portD), and DUT has 4 ports(port0--port3), then 
connect portA to port0, portB to port1, portC to port2, portD to port3. 


Test Setup#2 for Functional test


Tester has 2 ports(portA--portB), DUT has 4 ports(port0--port3), and a switch 
that supports IEEE 802.3ad Dynamic link aggregation, then connect port0-port2 
to the switch for dynamic link aggregation, connect portA to aggregated 
interface on the switch, connect portB to port3.


- Case: Basic bonding--Create bonded devices and slaves
  Description: 
Use Setup#1.
Create bonded device and add some ports as salve of bonded 
device,
Then removed slaves or added slaves or change the bonding primary 
slave
Or change bonding mode and so on.
  Expected test result:
Verify the basic functions are normal.

- Case: Basic bonding--MAC Address Test
  Description: 
Use Setup#1.
Create bonded device and add some ports as slaves of bonded 
device,
Check that the changes of  the bonded device and slave MAC 
  Expected test result:
Verify the behavior of bonded device and slave according to the 
mode.

- Case: Basic bonding--Device Promiscuous Mode Test
  Description: 
Use Setup#1.
Create bonded device and add some ports as slaves of bonded 
device,
Set promiscuous mode on or off, then send packets to the bonded 
device
Or slaves.
  Expected test result:
Verify the RX/TX status of bonded device and slaves according to 
the mode.

- Case: Mode 0(Round Robin) TX/RX test
  Description: 
Use Setup#1.
Create bonded device with mode 0 and add 3 ports as slaves of 
bonded device,
Forward packets between bonded device and unbounded device, start 
to forward,
And send packets to unbound device or slaves.
  Expected test result:
Verify the RX/TX status of bonded device and slaves in mode 0.

- Case: Mode 0(Round Robin) Bring one slave link down
  Description: 
Use Setup#1.
Create bonded device with mode 0 and add 3 ports as slaves of 
bonded device,
Forward packets between bonded device and unbounded device, start 
to forward,
Bring the link on either port 0, 1 or 2 down. And send packets to 
unbound 
device or slaves.
  Expected test result:
Verify the RX/TX status of bonded device and slaves in mode 0.

- Case: Mode 0(Round Robin) Bring all slave links down
  Description: 
Use Setup#1.
Create bonded device with mode 0 and add 3 ports as slaves of 
bonded device,
Forward packets between bonded device and unbounded device, start 
to forward,
Bring the links down on all bonded ports. And send packets to 
unbound 
device or slaves.
  Expected test result:
Verify the RX/TX status of bonded device and slaves in mode 0.

- Case: Mode 1(Active Backup) TX/RX Test
  Description: 
Use Setup#1.
Create bonded device with mode 1 and add 3 ports as slaves of 
bonded device,
Forward packets between bonded device and unbounded device, start 
to forward,
And send packets to unbound device or slaves.
  Expected test result:
Verify the RX/TX status of bonded device and slaves in mode 1.

- Case: Mode 1(Active Backup) Change active slave, RX/TX test
  Description: 
Use Setup#1.
Continuing from previous test case.Change the active slave port 
from port0 
to port1.Verify that the bonded device's MAC has changed to 
slave1's MAC.

testpmd> set bonding primary 1 4 

   Repeat the transmission and reception(TX/RX) test verify 

[dpdk-dev] [PATCH 0/5] vmxnet3 pmd fixes/improvement

2014-11-05 Thread Cao, Waterman
Hi Yong,

We tested your patch with VMWare ESX 5.5.
It works fine with R1.8 RC1. 
You can find more details from Xiaonan's reports.

Regards

Waterman 
>-Original Message-
>From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yong Wang
>Sent: Tuesday, October 14, 2014 5:00 AM
>To: Thomas Monjalon
>Cc: dev at dpdk.org
>Subject: Re: [dpdk-dev] [PATCH 0/5] vmxnet3 pmd fixes/improvement
>
>Only the last one is performance related and it merely tries to give hints to 
>the compiler to hopefully make branch prediction more efficient.  It also 
>moves a constant assignment out of the pkt polling loop.
>
>We did performance evaluation on a Nehalem box with 4cores at 2.8GHz x 2 
>socket:
>On the DPDK-side, it's running some l3 forwarding apps in a VM on ESXi with 
>one core assigned for polling.  The client side is pktgen/dpdk, pumping 64B 
>tcp packets at line rate.  Before the patch, we are seeing ~900K PPS with 65% 
>cpu of a core used for DPDK.  After the patch, we are seeing the same pkt rate 
>with only 45% of a core used.  CPU usage is collected factoring our the idle 
>loop cost.  The packet rate is a result of the mode we used for vmxnet3 (pure 
>emulation mode running default number of hypervisor contexts).  I can add 
>these info in the review request.
>
>Yong
>
>From: Thomas Monjalon 
>Sent: Monday, October 13, 2014 1:29 PM
>To: Yong Wang
>Cc: dev at dpdk.org
>Subject: Re: [dpdk-dev] [PATCH 0/5] vmxnet3 pmd fixes/improvement
>
>Hi,
>
>2014-10-12 23:23, Yong Wang:
>> This patch series include various fixes and improvement to the
>> vmxnet3 pmd driver.
>>
>> Yong Wang (5):
>>   vmxnet3: Fix VLAN Rx stripping
>>   vmxnet3: Add VLAN Tx offload
>>   vmxnet3: Fix dev stop/restart bug
>>   vmxnet3: Add rx pkt check offloads
>>   vmxnet3: Some perf improvement on the rx path
>
>Please, could describe what is the performance gain for these patches?
>Benchmark numbers would be appreciated.
>
>Thanks
>--
>Thomas

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yong Wang
Sent: Tuesday, October 14, 2014 5:00 AM
To: Thomas Monjalon
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH 0/5] vmxnet3 pmd fixes/improvement

Only the last one is performance related and it merely tries to give hints to 
the compiler to hopefully make branch prediction more efficient.  It also moves 
a constant assignment out of the pkt polling loop.

We did performance evaluation on a Nehalem box with 4cores at 2.8GHz x 2 socket:
On the DPDK-side, it's running some l3 forwarding apps in a VM on ESXi with one 
core assigned for polling.  The client side is pktgen/dpdk, pumping 64B tcp 
packets at line rate.  Before the patch, we are seeing ~900K PPS with 65% cpu 
of a core used for DPDK.  After the patch, we are seeing the same pkt rate with 
only 45% of a core used.  CPU usage is collected factoring our the idle loop 
cost.  The packet rate is a result of the mode we used for vmxnet3 (pure 
emulation mode running default number of hypervisor contexts).  I can add these 
info in the review request.

Yong

From: Thomas Monjalon 
Sent: Monday, October 13, 2014 1:29 PM
To: Yong Wang
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH 0/5] vmxnet3 pmd fixes/improvement

Hi,

2014-10-12 23:23, Yong Wang:
> This patch series include various fixes and improvement to the
> vmxnet3 pmd driver.
>
> Yong Wang (5):
>   vmxnet3: Fix VLAN Rx stripping
>   vmxnet3: Add VLAN Tx offload
>   vmxnet3: Fix dev stop/restart bug
>   vmxnet3: Add rx pkt check offloads
>   vmxnet3: Some perf improvement on the rx path

Please, could describe what is the performance gain for these patches?
Benchmark numbers would be appreciated.

Thanks
--
Thomas


[dpdk-dev] [RFC PATCH] Adding RTE_KNI_PREEMPT configuration option

2014-11-05 Thread Marc Sune
This patch introduces CONFIG_RTE_KNI_PREEMPT flag. When set to 'no', KNI
kernel thread(s) do not call schedule_timeout_interruptible(), which improves
overall KNI performance at the expense of CPU cycles (polling).

Default values is 'yes', maintaining the same behaviour as of now.

Note: this RFC patch is based on v1.7.1, since I was using a 1.7 application.
It will eventually be rebased to 1.8 upon acceptance.

Signed-off-by: Marc Sune 
---
 config/common_linuxapp |1 +
 lib/librte_eal/linuxapp/kni/kni_misc.c |4 
 2 files changed, 5 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 9047975..9cebcf1 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -382,6 +382,7 @@ CONFIG_RTE_LIBRTE_PIPELINE=y
 # Compile librte_kni
 #
 CONFIG_RTE_LIBRTE_KNI=y
+CONFIG_RTE_KNI_PREEMPT=y
 CONFIG_RTE_KNI_KO_DEBUG=n
 CONFIG_RTE_KNI_VHOST=n
 CONFIG_RTE_KNI_VHOST_MAX_CACHE_SIZE=1024
diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c 
b/lib/librte_eal/linuxapp/kni/kni_misc.c
index ba6..e7e6c27 100644
--- a/lib/librte_eal/linuxapp/kni/kni_misc.c
+++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
@@ -229,9 +229,11 @@ kni_thread_single(void *unused)
}
}
up_read(&kni_list_lock);
+#ifdef RTE_KNI_PREEMPT
/* reschedule out for a while */
schedule_timeout_interruptible(usecs_to_jiffies( \
KNI_KTHREAD_RESCHEDULE_INTERVAL));
+#endif
}

return 0;
@@ -252,8 +254,10 @@ kni_thread_multiple(void *param)
 #endif
kni_net_poll_resp(dev);
}
+#ifdef RTE_KNI_PREEMPT
schedule_timeout_interruptible(usecs_to_jiffies( \
KNI_KTHREAD_RESCHEDULE_INTERVAL));
+#endif
}

return 0;
-- 
1.7.10.4



[dpdk-dev] [PATCH 5/5] vmxnet3: Some perf improvement on the rx path

2014-11-05 Thread Thomas Monjalon
2014-10-12 23:23, Yong Wang:
> Signed-off-by: Yong Wang 

Please, could you give some explanations to put in the commit log?

Thanks
-- 
Thomas


[dpdk-dev] [PATCH v3 0/6] i40e VMDQ support

2014-11-05 Thread Thomas Monjalon
> > v3:
> > - Fix comments style.
> > - Simplify words in comments.
> > - Add variable defintion for BSD config file.
> > - Code rebase to latest DPDK repo.
> > 
> > v2:
> > - Fix a few typos.
> > - Add comments for RX mq mode flags.
> > - Remove '\n' from some log messages.
> > - Remove 'Acked-by' in commit log.
> > 
> > v1:
> > Define extra VMDQ arguments to expand VMDQ configuration. This also
> > includes change in igb and ixgbe PMD driver. In the meanwhile, fix 2
> > defects in rte_ether library.
> > 
> > Add full VMDQ support in i40e PMD driver. renamed some functions, setup
> > VMDQ VSI after it's enabled in application. It also make some improvement
> > on macaddr add/delete to support setting multiple macaddr for single or
> > multiple pools.
> > 
> > Finally, change i40e rx/tx_queue_setup and dev_start/stop functions to
> > configure/switch queues belonging to VMDQ pools.
> > 
> > Chen Jing D(Mark) (6):
> >   ether: enhancement for VMDQ support
> >   igb: change for VMDQ arguments expansion
> >   ixgbe: change for VMDQ arguments expansion
> >   i40e: add VMDQ support
> >   i40e: macaddr add/del enhancement
> >   i40e: Add full VMDQ pools support
> 
> Acked-by: Konstantin Ananyev 

Applied

It will need to be well explained in the programmer's guide.

Thanks
-- 
Thomas