date:20150129

[dpdk-dev] [PATCH] eal/linux: allow to map BARs with MSI-X tables, around them

2015-01-29 Thread Dan Aloni

On Wed, Jan 28, 2015 at 03:01:38PM +, Burakov, Anatoly wrote:
> Hi Dan
> 
> Apologies for not looking at it earlier.

No problem, we are all quite busy :)

> > +   if (map_addr != MAP_FAILED
> > +   &&  memreg[1].offset  &&  memreg[1].size) {
> > +   uint8_t *second_addr =
> > +   ((uint8_t *)bar_addr +
> > memreg[1].offset);
> 
> Nitpicking, but probably better to use void* and RTE_PTR_ADD here.

Nitpicking very justified. New patch coming your way.

-- 
Dan Aloni

[dpdk-dev] [PATCH v2] eal/linux: allow to map BARs with MSI-X tables, around them

2015-01-29 Thread Dan Aloni

While VFIO doesn't allow us to map complete BARs with MSI-X tables,
it does allow us to map around them in PAGE_SIZE granularity. There
might be adapters that provide their registers in the same BAR
but on a different page. For example, Intel's NVME adapter, though
not a network adapter, provides only one MMIO BAR that contains
the MSI-X table.

Signed-off-by: Dan Aloni 
Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c  |  5 +-
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |  2 +-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  |  4 +-
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 98 +++---
 lib/librte_eal/linuxapp/eal/eal_vfio.h |  8 ++-
 5 files changed, 100 insertions(+), 17 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index b5f54101e8aa..4a74a9372a15 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -118,13 +118,14 @@ pci_find_max_end_va(void)

 /* map a particular resource from a file */
 void *
-pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
+int additional_flags)
 {
void *mapaddr;

/* Map the PCI memory resource of device */
mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
-   MAP_SHARED, fd, offset);
+   MAP_SHARED | additional_flags, fd, offset);
if (mapaddr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s 
(%p)\n",
__func__, fd, requested_addr,
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index 1070eb88fe0a..0a0853d4c4df 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -66,7 +66,7 @@ extern void *pci_map_addr;
 void *pci_find_max_end_va(void);

 void *pci_map_resource(void *requested_addr, int fd, off_t offset,
-   size_t size);
+  size_t size, int additional_flags);

 /* map IGB_UIO resource prototype */
 int pci_uio_map_resource(struct rte_pci_device *dev);
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index e53f06b82430..eaa2e36f643e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -139,7 +139,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev)

if (pci_map_resource(uio_res->maps[i].addr, fd,
 (off_t)uio_res->maps[i].offset,
-(size_t)uio_res->maps[i].size)
+(size_t)uio_res->maps[i].size, 0)
!= uio_res->maps[i].addr) {
RTE_LOG(ERR, EAL,
"Cannot mmap device resource\n");
@@ -379,7 +379,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
pci_map_addr = pci_find_max_end_va();

mapaddr = pci_map_resource(pci_map_addr, fd, 
(off_t)offset,
-   (size_t)maps[j].size);
+   (size_t)maps[j].size, 0);
if (mapaddr == MAP_FAILED)
fail = 1;

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 20e097727f80..c8df91c0f800 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -62,6 +62,9 @@

 #ifdef VFIO_PRESENT

+#define PAGE_SIZE   (sysconf(_SC_PAGESIZE))
+#define PAGE_MASK   (~(PAGE_SIZE - 1))
+
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
@@ -72,10 +75,12 @@ static struct vfio_config vfio_cfg;

 /* get PCI BAR number where MSI-X interrupts are */
 static int
-pci_vfio_get_msix_bar(int fd, int *msix_bar)
+pci_vfio_get_msix_bar(int fd, int *msix_bar, uint32_t *msix_table_offset,
+ uint32_t *msix_table_size)
 {
int ret;
uint32_t reg;
+   uint16_t flags;
uint8_t cap_id, cap_offset;

/* read PCI capability pointer from config space */
@@ -134,7 +139,18 @@ pci_vfio_get_msix_bar(int fd, int *msix_bar)
return -1;
}

+   ret = pread64(fd, &flags, sizeof(flags),
+   
VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+   cap_offset + 2);
+   if (ret != sizeof(flags)) {
+   RTE_LOG(ERR, EAL, "Cannot read table flags from 
PCI config "
+

[dpdk-dev] [PATCH v3 00/16] support multi-pthread per core

2015-01-29 Thread Cunming Liang

v3 changes:
  add sched_yield() in rte_ring to avoid long spin [15/16]

v2 changes:
  add '-' support for EAL option '--lcores'

The patch series contain the enhancements of EAL and fixes for libraries
to run multi-pthreads(either EAL or non-EAL thread) per physical core.
Two major changes list as below:
- Extend the core affinity of each EAL thread to 1:n.
  Each lcore stands for a EAL thread rather than a logical core.
  The change adds new EAL option to allow static lcore to cpuset assginment.
  Then a lcore(EAL thread) affinity to a cpuset, original 1:1 mapping is the 
special case.
- Fix the libraries to allow running on any non-EAL thread.
  It fix the gaps running libraries in non-EAL thread(dynamic created by user).
  Each fix libraries take care the case of rte_lcore_id() >= RTE_MAX_LCORE.

Thanks a million for the comments from Konstantin, Bruce, Mirek and Stephen in 
RFC review.



*** BLURB HERE ***

Cunming Liang (16):
  eal: add cpuset into per EAL thread lcore_config
  eal: new eal option '--lcores' for cpu assignment
  eal: add support parsing socket_id from cpuset
  eal: new TLS definition and API declaration
  eal: add eal_common_thread.c for common thread API
  eal: add rte_gettid() to acquire unique system tid
  eal: apply affinity of EAL thread by assigned cpuset
  enic: fix re-define freebsd compile complain
  malloc: fix the issue of SOCKET_ID_ANY
  log: fix the gap to support non-EAL thread
  eal: set _lcore_id and _socket_id to (-1) by default
  eal: fix recursive spinlock in non-EAL thraed
  mempool: add support to non-EAL thread
  ring: add support to non-EAL thread
  ring: add sched_yield to avoid spin forever
  timer: add support to non-EAL thread

 lib/librte_eal/bsdapp/eal/Makefile |   1 +
 lib/librte_eal/bsdapp/eal/eal.c|  13 +-
 lib/librte_eal/bsdapp/eal/eal_lcore.c  |  14 +
 lib/librte_eal/bsdapp/eal/eal_memory.c |   2 +
 lib/librte_eal/bsdapp/eal/eal_thread.c |  76 +++---
 lib/librte_eal/common/eal_common_launch.c  |   1 -
 lib/librte_eal/common/eal_common_log.c |  17 +-
 lib/librte_eal/common/eal_common_options.c | 300 -
 lib/librte_eal/common/eal_common_thread.c  | 142 ++
 lib/librte_eal/common/eal_options.h|   2 +
 lib/librte_eal/common/eal_thread.h |  66 +
 .../common/include/generic/rte_spinlock.h  |   4 +-
 lib/librte_eal/common/include/rte_eal.h|  27 ++
 lib/librte_eal/common/include/rte_lcore.h  |  37 ++-
 lib/librte_eal/common/include/rte_log.h|   5 +
 lib/librte_eal/linuxapp/eal/Makefile   |   4 +
 lib/librte_eal/linuxapp/eal/eal.c  |   7 +-
 lib/librte_eal/linuxapp/eal/eal_lcore.c|  15 ++
 lib/librte_eal/linuxapp/eal/eal_thread.c   |  78 +++---
 lib/librte_malloc/malloc_heap.h|   7 +-
 lib/librte_mempool/rte_mempool.h   |  18 +-
 lib/librte_pmd_enic/enic.h |   1 +
 lib/librte_pmd_enic/enic_compat.h  |   1 +
 lib/librte_ring/rte_ring.h |  35 ++-
 lib/librte_timer/rte_timer.c   |  40 ++-
 lib/librte_timer/rte_timer.h   |   2 +-
 26 files changed, 778 insertions(+), 137 deletions(-)
 create mode 100644 lib/librte_eal/common/eal_common_thread.c

-- 
1.8.1.4

[dpdk-dev] [PATCH v3 01/16] eal: add cpuset into per EAL thread lcore_config

2015-01-29 Thread Cunming Liang

The patch adds 'cpuset' into per-lcore configure 'lcore_config[]',
as the lcore no longer always 1:1 pinning with physical cpu.
The lcore now stands for a EAL thread rather than a logical cpu.

It doesn't change the default behavior of 1:1 mapping, but allows to
affinity the EAL thread to multiple cpus.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_lcore.c | 7 +++
 lib/librte_eal/bsdapp/eal/eal_memory.c| 2 ++
 lib/librte_eal/common/include/rte_lcore.h | 8 
 lib/librte_eal/linuxapp/eal/Makefile  | 1 +
 lib/librte_eal/linuxapp/eal/eal_lcore.c   | 8 
 5 files changed, 26 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_lcore.c 
b/lib/librte_eal/bsdapp/eal/eal_lcore.c
index 662f024..72f8ac2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_lcore.c
+++ b/lib/librte_eal/bsdapp/eal/eal_lcore.c
@@ -76,11 +76,18 @@ rte_eal_cpu_init(void)
 * ones and enable them by default.
 */
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+   /* init cpuset for per lcore config */
+   CPU_ZERO(&lcore_config[lcore_id].cpuset);
+
lcore_config[lcore_id].detected = (lcore_id < ncpus);
if (lcore_config[lcore_id].detected == 0) {
config->lcore_role[lcore_id] = ROLE_OFF;
continue;
}
+
+   /* By default, lcore 1:1 map to cpu id */
+   CPU_SET(lcore_id, &lcore_config[lcore_id].cpuset);
+
/* By default, each detected core is enabled */
config->lcore_role[lcore_id] = ROLE_RTE;
lcore_config[lcore_id].core_id = cpu_core_id(lcore_id);
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c 
b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ee87d..a34d500 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -45,6 +45,8 @@
 #include "eal_internal_cfg.h"
 #include "eal_filesystem.h"

+/* avoid re-defined against with freebsd header */
+#undef PAGE_SIZE
 #define PAGE_SIZE (sysconf(_SC_PAGESIZE))

 /*
diff --git a/lib/librte_eal/common/include/rte_lcore.h 
b/lib/librte_eal/common/include/rte_lcore.h
index 49b2c03..4c7d6bb 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -50,6 +50,13 @@ extern "C" {

 #define LCORE_ID_ANY -1/**< Any lcore. */

+#if defined(__linux__)
+   typedef cpu_set_t rte_cpuset_t;
+#elif defined(__FreeBSD__)
+#include 
+   typedef cpuset_t rte_cpuset_t;
+#endif
+
 /**
  * Structure storing internal configuration (per-lcore)
  */
@@ -65,6 +72,7 @@ struct lcore_config {
unsigned socket_id;/**< physical socket id for this lcore */
unsigned core_id;  /**< core number on socket for this lcore */
int core_index;/**< relative index, starting from 0 */
+   rte_cpuset_t cpuset;   /**< cpu set which the lcore affinity to */
 };

 /**
diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 72ecf3a..0e9c447 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -87,6 +87,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_dev.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_options.c

 CFLAGS_eal.o := -D_GNU_SOURCE
+CFLAGS_eal_lcore.o := -D_GNU_SOURCE
 CFLAGS_eal_thread.o := -D_GNU_SOURCE
 CFLAGS_eal_log.o := -D_GNU_SOURCE
 CFLAGS_eal_common_log.o := -D_GNU_SOURCE
diff --git a/lib/librte_eal/linuxapp/eal/eal_lcore.c 
b/lib/librte_eal/linuxapp/eal/eal_lcore.c
index c67e0e6..29615f8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_lcore.c
+++ b/lib/librte_eal/linuxapp/eal/eal_lcore.c
@@ -158,11 +158,19 @@ rte_eal_cpu_init(void)
 * ones and enable them by default.
 */
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+   /* init cpuset for per lcore config */
+   CPU_ZERO(&lcore_config[lcore_id].cpuset);
+
+   /* in 1:1 mapping, record related cpu detected state */
lcore_config[lcore_id].detected = cpu_detected(lcore_id);
if (lcore_config[lcore_id].detected == 0) {
config->lcore_role[lcore_id] = ROLE_OFF;
continue;
}
+
+   /* By default, lcore 1:1 map to cpu id */
+   CPU_SET(lcore_id, &lcore_config[lcore_id].cpuset);
+
/* By default, each detected core is enabled */
config->lcore_role[lcore_id] = ROLE_RTE;
lcore_config[lcore_id].core_id = cpu_core_id(lcore_id);
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 02/16] eal: new eal option '--lcores' for cpu assignment

2015-01-29 Thread Cunming Liang

It supports one new eal long option '--lcores' for EAL thread cpuset assignment.

The format pattern:
--lcores='lcores[@cpus]<,lcores[@cpus]>'
lcores, cpus could be a single digit/range or a group.
'(' and ')' are necessary if it's a group.
If not supply '@cpus', the value of cpus uses the same as lcores.

e.g. '1,2@(5-7),(3-5)@(0,2),(0,6),7-8' means starting 9 EAL thread as below
  lcore 0 runs on cpuset 0x41 (cpu 0,6)
  lcore 1 runs on cpuset 0x2 (cpu 1)
  lcore 2 runs on cpuset 0xe0 (cpu 5,6,7)
  lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2)
  lcore 6 runs on cpuset 0x41 (cpu 0,6)
  lcore 7 runs on cpuset 0x80 (cpu 7)
  lcore 8 runs on cpuset 0x100 (cpu 8)

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/common/eal_common_launch.c  |   1 -
 lib/librte_eal/common/eal_common_options.c | 300 -
 lib/librte_eal/common/eal_options.h|   2 +
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 4 files changed, 299 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_launch.c 
b/lib/librte_eal/common/eal_common_launch.c
index 599f83b..2d732b1 100644
--- a/lib/librte_eal/common/eal_common_launch.c
+++ b/lib/librte_eal/common/eal_common_launch.c
@@ -117,4 +117,3 @@ rte_eal_mp_wait_lcore(void)
rte_eal_wait_lcore(lcore_id);
}
 }
-
diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 67e02dc..29ebb6f 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "eal_internal_cfg.h"
 #include "eal_options.h"
@@ -85,6 +86,7 @@ eal_long_options[] = {
{OPT_XEN_DOM0, 0, 0, OPT_XEN_DOM0_NUM},
{OPT_CREATE_UIO_DEV, 1, NULL, OPT_CREATE_UIO_DEV_NUM},
{OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
+   {OPT_LCORES, 1, 0, OPT_LCORES_NUM},
{0, 0, 0, 0}
 };

@@ -255,9 +257,11 @@ eal_parse_corelist(const char *corelist)
if (min == RTE_MAX_LCORE)
min = idx;
for (idx = min; idx <= max; idx++) {
-   cfg->lcore_role[idx] = ROLE_RTE;
-   lcore_config[idx].core_index = count;
-   count++;
+   if (cfg->lcore_role[idx] != ROLE_RTE) {
+   cfg->lcore_role[idx] = ROLE_RTE;
+   lcore_config[idx].core_index = count;
+   count++;
+   }
}
min = RTE_MAX_LCORE;
} else
@@ -292,6 +296,279 @@ eal_parse_master_lcore(const char *arg)
return 0;
 }

+/*
+ * Parse elem, the elem could be single number/range or '(' ')' group
+ * Within group elem, '-' used for a range seperator;
+ *',' used for a single number.
+ */
+static int
+eal_parse_set(const char *input, uint16_t set[], unsigned num)
+{
+   unsigned idx;
+   const char *str = input;
+   char *end = NULL;
+   unsigned min, max;
+
+   memset(set, 0, num * sizeof(uint16_t));
+
+   while (isblank(*str))
+   str++;
+
+   /* only digit or left bracket is qulify for start point */
+   if ((!isdigit(*str) && *str != '(') || *str == '\0')
+   return -1;
+
+   /* process single number or single range of number */
+   if (*str != '(') {
+   errno = 0;
+   idx = strtoul(str, &end, 10);
+   if (errno || end == NULL || idx >= num)
+   return -1;
+   else {
+   while (isblank(*end))
+   end++;
+
+   min = idx;
+   max = idx;
+   if (*end == '-') {
+   /* proccess single - */
+   end++;
+   while (isblank(*end))
+   end++;
+   if (!isdigit(*end))
+   return -1;
+
+   errno = 0;
+   idx = strtoul(end, &end, 10);
+   if (errno || end == NULL || idx >= num)
+   return -1;
+   max = idx;
+   while (isblank(*end))
+   end++;
+   if (*end != ',' && *end != '\0')
+   return -1;
+   }
+
+   if (*end != ',' && *end != '\0' &&
+   *end != '@')
+   return -1;
+
+   for (idx = RTE_MIN(min, max);
+idx <= R

[dpdk-dev] [PATCH v3 03/16] eal: add support parsing socket_id from cpuset

2015-01-29 Thread Cunming Liang

It returns the socket_id if all cpus in the cpuset belongs
to the same NUMA node, otherwise it will return SOCKET_ID_ANY.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_lcore.c   |  7 +
 lib/librte_eal/common/eal_thread.h  | 52 +
 lib/librte_eal/linuxapp/eal/eal_lcore.c |  7 +
 3 files changed, 66 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_lcore.c 
b/lib/librte_eal/bsdapp/eal/eal_lcore.c
index 72f8ac2..162fb4f 100644
--- a/lib/librte_eal/bsdapp/eal/eal_lcore.c
+++ b/lib/librte_eal/bsdapp/eal/eal_lcore.c
@@ -41,6 +41,7 @@
 #include 

 #include "eal_private.h"
+#include "eal_thread.h"

 /* No topology information available on FreeBSD including NUMA info */
 #define cpu_core_id(X) 0
@@ -112,3 +113,9 @@ rte_eal_cpu_init(void)

return 0;
 }
+
+unsigned
+eal_cpu_socket_id(__rte_unused unsigned cpu_id)
+{
+   return cpu_socket_id(cpu_id);
+}
diff --git a/lib/librte_eal/common/eal_thread.h 
b/lib/librte_eal/common/eal_thread.h
index b53b84d..a25ee86 100644
--- a/lib/librte_eal/common/eal_thread.h
+++ b/lib/librte_eal/common/eal_thread.h
@@ -34,6 +34,10 @@
 #ifndef EAL_THREAD_H
 #define EAL_THREAD_H

+#include 
+
+#include 
+
 /**
  * basic loop of thread, called for each thread by eal_init().
  *
@@ -50,4 +54,52 @@ __attribute__((noreturn)) void *eal_thread_loop(void *arg);
  */
 void eal_thread_init_master(unsigned lcore_id);

+/**
+ * Get the NUMA socket id from cpu id.
+ * This function is private to EAL.
+ *
+ * @param cpu_id
+ *   The logical process id.
+ * @return
+ *   socket_id or SOCKET_ID_ANY
+ */
+unsigned eal_cpu_socket_id(unsigned cpu_id);
+
+/**
+ * Get the NUMA socket id from cpuset.
+ * This function is private to EAL.
+ *
+ * @param cpusetp
+ *   The point to a valid cpu set.
+ * @return
+ *   socket_id or SOCKET_ID_ANY
+ */
+static inline int
+eal_cpuset_socket_id(rte_cpuset_t *cpusetp)
+{
+   unsigned cpu = 0;
+   int socket_id = SOCKET_ID_ANY;
+   int sid;
+
+   if (cpusetp == NULL)
+   return SOCKET_ID_ANY;
+
+   do {
+   if (!CPU_ISSET(cpu, cpusetp))
+   continue;
+
+   if (socket_id == SOCKET_ID_ANY)
+   socket_id = eal_cpu_socket_id(cpu);
+
+   sid = eal_cpu_socket_id(cpu);
+   if (socket_id != sid) {
+   socket_id = SOCKET_ID_ANY;
+   break;
+   }
+
+   } while (++cpu < RTE_MAX_LCORE);
+
+   return socket_id;
+}
+
 #endif /* EAL_THREAD_H */
diff --git a/lib/librte_eal/linuxapp/eal/eal_lcore.c 
b/lib/librte_eal/linuxapp/eal/eal_lcore.c
index 29615f8..922af6d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_lcore.c
+++ b/lib/librte_eal/linuxapp/eal/eal_lcore.c
@@ -45,6 +45,7 @@

 #include "eal_private.h"
 #include "eal_filesystem.h"
+#include "eal_thread.h"

 #define SYS_CPU_DIR "/sys/devices/system/cpu/cpu%u"
 #define CORE_ID_FILE "topology/core_id"
@@ -197,3 +198,9 @@ rte_eal_cpu_init(void)

return 0;
 }
+
+unsigned
+eal_cpu_socket_id(unsigned cpu_id)
+{
+   return cpu_socket_id(cpu_id);
+}
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 04/16] eal: new TLS definition and API declaration

2015-01-29 Thread Cunming Liang

1. add two TLS *_socket_id* and *_cpuset*
2. add two external API rte_thread_set/get_affinity
3. add one internal API eal_thread_dump_affinity

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_thread.c|  2 ++
 lib/librte_eal/common/eal_thread.h| 14 ++
 lib/librte_eal/common/include/rte_lcore.h | 29 +++--
 lib/librte_eal/linuxapp/eal/eal_thread.c  |  2 ++
 4 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c 
b/lib/librte_eal/bsdapp/eal/eal_thread.c
index ab05368..10220c7 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -56,6 +56,8 @@
 #include "eal_thread.h"

 RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
+RTE_DEFINE_PER_LCORE(unsigned, _socket_id);
+RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);

 /*
  * Send a message to a slave lcore identified by slave_id to call a
diff --git a/lib/librte_eal/common/eal_thread.h 
b/lib/librte_eal/common/eal_thread.h
index a25ee86..28edf51 100644
--- a/lib/librte_eal/common/eal_thread.h
+++ b/lib/librte_eal/common/eal_thread.h
@@ -102,4 +102,18 @@ eal_cpuset_socket_id(rte_cpuset_t *cpusetp)
return socket_id;
 }

+/**
+ * Dump the current pthread cpuset.
+ * This function is private to EAL.
+ *
+ * @param str
+ *   The string buffer the cpuset will dump to.
+ * @param size
+ *   The string buffer size.
+ */
+#define CPU_STR_LEN256
+void
+eal_thread_dump_affinity(char str[], unsigned size);
+
+
 #endif /* EAL_THREAD_H */
diff --git a/lib/librte_eal/common/include/rte_lcore.h 
b/lib/librte_eal/common/include/rte_lcore.h
index 4c7d6bb..facdbdc 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 

 #ifdef __cplusplus
 extern "C" {
@@ -80,7 +81,9 @@ struct lcore_config {
  */
 extern struct lcore_config lcore_config[RTE_MAX_LCORE];

-RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per core "core id". */
+RTE_DECLARE_PER_LCORE(unsigned, _lcore_id);  /**< Per thread "lcore id". */
+RTE_DECLARE_PER_LCORE(unsigned, _socket_id); /**< Per thread "socket id". */
+RTE_DECLARE_PER_LCORE(rte_cpuset_t, _cpuset); /**< Per thread "cpuset". */

 /**
  * Return the ID of the execution unit we are running on.
@@ -146,7 +149,7 @@ rte_lcore_index(int lcore_id)
 static inline unsigned
 rte_socket_id(void)
 {
-   return lcore_config[rte_lcore_id()].socket_id;
+   return RTE_PER_LCORE(_socket_id);
 }

 /**
@@ -229,6 +232,28 @@ rte_get_next_lcore(unsigned i, int skip_master, int wrap)
 i

[dpdk-dev] [PATCH v3 05/16] eal: add eal_common_thread.c for common thread API

2015-01-29 Thread Cunming Liang

The API works for both EAL thread and none EAL thread.
When calling rte_thread_set_affinity, the *_socket_id* and
*_cpuset* of calling thread will be updated if the thread
successful set the cpu affinity.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/Makefile|   1 +
 lib/librte_eal/common/eal_common_thread.c | 142 ++
 lib/librte_eal/linuxapp/eal/Makefile  |   2 +
 3 files changed, 145 insertions(+)
 create mode 100644 lib/librte_eal/common/eal_common_thread.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
b/lib/librte_eal/bsdapp/eal/Makefile
index d434882..78406be 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -73,6 +73,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_hexdump.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_devargs.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_dev.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_options.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_thread.c

 CFLAGS_eal.o := -D_GNU_SOURCE
 #CFLAGS_eal_thread.o := -D_GNU_SOURCE
diff --git a/lib/librte_eal/common/eal_common_thread.c 
b/lib/librte_eal/common/eal_common_thread.c
new file mode 100644
index 000..d996690
--- /dev/null
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -0,0 +1,142 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "eal_thread.h"
+
+int
+rte_thread_set_affinity(rte_cpuset_t *cpusetp)
+{
+   int s;
+   unsigned lcore_id;
+   pthread_t tid;
+
+   if (!cpusetp)
+   return -1;
+
+   lcore_id = rte_lcore_id();
+   if (lcore_id != (unsigned)LCORE_ID_ANY) {
+   /* EAL thread */
+   tid = lcore_config[lcore_id].thread_id;
+
+   s = pthread_setaffinity_np(tid, sizeof(rte_cpuset_t), cpusetp);
+   if (s != 0) {
+   RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
+   return -1;
+   }
+
+   /* store socket_id in TLS for quick access */
+   RTE_PER_LCORE(_socket_id) =
+   eal_cpuset_socket_id(cpusetp);
+
+   /* store cpuset in TLS for quick access */
+   rte_memcpy(&RTE_PER_LCORE(_cpuset), cpusetp,
+  sizeof(rte_cpuset_t));
+
+   /* update lcore_config */
+   lcore_config[lcore_id].socket_id = RTE_PER_LCORE(_socket_id);
+   rte_memcpy(&lcore_config[lcore_id].cpuset, cpusetp,
+  sizeof(rte_cpuset_t));
+   } else {
+   /* none EAL thread */
+   tid = pthread_self();
+
+   s = pthread_setaffinity_np(tid, sizeof(rte_cpuset_t), cpusetp);
+   if (s != 0) {
+   RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
+   return -1;
+   }
+
+   /* store cpuset in TLS for quick access */
+   rte_memcpy(&RTE_PER_LCORE(_cpuset), cpusetp,
+  sizeof(rte_cpuset_t));
+
+   /* store socket_id in TLS for quick access */
+   RTE_PER_LCORE(_socket_id) =
+

[dpdk-dev] [PATCH v3 06/16] eal: add rte_gettid() to acquire unique system tid

2015-01-29 Thread Cunming Liang

The rte_gettid() wraps the linux and freebsd syscall gettid().
It provides a persistent unique thread id for the calling thread.
It will save the unique id in TLS on the first time.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_thread.c   |  9 +
 lib/librte_eal/common/include/rte_eal.h  | 27 +++
 lib/librte_eal/linuxapp/eal/eal_thread.c |  7 +++
 3 files changed, 43 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c 
b/lib/librte_eal/bsdapp/eal/eal_thread.c
index 10220c7..d0c077b 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -233,3 +234,11 @@ eal_thread_loop(__attribute__((unused)) void *arg)
/* pthread_exit(NULL); */
/* return NULL; */
 }
+
+/* require calling thread tid by gettid() */
+int rte_sys_gettid(void)
+{
+   long lwpid;
+   thr_self(&lwpid);
+   return (int)lwpid;
+}
diff --git a/lib/librte_eal/common/include/rte_eal.h 
b/lib/librte_eal/common/include/rte_eal.h
index f4ecd2e..8ccdd65 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -41,6 +41,9 @@
  */

 #include 
+#include 
+
+#include 

 #ifdef __cplusplus
 extern "C" {
@@ -262,6 +265,30 @@ rte_set_application_usage_hook( rte_usage_hook_t 
usage_func );
  */
 int rte_eal_has_hugepages(void);

+/**
+ * A wrap API for syscall gettid.
+ *
+ * @return
+ *   On success, returns the thread ID of calling process.
+ *   It always successful.
+ */
+int rte_sys_gettid(void);
+
+/**
+ * Get system unique thread id.
+ *
+ * @return
+ *   On success, returns the thread ID of calling process.
+ *   It always successful.
+ */
+static inline int rte_gettid(void)
+{
+   static RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
+   if (RTE_PER_LCORE(_thread_id) == -1)
+   RTE_PER_LCORE(_thread_id) = rte_sys_gettid();
+   return RTE_PER_LCORE(_thread_id);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c 
b/lib/librte_eal/linuxapp/eal/eal_thread.c
index 748a83a..ed20c93 100644
--- a/lib/librte_eal/linuxapp/eal/eal_thread.c
+++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -233,3 +234,9 @@ eal_thread_loop(__attribute__((unused)) void *arg)
/* pthread_exit(NULL); */
/* return NULL; */
 }
+
+/* require calling thread tid by gettid() */
+int rte_sys_gettid(void)
+{
+   return (int)syscall(SYS_gettid);
+}
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 08/16] enic: fix re-define freebsd compile complain

2015-01-29 Thread Cunming Liang

Some macro already been defined by freebsd 'sys/param.h'.

Signed-off-by: Cunming Liang 
---
 lib/librte_pmd_enic/enic.h| 1 +
 lib/librte_pmd_enic/enic_compat.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/lib/librte_pmd_enic/enic.h b/lib/librte_pmd_enic/enic.h
index c43417c..189c3b9 100644
--- a/lib/librte_pmd_enic/enic.h
+++ b/lib/librte_pmd_enic/enic.h
@@ -66,6 +66,7 @@
 #define ENIC_CALC_IP_CKSUM  1
 #define ENIC_CALC_TCP_UDP_CKSUM 2
 #define ENIC_MAX_MTU9000
+#undef PAGE_SIZE
 #define PAGE_SIZE   4096
 #define PAGE_ROUND_UP(x) \
unsigned long)(x)) + PAGE_SIZE-1) & (~(PAGE_SIZE-1)))
diff --git a/lib/librte_pmd_enic/enic_compat.h 
b/lib/librte_pmd_enic/enic_compat.h
index b1af838..b84c766 100644
--- a/lib/librte_pmd_enic/enic_compat.h
+++ b/lib/librte_pmd_enic/enic_compat.h
@@ -67,6 +67,7 @@
 #define pr_warn(y, args...) dev_warning(0, y, ##args)
 #define BUG() pr_err("BUG at %s:%d", __func__, __LINE__)

+#undef ALIGN
 #define ALIGN(x, a)  __ALIGN_MASK(x, (typeof(x))(a)-1)
 #define __ALIGN_MASK(x, mask)(((x)+(mask))&~(mask))
 #define udelay usleep
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 07/16] eal: apply affinity of EAL thread by assigned cpuset

2015-01-29 Thread Cunming Liang

EAL threads use assigned cpuset to set core affinity during startup.
It keeps 1:1 mapping, if no '--lcores' option is used.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal.c  | 13 ---
 lib/librte_eal/bsdapp/eal/eal_thread.c   | 63 +-
 lib/librte_eal/linuxapp/eal/eal.c|  7 +++-
 lib/librte_eal/linuxapp/eal/eal_thread.c | 67 +++-
 4 files changed, 54 insertions(+), 96 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 69f3c03..98c5a83 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -432,6 +432,7 @@ rte_eal_init(int argc, char **argv)
int i, fctret, ret;
pthread_t thread_id;
static rte_atomic32_t run_once = RTE_ATOMIC32_INIT(0);
+   char cpuset[CPU_STR_LEN];

if (!rte_atomic32_test_and_set(&run_once))
return -1;
@@ -502,13 +503,17 @@ rte_eal_init(int argc, char **argv)
if (rte_eal_pci_init() < 0)
rte_panic("Cannot init PCI\n");

-   RTE_LOG(DEBUG, EAL, "Master core %u is ready (tid=%p)\n",
-   rte_config.master_lcore, thread_id);
-
eal_check_mem_on_local_socket();

rte_eal_mcfg_complete();

+   eal_thread_init_master(rte_config.master_lcore);
+
+   eal_thread_dump_affinity(cpuset, CPU_STR_LEN);
+
+   RTE_LOG(DEBUG, EAL, "Master lcore %u is ready (tid=%p;cpuset=[%s])\n",
+   rte_config.master_lcore, thread_id, cpuset);
+
if (rte_eal_dev_init() < 0)
rte_panic("Cannot init pmd devices\n");

@@ -532,8 +537,6 @@ rte_eal_init(int argc, char **argv)
rte_panic("Cannot create thread\n");
}

-   eal_thread_init_master(rte_config.master_lcore);
-
/*
 * Launch a dummy function on all slave lcores, so that master lcore
 * knows they are all ready when this function returns.
diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c 
b/lib/librte_eal/bsdapp/eal/eal_thread.c
index d0c077b..5b16302 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -103,55 +103,27 @@ eal_thread_set_affinity(void)
 {
int s;
pthread_t thread;
-
-/*
- * According to the section VERSIONS of the CPU_ALLOC man page:
- *
- * The CPU_ZERO(), CPU_SET(), CPU_CLR(), and CPU_ISSET() macros were added
- * in glibc 2.3.3.
- *
- * CPU_COUNT() first appeared in glibc 2.6.
- *
- * CPU_AND(), CPU_OR(), CPU_XOR(),CPU_EQUAL(),CPU_ALLOC(),
- * CPU_ALLOC_SIZE(), CPU_FREE(), CPU_ZERO_S(),  CPU_SET_S(),  CPU_CLR_S(),
- * CPU_ISSET_S(),  CPU_AND_S(), CPU_OR_S(), CPU_XOR_S(), and CPU_EQUAL_S()
- * first appeared in glibc 2.7.
- */
-#if defined(CPU_ALLOC)
-   size_t size;
-   cpu_set_t *cpusetp;
-
-   cpusetp = CPU_ALLOC(RTE_MAX_LCORE);
-   if (cpusetp == NULL) {
-   RTE_LOG(ERR, EAL, "CPU_ALLOC failed\n");
-   return -1;
-   }
-
-   size = CPU_ALLOC_SIZE(RTE_MAX_LCORE);
-   CPU_ZERO_S(size, cpusetp);
-   CPU_SET_S(rte_lcore_id(), size, cpusetp);
+   unsigned lcore_id = rte_lcore_id();

thread = pthread_self();
-   s = pthread_setaffinity_np(thread, size, cpusetp);
+   s = pthread_setaffinity_np(thread, sizeof(cpuset_t),
+  &lcore_config[lcore_id].cpuset);
if (s != 0) {
RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
-   CPU_FREE(cpusetp);
return -1;
}

-   CPU_FREE(cpusetp);
-#else /* CPU_ALLOC */
-   cpuset_t cpuset;
-   CPU_ZERO( &cpuset );
-   CPU_SET( rte_lcore_id(), &cpuset );
+   /* acquire system unique id  */
+   rte_gettid();
+
+   /* store socket_id in TLS for quick access */
+   RTE_PER_LCORE(_socket_id) =
+   eal_cpuset_socket_id(&lcore_config[lcore_id].cpuset);
+
+   CPU_COPY(&lcore_config[lcore_id].cpuset, &RTE_PER_LCORE(_cpuset));
+
+   lcore_config[lcore_id].socket_id = RTE_PER_LCORE(_socket_id);

-   thread = pthread_self();
-   s = pthread_setaffinity_np(thread, sizeof( cpuset ), &cpuset);
-   if (s != 0) {
-   RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n");
-   return -1;
-   }
-#endif
return 0;
 }

@@ -174,6 +146,7 @@ eal_thread_loop(__attribute__((unused)) void *arg)
unsigned lcore_id;
pthread_t thread_id;
int m2s, s2m;
+   char cpuset[CPU_STR_LEN];

thread_id = pthread_self();

@@ -185,9 +158,6 @@ eal_thread_loop(__attribute__((unused)) void *arg)
if (lcore_id == RTE_MAX_LCORE)
rte_panic("cannot retrieve lcore id\n");

-   RTE_LOG(DEBUG, EAL, "Core %u is ready (tid=%p)\n",
-   lcore_id, thread_id);
-
m2s = lcore_config[lcore_id].pipe_master2slave[0];
s2m = lcore_config[lcore_id].pipe_slave2master[1];

@@ -198,6 +168,11

[dpdk-dev] [PATCH v3 09/16] malloc: fix the issue of SOCKET_ID_ANY

2015-01-29 Thread Cunming Liang

Add check for rte_socket_id(), avoid get unexpected return like (-1).

Signed-off-by: Cunming Liang 
---
 lib/librte_malloc/malloc_heap.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/librte_malloc/malloc_heap.h b/lib/librte_malloc/malloc_heap.h
index b4aec45..a47136d 100644
--- a/lib/librte_malloc/malloc_heap.h
+++ b/lib/librte_malloc/malloc_heap.h
@@ -44,7 +44,12 @@ extern "C" {
 static inline unsigned
 malloc_get_numa_socket(void)
 {
-   return rte_socket_id();
+   unsigned socket_id = rte_socket_id();
+
+   if (socket_id == (unsigned)SOCKET_ID_ANY)
+   return 0;
+
+   return socket_id;
 }

 void *
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 10/16] log: fix the gap to support non-EAL thread

2015-01-29 Thread Cunming Liang

For those non-EAL thread, *_lcore_id* is invalid and probably larger than 
RTE_MAX_LCORE.
The patch adds the check and allows only EAL thread using EAL per thread log 
level and log type.
Others shares the global log level.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/common/eal_common_log.c  | 17 +++--
 lib/librte_eal/common/include/rte_log.h |  5 +
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_log.c 
b/lib/librte_eal/common/eal_common_log.c
index cf57619..e8dc94a 100644
--- a/lib/librte_eal/common/eal_common_log.c
+++ b/lib/librte_eal/common/eal_common_log.c
@@ -193,11 +193,20 @@ rte_set_log_type(uint32_t type, int enable)
rte_logs.type &= (~type);
 }

+/* Get global log type */
+uint32_t
+rte_get_log_type(void)
+{
+   return rte_logs.type;
+}
+
 /* get the current loglevel for the message beeing processed */
 int rte_log_cur_msg_loglevel(void)
 {
unsigned lcore_id;
lcore_id = rte_lcore_id();
+   if (lcore_id >= RTE_MAX_LCORE)
+   return rte_get_log_level();
return log_cur_msg[lcore_id].loglevel;
 }

@@ -206,6 +215,8 @@ int rte_log_cur_msg_logtype(void)
 {
unsigned lcore_id;
lcore_id = rte_lcore_id();
+   if (lcore_id >= RTE_MAX_LCORE)
+   return rte_get_log_type();
return log_cur_msg[lcore_id].logtype;
 }

@@ -265,8 +276,10 @@ rte_vlog(__attribute__((unused)) uint32_t level,

/* save loglevel and logtype in a global per-lcore variable */
lcore_id = rte_lcore_id();
-   log_cur_msg[lcore_id].loglevel = level;
-   log_cur_msg[lcore_id].logtype = logtype;
+   if (lcore_id < RTE_MAX_LCORE) {
+   log_cur_msg[lcore_id].loglevel = level;
+   log_cur_msg[lcore_id].logtype = logtype;
+   }

ret = vfprintf(f, format, ap);
fflush(f);
diff --git a/lib/librte_eal/common/include/rte_log.h 
b/lib/librte_eal/common/include/rte_log.h
index db1ea08..f83a0d9 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -144,6 +144,11 @@ uint32_t rte_get_log_level(void);
 void rte_set_log_type(uint32_t type, int enable);

 /**
+ * Get the global log type.
+ */
+uint32_t rte_get_log_type(void);
+
+/**
  * Get the current loglevel for the message being processed.
  *
  * Before calling the user-defined stream for logging, the log
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 13/16] mempool: add support to non-EAL thread

2015-01-29 Thread Cunming Liang

For non-EAL thread, bypass per lcore cache, directly use ring pool.
It allows using rte_mempool in either EAL thread or any user pthread.
As in non-EAL thread, it directly rely on rte_ring and it's none preemptive.
It doesn't suggest to run multi-pthread/cpu which compete the rte_mempool.
It will get bad performance and has critical risk if scheduling policy is RT.

Signed-off-by: Cunming Liang 
---
 lib/librte_mempool/rte_mempool.h | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 3314651..4845f27 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -198,10 +198,12 @@ struct rte_mempool {
  *   Number to add to the object-oriented statistics.
  */
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
-#define __MEMPOOL_STAT_ADD(mp, name, n) do {   \
-   unsigned __lcore_id = rte_lcore_id();   \
-   mp->stats[__lcore_id].name##_objs += n; \
-   mp->stats[__lcore_id].name##_bulk += 1; \
+#define __MEMPOOL_STAT_ADD(mp, name, n) do {\
+   unsigned __lcore_id = rte_lcore_id();   \
+   if (__lcore_id < RTE_MAX_LCORE) {   \
+   mp->stats[__lcore_id].name##_objs += n; \
+   mp->stats[__lcore_id].name##_bulk += 1; \
+   }   \
} while(0)
 #else
 #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0)
@@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const 
*obj_table,
__MEMPOOL_STAT_ADD(mp, put, n);

 #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
-   /* cache is not enabled or single producer */
-   if (unlikely(cache_size == 0 || is_mp == 0))
+   /* cache is not enabled or single producer or none EAL thread */
+   if (unlikely(cache_size == 0 || is_mp == 0 ||
+lcore_id >= RTE_MAX_LCORE))
goto ring_enqueue;

/* Go straight to ring if put would overflow mem allocated for cache */
@@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table,
uint32_t cache_size = mp->cache_size;

/* cache is not enabled or single consumer */
-   if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size))
+   if (unlikely(cache_size == 0 || is_mc == 0 ||
+n >= cache_size || lcore_id >= RTE_MAX_LCORE))
goto ring_dequeue;

cache = &mp->local_cache[lcore_id];
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 14/16] ring: add support to non-EAL thread

2015-01-29 Thread Cunming Liang

ring debug stat won't take care non-EAL thread.

Signed-off-by: Cunming Liang 
---
 lib/librte_ring/rte_ring.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 7cd5f2d..39bacdd 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -188,10 +188,12 @@ struct rte_ring {
  *   The number to add to the object-oriented statistics.
  */
 #ifdef RTE_LIBRTE_RING_DEBUG
-#define __RING_STAT_ADD(r, name, n) do {   \
-   unsigned __lcore_id = rte_lcore_id();   \
-   r->stats[__lcore_id].name##_objs += n;  \
-   r->stats[__lcore_id].name##_bulk += 1;  \
+#define __RING_STAT_ADD(r, name, n) do {\
+   unsigned __lcore_id = rte_lcore_id();   \
+   if (__lcore_id < RTE_MAX_LCORE) {   \
+   r->stats[__lcore_id].name##_objs += n;  \
+   r->stats[__lcore_id].name##_bulk += 1;  \
+   }   \
} while(0)
 #else
 #define __RING_STAT_ADD(r, name, n) do {} while(0)
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 15/16] ring: add sched_yield to avoid spin forever

2015-01-29 Thread Cunming Liang

It does a gentle yield after spin for a while.
It reduces the wasting by spin when the preemption happens.

Signed-off-by: Cunming Liang 
---
 lib/librte_ring/rte_ring.h | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 39bacdd..c16da6e 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -126,6 +126,7 @@ struct rte_ring_debug_stats {

 #define RTE_RING_NAMESIZE 32 /**< The maximum length of a ring name. */
 #define RTE_RING_MZ_PREFIX "RG_"
+#define RTE_RING_PAUSE_REP 0x100  /**< yield after num of times pause. */

 /**
  * An RTE ring structure.
@@ -410,7 +411,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const 
*obj_table,
uint32_t cons_tail, free_entries;
const unsigned max = n;
int success;
-   unsigned i;
+   unsigned i, rep;
uint32_t mask = r->prod.mask;
int ret;

@@ -468,8 +469,14 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const 
*obj_table,
 * If there are other enqueues in progress that preceded us,
 * we need to wait for them to complete
 */
-   while (unlikely(r->prod.tail != prod_head))
-   rte_pause();
+   do {
+   for (rep = RTE_RING_PAUSE_REP;
+rep != 0 && r->prod.tail != prod_head; rep--)
+   rte_pause();
+
+   if (rep == 0)
+   sched_yield();
+   }while(rep == 0);

r->prod.tail = prod_next;
return ret;
@@ -589,7 +596,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void 
**obj_table,
uint32_t cons_next, entries;
const unsigned max = n;
int success;
-   unsigned i;
+   unsigned i, rep;
uint32_t mask = r->prod.mask;

/* move cons.head atomically */
@@ -634,8 +641,14 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void 
**obj_table,
 * If there are other dequeues in progress that preceded us,
 * we need to wait for them to complete
 */
-   while (unlikely(r->cons.tail != cons_head))
-   rte_pause();
+   do {
+   for (rep = RTE_RING_PAUSE_REP;
+rep != 0 && r->cons.tail != cons_head; rep--)
+   rte_pause();
+
+   if (rep == 0)
+   sched_yield();
+   }while(rep == 0);

__RING_STAT_ADD(r, deq_success, n);
r->cons.tail = cons_next;
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 16/16] timer: add support to non-EAL thread

2015-01-29 Thread Cunming Liang

Allow to setup timers only for EAL (lcore) threads (__lcore_id < MAX_LCORE_ID).
E.g. ? dynamically created thread will be able to reset/stop timer for lcore 
thread,
but it will be not allowed to setup timer for itself or another non-lcore 
thread.
rte_timer_manage() for non-lcore thread would simply do nothing and return 
straightway.

Signed-off-by: Cunming Liang 
---
 lib/librte_timer/rte_timer.c | 40 +++-
 lib/librte_timer/rte_timer.h |  2 +-
 2 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 269a992..601c159 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -79,9 +79,10 @@ static struct priv_timer priv_timer[RTE_MAX_LCORE];

 /* when debug is enabled, store some statistics */
 #ifdef RTE_LIBRTE_TIMER_DEBUG
-#define __TIMER_STAT_ADD(name, n) do { \
-   unsigned __lcore_id = rte_lcore_id();   \
-   priv_timer[__lcore_id].stats.name += (n);   \
+#define __TIMER_STAT_ADD(name, n) do { \
+   unsigned __lcore_id = rte_lcore_id();   \
+   if (__lcore_id < RTE_MAX_LCORE) \
+   priv_timer[__lcore_id].stats.name += (n);   \
} while(0)
 #else
 #define __TIMER_STAT_ADD(name, n) do {} while(0)
@@ -127,15 +128,26 @@ timer_set_config_state(struct rte_timer *tim,
unsigned lcore_id;

lcore_id = rte_lcore_id();
+   if (lcore_id >= RTE_MAX_LCORE)
+   lcore_id = LCORE_ID_ANY;

/* wait that the timer is in correct status before update,
 * and mark it as being configured */
while (success == 0) {
prev_status.u32 = tim->status.u32;

+   /*
+* prevent race condition of non-EAL threads
+* to update the timer. When 'owner == LCORE_ID_ANY',
+* it means updated by a non-EAL thread.
+*/
+   if (lcore_id == (unsigned)LCORE_ID_ANY &&
+   (uint16_t)lcore_id == prev_status.owner)
+   return -1;
+
/* timer is running on another core, exit */
if (prev_status.state == RTE_TIMER_RUNNING &&
-   (unsigned)prev_status.owner != lcore_id)
+   prev_status.owner != (uint16_t)lcore_id)
return -1;

/* timer is being configured on another core */
@@ -366,9 +378,13 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,

/* round robin for tim_lcore */
if (tim_lcore == (unsigned)LCORE_ID_ANY) {
-   tim_lcore = rte_get_next_lcore(priv_timer[lcore_id].prev_lcore,
-  0, 1);
-   priv_timer[lcore_id].prev_lcore = tim_lcore;
+   if (lcore_id < RTE_MAX_LCORE) {
+   tim_lcore = rte_get_next_lcore(
+   priv_timer[lcore_id].prev_lcore,
+   0, 1);
+   priv_timer[lcore_id].prev_lcore = tim_lcore;
+   } else
+   tim_lcore = rte_get_next_lcore(LCORE_ID_ANY, 0, 1);
}

/* wait that the timer is in correct status before update,
@@ -378,7 +394,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
return -1;

__TIMER_STAT_ADD(reset, 1);
-   if (prev_status.state == RTE_TIMER_RUNNING) {
+   if (prev_status.state == RTE_TIMER_RUNNING &&
+   lcore_id < RTE_MAX_LCORE) {
priv_timer[lcore_id].updated = 1;
}

@@ -455,7 +472,8 @@ rte_timer_stop(struct rte_timer *tim)
return -1;

__TIMER_STAT_ADD(stop, 1);
-   if (prev_status.state == RTE_TIMER_RUNNING) {
+   if (prev_status.state == RTE_TIMER_RUNNING &&
+   lcore_id < RTE_MAX_LCORE) {
priv_timer[lcore_id].updated = 1;
}

@@ -499,6 +517,10 @@ void rte_timer_manage(void)
uint64_t cur_time;
int i, ret;

+   /* timer manager only runs on EAL thread */
+   if (lcore_id >= RTE_MAX_LCORE)
+   return;
+
__TIMER_STAT_ADD(manage, 1);
/* optimize for the case where per-cpu list is empty */
if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 4907cf5..5c5df91 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -76,7 +76,7 @@ extern "C" {
 #define RTE_TIMER_RUNNING 2 /**< State: timer function is running. */
 #define RTE_TIMER_CONFIG  3 /**< State: timer is being configured. */

-#define RTE_TIMER_NO_OWNER -1 /**< Timer has no owner. */
+#define RTE_TIMER_NO_OWNER -2 /**< Timer has no owner. */

 /**
  * Timer type: Periodic or single (one-shot).
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 11/16] eal: set _lcore_id and _socket_id to (-1) by default

2015-01-29 Thread Cunming Liang

For those none EAL thread, *_lcore_id* shall always be LCORE_ID_ANY.
The libraries using *_lcore_id* as index need to take care.
*_socket_id* always be SOCKET_ID_ANY unitl the thread changes the affinity
by rte_thread_set_affinity()

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/eal_thread.c   | 4 ++--
 lib/librte_eal/linuxapp/eal/eal_thread.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c 
b/lib/librte_eal/bsdapp/eal/eal_thread.c
index 5b16302..2b3c9a8 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -56,8 +56,8 @@
 #include "eal_private.h"
 #include "eal_thread.h"

-RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
-RTE_DEFINE_PER_LCORE(unsigned, _socket_id);
+RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = (unsigned)LCORE_ID_ANY;
+RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
 RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);

 /*
diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c 
b/lib/librte_eal/linuxapp/eal/eal_thread.c
index 6eb1525..ab94e20 100644
--- a/lib/librte_eal/linuxapp/eal/eal_thread.c
+++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
@@ -57,8 +57,8 @@
 #include "eal_private.h"
 #include "eal_thread.h"

-RTE_DEFINE_PER_LCORE(unsigned, _lcore_id);
-RTE_DEFINE_PER_LCORE(unsigned, _socket_id);
+RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = (unsigned)LCORE_ID_ANY;
+RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
 RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);

 /*
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 12/16] eal: fix recursive spinlock in non-EAL thraed

2015-01-29 Thread Cunming Liang

In non-EAL thread, lcore_id alrways be LCORE_ID_ANY.
It cann't be used as unique id for recursive spinlock.
Then use rte_gettid() to replace it.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/common/include/generic/rte_spinlock.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h 
b/lib/librte_eal/common/include/generic/rte_spinlock.h
index dea885c..c7fb0df 100644
--- a/lib/librte_eal/common/include/generic/rte_spinlock.h
+++ b/lib/librte_eal/common/include/generic/rte_spinlock.h
@@ -179,7 +179,7 @@ static inline void 
rte_spinlock_recursive_init(rte_spinlock_recursive_t *slr)
  */
 static inline void rte_spinlock_recursive_lock(rte_spinlock_recursive_t *slr)
 {
-   int id = rte_lcore_id();
+   int id = rte_gettid();

if (slr->user != id) {
rte_spinlock_lock(&slr->sl);
@@ -212,7 +212,7 @@ static inline void 
rte_spinlock_recursive_unlock(rte_spinlock_recursive_t *slr)
  */
 static inline int rte_spinlock_recursive_trylock(rte_spinlock_recursive_t *slr)
 {
-   int id = rte_lcore_id();
+   int id = rte_gettid();

if (slr->user != id) {
if (rte_spinlock_trylock(&slr->sl) == 0)
-- 
1.8.1.4

[dpdk-dev] [PATCH 0/2] enable SRIOV switch in i40e driver

2015-01-29 Thread Jingjing Wu

Enable SRIOV switch in i40e driver. With this patch set, SRIOV switch
can be done on Fortville NICs.

Jingjing Wu (2):
  i40e: fix the bug when configuring vsi
  i40e: enable internal switch of pf

 lib/librte_pmd_i40e/i40e_ethdev.c | 38 +-
 1 file changed, 37 insertions(+), 1 deletion(-)

-- 
1.9.3

[dpdk-dev] [PATCH 1/2] i40e: fix the bug when configuring vsi

2015-01-29 Thread Jingjing Wu

In i40e_vsi_config_tc_queue_mapping, should add a flag to indicate
another valid setting by OR operation, but not set this flag to
valid_sections, otherwise it will overwrite the flags set before.

Signed-off-by: Jingjing Wu 
---
 lib/librte_pmd_i40e/i40e_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index b47a3d2..fe758c2 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -2632,7 +2632,7 @@ i40e_vsi_config_tc_queue_mapping(struct i40e_vsi *vsi,
rte_cpu_to_le_16(I40E_AQ_VSI_QUE_MAP_CONTIG);
info->queue_mapping[0] = rte_cpu_to_le_16(vsi->base_queue);
}
-   info->valid_sections =
+   info->valid_sections |=
rte_cpu_to_le_16(I40E_AQ_VSI_PROP_QUEUE_MAP_VALID);

return I40E_SUCCESS;
-- 
1.9.3

[dpdk-dev] [PATCH 2/2] i40e: enable internal switch of pf

2015-01-29 Thread Jingjing Wu

This patch enables PF's internal switch by setting ALLOWLOOPBACK
flag when VEB is created. With this patch, traffic from PF can be
switched on the VEB.

Signed-off-by: Jingjing Wu 
---
 lib/librte_pmd_i40e/i40e_ethdev.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index fe758c2..94fd36c 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -2854,6 +2854,40 @@ i40e_vsi_dump_bw_config(struct i40e_vsi *vsi)
return 0;
 }

+/*
+ * i40e_enable_pf_lb
+ * @pf: pointer to the pf structure
+ *
+ * allow loopback on pf
+ */
+static inline void
+i40e_enable_pf_lb(struct i40e_pf *pf)
+{
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   struct i40e_vsi_context ctxt;
+   int ret;
+
+   memset(&ctxt, 0, sizeof(ctxt));
+   ctxt.seid = pf->main_vsi_seid;
+   ctxt.pf_num = hw->pf_id;
+   ret = i40e_aq_get_vsi_params(hw, &ctxt, NULL);
+   if (ret) {
+   PMD_DRV_LOG(ERR, "couldn't get pf vsi config, err %d, aq_err 
%d",
+   ret, hw->aq.asq_last_status);
+   return;
+   }
+   ctxt.flags = I40E_AQ_VSI_TYPE_PF;
+   ctxt.info.valid_sections =
+   rte_cpu_to_le_16(I40E_AQ_VSI_PROP_SWITCH_VALID);
+   ctxt.info.switch_id |=
+   rte_cpu_to_le_16(I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB);
+
+   ret = i40e_aq_update_vsi_params(hw, &ctxt, NULL);
+   if (ret)
+   PMD_DRV_LOG(ERR, "update vsi switch failed, aq_err=%d\n",
+   hw->aq.asq_last_status);
+}
+
 /* Setup a VSI */
 struct i40e_vsi *
 i40e_vsi_setup(struct i40e_pf *pf,
@@ -2889,6 +2923,8 @@ i40e_vsi_setup(struct i40e_pf *pf,
PMD_DRV_LOG(ERR, "VEB setup failed");
return NULL;
}
+   /* set ALLOWLOOPBACk on pf, when veb is created */
+   i40e_enable_pf_lb(pf);
}

vsi = rte_zmalloc("i40e_vsi", sizeof(struct i40e_vsi), 0);
-- 
1.9.3

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-29 Thread Wang, Zhihong



> -Original Message-
> From: EDMISON, Kelvin (Kelvin) [mailto:kelvin.edmison at alcatel-lucent.com]
> Sent: Thursday, January 29, 2015 5:48 AM
> To: Wang, Zhihong; Stephen Hemminger; Neil Horman
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> 
> 
> On 2015-01-27, 3:22 AM, "Wang, Zhihong"  wrote:
> 
> >
> >
> >> -Original Message-
> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of EDMISON,
> Kelvin
> >> (Kelvin)
> >> Sent: Friday, January 23, 2015 2:22 AM
> >> To: dev at dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> >>
> >>
> >>
> >> On 2015-01-21, 3:54 PM, "Neil Horman" 
> wrote:
> >>
> >> >On Wed, Jan 21, 2015 at 11:49:47AM -0800, Stephen Hemminger wrote:
> >> >> On Wed, 21 Jan 2015 13:26:20 + Bruce Richardson
> >> >>  wrote:
> >> >>
> [..trim...]
> >> >> One issue I have is that as a vendor we need to ship on binary,
> >> >>not different distributions  for each Intel chip variant. There is
> >> >>some support for multi-chip version functions  but only in latest
> >> >>Gcc which isn't in Debian stable. And the
> >>multi-chip
> >> >>version
> >> >> of functions is going to be more expensive than inlining. For some
> >> >>cases, I have  seen that the overhead of fancy instructions looks
> >> >>good but have
> >>nasty
> >> >>side effects
> >> >> like CPU stall and/or increased power consumption which turns of
> >>turbo
> >> >>boost.
> >> >>
> >> >>
> >> >> Distro's in general have the same problem with special case
> >> >>optimizations.
> >> >>
> >> >What we really need is to do something like borrow the alternatives
> >> >mechanism from the kernel so that we can dynamically replace
> >> >instructions at run time based on cpu flags.  That way we could make
> >> >the choice at run time, and wouldn't have to do alot of special case
> >> >jumping about.
> >> >Neil
> >>
> >> +1.
> >>
> >> I think it should be an anti-requirement that the build machine be
> >> the exact same chip as the deployment platform.
> >>
> >> I like the cpu flag inspection approach.  It would help in the case
> >>where  DPDK is in a VM and an odd set of CPU flags have been exposed.
> >>
> >> If that approach doesn't work though, then perhaps DPDK memcpy could
> >>go  through a benchmarking at app startup time and select the most
> >>performant  option out of a set, like mdraid's raid6 implementation
> >>does.  To give an  example, this is what my systems print out at boot
> >>time re: raid6  algorithm selection.
> >> raid6: sse2x13171 MB/s
> >> raid6: sse2x23925 MB/s
> >> raid6: sse2x44523 MB/s
> >> raid6: using algorithm sse2x4 (4523 MB/s)
> >>
> >> Regards,
> >>Kelvin
> >>
> >
> >Thanks for the proposal!
> >
> >For DPDK, performance is always the most important concern. We need to
> >utilize new architecture features to achieve that, so solution per arch
> >is necessary.
> >Even a few extra cycles can lead to bad performance if they're in a hot
> >loop.
> >For instance, let's assume DPDK takes 60 cycles to process a packet on
> >average, then 3 more cycles here means 5% performance drop.
> >
> >The dynamic solution is doable but with performance penalties, even if
> >it could be small. Also it may bring extra complexity, which can lead
> >to unpredictable behaviors and side effects.
> >For example, the dynamic solution won't have inline unrolling, which
> >can bring significant performance benefit for small copies with
> >constant length, like eth_addr.
> >
> >We can investigate the VM scenario more.
> >
> >Zhihong (John)
> 
> John,
> 
>   Thanks for taking the time to answer my newbie question. I deeply
> appreciate the attention paid to performance in DPDK. I have a follow-up
> though.
> 
> I'm trying to figure out what requirements this approach creates for the
> software build environment.  If we want to build optimized versions for
> Haswell, Ivy Bridge, Sandy Bridge, etc, does this mean that we must have one
> of each micro-architecture available for running the builds, or is there a way
> of cross-compiling for all micro-architectures from just one build
> environment?
> 
> Thanks,
>   Kelvin
> 

I'm not an expert in this, just some facts based on my test: The compile 
process depends on the compiler and the lib version.
So even on a machine that doesn't support the necessary ISA, it still should 
compile as long as gcc & glibc & etc have the support, only you'll get "Illegal 
instruction" trying launching the compiled binary.

Therefore if there's a way (worst case scenario: change flags manually) to make 
DPDK build process think that it's on a Haswell machine, it will produce 
Haswell binaries.

Zhihong (John)

[dpdk-dev] [PATCH 2/2] i40e: enable internal switch of pf

2015-01-29 Thread Qiu, Michael

On 1/29/2015 9:42 AM, Jingjing Wu wrote:
> This patch enables PF's internal switch by setting ALLOWLOOPBACK
> flag when VEB is created. With this patch, traffic from PF can be
> switched on the VEB.
>
> Signed-off-by: Jingjing Wu 
> ---
>  lib/librte_pmd_i40e/i40e_ethdev.c | 36 
>  1 file changed, 36 insertions(+)
>
> diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
> b/lib/librte_pmd_i40e/i40e_ethdev.c
> index fe758c2..94fd36c 100644
> --- a/lib/librte_pmd_i40e/i40e_ethdev.c
> +++ b/lib/librte_pmd_i40e/i40e_ethdev.c
> @@ -2854,6 +2854,40 @@ i40e_vsi_dump_bw_config(struct i40e_vsi *vsi)
>   return 0;
>  }
>  
> +/*
> + * i40e_enable_pf_lb
> + * @pf: pointer to the pf structure
> + *
> + * allow loopback on pf
> + */
> +static inline void
> +i40e_enable_pf_lb(struct i40e_pf *pf)
> +{
> + struct i40e_hw *hw = I40E_PF_TO_HW(pf);
> + struct i40e_vsi_context ctxt;
> + int ret;
> +
> + memset(&ctxt, 0, sizeof(ctxt));
> + ctxt.seid = pf->main_vsi_seid;
> + ctxt.pf_num = hw->pf_id;
> + ret = i40e_aq_get_vsi_params(hw, &ctxt, NULL);
> + if (ret) {
> + PMD_DRV_LOG(ERR, "couldn't get pf vsi config, err %d, aq_err 
> %d",
> + ret, hw->aq.asq_last_status);
> + return;
> + }
> + ctxt.flags = I40E_AQ_VSI_TYPE_PF;
> + ctxt.info.valid_sections =
> + rte_cpu_to_le_16(I40E_AQ_VSI_PROP_SWITCH_VALID);

Here does it need to be "|=" ? As ctxt.infowill be filled in
i40e_aq_get_vsi_params(), I don't know if it has other issue for
override this filled by "=".

Thanks,
Michael
> + ctxt.info.switch_id |=
> + rte_cpu_to_le_16(I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB);
> +
> + ret = i40e_aq_update_vsi_params(hw, &ctxt, NULL);
> + if (ret)
> + PMD_DRV_LOG(ERR, "update vsi switch failed, aq_err=%d\n",
> + hw->aq.asq_last_status);
> +}
> +
>  /* Setup a VSI */
>  struct i40e_vsi *
>  i40e_vsi_setup(struct i40e_pf *pf,
> @@ -2889,6 +2923,8 @@ i40e_vsi_setup(struct i40e_pf *pf,
>   PMD_DRV_LOG(ERR, "VEB setup failed");
>   return NULL;
>   }
> + /* set ALLOWLOOPBACk on pf, when veb is created */
> + i40e_enable_pf_lb(pf);
>   }
>  
>   vsi = rte_zmalloc("i40e_vsi", sizeof(struct i40e_vsi), 0);

[dpdk-dev] [PATCH v1 1/5] ethdev: add rx interrupt enable/disable functions

2015-01-29 Thread Qiu, Michael

On 1/28/2015 5:51 PM, Danny Zhou wrote:

Commit log is better for others to review the patch I think, it's
helpful for others to understand your patch.

> Signed-off-by: Danny Zhou 
> ---
>  lib/librte_ether/rte_ethdev.c | 45 ++
>  lib/librte_ether/rte_ethdev.h | 57 
> +++
>  2 files changed, 102 insertions(+)
>
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index ea3a1fb..dd66cd9 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -2825,6 +2825,51 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
>   }
>   rte_spinlock_unlock(&rte_eth_dev_cb_lock);
>  }
> +
> +int
> +rte_eth_dev_rx_queue_intr_enable(uint8_t port_id,
> + uint16_t queue_id)
> +{
> + struct rte_eth_dev *dev;
> +
> + if (port_id >= nb_ports) {
> + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> + return (-ENODEV);
> + }
> +
> + dev = &rte_eth_devices[port_id];
> + if (dev == NULL) {
> + PMD_DEBUG_TRACE("Invalid port device\n");
> + return (-ENODEV);
> + }
> +
> + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP);
> + (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id);

As callback function rx_queue_intr_enabl() has a return value, I think
it better to do a check, although all your implement always return a
value of zero.

BTW, better to add a blank line before last return.

Thanks,
Michael
> + return 0;
> +}
> +
> +int
> +rte_eth_dev_rx_queue_intr_disable(uint8_t port_id,
> + uint16_t queue_id)
> +{
> + struct rte_eth_dev *dev;
> +
> + if (port_id >= nb_ports) {
> + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> + return (-ENODEV);
> + }
> +
> + dev = &rte_eth_devices[port_id];
> + if (dev == NULL) {
> + PMD_DEBUG_TRACE("Invalid port device\n");
> + return (-ENODEV);
> + }
> +
> + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP);
> + (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id);
> + return 0;
> +}
> +
>  #ifdef RTE_NIC_BYPASS
>  int rte_eth_dev_bypass_init(uint8_t port_id)
>  {
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 1200c1c..c080039 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -848,6 +848,8 @@ struct rte_eth_fdir {
>  struct rte_intr_conf {
>   /** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */
>   uint16_t lsc;
> + /** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */
> + uint16_t rxq;
>  };
>  
>  /**
> @@ -1108,6 +1110,14 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev 
> *dev,
>   const struct rte_eth_txconf *tx_conf);
>  /**< @internal Setup a transmit queue of an Ethernet device. */
>  
> +typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev,
> + uint16_t rx_queue_id);
> +/**< @internal Enable interrupt of a receive queue of an Ethernet device. */
> +
> +typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev,
> + uint16_t rx_queue_id);
> +/**< @internal Disable interrupt of a receive queue of an Ethernet device. */
> +
>  typedef void (*eth_queue_release_t)(void *queue);
>  /**< @internal Release memory resources allocated by given RX/TX queue. */
>  
> @@ -1444,6 +1454,8 @@ struct eth_dev_ops {
>   eth_queue_start_t  tx_queue_start;/**< Start TX for a queue.*/
>   eth_queue_stop_t   tx_queue_stop;/**< Stop TX for a queue.*/
>   eth_rx_queue_setup_t   rx_queue_setup;/**< Set up device RX queue.*/
> + eth_rx_enable_intr_t   rx_queue_intr_enable; /**< Enable Rx queue 
> interrupt. */
> + eth_rx_disable_intr_t  rx_queue_intr_disable; /**< Disable Rx queue 
> interrupt.*/
>   eth_queue_release_trx_queue_release;/**< Release RX queue.*/
>   eth_rx_queue_count_t   rx_queue_count; /**< Get Rx queue count. */
>   eth_rx_descriptor_done_t   rx_descriptor_done;  /**< Check rxd DD bit */
> @@ -2810,6 +2822,51 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev 
> *dev,
>   enum rte_eth_event_type event);
>  
>  /**
> + * When there is no rx packet coming in Rx Queue for a long time, we can
> + * sleep lcore related to RX Queue for power saving, and enable rx interrupt
> + * to be triggered when rx packect arrives.
> + *
> + * The rte_eth_dev_rx_queue_intr_enable() function enables rx queue
> + * interrupt on specific rx queue of a port.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the receive queue from which to retrieve input packets.
> + *   The value must be in the range [0, nb_rx_queue

[dpdk-dev] [PATCH v2 0/4] DPDK memcpy optimization

2015-01-29 Thread Zhihong Wang

This patch set optimizes memcpy for DPDK for both SSE and AVX platforms.
It also extends memcpy test coverage with unaligned cases and more test points.

Optimization techniques are summarized below:

1. Utilize full cache bandwidth

2. Enforce aligned stores

3. Apply load address alignment based on architecture features

4. Make load/store address available as early as possible

5. General optimization techniques like inlining, branch reducing, prefetch 
pattern access

--
Changes in v2:

1. Reduced constant test cases in app/test/test_memcpy_perf.c for fast build

2. Modified macro definition for better code readability & safety

Zhihong Wang (4):
  app/test: Disabled VTA for memcpy test in app/test/Makefile
  app/test: Removed unnecessary test cases in app/test/test_memcpy.c
  app/test: Extended test coverage in app/test/test_memcpy_perf.c
  lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE
and AVX platforms

 app/test/Makefile  |   6 +
 app/test/test_memcpy.c |  52 +-
 app/test/test_memcpy_perf.c| 220 ---
 .../common/include/arch/x86/rte_memcpy.h   | 680 +++--
 4 files changed, 654 insertions(+), 304 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH v2 3/4] app/test: Extended test coverage in app/test/test_memcpy_perf.c

2015-01-29 Thread Zhihong Wang

Main code changes:

1. Added more typical data points for a thorough performance test

2. Added unaligned test cases since it's common in DPDK usage

Signed-off-by: Zhihong Wang 
---
 app/test/test_memcpy_perf.c | 220 +++-
 1 file changed, 138 insertions(+), 82 deletions(-)

diff --git a/app/test/test_memcpy_perf.c b/app/test/test_memcpy_perf.c
index 7809610..754828e 100644
--- a/app/test/test_memcpy_perf.c
+++ b/app/test/test_memcpy_perf.c
@@ -54,9 +54,10 @@
 /* List of buffer sizes to test */
 #if TEST_VALUE_RANGE == 0
 static size_t buf_sizes[] = {
-   0, 1, 7, 8, 9, 15, 16, 17, 31, 32, 33, 63, 64, 65, 127, 128, 129, 255,
-   256, 257, 320, 384, 511, 512, 513, 1023, 1024, 1025, 1518, 1522, 1600,
-   2048, 3072, 4096, 5120, 6144, 7168, 8192
+   1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 15, 16, 17, 31, 32, 33, 63, 64, 65, 127, 
128,
+   129, 191, 192, 193, 255, 256, 257, 319, 320, 321, 383, 384, 385, 447, 
448,
+   449, 511, 512, 513, 767, 768, 769, 1023, 1024, 1025, 1518, 1522, 1536, 
1600,
+   2048, 2560, 3072, 3584, 4096, 4608, 5120, 5632, 6144, 6656, 7168, 7680, 
8192
 };
 /* MUST be as large as largest packet size above */
 #define SMALL_BUFFER_SIZE   8192
@@ -78,7 +79,7 @@ static size_t buf_sizes[TEST_VALUE_RANGE];
 #define TEST_BATCH_SIZE 100

 /* Data is aligned on this many bytes (power of 2) */
-#define ALIGNMENT_UNIT  16
+#define ALIGNMENT_UNIT  32

 /*
  * Pointers used in performance tests. The two large buffers are for uncached
@@ -94,19 +95,19 @@ init_buffers(void)
 {
unsigned i;

-   large_buf_read = rte_malloc("memcpy", LARGE_BUFFER_SIZE, 
ALIGNMENT_UNIT);
+   large_buf_read = rte_malloc("memcpy", LARGE_BUFFER_SIZE + 
ALIGNMENT_UNIT, ALIGNMENT_UNIT);
if (large_buf_read == NULL)
goto error_large_buf_read;

-   large_buf_write = rte_malloc("memcpy", LARGE_BUFFER_SIZE, 
ALIGNMENT_UNIT);
+   large_buf_write = rte_malloc("memcpy", LARGE_BUFFER_SIZE + 
ALIGNMENT_UNIT, ALIGNMENT_UNIT);
if (large_buf_write == NULL)
goto error_large_buf_write;

-   small_buf_read = rte_malloc("memcpy", SMALL_BUFFER_SIZE, 
ALIGNMENT_UNIT);
+   small_buf_read = rte_malloc("memcpy", SMALL_BUFFER_SIZE + 
ALIGNMENT_UNIT, ALIGNMENT_UNIT);
if (small_buf_read == NULL)
goto error_small_buf_read;

-   small_buf_write = rte_malloc("memcpy", SMALL_BUFFER_SIZE, 
ALIGNMENT_UNIT);
+   small_buf_write = rte_malloc("memcpy", SMALL_BUFFER_SIZE + 
ALIGNMENT_UNIT, ALIGNMENT_UNIT);
if (small_buf_write == NULL)
goto error_small_buf_write;

@@ -140,25 +141,25 @@ free_buffers(void)

 /*
  * Get a random offset into large array, with enough space needed to perform
- * max copy size. Offset is aligned.
+ * max copy size. Offset is aligned, uoffset is used for unalignment setting.
  */
 static inline size_t
-get_rand_offset(void)
+get_rand_offset(size_t uoffset)
 {
-   return ((rte_rand() % (LARGE_BUFFER_SIZE - SMALL_BUFFER_SIZE)) &
-   ~(ALIGNMENT_UNIT - 1));
+   return (((rte_rand() % (LARGE_BUFFER_SIZE - SMALL_BUFFER_SIZE)) &
+   ~(ALIGNMENT_UNIT - 1)) + uoffset);
 }

 /* Fill in source and destination addresses. */
 static inline void
-fill_addr_arrays(size_t *dst_addr, int is_dst_cached,
-   size_t *src_addr, int is_src_cached)
+fill_addr_arrays(size_t *dst_addr, int is_dst_cached, size_t dst_uoffset,
+size_t *src_addr, int is_src_cached, size_t 
src_uoffset)
 {
unsigned int i;

for (i = 0; i < TEST_BATCH_SIZE; i++) {
-   dst_addr[i] = (is_dst_cached) ? 0 : get_rand_offset();
-   src_addr[i] = (is_src_cached) ? 0 : get_rand_offset();
+   dst_addr[i] = (is_dst_cached) ? dst_uoffset : 
get_rand_offset(dst_uoffset);
+   src_addr[i] = (is_src_cached) ? src_uoffset : 
get_rand_offset(src_uoffset);
}
 }

@@ -169,16 +170,17 @@ fill_addr_arrays(size_t *dst_addr, int is_dst_cached,
  */
 static void
 do_uncached_write(uint8_t *dst, int is_dst_cached,
-   const uint8_t *src, int is_src_cached, size_t size)
+ const uint8_t *src, int is_src_cached, size_t 
size)
 {
unsigned i, j;
size_t dst_addrs[TEST_BATCH_SIZE], src_addrs[TEST_BATCH_SIZE];

for (i = 0; i < (TEST_ITERATIONS / TEST_BATCH_SIZE); i++) {
-   fill_addr_arrays(dst_addrs, is_dst_cached,
-src_addrs, is_src_cached);
-   for (j = 0; j < TEST_BATCH_SIZE; j++)
+   fill_addr_arrays(dst_addrs, is_dst_cached, 0,
+src_addrs, is_src_cached, 0);
+   for (j = 0; j < TEST_BATCH_SIZE; j++) {
rte_memcpy(dst+dst_addrs[j], src+src_addrs[j], size);
+   }
}
 }

@@ -186,52 +188,111 @@ do_uncached_writ

[dpdk-dev] [PATCH v2 1/4] app/test: Disabled VTA for memcpy test in app/test/Makefile

2015-01-29 Thread Zhihong Wang

VTA is for debugging only, it increases compile time and binary size, 
especially when there're a lot of inlines.
So disable it since memcpy test contains a lot of inline calls.

Signed-off-by: Zhihong Wang 
---
 app/test/Makefile | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/app/test/Makefile b/app/test/Makefile
index 4311f96..94dbadf 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -143,6 +143,12 @@ CFLAGS_test_kni.o += -Wno-deprecated-declarations
 endif
 CFLAGS += -D_GNU_SOURCE

+# Disable VTA for memcpy test
+ifeq ($(CC), gcc)
+CFLAGS_test_memcpy.o += -fno-var-tracking-assignments
+CFLAGS_test_memcpy_perf.o += -fno-var-tracking-assignments
+endif
+
 # this application needs libraries first
 DEPDIRS-y += lib

-- 
1.9.3

[dpdk-dev] [PATCH v2 2/4] app/test: Removed unnecessary test cases in app/test/test_memcpy.c

2015-01-29 Thread Zhihong Wang

Removed unnecessary test cases for base move functions since the function 
"func_test" covers them all.

Signed-off-by: Zhihong Wang 
---
 app/test/test_memcpy.c | 52 +-
 1 file changed, 1 insertion(+), 51 deletions(-)

diff --git a/app/test/test_memcpy.c b/app/test/test_memcpy.c
index 56b8e1e..b2bb4e0 100644
--- a/app/test/test_memcpy.c
+++ b/app/test/test_memcpy.c
@@ -78,56 +78,9 @@ static size_t buf_sizes[TEST_VALUE_RANGE];
 #define TEST_BATCH_SIZE 100

 /* Data is aligned on this many bytes (power of 2) */
-#define ALIGNMENT_UNIT  16
+#define ALIGNMENT_UNIT  32


-
-/* Structure with base memcpy func pointer, and number of bytes it copies */
-struct base_memcpy_func {
-   void (*func)(uint8_t *dst, const uint8_t *src);
-   unsigned size;
-};
-
-/* To create base_memcpy_func structure entries */
-#define BASE_FUNC(n) {rte_mov##n, n}
-
-/* Max number of bytes that can be copies with a "base" memcpy functions */
-#define MAX_BASE_FUNC_SIZE 256
-
-/*
- * Test the "base" memcpy functions, that a copy fixed number of bytes.
- */
-static int
-base_func_test(void)
-{
-   const struct base_memcpy_func base_memcpy_funcs[6] = {
-   BASE_FUNC(16),
-   BASE_FUNC(32),
-   BASE_FUNC(48),
-   BASE_FUNC(64),
-   BASE_FUNC(128),
-   BASE_FUNC(256),
-   };
-   unsigned i, j;
-   unsigned num_funcs = sizeof(base_memcpy_funcs) / 
sizeof(base_memcpy_funcs[0]);
-   uint8_t dst[MAX_BASE_FUNC_SIZE];
-   uint8_t src[MAX_BASE_FUNC_SIZE];
-
-   for (i = 0; i < num_funcs; i++) {
-   unsigned size = base_memcpy_funcs[i].size;
-   for (j = 0; j < size; j++) {
-   dst[j] = 0;
-   src[j] = (uint8_t) rte_rand();
-   }
-   base_memcpy_funcs[i].func(dst, src);
-   for (j = 0; j < size; j++)
-   if (dst[j] != src[j])
-   return -1;
-   }
-
-   return 0;
-}
-
 /*
  * Create two buffers, and initialise one with random values. These are copied
  * to the second buffer and then compared to see if the copy was successful.
@@ -218,9 +171,6 @@ test_memcpy(void)
ret = func_test();
if (ret != 0)
return -1;
-   ret = base_func_test();
-   if (ret != 0)
-   return -1;
return 0;
 }

-- 
1.9.3

[dpdk-dev] [PATCH v2 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms

2015-01-29 Thread Zhihong Wang

Main code changes:

1. Differentiate architectural features based on CPU flags

a. Implement separated move functions for SSE/AVX/AVX2 to make full 
utilization of cache bandwidth

b. Implement separated copy flow specifically optimized for target 
architecture

2. Rewrite the memcpy function "rte_memcpy"

a. Add store aligning

b. Add load aligning based on architectural features

c. Put block copy loop into inline move functions for better control of 
instruction order

d. Eliminate unnecessary MOVs

3. Rewrite the inline move functions

a. Add move functions for unaligned load cases

b. Change instruction order in copy loops for better pipeline utilization

c. Use intrinsics instead of assembly code

4. Remove slow glibc call for constant copies

Signed-off-by: Zhihong Wang 
---
 .../common/include/arch/x86/rte_memcpy.h   | 680 +++--
 1 file changed, 509 insertions(+), 171 deletions(-)

diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h 
b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
index fb9eba8..7b2d382 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
@@ -34,166 +34,189 @@
 #ifndef _RTE_MEMCPY_X86_64_H_
 #define _RTE_MEMCPY_X86_64_H_

+/**
+ * @file
+ *
+ * Functions for SSE/AVX/AVX2 implementation of memcpy().
+ */
+
+#include 
 #include 
 #include 
-#include 
+#include 

 #ifdef __cplusplus
 extern "C" {
 #endif

-#include "generic/rte_memcpy.h"
+/**
+ * Copy bytes from one location to another. The locations must not overlap.
+ *
+ * @note This is implemented as a macro, so it's address should not be taken
+ * and care is needed as parameter expressions may be evaluated multiple times.
+ *
+ * @param dst
+ *   Pointer to the destination of the data.
+ * @param src
+ *   Pointer to the source data.
+ * @param n
+ *   Number of bytes to copy.
+ * @return
+ *   Pointer to the destination data.
+ */
+static inline void *
+rte_memcpy(void *dst, const void *src, size_t n) 
__attribute__((always_inline));

-#ifdef __INTEL_COMPILER
-#pragma warning(disable:593) /* Stop unused variable warning (reg_a etc). */
-#endif
+#ifdef RTE_MACHINE_CPUFLAG_AVX2

+/**
+ * AVX2 implementation below
+ */
+
+/**
+ * Copy 16 bytes from one location to another,
+ * locations should not overlap.
+ */
 static inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
-   __m128i reg_a;
-   asm volatile (
-   "movdqu (%[src]), %[reg_a]\n\t"
-   "movdqu %[reg_a], (%[dst])\n\t"
-   : [reg_a] "=x" (reg_a)
-   : [src] "r" (src),
- [dst] "r"(dst)
-   : "memory"
-   );
+   __m128i xmm0;
+
+   xmm0 = _mm_loadu_si128((const __m128i *)src);
+   _mm_storeu_si128((__m128i *)dst, xmm0);
 }

+/**
+ * Copy 32 bytes from one location to another,
+ * locations should not overlap.
+ */
 static inline void
 rte_mov32(uint8_t *dst, const uint8_t *src)
 {
-   __m128i reg_a, reg_b;
-   asm volatile (
-   "movdqu (%[src]), %[reg_a]\n\t"
-   "movdqu 16(%[src]), %[reg_b]\n\t"
-   "movdqu %[reg_a], (%[dst])\n\t"
-   "movdqu %[reg_b], 16(%[dst])\n\t"
-   : [reg_a] "=x" (reg_a),
- [reg_b] "=x" (reg_b)
-   : [src] "r" (src),
- [dst] "r"(dst)
-   : "memory"
-   );
-}
+   __m256i ymm0;

-static inline void
-rte_mov48(uint8_t *dst, const uint8_t *src)
-{
-   __m128i reg_a, reg_b, reg_c;
-   asm volatile (
-   "movdqu (%[src]), %[reg_a]\n\t"
-   "movdqu 16(%[src]), %[reg_b]\n\t"
-   "movdqu 32(%[src]), %[reg_c]\n\t"
-   "movdqu %[reg_a], (%[dst])\n\t"
-   "movdqu %[reg_b], 16(%[dst])\n\t"
-   "movdqu %[reg_c], 32(%[dst])\n\t"
-   : [reg_a] "=x" (reg_a),
- [reg_b] "=x" (reg_b),
- [reg_c] "=x" (reg_c)
-   : [src] "r" (src),
- [dst] "r"(dst)
-   : "memory"
-   );
+   ymm0 = _mm256_loadu_si256((const __m256i *)src);
+   _mm256_storeu_si256((__m256i *)dst, ymm0);
 }

+/**
+ * Copy 64 bytes from one location to another,
+ * locations should not overlap.
+ */
 static inline void
 rte_mov64(uint8_t *dst, const uint8_t *src)
 {
-   __m128i reg_a, reg_b, reg_c, reg_d;
-   asm volatile (
-   "movdqu (%[src]), %[reg_a]\n\t"
-   "movdqu 16(%[src]), %[reg_b]\n\t"
-   "movdqu 32(%[src]), %[reg_c]\n\t"
-   "movdqu 48(%[src]), %[reg_d]\n\t"
-   "movdqu %[reg_a], (%[dst])\n\t"
-   "movdqu %[reg_b], 16(%[dst])\n\t"
-   "movdqu %[reg_c], 32(%[dst])\n\t"
-   "movdqu %[reg_d], 48(%[dst])\n\t"
-   : [reg_a] "=x" (reg_a),
- [reg_b] "=x" (reg_b),
- [reg_c] "=x" (reg_c),
-

[dpdk-dev] [PATCH 00/17] unified packet type

2015-01-29 Thread Helin Zhang

Currently only 6 bits which are stored in ol_flags are used to indicate
the packet types. This is not enough, as some NIC hardware can recognize
quite a lot of packet types, e.g i40e hardware can recognize more than 150
packet types. Hiding those packet types hides hardware offload capabilities
which could be quite useful for improving performance and for end users.
So an unified packet types are needed to support all possible PMDs. Recently
a 16 bits packet_type field has been added in mbuf header which can be used
for this purpose. In addition, all packet types stored in ol_flag field
should be deleted at all, and 6 bits of ol_flags can be save as the benifit.

Initially, 16 bits of packet_type can be divided into several sub fields to
indicate different packet type information of a packet. The initial design
is to divide those bits into 4 fields for L3 types, tunnel types, inner L3
types and L4 types. All PMDs should translate the offloaded packet types
into this 4 fields of information, for user applications.

Helin Zhang (17):
  mbuf: add definitions of unified packet types
  e1000: support of unified packet type
  ixgbe: support of unified packet type
  ixgbe: support of unified packet type
  i40e: support of unified packet type
  bond: support of unified packet type
  enic: support of unified packet type
  vmxnet3: support of unified packet type
  app/test-pipeline: support of unified packet type
  app/test-pmd: support of unified packet type
  app/test: support of unified packet type
  examples/ip_fragmentation: support of unified packet type
  examples/ip_reassembly: support of unified packet type
  examples/l3fwd-acl: support of unified packet type
  examples/l3fwd-power: support of unified packet type
  examples/l3fwd: support of unified packet type
  mbuf: remove old packet type bit masks for ol_flags

 app/test-pipeline/pipeline_hash.c  |   4 +-
 app/test-pmd/csumonly.c|   6 +-
 app/test-pmd/rxonly.c  |   9 +-
 app/test/packet_burst_generator.c  |  10 +-
 examples/ip_fragmentation/main.c   |   7 +-
 examples/ip_reassembly/main.c  |   7 +-
 examples/l3fwd-acl/main.c  |  19 +-
 examples/l3fwd-power/main.c|   5 +-
 examples/l3fwd/main.c  |  64 +--
 lib/librte_mbuf/rte_mbuf.c |   6 -
 lib/librte_mbuf/rte_mbuf.h |  84 +++-
 lib/librte_pmd_bond/rte_eth_bond_pmd.c |   9 +-
 lib/librte_pmd_e1000/igb_rxtx.c|  95 +++-
 lib/librte_pmd_enic/enic_main.c|  14 +-
 lib/librte_pmd_i40e/i40e_rxtx.c| 778 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c  | 141 --
 lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c  |  39 +-
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c  |   4 +-
 18 files changed, 865 insertions(+), 436 deletions(-)

-- 
1.8.1.4

[dpdk-dev] [PATCH 01/17] mbuf: add definitions of unified packet types

2015-01-29 Thread Helin Zhang

As there are only 6 bit flags in ol_flags for indicating packet types,
which is not enough to describe all the possible packet types hardware
can recognize. For example, i40e hardware can recognize more than 150
packet types. Unified packet type is composed of tunnel type, L3 type,
L4 type and inner L3 type fields, and can be stored in 16 bits mbuf
field of 'packet_type'.

Signed-off-by: Helin Zhang 
Signed-off-by: Cunming Liang 
Signed-off-by: Jijiang Liu 
---
 lib/librte_mbuf/rte_mbuf.h | 74 ++
 1 file changed, 74 insertions(+)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 16059c6..94ae344 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -165,6 +165,80 @@ extern "C" {
 /* Use final bit of flags to indicate a control mbuf */
 #define CTRL_MBUF_FLAG   (1ULL << 63) /**< Mbuf contains control data */

+/*
+ * Sixteen bits are divided into several fields to mark packet types. Note that
+ * each field is indexical.
+ * - Bit 3:0 is for tunnel types.
+ * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types.
+ * - Bit 10:8 is for L4 types. It can also be used for inner L4 types for
+ *   tunneling packets.
+ * - Bit 13:11 is for inner L3 types.
+ * - Bit 15:14 is reserved.
+ *
+ * To be compitable with Vector PMD, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT,
+ * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP
+ * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous 7 bits.
+ *
+ * Note that L3 types values are selected for checking IPV4/IPV6 header from
+ * performance point of view. Reading annotations of RTE_ETH_IS_IPV4_HDR and
+ * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 type values.
+ */
+#define RTE_PTYPE_UNKNOWN   0x /* 0b */
+/* bit 3:0 for tunnel types */
+#define RTE_PTYPE_TUNNEL_IP 0x0001 /* 0b0001 */
+#define RTE_PTYPE_TUNNEL_TCP0x0002 /* 0b0010 */
+#define RTE_PTYPE_TUNNEL_UDP0x0003 /* 0b0011 */
+#define RTE_PTYPE_TUNNEL_GRE0x0004 /* 0b0100 */
+#define RTE_PTYPE_TUNNEL_VXLAN  0x0005 /* 0b0101 */
+#define RTE_PTYPE_TUNNEL_NVGRE  0x0006 /* 0b0110 */
+#define RTE_PTYPE_TUNNEL_GENEVE 0x0007 /* 0b0111 */
+#define RTE_PTYPE_TUNNEL_GRENAT 0x0008 /* 0b1000 */
+#define RTE_PTYPE_TUNNEL_GRENAT_MAC 0x0009 /* 0b1001 */
+#define RTE_PTYPE_TUNNEL_GRENAT_MACVLAN 0x000a /* 0b1010 */
+#define RTE_PTYPE_TUNNEL_MASK   0x000f /* 0b */
+/* bit 7:4 for L3 types */
+#define RTE_PTYPE_L3_IPV4   0x0010 /* 0b0001 */
+#define RTE_PTYPE_L3_IPV4_EXT   0x0030 /* 0b0011 */
+#define RTE_PTYPE_L3_IPV6   0x0040 /* 0b0100 */
+#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN   0x0090 /* 0b1001 */
+#define RTE_PTYPE_L3_IPV6_EXT   0x00c0 /* 0b1100 */
+#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN   0x00e0 /* 0b1110 */
+#define RTE_PTYPE_L3_MASK   0x00f0 /* 0b */
+/* bit 10:8 for L4 types */
+#define RTE_PTYPE_L4_TCP0x0100 /* 0b0001 */
+#define RTE_PTYPE_L4_UDP0x0200 /* 0b0010 */
+#define RTE_PTYPE_L4_FRAG   0x0300 /* 0b0011 */
+#define RTE_PTYPE_L4_SCTP   0x0400 /* 0b0100 */
+#define RTE_PTYPE_L4_ICMP   0x0500 /* 0b0101 */
+#define RTE_PTYPE_L4_NONFRAG0x0600 /* 0b0110 */
+#define RTE_PTYPE_L4_MASK   0x0700 /* 0b0111 */
+/* bit 13:11 for inner L3 types */
+#define RTE_PTYPE_INNER_L3_IPV4 0x0800 /* 0b1000 */
+#define RTE_PTYPE_INNER_L3_IPV4_EXT 0x1000 /* 0b0001 */
+#define RTE_PTYPE_INNER_L3_IPV6 0x1800 /* 0b00011000 */
+#define RTE_PTYPE_INNER_L3_IPV6_EXT 0x2000 /* 0b0010 */
+#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x2800 /* 0b00101000 */
+#define RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x3000 /* 0b0011 */
+#define RTE_PTYPE_INNER_L3_MASK 0x3800 /* 0b00111000 */
+/* bit 15:14 reserved */
+
+/**
+ * Check if the (outer) L3 header is IPv4. To avoid comparing IPv4 types one by
+ * one, bit 4 is selected to be used for IPv4 only. Then checking bit 4 can
+ * determin if it is an IPV4 packet.
+ */
+#define  RTE_ETH_IS_IPV4_HDR(ptype) ((ptype) & RTE_PTYPE_L3_IPV4)
+
+/**
+ * Check if the (outer) L3 header is IPv4. To avoid comparing IPv4 types one by
+ * one, bit 6 is selected to be used for IPv4 only. Then checking bit 6 can
+ * determin if it is an IPV4 packet.
+ */
+#define  RTE_ETH_IS_IPV6_HDR(ptype) ((ptype) &

[dpdk-dev] [PATCH 02/17] e1000: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks of packet type for
ol_flags are replaced by unified packet type.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_e1000/igb_rxtx.c | 95 ++---
 1 file changed, 80 insertions(+), 15 deletions(-)

diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index 5c394a9..1ffb39e 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -602,17 +602,82 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
  *  RX functions
  *
  **/
+#define IGB_PACKET_TYPE_IPV4  0X01
+#define IGB_PACKET_TYPE_IPV4_TCP  0X11
+#define IGB_PACKET_TYPE_IPV4_UDP  0X21
+#define IGB_PACKET_TYPE_IPV4_SCTP 0X41
+#define IGB_PACKET_TYPE_IPV4_EXT  0X03
+#define IGB_PACKET_TYPE_IPV4_EXT_SCTP 0X43
+#define IGB_PACKET_TYPE_IPV6  0X04
+#define IGB_PACKET_TYPE_IPV6_TCP  0X14
+#define IGB_PACKET_TYPE_IPV6_UDP  0X24
+#define IGB_PACKET_TYPE_IPV6_EXT  0X0C
+#define IGB_PACKET_TYPE_IPV6_EXT_TCP  0X1C
+#define IGB_PACKET_TYPE_IPV6_EXT_UDP  0X2C
+#define IGB_PACKET_TYPE_IPV4_IPV6 0X05
+#define IGB_PACKET_TYPE_IPV4_IPV6_TCP 0X15
+#define IGB_PACKET_TYPE_IPV4_IPV6_UDP 0X25
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT 0X0D
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT_TCP 0X1D
+#define IGB_PACKET_TYPE_IPV4_IPV6_EXT_UDP 0X2D
+#define IGB_PACKET_TYPE_MAX   0X80
+#define IGB_PACKET_TYPE_MASK  0X7F
+#define IGB_PACKET_TYPE_SHIFT 0X04
+static inline uint16_t
+igb_rxd_pkt_info_to_pkt_type(uint16_t pkt_info)
+{
+   static const uint16_t
+   ptype_table[IGB_PACKET_TYPE_MAX] __rte_cache_aligned = {
+   [IGB_PACKET_TYPE_IPV4] = RTE_PTYPE_L3_IPV4,
+   [IGB_PACKET_TYPE_IPV4_EXT] = RTE_PTYPE_L3_IPV4_EXT,
+   [IGB_PACKET_TYPE_IPV6] = RTE_PTYPE_L3_IPV6,
+   [IGB_PACKET_TYPE_IPV4_IPV6] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_TUNNEL_IP | RTE_PTYPE_INNER_L3_IPV6,
+   [IGB_PACKET_TYPE_IPV6_EXT] = RTE_PTYPE_L3_IPV6_EXT,
+   [IGB_PACKET_TYPE_IPV4_IPV6_EXT] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6_EXT,
+   [IGB_PACKET_TYPE_IPV4_TCP] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_L4_TCP,
+   [IGB_PACKET_TYPE_IPV6_TCP] = RTE_PTYPE_L3_IPV6 |
+   RTE_PTYPE_L4_TCP,
+   [IGB_PACKET_TYPE_IPV4_IPV6_TCP] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_TUNNEL_IP | RTE_PTYPE_INNER_L3_IPV6 |
+   RTE_PTYPE_L4_TCP,
+   [IGB_PACKET_TYPE_IPV6_EXT_TCP] = RTE_PTYPE_L3_IPV6_EXT |
+   RTE_PTYPE_L4_TCP,
+   [IGB_PACKET_TYPE_IPV4_IPV6_EXT_TCP] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_L4_TCP,
+   [IGB_PACKET_TYPE_IPV4_UDP] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_L4_UDP,
+   [IGB_PACKET_TYPE_IPV6_UDP] = RTE_PTYPE_L3_IPV6 |
+   RTE_PTYPE_L4_UDP,
+   [IGB_PACKET_TYPE_IPV4_IPV6_UDP] =  RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_TUNNEL_IP | RTE_PTYPE_INNER_L3_IPV6 |
+   RTE_PTYPE_L4_UDP,
+   [IGB_PACKET_TYPE_IPV6_EXT_UDP] = RTE_PTYPE_L3_IPV6_EXT |
+   RTE_PTYPE_L4_UDP,
+   [IGB_PACKET_TYPE_IPV4_IPV6_EXT_UDP] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_L4_UDP,
+   [IGB_PACKET_TYPE_IPV4_SCTP] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_L4_SCTP,
+   [IGB_PACKET_TYPE_IPV4_EXT_SCTP] = RTE_PTYPE_L3_IPV4_EXT |
+   RTE_PTYPE_L4_SCTP,
+   };
+   if (unlikely(pkt_info & E1000_RXDADV_PKTTYPE_ETQF))
+   return RTE_PTYPE_UNKNOWN;
+
+   pkt_info = (pkt_info >> IGB_PACKET_TYPE_SHIFT) & IGB_PACKET_TYPE_MASK;
+
+   return ptype_table[pkt_info];
+}
+
 static inline uint64_t
 rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
 {
-   uint64_t pkt_flags;
-
-   static uint64_t ip_pkt_types_map[16] = {
-   0, PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT, PKT_RX_IPV4_HDR_EXT,
-   PKT_RX_IPV6_HDR, 0, 0, 0,
-   PKT_RX_IPV6_HDR_EXT, 0, 0, 0,
-   PKT_RX_IPV6_HDR_EXT, 0, 0, 0,
-   };
+   uint64_t pkt_flags = ((hl_tp_rs & 0x0F) == 0) ?  0 : PKT_RX_RSS_HASH;

 #if defined(RTE_LIBRTE_IEEE1588)
static uint32_t ip_pkt_etqf_map[8] = {
@@ -620,14 +685,10 @@ rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
0, 0, 0, 0,
};

-   pkt_flags = (hl_tp_rs & E1000_RXDADV_PKTTYPE_ETQF) ?
-

[dpdk-dev] [PATCH 03/17] ixgbe: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks of packet type for
ol_flags are replaced by unified packet type.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 141 +-
 1 file changed, 107 insertions(+), 34 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index e6766b3..aefb4e9 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -866,40 +866,102 @@ end_of_tx:
  *  RX functions
  *
  **/
-static inline uint64_t
-rx_desc_hlen_type_rss_to_pkt_flags(uint32_t hl_tp_rs)
+#define IXGBE_PACKET_TYPE_IPV4  0X01
+#define IXGBE_PACKET_TYPE_IPV4_TCP  0X11
+#define IXGBE_PACKET_TYPE_IPV4_UDP  0X21
+#define IXGBE_PACKET_TYPE_IPV4_SCTP 0X41
+#define IXGBE_PACKET_TYPE_IPV4_EXT  0X03
+#define IXGBE_PACKET_TYPE_IPV4_EXT_SCTP 0X43
+#define IXGBE_PACKET_TYPE_IPV6  0X04
+#define IXGBE_PACKET_TYPE_IPV6_TCP  0X14
+#define IXGBE_PACKET_TYPE_IPV6_UDP  0X24
+#define IXGBE_PACKET_TYPE_IPV6_EXT  0X0C
+#define IXGBE_PACKET_TYPE_IPV6_EXT_TCP  0X1C
+#define IXGBE_PACKET_TYPE_IPV6_EXT_UDP  0X2C
+#define IXGBE_PACKET_TYPE_IPV4_IPV6 0X05
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_TCP 0X15
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_UDP 0X25
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT 0X0D
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_TCP 0X1D
+#define IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_UDP 0X2D
+#define IXGBE_PACKET_TYPE_MAX   0X80
+#define IXGBE_PACKET_TYPE_MASK  0X7F
+#define IXGBE_PACKET_TYPE_SHIFT 0X04
+static inline uint16_t
+ixgbe_rxd_pkt_info_to_pkt_type(uint16_t pkt_info)
 {
-   uint64_t pkt_flags;
-
-   static uint64_t ip_pkt_types_map[16] = {
-   0, PKT_RX_IPV4_HDR, PKT_RX_IPV4_HDR_EXT, PKT_RX_IPV4_HDR_EXT,
-   PKT_RX_IPV6_HDR, 0, 0, 0,
-   PKT_RX_IPV6_HDR_EXT, 0, 0, 0,
-   PKT_RX_IPV6_HDR_EXT, 0, 0, 0,
+   static const uint16_t
+   ptype_table[IXGBE_PACKET_TYPE_MAX] __rte_cache_aligned = {
+   [IXGBE_PACKET_TYPE_IPV4] = RTE_PTYPE_L3_IPV4,
+   [IXGBE_PACKET_TYPE_IPV4_EXT] = RTE_PTYPE_L3_IPV4_EXT,
+   [IXGBE_PACKET_TYPE_IPV6] = RTE_PTYPE_L3_IPV6,
+   [IXGBE_PACKET_TYPE_IPV4_IPV6] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_TUNNEL_IP | RTE_PTYPE_INNER_L3_IPV6,
+   [IXGBE_PACKET_TYPE_IPV6_EXT] = RTE_PTYPE_L3_IPV6_EXT,
+   [IXGBE_PACKET_TYPE_IPV4_IPV6_EXT] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6_EXT,
+   [IXGBE_PACKET_TYPE_IPV4_TCP] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_L4_TCP,
+   [IXGBE_PACKET_TYPE_IPV6_TCP] = RTE_PTYPE_L3_IPV6 |
+   RTE_PTYPE_L4_TCP,
+   [IXGBE_PACKET_TYPE_IPV4_IPV6_TCP] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_TUNNEL_IP | RTE_PTYPE_INNER_L3_IPV6 |
+   RTE_PTYPE_L4_TCP,
+   [IXGBE_PACKET_TYPE_IPV6_EXT_TCP] = RTE_PTYPE_L3_IPV6_EXT |
+   RTE_PTYPE_L4_TCP,
+   [IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_TCP] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_L4_TCP,
+   [IXGBE_PACKET_TYPE_IPV4_UDP] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_L4_UDP,
+   [IXGBE_PACKET_TYPE_IPV6_UDP] = RTE_PTYPE_L3_IPV6 |
+   RTE_PTYPE_L4_UDP,
+   [IXGBE_PACKET_TYPE_IPV4_IPV6_UDP] =  RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_TUNNEL_IP | RTE_PTYPE_INNER_L3_IPV6 |
+   RTE_PTYPE_L4_UDP,
+   [IXGBE_PACKET_TYPE_IPV6_EXT_UDP] = RTE_PTYPE_L3_IPV6_EXT |
+   RTE_PTYPE_L4_UDP,
+   [IXGBE_PACKET_TYPE_IPV4_IPV6_EXT_UDP] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_TUNNEL_IP |
+   RTE_PTYPE_INNER_L3_IPV6_EXT | RTE_PTYPE_L4_UDP,
+   [IXGBE_PACKET_TYPE_IPV4_SCTP] = RTE_PTYPE_L3_IPV4 |
+   RTE_PTYPE_L4_SCTP,
+   [IXGBE_PACKET_TYPE_IPV4_EXT_SCTP] = RTE_PTYPE_L3_IPV4_EXT |
+   RTE_PTYPE_L4_SCTP,
};
+   if (unlikely(pkt_info & IXGBE_RXDADV_PKTTYPE_ETQF))
+   return RTE_PTYPE_UNKNOWN;

-   static uint64_t ip_rss_types_map[16] = {
+   pkt_info = (pkt_info >> IXGBE_PACKET_TYPE_SHIFT) &
+   IXGBE_PACKET_TYPE_MASK;
+
+   return ptype_table[pkt_info];
+}
+
+static inline uint64_t
+ixgbe_rxd_pkt_info_to_pkt_flags(uint16_t pkt_info)
+{
+   static uint64_t ip_rss_types_map[16] __rte_cache_aligned = {
0, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH, PKT_RX_RSS_HASH,
0

[dpdk-dev] [PATCH 04/17] ixgbe: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks of packet type for
ol_flags are replaced by unified packet type for Vector PMD.

Signed-off-by: Cunming Liang 
Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c | 39 +++
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
index b54cb19..b3cf7dd 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
@@ -134,44 +134,35 @@ ixgbe_rxq_rearm(struct igb_rx_queue *rxq)
  */
 #ifdef RTE_IXGBE_RX_OLFLAGS_ENABLE

-#define OLFLAGS_MASK ((uint16_t)(PKT_RX_VLAN_PKT | PKT_RX_IPV4_HDR |\
-PKT_RX_IPV4_HDR_EXT | PKT_RX_IPV6_HDR |\
-PKT_RX_IPV6_HDR_EXT))
-#define OLFLAGS_MASK_V   (((uint64_t)OLFLAGS_MASK << 48) | \
- ((uint64_t)OLFLAGS_MASK << 32) | \
- ((uint64_t)OLFLAGS_MASK << 16) | \
- ((uint64_t)OLFLAGS_MASK))
-#define PTYPE_SHIFT(1)
+#define OLFLAGS_MASK_V   (((uint64_t)PKT_RX_VLAN_PKT << 48) | \
+ ((uint64_t)PKT_RX_VLAN_PKT << 32) | \
+ ((uint64_t)PKT_RX_VLAN_PKT << 16) | \
+ ((uint64_t)PKT_RX_VLAN_PKT))
 #define VTAG_SHIFT (3)

 static inline void
 desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 {
-   __m128i ptype0, ptype1, vtag0, vtag1;
+   __m128i vtag0, vtag1;
union {
uint16_t e[4];
uint64_t dword;
} vol;

-   ptype0 = _mm_unpacklo_epi16(descs[0], descs[1]);
-   ptype1 = _mm_unpacklo_epi16(descs[2], descs[3]);
vtag0 = _mm_unpackhi_epi16(descs[0], descs[1]);
vtag1 = _mm_unpackhi_epi16(descs[2], descs[3]);

-   ptype1 = _mm_unpacklo_epi32(ptype0, ptype1);
vtag1 = _mm_unpacklo_epi32(vtag0, vtag1);
-
-   ptype1 = _mm_slli_epi16(ptype1, PTYPE_SHIFT);
vtag1 = _mm_srli_epi16(vtag1, VTAG_SHIFT);

-   ptype1 = _mm_or_si128(ptype1, vtag1);
-   vol.dword = _mm_cvtsi128_si64(ptype1) & OLFLAGS_MASK_V;
+   vol.dword = _mm_cvtsi128_si64(vtag1) & OLFLAGS_MASK_V;

rx_pkts[0]->ol_flags = vol.e[0];
rx_pkts[1]->ol_flags = vol.e[1];
rx_pkts[2]->ol_flags = vol.e[2];
rx_pkts[3]->ol_flags = vol.e[3];
 }
+
 #else
 #define desc_to_olflags_v(desc, rx_pkts) do {} while (0)
 #endif
@@ -204,6 +195,8 @@ _recv_raw_pkts_vec(struct igb_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
0/* ignore pkt_type field */
);
__m128i dd_check, eop_check;
+   __m128i desc_mask = _mm_set_epi32(0x, 0x,
+ 0x, 0x07F0);

if (unlikely(nb_pkts < RTE_IXGBE_VPMD_RX_BURST))
return 0;
@@ -239,7 +232,8 @@ _recv_raw_pkts_vec(struct igb_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
13, 12,  /* octet 12~13, low 16 bits pkt_len */
13, 12,  /* octet 12~13, 16 bits data_len */
-   0xFF, 0xFF   /* skip pkt_type field */
+   1,   /* octet 1, 8 bits pkt_type field */
+   0/* octet 0, 4 bits offset 4 pkt_type field */
);

/* Cache is empty -> need to scan the buffer rings, but first move
@@ -248,6 +242,7 @@ _recv_raw_pkts_vec(struct igb_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,

/*
 * A. load 4 packet in one loop
+* [A*. mask out 4 unused dirty field in desc]
 * B. copy 4 mbuf point from swring to rx_pkts
 * C. calc the number of DD bits among the 4 packets
 * [C*. extract the end-of-packet bit, if requested]
@@ -289,6 +284,14 @@ _recv_raw_pkts_vec(struct igb_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
/* B.2 copy 2 mbuf point into rx_pkts  */
_mm_storeu_si128((__m128i *)&rx_pkts[pos+2], mbp2);

+   /* A* mask out 0~3 bits RSS type */
+   descs[3] = _mm_and_si128(descs[3], desc_mask);
+   descs[2] = _mm_and_si128(descs[2], desc_mask);
+
+   /* A* mask out 0~3 bits RSS type */
+   descs[1] = _mm_and_si128(descs[1], desc_mask);
+   descs[0] = _mm_and_si128(descs[0], desc_mask);
+
/* avoid compiler reorder optimization */
rte_compiler_barrier();

@@ -301,7 +304,7 @@ _recv_raw_pkts_vec(struct igb_rx_queue *rxq, struct 
rte_mbuf **rx_pkts,
/* C.1 4=>2 filter staterr info only */
sterr_tmp1 = _mm_unpackhi_epi32(descs[1], descs[0]);

-   /* set ol_flags with packet type and vlan tag */
+   /* set ol_flags with vlan packet type */
desc_to_olflags_v(descs, &rx_pkts[pos]);

[dpdk-dev] [PATCH 05/17] i40e: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks of packet type for
ol_flags are replaced by unified packet type.

Signed-off-by: Helin Zhang 
Signed-off-by: Jijiang Liu 
---
 lib/librte_pmd_i40e/i40e_rxtx.c | 778 ++--
 1 file changed, 504 insertions(+), 274 deletions(-)

diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c
index 2beae3c..68029c3 100644
--- a/lib/librte_pmd_i40e/i40e_rxtx.c
+++ b/lib/librte_pmd_i40e/i40e_rxtx.c
@@ -146,272 +146,503 @@ i40e_rxd_error_to_pkt_flags(uint64_t qword)
return flags;
 }

-/* Translate pkt types to pkt flags */
-static inline uint64_t
-i40e_rxd_ptype_to_pkt_flags(uint64_t qword)
+/* For each value it means, datasheet of hardware can tell more details */
+static inline uint16_t
+i40e_rxd_pkt_type_mapping(uint8_t ptype)
 {
-   uint8_t ptype = (uint8_t)((qword & I40E_RXD_QW1_PTYPE_MASK) >>
-   I40E_RXD_QW1_PTYPE_SHIFT);
-   static const uint64_t ip_ptype_map[I40E_MAX_PKT_TYPE] = {
-   0, /* PTYPE 0 */
-   0, /* PTYPE 1 */
-   0, /* PTYPE 2 */
-   0, /* PTYPE 3 */
-   0, /* PTYPE 4 */
-   0, /* PTYPE 5 */
-   0, /* PTYPE 6 */
-   0, /* PTYPE 7 */
-   0, /* PTYPE 8 */
-   0, /* PTYPE 9 */
-   0, /* PTYPE 10 */
-   0, /* PTYPE 11 */
-   0, /* PTYPE 12 */
-   0, /* PTYPE 13 */
-   0, /* PTYPE 14 */
-   0, /* PTYPE 15 */
-   0, /* PTYPE 16 */
-   0, /* PTYPE 17 */
-   0, /* PTYPE 18 */
-   0, /* PTYPE 19 */
-   0, /* PTYPE 20 */
-   0, /* PTYPE 21 */
-   PKT_RX_IPV4_HDR, /* PTYPE 22 */
-   PKT_RX_IPV4_HDR, /* PTYPE 23 */
-   PKT_RX_IPV4_HDR, /* PTYPE 24 */
-   0, /* PTYPE 25 */
-   PKT_RX_IPV4_HDR, /* PTYPE 26 */
-   PKT_RX_IPV4_HDR, /* PTYPE 27 */
-   PKT_RX_IPV4_HDR, /* PTYPE 28 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 29 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 30 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 31 */
-   0, /* PTYPE 32 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 33 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 34 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 35 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 36 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 37 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 38 */
-   0, /* PTYPE 39 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 40 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 41 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 42 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 43 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 44 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 45 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 46 */
-   0, /* PTYPE 47 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 48 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 49 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 50 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 51 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 52 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 53 */
-   0, /* PTYPE 54 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 55 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 56 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 57 */
-   PKT_RX_IPV4_HDR_EXT, /* PTYPE 58 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 59 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 60 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 61 */
-   0, /* PTYPE 62 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 63 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 64 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 65 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 66 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 67 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 68 */
-   0, /* PTYPE 69 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 70 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 71 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 72 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 73 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 74 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 75 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 76 */
-   0, /* PTYPE 77 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 78 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 79 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 80 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 81 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 82 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 83 */
-   0, /* PTYPE 84 */
-   PKT_RX_TUNNEL_IPV4_HDR, /* PTYPE 85 */
-

[dpdk-dev] [PATCH 06/17] bond: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks of packet type for
ol_flags are replaced by unified packet type.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_bond/rte_eth_bond_pmd.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/lib/librte_pmd_bond/rte_eth_bond_pmd.c 
b/lib/librte_pmd_bond/rte_eth_bond_pmd.c
index 8b80297..acd8e77 100644
--- a/lib/librte_pmd_bond/rte_eth_bond_pmd.c
+++ b/lib/librte_pmd_bond/rte_eth_bond_pmd.c
@@ -319,12 +319,11 @@ xmit_l23_hash(const struct rte_mbuf *buf, uint8_t 
slave_count)

hash = ether_hash(eth_hdr);

-   if (buf->ol_flags & PKT_RX_IPV4_HDR) {
+   if (RTE_ETH_IS_IPV4_HDR(buf->packet_type)) {
struct ipv4_hdr *ipv4_hdr = (struct ipv4_hdr *)
((char *)(eth_hdr + 1) + vlan_offset);
l3hash = ipv4_hash(ipv4_hdr);
-
-   } else if  (buf->ol_flags & PKT_RX_IPV6_HDR) {
+   } else if  (RTE_ETH_IS_IPV6_HDR(buf->packet_type)) {
struct ipv6_hdr *ipv6_hdr = (struct ipv6_hdr *)
((char *)(eth_hdr + 1) + vlan_offset);
l3hash = ipv6_hash(ipv6_hdr);
@@ -346,7 +345,7 @@ xmit_l34_hash(const struct rte_mbuf *buf, uint8_t 
slave_count)
struct tcp_hdr *tcp_hdr = NULL;
uint32_t hash, l3hash = 0, l4hash = 0;

-   if (buf->ol_flags & PKT_RX_IPV4_HDR) {
+   if (RTE_ETH_IS_IPV4_HDR(buf->packet_type)) {
struct ipv4_hdr *ipv4_hdr = (struct ipv4_hdr *)
((char *)(eth_hdr + 1) + vlan_offset);
size_t ip_hdr_offset;
@@ -365,7 +364,7 @@ xmit_l34_hash(const struct rte_mbuf *buf, uint8_t 
slave_count)
ip_hdr_offset);
l4hash = HASH_L4_PORTS(udp_hdr);
}
-   } else if  (buf->ol_flags & PKT_RX_IPV6_HDR) {
+   } else if  (RTE_ETH_IS_IPV6_HDR(buf->packet_type)) {
struct ipv6_hdr *ipv6_hdr = (struct ipv6_hdr *)
((char *)(eth_hdr + 1) + vlan_offset);
l3hash = ipv6_hash(ipv6_hdr);
-- 
1.8.1.4

[dpdk-dev] [PATCH 07/17] enic: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks of packet type for
ol_flags are replaced by unified packet type.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_enic/enic_main.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/lib/librte_pmd_enic/enic_main.c b/lib/librte_pmd_enic/enic_main.c
index 48fdca2..9acba9a 100644
--- a/lib/librte_pmd_enic/enic_main.c
+++ b/lib/librte_pmd_enic/enic_main.c
@@ -423,7 +423,7 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
rx_pkt->pkt_len = bytes_written;

if (ipv4) {
-   rx_pkt->ol_flags |= PKT_RX_IPV4_HDR;
+   rx_pkt->packet_type = RTE_PTYPE_L3_IPV4;
if (!csum_not_calc) {
if (unlikely(!ipv4_csum_ok))
rx_pkt->ol_flags |= PKT_RX_IP_CKSUM_BAD;
@@ -432,7 +432,7 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
rx_pkt->ol_flags |= PKT_RX_L4_CKSUM_BAD;
}
} else if (ipv6)
-   rx_pkt->ol_flags |= PKT_RX_IPV6_HDR;
+   rx_pkt->packet_type = RTE_PTYPE_L3_IPV6;
} else {
/* Header split */
if (sop && !eop) {
@@ -445,7 +445,7 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
*rx_pkt_bucket = rx_pkt;
rx_pkt->pkt_len = bytes_written;
if (ipv4) {
-   rx_pkt->ol_flags |= PKT_RX_IPV4_HDR;
+   rx_pkt->packet_type = RTE_PTYPE_L3_IPV4;
if (!csum_not_calc) {
if (unlikely(!ipv4_csum_ok))
rx_pkt->ol_flags |=
@@ -457,13 +457,14 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
PKT_RX_L4_CKSUM_BAD;
}
} else if (ipv6)
-   rx_pkt->ol_flags |= PKT_RX_IPV6_HDR;
+   rx_pkt->packet_type = RTE_PTYPE_L3_IPV6;
} else {
/* Payload */
hdr_rx_pkt = *rx_pkt_bucket;
hdr_rx_pkt->pkt_len += bytes_written;
if (ipv4) {
-   hdr_rx_pkt->ol_flags |= PKT_RX_IPV4_HDR;
+   hdr_rx_pkt->packet_type =
+   RTE_PTYPE_L3_IPV4;
if (!csum_not_calc) {
if (unlikely(!ipv4_csum_ok))
hdr_rx_pkt->ol_flags |=
@@ -475,7 +476,8 @@ static int enic_rq_indicate_buf(struct vnic_rq *rq,
PKT_RX_L4_CKSUM_BAD;
}
} else if (ipv6)
-   hdr_rx_pkt->ol_flags |= PKT_RX_IPV6_HDR;
+   hdr_rx_pkt->packet_type =
+   RTE_PTYPE_L3_IPV6;

}
}
-- 
1.8.1.4

[dpdk-dev] [PATCH 08/17] vmxnet3: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks of packet type for
ol_flags are replaced by unified packet type.

Signed-off-by: Helin Zhang 
---
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
index 8425f32..c85ebd8 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
@@ -650,9 +650,9 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
struct ipv4_hdr *ip = (struct ipv4_hdr *)(eth + 1);

if (((ip->version_ihl & 0xf) << 2) > (int)sizeof(struct 
ipv4_hdr))
-   rxm->ol_flags |= PKT_RX_IPV4_HDR_EXT;
+   rxm->packet_type = RTE_PTYPE_L3_IPV4_EXT;
else
-   rxm->ol_flags |= PKT_RX_IPV4_HDR;
+   rxm->packet_type = RTE_PTYPE_L3_IPV4;

if (!rcd->cnc) {
if (!rcd->ipc)
-- 
1.8.1.4

[dpdk-dev] [PATCH 09/17] app/test-pipeline: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks and relevant macros
of packet type for ol_flags are replaced by unified packet type and
relevant macros.

Signed-off-by: Helin Zhang 
---
 app/test-pipeline/pipeline_hash.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/app/test-pipeline/pipeline_hash.c 
b/app/test-pipeline/pipeline_hash.c
index 4598ad4..db650c8 100644
--- a/app/test-pipeline/pipeline_hash.c
+++ b/app/test-pipeline/pipeline_hash.c
@@ -459,14 +459,14 @@ app_main_loop_rx_metadata(void) {
signature = RTE_MBUF_METADATA_UINT32_PTR(m, 0);
key = RTE_MBUF_METADATA_UINT8_PTR(m, 32);

-   if (m->ol_flags & PKT_RX_IPV4_HDR) {
+   if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
ip_hdr = (struct ipv4_hdr *)
&m_data[sizeof(struct ether_hdr)];
ip_dst = ip_hdr->dst_addr;

k32 = (uint32_t *) key;
k32[0] = ip_dst & 0xFF00;
-   } else {
+   } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
ipv6_hdr = (struct ipv6_hdr *)
&m_data[sizeof(struct ether_hdr)];
ipv6_dst = ipv6_hdr->dst_addr;
-- 
1.8.1.4

[dpdk-dev] [PATCH 10/17] app/test-pmd: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks and relevant macros
of packet type for ol_flags are replaced by unified packet type and
relevant macros.

Signed-off-by: Helin Zhang 
---
 app/test-pmd/csumonly.c | 6 +++---
 app/test-pmd/rxonly.c   | 9 +++--
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index 41711fd..5e08272 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -319,7 +319,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
uint16_t nb_tx;
uint16_t i;
uint64_t ol_flags;
-   uint16_t testpmd_ol_flags;
+   uint16_t testpmd_ol_flags, packet_type;
uint8_t l4_proto, l4_tun_len = 0;
uint16_t ethertype = 0, outer_ethertype = 0;
uint16_t l2_len = 0, l3_len = 0, l4_len = 0;
@@ -362,6 +362,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
tunnel = 0;
l4_tun_len = 0;
m = pkts_burst[i];
+   packet_type = m->packet_type;

/* Update the L3/L4 checksum error packet statistics */
rx_bad_ip_csum += ((m->ol_flags & PKT_RX_IP_CKSUM_BAD) != 0);
@@ -387,8 +388,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)

/* currently, this flag is set by i40e only if the
 * packet is vxlan */
-   } else if (m->ol_flags & (PKT_RX_TUNNEL_IPV4_HDR |
-   PKT_RX_TUNNEL_IPV6_HDR))
+   } else if (RTE_ETH_IS_TUNNEL_PKT(packet_type))
tunnel = 1;

if (tunnel == 1) {
diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index fdfe990..8eb68c4 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -92,7 +92,7 @@ pkt_burst_receive(struct fwd_stream *fs)
uint64_t ol_flags;
uint16_t nb_rx;
uint16_t i, packet_type;
-   uint64_t is_encapsulation;
+   uint16_t is_encapsulation;

 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
uint64_t start_tsc;
@@ -135,10 +135,7 @@ pkt_burst_receive(struct fwd_stream *fs)
eth_type = RTE_BE_TO_CPU_16(eth_hdr->ether_type);
ol_flags = mb->ol_flags;
packet_type = mb->packet_type;
-
-   is_encapsulation = ol_flags & (PKT_RX_TUNNEL_IPV4_HDR |
-   PKT_RX_TUNNEL_IPV6_HDR);
-
+   is_encapsulation = RTE_ETH_IS_TUNNEL_PKT(packet_type);
print_ether_addr("  src=", ð_hdr->s_addr);
print_ether_addr(" - dst=", ð_hdr->d_addr);
printf(" - type=0x%04x - length=%u - nb_segs=%d",
@@ -174,7 +171,7 @@ pkt_burst_receive(struct fwd_stream *fs)
l2_len  = sizeof(struct ether_hdr);

 /* Do not support ipv4 option field */
-   if (ol_flags & PKT_RX_TUNNEL_IPV4_HDR) {
+   if (RTE_ETH_IS_IPV4_HDR(packet_type)) {
l3_len = sizeof(struct ipv4_hdr);
ipv4_hdr = (struct ipv4_hdr *) 
(rte_pktmbuf_mtod(mb,
unsigned char *) + l2_len);
-- 
1.8.1.4

[dpdk-dev] [PATCH 11/17] app/test: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks and relevant macros
of packet type for ol_flags are replaced by unified packet type and
relevant macros.

Signed-off-by: Helin Zhang 
---
 app/test/packet_burst_generator.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/app/test/packet_burst_generator.c 
b/app/test/packet_burst_generator.c
index 4a89663..0a936ea 100644
--- a/app/test/packet_burst_generator.c
+++ b/app/test/packet_burst_generator.c
@@ -258,18 +258,16 @@ nomore_mbuf:
pkt->vlan_tci  = ETHER_TYPE_IPv4;
pkt->l3_len = sizeof(struct ipv4_hdr);

+   pkt->packet_type = RTE_PTYPE_L3_IPV4;
if (vlan_enabled)
-   pkt->ol_flags = PKT_RX_IPV4_HDR | 
PKT_RX_VLAN_PKT;
-   else
-   pkt->ol_flags = PKT_RX_IPV4_HDR;
+   pkt->ol_flags = PKT_RX_VLAN_PKT;
} else {
pkt->vlan_tci  = ETHER_TYPE_IPv6;
pkt->l3_len = sizeof(struct ipv6_hdr);

+   pkt->packet_type = RTE_PTYPE_L3_IPV6;
if (vlan_enabled)
-   pkt->ol_flags = PKT_RX_IPV6_HDR | 
PKT_RX_VLAN_PKT;
-   else
-   pkt->ol_flags = PKT_RX_IPV6_HDR;
+   pkt->ol_flags = PKT_RX_VLAN_PKT;
}

pkts_burst[nb_pkt] = pkt;
-- 
1.8.1.4

[dpdk-dev] [PATCH 12/17] examples/ip_fragmentation: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks and relevant macros
of packet type for ol_flags are replaced by unified packet type and
relevant macros.

Signed-off-by: Helin Zhang 
---
 examples/ip_fragmentation/main.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index eac5427..152844e 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -286,7 +286,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct 
lcore_queue_conf *qconf,
len = qconf->tx_mbufs[port_out].len;

/* if this is an IPv4 packet */
-   if (m->ol_flags & PKT_RX_IPV4_HDR) {
+   if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
struct ipv4_hdr *ip_hdr;
uint32_t ip_dst;
/* Read the lookup key (i.e. ip_dst) from the input packet */
@@ -320,9 +320,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct 
lcore_queue_conf *qconf,
if (unlikely (len2 < 0))
return;
}
-   }
-   /* if this is an IPv6 packet */
-   else if (m->ol_flags & PKT_RX_IPV6_HDR) {
+   } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+   /* if this is an IPv6 packet */
struct ipv6_hdr *ip_hdr;

ipv6 = 1;
-- 
1.8.1.4

[dpdk-dev] [PATCH 14/17] examples/l3fwd-acl: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks and relevant macros
of packet type for ol_flags are replaced by unified packet type and
relevant macros.

Signed-off-by: Helin Zhang 
---
 examples/l3fwd-acl/main.c | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/examples/l3fwd-acl/main.c b/examples/l3fwd-acl/main.c
index f1f7601..af70ccd 100644
--- a/examples/l3fwd-acl/main.c
+++ b/examples/l3fwd-acl/main.c
@@ -651,9 +651,7 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct 
acl_search_t *acl,
struct ipv4_hdr *ipv4_hdr;
struct rte_mbuf *pkt = pkts_in[index];

-   int type = pkt->ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV6_HDR);
-
-   if (type == PKT_RX_IPV4_HDR) {
+   if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {

ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
unsigned char *) + sizeof(struct ether_hdr));
@@ -674,8 +672,7 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct 
acl_search_t *acl,
rte_pktmbuf_free(pkt);
}

-   } else if (type == PKT_RX_IPV6_HDR) {
-
+   } else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
/* Fill acl structure */
acl->data_ipv6[acl->num_ipv6] = MBUF_IPV6_2PROTO(pkt);
acl->m_ipv6[(acl->num_ipv6)++] = pkt;
@@ -693,17 +690,13 @@ prepare_one_packet(struct rte_mbuf **pkts_in, struct 
acl_search_t *acl,
 {
struct rte_mbuf *pkt = pkts_in[index];

-   int type = pkt->ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV6_HDR);
-
-   if (type == PKT_RX_IPV4_HDR) {
-
+   if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
/* Fill acl structure */
acl->data_ipv4[acl->num_ipv4] = MBUF_IPV4_2PROTO(pkt);
acl->m_ipv4[(acl->num_ipv4)++] = pkt;


-   } else if (type == PKT_RX_IPV6_HDR) {
-
+   } else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
/* Fill acl structure */
acl->data_ipv6[acl->num_ipv6] = MBUF_IPV6_2PROTO(pkt);
acl->m_ipv6[(acl->num_ipv6)++] = pkt;
@@ -751,9 +744,9 @@ send_one_packet(struct rte_mbuf *m, uint32_t res)
/* in the ACL list, drop it */
 #ifdef L3FWDACL_DEBUG
if ((res & ACL_DENY_SIGNATURE) != 0) {
-   if (m->ol_flags & PKT_RX_IPV4_HDR)
+   if (RTE_ETH_IS_IPV4_HDR(m->packet_type))
dump_acl4_rule(m, res);
-   else
+   else if (RTE_ETH_IS_IPV6_HDR(m->packet_type))
dump_acl6_rule(m, res);
}
 #endif
-- 
1.8.1.4

[dpdk-dev] [PATCH 13/17] examples/ip_reassembly: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks and relevant macros
of packet type for ol_flags are replaced by unified packet type and
relevant macros.

Signed-off-by: Helin Zhang 
---
 examples/ip_reassembly/main.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 8492153..5ef2135 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -357,7 +357,7 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t 
queue,
dst_port = portid;

/* if packet is IPv4 */
-   if (m->ol_flags & (PKT_RX_IPV4_HDR)) {
+   if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
struct ipv4_hdr *ip_hdr;
uint32_t ip_dst;

@@ -397,9 +397,8 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t 
queue,
}

eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
-   }
-   /* if packet is IPv6 */
-   else if (m->ol_flags & (PKT_RX_IPV6_HDR | PKT_RX_IPV6_HDR_EXT)) {
+   } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
+   /* if packet is IPv6 */
struct ipv6_extension_fragment *frag_hdr;
struct ipv6_hdr *ip_hdr;

-- 
1.8.1.4

[dpdk-dev] [PATCH 15/17] examples/l3fwd-power: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks and relevant macros
of packet type for ol_flags are replaced by unified packet type and
relevant macros.

Signed-off-by: Helin Zhang 
---
 examples/l3fwd-power/main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index f6b55b9..964e5b9 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -638,7 +638,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid,

eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);

-   if (m->ol_flags & PKT_RX_IPV4_HDR) {
+   if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
/* Handle IPv4 headers.*/
ipv4_hdr =
(struct ipv4_hdr *)(rte_pktmbuf_mtod(m, unsigned char*)
@@ -673,8 +673,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid,
ether_addr_copy(&ports_eth_addr[dst_port], ð_hdr->s_addr);

send_single_packet(m, dst_port);
-   }
-   else {
+   } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
/* Handle IPv6 headers.*/
 #if (APP_LOOKUP_METHOD == APP_LOOKUP_EXACT_MATCH)
struct ipv6_hdr *ipv6_hdr;
-- 
1.8.1.4

[dpdk-dev] [PATCH 17/17] mbuf: remove old packet type bit masks for ol_flags

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks and relevant macros
of packet type for ol_flags are replaced by unified packet type and
relevant macros.

Signed-off-by: Helin Zhang 
---
 lib/librte_mbuf/rte_mbuf.c |  6 --
 lib/librte_mbuf/rte_mbuf.h | 10 ++
 2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 1b14e02..8050ccf 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -215,14 +215,8 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
/* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
/* case PKT_RX_RECIP_ERR: return "PKT_RX_RECIP_ERR"; */
/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
-   case PKT_RX_IPV4_HDR: return "PKT_RX_IPV4_HDR";
-   case PKT_RX_IPV4_HDR_EXT: return "PKT_RX_IPV4_HDR_EXT";
-   case PKT_RX_IPV6_HDR: return "PKT_RX_IPV6_HDR";
-   case PKT_RX_IPV6_HDR_EXT: return "PKT_RX_IPV6_HDR_EXT";
case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
-   case PKT_RX_TUNNEL_IPV4_HDR: return "PKT_RX_TUNNEL_IPV4_HDR";
-   case PKT_RX_TUNNEL_IPV6_HDR: return "PKT_RX_TUNNEL_IPV6_HDR";
default: return NULL;
}
 }
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 94ae344..5df0d61 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -90,16 +90,10 @@ extern "C" {
 #define PKT_RX_HBUF_OVERFLOW (0ULL << 0)  /**< Header buffer overflow. */
 #define PKT_RX_RECIP_ERR (0ULL << 0)  /**< Hardware processing error. */
 #define PKT_RX_MAC_ERR   (0ULL << 0)  /**< MAC error. */
-#define PKT_RX_IPV4_HDR  (1ULL << 5)  /**< RX packet with IPv4 header. */
-#define PKT_RX_IPV4_HDR_EXT  (1ULL << 6)  /**< RX packet with extended IPv4 
header. */
-#define PKT_RX_IPV6_HDR  (1ULL << 7)  /**< RX packet with IPv6 header. */
-#define PKT_RX_IPV6_HDR_EXT  (1ULL << 8)  /**< RX packet with extended IPv6 
header. */
 #define PKT_RX_IEEE1588_PTP  (1ULL << 9)  /**< RX IEEE1588 L2 Ethernet PT 
Packet. */
 #define PKT_RX_IEEE1588_TMST (1ULL << 10) /**< RX IEEE1588 L2/L4 timestamped 
packet.*/
-#define PKT_RX_TUNNEL_IPV4_HDR (1ULL << 11) /**< RX tunnel packet with IPv4 
header.*/
-#define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 
header. */
-#define PKT_RX_FDIR_ID   (1ULL << 13) /**< FD id reported if FDIR match. */
-#define PKT_RX_FDIR_FLX  (1ULL << 14) /**< Flexible bytes reported if FDIR 
match. */
+#define PKT_RX_FDIR_ID   (1ULL << 11) /**< FD id reported if FDIR match. */
+#define PKT_RX_FDIR_FLX  (1ULL << 12) /**< Flexible bytes reported if FDIR 
match. */
 /* add new RX flags here */

 /* add new TX flags here */
-- 
1.8.1.4

[dpdk-dev] [PATCH 16/17] examples/l3fwd: support of unified packet type

2015-01-29 Thread Helin Zhang

To unify packet types among all PMDs, bit masks and relevant macros
of packet type for ol_flags are replaced by unified packet type and
relevant macros.

Signed-off-by: Helin Zhang 
---
 examples/l3fwd/main.c | 64 +--
 1 file changed, 37 insertions(+), 27 deletions(-)

diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 6f7d7d4..d02a19c 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -958,7 +958,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, 
struct lcore_conf *qcon

eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);

-   if (m->ol_flags & PKT_RX_IPV4_HDR) {
+   if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) {
/* Handle IPv4 headers.*/
ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(m, unsigned 
char *) +
sizeof(struct ether_hdr));
@@ -993,7 +993,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, 
struct lcore_conf *qcon

send_single_packet(m, dst_port);

-   } else {
+   } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
/* Handle IPv6 headers.*/
struct ipv6_hdr *ipv6_hdr;

@@ -1039,11 +1039,11 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t 
portid, struct lcore_conf *qcon
  * to BAD_PORT value.
  */
 static inline __attribute__((always_inline)) void
-rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t flags)
+rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint16_t ptype)
 {
uint8_t ihl;

-   if ((flags & PKT_RX_IPV4_HDR) != 0) {
+   if (RTE_ETH_IS_IPV4_HDR(ptype)) {

ihl = ipv4_hdr->version_ihl - IPV4_MIN_VER_IHL;

@@ -1074,11 +1074,11 @@ get_dst_port(const struct lcore_conf *qconf, struct 
rte_mbuf *pkt,
struct ipv6_hdr *ipv6_hdr;
struct ether_hdr *eth_hdr;

-   if (pkt->ol_flags & PKT_RX_IPV4_HDR) {
+   if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
if (rte_lpm_lookup(qconf->ipv4_lookup_struct, dst_ipv4,
&next_hop) != 0)
next_hop = portid;
-   } else if (pkt->ol_flags & PKT_RX_IPV6_HDR) {
+   } else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
if (rte_lpm6_lookup(qconf->ipv6_lookup_struct,
@@ -1112,7 +1112,7 @@ process_packet(struct lcore_conf *qconf, struct rte_mbuf 
*pkt,
ve = val_eth[dp];

dst_port[0] = dp;
-   rfc1812_process(ipv4_hdr, dst_port, pkt->ol_flags);
+   rfc1812_process(ipv4_hdr, dst_port, pkt->packet_type);

te =  _mm_blend_epi16(te, ve, MASK_ETH);
_mm_store_si128((__m128i *)eth_hdr, te);
@@ -1122,7 +1122,7 @@ process_packet(struct lcore_conf *qconf, struct rte_mbuf 
*pkt,
  * Read ol_flags and destination IPV4 addresses from 4 mbufs.
  */
 static inline void
-processx4_step1(struct rte_mbuf *pkt[FWDSTEP], __m128i *dip, uint32_t *flag)
+processx4_step1(struct rte_mbuf *pkt[FWDSTEP], __m128i *dip, int *ipv4_flag)
 {
struct ipv4_hdr *ipv4_hdr;
struct ether_hdr *eth_hdr;
@@ -1131,22 +1131,22 @@ processx4_step1(struct rte_mbuf *pkt[FWDSTEP], __m128i 
*dip, uint32_t *flag)
eth_hdr = rte_pktmbuf_mtod(pkt[0], struct ether_hdr *);
ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
x0 = ipv4_hdr->dst_addr;
-   flag[0] = pkt[0]->ol_flags & PKT_RX_IPV4_HDR;

eth_hdr = rte_pktmbuf_mtod(pkt[1], struct ether_hdr *);
ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
x1 = ipv4_hdr->dst_addr;
-   flag[0] &= pkt[1]->ol_flags;

eth_hdr = rte_pktmbuf_mtod(pkt[2], struct ether_hdr *);
ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
x2 = ipv4_hdr->dst_addr;
-   flag[0] &= pkt[2]->ol_flags;

eth_hdr = rte_pktmbuf_mtod(pkt[3], struct ether_hdr *);
ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
x3 = ipv4_hdr->dst_addr;
-   flag[0] &= pkt[3]->ol_flags;
+   *ipv4_flag = RTE_ETH_IS_IPV4_HDR(pkt[0]->packet_type) &&
+   RTE_ETH_IS_IPV4_HDR(pkt[1]->packet_type) &&
+   RTE_ETH_IS_IPV4_HDR(pkt[2]->packet_type) &&
+   RTE_ETH_IS_IPV4_HDR(pkt[3]->packet_type);

dip[0] = _mm_set_epi32(x3, x2, x1, x0);
 }
@@ -1156,7 +1156,7 @@ processx4_step1(struct rte_mbuf *pkt[FWDSTEP], __m128i 
*dip, uint32_t *flag)
  * If lookup fails, use incoming port (portid) as destination port.
  */
 static inline void
-processx4_step2(const struct lcore_conf *qconf, __m128i dip, uint32_t flag,
+processx4_step2(const struct lcore_conf *qconf, __m128i dip, int ipv4_flag,
uint8_t portid, struct rte_mbuf *pkt[FWDSTEP], uint16_t dprt[FWDSTEP])
 {
rte_xmm_t dst;
@@ -1167,7 +1167,7 @@ processx4_step2(const struct lcore_conf *qconf, __m128i 
dip, uint32_t flag,
dip = _mm_shuffl

[dpdk-dev] [PATCH v1 2/5] ixgbe: enable rx queue interrupts for both PF and VF

2015-01-29 Thread Qiu, Michael

On 1/28/2015 5:52 PM, Danny Zhou wrote:
> Signed-off-by: Danny Zhou 
> ---
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 371 
> 
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   9 +
>  2 files changed, 380 insertions(+)
>
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> index b341dd0..39f883a 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> @@ -60,6 +60,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "ixgbe_logs.h"
> @@ -173,6 +174,7 @@ static int ixgbe_dev_rss_reta_query(struct rte_eth_dev 
> *dev,
>   uint16_t reta_size);
>  static void ixgbe_dev_link_status_print(struct rte_eth_dev *dev);
>  static int ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev);
> +static int ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev);
>  static int ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev);
>  static int ixgbe_dev_interrupt_action(struct rte_eth_dev *dev);
>  static void ixgbe_dev_interrupt_handler(struct rte_intr_handle *handle,
> @@ -186,11 +188,14 @@ static void ixgbe_dcb_init(struct ixgbe_hw *hw,struct 
> ixgbe_dcb_config *dcb_conf
>  /* For Virtual Function support */
>  static int eth_ixgbevf_dev_init(struct eth_driver *eth_drv,
>   struct rte_eth_dev *eth_dev);
> +static int ixgbevf_dev_interrupt_get_status(struct rte_eth_dev *dev);
> +static int ixgbevf_dev_interrupt_action(struct rte_eth_dev *dev);
>  static int  ixgbevf_dev_configure(struct rte_eth_dev *dev);
>  static int  ixgbevf_dev_start(struct rte_eth_dev *dev);
>  static void ixgbevf_dev_stop(struct rte_eth_dev *dev);
>  static void ixgbevf_dev_close(struct rte_eth_dev *dev);
>  static void ixgbevf_intr_disable(struct ixgbe_hw *hw);
> +static void ixgbevf_intr_enable(struct ixgbe_hw *hw);
>  static void ixgbevf_dev_stats_get(struct rte_eth_dev *dev,
>   struct rte_eth_stats *stats);
>  static void ixgbevf_dev_stats_reset(struct rte_eth_dev *dev);
> @@ -198,8 +203,15 @@ static int ixgbevf_vlan_filter_set(struct rte_eth_dev 
> *dev,
>   uint16_t vlan_id, int on);
>  static void ixgbevf_vlan_strip_queue_set(struct rte_eth_dev *dev,
>   uint16_t queue, int on);
> +static void ixgbevf_set_ivar(struct ixgbe_hw *hw, s8 direction, u8 queue, u8 
> msix_vector);

^^^
>  static void ixgbevf_vlan_offload_set(struct rte_eth_dev *dev, int mask);
>  static void ixgbevf_set_vfta_all(struct rte_eth_dev *dev, bool on);
> +static void ixgbevf_dev_interrupt_handler(struct rte_intr_handle *handle,
> + void *param);
> +static int ixgbevf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, 
> uint16_t queue_id);
> +static int ixgbevf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, 
> uint16_t queue_id);
> +static void ixgbevf_set_ivar(struct ixgbe_hw *hw, s8 direction, u8 queue, u8 
> msix_vector);

Yes re-claim static void ixgbevf_set_ivar()  for twice? Or are they
different?

> +static void ixgbevf_configure_msix(struct  ixgbe_hw *hw);
>  
>  /* For Eth VMDQ APIs support */
>  static int ixgbe_uc_hash_table_set(struct rte_eth_dev *dev, struct
> @@ -217,6 +229,11 @@ static int ixgbe_mirror_rule_set(struct rte_eth_dev *dev,
>  static int ixgbe_mirror_rule_reset(struct rte_eth_dev *dev,
>   uint8_t rule_id);

[...]
> +static void
> +ixgbe_configure_msix(struct ixgbe_hw *hw)
> +{
> + int queue_id;
> + u32 mask;
> + u32 gpie;
> +
> + /* set GPIE for in MSI-x mode */
> + gpie = IXGBE_READ_REG(hw, IXGBE_GPIE);
> + gpie = IXGBE_GPIE_MSIX_MODE | IXGBE_GPIE_PBA_SUPPORT |
> +IXGBE_GPIE_OCD;
> + gpie |= IXGBE_GPIE_EIAME;

As you will override gpie with other flags why need to read the reg and
save to gpie first?

Maybe read the reg to reset?

I guess should be:

+   gpie = IXGBE_READ_REG(hw, IXGBE_GPIE);
+   gpie |= IXGBE_GPIE_MSIX_MODE | IXGBE_GPIE_PBA_SUPPORT |
+   IXGBE_GPIE_OCD | IXGBE_GPIE_EIAME;

Maybe not correct as I not familiar with IXGBE.


> + /*
> +  * use EIAM to auto-mask when MSI-X interrupt is asserted
> +  * this saves a register write for every interrupt
> +  */
> + switch (hw->mac.type) {
> + case ixgbe_mac_82598EB:
> + IXGBE_WRITE_REG(hw, IXGBE_EIAM, IXGBE_EICS_RTX_QUEUE);
> + break;
> + case ixgbe_mac_82599EB:
> + case ixgbe_mac_X540:
> + default:
> + IXGBE_WRITE_REG(hw, IXGBE_EIAM_EX(0), 0x);
> + IXGBE_WRITE_REG(hw, IXGBE_EIAM_EX(1), 0x);
> + break;
> + }
> + IXGBE_WRITE_REG(hw, IXGBE_GPIE, gpie);
> +
> + /*
> + * Populate the IVAR table and set the ITR values to the
> + * corresponding register.
> + */
> + for (queue_id = 0; queue_id < VFIO_MAX_QUEUE_ID; queue_id++)
> + ixgbe_set_ivar(hw, 0, queue_id, q

[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-29 Thread Fu, JingguoX

Basic Information

Patch nameDPDK memcpy optimization
Brief description about test purposeVerify memory copy and memory 
copy performance cases on variety OS
Test Flag Tested-by
Tester name   jingguox.fu at intel.com

Test Tool Chain information N/A
  Commit ID 88fa98a60b34812bfed92e5b2706fcf7e1cbcbc8
Test Result Summary Total 6 cases, 6 passed, 0 failed

Test environment

-   Environment 1:
OS: Ubuntu12.04 3.2.0-23-generic X86_64
GCC: gcc version 4.6.3
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ [8086:10fb] (rev 
01)

-   Environment 2: 
OS: Ubuntu14.04 3.13.0-24-generic
GCC: gcc version 4.8.2
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ [8086:10fb] (rev 
01)

Environment 3:
OS: Fedora18 3.6.10-4.fc18.x86_64
GCC: gcc version 4.7.2 20121109
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ [8086:10fb] (rev 
01)


Detailed Testing information

  Test Case - nametest_memcpy
Test Case - Description 
  Create two buffers, and initialise one with random values. 
These are copied 
  to the second buffer and then compared to see if the copy was 
successful. The 
  bytes outside the copied area are also checked to make sure 
they were not changed.
Test Case -test sample/application
  test application in app/test
Test Case -command / instruction
  # ./app/test/test -n 1 -c 
  #RTE>> memcpy_autotest
Test Case - expected
  #RTE>> Test   OK
Test Result- PASSED

Test Case - nametest_memcpy_perf
Test Case - Description
  a number of different sizes and cached/uncached permutations
Test Case -test sample/application
  test application in app/test
Test Case -command / instruction
  # ./app/test/test -n 1 -c 
  #RTE>> memcpy_perf_autotest
Test Case - expected
  #RTE>> Test   OK
Test Result- PASSED


-Original Message-
From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of zhihong.w...@intel.com
Sent: Monday, January 19, 2015 09:54
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

This patch set optimizes memcpy for DPDK for both SSE and AVX platforms.
It also extends memcpy test coverage with unaligned cases and more test points.

Optimization techniques are summarized below:

1. Utilize full cache bandwidth

2. Enforce aligned stores

3. Apply load address alignment based on architecture features

4. Make load/store address available as early as possible

5. General optimization techniques like inlining, branch reducing, prefetch 
pattern access

Zhihong Wang (4):
  Disabled VTA for memcpy test in app/test/Makefile
  Removed unnecessary test cases in test_memcpy.c
  Extended test coverage in test_memcpy_perf.c
  Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX
platforms

 app/test/Makefile  |   6 +
 app/test/test_memcpy.c |  52 +-
 app/test/test_memcpy_perf.c| 238 +---
 .../common/include/arch/x86/rte_memcpy.h   | 664 +++--
 4 files changed, 656 insertions(+), 304 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH v1 4/5] eal: add per rx queue interrupt handling based on VFIO

2015-01-29 Thread Zhou, Danny

Thanks for review Steve. Comments inline.

> -Original Message-
> From: Liang, Cunming
> Sent: Thursday, January 29, 2015 2:10 AM
> To: Zhou, Danny; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v1 4/5] eal: add per rx queue interrupt 
> handling based on VFIO
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Danny Zhou
> > Sent: Wednesday, January 28, 2015 2:51 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v1 4/5] eal: add per rx queue interrupt handling
> > based on VFIO
> >
> > Signed-off-by: Danny Zhou 
> > Signed-off-by: Yong Liu 
> > ---
> >  lib/librte_eal/common/include/rte_eal.h|   9 +
> >  lib/librte_eal/linuxapp/eal/eal_interrupts.c   | 186
> > -
> >  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c |  11 +-
> >  .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
> >  4 files changed, 168 insertions(+), 42 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/include/rte_eal.h
> > b/lib/librte_eal/common/include/rte_eal.h
> > index f4ecd2e..5f31aa5 100644
> > --- a/lib/librte_eal/common/include/rte_eal.h
> > +++ b/lib/librte_eal/common/include/rte_eal.h
> > @@ -150,6 +150,15 @@ int rte_eal_iopl_init(void);
> >   *   - On failure, a negative error value.
> >   */
> >  int rte_eal_init(int argc, char **argv);
> > +
> > +/**
> > + * @param port_id
> > + *   the port id
> > + * @return
> > + *   - On success, return 0
> [LCM] It has changes to return -1.
> > + */
> > +int rte_eal_wait_rx_intr(uint8_t port_id, uint8_t queue_id);
> > +
> >  /**
> >   * Usage function typedef used by the application usage function.
> >   *
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> > b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> > index dc2668a..b120303 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> > @@ -64,6 +64,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include "eal_private.h"
> >  #include "eal_vfio.h"
> > @@ -127,6 +128,7 @@ static pthread_t intr_thread;
> >  #ifdef VFIO_PRESENT
> >
> >  #define IRQ_SET_BUF_LEN  (sizeof(struct vfio_irq_set) + sizeof(int))
> > +#define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + sizeof(int) *
> > (VFIO_MAX_QUEUE_ID + 1))
> >
> >  /* enable legacy (INTx) interrupts */
> >  static int
> > @@ -221,7 +223,7 @@ vfio_disable_intx(struct rte_intr_handle *intr_handle) {
> >  /* enable MSI-X interrupts */
> >  static int
> >  vfio_enable_msi(struct rte_intr_handle *intr_handle) {
> > -   int len, ret;
> > +   int len, ret, max_intr;
> > char irq_set_buf[IRQ_SET_BUF_LEN];
> > struct vfio_irq_set *irq_set;
> > int *fd_ptr;
> > @@ -230,12 +232,19 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle)
> > {
> >
> > irq_set = (struct vfio_irq_set *) irq_set_buf;
> > irq_set->argsz = len;
> > -   irq_set->count = 1;
> > +   if ((!intr_handle->max_intr) ||
> > +   (intr_handle->max_intr > VFIO_MAX_QUEUE_ID))
> > +   max_intr = VFIO_MAX_QUEUE_ID + 1;
> > +   else
> > +   max_intr = intr_handle->max_intr;
> > +
> > +   irq_set->count = max_intr;
> > irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> > VFIO_IRQ_SET_ACTION_TRIGGER;
> > irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
> > irq_set->start = 0;
> > fd_ptr = (int *) &irq_set->data;
> > -   *fd_ptr = intr_handle->fd;
> > +   memcpy(fd_ptr, intr_handle->queue_fd, sizeof(intr_handle->queue_fd));
> > +   fd_ptr[max_intr - 1] = intr_handle->fd;
> >
> > ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
> >
> > @@ -244,23 +253,6 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle) {
> > intr_handle->fd);
> > return -1;
> > }
> > -
> > -   /* manually trigger interrupt to enable it */
> > -   memset(irq_set, 0, len);
> > -   len = sizeof(struct vfio_irq_set);
> > -   irq_set->argsz = len;
> > -   irq_set->count = 1;
> > -   irq_set->flags = VFIO_IRQ_SET_DATA_NONE |
> > VFIO_IRQ_SET_ACTION_TRIGGER;
> > -   irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
> > -   irq_set->start = 0;
> > -
> > -   ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
> > -
> > -   if (ret) {
> > -   RTE_LOG(ERR, EAL, "Error triggering MSI interrupts for fd %d\n",
> > -   intr_handle->fd);
> > -   return -1;
> > -   }
> > return 0;
> >  }
> >
> > @@ -292,8 +284,8 @@ vfio_disable_msi(struct rte_intr_handle *intr_handle) {
> >  /* enable MSI-X interrupts */
> >  static int
> >  vfio_enable_msix(struct rte_intr_handle *intr_handle) {
> > -   int len, ret;
> > -   char irq_set_buf[IRQ_SET_BUF_LEN];
> > +   int len, ret, max_intr;
> > +   char irq_set_buf[MSIX_IRQ_SET_BUF_LEN];
> > struct vfio_irq_set *irq_set;
> > int *fd_ptr;
> >
> > @@ -301,12 +293,19 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle)
>

[dpdk-dev] [PATCH v1 5/5] L3fwd-power: enable one-shot rx interrupt and polling/interrupt mode switch

2015-01-29 Thread Zhou, Danny



> -Original Message-
> From: Liang, Cunming
> Sent: Thursday, January 29, 2015 2:34 AM
> To: Zhou, Danny; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v1 5/5] L3fwd-power: enable one-shot rx 
> interrupt and polling/interrupt mode switch
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Danny Zhou
> > Sent: Wednesday, January 28, 2015 2:51 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v1 5/5] L3fwd-power: enable one-shot rx interrupt
> > and polling/interrupt mode switch
> >
> > Signed-off-by: Danny Zhou 
> > ---
> >  examples/l3fwd-power/main.c | 170
> > +---
> >  1 file changed, 129 insertions(+), 41 deletions(-)
> >
> > diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
> > index f6b55b9..e6e4f55 100644
> > --- a/examples/l3fwd-power/main.c
> > +++ b/examples/l3fwd-power/main.c
> > @@ -75,12 +75,13 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #define RTE_LOGTYPE_L3FWD_POWER RTE_LOGTYPE_USER1
> >
> >  #define MAX_PKT_BURST 32
> >
> > -#define MIN_ZERO_POLL_COUNT 5
> > +#define MIN_ZERO_POLL_COUNT 10
> >
> >  /* around 100ms at 2 Ghz */
> >  #define TIMER_RESOLUTION_CYCLES   2ULL
> > @@ -188,6 +189,9 @@ struct lcore_rx_queue {
> >  #define MAX_TX_QUEUE_PER_PORT RTE_MAX_ETHPORTS
> >  #define MAX_RX_QUEUE_PER_PORT 128
> >
> > +#define MAX_RX_QUEUE_INTERRUPT_PER_PORT 16
> > +
> > +
> >  #define MAX_LCORE_PARAMS 1024
> >  struct lcore_params {
> > uint8_t port_id;
> > @@ -214,7 +218,7 @@ static uint16_t nb_lcore_params =
> > sizeof(lcore_params_array_default) /
> >
> >  static struct rte_eth_conf port_conf = {
> > .rxmode = {
> > -   .mq_mode= ETH_MQ_RX_RSS,
> > +   .mq_mode = ETH_MQ_RX_RSS,
> > .max_rx_pkt_len = ETHER_MAX_LEN,
> > .split_hdr_size = 0,
> > .header_split   = 0, /**< Header Split disabled */
> > @@ -226,11 +230,14 @@ static struct rte_eth_conf port_conf = {
> > .rx_adv_conf = {
> > .rss_conf = {
> > .rss_key = NULL,
> > -   .rss_hf = ETH_RSS_IP,
> > +   .rss_hf = ETH_RSS_UDP,
> > },
> > },
> > .txmode = {
> > -   .mq_mode = ETH_DCB_NONE,
> > +   .mq_mode = ETH_MQ_TX_NONE,
> > +   },
> > +   .intr_conf = {
> > +   .rxq = 1, /**< rxq interrupt feature enabled */
> > },
> >  };
> >
> > @@ -402,19 +409,22 @@ power_timer_cb(__attribute__((unused)) struct
> > rte_timer *tim,
> > /* accumulate total execution time in us when callback is invoked */
> > sleep_time_ratio = (float)(stats[lcore_id].sleep_time) /
> > (float)SCALING_PERIOD;
> > -
> > /**
> >  * check whether need to scale down frequency a step if it sleep a lot.
> >  */
> > -   if (sleep_time_ratio >= SCALING_DOWN_TIME_RATIO_THRESHOLD)
> > -   rte_power_freq_down(lcore_id);
> > +   if (sleep_time_ratio >= SCALING_DOWN_TIME_RATIO_THRESHOLD) {
> > +   if (rte_power_freq_down)
> > +   rte_power_freq_down(lcore_id);
> > +   }
> > else if ( (unsigned)(stats[lcore_id].nb_rx_processed /
> > -   stats[lcore_id].nb_iteration_looped) < MAX_PKT_BURST)
> > +   stats[lcore_id].nb_iteration_looped) < MAX_PKT_BURST) {
> > /**
> >  * scale down a step if average packet per iteration less
> >  * than expectation.
> >  */
> > -   rte_power_freq_down(lcore_id);
> > +   if (rte_power_freq_down)
> > +   rte_power_freq_down(lcore_id);
> > +   }
> >
> > /**
> >  * initialize another timer according to current frequency to ensure
> > @@ -707,22 +717,20 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t
> > portid,
> >
> >  }
> >
> > -#define SLEEP_GEAR1_THRESHOLD100
> > -#define SLEEP_GEAR2_THRESHOLD1000
> > +#define MINIMUM_SLEEP_TIME 1
> > +#define SUSPEND_THRESHOLD  300
> >
> >  static inline uint32_t
> >  power_idle_heuristic(uint32_t zero_rx_packet_count)
> >  {
> > -   /* If zero count is less than 100, use it as the sleep time in us */
> > -   if (zero_rx_packet_count < SLEEP_GEAR1_THRESHOLD)
> > -   return zero_rx_packet_count;
> > -   /* If zero count is less than 1000, sleep time should be 100 us */
> > -   else if ((zero_rx_packet_count >= SLEEP_GEAR1_THRESHOLD) &&
> > -   (zero_rx_packet_count < SLEEP_GEAR2_THRESHOLD))
> > -   return SLEEP_GEAR1_THRESHOLD;
> > -   /* If zero count is greater than 1000, sleep time should be 1000 us */
> > -   else if (zero_rx_packet_count >= SLEEP_GEAR2_THRESHOLD)
> > -   return SLEEP_GEAR2_THRESHOLD;
> > +   /* If zero count is less than 100,  sleep 1us */
> > +   if (zero_rx_packet_count < SUSPEND_THRESHOLD)
> > +   return MINIMUM_SLEEP_TIME;
> > +   /* If zero count is less than 100

[dpdk-dev] [PATCH 2/2] i40e: enable internal switch of pf

2015-01-29 Thread Wu, Jingjing

Hi, Michael

> -Original Message-
> From: Qiu, Michael
> Sent: Thursday, January 29, 2015 9:56 AM
> To: Wu, Jingjing; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/2] i40e: enable internal switch of pf
> 
> On 1/29/2015 9:42 AM, Jingjing Wu wrote:
> > This patch enables PF's internal switch by setting ALLOWLOOPBACK flag
> > when VEB is created. With this patch, traffic from PF can be switched
> > on the VEB.
> >
> > Signed-off-by: Jingjing Wu 
> > ---
> >  lib/librte_pmd_i40e/i40e_ethdev.c | 36
> > 
> >  1 file changed, 36 insertions(+)
> >
> > diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c
> > b/lib/librte_pmd_i40e/i40e_ethdev.c
> > index fe758c2..94fd36c 100644
> > --- a/lib/librte_pmd_i40e/i40e_ethdev.c
> > +++ b/lib/librte_pmd_i40e/i40e_ethdev.c
> > @@ -2854,6 +2854,40 @@ i40e_vsi_dump_bw_config(struct i40e_vsi *vsi)
> > return 0;
> >  }
> >
> > +/*
> > + * i40e_enable_pf_lb
> > + * @pf: pointer to the pf structure
> > + *
> > + * allow loopback on pf
> > + */
> > +static inline void
> > +i40e_enable_pf_lb(struct i40e_pf *pf) {
> > +   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
> > +   struct i40e_vsi_context ctxt;
> > +   int ret;
> > +
> > +   memset(&ctxt, 0, sizeof(ctxt));
> > +   ctxt.seid = pf->main_vsi_seid;
> > +   ctxt.pf_num = hw->pf_id;
> > +   ret = i40e_aq_get_vsi_params(hw, &ctxt, NULL);
> > +   if (ret) {
> > +   PMD_DRV_LOG(ERR, "couldn't get pf vsi config, err %d,
> aq_err %d",
> > +   ret, hw->aq.asq_last_status);
> > +   return;
> > +   }
> > +   ctxt.flags = I40E_AQ_VSI_TYPE_PF;
> > +   ctxt.info.valid_sections =
> > +   rte_cpu_to_le_16(I40E_AQ_VSI_PROP_SWITCH_VALID);
> 
> Here does it need to be "|=" ? As ctxt.infowill be filled in
> i40e_aq_get_vsi_params(), I don't know if it has other issue for override this
> filled by "=".
> 
> Thanks,
> Michael

You can look at the following lines. What we called is 
i40e_aq_update_vsi_params.
So we need only set the flag we want to update.

Thanks
Jingjing

> > +   ctxt.info.switch_id |=
> > +   rte_cpu_to_le_16(I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB);
> > +
> > +   ret = i40e_aq_update_vsi_params(hw, &ctxt, NULL);
> > +   if (ret)
> > +   PMD_DRV_LOG(ERR, "update vsi switch failed,
> aq_err=%d\n",
> > +   hw->aq.asq_last_status);
> > +}
> > +
> >  /* Setup a VSI */
> >  struct i40e_vsi *
> >  i40e_vsi_setup(struct i40e_pf *pf,
> > @@ -2889,6 +2923,8 @@ i40e_vsi_setup(struct i40e_pf *pf,
> > PMD_DRV_LOG(ERR, "VEB setup failed");
> > return NULL;
> > }
> > +   /* set ALLOWLOOPBACk on pf, when veb is created */
> > +   i40e_enable_pf_lb(pf);
> > }
> >
> > vsi = rte_zmalloc("i40e_vsi", sizeof(struct i40e_vsi), 0);

[dpdk-dev] [PATCH v1 4/5] eal: add per rx queue interrupt handling based on VFIO

2015-01-29 Thread Qiu, Michael

On 1/28/2015 5:52 PM, Danny Zhou wrote:
> Signed-off-by: Danny Zhou 
> Signed-off-by: Yong Liu 
> ---

[...]

> +static void
> +eal_intr_process_rx_interrupts(uint8_t port_id, struct epoll_event *events, 
> int nfds)
> +{
> + int n, bytes_read;
> + union rte_intr_read_buffer buf;
> + struct rte_intr_handle intr_handle = 
> rte_eth_devices[port_id].pci_dev->intr_handle;
> +
> + for (n = 0; n < nfds; n++) {
> + /* set the length to be read dor different handle type */
> + switch (intr_handle.type) {
> + case RTE_INTR_HANDLE_UIO:
> + bytes_read = sizeof(buf.uio_intr_count);
> + break;
> + case RTE_INTR_HANDLE_ALARM:
> + bytes_read = sizeof(buf.timerfd_num);
> + break;
> +#ifdef VFIO_PRESENT
> + case RTE_INTR_HANDLE_VFIO_MSIX:
> + case RTE_INTR_HANDLE_VFIO_MSI:
> + case RTE_INTR_HANDLE_VFIO_LEGACY:
> + bytes_read = sizeof(buf.vfio_intr_count);
> + break;
> +#endif
> + default:
> + bytes_read = 1;
> + break;
> + }
> +
> + /**
> + * read out to clear the ready-to-be-read flag
> + * for epoll_wait.
> + */
> + bytes_read = read(events[n].data.fd, &buf, bytes_read);
> + if (bytes_read < 0)
> + RTE_LOG(ERR, EAL, "Error reading from file "
> + "descriptor %d: %s\n", events[n].data.fd,
> + strerror(errno));
> + else if (bytes_read == 0)
> + RTE_LOG(ERR, EAL, "Read nothing from file "
> + "descriptor %d\n", events[n].data.fd);

Is there any issue while bytes_read is not equal to the count need to be
read?

> + }
> +}
> +
> +static void
> +eal_intr_handle_rx_interrupts(uint8_t port_id, int pfd, unsigned totalfds)
> +{
> + struct epoll_event events[totalfds];
> + int nfds = 0;
> +
> +m_wait:
> + nfds = epoll_wait(pfd, events, totalfds,
> + EAL_INTR_EPOLL_WAIT_FOREVER);
> + /* epoll_wait fail */
> + if (nfds < 0) {
> + RTE_LOG(ERR, EAL,
> + "epoll_wait returns with fail\n");
> + return;
> + }
> + /* epoll_wait timeout, will never happens here */
> + else if (nfds == 0)
> + goto m_wait;
> + /* epoll_wait has at least one fd ready to read */
> + eal_intr_process_rx_interrupts(port_id, events, nfds);
> +}
> +
> +int
> +rte_eal_wait_rx_intr(uint8_t port_id, uint8_t queue_id)
> +{
> + struct rte_intr_handle intr_handle = 
> rte_eth_devices[port_id].pci_dev->intr_handle;
> + struct epoll_event ev;
> + unsigned numfds = 0;
> +
> + /* create epoll fd */
> + int pfd = epoll_create(1);
> + if (pfd < 0) {
> + RTE_LOG(ERR, EAL, "Cannot create epoll instance\n");
> + return -1;
> + }
> +
> + rte_spinlock_lock(&intr_lock);
> +
> + ev.events = EPOLLIN | EPOLLPRI;
> + switch (intr_handle.type) {
> + case RTE_INTR_HANDLE_UIO:
> + ev.data.fd = intr_handle.fd;
> + break;
> +#ifdef VFIO_PRESENT
> + case RTE_INTR_HANDLE_VFIO_MSIX:
> + ev.data.fd = intr_handle.queue_fd[queue_id];
> + break;
> + case RTE_INTR_HANDLE_VFIO_MSI:
> + ev.data.fd = intr_handle.queue_fd[queue_id];
> + break;
> + case RTE_INTR_HANDLE_VFIO_LEGACY:
> + ev.data.fd = intr_handle.queue_fd[queue_id];
> + break;

As those three branches are all the same, why not combine to one?

+   case RTE_INTR_HANDLE_VFIO_MSIX:
+   case RTE_INTR_HANDLE_VFIO_MSI:
+   case RTE_INTR_HANDLE_VFIO_LEGACY:
+   ev.data.fd = intr_handle.queue_fd[queue_id];
+   break;


> +#endif
> + default:
> + break;
> + close(pfd);
> + return -1;

Here Steve has pointed out, but indentation should be a issue.

> + }
> +
> + if (epoll_ctl(pfd, EPOLL_CTL_ADD, ev.data.fd, &ev) < 0) {
> + RTE_LOG(ERR, EAL, "Error adding fd %d epoll_ctl, %s\n",
> + intr_handle.queue_fd[queue_id], 
> strerror(errno));
> + } else
> + numfds++;
> +
> + rte_spinlock_unlock(&intr_lock);
> + /* serve the interrupt */
> + eal_intr_handle_rx_interrupts(port_id, pfd, numfds);
> +
> + /**
> + * when we return, we need to rebuild the
> + * list of fds to monitor.
> + */
> + close(pfd);
> +
> + return 0;
> +}
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
> b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> index 20e0977..63d0ae8 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> @@ -283,7 +283

[dpdk-dev] [PATCH v1 4/5] eal: add per rx queue interrupt handling based on VFIO

2015-01-29 Thread Zhou, Danny



> -Original Message-
> From: Qiu, Michael
> Sent: Thursday, January 29, 2015 1:07 PM
> To: Zhou, Danny; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 4/5] eal: add per rx queue interrupt 
> handling based on VFIO
> 
> On 1/28/2015 5:52 PM, Danny Zhou wrote:
> > Signed-off-by: Danny Zhou 
> > Signed-off-by: Yong Liu 
> > ---
> 
> [...]
> 
> > +static void
> > +eal_intr_process_rx_interrupts(uint8_t port_id, struct epoll_event 
> > *events, int nfds)
> > +{
> > +   int n, bytes_read;
> > +   union rte_intr_read_buffer buf;
> > +   struct rte_intr_handle intr_handle = 
> > rte_eth_devices[port_id].pci_dev->intr_handle;
> > +
> > +   for (n = 0; n < nfds; n++) {
> > +   /* set the length to be read dor different handle type */
> > +   switch (intr_handle.type) {
> > +   case RTE_INTR_HANDLE_UIO:
> > +   bytes_read = sizeof(buf.uio_intr_count);
> > +   break;
> > +   case RTE_INTR_HANDLE_ALARM:
> > +   bytes_read = sizeof(buf.timerfd_num);
> > +   break;
> > +#ifdef VFIO_PRESENT
> > +   case RTE_INTR_HANDLE_VFIO_MSIX:
> > +   case RTE_INTR_HANDLE_VFIO_MSI:
> > +   case RTE_INTR_HANDLE_VFIO_LEGACY:
> > +   bytes_read = sizeof(buf.vfio_intr_count);
> > +   break;
> > +#endif
> > +   default:
> > +   bytes_read = 1;
> > +   break;
> > +   }
> > +
> > +   /**
> > +   * read out to clear the ready-to-be-read flag
> > +   * for epoll_wait.
> > +   */
> > +   bytes_read = read(events[n].data.fd, &buf, bytes_read);
> > +   if (bytes_read < 0)
> > +   RTE_LOG(ERR, EAL, "Error reading from file "
> > +   "descriptor %d: %s\n", events[n].data.fd,
> > +   strerror(errno));
> > +   else if (bytes_read == 0)
> > +   RTE_LOG(ERR, EAL, "Read nothing from file "
> > +   "descriptor %d\n", events[n].data.fd);
> 
> Is there any issue while bytes_read is not equal to the count need to be
> read?
> 

Not a problem as long as bytes_read > 0.

> > +   }
> > +}
> > +
> > +static void
> > +eal_intr_handle_rx_interrupts(uint8_t port_id, int pfd, unsigned totalfds)
> > +{
> > +   struct epoll_event events[totalfds];
> > +   int nfds = 0;
> > +
> > +m_wait:
> > +   nfds = epoll_wait(pfd, events, totalfds,
> > +   EAL_INTR_EPOLL_WAIT_FOREVER);
> > +   /* epoll_wait fail */
> > +   if (nfds < 0) {
> > +   RTE_LOG(ERR, EAL,
> > +   "epoll_wait returns with fail\n");
> > +   return;
> > +   }
> > +   /* epoll_wait timeout, will never happens here */
> > +   else if (nfds == 0)
> > +   goto m_wait;
> > +   /* epoll_wait has at least one fd ready to read */
> > +   eal_intr_process_rx_interrupts(port_id, events, nfds);
> > +}
> > +
> > +int
> > +rte_eal_wait_rx_intr(uint8_t port_id, uint8_t queue_id)
> > +{
> > +   struct rte_intr_handle intr_handle = 
> > rte_eth_devices[port_id].pci_dev->intr_handle;
> > +   struct epoll_event ev;
> > +   unsigned numfds = 0;
> > +
> > +   /* create epoll fd */
> > +   int pfd = epoll_create(1);
> > +   if (pfd < 0) {
> > +   RTE_LOG(ERR, EAL, "Cannot create epoll instance\n");
> > +   return -1;
> > +   }
> > +
> > +   rte_spinlock_lock(&intr_lock);
> > +
> > +   ev.events = EPOLLIN | EPOLLPRI;
> > +   switch (intr_handle.type) {
> > +   case RTE_INTR_HANDLE_UIO:
> > +   ev.data.fd = intr_handle.fd;
> > +   break;
> > +#ifdef VFIO_PRESENT
> > +   case RTE_INTR_HANDLE_VFIO_MSIX:
> > +   ev.data.fd = intr_handle.queue_fd[queue_id];
> > +   break;
> > +   case RTE_INTR_HANDLE_VFIO_MSI:
> > +   ev.data.fd = intr_handle.queue_fd[queue_id];
> > +   break;
> > +   case RTE_INTR_HANDLE_VFIO_LEGACY:
> > +   ev.data.fd = intr_handle.queue_fd[queue_id];
> > +   break;
> 
> As those three branches are all the same, why not combine to one?
> 
> + case RTE_INTR_HANDLE_VFIO_MSIX:
> + case RTE_INTR_HANDLE_VFIO_MSI:
> + case RTE_INTR_HANDLE_VFIO_LEGACY:
> + ev.data.fd = intr_handle.queue_fd[queue_id];
> + break;
> 

Accepted.

> 
> > +#endif
> > +   default:
> > +   break;
> > +   close(pfd);
> > +   return -1;
> 
> Here Steve has pointed out, but indentation should be a issue.
> 

To be fixed in V2.

> > +   }
> > +
> > +   if (epoll_ctl(pfd, EPOLL_CTL_ADD, ev.data.fd, &ev) < 0) {
> > +   RTE_LOG(ERR, EAL, "Error adding fd %d epoll_ctl, %s\n",
> > +   intr_handle.queue_fd[queue_id], 
> > strerror(errno));
> > +   } else
> > +   numfds++;
> > +
> > +   rte_spinlock_unlock(&intr_lock);
> > +   /* serve the interrupt */
> > +   eal_intr_handle_rx_interrupts(port_id, pfd, numfds);
> > +
>

[dpdk-dev] [PATCH 00/15] migrate flow director in ixgbe driver to new API

2015-01-29 Thread Jingjing Wu

The patch set uses new filter_ctrl API to replace old flow director filter APIs.
It uses new functions and structure to replace old ones in ixgbe driver,
updates commands to replace old ones in testpmd, and removes the old APIs

Jingjing Wu (15):
  ixgbe: migrate flow director filter operations (add/delete/update) to
new API
  ethdev: extend flow type and flexible payload type definition for flow
director
  ixgbe: implement the flexpayload configuration of flow director filter
  app/test: remove the flexbytes_offset setting in test_link_bonding
  testpmd: remove the flexbytes_offset setting
  ethdev: remove flexbytes_offset from rte_fdir_conf
  ethdev: structures definition for flow director masks
  ixgbe: implement the mask configuration of flow director filter
  ixgbe: implement the get info and statistic operations of flow
director
  ixgbe: implement the flush operation of flow director
  testpmd: add and update commands for flow director
  testpmd: update function to show flow director information
  testpmd: set the default value of flow director's mask
  testpmd: remove old commands for flow director
  doc: commands changed in testpmd_funcs.rst for flow director

 app/test-pmd/cmdline.c  |  755 -
 app/test-pmd/config.c   |  197 +
 app/test-pmd/parameters.c   |   16 -
 app/test-pmd/testpmd.c  |   14 +-
 app/test-pmd/testpmd.h  |   16 -
 app/test/test_link_bonding.c|1 -
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  287 +++
 lib/librte_ether/rte_eth_ctrl.h |   15 +
 lib/librte_ether/rte_ethdev.h   |3 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   11 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   43 +-
 lib/librte_pmd_ixgbe/ixgbe_fdir.c   | 1169 ---
 12 files changed, 1054 insertions(+), 1473 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH 01/15] ixgbe: migrate flow director filter operations (add/delete/update) to new API

2015-01-29 Thread Jingjing Wu

This patch changes the add/delete/update operations to be implemented through
filter_ctrl API and 
RTE_ETH_FILTER_ADD/RTE_ETH_FILTER_DELETE/RTE_ETH_FILTER_UPDATE ops.
It also removes the callback functions:
 - ixgbe_eth_dev_ops.fdir_add_signature_filter
 - ixgbe_eth_dev_ops.fdir_update_signature_filter
 - ixgbe_eth_dev_ops.fdir_remove_signature_filter
 - ixgbe_eth_dev_ops.fdir_add_perfect_filter
 - ixgbe_eth_dev_ops.fdir_update_perfect_filter
 - ixgbe_eth_dev_ops.fdir_remove_perfect_filter

Signed-off-by: Jingjing Wu 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   9 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  23 +-
 lib/librte_pmd_ixgbe/ixgbe_fdir.c   | 927 ++--
 3 files changed, 477 insertions(+), 482 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index b341dd0..c3a76e6 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -360,13 +360,7 @@ static struct eth_dev_ops ixgbe_eth_dev_ops = {
.set_vf_vlan_filter   = ixgbe_set_pool_vlan_filter,
.set_queue_rate_limit = ixgbe_set_queue_rate_limit,
.set_vf_rate_limit= ixgbe_set_vf_rate_limit,
-   .fdir_add_signature_filter= ixgbe_fdir_add_signature_filter,
-   .fdir_update_signature_filter = ixgbe_fdir_update_signature_filter,
-   .fdir_remove_signature_filter = ixgbe_fdir_remove_signature_filter,
.fdir_infos_get   = ixgbe_fdir_info_get,
-   .fdir_add_perfect_filter  = ixgbe_fdir_add_perfect_filter,
-   .fdir_update_perfect_filter   = ixgbe_fdir_update_perfect_filter,
-   .fdir_remove_perfect_filter   = ixgbe_fdir_remove_perfect_filter,
.fdir_set_masks   = ixgbe_fdir_set_masks,
.reta_update  = ixgbe_dev_rss_reta_update,
.reta_query   = ixgbe_dev_rss_reta_query,
@@ -4213,6 +4207,9 @@ ixgbe_dev_filter_ctrl(struct rte_eth_dev *dev,
case RTE_ETH_FILTER_ETHERTYPE:
ret = ixgbe_ethertype_filter_handle(dev, filter_op, arg);
break;
+   case RTE_ETH_FILTER_FDIR:
+   ret = ixgbe_fdir_ctrl_func(dev, filter_op, arg);
+   break;
default:
PMD_DRV_LOG(WARNING, "Filter type (%d) not supported",
filter_type);
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
index 1383194..d92e54a 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
@@ -309,29 +309,9 @@ int ixgbe_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
  */
 int ixgbe_fdir_configure(struct rte_eth_dev *dev);

-int ixgbe_fdir_add_signature_filter(struct rte_eth_dev *dev,
-   struct rte_fdir_filter *fdir_filter, uint8_t queue);
-
-int ixgbe_fdir_update_signature_filter(struct rte_eth_dev *dev,
-   struct rte_fdir_filter *fdir_filter, uint8_t queue);
-
-int ixgbe_fdir_remove_signature_filter(struct rte_eth_dev *dev,
-   struct rte_fdir_filter *fdir_filter);
-
 void ixgbe_fdir_info_get(struct rte_eth_dev *dev,
struct rte_eth_fdir *fdir);

-int ixgbe_fdir_add_perfect_filter(struct rte_eth_dev *dev,
-   struct rte_fdir_filter *fdir_filter, uint16_t soft_id,
-   uint8_t queue, uint8_t drop);
-
-int ixgbe_fdir_update_perfect_filter(struct rte_eth_dev *dev,
-   struct rte_fdir_filter *fdir_filter,uint16_t soft_id,
-   uint8_t queue, uint8_t drop);
-
-int ixgbe_fdir_remove_perfect_filter(struct rte_eth_dev *dev,
-   struct rte_fdir_filter *fdir_filter, uint16_t soft_id);
-
 int ixgbe_fdir_set_masks(struct rte_eth_dev *dev,
struct rte_fdir_masks *fdir_masks);

@@ -355,4 +335,7 @@ void ixgbe_pf_mbx_process(struct rte_eth_dev *eth_dev);
 int ixgbe_pf_host_configure(struct rte_eth_dev *eth_dev);

 uint32_t ixgbe_convert_vm_rx_mask_to_val(uint16_t rx_mask, uint32_t orig_val);
+
+int ixgbe_fdir_ctrl_func(struct rte_eth_dev *dev,
+   enum rte_filter_op filter_op, void *arg);
 #endif /* _IXGBE_ETHDEV_H_ */
diff --git a/lib/librte_pmd_ixgbe/ixgbe_fdir.c 
b/lib/librte_pmd_ixgbe/ixgbe_fdir.c
index cfcb515..e682d14 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_fdir.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_fdir.c
@@ -62,7 +62,28 @@
 #define SIG_BUCKET_64KB_HASH_MASK   0x1FFF  /* 13 bits */
 #define SIG_BUCKET_128KB_HASH_MASK  0x3FFF  /* 14 bits */
 #define SIG_BUCKET_256KB_HASH_MASK  0x7FFF  /* 15 bits */
-
+#define IXGBE_FDIRCMD_CMD_INTERVAL_US   10
+
+static int fdir_erase_filter_82599(struct ixgbe_hw *hw, uint32_t fdirhash);
+static int ixgbe_fdir_filter_to_atr_input(
+   const struct rte_eth_fdir_filter *fdir_filter,
+   union ixgbe_atr_input *input);
+static uint32_t ixgbe_atr_compute_hash_82599(union ixgbe_atr_input *atr_input,
+uint32_t key);
+static uint32_t atr_comput

[dpdk-dev] [PATCH 02/15] ethdev: extend flow type and flexible payload type definition for flow director

2015-01-29 Thread Jingjing Wu

This patch adds RTE_ETH_FLOW_TYPE_RAW and RTE_ETH_RAW_PAYLOAD to support the
flexible payload is started from the beginning of the packet.

Signed-off-by: Jingjing Wu 
---
 lib/librte_ether/rte_eth_ctrl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index 5d9c387..74403b7 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -183,6 +183,7 @@ struct rte_eth_tunnel_filter_conf {
  */
 enum rte_eth_flow_type {
RTE_ETH_FLOW_TYPE_NONE = 0,
+   RTE_ETH_FLOW_TYPE_RAW,
RTE_ETH_FLOW_TYPE_UDPV4,
RTE_ETH_FLOW_TYPE_TCPV4,
RTE_ETH_FLOW_TYPE_SCTPV4,
@@ -347,6 +348,7 @@ struct rte_eth_fdir_filter {
  */
 enum rte_eth_payload_type {
RTE_ETH_PAYLOAD_UNKNOWN = 0,
+   RTE_ETH_RAW_PAYLOAD,
RTE_ETH_L2_PAYLOAD,
RTE_ETH_L3_PAYLOAD,
RTE_ETH_L4_PAYLOAD,
-- 
1.9.3

[dpdk-dev] [PATCH 03/15] ixgbe: implement the flexpayload configuration of flow director filter

2015-01-29 Thread Jingjing Wu

This patch implement the flexpayload configuration of flow director filter.

Signed-off-by: Jingjing Wu 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   1 +
 lib/librte_pmd_ixgbe/ixgbe_fdir.c   | 114 +---
 2 files changed, 108 insertions(+), 7 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
index d92e54a..bea28c5 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
@@ -102,6 +102,7 @@
  * Information about the fdir mode.
  */
 struct ixgbe_hw_fdir_info {
+   uint8_t flex_bytes_offset;
uint16_tcollision;
uint16_tfree;
uint16_tmaxhash;
diff --git a/lib/librte_pmd_ixgbe/ixgbe_fdir.c 
b/lib/librte_pmd_ixgbe/ixgbe_fdir.c
index e682d14..ccc9ad3 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_fdir.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_fdir.c
@@ -62,9 +62,15 @@
 #define SIG_BUCKET_64KB_HASH_MASK   0x1FFF  /* 13 bits */
 #define SIG_BUCKET_128KB_HASH_MASK  0x3FFF  /* 14 bits */
 #define SIG_BUCKET_256KB_HASH_MASK  0x7FFF  /* 15 bits */
+#define IXGBE_DEFAULT_FLEXBYTES_OFFSET  12 /* default flexbytes offset in 
bytes */
+#define IXGBE_MAX_FLX_SOURCE_OFF62
+#define IXGBE_FDIRCTRL_FLEX_MASK(0x1F << IXGBE_FDIRCTRL_FLEX_SHIFT)
 #define IXGBE_FDIRCMD_CMD_INTERVAL_US   10

+static int ixgbe_set_fdir_flex_conf(struct rte_eth_dev *dev,
+   const struct rte_eth_fdir_flex_conf *conf);
 static int fdir_erase_filter_82599(struct ixgbe_hw *hw, uint32_t fdirhash);
+static int fdir_enable_82599(struct ixgbe_hw *hw, uint32_t fdirctrl);
 static int ixgbe_fdir_filter_to_atr_input(
const struct rte_eth_fdir_filter *fdir_filter,
union ixgbe_atr_input *input);
@@ -92,7 +98,8 @@ static int ixgbe_add_del_fdir_filter(struct rte_eth_dev *dev,
  *  @hw: pointer to hardware structure
  *  @fdirctrl: value to write to flow director control register
  **/
-static void fdir_enable_82599(struct ixgbe_hw *hw, u32 fdirctrl)
+static int
+fdir_enable_82599(struct ixgbe_hw *hw, uint32_t fdirctrl)
 {
int i;

@@ -132,16 +139,20 @@ static void fdir_enable_82599(struct ixgbe_hw *hw, u32 
fdirctrl)
msec_delay(1);
}

-   if (i >= IXGBE_FDIR_INIT_DONE_POLL)
-   PMD_INIT_LOG(WARNING, "Flow Director poll time exceeded!");
+   if (i >= IXGBE_FDIR_INIT_DONE_POLL) {
+   PMD_INIT_LOG(ERR, "Flow Director poll time exceeded"
+   "during enabling!");
+   return -ETIMEDOUT;
+   }
+   return 0;
 }

 /*
  * Set appropriate bits in fdirctrl for: variable reporting levels, moving
  * flexbytes matching field, and drop queue (only for perfect matching mode).
  */
-static int
-configure_fdir_flags(struct rte_fdir_conf *conf, uint32_t *fdirctrl)
+static inline int
+configure_fdir_flags(const struct rte_fdir_conf *conf, uint32_t *fdirctrl)
 {
*fdirctrl = 0;

@@ -183,13 +194,88 @@ configure_fdir_flags(struct rte_fdir_conf *conf, uint32_t 
*fdirctrl)
return -EINVAL;
};

-   *fdirctrl |= (conf->flexbytes_offset << IXGBE_FDIRCTRL_FLEX_SHIFT);
+   *fdirctrl |= (IXGBE_DEFAULT_FLEXBYTES_OFFSET / sizeof(uint16_t)) <<
+IXGBE_FDIRCTRL_FLEX_SHIFT;

if (conf->mode == RTE_FDIR_MODE_PERFECT) {
*fdirctrl |= IXGBE_FDIRCTRL_PERFECT_MATCH;
*fdirctrl |= (conf->drop_queue << IXGBE_FDIRCTRL_DROP_Q_SHIFT);
}
+   /*
+* Continue setup of fdirctrl register bits:
+*  Set the maximum length per hash bucket to 0xA filters
+*  Send interrupt when 64 filters are left
+*/
+   *fdirctrl |= (0xA << IXGBE_FDIRCTRL_MAX_LENGTH_SHIFT) |
+   (4 << IXGBE_FDIRCTRL_FULL_THRESH_SHIFT);
+
+   return 0;
+}
+
+/*
+ * ixgbe_check_fdir_flex_conf -check if the flex payload and mask configuration
+ * arguments are valid
+ */
+static int
+ixgbe_set_fdir_flex_conf(struct rte_eth_dev *dev,
+   const struct rte_eth_fdir_flex_conf *conf)
+{
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct ixgbe_hw_fdir_info *info =
+   IXGBE_DEV_PRIVATE_TO_FDIR_INFO(dev->data->dev_private);
+   const struct rte_eth_flex_payload_cfg *flex_cfg;
+   const struct rte_eth_fdir_flex_mask *flex_mask;
+   uint32_t fdirctrl, fdirm;
+   uint16_t flexbytes = 0;
+   uint16_t i;
+
+   fdirctrl = IXGBE_READ_REG(hw, IXGBE_FDIRCTRL);
+   fdirm = IXGBE_READ_REG(hw, IXGBE_FDIRM);
+
+   if (conf == NULL) {
+   PMD_DRV_LOG(INFO, "NULL pointer.");
+   return -EINVAL;
+   }

+   for (i = 0; i < conf->nb_payloads; i++) {
+   flex_cfg = &conf->flex_set[i];
+   if (flex_cfg->type != RTE_ETH_RAW_PAYLOAD) {
+   PMD_DRV_LOG(ERR, "unsupported payload type.");
+   return -EINVAL;
+

[dpdk-dev] [PATCH 04/15] app/test: remove the flexbytes_offset setting in test_link_bonding

2015-01-29 Thread Jingjing Wu

This patch removes the flexbytes_offset setting, because the flexible payload
setting is done by flex_conf instead of flexbytes_offset.

Signed-off-by: Jingjing Wu 
---
 app/test/test_link_bonding.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/app/test/test_link_bonding.c b/app/test/test_link_bonding.c
index 4523de6..630c047 100644
--- a/app/test/test_link_bonding.c
+++ b/app/test/test_link_bonding.c
@@ -182,7 +182,6 @@ struct rte_fdir_conf fdir_conf = {
.mode = RTE_FDIR_MODE_NONE,
.pballoc = RTE_FDIR_PBALLOC_64K,
.status = RTE_FDIR_REPORT_STATUS,
-   .flexbytes_offset = 0x6,
.drop_queue = 127,
 };

-- 
1.9.3

[dpdk-dev] [PATCH 05/15] testpmd: remove the flexbytes_offset setting

2015-01-29 Thread Jingjing Wu

This patch removes the flexbytes_offset setting, because the flexible payload
setting is done by flex_conf instead of flexbytes_offset.

Signed-off-by: Jingjing Wu 
---
 app/test-pmd/parameters.c | 16 
 app/test-pmd/testpmd.c|  1 -
 2 files changed, 17 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index adf3203..db124db 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -145,10 +145,6 @@ usage(char* progname)
   "(N: none  or match (default) or always).\n");
printf("  --pkt-filter-size=N: set Flow Director mode "
   "(N: 64K (default mode) or 128K or 256K).\n");
-   printf("  --pkt-filter-flexbytes-offset=N: set flexbytes-offset. "
-  "The offset is defined in word units counted from the "
-  "first byte of the destination Ethernet MAC address. "
-  "0 <= N <= 32.\n");
printf("  --pkt-filter-drop-queue=N: set drop-queue. "
   "In perfect mode, when you add a rule with queue = -1 "
   "the packet will be enqueued into the rx drop-queue. "
@@ -523,7 +519,6 @@ launch_args_parse(int argc, char** argv)
{ "pkt-filter-mode",1, 0, 0 },
{ "pkt-filter-report-hash", 1, 0, 0 },
{ "pkt-filter-size",1, 0, 0 },
-   { "pkt-filter-flexbytes-offset",1, 0, 0 },
{ "pkt-filter-drop-queue",  1, 0, 0 },
{ "crc-strip",  0, 0, 0 },
{ "enable-rx-cksum",0, 0, 0 },
@@ -747,17 +742,6 @@ launch_args_parse(int argc, char** argv)
 optarg);
}
if (!strcmp(lgopts[opt_idx].name,
-   "pkt-filter-flexbytes-offset")) {
-   n = atoi(optarg);
-   if ( n >= 0 && n <= (int) 32)
-   fdir_conf.flexbytes_offset =
-   (uint8_t) n;
-   else
-   rte_exit(EXIT_FAILURE,
-"flexbytes %d invalid - must"
-"be  >= 0 && <= 32\n", n);
-   }
-   if (!strcmp(lgopts[opt_idx].name,
"pkt-filter-drop-queue")) {
n = atoi(optarg);
if (n >= 0)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 773b8af..2773c10 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -298,7 +298,6 @@ struct rte_fdir_conf fdir_conf = {
.mode = RTE_FDIR_MODE_NONE,
.pballoc = RTE_FDIR_PBALLOC_64K,
.status = RTE_FDIR_REPORT_STATUS,
-   .flexbytes_offset = 0x6,
.drop_queue = 127,
 };

-- 
1.9.3

[dpdk-dev] [PATCH 07/15] ethdev: structures definition for flow director masks

2015-01-29 Thread Jingjing Wu

This patch defines structure rte_eth_fdir_masks.
It extends rte_fdir_conf and rte_eth_fdir_info to contain mask's configuration.

Signed-off-by: Jingjing Wu 
---
 lib/librte_ether/rte_eth_ctrl.h | 13 +
 lib/librte_ether/rte_ethdev.h   |  1 +
 2 files changed, 14 insertions(+)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index 74403b7..a5db310 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -344,6 +344,18 @@ struct rte_eth_fdir_filter {
 };

 /**
+ *  A structure used to configure FDIR masks that are used by the device
+ *  to match the various fields of RX packet headers.
+ */
+struct rte_eth_fdir_masks {
+   uint16_t vlan_tci_mask;
+   struct rte_eth_ipv4_flow   ipv4_mask;
+   struct rte_eth_ipv6_flow   ipv6_mask;
+   uint16_t src_port_mask;
+   uint16_t dst_port_mask;
+};
+
+/**
  * Payload type
  */
 enum rte_eth_payload_type {
@@ -409,6 +421,7 @@ enum rte_fdir_mode {
  */
 struct rte_eth_fdir_info {
enum rte_fdir_mode mode; /**< Flow director mode */
+   struct rte_eth_fdir_masks mask;
struct rte_eth_fdir_flex_conf flex_conf;
/**< Flex payload configuration information */
uint32_t guarant_spc;  /**< Guaranteed spaces.*/
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index cbe05b1..6c8ad08 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -720,6 +720,7 @@ struct rte_fdir_conf {
enum rte_fdir_status_mode status;  /**< How to report FDIR hash. */
/** RX queue of packets matching a "drop" filter in perfect mode. */
uint8_t drop_queue;
+   struct rte_eth_fdir_masks mask;
struct rte_eth_fdir_flex_conf flex_conf;
/**< Flex payload configuration. */
 };
-- 
1.9.3

[dpdk-dev] [PATCH 08/15] ixgbe: implement the mask configuration of flow director filter

2015-01-29 Thread Jingjing Wu

This patch implement the mask configuration of flow director filter,
by using the mask defined in rte_fdir_conf instead of callback function
fdir_set_masks.

Signed-off-by: Jingjing Wu 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   1 -
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  16 +-
 lib/librte_pmd_ixgbe/ixgbe_fdir.c   | 282 +++-
 3 files changed, 165 insertions(+), 134 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index c3a76e6..8bc7009 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -361,7 +361,6 @@ static struct eth_dev_ops ixgbe_eth_dev_ops = {
.set_queue_rate_limit = ixgbe_set_queue_rate_limit,
.set_vf_rate_limit= ixgbe_set_vf_rate_limit,
.fdir_infos_get   = ixgbe_fdir_info_get,
-   .fdir_set_masks   = ixgbe_fdir_set_masks,
.reta_update  = ixgbe_dev_rss_reta_update,
.reta_query   = ixgbe_dev_rss_reta_query,
 #ifdef RTE_NIC_BYPASS
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
index bea28c5..7961167 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
@@ -101,7 +101,20 @@
 /*
  * Information about the fdir mode.
  */
+
+struct ixgbe_hw_fdir_mask {
+   uint16_t vlan_tci_mask;
+   uint32_t src_ipv4_mask;
+   uint32_t dst_ipv4_mask;
+   uint16_t src_ipv6_mask;
+   uint16_t dst_ipv6_mask;
+   uint16_t src_port_mask;
+   uint16_t dst_port_mask;
+   uint16_t flex_bytes_mask;
+};
+
 struct ixgbe_hw_fdir_info {
+   struct ixgbe_hw_fdir_mask mask;
uint8_t flex_bytes_offset;
uint16_tcollision;
uint16_tfree;
@@ -313,9 +326,6 @@ int ixgbe_fdir_configure(struct rte_eth_dev *dev);
 void ixgbe_fdir_info_get(struct rte_eth_dev *dev,
struct rte_eth_fdir *fdir);

-int ixgbe_fdir_set_masks(struct rte_eth_dev *dev,
-   struct rte_fdir_masks *fdir_masks);
-
 void ixgbe_configure_dcb(struct rte_eth_dev *dev);

 /*
diff --git a/lib/librte_pmd_ixgbe/ixgbe_fdir.c 
b/lib/librte_pmd_ixgbe/ixgbe_fdir.c
index ccc9ad3..16b0ba8 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_fdir.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_fdir.c
@@ -63,13 +63,53 @@
 #define SIG_BUCKET_128KB_HASH_MASK  0x3FFF  /* 14 bits */
 #define SIG_BUCKET_256KB_HASH_MASK  0x7FFF  /* 15 bits */
 #define IXGBE_DEFAULT_FLEXBYTES_OFFSET  12 /* default flexbytes offset in 
bytes */
+#define IXGBE_FDIR_MAX_FLEX_LEN 2 /* len in bytes of flexbytes */
 #define IXGBE_MAX_FLX_SOURCE_OFF62
 #define IXGBE_FDIRCTRL_FLEX_MASK(0x1F << IXGBE_FDIRCTRL_FLEX_SHIFT)
 #define IXGBE_FDIRCMD_CMD_INTERVAL_US   10

+#define IXGBE_FDIR_FLOW_TYPES ( \
+   (1 << RTE_ETH_FLOW_TYPE_UDPV4) | \
+   (1 << RTE_ETH_FLOW_TYPE_TCPV4) | \
+   (1 << RTE_ETH_FLOW_TYPE_SCTPV4) | \
+   (1 << RTE_ETH_FLOW_TYPE_IPV4_OTHER) | \
+   (1 << RTE_ETH_FLOW_TYPE_UDPV6) | \
+   (1 << RTE_ETH_FLOW_TYPE_TCPV6) | \
+   (1 << RTE_ETH_FLOW_TYPE_SCTPV6) | \
+   (1 << RTE_ETH_FLOW_TYPE_IPV6_OTHER))
+
+#define IPV6_ADDR_TO_MASK(ipaddr, ipv6m) do { \
+   uint8_t ipv6_addr[16]; \
+   uint8_t i; \
+   rte_memcpy(ipv6_addr, (ipaddr), sizeof(ipv6_addr));\
+   (ipv6m) = 0; \
+   for (i = 0; i < sizeof(ipv6_addr); i++) { \
+   if (ipv6_addr[i] == UINT8_MAX) \
+   (ipv6m) |= 1 << i; \
+   else if (ipv6_addr[i] != 0) { \
+   PMD_DRV_LOG(ERR, " invalid IPv6 address mask."); \
+   return -EINVAL; \
+   } \
+   } \
+} while (0)
+
+#define IPV6_MASK_TO_ADDR(ipv6m, ipaddr) do { \
+   uint8_t ipv6_addr[16]; \
+   uint8_t i; \
+   for (i = 0; i < sizeof(ipv6_addr); i++) { \
+   if ((ipv6m) & (1 << i)) \
+   ipv6_addr[i] = UINT8_MAX; \
+   else \
+   ipv6_addr[i] = 0; \
+   } \
+   rte_memcpy((ipaddr), ipv6_addr, sizeof(ipv6_addr));\
+} while (0)
+
+static int fdir_erase_filter_82599(struct ixgbe_hw *hw, uint32_t fdirhash);
+static int fdir_set_input_mask_82599(struct rte_eth_dev *dev,
+   const struct rte_eth_fdir_masks *input_mask);
 static int ixgbe_set_fdir_flex_conf(struct rte_eth_dev *dev,
const struct rte_eth_fdir_flex_conf *conf);
-static int fdir_erase_filter_82599(struct ixgbe_hw *hw, uint32_t fdirhash);
 static int fdir_enable_82599(struct ixgbe_hw *hw, uint32_t fdirctrl);
 static int ixgbe_fdir_filter_to_atr_input(
const struct rte_eth_fdir_filter *fdir_filter,
@@ -212,6 +252,111 @@ configure_fdir_flags(const struct rte_fdir_conf *conf, 
uint32_t *fdirctrl)
return 0;
 }

+/**
+ * Reverse the bits in FDIR registers that store 2 x 16 bit masks.
+ *
+ *  @hi_dword: Bits 31:16 mask to be bit swapped.
+ *  @lo_dword: Bits 15:0  mask to be bit s

[dpdk-dev] [PATCH 06/15] ethdev: remove flexbytes_offset from rte_fdir_conf

2015-01-29 Thread Jingjing Wu

This patch removes the flexbytes_offset from rte_fdir_conf, because
the flexible payload setting is done by flex_conf instead of flexbytes_offset.

Signed-off-by: Jingjing Wu 
---
 lib/librte_ether/rte_ethdev.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 1200c1c..cbe05b1 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -718,8 +718,6 @@ struct rte_fdir_conf {
enum rte_fdir_mode mode; /**< Flow Director mode. */
enum rte_fdir_pballoc_type pballoc; /**< Space for FDIR filters. */
enum rte_fdir_status_mode status;  /**< How to report FDIR hash. */
-   /** Offset of flexbytes field in RX packets (in 16-bit word units). */
-   uint8_t flexbytes_offset;
/** RX queue of packets matching a "drop" filter in perfect mode. */
uint8_t drop_queue;
struct rte_eth_fdir_flex_conf flex_conf;
-- 
1.9.3

[dpdk-dev] [PATCH 09/15] ixgbe: implement the get info and statistic operations of flow director

2015-01-29 Thread Jingjing Wu

This patch changes the get info operation to be implemented through
filter_ctrl API and RTE_ETH_FILTER_INFO/RTE_ETH_FILTER_STATS ops.

Signed-off-by: Jingjing Wu 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  1 -
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  3 --
 lib/librte_pmd_ixgbe/ixgbe_fdir.c   | 94 ++---
 3 files changed, 78 insertions(+), 20 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 8bc7009..a2cdef7 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -360,7 +360,6 @@ static struct eth_dev_ops ixgbe_eth_dev_ops = {
.set_vf_vlan_filter   = ixgbe_set_pool_vlan_filter,
.set_queue_rate_limit = ixgbe_set_queue_rate_limit,
.set_vf_rate_limit= ixgbe_set_vf_rate_limit,
-   .fdir_infos_get   = ixgbe_fdir_info_get,
.reta_update  = ixgbe_dev_rss_reta_update,
.reta_query   = ixgbe_dev_rss_reta_query,
 #ifdef RTE_NIC_BYPASS
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
index 7961167..e985199 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
@@ -323,9 +323,6 @@ int ixgbe_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
  */
 int ixgbe_fdir_configure(struct rte_eth_dev *dev);

-void ixgbe_fdir_info_get(struct rte_eth_dev *dev,
-   struct rte_eth_fdir *fdir);
-
 void ixgbe_configure_dcb(struct rte_eth_dev *dev);

 /*
diff --git a/lib/librte_pmd_ixgbe/ixgbe_fdir.c 
b/lib/librte_pmd_ixgbe/ixgbe_fdir.c
index 16b0ba8..40ab342 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_fdir.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_fdir.c
@@ -130,6 +130,11 @@ static int ixgbe_add_del_fdir_filter(struct rte_eth_dev 
*dev,
  const struct rte_eth_fdir_filter *fdir_filter,
  bool del,
  bool update);
+static void ixgbe_fdir_info_get(struct rte_eth_dev *dev,
+   struct rte_eth_fdir_info *fdir_info);
+static void ixgbe_fdir_stats_get(struct rte_eth_dev *dev,
+   struct rte_eth_fdir_stats *fdir_stats);
+
 /**
  * This function is based on ixgbe_fdir_enable_82599() in ixgbe/ixgbe_82599.c.
  * It adds extra configuration of fdirctrl that is common for all filter types.
@@ -959,19 +964,61 @@ ixgbe_add_del_fdir_filter(struct rte_eth_dev *dev,
return err;
 }

-void
-ixgbe_fdir_info_get(struct rte_eth_dev *dev, struct rte_eth_fdir *fdir)
+#define FDIRENTRIES_NUM_SHIFT 10
+static void
+ixgbe_fdir_info_get(struct rte_eth_dev *dev, struct rte_eth_fdir_info 
*fdir_info)
 {
struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct ixgbe_hw_fdir_info *info =
IXGBE_DEV_PRIVATE_TO_FDIR_INFO(dev->data->dev_private);
-   uint32_t reg;
+   uint32_t fdirctrl, max_num;
+   uint8_t offset;

-   if (hw->mac.type != ixgbe_mac_82599EB &&
-   hw->mac.type != ixgbe_mac_X540 &&
-   hw->mac.type != ixgbe_mac_X550 &&
-   hw->mac.type != ixgbe_mac_X550EM_x)
-   return;
+   fdirctrl = IXGBE_READ_REG(hw, IXGBE_FDIRCTRL);
+   offset = ((fdirctrl & IXGBE_FDIRCTRL_FLEX_MASK) >>
+   IXGBE_FDIRCTRL_FLEX_SHIFT) * sizeof(uint16_t);
+
+   fdir_info->mode = dev->data->dev_conf.fdir_conf.mode;
+   max_num = (1 << (FDIRENTRIES_NUM_SHIFT +
+   (fdirctrl & FDIRCTRL_PBALLOC_MASK)));
+   if (fdir_info->mode == RTE_FDIR_MODE_PERFECT)
+   fdir_info->guarant_spc = max_num;
+   else if (fdir_info->mode == RTE_FDIR_MODE_SIGNATURE)
+   fdir_info->guarant_spc = max_num * 4;
+
+   fdir_info->mask.vlan_tci_mask = info->mask.vlan_tci_mask;
+   fdir_info->mask.ipv4_mask.src_ip = info->mask.src_ipv4_mask;
+   fdir_info->mask.ipv4_mask.dst_ip = info->mask.dst_ipv4_mask;
+   IPV6_MASK_TO_ADDR(info->mask.src_ipv6_mask,
+   fdir_info->mask.ipv6_mask.src_ip);
+   IPV6_MASK_TO_ADDR(info->mask.dst_ipv6_mask,
+   fdir_info->mask.ipv6_mask.dst_ip);
+   fdir_info->mask.src_port_mask = info->mask.src_port_mask;
+   fdir_info->mask.dst_port_mask = info->mask.dst_port_mask;
+   fdir_info->max_flexpayload = IXGBE_FDIR_MAX_FLEX_LEN;
+   fdir_info->flow_types_mask[0] = IXGBE_FDIR_FLOW_TYPES;
+   fdir_info->flex_payload_unit = sizeof(uint16_t);
+   fdir_info->max_flex_payload_segment_num = 1;
+   fdir_info->flex_payload_limit = 62;
+   fdir_info->flex_conf.nb_payloads = 1;
+   fdir_info->flex_conf.flex_set[0].type = RTE_ETH_RAW_PAYLOAD;
+   fdir_info->flex_conf.flex_set[0].src_offset[0] = offset;
+   fdir_info->flex_conf.flex_set[0].src_offset[1] = offset + 1;
+   fdir_info->flex_conf.nb_flexmasks = 1;
+   fdir_info->flex_conf.flex_mask[0].flow_type = RTE_ETH_FLOW_TYPE_RAW;
+   fdi

[dpdk-dev] [PATCH 10/15] ixgbe: implement the flush operation of flow director

2015-01-29 Thread Jingjing Wu

This patch implement RTE_ETH_FILTER_FLUSH operation to delete all
flow director filters in ixgbe driver.

Signed-off-by: Jingjing Wu 
---
 lib/librte_pmd_ixgbe/ixgbe_fdir.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_fdir.c 
b/lib/librte_pmd_ixgbe/ixgbe_fdir.c
index 40ab342..4f843bb 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_fdir.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_fdir.c
@@ -130,6 +130,7 @@ static int ixgbe_add_del_fdir_filter(struct rte_eth_dev 
*dev,
  const struct rte_eth_fdir_filter *fdir_filter,
  bool del,
  bool update);
+static int ixgbe_fdir_flush(struct rte_eth_dev *dev);
 static void ixgbe_fdir_info_get(struct rte_eth_dev *dev,
struct rte_eth_fdir_info *fdir_info);
 static void ixgbe_fdir_stats_get(struct rte_eth_dev *dev,
@@ -964,6 +965,28 @@ ixgbe_add_del_fdir_filter(struct rte_eth_dev *dev,
return err;
 }

+static int
+ixgbe_fdir_flush(struct rte_eth_dev *dev)
+{
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct ixgbe_hw_fdir_info *info =
+   IXGBE_DEV_PRIVATE_TO_FDIR_INFO(dev->data->dev_private);
+   int ret;
+
+   ret = ixgbe_reinit_fdir_tables_82599(hw);
+   if (ret < 0) {
+   PMD_INIT_LOG(ERR, "Failed to re-initialize FD table.");
+   return ret;
+   }
+
+   info->f_add = 0;
+   info->f_remove = 0;
+   info->add = 0;
+   info->remove = 0;
+
+   return ret;
+}
+
 #define FDIRENTRIES_NUM_SHIFT 10
 static void
 ixgbe_fdir_info_get(struct rte_eth_dev *dev, struct rte_eth_fdir_info 
*fdir_info)
@@ -1103,6 +1126,9 @@ ixgbe_fdir_ctrl_func(struct rte_eth_dev *dev,
ret = ixgbe_add_del_fdir_filter(dev,
(struct rte_eth_fdir_filter *)arg, TRUE, FALSE);
break;
+   case RTE_ETH_FILTER_FLUSH:
+   ret = ixgbe_fdir_flush(dev);
+   break;
case RTE_ETH_FILTER_INFO:
ixgbe_fdir_info_get(dev, (struct rte_eth_fdir_info *)arg);
break;
-- 
1.9.3

[dpdk-dev] [PATCH 12/15] testpmd: update function to show flow director information

2015-01-29 Thread Jingjing Wu

update the function to print information includes:
 - capability
 - mask
 - flex configuration

Signed-off-by: Jingjing Wu 
---
 app/test-pmd/config.c | 77 ++-
 1 file changed, 45 insertions(+), 32 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index c40f819..4db9b9a 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -98,6 +98,7 @@

 static const char *flowtype_str[RTE_ETH_FLOW_TYPE_MAX] = {
NULL,
+   "raw",
"udp4",
"tcp4",
"sctp4",
@@ -1822,14 +1823,34 @@ fdir_remove_signature_filter(portid_t port_id,
 }

 static inline void
-print_fdir_flex_payload(struct rte_eth_fdir_flex_conf *flex_conf)
+print_fdir_mask(struct rte_eth_fdir_masks *mask)
+{
+   printf("\nvlan_tci: 0x%04x, src_ipv4: 0x%08x, dst_ipv4: 0x%08x,"
+ " src_port: 0x%04x, dst_port: 0x%04x",
+   mask->vlan_tci_mask, mask->ipv4_mask.src_ip,
+   mask->ipv4_mask.dst_ip,
+   mask->src_port_mask, mask->dst_port_mask);
+
+   printf("\nsrc_ipv6: 0x%08x,0x%08x,0x%08x,0x%08x,"
+" dst_ipv6: 0x%08x,0x%08x,0x%08x,0x%08x",
+   mask->ipv6_mask.src_ip[0], mask->ipv6_mask.src_ip[1],
+   mask->ipv6_mask.src_ip[2], mask->ipv6_mask.src_ip[3],
+   mask->ipv6_mask.dst_ip[0], mask->ipv6_mask.dst_ip[1],
+   mask->ipv6_mask.dst_ip[2], mask->ipv6_mask.dst_ip[3]);
+   printf("\n");
+}
+
+static inline void
+print_fdir_flex_payload(struct rte_eth_fdir_flex_conf *flex_conf, uint32_t num)
 {
struct rte_eth_flex_payload_cfg *cfg;
-   int i, j;
+   uint32_t i, j;

for (i = 0; i < flex_conf->nb_payloads; i++) {
cfg = &flex_conf->flex_set[i];
-   if (cfg->type == RTE_ETH_L2_PAYLOAD)
+   if (cfg->type == RTE_ETH_RAW_PAYLOAD)
+   printf("\nRAW:  ");
+   else if (cfg->type == RTE_ETH_L2_PAYLOAD)
printf("\nL2_PAYLOAD:  ");
else if (cfg->type == RTE_ETH_L3_PAYLOAD)
printf("\nL3_PAYLOAD:  ");
@@ -1837,22 +1858,22 @@ print_fdir_flex_payload(struct rte_eth_fdir_flex_conf 
*flex_conf)
printf("\nL4_PAYLOAD:  ");
else
printf("\nUNKNOWN PAYLOAD(%u):  ", cfg->type);
-   for (j = 0; j < RTE_ETH_FDIR_MAX_FLEXLEN; j++)
+   for (j = 0; j < num; j++)
printf("  %-5u", cfg->src_offset[j]);
}
printf("\n");
 }

 static inline void
-print_fdir_flex_mask(struct rte_eth_fdir_flex_conf *flex_conf)
+print_fdir_flex_mask(struct rte_eth_fdir_flex_conf *flex_conf, uint32_t num)
 {
struct rte_eth_fdir_flex_mask *mask;
-   int i, j;
+   uint32_t i, j;

for (i = 0; i < flex_conf->nb_flexmasks; i++) {
mask = &flex_conf->flex_mask[i];
printf("\n%s:\t", flowtype_str[mask->flow_type]);
-   for (j = 0; j < RTE_ETH_FDIR_MAX_FLEXLEN; j++)
+   for (j = 0; j < num; j++)
printf(" %02x", mask->mask[j]);
}
printf("\n");
@@ -1885,26 +1906,8 @@ fdir_get_infos(portid_t port_id)
return;
ret = rte_eth_dev_filter_supported(port_id, RTE_ETH_FILTER_FDIR);
if (ret < 0) {
-   /* use the old fdir APIs to get info */
-   struct rte_eth_fdir fdir;
-   memset(&fdir, 0, sizeof(fdir));
-   ret = rte_eth_dev_fdir_get_infos(port_id, &fdir);
-   if (ret < 0) {
-   printf("\n getting fdir info fails on port %-2d\n",
-   port_id);
-   return;
-   }
-   printf("\n  %s FDIR infos for port %-2d %s\n",
-   fdir_stats_border, port_id, fdir_stats_border);
-   printf("  collision: %-10"PRIu64"  free: %"PRIu64"\n"
-  "  maxhash:   %-10"PRIu64"  maxlen:   %"PRIu64"\n"
-  "  add:   %-10"PRIu64"  remove:   %"PRIu64"\n"
-  "  f_add: %-10"PRIu64"  f_remove: %"PRIu64"\n",
-  (uint64_t)(fdir.collision), (uint64_t)(fdir.free),
-  (uint64_t)(fdir.maxhash), (uint64_t)(fdir.maxlen),
-  fdir.add, fdir.remove, fdir.f_add, fdir.f_remove);
-   printf("  %s%s\n",
-  fdir_stats_border, fdir_stats_border);
+   printf("\n FDIR is not supported on port %-2d\n",
+   port_id);
return;
}

@@ -1933,18 +1936,28 @@ fdir_get_infos(portid_t port_id)
fdir_info.flex_payload_unit,
fdir_info.max_flex_payload_segment_num,
fdir_info.flex_bitmask_unit, fdir_info.max_flex_bitmask_num);
+

[dpdk-dev] [PATCH 13/15] testpmd: set the default value of flow director's mask

2015-01-29 Thread Jingjing Wu

this patch sets the default value of flow director's mask.

Signed-off-by: Jingjing Wu 
---
 app/test-pmd/testpmd.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 2773c10..8b41fe5 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -298,6 +298,19 @@ struct rte_fdir_conf fdir_conf = {
.mode = RTE_FDIR_MODE_NONE,
.pballoc = RTE_FDIR_PBALLOC_64K,
.status = RTE_FDIR_REPORT_STATUS,
+   .mask = {
+   .vlan_tci_mask = 0x0,
+   .ipv4_mask = {
+   .src_ip = 0x,
+   .dst_ip = 0x,
+   },
+   .ipv6_mask = {
+   .src_ip = {0x, 0x, 0x, 
0x},
+   .dst_ip = {0x, 0x, 0x, 
0x},
+   },
+   .src_port_mask = 0x,
+   .dst_port_mask = 0x,
+   },
.drop_queue = 127,
 };

-- 
1.9.3

[dpdk-dev] [PATCH 14/15] testpmd: remove old commands for flow director

2015-01-29 Thread Jingjing Wu

Following commands of flow director filter are removed:
  - add_signature_filter
  - upd_signature_filter
  - rm_signature_filter
  - add_perfect_filter
  - upd_perfect_filter
  - rm_perfect_filter
  - set_masks_filter
  - set_ipv6_masks_filter

Signed-off-by: Jingjing Wu 
---
 app/test-pmd/cmdline.c | 573 +
 app/test-pmd/config.c  | 124 ---
 app/test-pmd/testpmd.h |  16 --
 3 files changed, 1 insertion(+), 712 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 3671d25..fb7eff5 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -115,7 +115,6 @@ static void cmd_help_brief_parsed(__attribute__((unused)) 
void *parsed_result,
"information.\n"
"help config : Configuration information.\n"
"help ports  : Configuring ports.\n"
-   "help flowdir: Flow Director filter help.\n"
"help registers  : Reading and setting port registers.\n"
"help filters: Filters configuration help.\n"
"help all: All of the above sections.\n\n"
@@ -490,71 +489,6 @@ static void cmd_help_long_parsed(void *parsed_result,
);
}

-
-   if (show_all || !strcmp(res->section, "flowdir")) {
-
-   cmdline_printf(
-   cl,
-   "\n"
-   "Flow director mode:\n"
-   "---\n\n"
-
-   "add_signature_filter (port_id) (ip|udp|tcp|sctp)"
-   " src (src_ip_address) (src_port)"
-   " dst (dst_ip_address) (dst_port)"
-   " flexbytes (flexbytes_values) vlan (vlan_id)"
-   " queue (queue_id)\n"
-   "Add a signature filter.\n\n"
-
-   "upd_signature_filter (port_id) (ip|udp|tcp|sctp)"
-   " src (src_ip_address) (src_port)"
-   " dst (dst_ip_address) (dst_port)"
-   " flexbytes (flexbytes_values) vlan (vlan_id)"
-   " queue (queue_id)\n"
-   "Update a signature filter.\n\n"
-
-   "rm_signature_filter (port_id) (ip|udp|tcp|sctp)"
-   " src (src_ip_address) (src_port)"
-   " dst (dst_ip_address) (dst_port)"
-   " flexbytes (flexbytes_values) vlan (vlan_id)\n"
-   "Remove a signature filter.\n\n"
-
-   "add_perfect_filter (port_id) (ip|udp|tcp|sctp)"
-   " src (src_ip_address) (src_port)"
-   " dst (dst_ip_address) (dst_port)"
-   " flexbytes (flexbytes_values) vlan (vlan_id)"
-   " queue (queue_id) soft (soft_id)\n"
-   "Add a perfect filter.\n\n"
-
-   "upd_perfect_filter (port_id) (ip|udp|tcp|sctp)"
-   " src (src_ip_address) (src_port)"
-   " dst (dst_ip_address) (dst_port)"
-   " flexbytes (flexbytes_values) vlan (vlan_id)"
-   " queue (queue_id)\n"
-   "Update a perfect filter.\n\n"
-
-   "rm_perfect_filter (port_id) (ip|udp|tcp|sctp)"
-   " src (src_ip_address) (src_port)"
-   " dst (dst_ip_address) (dst_port)"
-   " flexbytes (flexbytes_values) vlan (vlan_id)"
-   " soft (soft_id)\n"
-   "Remove a perfect filter.\n\n"
-
-   "set_masks_filter (port_id) only_ip_flow (0|1)"
-   " src_mask (ip_src_mask) (src_port_mask)"
-   " dst_mask (ip_dst_mask) (dst_port_mask)"
-   " flexbytes (0|1) vlan_id (0|1) vlan_prio (0|1)\n"
-   "Set IPv4 filter masks.\n\n"
-
-   "set_ipv6_masks_filter (port_id) only_ip_flow (0|1)"
-   " src_mask (ip_src_mask) (src_port_mask)"
-   " dst_mask (ip_dst_mask) (dst_port_mask)"
-   " flexbytes (0|1) vlan_id (0|1) vlan_prio (0|1)"
-   " compare_dst (0|1)\n"
-   "Set IPv6 filter masks.\n\n"
-   );
-   }
-
if (show_all || !strcmp(res->section, "ports")) {

cmdline_printf(
@@ -750,7 +684,7 @@ cmdline_parse_token_string_t cmd_help_long_help =

 cmdline_parse_token_string_t cmd_help_long_section =
TOKEN_STRING_INITIALIZER(struct cmd_help_long_result, section,
-   "all#control#display#config#flowdir#"
+   "all#control#display#config#"
"ports#registers#filters");

 cmdline_parse_inst_t cmd_help_long = {
@@ -4361,503 +4295

[dpdk-dev] [PATCH 11/15] testpmd: add and update commands for flow director

2015-01-29 Thread Jingjing Wu

Add new command to set flow director's mask:
  - flow_director_mask
Update arguments of commands:
  - flow_director_filter
  - flow_director_flex_mask
  - flow_director_flex_payload

Signed-off-by: Jingjing Wu 
---
 app/test-pmd/cmdline.c | 182 -
 1 file changed, 164 insertions(+), 18 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4beb404..3671d25 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -701,25 +701,26 @@ static void cmd_help_long_parsed(void *parsed_result,
"get_flex_filter (port_id) index (idx)\n"
"get info of a flex filter.\n\n"

-   "flow_director_filter (port_id) (add|del)"
+   "flow_director_filter (port_id) (add|del|update)"
" flow (ip4|ip4-frag|ip6|ip6-frag)"
" src (src_ip_address) dst (dst_ip_address)"
-   " flexbytes (flexbytes_value)"
+   " vlan (vlan_value) flexbytes (flexbytes_value)"
" (drop|fwd) queue (queue_id) fd_id (fd_id_value)\n"
"Add/Del an IP type flow director filter.\n\n"

-   "flow_director_filter (port_id) (add|del)"
+   "flow_director_filter (port_id) (add|del|update)"
" flow (udp4|tcp4|udp6|tcp6)"
" src (src_ip_address) (src_port)"
" dst (dst_ip_address) (dst_port)"
-   " flexbytes (flexbytes_value)"
+   " vlan (vlan_value) flexbytes (flexbytes_value)"
" (drop|fwd) queue (queue_id) fd_id (fd_id_value)\n"
"Add/Del an UDP/TCP type flow director filter.\n\n"

-   "flow_director_filter (port_id) (add|del)"
+   "flow_director_filter (port_id) (add|del|update)"
" flow (sctp4|sctp6)"
-   " src (src_ip_address) dst (dst_ip_address)"
-   " tag (verification_tag)"
+   " src (src_ip_address) (src_port)"
+   " dst (dst_ip_address) (dst_port)"
+   " tag (verification_tag) vlan (vlan_value)"
" flexbytes (flexbytes_value) (drop|fwd)"
" queue (queue_id) fd_id (fd_id_value)\n"
"Add/Del a SCTP type flow director filter.\n\n"
@@ -727,13 +728,18 @@ static void cmd_help_long_parsed(void *parsed_result,
"flush_flow_director (port_id)\n"
"Flush all flow director entries of a device.\n\n"

+   "flow_director_mask (port_id) vlan (vlan_value)"
+   " src_mask (ipv4_src) (ipv6_src) (src_port)"
+   " dst_mask (ipv4_dst) (ipv6_dst) (dst_port)\n"
+   "Set flow director mask.\n\n"
+
"flow_director_flex_mask (port_id)"
-   " flow 
(ip4|ip4-frag|tcp4|udp4|sctp4|ip6|ip6-frag|tcp6|udp6|sctp6|all)"
+   " flow 
(raw|ip4|ip4-frag|tcp4|udp4|sctp4|ip6|ip6-frag|tcp6|udp6|sctp6|all)"
" (mask)\n"
"Configure mask of flex payload.\n\n"

"flow_director_flex_payload (port_id)"
-   " (l2|l3|l4) (config)\n"
+   " (raw|l2|l3|l4) (config)\n"
"Configure flex payload selection.\n\n"
);
}
@@ -8084,6 +8090,8 @@ struct cmd_flow_director_result {
uint16_t port_dst;
cmdline_fixed_string_t verify_tag;
uint32_t verify_tag_value;
+   cmdline_fixed_string_t vlan;
+   uint16_t vlan_value;
cmdline_fixed_string_t flexbytes;
cmdline_fixed_string_t flexbytes_value;
cmdline_fixed_string_t drop;
@@ -8139,6 +8147,7 @@ str2flowtype(char *string)
char str[32];
enum rte_eth_flow_type type;
} flowtype_str[] = {
+   {"raw", RTE_ETH_FLOW_TYPE_RAW},
{"ip4", RTE_ETH_FLOW_TYPE_IPV4_OTHER},
{"ip4-frag", RTE_ETH_FLOW_TYPE_FRAG_IPV4},
{"udp4", RTE_ETH_FLOW_TYPE_UDPV4},
@@ -8209,6 +8218,7 @@ cmd_flow_director_filter_parsed(void *parsed_result,
entry.input.flow_type = str2flowtype(res->flow_type);
switch (entry.input.flow_type) {
case RTE_ETH_FLOW_TYPE_IPV4_OTHER:
+   case RTE_ETH_FLOW_TYPE_FRAG_IPV4:
case RTE_ETH_FLOW_TYPE_UDPV4:
case RTE_ETH_FLOW_TYPE_TCPV4:
IPV4_ADDR_TO_UINT(res->ip_dst,
@@ -8231,6 +8241,7 @@ cmd_flow_director_filter_parsed(void *parsed_result,
rte_cpu_to_be_32(res->verify_tag_value);
break;
case RTE_ETH_FLOW_TYPE_IPV6_OTHER:
+   case RTE_E

[dpdk-dev] [PATCH 15/15] doc: commands changed in testpmd_funcs.rst for flow director

2015-01-29 Thread Jingjing Wu

Following commands of flow director filter are removed:
  - add_signature_filter
  - upd_signature_filter
  - rm_signature_filter
  - add_perfect_filter
  - upd_perfect_filter
  - rm_perfect_filter
  - set_masks_filter
  - set_ipv6_masks_filter
New command is added to set flow director's mask:
  - flow_director_mask
Update arguments of commands:
  - flow_director_filter
  - flow_director_flex_mask
  - flow_director_flex_payload

Signed-off-by: Jingjing Wu 
---
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 287 ++--
 1 file changed, 100 insertions(+), 187 deletions(-)

diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 218835a..70fe9ea 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -71,7 +71,6 @@ These are divided into sections and can be accessed using 
help, help section or
 help display: Displaying port, stats and config information.
 help config : Configuration information.
 help ports  : Configuring ports.
-help flowdir: Flow Director filter help.
 help registers  : Reading and setting port registers.
 help filters: Filters configuration help.
 help all: All of the above sections.
@@ -971,192 +970,6 @@ Where the threshold type can be:
 *   txrst: Set the transmit RS bit threshold of TX rings, 0 <= value <= txd.
 These threshold options are also available from the command-line.

-Flow Director Functions

-
-The Flow Director works in receive mode to identify specific flows or sets of 
flows and route them to specific queues.
-
-Two types of filtering are supported which are referred to as Perfect Match 
and Signature filters:
-
-*   Perfect match filters.
-The hardware checks a match between the masked fields of the received 
packets and the programmed filters.
-
-*   Signature filters.
-The hardware checks a match between a hash-based signature of the masked 
fields of the received packet.
-
-The Flow Director filters can match the following fields in a packet:
-
-*   Source IP and destination IP addresses.
-
-*   Source port and destination port numbers (for UDP and TCP packets).
-
-*   IPv4/IPv6 and UDP/ TCP/SCTP protocol match.
-
-*   VLAN header.
-
-*   Flexible 2-byte tuple match anywhere in the first 64 bytes of the packet.
-
-The Flow Director can also mask out parts of all of these fields so that 
filters are only applied to certain fields
-or parts of the fields.
-For example it is possible to mask out sub-nets of IP addresses or to ignore 
VLAN headers.
-
-In the following sections, several common parameters are used in the Flow 
Director filters.
-These are explained below:
-
-*   src: A pair of source address values. The source IP, in IPv4 or IPv6 
format, and the source port:
-
-src 192.168.0.1 1024
-
-src 2001:DB8:85A3:0:0:8A2E:370:7000 1024
-
-*   dst: A pair of destination address values. The destination IP, in IPv4 or 
IPv6 format, and the destination port.
-
-*   flexbytes: A 2-byte tuple to be matched within the first 64 bytes of a 
packet.
-
-The offset where the match occurs is set by the --pkt-filter-flexbytes-offset 
command-line parameter
-and is counted from the first byte of the destination Ethernet MAC address.
-The default offset is 0xC bytes, which is the "Type" word in the MAC header.
-Typically, the flexbyte value is set to 0x0800 to match the IPv4 MAC type or 
0x86DD to match IPv6.
-These values change when a VLAN tag is added.
-
-*   vlan: The VLAN header to match in the packet.
-
-*   queue: The index of the RX queue to route matched packets to.
-
-*   soft: The 16-bit value in the MBUF flow director ID field for RX packets 
matching the filter.
-
-add_signature_filter
-
-
-Add a signature filter:
-
-# Command is displayed on several lines for clarity.
-
-add_signature_filter (port_id) (ip|udp|tcp|sctp)
-
-src (src_ip_address) (src_port)
-
-dst (dst_ip_address) (dst_port)
-
-flexbytes (flexbytes_values)
-
-vlan (vlan_id) queue (queue_id)
-
-upd_signature_filter
-
-
-Update a signature filter:
-
-# Command is displayed on several lines for clarity.
-
-upd_signature_filter (port_id) (ip|udp|tcp|sctp)
-
-src (src_ip_address) (src_port)
-
-dst (dst_ip_address) (dst_port)
-
-flexbytes (flexbytes_values)
-
-vlan (vlan_id) queue (queue_id)
-
-rm_signature_filter
-~~~
-
-Remove a signature filter:
-
-# Command is displayed on several lines for clarity.
-
-rm_signature_filter (port_id) (ip|udp|tcp|sctp)
-
-src (src_ip_address) (src_port)
-
-dst (dst_ip_address) (dst_port)
-
-flexbytes (flexbytes_values)
-
-vlan (vlan_id)
-
-add_perfect_filter
-~~
-
-Add a perfect filter:
-
-# Command is displayed on several lines for clarity.
-
-add_perfect_filter (port_id) (ip|udp|tcp|sctp)
-
-src (src_

[dpdk-dev] [PATCH v1 2/5] ixgbe: enable rx queue interrupts for both PF and VF

2015-01-29 Thread Zhou, Danny



> -Original Message-
> From: Qiu, Michael
> Sent: Thursday, January 29, 2015 11:40 AM
> To: Zhou, Danny; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 2/5] ixgbe: enable rx queue interrupts for 
> both PF and VF
> 
> On 1/28/2015 5:52 PM, Danny Zhou wrote:
> > Signed-off-by: Danny Zhou 
> > ---
> >  lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 371 
> > 
> >  lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   9 +
> >  2 files changed, 380 insertions(+)
> >
> > diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
> > b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> > index b341dd0..39f883a 100644
> > --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> > +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> > @@ -60,6 +60,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >
> >  #include "ixgbe_logs.h"
> > @@ -173,6 +174,7 @@ static int ixgbe_dev_rss_reta_query(struct rte_eth_dev 
> > *dev,
> > uint16_t reta_size);
> >  static void ixgbe_dev_link_status_print(struct rte_eth_dev *dev);
> >  static int ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev);
> > +static int ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev);
> >  static int ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev);
> >  static int ixgbe_dev_interrupt_action(struct rte_eth_dev *dev);
> >  static void ixgbe_dev_interrupt_handler(struct rte_intr_handle *handle,
> > @@ -186,11 +188,14 @@ static void ixgbe_dcb_init(struct ixgbe_hw *hw,struct 
> > ixgbe_dcb_config *dcb_conf
> >  /* For Virtual Function support */
> >  static int eth_ixgbevf_dev_init(struct eth_driver *eth_drv,
> > struct rte_eth_dev *eth_dev);
> > +static int ixgbevf_dev_interrupt_get_status(struct rte_eth_dev *dev);
> > +static int ixgbevf_dev_interrupt_action(struct rte_eth_dev *dev);
> >  static int  ixgbevf_dev_configure(struct rte_eth_dev *dev);
> >  static int  ixgbevf_dev_start(struct rte_eth_dev *dev);
> >  static void ixgbevf_dev_stop(struct rte_eth_dev *dev);
> >  static void ixgbevf_dev_close(struct rte_eth_dev *dev);
> >  static void ixgbevf_intr_disable(struct ixgbe_hw *hw);
> > +static void ixgbevf_intr_enable(struct ixgbe_hw *hw);
> >  static void ixgbevf_dev_stats_get(struct rte_eth_dev *dev,
> > struct rte_eth_stats *stats);
> >  static void ixgbevf_dev_stats_reset(struct rte_eth_dev *dev);
> > @@ -198,8 +203,15 @@ static int ixgbevf_vlan_filter_set(struct rte_eth_dev 
> > *dev,
> > uint16_t vlan_id, int on);
> >  static void ixgbevf_vlan_strip_queue_set(struct rte_eth_dev *dev,
> > uint16_t queue, int on);
> > +static void ixgbevf_set_ivar(struct ixgbe_hw *hw, s8 direction, u8 queue, 
> > u8 msix_vector);
> 
> ^^^
> >  static void ixgbevf_vlan_offload_set(struct rte_eth_dev *dev, int mask);
> >  static void ixgbevf_set_vfta_all(struct rte_eth_dev *dev, bool on);
> > +static void ixgbevf_dev_interrupt_handler(struct rte_intr_handle *handle,
> > +   void *param);
> > +static int ixgbevf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, 
> > uint16_t queue_id);
> > +static int ixgbevf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, 
> > uint16_t queue_id);
> > +static void ixgbevf_set_ivar(struct ixgbe_hw *hw, s8 direction, u8 queue, 
> > u8 msix_vector);
> 
> Yes re-claim static void ixgbevf_set_ivar()  for twice? Or are they
> different?
> 

Good catch.

> > +static void ixgbevf_configure_msix(struct  ixgbe_hw *hw);
> >
> >  /* For Eth VMDQ APIs support */
> >  static int ixgbe_uc_hash_table_set(struct rte_eth_dev *dev, struct
> > @@ -217,6 +229,11 @@ static int ixgbe_mirror_rule_set(struct rte_eth_dev 
> > *dev,
> >  static int ixgbe_mirror_rule_reset(struct rte_eth_dev *dev,
> > uint8_t rule_id);
> 
> [...]
> > +static void
> > +ixgbe_configure_msix(struct ixgbe_hw *hw)
> > +{
> > +   int queue_id;
> > +   u32 mask;
> > +   u32 gpie;
> > +
> > +   /* set GPIE for in MSI-x mode */
> > +   gpie = IXGBE_READ_REG(hw, IXGBE_GPIE);
> > +   gpie = IXGBE_GPIE_MSIX_MODE | IXGBE_GPIE_PBA_SUPPORT |
> > +  IXGBE_GPIE_OCD;
> > +   gpie |= IXGBE_GPIE_EIAME;
> 
> As you will override gpie with other flags why need to read the reg and
> save to gpie first?
> 
> Maybe read the reg to reset?
> 
> I guess should be:
> 
> + gpie = IXGBE_READ_REG(hw, IXGBE_GPIE);
> + gpie |= IXGBE_GPIE_MSIX_MODE | IXGBE_GPIE_PBA_SUPPORT |
> + IXGBE_GPIE_OCD | IXGBE_GPIE_EIAME;
> 
> Maybe not correct as I not familiar with IXGBE.
> 
> 
Accepted. 

> > +   /*
> > +* use EIAM to auto-mask when MSI-X interrupt is asserted
> > +* this saves a register write for every interrupt
> > +*/
> > +   switch (hw->mac.type) {
> > +   case ixgbe_mac_82598EB:
> > +   IXGBE_WRITE_REG(hw, IXGBE_EIAM, IXGBE_EICS_RTX_QUEUE);
> > +   break;
> > +   case ixgbe_mac_82599EB:
> > +   case ixgbe_mac_X540:
> > +   default:
> > +   IXGBE_WR

[dpdk-dev] [PATCH 2/2] i40e: enable internal switch of pf

2015-01-29 Thread Qiu, Michael

On 1/29/2015 12:57 PM, Wu, Jingjing wrote:
> Hi, Michael
>
>> -Original Message-
>> From: Qiu, Michael
>> Sent: Thursday, January 29, 2015 9:56 AM
>> To: Wu, Jingjing; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH 2/2] i40e: enable internal switch of pf
>>
>> On 1/29/2015 9:42 AM, Jingjing Wu wrote:
>>> This patch enables PF's internal switch by setting ALLOWLOOPBACK flag
>>> when VEB is created. With this patch, traffic from PF can be switched
>>> on the VEB.
>>>
>>> Signed-off-by: Jingjing Wu 
>>> ---
>>>  lib/librte_pmd_i40e/i40e_ethdev.c | 36
>>> 
>>>  1 file changed, 36 insertions(+)
>>>
>>> diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c
>>> b/lib/librte_pmd_i40e/i40e_ethdev.c
>>> index fe758c2..94fd36c 100644
>>> --- a/lib/librte_pmd_i40e/i40e_ethdev.c
>>> +++ b/lib/librte_pmd_i40e/i40e_ethdev.c
>>> @@ -2854,6 +2854,40 @@ i40e_vsi_dump_bw_config(struct i40e_vsi *vsi)
>>> return 0;
>>>  }
>>>
>>> +/*
>>> + * i40e_enable_pf_lb
>>> + * @pf: pointer to the pf structure
>>> + *
>>> + * allow loopback on pf
>>> + */
>>> +static inline void
>>> +i40e_enable_pf_lb(struct i40e_pf *pf) {
>>> +   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
>>> +   struct i40e_vsi_context ctxt;
>>> +   int ret;
>>> +
>>> +   memset(&ctxt, 0, sizeof(ctxt));
>>> +   ctxt.seid = pf->main_vsi_seid;
>>> +   ctxt.pf_num = hw->pf_id;
>>> +   ret = i40e_aq_get_vsi_params(hw, &ctxt, NULL);
>>> +   if (ret) {
>>> +   PMD_DRV_LOG(ERR, "couldn't get pf vsi config, err %d,
>> aq_err %d",
>>> +   ret, hw->aq.asq_last_status);
>>> +   return;
>>> +   }
>>> +   ctxt.flags = I40E_AQ_VSI_TYPE_PF;
>>> +   ctxt.info.valid_sections =
>>> +   rte_cpu_to_le_16(I40E_AQ_VSI_PROP_SWITCH_VALID);
>> Here does it need to be "|=" ? As ctxt.infowill be filled in
>> i40e_aq_get_vsi_params(), I don't know if it has other issue for override 
>> this
>> filled by "=".
>>
>> Thanks,
>> Michael
> You can look at the following lines. What we called is 
> i40e_aq_update_vsi_params.
> So we need only set the flag we want to update.

Sorry, I make a mistake, what I mean is:

1. ret = i40e_aq_get_vsi_params(hw, &ctxt, NULL);
here will fill the the field  ctxt.info of struct i40e_vsi_context
ctxt right?
So ctxt.info is get from other place.

2. Then:

+   ctxt.info.valid_sections =
+   rte_cpu_to_le_16(I40E_AQ_VSI_PROP_SWITCH_VALID);

Has been override by assignment a value, so I just confuse whether it has some 
issue.

If I'm wrong, please ignore.


Thanks,
Michael

> Thanks
> Jingjing
>
>>> +   ctxt.info.switch_id |=
>>> +   rte_cpu_to_le_16(I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB);
>>> +
>>> +   ret = i40e_aq_update_vsi_params(hw, &ctxt, NULL);
>>> +   if (ret)
>>> +   PMD_DRV_LOG(ERR, "update vsi switch failed,
>> aq_err=%d\n",
>>> +   hw->aq.asq_last_status);
>>> +}
>>> +
>>>  /* Setup a VSI */
>>>  struct i40e_vsi *
>>>  i40e_vsi_setup(struct i40e_pf *pf,
>>> @@ -2889,6 +2923,8 @@ i40e_vsi_setup(struct i40e_pf *pf,
>>> PMD_DRV_LOG(ERR, "VEB setup failed");
>>> return NULL;
>>> }
>>> +   /* set ALLOWLOOPBACk on pf, when veb is created */
>>> +   i40e_enable_pf_lb(pf);
>>> }
>>>
>>> vsi = rte_zmalloc("i40e_vsi", sizeof(struct i40e_vsi), 0);
>

[dpdk-dev] [PATCH v2 0/4] DPDK memcpy optimization

2015-01-29 Thread Fu, JingguoX

Basic Information

Patch nameDPDK memcpy optimization v2
Brief description about test purposeVerify memory copy and memory 
copy performance cases on variety OS
Test Flag Tested-by
Tester name   jingguox.fu at intel.com

Test Tool Chain information N/A
  Commit ID 88fa98a60b34812bfed92e5b2706fcf7e1cbcbc8
Test Result Summary Total 6 cases, 6 passed, 0 failed

Test environment

-   Environment 1:
OS: Ubuntu12.04 3.2.0-23-generic X86_64
GCC: gcc version 4.6.3
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ [8086:10fb] (rev 
01)

-   Environment 2: 
OS: Ubuntu14.04 3.13.0-24-generic
GCC: gcc version 4.8.2
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ [8086:10fb] (rev 
01)

Environment 3:
OS: Fedora18 3.6.10-4.fc18.x86_64
GCC: gcc version 4.7.2 20121109
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ [8086:10fb] (rev 
01)

Detailed Testing information

Test Case - name  test_memcpy
Test Case - Description 
  Create two buffers, and initialise one with random values. 
These are copied 
  to the second buffer and then compared to see if the copy was 
successful. The 
  bytes outside the copied area are also checked to make sure 
they were not changed.
Test Case -test sample/application
  test application in app/test
Test Case -command / instruction
  # ./app/test/test -n 1 -c 
  #RTE>> memcpy_autotest
Test Case - expected
  #RTE>> Test   OK
Test Result- PASSED

Test Case - name  test_memcpy_perf
Test Case - Description
  a number of different sizes and cached/uncached permutations
Test Case -test sample/application
  test application in app/test
Test Case -command / instruction
  # ./app/test/test -n 1 -c 
  #RTE>> memcpy_perf_autotest
Test Case - expected
  #RTE>> Test   OK
Test Result- PASSED


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Zhihong Wang
Sent: Thursday, January 29, 2015 10:39
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH v2 0/4] DPDK memcpy optimization

This patch set optimizes memcpy for DPDK for both SSE and AVX platforms.
It also extends memcpy test coverage with unaligned cases and more test points.

Optimization techniques are summarized below:

1. Utilize full cache bandwidth

2. Enforce aligned stores

3. Apply load address alignment based on architecture features

4. Make load/store address available as early as possible

5. General optimization techniques like inlining, branch reducing, prefetch 
pattern access

--
Changes in v2:

1. Reduced constant test cases in app/test/test_memcpy_perf.c for fast build

2. Modified macro definition for better code readability & safety

Zhihong Wang (4):
  app/test: Disabled VTA for memcpy test in app/test/Makefile
  app/test: Removed unnecessary test cases in app/test/test_memcpy.c
  app/test: Extended test coverage in app/test/test_memcpy_perf.c
  lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE
and AVX platforms

 app/test/Makefile  |   6 +
 app/test/test_memcpy.c |  52 +-
 app/test/test_memcpy_perf.c| 220 ---
 .../common/include/arch/x86/rte_memcpy.h   | 680 +++--
 4 files changed, 654 insertions(+), 304 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH 2/2] i40e: enable internal switch of pf

2015-01-29 Thread Wu, Jingjing

Hi, Michael

> -Original Message-
> From: Qiu, Michael
> Sent: Thursday, January 29, 2015 2:06 PM
> To: Wu, Jingjing; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/2] i40e: enable internal switch of pf
> 
> On 1/29/2015 12:57 PM, Wu, Jingjing wrote:
> > Hi, Michael
> >
> >> -Original Message-
> >> From: Qiu, Michael
> >> Sent: Thursday, January 29, 2015 9:56 AM
> >> To: Wu, Jingjing; dev at dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH 2/2] i40e: enable internal switch of
> >> pf
> >>
> >> On 1/29/2015 9:42 AM, Jingjing Wu wrote:
> >>> This patch enables PF's internal switch by setting ALLOWLOOPBACK
> >>> flag when VEB is created. With this patch, traffic from PF can be
> >>> switched on the VEB.
> >>>
> >>> Signed-off-by: Jingjing Wu 
> >>> ---
> >>>  lib/librte_pmd_i40e/i40e_ethdev.c | 36
> >>> 
> >>>  1 file changed, 36 insertions(+)
> >>>
> >>> diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c
> >>> b/lib/librte_pmd_i40e/i40e_ethdev.c
> >>> index fe758c2..94fd36c 100644
> >>> --- a/lib/librte_pmd_i40e/i40e_ethdev.c
> >>> +++ b/lib/librte_pmd_i40e/i40e_ethdev.c
> >>> @@ -2854,6 +2854,40 @@ i40e_vsi_dump_bw_config(struct i40e_vsi
> *vsi)
> >>>   return 0;
> >>>  }
> >>>
> >>> +/*
> >>> + * i40e_enable_pf_lb
> >>> + * @pf: pointer to the pf structure
> >>> + *
> >>> + * allow loopback on pf
> >>> + */
> >>> +static inline void
> >>> +i40e_enable_pf_lb(struct i40e_pf *pf) {
> >>> + struct i40e_hw *hw = I40E_PF_TO_HW(pf);
> >>> + struct i40e_vsi_context ctxt;
> >>> + int ret;
> >>> +
> >>> + memset(&ctxt, 0, sizeof(ctxt));
> >>> + ctxt.seid = pf->main_vsi_seid;
> >>> + ctxt.pf_num = hw->pf_id;
> >>> + ret = i40e_aq_get_vsi_params(hw, &ctxt, NULL);
> >>> + if (ret) {
> >>> + PMD_DRV_LOG(ERR, "couldn't get pf vsi config, err %d,
> >> aq_err %d",
> >>> + ret, hw->aq.asq_last_status);
> >>> + return;
> >>> + }
> >>> + ctxt.flags = I40E_AQ_VSI_TYPE_PF;
> >>> + ctxt.info.valid_sections =
> >>> + rte_cpu_to_le_16(I40E_AQ_VSI_PROP_SWITCH_VALID);
> >> Here does it need to be "|=" ? As ctxt.infowill be filled in
> >> i40e_aq_get_vsi_params(), I don't know if it has other issue for
> >> override this filled by "=".
> >>
> >> Thanks,
> >> Michael
> > You can look at the following lines. What we called is
> i40e_aq_update_vsi_params.
> > So we need only set the flag we want to update.
> 
> Sorry, I make a mistake, what I mean is:
> 
> 1. ret = i40e_aq_get_vsi_params(hw, &ctxt, NULL);
> here will fill the the field  ctxt.info of struct i40e_vsi_context ctxt 
> right?
> So ctxt.info is get from other place.
> 
> 2. Then:
> 
> + ctxt.info.valid_sections =
> + rte_cpu_to_le_16(I40E_AQ_VSI_PROP_SWITCH_VALID);
> 
> Has been override by assignment a value, so I just confuse whether it has
> some issue.
> 
> If I'm wrong, please ignore. 
> 
> 
> Thanks,
> Michael
> 
I get your idea now. Some elements in ctxt is meaningless and not set when 
getting, and others are meaningful when
updating. The valid_sections is only meaningful when setting. If one flag in 
valid_section is set, it means the
hw need to process corresponding section.

> > Thanks
> > Jingjing
> >
> >>> + ctxt.info.switch_id |=
> >>> + rte_cpu_to_le_16(I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB);
> >>> +
> >>> + ret = i40e_aq_update_vsi_params(hw, &ctxt, NULL);
> >>> + if (ret)
> >>> + PMD_DRV_LOG(ERR, "update vsi switch failed,
> >> aq_err=%d\n",
> >>> + hw->aq.asq_last_status);
> >>> +}
> >>> +
> >>>  /* Setup a VSI */
> >>>  struct i40e_vsi *
> >>>  i40e_vsi_setup(struct i40e_pf *pf,
> >>> @@ -2889,6 +2923,8 @@ i40e_vsi_setup(struct i40e_pf *pf,
> >>>   PMD_DRV_LOG(ERR, "VEB setup failed");
> >>>   return NULL;
> >>>   }
> >>> + /* set ALLOWLOOPBACk on pf, when veb is created */
> >>> + i40e_enable_pf_lb(pf);
> >>>   }
> >>>
> >>>   vsi = rte_zmalloc("i40e_vsi", sizeof(struct i40e_vsi), 0);
> >

[dpdk-dev] [PATCH 2/2] i40e: enable internal switch of pf

2015-01-29 Thread Qiu, Michael

On 1/29/2015 2:27 PM, Wu, Jingjing wrote:
> Hi, Michael
>
>> -Original Message-
>> From: Qiu, Michael
>> Sent: Thursday, January 29, 2015 2:06 PM
>> To: Wu, Jingjing; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH 2/2] i40e: enable internal switch of pf
>>
>> On 1/29/2015 12:57 PM, Wu, Jingjing wrote:
>>> Hi, Michael
>>>
 -Original Message-
 From: Qiu, Michael
 Sent: Thursday, January 29, 2015 9:56 AM
 To: Wu, Jingjing; dev at dpdk.org
 Subject: Re: [dpdk-dev] [PATCH 2/2] i40e: enable internal switch of
 pf

 On 1/29/2015 9:42 AM, Jingjing Wu wrote:
> This patch enables PF's internal switch by setting ALLOWLOOPBACK
> flag when VEB is created. With this patch, traffic from PF can be
> switched on the VEB.
>
> Signed-off-by: Jingjing Wu 
> ---
>  lib/librte_pmd_i40e/i40e_ethdev.c | 36
> 
>  1 file changed, 36 insertions(+)
>
> diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c
> b/lib/librte_pmd_i40e/i40e_ethdev.c
> index fe758c2..94fd36c 100644
> --- a/lib/librte_pmd_i40e/i40e_ethdev.c
> +++ b/lib/librte_pmd_i40e/i40e_ethdev.c
> @@ -2854,6 +2854,40 @@ i40e_vsi_dump_bw_config(struct i40e_vsi
>> *vsi)
>   return 0;
>  }
>
> +/*
> + * i40e_enable_pf_lb
> + * @pf: pointer to the pf structure
> + *
> + * allow loopback on pf
> + */
> +static inline void
> +i40e_enable_pf_lb(struct i40e_pf *pf) {
> + struct i40e_hw *hw = I40E_PF_TO_HW(pf);
> + struct i40e_vsi_context ctxt;
> + int ret;
> +
> + memset(&ctxt, 0, sizeof(ctxt));
> + ctxt.seid = pf->main_vsi_seid;
> + ctxt.pf_num = hw->pf_id;
> + ret = i40e_aq_get_vsi_params(hw, &ctxt, NULL);
> + if (ret) {
> + PMD_DRV_LOG(ERR, "couldn't get pf vsi config, err %d,
 aq_err %d",
> + ret, hw->aq.asq_last_status);
> + return;
> + }
> + ctxt.flags = I40E_AQ_VSI_TYPE_PF;
> + ctxt.info.valid_sections =
> + rte_cpu_to_le_16(I40E_AQ_VSI_PROP_SWITCH_VALID);
 Here does it need to be "|=" ? As ctxt.infowill be filled in
 i40e_aq_get_vsi_params(), I don't know if it has other issue for
 override this filled by "=".

 Thanks,
 Michael
>>> You can look at the following lines. What we called is
>> i40e_aq_update_vsi_params.
>>> So we need only set the flag we want to update.
>> Sorry, I make a mistake, what I mean is:
>>
>> 1. ret = i40e_aq_get_vsi_params(hw, &ctxt, NULL);
>> here will fill the the field  ctxt.info of struct i40e_vsi_context ctxt 
>> right?
>> So ctxt.info is get from other place.
>>
>> 2. Then:
>>
>> +ctxt.info.valid_sections =
>> +rte_cpu_to_le_16(I40E_AQ_VSI_PROP_SWITCH_VALID);
>>
>> Has been override by assignment a value, so I just confuse whether it has
>> some issue.
>>
>> If I'm wrong, please ignore. 
>>
>>
>> Thanks,
>> Michael
>>
> I get your idea now. Some elements in ctxt is meaningless and not set when 
> getting, and others are meaningful when
> updating. The valid_sections is only meaningful when setting. If one flag in 
> valid_section is set, it means the
> hw need to process corresponding section.

OK, as it meaningless, I agree with you.

Thanks,
Michael
>>> Thanks
>>> Jingjing
>>>
> + ctxt.info.switch_id |=
> + rte_cpu_to_le_16(I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB);
> +
> + ret = i40e_aq_update_vsi_params(hw, &ctxt, NULL);
> + if (ret)
> + PMD_DRV_LOG(ERR, "update vsi switch failed,
 aq_err=%d\n",
> + hw->aq.asq_last_status);
> +}
> +
>  /* Setup a VSI */
>  struct i40e_vsi *
>  i40e_vsi_setup(struct i40e_pf *pf,
> @@ -2889,6 +2923,8 @@ i40e_vsi_setup(struct i40e_pf *pf,
>   PMD_DRV_LOG(ERR, "VEB setup failed");
>   return NULL;
>   }
> + /* set ALLOWLOOPBACk on pf, when veb is created */
> + i40e_enable_pf_lb(pf);
>   }
>
>   vsi = rte_zmalloc("i40e_vsi", sizeof(struct i40e_vsi), 0);
>

[dpdk-dev] [PATCH v3 01/25] virtio: Rearrange resource initialization

2015-01-29 Thread Ouyang Changchun

For clarity make the setup of PCI resources for Linux into a function rather
than block of code #ifdef'd in middle of dev_init.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 76 ---
 1 file changed, 43 insertions(+), 33 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index b3b5bb6..662a49c 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -794,6 +794,41 @@ virtio_has_msix(const struct rte_pci_addr *loc)

return (d != NULL);
 }
+
+/* Extract I/O port numbers from sysfs */
+static int virtio_resource_init(struct rte_pci_device *pci_dev)
+{
+   char dirname[PATH_MAX];
+   char filename[PATH_MAX];
+   unsigned long start, size;
+
+   if (get_uio_dev(&pci_dev->addr, dirname, sizeof(dirname)) < 0)
+   return -1;
+
+   /* get portio size */
+   snprintf(filename, sizeof(filename),
+"%s/portio/port0/size", dirname);
+   if (parse_sysfs_value(filename, &size) < 0) {
+   PMD_INIT_LOG(ERR, "%s(): cannot parse size",
+__func__);
+   return -1;
+   }
+
+   /* get portio start */
+   snprintf(filename, sizeof(filename),
+"%s/portio/port0/start", dirname);
+   if (parse_sysfs_value(filename, &start) < 0) {
+   PMD_INIT_LOG(ERR, "%s(): cannot parse portio start",
+__func__);
+   return -1;
+   }
+   pci_dev->mem_resource[0].addr = (void *)(uintptr_t)start;
+   pci_dev->mem_resource[0].len =  (uint64_t)size;
+   PMD_INIT_LOG(DEBUG,
+"PCI Port IO found start=0x%lx with size=0x%lx",
+start, size);
+   return 0;
+}
 #else
 static int
 virtio_has_msix(const struct rte_pci_addr *loc __rte_unused)
@@ -801,6 +836,12 @@ virtio_has_msix(const struct rte_pci_addr *loc 
__rte_unused)
/* nic_uio does not enable interrupts, return 0 (false). */
return 0;
 }
+
+static int virtio_resource_init(struct rte_pci_device *pci_dev __rte_unused)
+{
+   /* no setup required */
+   return 0;
+}
 #endif

 /*
@@ -831,40 +872,9 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
return 0;

pci_dev = eth_dev->pci_dev;
+   if (virtio_resource_init(pci_dev) < 0)
+   return -1;

-#ifdef RTE_EXEC_ENV_LINUXAPP
-   {
-   char dirname[PATH_MAX];
-   char filename[PATH_MAX];
-   unsigned long start, size;
-
-   if (get_uio_dev(&pci_dev->addr, dirname, sizeof(dirname)) < 0)
-   return -1;
-
-   /* get portio size */
-   snprintf(filename, sizeof(filename),
-"%s/portio/port0/size", dirname);
-   if (parse_sysfs_value(filename, &size) < 0) {
-   PMD_INIT_LOG(ERR, "%s(): cannot parse size",
-__func__);
-   return -1;
-   }
-
-   /* get portio start */
-   snprintf(filename, sizeof(filename),
-"%s/portio/port0/start", dirname);
-   if (parse_sysfs_value(filename, &start) < 0) {
-   PMD_INIT_LOG(ERR, "%s(): cannot parse portio start",
-__func__);
-   return -1;
-   }
-   pci_dev->mem_resource[0].addr = (void *)(uintptr_t)start;
-   pci_dev->mem_resource[0].len =  (uint64_t)size;
-   PMD_INIT_LOG(DEBUG,
-"PCI Port IO found start=0x%lx with size=0x%lx",
-start, size);
-   }
-#endif
hw->use_msix = virtio_has_msix(&pci_dev->addr);
hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;

-- 
1.8.4.2

[dpdk-dev] [PATCH v3 04/25] virtio: Add support for Link State interrupt

2015-01-29 Thread Ouyang Changchun

Virtio has link state interrupt which can be used.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 78 +++
 lib/librte_pmd_virtio/virtio_pci.c| 22 ++
 lib/librte_pmd_virtio/virtio_pci.h|  4 ++
 3 files changed, 86 insertions(+), 18 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 5df3b54..ef87ff8 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -845,6 +845,34 @@ static int virtio_resource_init(struct rte_pci_device 
*pci_dev __rte_unused)
 #endif

 /*
+ * Process Virtio Config changed interrupt and call the callback
+ * if link state changed.
+ */
+static void
+virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
+void *param)
+{
+   struct rte_eth_dev *dev = param;
+   struct virtio_hw *hw =
+   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint8_t isr;
+
+   /* Read interrupt status which clears interrupt */
+   isr = vtpci_isr(hw);
+   PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);
+
+   if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0)
+   PMD_DRV_LOG(ERR, "interrupt enable failed");
+
+   if (isr & VIRTIO_PCI_ISR_CONFIG) {
+   if (virtio_dev_link_update(dev, 0) == 0)
+   _rte_eth_dev_callback_process(dev,
+ RTE_ETH_EVENT_INTR_LSC);
+   }
+
+}
+
+/*
  * This function is based on probe() function in virtio_pci.c
  * It returns 0 on success.
  */
@@ -968,6 +996,10 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
eth_dev->data->port_id, pci_dev->id.vendor_id,
pci_dev->id.device_id);
+
+   /* Setup interrupt callback  */
+   rte_intr_callback_register(&pci_dev->intr_handle,
+  virtio_interrupt_handler, eth_dev);
return 0;
 }

@@ -975,7 +1007,7 @@ static struct eth_driver rte_virtio_pmd = {
{
.name = "rte_virtio_pmd",
.id_table = pci_id_virtio_map,
-   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
},
.eth_dev_init = eth_virtio_dev_init,
.dev_private_size = sizeof(struct virtio_adapter),
@@ -1021,6 +1053,9 @@ static int
 virtio_dev_configure(struct rte_eth_dev *dev)
 {
const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
+   struct virtio_hw *hw =
+   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   int ret;

PMD_INIT_LOG(DEBUG, "configure");

@@ -1029,7 +1064,11 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return (-EINVAL);
}

-   return 0;
+   ret = vtpci_irq_config(hw, 0);
+   if (ret != 0)
+   PMD_DRV_LOG(ERR, "failed to set config vector");
+
+   return ret;
 }


@@ -1037,7 +1076,6 @@ static int
 virtio_dev_start(struct rte_eth_dev *dev)
 {
uint16_t nb_queues, i;
-   uint16_t status;
struct virtio_hw *hw =
VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);

@@ -1052,18 +1090,22 @@ virtio_dev_start(struct rte_eth_dev *dev)
/* Do final configuration before rx/tx engine starts */
virtio_dev_rxtx_start(dev);

-   /* Check VIRTIO_NET_F_STATUS for link status*/
-   if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
-   vtpci_read_dev_config(hw,
-   offsetof(struct virtio_net_config, status),
-   &status, sizeof(status));
-   if ((status & VIRTIO_NET_S_LINK_UP) == 0)
-   PMD_INIT_LOG(ERR, "Port: %d Link is DOWN",
-dev->data->port_id);
-   else
-   PMD_INIT_LOG(DEBUG, "Port: %d Link is UP",
-dev->data->port_id);
+   /* check if lsc interrupt feature is enabled */
+   if (dev->data->dev_conf.intr_conf.lsc) {
+   if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
+   PMD_DRV_LOG(ERR, "link status not supported by host");
+   return -ENOTSUP;
+   }
+
+   if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0) {
+   PMD_DRV_LOG(ERR, "interrupt enable failed");
+   return -EIO;
+   }
}
+
+   /* Initialize Link state */
+   virtio_dev_link_update(dev, 0);
+
vtpci_reinit_complete(hw);

/*Notify the backend
@@ -1145,6 +1187,7 @@ virtio_dev_stop(struct rte_eth_dev *dev)
VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);

/* reset the

[dpdk-dev] [PATCH v3 03/25] virtio: Allow starting with link down

2015-01-29 Thread Ouyang Changchun

Starting driver with link down should be ok, it is with every
other driver. So just allow it.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index dc47e72..5df3b54 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -1057,14 +1057,12 @@ virtio_dev_start(struct rte_eth_dev *dev)
vtpci_read_dev_config(hw,
offsetof(struct virtio_net_config, status),
&status, sizeof(status));
-   if ((status & VIRTIO_NET_S_LINK_UP) == 0) {
+   if ((status & VIRTIO_NET_S_LINK_UP) == 0)
PMD_INIT_LOG(ERR, "Port: %d Link is DOWN",
 dev->data->port_id);
-   return -EIO;
-   } else {
+   else
PMD_INIT_LOG(DEBUG, "Port: %d Link is UP",
 dev->data->port_id);
-   }
}
vtpci_reinit_complete(hw);

-- 
1.8.4.2

[dpdk-dev] [PATCH v3 05/25] ether: Add soft vlan encap/decap functions

2015-01-29 Thread Ouyang Changchun

It is helpful to allow device drivers that don't support hardware
VLAN stripping to emulate this in software. This allows application
to be device independent.

Avoid discarding shared mbufs. Make a copy in rte_vlan_insert() of any
packet to be tagged that has a reference count > 1.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_ether/rte_ether.h | 76 
 1 file changed, 76 insertions(+)

diff --git a/lib/librte_ether/rte_ether.h b/lib/librte_ether/rte_ether.h
index 7e7d22c..74f71c2 100644
--- a/lib/librte_ether/rte_ether.h
+++ b/lib/librte_ether/rte_ether.h
@@ -49,6 +49,8 @@ extern "C" {

 #include 
 #include 
+#include 
+#include 

 #define ETHER_ADDR_LEN  6 /**< Length of Ethernet address. */
 #define ETHER_TYPE_LEN  2 /**< Length of Ethernet type field. */
@@ -333,6 +335,80 @@ struct vxlan_hdr {
 #define ETHER_VXLAN_HLEN (sizeof(struct udp_hdr) + sizeof(struct vxlan_hdr))
 /**< VXLAN tunnel header length. */

+/**
+ * Extract VLAN tag information into mbuf
+ *
+ * Software version of VLAN stripping
+ *
+ * @param m
+ *   The packet mbuf.
+ * @return
+ *   - 0: Success
+ *   - 1: not a vlan packet
+ */
+static inline int rte_vlan_strip(struct rte_mbuf *m)
+{
+   struct ether_hdr *eh
+= rte_pktmbuf_mtod(m, struct ether_hdr *);
+
+   if (eh->ether_type != ETHER_TYPE_VLAN)
+   return -1;
+
+   struct vlan_hdr *vh = (struct vlan_hdr *)(eh + 1);
+   m->ol_flags |= PKT_RX_VLAN_PKT;
+   m->vlan_tci = rte_be_to_cpu_16(vh->vlan_tci);
+
+   /* Copy ether header over rather than moving whole packet */
+   memmove(rte_pktmbuf_adj(m, sizeof(struct vlan_hdr)),
+   eh, 2 * ETHER_ADDR_LEN);
+
+   return 0;
+}
+
+/**
+ * Insert VLAN tag into mbuf.
+ *
+ * Software version of VLAN unstripping
+ *
+ * @param m
+ *   The packet mbuf.
+ * @return
+ *   - 0: On success
+ *   -EPERM: mbuf is is shared overwriting would be unsafe
+ *   -ENOSPC: not enough headroom in mbuf
+ */
+static inline int rte_vlan_insert(struct rte_mbuf **m)
+{
+   struct ether_hdr *oh, *nh;
+   struct vlan_hdr *vh;
+
+#ifdef RTE_MBUF_REFCNT
+   /* Can't insert header if mbuf is shared */
+   if (rte_mbuf_refcnt_read(*m) > 1) {
+   struct rte_mbuf *copy;
+
+   copy = rte_pktmbuf_clone(*m, (*m)->pool);
+   if (unlikely(copy == NULL))
+   return -ENOMEM;
+   rte_pktmbuf_free(*m);
+   *m = copy;
+   }
+#endif
+   oh = rte_pktmbuf_mtod(*m, struct ether_hdr *);
+   nh = (struct ether_hdr *)
+   rte_pktmbuf_prepend(*m, sizeof(struct vlan_hdr));
+   if (nh == NULL)
+   return -ENOSPC;
+
+   memmove(nh, oh, 2 * ETHER_ADDR_LEN);
+   nh->ether_type = ETHER_TYPE_VLAN;
+
+   vh = (struct vlan_hdr *) (nh + 1);
+   vh->vlan_tci = rte_cpu_to_be_16((*m)->vlan_tci);
+
+   return 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.4.2

[dpdk-dev] [PATCH v3 02/25] virtio: Use weaker barriers

2015-01-29 Thread Ouyang Changchun

The DPDK driver only has to deal with the case of running on PCI
and with SMP. In this case, the code can use the weaker barriers
instead of using hard (fence) barriers. This will help performance.
The rationale is explained in Linux kernel virtio_ring.h.

To make it clearer that this is a virtio thing and not some generic
barrier, prefix the barrier calls with virtio_.

Add missing (and needed) barrier between updating ring data
structure and notifying host.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c |  2 +-
 lib/librte_pmd_virtio/virtio_rxtx.c   |  8 +---
 lib/librte_pmd_virtio/virtqueue.h | 19 ++-
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 662a49c..dc47e72 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -175,7 +175,7 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,
uint32_t idx, desc_idx, used_idx;
struct vring_used_elem *uep;

-   rmb();
+   virtio_rmb();

used_idx = (uint32_t)(vq->vq_used_cons_idx
& (vq->vq_nentries - 1));
diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index c013f97..78af334 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -456,7 +456,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)

nb_used = VIRTQUEUE_NUSED(rxvq);

-   rmb();
+   virtio_rmb();

num = (uint16_t)(likely(nb_used <= nb_pkts) ? nb_used : nb_pkts);
num = (uint16_t)(likely(num <= VIRTIO_MBUF_BURST_SZ) ? num : 
VIRTIO_MBUF_BURST_SZ);
@@ -516,6 +516,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
}

if (likely(nb_enqueued)) {
+   virtio_wmb();
if (unlikely(virtqueue_kick_prepare(rxvq))) {
virtqueue_notify(rxvq);
PMD_RX_LOG(DEBUG, "Notified\n");
@@ -547,7 +548,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,

nb_used = VIRTQUEUE_NUSED(rxvq);

-   rmb();
+   virtio_rmb();

if (nb_used == 0)
return 0;
@@ -694,7 +695,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
PMD_TX_LOG(DEBUG, "%d packets to xmit", nb_pkts);
nb_used = VIRTQUEUE_NUSED(txvq);

-   rmb();
+   virtio_rmb();

num = (uint16_t)(likely(nb_used < VIRTIO_MBUF_BURST_SZ) ? nb_used : 
VIRTIO_MBUF_BURST_SZ);

@@ -735,6 +736,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
}
}
vq_update_avail_idx(txvq);
+   virtio_wmb();

txvq->packets += nb_tx;

diff --git a/lib/librte_pmd_virtio/virtqueue.h 
b/lib/librte_pmd_virtio/virtqueue.h
index fdee054..f6ad98d 100644
--- a/lib/librte_pmd_virtio/virtqueue.h
+++ b/lib/librte_pmd_virtio/virtqueue.h
@@ -46,9 +46,18 @@
 #include "virtio_ring.h"
 #include "virtio_logs.h"

-#define mb()  rte_mb()
-#define wmb() rte_wmb()
-#define rmb() rte_rmb()
+/*
+ * Per virtio_config.h in Linux.
+ * For virtio_pci on SMP, we don't need to order with respect to MMIO
+ * accesses through relaxed memory I/O windows, so smp_mb() et al are
+ * sufficient.
+ *
+ * This driver is for virtio_pci on SMP and therefore can assume
+ * weaker (compiler barriers)
+ */
+#define virtio_mb()rte_mb()
+#define virtio_rmb()   rte_compiler_barrier()
+#define virtio_wmb()   rte_compiler_barrier()

 #ifdef RTE_PMD_PACKET_PREFETCH
 #define rte_packet_prefetch(p)  rte_prefetch1(p)
@@ -225,7 +234,7 @@ virtqueue_full(const struct virtqueue *vq)
 static inline void
 vq_update_avail_idx(struct virtqueue *vq)
 {
-   rte_compiler_barrier();
+   virtio_rmb();
vq->vq_ring.avail->idx = vq->vq_avail_idx;
 }

@@ -255,7 +264,7 @@ static inline void
 virtqueue_notify(struct virtqueue *vq)
 {
/*
-* Ensure updated avail->idx is visible to host. mb() necessary?
+* Ensure updated avail->idx is visible to host.
 * For virtio on IA, the notificaiton is through io port operation
 * which is a serialization instruction itself.
 */
-- 
1.8.4.2

[dpdk-dev] [PATCH v3 06/25] virtio: Use software vlan stripping

2015-01-29 Thread Ouyang Changchun

Implement VLAN stripping in software. This allows application
to be device independent.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_ether/rte_ethdev.h |  3 +++
 lib/librte_pmd_virtio/virtio_ethdev.c |  2 ++
 lib/librte_pmd_virtio/virtio_pci.h|  1 +
 lib/librte_pmd_virtio/virtio_rxtx.c   | 20 ++--
 4 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 1200c1c..94d6b2b 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -643,6 +643,9 @@ struct rte_eth_rxconf {
 #define ETH_TXQ_FLAGS_NOOFFLOADS \
(ETH_TXQ_FLAGS_NOVLANOFFL | ETH_TXQ_FLAGS_NOXSUMSCTP | \
 ETH_TXQ_FLAGS_NOXSUMUDP  | ETH_TXQ_FLAGS_NOXSUMTCP)
+#define ETH_TXQ_FLAGS_NOXSUMS \
+   (ETH_TXQ_FLAGS_NOXSUMSCTP | ETH_TXQ_FLAGS_NOXSUMUDP | \
+ETH_TXQ_FLAGS_NOXSUMTCP)
 /**
  * A structure used to configure a TX ring of an Ethernet port.
  */
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index ef87ff8..da74659 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -1064,6 +1064,8 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return (-EINVAL);
}

+   hw->vlan_strip = rxmode->hw_vlan_strip;
+
ret = vtpci_irq_config(hw, 0);
if (ret != 0)
PMD_DRV_LOG(ERR, "failed to set config vector");
diff --git a/lib/librte_pmd_virtio/virtio_pci.h 
b/lib/librte_pmd_virtio/virtio_pci.h
index 6998737..6d93fac 100644
--- a/lib/librte_pmd_virtio/virtio_pci.h
+++ b/lib/librte_pmd_virtio/virtio_pci.h
@@ -168,6 +168,7 @@ struct virtio_hw {
uint32_tmax_tx_queues;
uint32_tmax_rx_queues;
uint16_tvtnet_hdr_size;
+   uint8_t vlan_strip;
uint8_t use_msix;
uint8_t mac_addr[ETHER_ADDR_LEN];
 };
diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index 78af334..e0216ec 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
@@ -408,8 +409,8 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,

PMD_INIT_FUNC_TRACE();

-   if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOOFFLOADS)
-   != ETH_TXQ_FLAGS_NOOFFLOADS) {
+   if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOXSUMS)
+   != ETH_TXQ_FLAGS_NOXSUMS) {
PMD_INIT_LOG(ERR, "TX checksum offload not supported\n");
return -EINVAL;
}
@@ -446,6 +447,7 @@ uint16_t
 virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 {
struct virtqueue *rxvq = rx_queue;
+   struct virtio_hw *hw = rxvq->hw;
struct rte_mbuf *rxm, *new_mbuf;
uint16_t nb_used, num, nb_rx = 0;
uint32_t len[VIRTIO_MBUF_BURST_SZ];
@@ -489,6 +491,9 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
rxm->pkt_len = (uint32_t)(len[i] - hdr_size);
rxm->data_len = (uint16_t)(len[i] - hdr_size);

+   if (hw->vlan_strip)
+   rte_vlan_strip(rxm);
+
VIRTIO_DUMP_PACKET(rxm, rxm->data_len);

rx_pkts[nb_rx++] = rxm;
@@ -717,6 +722,17 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
 */
if (likely(need <= 0)) {
txm = tx_pkts[nb_tx];
+
+   /* Do VLAN tag insertion */
+   if (txm->ol_flags & PKT_TX_VLAN_PKT) {
+   error = rte_vlan_insert(&txm);
+   if (unlikely(error)) {
+   rte_pktmbuf_free(txm);
+   ++nb_tx;
+   continue;
+   }
+   }
+
/* Enqueue Packet buffers */
error = virtqueue_enqueue_xmit(txvq, txm);
if (unlikely(error)) {
-- 
1.8.4.2

[dpdk-dev] [PATCH v3 07/25] virtio: Remove unnecessary adapter structure

2015-01-29 Thread Ouyang Changchun

Cleanup virtio code by eliminating unnecessary nesting of
virtio hardware structure inside adapter structure.
Also allows removing unneeded macro, making code clearer.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 43 ---
 lib/librte_pmd_virtio/virtio_ethdev.h |  9 
 lib/librte_pmd_virtio/virtio_rxtx.c   |  3 +--
 3 files changed, 16 insertions(+), 39 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index da74659..59b74b7 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -207,8 +207,7 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,
 static int
 virtio_set_multiple_queues(struct rte_eth_dev *dev, uint16_t nb_queues)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -242,8 +241,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
const struct rte_memzone *mz;
uint16_t vq_size;
int size;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtqueue  *vq = NULL;

/* Write the virtqueue index to the Queue Select Field */
@@ -383,8 +381,7 @@ virtio_dev_cq_queue_setup(struct rte_eth_dev *dev, uint16_t 
vtpci_queue_idx,
struct virtqueue *vq;
uint16_t nb_desc = 0;
int ret;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;

PMD_INIT_FUNC_TRACE();
ret = virtio_dev_queue_setup(dev, VTNET_CQ, VTNET_SQ_CQ_QUEUE_IDX,
@@ -410,8 +407,7 @@ virtio_dev_close(struct rte_eth_dev *dev)
 static void
 virtio_dev_promiscuous_enable(struct rte_eth_dev *dev)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -430,8 +426,7 @@ virtio_dev_promiscuous_enable(struct rte_eth_dev *dev)
 static void
 virtio_dev_promiscuous_disable(struct rte_eth_dev *dev)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -450,8 +445,7 @@ virtio_dev_promiscuous_disable(struct rte_eth_dev *dev)
 static void
 virtio_dev_allmulticast_enable(struct rte_eth_dev *dev)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -470,8 +464,7 @@ virtio_dev_allmulticast_enable(struct rte_eth_dev *dev)
 static void
 virtio_dev_allmulticast_disable(struct rte_eth_dev *dev)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -853,8 +846,7 @@ virtio_interrupt_handler(__rte_unused struct 
rte_intr_handle *handle,
 void *param)
 {
struct rte_eth_dev *dev = param;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
uint8_t isr;

/* Read interrupt status which clears interrupt */
@@ -880,12 +872,11 @@ static int
 eth_virtio_dev_init(__rte_unused struct eth_driver *eth_drv,
struct rte_eth_dev *eth_dev)
 {
+   struct virtio_hw *hw = eth_dev->data->dev_private;
struct virtio_net_config *config;
struct virtio_net_config local_config;
uint32_t offset_conf = sizeof(config->mac);
struct rte_pci_device *pci_dev;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);

if (RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr)) {
PMD_INIT_LOG(ERR,
@@ -1010,7 +1001,7 @@ static struct eth_driver rte_virtio_pmd = {
.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
},
.eth_dev_init = eth_virtio_dev_init,
-   .dev_private_size = sizeof(struct virtio_adapter),
+   .dev_private_size = sizeof(struct virtio_hw),
 };

 /*
@@ -1053,8 +1044,7 @@ static int
 virtio_dev_configure(struct rte_eth_dev *dev)
 {
const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+

[dpdk-dev] [PATCH v3 09/25] virtio: Fix how states are handled during initialization

2015-01-29 Thread Ouyang Changchun

Change order of initialiazation to match Linux kernel.
Don't blow away control queue by doing reset when stopped.

Calling dev_stop then dev_start would not work.
Dev_stop was calling virtio reset and that would clear all queues
and clear all feature negotiation.
Resolved by only doing reset on device removal.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 58 ---
 lib/librte_pmd_virtio/virtio_pci.c| 10 ++
 lib/librte_pmd_virtio/virtio_pci.h|  3 +-
 3 files changed, 37 insertions(+), 34 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 0d41e7f..47dd33d 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -398,9 +398,14 @@ virtio_dev_cq_queue_setup(struct rte_eth_dev *dev, 
uint16_t vtpci_queue_idx,
 static void
 virtio_dev_close(struct rte_eth_dev *dev)
 {
+   struct virtio_hw *hw = dev->data->dev_private;
+
PMD_INIT_LOG(DEBUG, "virtio_dev_close");

-   virtio_dev_stop(dev);
+   /* reset the NIC */
+   vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
+   vtpci_reset(hw);
+   virtio_dev_free_mbufs(dev);
 }

 static void
@@ -889,6 +894,9 @@ eth_virtio_dev_init(__rte_unused struct eth_driver *eth_drv,
if (rte_eal_process_type() == RTE_PROC_SECONDARY)
return 0;

+   /* Tell the host we've noticed this device. */
+   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
+
pci_dev = eth_dev->pci_dev;
if (virtio_resource_init(pci_dev) < 0)
return -1;
@@ -899,9 +907,6 @@ eth_virtio_dev_init(__rte_unused struct eth_driver *eth_drv,
/* Reset the device although not necessary at startup */
vtpci_reset(hw);

-   /* Tell the host we've noticed this device. */
-   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
-
/* Tell the host we've known how to drive the device. */
vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
virtio_negotiate_features(hw);
@@ -990,6 +995,9 @@ eth_virtio_dev_init(__rte_unused struct eth_driver *eth_drv,
/* Setup interrupt callback  */
rte_intr_callback_register(&pci_dev->intr_handle,
   virtio_interrupt_handler, eth_dev);
+
+   virtio_dev_cq_start(eth_dev);
+
return 0;
 }

@@ -1044,7 +1052,6 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 {
const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
struct virtio_hw *hw = dev->data->dev_private;
-   int ret;

PMD_INIT_LOG(DEBUG, "configure");

@@ -1055,11 +1062,12 @@ virtio_dev_configure(struct rte_eth_dev *dev)

hw->vlan_strip = rxmode->hw_vlan_strip;

-   ret = vtpci_irq_config(hw, 0);
-   if (ret != 0)
+   if (vtpci_irq_config(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
PMD_DRV_LOG(ERR, "failed to set config vector");
+   return -EBUSY;
+   }

-   return ret;
+   return 0;
 }


@@ -1069,17 +1077,6 @@ virtio_dev_start(struct rte_eth_dev *dev)
uint16_t nb_queues, i;
struct virtio_hw *hw = dev->data->dev_private;

-   /* Tell the host we've noticed this device. */
-   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
-
-   /* Tell the host we've known how to drive the device. */
-   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
-
-   virtio_dev_cq_start(dev);
-
-   /* Do final configuration before rx/tx engine starts */
-   virtio_dev_rxtx_start(dev);
-
/* check if lsc interrupt feature is enabled */
if (dev->data->dev_conf.intr_conf.lsc) {
if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
@@ -1096,8 +1093,16 @@ virtio_dev_start(struct rte_eth_dev *dev)
/* Initialize Link state */
virtio_dev_link_update(dev, 0);

+   /* On restart after stop do not touch queues */
+   if (hw->started)
+   return 0;
+
vtpci_reinit_complete(hw);

+   /* Do final configuration before rx/tx engine starts */
+   virtio_dev_rxtx_start(dev);
+   hw->started = 1;
+
/*Notify the backend
 *Otherwise the tap backend might already stop its queue due to 
fullness.
 *vhost backend will have no chance to be waked up
@@ -1168,17 +1173,20 @@ static void virtio_dev_free_mbufs(struct rte_eth_dev 
*dev)
 }

 /*
- * Stop device: disable rx and tx functions to allow for reconfiguring.
+ * Stop device: disable interrupt and mark link down
  */
 static void
 virtio_dev_stop(struct rte_eth_dev *dev)
 {
-   struct virtio_hw *hw = dev->data->dev_private;
+   struct rte_eth_link link;

-   /* reset the NIC */
-   vtpci_irq_config(hw, 0);
-   vtpci_reset(hw);
-   virtio_dev_free_mbufs(dev);
+   PMD_INIT_LOG(DEBUG, "stop");
+
+   if (dev->data->dev_conf.intr_conf.lsc)
+   rte_intr_disable(&dev->pci_dev-

[dpdk-dev] [PATCH v3 00/25] Single virtio implementation

2015-01-29 Thread Ouyang Changchun

This is the patch set for single virtio implementation.

Why we need single virtio?

As we know currently there are at least 3 virtio PMD driver implementations:
A) lib/librte_pmd_virtio(refer as virtio A);
B) virtio_net_pmd by 6wind(refer as virtio B);
C) virtio by Brocade/vyatta(refer as virtio C);

Integrating 3 implementations into one could reduce the maintaining cost and 
time,
in other hand, user don't need practice their application on 3 variant one by 
one to see
which one is the best for them;

What's the status?

Currently virtio A has covered most features of virtio B except for using port 
io to get pci resource,
so there is a patch(17/22) to resolve it. But on the other hand there are a few 
differences between
virtio A and virtio C, it needs integrate features/codes of virtio C into 
virtio A.
This patch set bases on two original RFC patch sets from Stephen 
Hemminger[stephen at networkplumber.org]
Refer to [http://dpdk.org/ml/archives/dev/2014-August/004845.html ] for the 
original one.
This patch set also resolves some conflict with latest codes, removed 
duplicated codes, fix some
issues in original codes.

What this patch set contains:
===
  1) virtio: Rearrange resource initialization, it extracts a function to setup 
PCI resources;
  2) virtio: Use weaker barriers, as DPDK driver only has to deal with the case 
of running on PCI
 and with SMP, In this case, the code can use the weaker barriers instead 
of using hard (fence)
 barriers. This may help performance a bit;
  3) virtio: Allow starting with link down, other driver has similar behavior;
  4) virtio: Add support for Link State interrupt;
  5) ether: Add soft vlan encap/decap functions, it helps if HW don't support 
vlan strip;
  6) virtio: Use software vlan stripping;
  7) virtio: Remove unnecessary adapter structure;
  8) virtio: Remove redundant vq_alignment, as vq alignment is always 4K, so 
use constant when needed;
  9) virtio: Fix how states are handled during initialization, this is to match 
Linux kernel;
  10) virtio: Make vtpci_get_status a local function as it is used in one file;
  11) virtio: Check for packet headroom at compile time;
  12) virtio: Move allocation before initialization to avoid being stuck in 
middle of virtio init;
  13) virtio: Add support for vlan filtering;
  14) virtio: Add support for multiple mac addresses;
  15) virtio: Add ability to set MAC address;
  16) virtio: Free mbuf's with threshold, this makes its behavior more like 
ixgbe;
  17) virtio: Use port IO to get PCI resource for security reasons and match 
virtio-net-pmd;
  18) virtio: Fix descriptor index issue;
  19) ether: Fix vlan strip/insert issue;
  20) example/vhost: Avoid inserting vlan twice and guest and host;
  21) example/vhost: Add vlan-strip cmd line option to turn on/off vlan strip 
on host;
  22) virtio: Use soft vlan strip in mergeable Rx path, this makes it has 
consistent logic
  with the normal Rx path.

Changes in v2:
  23) virtio: Fix zero copy break issue, the vring should be ready before 
virtio PMD set
  the status of DRIVER_OK;
  24) virtio: Remove unnecessary hotspots in data path.

Changes in v3:
  25) virtio: Fix wmb issue;
  26) Fix one minor issue in patch 20, also fix its idention.


Changchun Ouyang (9):
  virtio: Use port IO to get PCI resource.
  virtio: Fix descriptor index issue
  ether: Fix vlan strip/insert issue
  example/vhost: Avoid inserting vlan twice
  example/vhost: Add vlan-strip cmd line option
  virtio: Use soft vlan strip in mergeable Rx path
  virtio: Fix zero copy break issue
  virtio: Remove hotspots
  virtio: Fix wmb issue

Stephen Hemminger (16):
  virtio: Rearrange resource initialization
  virtio: Use weaker barriers
  virtio: Allow starting with link down
  virtio: Add support for Link State interrupt
  ether: Add soft vlan encap/decap functions
  virtio: Use software vlan stripping
  virtio: Remove unnecessary adapter structure
  virtio: Remove redundant vq_alignment
  virtio: Fix how states are handled during initialization
  virtio: Make vtpci_get_status local
  virtio: Check for packet headroom at compile time
  virtio: Move allocation before initialization
  virtio: Add support for vlan filtering
  virtio: Add suport for multiple mac addresses
  virtio: Add ability to set MAC address
  virtio: Free mbuf's with threshold

 config/common_linuxapp  |   2 +
 examples/vhost/main.c   |  70 +++--
 lib/librte_eal/common/include/rte_pci.h |   4 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   |   5 +-
 lib/librte_ether/rte_ethdev.h   |   8 +
 lib/librte_ether/rte_ether.h|  76 +
 lib/librte_pmd_virtio/virtio_ethdev.c   | 492 +---
 lib/librte_pmd_virtio/virtio_ethdev.h   |  12 +-
 lib/librte_pmd_virtio/virtio_pci.c  |  20 +-
 lib/librte_pmd_virtio/virtio_pci.h  |   8 +-
 lib/librte_pmd_virtio/virtio_rxtx.c

[dpdk-dev] [PATCH v3 11/25] virtio: Check for packet headroom at compile time

2015-01-29 Thread Ouyang Changchun

Better to check at compile time than fail at runtime.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 47dd33d..9679c2f 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -882,11 +882,7 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
uint32_t offset_conf = sizeof(config->mac);
struct rte_pci_device *pci_dev;

-   if (RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr)) {
-   PMD_INIT_LOG(ERR,
-   "MBUF HEADROOM should be enough to hold virtio net 
hdr\n");
-   return -1;
-   }
+   RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr));

eth_dev->dev_ops = &virtio_eth_dev_ops;
eth_dev->tx_pkt_burst = &virtio_xmit_pkts;
-- 
1.8.4.2

[dpdk-dev] [PATCH v3 08/25] virtio: Remove redundant vq_alignment

2015-01-29 Thread Ouyang Changchun

Since vq_alignment is constant (always 4K), it does not
need to be part of the vring struct.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 1 -
 lib/librte_pmd_virtio/virtio_rxtx.c   | 2 +-
 lib/librte_pmd_virtio/virtqueue.h | 3 +--
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 59b74b7..0d41e7f 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -294,7 +294,6 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq->port_id = dev->data->port_id;
vq->queue_id = queue_idx;
vq->vq_queue_index = vtpci_queue_idx;
-   vq->vq_alignment = VIRTIO_PCI_VRING_ALIGN;
vq->vq_nentries = vq_size;
vq->vq_free_cnt = vq_size;

diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index a82d5ff..b6d6832 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -258,7 +258,7 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 * Reinitialise since virtio port might have been stopped and restarted
 */
memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size);
-   vring_init(vr, size, ring_mem, vq->vq_alignment);
+   vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
vq->vq_used_cons_idx = 0;
vq->vq_desc_head_idx = 0;
vq->vq_avail_idx = 0;
diff --git a/lib/librte_pmd_virtio/virtqueue.h 
b/lib/librte_pmd_virtio/virtqueue.h
index f6ad98d..5b8a255 100644
--- a/lib/librte_pmd_virtio/virtqueue.h
+++ b/lib/librte_pmd_virtio/virtqueue.h
@@ -138,8 +138,7 @@ struct virtqueue {
uint8_t port_id;  /**< Device port identifier. */

void*vq_ring_virt_mem;/**< linear address of vring*/
-   int vq_alignment;
-   int vq_ring_size;
+   unsigned int vq_ring_size;
phys_addr_t vq_ring_mem;  /**< physical address of vring */

struct vring vq_ring;/**< vring keeping desc, used and avail */
-- 
1.8.4.2

[dpdk-dev] [PATCH v3 12/25] virtio: Move allocation before initialization

2015-01-29 Thread Ouyang Changchun

If allocation fails, don't want to leave virtio device stuck
in middle of initialization sequence.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 9679c2f..39b1fb4 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -890,6 +890,15 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
if (rte_eal_process_type() == RTE_PROC_SECONDARY)
return 0;

+   /* Allocate memory for storing MAC addresses */
+   eth_dev->data->mac_addrs = rte_zmalloc("virtio", ETHER_ADDR_LEN, 0);
+   if (eth_dev->data->mac_addrs == NULL) {
+   PMD_INIT_LOG(ERR,
+   "Failed to allocate %d bytes needed to store MAC 
addresses",
+   ETHER_ADDR_LEN);
+   return -ENOMEM;
+   }
+
/* Tell the host we've noticed this device. */
vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);

@@ -916,15 +925,6 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr);
}

-   /* Allocate memory for storing MAC addresses */
-   eth_dev->data->mac_addrs = rte_zmalloc("virtio", ETHER_ADDR_LEN, 0);
-   if (eth_dev->data->mac_addrs == NULL) {
-   PMD_INIT_LOG(ERR,
-   "Failed to allocate %d bytes needed to store MAC 
addresses",
-   ETHER_ADDR_LEN);
-   return -ENOMEM;
-   }
-
/* Copy the permanent MAC address to: virtio_hw */
virtio_get_hwaddr(hw);
ether_addr_copy((struct ether_addr *) hw->mac_addr,
-- 
1.8.4.2

[dpdk-dev] [PATCH v3 13/25] virtio: Add support for vlan filtering

2015-01-29 Thread Ouyang Changchun

Virtio supports vlan filtering.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 39b1fb4..591d692 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -84,6 +84,8 @@ static void virtio_dev_tx_queue_release(__rte_unused void 
*txq);
 static void virtio_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats 
*stats);
 static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
 static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);
+static int virtio_vlan_filter_set(struct rte_eth_dev *dev,
+   uint16_t vlan_id, int on);

 static int virtio_dev_queue_stats_mapping_set(
__rte_unused struct rte_eth_dev *eth_dev,
@@ -511,6 +513,7 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
.tx_queue_release= virtio_dev_tx_queue_release,
/* collect stats per queue */
.queue_stats_mapping_set = virtio_dev_queue_stats_mapping_set,
+   .vlan_filter_set = virtio_vlan_filter_set,
 };

 static inline int
@@ -640,14 +643,31 @@ virtio_get_hwaddr(struct virtio_hw *hw)
}
 }

+static int
+virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   struct virtio_pmd_ctrl ctrl;
+   int len;
+
+   if (!vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN))
+   return -ENOTSUP;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_VLAN;
+   ctrl.hdr.cmd = on ? VIRTIO_NET_CTRL_VLAN_ADD : VIRTIO_NET_CTRL_VLAN_DEL;
+   memcpy(ctrl.data, &vlan_id, sizeof(vlan_id));
+   len = sizeof(vlan_id);
+
+   return virtio_send_command(hw->cvq, &ctrl, &len, 1);
+}

 static void
 virtio_negotiate_features(struct virtio_hw *hw)
 {
uint32_t host_features, mask;

-   mask = VIRTIO_NET_F_CTRL_VLAN;
-   mask |= VIRTIO_NET_F_CSUM | VIRTIO_NET_F_GUEST_CSUM;
+   /* checksum offload not implemented */
+   mask = VIRTIO_NET_F_CSUM | VIRTIO_NET_F_GUEST_CSUM;

/* TSO and LRO are only available when their corresponding
 * checksum offload feature is also negotiated.
@@ -1058,6 +1078,13 @@ virtio_dev_configure(struct rte_eth_dev *dev)

hw->vlan_strip = rxmode->hw_vlan_strip;

+   if (rxmode->hw_vlan_filter
+   && !vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN)) {
+   PMD_DRV_LOG(NOTICE,
+   "vlan filtering not available on this host");
+   return -ENOTSUP;
+   }
+
if (vtpci_irq_config(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
PMD_DRV_LOG(ERR, "failed to set config vector");
return -EBUSY;
-- 
1.8.4.2

[dpdk-dev] [PATCH v3 15/25] virtio: Add ability to set MAC address

2015-01-29 Thread Ouyang Changchun

Need to have do special things to set default mac address.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_ether/rte_ethdev.h |  5 +
 lib/librte_pmd_virtio/virtio_ethdev.c | 24 
 2 files changed, 29 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 94d6b2b..5a54276 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1240,6 +1240,10 @@ typedef void (*eth_mac_addr_add_t)(struct rte_eth_dev 
*dev,
  uint32_t vmdq);
 /**< @internal Set a MAC address into Receive Address Address Register */

+typedef void (*eth_mac_addr_set_t)(struct rte_eth_dev *dev,
+ struct ether_addr *mac_addr);
+/**< @internal Set a MAC address into Receive Address Address Register */
+
 typedef int (*eth_uc_hash_table_set_t)(struct rte_eth_dev *dev,
  struct ether_addr *mac_addr,
  uint8_t on);
@@ -1459,6 +1463,7 @@ struct eth_dev_ops {
priority_flow_ctrl_set_t   priority_flow_ctrl_set; /**< Setup priority 
flow control.*/
eth_mac_addr_remove_t  mac_addr_remove; /**< Remove MAC address */
eth_mac_addr_add_t mac_addr_add;  /**< Add a MAC address */
+   eth_mac_addr_set_t mac_addr_set;  /**< Set a MAC address */
eth_uc_hash_table_set_tuc_hash_table_set;  /**< Set Unicast Table 
Array */
eth_uc_all_hash_table_set_t uc_all_hash_table_set;  /**< Set Unicast 
hash bitmap */
eth_mirror_rule_set_t  mirror_rule_set;  /**< Add a traffic mirror 
rule.*/
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 0e74eea..b30ab2a 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -90,6 +90,8 @@ static void virtio_mac_addr_add(struct rte_eth_dev *dev,
struct ether_addr *mac_addr,
uint32_t index, uint32_t vmdq __rte_unused);
 static void virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
+static void virtio_mac_addr_set(struct rte_eth_dev *dev,
+   struct ether_addr *mac_addr);

 static int virtio_dev_queue_stats_mapping_set(
__rte_unused struct rte_eth_dev *eth_dev,
@@ -518,6 +520,7 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
.vlan_filter_set = virtio_vlan_filter_set,
.mac_addr_add= virtio_mac_addr_add,
.mac_addr_remove = virtio_mac_addr_remove,
+   .mac_addr_set= virtio_mac_addr_set,
 };

 static inline int
@@ -733,6 +736,27 @@ virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t 
index)
virtio_mac_table_set(hw, uc, mc);
 }

+static void
+virtio_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+
+   memcpy(hw->mac_addr, mac_addr, ETHER_ADDR_LEN);
+
+   /* Use atomic update if available */
+   if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+   struct virtio_pmd_ctrl ctrl;
+   int len = ETHER_ADDR_LEN;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET;
+
+   memcpy(ctrl.data, mac_addr, ETHER_ADDR_LEN);
+   virtio_send_command(hw->cvq, &ctrl, &len, 1);
+   } else if (vtpci_with_feature(hw, VIRTIO_NET_F_MAC))
+   virtio_set_hwaddr(hw);
+}
+
 static int
 virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 {
-- 
1.8.4.2

[dpdk-dev] [PATCH v3 10/25] virtio: Make vtpci_get_status local

2015-01-29 Thread Ouyang Changchun

Make vtpci_get_status a local function as it is used in one file.

igned-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_pci.c | 4 +++-
 lib/librte_pmd_virtio/virtio_pci.h | 2 --
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_pci.c 
b/lib/librte_pmd_virtio/virtio_pci.c
index b099e4f..2245bec 100644
--- a/lib/librte_pmd_virtio/virtio_pci.c
+++ b/lib/librte_pmd_virtio/virtio_pci.c
@@ -35,6 +35,8 @@
 #include "virtio_pci.h"
 #include "virtio_logs.h"

+static uint8_t vtpci_get_status(struct virtio_hw *);
+
 void
 vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset,
void *dst, int length)
@@ -113,7 +115,7 @@ vtpci_reinit_complete(struct virtio_hw *hw)
vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
 }

-uint8_t
+static uint8_t
 vtpci_get_status(struct virtio_hw *hw)
 {
return VIRTIO_READ_REG_1(hw, VIRTIO_PCI_STATUS);
diff --git a/lib/librte_pmd_virtio/virtio_pci.h 
b/lib/librte_pmd_virtio/virtio_pci.h
index 0a4b578..64d9c34 100644
--- a/lib/librte_pmd_virtio/virtio_pci.h
+++ b/lib/librte_pmd_virtio/virtio_pci.h
@@ -255,8 +255,6 @@ void vtpci_reset(struct virtio_hw *);

 void vtpci_reinit_complete(struct virtio_hw *);

-uint8_t vtpci_get_status(struct virtio_hw *);
-
 void vtpci_set_status(struct virtio_hw *, uint8_t);

 uint32_t vtpci_negotiate_features(struct virtio_hw *, uint32_t);
-- 
1.8.4.2

[dpdk-dev] [PATCH v3 16/25] virtio: Free mbuf's with threshold

2015-01-29 Thread Ouyang Changchun

This makes virtio driver work like ixgbe. Transmit buffers are
held until a transmit threshold is reached. The previous behavior
was to hold mbuf's until the ring entry was reused which caused
more memory usage than needed.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c |  7 ++--
 lib/librte_pmd_virtio/virtio_rxtx.c   | 75 +--
 lib/librte_pmd_virtio/virtqueue.h |  3 +-
 3 files changed, 60 insertions(+), 25 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index b30ab2a..8cd2d51 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -176,15 +176,16 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,

virtqueue_notify(vq);

-   while (vq->vq_used_cons_idx == vq->vq_ring.used->idx)
+   rte_rmb();
+   while (vq->vq_used_cons_idx == vq->vq_ring.used->idx) {
+   rte_rmb();
usleep(100);
+   }

while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
uint32_t idx, desc_idx, used_idx;
struct vring_used_elem *uep;

-   virtio_rmb();
-
used_idx = (uint32_t)(vq->vq_used_cons_idx
& (vq->vq_nentries - 1));
uep = &vq->vq_ring.used->ring[used_idx];
diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index b6d6832..580701a 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -129,17 +129,32 @@ virtqueue_dequeue_burst_rx(struct virtqueue *vq, struct 
rte_mbuf **rx_pkts,
return i;
 }

+#ifndef DEFAULT_TX_FREE_THRESH
+#define DEFAULT_TX_FREE_THRESH 32
+#endif
+
+/* Cleanup from completed transmits. */
 static void
-virtqueue_dequeue_pkt_tx(struct virtqueue *vq)
+virtio_xmit_cleanup(struct virtqueue *vq, uint16_t num)
 {
-   struct vring_used_elem *uep;
-   uint16_t used_idx, desc_idx;
+   uint16_t i, used_idx, desc_idx;
+   for (i = 0; i < num; i++) {
+   struct vring_used_elem *uep;
+   struct vq_desc_extra *dxp;
+
+   used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 
1));
+   uep = &vq->vq_ring.used->ring[used_idx];
+   dxp = &vq->vq_descx[used_idx];
+
+   desc_idx = (uint16_t) uep->id;
+   vq->vq_used_cons_idx++;
+   vq_ring_free_chain(vq, desc_idx);

-   used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 1));
-   uep = &vq->vq_ring.used->ring[used_idx];
-   desc_idx = (uint16_t) uep->id;
-   vq->vq_used_cons_idx++;
-   vq_ring_free_chain(vq, desc_idx);
+   if (dxp->cookie != NULL) {
+   rte_pktmbuf_free(dxp->cookie);
+   dxp->cookie = NULL;
+   }
+   }
 }


@@ -203,8 +218,6 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
rte_mbuf *cookie)

idx = head_idx;
dxp = &txvq->vq_descx[idx];
-   if (dxp->cookie != NULL)
-   rte_pktmbuf_free(dxp->cookie);
dxp->cookie = (void *)cookie;
dxp->ndescs = needed;

@@ -404,6 +417,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 {
uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
struct virtqueue *vq;
+   uint16_t tx_free_thresh;
int ret;

PMD_INIT_FUNC_TRACE();
@@ -421,6 +435,22 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
return ret;
}

+   tx_free_thresh = tx_conf->tx_free_thresh;
+   if (tx_free_thresh == 0)
+   tx_free_thresh =
+   RTE_MIN(vq->vq_nentries / 4, DEFAULT_TX_FREE_THRESH);
+
+   if (tx_free_thresh >= (vq->vq_nentries - 3)) {
+   RTE_LOG(ERR, PMD, "tx_free_thresh must be less than the "
+   "number of TX entries minus 3 (%u)."
+   " (tx_free_thresh=%u port=%u queue=%u)\n",
+   vq->vq_nentries - 3,
+   tx_free_thresh, dev->data->port_id, queue_idx);
+   return -EINVAL;
+   }
+
+   vq->vq_free_thresh = tx_free_thresh;
+
dev->data->tx_queues[queue_idx] = vq;
return 0;
 }
@@ -688,11 +718,9 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
 {
struct virtqueue *txvq = tx_queue;
struct rte_mbuf *txm;
-   uint16_t nb_used, nb_tx, num;
+   uint16_t nb_used, nb_tx;
int error;

-   nb_tx = 0;
-
if (unlikely(nb_pkts < 1))
return nb_pkts;

@@ -700,21 +728,26 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
nb_used = VIRTQUEUE_NUSED(txvq);

virtio_rmb();
+   if (likely(nb_used > txvq->vq_free_thresh))
+   virtio_xmit_cleanup(txv

[dpdk-dev] [PATCH v3 14/25] virtio: Add suport for multiple mac addresses

2015-01-29 Thread Ouyang Changchun

Virtio support multiple MAC addresses.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 94 ++-
 lib/librte_pmd_virtio/virtio_ethdev.h |  3 +-
 lib/librte_pmd_virtio/virtqueue.h | 34 -
 3 files changed, 127 insertions(+), 4 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 591d692..0e74eea 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -86,6 +86,10 @@ static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
 static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);
 static int virtio_vlan_filter_set(struct rte_eth_dev *dev,
uint16_t vlan_id, int on);
+static void virtio_mac_addr_add(struct rte_eth_dev *dev,
+   struct ether_addr *mac_addr,
+   uint32_t index, uint32_t vmdq __rte_unused);
+static void virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);

 static int virtio_dev_queue_stats_mapping_set(
__rte_unused struct rte_eth_dev *eth_dev,
@@ -503,8 +507,6 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
.stats_get   = virtio_dev_stats_get,
.stats_reset = virtio_dev_stats_reset,
.link_update = virtio_dev_link_update,
-   .mac_addr_add= NULL,
-   .mac_addr_remove = NULL,
.rx_queue_setup  = virtio_dev_rx_queue_setup,
/* meaningfull only to multiple queue */
.rx_queue_release= virtio_dev_rx_queue_release,
@@ -514,6 +516,8 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
/* collect stats per queue */
.queue_stats_mapping_set = virtio_dev_queue_stats_mapping_set,
.vlan_filter_set = virtio_vlan_filter_set,
+   .mac_addr_add= virtio_mac_addr_add,
+   .mac_addr_remove = virtio_mac_addr_remove,
 };

 static inline int
@@ -644,6 +648,92 @@ virtio_get_hwaddr(struct virtio_hw *hw)
 }

 static int
+virtio_mac_table_set(struct virtio_hw *hw,
+const struct virtio_net_ctrl_mac *uc,
+const struct virtio_net_ctrl_mac *mc)
+{
+   struct virtio_pmd_ctrl ctrl;
+   int err, len[2];
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_TABLE_SET;
+
+   len[0] = uc->entries * ETHER_ADDR_LEN + sizeof(uc->entries);
+   memcpy(ctrl.data, uc, len[0]);
+
+   len[1] = mc->entries * ETHER_ADDR_LEN + sizeof(mc->entries);
+   memcpy(ctrl.data + len[0], mc, len[1]);
+
+   err = virtio_send_command(hw->cvq, &ctrl, len, 2);
+   if (err != 0)
+   PMD_DRV_LOG(NOTICE, "mac table set failed: %d", err);
+
+   return err;
+}
+
+static void
+virtio_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+   uint32_t index, uint32_t vmdq __rte_unused)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   const struct ether_addr *addrs = dev->data->mac_addrs;
+   unsigned int i;
+   struct virtio_net_ctrl_mac *uc, *mc;
+
+   if (index >= VIRTIO_MAX_MAC_ADDRS) {
+   PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
+   return;
+   }
+
+   uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(uc->entries));
+   uc->entries = 0;
+   mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(mc->entries));
+   mc->entries = 0;
+
+   for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
+   const struct ether_addr *addr
+   = (i == index) ? mac_addr : addrs + i;
+   struct virtio_net_ctrl_mac *tbl
+   = is_multicast_ether_addr(addr) ? mc : uc;
+
+   memcpy(&tbl->macs[tbl->entries++], addr, ETHER_ADDR_LEN);
+   }
+
+   virtio_mac_table_set(hw, uc, mc);
+}
+
+static void
+virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   struct ether_addr *addrs = dev->data->mac_addrs;
+   struct virtio_net_ctrl_mac *uc, *mc;
+   unsigned int i;
+
+   if (index >= VIRTIO_MAX_MAC_ADDRS) {
+   PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
+   return;
+   }
+
+   uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(uc->entries));
+   uc->entries = 0;
+   mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(mc->entries));
+   mc->entries = 0;
+
+   for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
+   struct virtio_net_ctrl_mac *tbl;
+
+   if (i == index || is_zero_ether_addr(addrs + i))
+   continue;
+
+   tbl = is_multicast_ether_addr(addrs + i) ? mc : uc;
+   memcpy(&tbl->macs[tbl->entries++], addrs + i, ETHER_

[dpdk-dev] [PATCH v3 18/25] virtio: Fix descriptor index issue

2015-01-29 Thread Ouyang Changchun

It should use vring descriptor index instead of used_ring index to index 
vq_descx.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index 580701a..a82e8eb 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -144,9 +144,9 @@ virtio_xmit_cleanup(struct virtqueue *vq, uint16_t num)

used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 
1));
uep = &vq->vq_ring.used->ring[used_idx];
-   dxp = &vq->vq_descx[used_idx];

desc_idx = (uint16_t) uep->id;
+   dxp = &vq->vq_descx[desc_idx];
vq->vq_used_cons_idx++;
vq_ring_free_chain(vq, desc_idx);

-- 
1.8.4.2

[dpdk-dev] [PATCH v3 19/25] ether: Fix vlan strip/insert issue

2015-01-29 Thread Ouyang Changchun

Need swap the data from cpu to BE(big endian) for vlan-type.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_ether/rte_ether.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ether.h b/lib/librte_ether/rte_ether.h
index 74f71c2..0797908 100644
--- a/lib/librte_ether/rte_ether.h
+++ b/lib/librte_ether/rte_ether.h
@@ -351,7 +351,7 @@ static inline int rte_vlan_strip(struct rte_mbuf *m)
struct ether_hdr *eh
 = rte_pktmbuf_mtod(m, struct ether_hdr *);

-   if (eh->ether_type != ETHER_TYPE_VLAN)
+   if (eh->ether_type != rte_cpu_to_be_16(ETHER_TYPE_VLAN))
return -1;

struct vlan_hdr *vh = (struct vlan_hdr *)(eh + 1);
@@ -401,7 +401,7 @@ static inline int rte_vlan_insert(struct rte_mbuf **m)
return -ENOSPC;

memmove(nh, oh, 2 * ETHER_ADDR_LEN);
-   nh->ether_type = ETHER_TYPE_VLAN;
+   nh->ether_type = rte_cpu_to_be_16(ETHER_TYPE_VLAN);

vh = (struct vlan_hdr *) (nh + 1);
vh->vlan_tci = rte_cpu_to_be_16((*m)->vlan_tci);
-- 
1.8.4.2

[dpdk-dev] [PATCH v3 17/25] virtio: Use port IO to get PCI resource.

2015-01-29 Thread Ouyang Changchun

Make virtio not require UIO for some security reasons, this is to match 6Wind's 
virtio-net-pmd.

Signed-off-by: Changchun Ouyang 
---
 config/common_linuxapp  |  2 +
 lib/librte_eal/common/include/rte_pci.h |  4 ++
 lib/librte_eal/linuxapp/eal/eal_pci.c   |  5 +-
 lib/librte_pmd_virtio/virtio_ethdev.c   | 91 -
 4 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 2f9643b..a412457 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -100,6 +100,8 @@ CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
 CONFIG_RTE_EAL_VFIO=y
+# Only for VIRTIO PMD currently
+CONFIG_RTE_EAL_PORT_IO=n

 #
 # Special configurations in PCI Config Space for high performance
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 66ed793..19abc1f 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -193,6 +193,10 @@ struct rte_pci_driver {

 /** Device needs PCI BAR mapping (done with either IGB_UIO or VFIO) */
 #define RTE_PCI_DRV_NEED_MAPPING 0x0001
+/** Device needs port IO(done with /proc/ioports) */
+#ifdef RTE_EAL_PORT_IO
+#define RTE_PCI_DRV_PORT_IO 0x0002
+#endif
 /** Device driver must be registered several times until failure - deprecated 
*/
 #pragma GCC poison RTE_PCI_DRV_MULTIPLE
 /** Device needs to be unbound even if no module is provided */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index b5f5410..5db0059 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -574,7 +574,10 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
/* map resources for devices that use igb_uio */
ret = pci_map_device(dev);
if (ret != 0)
-   return ret;
+#ifdef RTE_EAL_PORT_IO
+   if ((dr->drv_flags & RTE_PCI_DRV_PORT_IO) == 0)
+#endif
+   return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
   rte_eal_process_type() == RTE_PROC_PRIMARY) {
/* unbind current driver */
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 8cd2d51..b905532 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -961,6 +961,71 @@ static int virtio_resource_init(struct rte_pci_device 
*pci_dev)
 start, size);
return 0;
 }
+
+#ifdef RTE_EAL_PORT_IO
+/* Extract port I/O numbers from proc/ioports */
+static int virtio_resource_init_by_portio(struct rte_pci_device *pci_dev)
+{
+   uint16_t start, end;
+   int size;
+   FILE *fp;
+   char *line = NULL;
+   char pci_id[16];
+   int found = 0;
+   size_t linesz;
+
+   snprintf(pci_id, sizeof(pci_id), PCI_PRI_FMT,
+pci_dev->addr.domain,
+pci_dev->addr.bus,
+pci_dev->addr.devid,
+pci_dev->addr.function);
+
+   fp = fopen("/proc/ioports", "r");
+   if (fp == NULL) {
+   PMD_INIT_LOG(ERR, "%s(): can't open ioports", __func__);
+   return -1;
+   }
+
+   while (getdelim(&line, &linesz, '\n', fp) > 0) {
+   char *ptr = line;
+   char *left;
+   int n;
+
+   n = strcspn(ptr, ":");
+   ptr[n] = 0;
+   left = &ptr[n+1];
+
+   while (*left && isspace(*left))
+   left++;
+
+   if (!strncmp(left, pci_id, strlen(pci_id))) {
+   found = 1;
+
+   while (*ptr && isspace(*ptr))
+   ptr++;
+
+   sscanf(ptr, "%04hx-%04hx", &start, &end);
+   size = end - start + 1;
+
+   break;
+   }
+   }
+
+   free(line);
+   fclose(fp);
+
+   if (!found)
+   return -1;
+
+   pci_dev->mem_resource[0].addr = (void *)(uintptr_t)(uint32_t)start;
+   pci_dev->mem_resource[0].len =  (uint64_t)size;
+   PMD_INIT_LOG(DEBUG,
+"PCI Port IO found start=0x%lx with size=0x%lx",
+start, size);
+   return 0;
+}
+#endif
+
 #else
 static int
 virtio_has_msix(const struct rte_pci_addr *loc __rte_unused)
@@ -974,6 +1039,14 @@ static int virtio_resource_init(struct rte_pci_device 
*pci_dev __rte_unused)
/* no setup required */
return 0;
 }
+
+#ifdef RTE_EAL_PORT_IO
+static int virtio_resource_init_by_portio(struct rte_pci_device *pci_dev)
+{
+   /* no setup required */
+   return 0;
+}
+#endif
 #endif

 /*
@@ -1039,7 +1112,10 @@ eth_virti

[dpdk-dev] [PATCH v3 20/25] example/vhost: Avoid inserting vlan twice

2015-01-29 Thread Ouyang Changchun

Check if it has already been vlan-tagged packet, if true, avoid inserting a
duplicated vlan tag into it.

This is a possible case when guest has the capability of inserting vlan tag.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 45 -
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 04f0118..6af7874 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1115,6 +1115,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
unsigned len, ret, offset = 0;
const uint16_t lcore_id = rte_lcore_id();
struct virtio_net *dev = vdev->dev;
+   struct ether_hdr *nh;

/*check if destination is local VM*/
if ((vm2vm_mode == VM2VM_SOFTWARE) && (virtio_tx_local(vdev, m) == 0)) {
@@ -1135,28 +1136,38 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
tx_q = &lcore_tx_queue[lcore_id];
len = tx_q->len;

-   m->ol_flags = PKT_TX_VLAN_PKT;
+   nh = rte_pktmbuf_mtod(m, struct ether_hdr *);
+   if (unlikely(nh->ether_type == rte_cpu_to_be_16(ETHER_TYPE_VLAN))) {
+   /* Guest has inserted the vlan tag. */
+   struct vlan_hdr *vh = (struct vlan_hdr *) (nh + 1);
+   uint16_t vlan_tag_be = rte_cpu_to_be_16(vlan_tag);
+   if ((vm2vm_mode == VM2VM_HARDWARE) &&
+   (vh->vlan_tci != vlan_tag_be))
+   vh->vlan_tci = vlan_tag_be;
+   } else {
+   m->ol_flags = PKT_TX_VLAN_PKT;

-   /*
-* Find the right seg to adjust the data len when offset is
-* bigger than tail room size.
-*/
-   if (unlikely(vm2vm_mode == VM2VM_HARDWARE)) {
-   if (likely(offset <= rte_pktmbuf_tailroom(m)))
-   m->data_len += offset;
-   else {
-   struct rte_mbuf *seg = m;
+   /*
+* Find the right seg to adjust the data len when offset is
+* bigger than tail room size.
+*/
+   if (unlikely(vm2vm_mode == VM2VM_HARDWARE)) {
+   if (likely(offset <= rte_pktmbuf_tailroom(m)))
+   m->data_len += offset;
+   else {
+   struct rte_mbuf *seg = m;

-   while ((seg->next != NULL) &&
-   (offset > rte_pktmbuf_tailroom(seg)))
-   seg = seg->next;
+   while ((seg->next != NULL) &&
+   (offset > rte_pktmbuf_tailroom(seg)))
+   seg = seg->next;

-   seg->data_len += offset;
+   seg->data_len += offset;
+   }
+   m->pkt_len += offset;
}
-   m->pkt_len += offset;
-   }

-   m->vlan_tci = vlan_tag;
+   m->vlan_tci = vlan_tag;
+   }

tx_q->m_table[len] = m;
len++;
-- 
1.8.4.2

[dpdk-dev] [PATCH v3 21/25] example/vhost: Add vlan-strip cmd line option

2015-01-29 Thread Ouyang Changchun

Support turn on/off RX VLAN strip on host, this let guest get the chance of
using its software VALN strip functionality.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 6af7874..1876c8e 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -159,6 +159,9 @@ static uint32_t num_devices;
 static uint32_t zero_copy;
 static int mergeable;

+/* Do vlan strip on host, enabled on default */
+static uint32_t vlan_strip = 1;
+
 /* number of descriptors to apply*/
 static uint32_t num_rx_descriptor = RTE_TEST_RX_DESC_DEFAULT_ZCP;
 static uint32_t num_tx_descriptor = RTE_TEST_TX_DESC_DEFAULT_ZCP;
@@ -564,6 +567,7 @@ us_vhost_usage(const char *prgname)
"   --rx-retry-delay [0-N]: timeout(in usecond) between 
retries on RX. This makes effect only if retries on rx enabled\n"
"   --rx-retry-num [0-N]: the number of retries on rx. This 
makes effect only if retries on rx enabled\n"
"   --mergeable [0|1]: disable(default)/enable RX mergeable 
buffers\n"
+   "   --vlan-strip [0|1]: disable/enable(default) RX VLAN 
strip on host\n"
"   --stats [0-N]: 0: Disable stats, N: Time in seconds to 
print stats\n"
"   --dev-basename: The basename to be used for the 
character device.\n"
"   --zero-copy [0|1]: disable(default)/enable rx/tx "
@@ -591,6 +595,7 @@ us_vhost_parse_args(int argc, char **argv)
{"rx-retry-delay", required_argument, NULL, 0},
{"rx-retry-num", required_argument, NULL, 0},
{"mergeable", required_argument, NULL, 0},
+   {"vlan-strip", required_argument, NULL, 0},
{"stats", required_argument, NULL, 0},
{"dev-basename", required_argument, NULL, 0},
{"zero-copy", required_argument, NULL, 0},
@@ -691,6 +696,22 @@ us_vhost_parse_args(int argc, char **argv)
}
}

+   /* Enable/disable RX VLAN strip on host. */
+   if (!strncmp(long_option[option_index].name,
+   "vlan-strip", MAX_LONG_OPT_SZ)) {
+   ret = parse_num_opt(optarg, 1);
+   if (ret == -1) {
+   RTE_LOG(INFO, VHOST_CONFIG,
+   "Invalid argument for VLAN 
strip [0|1]\n");
+   us_vhost_usage(prgname);
+   return -1;
+   } else {
+   vlan_strip = !!ret;
+   vmdq_conf_default.rxmode.hw_vlan_strip =
+   vlan_strip;
+   }
+   }
+
/* Enable/disable stats. */
if (!strncmp(long_option[option_index].name, "stats", 
MAX_LONG_OPT_SZ)) {
ret = parse_num_opt(optarg, INT32_MAX);
@@ -950,7 +971,9 @@ link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m)
dev->device_fh);

/* Enable stripping of the vlan tag as we handle routing. */
-   rte_eth_dev_set_vlan_strip_on_queue(ports[0], 
(uint16_t)vdev->vmdq_rx_q, 1);
+   if (vlan_strip)
+   rte_eth_dev_set_vlan_strip_on_queue(ports[0],
+   (uint16_t)vdev->vmdq_rx_q, 1);

/* Set device as ready for RX. */
vdev->ready = DEVICE_RX;
-- 
1.8.4.2

[dpdk-dev] [PATCH v3 22/25] virtio: Use soft vlan strip in mergeable Rx path

2015-01-29 Thread Ouyang Changchun

To keep the consistent logic with normal Rx path, the mergeable
Rx path also needs software vlan strip/decap if it is enabled.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_rxtx.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index a82e8eb..c6d9ae7 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -568,6 +568,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,
uint16_t nb_pkts)
 {
struct virtqueue *rxvq = rx_queue;
+   struct virtio_hw *hw = rxvq->hw;
struct rte_mbuf *rxm, *new_mbuf;
uint16_t nb_used, num, nb_rx = 0;
uint32_t len[VIRTIO_MBUF_BURST_SZ];
@@ -674,6 +675,9 @@ virtio_recv_mergeable_pkts(void *rx_queue,
seg_res -= rcv_cnt;
}

+   if (hw->vlan_strip)
+   rte_vlan_strip(rx_pkts[nb_rx]);
+
VIRTIO_DUMP_PACKET(rx_pkts[nb_rx],
rx_pkts[nb_rx]->data_len);

-- 
1.8.4.2

[dpdk-dev] [PATCH v3 23/25] virtio: Fix zero copy break issue

2015-01-29 Thread Ouyang Changchun

vHOST zero copy need get vring descriptor and its buffer address to
set the DMA address of HW ring, it is done in new_device when ioctl set_backend
is called. This requies virtio_dev_rxtx_start is called before 
vtpci_reinit_complete,
which makes sure the vring descriptro and its buffer is ready before its using.

this patch also fixes one set status issue, according to virtio spec,
VIRTIO_CONFIG_STATUS_ACK should be set after virtio hw reset.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index b905532..648c761 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -414,6 +414,7 @@ virtio_dev_close(struct rte_eth_dev *dev)
/* reset the NIC */
vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
vtpci_reset(hw);
+   hw->started = 0;
virtio_dev_free_mbufs(dev);
 }

@@ -1107,9 +1108,6 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
return -ENOMEM;
}

-   /* Tell the host we've noticed this device. */
-   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
-
pci_dev = eth_dev->pci_dev;
if (virtio_resource_init(pci_dev) < 0)
 #ifdef RTE_EAL_PORT_IO
@@ -1123,6 +1121,9 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
/* Reset the device although not necessary at startup */
vtpci_reset(hw);

+   /* Tell the host we've noticed this device. */
+   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
+
/* Tell the host we've known how to drive the device. */
vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
virtio_negotiate_features(hw);
@@ -1324,10 +1325,10 @@ virtio_dev_start(struct rte_eth_dev *dev)
if (hw->started)
return 0;

-   vtpci_reinit_complete(hw);
-
/* Do final configuration before rx/tx engine starts */
virtio_dev_rxtx_start(dev);
+   vtpci_reinit_complete(hw);
+
hw->started = 1;

/*Notify the backend
-- 
1.8.4.2

1 2 >

1 - 100 of 176 matches

Mail list logo