[dpdk-dev] 4 Traffic classes per Pipe limitation

2015-06-06 Thread Michael Sardo
Oops, I should have searched a bit more before asking. I see that they've
already been made available:
http://dpdk.org/ml/archives/dev/attachments/20150423/17a4d8de/attachment-0001.pdf

Thanks.

-Mike

On Sat, Jun 6, 2015 at 5:05 PM, Michael Sardo  wrote:

> Hello Cristian,
>
> Are the slides shown in that video available? They're very helpful.
>
> -Mike
>
> On Fri, Jun 5, 2015 at 4:50 PM, Dumitrescu, Cristian <
> cristian.dumitrescu at intel.com> wrote:
>
>> Hi Avinash,
>>
>> > -Original Message-
>> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yeddula, Avinash
>> > Sent: Friday, June 5, 2015 6:06 PM
>> > To: dev at dpdk.org
>> > Subject: [dpdk-dev] 4 Traffic classes per Pipe limitation
>> >
>> > Hi,
>> > This is related to the QOS scheduler functionality provided by dpdk.
>> >
>> > I see a limit on the number of traffic classes to be 4.  I'm exploring
>> the
>> > available options to increase that limit to 8.
>>
>> Yes, there are 4x traffic classes (scheduled in strict priority), but
>> each traffic class has 4x queues (scheduled using WFQ); for big weight
>> ratios between queues (e.g. 1:4 or 1:8, etc), WFQ becomes very similar to
>> strict priority, a king of strict priority without starvation. So the 16x
>> queues per pipe can be considered 16x sub-traffic-classes.
>>
>> You might want to watch this video on DPDK QoS:
>> https://youtu.be/_PPklkWGugs
>>
>> >
>> > This is what I found when I researched on this topic.
>> > The limitation on number's of TC (and pipes) comes from the number of
>> > bits available. Since the QoS code overloads the 32 bit RSS field in
>> > the mbuf there isn't enough bits to a lot. But then again if you add
>> lots
>> > of pipes or subports the memory footprint gets huge.
>>
>> It is not that simple. The number of 4x traffic classes in deeply built
>> into the implementation for performance reasons. Increasing the number of
>> bits allocated to traffic class in mbuf->sched would not help.
>>
>> >
>> > Any more info or suggestions on increasing the limit to 8 ?
>>
>> Yes, look at the 16x pipe queues as 16x (sub)traffic classes.
>> >
>> > Thanks
>> > -Avinash
>>
>
>


[dpdk-dev] [PATCH] log:Change magic number on RTE_LOG_LEVEL to a define

2015-06-06 Thread Keith Wiles
Config files used RTE_LOG_LEVEL=8 to set log level to DEBUG. Using
a the RTE_LOG_ is easier to maintain.

Converted the RTE_LOG_ defines into a enum of values with
the same names for to reduct maintaining the values and allow
debuggers to print the name of the value.

Signed-off-by: Keith Wiles 
---
 config/common_bsdapp|  8 +++-
 config/common_linuxapp  |  8 +++-
 lib/librte_eal/common/eal_common_log.c  |  4 ++--
 lib/librte_eal/common/include/rte_log.h | 19 +++
 4 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 0b169c8..97bbcbd 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -93,12 +93,18 @@ CONFIG_RTE_MAX_NUMA_NODES=8
 CONFIG_RTE_MAX_MEMSEG=256
 CONFIG_RTE_MAX_MEMZONE=2560
 CONFIG_RTE_MAX_TAILQ=32
-CONFIG_RTE_LOG_LEVEL=8
 CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n

 #
+# Log level use: RTE_LOG_XXX
+#   XXX = NOOP, EMERG, ALERT, CRIT, ERR, WARNING, NOTICE, INFO or DEBUG
+#   Look in rte_log.h for others if any.
+#
+CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG
+
+#
 # FreeBSD contiguous memory driver settings
 #
 CONFIG_RTE_CONTIGMEM_MAX_NUM_BUFS=64
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 5deb55a..886fc66 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -93,7 +93,6 @@ CONFIG_RTE_MAX_NUMA_NODES=8
 CONFIG_RTE_MAX_MEMSEG=256
 CONFIG_RTE_MAX_MEMZONE=2560
 CONFIG_RTE_MAX_TAILQ=32
-CONFIG_RTE_LOG_LEVEL=8
 CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
@@ -102,6 +101,13 @@ CONFIG_RTE_EAL_IGB_UIO=y
 CONFIG_RTE_EAL_VFIO=y

 #
+# Log level use: RTE_LOG_XXX
+#   XXX = NOOP, EMERG, ALERT, CRIT, ERR, WARNING, NOTICE, INFO or DEBUG
+#   Look in rte_log.h for others if any.
+#
+CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG
+
+#
 # Special configurations in PCI Config Space for high performance
 #
 CONFIG_RTE_PCI_CONFIG=n
diff --git a/lib/librte_eal/common/eal_common_log.c 
b/lib/librte_eal/common/eal_common_log.c
index fe3d7d5..3dcceab 100644
--- a/lib/librte_eal/common/eal_common_log.c
+++ b/lib/librte_eal/common/eal_common_log.c
@@ -82,7 +82,7 @@ static struct log_history_list log_history;
 /* global log structure */
 struct rte_logs rte_logs = {
.type = ~0,
-   .level = RTE_LOG_DEBUG,
+   .level = RTE_LOG_LEVEL,
.file = NULL,
 };

@@ -93,7 +93,7 @@ static int history_enabled = 1;

 /**
  * This global structure stores some informations about the message
- * that is currently beeing processed by one lcore
+ * that is currently being processed by one lcore
  */
 struct log_cur_msg {
uint32_t loglevel; /**< log level - see rte_log.h */
diff --git a/lib/librte_eal/common/include/rte_log.h 
b/lib/librte_eal/common/include/rte_log.h
index 3b467c1..e7e893e 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -89,14 +89,17 @@ extern struct rte_logs rte_logs;
 #define RTE_LOGTYPE_USER8   0x8000 /**< User-defined log type 8. */

 /* Can't use 0, as it gives compiler warnings */
-#define RTE_LOG_EMERG1U  /**< System is unusable.   */
-#define RTE_LOG_ALERT2U  /**< Action must be taken immediately. */
-#define RTE_LOG_CRIT 3U  /**< Critical conditions.  */
-#define RTE_LOG_ERR  4U  /**< Error conditions. */
-#define RTE_LOG_WARNING  5U  /**< Warning conditions.   */
-#define RTE_LOG_NOTICE   6U  /**< Normal but significant condition. */
-#define RTE_LOG_INFO 7U  /**< Informational.*/
-#define RTE_LOG_DEBUG8U  /**< Debug-level messages. */
+enum {
+   RTE_LOG_NOOP = 0,   /**< Noop not used (zero entry)*/
+   RTE_LOG_EMERG,  /**< System is unusable.   */
+   RTE_LOG_ALERT,  /**< Action must be taken immediately. */
+   RTE_LOG_CRIT,   /**< Critical conditions.  */
+   RTE_LOG_ERR,/**< Error conditions. */
+   RTE_LOG_WARNING,/**< Warning conditions.   */
+   RTE_LOG_NOTICE, /**< Normal but significant condition. */
+   RTE_LOG_INFO,   /**< Informational.*/
+   RTE_LOG_DEBUG   /**< Debug-level messages. */
+};

 /** The default log stream. */
 extern FILE *eal_default_log_stream;
-- 
2.3.0



[dpdk-dev] [PATCH] eal:Fix log messages always being printed from rte_eal_cpu_init

2015-06-06 Thread Keith Wiles
The RTE_LOG(DEBUG, ...) messages in rte_eal_cpu_init() are printed
even when the log level on the command line was set to INFO or lower.

The problem is the rte_eal_cpu_init() routine was called before
the command line args are scanned. Setting --log-level=7 now
correctly does not print the messages from the rte_eal_cpu_init() routine.

Signed-off-by: Keith Wiles 
---
 lib/librte_eal/bsdapp/eal/eal.c   | 43 ++-
 lib/librte_eal/linuxapp/eal/eal.c | 43 ++-
 2 files changed, 76 insertions(+), 10 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 43e8a47..ca10f2c 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -306,6 +306,38 @@ eal_get_hugepage_mem_size(void)
return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX;
 }

+/* Parse the arguments for --log-level only */
+static void
+eal_log_level_parse(int argc, char **argv)
+{
+   int opt;
+   char **argvopt;
+   int option_index;
+
+   argvopt = argv;
+
+   eal_reset_internal_config(_config);
+
+   while ((opt = getopt_long(argc, argvopt, eal_short_options,
+ eal_long_options, _index)) != EOF) {
+
+   int ret;
+
+   /* getopt is not happy, stop right now */
+   if (opt == '?')
+   break;
+
+   ret = (opt == OPT_LOG_LEVEL_NUM)?
+   eal_parse_common_option(opt, optarg, _config) 
: 0;
+
+   /* common parser is not happy */
+   if (ret < 0)
+   break;
+   }
+
+   optind = 0; /* reset getopt lib */
+}
+
 /* Parse the argument given in the command line of the application */
 static int
 eal_parse_args(int argc, char **argv)
@@ -317,8 +349,6 @@ eal_parse_args(int argc, char **argv)

argvopt = argv;

-   eal_reset_internal_config(_config);
-
while ((opt = getopt_long(argc, argvopt, eal_short_options,
  eal_long_options, _index)) != EOF) {

@@ -447,6 +477,12 @@ rte_eal_init(int argc, char **argv)
if (rte_eal_log_early_init() < 0)
rte_panic("Cannot init early logs\n");

+   eal_log_level_parse(argc, argv);
+
+   /* set log level as early as possible */
+   rte_set_log_level(internal_config.log_level);
+
+   RTE_LOG(INFO, EAL,  "DPDK Version %s\n", rte_version());
if (rte_eal_cpu_init() < 0)
rte_panic("Cannot detect lcores\n");

@@ -454,9 +490,6 @@ rte_eal_init(int argc, char **argv)
if (fctret < 0)
exit(1);

-   /* set log level as early as possible */
-   rte_set_log_level(internal_config.log_level);
-
if (internal_config.no_hugetlbfs == 0 &&
internal_config.process_type != RTE_PROC_SECONDARY &&
eal_hugepage_info_init() < 0)
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index bd770cf..090ec99 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -499,6 +499,38 @@ eal_get_hugepage_mem_size(void)
return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX;
 }

+/* Parse the arguments for --log-level only */
+static void
+eal_log_level_parse(int argc, char **argv)
+{
+   int opt;
+   char **argvopt;
+   int option_index;
+
+   argvopt = argv;
+
+   eal_reset_internal_config(_config);
+
+   while ((opt = getopt_long(argc, argvopt, eal_short_options,
+ eal_long_options, _index)) != EOF) {
+
+   int ret;
+
+   /* getopt is not happy, stop right now */
+   if (opt == '?')
+   break;
+
+   ret = (opt == OPT_LOG_LEVEL_NUM)?
+   eal_parse_common_option(opt, optarg, _config) 
: 0;
+
+   /* common parser is not happy */
+   if (ret < 0)
+   break;
+   }
+
+   optind = 0; /* reset getopt lib */
+}
+
 /* Parse the argument given in the command line of the application */
 static int
 eal_parse_args(int argc, char **argv)
@@ -511,8 +543,6 @@ eal_parse_args(int argc, char **argv)

argvopt = argv;

-   eal_reset_internal_config(_config);
-
while ((opt = getopt_long(argc, argvopt, eal_short_options,
  eal_long_options, _index)) != EOF) {

@@ -717,6 +747,12 @@ rte_eal_init(int argc, char **argv)
if (rte_eal_log_early_init() < 0)
rte_panic("Cannot init early logs\n");

+   eal_log_level_parse(argc, argv);
+
+   /* set log level as early as possible */
+   rte_set_log_level(internal_config.log_level);
+
+   RTE_LOG(INFO, EAL, "DPDK Version %s\n", rte_version());
if (rte_eal_cpu_init() < 0)
rte_panic("Cannot detect lcores\n");

@@ -724,9 +760,6 @@ 

[dpdk-dev] [PATCH v2 7/7] app/test: update unit test with rte_memzone_free

2015-06-06 Thread Sergio Gonzalez Monroy
Update memzone unit test for the new rte_memzone_free API.

Signed-off-by: Sergio Gonzalez Monroy 
---
 app/test/test_memzone.c | 49 +
 1 file changed, 49 insertions(+)

diff --git a/app/test/test_memzone.c b/app/test/test_memzone.c
index c5e4872..7667d30 100644
--- a/app/test/test_memzone.c
+++ b/app/test/test_memzone.c
@@ -683,6 +683,51 @@ test_memzone_bounded(void)
return (0);
 }

+static int
+test_memzone_free(void)
+{
+   const struct rte_memzone *mz[4];
+
+   mz[0] = rte_memzone_reserve("tempzone0", 2000, SOCKET_ID_ANY, 0);
+   mz[1] = rte_memzone_reserve("tempzone1", 4000, SOCKET_ID_ANY, 0);
+
+   if (mz[0] > mz[1])
+   return -1;
+   if (!rte_memzone_lookup("tempzone0"))
+   return -1;
+   if (!rte_memzone_lookup("tempzone1"))
+   return -1;
+
+   if (rte_memzone_free(mz[0])) {
+   printf("Fail memzone free - tempzone0\n");
+   return -1;
+   }
+   if (rte_memzone_lookup("tempzone0")) {
+   printf("Found previously free memzone - tempzone0\n");
+   return -1;
+   }
+   mz[2] = rte_memzone_reserve("tempzone2", 2000, SOCKET_ID_ANY, 0);
+
+   if (mz[2] > mz[1]) {
+   printf("tempzone2 should have gotten the free entry from 
tempzone0\n");
+   return -1;
+   }
+   if (rte_memzone_free(mz[2])) {
+   printf("Fail memzone free - tempzone2\n");
+   return -1;
+   }
+   if (rte_memzone_lookup("tempzone2")) {
+   printf("Found previously free memzone - tempzone2\n");
+   return -1;
+   }
+   if (rte_memzone_free(mz[1])) {
+   printf("Fail memzone free - tempzone1\n");
+   return -1;
+   }
+   if (rte_memzone_lookup("tempzone1")) {
+   printf("Found previously free memzone - tempzone1\n");
+   return -1;
+   }

return 0;
 }
@@ -795,6 +840,10 @@ test_memzone(void)
if (test_memzone_reserve_max_aligned() < 0)
return -1;

+   printf("test free memzone\n");
+   if (test_memzone_free() < 0)
+   return -1;
+
return 0;
 }

-- 
1.9.3



[dpdk-dev] [PATCH v2 6/7] eal: new rte_memzone_free

2015-06-06 Thread Sergio Gonzalez Monroy
Implement rte_memzone_free which, as its name implies, would free a
memzone.

Currently memzone are tracked in an array and cannot be free.
To be able to reuse the same array to track memzones, we have to
change how we keep track of reserved memzones.

With this patch, any memzone with addr NULL is not used, so we also need
to change how we look for the next memzone entry free.

Signed-off-by: Sergio Gonzalez Monroy 
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map |  6 +++
 lib/librte_eal/common/eal_common_memzone.c| 50 +--
 lib/librte_eal/common/include/rte_eal_memconfig.h |  2 +-
 lib/librte_eal/common/include/rte_memzone.h   | 11 +
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 28 +++--
 lib/librte_eal/linuxapp/eal/rte_eal_version.map   |  6 +++
 6 files changed, 95 insertions(+), 8 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 0401be2..7110816 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -105,3 +105,9 @@ DPDK_2.0 {

local: *;
 };
+
+DPDK_2.1 {
+   global:
+
+   rte_memzone_free;
+} DPDK_2.0;
diff --git a/lib/librte_eal/common/eal_common_memzone.c 
b/lib/librte_eal/common/eal_common_memzone.c
index 742f6c9..0b458ec 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -76,6 +76,23 @@ memzone_lookup_thread_unsafe(const char *name)
return NULL;
 }

+static inline struct rte_memzone *
+get_next_free_memzone(void)
+{
+   struct rte_mem_config *mcfg;
+   unsigned i = 0;
+
+   /* get pointer to global configuration */
+   mcfg = rte_eal_get_configuration()->mem_config;
+
+   for (i = 0; i < RTE_MAX_MEMZONE && mcfg->memzone[i].addr != NULL; i++);
+
+   if (i < RTE_MAX_MEMZONE)
+   return >memzone[i];
+
+   return NULL;
+}
+
 /*
  * Return a pointer to a correctly filled memzone descriptor. If the
  * allocation cannot be done, return NULL.
@@ -140,7 +157,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, 
size_t len,
mcfg = rte_eal_get_configuration()->mem_config;

/* no more room in config */
-   if (mcfg->memzone_idx >= RTE_MAX_MEMZONE) {
+   if (mcfg->memzone_cnt >= RTE_MAX_MEMZONE) {
RTE_LOG(ERR, EAL, "%s(): No more room in config\n", __func__);
rte_errno = ENOSPC;
return NULL;
@@ -214,7 +231,8 @@ memzone_reserve_aligned_thread_unsafe(const char *name, 
size_t len,
const struct malloc_elem *elem = malloc_elem_from_data(mz_addr);

/* fill the zone in config */
-   struct rte_memzone *mz = >memzone[mcfg->memzone_idx++];
+   struct rte_memzone *mz = get_next_free_memzone();
+   mcfg->memzone_cnt++;
snprintf(mz->name, sizeof(mz->name), "%s", name);
mz->phys_addr = rte_malloc_virt2phy(mz_addr);
mz->addr = mz_addr;
@@ -290,6 +308,32 @@ rte_memzone_reserve_bounded(const char *name, size_t len,
return mz;
 }

+int
+rte_memzone_free(const struct rte_memzone *mz)
+{
+   struct rte_mem_config *mcfg;
+   int ret = 0;
+   void *addr;
+   unsigned idx;
+
+   if (mz == NULL)
+   return -EINVAL;
+
+   mcfg = rte_eal_get_configuration()->mem_config;
+
+   rte_rwlock_read_lock(>mlock);
+
+   idx = ((uintptr_t)mz - (uintptr_t)mcfg->memzone);
+   idx = idx / sizeof(struct rte_memzone);
+
+   addr = mcfg->memzone[idx].addr;
+   mcfg->memzone[idx].addr = NULL;
+   rte_free(addr);
+
+   rte_rwlock_read_unlock(>mlock);
+
+   return ret;
+}

 /*
  * Lookup for the memzone identified by the given name
@@ -363,7 +407,7 @@ rte_eal_memzone_init(void)
rte_rwlock_write_lock(>mlock);

/* delete all zones */
-   mcfg->memzone_idx = 0;
+   mcfg->memzone_cnt = 0;
memset(mcfg->memzone, 0, sizeof(mcfg->memzone));

rte_rwlock_write_unlock(>mlock);
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h 
b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 055212a..2015074 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -67,7 +67,7 @@ struct rte_mem_config {
rte_rwlock_t qlock;   /**< used for tailq operation for thread safe. */
rte_rwlock_t mplock;  /**< only used by mempool LIB for thread-safe. */

-   uint32_t memzone_idx; /**< Index of memzone */
+   uint32_t memzone_cnt; /**< Number of allocated memzones */

/* memory segments and zones */
struct rte_memseg memseg[RTE_MAX_MEMSEG];/**< Physmem descriptors. 
*/
diff --git a/lib/librte_eal/common/include/rte_memzone.h 
b/lib/librte_eal/common/include/rte_memzone.h
index 81b6ad4..3f54bde 100644
--- a/lib/librte_eal/common/include/rte_memzone.h
+++ b/lib/librte_eal/common/include/rte_memzone.h
@@ 

[dpdk-dev] [PATCH v2 5/7] eal: remove setup of free_memseg in ivshmem

2015-06-06 Thread Sergio Gonzalez Monroy
Remove code setting up free_memseg as it is not used/relevant anymore.

Signed-off-by: Sergio Gonzalez Monroy 
---
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c 
b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
index 2deaeb7..facfb80 100644
--- a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
+++ b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
@@ -725,15 +725,6 @@ map_all_segments(void)
 * expect memsegs to be empty */
memcpy(>memseg[i], ,
sizeof(struct rte_memseg));
-   memcpy(>free_memseg[i], ,
-   sizeof(struct rte_memseg));
-
-
-   /* adjust the free_memseg so that there's no free space left */
-   mcfg->free_memseg[i].ioremap_addr += mcfg->free_memseg[i].len;
-   mcfg->free_memseg[i].phys_addr += mcfg->free_memseg[i].len;
-   mcfg->free_memseg[i].addr_64 += mcfg->free_memseg[i].len;
-   mcfg->free_memseg[i].len = 0;

close(fd);

-- 
1.9.3



[dpdk-dev] [PATCH v2 3/7] app/test: update malloc/memzone unit tests

2015-06-06 Thread Sergio Gonzalez Monroy
Some unit test are not relevant anymore. It is the case of those malloc
UTs that checked corner cases when allocating MALLOC_MEMZONE_SIZE
chunks, and the case of those memzone UTs relaying of specific free
memsegs of rhte reserved memzone.

Other UTs just need to be update, for example, to calculate maximum free
block size available.

Signed-off-by: Sergio Gonzalez Monroy 
---
 app/test/test_malloc.c  |  86 --
 app/test/test_memzone.c | 436 
 2 files changed, 35 insertions(+), 487 deletions(-)

diff --git a/app/test/test_malloc.c b/app/test/test_malloc.c
index ea6f651..a04a751 100644
--- a/app/test/test_malloc.c
+++ b/app/test/test_malloc.c
@@ -56,10 +56,6 @@

 #define N 1

-#define QUOTE_(x) #x
-#define QUOTE(x) QUOTE_(x)
-#define MALLOC_MEMZONE_SIZE QUOTE(RTE_MALLOC_MEMZONE_SIZE)
-
 /*
  * Malloc
  * ==
@@ -292,60 +288,6 @@ test_str_to_size(void)
 }

 static int
-test_big_alloc(void)
-{
-   int socket = 0;
-   struct rte_malloc_socket_stats pre_stats, post_stats;
-   size_t size =rte_str_to_size(MALLOC_MEMZONE_SIZE)*2;
-   int align = 0;
-#ifndef RTE_LIBRTE_MALLOC_DEBUG
-   int overhead = RTE_CACHE_LINE_SIZE + RTE_CACHE_LINE_SIZE;
-#else
-   int overhead = RTE_CACHE_LINE_SIZE + RTE_CACHE_LINE_SIZE + 
RTE_CACHE_LINE_SIZE;
-#endif
-
-   rte_malloc_get_socket_stats(socket, _stats);
-
-   void *p1 = rte_malloc_socket("BIG", size , align, socket);
-   if (!p1)
-   return -1;
-   rte_malloc_get_socket_stats(socket,_stats);
-
-   /* Check statistics reported are correct */
-   /* Allocation may increase, or may be the same as before big allocation 
*/
-   if (post_stats.heap_totalsz_bytes < pre_stats.heap_totalsz_bytes) {
-   printf("Malloc statistics are incorrect - 
heap_totalsz_bytes\n");
-   return -1;
-   }
-   /* Check that allocated size adds up correctly */
-   if (post_stats.heap_allocsz_bytes !=
-   pre_stats.heap_allocsz_bytes + size + align + overhead) 
{
-   printf("Malloc statistics are incorrect - alloc_size\n");
-   return -1;
-   }
-   /* Check free size against tested allocated size */
-   if (post_stats.heap_freesz_bytes !=
-   post_stats.heap_totalsz_bytes - 
post_stats.heap_allocsz_bytes) {
-   printf("Malloc statistics are incorrect - heap_freesz_bytes\n");
-   return -1;
-   }
-   /* Number of allocated blocks must increase after allocation */
-   if (post_stats.alloc_count != pre_stats.alloc_count + 1) {
-   printf("Malloc statistics are incorrect - alloc_count\n");
-   return -1;
-   }
-   /* New blocks now available - just allocated 1 but also 1 new free */
-   if (post_stats.free_count != pre_stats.free_count &&
-   post_stats.free_count != pre_stats.free_count - 1) {
-   printf("Malloc statistics are incorrect - free_count\n");
-   return -1;
-   }
-
-   rte_free(p1);
-   return 0;
-}
-
-static int
 test_multi_alloc_statistics(void)
 {
int socket = 0;
@@ -399,10 +341,6 @@ test_multi_alloc_statistics(void)
/* After freeing both allocations check stats return to original */
rte_malloc_get_socket_stats(socket, _stats);

-   /*
-* Check that no new blocks added after small allocations
-* i.e. < RTE_MALLOC_MEMZONE_SIZE
-*/
if(second_stats.heap_totalsz_bytes != first_stats.heap_totalsz_bytes) {
printf("Incorrect heap statistics: Total size \n");
return -1;
@@ -447,18 +385,6 @@ test_multi_alloc_statistics(void)
 }

 static int
-test_memzone_size_alloc(void)
-{
-   void *p1 = rte_malloc("BIG", 
(size_t)(rte_str_to_size(MALLOC_MEMZONE_SIZE) - 128), 64);
-   if (!p1)
-   return -1;
-   rte_free(p1);
-   /* one extra check - check no crashes if free(NULL) */
-   rte_free(NULL);
-   return 0;
-}
-
-static int
 test_rte_malloc_type_limits(void)
 {
/* The type-limits functionality is not yet implemented,
@@ -935,18 +861,6 @@ test_malloc(void)
}
else printf("test_str_to_size() passed\n");

-   if (test_memzone_size_alloc() < 0){
-   printf("test_memzone_size_alloc() failed\n");
-   return -1;
-   }
-   else printf("test_memzone_size_alloc() passed\n");
-
-   if (test_big_alloc() < 0){
-   printf("test_big_alloc() failed\n");
-   return -1;
-   }
-   else printf("test_big_alloc() passed\n");
-
if (test_zero_aligned_alloc() < 0){
printf("test_zero_aligned_alloc() failed\n");
return -1;
diff --git a/app/test/test_memzone.c b/app/test/test_memzone.c
index 9c7a1cb..c5e4872 100644
--- a/app/test/test_memzone.c
+++ b/app/test/test_memzone.c
@@ -44,6 +44,9 @@
 #include 
 #include 
 

[dpdk-dev] [PATCH v2 1/7] eal: move librte_malloc to eal/common

2015-06-06 Thread Sergio Gonzalez Monroy
This patch moves the malloc library inside the eal.

This is the first step towards using malloc to allocate memory directly
from memsegs. Thus, memzones would allocate memory through malloc,
allowing unreserve/free memzones.

Signed-off-by: Sergio Gonzalez Monroy 
---
 config/common_bsdapp|   9 +-
 config/common_linuxapp  |   9 +-
 drivers/net/af_packet/Makefile  |   1 -
 drivers/net/bonding/Makefile|   1 -
 drivers/net/e1000/Makefile  |   2 +-
 drivers/net/enic/Makefile   |   2 +-
 drivers/net/fm10k/Makefile  |   2 +-
 drivers/net/i40e/Makefile   |   2 +-
 drivers/net/ixgbe/Makefile  |   2 +-
 drivers/net/mlx4/Makefile   |   1 -
 drivers/net/null/Makefile   |   1 -
 drivers/net/pcap/Makefile   |   1 -
 drivers/net/virtio/Makefile |   2 +-
 drivers/net/vmxnet3/Makefile|   2 +-
 drivers/net/xenvirt/Makefile|   2 +-
 lib/Makefile|   1 -
 lib/librte_acl/Makefile |   2 +-
 lib/librte_eal/bsdapp/eal/Makefile  |   4 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  13 +
 lib/librte_eal/common/Makefile  |   1 +
 lib/librte_eal/common/include/rte_malloc.h  | 342 
 lib/librte_eal/common/malloc_elem.c | 320 ++
 lib/librte_eal/common/malloc_elem.h | 190 +
 lib/librte_eal/common/malloc_heap.c | 209 +++
 lib/librte_eal/common/malloc_heap.h |  70 +
 lib/librte_eal/common/rte_malloc.c  | 260 ++
 lib/librte_eal/linuxapp/eal/Makefile|   4 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  13 +
 lib/librte_hash/Makefile|   2 +-
 lib/librte_lpm/Makefile |   2 +-
 lib/librte_malloc/Makefile  |  52 
 lib/librte_malloc/malloc_elem.c | 320 --
 lib/librte_malloc/malloc_elem.h | 190 -
 lib/librte_malloc/malloc_heap.c | 209 ---
 lib/librte_malloc/malloc_heap.h |  70 -
 lib/librte_malloc/rte_malloc.c  | 260 --
 lib/librte_malloc/rte_malloc.h  | 342 
 lib/librte_malloc/rte_malloc_version.map|  19 --
 lib/librte_mempool/Makefile |   2 -
 lib/librte_port/Makefile|   1 -
 lib/librte_ring/Makefile|   3 +-
 lib/librte_table/Makefile   |   1 -
 42 files changed, 1440 insertions(+), 1501 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_malloc.h
 create mode 100644 lib/librte_eal/common/malloc_elem.c
 create mode 100644 lib/librte_eal/common/malloc_elem.h
 create mode 100644 lib/librte_eal/common/malloc_heap.c
 create mode 100644 lib/librte_eal/common/malloc_heap.h
 create mode 100644 lib/librte_eal/common/rte_malloc.c
 delete mode 100644 lib/librte_malloc/Makefile
 delete mode 100644 lib/librte_malloc/malloc_elem.c
 delete mode 100644 lib/librte_malloc/malloc_elem.h
 delete mode 100644 lib/librte_malloc/malloc_heap.c
 delete mode 100644 lib/librte_malloc/malloc_heap.h
 delete mode 100644 lib/librte_malloc/rte_malloc.c
 delete mode 100644 lib/librte_malloc/rte_malloc.h
 delete mode 100644 lib/librte_malloc/rte_malloc_version.map

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 0b169c8..5d3cc39 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -97,6 +97,8 @@ CONFIG_RTE_LOG_LEVEL=8
 CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_MALLOC_DEBUG=n
+CONFIG_RTE_MALLOC_MEMZONE_SIZE=11M

 #
 # FreeBSD contiguous memory driver settings
@@ -295,13 +297,6 @@ CONFIG_RTE_LIBRTE_TIMER=y
 CONFIG_RTE_LIBRTE_TIMER_DEBUG=n

 #
-# Compile librte_malloc
-#
-CONFIG_RTE_LIBRTE_MALLOC=y
-CONFIG_RTE_LIBRTE_MALLOC_DEBUG=n
-CONFIG_RTE_MALLOC_MEMZONE_SIZE=11M
-
-#
 # Compile librte_cfgfile
 #
 CONFIG_RTE_LIBRTE_CFGFILE=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 5deb55a..810168f 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -100,6 +100,8 @@ CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
 CONFIG_RTE_EAL_VFIO=y
+CONFIG_RTE_MALLOC_DEBUG=n
+CONFIG_RTE_MALLOC_MEMZONE_SIZE=11M

 #
 # Special configurations in PCI Config Space for high performance
@@ -302,13 +304,6 @@ CONFIG_RTE_LIBRTE_TIMER=y
 CONFIG_RTE_LIBRTE_TIMER_DEBUG=n

 #
-# Compile librte_malloc
-#
-CONFIG_RTE_LIBRTE_MALLOC=y
-CONFIG_RTE_LIBRTE_MALLOC_DEBUG=n
-CONFIG_RTE_MALLOC_MEMZONE_SIZE=11M
-

[dpdk-dev] [PATCH v2 0/7] dynamic memzone

2015-06-06 Thread Sergio Gonzalez Monroy
Current implemetation allows reserving/creating memzones but not the opposite
(unreserve/free). This affects mempools and other memzone based objects.

>From my point of view, implementing free functionality for memzones would look
like malloc over memsegs.
Thus, this approach moves malloc inside eal (which in turn removes a circular
dependency), where malloc heaps are composed of memsegs.
We keep both malloc and memzone APIs as they are, but memzones allocate its
memory by calling malloc_heap_alloc.
Some extra functionality is required in malloc to allow for boundary constrained
memory requests.
In summary, currently malloc is based on memzones, and with this approach
memzones are based on malloc.

v2:
 - New rte_memzone_free
 - Support memzone len = 0
 - Add all available memsegs to malloc heap at init
 - Update memzone/malloc unit tests

TODOs:
 - checkpatch: current malloc code gives plenty of errors

Sergio Gonzalez Monroy (7):
  eal: move librte_malloc to eal/common
  eal: memzone allocated by malloc
  app/test: update malloc/memzone unit tests
  config: remove CONFIG_RTE_MALLOC_MEMZONE_SIZE
  eal: remove setup of free_memseg in ivshmem
  eal: new rte_memzone_free
  app/test: update unit test with rte_memzone_free

 app/test/test_malloc.c|  86 -
 app/test/test_memzone.c   | 439 +++---
 config/common_bsdapp  |   8 +-
 config/common_linuxapp|   8 +-
 drivers/net/af_packet/Makefile|   1 -
 drivers/net/bonding/Makefile  |   1 -
 drivers/net/e1000/Makefile|   2 +-
 drivers/net/enic/Makefile |   2 +-
 drivers/net/fm10k/Makefile|   2 +-
 drivers/net/i40e/Makefile |   2 +-
 drivers/net/ixgbe/Makefile|   2 +-
 drivers/net/mlx4/Makefile |   1 -
 drivers/net/null/Makefile |   1 -
 drivers/net/pcap/Makefile |   1 -
 drivers/net/virtio/Makefile   |   2 +-
 drivers/net/vmxnet3/Makefile  |   2 +-
 drivers/net/xenvirt/Makefile  |   2 +-
 lib/Makefile  |   1 -
 lib/librte_acl/Makefile   |   2 +-
 lib/librte_eal/bsdapp/eal/Makefile|   4 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map |  19 +
 lib/librte_eal/common/Makefile|   1 +
 lib/librte_eal/common/eal_common_memzone.c| 323 ++--
 lib/librte_eal/common/include/rte_eal_memconfig.h |   4 +-
 lib/librte_eal/common/include/rte_malloc.h| 342 +
 lib/librte_eal/common/include/rte_malloc_heap.h   |   3 +-
 lib/librte_eal/common/include/rte_memory.h|   1 +
 lib/librte_eal/common/include/rte_memzone.h   |  11 +
 lib/librte_eal/common/malloc_elem.c   | 344 +
 lib/librte_eal/common/malloc_elem.h   | 192 ++
 lib/librte_eal/common/malloc_heap.c   | 207 ++
 lib/librte_eal/common/malloc_heap.h   |  70 
 lib/librte_eal/common/rte_malloc.c| 259 +
 lib/librte_eal/linuxapp/eal/Makefile  |   4 +-
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c |  37 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map   |  19 +
 lib/librte_hash/Makefile  |   2 +-
 lib/librte_lpm/Makefile   |   2 +-
 lib/librte_malloc/Makefile|  52 ---
 lib/librte_malloc/malloc_elem.c   | 320 
 lib/librte_malloc/malloc_elem.h   | 190 --
 lib/librte_malloc/malloc_heap.c   | 209 --
 lib/librte_malloc/malloc_heap.h   |  70 
 lib/librte_malloc/rte_malloc.c| 260 -
 lib/librte_malloc/rte_malloc.h| 342 -
 lib/librte_malloc/rte_malloc_version.map  |  19 -
 lib/librte_mempool/Makefile   |   2 -
 lib/librte_port/Makefile  |   1 -
 lib/librte_ring/Makefile  |   3 +-
 lib/librte_table/Makefile |   1 -
 50 files changed, 1685 insertions(+), 2193 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_malloc.h
 create mode 100644 lib/librte_eal/common/malloc_elem.c
 create mode 100644 lib/librte_eal/common/malloc_elem.h
 create mode 100644 lib/librte_eal/common/malloc_heap.c
 create mode 100644 lib/librte_eal/common/malloc_heap.h
 create mode 100644 lib/librte_eal/common/rte_malloc.c
 delete mode 100644 lib/librte_malloc/Makefile
 delete mode 100644 lib/librte_malloc/malloc_elem.c
 delete mode 100644 lib/librte_malloc/malloc_elem.h
 delete mode 100644 lib/librte_malloc/malloc_heap.c
 delete mode 

[dpdk-dev] [PATCH 16/16] mlx4: query netdevice to get initial MAC address

2015-06-06 Thread Adrien Mazarguil
From: Or Ami 

Querying the netdevice instead of deriving the port's MAC address from its
GID is less prone to errors. There is no guarantee that the GID will always
contain it nor that the algorithm won't change.

Signed-off-by: Or Ami 
Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 4c4f693..04cc5e1 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -4427,22 +4427,25 @@ mlx4_ibv_device_to_pci_addr(const struct ibv_device 
*device,
 }

 /**
- * Derive MAC address from port GID.
+ * Get MAC address by querying netdevice.
  *
+ * @param[in] priv
+ *   struct priv for the requested device.
  * @param[out] mac
  *   MAC address output buffer.
- * @param port
- *   Physical port number.
- * @param[in] gid
- *   Port GID.
+ *
+ * @return
+ *   0 on success, -1 on failure and errno is set.
  */
-static void
-mac_from_gid(uint8_t (*mac)[ETHER_ADDR_LEN], uint32_t port, uint8_t *gid)
+static int
+priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
 {
-   memcpy(&(*mac)[0], gid + 8, 3);
-   memcpy(&(*mac)[3], gid + 13, 3);
-   if (port == 1)
-   (*mac)[0] ^= 2;
+   struct ifreq request;
+
+   if (priv_ifreq(priv, SIOCGIFHWADDR, ))
+   return -1;
+   memcpy(mac, request.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+   return 0;
 }

 /* Support up to 32 adapters. */
@@ -4604,7 +4607,6 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
struct ibv_exp_device_attr exp_device_attr;
 #endif /* HAVE_EXP_QUERY_DEVICE */
struct ether_addr mac;
-   union ibv_gid temp_gid;

 #ifdef HAVE_EXP_QUERY_DEVICE
exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
@@ -4729,12 +4731,12 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)

(void)mlx4_getenv_int;
priv->vf = vf;
-   if (ibv_query_gid(ctx, port, 0, _gid)) {
-   ERROR("ibv_query_gid() failure");
+   /* Configure the first MAC address by default. */
+   if (priv_get_mac(priv, _bytes)) {
+   ERROR("cannot get MAC address, is mlx4_en loaded?"
+ " (errno: %s)", strerror(errno));
goto port_error;
}
-   /* Configure the first MAC address by default. */
-   mac_from_gid(_bytes, port, temp_gid.raw);
INFO("port %u MAC address is %02x:%02x:%02x:%02x:%02x:%02x",
 priv->port,
 mac.addr_bytes[0], mac.addr_bytes[1],
-- 
2.1.0



[dpdk-dev] [PATCH 15/16] mlx4: fix support for multiple VLAN filters

2015-06-06 Thread Adrien Mazarguil
From: Olga Shern 

This commit fixes the "Multiple RX VLAN filters can be configured, but only
the first one works" bug. Since a single flow specification cannot contain
several VLAN definitions, the flows table is extended with MLX4_MAX_VLAN_IDS
possible specifications per configured MAC address.

Signed-off-by: Olga Shern 
Signed-off-by: Or Ami 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 174 
 1 file changed, 115 insertions(+), 59 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 4c0294a..4c4f693 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -33,8 +33,6 @@

 /*
  * Known limitations:
- * - Multiple RX VLAN filters can be configured, but only the first one
- *   works properly.
  * - RSS hash key and options cannot be modified.
  * - Hardware counters aren't implemented.
  */
@@ -227,11 +225,10 @@ struct rxq {
/* Faster callbacks that bypass Verbs. */
drv_exp_poll_cq_func ibv_exp_poll_cq;
/*
-* There is exactly one flow configured per MAC address. Each flow
-* may contain several specifications, one per configured VLAN ID.
+* Each VLAN ID requires a separate flow steering rule.
 */
BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
-   struct mlx_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES];
+   struct mlx_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES][MLX4_MAX_VLAN_IDS];
struct mlx_flow *promisc_flow; /* Promiscuous flow. */
struct mlx_flow *allmulti_flow; /* Multicast flow. */
unsigned int port_id; /* Port ID for incoming packets. */
@@ -1880,15 +1877,17 @@ rxq_free_elts(struct rxq *rxq)
 }

 /**
- * Unregister a MAC address from a RX queue.
+ * Delete flow steering rule.
  *
  * @param rxq
  *   Pointer to RX queue structure.
  * @param mac_index
  *   MAC address index.
+ * @param vlan_index
+ *   VLAN index.
  */
 static void
-rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
+rxq_del_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 {
 #ifndef NDEBUG
struct priv *priv = rxq->priv;
@@ -1896,20 +1895,43 @@ rxq_mac_addr_del(struct rxq *rxq, unsigned int 
mac_index)
(const uint8_t (*)[ETHER_ADDR_LEN])
priv->mac[mac_index].addr_bytes;
 #endif
+   assert(rxq->mac_flow[mac_index][vlan_index] != NULL);
+   DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
+ " (VLAN ID %" PRIu16 ")",
+ (void *)rxq,
+ (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
+ mac_index, priv->vlan_filter[vlan_index].id);
+   claim_zero(mlx_destroy_flow(rxq->mac_flow[mac_index][vlan_index]));
+   rxq->mac_flow[mac_index][vlan_index] = NULL;
+}
+
+/**
+ * Unregister a MAC address from a RX queue.
+ *
+ * @param rxq
+ *   Pointer to RX queue structure.
+ * @param mac_index
+ *   MAC address index.
+ */
+static void
+rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
+{
+   struct priv *priv = rxq->priv;
+   unsigned int i;
+   unsigned int vlans = 0;

assert(mac_index < elemof(priv->mac));
-   if (!BITFIELD_ISSET(rxq->mac_configured, mac_index)) {
-   assert(rxq->mac_flow[mac_index] == NULL);
+   if (!BITFIELD_ISSET(rxq->mac_configured, mac_index))
return;
+   for (i = 0; (i != elemof(priv->vlan_filter)); ++i) {
+   if (!priv->vlan_filter[i].enabled)
+   continue;
+   rxq_del_flow(rxq, mac_index, i);
+   vlans++;
+   }
+   if (!vlans) {
+   rxq_del_flow(rxq, mac_index, 0);
}
-   DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x"
- " index %u",
- (void *)rxq,
- (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
- mac_index);
-   assert(rxq->mac_flow[mac_index] != NULL);
-   claim_zero(mlx_destroy_flow(rxq->mac_flow[mac_index]));
-   rxq->mac_flow[mac_index] = NULL;
BITFIELD_RESET(rxq->mac_configured, mac_index);
 }

@@ -1933,47 +1955,37 @@ static int rxq_promiscuous_enable(struct rxq *);
 static void rxq_promiscuous_disable(struct rxq *);

 /**
- * Register a MAC address in a RX queue.
+ * Add single flow steering rule.
  *
  * @param rxq
  *   Pointer to RX queue structure.
  * @param mac_index
  *   MAC address index to register.
+ * @param vlan_index
+ *   VLAN index. Use -1 for a flow without VLAN.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 static int
-rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
+rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 {
+   struct mlx_flow *flow;
struct priv *priv = rxq->priv;
const uint8_t (*mac)[ETHER_ADDR_LEN] =
-   (const uint8_t 

[dpdk-dev] [PATCH 14/16] mlx4: remove provision for flow creation failure in DMFS A0 mode

2015-06-06 Thread Adrien Mazarguil
From: Or Ami 

Starting from MLNX_OFED 3.0 FW 2.34.5000 when working with optimized
steering mode (-7) QPs can be attached to the port's MAC, therefore no need
for the check.

Signed-off-by: Or Ami 
Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 20 
 1 file changed, 20 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index ab53c19..4c0294a 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -310,7 +310,6 @@ struct priv {
uint8_t port; /* Physical port number. */
unsigned int started:1; /* Device started, flows enabled. */
unsigned int promisc:1; /* Device in promiscuous mode. */
-   unsigned int promisc_ok:1; /* Promiscuous flow is supported. */
unsigned int allmulti:1; /* Device receives all multicast packets. */
unsigned int hw_qpg:1; /* QP groups are supported. */
unsigned int hw_tss:1; /* TSS is supported. */
@@ -2020,25 +2019,6 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
errno = 0;
flow = mlx_create_flow(rxq->qp, attr);
if (flow == NULL) {
-   int err = errno;
-
-   /* Flow creation failure is not fatal when in DMFS A0 mode.
-* Ignore error if promiscuity is already enabled or can be
-* enabled. */
-   if (priv->promisc_ok)
-   return 0;
-   if ((rxq->promisc_flow != NULL) ||
-   (rxq_promiscuous_enable(rxq) == 0)) {
-   if (rxq->promisc_flow != NULL)
-   rxq_promiscuous_disable(rxq);
-   WARN("cannot configure normal flow;"
-" if optimized steering is enabled"
-" (options mlx4_core log_num_mgm_entry_size=-7), "
-" please check RN and QSG for more information.");
-   priv->promisc_ok = 1;
-   return 0;
-   }
-   errno = err;
/* It's not clear whether errno is always set in this case. */
ERROR("%p: flow configuration failed, errno=%d: %s",
  (void *)rxq, errno,
-- 
2.1.0



[dpdk-dev] [PATCH 13/16] mlx4: fix error message for invalid number of descriptors

2015-06-06 Thread Adrien Mazarguil
From: Or Ami 

The number of descriptors must be a multiple of MLX4_PMD_SGE_WR_N.

Signed-off-by: Or Ami 
Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 73663d2..ab53c19 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1391,7 +1391,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, 
uint16_t desc,
(void)conf; /* Thresholds configuration (ignored). */
if ((desc == 0) || (desc % MLX4_PMD_SGE_WR_N)) {
ERROR("%p: invalid number of TX descriptors (must be a"
- " multiple of %d)", (void *)dev, desc);
+ " multiple of %d)", (void *)dev, MLX4_PMD_SGE_WR_N);
return EINVAL;
}
desc /= MLX4_PMD_SGE_WR_N;
@@ -3103,7 +3103,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
}
if ((desc == 0) || (desc % MLX4_PMD_SGE_WR_N)) {
ERROR("%p: invalid number of RX descriptors (must be a"
- " multiple of %d)", (void *)dev, desc);
+ " multiple of %d)", (void *)dev, MLX4_PMD_SGE_WR_N);
return EINVAL;
}
/* Get mbuf length. */
-- 
2.1.0



[dpdk-dev] [PATCH 12/16] mlx4: add support for upstream flow steering API

2015-06-06 Thread Adrien Mazarguil
From: Alex Rosenbaum 

This commit makes librte_pmd_mlx4 support both the extended Verbs API from
upstream and the original experimental Verbs API.

Signed-off-by: Olga Shern 
Signed-off-by: Alex Rosenbaum 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/Makefile |  4 +++
 drivers/net/mlx4/mlx4.c   | 82 +++
 2 files changed, 59 insertions(+), 27 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index ce1f2b0..2b3a1b6 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -116,6 +116,10 @@ mlx4_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
HAVE_EXP_QUERY_DEVICE \
infiniband/verbs.h \
type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_STRUCT_IBV_FLOW  \
+   infiniband/verbs.h \
+   type 'struct ibv_flow' $(AUTOCONF_OUTPUT)

 mlx4.o: mlx4_autoconf.h

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index f9faeb0..73663d2 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -97,6 +97,34 @@
 /* PMD header. */
 #include "mlx4.h"

+#ifdef HAVE_STRUCT_IBV_FLOW
+
+/* Use extended flow steering Verbs API from upstream. */
+#define MLX_FLOW_ATTR_NORMAL IBV_FLOW_ATTR_NORMAL
+#define MLX_FLOW_SPEC_ETH IBV_FLOW_SPEC_ETH
+#define MLX_FLOW_ATTR_MC_DEFAULT IBV_FLOW_ATTR_MC_DEFAULT
+#define MLX_FLOW_ATTR_ALL_DEFAULT IBV_FLOW_ATTR_ALL_DEFAULT
+#define mlx_flow ibv_flow
+#define mlx_flow_attr ibv_flow_attr
+#define mlx_flow_spec_eth ibv_flow_spec_eth
+#define mlx_create_flow ibv_create_flow
+#define mlx_destroy_flow ibv_destroy_flow
+
+#else /* HAVE_STRUCT_IBV_FLOW */
+
+/* Use experimental flow steering Verbs API. */
+#define MLX_FLOW_ATTR_NORMAL IBV_EXP_FLOW_ATTR_NORMAL
+#define MLX_FLOW_SPEC_ETH IBV_EXP_FLOW_SPEC_ETH
+#define MLX_FLOW_ATTR_MC_DEFAULT IBV_EXP_FLOW_ATTR_MC_DEFAULT
+#define MLX_FLOW_ATTR_ALL_DEFAULT IBV_EXP_FLOW_ATTR_ALL_DEFAULT
+#define mlx_flow ibv_exp_flow
+#define mlx_flow_attr ibv_exp_flow_attr
+#define mlx_flow_spec_eth ibv_exp_flow_spec_eth
+#define mlx_create_flow ibv_exp_create_flow
+#define mlx_destroy_flow ibv_exp_destroy_flow
+
+#endif /* HAVE_STRUCT_IBV_FLOW */
+
 /* Runtime logging through RTE_LOG() is enabled when not in debugging mode.
  * Intermediate LOG_*() macros add the required end-of-line characters. */
 #ifndef NDEBUG
@@ -203,9 +231,9 @@ struct rxq {
 * may contain several specifications, one per configured VLAN ID.
 */
BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
-   struct ibv_exp_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES];
-   struct ibv_exp_flow *promisc_flow; /* Promiscuous flow. */
-   struct ibv_exp_flow *allmulti_flow; /* Multicast flow. */
+   struct mlx_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES];
+   struct mlx_flow *promisc_flow; /* Promiscuous flow. */
+   struct mlx_flow *allmulti_flow; /* Multicast flow. */
unsigned int port_id; /* Port ID for incoming packets. */
unsigned int elts_n; /* (*elts)[] length. */
unsigned int elts_head; /* Current index in (*elts)[]. */
@@ -1881,7 +1909,7 @@ rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
  (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
  mac_index);
assert(rxq->mac_flow[mac_index] != NULL);
-   claim_zero(ibv_exp_destroy_flow(rxq->mac_flow[mac_index]));
+   claim_zero(mlx_destroy_flow(rxq->mac_flow[mac_index]));
rxq->mac_flow[mac_index] = NULL;
BITFIELD_RESET(rxq->mac_configured, mac_index);
 }
@@ -1926,7 +1954,7 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
unsigned int vlans = 0;
unsigned int specs = 0;
unsigned int i, j;
-   struct ibv_exp_flow *flow;
+   struct mlx_flow *flow;

assert(mac_index < elemof(priv->mac));
if (BITFIELD_ISSET(rxq->mac_configured, mac_index))
@@ -1938,28 +1966,28 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int 
mac_index)
specs = (vlans ? vlans : 1);

/* Allocate flow specification on the stack. */
-   struct ibv_exp_flow_attr data
+   struct mlx_flow_attr data
[1 +
-(sizeof(struct ibv_exp_flow_spec_eth[specs]) /
- sizeof(struct ibv_exp_flow_attr)) +
-!!(sizeof(struct ibv_exp_flow_spec_eth[specs]) %
-   sizeof(struct ibv_exp_flow_attr))];
-   struct ibv_exp_flow_attr *attr = (void *)[0];
-   struct ibv_exp_flow_spec_eth *spec = (void *)[1];
+(sizeof(struct mlx_flow_spec_eth[specs]) /
+ sizeof(struct mlx_flow_attr)) +
+!!(sizeof(struct mlx_flow_spec_eth[specs]) %
+   sizeof(struct mlx_flow_attr))];
+   struct mlx_flow_attr *attr = (void *)[0];
+   struct mlx_flow_spec_eth *spec = (void *)[1];

/*

[dpdk-dev] [PATCH 11/16] mlx4: improve accuracy of link status information

2015-06-06 Thread Adrien Mazarguil
From: Olga Shern 

Query interface properties using the ethtool API instead of Verbs
through ibv_query_port(). The returned information is more accurate for
Ethernet links since several link speeds cannot be mapped to Verbs
semantics.

Signed-off-by: Olga Shern 
Signed-off-by: Alex Rosenbaum 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 44 +---
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index cc16e8c..f9faeb0 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -264,7 +264,6 @@ struct priv {
struct rte_eth_dev *dev; /* Ethernet device. */
struct ibv_context *ctx; /* Verbs context. */
struct ibv_device_attr device_attr; /* Device properties. */
-   struct ibv_port_attr port_attr; /* Physical port properties. */
struct ibv_pd *pd; /* Protection Domain. */
/*
 * MAC addresses array and configuration bit-field.
@@ -3912,29 +3911,37 @@ static int
 mlx4_link_update_unlocked(struct rte_eth_dev *dev, int wait_to_complete)
 {
struct priv *priv = dev->data->dev_private;
-   struct ibv_port_attr port_attr;
-   static const uint8_t width_mult[] = {
-   /* Multiplier values taken from devinfo.c in libibverbs. */
-   0, 1, 4, 0, 8, 0, 0, 0, 12, 0
+   struct ethtool_cmd edata = {
+   .cmd = ETHTOOL_GSET
};
+   struct ifreq ifr;
+   struct rte_eth_link dev_link;
+   int link_speed = 0;

(void)wait_to_complete;
-   errno = ibv_query_port(priv->ctx, priv->port, _attr);
-   if (errno) {
-   WARN("port query failed: %s", strerror(errno));
+   if (priv_ifreq(priv, SIOCGIFFLAGS, )) {
+   WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno));
return -1;
}
-   dev->data->dev_link = (struct rte_eth_link){
-   .link_speed = (ibv_rate_to_mbps(mult_to_ibv_rate
-   (port_attr.active_speed)) *
-  width_mult[(port_attr.active_width %
-  sizeof(width_mult))]),
-   .link_duplex = ETH_LINK_FULL_DUPLEX,
-   .link_status = (port_attr.state == IBV_PORT_ACTIVE)
-   };
-   if (memcmp(_attr, >port_attr, sizeof(port_attr))) {
+   memset(_link, 0, sizeof(dev_link));
+   dev_link.link_status = ((ifr.ifr_flags & IFF_UP) &&
+   (ifr.ifr_flags & IFF_RUNNING));
+   ifr.ifr_data = 
+   if (priv_ifreq(priv, SIOCETHTOOL, )) {
+   WARN("ioctl(SIOCETHTOOL, ETHTOOL_GSET) failed: %s",
+strerror(errno));
+   return -1;
+   }
+   link_speed = ethtool_cmd_speed();
+   if (link_speed == -1)
+   dev_link.link_speed = 0;
+   else
+   dev_link.link_speed = link_speed;
+   dev_link.link_duplex = ((edata.duplex == DUPLEX_HALF) ?
+   ETH_LINK_HALF_DUPLEX : ETH_LINK_FULL_DUPLEX);
+   if (memcmp(_link, >data->dev_link, sizeof(dev_link))) {
/* Link status changed. */
-   priv->port_attr = port_attr;
+   dev->data->dev_link = dev_link;
return 0;
}
/* Link status is still the same. */
@@ -4581,7 +4588,6 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)

priv->ctx = ctx;
priv->device_attr = device_attr;
-   priv->port_attr = port_attr;
priv->port = port;
priv->pd = pd;
priv->mtu = ETHER_MTU;
-- 
2.1.0



[dpdk-dev] [PATCH 10/16] mlx4: allow applications to use fork() safely

2015-06-06 Thread Adrien Mazarguil
From: Olga Shern 

Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index f7186fa..cc16e8c 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -4793,6 +4793,13 @@ rte_mlx4_pmd_init(const char *name, const char *args)
 {
(void)name;
(void)args;
+   /*
+* RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
+* huge pages. Calling ibv_fork_init() during init allows
+* applications to use fork() safely.
+*/
+   setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
+   ibv_fork_init();
rte_eal_pci_register(_driver.pci_drv);
return 0;
 }
-- 
2.1.0



[dpdk-dev] [PATCH 09/16] mlx4: merge RX queue setup functions

2015-06-06 Thread Adrien Mazarguil
From: Alex Rosenbaum 

Make rxq_setup_qp() handle inline support like rxq_setup_qp_rss() instead of
having two separate functions.

Signed-off-by: Alex Rosenbaum 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 62 -
 1 file changed, 15 insertions(+), 47 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 080602e..f7186fa 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2708,9 +2708,16 @@ repost:
 }

 #ifdef INLINE_RECV
+typedef struct ibv_exp_qp_init_attr mlx4_qp_init_attr_t;
+#define mlx4_create_qp ibv_exp_create_qp
+#else /* INLINE_RECV */
+typedef struct ibv_qp_init_attr mlx4_qp_init_attr_t;
+#define mlx4_create_qp ibv_create_qp
+#endif /* INLINE_RECV */

 /**
- * Allocate a Queue Pair in case inline receive is supported.
+ * Allocate a Queue Pair.
+ * Optionally setup inline receive if supported.
  *
  * @param priv
  *   Pointer to private structure.
@@ -2725,12 +2732,11 @@ repost:
 static struct ibv_qp *
 rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
 {
-   struct ibv_exp_qp_init_attr attr = {
+   mlx4_qp_init_attr_t attr = {
/* CQ to be associated with the send queue. */
.send_cq = cq,
/* CQ to be associated with the receive queue. */
.recv_cq = cq,
-   .max_inl_recv = priv->inl_recv_size,
.cap = {
/* Max number of outstanding WRs. */
.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
@@ -2743,61 +2749,23 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, 
uint16_t desc)
 MLX4_PMD_SGE_WR_N),
},
.qp_type = IBV_QPT_RAW_PACKET,
-   .pd = priv->pd
};

+#ifdef INLINE_RECV
+   attr.max_inl_recv = priv->inl_recv_size;
+   attr.pd = priv->pd;
attr.comp_mask = IBV_EXP_QP_INIT_ATTR_PD;
attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
+#endif

-   return ibv_exp_create_qp(priv->ctx, );
-}
-
-#else /* INLINE_RECV */
-
-/**
- * Allocate a Queue Pair.
- *
- * @param priv
- *   Pointer to private structure.
- * @param cq
- *   Completion queue to associate with QP.
- * @param desc
- *   Number of descriptors in QP (hint only).
- *
- * @return
- *   QP pointer or NULL in case of error.
- */
-static struct ibv_qp *
-rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
-{
-   struct ibv_qp_init_attr attr = {
-   /* CQ to be associated with the send queue. */
-   .send_cq = cq,
-   /* CQ to be associated with the receive queue. */
-   .recv_cq = cq,
-   .cap = {
-   /* Max number of outstanding WRs. */
-   .max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
-   priv->device_attr.max_qp_wr :
-   desc),
-   /* Max number of scatter/gather elements in a WR. */
-   .max_recv_sge = ((priv->device_attr.max_sge <
- MLX4_PMD_SGE_WR_N) ?
-priv->device_attr.max_sge :
-MLX4_PMD_SGE_WR_N),
-   },
-   .qp_type = IBV_QPT_RAW_PACKET
-   };
-
-   return ibv_create_qp(priv->pd, );
+   return mlx4_create_qp(priv->ctx, );
 }

-#endif /* INLINE_RECV */
-
 #ifdef RSS_SUPPORT

 /**
  * Allocate a RSS Queue Pair.
+ * Optionally setup inline receive if supported.
  *
  * @param priv
  *   Pointer to private structure.
-- 
2.1.0



[dpdk-dev] [PATCH 08/16] mlx4: avoid looking up WR ID to improve RX performance

2015-06-06 Thread Adrien Mazarguil
From: Alex Rosenbaum 

This is done by storing the current index in the RX queue structure.

Signed-off-by: Alex Rosenbaum 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 061f5e6..080602e 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -208,6 +208,7 @@ struct rxq {
struct ibv_exp_flow *allmulti_flow; /* Multicast flow. */
unsigned int port_id; /* Port ID for incoming packets. */
unsigned int elts_n; /* (*elts)[] length. */
+   unsigned int elts_head; /* Current index in (*elts)[]. */
union {
struct rxq_elt_sp (*sp)[]; /* Scattered RX elements. */
struct rxq_elt (*no_sp)[]; /* RX elements. */
@@ -1651,6 +1652,7 @@ rxq_alloc_elts_sp(struct rxq *rxq, unsigned int elts_n,
DEBUG("%p: allocated and configured %u WRs (%zu segments)",
  (void *)rxq, elts_n, (elts_n * elemof((*elts)[0].sges)));
rxq->elts_n = elts_n;
+   rxq->elts_head = 0;
rxq->elts.sp = elts;
assert(ret == 0);
return 0;
@@ -1795,6 +1797,7 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, 
struct rte_mbuf **pool)
DEBUG("%p: allocated and configured %u single-segment WRs",
  (void *)rxq, elts_n);
rxq->elts_n = elts_n;
+   rxq->elts_head = 0;
rxq->elts.no_sp = elts;
assert(ret == 0);
return 0;
@@ -2376,6 +2379,8 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
 {
struct rxq *rxq = (struct rxq *)dpdk_rxq;
struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
+   const unsigned int elts_n = rxq->elts_n;
+   unsigned int elts_head = rxq->elts_head;
struct ibv_exp_wc wcs[pkts_n];
struct ibv_recv_wr head;
struct ibv_recv_wr **next = 
@@ -2402,7 +2407,7 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
struct ibv_exp_wc *wc = [i];
uint64_t wr_id = wc->wr_id;
uint32_t len = wc->byte_len;
-   struct rxq_elt_sp *elt = &(*elts)[wr_id];
+   struct rxq_elt_sp *elt = &(*elts)[elts_head];
struct ibv_recv_wr *wr = >wr;
struct rte_mbuf *pkt_buf = NULL; /* Buffer returned in pkts. */
struct rte_mbuf **pkt_buf_next = _buf;
@@ -2410,10 +2415,15 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf 
**pkts, uint16_t pkts_n)
unsigned int j = 0;

/* Sanity checks. */
+#ifdef NDEBUG
+   (void)wr_id;
+#endif
assert(wr_id < rxq->elts_n);
assert(wr_id == wr->wr_id);
assert(wr->sg_list == elt->sges);
assert(wr->num_sge == elemof(elt->sges));
+   assert(elts_head < rxq->elts_n);
+   assert(rxq->elts_head < rxq->elts_n);
/* Link completed WRs together for repost. */
*next = wr;
next = >next;
@@ -2522,6 +2532,8 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
rxq->stats.ibytes += wc->byte_len;
 #endif
 repost:
+   if (++elts_head >= elts_n)
+   elts_head = 0;
continue;
}
*next = NULL;
@@ -2539,6 +2551,7 @@ repost:
  strerror(i));
abort();
}
+   rxq->elts_head = elts_head;
 #ifdef MLX4_PMD_SOFT_COUNTERS
/* Increase packets counter. */
rxq->stats.ipackets += ret;
@@ -2568,6 +2581,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
 {
struct rxq *rxq = (struct rxq *)dpdk_rxq;
struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts.no_sp;
+   const unsigned int elts_n = rxq->elts_n;
+   unsigned int elts_head = rxq->elts_head;
struct ibv_exp_wc wcs[pkts_n];
struct ibv_recv_wr head;
struct ibv_recv_wr **next = 
@@ -2592,7 +2607,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
struct ibv_exp_wc *wc = [i];
uint64_t wr_id = wc->wr_id;
uint32_t len = wc->byte_len;
-   struct rxq_elt *elt = &(*elts)[WR_ID(wr_id).id];
+   struct rxq_elt *elt = &(*elts)[elts_head];
struct ibv_recv_wr *wr = >wr;
struct rte_mbuf *seg =
(void *)(elt->sge.addr - WR_ID(wr_id).offset);
@@ -2603,6 +2618,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
assert(wr_id == wr->wr_id);
assert(wr->sg_list == >sge);
assert(wr->num_sge == 1);
+   assert(elts_head < rxq->elts_n);
+   assert(rxq->elts_head < rxq->elts_n);
/* Link completed WRs 

[dpdk-dev] [PATCH 07/16] mlx4: update optimized steering warning message

2015-06-06 Thread Adrien Mazarguil
From: Olga Shern 

This feature is now also supported in VMs.

Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 1b86e58..061f5e6 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2001,10 +2001,10 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int 
mac_index)
(rxq_promiscuous_enable(rxq) == 0)) {
if (rxq->promisc_flow != NULL)
rxq_promiscuous_disable(rxq);
-   WARN("cannot configure normal flow but promiscuous"
-" mode is fine, assuming promiscuous optimization"
-" is enabled"
-" (options mlx4_core log_num_mgm_entry_size=-7)");
+   WARN("cannot configure normal flow;"
+" if optimized steering is enabled"
+" (options mlx4_core log_num_mgm_entry_size=-7), "
+" please check RN and QSG for more information.");
priv->promisc_ok = 1;
return 0;
}
-- 
2.1.0



[dpdk-dev] [PATCH 06/16] mlx4: use faster CQ polling function

2015-06-06 Thread Adrien Mazarguil
From: Alex Rosenbaum 

Replace ibv_exp_poll_cq() with direct function call to improve performance.

Signed-off-by: Alex Rosenbaum 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 3210120..1b86e58 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -196,6 +196,8 @@ struct rxq {
struct ibv_mr *mr; /* Memory Region (for mp). */
struct ibv_cq *cq; /* Completion Queue. */
struct ibv_qp *qp; /* Queue Pair. */
+   /* Faster callbacks that bypass Verbs. */
+   drv_exp_poll_cq_func ibv_exp_poll_cq;
/*
 * There is exactly one flow configured per MAC address. Each flow
 * may contain several specifications, one per configured VLAN ID.
@@ -2386,7 +2388,7 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
return mlx4_rx_burst(dpdk_rxq, pkts, pkts_n);
if (unlikely(elts == NULL)) /* See RTE_DEV_CMD_SET_MTU. */
return 0;
-   wcs_n = ibv_exp_poll_cq(rxq->cq, pkts_n, wcs, sizeof(wcs[0]));
+   wcs_n = rxq->ibv_exp_poll_cq(rxq->cq, pkts_n, wcs, sizeof(wcs[0]));
if (unlikely(wcs_n == 0))
return 0;
if (unlikely(wcs_n < 0)) {
@@ -2576,7 +2578,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)

if (unlikely(rxq->sp))
return mlx4_rx_burst_sp(dpdk_rxq, pkts, pkts_n);
-   wcs_n = ibv_exp_poll_cq(rxq->cq, pkts_n, wcs, sizeof(wcs[0]));
+   wcs_n = rxq->ibv_exp_poll_cq(rxq->cq, pkts_n, wcs, sizeof(wcs[0]));
if (unlikely(wcs_n == 0))
return 0;
if (unlikely(wcs_n < 0)) {
@@ -3213,6 +3215,13 @@ skip_alloc:
/* Save port ID. */
tmpl.port_id = dev->data->port_id;
DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
+   tmpl.ibv_exp_poll_cq = (drv_exp_poll_cq_func)(uintptr_t)
+   ibv_exp_get_provider_func(tmpl.cq->context,
+ IBV_EXP_POLL_CQ_FUNC);
+   if (tmpl.ibv_exp_poll_cq == NULL) {
+   ERROR("%p: cannot retrieve IBV_EXP_POLL_CQ_FUNC", (void *)dev);
+   goto error;
+   }
/* Clean up rxq in case we're reinitializing it. */
DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
rxq_cleanup(rxq);
-- 
2.1.0



[dpdk-dev] [PATCH 05/16] mlx4: add L2 tunnel (VXLAN) RX checksum offload support

2015-06-06 Thread Adrien Mazarguil
Depending on adapters features and VXLAN support in the kernel, VXLAN frames
can be automatically recognized, in which case checksum validation occurs on
inner and outer L3 and L4.

Signed-off-by: Adrien Mazarguil 
Acked-by: Guillaume Gaudonville 
---
 drivers/net/mlx4/mlx4.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index cec894f..3210120 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -212,6 +212,7 @@ struct rxq {
} elts;
unsigned int sp:1; /* Use scattered RX elements. */
unsigned int csum:1; /* Enable checksum offloading. */
+   unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
uint32_t mb_len; /* Length of a mp-issued mbuf. */
struct mlx4_rxq_stats stats; /* RX queue counters. */
unsigned int socket; /* CPU socket ID for allocations. */
@@ -285,6 +286,7 @@ struct priv {
unsigned int hw_tss:1; /* TSS is supported. */
unsigned int hw_rss:1; /* RSS is supported. */
unsigned int hw_csum:1; /* Checksum offload is supported. */
+   unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
unsigned int rss:1; /* RSS is enabled. */
unsigned int vf:1; /* This is a VF device. */
 #ifdef INLINE_RECV
@@ -2329,6 +2331,25 @@ rxq_wc_to_ol_flags(const struct rxq *rxq, uint64_t 
exp_wc_flags)
  IBV_EXP_L3_RX_CSUM_OK, PKT_RX_IP_CKSUM_BAD) |
TRANSPOSE(~exp_wc_flags,
  IBV_EXP_L4_RX_CSUM_OK, PKT_RX_L4_CKSUM_BAD);
+   /*
+* PKT_RX_IP_CKSUM_BAD and PKT_RX_L4_CKSUM_BAD are used in place
+* of PKT_RX_EIP_CKSUM_BAD because the latter is not functional
+* (its value is 0).
+*/
+   if ((exp_wc_flags & IBV_EXP_L2_TUNNEL_PACKET) && (rxq->csum_l2tun))
+   ol_flags |=
+   TRANSPOSE(exp_wc_flags,
+ IBV_EXP_L2_TUNNEL_IPV4_PACKET,
+ PKT_RX_TUNNEL_IPV4_HDR) |
+   TRANSPOSE(exp_wc_flags,
+ IBV_EXP_L2_TUNNEL_IPV6_PACKET,
+ PKT_RX_TUNNEL_IPV6_HDR) |
+   TRANSPOSE(~exp_wc_flags,
+ IBV_EXP_L2_TUNNEL_L3_RX_CSUM_OK,
+ PKT_RX_IP_CKSUM_BAD) |
+   TRANSPOSE(~exp_wc_flags,
+ IBV_EXP_L2_TUNNEL_L4_RX_CSUM_OK,
+ PKT_RX_L4_CKSUM_BAD);
return ol_flags;
 }

@@ -2859,6 +2880,10 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
rxq->csum = tmpl.csum;
}
+   if (priv->hw_csum_l2tun) {
+   tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
+   rxq->csum_l2tun = tmpl.csum_l2tun;
+   }
/* Enable scattered packets support for this queue if necessary. */
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
@@ -3078,6 +3103,8 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
/* Toggle RX checksum offload if hardware supports it. */
if (priv->hw_csum)
tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
+   if (priv->hw_csum_l2tun)
+   tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
/* Enable scattered packets support for this queue if necessary. */
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
@@ -4270,6 +4297,8 @@ static const struct eth_dev_ops mlx4_dev_ops = {
.mac_addr_remove = mlx4_mac_addr_remove,
.mac_addr_add = mlx4_mac_addr_add,
.mtu_set = mlx4_dev_set_mtu,
+   .udp_tunnel_add = NULL,
+   .udp_tunnel_del = NULL,
.fdir_add_signature_filter = NULL,
.fdir_update_signature_filter = NULL,
.fdir_remove_signature_filter = NULL,
@@ -4599,6 +4628,11 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
DEBUG("checksum offloading is %ssupported",
  (priv->hw_csum ? "" : "not "));

+   priv->hw_csum_l2tun = !!(exp_device_attr.exp_device_cap_flags &
+IBV_EXP_DEVICE_L2_TUNNEL_OFFLOADS);
+   DEBUG("L2 tunnel checksum offloads are %ssupported",
+ (priv->hw_csum_l2tun ? "" : "not "));
+
 #ifdef INLINE_RECV
priv->inl_recv_size = mlx4_getenv_int("MLX4_INLINE_RECV_SIZE");

-- 
2.1.0



[dpdk-dev] [PATCH 04/16] mlx4: add L3 and L4 RX checksum offload support

2015-06-06 Thread Adrien Mazarguil
From: Gilad Berman 

Mellanox ConnectX-3 adapters can handle L3 (IPv4) and L4 (TCP, UDP, TCP6,
UDP6) RX checksums validation, with and without 802.1Q (VLAN) headers.

Signed-off-by: Gilad Berman 
Signed-off-by: Adrien Mazarguil 
Acked-by: Guillaume Gaudonville 
---
 drivers/net/mlx4/mlx4.c | 63 +++--
 1 file changed, 61 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index e32e433..cec894f 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -141,6 +141,12 @@ static inline void wr_id_t_check(void)
(void)wr_id_t_check;
 }

+/* Transpose flags. Useful to convert IBV to DPDK flags. */
+#define TRANSPOSE(val, from, to) \
+   (((from) >= (to)) ? \
+(((val) & (from)) / ((from) / (to))) : \
+(((val) & (from)) * ((to) / (from
+
 /* If raw send operations are available, use them since they are faster. */
 #ifdef SEND_RAW_WR_SUPPORT
 typedef struct ibv_send_wr_raw mlx4_send_wr_t;
@@ -205,6 +211,7 @@ struct rxq {
struct rxq_elt (*no_sp)[]; /* RX elements. */
} elts;
unsigned int sp:1; /* Use scattered RX elements. */
+   unsigned int csum:1; /* Enable checksum offloading. */
uint32_t mb_len; /* Length of a mp-issued mbuf. */
struct mlx4_rxq_stats stats; /* RX queue counters. */
unsigned int socket; /* CPU socket ID for allocations. */
@@ -277,6 +284,7 @@ struct priv {
unsigned int hw_qpg:1; /* QP groups are supported. */
unsigned int hw_tss:1; /* TSS is supported. */
unsigned int hw_rss:1; /* RSS is supported. */
+   unsigned int hw_csum:1; /* Checksum offload is supported. */
unsigned int rss:1; /* RSS is enabled. */
unsigned int vf:1; /* This is a VF device. */
 #ifdef INLINE_RECV
@@ -2296,6 +2304,34 @@ rxq_cleanup(struct rxq *rxq)
memset(rxq, 0, sizeof(*rxq));
 }

+/**
+ * Translate RX work completion flags to offload flags.
+ *
+ * @param[in] rxq
+ *   Pointer to RX queue structure.
+ * @param exp_wc_flags
+ *   RX flags from struct ibv_exp_wc.
+ *
+ * @return
+ *   Offload flags (ol_flags) for struct rte_mbuf.
+ */
+static inline uint32_t
+rxq_wc_to_ol_flags(const struct rxq *rxq, uint64_t exp_wc_flags)
+{
+   uint32_t ol_flags;
+
+   ol_flags =
+   TRANSPOSE(exp_wc_flags, IBV_EXP_IPV4_PACKET, PKT_RX_IPV4_HDR) |
+   TRANSPOSE(exp_wc_flags, IBV_EXP_IPV6_PACKET, PKT_RX_IPV6_HDR);
+   if (rxq->csum)
+   ol_flags |=
+   TRANSPOSE(~exp_wc_flags,
+ IBV_EXP_L3_RX_CSUM_OK, PKT_RX_IP_CKSUM_BAD) |
+   TRANSPOSE(~exp_wc_flags,
+ IBV_EXP_L4_RX_CSUM_OK, PKT_RX_L4_CKSUM_BAD);
+   return ol_flags;
+}
+
 static uint16_t
 mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n);

@@ -2453,7 +2489,7 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
NB_SEGS(pkt_buf) = j;
PORT(pkt_buf) = rxq->port_id;
PKT_LEN(pkt_buf) = wc->byte_len;
-   pkt_buf->ol_flags = 0;
+   pkt_buf->ol_flags = rxq_wc_to_ol_flags(rxq, wc->exp_wc_flags);

/* Return packet. */
*(pkts++) = pkt_buf;
@@ -2594,7 +2630,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
NEXT(seg) = NULL;
PKT_LEN(seg) = len;
DATA_LEN(seg) = len;
-   seg->ol_flags = 0;
+   seg->ol_flags = rxq_wc_to_ol_flags(rxq, wc->exp_wc_flags);

/* Return packet. */
*(pkts++) = seg;
@@ -2818,6 +2854,11 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
/* Number of descriptors and mbufs currently allocated. */
desc_n = (tmpl.elts_n * (tmpl.sp ? MLX4_PMD_SGE_WR_N : 1));
mbuf_n = desc_n;
+   /* Toggle RX checksum offload if hardware supports it. */
+   if (priv->hw_csum) {
+   tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
+   rxq->csum = tmpl.csum;
+   }
/* Enable scattered packets support for this queue if necessary. */
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
@@ -3034,6 +3075,9 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
rte_pktmbuf_tailroom(buf)) == tmpl.mb_len);
assert(rte_pktmbuf_headroom(buf) == RTE_PKTMBUF_HEADROOM);
rte_pktmbuf_free(buf);
+   /* Toggle RX checksum offload if hardware supports it. */
+   if (priv->hw_csum)
+   tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
/* Enable scattered packets support for this queue if necessary. */
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
@@ 

[dpdk-dev] [PATCH 03/16] mlx4: make sure experimental device query function is implemented

2015-06-06 Thread Adrien Mazarguil
From: Olga Shern 

HAVE_EXP_QUERY_DEVICE is used to check whether ibv_exp_query_device() can be
used. RSS and inline receive features depend on it.

Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/Makefile |  4 
 drivers/net/mlx4/mlx4.c   | 17 ++---
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 97b364a..ce1f2b0 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -112,6 +112,10 @@ mlx4_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
SEND_RAW_WR_SUPPORT \
infiniband/verbs.h \
type 'struct ibv_send_wr_raw' $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_EXP_QUERY_DEVICE \
+   infiniband/verbs.h \
+   type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)

 mlx4.o: mlx4_autoconf.h

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index b77fb22..e32e433 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -4452,17 +4452,18 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
struct ibv_pd *pd = NULL;
struct priv *priv = NULL;
struct rte_eth_dev *eth_dev;
-#if defined(INLINE_RECV) || defined(RSS_SUPPORT)
+#ifdef HAVE_EXP_QUERY_DEVICE
struct ibv_exp_device_attr exp_device_attr;
-#endif
+#endif /* HAVE_EXP_QUERY_DEVICE */
struct ether_addr mac;
union ibv_gid temp_gid;

+#ifdef HAVE_EXP_QUERY_DEVICE
+   exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
 #ifdef RSS_SUPPORT
-   exp_device_attr.comp_mask =
-   (IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS |
-IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ);
+   exp_device_attr.comp_mask |= IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ;
 #endif /* RSS_SUPPORT */
+#endif /* HAVE_EXP_QUERY_DEVICE */

DEBUG("using port %u (%08" PRIx32 ")", port, test);

@@ -4507,11 +4508,12 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
priv->port = port;
priv->pd = pd;
priv->mtu = ETHER_MTU;
-#ifdef RSS_SUPPORT
+#ifdef HAVE_EXP_QUERY_DEVICE
if (ibv_exp_query_device(ctx, _device_attr)) {
-   INFO("experimental ibv_exp_query_device");
+   ERROR("ibv_exp_query_device() failed");
goto port_error;
}
+#ifdef RSS_SUPPORT
if ((exp_device_attr.exp_device_cap_flags &
 IBV_EXP_DEVICE_QPG) &&
(exp_device_attr.exp_device_cap_flags &
@@ -4563,6 +4565,7 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
 priv->inl_recv_size);
}
 #endif /* INLINE_RECV */
+#endif /* HAVE_EXP_QUERY_DEVICE */

(void)mlx4_getenv_int;
priv->vf = vf;
-- 
2.1.0



[dpdk-dev] [PATCH 02/16] mlx4: use experimental verbs for polling and completions

2015-06-06 Thread Adrien Mazarguil
This API implements additional flags in work completions that are required
to support checksum offloads.

Signed-off-by: Gilad Berman 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 648b210..b77fb22 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -914,7 +914,7 @@ txq_complete(struct txq *txq)
unsigned int elts_comp = txq->elts_comp;
unsigned int elts_tail;
const unsigned int elts_n = txq->elts_n;
-   struct ibv_wc wcs[elts_comp];
+   struct ibv_exp_wc wcs[elts_comp];
int wcs_n;

if (unlikely(elts_comp == 0))
@@ -923,11 +923,11 @@ txq_complete(struct txq *txq)
DEBUG("%p: processing %u work requests completions",
  (void *)txq, elts_comp);
 #endif
-   wcs_n = ibv_poll_cq(txq->cq, elts_comp, wcs);
+   wcs_n = ibv_exp_poll_cq(txq->cq, elts_comp, wcs, sizeof(wcs[0]));
if (unlikely(wcs_n == 0))
return 0;
if (unlikely(wcs_n < 0)) {
-   DEBUG("%p: ibv_poll_cq() failed (wcs_n=%d)",
+   DEBUG("%p: ibv_exp_poll_cq() failed (wcs_n=%d)",
  (void *)txq, wcs_n);
return -1;
}
@@ -2317,7 +2317,7 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
 {
struct rxq *rxq = (struct rxq *)dpdk_rxq;
struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
-   struct ibv_wc wcs[pkts_n];
+   struct ibv_exp_wc wcs[pkts_n];
struct ibv_recv_wr head;
struct ibv_recv_wr **next = 
struct ibv_recv_wr *bad_wr;
@@ -2329,18 +2329,18 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf 
**pkts, uint16_t pkts_n)
return mlx4_rx_burst(dpdk_rxq, pkts, pkts_n);
if (unlikely(elts == NULL)) /* See RTE_DEV_CMD_SET_MTU. */
return 0;
-   wcs_n = ibv_poll_cq(rxq->cq, pkts_n, wcs);
+   wcs_n = ibv_exp_poll_cq(rxq->cq, pkts_n, wcs, sizeof(wcs[0]));
if (unlikely(wcs_n == 0))
return 0;
if (unlikely(wcs_n < 0)) {
-   DEBUG("rxq=%p, ibv_poll_cq() failed (wc_n=%d)",
+   DEBUG("rxq=%p, ibv_exp_poll_cq() failed (wc_n=%d)",
  (void *)rxq, wcs_n);
return 0;
}
assert(wcs_n <= (int)pkts_n);
/* For each work completion. */
for (i = 0; (i != wcs_n); ++i) {
-   struct ibv_wc *wc = [i];
+   struct ibv_exp_wc *wc = [i];
uint64_t wr_id = wc->wr_id;
uint32_t len = wc->byte_len;
struct rxq_elt_sp *elt = &(*elts)[wr_id];
@@ -2509,7 +2509,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
 {
struct rxq *rxq = (struct rxq *)dpdk_rxq;
struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts.no_sp;
-   struct ibv_wc wcs[pkts_n];
+   struct ibv_exp_wc wcs[pkts_n];
struct ibv_recv_wr head;
struct ibv_recv_wr **next = 
struct ibv_recv_wr *bad_wr;
@@ -2519,18 +2519,18 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)

if (unlikely(rxq->sp))
return mlx4_rx_burst_sp(dpdk_rxq, pkts, pkts_n);
-   wcs_n = ibv_poll_cq(rxq->cq, pkts_n, wcs);
+   wcs_n = ibv_exp_poll_cq(rxq->cq, pkts_n, wcs, sizeof(wcs[0]));
if (unlikely(wcs_n == 0))
return 0;
if (unlikely(wcs_n < 0)) {
-   DEBUG("rxq=%p, ibv_poll_cq() failed (wc_n=%d)",
+   DEBUG("rxq=%p, ibv_exp_poll_cq() failed (wc_n=%d)",
  (void *)rxq, wcs_n);
return 0;
}
assert(wcs_n <= (int)pkts_n);
/* For each work completion. */
for (i = 0; (i != wcs_n); ++i) {
-   struct ibv_wc *wc = [i];
+   struct ibv_exp_wc *wc = [i];
uint64_t wr_id = wc->wr_id;
uint32_t len = wc->byte_len;
struct rxq_elt *elt = &(*elts)[WR_ID(wr_id).id];
-- 
2.1.0



[dpdk-dev] [PATCH 01/16] mlx4: add MOFED 3.0 compatibility to interfaces names retrieval

2015-06-06 Thread Adrien Mazarguil
Since Mellanox OFED 3.0 and Linux 3.15, interface port numbers are stored
in dev_port instead of dev_id sysfs files.

Signed-off-by: Or Ami 
Signed-off-by: Nitzan Weller 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4.c | 51 +
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index f915bc1..648b210 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -337,9 +337,11 @@ priv_unlock(struct priv *priv)
 static int
 priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
 {
-   int ret = -1;
DIR *dir;
struct dirent *dent;
+   unsigned int dev_type = 0;
+   unsigned int dev_port_prev = ~0u;
+   char match[IF_NAMESIZE] = "";

{
MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
@@ -351,7 +353,7 @@ priv_get_ifname(const struct priv *priv, char 
(*ifname)[IF_NAMESIZE])
while ((dent = readdir(dir)) != NULL) {
char *name = dent->d_name;
FILE *file;
-   unsigned int dev_id;
+   unsigned int dev_port;
int r;

if ((name[0] == '.') &&
@@ -359,22 +361,47 @@ priv_get_ifname(const struct priv *priv, char 
(*ifname)[IF_NAMESIZE])
 ((name[1] == '.') && (name[2] == '\0'
continue;

-   MKSTR(path, "%s/device/net/%s/dev_id",
- priv->ctx->device->ibdev_path, name);
+   MKSTR(path, "%s/device/net/%s/%s",
+ priv->ctx->device->ibdev_path, name,
+ (dev_type ? "dev_id" : "dev_port"));

file = fopen(path, "rb");
-   if (file == NULL)
+   if (file == NULL) {
+   if (errno != ENOENT)
+   continue;
+   /*
+* Switch to dev_id when dev_port does not exist as
+* is the case with Linux kernel versions < 3.15.
+*/
+try_dev_id:
+   match[0] = '\0';
+   if (dev_type)
+   break;
+   dev_type = 1;
+   dev_port_prev = ~0u;
+   rewinddir(dir);
continue;
-   r = fscanf(file, "%x", _id);
-   fclose(file);
-   if ((r == 1) && (dev_id == (priv->port - 1u))) {
-   snprintf(*ifname, sizeof(*ifname), "%s", name);
-   ret = 0;
-   break;
}
+   r = fscanf(file, (dev_type ? "%x" : "%u"), _port);
+   fclose(file);
+   if (r != 1)
+   continue;
+   /*
+* Switch to dev_id when dev_port returns the same value for
+* all ports. May happen when using a MOFED release older than
+* 3.0 with a Linux kernel >= 3.15.
+*/
+   if (dev_port == dev_port_prev)
+   goto try_dev_id;
+   dev_port_prev = dev_port;
+   if (dev_port == (priv->port - 1u))
+   snprintf(match, sizeof(match), "%s", name);
}
closedir(dir);
-   return ret;
+   if (match[0] == '\0')
+   return -1;
+   strncpy(*ifname, match, sizeof(*ifname));
+   return 0;
 }

 /**
-- 
2.1.0



[dpdk-dev] [PATCH 00/16] mlx4: MOFED 3.0 support, bugfixes and enhancements

2015-06-06 Thread Adrien Mazarguil
This patchset adds compatibility with the upcoming Mellanox OFED 3.0
release (new kernel drivers and userland support libraries), which supports
new features such as L3/L4 checksum validation offloads and addresses
several bugs and limitations at the same time.

Adrien Mazarguil (3):
  mlx4: add MOFED 3.0 compatibility to interfaces names retrieval
  mlx4: use experimental verbs for polling and completions
  mlx4: add L2 tunnel (VXLAN) RX checksum offload support

Alex Rosenbaum (4):
  mlx4: use faster CQ polling function
  mlx4: avoid looking up WR ID to improve RX performance
  mlx4: merge RX queue setup functions
  mlx4: add support for upstream flow steering API

Gilad Berman (1):
  mlx4: add L3 and L4 RX checksum offload support

Olga Shern (5):
  mlx4: make sure experimental device query function is implemented
  mlx4: update optimized steering warning message
  mlx4: allow applications to use fork() safely
  mlx4: improve accuracy of link status information
  mlx4: fix support for multiple VLAN filters

Or Ami (3):
  mlx4: fix error message for invalid number of descriptors
  mlx4: remove provision for flow creation failure in DMFS A0 mode
  mlx4: query netdevice to get initial MAC address

 drivers/net/mlx4/Makefile |   8 +
 drivers/net/mlx4/mlx4.c   | 627 ++
 2 files changed, 421 insertions(+), 214 deletions(-)

-- 
2.1.0



[dpdk-dev] [PATCH 4/4] app: replace dump_cfg with proc_info

2015-06-06 Thread Thomas Monjalon
2015-06-05 18:35, Maryam Tahhan:
> Extend dump_cfg to also display statistcs information for given DPDK
> ports and rename the application to proc_info as it's now a utility
> doing a little more than just dumping the memory information for DPDK.
> 
> Signed-off-by: Maryam Tahhan 
> ---
>  app/Makefile   |   2 +-
>  app/dump_cfg/Makefile  |  45 -
>  app/dump_cfg/main.c|  92 -
>  app/proc_info/Makefile |  45 +
>  app/proc_info/main.c   | 525 
> +
>  mk/rte.sdktest.mk  |   4 +-

It looks promising, thanks.
Would you consider adding yourself as a maintainer of this app?



[dpdk-dev] [PATCH 2/2] ethtool: add new library to provide ethtool-alike APIs

2015-06-06 Thread Thomas Monjalon
2015-06-05 17:24, Andrew Harvey:
> On 6/5/15, 5:47 AM, "Bruce Richardson"  wrote:
> >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> >> > That's why we need to understand what we (or you) are missing.
> >> > Maybe that it would be clearer with some code examples (which would
> >> > go in the lib documentation if any).
> >> > 
> >> > Thanks
> >
> >How about doing this work as a sample application initially, to
> >demonstrate how
> >an application written using ethtool APIs could be shimmed to use DPDK
> >underneath.
> >The ethtool to dpdk mapping could be contained in a single header file
> >(or header
> >and c file) inside the sample app. This would allow easy re-use of the
> >shim
> >layer, while at the same time not making it part of the core DPDK
> >libraries.
> >
> >Regards,
> >/Bruce
> 
> This would appear to be the most pragmatic way forward.  It would allow
> others to see more of the code and judge its value for themselves. I have
> no issues with this approach if others agree.

Since the beginning of this thread, a doc is requested (many times) to
show the benefit of integrating such a layer.
If you prefer coding a full example, it would probably also be fine.