date:20160622

[dpdk-dev] [PATCH] ethdev: fix formatting of doxygen comments

2016-06-22 Thread Thomas Monjalon

2016-06-18 02:27, Hiroyuki Mikita:
> This commit fixes some functions missing in API documentation.
> 
> Signed-off-by: Hiroyuki Mikita 

Applied, thanks

[dpdk-dev] [PATCH 0/3] ethdev: add helper functions to get eth_dev and dev private data

2016-06-22 Thread Thomas Monjalon

2016-02-17 14:20, Ferruh Yigit:
> This is to provide abstraction and reduce global variable access.
> 
> Global variable rte_eth_devices kept exported to not break ABI.
> 
> Bonding driver not selected on purpose, just it seems it is using 
> rte_eth_devices heavily.

The struct rte_eth_dev is marked internal.
It is a good goal to remove access to the global array rte_eth_devices,
but the fix must be in the code accessing it only (bonding).

This patchset is rejected.

[dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache aligned

2016-06-22 Thread Thomas Monjalon

2016-05-03 18:12, Jerin Jacob:
> Elements of struct rte_eth_dev used in the fast path.
> Make struct rte_eth_dev cache aligned to avoid the cases where
> rte_eth_dev elements share the same cache line with other structures.
> 
> Signed-off-by: Jerin Jacob 

Let's try it in real tests.
Applied, thanks.

[dpdk-dev] [PATCH v4 1/2] ethdev: add tunnel and port RSS offload types

2016-06-22 Thread Jerin Jacob

On Wed, Jun 22, 2016 at 05:06:30PM +0200, Thomas Monjalon wrote:
> 2016-06-22 18:33, Jerin Jacob:
> > - added VXLAN, GENEVE and NVGRE tunnel flow types
> > - added PORT flow type for accounting physical/virtual
> > port or channel number in flow creation
> [...]
> > +#define RTE_ETH_FLOW_PORT   18
> > +   /**< Physical/virtual port number based flow */
> 
> What about
> "Consider device port number as a flow differentiator" ?
> 
> I can make the change if you (Jerin and/or Bruce) agree.

Looks OK to me. No strong opinion on this.

[dpdk-dev] [PATCH v4 0/2] New RSS offload flags

2016-06-22 Thread Thomas Monjalon

> Jerin Jacob (2):
>   ethdev: add tunnel and port RSS offload types
>   ethdev: add ETH_RSS_RETA_SIZE_256

Applied with suggested rewording, thanks

[dpdk-dev] [PATCH v5 10/17] ethdev: get rid of eth driver register callback

2016-06-22 Thread Shreyansh jain

On Wednesday 22 June 2016 06:58 PM, Neil Horman wrote:
> On Wed, Jun 22, 2016 at 02:36:29PM +0530, Shreyansh Jain wrote:
>> Now that all pdev are pci drivers, we don't need to register ethdev drivers
>> through a dedicated channel.
>>
>> Signed-off-by: David Marchand 
>> Signed-off-by: Shreyansh Jain 
>> ---
>>  lib/librte_ether/rte_ethdev.c  | 22 --
>>  lib/librte_ether/rte_ethdev.h  | 12 
>>  lib/librte_ether/rte_ether_version.map |  1 -
>>  3 files changed, 35 deletions(-)
>>
>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>> index 312c42c..06065fe 100644
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -340,28 +340,6 @@ rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
>>  return 0;
>>  }
>>  
>> -/**
>> - * Register an Ethernet [Poll Mode] driver.
>> - *
>> - * Function invoked by the initialization function of an Ethernet driver
>> - * to simultaneously register itself as a PCI driver and as an Ethernet
>> - * Poll Mode Driver.
>> - * Invokes the rte_eal_pci_register() function to register the *pci_drv*
>> - * structure embedded in the *eth_drv* structure, after having stored the
>> - * address of the rte_eth_dev_init() function in the *devinit* field of
>> - * the *pci_drv* structure.
>> - * During the PCI probing phase, the rte_eth_dev_init() function is
>> - * invoked for each PCI [Ethernet device] matching the embedded PCI
>> - * identifiers provided by the driver.
>> - */
>> -void
>> -rte_eth_driver_register(struct eth_driver *eth_drv)
>> -{
>> -eth_drv->pci_drv.devinit = rte_eth_dev_pci_probe;
>> -eth_drv->pci_drv.devuninit = rte_eth_dev_pci_remove;
>> -rte_eal_pci_register(ð_drv->pci_drv);
>> -}
>> -
>>  int
>>  rte_eth_dev_is_valid_port(uint8_t port_id)
>>  {
>> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
>> index 2249466..ffd24e4 100644
>> --- a/lib/librte_ether/rte_ethdev.h
>> +++ b/lib/librte_ether/rte_ethdev.h
>> @@ -1862,18 +1862,6 @@ struct eth_driver {
>>  };
>>  
>>  /**
>> - * @internal
>> - * A function invoked by the initialization function of an Ethernet driver
>> - * to simultaneously register itself as a PCI driver and as an Ethernet
>> - * Poll Mode Driver (PMD).
>> - *
>> - * @param eth_drv
>> - *   The pointer to the *eth_driver* structure associated with
>> - *   the Ethernet driver.
>> - */
>> -void rte_eth_driver_register(struct eth_driver *eth_drv);
>> -
>> -/**
>>   * Convert a numerical speed in Mbps to a bitmap flag that can be used in
>>   * the bitmap link_speeds of the struct rte_eth_conf
>>   *
>> diff --git a/lib/librte_ether/rte_ether_version.map 
>> b/lib/librte_ether/rte_ether_version.map
>> index cf4581c..8151007 100644
>> --- a/lib/librte_ether/rte_ether_version.map
>> +++ b/lib/librte_ether/rte_ether_version.map
>> @@ -80,7 +80,6 @@ DPDK_2.2 {
>>  rte_eth_dev_vlan_filter;
>>  rte_eth_dev_wd_timeout_store;
>>  rte_eth_dma_zone_reserve;
>> -rte_eth_driver_register;
>>  rte_eth_led_off;
>>  rte_eth_led_on;
>>  rte_eth_link;
> Nak, Same issue as the crypto registration

Yes, I agree. I will fix this in next version.

> 
>> -- 
>> 2.7.4
>>
>>
>

[dpdk-dev] [PATCH v5 09/17] crypto: get rid of crypto driver register callback

2016-06-22 Thread Shreyansh jain

On Wednesday 22 June 2016 06:57 PM, Neil Horman wrote:
> On Wed, Jun 22, 2016 at 02:36:28PM +0530, Shreyansh Jain wrote:
>> Now that all pdev are pci drivers, we don't need to register crypto drivers
>> through a dedicated channel.
>>
>> Signed-off-by: David Marchand 
>> Signed-off-by: Shreyansh Jain 
>> ---
>>  lib/librte_cryptodev/rte_cryptodev.c   | 22 ---
>>  lib/librte_cryptodev/rte_cryptodev_pmd.h   | 30 
>> --
>>  lib/librte_cryptodev/rte_cryptodev_version.map |  1 -
>>  3 files changed, 53 deletions(-)
>>
>> diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
>> b/lib/librte_cryptodev/rte_cryptodev.c
>> index 65a2e29..a7cb33a 100644
>> --- a/lib/librte_cryptodev/rte_cryptodev.c
>> +++ b/lib/librte_cryptodev/rte_cryptodev.c
>> @@ -444,28 +444,6 @@ rte_cryptodev_pci_remove(struct rte_pci_device *pci_dev)
>>  return 0;
>>  }
>>  
>> -int
>> -rte_cryptodev_pmd_driver_register(struct rte_cryptodev_driver *cryptodrv,
>> -enum pmd_type type)
>> -{
>> -/* Call crypto device initialization directly if device is virtual */
>> -if (type == PMD_VDEV)
>> -return rte_cryptodev_pci_probe((struct rte_pci_driver 
>> *)cryptodrv,
>> -NULL);
>> -
>> -/*
>> - * Register PCI driver for physical device intialisation during
>> - * PCI probing
>> - */
>> -cryptodrv->pci_drv.devinit = rte_cryptodev_pci_probe;
>> -cryptodrv->pci_drv.devuninit = rte_cryptodev_pci_remove;
>> -
>> -rte_eal_pci_register(&cryptodrv->pci_drv);
>> -
>> -return 0;
>> -}
>> -
>> -
>>  uint16_t
>>  rte_cryptodev_queue_pair_count(uint8_t dev_id)
>>  {
>> diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h 
>> b/lib/librte_cryptodev/rte_cryptodev_pmd.h
>> index 3fb7c7c..99fd69e 100644
>> --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
>> +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
>> @@ -491,36 +491,6 @@ rte_cryptodev_pmd_virtual_dev_init(const char *name, 
>> size_t dev_private_size,
>>  extern int
>>  rte_cryptodev_pmd_release_device(struct rte_cryptodev *cryptodev);
>>  
>> -
>> -/**
>> - * Register a Crypto [Poll Mode] driver.
>> - *
>> - * Function invoked by the initialization function of a Crypto driver
>> - * to simultaneously register itself as Crypto Poll Mode Driver and to 
>> either:
>> - *
>> - *  a - register itself as PCI driver if the crypto device is a physical
>> - *  device, by invoking the rte_eal_pci_register() function to
>> - *  register the *pci_drv* structure embedded in the *crypto_drv*
>> - *  structure, after having stored the address of the
>> - *  rte_cryptodev_init() function in the *devinit* field of the
>> - *  *pci_drv* structure.
>> - *
>> - *  During the PCI probing phase, the rte_cryptodev_init()
>> - *  function is invoked for each PCI [device] matching the
>> - *  embedded PCI identifiers provided by the driver.
>> - *
>> - *  b, complete the initialization sequence if the device is a virtual
>> - *  device by calling the rte_cryptodev_init() directly passing a
>> - *  NULL parameter for the rte_pci_device structure.
>> - *
>> - *   @param crypto_drv  crypto_driver structure associated with the 
>> crypto
>> - *  driver.
>> - *   @param typepmd type
>> - */
>> -extern int
>> -rte_cryptodev_pmd_driver_register(struct rte_cryptodev_driver *crypto_drv,
>> -enum pmd_type type);
>> -
>>  /**
>>   * Executes all the user application registered callbacks for the specific
>>   * device.
>> diff --git a/lib/librte_cryptodev/rte_cryptodev_version.map 
>> b/lib/librte_cryptodev/rte_cryptodev_version.map
>> index 8d0edfb..e0a9620 100644
>> --- a/lib/librte_cryptodev/rte_cryptodev_version.map
>> +++ b/lib/librte_cryptodev/rte_cryptodev_version.map
>> @@ -14,7 +14,6 @@ DPDK_16.04 {
>>  rte_cryptodev_info_get;
>>  rte_cryptodev_pmd_allocate;
>>  rte_cryptodev_pmd_callback_process;
>> -rte_cryptodev_pmd_driver_register;
>>  rte_cryptodev_pmd_release_device;
>>  rte_cryptodev_pmd_virtual_dev_init;
>>  rte_cryptodev_sym_session_create;
> NAK, you can't just remove exported symbols without going through the
> deprecation process.  Better still would be to only expose it for DPDK_16.04 
> and
> hide it in the next release

Agree. I will fix it.

> 
> Neil
> 
>> -- 
>> 2.7.4
>>
>>
>

[dpdk-dev] [PATCH v3] ethdev: fix DCB config issue on ixgbe

2016-06-22 Thread Thomas Monjalon

2016-05-06 05:33, Wenzhuo Lu:
> An issue is found that DCB cannot be configured on ixgbe
> NICs. It's said the TX queue number is not right.
> On ixgbe the max TX queue number is not fixed, it depends
> on the multi-queue mode. The API rte_eth_dev_configure
> should be used to configure this mode. But the input of
> this API includes TX queue number. The problem is before
> the mode is configured, we cannot decide the TX queue
> number.
> 
> This patch adds an API to configure RX & TX multi-queue mode
> separately. After the mode is configured, the max RX & TX
> queue number is decided. Then we can set the appropriate
> RX & TX queue number.
[...]
> +/**
> + * Set RX & TX multi_queue mode.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param rx_mq_mode
> + *   RX multi_queue mode.
> + * @param tx_mq_mode
> + *   TX multi_queue mode.
> + *
> + * @return
> + *   - (0) if successful.
> + *   - (-ENODEV) if port identifier is invalid.
> + */
> +int
> +rte_eth_dev_mq_mode_set(uint8_t port_id,
> + enum rte_eth_rx_mq_mode rx_mq_mode,
> + enum rte_eth_tx_mq_mode tx_mq_mode);

I've really tried to think about it and I think it is more or less
a hack.
First, it is not explained in the doc when we should use
rte_eth_dev_mq_mode_set() instead of a simple call to
rte_eth_dev_configure().
Second, I don't understand why having a function which configures the
"multiqueue modes" without configuring properly RSS/VMDq/DCB.
Last, it is said that rte_eth_dev_configure() "must be invoked first
before any other function in the Ethernet API".

My opinion is that the primary goal of rte_eth_dev_configure() was
"Embedding all configuration information in a single data structure"
but it is currently configuring only speed and some flow steering
(only RSS, VMDq, DCB and flow director).
This bug and the state of the ethdev API clearly shows that we must
have one function per feature (or group of features) and drop
rte_eth_dev_configure().

You can argue it is a just a personal feeling and this comment comes
late, but I promise it is not easy to give a negative opinion because
of design perspective.
I strongly feel we must stop workarounding the ethdev API issues
and start really fixing it.

Hope you understand and agree to work on a new API.

[dpdk-dev] [PATCH v3] eal/linuxapp: fix resource leak

2016-06-22 Thread Daniel Mrzyglod

This patch fix all cases to do proper handle all munmap if pointer
of hugepage is not NULL which prohibits resource leak.

Coverity issue: 97920
Fixes: b6a468ad41d5 ("memory: add --socket-mem option")

Signed-off-by: Daniel Mrzyglod 
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 9251a5b..9b0d39a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1051,7 +1051,7 @@ int
 rte_eal_hugepage_init(void)
 {
struct rte_mem_config *mcfg;
-   struct hugepage_file *hugepage, *tmp_hp = NULL;
+   struct hugepage_file *hugepage = NULL, *tmp_hp = NULL;
struct hugepage_info used_hp[MAX_HUGEPAGE_SIZES];

uint64_t memory[RTE_MAX_NUMA_NODES];
@@ -1367,13 +1367,15 @@ rte_eal_hugepage_init(void)
"of memory.\n",
i, nr_hugefiles, RTE_STR(CONFIG_RTE_MAX_MEMSEG),
RTE_MAX_MEMSEG);
-   return -ENOMEM;
+   goto fail;
}

return 0;

 fail:
free(tmp_hp);
+   if (hugepage != NULL)
+   munmap(hugepage, nr_hugefiles * sizeof(struct hugepage_file));
return -1;
 }

-- 
2.7.4

[dpdk-dev] [PATCH v4 2/2] ethdev: add ETH_RSS_RETA_SIZE_256

2016-06-22 Thread Jerin Jacob

Signed-off-by: Jerin Jacob 
---
 lib/librte_ether/rte_ethdev.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 1579cb0..6cf4c58 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -477,6 +477,7 @@ struct rte_eth_rss_conf {
  */
 #define ETH_RSS_RETA_SIZE_64  64
 #define ETH_RSS_RETA_SIZE_128 128
+#define ETH_RSS_RETA_SIZE_256 256
 #define ETH_RSS_RETA_SIZE_512 512
 #define RTE_RETA_GROUP_SIZE   64

-- 
2.5.5

[dpdk-dev] [PATCH v4 1/2] ethdev: add tunnel and port RSS offload types

2016-06-22 Thread Jerin Jacob

- added VXLAN, GENEVE and NVGRE tunnel flow types
- added PORT flow type for accounting physical/virtual
port or channel number in flow creation

Signed-off-by: Jerin Jacob 
---
 app/test-pmd/cmdline.c  | 18 +++---
 app/test-pmd/config.c   |  9 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  6 +-
 lib/librte_ether/rte_eth_ctrl.h |  7 ++-
 lib/librte_ether/rte_ethdev.h   | 16 +++-
 5 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 9d3e4e8..b6b61ad 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -565,7 +565,7 @@ static void cmd_help_long_parsed(void *parsed_result,
"Set 
crc-strip/scatter/rx-checksum/hardware-vlan/drop_en"
" for ports.\n\n"

-   "port config all rss (all|ip|tcp|udp|sctp|ether|none)\n"
+   "port config all rss 
(all|ip|tcp|udp|sctp|ether|port|vxlan|geneve|nvgre|none)\n"
"Set the RSS mode.\n\n"

"port config port-id rss reta 
(hash,queue)[,(hash,queue)]\n"
@@ -1552,6 +1552,14 @@ cmd_config_rss_parsed(void *parsed_result,
rss_conf.rss_hf = ETH_RSS_SCTP;
else if (!strcmp(res->value, "ether"))
rss_conf.rss_hf = ETH_RSS_L2_PAYLOAD;
+   else if (!strcmp(res->value, "port"))
+   rss_conf.rss_hf = ETH_RSS_PORT;
+   else if (!strcmp(res->value, "vxlan"))
+   rss_conf.rss_hf = ETH_RSS_VXLAN;
+   else if (!strcmp(res->value, "geneve"))
+   rss_conf.rss_hf = ETH_RSS_GENEVE;
+   else if (!strcmp(res->value, "nvgre"))
+   rss_conf.rss_hf = ETH_RSS_NVGRE;
else if (!strcmp(res->value, "none"))
rss_conf.rss_hf = 0;
else {
@@ -1578,12 +1586,12 @@ cmdline_parse_token_string_t cmd_config_rss_name =
TOKEN_STRING_INITIALIZER(struct cmd_config_rss, name, "rss");
 cmdline_parse_token_string_t cmd_config_rss_value =
TOKEN_STRING_INITIALIZER(struct cmd_config_rss, value,
-   "all#ip#tcp#udp#sctp#ether#none");
+   "all#ip#tcp#udp#sctp#ether#port#vxlan#geneve#nvgre#none");

 cmdline_parse_inst_t cmd_config_rss = {
.f = cmd_config_rss_parsed,
.data = NULL,
-   .help_str = "port config all rss all|ip|tcp|udp|sctp|ether|none",
+   .help_str = "port config all rss 
all|ip|tcp|udp|sctp|ether|port|vxlan|geneve|nvgre|none",
.tokens = {
(void *)&cmd_config_rss_port,
(void *)&cmd_config_rss_keyword,
@@ -9499,6 +9507,10 @@ flowtype_to_str(uint16_t ftype)
{"ipv6-sctp", RTE_ETH_FLOW_NONFRAG_IPV6_SCTP},
{"ipv6-other", RTE_ETH_FLOW_NONFRAG_IPV6_OTHER},
{"l2_payload", RTE_ETH_FLOW_L2_PAYLOAD},
+   {"port", RTE_ETH_FLOW_PORT},
+   {"vxlan", RTE_ETH_FLOW_VXLAN},
+   {"geneve", RTE_ETH_FLOW_GENEVE},
+   {"nvgre", RTE_ETH_FLOW_NVGRE},
};

for (i = 0; i < RTE_DIM(ftype_table); i++) {
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index cb71c09..9ccabf9 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -138,6 +138,11 @@ static const struct rss_type_info rss_type_table[] = {
{ "ipv6-ex", ETH_RSS_IPV6_EX },
{ "ipv6-tcp-ex", ETH_RSS_IPV6_TCP_EX },
{ "ipv6-udp-ex", ETH_RSS_IPV6_UDP_EX },
+   { "port", ETH_RSS_PORT },
+   { "vxlan", ETH_RSS_VXLAN },
+   { "geneve", ETH_RSS_GENEVE },
+   { "nvgre", ETH_RSS_NVGRE },
+
 };

 static void
@@ -2119,6 +2124,10 @@ flowtype_to_str(uint16_t flow_type)
{"ipv6-sctp", RTE_ETH_FLOW_NONFRAG_IPV6_SCTP},
{"ipv6-other", RTE_ETH_FLOW_NONFRAG_IPV6_OTHER},
{"l2_payload", RTE_ETH_FLOW_L2_PAYLOAD},
+   {"port", RTE_ETH_FLOW_PORT},
+   {"vxlan", RTE_ETH_FLOW_VXLAN},
+   {"geneve", RTE_ETH_FLOW_GENEVE},
+   {"nvgre", RTE_ETH_FLOW_NVGRE},
};

for (i = 0; i < RTE_DIM(flowtype_str_table); i++) {
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 4e19229..30e410d 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -179,6 +179,10 @@ For example:
  ipv6-sctp
  ipv6-other
  l2_payload
+ port
+ vxlan
+ geneve
+ nvgre

 show port rss reta
 ~~
@@ -1286,7 +1290,7 @@ port config - RSS

 Set the RSS (Receive Side Scaling) mode on or off::

-   testpmd> port config all rss (all|ip|tcp|udp|sctp|ether|none)
+   testpmd> port config all rss 
(all|ip|tcp|udp|sctp|ether|port|vxlan|geneve|nvgre|none)

 RSS is on by default.

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index b8c7be9..c04a488 100

[dpdk-dev] [PATCH v4 0/2] New RSS offload flags

2016-06-22 Thread Jerin Jacob

v1..v2
- Added cover letter
- Corrected typo in RET_ETH_FLOW_VXLAN name
- Updated test-pmd application to access newly defined RSS offload flags

v2..v3
-testpmd document update(Suggested by John and Pablo)

v3..v4
- Added doxgen comments for new FLOW types(Suggested by Thomas)

Jerin Jacob (2):
  ethdev: add tunnel and port RSS offload types
  ethdev: add ETH_RSS_RETA_SIZE_256

 app/test-pmd/cmdline.c  | 18 +++---
 app/test-pmd/config.c   |  9 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  6 +-
 lib/librte_ether/rte_eth_ctrl.h |  7 ++-
 lib/librte_ether/rte_ethdev.h   | 17 -
 5 files changed, 51 insertions(+), 6 deletions(-)

-- 
2.5.5

[dpdk-dev] [PATCH] port: fix build when KNI support is not enabled

2016-06-22 Thread Olivier Matz

On 06/22/2016 02:20 PM, Thomas Monjalon wrote:
> 2016-06-22 13:57, Olivier Matz:
>> Hi Thomas,
>>
>> On 06/22/2016 01:49 PM, Thomas Monjalon wrote:
>>> 2016-06-22 14:34, Panu Matilainen:
 --- a/lib/librte_port/Makefile
 +++ b/lib/librte_port/Makefile
 @@ -82,6 +82,8 @@ DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_mempool
  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_ether
  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_ip_frag
  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_sched
 +ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_kni
 +endif
>>>
>>> I do not remember why $(CONFIG_RTE_LIBRTE_PORT) is needed in its Makefile.
>>> I think we can do
>>> DEPDIRS-$(CONFIG_RTE_LIBRTE_KNI) += lib/librte_kni
>>> and set DEPDIRS-y everywhere else.
>>>
>>
>> It's probably not much used, but the build framework allows to do
>> the following to build only one directory:
>>
>>   make lib/librte_port_sub
>>
>> This directly jumps to the librte_port Makefile, bypassing parent
>> directories. I think that's why the config check is duplicated in the
>> Makefile.
> 
> If we want to specifically build this directory, why preventing us to do
> so with CONFIG_RTE_LIBRTE_PORT?

If we call foo_sub with CONFIG_FOO=n, it will generate a library and
install headers in the build directory, however the config is unset.
Some propositions if we want to replace
DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) by DEPDIRS-y:

1/ say that "make foo_sub" should be used with care, only if CONFIG_FOO
   is set (else it is not supported) -> nothing to do
2/ fix the make %_sub feature to browse parent directories, checking
   the SUBDIRS-${CONFIG_FOO}
3/ remove the make %_sub feature, maybe nobody cares...

I think 1/ is acceptable.

[dpdk-dev] [PATCH] app/test: fix for icc compilation error

2016-06-22 Thread John Griffin

On 22/06/16 17:13, Deepak Kumar Jain wrote:
> Icc complains about variable may be used without setting.
>
> Fixes: 97fe6461c7cbfb ("app/test: add SNOW 3G performance test)
>
> Signed-off-by: Deepak Kumar Jain 
Acked-by: John Griffin

[dpdk-dev] [PATCH] app/test: fix for icc compilation error

2016-06-22 Thread Deepak Kumar Jain

Icc complains about variable may be used without setting.

Fixes: 97fe6461c7cbfb ("app/test: add SNOW 3G performance test)

Signed-off-by: Deepak Kumar Jain 
---
 app/test/test_cryptodev_perf.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/app/test/test_cryptodev_perf.c b/app/test/test_cryptodev_perf.c
index e1adc99..e484cbb 100644
--- a/app/test/test_cryptodev_perf.c
+++ b/app/test/test_cryptodev_perf.c
@@ -2458,18 +2458,17 @@ test_perf_aes_sha(uint8_t dev_id, uint16_t queue_id,

/* Generate a burst of crypto operations */
for (i = 0; i < (pparams->burst_size * NUM_MBUF_SETS); i++) {
-   struct rte_mbuf *m = test_perf_create_pktmbuf(
+   mbufs[i] = test_perf_create_pktmbuf(
ts_params->mbuf_mp,
pparams->buf_size);

-   if (m == NULL) {
+   if (mbufs[i] == NULL) {
printf("\nFailed to get mbuf - freeing the rest.\n");
for (k = 0; k < i; k++)
rte_pktmbuf_free(mbufs[k]);
return -1;
}

-   mbufs[i] = m;
}


@@ -2587,18 +2586,17 @@ test_perf_snow3g(uint8_t dev_id, uint16_t queue_id,

/* Generate a burst of crypto operations */
for (i = 0; i < (pparams->burst_size * NUM_MBUF_SETS); i++) {
-   struct rte_mbuf *m = test_perf_create_pktmbuf(
+   mbufs[i] = test_perf_create_pktmbuf(
ts_params->mbuf_mp,
pparams->buf_size);

-   if (m == NULL) {
+   if (mbufs[i] == NULL) {
printf("\nFailed to get mbuf - freeing the rest.\n");
for (k = 0; k < i; k++)
rte_pktmbuf_free(mbufs[k]);
return -1;
}

-   mbufs[i] = m;
}

tsc_start = rte_rdtsc_precise();
-- 
2.5.5

[dpdk-dev] [PATCH v4 1/2] ethdev: add tunnel and port RSS offload types

2016-06-22 Thread Thomas Monjalon

2016-06-22 18:33, Jerin Jacob:
> - added VXLAN, GENEVE and NVGRE tunnel flow types
> - added PORT flow type for accounting physical/virtual
> port or channel number in flow creation
[...]
> +#define RTE_ETH_FLOW_PORT   18
> + /**< Physical/virtual port number based flow */

What about
"Consider device port number as a flow differentiator" ?

I can make the change if you (Jerin and/or Bruce) agree.

[dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset

2016-06-22 Thread Jerin Jacob

On Wed, Jun 22, 2016 at 11:18:21AM +0200, Thomas Monjalon wrote:
> 2016-06-22 08:25, Lu, Wenzhuo:
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > 2016-06-22 13:29, Jerin Jacob:
> > > > Thomas,
> > > > As a librte_ether maintainer any comments on this?
> > > 
> > > +1 for adding details and make sure naming is good.
> > > I don't really need to comment here because I have already done this 
> > > comment
> > > earlier:
> > >   http://dpdk.org/ml/archives/dev/2016-June/041845.html
> > > Thank you for insisting.
> > I've add some details in this patch set. If it's not enough, please let me 
> > know.
> > And I think this discussion is about what the API name should be like. 
> > Actually I think all the existing name is describing what is done by the 
> > API not when and where it should be used, like dev_start/stop.
> 
> You're right, I overlooked it:
> 
> + * The API will stop the port, clear the rx/tx queues, re-setup the rx/tx
> + * queues, restart the port.
> 
> Jerin, which detail do you think is needed?

When to use what ? In what scenarios application need to use
generic stop/start vs this new API?

How about calling it as rte_eth_dev_restart() ?

If existing stop and then start is same the new API in functional perspective,
How about having generic implementation of rte_eth_dev_restart() if PMD
specific restart handlers are NOT found.

That why application need to call only rte_eth_dev_restart() for port
restart. It can internally decide optimized stop/start or generic
restart

Jerin

> 
> Wenzhuo, why this function is needed?
> All these actions are already possible independently.
> When looking at ixgbe implementation, I see:
>   ixgbevf_dev_stats_reset() which is not documented in the API
>   rte_delay_ms(1000);
>   do {} while
> It looks to be some hacks.
> If you really need some workarounds to handle some tricky situations,
> maybe that the API is not detailed enough.
> 
> > But anyway I'm open for changing the name. Is the name process_reset_intr 
> > you prefer? Thanks.
> 
> Not sure.
> If you really intend to add a generic reset, maybe rte_eth_dev_reset()
> is a good name. We just need more justification.
> After reading the doc, the user can understand it is just a wrapper of
> existing functions. But it appears in the code that it does more and can
> help in some situations.

[dpdk-dev] segfault in vt_pci_init

2016-06-22 Thread Thomas F Herbert

Thomas,

Thanks. That fixes it of course.

--TFH

On 6/22/16 3:39 PM, Thomas Monjalon wrote:
> 2016-06-22 14:27, Thomas F Herbert:
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x00616172 in vtpci_init (dev=0xbfb0f0, hw=0x7fffd57d0700,
>>   dev_flags=0x7fffdf9c)
>>   at /home/heat-admin/dpdk/drivers/net/virtio/virtio_pci.c:722
>> 722dev->devargs->type != RTE_DEVTYPE_WHITELISTED_PCI) {
> Please could you check with the current HEAD?
> Maybe this commit can fix your issue:
>   http://dpdk.org/commit/7e40200c56

-- 
*Thomas F Herbert*
SDN Group
Office of Technology
*Red Hat*

[dpdk-dev] [PATCH] arm64: change rte_memcpy to inline function

2016-06-22 Thread Jianbo Liu

On 17 June 2016 at 18:30, Thomas Monjalon  wrote:
> 2016-05-19 17:56, Thomas Monjalon:
>> 2016-05-19 21:48, Jianbo Liu:
>> > On 13 May 2016 at 23:49, Thomas Monjalon  
>> > wrote:
>> > > 2016-05-10 14:01, Jianbo Liu:
>> > >> Other APP may call rte_memcpy by function pointer,
>> > >> so change it to an inline function.
>> > >
>> > > Any example in mind?
>> > >
>> > It's for ODP-DPDK.
>>
>> Given that ODP is open (dataplane), you should also consider ppc64 and tile.
>>
>> > >> --- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
>> > >> +++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
>> > >> -#define rte_memcpy(d, s, n)  memcpy((d), (s), (n))
>> > >> +static inline void *
>> > >> +rte_memcpy(void *dst, const void *src, size_t n)
>> > >> +{
>> > >> + return memcpy(dst, src, n);
>> > >> +}
>> > >
>> > > It has no sense if other archs (arm32, ppc64, tile) are not updated.
>> > >
>> > But it also an inline function on x86.
>>
>> In x86, it was implemented as a function because there is some code.
>> If you want to make sure it is always a function, even in the case
>> of just calling memcpy from libc, you should put a doxygen comment in
>> the generic part and adapt every archs.
>
> no news?
> a v2 would be welcome

Hi Thomas,
Please close it, since there is already a solution to this issue in odp-dpdk.

Thanks!
Jianbo

[dpdk-dev] [PATCH v5 09/17] crypto: get rid of crypto driver register callback

2016-06-22 Thread Thomas Monjalon

2016-06-22 09:27, Neil Horman:
> > +++ b/lib/librte_cryptodev/rte_cryptodev_version.map
> > -   rte_cryptodev_pmd_driver_register;
> NAK, you can't just remove exported symbols without going through the
> deprecation process.  Better still would be to only expose it for DPDK_16.04 
> and
> hide it in the next release

This function is not called by the application.
Thus there is no ABI break.

[dpdk-dev] segfault in vt_pci_init

2016-06-22 Thread Thomas Monjalon

2016-06-22 14:27, Thomas F Herbert:
> Program received signal SIGSEGV, Segmentation fault.
> 0x00616172 in vtpci_init (dev=0xbfb0f0, hw=0x7fffd57d0700,
>  dev_flags=0x7fffdf9c)
>  at /home/heat-admin/dpdk/drivers/net/virtio/virtio_pci.c:722
> 722dev->devargs->type != RTE_DEVTYPE_WHITELISTED_PCI) {

Please could you check with the current HEAD?
Maybe this commit can fix your issue:
http://dpdk.org/commit/7e40200c56

[dpdk-dev] [PATCH v2 3/3] app/pdump: fix string overflow

2016-06-22 Thread Reshma Pattan

replaced strncpy with snprintf for safely
copying the strings.

Coverity issue 127351: string overflow

Fixes: caa7028276b8 ("app/pdump: add tool for packet capturing")

Signed-off-by: Reshma Pattan 
---
 app/pdump/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/app/pdump/main.c b/app/pdump/main.c
index f8923b9..fe4d38a 100644
--- a/app/pdump/main.c
+++ b/app/pdump/main.c
@@ -217,12 +217,12 @@ parse_rxtxdev(const char *key, const char *value, void 
*extra_args)
struct pdump_tuples *pt = extra_args;

if (!strcmp(key, PDUMP_RX_DEV_ARG)) {
-   strncpy(pt->rx_dev, value, strlen(value));
+   snprintf(pt->rx_dev, sizeof(pt->rx_dev), "%s", value);
/* identify the tx stream type for pcap vdev */
if (if_nametoindex(pt->rx_dev))
pt->rx_vdev_stream_type = IFACE;
} else if (!strcmp(key, PDUMP_TX_DEV_ARG)) {
-   strncpy(pt->tx_dev, value, strlen(value));
+   snprintf(pt->tx_dev, sizeof(pt->tx_dev), "%s", value);
/* identify the tx stream type for pcap vdev */
if (if_nametoindex(pt->tx_dev))
pt->tx_vdev_stream_type = IFACE;
-- 
2.5.0

[dpdk-dev] [PATCH v2 2/3] pdump: fix string overflow

2016-06-22 Thread Reshma Pattan

replaced strncpy with snprintf for safely
copying the strings.

Cverity issue 127350: string overflow

Fixes: 278f945402c5 ("pdump: add new library for packet capture")

Signed-off-by: Reshma Pattan 
---
 lib/librte_pdump/rte_pdump.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
index de20b9e..b499d09 100644
--- a/lib/librte_pdump/rte_pdump.c
+++ b/lib/librte_pdump/rte_pdump.c
@@ -799,13 +799,15 @@ pdump_prepare_client_request(char *device, uint16_t queue,
req.flags = flags;
req.op =  operation;
if ((operation & ENABLE) != 0) {
-   strncpy(req.data.en_v1.device, device, strlen(device));
+   snprintf(req.data.en_v1.device, sizeof(req.data.en_v1.device),
+   "%s", device);
req.data.en_v1.queue = queue;
req.data.en_v1.ring = ring;
req.data.en_v1.mp = mp;
req.data.en_v1.filter = filter;
} else {
-   strncpy(req.data.dis_v1.device, device, strlen(device));
+   snprintf(req.data.dis_v1.device, sizeof(req.data.dis_v1.device),
+   "%s", device);
req.data.dis_v1.queue = queue;
req.data.dis_v1.ring = NULL;
req.data.dis_v1.mp = NULL;
-- 
2.5.0

[dpdk-dev] [PATCH v2 1/3] pdump: check getenv return value

2016-06-22 Thread Reshma Pattan

inside pdump_get_socket_path(), getenv can return
a NULL pointer if the match for SOCKET_PATH_HOME is
not found in the environment. NULL check is added to
return -1 immediately without calling mkdir.
Since pdump_get_socket_path() returns -1 now,
wherever this function is called there the return
value is checked and error message is logged.

Coverity issue 127344:  return value check
Coverity issue 127347:  null pointer dereference

Fixes: 278f945402c5 ("pdump: add new library for packet capture")
Fixes: 278f945402c5 ("pdump: add new library for packet capture")

Signed-off-by: Reshma Pattan 
---
 lib/librte_pdump/rte_pdump.c | 47 +---
 1 file changed, 40 insertions(+), 7 deletions(-)

diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
index c921f51..de20b9e 100644
--- a/lib/librte_pdump/rte_pdump.c
+++ b/lib/librte_pdump/rte_pdump.c
@@ -441,7 +441,7 @@ set_pdump_rxtx_cbs(struct pdump_request *p)
 }

 /* get socket path (/var/run if root, $HOME otherwise) */
-static void
+static int
 pdump_get_socket_path(char *buffer, int bufsz, enum rte_pdump_socktype type)
 {
const char *dir = NULL;
@@ -451,9 +451,16 @@ pdump_get_socket_path(char *buffer, int bufsz, enum 
rte_pdump_socktype type)
else if (type == RTE_PDUMP_SOCKET_CLIENT && client_socket_dir[0] != 0)
dir = client_socket_dir;
else {
-   if (getuid() != 0)
+   if (getuid() != 0) {
dir = getenv(SOCKET_PATH_HOME);
-   else
+   if (!dir) {
+   RTE_LOG(ERR, PDUMP,
+   "Failed to get environment variable"
+   " value for %s, %s:%d\n",
+   SOCKET_PATH_HOME, __func__, __LINE__);
+   return -1;
+   }
+   } else
dir = SOCKET_PATH_VAR_RUN;
}

@@ -463,6 +470,8 @@ pdump_get_socket_path(char *buffer, int bufsz, enum 
rte_pdump_socktype type)
else
snprintf(buffer, bufsz, CLIENT_SOCKET, dir, getpid(),
rte_sys_gettid());
+
+   return 0;
 }

 static int
@@ -472,8 +481,14 @@ pdump_create_server_socket(void)
struct sockaddr_un addr;
socklen_t addr_len;

-   pdump_get_socket_path(addr.sun_path, sizeof(addr.sun_path),
+   ret = pdump_get_socket_path(addr.sun_path, sizeof(addr.sun_path),
RTE_PDUMP_SOCKET_SERVER);
+   if (ret != 0) {
+   RTE_LOG(ERR, PDUMP,
+   "Failed to get server socket path: %s:%d\n",
+   __func__, __LINE__);
+   return -1;
+   }
addr.sun_family = AF_UNIX;

/* remove if file already exists */
@@ -604,8 +619,14 @@ rte_pdump_uninit(void)

struct sockaddr_un addr;

-   pdump_get_socket_path(addr.sun_path, sizeof(addr.sun_path),
+   ret = pdump_get_socket_path(addr.sun_path, sizeof(addr.sun_path),
RTE_PDUMP_SOCKET_SERVER);
+   if (ret != 0) {
+   RTE_LOG(ERR, PDUMP,
+   "Failed to get server socket path: %s:%d\n",
+   __func__, __LINE__);
+   return -1;
+   }
ret = unlink(addr.sun_path);
if (ret != 0) {
RTE_LOG(ERR, PDUMP,
@@ -639,8 +660,14 @@ pdump_create_client_socket(struct pdump_request *p)
return ret;
}

-   pdump_get_socket_path(addr.sun_path, sizeof(addr.sun_path),
+   ret = pdump_get_socket_path(addr.sun_path, sizeof(addr.sun_path),
RTE_PDUMP_SOCKET_CLIENT);
+   if (ret != 0) {
+   RTE_LOG(ERR, PDUMP,
+   "Failed to get client socket path: %s:%d\n",
+   __func__, __LINE__);
+   return -1;
+   }
addr.sun_family = AF_UNIX;
addr_len = sizeof(struct sockaddr_un);

@@ -656,9 +683,15 @@ pdump_create_client_socket(struct pdump_request *p)

serv_len = sizeof(struct sockaddr_un);
memset(&serv_addr, 0, sizeof(serv_addr));
-   pdump_get_socket_path(serv_addr.sun_path,
+   ret = pdump_get_socket_path(serv_addr.sun_path,
sizeof(serv_addr.sun_path),
RTE_PDUMP_SOCKET_SERVER);
+   if (ret != 0) {
+   RTE_LOG(ERR, PDUMP,
+   "Failed to get server socket path: %s:%d\n",
+   __func__, __LINE__);
+   break;
+   }
serv_addr.sun_family = AF_UNIX;

n =  sendto(socket_fd, p, sizeof(struct pdump_request), 0,
-- 
2.5.0

[dpdk-dev] [PATCH v2 0/3] fix coverity issues in packet capture framework

2016-06-22 Thread Reshma Pattan

This patchset fixes coverity issues in pdump library and pdump tool.

v2:
fixed code review comment to use snprintf instead of strncpy.

Reshma Pattan (3):
  pdump: check getenv return value
  pdump: fix string overflow
  app/pdump: fix string overflow

 app/pdump/main.c |  4 ++--
 lib/librte_pdump/rte_pdump.c | 53 
 2 files changed, 46 insertions(+), 11 deletions(-)

-- 
2.5.0

[dpdk-dev] [PATCH v5 17/17] ethdev: get rid of device type

2016-06-22 Thread Shreyansh Jain

Now that hotplug has been moved to eal, there is no reason to keep the device
type in this layer.

Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 app/test/virtual_pmd.c|  2 +-
 drivers/net/af_packet/rte_eth_af_packet.c |  2 +-
 drivers/net/bonding/rte_eth_bond_api.c|  2 +-
 drivers/net/cxgbe/cxgbe_main.c|  2 +-
 drivers/net/mlx4/mlx4.c   |  2 +-
 drivers/net/mlx5/mlx5.c   |  2 +-
 drivers/net/mpipe/mpipe_tilegx.c  |  2 +-
 drivers/net/null/rte_eth_null.c   |  2 +-
 drivers/net/pcap/rte_eth_pcap.c   |  2 +-
 drivers/net/ring/rte_eth_ring.c   |  2 +-
 drivers/net/vhost/rte_eth_vhost.c |  2 +-
 drivers/net/xenvirt/rte_eth_xenvirt.c |  2 +-
 examples/ip_pipeline/init.c   | 22 --
 lib/librte_ether/rte_ethdev.c |  5 ++---
 lib/librte_ether/rte_ethdev.h | 15 +--
 15 files changed, 15 insertions(+), 51 deletions(-)

diff --git a/app/test/virtual_pmd.c b/app/test/virtual_pmd.c
index b4bd2f2..8a1f0d0 100644
--- a/app/test/virtual_pmd.c
+++ b/app/test/virtual_pmd.c
@@ -581,7 +581,7 @@ virtual_ethdev_create(const char *name, struct ether_addr 
*mac_addr,
goto err;

/* reserve an ethdev entry */
-   eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_PCI);
+   eth_dev = rte_eth_dev_allocate(name);
if (eth_dev == NULL)
goto err;

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index f17bd7e..36ac102 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -648,7 +648,7 @@ rte_pmd_init_internals(const char *name,
}

/* reserve an ethdev entry */
-   *eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+   *eth_dev = rte_eth_dev_allocate(name);
if (*eth_dev == NULL)
goto error;

diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index 53df9fe..b858ee1 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -189,7 +189,7 @@ rte_eth_bond_create(const char *name, uint8_t mode, uint8_t 
socket_id)
}

/* reserve an ethdev entry */
-   eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+   eth_dev = rte_eth_dev_allocate(name);
if (eth_dev == NULL) {
RTE_BOND_LOG(ERR, "Unable to allocate rte_eth_dev");
goto err;
diff --git a/drivers/net/cxgbe/cxgbe_main.c b/drivers/net/cxgbe/cxgbe_main.c
index ceaf5ab..922155b 100644
--- a/drivers/net/cxgbe/cxgbe_main.c
+++ b/drivers/net/cxgbe/cxgbe_main.c
@@ -1150,7 +1150,7 @@ int cxgbe_probe(struct adapter *adapter)
 */

/* reserve an ethdev entry */
-   pi->eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_PCI);
+   pi->eth_dev = rte_eth_dev_allocate(name);
if (!pi->eth_dev)
goto out_free;

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index b594433..ba42c33 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -5715,7 +5715,7 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)

snprintf(name, sizeof(name), "%s port %u",
 ibv_get_device_name(ibv_dev), port);
-   eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_PCI);
+   eth_dev = rte_eth_dev_allocate(name);
}
if (eth_dev == NULL) {
ERROR("can not allocate rte ethdev");
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 1989a37..f6399fc 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -519,7 +519,7 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)

snprintf(name, sizeof(name), "%s port %u",
 ibv_get_device_name(ibv_dev), port);
-   eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_PCI);
+   eth_dev = rte_eth_dev_allocate(name);
}
if (eth_dev == NULL) {
ERROR("can not allocate rte ethdev");
diff --git a/drivers/net/mpipe/mpipe_tilegx.c b/drivers/net/mpipe/mpipe_tilegx.c
index 26e1424..9de556e 100644
--- a/drivers/net/mpipe/mpipe_tilegx.c
+++ b/drivers/net/mpipe/mpipe_tilegx.c
@@ -1587,7 +1587,7 @@ rte_pmd_mpipe_devinit(const char *ifname,
return -ENODEV;
}

-   eth_dev = rte_eth_dev_allocate(ifname, RTE_ETH_DEV_VIRTUAL);
+   eth_dev = rte_eth_dev_allocate(ifname);
if (!eth_dev) {
RTE_LOG(ERR, PMD, "%s: Failed to allocate device.\n", ifname);
rte_free(priv);
diff --git a/drivers/net/null/rt

[dpdk-dev] [PATCH v5 16/17] ethdev: convert to eal hotplug

2016-06-22 Thread Shreyansh Jain

Remove bus logic from ethdev hotplug by using eal for this.

Current api is preserved:
- the last port that has been created is tracked to return it to the
  application when attaching,
- the internal device name is reused when detaching.

We can not get rid of ethdev hotplug yet since we still need some mechanism
to inform applications of port creation/removal to substitute for ethdev
hotplug api.

dev_type field in struct rte_eth_dev and rte_eth_dev_allocate are kept as
is, but this information is not needed anymore and is removed in the following
commit.

Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_ether/rte_ethdev.c | 207 +++---
 1 file changed, 33 insertions(+), 174 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 6f2b169..10e81e1 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -72,6 +72,7 @@
 static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
 struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
 static struct rte_eth_dev_data *rte_eth_dev_data;
+static uint8_t eth_dev_last_created_port;
 static uint8_t nb_ports;

 /* spinlock for eth device callbacks */
@@ -216,6 +217,7 @@ rte_eth_dev_allocate(const char *name, enum 
rte_eth_dev_type type)
eth_dev->data->port_id = port_id;
eth_dev->attached = DEV_ATTACHED;
eth_dev->dev_type = type;
+   eth_dev_last_created_port = port_id;
nb_ports++;
return eth_dev;
 }
@@ -347,27 +349,6 @@ rte_eth_dev_count(void)
return nb_ports;
 }

-static enum rte_eth_dev_type
-rte_eth_dev_get_device_type(uint8_t port_id)
-{
-   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, RTE_ETH_DEV_UNKNOWN);
-   return rte_eth_devices[port_id].dev_type;
-}
-
-static int
-rte_eth_dev_get_addr_by_port(uint8_t port_id, struct rte_pci_addr *addr)
-{
-   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
-
-   if (addr == NULL) {
-   RTE_PMD_DEBUG_TRACE("Null pointer is specified\n");
-   return -EINVAL;
-   }
-
-   *addr = rte_eth_devices[port_id].pci_dev->addr;
-   return 0;
-}
-
 int
 rte_eth_dev_get_name_by_port(uint8_t port_id, char *name)
 {
@@ -413,34 +394,6 @@ rte_eth_dev_get_port_by_name(const char *name, uint8_t 
*port_id)
 }

 static int
-rte_eth_dev_get_port_by_addr(const struct rte_pci_addr *addr, uint8_t *port_id)
-{
-   int i;
-   struct rte_pci_device *pci_dev = NULL;
-
-   if (addr == NULL) {
-   RTE_PMD_DEBUG_TRACE("Null pointer is specified\n");
-   return -EINVAL;
-   }
-
-   *port_id = RTE_MAX_ETHPORTS;
-
-   for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
-
-   pci_dev = rte_eth_devices[i].pci_dev;
-
-   if (pci_dev &&
-   !rte_eal_compare_pci_addr(&pci_dev->addr, addr)) {
-
-   *port_id = i;
-
-   return 0;
-   }
-   }
-   return -ENODEV;
-}
-
-static int
 rte_eth_dev_is_detachable(uint8_t port_id)
 {
uint32_t dev_flags;
@@ -465,124 +418,45 @@ rte_eth_dev_is_detachable(uint8_t port_id)
return 1;
 }

-/* attach the new physical device, then store port_id of the device */
-static int
-rte_eth_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id)
-{
-   /* Invoke probe func of the driver can handle the new device. */
-   if (rte_eal_pci_probe_one(addr))
-   goto err;
-
-   if (rte_eth_dev_get_port_by_addr(addr, port_id))
-   goto err;
-
-   return 0;
-err:
-   return -1;
-}
-
-/* detach the new physical device, then store pci_addr of the device */
-static int
-rte_eth_dev_detach_pdev(uint8_t port_id, struct rte_pci_addr *addr)
-{
-   struct rte_pci_addr freed_addr;
-   struct rte_pci_addr vp;
-
-   /* get pci address by port id */
-   if (rte_eth_dev_get_addr_by_port(port_id, &freed_addr))
-   goto err;
-
-   /* Zeroed pci addr means the port comes from virtual device */
-   vp.domain = vp.bus = vp.devid = vp.function = 0;
-   if (rte_eal_compare_pci_addr(&vp, &freed_addr) == 0)
-   goto err;
-
-   /* invoke devuninit func of the pci driver,
-* also remove the device from pci_device_list */
-   if (rte_eal_pci_detach(&freed_addr))
-   goto err;
-
-   *addr = freed_addr;
-   return 0;
-err:
-   return -1;
-}
-
-/* attach the new virtual device, then store port_id of the device */
-static int
-rte_eth_dev_attach_vdev(const char *vdevargs, uint8_t *port_id)
-{
-   char *name = NULL, *args = NULL;
-   int ret = -1;
-
-   /* parse vdevargs, then retrieve device name and args */
-   if (rte_eal_parse_devargs_str(vdevargs, &name, &args))
-   goto end;
-
-   /* walk around dev_driver_list to find the driver of the device,
-* then invoke probe function of the driver.
-* rte_eal_

[dpdk-dev] [PATCH v5 15/17] eal: add hotplug operations for pci and vdev

2016-06-22 Thread Shreyansh Jain

hotplug which deals with resources should come from the layer that already
handles them, i.e. eal.

For both attach and detach operations, 'name' is used to select the bus
that will handle the request.

Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  2 ++
 lib/librte_eal/common/eal_common_dev.c  | 47 +
 lib/librte_eal/common/include/rte_dev.h | 25 +
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  2 ++
 4 files changed, 76 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 1852c4a..6f9324f 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -159,5 +159,7 @@ DPDK_16.07 {
rte_keepalive_mark_sleep;
rte_keepalive_register_relay_callback;
rte_thread_setname;
+   rte_eal_dev_attach;
+   rte_eal_dev_detach;

 } DPDK_16.04;
diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index a8a4146..14c6cf1 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -150,3 +150,50 @@ rte_eal_vdev_uninit(const char *name)
RTE_LOG(ERR, EAL, "no driver found for %s\n", name);
return -EINVAL;
 }
+
+int rte_eal_dev_attach(const char *name, const char *devargs)
+{
+   struct rte_pci_addr addr;
+   int ret = -1;
+
+   if (name == NULL || devargs == NULL) {
+   RTE_LOG(ERR, EAL, "Invalid device arguments provided\n");
+   return ret;
+   }
+
+   if (eal_parse_pci_DomBDF(name, &addr) == 0) {
+   if (rte_eal_pci_probe_one(&addr) < 0)
+   goto err;
+
+   } else {
+   if (rte_eal_vdev_init(name, devargs))
+   goto err;
+   }
+
+   return 0;
+
+err:
+   RTE_LOG(ERR, EAL, "Driver cannot attach the device\n");
+   return ret;
+}
+
+int rte_eal_dev_detach(const char *name)
+{
+   struct rte_pci_addr addr;
+
+   if (name == NULL)
+   goto err;
+
+   if (eal_parse_pci_DomBDF(name, &addr) == 0) {
+   if (rte_eal_pci_detach(&addr) < 0)
+   goto err;
+   } else {
+   if (rte_eal_vdev_uninit(name))
+   goto err;
+   }
+   return 0;
+
+err:
+   RTE_LOG(ERR, EAL, "Driver cannot detach the device\n");
+   return -1;
+}
diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index 85e48f2..b1c0520 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -178,6 +178,31 @@ int rte_eal_vdev_init(const char *name, const char *args);
  */
 int rte_eal_vdev_uninit(const char *name);

+/**
+ * Attach a resource to a registered driver.
+ *
+ * @param name
+ *   The resource name, that refers to a pci resource or some private
+ *   way of designating a resource for vdev drivers. Based on this
+ *   resource name, eal will identify a driver capable of handling
+ *   this resource and pass this resource to the driver probing
+ *   function.
+ * @param devargs
+ *   Device arguments to be passed to the driver.
+ * @return
+ *   0 on success, negative on error.
+ */
+int rte_eal_dev_attach(const char *name, const char *devargs);
+
+/**
+ * Detach a resource from its driver.
+ *
+ * @param name
+ *   Same description as for rte_eal_dev_attach().
+ *   Here, eal will call the driver detaching function.
+ */
+int rte_eal_dev_detach(const char *name);
+
 #define PMD_REGISTER_DRIVER(d)\
 RTE_INIT(devinitfn_ ##d);\
 static void devinitfn_ ##d(void)\
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map 
b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 0513467..6c6163e 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -162,5 +162,7 @@ DPDK_16.07 {
rte_keepalive_mark_sleep;
rte_keepalive_register_relay_callback;
rte_thread_setname;
+   rte_eal_dev_attach;
+   rte_eal_dev_detach;

 } DPDK_16.04;
-- 
2.7.4

[dpdk-dev] [PATCH v5 14/17] ethdev: do not scan all pci devices on attach

2016-06-22 Thread Shreyansh Jain

No need to scan all devices, we only need to update the device being
attached.

Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_eal/common/eal_common_pci.c | 11 ---
 lib/librte_ether/rte_ethdev.c  |  3 ---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index dfd0a8c..d05dda4 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -339,6 +339,11 @@ rte_eal_pci_probe_one(const struct rte_pci_addr *addr)
if (addr == NULL)
return -1;

+   /* update current pci device in global list, kernel bindings might have
+* changed since last time we looked at it */
+   if (pci_update_device(addr) < 0)
+   goto err_return;
+
TAILQ_FOREACH(dev, &pci_device_list, next) {
if (rte_eal_compare_pci_addr(&dev->addr, addr))
continue;
@@ -351,9 +356,9 @@ rte_eal_pci_probe_one(const struct rte_pci_addr *addr)
return -1;

 err_return:
-   RTE_LOG(WARNING, EAL, "Requested device " PCI_PRI_FMT
-   " cannot be used\n", dev->addr.domain, dev->addr.bus,
-   dev->addr.devid, dev->addr.function);
+   RTE_LOG(WARNING, EAL,
+   "Requested device " PCI_PRI_FMT " cannot be used\n",
+   addr->domain, addr->bus, addr->devid, addr->function);
return -1;
 }

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ace8353..6f2b169 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -469,9 +469,6 @@ rte_eth_dev_is_detachable(uint8_t port_id)
 static int
 rte_eth_dev_attach_pdev(struct rte_pci_addr *addr, uint8_t *port_id)
 {
-   /* re-construct pci_device_list */
-   if (rte_eal_pci_scan())
-   goto err;
/* Invoke probe func of the driver can handle the new device. */
if (rte_eal_pci_probe_one(addr))
goto err;
-- 
2.7.4

[dpdk-dev] [PATCH v5 13/17] pci: add a helper to update a device

2016-06-22 Thread Shreyansh Jain

This helper updates a pci device object with latest information it can
find.
It will be used mainly for hotplug code.

Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c | 49 +
 lib/librte_eal/common/eal_common_pci.c  |  2 --
 lib/librte_eal/common/eal_private.h | 13 +
 lib/librte_eal/common/include/rte_pci.h |  3 ++
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 13 +
 5 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 880483d..013c953 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -406,6 +406,55 @@ error:
return -1;
 }

+int
+pci_update_device(const struct rte_pci_addr *addr)
+{
+   int fd;
+   struct pci_conf matches[2];
+   struct pci_match_conf match = {
+   .pc_sel = {
+   .pc_domain = addr->domain,
+   .pc_bus = addr->bus,
+   .pc_dev = addr->devid,
+   .pc_func = addr->function,
+   },
+   };
+   struct pci_conf_io conf_io = {
+   .pat_buf_len = 0,
+   .num_patterns = 1,
+   .patterns = &match,
+   .match_buf_len = sizeof(matches),
+   .matches = &matches[0],
+   };
+
+   fd = open("/dev/pci", O_RDONLY);
+   if (fd < 0) {
+   RTE_LOG(ERR, EAL, "%s(): error opening /dev/pci\n", __func__);
+   goto error;
+   }
+
+   if (ioctl(fd, PCIOCGETCONF, &conf_io) < 0) {
+   RTE_LOG(ERR, EAL, "%s(): error with ioctl on /dev/pci: %s\n",
+   __func__, strerror(errno));
+   goto error;
+   }
+
+   if (conf_io.num_matches != 1)
+   goto error;
+
+   if (pci_scan_one(fd, &matches[0]) < 0)
+   goto error;
+
+   close(fd);
+
+   return 0;
+
+error:
+   if (fd >= 0)
+   close(fd);
+   return -1;
+}
+
 /* Read PCI config space. */
 int rte_eal_pci_read_config(const struct rte_pci_device *dev,
void *buf, size_t len, off_t offset)
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index fee4aa5..dfd0a8c 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -87,8 +87,6 @@ struct pci_driver_list pci_driver_list =
 struct pci_device_list pci_device_list =
TAILQ_HEAD_INITIALIZER(pci_device_list);

-#define SYSFS_PCI_DEVICES "/sys/bus/pci/devices"
-
 const char *pci_get_sysfs_path(void)
 {
const char *path = NULL;
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 06a68f6..b8ff9c5 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -152,6 +152,19 @@ struct rte_pci_driver;
 struct rte_pci_device;

 /**
+ * Update a pci device object by asking the kernel for the latest information.
+ *
+ * This function is private to EAL.
+ *
+ * @param addr
+ * The PCI Bus-Device-Function address to look for
+ * @return
+ *   - 0 on success.
+ *   - negative on error.
+ */
+int pci_update_device(const struct rte_pci_addr *addr);
+
+/**
  * Unbind kernel driver for this device
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 1666a55..eed6b56 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -107,6 +107,9 @@ const char *pci_get_sysfs_path(void);
 /** Nb. of values in PCI resource format. */
 #define PCI_RESOURCE_FMT_NVAL 3

+/** Default sysfs path for PCI device search. */
+#define SYSFS_PCI_DEVICES "/sys/bus/pci/devices"
+
 /**
  * A structure describing a PCI resource.
  */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index bfc410f..0a368c5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -417,6 +417,19 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t 
bus,
return 0;
 }

+int
+pci_update_device(const struct rte_pci_addr *addr)
+{
+   char filename[PATH_MAX];
+
+   snprintf(filename, sizeof(filename), "%s/" PCI_PRI_FMT,
+SYSFS_PCI_DEVICES, addr->domain, addr->bus, addr->devid,
+addr->function);
+
+   return pci_scan_one(filename, addr->domain, addr->bus, addr->devid,
+   addr->function);
+}
+
 /*
  * split up a pci address into its constituent parts.
  */
-- 
2.7.4

[dpdk-dev] [PATCH v5 12/17] pci: add a helper for device name

2016-06-22 Thread Shreyansh Jain

eal is a better place than crypto / ethdev for naming resources.
Add a helper in eal and make use of it in crypto / ethdev.

Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_cryptodev/rte_cryptodev.c| 27 ---
 lib/librte_eal/common/include/rte_pci.h | 25 +
 lib/librte_ether/rte_ethdev.c   | 24 
 3 files changed, 33 insertions(+), 43 deletions(-)

diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
b/lib/librte_cryptodev/rte_cryptodev.c
index a7cb33a..3b587e4 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -276,23 +276,6 @@ rte_cryptodev_pmd_allocate(const char *name, int socket_id)
return cryptodev;
 }

-static inline int
-rte_cryptodev_create_unique_device_name(char *name, size_t size,
-   struct rte_pci_device *pci_dev)
-{
-   int ret;
-
-   if ((name == NULL) || (pci_dev == NULL))
-   return -EINVAL;
-
-   ret = snprintf(name, size, "%d:%d.%d",
-   pci_dev->addr.bus, pci_dev->addr.devid,
-   pci_dev->addr.function);
-   if (ret < 0)
-   return ret;
-   return 0;
-}
-
 int
 rte_cryptodev_pmd_release_device(struct rte_cryptodev *cryptodev)
 {
@@ -355,9 +338,8 @@ rte_cryptodev_pci_probe(struct rte_pci_driver *pci_drv,
if (cryptodrv == NULL)
return -ENODEV;

-   /* Create unique Crypto device name using PCI address */
-   rte_cryptodev_create_unique_device_name(cryptodev_name,
-   sizeof(cryptodev_name), pci_dev);
+   rte_eal_pci_device_name(&pci_dev->addr, cryptodev_name,
+   sizeof(cryptodev_name));

cryptodev = rte_cryptodev_pmd_allocate(cryptodev_name, rte_socket_id());
if (cryptodev == NULL)
@@ -412,9 +394,8 @@ rte_cryptodev_pci_remove(struct rte_pci_device *pci_dev)
if (pci_dev == NULL)
return -EINVAL;

-   /* Create unique device name using PCI address */
-   rte_cryptodev_create_unique_device_name(cryptodev_name,
-   sizeof(cryptodev_name), pci_dev);
+   rte_eal_pci_device_name(&pci_dev->addr, cryptodev_name,
+   sizeof(cryptodev_name));

cryptodev = rte_cryptodev_pmd_get_named_dev(cryptodev_name);
if (cryptodev == NULL)
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index ac890fc..1666a55 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -82,6 +82,7 @@ extern "C" {
 #include 
 #include 

+#include 
 #include 

 TAILQ_HEAD(pci_device_list, rte_pci_device); /**< PCI devices in D-linked Q. */
@@ -95,6 +96,7 @@ const char *pci_get_sysfs_path(void);

 /** Formatting string for PCI device identifier: Ex: :00:01.0 */
 #define PCI_PRI_FMT "%.4" PRIx16 ":%.2" PRIx8 ":%.2" PRIx8 ".%" PRIx8
+#define PCI_PRI_STR_SIZE sizeof(":XX:XX.X")

 /** Short formatting string, without domain, for PCI device: Ex: 00:01.0 */
 #define PCI_SHORT_PRI_FMT "%.2" PRIx8 ":%.2" PRIx8 ".%" PRIx8
@@ -308,6 +310,29 @@ eal_parse_pci_DomBDF(const char *input, struct 
rte_pci_addr *dev_addr)
 }
 #undef GET_PCIADDR_FIELD

+/**
+ * Utility function to write a pci device name, this device name can later be
+ * used to retrieve the corresponding rte_pci_addr using above functions.
+ *
+ * @param addr
+ * The PCI Bus-Device-Function address
+ * @param output
+ * The output buffer string
+ * @param size
+ * The output buffer size
+ * @return
+ *  0 on success, negative on error.
+ */
+static inline void
+rte_eal_pci_device_name(const struct rte_pci_addr *addr,
+   char *output, size_t size)
+{
+   RTE_VERIFY(size >= PCI_PRI_STR_SIZE);
+   RTE_VERIFY(snprintf(output, size, PCI_PRI_FMT,
+   addr->domain, addr->bus,
+   addr->devid, addr->function) >= 0);
+}
+
 /* Compare two PCI device addresses. */
 /**
  * Utility function to compare two PCI device addresses.
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 06065fe..ace8353 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -220,20 +220,6 @@ rte_eth_dev_allocate(const char *name, enum 
rte_eth_dev_type type)
return eth_dev;
 }

-static int
-rte_eth_dev_create_unique_device_name(char *name, size_t size,
-   struct rte_pci_device *pci_dev)
-{
-   int ret;
-
-   ret = snprintf(name, size, "%d:%d.%d",
-   pci_dev->addr.bus, pci_dev->addr.devid,
-   pci_dev->addr.function);
-   if (ret < 0)
-   return ret;
-   return 0;
-}
-
 int
 rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
 {
@@ -257,9 +243,8 @@ rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv,

eth_drv = (struct eth_driver *)pci_drv;

-   /* Cr

[dpdk-dev] [PATCH v5 11/17] eal/linux: move back interrupt thread init before setting affinity

2016-06-22 Thread Shreyansh Jain

Now that virtio pci driver is initialized in a constructor, iopl() stuff
happens early enough so that interrupt thread can be created right after
plugin loading.
This way, chelsio driver should be happy again [1].

[1] http://dpdk.org/ml/archives/dev/2015-November/028289.html

Signed-off-by: David Marchand 
Tested-by: Rahul Lakkireddy 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_eal/linuxapp/eal/eal.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 29fba52..748daca 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -821,6 +821,9 @@ rte_eal_init(int argc, char **argv)
if (eal_plugins_init() < 0)
rte_panic("Cannot init plugins\n");

+   if (rte_eal_intr_init() < 0)
+   rte_panic("Cannot init interrupt-handling thread\n");
+
eal_thread_init_master(rte_config.master_lcore);

ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
@@ -832,9 +835,6 @@ rte_eal_init(int argc, char **argv)
if (rte_eal_dev_init() < 0)
rte_panic("Cannot init pmd devices\n");

-   if (rte_eal_intr_init() < 0)
-   rte_panic("Cannot init interrupt-handling thread\n");
-
RTE_LCORE_FOREACH_SLAVE(i) {

/*
-- 
2.7.4

[dpdk-dev] [PATCH v5 10/17] ethdev: get rid of eth driver register callback

2016-06-22 Thread Shreyansh Jain

Now that all pdev are pci drivers, we don't need to register ethdev drivers
through a dedicated channel.

Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_ether/rte_ethdev.c  | 22 --
 lib/librte_ether/rte_ethdev.h  | 12 
 lib/librte_ether/rte_ether_version.map |  1 -
 3 files changed, 35 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 312c42c..06065fe 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -340,28 +340,6 @@ rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
return 0;
 }

-/**
- * Register an Ethernet [Poll Mode] driver.
- *
- * Function invoked by the initialization function of an Ethernet driver
- * to simultaneously register itself as a PCI driver and as an Ethernet
- * Poll Mode Driver.
- * Invokes the rte_eal_pci_register() function to register the *pci_drv*
- * structure embedded in the *eth_drv* structure, after having stored the
- * address of the rte_eth_dev_init() function in the *devinit* field of
- * the *pci_drv* structure.
- * During the PCI probing phase, the rte_eth_dev_init() function is
- * invoked for each PCI [Ethernet device] matching the embedded PCI
- * identifiers provided by the driver.
- */
-void
-rte_eth_driver_register(struct eth_driver *eth_drv)
-{
-   eth_drv->pci_drv.devinit = rte_eth_dev_pci_probe;
-   eth_drv->pci_drv.devuninit = rte_eth_dev_pci_remove;
-   rte_eal_pci_register(ð_drv->pci_drv);
-}
-
 int
 rte_eth_dev_is_valid_port(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2249466..ffd24e4 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1862,18 +1862,6 @@ struct eth_driver {
 };

 /**
- * @internal
- * A function invoked by the initialization function of an Ethernet driver
- * to simultaneously register itself as a PCI driver and as an Ethernet
- * Poll Mode Driver (PMD).
- *
- * @param eth_drv
- *   The pointer to the *eth_driver* structure associated with
- *   the Ethernet driver.
- */
-void rte_eth_driver_register(struct eth_driver *eth_drv);
-
-/**
  * Convert a numerical speed in Mbps to a bitmap flag that can be used in
  * the bitmap link_speeds of the struct rte_eth_conf
  *
diff --git a/lib/librte_ether/rte_ether_version.map 
b/lib/librte_ether/rte_ether_version.map
index cf4581c..8151007 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -80,7 +80,6 @@ DPDK_2.2 {
rte_eth_dev_vlan_filter;
rte_eth_dev_wd_timeout_store;
rte_eth_dma_zone_reserve;
-   rte_eth_driver_register;
rte_eth_led_off;
rte_eth_led_on;
rte_eth_link;
-- 
2.7.4

[dpdk-dev] [PATCH v5 09/17] crypto: get rid of crypto driver register callback

2016-06-22 Thread Shreyansh Jain

Now that all pdev are pci drivers, we don't need to register crypto drivers
through a dedicated channel.

Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_cryptodev/rte_cryptodev.c   | 22 ---
 lib/librte_cryptodev/rte_cryptodev_pmd.h   | 30 --
 lib/librte_cryptodev/rte_cryptodev_version.map |  1 -
 3 files changed, 53 deletions(-)

diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
b/lib/librte_cryptodev/rte_cryptodev.c
index 65a2e29..a7cb33a 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -444,28 +444,6 @@ rte_cryptodev_pci_remove(struct rte_pci_device *pci_dev)
return 0;
 }

-int
-rte_cryptodev_pmd_driver_register(struct rte_cryptodev_driver *cryptodrv,
-   enum pmd_type type)
-{
-   /* Call crypto device initialization directly if device is virtual */
-   if (type == PMD_VDEV)
-   return rte_cryptodev_pci_probe((struct rte_pci_driver 
*)cryptodrv,
-   NULL);
-
-   /*
-* Register PCI driver for physical device intialisation during
-* PCI probing
-*/
-   cryptodrv->pci_drv.devinit = rte_cryptodev_pci_probe;
-   cryptodrv->pci_drv.devuninit = rte_cryptodev_pci_remove;
-
-   rte_eal_pci_register(&cryptodrv->pci_drv);
-
-   return 0;
-}
-
-
 uint16_t
 rte_cryptodev_queue_pair_count(uint8_t dev_id)
 {
diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h 
b/lib/librte_cryptodev/rte_cryptodev_pmd.h
index 3fb7c7c..99fd69e 100644
--- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
+++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
@@ -491,36 +491,6 @@ rte_cryptodev_pmd_virtual_dev_init(const char *name, 
size_t dev_private_size,
 extern int
 rte_cryptodev_pmd_release_device(struct rte_cryptodev *cryptodev);

-
-/**
- * Register a Crypto [Poll Mode] driver.
- *
- * Function invoked by the initialization function of a Crypto driver
- * to simultaneously register itself as Crypto Poll Mode Driver and to either:
- *
- * a - register itself as PCI driver if the crypto device is a physical
- * device, by invoking the rte_eal_pci_register() function to
- * register the *pci_drv* structure embedded in the *crypto_drv*
- * structure, after having stored the address of the
- * rte_cryptodev_init() function in the *devinit* field of the
- * *pci_drv* structure.
- *
- * During the PCI probing phase, the rte_cryptodev_init()
- * function is invoked for each PCI [device] matching the
- * embedded PCI identifiers provided by the driver.
- *
- * b, complete the initialization sequence if the device is a virtual
- * device by calling the rte_cryptodev_init() directly passing a
- * NULL parameter for the rte_pci_device structure.
- *
- *   @param crypto_drv crypto_driver structure associated with the crypto
- * driver.
- *   @param type   pmd type
- */
-extern int
-rte_cryptodev_pmd_driver_register(struct rte_cryptodev_driver *crypto_drv,
-   enum pmd_type type);
-
 /**
  * Executes all the user application registered callbacks for the specific
  * device.
diff --git a/lib/librte_cryptodev/rte_cryptodev_version.map 
b/lib/librte_cryptodev/rte_cryptodev_version.map
index 8d0edfb..e0a9620 100644
--- a/lib/librte_cryptodev/rte_cryptodev_version.map
+++ b/lib/librte_cryptodev/rte_cryptodev_version.map
@@ -14,7 +14,6 @@ DPDK_16.04 {
rte_cryptodev_info_get;
rte_cryptodev_pmd_allocate;
rte_cryptodev_pmd_callback_process;
-   rte_cryptodev_pmd_driver_register;
rte_cryptodev_pmd_release_device;
rte_cryptodev_pmd_virtual_dev_init;
rte_cryptodev_sym_session_create;
-- 
2.7.4

[dpdk-dev] [PATCH v5 08/17] drivers: convert all pdev drivers as pci drivers

2016-06-22 Thread Shreyansh Jain

Simplify crypto and ethdev pci drivers init by using newly introduced
init macros and helpers.
Those drivers then don't need to register as "rte_driver"s anymore.

virtio and mlx* drivers use the general purpose RTE_INIT macro, as they both
need some special stuff to be done before registering a pci driver.

Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 drivers/crypto/qat/rte_qat_cryptodev.c  | 16 +++
 drivers/net/bnx2x/bnx2x_ethdev.c| 35 +---
 drivers/net/cxgbe/cxgbe_ethdev.c| 24 +++--
 drivers/net/e1000/em_ethdev.c   | 16 +++
 drivers/net/e1000/igb_ethdev.c  | 40 +---
 drivers/net/ena/ena_ethdev.c| 18 +++--
 drivers/net/enic/enic_ethdev.c  | 23 +++-
 drivers/net/fm10k/fm10k_ethdev.c| 23 +++-
 drivers/net/i40e/i40e_ethdev.c  | 26 +++---
 drivers/net/i40e/i40e_ethdev_vf.c   | 25 +++---
 drivers/net/ixgbe/ixgbe_ethdev.c| 47 +
 drivers/net/mlx4/mlx4.c | 20 +++---
 drivers/net/mlx5/mlx5.c | 19 +++--
 drivers/net/nfp/nfp_net.c   | 21 +++
 drivers/net/qede/qede_ethdev.c  | 40 ++--
 drivers/net/szedata2/rte_eth_szedata2.c | 25 +++---
 drivers/net/virtio/virtio_ethdev.c  | 26 +-
 drivers/net/vmxnet3/vmxnet3_ethdev.c| 23 +++-
 18 files changed, 76 insertions(+), 391 deletions(-)

diff --git a/drivers/crypto/qat/rte_qat_cryptodev.c 
b/drivers/crypto/qat/rte_qat_cryptodev.c
index 0ff3944..970970a 100644
--- a/drivers/crypto/qat/rte_qat_cryptodev.c
+++ b/drivers/crypto/qat/rte_qat_cryptodev.c
@@ -117,21 +117,11 @@ static struct rte_cryptodev_driver rte_qat_pmd = {
.name = "rte_qat_pmd",
.id_table = pci_id_qat_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+   .devinit = rte_cryptodev_pci_probe,
+   .devuninit = rte_cryptodev_pci_remove,
},
.cryptodev_init = crypto_qat_dev_init,
.dev_private_size = sizeof(struct qat_pmd_private),
 };

-static int
-rte_qat_pmd_init(const char *name __rte_unused, const char *params 
__rte_unused)
-{
-   PMD_INIT_FUNC_TRACE();
-   return rte_cryptodev_pmd_driver_register(&rte_qat_pmd, PMD_PDEV);
-}
-
-static struct rte_driver pmd_qat_drv = {
-   .type = PMD_PDEV,
-   .init = rte_qat_pmd_init,
-};
-
-PMD_REGISTER_DRIVER(pmd_qat_drv);
+RTE_EAL_PCI_REGISTER(rte_qat_pmd);
diff --git a/drivers/net/bnx2x/bnx2x_ethdev.c b/drivers/net/bnx2x/bnx2x_ethdev.c
index 071b44f..5ab3c75 100644
--- a/drivers/net/bnx2x/bnx2x_ethdev.c
+++ b/drivers/net/bnx2x/bnx2x_ethdev.c
@@ -506,11 +506,15 @@ static struct eth_driver rte_bnx2x_pmd = {
.name = "rte_bnx2x_pmd",
.id_table = pci_id_bnx2x_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+   .devinit = rte_eth_dev_pci_probe,
+   .devuninit = rte_eth_dev_pci_remove,
},
.eth_dev_init = eth_bnx2x_dev_init,
.dev_private_size = sizeof(struct bnx2x_softc),
 };

+RTE_EAL_PCI_REGISTER(rte_bnx2x_pmd);
+
 /*
  * virtual function driver struct
  */
@@ -519,36 +523,11 @@ static struct eth_driver rte_bnx2xvf_pmd = {
.name = "rte_bnx2xvf_pmd",
.id_table = pci_id_bnx2xvf_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+   .devinit = rte_eth_dev_pci_probe,
+   .devuninit = rte_eth_dev_pci_remove,
},
.eth_dev_init = eth_bnx2xvf_dev_init,
.dev_private_size = sizeof(struct bnx2x_softc),
 };

-static int rte_bnx2x_pmd_init(const char *name __rte_unused, const char 
*params __rte_unused)
-{
-   PMD_INIT_FUNC_TRACE();
-   rte_eth_driver_register(&rte_bnx2x_pmd);
-
-   return 0;
-}
-
-static int rte_bnx2xvf_pmd_init(const char *name __rte_unused, const char 
*params __rte_unused)
-{
-   PMD_INIT_FUNC_TRACE();
-   rte_eth_driver_register(&rte_bnx2xvf_pmd);
-
-   return 0;
-}
-
-static struct rte_driver rte_bnx2x_driver = {
-   .type = PMD_PDEV,
-   .init = rte_bnx2x_pmd_init,
-};
-
-static struct rte_driver rte_bnx2xvf_driver = {
-   .type = PMD_PDEV,
-   .init = rte_bnx2xvf_pmd_init,
-};
-
-PMD_REGISTER_DRIVER(rte_bnx2x_driver);
-PMD_REGISTER_DRIVER(rte_bnx2xvf_driver);
+RTE_EAL_PCI_REGISTER(rte_bnx2xvf_pmd);
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index 04eddaf..1389371 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -869,29 +869,11 @@ static struct eth_driver rte_cxgbe_pmd = {
.name = "rte_cxgbe_pmd",
.id_table = cxgb4_pci_tbl,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+

[dpdk-dev] [PATCH v5 07/17] ethdev: export init/uninit common wrappers for pci drivers

2016-06-22 Thread Shreyansh Jain

Preparing for getting rid of eth_drv, here are two wrappers that can be
used by pci drivers that assume a 1 to 1 association between pci resource and
upper interface.

Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_ether/rte_ethdev.c  | 14 +++---
 lib/librte_ether/rte_ethdev.h  | 13 +
 lib/librte_ether/rte_ether_version.map |  3 +++
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 42aaef7..312c42c 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -245,9 +245,9 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
return 0;
 }

-static int
-rte_eth_dev_init(struct rte_pci_driver *pci_drv,
-struct rte_pci_device *pci_dev)
+int
+rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv,
+ struct rte_pci_device *pci_dev)
 {
struct eth_driver*eth_drv;
struct rte_eth_dev *eth_dev;
@@ -299,8 +299,8 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
return diag;
 }

-static int
-rte_eth_dev_uninit(struct rte_pci_device *pci_dev)
+int
+rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
 {
const struct eth_driver *eth_drv;
struct rte_eth_dev *eth_dev;
@@ -357,8 +357,8 @@ rte_eth_dev_uninit(struct rte_pci_device *pci_dev)
 void
 rte_eth_driver_register(struct eth_driver *eth_drv)
 {
-   eth_drv->pci_drv.devinit = rte_eth_dev_init;
-   eth_drv->pci_drv.devuninit = rte_eth_dev_uninit;
+   eth_drv->pci_drv.devinit = rte_eth_dev_pci_probe;
+   eth_drv->pci_drv.devuninit = rte_eth_dev_pci_remove;
rte_eal_pci_register(ð_drv->pci_drv);
 }

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index bd93bf6..2249466 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -4354,6 +4354,19 @@ rte_eth_dev_get_port_by_name(const char *name, uint8_t 
*port_id);
 int
 rte_eth_dev_get_name_by_port(uint8_t port_id, char *name);

+/**
+ * Wrapper for use by pci drivers as a .devinit function to attach to a ethdev
+ * interface.
+ */
+int rte_eth_dev_pci_probe(struct rte_pci_driver *pci_drv,
+ struct rte_pci_device *pci_dev);
+
+/**
+ * Wrapper for use by pci drivers as a .devuninit function to detach a ethdev
+ * interface.
+ */
+int rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ether/rte_ether_version.map 
b/lib/librte_ether/rte_ether_version.map
index 97ed0b0..cf4581c 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -140,4 +140,7 @@ DPDK_16.07 {
rte_eth_dev_get_name_by_port;
rte_eth_dev_get_port_by_name;
rte_eth_xstats_get_names;
+   rte_eth_dev_pci_probe;
+   rte_eth_dev_pci_remove;
+
 } DPDK_16.04;
-- 
2.7.4

[dpdk-dev] [PATCH v5 06/17] crypto: export init/uninit common wrappers for pci drivers

2016-06-22 Thread Shreyansh Jain

Preparing for getting rid of rte_cryptodev_driver, here are two wrappers
that can be used by pci drivers that assume a 1 to 1 association between
pci resource and upper interface.

Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_cryptodev/rte_cryptodev.c   | 16 
 lib/librte_cryptodev/rte_cryptodev_pmd.h   | 12 
 lib/librte_cryptodev/rte_cryptodev_version.map |  8 
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
b/lib/librte_cryptodev/rte_cryptodev.c
index b0d806c..65a2e29 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -340,9 +340,9 @@ rte_cryptodev_pmd_virtual_dev_init(const char *name, size_t 
dev_private_size,
return cryptodev;
 }

-static int
-rte_cryptodev_init(struct rte_pci_driver *pci_drv,
-   struct rte_pci_device *pci_dev)
+int
+rte_cryptodev_pci_probe(struct rte_pci_driver *pci_drv,
+   struct rte_pci_device *pci_dev)
 {
struct rte_cryptodev_driver *cryptodrv;
struct rte_cryptodev *cryptodev;
@@ -401,8 +401,8 @@ rte_cryptodev_init(struct rte_pci_driver *pci_drv,
return -ENXIO;
 }

-static int
-rte_cryptodev_uninit(struct rte_pci_device *pci_dev)
+int
+rte_cryptodev_pci_remove(struct rte_pci_device *pci_dev)
 {
const struct rte_cryptodev_driver *cryptodrv;
struct rte_cryptodev *cryptodev;
@@ -450,15 +450,15 @@ rte_cryptodev_pmd_driver_register(struct 
rte_cryptodev_driver *cryptodrv,
 {
/* Call crypto device initialization directly if device is virtual */
if (type == PMD_VDEV)
-   return rte_cryptodev_init((struct rte_pci_driver *)cryptodrv,
+   return rte_cryptodev_pci_probe((struct rte_pci_driver 
*)cryptodrv,
NULL);

/*
 * Register PCI driver for physical device intialisation during
 * PCI probing
 */
-   cryptodrv->pci_drv.devinit = rte_cryptodev_init;
-   cryptodrv->pci_drv.devuninit = rte_cryptodev_uninit;
+   cryptodrv->pci_drv.devinit = rte_cryptodev_pci_probe;
+   cryptodrv->pci_drv.devuninit = rte_cryptodev_pci_remove;

rte_eal_pci_register(&cryptodrv->pci_drv);

diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h 
b/lib/librte_cryptodev/rte_cryptodev_pmd.h
index c977c61..3fb7c7c 100644
--- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
+++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
@@ -534,6 +534,18 @@ rte_cryptodev_pmd_driver_register(struct 
rte_cryptodev_driver *crypto_drv,
 void rte_cryptodev_pmd_callback_process(struct rte_cryptodev *dev,
enum rte_cryptodev_event_type event);

+/**
+ * Wrapper for use by pci drivers as a .devinit function to attach to a crypto
+ * interface.
+ */
+int rte_cryptodev_pci_probe(struct rte_pci_driver *pci_drv,
+   struct rte_pci_device *pci_dev);
+
+/**
+ * Wrapper for use by pci drivers as a .devuninit function to detach a crypto
+ * interface.
+ */
+int rte_cryptodev_pci_remove(struct rte_pci_device *pci_dev);

 #ifdef __cplusplus
 }
diff --git a/lib/librte_cryptodev/rte_cryptodev_version.map 
b/lib/librte_cryptodev/rte_cryptodev_version.map
index 41004e1..8d0edfb 100644
--- a/lib/librte_cryptodev/rte_cryptodev_version.map
+++ b/lib/librte_cryptodev/rte_cryptodev_version.map
@@ -32,3 +32,11 @@ DPDK_16.04 {

local: *;
 };
+
+DPDK_16.07 {
+   global:
+
+   rte_cryptodev_pci_probe;
+   rte_cryptodev_pci_remove;
+
+} DPDK_16.04;
-- 
2.7.4

[dpdk-dev] [PATCH v5 05/17] eal: introduce init macros

2016-06-22 Thread Shreyansh Jain

Introduce a RTE_INIT macro used to mark an init function as a constructor.
Current eal macros have been converted to use this (no functional impact).
RTE_EAL_PCI_REGISTER is added as a helper for pci drivers.

Suggested-by: Jan Viktorin 
Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_eal/common/include/rte_dev.h   | 4 ++--
 lib/librte_eal/common/include/rte_eal.h   | 3 +++
 lib/librte_eal/common/include/rte_pci.h   | 8 
 lib/librte_eal/common/include/rte_tailq.h | 4 ++--
 4 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index f1b5507..85e48f2 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -179,8 +179,8 @@ int rte_eal_vdev_init(const char *name, const char *args);
 int rte_eal_vdev_uninit(const char *name);

 #define PMD_REGISTER_DRIVER(d)\
-void devinitfn_ ##d(void);\
-void __attribute__((constructor, used)) devinitfn_ ##d(void)\
+RTE_INIT(devinitfn_ ##d);\
+static void devinitfn_ ##d(void)\
 {\
rte_eal_driver_register(&d);\
 }
diff --git a/lib/librte_eal/common/include/rte_eal.h 
b/lib/librte_eal/common/include/rte_eal.h
index a71d6f5..186f3c6 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -252,6 +252,9 @@ static inline int rte_gettid(void)
return RTE_PER_LCORE(_thread_id);
 }

+#define RTE_INIT(func) \
+static void __attribute__((constructor, used)) func(void)
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index fa74962..ac890fc 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -470,6 +470,14 @@ void rte_eal_pci_dump(FILE *f);
  */
 void rte_eal_pci_register(struct rte_pci_driver *driver);

+/** Helper for PCI device registeration from driver (eth, crypto) instance */
+#define RTE_EAL_PCI_REGISTER(name) \
+RTE_INIT(pciinitfn_ ##name); \
+static void pciinitfn_ ##name(void) \
+{ \
+   rte_eal_pci_register(&(name).pci_drv); \
+}
+
 /**
  * Unregister a PCI driver.
  *
diff --git a/lib/librte_eal/common/include/rte_tailq.h 
b/lib/librte_eal/common/include/rte_tailq.h
index 4a686e6..71ed3bb 100644
--- a/lib/librte_eal/common/include/rte_tailq.h
+++ b/lib/librte_eal/common/include/rte_tailq.h
@@ -148,8 +148,8 @@ struct rte_tailq_head *rte_eal_tailq_lookup(const char 
*name);
 int rte_eal_tailq_register(struct rte_tailq_elem *t);

 #define EAL_REGISTER_TAILQ(t) \
-void tailqinitfn_ ##t(void); \
-void __attribute__((constructor, used)) tailqinitfn_ ##t(void) \
+RTE_INIT(tailqinitfn_ ##t); \
+static void tailqinitfn_ ##t(void) \
 { \
if (rte_eal_tailq_register(&t) < 0) \
rte_panic("Cannot initialize tailq: %s\n", t.name); \
-- 
2.7.4

[dpdk-dev] [PATCH v5 04/17] eal: remove duplicate function declaration

2016-06-22 Thread Shreyansh Jain

rte_eal_dev_init is declared in both eal_private.h and rte_dev.h since its
introduction.
This function has been exported in ABI, so remove it from eal_private.h

Fixes: e57f20e05177 ("eal: make vdev init path generic for both virtual and pci 
devices")
Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_eal/common/eal_private.h | 7 ---
 lib/librte_eal/linuxapp/eal/eal.c   | 1 +
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 857dc3e..06a68f6 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -259,13 +259,6 @@ int rte_eal_intr_init(void);
 int rte_eal_alarm_init(void);

 /**
- * This function initialises any virtual devices
- *
- * This function is private to the EAL.
- */
-int rte_eal_dev_init(void);
-
-/**
  * Function is to check if the kernel module(like, vfio, vfio_iommu_type1,
  * etc.) loaded.
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 4f22c18..29fba52 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -70,6 +70,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.7.4

[dpdk-dev] [PATCH v5 03/17] drivers: align pci driver definitions

2016-06-22 Thread Shreyansh Jain

Pure coding style, but it might make it easier later if we want to move
fields in rte_cryptodev_driver and eth_driver structures.

Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 drivers/crypto/qat/rte_qat_cryptodev.c | 2 +-
 drivers/net/ena/ena_ethdev.c   | 2 +-
 drivers/net/nfp/nfp_net.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/qat/rte_qat_cryptodev.c 
b/drivers/crypto/qat/rte_qat_cryptodev.c
index f46ec85..0ff3944 100644
--- a/drivers/crypto/qat/rte_qat_cryptodev.c
+++ b/drivers/crypto/qat/rte_qat_cryptodev.c
@@ -113,7 +113,7 @@ crypto_qat_dev_init(__attribute__((unused)) struct 
rte_cryptodev_driver *crypto_
 }

 static struct rte_cryptodev_driver rte_qat_pmd = {
-   {
+   .pci_drv = {
.name = "rte_qat_pmd",
.id_table = pci_id_qat_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index e157587..8d01e9a 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -1427,7 +1427,7 @@ static uint16_t eth_ena_xmit_pkts(void *tx_queue, struct 
rte_mbuf **tx_pkts,
 }

 static struct eth_driver rte_ena_pmd = {
-   {
+   .pci_drv = {
.name = "rte_ena_pmd",
.id_table = pci_id_ena_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index 5c9f350..ef7011e 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -2463,7 +2463,7 @@ static struct rte_pci_id pci_id_nfp_net_map[] = {
 };

 static struct eth_driver rte_nfp_net_pmd = {
-   {
+   .pci_drv = {
.name = "rte_nfp_net_pmd",
.id_table = pci_id_nfp_net_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
-- 
2.7.4

[dpdk-dev] [PATCH v5 02/17] crypto: no need for a crypto pmd type

2016-06-22 Thread Shreyansh Jain

This information is not used and just adds noise.

Signed-off-by: David Marchand 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_cryptodev/rte_cryptodev.c | 8 +++-
 lib/librte_cryptodev/rte_cryptodev.h | 2 --
 lib/librte_cryptodev/rte_cryptodev_pmd.h | 3 +--
 3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
b/lib/librte_cryptodev/rte_cryptodev.c
index 960e2d5..b0d806c 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -230,7 +230,7 @@ rte_cryptodev_find_free_device_index(void)
 }

 struct rte_cryptodev *
-rte_cryptodev_pmd_allocate(const char *name, enum pmd_type type, int socket_id)
+rte_cryptodev_pmd_allocate(const char *name, int socket_id)
 {
struct rte_cryptodev *cryptodev;
uint8_t dev_id;
@@ -269,7 +269,6 @@ rte_cryptodev_pmd_allocate(const char *name, enum pmd_type 
type, int socket_id)
cryptodev->data->dev_started = 0;

cryptodev->attached = RTE_CRYPTODEV_ATTACHED;
-   cryptodev->pmd_type = type;

cryptodev_globals.nb_devs++;
}
@@ -318,7 +317,7 @@ rte_cryptodev_pmd_virtual_dev_init(const char *name, size_t 
dev_private_size,
struct rte_cryptodev *cryptodev;

/* allocate device structure */
-   cryptodev = rte_cryptodev_pmd_allocate(name, PMD_VDEV, socket_id);
+   cryptodev = rte_cryptodev_pmd_allocate(name, socket_id);
if (cryptodev == NULL)
return NULL;

@@ -360,8 +359,7 @@ rte_cryptodev_init(struct rte_pci_driver *pci_drv,
rte_cryptodev_create_unique_device_name(cryptodev_name,
sizeof(cryptodev_name), pci_dev);

-   cryptodev = rte_cryptodev_pmd_allocate(cryptodev_name, PMD_PDEV,
-   rte_socket_id());
+   cryptodev = rte_cryptodev_pmd_allocate(cryptodev_name, rte_socket_id());
if (cryptodev == NULL)
return -ENOMEM;

diff --git a/lib/librte_cryptodev/rte_cryptodev.h 
b/lib/librte_cryptodev/rte_cryptodev.h
index 27cf8ef..f22eb43 100644
--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -700,8 +700,6 @@ struct rte_cryptodev {

enum rte_cryptodev_type dev_type;
/**< Crypto device type */
-   enum pmd_type pmd_type;
-   /**< PMD type - PDEV / VDEV */

struct rte_cryptodev_cb_list link_intr_cbs;
/**< User application callback for interrupts if present */
diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h 
b/lib/librte_cryptodev/rte_cryptodev_pmd.h
index 7d049ea..c977c61 100644
--- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
+++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
@@ -454,13 +454,12 @@ struct rte_cryptodev_ops {
  * to that slot for the driver to use.
  *
  * @param  nameUnique identifier name for each device
- * @param  typeDevice type of this Crypto device
  * @param  socket_id   Socket to allocate resources on.
  * @return
  *   - Slot in the rte_dev_devices array for a new device;
  */
 struct rte_cryptodev *
-rte_cryptodev_pmd_allocate(const char *name, enum pmd_type type, int 
socket_id);
+rte_cryptodev_pmd_allocate(const char *name, int socket_id);

 /**
  * Creates a new virtual crypto device and returns the pointer
-- 
2.7.4

[dpdk-dev] [PATCH v5 01/17] pci: no need for dynamic tailq init

2016-06-22 Thread Shreyansh Jain

These lists can be initialized once and for all at build time.
With this, those lists are only manipulated in a common place
(and we could even make them private).

A nice side effect is that pci drivers can now register in constructors.

Signed-off-by: David Marchand 
Reviewed-by: Jan Viktorin 
Signed-off-by: Shreyansh Jain 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c| 3 ---
 lib/librte_eal/common/eal_common_pci.c | 6 --
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 3 ---
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 7fdd6f1..880483d 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -623,9 +623,6 @@ rte_eal_pci_ioport_unmap(struct rte_pci_ioport *p)
 int
 rte_eal_pci_init(void)
 {
-   TAILQ_INIT(&pci_driver_list);
-   TAILQ_INIT(&pci_device_list);
-
/* for debug purposes, PCI can be disabled */
if (internal_config.no_pci)
return 0;
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index ba5283d..fee4aa5 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -82,8 +82,10 @@

 #include "eal_private.h"

-struct pci_driver_list pci_driver_list;
-struct pci_device_list pci_device_list;
+struct pci_driver_list pci_driver_list =
+   TAILQ_HEAD_INITIALIZER(pci_driver_list);
+struct pci_device_list pci_device_list =
+   TAILQ_HEAD_INITIALIZER(pci_device_list);

 #define SYSFS_PCI_DEVICES "/sys/bus/pci/devices"

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index f9c3efd..bfc410f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -743,9 +743,6 @@ rte_eal_pci_ioport_unmap(struct rte_pci_ioport *p)
 int
 rte_eal_pci_init(void)
 {
-   TAILQ_INIT(&pci_driver_list);
-   TAILQ_INIT(&pci_device_list);
-
/* for debug purposes, PCI can be disabled */
if (internal_config.no_pci)
return 0;
-- 
2.7.4

[dpdk-dev] [PATCH v5 00/17] Prepare for rte_device / rte_driver

2016-06-22 Thread Shreyansh Jain

* Original patch series is from David Marchand [1], [2].
* Cover letter text has been modified to make it author agnostic

David created the original patchset based on the discussions on list [3].
Being a large piece of work, this patchset introduces first level of changes
for generalizing the driver-device relationship for supporting hotplug.

Pending work, as per discussions in thread [3]:
- Heirarchical relationship between rte_driver/device, pci_*, crypto_*
- Cleaner device init/deinit methods (probably from rte_driver onwards)
- Moving generic flags/fields from pci_* structure to rte_* structure
- Removing dependency on devargs for pdev/vdev distinction
- Device/Driver lists: discussion and decision on separate or unified lists

This patchset is based on master (34d279)

Changes since v4:
- Fix compilation issue after rebase on HEAD (913154e) in previous series
- Retain rte_eth_dev_get_port_by_name and rte_eth_dev_get_name_by_port which
  were removed by previous patchset. These are being used by pdump library

Changes since v3:
- rebase over HEAD (913154e)
- Update arguments to RTE_EAL_PCI_REGISTER macro as per Jan's suggestion
- modify qede driver to use RTE_EAL_PCI_REGISTER
- Argument check in hotplug functions

Changes since v2:
- rebase over HEAD (d76c193)
- Move SYSFS_PCI_DRIVERS macro to rte_pci.h to avoid compilation issue

Changes since v1:
- rebased on HEAD, new drivers should be okay
- patches have been split into smaller pieces
- RTE_INIT macro has been added, but in the end, I am not sure it is useful
- device type has been removed from ethdev, as it was used only by hotplug
- getting rid of pmd type in eal patch (patch 5 of initial series) has been
  dropped for now, we can do this once vdev drivers have been converted

[1] http://dpdk.org/ml/archives/dev/2016-January/032387.html
[2] http://dpdk.org/ml/archives/dev/2016-April/037686.html
[3] http://dpdk.org/ml/archives/dev/2016-January/031390.html

David Marchand, Shreyansh Jain (17):
  pci: no need for dynamic tailq init
  crypto: no need for a crypto pmd type
  drivers: align pci driver definitions
  eal: remove duplicate function declaration
  eal: introduce init macros
  crypto: export init/uninit common wrappers for pci drivers
  ethdev: export init/uninit common wrappers for pci drivers
  drivers: convert all pdev drivers as pci drivers
  crypto: get rid of crypto driver register callback
  ethdev: get rid of eth driver register callback
  eal/linux: move back interrupt thread init before setting affinity
  pci: add a helper for device name
  pci: add a helper to update a device
  ethdev: do not scan all pci devices on attach
  eal: add hotplug operations for pci and vdev
  ethdev: convert to eal hotplug
  ethdev: get rid of device type

 app/test/virtual_pmd.c  |   2 +-
 drivers/crypto/qat/rte_qat_cryptodev.c  |  18 +-
 drivers/net/af_packet/rte_eth_af_packet.c   |   2 +-
 drivers/net/bnx2x/bnx2x_ethdev.c|  35 +--
 drivers/net/bonding/rte_eth_bond_api.c  |   2 +-
 drivers/net/cxgbe/cxgbe_ethdev.c|  24 +--
 drivers/net/cxgbe/cxgbe_main.c  |   2 +-
 drivers/net/e1000/em_ethdev.c   |  16 +-
 drivers/net/e1000/igb_ethdev.c  |  40 +---
 drivers/net/ena/ena_ethdev.c|  20 +-
 drivers/net/enic/enic_ethdev.c  |  23 +-
 drivers/net/fm10k/fm10k_ethdev.c|  23 +-
 drivers/net/i40e/i40e_ethdev.c  |  26 +--
 drivers/net/i40e/i40e_ethdev_vf.c   |  25 +--
 drivers/net/ixgbe/ixgbe_ethdev.c|  47 +---
 drivers/net/mlx4/mlx4.c |  22 +-
 drivers/net/mlx5/mlx5.c |  21 +-
 drivers/net/mpipe/mpipe_tilegx.c|   2 +-
 drivers/net/nfp/nfp_net.c   |  23 +-
 drivers/net/null/rte_eth_null.c |   2 +-
 drivers/net/pcap/rte_eth_pcap.c |   2 +-
 drivers/net/qede/qede_ethdev.c  |  40 +---
 drivers/net/ring/rte_eth_ring.c |   2 +-
 drivers/net/szedata2/rte_eth_szedata2.c |  25 +--
 drivers/net/vhost/rte_eth_vhost.c   |   2 +-
 drivers/net/virtio/virtio_ethdev.c  |  26 +--
 drivers/net/vmxnet3/vmxnet3_ethdev.c|  23 +-
 drivers/net/xenvirt/rte_eth_xenvirt.c   |   2 +-
 examples/ip_pipeline/init.c |  22 --
 lib/librte_cryptodev/rte_cryptodev.c|  67 ++
 lib/librte_cryptodev/rte_cryptodev.h|   2 -
 lib/librte_cryptodev/rte_cryptodev_pmd.h|  45 ++--
 lib/librte_cryptodev/rte_cryptodev_version.map  |   9 +-
 lib/librte_eal/bsdapp/eal/eal_pci.c |  52 -
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   2 +
 lib/librte_eal/common/eal_common_dev.c  |  47 
 lib/librte_eal/common/eal_common_pci.c  |  19 +-
 lib/librte_eal/common/eal_private.h |  20 +-
 lib/librte_

[dpdk-dev] [PATCH] port: fix build when KNI support is not enabled

2016-06-22 Thread Panu Matilainen

Commit 9fc37d1c071c is missing a conditional in the dependencies,
causing builds to fail when KNI is not enabled:
== Build lib/librte_port
  LD librte_port.so.3
/usr/bin/ld: cannot find -lrte_kni
collect2: error: ld returned 1 exit status

Fixes: 9fc37d1c071c ("port: support KNI")

Signed-off-by: Panu Matilainen 
---
 lib/librte_port/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_port/Makefile b/lib/librte_port/Makefile
index 0fc929b..3d84a0e 100644
--- a/lib/librte_port/Makefile
+++ b/lib/librte_port/Makefile
@@ -82,6 +82,8 @@ DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_mempool
 DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_ether
 DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_ip_frag
 DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_sched
+ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_kni
+endif

 include $(RTE_SDK)/mk/rte.lib.mk
-- 
2.5.5

[dpdk-dev] [dpdk-users] Get interface speed through IOCTL

2016-06-22 Thread Thomas Monjalon

2016-06-22 17:49, Gadre Nayan:
> I need to get the interface speed though standard IOCTL call:
[...]
> 
> However, for 10G interface I do not read a correct speed, since it may
> not be supported.

Yes

> Out 1G cards are I350 and 10G card is I40.
[...]
> There is no support for 1base. So to get the ioctl working for 10G
> card, is it a trivial change of adding few more support options and
> adding another case SPEED_1 clause, or is it more involved ?

You are talking about KNI which has some old ethtool support for
some igb and ixgbe NICs.
My opinion is that we should drop this ethtool support which do not
compile everywhere and has a very limited support.

If you want to get the speed information, you have several other ways
to explore:
- communicating with the DPDK application
- make a secondary application process
- add the feature in the secondary process app/proc_info
- use examples/ethtool (and check if it can be promoted in lib/)
- implement a kernel module upstream

If you just need to get the speed info, I feel app/proc_info would be
a good idea.

[dpdk-dev] librte_meter compilation fails on IBM Power8

2016-06-22 Thread Nélio Laranjeiro

Hi Cristian, Chao,

I have encountered a compilation failure on IBM Power8 when compiling
master branch with EXTRA_CFLAGS='-O0 -g':

  /root/nl/dpdk.org/build/lib/librte_meter.a(rte_meter.o): In function 
`rte_meter_get_tb_params':
  /root/nl/dpdk.org/lib/librte_meter/rte_meter.c:57: undefined reference to 
`ceil'

Seems related to commit 43f4364d.

I don't have the time to search more deeply, I hope it can help.

Regards,

-- 
N?lio Laranjeiro
6WIND

[dpdk-dev] segfault in vt_pci_init

2016-06-22 Thread Thomas F Herbert

All,

I am trying to debug a problem:

I am running testpmd in a VM.

The stack trace is below.

Running uio_pci_generic. kvm-qemu 2.5.3, Centos 7
CentOS Linux release 7.2.1511 (Core)


Problem can be reproduced using either igb_uio or virtio_pci kernel drivers.

Currently running 3.18.35 kernel but problem can be reproduced with 3.10 
kernel shipped with Centos.

Problem originally was scene with 16.04. Now reproduced from top of 
master commit, 34d279906d588e349f3e69020a94d7f932bdf099

Currently running qemu/kvm 2.3: qemu-kvm-ev-2.3.0-31.el7_2.10.1.x86_64

Thanks for helping with more eyes on this problem. I am at OPNFV and 
this problem is currently holding up nfv deployment.

(gdb) run -c 0x1 -n 4 -- -i -- portmask 0x1 --ncores=1
Starting program: /home/heat-admin/dpdk/mybuild/app/testpmd -c 0x1 -n 4 
-- -i -- portmask 0x1 --ncores=1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
EAL: Detected 4 lcore(s)
EAL: Probing VFIO support...
EAL: VFIO support initialized
[New Thread 0x773f0700 (LWP 30542)]
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using 
unreliable clock cycles !
[New Thread 0x76aec700 (LWP 30543)]
EAL: PCI device :00:03.0 on NUMA socket -1
EAL:   probe driver: 1af4:1000 rte_virtio_pmd

Program received signal SIGSEGV, Segmentation fault.
0x00616172 in vtpci_init (dev=0xbfb0f0, hw=0x7fffd57d0700,
 dev_flags=0x7fffdf9c)
 at /home/heat-admin/dpdk/drivers/net/virtio/virtio_pci.c:722
722dev->devargs->type != RTE_DEVTYPE_WHITELISTED_PCI) {
Missing separate debuginfos, use: debuginfo-install 
glibc-2.17-106.el7_2.6.x86_64
(gdb) bt
#0  0x00616172 in vtpci_init (dev=0xbfb0f0, hw=0x7fffd57d0700,
 dev_flags=0x7fffdf9c)
 at /home/heat-admin/dpdk/drivers/net/virtio/virtio_pci.c:722
#1  0x006209fb in eth_virtio_dev_init (
 eth_dev=0xb659a0 )
 at /home/heat-admin/dpdk/drivers/net/virtio/virtio_ethdev.c:1108
#2  0x0048339c in rte_eth_dev_init (pci_drv=0x8d52a0 
,
 pci_dev=0xbfb0f0)
 at /home/heat-admin/dpdk/lib/librte_ether/rte_ethdev.c:288
#3  0x004a0b7d in rte_eal_pci_probe_one_driver (
 dr=0x8d52a0 , dev=0xbfb0f0)
 at /home/heat-admin/dpdk/lib/librte_eal/common/eal_common_pci.c:214
#4  0x004a0d3d in pci_probe_all_drivers (dev=0xbfb0f0)
 at /home/heat-admin/dpdk/lib/librte_eal/common/eal_common_pci.c:290
#5  0x004a108a in rte_eal_pci_probe ()
 at /home/heat-admin/dpdk/lib/librte_eal/common/eal_common_pci.c:417
#6  0x00492fd2 in rte_eal_init (argc=11, argv=0x7fffe368)
 at /home/heat-admin/dpdk/lib/librte_eal/linuxapp/eal/eal.c:874
#7  0x0043592d in main (argc=11, argv=0x7fffe368)
 at /home/heat-admin/dpdk/app/test-pmd/testpmd.c:2059
(gdb)

-- 
*Thomas F Herbert*
SDN Group
Office of Technology
*Red Hat*

[dpdk-dev] [PATCH] port: fix build when KNI support is not enabled

2016-06-22 Thread Thomas Monjalon

2016-06-22 13:57, Olivier Matz:
> Hi Thomas,
> 
> On 06/22/2016 01:49 PM, Thomas Monjalon wrote:
> > 2016-06-22 14:34, Panu Matilainen:
> >> --- a/lib/librte_port/Makefile
> >> +++ b/lib/librte_port/Makefile
> >> @@ -82,6 +82,8 @@ DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_mempool
> >>  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_ether
> >>  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_ip_frag
> >>  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_sched
> >> +ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
> >>  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_kni
> >> +endif
> > 
> > I do not remember why $(CONFIG_RTE_LIBRTE_PORT) is needed in its Makefile.
> > I think we can do
> > DEPDIRS-$(CONFIG_RTE_LIBRTE_KNI) += lib/librte_kni
> > and set DEPDIRS-y everywhere else.
> > 
> 
> It's probably not much used, but the build framework allows to do
> the following to build only one directory:
> 
>   make lib/librte_port_sub
> 
> This directly jumps to the librte_port Makefile, bypassing parent
> directories. I think that's why the config check is duplicated in the
> Makefile.

If we want to specifically build this directory, why preventing us to do
so with CONFIG_RTE_LIBRTE_PORT?

[dpdk-dev] [PATCH] port: fix build when KNI support is not enabled

2016-06-22 Thread Olivier Matz

Hi Thomas,

On 06/22/2016 01:49 PM, Thomas Monjalon wrote:
> 2016-06-22 14:34, Panu Matilainen:
>> --- a/lib/librte_port/Makefile
>> +++ b/lib/librte_port/Makefile
>> @@ -82,6 +82,8 @@ DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_mempool
>>  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_ether
>>  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_ip_frag
>>  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_sched
>> +ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
>>  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_kni
>> +endif
> 
> I do not remember why $(CONFIG_RTE_LIBRTE_PORT) is needed in its Makefile.
> I think we can do
>   DEPDIRS-$(CONFIG_RTE_LIBRTE_KNI) += lib/librte_kni
> and set DEPDIRS-y everywhere else.
> 

It's probably not much used, but the build framework allows to do
the following to build only one directory:

  make lib/librte_port_sub

This directly jumps to the librte_port Makefile, bypassing parent
directories. I think that's why the config check is duplicated in the
Makefile.


Olivier

[dpdk-dev] [PATCH] port: fix build when KNI support is not enabled

2016-06-22 Thread Thomas Monjalon

2016-06-22 14:34, Panu Matilainen:
> --- a/lib/librte_port/Makefile
> +++ b/lib/librte_port/Makefile
> @@ -82,6 +82,8 @@ DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_mempool
>  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_ether
>  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_ip_frag
>  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_sched
> +ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
>  DEPDIRS-$(CONFIG_RTE_LIBRTE_PORT) += lib/librte_kni
> +endif

I do not remember why $(CONFIG_RTE_LIBRTE_PORT) is needed in its Makefile.
I think we can do
DEPDIRS-$(CONFIG_RTE_LIBRTE_KNI) += lib/librte_kni
and set DEPDIRS-y everywhere else.

[dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset

2016-06-22 Thread Jerin Jacob

On Wed, Jun 22, 2016 at 06:42:43AM +, Lu, Wenzhuo wrote:
> Hi Jerin,
> 
> 
> > -Original Message-
> > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > Sent: Wednesday, June 22, 2016 2:10 PM
> > To: Lu, Wenzhuo
> > Cc: Ananyev, Konstantin; Stephen Hemminger; dev at dpdk.org; Richardson,
> > Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> > thomas.monjalon at 6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device 
> > reset
> > 
> > On Wed, Jun 22, 2016 at 05:05:14AM +, Lu, Wenzhuo wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > > > Sent: Wednesday, June 22, 2016 12:15 PM
> > > > To: Lu, Wenzhuo
> > > > Cc: Ananyev, Konstantin; Stephen Hemminger; dev at dpdk.org;
> > > > Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing;
> > > > Zhang, Helin; thomas.monjalon at 6wind.com
> > > > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support
> > > > device reset
> > > >
> > > > On Wed, Jun 22, 2016 at 03:32:16AM +, Lu, Wenzhuo wrote:
> > > > > Lost here. I think these RTE_ETH_EVENTs are used to connect the
> > > > > APP call
> > > > back functions with the events.
> > > > > Actually I want the APP to register a callback function
> > > > > reset_event_callback for
> > > > the reset event. Like this,
> > > > >   /* register reset interrupt callback */
> > > > >   rte_eth_dev_callback_register(portid,
> > > > >   RTE_ETH_EVENT_INTR_RESET, reset_event_callback,
> > > > NULL); And when the
> > > > > VF driver finds PF link down/up, it  should  use
> > > > _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET) to run
> > > > into the callback which is provided by APP. Means reset_event_callback 
> > > > here.
> > > >
> > > > me too. Their is existing RTE_ETH_EVENT_INTR_RESET event to notify
> > > > the PF reset.I guess it is not for the PF link change or it isfor
> > > > generic VF reset request initiated by PF for everything.
> > > I think this event is for device reset not only for PF but also can for 
> > > VF. I think
> > we can use this event when the driver want the APP to reset the device. The 
> > PF
> > link down/up caused VF reset is one of the cases.
> > 
> > Then please correct description for the RTE_ETH_EVENT_INTR_RESET in
> > lib/librte_ether/rte_ethdev.h "/**< reset interrupt event, sent to VF on PF 
> > reset
> > */"
> > 
> > >
> > > >
> > > > file: lib/librte_ether/rte_ethdev.h
> > > > RTE_ETH_EVENT_INTR_RESET,
> > > > /**< reset interrupt event, sent to VF on PF reset */
> > > >
> > > > ^^^
> > > >
> > > > if application need to call rte_ethdev_reset() on
> > > > RTE_ETH_EVENT_INTR_RESET event then please mention it commit log or
> > API description.
> > > Good suggestion. I'll try to find where's the good place to add more
> > explanation.
> > 
> > I guess then reset API can be changed to rte_ethdev_process_reset_intr() or
> > similar to reflect the use case(API called by application on reset event 
> > from PF)
> > 
> > The PMDs were PF does not generate the RTE_ETH_EVENT_INTR_RESET to VF
> > then VF's reset PMD callback shall be a 'nop'
> > 
> > Jerin
> But I don't think it's appropriate to bind the RTE_ETH_EVENTs with the APIs. 
> This patch set provide a reset API to reset the device. Don't mean this reset 
> API only can be used when the APP hit the event RTE_ETH_EVENT_INTR_RESET. I 
> can add some comments to suggest the user to call the reset API at that time. 
> But I think APP can call the reset API anytime when it thinks it's necessary. 
> So I don't like the name *process_reset_intr*, it hints that this API is only 
> for the INTR_RESET event.

That's where scope of API and PMD implementation its not getting clear.
Can you tell me any other use case where we need to call this API from 
application.
The name rte_ethdev_reset() is too generic. If you are going with that
generic name then you may need add lot of details in API description.

Thomas,
As a librte_ether maintainer any comments on this?


> 
> > 
> > > >
> > >

[dpdk-dev] [PATCH v3 1/2] ethdev: add tunnel and port RSS offload types

2016-06-22 Thread Jerin Jacob

On Wed, Jun 22, 2016 at 08:43:52AM +0200, Thomas Monjalon wrote:
> 2016-06-22 09:00, Jerin Jacob:
> > On Tue, Jun 21, 2016 at 11:02:59PM +0200, Thomas Monjalon wrote:
> > > Hi Jerin,
> > 
> > Hi Thomas,
> > 
> > > 
> > > I wanted to push this patch which is now a dependency of ThunderX
> > > but I do not fully understand it.
> > > 
> > > 2016-03-31 02:21, Jerin Jacob:
> > > > - added VXLAN, GENEVE and NVGRE tunnel flow types
> > > > - added PORT flow type for accounting physical/virtual
> > > > port or channel number in flow creation
> > > [...]
> > > > --- a/lib/librte_ether/rte_eth_ctrl.h
> > > > +++ b/lib/librte_ether/rte_eth_ctrl.h
> > > > @@ -74,7 +74,11 @@ extern "C" {
> > > >  #define RTE_ETH_FLOW_IPV6_EX15
> > > >  #define RTE_ETH_FLOW_IPV6_TCP_EX16
> > > >  #define RTE_ETH_FLOW_IPV6_UDP_EX17
> > > > -#define RTE_ETH_FLOW_MAX18
> > > > +#define RTE_ETH_FLOW_PORT   18
> > > > +#define RTE_ETH_FLOW_VXLAN  19
> > > > +#define RTE_ETH_FLOW_GENEVE 20
> > > > +#define RTE_ETH_FLOW_NVGRE  21
> > > > +#define RTE_ETH_FLOW_MAX22
> > > 
> > > Please could you explain more what is PORT flow?
> > 
> > For example, a NIC card with two physical port where application
> > configures RTE_ETH_FLOW_IPV4 for both, In that case
> > HW generate same RSS value for a similar IPV4 packet,  However, in-case if
> > application want to generate a flow that account physical port also then
> > it can configure with RTE_ETH_FLOW_IPV4 | RTE_ETH_FLOW_PORT.
> > 
> > RTE_ETH_FLOW_PORT useful for the case where one physical port assigned for
> > INBOUND traffic and other-one for OUTBOUND traffic etc
> 
> OK
> 
> > > Does it need a comment in the code?
> > Not sure, commit log has description.
> 
> How do you expect the user to understand this new value in the API?
> Users do not check in the git history.
> They use doxygen, headers comments and/or examples.

The reason why I said because none of flow type has comments in the
list. If you think RTE_ETH_FLOW_PORT needs a doxygen comment then I can
add it.

It would be nice some else could add the comments for following,
RTE_ETH_FLOW_RAW,
RTE_ETH_FLOW_L2_PAYLOAD

[dpdk-dev] [PATCH 3/3] app/pdump: fix string overflow

2016-06-22 Thread Anupam Kapoor

>   if (!strcmp(key, PDUMP_RX_DEV_ARG)) {
> - strncpy(pt->rx_dev, value, strlen(value));
> + strncpy(pt->rx_dev, value, sizeof(pt->rx_dev)-1);

I guess size-1 is to give room for terminating null byte, but for this
case is it guarantied that pt->rx_dev last byte is NULL?

why not just use a snprintf(...) here since it has better error behavior ?
although compared to str*cpy it might be a bit slow, but hopefully that
should be ok ?

--
thanks
anupam


On Tue, Jun 21, 2016 at 10:51 PM, Ferruh Yigit 
wrote:

> On 6/21/2016 4:18 PM, Reshma Pattan wrote:
> > using source length in strncpy can cause destination
> > overflow if destination length is not big enough to
> > handle the source string. Changes are made to use destination
> > size instead of source length in strncpy.
> >
> > Coverity issue 127351: string overflow
> >
> > Fixes: caa7028276b8 ("app/pdump: add tool for packet capturing")
> >
> > Signed-off-by: Reshma Pattan 
> > ---
> >  app/pdump/main.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/app/pdump/main.c b/app/pdump/main.c
> > index f8923b9..af92ef3 100644
> > --- a/app/pdump/main.c
> > +++ b/app/pdump/main.c
> > @@ -217,12 +217,12 @@ parse_rxtxdev(const char *key, const char *value,
> void *extra_args)
> >   struct pdump_tuples *pt = extra_args;
> >
> >   if (!strcmp(key, PDUMP_RX_DEV_ARG)) {
> > - strncpy(pt->rx_dev, value, strlen(value));
> > + strncpy(pt->rx_dev, value, sizeof(pt->rx_dev)-1);
>
> I guess size-1 is to give room for terminating null byte, but for this
> case is it guarantied that pt->rx_dev last byte is NULL?
>
>


-- 
In the beginning was the lambda, and the lambda was with Emacs, and Emacs
was the lambda.

[dpdk-dev] regarding coverity-scan-id : 127350

2016-06-22 Thread Anupam Kapoor

hi,

i just looked at the CID: 127350 which raises a STRING_OVERFLOW bug
for pdump_prepare_client_request(...)

callers of pdump_prepare_client_request(...) pass 'device' parameter which
is always of DEVICE_ID_SIZE (64) bytes long. hence it is not possible to
overflow pdump_request.data.en_v1.device buffer which is also
DEVICE_ID_SIZE bytes long.


--
thanks
anupam

[dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset

2016-06-22 Thread Jerin Jacob

On Wed, Jun 22, 2016 at 05:05:14AM +, Lu, Wenzhuo wrote:
> 
> 
> > -Original Message-
> > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > Sent: Wednesday, June 22, 2016 12:15 PM
> > To: Lu, Wenzhuo
> > Cc: Ananyev, Konstantin; Stephen Hemminger; dev at dpdk.org; Richardson,
> > Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> > thomas.monjalon at 6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device 
> > reset
> > 
> > On Wed, Jun 22, 2016 at 03:32:16AM +, Lu, Wenzhuo wrote:
> > > Lost here. I think these RTE_ETH_EVENTs are used to connect the APP call
> > back functions with the events.
> > > Actually I want the APP to register a callback function 
> > > reset_event_callback for
> > the reset event. Like this,
> > >   /* register reset interrupt callback */
> > >   rte_eth_dev_callback_register(portid,
> > >   RTE_ETH_EVENT_INTR_RESET, reset_event_callback,
> > NULL); And when the
> > > VF driver finds PF link down/up, it  should  use
> > _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET) to run into
> > the callback which is provided by APP. Means reset_event_callback here.
> > 
> > me too. Their is existing RTE_ETH_EVENT_INTR_RESET event to notify the PF
> > reset.I guess it is not for the PF link change or it isfor generic VF reset 
> > request
> > initiated by PF for everything.
> I think this event is for device reset not only for PF but also can for VF. I 
> think we can use this event when the driver want the APP to reset the device. 
> The PF link down/up caused VF reset is one of the cases.

Then please correct description for the RTE_ETH_EVENT_INTR_RESET
in lib/librte_ether/rte_ethdev.h
"/**< reset interrupt event, sent to VF on PF reset */"

> 
> > 
> > file: lib/librte_ether/rte_ethdev.h
> > RTE_ETH_EVENT_INTR_RESET,
> > /**< reset interrupt event, sent to VF on PF reset */
> >^^^
> > 
> > if application need to call rte_ethdev_reset() on  RTE_ETH_EVENT_INTR_RESET
> > event then please mention it commit log or API description.
> Good suggestion. I'll try to find where's the good place to add more 
> explanation.

I guess then reset API can be changed to rte_ethdev_process_reset_intr() or
similar to reflect the use case(API called by application on reset event from 
PF)

The PMDs were PF does not generate the RTE_ETH_EVENT_INTR_RESET to VF
then VF's reset PMD callback shall be a 'nop'

Jerin

> > 
>

[dpdk-dev] [PATCH v3 00/25] Refactor mlx5 to improve performance

2016-06-22 Thread Adrien Mazarguil

On Wed, Jun 22, 2016 at 10:19:14AM +0100, Bruce Richardson wrote:
> On Wed, Jun 22, 2016 at 10:20:54AM +0200, Adrien Mazarguil wrote:
> > On Tue, Jun 21, 2016 at 05:42:29PM +0100, Ferruh Yigit wrote:
> > > On 6/21/2016 8:23 AM, Nelio Laranjeiro wrote:
> > > > Enhance mlx5 with a data path that bypasses Verbs.
> > > > 
> > > > The first half of this patchset removes support for functionality 
> > > > completely
> > > > rewritten in the second half (scatter/gather, inline send), while the 
> > > > data
> > > > path is refactored without Verbs.
> > > > 
> > > > The PMD remains usable during the transition.
> > > > 
> > > > This patchset must be applied after "Miscellaneous fixes for mlx4 and 
> > > > mlx5".
> > > > 
> > > > Changes in v3:
> > > > - Rebased patchset on top of next-net/rel_16_07.
> > > > 
> > > > Changes in v2:
> > > > - Rebased patchset on top of dpdk/master.
> > > > - Fixed CQE size on Power8.
> > > > - Fixed mbuf assertion failure in debug mode.
> > > > - Fixed missing class_id field in rte_pci_id by using RTE_PCI_DEVICE.
> > > > 
> > > > Adrien Mazarguil (8):
> > > >   mlx5: replace countdown with threshold for Tx completions
> > > >   mlx5: add debugging information about Tx queues capabilities
> > > >   mlx5: check remaining space while processing Tx burst
> > > >   mlx5: resurrect Tx gather support
> > > >   mlx5: work around spurious compilation errors
> > > >   mlx5: remove redundant Rx queue initialization code
> > > >   mlx5: make Rx queue reinitialization safer
> > > >   mlx5: resurrect Rx scatter support
> > > > 
> > > > Nelio Laranjeiro (16):
> > > >   drivers: fix PCI class id support
> > > >   mlx5: split memory registration function
> > > >   mlx5: remove Tx gather support
> > > >   mlx5: remove Rx scatter support
> > > >   mlx5: remove configuration variable
> > > >   mlx5: remove inline Tx support
> > > >   mlx5: split Tx queue structure
> > > >   mlx5: split Rx queue structure
> > > >   mlx5: update prerequisites for upcoming enhancements
> > > >   mlx5: add definitions for data path without Verbs
> > > >   mlx5: add support for configuration through kvargs
> > > >   mlx5: add Tx/Rx burst function selection wrapper
> > > >   mlx5: refactor Rx data path
> > > >   mlx5: refactor Tx data path
> > > >   mlx5: handle Rx CQE compression
> > > >   mlx5: add support for multi-packet send
> > > > 
> > > > Yaacov Hazan (1):
> > > >   mlx5: add support for inline send
> > > > 
> > > 
> > > Patchset applies and compiles fine, thanks.
> > > 
> > > But still has some checkpatch warnings, -btw, I am using the checkpatch
> > > script from latest master branch of Linux repo.
> > > 
> > > Following is the sample type of warnings (not instances, there are more
> > > than one instance per type):
> > 
> > While Nelio is preparing a v4 to address the kvargs issue, the remaining
> > warnings can be safely ignored.
> > 
> > A few of them are in relocated but unmodified code as this patchset
> > refactors the entire PMD, others are documented. We settled on positive
> > errno values internally because mlx5 uses syscalls and switching the sign
> > bit all over the place quickly became unmanageable. They are made negative
> > when returning from public callbacks (except for kvargs by mistake).
> > 
> > In short, we did run checkpatch, fixed a million warnings and other errors
> > and left those on purpose, nothing to worry about.
> > 
> 
> Yes, they are nothing to worry about, but at the same time, I fail to see why
> most of them should not be fixed. Even if you are moving code, unless it's a 
> whole
> file it's not going to show up as a move in the diff, so some small changes
> during the move can be ok.
>  
> > > WARNING:UNSPECIFIED_INT: Prefer 'unsigned int' to bare use of 'unsigned'
> > > #112: FILE: drivers/net/mlx5/mlx5_mr.c:65:
> > > +   unsigned mem_idx)
> > > 
> This looks easily fixable.
> 
> > > WARNING:BLOCK_COMMENT_STYLE: Block comments use a trailing */ on a
> > > separate line
> > > #288: FILE: drivers/net/mlx5/mlx5_mr.c:241:
> > > +* pointer is valid. */
> > > 
> I also think this should be fixed.
> 
> > > WARNING:USE_NEGATIVE_ERRNO: return of an errno should typically be
> > > negative (ie: return -EINVAL)
> > > #524: FILE: drivers/net/mlx5/mlx5_txq.c:265:
> > > +   return EINVAL;
> > > 
> > > WARNING:LONG_LINE: line over 80 characters
> > > #108: FILE: drivers/net/mlx5/mlx5_ethdev.c:1250:
> > > +   txq_ctrl->txq.stats.idx =
> > > primary_txq->stats.idx;
> > > 
> > > WARNING:STATIC_CONST_CHAR_ARRAY: static const char * array should
> > > probably be static const char * const
> > > #88: FILE: drivers/net/mlx5/mlx5.c:281:
> > > +   static const char *params[] = {
> > > 
> > > ERROR:ASSIGN_IN_IF: do not use assignment in if condition
> > > #218: FILE: drivers/net/mlx5/mlx5_rxtx.c:92:
> > > +   if (!ret || !(ret = ((*buf)[i] == magic[i])))
> > > 
> > > CHECK:SPACING: spaces preferred around that '&' (ctx:VxV)
> > > #414

[dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset

2016-06-22 Thread Thomas Monjalon

2016-06-22 08:25, Lu, Wenzhuo:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > 2016-06-22 13:29, Jerin Jacob:
> > > Thomas,
> > > As a librte_ether maintainer any comments on this?
> > 
> > +1 for adding details and make sure naming is good.
> > I don't really need to comment here because I have already done this comment
> > earlier:
> > http://dpdk.org/ml/archives/dev/2016-June/041845.html
> > Thank you for insisting.
> I've add some details in this patch set. If it's not enough, please let me 
> know.
> And I think this discussion is about what the API name should be like. 
> Actually I think all the existing name is describing what is done by the API 
> not when and where it should be used, like dev_start/stop.

You're right, I overlooked it:

+ * The API will stop the port, clear the rx/tx queues, re-setup the rx/tx
+ * queues, restart the port.

Jerin, which detail do you think is needed?

Wenzhuo, why this function is needed?
All these actions are already possible independently.
When looking at ixgbe implementation, I see:
ixgbevf_dev_stats_reset() which is not documented in the API
rte_delay_ms(1000);
do {} while
It looks to be some hacks.
If you really need some workarounds to handle some tricky situations,
maybe that the API is not detailed enough.

> But anyway I'm open for changing the name. Is the name process_reset_intr you 
> prefer? Thanks.

Not sure.
If you really intend to add a generic reset, maybe rte_eth_dev_reset()
is a good name. We just need more justification.
After reading the doc, the user can understand it is just a wrapper of
existing functions. But it appears in the code that it does more and can
help in some situations.

[dpdk-dev] [PATCH v4 25/25] mlx5: resurrect Rx scatter support

2016-06-22 Thread Nelio Laranjeiro

From: Adrien Mazarguil 

This commit brings back Rx scatter and related support by the MTU update
function. The maximum number of segments per packet is not a fixed value
anymore (previously MLX5_PMD_SGE_WR_N, set to 4 by default) as it caused
performance issues when fewer segments were actually needed as well as
limitations on the maximum packet size that could be received with the
default mbuf size (supporting at most 8576 bytes).

These limitations are now lifted as the number of SGEs is derived from the
MTU (which implies MRU) at queue initialization and during MTU update.

Signed-off-by: Adrien Mazarguil 
Signed-off-by: Vasily Philipov 
Signed-off-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5_ethdev.c |  84 +
 drivers/net/mlx5/mlx5_rxq.c|  73 +-
 drivers/net/mlx5/mlx5_rxtx.c   | 139 -
 drivers/net/mlx5/mlx5_rxtx.h   |   1 +
 4 files changed, 215 insertions(+), 82 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 698a50e..3f61be2 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -725,6 +725,9 @@ mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
unsigned int i;
uint16_t (*rx_func)(void *, struct rte_mbuf **, uint16_t) =
mlx5_rx_burst;
+   unsigned int max_frame_len;
+   int rehash;
+   int restart = priv->started;

if (mlx5_is_secondary())
return -E_RTE_SECONDARY;
@@ -738,7 +741,6 @@ mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
goto out;
} else
DEBUG("adapter port %u MTU set to %u", priv->port, mtu);
-   priv->mtu = mtu;
/* Temporarily replace RX handler with a fake one, assuming it has not
 * been copied elsewhere. */
dev->rx_pkt_burst = removed_rx_burst;
@@ -746,28 +748,88 @@ mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 * removed_rx_burst() instead. */
rte_wmb();
usleep(1000);
+   /* MTU does not include header and CRC. */
+   max_frame_len = ETHER_HDR_LEN + mtu + ETHER_CRC_LEN;
+   /* Check if at least one queue is going to need a SGE update. */
+   for (i = 0; i != priv->rxqs_n; ++i) {
+   struct rxq *rxq = (*priv->rxqs)[i];
+   unsigned int mb_len;
+   unsigned int size = RTE_PKTMBUF_HEADROOM + max_frame_len;
+   unsigned int sges_n;
+
+   if (rxq == NULL)
+   continue;
+   mb_len = rte_pktmbuf_data_room_size(rxq->mp);
+   assert(mb_len >= RTE_PKTMBUF_HEADROOM);
+   /* Determine the number of SGEs needed for a full packet
+* and round it to the next power of two. */
+   sges_n = log2above((size / mb_len) + !!(size % mb_len));
+   if (sges_n != rxq->sges_n)
+   break;
+   }
+   /* If all queues have the right number of SGEs, a simple rehash
+* of their buffers is enough, otherwise SGE information can only
+* be updated in a queue by recreating it. All resources that depend
+* on queues (flows, indirection tables) must be recreated as well in
+* that case. */
+   rehash = (i == priv->rxqs_n);
+   if (!rehash) {
+   /* Clean up everything as with mlx5_dev_stop(). */
+   priv_special_flow_disable_all(priv);
+   priv_mac_addrs_disable(priv);
+   priv_destroy_hash_rxqs(priv);
+   priv_fdir_disable(priv);
+   priv_dev_interrupt_handler_uninstall(priv, dev);
+   }
+recover:
/* Reconfigure each RX queue. */
for (i = 0; (i != priv->rxqs_n); ++i) {
struct rxq *rxq = (*priv->rxqs)[i];
-   unsigned int mb_len;
-   unsigned int max_frame_len;
+   struct rxq_ctrl *rxq_ctrl =
+   container_of(rxq, struct rxq_ctrl, rxq);
int sp;
+   unsigned int mb_len;
+   unsigned int tmp;

if (rxq == NULL)
continue;
-   /* Calculate new maximum frame length according to MTU and
-* toggle scattered support (sp) if necessary. */
-   max_frame_len = (priv->mtu + ETHER_HDR_LEN +
-(ETHER_MAX_VLAN_FRAME_LEN - ETHER_MAX_LEN));
mb_len = rte_pktmbuf_data_room_size(rxq->mp);
assert(mb_len >= RTE_PKTMBUF_HEADROOM);
+   /* Toggle scattered support (sp) if necessary. */
sp = (max_frame_len > (mb_len - RTE_PKTMBUF_HEADROOM));
-   if (sp) {
-   ERROR("%p: RX scatter is not supported", (void *)dev);
-   ret = ENOTSUP;
-   goto out;
+   /* Provide new values to rxq_setup(). */
+   dev->data->d

[dpdk-dev] [PATCH v4 24/25] mlx5: make Rx queue reinitialization safer

2016-06-22 Thread Nelio Laranjeiro

From: Adrien Mazarguil 

The primary purpose of rxq_rehash() function is to stop and restart
reception on a queue after re-posting buffers. This may fail if the array
that temporarily stores existing buffers for reuse cannot be allocated.

Update rxq_rehash() to work on the target queue directly (not through a
template copy) and avoid this allocation.

rxq_alloc_elts() is modified accordingly to take buffers from an existing
queue directly and update their refcount.

Unlike rxq_rehash(), rxq_setup() must work on a temporary structure but
should not allocate new mbufs from the pool while reinitializing an
existing queue. This is achieved by using the refcount-aware
rxq_alloc_elts() before overwriting queue data.

Signed-off-by: Adrien Mazarguil 
Signed-off-by: Vasily Philipov 
---
 drivers/net/mlx5/mlx5_rxq.c | 83 ++---
 1 file changed, 41 insertions(+), 42 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 707296c..0a3225e 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -642,7 +642,7 @@ priv_rehash_flows(struct priv *priv)
  */
 static int
 rxq_alloc_elts(struct rxq_ctrl *rxq_ctrl, unsigned int elts_n,
-  struct rte_mbuf **pool)
+  struct rte_mbuf *(*pool)[])
 {
unsigned int i;
int ret = 0;
@@ -654,9 +654,10 @@ rxq_alloc_elts(struct rxq_ctrl *rxq_ctrl, unsigned int 
elts_n,
&(*rxq_ctrl->rxq.wqes)[i];

if (pool != NULL) {
-   buf = *(pool++);
+   buf = (*pool)[i];
assert(buf != NULL);
rte_pktmbuf_reset(buf);
+   rte_pktmbuf_refcnt_update(buf, 1);
} else
buf = rte_pktmbuf_alloc(rxq_ctrl->rxq.mp);
if (buf == NULL) {
@@ -781,7 +782,7 @@ rxq_cleanup(struct rxq_ctrl *rxq_ctrl)
 }

 /**
- * Reconfigure a RX queue with new parameters.
+ * Reconfigure RX queue buffers.
  *
  * rxq_rehash() does not allocate mbufs, which, if not done from the right
  * thread (such as a control thread), may corrupt the pool.
@@ -798,67 +799,48 @@ rxq_cleanup(struct rxq_ctrl *rxq_ctrl)
 int
 rxq_rehash(struct rte_eth_dev *dev, struct rxq_ctrl *rxq_ctrl)
 {
-   struct rxq_ctrl tmpl = *rxq_ctrl;
-   unsigned int mbuf_n;
-   unsigned int desc_n;
-   struct rte_mbuf **pool;
-   unsigned int i, k;
+   unsigned int elts_n = rxq_ctrl->rxq.elts_n;
+   unsigned int i;
struct ibv_exp_wq_attr mod;
int err;

DEBUG("%p: rehashing queue %p", (void *)dev, (void *)rxq_ctrl);
-   /* Number of descriptors and mbufs currently allocated. */
-   desc_n = tmpl.rxq.elts_n;
-   mbuf_n = desc_n;
/* From now on, any failure will render the queue unusable.
 * Reinitialize WQ. */
mod = (struct ibv_exp_wq_attr){
.attr_mask = IBV_EXP_WQ_ATTR_STATE,
.wq_state = IBV_EXP_WQS_RESET,
};
-   err = ibv_exp_modify_wq(tmpl.wq, &mod);
+   err = ibv_exp_modify_wq(rxq_ctrl->wq, &mod);
if (err) {
ERROR("%p: cannot reset WQ: %s", (void *)dev, strerror(err));
assert(err > 0);
return err;
}
-   /* Allocate pool. */
-   pool = rte_malloc(__func__, (mbuf_n * sizeof(*pool)), 0);
-   if (pool == NULL) {
-   ERROR("%p: cannot allocate memory", (void *)dev);
-   return ENOBUFS;
-   }
/* Snatch mbufs from original queue. */
-   k = 0;
-   for (i = 0; (i != desc_n); ++i)
-   pool[k++] = (*rxq_ctrl->rxq.elts)[i];
-   assert(k == mbuf_n);
-   rte_free(pool);
+   claim_zero(rxq_alloc_elts(rxq_ctrl, elts_n, rxq_ctrl->rxq.elts));
+   for (i = 0; i != elts_n; ++i) {
+   struct rte_mbuf *buf = (*rxq_ctrl->rxq.elts)[i];
+
+   assert(rte_mbuf_refcnt_read(buf) == 2);
+   rte_pktmbuf_free_seg(buf);
+   }
/* Change queue state to ready. */
mod = (struct ibv_exp_wq_attr){
.attr_mask = IBV_EXP_WQ_ATTR_STATE,
.wq_state = IBV_EXP_WQS_RDY,
};
-   err = ibv_exp_modify_wq(tmpl.wq, &mod);
+   err = ibv_exp_modify_wq(rxq_ctrl->wq, &mod);
if (err) {
ERROR("%p: WQ state to IBV_EXP_WQS_RDY failed: %s",
  (void *)dev, strerror(err));
goto error;
}
-   /* Post SGEs. */
-   err = rxq_alloc_elts(&tmpl, desc_n, pool);
-   if (err) {
-   ERROR("%p: cannot reallocate WRs, aborting", (void *)dev);
-   rte_free(pool);
-   assert(err > 0);
-   return err;
-   }
/* Update doorbell counter. */
-   rxq_ctrl->rxq.rq_ci = desc_n;
+   rxq_ctrl->rxq.rq_ci = elts_n;
rte_wmb();
*rxq_ctrl->rxq.rq_db = htonl(rxq_ctrl->rxq.rq_ci);
 error:
-

[dpdk-dev] [PATCH v4 23/25] mlx5: remove redundant Rx queue initialization code

2016-06-22 Thread Nelio Laranjeiro

From: Adrien Mazarguil 

Toggling RX checksum offloads is already done at initialization time. This
code does not belong in rxq_rehash().

Signed-off-by: Adrien Mazarguil 
Signed-off-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5_rxq.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 6881cdd..707296c 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -798,7 +798,6 @@ rxq_cleanup(struct rxq_ctrl *rxq_ctrl)
 int
 rxq_rehash(struct rte_eth_dev *dev, struct rxq_ctrl *rxq_ctrl)
 {
-   struct priv *priv = rxq_ctrl->priv;
struct rxq_ctrl tmpl = *rxq_ctrl;
unsigned int mbuf_n;
unsigned int desc_n;
@@ -811,15 +810,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq_ctrl 
*rxq_ctrl)
/* Number of descriptors and mbufs currently allocated. */
desc_n = tmpl.rxq.elts_n;
mbuf_n = desc_n;
-   /* Toggle RX checksum offload if hardware supports it. */
-   if (priv->hw_csum) {
-   tmpl.rxq.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
-   rxq_ctrl->rxq.csum = tmpl.rxq.csum;
-   }
-   if (priv->hw_csum_l2tun) {
-   tmpl.rxq.csum_l2tun = 
!!dev->data->dev_conf.rxmode.hw_ip_checksum;
-   rxq_ctrl->rxq.csum_l2tun = tmpl.rxq.csum_l2tun;
-   }
/* From now on, any failure will render the queue unusable.
 * Reinitialize WQ. */
mod = (struct ibv_exp_wq_attr){
-- 
2.1.4

[dpdk-dev] [PATCH v4 22/25] mlx5: work around spurious compilation errors

2016-06-22 Thread Nelio Laranjeiro

From: Adrien Mazarguil 

Since commit "mlx5: resurrect Tx gather support", older GCC versions (such
as 4.8.5) may complain about the following:

 mlx5_rxtx.c: In function `mlx5_tx_burst':
 mlx5_rxtx.c:705:25: error: `wqe' may be used uninitialized in this
 function [-Werror=maybe-uninitialized]

 mlx5_rxtx.c: In function `mlx5_tx_burst_inline':
 mlx5_rxtx.c:864:25: error: `wqe' may be used uninitialized in this
 function [-Werror=maybe-uninitialized]

In both cases, this code cannot be reached when wqe is not initialized.

Considering older GCC versions are still widely used, work around this
issue by initializing wqe preemptively, even if it should not be necessary.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_rxtx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index db784c0..2fc57dc 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -591,7 +591,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
unsigned int j = 0;
unsigned int max;
unsigned int comp;
-   volatile union mlx5_wqe *wqe;
+   volatile union mlx5_wqe *wqe = NULL;

if (unlikely(!pkts_n))
return 0;
@@ -733,7 +733,7 @@ mlx5_tx_burst_inline(void *dpdk_txq, struct rte_mbuf 
**pkts, uint16_t pkts_n)
unsigned int j = 0;
unsigned int max;
unsigned int comp;
-   volatile union mlx5_wqe *wqe;
+   volatile union mlx5_wqe *wqe = NULL;
unsigned int max_inline = txq->max_inline;

if (unlikely(!pkts_n))
-- 
2.1.4

[dpdk-dev] [PATCH v4 21/25] mlx5: resurrect Tx gather support

2016-06-22 Thread Nelio Laranjeiro

From: Adrien Mazarguil 

Compared to its previous incarnation, the software limit on the number of
mbuf segments is no more (previously MLX5_PMD_SGE_WR_N, set to 4 by
default) hence no need for linearization code and related buffers that
permanently consumed a non negligible amount of memory to handle oversized
mbufs.

The resulting code is both lighter and faster.

Signed-off-by: Adrien Mazarguil 
Signed-off-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5_rxtx.c | 231 +--
 drivers/net/mlx5/mlx5_txq.c  |   6 +-
 2 files changed, 182 insertions(+), 55 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 7097713..db784c0 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -301,6 +301,7 @@ mlx5_wqe_write(struct txq *txq, volatile union mlx5_wqe 
*wqe,
 {
wqe->wqe.ctrl.data[0] = htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND);
wqe->wqe.ctrl.data[1] = htonl((txq->qp_num_8s) | 4);
+   wqe->wqe.ctrl.data[2] = 0;
wqe->wqe.ctrl.data[3] = 0;
wqe->inl.eseg.rsvd0 = 0;
wqe->inl.eseg.rsvd1 = 0;
@@ -346,6 +347,7 @@ mlx5_wqe_write_vlan(struct txq *txq, volatile union 
mlx5_wqe *wqe,

wqe->wqe.ctrl.data[0] = htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND);
wqe->wqe.ctrl.data[1] = htonl((txq->qp_num_8s) | 4);
+   wqe->wqe.ctrl.data[2] = 0;
wqe->wqe.ctrl.data[3] = 0;
wqe->inl.eseg.rsvd0 = 0;
wqe->inl.eseg.rsvd1 = 0;
@@ -423,6 +425,7 @@ mlx5_wqe_write_inline(struct txq *txq, volatile union 
mlx5_wqe *wqe,
assert(size < 64);
wqe->inl.ctrl.data[0] = htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND);
wqe->inl.ctrl.data[1] = htonl(txq->qp_num_8s | size);
+   wqe->inl.ctrl.data[2] = 0;
wqe->inl.ctrl.data[3] = 0;
wqe->inl.eseg.rsvd0 = 0;
wqe->inl.eseg.rsvd1 = 0;
@@ -496,6 +499,7 @@ mlx5_wqe_write_inline_vlan(struct txq *txq, volatile union 
mlx5_wqe *wqe,
assert(size < 64);
wqe->inl.ctrl.data[0] = htonl((txq->wqe_ci << 8) | MLX5_OPCODE_SEND);
wqe->inl.ctrl.data[1] = htonl(txq->qp_num_8s | size);
+   wqe->inl.ctrl.data[2] = 0;
wqe->inl.ctrl.data[3] = 0;
wqe->inl.eseg.rsvd0 = 0;
wqe->inl.eseg.rsvd1 = 0;
@@ -584,6 +588,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
uint16_t elts_head = txq->elts_head;
const unsigned int elts_n = txq->elts_n;
unsigned int i = 0;
+   unsigned int j = 0;
unsigned int max;
unsigned int comp;
volatile union mlx5_wqe *wqe;
@@ -600,21 +605,25 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
if (max > elts_n)
max -= elts_n;
do {
-   struct rte_mbuf *buf;
+   struct rte_mbuf *buf = *(pkts++);
unsigned int elts_head_next;
uintptr_t addr;
uint32_t length;
uint32_t lkey;
+   unsigned int segs_n = buf->nb_segs;
+   volatile struct mlx5_wqe_data_seg *dseg;
+   unsigned int ds = sizeof(*wqe) / 16;

/* Make sure there is enough room to store this packet and
 * that one ring entry remains unused. */
-   if (max < 1 + 1)
+   assert(segs_n);
+   if (max < segs_n + 1)
break;
-   --max;
+   max -= segs_n;
--pkts_n;
-   buf = *(pkts++);
elts_head_next = (elts_head + 1) & (elts_n - 1);
wqe = &(*txq->wqes)[txq->wqe_ci & (txq->wqe_n - 1)];
+   dseg = &wqe->wqe.dseg;
rte_prefetch0(wqe);
if (pkts_n)
rte_prefetch0(*pkts);
@@ -634,7 +643,6 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
buf->vlan_tci);
else
mlx5_wqe_write(txq, wqe, addr, length, lkey);
-   wqe->wqe.ctrl.data[2] = 0;
/* Should we enable HW CKSUM offload */
if (buf->ol_flags &
(PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM)) {
@@ -643,6 +651,35 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
MLX5_ETH_WQE_L4_CSUM;
} else
wqe->wqe.eseg.cs_flags = 0;
+   while (--segs_n) {
+   /* Spill on next WQE when the current one does not have
+* enough room left. Size of WQE must a be a multiple
+* of data segment size. */
+   assert(!(sizeof(*wqe) % sizeof(*dseg)));
+   if (!(ds % (sizeof(*wqe) / 16)))
+   dseg = (volatile void *)
+   &(*

[dpdk-dev] [PATCH v4 20/25] mlx5: check remaining space while processing Tx burst

2016-06-22 Thread Nelio Laranjeiro

From: Adrien Mazarguil 

The space necessary to store segmented packets cannot be known in advance
and must be verified for each of them.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_rxtx.c | 136 ++-
 1 file changed, 70 insertions(+), 66 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 2ee504d..7097713 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -583,50 +583,49 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
struct txq *txq = (struct txq *)dpdk_txq;
uint16_t elts_head = txq->elts_head;
const unsigned int elts_n = txq->elts_n;
-   unsigned int i;
+   unsigned int i = 0;
unsigned int max;
unsigned int comp;
volatile union mlx5_wqe *wqe;
-   struct rte_mbuf *buf;

if (unlikely(!pkts_n))
return 0;
-   buf = pkts[0];
/* Prefetch first packet cacheline. */
tx_prefetch_cqe(txq, txq->cq_ci);
tx_prefetch_cqe(txq, txq->cq_ci + 1);
-   rte_prefetch0(buf);
+   rte_prefetch0(*pkts);
/* Start processing. */
txq_complete(txq);
max = (elts_n - (elts_head - txq->elts_tail));
if (max > elts_n)
max -= elts_n;
-   assert(max >= 1);
-   assert(max <= elts_n);
-   /* Always leave one free entry in the ring. */
-   --max;
-   if (max == 0)
-   return 0;
-   if (max > pkts_n)
-   max = pkts_n;
-   for (i = 0; (i != max); ++i) {
-   unsigned int elts_head_next = (elts_head + 1) & (elts_n - 1);
+   do {
+   struct rte_mbuf *buf;
+   unsigned int elts_head_next;
uintptr_t addr;
uint32_t length;
uint32_t lkey;

+   /* Make sure there is enough room to store this packet and
+* that one ring entry remains unused. */
+   if (max < 1 + 1)
+   break;
+   --max;
+   --pkts_n;
+   buf = *(pkts++);
+   elts_head_next = (elts_head + 1) & (elts_n - 1);
wqe = &(*txq->wqes)[txq->wqe_ci & (txq->wqe_n - 1)];
rte_prefetch0(wqe);
-   if (i + 1 < max)
-   rte_prefetch0(pkts[i + 1]);
+   if (pkts_n)
+   rte_prefetch0(*pkts);
/* Retrieve buffer information. */
addr = rte_pktmbuf_mtod(buf, uintptr_t);
length = DATA_LEN(buf);
/* Update element. */
(*txq->elts)[elts_head] = buf;
/* Prefetch next buffer data. */
-   if (i + 1 < max)
-   rte_prefetch0(rte_pktmbuf_mtod(pkts[i + 1],
+   if (pkts_n)
+   rte_prefetch0(rte_pktmbuf_mtod(*pkts,
   volatile void *));
/* Retrieve Memory Region key for this memory pool. */
lkey = txq_mp2mr(txq, txq_mb2mp(buf));
@@ -649,8 +648,8 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
txq->stats.obytes += length;
 #endif
elts_head = elts_head_next;
-   buf = pkts[i + 1];
-   }
+   ++i;
+   } while (pkts_n);
/* Take a shortcut if nothing must be sent. */
if (unlikely(i == 0))
return 0;
@@ -693,44 +692,43 @@ mlx5_tx_burst_inline(void *dpdk_txq, struct rte_mbuf 
**pkts, uint16_t pkts_n)
struct txq *txq = (struct txq *)dpdk_txq;
uint16_t elts_head = txq->elts_head;
const unsigned int elts_n = txq->elts_n;
-   unsigned int i;
+   unsigned int i = 0;
unsigned int max;
unsigned int comp;
volatile union mlx5_wqe *wqe;
-   struct rte_mbuf *buf;
unsigned int max_inline = txq->max_inline;

if (unlikely(!pkts_n))
return 0;
-   buf = pkts[0];
/* Prefetch first packet cacheline. */
tx_prefetch_cqe(txq, txq->cq_ci);
tx_prefetch_cqe(txq, txq->cq_ci + 1);
-   rte_prefetch0(buf);
+   rte_prefetch0(*pkts);
/* Start processing. */
txq_complete(txq);
max = (elts_n - (elts_head - txq->elts_tail));
if (max > elts_n)
max -= elts_n;
-   assert(max >= 1);
-   assert(max <= elts_n);
-   /* Always leave one free entry in the ring. */
-   --max;
-   if (max == 0)
-   return 0;
-   if (max > pkts_n)
-   max = pkts_n;
-   for (i = 0; (i != max); ++i) {
-   unsigned int elts_head_next = (elts_head + 1) & (elts_n - 1);
+   do {
+   struct rte_mbuf *buf;
+   unsigned int elts_head_next;
uintptr_t addr;
uint32_t length;
uint32_t

[dpdk-dev] [PATCH v4 19/25] mlx5: add debugging information about Tx queues capabilities

2016-06-22 Thread Nelio Laranjeiro

From: Adrien Mazarguil 

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_txq.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 4f17fb0..bae9f3d 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -343,6 +343,11 @@ txq_ctrl_setup(struct rte_eth_dev *dev, struct txq_ctrl 
*txq_ctrl,
  (void *)dev, strerror(ret));
goto error;
}
+   DEBUG("TX queue capabilities: max_send_wr=%u, max_send_sge=%u,"
+ " max_inline_data=%u",
+ attr.init.cap.max_send_wr,
+ attr.init.cap.max_send_sge,
+ attr.init.cap.max_inline_data);
attr.mod = (struct ibv_exp_qp_attr){
/* Move the QP to this state. */
.qp_state = IBV_QPS_INIT,
-- 
2.1.4

[dpdk-dev] [PATCH v4 18/25] mlx5: add support for multi-packet send

2016-06-22 Thread Nelio Laranjeiro

This feature enables the TX burst function to emit up to 5 packets using
only two WQEs on devices that support it. Saves PCI bandwidth and improves
performance.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
Signed-off-by: Olga Shern 
---
 doc/guides/nics/mlx5.rst   |  10 ++
 drivers/net/mlx5/mlx5.c|  14 +-
 drivers/net/mlx5/mlx5_ethdev.c |  15 +-
 drivers/net/mlx5/mlx5_rxtx.c   | 400 +
 drivers/net/mlx5/mlx5_rxtx.h   |   2 +
 drivers/net/mlx5/mlx5_txq.c|   2 +-
 6 files changed, 439 insertions(+), 4 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 9ada221..063c4a5 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -171,6 +171,16 @@ Run-time configuration

   This option should be used in combination with ``txq_inline`` above.

+- ``txq_mpw_en`` parameter [int]
+
+  A nonzero value enables multi-packet send. This feature allows the TX
+  burst function to pack up to five packets in two descriptors in order to
+  save PCI bandwidth and improve performance at the cost of a slightly
+  higher CPU usage.
+
+  It is currently only supported on the ConnectX-4 Lx family of adapters.
+  Enabled by default.
+
 Prerequisites
 -

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 49c7ae8..884f824 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -79,6 +79,9 @@
  * enabling inline send. */
 #define MLX5_TXQS_MIN_INLINE "txqs_min_inline"

+/* Device parameter to enable multi-packet send WQEs. */
+#define MLX5_TXQ_MPW_EN "txq_mpw_en"
+
 /**
  * Retrieve integer value from environment variable.
  *
@@ -280,6 +283,8 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
priv->txq_inline = tmp;
else if (strcmp(MLX5_TXQS_MIN_INLINE, key) == 0)
priv->txqs_inline = tmp;
+   else if (strcmp(MLX5_TXQ_MPW_EN, key) == 0)
+   priv->mps = !!tmp;
else {
WARN("%s: unknown parameter", key);
return -EINVAL;
@@ -305,6 +310,7 @@ mlx5_args(struct priv *priv, struct rte_devargs *devargs)
MLX5_RXQ_CQE_COMP_EN,
MLX5_TXQ_INLINE,
MLX5_TXQS_MIN_INLINE,
+   MLX5_TXQ_MPW_EN,
};
struct rte_kvargs *kvlist;
int ret = 0;
@@ -499,6 +505,7 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
priv->port = port;
priv->pd = pd;
priv->mtu = ETHER_MTU;
+   priv->mps = mps; /* Enable MPW by default if supported. */
priv->cqe_comp = 1; /* Enable compression by default. */
err = mlx5_args(priv, pci_dev->devargs);
if (err) {
@@ -547,7 +554,12 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)

priv_get_num_vfs(priv, &num_vfs);
priv->sriov = (num_vfs || sriov);
-   priv->mps = mps;
+   if (priv->mps && !mps) {
+   ERROR("multi-packet send not supported on this device"
+ " (" MLX5_TXQ_MPW_EN ")");
+   err = ENOTSUP;
+   goto port_error;
+   }
/* Allocate and register default RSS hash keys. */
priv->rss_conf = rte_calloc(__func__, hash_rxq_init_n,
sizeof((*priv->rss_conf)[0]), 0);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index aeea4ff..698a50e 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -584,7 +584,8 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *info)
  DEV_RX_OFFLOAD_UDP_CKSUM |
  DEV_RX_OFFLOAD_TCP_CKSUM) :
 0);
-   info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT;
+   if (!priv->mps)
+   info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT;
if (priv->hw_csum)
info->tx_offload_capa |=
(DEV_TX_OFFLOAD_IPV4_CKSUM |
@@ -1318,7 +1319,17 @@ void
 priv_select_tx_function(struct priv *priv)
 {
priv->dev->tx_pkt_burst = mlx5_tx_burst;
-   if (priv->txq_inline && (priv->txqs_n >= priv->txqs_inline)) {
+   /* Display warning for unsupported configurations. */
+   if (priv->sriov && priv->mps)
+   WARN("multi-packet send WQE cannot be used on a SR-IOV setup");
+   /* Select appropriate TX function. */
+   if ((priv->sriov == 0) && priv->mps && priv->txq_inline) {
+   priv->dev->tx_pkt_burst = mlx5_tx_burst_mpw_inline;
+   DEBUG("selected MPW inline TX function");
+   } else if ((priv->sriov == 0) && priv->mps) {
+   priv->dev->tx_pkt_burst = mlx5_tx_burst_mpw;
+   DEBUG("selected MPW TX function");
+

[dpdk-dev] [PATCH v4 17/25] mlx5: add support for inline send

2016-06-22 Thread Nelio Laranjeiro

From: Yaacov Hazan 

Implement send inline feature which copies packet data directly into WQEs
for improved latency. The maximum packet size and the minimum number of Tx
queues to qualify for inline send are user-configurable.

This feature is effective when HW causes a performance bottleneck.

Signed-off-by: Yaacov Hazan 
Signed-off-by: Adrien Mazarguil 
Signed-off-by: Nelio Laranjeiro 
---
 doc/guides/nics/mlx5.rst   |  17 +++
 drivers/net/mlx5/mlx5.c|  13 ++
 drivers/net/mlx5/mlx5.h|   2 +
 drivers/net/mlx5/mlx5_ethdev.c |   5 +
 drivers/net/mlx5/mlx5_rxtx.c   | 271 +
 drivers/net/mlx5/mlx5_rxtx.h   |   2 +
 drivers/net/mlx5/mlx5_txq.c|   4 +
 7 files changed, 314 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 756153b..9ada221 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -154,6 +154,23 @@ Run-time configuration
   allows to save PCI bandwidth and improve performance at the cost of a
   slightly higher CPU usage.  Enabled by default.

+- ``txq_inline`` parameter [int]
+
+  Amount of data to be inlined during TX operations. Improves latency.
+  Can improve PPS performance when PCI back pressure is detected and may be
+  useful for scenarios involving heavy traffic on many queues.
+
+  It is not enabled by default (set to 0) since the additional software
+  logic necessary to handle this mode can lower performance when back
+  pressure is not expected.
+
+- ``txqs_min_inline`` parameter [int]
+
+  Enable inline send only when the number of TX queues is greater or equal
+  to this value.
+
+  This option should be used in combination with ``txq_inline`` above.
+
 Prerequisites
 -

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ec4e0b6..49c7ae8 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -72,6 +72,13 @@
 /* Device parameter to enable RX completion queue compression. */
 #define MLX5_RXQ_CQE_COMP_EN "rxq_cqe_comp_en"

+/* Device parameter to configure inline send. */
+#define MLX5_TXQ_INLINE "txq_inline"
+
+/* Device parameter to configure the number of TX queues threshold for
+ * enabling inline send. */
+#define MLX5_TXQS_MIN_INLINE "txqs_min_inline"
+
 /**
  * Retrieve integer value from environment variable.
  *
@@ -269,6 +276,10 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
}
if (strcmp(MLX5_RXQ_CQE_COMP_EN, key) == 0)
priv->cqe_comp = !!tmp;
+   else if (strcmp(MLX5_TXQ_INLINE, key) == 0)
+   priv->txq_inline = tmp;
+   else if (strcmp(MLX5_TXQS_MIN_INLINE, key) == 0)
+   priv->txqs_inline = tmp;
else {
WARN("%s: unknown parameter", key);
return -EINVAL;
@@ -292,6 +303,8 @@ mlx5_args(struct priv *priv, struct rte_devargs *devargs)
 {
static const char *params[] = {
MLX5_RXQ_CQE_COMP_EN,
+   MLX5_TXQ_INLINE,
+   MLX5_TXQS_MIN_INLINE,
};
struct rte_kvargs *kvlist;
int ret = 0;
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 8f5a6df..3a86609 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -113,6 +113,8 @@ struct priv {
unsigned int mps:1; /* Whether multi-packet send is supported. */
unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */
unsigned int pending_alarm:1; /* An alarm is pending. */
+   unsigned int txq_inline; /* Maximum packet size for inlining. */
+   unsigned int txqs_inline; /* Queue number threshold for inlining. */
/* RX/TX queues. */
unsigned int rxqs_n; /* RX queues array size. */
unsigned int txqs_n; /* TX queues array size. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 47e64b2..aeea4ff 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1318,6 +1318,11 @@ void
 priv_select_tx_function(struct priv *priv)
 {
priv->dev->tx_pkt_burst = mlx5_tx_burst;
+   if (priv->txq_inline && (priv->txqs_n >= priv->txqs_inline)) {
+   priv->dev->tx_pkt_burst = mlx5_tx_burst_inline;
+   DEBUG("selected inline TX function (%u >= %u queues)",
+ priv->txqs_n, priv->txqs_inline);
+   }
 }

 /**
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index d56c9e9..43fe532 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -374,6 +374,139 @@ mlx5_wqe_write_vlan(struct txq *txq, volatile union 
mlx5_wqe *wqe,
 }

 /**
+ * Write a inline WQE.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param wqe
+ *   Pointer to the WQE to fill.
+ * @param addr
+ *   Buffer data address.
+ * @param length
+ *   Packet length.
+ * @param lkey
+ *   Memory region lkey.
+ */
+static inline void
+mlx5_wqe_write_inline(struct txq *txq, volatile union

[dpdk-dev] [PATCH v4 16/25] mlx5: replace countdown with threshold for Tx completions

2016-06-22 Thread Nelio Laranjeiro

From: Adrien Mazarguil 

Replacing the variable countdown (which depends on the number of
descriptors) with a fixed relative threshold known at compile time improves
performance by reducing the TX queue structure footprint and the amount of
code to manage completions during a burst.

Completions are now requested at most once per burst after threshold is
reached.

Signed-off-by: Adrien Mazarguil 
Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Vasily Philipov 
---
 drivers/net/mlx5/mlx5_defs.h |  7 +--
 drivers/net/mlx5/mlx5_rxtx.c | 42 --
 drivers/net/mlx5/mlx5_rxtx.h |  5 ++---
 drivers/net/mlx5/mlx5_txq.c  | 19 ---
 4 files changed, 43 insertions(+), 30 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 8d2ec7a..cc2a6f3 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -48,8 +48,11 @@
 /* Maximum number of special flows. */
 #define MLX5_MAX_SPECIAL_FLOWS 4

-/* Request send completion once in every 64 sends, might be less. */
-#define MLX5_PMD_TX_PER_COMP_REQ 64
+/*
+ * Request TX completion every time descriptors reach this threshold since
+ * the previous request. Must be a power of two for performance reasons.
+ */
+#define MLX5_TX_COMP_THRESH 32

 /* RSS Indirection table size. */
 #define RSS_INDIRECTION_TABLE_SIZE 256
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 30d413c..d56c9e9 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -154,9 +154,6 @@ check_cqe64(volatile struct mlx5_cqe64 *cqe,
  * Manage TX completions.
  *
  * When sending a burst, mlx5_tx_burst() posts several WRs.
- * To improve performance, a completion event is only required once every
- * MLX5_PMD_TX_PER_COMP_REQ sends. Doing so discards completion information
- * for other WRs, but this information would not be used anyway.
  *
  * @param txq
  *   Pointer to TX queue structure.
@@ -170,14 +167,16 @@ txq_complete(struct txq *txq)
uint16_t elts_free = txq->elts_tail;
uint16_t elts_tail;
uint16_t cq_ci = txq->cq_ci;
-   unsigned int wqe_ci = (unsigned int)-1;
+   volatile struct mlx5_cqe64 *cqe = NULL;
+   volatile union mlx5_wqe *wqe;

do {
-   unsigned int idx = cq_ci & cqe_cnt;
-   volatile struct mlx5_cqe64 *cqe = &(*txq->cqes)[idx].cqe64;
+   volatile struct mlx5_cqe64 *tmp;

-   if (check_cqe64(cqe, cqe_n, cq_ci) == 1)
+   tmp = &(*txq->cqes)[cq_ci & cqe_cnt].cqe64;
+   if (check_cqe64(tmp, cqe_n, cq_ci))
break;
+   cqe = tmp;
 #ifndef NDEBUG
if (MLX5_CQE_FORMAT(cqe->op_own) == MLX5_COMPRESSED) {
if (!check_cqe64_seen(cqe))
@@ -191,14 +190,15 @@ txq_complete(struct txq *txq)
return;
}
 #endif /* NDEBUG */
-   wqe_ci = ntohs(cqe->wqe_counter);
++cq_ci;
} while (1);
-   if (unlikely(wqe_ci == (unsigned int)-1))
+   if (unlikely(cqe == NULL))
return;
+   wqe = &(*txq->wqes)[htons(cqe->wqe_counter) & (txq->wqe_n - 1)];
+   elts_tail = wqe->wqe.ctrl.data[3];
+   assert(elts_tail < txq->wqe_n);
/* Free buffers. */
-   elts_tail = (wqe_ci + 1) & (elts_n - 1);
-   do {
+   while (elts_free != elts_tail) {
struct rte_mbuf *elt = (*txq->elts)[elts_free];
unsigned int elts_free_next =
(elts_free + 1) & (elts_n - 1);
@@ -214,7 +214,7 @@ txq_complete(struct txq *txq)
/* Only one segment needs to be freed. */
rte_pktmbuf_free_seg(elt);
elts_free = elts_free_next;
-   } while (elts_free != elts_tail);
+   }
txq->cq_ci = cq_ci;
txq->elts_tail = elts_tail;
/* Update the consumer index. */
@@ -435,6 +435,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
const unsigned int elts_n = txq->elts_n;
unsigned int i;
unsigned int max;
+   unsigned int comp;
volatile union mlx5_wqe *wqe;
struct rte_mbuf *buf;

@@ -484,12 +485,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
buf->vlan_tci);
else
mlx5_wqe_write(txq, wqe, addr, length, lkey);
-   /* Request completion if needed. */
-   if (unlikely(--txq->elts_comp == 0)) {
-   wqe->wqe.ctrl.data[2] = htonl(8);
-   txq->elts_comp = txq->elts_comp_cd_init;
-   } else
-   wqe->wqe.ctrl.data[2] = 0;
+   wqe->wqe.ctrl.data[2] = 0;
/* Should we enable HW CKSUM offload */
if (buf->ol_flags &
(PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM |

[dpdk-dev] [PATCH v4 15/25] mlx5: handle Rx CQE compression

2016-06-22 Thread Nelio Laranjeiro

Mini (compressed) CQEs are returned by the NIC when PCI back pressure is
detected, in which case the first CQE64 contains common packet information
followed by a number of CQE8 providing the rest, followed by a matching
number of empty CQE64 entries to be used by software for decompression.

Before decompression:

  0   1  2   6 7 8
  +---+  +-+ +---+   +---+ +---+ +---+
  | CQE64 |  |  CQE64  | | CQE64 |   | CQE64 | | CQE64 | | CQE64 |
  |---|  |-| |---|   |---| |---| |---|
  | . |  | cqe8[0] | |   | . |   | |   | | . |
  | . |  | cqe8[1] | |   | . |   | |   | | . |
  | . |  | ... | |   | . |   | |   | | . |
  | . |  | cqe8[7] | |   |   |   | |   | | . |
  +---+  +-+ +---+   +---+ +---+ +---+

After decompression:

  0  1 ... 8
  +---+  +---+ +---+
  | CQE64 |  | CQE64 | | CQE64 |
  |---|  |---| |---|
  | . |  | . |  .  | . |
  | . |  | . |  .  | . |
  | . |  | . |  .  | . |
  | . |  | . | | . |
  +---+  +---+ +---+

This patch does not perform the entire decompression step as it would be
really expensive, instead the first CQE64 is consumed and an internal
context is maintained to interpret the following CQE8 entries directly.

Intermediate empty CQE64 entries are handed back to HW without further
processing.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
Signed-off-by: Olga Shern 
Signed-off-by: Vasily Philipov 
---
 doc/guides/nics/mlx5.rst |   6 +
 drivers/net/mlx5/mlx5.c  |  25 -
 drivers/net/mlx5/mlx5.h  |   1 +
 drivers/net/mlx5/mlx5_rxq.c  |   9 +-
 drivers/net/mlx5/mlx5_rxtx.c | 260 ---
 drivers/net/mlx5/mlx5_rxtx.h |  11 ++
 drivers/net/mlx5/mlx5_txq.c  |   5 +
 7 files changed, 248 insertions(+), 69 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 3a07928..756153b 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -148,6 +148,12 @@ Run-time configuration

 - **ethtool** operations on related kernel interfaces also affect the PMD.

+- ``rxq_cqe_comp_en`` parameter [int]
+
+  A nonzero value enables the compression of CQE on RX side. This feature
+  allows to save PCI bandwidth and improve performance at the cost of a
+  slightly higher CPU usage.  Enabled by default.
+
 Prerequisites
 -

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 98884f7..ec4e0b6 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -69,6 +69,9 @@
 #include "mlx5_autoconf.h"
 #include "mlx5_defs.h"

+/* Device parameter to enable RX completion queue compression. */
+#define MLX5_RXQ_CQE_COMP_EN "rxq_cqe_comp_en"
+
 /**
  * Retrieve integer value from environment variable.
  *
@@ -256,12 +259,21 @@ static int
 mlx5_args_check(const char *key, const char *val, void *opaque)
 {
struct priv *priv = opaque;
+   unsigned long tmp;

-   /* No parameters are expected at the moment. */
-   (void)priv;
-   (void)val;
-   WARN("%s: unknown parameter", key);
-   return -EINVAL;
+   errno = 0;
+   tmp = strtoul(val, NULL, 0);
+   if (errno) {
+   WARN("%s: \"%s\" is not a valid integer", key, val);
+   return errno;
+   }
+   if (strcmp(MLX5_RXQ_CQE_COMP_EN, key) == 0)
+   priv->cqe_comp = !!tmp;
+   else {
+   WARN("%s: unknown parameter", key);
+   return -EINVAL;
+   }
+   return 0;
 }

 /**
@@ -279,7 +291,7 @@ static int
 mlx5_args(struct priv *priv, struct rte_devargs *devargs)
 {
static const char *params[] = {
-   NULL,
+   MLX5_RXQ_CQE_COMP_EN,
};
struct rte_kvargs *kvlist;
int ret = 0;
@@ -474,6 +486,7 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
priv->port = port;
priv->pd = pd;
priv->mtu = ETHER_MTU;
+   priv->cqe_comp = 1; /* Enable compression by default. */
err = mlx5_args(priv, pci_dev->devargs);
if (err) {
ERROR("failed to process device arguments: %s",
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3dca03d..8f5a6df 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -111,6 +111,7 @@ struct priv {
unsigned int hw_padding:1; /* End alignment padding is supported. */
unsigned int sriov:1; /* This is a VF or PF with VF devices. */
unsigned int mps:1; /* Whether multi-packet send is supported. */
+   unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */
unsigned int pending_alarm:1; /* An alarm is pe

[dpdk-dev] [PATCH v4 14/25] mlx5: refactor Tx data path

2016-06-22 Thread Nelio Laranjeiro

Bypass Verbs to improve Tx performance.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Yaacov Hazan 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/Makefile  |   5 -
 drivers/net/mlx5/mlx5_ethdev.c |  10 +-
 drivers/net/mlx5/mlx5_mr.c |   4 +-
 drivers/net/mlx5/mlx5_rxtx.c   | 359 ++---
 drivers/net/mlx5/mlx5_rxtx.h   |  52 +++---
 drivers/net/mlx5/mlx5_txq.c| 216 +
 6 files changed, 343 insertions(+), 303 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index dc99797..66687e8 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -106,11 +106,6 @@ mlx5_autoconf.h.new: FORCE
 mlx5_autoconf.h.new: $(RTE_SDK)/scripts/auto-config-h.sh
$Q $(RM) -f -- '$@'
$Q sh -- '$<' '$@' \
-   HAVE_VERBS_VLAN_INSERTION \
-   infiniband/verbs.h \
-   enum IBV_EXP_RECEIVE_WQ_CVLAN_INSERTION \
-   $(AUTOCONF_OUTPUT)
-   $Q sh -- '$<' '$@' \
HAVE_VERBS_IBV_EXP_CQ_COMPRESSED_CQE \
infiniband/verbs_exp.h \
enum IBV_EXP_CQ_COMPRESSED_CQE \
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 16b05d3..47e64b2 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1242,11 +1242,11 @@ mlx5_secondary_data_setup(struct priv *priv)
txq_ctrl = rte_calloc_socket("TXQ", 1, sizeof(*txq_ctrl), 0,
 primary_txq_ctrl->socket);
if (txq_ctrl != NULL) {
-   if (txq_setup(priv->dev,
- primary_txq_ctrl,
- primary_txq->elts_n,
- primary_txq_ctrl->socket,
- NULL) == 0) {
+   if (txq_ctrl_setup(priv->dev,
+  primary_txq_ctrl,
+  primary_txq->elts_n,
+  primary_txq_ctrl->socket,
+  NULL) == 0) {
txq_ctrl->txq.stats.idx =
primary_txq->stats.idx;
tx_queues[i] = &txq_ctrl->txq;
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 79d5568..e5e8a04 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -189,7 +189,7 @@ txq_mp2mr_reg(struct txq *txq, struct rte_mempool *mp, 
unsigned int idx)
/* Add a new entry, register MR first. */
DEBUG("%p: discovered new memory pool \"%s\" (%p)",
  (void *)txq_ctrl, mp->name, (void *)mp);
-   mr = mlx5_mp2mr(txq_ctrl->txq.priv->pd, mp);
+   mr = mlx5_mp2mr(txq_ctrl->priv->pd, mp);
if (unlikely(mr == NULL)) {
DEBUG("%p: unable to configure MR, ibv_reg_mr() failed.",
  (void *)txq_ctrl);
@@ -208,7 +208,7 @@ txq_mp2mr_reg(struct txq *txq, struct rte_mempool *mp, 
unsigned int idx)
/* Store the new entry. */
txq_ctrl->txq.mp2mr[idx].mp = mp;
txq_ctrl->txq.mp2mr[idx].mr = mr;
-   txq_ctrl->txq.mp2mr[idx].lkey = mr->lkey;
+   txq_ctrl->txq.mp2mr[idx].lkey = htonl(mr->lkey);
DEBUG("%p: new MR lkey for MP \"%s\" (%p): 0x%08" PRIu32,
  (void *)txq_ctrl, mp->name, (void *)mp,
  txq_ctrl->txq.mp2mr[idx].lkey);
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 27d8852..95bf981 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -119,68 +119,52 @@ get_cqe64(volatile struct mlx5_cqe cqes[],
  *
  * @param txq
  *   Pointer to TX queue structure.
- *
- * @return
- *   0 on success, -1 on failure.
  */
-static int
+static void
 txq_complete(struct txq *txq)
 {
-   unsigned int elts_comp = txq->elts_comp;
-   unsigned int elts_tail = txq->elts_tail;
-   unsigned int elts_free = txq->elts_tail;
const unsigned int elts_n = txq->elts_n;
-   int wcs_n;
-
-   if (unlikely(elts_comp == 0))
-   return 0;
-#ifdef DEBUG_SEND
-   DEBUG("%p: processing %u work requests completions",
- (void *)txq, elts_comp);
-#endif
-   wcs_n = txq->poll_cnt(txq->cq, elts_comp);
-   if (unlikely(wcs_n == 0))
-   return 0;
-   if (unlikely(wcs_n < 0)) {
-   DEBUG("%p: ibv_poll_cq() failed (wcs_n=%d)",
- (void *)txq, wcs_n);
-   return -1;
+   const unsigned int cqe_n = txq->cqe_n;
+   uint16_t elts_free = txq->elts_tail;
+   uint16_t elts_tail;
+   uint16_t cq_ci = txq->cq_ci;
+   unsigned int wqe_ci = (unsigned int)-1;
+   int ret = 0;
+
+   while (ret == 0) {
+   volatile struct mlx5_cqe64 *cqe;
+
+   cqe = get_cqe64(*txq->cqes, cqe_n

[dpdk-dev] [PATCH v4 13/25] mlx5: refactor Rx data path

2016-06-22 Thread Nelio Laranjeiro

Bypass Verbs to improve RX performance.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Yaacov Hazan 
Signed-off-by: Adrien Mazarguil 
Signed-off-by: Vasily Philipov 
---
 drivers/net/mlx5/mlx5_ethdev.c |   4 +-
 drivers/net/mlx5/mlx5_fdir.c   |   2 +-
 drivers/net/mlx5/mlx5_rxq.c| 303 -
 drivers/net/mlx5/mlx5_rxtx.c   | 289 ---
 drivers/net/mlx5/mlx5_rxtx.h   |  38 +++---
 drivers/net/mlx5/mlx5_vlan.c   |   3 +-
 6 files changed, 325 insertions(+), 314 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 759434e..16b05d3 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1263,7 +1263,9 @@ mlx5_secondary_data_setup(struct priv *priv)
}
/* RX queues. */
for (i = 0; i != nb_rx_queues; ++i) {
-   struct rxq *primary_rxq = (*sd->primary_priv->rxqs)[i];
+   struct rxq_ctrl *primary_rxq =
+   container_of((*sd->primary_priv->rxqs)[i],
+struct rxq_ctrl, rxq);

if (primary_rxq == NULL)
continue;
diff --git a/drivers/net/mlx5/mlx5_fdir.c b/drivers/net/mlx5/mlx5_fdir.c
index 1850218..73eb00e 100644
--- a/drivers/net/mlx5/mlx5_fdir.c
+++ b/drivers/net/mlx5/mlx5_fdir.c
@@ -431,7 +431,7 @@ priv_get_fdir_queue(struct priv *priv, uint16_t idx)
ind_init_attr = (struct ibv_exp_rwq_ind_table_init_attr){
.pd = priv->pd,
.log_ind_tbl_size = 0,
-   .ind_tbl = &((*priv->rxqs)[idx]->wq),
+   .ind_tbl = &rxq_ctrl->wq,
.comp_mask = 0,
};

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 7db4ce7..a8f68a3 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -43,6 +43,8 @@
 #pragma GCC diagnostic ignored "-pedantic"
 #endif
 #include 
+#include 
+#include 
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-pedantic"
 #endif
@@ -373,8 +375,13 @@ priv_create_hash_rxqs(struct priv *priv)
DEBUG("indirection table extended to assume %u WQs",
  priv->reta_idx_n);
}
-   for (i = 0; (i != priv->reta_idx_n); ++i)
-   wqs[i] = (*priv->rxqs)[(*priv->reta_idx)[i]]->wq;
+   for (i = 0; (i != priv->reta_idx_n); ++i) {
+   struct rxq_ctrl *rxq_ctrl;
+
+   rxq_ctrl = container_of((*priv->rxqs)[(*priv->reta_idx)[i]],
+   struct rxq_ctrl, rxq);
+   wqs[i] = rxq_ctrl->wq;
+   }
/* Get number of hash RX queues to configure. */
for (i = 0, hash_rxqs_n = 0; (i != ind_tables_n); ++i)
hash_rxqs_n += ind_table_init[i].hash_types_n;
@@ -638,21 +645,13 @@ rxq_alloc_elts(struct rxq_ctrl *rxq_ctrl, unsigned int 
elts_n,
   struct rte_mbuf **pool)
 {
unsigned int i;
-   struct rxq_elt (*elts)[elts_n] =
-   rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
- rxq_ctrl->socket);
int ret = 0;

-   if (elts == NULL) {
-   ERROR("%p: can't allocate packets array", (void *)rxq_ctrl);
-   ret = ENOMEM;
-   goto error;
-   }
/* For each WR (packet). */
for (i = 0; (i != elts_n); ++i) {
-   struct rxq_elt *elt = &(*elts)[i];
-   struct ibv_sge *sge = &(*elts)[i].sge;
struct rte_mbuf *buf;
+   volatile struct mlx5_wqe_data_seg *scat =
+   &(*rxq_ctrl->rxq.wqes)[i];

if (pool != NULL) {
buf = *(pool++);
@@ -666,40 +665,36 @@ rxq_alloc_elts(struct rxq_ctrl *rxq_ctrl, unsigned int 
elts_n,
ret = ENOMEM;
goto error;
}
-   elt->buf = buf;
/* Headroom is reserved by rte_pktmbuf_alloc(). */
assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
/* Buffer is supposed to be empty. */
assert(rte_pktmbuf_data_len(buf) == 0);
assert(rte_pktmbuf_pkt_len(buf) == 0);
-   /* sge->addr must be able to store a pointer. */
-   assert(sizeof(sge->addr) >= sizeof(uintptr_t));
-   /* SGE keeps its headroom. */
-   sge->addr = (uintptr_t)
-   ((uint8_t *)buf->buf_addr + RTE_PKTMBUF_HEADROOM);
-   sge->length = (buf->buf_len - RTE_PKTMBUF_HEADROOM);
-   sge->lkey = rxq_ctrl->mr->lkey;
-   /* Redundant check for tailroom. */
-   assert(sge->length == rte_pktmbuf_tailroom(buf));
+   assert(!buf->next);
+   PORT(buf) = rxq_ctrl->rxq.port_id;
+   DATA_LEN(buf) = rte_pktmbuf_tailroom(buf);
+   PKT_LEN(buf) = DATA_LEN(buf);
+   NB_SEGS

[dpdk-dev] [PATCH v4 12/25] mlx5: add Tx/Rx burst function selection wrapper

2016-06-22 Thread Nelio Laranjeiro

These wrappers are meant to prevent code duplication later.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.h|  2 ++
 drivers/net/mlx5/mlx5_ethdev.c | 34 --
 drivers/net/mlx5/mlx5_txq.c|  2 +-
 3 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 935e1b0..3dca03d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -196,6 +196,8 @@ void priv_dev_interrupt_handler_install(struct priv *, 
struct rte_eth_dev *);
 int mlx5_set_link_down(struct rte_eth_dev *dev);
 int mlx5_set_link_up(struct rte_eth_dev *dev);
 struct priv *mlx5_secondary_data_setup(struct priv *priv);
+void priv_select_tx_function(struct priv *);
+void priv_select_rx_function(struct priv *);

 /* mlx5_mac.c */

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 4095a06..759434e 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1099,8 +1099,8 @@ priv_set_link(struct priv *priv, int up)
err = priv_set_flags(priv, ~IFF_UP, IFF_UP);
if (err)
return err;
-   dev->rx_pkt_burst = mlx5_rx_burst;
-   dev->tx_pkt_burst = mlx5_tx_burst;
+   priv_select_tx_function(priv);
+   priv_select_rx_function(priv);
} else {
err = priv_set_flags(priv, ~IFF_UP, ~IFF_UP);
if (err)
@@ -1290,13 +1290,11 @@ mlx5_secondary_data_setup(struct priv *priv)
rte_mb();
priv->dev->data = &sd->data;
rte_mb();
-   priv->dev->tx_pkt_burst = mlx5_tx_burst;
-   priv->dev->rx_pkt_burst = removed_rx_burst;
+   priv_select_tx_function(priv);
+   priv_select_rx_function(priv);
priv_unlock(priv);
 end:
/* More sanity checks. */
-   assert(priv->dev->tx_pkt_burst == mlx5_tx_burst);
-   assert(priv->dev->rx_pkt_burst == removed_rx_burst);
assert(priv->dev->data == &sd->data);
rte_spinlock_unlock(&sd->lock);
return priv;
@@ -1307,3 +1305,27 @@ error:
rte_spinlock_unlock(&sd->lock);
return NULL;
 }
+
+/**
+ * Configure the TX function to use.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+void
+priv_select_tx_function(struct priv *priv)
+{
+   priv->dev->tx_pkt_burst = mlx5_tx_burst;
+}
+
+/**
+ * Configure the RX function to use.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+void
+priv_select_rx_function(struct priv *priv)
+{
+   priv->dev->rx_pkt_burst = mlx5_rx_burst;
+}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 9f3a33b..d7cc39d 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -477,7 +477,7 @@ mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, 
uint16_t desc,
  (void *)dev, (void *)txq_ctrl);
(*priv->txqs)[idx] = &txq_ctrl->txq;
/* Update send callback. */
-   dev->tx_pkt_burst = mlx5_tx_burst;
+   priv_select_tx_function(priv);
}
priv_unlock(priv);
return -ret;
-- 
2.1.4

[dpdk-dev] [PATCH v4 11/25] mlx5: add support for configuration through kvargs

2016-06-22 Thread Nelio Laranjeiro

The intent is to replace the remaining compile-time options and environment
variables with a common mean of runtime configuration. This commit only
adds the kvargs handling code, subsequent commits will update the rest.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c | 72 +
 1 file changed, 72 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 3f45d84..98884f7 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 

 /* Verbs header. */
@@ -57,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-pedantic"
 #endif
@@ -237,6 +239,70 @@ mlx5_dev_idx(struct rte_pci_addr *pci_addr)
return ret;
 }

+/**
+ * Verify and store value for device argument.
+ *
+ * @param[in] key
+ *   Key argument to verify.
+ * @param[in] val
+ *   Value associated with key.
+ * @param opaque
+ *   User data.
+ *
+ * @return
+ *   0 on success, negative errno value on failure.
+ */
+static int
+mlx5_args_check(const char *key, const char *val, void *opaque)
+{
+   struct priv *priv = opaque;
+
+   /* No parameters are expected at the moment. */
+   (void)priv;
+   (void)val;
+   WARN("%s: unknown parameter", key);
+   return -EINVAL;
+}
+
+/**
+ * Parse device parameters.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param devargs
+ *   Device arguments structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+static int
+mlx5_args(struct priv *priv, struct rte_devargs *devargs)
+{
+   static const char *params[] = {
+   NULL,
+   };
+   struct rte_kvargs *kvlist;
+   int ret = 0;
+   int i;
+
+   if (devargs == NULL)
+   return 0;
+   kvlist = rte_kvargs_parse(devargs->args, params);
+   if (kvlist == NULL)
+   return 0;
+   /* Process parameters. */
+   for (i = 0; (i != RTE_DIM(params)); ++i) {
+   if (rte_kvargs_count(kvlist, params[i])) {
+   ret = rte_kvargs_process(kvlist, params[i],
+mlx5_args_check, priv);
+   if (ret != 0)
+   return ret;
+   }
+   }
+   rte_kvargs_free(kvlist);
+   return 0;
+}
+
 static struct eth_driver mlx5_driver;

 /**
@@ -408,6 +474,12 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
priv->port = port;
priv->pd = pd;
priv->mtu = ETHER_MTU;
+   err = mlx5_args(priv, pci_dev->devargs);
+   if (err) {
+   ERROR("failed to process device arguments: %s",
+ strerror(err));
+   goto port_error;
+   }
if (ibv_exp_query_device(ctx, &exp_device_attr)) {
ERROR("ibv_exp_query_device() failed");
goto port_error;
-- 
2.1.4

[dpdk-dev] [PATCH v4 10/25] mlx5: add definitions for data path without Verbs

2016-06-22 Thread Nelio Laranjeiro

These structures and macros extend those exposed by libmlx5 (in mlx5_hw.h)
to let the PMD manage work queue and completion queue elements directly.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_prm.h | 163 
 1 file changed, 163 insertions(+)
 create mode 100644 drivers/net/mlx5/mlx5_prm.h

diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
new file mode 100644
index 000..5db219b
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -0,0 +1,163 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_PMD_MLX5_PRM_H_
+#define RTE_PMD_MLX5_PRM_H_
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include 
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* Get CQE owner bit. */
+#define MLX5_CQE_OWNER(op_own) ((op_own) & MLX5_CQE_OWNER_MASK)
+
+/* Get CQE format. */
+#define MLX5_CQE_FORMAT(op_own) (((op_own) & MLX5E_CQE_FORMAT_MASK) >> 2)
+
+/* Get CQE opcode. */
+#define MLX5_CQE_OPCODE(op_own) (((op_own) & 0xf0) >> 4)
+
+/* Get CQE solicited event. */
+#define MLX5_CQE_SE(op_own) (((op_own) >> 1) & 1)
+
+/* Invalidate a CQE. */
+#define MLX5_CQE_INVALIDATE (MLX5_CQE_INVALID << 4)
+
+/* CQE value to inform that VLAN is stripped. */
+#define MLX5_CQE_VLAN_STRIPPED 0x1
+
+/* Maximum number of packets a multi-packet WQE can handle. */
+#define MLX5_MPW_DSEG_MAX 5
+
+/* Room for inline data in regular work queue element. */
+#define MLX5_WQE64_INL_DATA 12
+
+/* Room for inline data in multi-packet WQE. */
+#define MLX5_MWQE64_INL_DATA 28
+
+/* Subset of struct mlx5_wqe_eth_seg. */
+struct mlx5_wqe_eth_seg_small {
+   uint32_t rsvd0;
+   uint8_t cs_flags;
+   uint8_t rsvd1;
+   uint16_t mss;
+   uint32_t rsvd2;
+   uint16_t inline_hdr_sz;
+};
+
+/* Regular WQE. */
+struct mlx5_wqe_regular {
+   union {
+   struct mlx5_wqe_ctrl_seg ctrl;
+   uint32_t data[4];
+   } ctrl;
+   struct mlx5_wqe_eth_seg eseg;
+   struct mlx5_wqe_data_seg dseg;
+} __rte_aligned(64);
+
+/* Inline WQE. */
+struct mlx5_wqe_inl {
+   union {
+   struct mlx5_wqe_ctrl_seg ctrl;
+   uint32_t data[4];
+   } ctrl;
+   struct mlx5_wqe_eth_seg eseg;
+   uint32_t byte_cnt;
+   uint8_t data[MLX5_WQE64_INL_DATA];
+} __rte_aligned(64);
+
+/* Multi-packet WQE. */
+struct mlx5_wqe_mpw {
+   union {
+   struct mlx5_wqe_ctrl_seg ctrl;
+   uint32_t data[4];
+   } ctrl;
+   struct mlx5_wqe_eth_seg_small eseg;
+   struct mlx5_wqe_data_seg dseg[2];
+} __rte_aligned(64);
+
+/* Multi-packet WQE with inline. */
+struct mlx5_wqe_mpw_inl {
+   union {
+   struct mlx5_wqe_ctrl_seg ctrl;
+   uint32_t data[4];
+   } ctrl;
+   struct mlx5_wqe_eth_seg_small eseg;
+   uint32_t byte_cnt;
+   uint8_t data[MLX5_MWQE64_INL_DATA];
+} __rte_aligned(64);
+
+/* Union of all WQE types. */
+union mlx5_wqe {
+   struct mlx5_wqe_regular wqe;
+   struct mlx5_wqe_inl inl;
+   struct mlx5_wqe_mpw mpw;
+   struct mlx5_wqe_mpw_inl mpw_inl;
+   uint8_t data[64];
+};
+
+/* MPW session status. */
+enum mlx5_mpw_state {
+

[dpdk-dev] [PATCH v4 09/25] mlx5: update prerequisites for upcoming enhancements

2016-06-22 Thread Nelio Laranjeiro

The latest version of Mellanox OFED exposes hardware definitions necessary
to implement data path operation bypassing Verbs. Update the minimum
version requirement to MLNX_OFED >= 3.3 and clean up compatibility checks
for previous releases.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 doc/guides/nics/mlx5.rst   | 44 +++---
 drivers/net/mlx5/Makefile  | 39 -
 drivers/net/mlx5/mlx5.c| 23 --
 drivers/net/mlx5/mlx5.h|  5 +
 drivers/net/mlx5/mlx5_defs.h   |  9 -
 drivers/net/mlx5/mlx5_fdir.c   | 10 --
 drivers/net/mlx5/mlx5_rxmode.c |  8 
 drivers/net/mlx5/mlx5_rxq.c| 30 
 drivers/net/mlx5/mlx5_rxtx.c   |  4 
 drivers/net/mlx5/mlx5_rxtx.h   |  8 
 drivers/net/mlx5/mlx5_txq.c|  2 --
 drivers/net/mlx5/mlx5_vlan.c   |  3 ---
 12 files changed, 16 insertions(+), 169 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 77fa957..3a07928 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -125,16 +125,6 @@ These options can be modified in the ``.config`` file.
 Environment variables
 ~

-- ``MLX5_ENABLE_CQE_COMPRESSION``
-
-  A nonzero value lets ConnectX-4 return smaller completion entries to
-  improve performance when PCI backpressure is detected. It is most useful
-  for scenarios involving heavy traffic on many queues.
-
-  Since the additional software logic necessary to handle this mode can
-  lower performance when there is no backpressure, it is not enabled by
-  default.
-
 - ``MLX5_PMD_ENABLE_PADDING``

   Enables HW packet padding in PCI bus transactions.
@@ -211,40 +201,12 @@ DPDK and must be installed separately:

 Currently supported by DPDK:

-- Mellanox OFED **3.1-1.0.3**, **3.1-1.5.7.1** or **3.2-2.0.0.0** depending
-  on usage.
-
-The following features are supported with version **3.1-1.5.7.1** and
-above only:
-
-- IPv6, UPDv6, TCPv6 RSS.
-- RX checksum offloads.
-- IBM POWER8.
-
-The following features are supported with version **3.2-2.0.0.0** and
-above only:
-
-- Flow director.
-- RX VLAN stripping.
-- TX VLAN insertion.
-- RX CRC stripping configuration.
+- Mellanox OFED **3.3-1.0.0.0**.

 - Minimum firmware version:

-  With MLNX_OFED **3.1-1.0.3**:
-
-  - ConnectX-4: **12.12.1240**
-  - ConnectX-4 Lx: **14.12.1100**
-
-  With MLNX_OFED **3.1-1.5.7.1**:
-
-  - ConnectX-4: **12.13.0144**
-  - ConnectX-4 Lx: **14.13.0144**
-
-  With MLNX_OFED **3.2-2.0.0.0**:
-
-  - ConnectX-4: **12.14.2036**
-  - ConnectX-4 Lx: **14.14.2036**
+  - ConnectX-4: **12.16.1006**
+  - ConnectX-4 Lx: **14.16.1006**

 Getting Mellanox OFED
 ~
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 289c85e..dc99797 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -106,42 +106,19 @@ mlx5_autoconf.h.new: FORCE
 mlx5_autoconf.h.new: $(RTE_SDK)/scripts/auto-config-h.sh
$Q $(RM) -f -- '$@'
$Q sh -- '$<' '$@' \
-   HAVE_EXP_QUERY_DEVICE \
-   infiniband/verbs.h \
-   type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
-   $Q sh -- '$<' '$@' \
-   HAVE_FLOW_SPEC_IPV6 \
-   infiniband/verbs.h \
-   type 'struct ibv_exp_flow_spec_ipv6' $(AUTOCONF_OUTPUT)
-   $Q sh -- '$<' '$@' \
-   HAVE_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR \
-   infiniband/verbs.h \
-   enum IBV_EXP_QP_BURST_CREATE_ENABLE_MULTI_PACKET_SEND_WR \
-   $(AUTOCONF_OUTPUT)
-   $Q sh -- '$<' '$@' \
-   HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS \
-   infiniband/verbs.h \
-   enum IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS \
-   $(AUTOCONF_OUTPUT)
-   $Q sh -- '$<' '$@' \
-   HAVE_EXP_CQ_RX_TCP_PACKET \
+   HAVE_VERBS_VLAN_INSERTION \
infiniband/verbs.h \
-   enum IBV_EXP_CQ_RX_TCP_PACKET \
+   enum IBV_EXP_RECEIVE_WQ_CVLAN_INSERTION \
$(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
-   HAVE_VERBS_FCS \
-   infiniband/verbs.h \
-   enum IBV_EXP_CREATE_WQ_FLAG_SCATTER_FCS \
+   HAVE_VERBS_IBV_EXP_CQ_COMPRESSED_CQE \
+   infiniband/verbs_exp.h \
+   enum IBV_EXP_CQ_COMPRESSED_CQE \
$(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
-   HAVE_VERBS_RX_END_PADDING \
-   infiniband/verbs.h \
-   enum IBV_EXP_CREATE_WQ_FLAG_RX_END_PADDING \
-   $(AUTOCONF_OUTPUT)
-   $Q sh -- '$<' '$@' \
-   HAVE_VERBS_VLAN_INSERTION \
-   infiniband/verbs.h \
-   enum IBV_EXP_RECEIVE_WQ_CVLAN_INSERTION \
+   HAVE_VERBS_MLX5_ETH_VLAN_INLINE_HEADER_SI

[dpdk-dev] [PATCH v4 08/25] mlx5: split Rx queue structure

2016-06-22 Thread Nelio Laranjeiro

To keep the data path as efficient as possible, move fields only useful to
the control path into new structure rxq_ctrl.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c  |   6 +-
 drivers/net/mlx5/mlx5_fdir.c |   8 +-
 drivers/net/mlx5/mlx5_rxq.c  | 250 ++-
 drivers/net/mlx5/mlx5_rxtx.c |   1 -
 drivers/net/mlx5/mlx5_rxtx.h |  13 ++-
 5 files changed, 148 insertions(+), 130 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 3d30e00..27a7a30 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -122,12 +122,14 @@ mlx5_dev_close(struct rte_eth_dev *dev)
usleep(1000);
for (i = 0; (i != priv->rxqs_n); ++i) {
struct rxq *rxq = (*priv->rxqs)[i];
+   struct rxq_ctrl *rxq_ctrl;

if (rxq == NULL)
continue;
+   rxq_ctrl = container_of(rxq, struct rxq_ctrl, rxq);
(*priv->rxqs)[i] = NULL;
-   rxq_cleanup(rxq);
-   rte_free(rxq);
+   rxq_cleanup(rxq_ctrl);
+   rte_free(rxq_ctrl);
}
priv->rxqs_n = 0;
priv->rxqs = NULL;
diff --git a/drivers/net/mlx5/mlx5_fdir.c b/drivers/net/mlx5/mlx5_fdir.c
index 63e43ad..e3b97ba 100644
--- a/drivers/net/mlx5/mlx5_fdir.c
+++ b/drivers/net/mlx5/mlx5_fdir.c
@@ -424,7 +424,9 @@ create_flow:
 static struct fdir_queue *
 priv_get_fdir_queue(struct priv *priv, uint16_t idx)
 {
-   struct fdir_queue *fdir_queue = &(*priv->rxqs)[idx]->fdir_queue;
+   struct rxq_ctrl *rxq_ctrl =
+   container_of((*priv->rxqs)[idx], struct rxq_ctrl, rxq);
+   struct fdir_queue *fdir_queue = &rxq_ctrl->fdir_queue;
struct ibv_exp_rwq_ind_table *ind_table = NULL;
struct ibv_qp *qp = NULL;
struct ibv_exp_rwq_ind_table_init_attr ind_init_attr;
@@ -629,8 +631,10 @@ priv_fdir_disable(struct priv *priv)
/* Run on every RX queue to destroy related flow director QP and
 * indirection table. */
for (i = 0; (i != priv->rxqs_n); i++) {
-   fdir_queue = &(*priv->rxqs)[i]->fdir_queue;
+   struct rxq_ctrl *rxq_ctrl =
+   container_of((*priv->rxqs)[i], struct rxq_ctrl, rxq);

+   fdir_queue = &rxq_ctrl->fdir_queue;
if (fdir_queue->qp != NULL) {
claim_zero(ibv_destroy_qp(fdir_queue->qp));
fdir_queue->qp = NULL;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 4000624..8d32e74 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -636,7 +636,7 @@ priv_rehash_flows(struct priv *priv)
 /**
  * Allocate RX queue elements.
  *
- * @param rxq
+ * @param rxq_ctrl
  *   Pointer to RX queue structure.
  * @param elts_n
  *   Number of elements to allocate.
@@ -648,16 +648,17 @@ priv_rehash_flows(struct priv *priv)
  *   0 on success, errno value on failure.
  */
 static int
-rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
+rxq_alloc_elts(struct rxq_ctrl *rxq_ctrl, unsigned int elts_n,
+  struct rte_mbuf **pool)
 {
unsigned int i;
struct rxq_elt (*elts)[elts_n] =
rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
- rxq->socket);
+ rxq_ctrl->socket);
int ret = 0;

if (elts == NULL) {
-   ERROR("%p: can't allocate packets array", (void *)rxq);
+   ERROR("%p: can't allocate packets array", (void *)rxq_ctrl);
ret = ENOMEM;
goto error;
}
@@ -672,10 +673,10 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, 
struct rte_mbuf **pool)
assert(buf != NULL);
rte_pktmbuf_reset(buf);
} else
-   buf = rte_pktmbuf_alloc(rxq->mp);
+   buf = rte_pktmbuf_alloc(rxq_ctrl->rxq.mp);
if (buf == NULL) {
assert(pool == NULL);
-   ERROR("%p: empty mbuf pool", (void *)rxq);
+   ERROR("%p: empty mbuf pool", (void *)rxq_ctrl);
ret = ENOMEM;
goto error;
}
@@ -691,15 +692,15 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, 
struct rte_mbuf **pool)
sge->addr = (uintptr_t)
((uint8_t *)buf->buf_addr + RTE_PKTMBUF_HEADROOM);
sge->length = (buf->buf_len - RTE_PKTMBUF_HEADROOM);
-   sge->lkey = rxq->mr->lkey;
+   sge->lkey = rxq_ctrl->mr->lkey;
/* Redundant check for tailroom. */
assert(sge->length == rte_pktmbuf_tailroom

[dpdk-dev] [PATCH v4 07/25] mlx5: split Tx queue structure

2016-06-22 Thread Nelio Laranjeiro

To keep the data path as efficient as possible, move fields only useful to
the control path into new structure txq_ctrl.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5.c|  21 +++--
 drivers/net/mlx5/mlx5_ethdev.c |  28 +++---
 drivers/net/mlx5/mlx5_mr.c |  39 
 drivers/net/mlx5/mlx5_rxtx.h   |   9 +-
 drivers/net/mlx5/mlx5_txq.c| 198 +
 5 files changed, 159 insertions(+), 136 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 350028b..3d30e00 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -98,7 +98,6 @@ static void
 mlx5_dev_close(struct rte_eth_dev *dev)
 {
struct priv *priv = mlx5_get_priv(dev);
-   void *tmp;
unsigned int i;

priv_lock(priv);
@@ -122,12 +121,13 @@ mlx5_dev_close(struct rte_eth_dev *dev)
/* XXX race condition if mlx5_rx_burst() is still running. */
usleep(1000);
for (i = 0; (i != priv->rxqs_n); ++i) {
-   tmp = (*priv->rxqs)[i];
-   if (tmp == NULL)
+   struct rxq *rxq = (*priv->rxqs)[i];
+
+   if (rxq == NULL)
continue;
(*priv->rxqs)[i] = NULL;
-   rxq_cleanup(tmp);
-   rte_free(tmp);
+   rxq_cleanup(rxq);
+   rte_free(rxq);
}
priv->rxqs_n = 0;
priv->rxqs = NULL;
@@ -136,12 +136,15 @@ mlx5_dev_close(struct rte_eth_dev *dev)
/* XXX race condition if mlx5_tx_burst() is still running. */
usleep(1000);
for (i = 0; (i != priv->txqs_n); ++i) {
-   tmp = (*priv->txqs)[i];
-   if (tmp == NULL)
+   struct txq *txq = (*priv->txqs)[i];
+   struct txq_ctrl *txq_ctrl;
+
+   if (txq == NULL)
continue;
+   txq_ctrl = container_of(txq, struct txq_ctrl, txq);
(*priv->txqs)[i] = NULL;
-   txq_cleanup(tmp);
-   rte_free(tmp);
+   txq_cleanup(txq_ctrl);
+   rte_free(txq_ctrl);
}
priv->txqs_n = 0;
priv->txqs = NULL;
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index ca57021..4095a06 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1232,28 +1232,32 @@ mlx5_secondary_data_setup(struct priv *priv)
/* TX queues. */
for (i = 0; i != nb_tx_queues; ++i) {
struct txq *primary_txq = (*sd->primary_priv->txqs)[i];
-   struct txq *txq;
+   struct txq_ctrl *primary_txq_ctrl;
+   struct txq_ctrl *txq_ctrl;

if (primary_txq == NULL)
continue;
-   txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0,
-   primary_txq->socket);
-   if (txq != NULL) {
+   primary_txq_ctrl = container_of(primary_txq,
+   struct txq_ctrl, txq);
+   txq_ctrl = rte_calloc_socket("TXQ", 1, sizeof(*txq_ctrl), 0,
+primary_txq_ctrl->socket);
+   if (txq_ctrl != NULL) {
if (txq_setup(priv->dev,
- txq,
+ primary_txq_ctrl,
  primary_txq->elts_n,
- primary_txq->socket,
+ primary_txq_ctrl->socket,
  NULL) == 0) {
-   txq->stats.idx = primary_txq->stats.idx;
-   tx_queues[i] = txq;
+   txq_ctrl->txq.stats.idx =
+   primary_txq->stats.idx;
+   tx_queues[i] = &txq_ctrl->txq;
continue;
}
-   rte_free(txq);
+   rte_free(txq_ctrl);
}
while (i) {
-   txq = tx_queues[--i];
-   txq_cleanup(txq);
-   rte_free(txq);
+   txq_ctrl = tx_queues[--i];
+   txq_cleanup(txq_ctrl);
+   rte_free(txq_ctrl);
}
goto error;
}
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 7c3e87f..79d5568 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -183,33 +183,36 @@ mlx5_mp2mr(struc

[dpdk-dev] [PATCH v4 06/25] mlx5: remove inline Tx support

2016-06-22 Thread Nelio Laranjeiro

Inline TX will be fully managed by the PMD after Verbs is bypassed in the
data path. Remove the current code until then.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 config/common_base   |  1 -
 doc/guides/nics/mlx5.rst | 10 --
 drivers/net/mlx5/Makefile|  4 ---
 drivers/net/mlx5/mlx5_defs.h |  5 ---
 drivers/net/mlx5/mlx5_rxtx.c | 73 +++-
 drivers/net/mlx5/mlx5_rxtx.h |  9 --
 drivers/net/mlx5/mlx5_txq.c  | 16 --
 7 files changed, 25 insertions(+), 93 deletions(-)

diff --git a/config/common_base b/config/common_base
index 39e6333..5fbac47 100644
--- a/config/common_base
+++ b/config/common_base
@@ -207,7 +207,6 @@ CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
 #
 CONFIG_RTE_LIBRTE_MLX5_PMD=n
 CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
-CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE=0
 CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8

 #
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 84c35a0..77fa957 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -114,16 +114,6 @@ These options can be modified in the ``.config`` file.
   adds additional run-time checks and debugging messages at the cost of
   lower performance.

-- ``CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE`` (default **0**)
-
-  Amount of data to be inlined during TX operations. Improves latency.
-  Can improve PPS performance when PCI backpressure is detected and may be
-  useful for scenarios involving heavy traffic on many queues.
-
-  Since the additional software logic necessary to handle this mode can
-  lower performance when there is no backpressure, it is not enabled by
-  default.
-
 - ``CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE`` (default **8**)

   Maximum number of cached memory pools (MPs) per TX queue. Each MP from
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 656a6e1..289c85e 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -86,10 +86,6 @@ else
 CFLAGS += -DNDEBUG -UPEDANTIC
 endif

-ifdef CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE
-CFLAGS += -DMLX5_PMD_MAX_INLINE=$(CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE)
-endif
-
 ifdef CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE
 CFLAGS += -DMLX5_PMD_TX_MP_CACHE=$(CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE)
 endif
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index da1c90e..9a19835 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -54,11 +54,6 @@
 /* RSS Indirection table size. */
 #define RSS_INDIRECTION_TABLE_SIZE 256

-/* Maximum size for inline data. */
-#ifndef MLX5_PMD_MAX_INLINE
-#define MLX5_PMD_MAX_INLINE 0
-#endif
-
 /*
  * Maximum number of cached Memory Pools (MPs) per TX queue. Each RTE MP
  * from which buffers are to be transmitted will have to be mapped by this
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 07d95eb..4ba88ea 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -329,56 +329,33 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
rte_prefetch0((volatile void *)
  (uintptr_t)buf_next_addr);
}
-   /* Put packet into send queue. */
-#if MLX5_PMD_MAX_INLINE > 0
-   if (length <= txq->max_inline) {
-#ifdef HAVE_VERBS_VLAN_INSERTION
-   if (insert_vlan)
-   err = txq->send_pending_inline_vlan
-   (txq->qp,
-(void *)addr,
-length,
-send_flags,
-&buf->vlan_tci);
-   else
-#endif /* HAVE_VERBS_VLAN_INSERTION */
-   err = txq->send_pending_inline
-   (txq->qp,
-(void *)addr,
-length,
-send_flags);
-   } else
-#endif
-   {
-   /* Retrieve Memory Region key for this
-* memory pool. */
-   lkey = txq_mp2mr(txq, txq_mb2mp(buf));
-   if (unlikely(lkey == (uint32_t)-1)) {
-   /* MR does not exist. */
-   DEBUG("%p: unable to get MP <-> MR"
- " association", (void *)txq);
-   /* Clean up TX element. */
-   elt->buf = NULL;
-   goto stop;
-   }
+   /* Retrieve Memory Region key for this memory pool. */
+   lkey = txq_mp2mr(txq, txq_mb2mp(buf));
+   if (unlikely(lkey == (uint32_t)-1)) {
+   /* MR does not exist. */
+   DEBUG("%p: unable t

[dpdk-dev] [PATCH v4 05/25] mlx5: remove configuration variable

2016-06-22 Thread Nelio Laranjeiro

There is no scatter/gather support anymore, CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N
has no purpose and can be removed.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 config/common_base   | 1 -
 doc/guides/nics/mlx5.rst | 7 ---
 drivers/net/mlx5/Makefile| 4 
 drivers/net/mlx5/mlx5_defs.h | 5 -
 drivers/net/mlx5/mlx5_rxq.c  | 4 
 drivers/net/mlx5/mlx5_txq.c  | 4 
 6 files changed, 25 deletions(-)

diff --git a/config/common_base b/config/common_base
index ead5984..39e6333 100644
--- a/config/common_base
+++ b/config/common_base
@@ -207,7 +207,6 @@ CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
 #
 CONFIG_RTE_LIBRTE_MLX5_PMD=n
 CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
-CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N=4
 CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE=0
 CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index d9196d1..84c35a0 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -114,13 +114,6 @@ These options can be modified in the ``.config`` file.
   adds additional run-time checks and debugging messages at the cost of
   lower performance.

-- ``CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N`` (default **4**)
-
-  Number of scatter/gather elements (SGEs) per work request (WR). Lowering
-  this number improves performance but also limits the ability to receive
-  scattered packets (packets that do not fit a single mbuf). The default
-  value is a safe tradeoff.
-
 - ``CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE`` (default **0**)

   Amount of data to be inlined during TX operations. Improves latency.
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 999ada5..656a6e1 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -86,10 +86,6 @@ else
 CFLAGS += -DNDEBUG -UPEDANTIC
 endif

-ifdef CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N
-CFLAGS += -DMLX5_PMD_SGE_WR_N=$(CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N)
-endif
-
 ifdef CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE
 CFLAGS += -DMLX5_PMD_MAX_INLINE=$(CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE)
 endif
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 09207d9..da1c90e 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -54,11 +54,6 @@
 /* RSS Indirection table size. */
 #define RSS_INDIRECTION_TABLE_SIZE 256

-/* Maximum number of Scatter/Gather Elements per Work Request. */
-#ifndef MLX5_PMD_SGE_WR_N
-#define MLX5_PMD_SGE_WR_N 4
-#endif
-
 /* Maximum size for inline data. */
 #ifndef MLX5_PMD_MAX_INLINE
 #define MLX5_PMD_MAX_INLINE 0
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 38ff9fd..4000624 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -976,10 +976,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
ERROR("%p: invalid number of RX descriptors", (void *)dev);
return EINVAL;
}
-   if (MLX5_PMD_SGE_WR_N > 1) {
-   ERROR("%p: RX scatter is not supported", (void *)dev);
-   return ENOTSUP;
-   }
/* Toggle RX checksum offload if hardware supports it. */
if (priv->hw_csum)
tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 5a248c9..59974c5 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -264,10 +264,6 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, 
uint16_t desc,
ERROR("%p: invalid number of TX descriptors", (void *)dev);
return EINVAL;
}
-   if (MLX5_PMD_SGE_WR_N > 1) {
-   ERROR("%p: TX gather is not supported", (void *)dev);
-   return EINVAL;
-   }
/* MRs will be registered in mp2mr[] later. */
attr.rd = (struct ibv_exp_res_domain_init_attr){
.comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
-- 
2.1.4

[dpdk-dev] [PATCH v4 04/25] mlx5: remove Rx scatter support

2016-06-22 Thread Nelio Laranjeiro

This is done in preparation of bypassing Verbs entirely for the data path
as a performance improvement. RX scatter cannot be maintained during the
transition and will be reimplemented later.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_ethdev.c |  31 +---
 drivers/net/mlx5/mlx5_rxq.c| 314 ++---
 drivers/net/mlx5/mlx5_rxtx.c   | 211 +--
 drivers/net/mlx5/mlx5_rxtx.h   |  13 +-
 4 files changed, 53 insertions(+), 516 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 280a90a..ca57021 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -623,8 +623,7 @@ mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev)

};

-   if (dev->rx_pkt_burst == mlx5_rx_burst ||
-   dev->rx_pkt_burst == mlx5_rx_burst_sp)
+   if (dev->rx_pkt_burst == mlx5_rx_burst)
return ptypes;
return NULL;
 }
@@ -762,19 +761,11 @@ mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
mb_len = rte_pktmbuf_data_room_size(rxq->mp);
assert(mb_len >= RTE_PKTMBUF_HEADROOM);
sp = (max_frame_len > (mb_len - RTE_PKTMBUF_HEADROOM));
-   /* Provide new values to rxq_setup(). */
-   dev->data->dev_conf.rxmode.jumbo_frame = sp;
-   dev->data->dev_conf.rxmode.max_rx_pkt_len = max_frame_len;
-   ret = rxq_rehash(dev, rxq);
-   if (ret) {
-   /* Force SP RX if that queue requires it and abort. */
-   if (rxq->sp)
-   rx_func = mlx5_rx_burst_sp;
-   break;
+   if (sp) {
+   ERROR("%p: RX scatter is not supported", (void *)dev);
+   ret = ENOTSUP;
+   goto out;
}
-   /* Scattered burst function takes priority. */
-   if (rxq->sp)
-   rx_func = mlx5_rx_burst_sp;
}
/* Burst functions can now be called again. */
rte_wmb();
@@ -1103,22 +1094,12 @@ priv_set_link(struct priv *priv, int up)
 {
struct rte_eth_dev *dev = priv->dev;
int err;
-   unsigned int i;

if (up) {
err = priv_set_flags(priv, ~IFF_UP, IFF_UP);
if (err)
return err;
-   for (i = 0; i < priv->rxqs_n; i++)
-   if ((*priv->rxqs)[i]->sp)
-   break;
-   /* Check if an sp queue exists.
-* Note: Some old frames might be received.
-*/
-   if (i == priv->rxqs_n)
-   dev->rx_pkt_burst = mlx5_rx_burst;
-   else
-   dev->rx_pkt_burst = mlx5_rx_burst_sp;
+   dev->rx_pkt_burst = mlx5_rx_burst;
dev->tx_pkt_burst = mlx5_tx_burst;
} else {
err = priv_set_flags(priv, ~IFF_UP, ~IFF_UP);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 0bcf55b..38ff9fd 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -634,145 +634,6 @@ priv_rehash_flows(struct priv *priv)
 }

 /**
- * Allocate RX queue elements with scattered packets support.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- * @param elts_n
- *   Number of elements to allocate.
- * @param[in] pool
- *   If not NULL, fetch buffers from this array instead of allocating them
- *   with rte_pktmbuf_alloc().
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-rxq_alloc_elts_sp(struct rxq *rxq, unsigned int elts_n,
- struct rte_mbuf **pool)
-{
-   unsigned int i;
-   struct rxq_elt_sp (*elts)[elts_n] =
-   rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
- rxq->socket);
-   int ret = 0;
-
-   if (elts == NULL) {
-   ERROR("%p: can't allocate packets array", (void *)rxq);
-   ret = ENOMEM;
-   goto error;
-   }
-   /* For each WR (packet). */
-   for (i = 0; (i != elts_n); ++i) {
-   unsigned int j;
-   struct rxq_elt_sp *elt = &(*elts)[i];
-   struct ibv_sge (*sges)[RTE_DIM(elt->sges)] = &elt->sges;
-
-   /* These two arrays must have the same size. */
-   assert(RTE_DIM(elt->sges) == RTE_DIM(elt->bufs));
-   /* For each SGE (segment). */
-   for (j = 0; (j != RTE_DIM(elt->bufs)); ++j) {
-   struct ibv_sge *sge = &(*sges)[j];
-   struct rte_mbuf *buf;
-
-   if (pool != NULL) {
-   buf = *(pool++);
-   assert(buf != NULL);
-   rte_pktmbuf_reset(buf);
-

[dpdk-dev] [PATCH v4 03/25] mlx5: remove Tx gather support

2016-06-22 Thread Nelio Laranjeiro

This is done in preparation of bypassing Verbs entirely for the data path
as a performance improvement. TX gather cannot be maintained during the
transition and will be reimplemented later.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_ethdev.c |   2 +-
 drivers/net/mlx5/mlx5_rxtx.c   | 315 -
 drivers/net/mlx5/mlx5_rxtx.h   |  17 ---
 drivers/net/mlx5/mlx5_txq.c|  49 ++-
 4 files changed, 69 insertions(+), 314 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 0a881b6..280a90a 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1260,7 +1260,7 @@ mlx5_secondary_data_setup(struct priv *priv)
if (txq != NULL) {
if (txq_setup(priv->dev,
  txq,
- primary_txq->elts_n * MLX5_PMD_SGE_WR_N,
+ primary_txq->elts_n,
  primary_txq->socket,
  NULL) == 0) {
txq->stats.idx = primary_txq->stats.idx;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 616cf7a..6e184c3 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -228,156 +228,6 @@ insert_vlan_sw(struct rte_mbuf *buf)
return 0;
 }

-#if MLX5_PMD_SGE_WR_N > 1
-
-/**
- * Copy scattered mbuf contents to a single linear buffer.
- *
- * @param[out] linear
- *   Linear output buffer.
- * @param[in] buf
- *   Scattered input buffer.
- *
- * @return
- *   Number of bytes copied to the output buffer or 0 if not large enough.
- */
-static unsigned int
-linearize_mbuf(linear_t *linear, struct rte_mbuf *buf)
-{
-   unsigned int size = 0;
-   unsigned int offset;
-
-   do {
-   unsigned int len = DATA_LEN(buf);
-
-   offset = size;
-   size += len;
-   if (unlikely(size > sizeof(*linear)))
-   return 0;
-   memcpy(&(*linear)[offset],
-  rte_pktmbuf_mtod(buf, uint8_t *),
-  len);
-   buf = NEXT(buf);
-   } while (buf != NULL);
-   return size;
-}
-
-/**
- * Handle scattered buffers for mlx5_tx_burst().
- *
- * @param txq
- *   TX queue structure.
- * @param segs
- *   Number of segments in buf.
- * @param elt
- *   TX queue element to fill.
- * @param[in] buf
- *   Buffer to process.
- * @param elts_head
- *   Index of the linear buffer to use if necessary (normally txq->elts_head).
- * @param[out] sges
- *   Array filled with SGEs on success.
- *
- * @return
- *   A structure containing the processed packet size in bytes and the
- *   number of SGEs. Both fields are set to (unsigned int)-1 in case of
- *   failure.
- */
-static struct tx_burst_sg_ret {
-   unsigned int length;
-   unsigned int num;
-}
-tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt,
-   struct rte_mbuf *buf, unsigned int elts_head,
-   struct ibv_sge (*sges)[MLX5_PMD_SGE_WR_N])
-{
-   unsigned int sent_size = 0;
-   unsigned int j;
-   int linearize = 0;
-
-   /* When there are too many segments, extra segments are
-* linearized in the last SGE. */
-   if (unlikely(segs > RTE_DIM(*sges))) {
-   segs = (RTE_DIM(*sges) - 1);
-   linearize = 1;
-   }
-   /* Update element. */
-   elt->buf = buf;
-   /* Register segments as SGEs. */
-   for (j = 0; (j != segs); ++j) {
-   struct ibv_sge *sge = &(*sges)[j];
-   uint32_t lkey;
-
-   /* Retrieve Memory Region key for this memory pool. */
-   lkey = txq_mp2mr(txq, txq_mb2mp(buf));
-   if (unlikely(lkey == (uint32_t)-1)) {
-   /* MR does not exist. */
-   DEBUG("%p: unable to get MP <-> MR association",
- (void *)txq);
-   /* Clean up TX element. */
-   elt->buf = NULL;
-   goto stop;
-   }
-   /* Update SGE. */
-   sge->addr = rte_pktmbuf_mtod(buf, uintptr_t);
-   if (txq->priv->sriov)
-   rte_prefetch0((volatile void *)
- (uintptr_t)sge->addr);
-   sge->length = DATA_LEN(buf);
-   sge->lkey = lkey;
-   sent_size += sge->length;
-   buf = NEXT(buf);
-   }
-   /* If buf is not NULL here and is not going to be linearized,
-* nb_segs is not valid. */
-   assert(j == segs);
-   assert((buf == NULL) || (linearize));
-   /* Linearize extra segments. */
-   if (linearize) {
-   struct ibv_sge *sge = &(*sges)[segs];
-   linear_t *linear = &(*txq->

[dpdk-dev] [PATCH v4 02/25] mlx5: split memory registration function

2016-06-22 Thread Nelio Laranjeiro

Except for the first time when memory registration occurs, the lkey is
always cached. Since memory registration is slow and performs system calls,
performance can be improved by moving that code to its own function outside
of the data path so only the lookup code is left in the original inlined
function.

Signed-off-by: Nelio Laranjeiro 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx5/Makefile|   1 +
 drivers/net/mlx5/mlx5_mr.c   | 277 +++
 drivers/net/mlx5/mlx5_rxtx.c | 209 ++--
 drivers/net/mlx5/mlx5_rxtx.h |   8 +-
 4 files changed, 295 insertions(+), 200 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_mr.c

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 82558aa..999ada5 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -47,6 +47,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_vlan.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_stats.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rss.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_fdir.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mr.c

 # Dependencies.
 DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_ether
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
new file mode 100644
index 000..7c3e87f
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -0,0 +1,277 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 6WIND S.A.
+ *   Copyright 2016 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include 
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include 
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+
+struct mlx5_check_mempool_data {
+   int ret;
+   char *start;
+   char *end;
+};
+
+/* Called by mlx5_check_mempool() when iterating the memory chunks. */
+static void mlx5_check_mempool_cb(struct rte_mempool *mp,
+   void *opaque, struct rte_mempool_memhdr *memhdr,
+   unsigned mem_idx)
+{
+   struct mlx5_check_mempool_data *data = opaque;
+
+   (void)mp;
+   (void)mem_idx;
+
+   /* It already failed, skip the next chunks. */
+   if (data->ret != 0)
+   return;
+   /* It is the first chunk. */
+   if (data->start == NULL && data->end == NULL) {
+   data->start = memhdr->addr;
+   data->end = data->start + memhdr->len;
+   return;
+   }
+   if (data->end == memhdr->addr) {
+   data->end += memhdr->len;
+   return;
+   }
+   if (data->start == (char *)memhdr->addr + memhdr->len) {
+   data->start -= memhdr->len;
+   return;
+   }
+   /* Error, mempool is not virtually contiguous. */
+   data->ret = -1;
+}
+
+/**
+ * Check if a mempool can be used: it must be virtually contiguous.
+ *
+ * @param[in] mp
+ *   Pointer to memory pool.
+ * @param[out] start
+ *   Pointer to the start address of the mempool virtual memory area
+ * @param[out] end
+ *   Pointer to the end address of the mempool virtual memory area
+ *
+ * @return
+ *   0 on success (mempo

[dpdk-dev] [PATCH v4 01/25] drivers: fix PCI class id support

2016-06-22 Thread Nelio Laranjeiro

Fixes: 701c8d80c820 ("pci: support class id probing")

Signed-off-by: Nelio Laranjeiro 
---
 drivers/crypto/qat/rte_qat_cryptodev.c |  5 +
 drivers/net/mlx4/mlx4.c| 18 ++
 drivers/net/mlx5/mlx5.c| 24 
 drivers/net/nfp/nfp_net.c  | 12 
 4 files changed, 19 insertions(+), 40 deletions(-)

diff --git a/drivers/crypto/qat/rte_qat_cryptodev.c 
b/drivers/crypto/qat/rte_qat_cryptodev.c
index a7912f5..f46ec85 100644
--- a/drivers/crypto/qat/rte_qat_cryptodev.c
+++ b/drivers/crypto/qat/rte_qat_cryptodev.c
@@ -69,10 +69,7 @@ static struct rte_cryptodev_ops crypto_qat_ops = {

 static struct rte_pci_id pci_id_qat_map[] = {
{
-   .vendor_id = 0x8086,
-   .device_id = 0x0443,
-   .subsystem_vendor_id = PCI_ANY_ID,
-   .subsystem_device_id = PCI_ANY_ID
+   RTE_PCI_DEVICE(0x8086, 0x0443),
},
{.device_id = 0},
 };
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 9e94630..6228688 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -5807,22 +5807,16 @@ error:

 static const struct rte_pci_id mlx4_pci_id_map[] = {
{
-   .vendor_id = PCI_VENDOR_ID_MELLANOX,
-   .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX3,
-   .subsystem_vendor_id = PCI_ANY_ID,
-   .subsystem_device_id = PCI_ANY_ID
+   RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+  PCI_DEVICE_ID_MELLANOX_CONNECTX3)
},
{
-   .vendor_id = PCI_VENDOR_ID_MELLANOX,
-   .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX3PRO,
-   .subsystem_vendor_id = PCI_ANY_ID,
-   .subsystem_device_id = PCI_ANY_ID
+   RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+  PCI_DEVICE_ID_MELLANOX_CONNECTX3PRO)
},
{
-   .vendor_id = PCI_VENDOR_ID_MELLANOX,
-   .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX3VF,
-   .subsystem_vendor_id = PCI_ANY_ID,
-   .subsystem_device_id = PCI_ANY_ID
+   RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+  PCI_DEVICE_ID_MELLANOX_CONNECTX3VF)
},
{
.vendor_id = 0
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 67a541c..350028b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -610,28 +610,20 @@ error:

 static const struct rte_pci_id mlx5_pci_id_map[] = {
{
-   .vendor_id = PCI_VENDOR_ID_MELLANOX,
-   .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX4,
-   .subsystem_vendor_id = PCI_ANY_ID,
-   .subsystem_device_id = PCI_ANY_ID
+   RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+  PCI_DEVICE_ID_MELLANOX_CONNECTX4)
},
{
-   .vendor_id = PCI_VENDOR_ID_MELLANOX,
-   .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX4VF,
-   .subsystem_vendor_id = PCI_ANY_ID,
-   .subsystem_device_id = PCI_ANY_ID
+   RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+  PCI_DEVICE_ID_MELLANOX_CONNECTX4VF)
},
{
-   .vendor_id = PCI_VENDOR_ID_MELLANOX,
-   .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX4LX,
-   .subsystem_vendor_id = PCI_ANY_ID,
-   .subsystem_device_id = PCI_ANY_ID
+   RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+  PCI_DEVICE_ID_MELLANOX_CONNECTX4LX)
},
{
-   .vendor_id = PCI_VENDOR_ID_MELLANOX,
-   .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF,
-   .subsystem_vendor_id = PCI_ANY_ID,
-   .subsystem_device_id = PCI_ANY_ID
+   RTE_PCI_DEVICE(PCI_VENDOR_ID_MELLANOX,
+  PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF)
},
{
.vendor_id = 0
diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index ea5a2a3..dd0c559 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -2446,16 +2446,12 @@ nfp_net_init(struct rte_eth_dev *eth_dev)

 static struct rte_pci_id pci_id_nfp_net_map[] = {
{
-   .vendor_id = PCI_VENDOR_ID_NETRONOME,
-   .device_id = PCI_DEVICE_ID_NFP6000_PF_NIC,
-   .subsystem_vendor_id = PCI_ANY_ID,
-   .subsystem_device_id = PCI_ANY_ID,
+   RTE_PCI_DEVICE(PCI_VENDOR_ID_NETRONOME,
+  PCI_DEVICE_ID_NFP6000_PF_NIC)
},
{
-   .vendor_id = PCI_VENDOR_ID_NETRONOME,
-   .device_id = PCI_DEVICE_ID_NFP6000_VF_NIC,
-   .subsystem_vendor_id = PCI_ANY_ID,
-   .subsystem_device_id = PCI_AN

[dpdk-dev] [PATCH 00/25] Refactor mlx5 to improve performance

2016-06-22 Thread Nelio Laranjeiro

Enhance mlx5 with a data path that bypasses Verbs.

The first half of this patchset removes support for functionality completely
rewritten in the second half (scatter/gather, inline send), while the data
path is refactored without Verbs.

The PMD remains usable during the transition.

This patchset must be applied after "Miscellaneous fixes for mlx4 and mlx5".

Changes in v4:
- Fixed errno return value of mlx5_args().
- Fixed long line above 80 characters.

Changes in v3:
- Rebased patchset on top of next-net/rel_16_07.

Changes in v2:
- Rebased patchset on top of dpdk/master.
- Fixed CQE size on Power8.
- Fixed mbuf assertion failure in debug mode.
- Fixed missing class_id field in rte_pci_id by using RTE_PCI_DEVICE.

Adrien Mazarguil (8):
  mlx5: replace countdown with threshold for Tx completions
  mlx5: add debugging information about Tx queues capabilities
  mlx5: check remaining space while processing Tx burst
  mlx5: resurrect Tx gather support
  mlx5: work around spurious compilation errors
  mlx5: remove redundant Rx queue initialization code
  mlx5: make Rx queue reinitialization safer
  mlx5: resurrect Rx scatter support

Nelio Laranjeiro (16):
  drivers: fix PCI class id support
  mlx5: split memory registration function
  mlx5: remove Tx gather support
  mlx5: remove Rx scatter support
  mlx5: remove configuration variable
  mlx5: remove inline Tx support
  mlx5: split Tx queue structure
  mlx5: split Rx queue structure
  mlx5: update prerequisites for upcoming enhancements
  mlx5: add definitions for data path without Verbs
  mlx5: add support for configuration through kvargs
  mlx5: add Tx/Rx burst function selection wrapper
  mlx5: refactor Rx data path
  mlx5: refactor Tx data path
  mlx5: handle Rx CQE compression
  mlx5: add support for multi-packet send

Yaacov Hazan (1):
  mlx5: add support for inline send

 config/common_base |2 -
 doc/guides/nics/mlx5.rst   |   94 +-
 drivers/crypto/qat/rte_qat_cryptodev.c |5 +-
 drivers/net/mlx4/mlx4.c|   18 +-
 drivers/net/mlx5/Makefile  |   49 +-
 drivers/net/mlx5/mlx5.c|  182 ++-
 drivers/net/mlx5/mlx5.h|   10 +
 drivers/net/mlx5/mlx5_defs.h   |   26 +-
 drivers/net/mlx5/mlx5_ethdev.c |  189 ++-
 drivers/net/mlx5/mlx5_fdir.c   |   20 +-
 drivers/net/mlx5/mlx5_mr.c |  280 
 drivers/net/mlx5/mlx5_prm.h|  163 +++
 drivers/net/mlx5/mlx5_rxmode.c |8 -
 drivers/net/mlx5/mlx5_rxq.c|  762 ---
 drivers/net/mlx5/mlx5_rxtx.c   | 2210 +++-
 drivers/net/mlx5/mlx5_rxtx.h   |  176 ++-
 drivers/net/mlx5/mlx5_txq.c|  368 +++---
 drivers/net/mlx5/mlx5_vlan.c   |6 +-
 drivers/net/nfp/nfp_net.c  |   12 +-
 19 files changed, 2625 insertions(+), 1955 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_mr.c
 create mode 100644 drivers/net/mlx5/mlx5_prm.h

-- 
2.1.4

[dpdk-dev] [PATCH v3] i40e: fix the type issue of a single VLAN type

2016-06-22 Thread Beilei Xing

In current i40e codebase, if single VLAN header is added in a packet,
it's treated as inner VLAN. Generally, a single VLAN header is
treated as the outer VLAN header. So change corresponding register
for single VLAN.

Fixes: 19b16e2f6442 ("ethdev: add vlan type when setting ether type")

Signed-off-by: Beilei Xing 
---
v3 changes:
 Note it as a "fixed issue" in the i40e driver.
 Reword the title.
v2 changes:
 Combine corresponding i40e driver changes into this patch.

 doc/guides/rel_notes/release_16_07.rst |  7 +++
 drivers/net/i40e/i40e_ethdev.c | 29 -
 lib/librte_ether/rte_ethdev.h  |  4 ++--
 3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst 
b/doc/guides/rel_notes/release_16_07.rst
index 7aeacb2..f6912a7 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -113,6 +113,13 @@ Drivers
   info to descriptor.
   Now this issue is fixed by disabling vlan stripping from inner header.

+* **i40e: Fixed the type issue of a single VLAN type.**
+
+  Currently, if a single VLAN header is added in a packet, it's treated
+  as inner VLAN. But generally, a single VLAN header is treated as the
+  outer VLAN header.
+  This issue is fixed by changing corresponding register for single VLAN.
+

 Libraries
 ~
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index f94ad87..838889e 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -924,12 +924,6 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 "VLAN ether type");
goto err_setup_pf_switch;
}
-   ret = i40e_vlan_tpid_set(dev, ETH_VLAN_TYPE_INNER, ETHER_TYPE_VLAN);
-   if (ret != I40E_SUCCESS) {
-   PMD_INIT_LOG(ERR, "Failed to set the default outer "
-"VLAN ether type");
-   goto err_setup_pf_switch;
-   }

/* PF setup, which includes VSI setup */
ret = i40e_pf_setup(pf);
@@ -2442,13 +2436,24 @@ i40e_vlan_tpid_set(struct rte_eth_dev *dev,
uint64_t reg_r = 0, reg_w = 0;
uint16_t reg_id = 0;
int ret = 0;
+   int qinq = dev->data->dev_conf.rxmode.hw_vlan_extend;

switch (vlan_type) {
case ETH_VLAN_TYPE_OUTER:
-   reg_id = 2;
+   if (qinq)
+   reg_id = 2;
+   else
+   reg_id = 3;
break;
case ETH_VLAN_TYPE_INNER:
-   reg_id = 3;
+   if (qinq)
+   reg_id = 3;
+   else {
+   ret = -EINVAL;
+   PMD_DRV_LOG(ERR, "Unsupported vlan type"
+   "in single vlan.\n");
+   return ret;
+   }
break;
default:
ret = -EINVAL;
@@ -2510,8 +2515,14 @@ i40e_vlan_offload_set(struct rte_eth_dev *dev, int mask)
}

if (mask & ETH_VLAN_EXTEND_MASK) {
-   if (dev->data->dev_conf.rxmode.hw_vlan_extend)
+   if (dev->data->dev_conf.rxmode.hw_vlan_extend) {
i40e_vsi_config_double_vlan(vsi, TRUE);
+   /* Set global registers with default ether type value */
+   i40e_vlan_tpid_set(dev, ETH_VLAN_TYPE_OUTER,
+  ETHER_TYPE_VLAN);
+   i40e_vlan_tpid_set(dev, ETH_VLAN_TYPE_INNER,
+  ETHER_TYPE_VLAN);
+   }
else
i40e_vsi_config_double_vlan(vsi, FALSE);
}
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index bd93bf6..6804a86 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -363,8 +363,8 @@ struct rte_eth_rxmode {
  */
 enum rte_vlan_type {
ETH_VLAN_TYPE_UNKNOWN = 0,
-   ETH_VLAN_TYPE_INNER, /**< Single VLAN, or inner VLAN. */
-   ETH_VLAN_TYPE_OUTER, /**< Outer VLAN. */
+   ETH_VLAN_TYPE_INNER, /**< Inner VLAN. */
+   ETH_VLAN_TYPE_OUTER, /**< Single VLAN, or outer VLAN. */
ETH_VLAN_TYPE_MAX,
 };

-- 
2.5.0

[dpdk-dev] [PATCH v16 3/3] mbuf: make default mempool ops configurable at build

2016-06-22 Thread David Hunt

By default, the mempool ops used for mbuf allocations is a multi
producer and multi consumer ring. We could imagine a target (maybe some
network processors?) that provides an hardware-assisted pool
mechanism. In this case, the default configuration for this architecture
would contain a different value for RTE_MBUF_DEFAULT_MEMPOOL_OPS.

Signed-off-by: Olivier Matz 
Signed-off-by: David Hunt 
Reviewed-by: Jan Viktorin 
Acked-by: Shreyansh Jain 
Acked-by: Olivier Matz 
---
 config/common_base |  1 +
 lib/librte_mbuf/rte_mbuf.c | 26 ++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/config/common_base b/config/common_base
index 11ac81e..5f230db 100644
--- a/config/common_base
+++ b/config/common_base
@@ -394,6 +394,7 @@ CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n
 #
 CONFIG_RTE_LIBRTE_MBUF=y
 CONFIG_RTE_LIBRTE_MBUF_DEBUG=n
+CONFIG_RTE_MBUF_DEFAULT_MEMPOOL_OPS="ring_mp_mc"
 CONFIG_RTE_MBUF_REFCNT_ATOMIC=y
 CONFIG_RTE_PKTMBUF_HEADROOM=128

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 2ece742..8cf5436 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -153,6 +153,7 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
int socket_id)
 {
+   struct rte_mempool *mp;
struct rte_pktmbuf_pool_private mbp_priv;
unsigned elt_size;

@@ -167,10 +168,27 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
mbp_priv.mbuf_data_room_size = data_room_size;
mbp_priv.mbuf_priv_size = priv_size;

-   return rte_mempool_create(name, n, elt_size,
-   cache_size, sizeof(struct rte_pktmbuf_pool_private),
-   rte_pktmbuf_pool_init, &mbp_priv, rte_pktmbuf_init, NULL,
-   socket_id, 0);
+   mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
+sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
+   if (mp == NULL)
+   return NULL;
+
+   rte_errno = rte_mempool_set_ops_byname(mp,
+   RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL);
+   if (rte_errno != 0) {
+   RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
+   return NULL;
+   }
+   rte_pktmbuf_pool_init(mp, &mbp_priv);
+
+   if (rte_mempool_populate_default(mp) < 0) {
+   rte_mempool_free(mp);
+   return NULL;
+   }
+
+   rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
+
+   return mp;
 }

 /* do some sanity checks on a mbuf: panic if it fails */
-- 
2.5.5

[dpdk-dev] [PATCH v16 2/3] app/test: test mempool handler

2016-06-22 Thread David Hunt

Create a minimal custom mempool handler and check that it
passes basic mempool autotests.

Signed-off-by: Olivier Matz 
Signed-off-by: David Hunt 
Reviewed-by: Jan Viktorin 
Acked-by: Shreyansh Jain 
Acked-by: Olivier Matz 
---
 app/test/test_mempool.c | 122 +++-
 1 file changed, 120 insertions(+), 2 deletions(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index b586249..31582d8 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -83,6 +83,99 @@
 static rte_atomic32_t synchro;

 /*
+ * Simple example of custom mempool structure. Holds pointers to all the
+ * elements which are simply malloc'd in this example.
+ */
+struct custom_mempool {
+   rte_spinlock_t lock;
+   unsigned count;
+   unsigned size;
+   void *elts[];
+};
+
+/*
+ * Loop through all the element pointers and allocate a chunk of memory, then
+ * insert that memory into the ring.
+ */
+static int
+custom_mempool_alloc(struct rte_mempool *mp)
+{
+   struct custom_mempool *cm;
+
+   cm = rte_zmalloc("custom_mempool",
+   sizeof(struct custom_mempool) + mp->size * sizeof(void *), 0);
+   if (cm == NULL)
+   return -ENOMEM;
+
+   rte_spinlock_init(&cm->lock);
+   cm->count = 0;
+   cm->size = mp->size;
+   mp->pool_data = cm;
+   return 0;
+}
+
+static void
+custom_mempool_free(struct rte_mempool *mp)
+{
+   rte_free((void *)(mp->pool_data));
+}
+
+static int
+custom_mempool_enqueue(struct rte_mempool *mp, void * const *obj_table,
+   unsigned n)
+{
+   struct custom_mempool *cm = (struct custom_mempool *)(mp->pool_data);
+   int ret = 0;
+
+   rte_spinlock_lock(&cm->lock);
+   if (cm->count + n > cm->size) {
+   ret = -ENOBUFS;
+   } else {
+   memcpy(&cm->elts[cm->count], obj_table, sizeof(void *) * n);
+   cm->count += n;
+   }
+   rte_spinlock_unlock(&cm->lock);
+   return ret;
+}
+
+
+static int
+custom_mempool_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
+{
+   struct custom_mempool *cm = (struct custom_mempool *)(mp->pool_data);
+   int ret = 0;
+
+   rte_spinlock_lock(&cm->lock);
+   if (n > cm->count) {
+   ret = -ENOENT;
+   } else {
+   cm->count -= n;
+   memcpy(obj_table, &cm->elts[cm->count], sizeof(void *) * n);
+   }
+   rte_spinlock_unlock(&cm->lock);
+   return ret;
+}
+
+static unsigned
+custom_mempool_get_count(const struct rte_mempool *mp)
+{
+   struct custom_mempool *cm = (struct custom_mempool *)(mp->pool_data);
+
+   return cm->count;
+}
+
+static struct rte_mempool_ops mempool_ops_custom = {
+   .name = "custom_handler",
+   .alloc = custom_mempool_alloc,
+   .free = custom_mempool_free,
+   .enqueue = custom_mempool_enqueue,
+   .dequeue = custom_mempool_dequeue,
+   .get_count = custom_mempool_get_count,
+};
+
+MEMPOOL_REGISTER_OPS(mempool_ops_custom);
+
+/*
  * save the object number in the first 4 bytes of object data. All
  * other bytes are set to 0.
  */
@@ -292,12 +385,14 @@ static int test_mempool_single_consumer(void)
  * test function for mempool test based on singple consumer and single 
producer,
  * can run on one lcore only
  */
-static int test_mempool_launch_single_consumer(__attribute__((unused)) void 
*arg)
+static int
+test_mempool_launch_single_consumer(__attribute__((unused)) void *arg)
 {
return test_mempool_single_consumer();
 }

-static void my_mp_init(struct rte_mempool * mp, __attribute__((unused)) void * 
arg)
+static void
+my_mp_init(struct rte_mempool *mp, __attribute__((unused)) void *arg)
 {
printf("mempool name is %s\n", mp->name);
/* nothing to be implemented here*/
@@ -477,6 +572,7 @@ test_mempool(void)
 {
struct rte_mempool *mp_cache = NULL;
struct rte_mempool *mp_nocache = NULL;
+   struct rte_mempool *mp_ext = NULL;

rte_atomic32_init(&synchro);

@@ -505,6 +601,27 @@ test_mempool(void)
goto err;
}

+   /* create a mempool with an external handler */
+   mp_ext = rte_mempool_create_empty("test_ext",
+   MEMPOOL_SIZE,
+   MEMPOOL_ELT_SIZE,
+   RTE_MEMPOOL_CACHE_MAX_SIZE, 0,
+   SOCKET_ID_ANY, 0);
+
+   if (mp_ext == NULL) {
+   printf("cannot allocate mp_ext mempool\n");
+   goto err;
+   }
+   if (rte_mempool_set_ops_byname(mp_ext, "custom_handler", NULL) < 0) {
+   printf("cannot set custom handler\n");
+   goto err;
+   }
+   if (rte_mempool_populate_default(mp_ext) < 0) {
+   printf("cannot populate mp_ext mempool\n");
+   goto err;
+   }
+   rte_mempool_obj_iter(mp_ext, my_obj_init, NULL);
+
/* retrieve the mempool from its name */
if (rte_mempool_lookup("test_nocache") != mp_nocache) {

[dpdk-dev] [PATCH v16 1/3] mempool: support mempool handler operations

2016-06-22 Thread David Hunt

Until now, the objects stored in a mempool were internally stored in a
ring. This patch introduces the possibility to register external handlers
replacing the ring.

The default behavior remains unchanged, but calling the new function
rte_mempool_set_ops_byname() right after rte_mempool_create_empty() allows
the user to change the handler that will be used when populating
the mempool.

This patch also adds a set of default ops (function callbacks) based
on rte_ring.

Signed-off-by: Olivier Matz 
Signed-off-by: David Hunt 
Acked-by: Shreyansh Jain 
Acked-by: Olivier Matz 
---
 app/test/test_mempool_perf.c   |   1 -
 doc/guides/prog_guide/mempool_lib.rst  |  32 +++-
 doc/guides/rel_notes/deprecation.rst   |   9 -
 lib/librte_mempool/Makefile|   2 +
 lib/librte_mempool/rte_mempool.c   |  67 +++-
 lib/librte_mempool/rte_mempool.h   | 255 ++---
 lib/librte_mempool/rte_mempool_ops.c   | 151 +
 lib/librte_mempool/rte_mempool_ring.c  | 161 ++
 lib/librte_mempool/rte_mempool_version.map |  13 +-
 9 files changed, 610 insertions(+), 81 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_ops.c
 create mode 100644 lib/librte_mempool/rte_mempool_ring.c

diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index c5e3576..c5f8455 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -161,7 +161,6 @@ per_lcore_mempool_test(__attribute__((unused)) void *arg)
   n_get_bulk);
if (unlikely(ret < 0)) {
rte_mempool_dump(stdout, mp);
-   rte_ring_dump(stdout, mp->ring);
/* in this case, objects are lost... */
return -1;
}
diff --git a/doc/guides/prog_guide/mempool_lib.rst 
b/doc/guides/prog_guide/mempool_lib.rst
index c3afc2e..1943fc4 100644
--- a/doc/guides/prog_guide/mempool_lib.rst
+++ b/doc/guides/prog_guide/mempool_lib.rst
@@ -34,7 +34,8 @@ Mempool Library
 ===

 A memory pool is an allocator of a fixed-sized object.
-In the DPDK, it is identified by name and uses a ring to store free objects.
+In the DPDK, it is identified by name and uses a mempool handler to store free 
objects.
+The default mempool handler is ring based.
 It provides some other optional services such as a per-core object cache and
 an alignment helper to ensure that objects are padded to spread them equally 
on all DRAM or DDR3 channels.

@@ -127,6 +128,35 @@ The maximum size of the cache is static and is defined at 
compilation time (CONF
A mempool in Memory with its Associated Ring


+Mempool Handlers
+
+
+This allows external memory subsystems, such as external hardware memory
+management systems and software based memory allocators, to be used with DPDK.
+
+There are two aspects to a mempool handler.
+
+* Adding the code for your new mempool operations (ops). This is achieved by
+  adding a new mempool ops code, and using the ``REGISTER_MEMPOOL_OPS`` macro.
+
+* Using the new API to call ``rte_mempool_create_empty()`` and
+  ``rte_mempool_set_ops_byname()`` to create a new mempool and specifying which
+  ops to use.
+
+Several different mempool handlers may be used in the same application. A new
+mempool can be created by using the ``rte_mempool_create_empty()`` function,
+then using ``rte_mempool_set_ops_byname()`` to point the mempool to the
+relevant mempool handler callback (ops) structure.
+
+Legacy applications may continue to use the old ``rte_mempool_create()`` API
+call, which uses a ring based mempool handler by default. These applications
+will need to be modified to use a new mempool handler.
+
+For applications that use ``rte_pktmbuf_create()``, there is a config setting
+(``RTE_MBUF_DEFAULT_MEMPOOL_OPS``) that allows the application to make use of
+an alternative mempool handler.
+
+
 Use Cases
 -

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index f75183f..3cbc19e 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -34,15 +34,6 @@ Deprecation Notices
   compact API. The ones that remain are backwards compatible and use the
   per-lcore default cache if available. This change targets release 16.07.

-* The rte_mempool struct will be changed in 16.07 to facilitate the new
-  external mempool manager functionality.
-  The ring element will be replaced with a more generic 'pool' opaque pointer
-  to allow new mempool handlers to use their own user-defined mempool
-  layout. Also newly added to rte_mempool is a handler index.
-  The existing API will be backward compatible, but there will be new API
-  functions added to facilitate the creation of mempools usi

[dpdk-dev] [PATCH v16 0/3] mempool: add mempool handler feature

2016-06-22 Thread David Hunt

Here's the latest version of the Mempool Handler patch set.
It's re-based on top of the latest head as of 20/6/2016, including
Olivier's 35-part patch series on mempool re-org [1]

[1] http://dpdk.org/ml/archives/dev/2016-May/039229.html

v16 changes:

 * Changed rte_mempool_ops_get() to rte_mempool_get_ops()
 * Changed rte_mempool_ops_register() to rte_mempool_register_ops()
 * Applied missing changes that should have been in v15

v15 changes:

 * Changed rte_mempool_ops_get() to rte_mempool_get_ops()
 * Did some minor tweaks to comments after the previous change of function
   names from put/get to enqueue/dequeue
 * Added missing spinlock_unlock in rte_mempool_ops_register()
 * Added check for null in ops_free
 * removed un-needed return statement

v14 changes:

 * set MEMPOOL_F_RING_CREATED flag after rte_mempool_ring_create() is called.
 * Changed name of feature from "external mempool manager" to "mempool handler"
   and updated comments and release notes accordingly.
 * Added a comment for newly added pool_config param in
   rte_mempool_set_ops_byname.

v13 changes:

 * Added in extra opaque data (pool_config) to mempool struct for mempool
   configuration by the ops functions. For example, this can be used to pass
  device names or device flags to the underlying alloc function.
 * Added mempool_config param to rte_mempool_set_ops_byname()

v12 changes:

 * Fixed a comment (function pram h -> ops)
 * fixed a typo (callbacki)

v11 changes:

 * Fixed comments (added '.' where needed for consistency)
 * removed ABI breakage notice for mempool manager in deprecation.rst
 * Added description of the external mempool manager functionality to
   doc/guides/prog_guide/mempool_lib.rst (John Mc reviewed)
 * renamed rte_mempool_default.c to rte_mempool_ring.c

v10 changes:

 * changed the _put/_get op names to _enqueue/_dequeue to be consistent
   with the function names
 * some rte_errno cleanup
 * comment tweaks about when to set pool_data
 * removed an un-needed check for ops->alloc == NULL

v9 changes:

 * added a check for NULL alloc in rte_mempool_ops_register
 * rte_mempool_alloc_t now returns int instead of void*
 * fixed some comment typo's
 * removed some unneeded typecasts
 * changed a return NULL to return -EEXIST in rte_mempool_ops_register
 * fixed rte_mempool_version.map file so builds ok as shared libs
 * moved flags check from rte_mempool_create_empty to rte_mempool_create

v8 changes:

 * merged first three patches in the series into one.
 * changed parameters to ops callback to all be rte_mempool pointer
   rather than than pointer to opaque data or uint64.
 * comment fixes.
 * fixed parameter to _free function (was inconsistent).
 * changed MEMPOOL_F_RING_CREATED to MEMPOOL_F_POOL_CREATED

v7 changes:

 * Changed rte_mempool_handler_table to rte_mempool_ops_table
 * Changed hander_idx to ops_index in rte_mempool struct
 * Reworked comments in rte_mempool.h around ops functions
 * Changed rte_mempool_hander.c to rte_mempool_ops.c
 * Changed all functions containing _handler_ to _ops_
 * Now there is no mention of 'handler' left
 * Other small changes out of review of mailing list

v6 changes:

 * Moved the flags handling from rte_mempool_create_empty to
   rte_mempool_create, as it's only there for backward compatibility
 * Various comment additions and cleanup
 * Renamed rte_mempool_handler to rte_mempool_ops
 * Added a union for *pool and u64 pool_id in struct rte_mempool
 * split the original patch into a few parts for easier review.
 * rename functions with _ext_ to _ops_.
 * addressed review comments
 * renamed put and get functions to enqueue and dequeue
 * changed occurences of rte_mempool_ops to const, as they
   contain function pointers (security)
 * split out the default external mempool handler into a separate
   patch for easier review

v5 changes:
 * rebasing, as it is dependent on another patch series [1]

v4 changes (Olivier Matz):
 * remove the rte_mempool_create_ext() function. To change the handler, the
   user has to do the following:
   - mp = rte_mempool_create_empty()
   - rte_mempool_set_handler(mp, "my_handler")
   - rte_mempool_populate_default(mp)
   This avoids to add another function with more than 10 arguments, duplicating
   the doxygen comments
 * change the api of rte_mempool_alloc_t: only the mempool pointer is required
   as all information is available in it
 * change the api of rte_mempool_free_t: remove return value
 * move inline wrapper functions from the .c to the .h (else they won't be
   inlined). This implies to have one header file (rte_mempool.h), or it
   would have generate cross dependencies issues.
 * remove now unused MEMPOOL_F_INT_HANDLER (note: it was misused anyway due
   to the use of && instead of &)
 * fix build in debug mode (__MEMPOOL_STAT_ADD(mp, put_pool, n) remaining)
 * fix build with shared libraries (global handler has to be declared in
   the .map file)
 * rationalize #include order
 * remove unused function rte_mempool_get_han

[dpdk-dev] [PATCH v3 1/2] ethdev: add callback to get register size in bytes

2016-06-22 Thread Thomas Monjalon

2016-06-22 10:19, Zyta Szpak:
> Note that if we remove rte_eth_dev_get_reg_length() then it will break 
> all of the drivers that implement it. Shall I remove it all leave it and 
> modify only ethtool to use rte_eth_dev_get_regs() to get reg size? In 
> the end the drivers will have to implement the part of setting the size 
> in reg_info struct. rte_eth_dev_get_regs() itself wouldn't change at all.
> Or do you have different opinion?

igb, ixgbe and i40e must be updated in the same patch to comply with the
new behaviour of rte_eth_dev_get_regs.
rte_eth_dev_get_reg_length can be deprecated and removed in the next release.


> On 21.06.2016 11:55, Zyta Szpak wrote:
> > OK, I will do the v4.
> >
> > On 17.06.2016 12:20, Thomas Monjalon wrote:
> >> 2016-06-13 16:51, Remy Horton:
> >>> On 12/06/2016 15:51, Zyta Szpak wrote:
>   I would prefer having only one function rte_eth_dev_get_regs()
>   which returns length and width if data is NULL.
>   The first call is a parameter request before buffer allocation,
>   and the second call fills the buffer.
> 
>   We can deprecate the old API and introduce this new one.
> 
>   Opinions?
> 
>  In my opinion as it is now it works fine. Gathering all parameters in
>  one callback might be a good idea if the maintainer also agrees to 
>  that
>  because as I mentioned, it interferes.
> >>>   From my perspective changing rte_eth_dev_get_regs() isn't a 
> >>> problem, as
> >>> it isn't used directly rather than through rte_ethtool_get_regs()..
> >> Zyta, would you like to make a v4?
> >
>

[dpdk-dev] [PATCH 3/3] app/pdump: fix string overflow

2016-06-22 Thread Bruce Richardson

On Wed, Jun 22, 2016 at 12:16:27PM +0530, Anupam Kapoor wrote:
> >   if (!strcmp(key, PDUMP_RX_DEV_ARG)) {
> > - strncpy(pt->rx_dev, value, strlen(value));
> > + strncpy(pt->rx_dev, value, sizeof(pt->rx_dev)-1);
> 
> I guess size-1 is to give room for terminating null byte, but for this
> case is it guarantied that pt->rx_dev last byte is NULL?
> 
> why not just use a snprintf(...) here since it has better error behavior ?
> although compared to str*cpy it might be a bit slow, but hopefully that
> should be ok ?
> 

Definite +1. For safely copying strings I think snprintf is often the easiest
API to use.

/Bruce

> --
> thanks
> anupam
> 
> 
> On Tue, Jun 21, 2016 at 10:51 PM, Ferruh Yigit 
> wrote:
> 
> > On 6/21/2016 4:18 PM, Reshma Pattan wrote:
> > > using source length in strncpy can cause destination
> > > overflow if destination length is not big enough to
> > > handle the source string. Changes are made to use destination
> > > size instead of source length in strncpy.
> > >
> > > Coverity issue 127351: string overflow
> > >
> > > Fixes: caa7028276b8 ("app/pdump: add tool for packet capturing")
> > >
> > > Signed-off-by: Reshma Pattan 
> > > ---
> > >  app/pdump/main.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/app/pdump/main.c b/app/pdump/main.c
> > > index f8923b9..af92ef3 100644
> > > --- a/app/pdump/main.c
> > > +++ b/app/pdump/main.c
> > > @@ -217,12 +217,12 @@ parse_rxtxdev(const char *key, const char *value,
> > void *extra_args)
> > >   struct pdump_tuples *pt = extra_args;
> > >
> > >   if (!strcmp(key, PDUMP_RX_DEV_ARG)) {
> > > - strncpy(pt->rx_dev, value, strlen(value));
> > > + strncpy(pt->rx_dev, value, sizeof(pt->rx_dev)-1);
> >
> > I guess size-1 is to give room for terminating null byte, but for this
> > case is it guarantied that pt->rx_dev last byte is NULL?
> >
> >
> 
> 
> -- 
> In the beginning was the lambda, and the lambda was with Emacs, and Emacs
> was the lambda.

[dpdk-dev] [PATCH v3 00/25] Refactor mlx5 to improve performance

2016-06-22 Thread Adrien Mazarguil

On Tue, Jun 21, 2016 at 05:42:29PM +0100, Ferruh Yigit wrote:
> On 6/21/2016 8:23 AM, Nelio Laranjeiro wrote:
> > Enhance mlx5 with a data path that bypasses Verbs.
> > 
> > The first half of this patchset removes support for functionality completely
> > rewritten in the second half (scatter/gather, inline send), while the data
> > path is refactored without Verbs.
> > 
> > The PMD remains usable during the transition.
> > 
> > This patchset must be applied after "Miscellaneous fixes for mlx4 and mlx5".
> > 
> > Changes in v3:
> > - Rebased patchset on top of next-net/rel_16_07.
> > 
> > Changes in v2:
> > - Rebased patchset on top of dpdk/master.
> > - Fixed CQE size on Power8.
> > - Fixed mbuf assertion failure in debug mode.
> > - Fixed missing class_id field in rte_pci_id by using RTE_PCI_DEVICE.
> > 
> > Adrien Mazarguil (8):
> >   mlx5: replace countdown with threshold for Tx completions
> >   mlx5: add debugging information about Tx queues capabilities
> >   mlx5: check remaining space while processing Tx burst
> >   mlx5: resurrect Tx gather support
> >   mlx5: work around spurious compilation errors
> >   mlx5: remove redundant Rx queue initialization code
> >   mlx5: make Rx queue reinitialization safer
> >   mlx5: resurrect Rx scatter support
> > 
> > Nelio Laranjeiro (16):
> >   drivers: fix PCI class id support
> >   mlx5: split memory registration function
> >   mlx5: remove Tx gather support
> >   mlx5: remove Rx scatter support
> >   mlx5: remove configuration variable
> >   mlx5: remove inline Tx support
> >   mlx5: split Tx queue structure
> >   mlx5: split Rx queue structure
> >   mlx5: update prerequisites for upcoming enhancements
> >   mlx5: add definitions for data path without Verbs
> >   mlx5: add support for configuration through kvargs
> >   mlx5: add Tx/Rx burst function selection wrapper
> >   mlx5: refactor Rx data path
> >   mlx5: refactor Tx data path
> >   mlx5: handle Rx CQE compression
> >   mlx5: add support for multi-packet send
> > 
> > Yaacov Hazan (1):
> >   mlx5: add support for inline send
> > 
> 
> Patchset applies and compiles fine, thanks.
> 
> But still has some checkpatch warnings, -btw, I am using the checkpatch
> script from latest master branch of Linux repo.
> 
> Following is the sample type of warnings (not instances, there are more
> than one instance per type):

While Nelio is preparing a v4 to address the kvargs issue, the remaining
warnings can be safely ignored.

A few of them are in relocated but unmodified code as this patchset
refactors the entire PMD, others are documented. We settled on positive
errno values internally because mlx5 uses syscalls and switching the sign
bit all over the place quickly became unmanageable. They are made negative
when returning from public callbacks (except for kvargs by mistake).

In short, we did run checkpatch, fixed a million warnings and other errors
and left those on purpose, nothing to worry about.

> WARNING:UNSPECIFIED_INT: Prefer 'unsigned int' to bare use of 'unsigned'
> #112: FILE: drivers/net/mlx5/mlx5_mr.c:65:
> +   unsigned mem_idx)
> 
> WARNING:BLOCK_COMMENT_STYLE: Block comments use a trailing */ on a
> separate line
> #288: FILE: drivers/net/mlx5/mlx5_mr.c:241:
> +* pointer is valid. */
> 
> WARNING:USE_NEGATIVE_ERRNO: return of an errno should typically be
> negative (ie: return -EINVAL)
> #524: FILE: drivers/net/mlx5/mlx5_txq.c:265:
> +   return EINVAL;
> 
> WARNING:LONG_LINE: line over 80 characters
> #108: FILE: drivers/net/mlx5/mlx5_ethdev.c:1250:
> +   txq_ctrl->txq.stats.idx =
> primary_txq->stats.idx;
> 
> WARNING:STATIC_CONST_CHAR_ARRAY: static const char * array should
> probably be static const char * const
> #88: FILE: drivers/net/mlx5/mlx5.c:281:
> +   static const char *params[] = {
> 
> ERROR:ASSIGN_IN_IF: do not use assignment in if condition
> #218: FILE: drivers/net/mlx5/mlx5_rxtx.c:92:
> +   if (!ret || !(ret = ((*buf)[i] == magic[i])))
> 
> CHECK:SPACING: spaces preferred around that '&' (ctx:VxV)
> #414: FILE: drivers/net/mlx5/mlx5_rxtx.c:625:
> +   (uintptr_t)&(*rxq->cqes)[rxq->cq_ci &
>^
> 
> WARNING:INDENTED_LABEL: labels should not be indented
> #520: FILE: drivers/net/mlx5/mlx5_rxtx.c:789:
> +   skip:

-- 
Adrien Mazarguil
6WIND

[dpdk-dev] [PATCH v3 00/25] Refactor mlx5 to improve performance

2016-06-22 Thread Bruce Richardson

On Wed, Jun 22, 2016 at 10:20:54AM +0200, Adrien Mazarguil wrote:
> On Tue, Jun 21, 2016 at 05:42:29PM +0100, Ferruh Yigit wrote:
> > On 6/21/2016 8:23 AM, Nelio Laranjeiro wrote:
> > > Enhance mlx5 with a data path that bypasses Verbs.
> > > 
> > > The first half of this patchset removes support for functionality 
> > > completely
> > > rewritten in the second half (scatter/gather, inline send), while the data
> > > path is refactored without Verbs.
> > > 
> > > The PMD remains usable during the transition.
> > > 
> > > This patchset must be applied after "Miscellaneous fixes for mlx4 and 
> > > mlx5".
> > > 
> > > Changes in v3:
> > > - Rebased patchset on top of next-net/rel_16_07.
> > > 
> > > Changes in v2:
> > > - Rebased patchset on top of dpdk/master.
> > > - Fixed CQE size on Power8.
> > > - Fixed mbuf assertion failure in debug mode.
> > > - Fixed missing class_id field in rte_pci_id by using RTE_PCI_DEVICE.
> > > 
> > > Adrien Mazarguil (8):
> > >   mlx5: replace countdown with threshold for Tx completions
> > >   mlx5: add debugging information about Tx queues capabilities
> > >   mlx5: check remaining space while processing Tx burst
> > >   mlx5: resurrect Tx gather support
> > >   mlx5: work around spurious compilation errors
> > >   mlx5: remove redundant Rx queue initialization code
> > >   mlx5: make Rx queue reinitialization safer
> > >   mlx5: resurrect Rx scatter support
> > > 
> > > Nelio Laranjeiro (16):
> > >   drivers: fix PCI class id support
> > >   mlx5: split memory registration function
> > >   mlx5: remove Tx gather support
> > >   mlx5: remove Rx scatter support
> > >   mlx5: remove configuration variable
> > >   mlx5: remove inline Tx support
> > >   mlx5: split Tx queue structure
> > >   mlx5: split Rx queue structure
> > >   mlx5: update prerequisites for upcoming enhancements
> > >   mlx5: add definitions for data path without Verbs
> > >   mlx5: add support for configuration through kvargs
> > >   mlx5: add Tx/Rx burst function selection wrapper
> > >   mlx5: refactor Rx data path
> > >   mlx5: refactor Tx data path
> > >   mlx5: handle Rx CQE compression
> > >   mlx5: add support for multi-packet send
> > > 
> > > Yaacov Hazan (1):
> > >   mlx5: add support for inline send
> > > 
> > 
> > Patchset applies and compiles fine, thanks.
> > 
> > But still has some checkpatch warnings, -btw, I am using the checkpatch
> > script from latest master branch of Linux repo.
> > 
> > Following is the sample type of warnings (not instances, there are more
> > than one instance per type):
> 
> While Nelio is preparing a v4 to address the kvargs issue, the remaining
> warnings can be safely ignored.
> 
> A few of them are in relocated but unmodified code as this patchset
> refactors the entire PMD, others are documented. We settled on positive
> errno values internally because mlx5 uses syscalls and switching the sign
> bit all over the place quickly became unmanageable. They are made negative
> when returning from public callbacks (except for kvargs by mistake).
> 
> In short, we did run checkpatch, fixed a million warnings and other errors
> and left those on purpose, nothing to worry about.
> 

Yes, they are nothing to worry about, but at the same time, I fail to see why
most of them should not be fixed. Even if you are moving code, unless it's a 
whole
file it's not going to show up as a move in the diff, so some small changes
during the move can be ok.

> > WARNING:UNSPECIFIED_INT: Prefer 'unsigned int' to bare use of 'unsigned'
> > #112: FILE: drivers/net/mlx5/mlx5_mr.c:65:
> > +   unsigned mem_idx)
> > 
This looks easily fixable.

> > WARNING:BLOCK_COMMENT_STYLE: Block comments use a trailing */ on a
> > separate line
> > #288: FILE: drivers/net/mlx5/mlx5_mr.c:241:
> > +* pointer is valid. */
> > 
I also think this should be fixed.

> > WARNING:USE_NEGATIVE_ERRNO: return of an errno should typically be
> > negative (ie: return -EINVAL)
> > #524: FILE: drivers/net/mlx5/mlx5_txq.c:265:
> > +   return EINVAL;
> > 
> > WARNING:LONG_LINE: line over 80 characters
> > #108: FILE: drivers/net/mlx5/mlx5_ethdev.c:1250:
> > +   txq_ctrl->txq.stats.idx =
> > primary_txq->stats.idx;
> > 
> > WARNING:STATIC_CONST_CHAR_ARRAY: static const char * array should
> > probably be static const char * const
> > #88: FILE: drivers/net/mlx5/mlx5.c:281:
> > +   static const char *params[] = {
> > 
> > ERROR:ASSIGN_IN_IF: do not use assignment in if condition
> > #218: FILE: drivers/net/mlx5/mlx5_rxtx.c:92:
> > +   if (!ret || !(ret = ((*buf)[i] == magic[i])))
> > 
> > CHECK:SPACING: spaces preferred around that '&' (ctx:VxV)
> > #414: FILE: drivers/net/mlx5/mlx5_rxtx.c:625:
> > +   (uintptr_t)&(*rxq->cqes)[rxq->cq_ci &
> >^
> > 
> > WARNING:INDENTED_LABEL: labels should not be indented
> > #520: FILE: drivers/net/mlx5/mlx5_rxtx.c:789:
> > +   skip:
This I also fe

[dpdk-dev] [PATCH v3 1/2] ethdev: add callback to get register size in bytes

2016-06-22 Thread Zyta Szpak

Note that if we remove rte_eth_dev_get_reg_length() then it will break 
all of the drivers that implement it. Shall I remove it all leave it and 
modify only ethtool to use rte_eth_dev_get_regs() to get reg size? In 
the end the drivers will have to implement the part of setting the size 
in reg_info struct. rte_eth_dev_get_regs() itself wouldn't change at all.
Or do you have different opinion?

On 21.06.2016 11:55, Zyta Szpak wrote:
> OK, I will do the v4.
>
> On 17.06.2016 12:20, Thomas Monjalon wrote:
>> 2016-06-13 16:51, Remy Horton:
>>> On 12/06/2016 15:51, Zyta Szpak wrote:
  I would prefer having only one function rte_eth_dev_get_regs()
  which returns length and width if data is NULL.
  The first call is a parameter request before buffer allocation,
  and the second call fills the buffer.

  We can deprecate the old API and introduce this new one.

  Opinions?

 In my opinion as it is now it works fine. Gathering all parameters in
 one callback might be a good idea if the maintainer also agrees to 
 that
 because as I mentioned, it interferes.
>>>   From my perspective changing rte_eth_dev_get_regs() isn't a 
>>> problem, as
>>> it isn't used directly rather than through rte_ethtool_get_regs()..
>> Zyta, would you like to make a v4?
>

[dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset

2016-06-22 Thread Thomas Monjalon

2016-06-22 13:29, Jerin Jacob:
> Thomas,
> As a librte_ether maintainer any comments on this?

+1 for adding details and make sure naming is good.
I don't really need to comment here because I have already done this
comment earlier:
http://dpdk.org/ml/archives/dev/2016-June/041845.html
Thank you for insisting.

[dpdk-dev] [PATCH v15 0/3] mempool: add mempool handler feature

2016-06-22 Thread Thomas Monjalon

2016-06-22 09:56, Thomas Monjalon:
> 2016-06-19 13:05, David Hunt:
> > v15 changes:
> > 
> >  * Changed rte_mempool_ops_get() to rte_mempool_get_ops()
> 
> I don't find this change in the patch.
> But I wonder wether it is really needed.

If we assume that rte_mempool_ops_* are wrappers on top of handlers,
rte_mempool_ops_get and rte_mempool_ops_register should be renamed to
rte_mempool_get_ops and rte_mempool_register_ops.

[dpdk-dev] [PATCH v15 0/3] mempool: add mempool handler feature

2016-06-22 Thread Thomas Monjalon

2016-06-19 13:05, David Hunt:
> v15 changes:
> 
>  * Changed rte_mempool_ops_get() to rte_mempool_get_ops()

I don't find this change in the patch.
But I wonder wether it is really needed.

[dpdk-dev] [PATCH v3 1/2] ethdev: add tunnel and port RSS offload types

2016-06-22 Thread Thomas Monjalon

2016-06-22 12:45, Jerin Jacob:
> On Wed, Jun 22, 2016 at 08:43:52AM +0200, Thomas Monjalon wrote:
> > 2016-06-22 09:00, Jerin Jacob:
> > > On Tue, Jun 21, 2016 at 11:02:59PM +0200, Thomas Monjalon wrote:
> > > > 2016-03-31 02:21, Jerin Jacob:
> > > > > +#define RTE_ETH_FLOW_PORT   18
> > > > > +#define RTE_ETH_FLOW_VXLAN  19
> > > > > +#define RTE_ETH_FLOW_GENEVE 20
> > > > > +#define RTE_ETH_FLOW_NVGRE  21
> > > > > +#define RTE_ETH_FLOW_MAX22
> > > > 
> > > > Please could you explain more what is PORT flow?
> > > 
> > > For example, a NIC card with two physical port where application
> > > configures RTE_ETH_FLOW_IPV4 for both, In that case
> > > HW generate same RSS value for a similar IPV4 packet,  However, in-case if
> > > application want to generate a flow that account physical port also then
> > > it can configure with RTE_ETH_FLOW_IPV4 | RTE_ETH_FLOW_PORT.
> > > 
> > > RTE_ETH_FLOW_PORT useful for the case where one physical port assigned for
> > > INBOUND traffic and other-one for OUTBOUND traffic etc
> > 
> > OK
> > 
> > > > Does it need a comment in the code?
> > > Not sure, commit log has description.
> > 
> > How do you expect the user to understand this new value in the API?
> > Users do not check in the git history.
> > They use doxygen, headers comments and/or examples.
> 
> The reason why I said because none of flow type has comments in the
> list. If you think RTE_ETH_FLOW_PORT needs a doxygen comment then I can
> add it.
> 
> It would be nice some else could add the comments for following,
> RTE_ETH_FLOW_RAW,
> RTE_ETH_FLOW_L2_PAYLOAD

These values passed without a proper check.
That's why we must not accept any line in API without good comment.

Please go ahead with what you can do and we'll fix or remove
the remaining later.

[dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset

2016-06-22 Thread Jerin Jacob

On Wed, Jun 22, 2016 at 03:32:16AM +, Lu, Wenzhuo wrote:
> Hi Jerin,
> 
> > -Original Message-
> > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > Sent: Wednesday, June 22, 2016 10:38 AM
> > To: Lu, Wenzhuo
> > Cc: Ananyev, Konstantin; Stephen Hemminger; dev at dpdk.org; Richardson,
> > Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> > thomas.monjalon at 6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device 
> > reset
> > 
> > On Wed, Jun 22, 2016 at 01:35:37AM +, Lu, Wenzhuo wrote:
> > > Hi Jerin,
> > >
> > > > -Original Message-
> > > > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > > > Sent: Tuesday, June 21, 2016 10:29 PM
> > > > To: Ananyev, Konstantin
> > > > Cc: Lu, Wenzhuo; Stephen Hemminger; dev at dpdk.org; Richardson, Bruce;
> > > > Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> > > > thomas.monjalon at 6wind.com
> > > > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support
> > > > device reset
> > > >
> > > > On Tue, Jun 21, 2016 at 02:03:15PM +, Ananyev, Konstantin wrote:
> > > > >
> > > > >
> > > > > > > > > > Hi Wenzhuo,
> > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Jun 20, 2016 at 02:24:27PM +0800,
> > > > > > > > > > > > > > > > Wenzhuo Lu
> > > > wrote:
> > > > > > > > > > > > > > > > > Add an API to reset the device.
> > > > > > > > > > > > > > > > > It's for VF device in this scenario, kernel 
> > > > > > > > > > > > > > > > > PF + DPDK VF.
> > > > > > > > > > > > > > > > > When the PF port down->up, APP should call
> > > > > > > > > > > > > > > > > this API to reset VF port. Most likely,
> > > > > > > > > > > > > > > > > APP should call it in its management
> > > > > > > > > > > > > > > > > thread and guarantee the thread safe. It
> > > > > > > > > > > > > > > > > means APP should stop the rx/tx and the
> > > > > > > > > > > > > > > > > device, then reset the device, then
> > > > recover the device and rx/tx.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Following is _a_ use-case for Device reset.
> > > > > > > > > > > > > > > > But may be not be _the_ use case. IMO, We
> > > > > > > > > > > > > > > > need to first say expected behavior of this
> > > > > > > > > > > > > > > > API and add a use-case
> > > > later.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Other use-case would be, PCIe VF with
> > > > > > > > > > > > > > > > functional level reset for SRIOV migration.
> > > > > > > > > > > > > > > > Are we on same page?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > In my experience with Linux devices, this is
> > > > > > > > > > > > > > > normally handled by the device driver in the
> > > > > > > > > > > > > > > start routine.  Since any use case which needs
> > > > > > > > > > > > > > > this is going to do a stop/reset/start
> > > > > > > > > > > > > > > sequence, why not just have
> > > > the VF device driver do this in the start routine?.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Adding yet another API and state transistion
> > > > > > > > > > > > > > > if not necessary increases the complexity and
> > > > > > > > > > > > > > > required test
> > > > cases for all devices.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I agree with Stephen here.I think if application
> > > > > > > > > > > > > > needs to call start after the device reset then
> > > > > > > > > > > > > > we could add this logic in start itself rather
> > > > > > > > > > > > > > exposing a yet another API
> > > > > > > > > > > > > Do you mean changing the device_start to include
> > > > > > > > > > > > > all these actions, stop
> > > > > > > > > > > > device -> stop queue -> re-setup queue -> start
> > > > > > > > > > > > queue -> start
> > > > device ?
> > > > > > > > > > > >
> > > > > > > > > > > > What was the expected API call sequence when you
> > > > > > > > > > > > were
> > > > introduced this API?
> > > > > > > > > > > >
> > > > > > > > > > > > Point was to have implicit device reset in the API
> > > > > > > > > > > > call sequence(Wherever make sense for specific PMD)
> > > > > > > > > > > I think the API call sequence depends on the
> > > > > > > > > > > implementation of the APP. Let's say if there's not
> > > > > > > > > > > this reset API, APP can use
> > > > > > this
> > > > > > > > API
> > > > > > > > > > call sequence to handle the PF link down/up event,
> > > > > > > > > > rte_eth_dev_close -> rte_eth_rx_queue_setup ->
> > > > > > rte_eth_tx_queue_setup -
> > > > > > > > >
> > > > > > > > > > rte_eth_dev_start.
> > > > > > > > > > > Actually our purpose is to use this reset API instead
> > > > > > > > > > > of the API call sequence. You can see the reset API is
> > > > > > > > > > > not necessary. The
> > > > > > > > benefit
> > > > > > > > > > is to save the code for APP.
> > > > > > > > > >
> > > > > > > > > > Then I am bit confused with origina

[dpdk-dev] [PATCH v3 11/25] mlx5: add support for configuration through kvargs

2016-06-22 Thread Nélio Laranjeiro

On Tue, Jun 21, 2016 at 05:42:42PM +0100, Ferruh Yigit wrote:
> On 6/21/2016 8:23 AM, Nelio Laranjeiro wrote:
> > The intent is to replace the remaining compile-time options and environment
> > variables with a common mean of runtime configuration. This commit only
> > adds the kvargs handling code, subsequent commits will update the rest.
> > 
> > Signed-off-by: Nelio Laranjeiro 
> > Signed-off-by: Adrien Mazarguil 
> > ---
> 
> ...
> 
> > +static int
> > +mlx5_args_check(const char *key, const char *val, void *opaque)
> > +{
> > +   struct priv *priv = opaque;
> > +
> > +   /* No parameters are expected at the moment. */
> > +   (void)priv;
> > +   (void)val;
> > +   WARN("%s: unknown parameter", key);
> > +   return EINVAL;
> Returning positive value here will prevent rte_kvargs_process() to fail,
> I guess that is the intention but returning EINVAL is misleading.
> 
> Also generating the checkpatch warning:
> WARNING:USE_NEGATIVE_ERRNO: return of an errno should typically be
> negative (ie: return -EINVAL)
> #71: FILE: drivers/net/mlx5/mlx5.c:264:
> +   return EINVAL;
> 

Good catch, in fact as it is not processed by the PMD itself, it must be
compliant with the rte_kvargs_process().

I will fix in in the v4.

-- 
N?lio Laranjeiro
6WIND

[dpdk-dev] [PATCH v5 10/17] ethdev: get rid of eth driver register callback

2016-06-22 Thread Neil Horman

On Wed, Jun 22, 2016 at 02:36:29PM +0530, Shreyansh Jain wrote:
> Now that all pdev are pci drivers, we don't need to register ethdev drivers
> through a dedicated channel.
> 
> Signed-off-by: David Marchand 
> Signed-off-by: Shreyansh Jain 
> ---
>  lib/librte_ether/rte_ethdev.c  | 22 --
>  lib/librte_ether/rte_ethdev.h  | 12 
>  lib/librte_ether/rte_ether_version.map |  1 -
>  3 files changed, 35 deletions(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 312c42c..06065fe 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -340,28 +340,6 @@ rte_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
>   return 0;
>  }
>  
> -/**
> - * Register an Ethernet [Poll Mode] driver.
> - *
> - * Function invoked by the initialization function of an Ethernet driver
> - * to simultaneously register itself as a PCI driver and as an Ethernet
> - * Poll Mode Driver.
> - * Invokes the rte_eal_pci_register() function to register the *pci_drv*
> - * structure embedded in the *eth_drv* structure, after having stored the
> - * address of the rte_eth_dev_init() function in the *devinit* field of
> - * the *pci_drv* structure.
> - * During the PCI probing phase, the rte_eth_dev_init() function is
> - * invoked for each PCI [Ethernet device] matching the embedded PCI
> - * identifiers provided by the driver.
> - */
> -void
> -rte_eth_driver_register(struct eth_driver *eth_drv)
> -{
> - eth_drv->pci_drv.devinit = rte_eth_dev_pci_probe;
> - eth_drv->pci_drv.devuninit = rte_eth_dev_pci_remove;
> - rte_eal_pci_register(ð_drv->pci_drv);
> -}
> -
>  int
>  rte_eth_dev_is_valid_port(uint8_t port_id)
>  {
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 2249466..ffd24e4 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1862,18 +1862,6 @@ struct eth_driver {
>  };
>  
>  /**
> - * @internal
> - * A function invoked by the initialization function of an Ethernet driver
> - * to simultaneously register itself as a PCI driver and as an Ethernet
> - * Poll Mode Driver (PMD).
> - *
> - * @param eth_drv
> - *   The pointer to the *eth_driver* structure associated with
> - *   the Ethernet driver.
> - */
> -void rte_eth_driver_register(struct eth_driver *eth_drv);
> -
> -/**
>   * Convert a numerical speed in Mbps to a bitmap flag that can be used in
>   * the bitmap link_speeds of the struct rte_eth_conf
>   *
> diff --git a/lib/librte_ether/rte_ether_version.map 
> b/lib/librte_ether/rte_ether_version.map
> index cf4581c..8151007 100644
> --- a/lib/librte_ether/rte_ether_version.map
> +++ b/lib/librte_ether/rte_ether_version.map
> @@ -80,7 +80,6 @@ DPDK_2.2 {
>   rte_eth_dev_vlan_filter;
>   rte_eth_dev_wd_timeout_store;
>   rte_eth_dma_zone_reserve;
> - rte_eth_driver_register;
>   rte_eth_led_off;
>   rte_eth_led_on;
>   rte_eth_link;
Nak, Same issue as the crypto registration

> -- 
> 2.7.4
> 
>

1 2 >

1 - 100 of 116 matches

Mail list logo