[dpdk-dev] [PATCH] vhost: remove unnecessary memset for virtio net hdr

2016-03-17 Thread Thomas Monjalon
2016-03-17 01:19, Xie, Huawei:
> On 3/16/2016 2:44 PM, Yuanhan Liu wrote:
> > We have to reset the virtio net hdr at virtio_enqueue_offload()
> > before, due to all mbufs share a single virtio_hdr structure:
> >
> > struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0};
> >
> > foreach (mbuf) {
> > virtio_enqueue_offload(mbuf, _hdr.hdr);
> >
> > copy net hdr and mbuf to desc buf
> > }
> >
> > However, after the vhost rxtx refactor, the code looks like:
> >
> > copy_mbuf_to_desc(mbuf)
> > {
> > struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0}
> >
> > virtio_enqueue_offload(mbuf, _hdr.hdr);
> >
> > copy net hdr and mbuf to desc buf
> > }
> >
> > foreach (mbuf) {
> > copy_mbuf_to_desc(mbuf);
> > }
> >
> > Therefore, the memset at virtio_enqueue_offload() is not necessary
> > any more; remove it.
> >
> > Signed-off-by: Yuanhan Liu 
> 
> Acked-by: Huawei Xie 

Applied, thanks


[dpdk-dev] [PATCH] mk: fix linker script when re-building

2016-03-17 Thread Thomas Monjalon
2016-03-17 11:37, Panu Matilainen:
> On 03/17/2016 01:22 AM, Sergio Gonzalez Monroy wrote:
> > The linker script is generated by simply finding all libraries in
> > RTE_OUTPUT/lib.
> >
> > The issue shows up when re-building the DPDK, hence already having a
> > linker script in that directory, resulting in the linker script
> > including itself.
> >
> > That does not play well with the linker.
> >
> > Simply filtering the linker script from all the found libraries solves
> > the problem.
> >
> > Fixes: 948fd64befc3 ("mk: replace the combined library with a linker 
> > script")
> >
> > Signed-off-by: Sergio Gonzalez Monroy 
> 
> Oops, thanks for spotting.
> 
> Acked-by: Panu Matilainen 

Applied, thanks


[dpdk-dev] [PATCH v11 8/8] ethdev: add 100G link speed

2016-03-17 Thread Thomas Monjalon
The link speed configuration is now done with bitmaps so 100G speed
requires only a new bit flag.
The actual link speed is a number so its size must be increased from
16-bit to 32-bit.

Signed-off-by: Marc Sune 
Tested-by: Nelio Laranjeiro 
Signed-off-by: Thomas Monjalon 
---
 app/test-pmd/cmdline.c  | 12 +++-
 doc/guides/nics/szedata2.rst|  6 --
 doc/guides/rel_notes/release_16_04.rst  |  5 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  2 +-
 drivers/net/fm10k/fm10k_ethdev.c|  2 +-
 drivers/net/mlx5/mlx5_ethdev.c  |  2 +-
 drivers/net/nfp/nfp_net.c   |  2 +-
 drivers/net/szedata2/rte_eth_szedata2.c |  9 ++---
 lib/librte_ether/rte_ethdev.c   |  2 ++
 lib/librte_ether/rte_ethdev.h   |  4 +++-
 10 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 3bc7bb4..3337b7b 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -549,7 +549,7 @@ static void cmd_help_long_parsed(void *parsed_result,
"Detach physical or virtual dev by port_id\n\n"

"port config (port_id|all)"
-   " speed (10|100|1000|1|4|auto)"
+   " speed (10|100|1000|1|4|10|auto)"
" duplex (half|full|auto)\n"
"Set speed and duplex for all ports or port_id\n\n"

@@ -1017,6 +1017,8 @@ parse_and_check_speed_duplex(char *speedstr, char 
*duplexstr, uint32_t *speed)
*speed = ETH_LINK_SPEED_10G;
} else if (!strcmp(speedstr, "4")) {
*speed = ETH_LINK_SPEED_40G;
+   } else if (!strcmp(speedstr, "10")) {
+   *speed = ETH_LINK_SPEED_100G;
} else if (!strcmp(speedstr, "auto")) {
*speed = ETH_LINK_SPEED_AUTONEG;
} else {
@@ -1064,7 +1066,7 @@ cmdline_parse_token_string_t cmd_config_speed_all_item1 =
TOKEN_STRING_INITIALIZER(struct cmd_config_speed_all, item1, "speed");
 cmdline_parse_token_string_t cmd_config_speed_all_value1 =
TOKEN_STRING_INITIALIZER(struct cmd_config_speed_all, value1,
-   "10#100#1000#1#4#auto");
+   
"10#100#1000#1#4#10#auto");
 cmdline_parse_token_string_t cmd_config_speed_all_item2 =
TOKEN_STRING_INITIALIZER(struct cmd_config_speed_all, item2, "duplex");
 cmdline_parse_token_string_t cmd_config_speed_all_value2 =
@@ -1074,7 +1076,7 @@ cmdline_parse_token_string_t cmd_config_speed_all_value2 =
 cmdline_parse_inst_t cmd_config_speed_all = {
.f = cmd_config_speed_all_parsed,
.data = NULL,
-   .help_str = "port config all speed 10|100|1000|1|4|auto duplex "
+   .help_str = "port config all speed 10|100|1000|1|4|10|auto 
duplex "
"half|full|auto",
.tokens = {
(void *)_config_speed_all_port,
@@ -1138,7 +1140,7 @@ cmdline_parse_token_string_t 
cmd_config_speed_specific_item1 =
"speed");
 cmdline_parse_token_string_t cmd_config_speed_specific_value1 =
TOKEN_STRING_INITIALIZER(struct cmd_config_speed_specific, value1,
-   "10#100#1000#1#4#auto");
+   
"10#100#1000#1#4#10#auto");
 cmdline_parse_token_string_t cmd_config_speed_specific_item2 =
TOKEN_STRING_INITIALIZER(struct cmd_config_speed_specific, item2,
"duplex");
@@ -1149,7 +1151,7 @@ cmdline_parse_token_string_t 
cmd_config_speed_specific_value2 =
 cmdline_parse_inst_t cmd_config_speed_specific = {
.f = cmd_config_speed_specific_parsed,
.data = NULL,
-   .help_str = "port config X speed 10|100|1000|1|4|auto duplex "
+   .help_str = "port config X speed 10|100|1000|1|4|10|auto 
duplex "
"half|full|auto",
.tokens = {
(void *)_config_speed_specific_port,
diff --git a/doc/guides/nics/szedata2.rst b/doc/guides/nics/szedata2.rst
index 77c15b3..741b400 100644
--- a/doc/guides/nics/szedata2.rst
+++ b/doc/guides/nics/szedata2.rst
@@ -148,9 +148,3 @@ Example output:
  TX threshold registers: pthresh=0 hthresh=0 wthresh=0
  TX RS bit threshold=0 - TXQ flags=0x0
testpmd>
-
-.. note::
-
-   Link speed API currently supports speeds up to 40 Gbps.
-   Therefore there is used 10G constant for 100 Gbps cards until the link speed
-   API is not changed.
diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 

[dpdk-dev] [PATCH v11 7/8] ethdev: convert speed number to bitmap flag

2016-03-17 Thread Thomas Monjalon
From: Marc Sune 

It is a helper for the bitmap configuration.

Signed-off-by: Marc Sune 
Signed-off-by: Thomas Monjalon 
---
 lib/librte_ether/rte_ethdev.c  | 31 +++
 lib/librte_ether/rte_ethdev.h  | 13 +
 lib/librte_ether/rte_ether_version.map |  1 +
 3 files changed, 45 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index db35102..4dbea4e 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -866,6 +866,37 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, 
uint16_t nb_queues)
return 0;
 }

+uint32_t
+rte_eth_speed_bitflag(uint32_t speed, int duplex)
+{
+   switch (speed) {
+   case ETH_SPEED_NUM_10M:
+   return duplex ? ETH_LINK_SPEED_10M : ETH_LINK_SPEED_10M_HD;
+   case ETH_SPEED_NUM_100M:
+   return duplex ? ETH_LINK_SPEED_100M : ETH_LINK_SPEED_100M_HD;
+   case ETH_SPEED_NUM_1G:
+   return ETH_LINK_SPEED_1G;
+   case ETH_SPEED_NUM_2_5G:
+   return ETH_LINK_SPEED_2_5G;
+   case ETH_SPEED_NUM_5G:
+   return ETH_LINK_SPEED_5G;
+   case ETH_SPEED_NUM_10G:
+   return ETH_LINK_SPEED_10G;
+   case ETH_SPEED_NUM_20G:
+   return ETH_LINK_SPEED_20G;
+   case ETH_SPEED_NUM_25G:
+   return ETH_LINK_SPEED_25G;
+   case ETH_SPEED_NUM_40G:
+   return ETH_LINK_SPEED_40G;
+   case ETH_SPEED_NUM_50G:
+   return ETH_LINK_SPEED_50G;
+   case ETH_SPEED_NUM_56G:
+   return ETH_LINK_SPEED_56G;
+   default:
+   return 0;
+   }
+}
+
 int
 rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
  const struct rte_eth_conf *dev_conf)
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 470e434..4b37cd0 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1871,6 +1871,19 @@ struct eth_driver {
 void rte_eth_driver_register(struct eth_driver *eth_drv);

 /**
+ * Convert a numerical speed in Mbps to a bitmap flag that can be used in
+ * the bitmap link_speeds of the struct rte_eth_conf
+ *
+ * @param speed
+ *   Numerical speed value in Mbps
+ * @param duplex
+ *   ETH_LINK_[HALF/FULL]_DUPLEX (only for 10/100M speeds)
+ * @return
+ *   0 if the speed cannot be mapped
+ */
+uint32_t rte_eth_speed_bitflag(uint32_t speed, int duplex);
+
+/**
  * Configure an Ethernet device.
  * This function must be invoked first before any other function in the
  * Ethernet API. This function can also be re-invoked when a device is in the
diff --git a/lib/librte_ether/rte_ether_version.map 
b/lib/librte_ether/rte_ether_version.map
index 5cb4d79..13d9f2e 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -124,6 +124,7 @@ DPDK_16.04 {
rte_eth_dev_set_vlan_ether_type;
rte_eth_dev_udp_tunnel_port_add;
rte_eth_dev_udp_tunnel_port_delete;
+   rte_eth_speed_bitflag;
rte_eth_tx_buffer_count_callback;
rte_eth_tx_buffer_drop_callback;
rte_eth_tx_buffer_init;
-- 
2.7.0



[dpdk-dev] [PATCH v11 6/8] ethdev: redesign link speed config

2016-03-17 Thread Thomas Monjalon
From: Marc Sune 

This patch redesigns the API to set the link speed/s configuration
of an ethernet port. Specifically:

- it allows to define a set of advertised speeds for
  auto-negociation.
- it allows to disable link auto-negociation (single fixed speed).
- default: auto-negociate all supported speeds.

A flag autoneg in struct rte_eth_link indicates if link speed was a
result of auto-negociation or was fixed by configuration.

Signed-off-by: Marc Sune 
Tested-by: Nelio Laranjeiro 
Signed-off-by: Thomas Monjalon 
---
 app/test-pmd/cmdline.c| 26 
 doc/guides/rel_notes/release_16_04.rst|  9 +++
 drivers/net/af_packet/rte_eth_af_packet.c |  1 +
 drivers/net/bnx2x/bnx2x_ethdev.c  |  4 +-
 drivers/net/bonding/rte_eth_bond_8023ad.c |  2 +-
 drivers/net/e1000/em_ethdev.c | 99 +++
 drivers/net/e1000/igb_ethdev.c| 94 +++--
 drivers/net/i40e/i40e_ethdev.c| 48 +++
 drivers/net/i40e/i40e_ethdev_vf.c |  7 ++-
 drivers/net/ixgbe/ixgbe_ethdev.c  | 46 ++
 drivers/net/mlx4/mlx4.c   |  2 +
 drivers/net/mpipe/mpipe_tilegx.c  |  2 +
 drivers/net/null/rte_eth_null.c   |  1 +
 drivers/net/pcap/rte_eth_pcap.c   |  1 +
 drivers/net/ring/rte_eth_ring.c   |  1 +
 drivers/net/szedata2/rte_eth_szedata2.c   |  2 +
 drivers/net/vmxnet3/vmxnet3_ethdev.c  |  1 +
 drivers/net/xenvirt/rte_eth_xenvirt.c |  1 +
 examples/ip_pipeline/config_parse.c   |  3 +-
 lib/librte_ether/rte_ethdev.h | 29 +
 20 files changed, 196 insertions(+), 183 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 874129a..3bc7bb4 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -984,7 +984,7 @@ struct cmd_config_speed_all {
 };

 static int
-parse_and_check_speed_duplex(char *speedstr, char *duplexstr, uint16_t *speed)
+parse_and_check_speed_duplex(char *speedstr, char *duplexstr, uint32_t *speed)
 {

int duplex;
@@ -1001,20 +1001,22 @@ parse_and_check_speed_duplex(char *speedstr, char 
*duplexstr, uint16_t *speed)
}

if (!strcmp(speedstr, "10")) {
-   *speed = ETH_SPEED_NUM_10M;
+   *speed = (duplex == ETH_LINK_HALF_DUPLEX) ?
+   ETH_LINK_SPEED_10M_HD : ETH_LINK_SPEED_10M;
} else if (!strcmp(speedstr, "100")) {
-   *speed = ETH_SPEED_NUM_100M;
+   *speed = (duplex == ETH_LINK_HALF_DUPLEX) ?
+   ETH_LINK_SPEED_100M_HD : ETH_LINK_SPEED_100M;
} else {
if (duplex != ETH_LINK_FULL_DUPLEX) {
printf("Invalid speed/duplex parameters\n");
return -1;
}
if (!strcmp(speedstr, "1000")) {
-   *speed = ETH_SPEED_NUM_1G;
+   *speed = ETH_LINK_SPEED_1G;
} else if (!strcmp(speedstr, "1")) {
-   *speed = ETH_SPEED_NUM_10G;
+   *speed = ETH_LINK_SPEED_10G;
} else if (!strcmp(speedstr, "4")) {
-   *speed = ETH_SPEED_NUM_40G;
+   *speed = ETH_LINK_SPEED_40G;
} else if (!strcmp(speedstr, "auto")) {
*speed = ETH_LINK_SPEED_AUTONEG;
} else {
@@ -1032,8 +1034,7 @@ cmd_config_speed_all_parsed(void *parsed_result,
__attribute__((unused)) void *data)
 {
struct cmd_config_speed_all *res = parsed_result;
-   uint16_t link_speed = ETH_LINK_SPEED_AUTONEG;
-   uint16_t link_duplex = 0;
+   uint32_t link_speed;
portid_t pid;

if (!all_ports_stopped()) {
@@ -1046,8 +1047,7 @@ cmd_config_speed_all_parsed(void *parsed_result,
return;

FOREACH_PORT(pid, ports) {
-   ports[pid].dev_conf.link_speed = link_speed;
-   ports[pid].dev_conf.link_duplex = link_duplex;
+   ports[pid].dev_conf.link_speeds = link_speed;
}

cmd_reconfig_device_queue(RTE_PORT_ALL, 1, 1);
@@ -1105,8 +1105,7 @@ cmd_config_speed_specific_parsed(void *parsed_result,
__attribute__((unused)) void *data)
 {
struct cmd_config_speed_specific *res = parsed_result;
-   uint16_t link_speed = ETH_LINK_SPEED_AUTONEG;
-   uint16_t link_duplex = 0;
+   uint32_t link_speed;

if (!all_ports_stopped()) {
printf("Please stop all ports first\n");
@@ -1120,8 +1119,7 @@ cmd_config_speed_specific_parsed(void *parsed_result,
_speed) < 0)
return;

-   ports[res->id].dev_conf.link_speed = link_speed;
-   ports[res->id].dev_conf.link_duplex = link_duplex;
+   ports[res->id].dev_conf.link_speeds = link_speed;


[dpdk-dev] [PATCH v11 5/8] ethdev: add speed capabilities

2016-03-17 Thread Thomas Monjalon
From: Marc Sune 

The speed capabilities of a device can be retrieved with
rte_eth_dev_info_get().

The new field speed_capa is initialized in the drivers without
taking care of device characteristics in this patch.
When the capabilities of a driver are accurate, the table in
overview.rst must be filled.

Signed-off-by: Marc Sune 
---
 doc/guides/nics/overview.rst   |  1 +
 doc/guides/rel_notes/release_16_04.rst |  8 
 drivers/net/bnx2x/bnx2x_ethdev.c   |  1 +
 drivers/net/cxgbe/cxgbe_ethdev.c   |  1 +
 drivers/net/e1000/em_ethdev.c  |  4 
 drivers/net/e1000/igb_ethdev.c |  4 
 drivers/net/fm10k/fm10k_ethdev.c   |  4 
 drivers/net/i40e/i40e_ethdev.c |  8 
 drivers/net/ixgbe/ixgbe_ethdev.c   |  8 
 drivers/net/mlx4/mlx4.c|  2 ++
 drivers/net/mlx5/mlx5_ethdev.c |  3 +++
 drivers/net/nfp/nfp_net.c  |  2 ++
 lib/librte_ether/rte_ethdev.h  | 21 +
 13 files changed, 67 insertions(+)

diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst
index 2d4f014..893da5f 100644
--- a/doc/guides/nics/overview.rst
+++ b/doc/guides/nics/overview.rst
@@ -88,6 +88,7 @@ Most of these differences are summarized below.
 = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = =
link status  X X X   X
link status event  X X
+   speed capabilities
Rx interrupt   X X X X
queue start/stop X X X X X   X
MTU update   X
diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 2785b29..6ecd304 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -47,6 +47,11 @@ This section should contain new features added in this 
release. Sample format:
   A new function ``rte_pktmbuf_alloc_bulk()`` has been added to allow the user
   to allocate a bulk of mbufs.

+* **Added device link speed capabilities.**
+
+  The structure ``rte_eth_dev_info`` has now a ``speed_capa`` bitmap, which
+  allows the application to know the supported speeds of each device.
+
 * **Restored vmxnet3 Tx data ring.**

   Tx data ring has been shown to improve small pkt forwarding performance
@@ -394,6 +399,9 @@ This section should contain API changes. Sample format:
 * Add a short 1-2 sentence description of the API change. Use fixed width
   quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past 
tense.

+* The ethdev structure ``rte_eth_dev_info`` was changed to support device
+  speed capabilities.
+
 * The functions ``rte_eth_dev_udp_tunnel_add`` and 
``rte_eth_dev_udp_tunnel_delete``
   have been renamed into ``rte_eth_dev_udp_tunnel_port_add`` and
   ``rte_eth_dev_udp_tunnel_port_delete``.
diff --git a/drivers/net/bnx2x/bnx2x_ethdev.c b/drivers/net/bnx2x/bnx2x_ethdev.c
index dc9ce84..607d2f4 100644
--- a/drivers/net/bnx2x/bnx2x_ethdev.c
+++ b/drivers/net/bnx2x/bnx2x_ethdev.c
@@ -327,6 +327,7 @@ bnx2x_dev_infos_get(struct rte_eth_dev *dev, __rte_unused 
struct rte_eth_dev_inf
dev_info->min_rx_bufsize = BNX2X_MIN_RX_BUF_SIZE;
dev_info->max_rx_pktlen  = BNX2X_MAX_RX_PKT_LEN;
dev_info->max_mac_addrs  = BNX2X_MAX_MAC_ADDRS;
+   dev_info->speed_capa = ETH_LINK_SPEED_10G | ETH_LINK_SPEED_20G;
 }

 static void
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index 8c6dd59..bccdca0 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -171,6 +171,7 @@ static void cxgbe_dev_info_get(struct rte_eth_dev *eth_dev,

device_info->rx_desc_lim = cxgbe_desc_lim;
device_info->tx_desc_lim = cxgbe_desc_lim;
+   device_info->speed_capa = ETH_LINK_SPEED_10G | ETH_LINK_SPEED_40G;
 }

 static void cxgbe_dev_promiscuous_enable(struct rte_eth_dev *eth_dev)
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index b9dbc0f..2a50857 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1055,6 +1055,10 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
.nb_min = E1000_MIN_RING_DESC,
.nb_align = EM_TXD_ALIGN,
};
+
+   dev_info->speed_capa = ETH_LINK_SPEED_10M_HD | ETH_LINK_SPEED_10M |
+   ETH_LINK_SPEED_100M_HD | ETH_LINK_SPEED_100M |
+   ETH_LINK_SPEED_1G;
 }

 /* return 0 means link status changed, -1 means not changed */
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 11786ef..b7e706a 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -1919,6 +1919,10 @@ eth_igb_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)

dev_info->rx_desc_lim = 

[dpdk-dev] [PATCH v11 4/8] ethdev: rename link speed constants

2016-03-17 Thread Thomas Monjalon
From: Marc Sune 

The speed numbers ETH_LINK_SPEED_ are renamed ETH_SPEED_NUM_.
The prefix ETH_LINK_SPEED_ is kept for AUTONEG and will be used
for bit flags in next patch.

Signed-off-by: Marc Sune 
---
 app/test-pmd/cmdline.c| 10 +-
 app/test/virtual_pmd.c|  2 +-
 drivers/net/af_packet/rte_eth_af_packet.c |  2 +-
 drivers/net/bonding/rte_eth_bond_8023ad.c | 12 ++--
 drivers/net/cxgbe/base/t4_hw.c|  8 
 drivers/net/e1000/em_ethdev.c |  8 
 drivers/net/e1000/igb_ethdev.c|  8 
 drivers/net/i40e/i40e_ethdev.c| 30 +++---
 drivers/net/i40e/i40e_ethdev_vf.c |  2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c  | 22 +++---
 drivers/net/mpipe/mpipe_tilegx.c  |  4 ++--
 drivers/net/nfp/nfp_net.c |  2 +-
 drivers/net/null/rte_eth_null.c   |  2 +-
 drivers/net/pcap/rte_eth_pcap.c   |  2 +-
 drivers/net/ring/rte_eth_ring.c   |  2 +-
 drivers/net/szedata2/rte_eth_szedata2.c   |  8 
 drivers/net/vmxnet3/vmxnet3_ethdev.c  |  2 +-
 drivers/net/xenvirt/rte_eth_xenvirt.c |  2 +-
 lib/librte_ether/rte_ethdev.h | 29 ++---
 19 files changed, 82 insertions(+), 75 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 37be5cd..874129a 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -1001,20 +1001,20 @@ parse_and_check_speed_duplex(char *speedstr, char 
*duplexstr, uint16_t *speed)
}

if (!strcmp(speedstr, "10")) {
-   *speed = ETH_LINK_SPEED_10;
+   *speed = ETH_SPEED_NUM_10M;
} else if (!strcmp(speedstr, "100")) {
-   *speed = ETH_LINK_SPEED_100;
+   *speed = ETH_SPEED_NUM_100M;
} else {
if (duplex != ETH_LINK_FULL_DUPLEX) {
printf("Invalid speed/duplex parameters\n");
return -1;
}
if (!strcmp(speedstr, "1000")) {
-   *speed = ETH_LINK_SPEED_1000;
+   *speed = ETH_SPEED_NUM_1G;
} else if (!strcmp(speedstr, "1")) {
-   *speed = ETH_LINK_SPEED_10G;
+   *speed = ETH_SPEED_NUM_10G;
} else if (!strcmp(speedstr, "4")) {
-   *speed = ETH_LINK_SPEED_40G;
+   *speed = ETH_SPEED_NUM_40G;
} else if (!strcmp(speedstr, "auto")) {
*speed = ETH_LINK_SPEED_AUTONEG;
} else {
diff --git a/app/test/virtual_pmd.c b/app/test/virtual_pmd.c
index b1d40d7..b4bd2f2 100644
--- a/app/test/virtual_pmd.c
+++ b/app/test/virtual_pmd.c
@@ -604,7 +604,7 @@ virtual_ethdev_create(const char *name, struct ether_addr 
*mac_addr,
TAILQ_INIT(&(eth_dev->link_intr_cbs));

eth_dev->data->dev_link.link_status = ETH_LINK_DOWN;
-   eth_dev->data->dev_link.link_speed = ETH_LINK_SPEED_1;
+   eth_dev->data->dev_link.link_speed = ETH_SPEED_NUM_10G;
eth_dev->data->dev_link.link_duplex = ETH_LINK_FULL_DUPLEX;

eth_dev->data->mac_addrs = rte_zmalloc(name, ETHER_ADDR_LEN, 0);
diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index dee7b59..641f849 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -116,7 +116,7 @@ static const char *valid_arguments[] = {
 static const char *drivername = "AF_PACKET PMD";

 static struct rte_eth_link pmd_link = {
-   .link_speed = 1,
+   .link_speed = ETH_SPEED_NUM_10G,
.link_duplex = ETH_LINK_FULL_DUPLEX,
.link_status = ETH_LINK_DOWN,
 };
diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c 
b/drivers/net/bonding/rte_eth_bond_8023ad.c
index 1b7e93a..ac8306f 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad.c
+++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
@@ -711,22 +711,22 @@ link_speed_key(uint16_t speed) {
case ETH_LINK_SPEED_AUTONEG:
key_speed = 0x00;
break;
-   case ETH_LINK_SPEED_10:
+   case ETH_SPEED_NUM_10M:
key_speed = BOND_LINK_SPEED_KEY_10M;
break;
-   case ETH_LINK_SPEED_100:
+   case ETH_SPEED_NUM_100M:
key_speed = BOND_LINK_SPEED_KEY_100M;
break;
-   case ETH_LINK_SPEED_1000:
+   case ETH_SPEED_NUM_1G:
key_speed = BOND_LINK_SPEED_KEY_1000M;
break;
-   case ETH_LINK_SPEED_10G:
+   case ETH_SPEED_NUM_10G:
key_speed = BOND_LINK_SPEED_KEY_10G;
break;
-   case ETH_LINK_SPEED_20G:
+   case ETH_SPEED_NUM_20G:
key_speed = BOND_LINK_SPEED_KEY_20G;
break;
-   case ETH_LINK_SPEED_40G:
+   case 

[dpdk-dev] [PATCH v11 3/8] app/testpmd: move speed and duplex parsing in a function

2016-03-17 Thread Thomas Monjalon
From: Marc Sune 

The code for checking and parsing speed/duplex was duplicated.
The new function is also checking the speed/duplex combination.

Signed-off-by: Marc Sune 
---
 app/test-pmd/cmdline.c | 99 --
 1 file changed, 47 insertions(+), 52 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 9d52b8c..37be5cd 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -983,6 +983,49 @@ struct cmd_config_speed_all {
cmdline_fixed_string_t value2;
 };

+static int
+parse_and_check_speed_duplex(char *speedstr, char *duplexstr, uint16_t *speed)
+{
+
+   int duplex;
+
+   if (!strcmp(duplexstr, "half")) {
+   duplex = ETH_LINK_HALF_DUPLEX;
+   } else if (!strcmp(duplexstr, "full")) {
+   duplex = ETH_LINK_FULL_DUPLEX;
+   } else if (!strcmp(duplexstr, "auto")) {
+   duplex = ETH_LINK_FULL_DUPLEX;
+   } else {
+   printf("Unknown duplex parameter\n");
+   return -1;
+   }
+
+   if (!strcmp(speedstr, "10")) {
+   *speed = ETH_LINK_SPEED_10;
+   } else if (!strcmp(speedstr, "100")) {
+   *speed = ETH_LINK_SPEED_100;
+   } else {
+   if (duplex != ETH_LINK_FULL_DUPLEX) {
+   printf("Invalid speed/duplex parameters\n");
+   return -1;
+   }
+   if (!strcmp(speedstr, "1000")) {
+   *speed = ETH_LINK_SPEED_1000;
+   } else if (!strcmp(speedstr, "1")) {
+   *speed = ETH_LINK_SPEED_10G;
+   } else if (!strcmp(speedstr, "4")) {
+   *speed = ETH_LINK_SPEED_40G;
+   } else if (!strcmp(speedstr, "auto")) {
+   *speed = ETH_LINK_SPEED_AUTONEG;
+   } else {
+   printf("Unknown speed parameter\n");
+   return -1;
+   }
+   }
+
+   return 0;
+}
+
 static void
 cmd_config_speed_all_parsed(void *parsed_result,
__attribute__((unused)) struct cmdline *cl,
@@ -998,33 +1041,9 @@ cmd_config_speed_all_parsed(void *parsed_result,
return;
}

-   if (!strcmp(res->value1, "10"))
-   link_speed = ETH_LINK_SPEED_10;
-   else if (!strcmp(res->value1, "100"))
-   link_speed = ETH_LINK_SPEED_100;
-   else if (!strcmp(res->value1, "1000"))
-   link_speed = ETH_LINK_SPEED_1000;
-   else if (!strcmp(res->value1, "1"))
-   link_speed = ETH_LINK_SPEED_10G;
-   else if (!strcmp(res->value1, "4"))
-   link_speed = ETH_LINK_SPEED_40G;
-   else if (!strcmp(res->value1, "auto"))
-   link_speed = ETH_LINK_SPEED_AUTONEG;
-   else {
-   printf("Unknown parameter\n");
+   if (parse_and_check_speed_duplex(res->value1, res->value2,
+   _speed) < 0)
return;
-   }
-
-   if (!strcmp(res->value2, "half"))
-   link_duplex = ETH_LINK_HALF_DUPLEX;
-   else if (!strcmp(res->value2, "full"))
-   link_duplex = ETH_LINK_FULL_DUPLEX;
-   else if (!strcmp(res->value2, "auto"))
-   link_duplex = ETH_LINK_AUTONEG_DUPLEX;
-   else {
-   printf("Unknown parameter\n");
-   return;
-   }

FOREACH_PORT(pid, ports) {
ports[pid].dev_conf.link_speed = link_speed;
@@ -1097,33 +1116,9 @@ cmd_config_speed_specific_parsed(void *parsed_result,
if (port_id_is_invalid(res->id, ENABLED_WARN))
return;

-   if (!strcmp(res->value1, "10"))
-   link_speed = ETH_LINK_SPEED_10;
-   else if (!strcmp(res->value1, "100"))
-   link_speed = ETH_LINK_SPEED_100;
-   else if (!strcmp(res->value1, "1000"))
-   link_speed = ETH_LINK_SPEED_1000;
-   else if (!strcmp(res->value1, "1"))
-   link_speed = ETH_LINK_SPEED_1;
-   else if (!strcmp(res->value1, "4"))
-   link_speed = ETH_LINK_SPEED_40G;
-   else if (!strcmp(res->value1, "auto"))
-   link_speed = ETH_LINK_SPEED_AUTONEG;
-   else {
-   printf("Unknown parameter\n");
-   return;
-   }
-
-   if (!strcmp(res->value2, "half"))
-   link_duplex = ETH_LINK_HALF_DUPLEX;
-   else if (!strcmp(res->value2, "full"))
-   link_duplex = ETH_LINK_FULL_DUPLEX;
-   else if (!strcmp(res->value2, "auto"))
-   link_duplex = ETH_LINK_AUTONEG_DUPLEX;
-   else {
-   printf("Unknown parameter\n");
+   if (parse_and_check_speed_duplex(res->value1, res->value2,
+   _speed) < 0)
return;
-   }

ports[res->id].dev_conf.link_speed = link_speed;
ports[res->id].dev_conf.link_duplex = 

[dpdk-dev] [PATCH v11 2/8] ethdev: use constants for link duplex

2016-03-17 Thread Thomas Monjalon
From: Marc Sune 

Some duplex values are replaced from 0 to half-duplex when link is down.

Some drivers are still using their own constants for duplex modes.

Signed-off-by: Marc Sune 
---
 drivers/net/e1000/em_ethdev.c  | 2 +-
 drivers/net/e1000/igb_ethdev.c | 2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c   | 2 +-
 drivers/net/virtio/virtio_ethdev.c | 2 +-
 drivers/net/virtio/virtio_ethdev.h | 2 --
 lib/librte_ether/rte_ethdev.h  | 2 +-
 6 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 58093c6..943a270 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1108,7 +1108,7 @@ eth_em_link_update(struct rte_eth_dev *dev, int 
wait_to_complete)
link.link_status = ETH_LINK_UP;
} else if (!link_check && (link.link_status == ETH_LINK_UP)) {
link.link_speed = 0;
-   link.link_duplex = 0;
+   link.link_duplex = ETH_LINK_HALF_DUPLEX;
link.link_status = ETH_LINK_DOWN;
}
rte_em_dev_atomic_write_link_status(dev, );
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 311f866..ea156ce 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -2033,7 +2033,7 @@ eth_igb_link_update(struct rte_eth_dev *dev, int 
wait_to_complete)
link.link_status = ETH_LINK_UP;
} else if (!link_check) {
link.link_speed = 0;
-   link.link_duplex = 0;
+   link.link_duplex = ETH_LINK_HALF_DUPLEX;
link.link_status = ETH_LINK_DOWN;
}
rte_igb_dev_atomic_write_link_status(dev, );
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index be28f7e..35dac49 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -2997,7 +2997,7 @@ ixgbe_dev_link_update(struct rte_eth_dev *dev, int 
wait_to_complete)

link.link_status = ETH_LINK_DOWN;
link.link_speed = 0;
-   link.link_duplex = 0;
+   link.link_duplex = ETH_LINK_HALF_DUPLEX;
memset(, 0, sizeof(old));
rte_ixgbe_dev_atomic_read_link_status(dev, );

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 3ebc221..63a368a 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1401,7 +1401,7 @@ virtio_dev_link_update(struct rte_eth_dev *dev, 
__rte_unused int wait_to_complet
memset(, 0, sizeof(link));
virtio_dev_atomic_read_link_status(dev, );
old = link;
-   link.link_duplex = FULL_DUPLEX;
+   link.link_duplex = ETH_LINK_FULL_DUPLEX;
link.link_speed  = SPEED_10G;

if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index fed9571..66423a0 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -42,8 +42,6 @@
 #define SPEED_100  100
 #define SPEED_1000 1000
 #define SPEED_10G  1
-#define HALF_DUPLEX1
-#define FULL_DUPLEX2

 #ifndef PAGE_SIZE
 #define PAGE_SIZE 4096
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index ec8d6b1..5379bee 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -246,7 +246,7 @@ struct rte_eth_stats {
  */
 struct rte_eth_link {
uint16_t link_speed;  /**< ETH_LINK_SPEED_[10, 100, 1000, 1] */
-   uint16_t link_duplex; /**< ETH_LINK_[HALF_DUPLEX, FULL_DUPLEX] */
+   uint16_t link_duplex; /**< ETH_LINK_[HALF/FULL]_DUPLEX */
uint8_t  link_status : 1; /**< ETH_LINK_[DOWN/UP] */
 }__attribute__((aligned(8))); /**< aligned for atomic64 read/write */

-- 
2.7.0



[dpdk-dev] [PATCH v11 1/8] ethdev: use constants for link state

2016-03-17 Thread Thomas Monjalon
Define and use ETH_LINK_UP and ETH_LINK_DOWN where appropriate.

Signed-off-by: Marc Sune 
Signed-off-by: Thomas Monjalon 
---
 app/test-pipeline/init.c |  2 +-
 app/test-pmd/testpmd.c   |  2 +-
 app/test/test_pmd_perf.c |  2 +-
 app/test/virtual_pmd.c   |  6 +++---
 drivers/net/af_packet/rte_eth_af_packet.c|  6 +++---
 drivers/net/bnx2x/bnx2x_ethdev.c |  2 +-
 drivers/net/bnx2x/elink.c|  2 +-
 drivers/net/bonding/rte_eth_bond_api.c   |  4 ++--
 drivers/net/bonding/rte_eth_bond_pmd.c   | 12 ++--
 drivers/net/e1000/em_ethdev.c|  8 
 drivers/net/e1000/igb_ethdev.c   |  4 ++--
 drivers/net/fm10k/fm10k_ethdev.c |  2 +-
 drivers/net/i40e/i40e_ethdev_vf.c|  2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c |  4 ++--
 drivers/net/mpipe/mpipe_tilegx.c | 12 ++--
 drivers/net/nfp/nfp_net.c|  2 +-
 drivers/net/null/rte_eth_null.c  |  6 +++---
 drivers/net/pcap/rte_eth_pcap.c  |  6 +++---
 drivers/net/ring/rte_eth_ring.c  | 10 +-
 drivers/net/szedata2/rte_eth_szedata2.c  |  2 +-
 drivers/net/virtio/virtio_ethdev.c   |  6 +++---
 drivers/net/vmxnet3/vmxnet3_ethdev.c |  2 +-
 drivers/net/xenvirt/rte_eth_xenvirt.c|  6 +++---
 examples/exception_path/main.c   |  2 +-
 examples/ip_fragmentation/main.c |  2 +-
 examples/ip_pipeline/init.c  |  2 +-
 examples/ip_reassembly/main.c|  2 +-
 examples/ipsec-secgw/ipsec-secgw.c   |  2 +-
 examples/ipv4_multicast/main.c   |  2 +-
 examples/kni/main.c  |  2 +-
 examples/l2fwd-crypto/main.c |  2 +-
 examples/l2fwd-ivshmem/host/host.c   |  2 +-
 examples/l2fwd-jobstats/main.c   |  2 +-
 examples/l2fwd-keepalive/main.c  |  2 +-
 examples/l2fwd/main.c|  2 +-
 examples/l3fwd-acl/main.c|  2 +-
 examples/l3fwd-power/main.c  |  2 +-
 examples/l3fwd/main.c|  2 +-
 examples/link_status_interrupt/main.c|  2 +-
 examples/load_balancer/init.c|  2 +-
 examples/multi_process/client_server_mp/mp_server/init.c |  2 +-
 examples/multi_process/l2fwd_fork/main.c |  2 +-
 examples/multi_process/symmetric_mp/main.c   |  2 +-
 examples/performance-thread/l3fwd-thread/main.c  |  2 +-
 lib/librte_ether/rte_ethdev.h|  5 -
 45 files changed, 80 insertions(+), 77 deletions(-)

diff --git a/app/test-pipeline/init.c b/app/test-pipeline/init.c
index db2196b..aef082f 100644
--- a/app/test-pipeline/init.c
+++ b/app/test-pipeline/init.c
@@ -205,7 +205,7 @@ app_ports_check_link(void)
link.link_speed / 1000,
link.link_status ? "UP" : "DOWN");

-   if (link.link_status == 0)
+   if (link.link_status == ETH_LINK_DOWN)
all_ports_up = 0;
}

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 38b9051..f713f39 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1641,7 +1641,7 @@ check_all_ports_link_status(uint32_t port_mask)
continue;
}
/* clear all_ports_up flag if any link down */
-   if (link.link_status == 0) {
+   if (link.link_status == ETH_LINK_DOWN) {
all_ports_up = 0;
break;
}
diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
index 48e16c9..59803f7 100644
--- a/app/test/test_pmd_perf.c
+++ b/app/test/test_pmd_perf.c
@@ -192,7 +192,7 @@ check_all_ports_link_status(uint8_t port_num, uint32_t 
port_mask)
continue;
}
/* clear all_ports_up flag if any link down */
-   if (link.link_status == 0) {
+   if (link.link_status == ETH_LINK_DOWN) {
all_ports_up = 0;
break;
}
diff --git a/app/test/virtual_pmd.c b/app/test/virtual_pmd.c
index a538c8a..b1d40d7 100644
--- 

[dpdk-dev] [PATCH v11 0/8] ethdev: 100G and link speed API refactoring

2016-03-17 Thread Thomas Monjalon
There are still too few tests and reviews, especially for
autonegotiation with Intel devices (patch #6).
I would not be surprised to see some bugs in this rework.

The capabilities must be adapted per device. It can be
improved in a separate patch.

It will be integrated in 16.04-rc2.
Please test and review shortly, thanks!



This series of patches adds the following capabilities:

* speed_capa bitmap in rte_eth_dev_info, which is filled by the PMDs
  according to the physical device capabilities.
* refactors link API in ethdev to allow the definition of the advertised
  link speeds, fix speed (no auto-negociation) or advertise all supported
  speeds (default).



v11:
- rebase on 16.04-rc1
- replace on more link status value in e1000 driver
- merge szedata2 patches
- remove szedata2 temporary comments in code and doc

v10:
- rebase
- rework release notes
- rearrange patch splitting
- fix doxygen comments
- fix typos
- removed log format of link.link_speed as %d (keep %u)
- complete ETH_LINK_[DOWN/UP] replacement from 0/1
- change ETH_LINK_SPEED_AUTONEG to 1
- replace ETH_LINK_SPEED_NEG by ETH_LINK_SPEED_AUTONEG (1)
- replace ETH_LINK_SPEED_NO_AUTONEG by ETH_LINK_SPEED_FIXED (0)
- rework rte_eth_speed_to_bm_flag to rte_eth_speed_bitflag
- complete 100G support in testpmd

v9: rebased to current HEAD. Reverted numeric speed to 32 bit in struct
rte_eth_link (no atomic link get > 64bit). Fixed mlx5 driver compilation
and link speeds. Moved documentation to release_16_04.rst and fixed several
issues. Upgrade NIC notes with speed capabilities.

v8: Rebased to current HEAD. Modified em driver impl. to not touch base files.
Merged patch 5 into 3 (map file). Changed numeric speed to a 64 bit value.
Filled-in speed capabilities for drivers bnx2x, cxgbe, mlx5 and nfp in
addition to the ones of previous patch sets.

v7: Rebased to current HEAD. Moved documentation to v2.3. Still needs testing
from PMD maintainers.

v6: Move link_duplex to be part of bitfield. Fixed i40 autoneg flag link
update code. Added rte_eth_speed_to_bm_flag() to .map file. Fixed other
spelling issues. Rebased to current HEAD.

v5: revert to v2 speed capabilities patch. Fixed MLX4 speed capabilities
(thanks N. Laranjeiro). Refactored link speed API to allow setting
advertised speeds (3/4). Added NO_AUTONEG option to explicitely disable
auto-negociation. Updated 2.2 rel. notes (4/4). Rebased to current HEAD.

v4: fixed errata in the documentation of field speeds of rte_eth_conf, and
commit 1/2 message. rebased to v2.1.0. v3 was incorrectly based on
~2.1.0-rc1.

v3: rebase to v2.1. unified ETH_LINK_SPEED and ETH_SPEED_CAP into ETH_SPEED.
Converted field speed in struct rte_eth_conf to speed, to allow a bitmap
for defining the announced speeds, as suggested M. Brorup. Fixed spelling
issues.

v2: rebase, converted speed_capa into 32 bits bitmap, fixed alignment
(checkpatch).



Marc Sune (6):
  ethdev: use constants for link duplex
  app/testpmd: move speed and duplex parsing in a function
  ethdev: rename link speed constants
  ethdev: add speed capabilities
  ethdev: redesign link speed config
  ethdev: convert speed number to bitmap flag

Thomas Monjalon (2):
  ethdev: use constants for link state
  ethdev: add 100G link speed

 app/test-pipeline/init.c   |   2 +-
 app/test-pmd/cmdline.c | 125 ++---
 app/test-pmd/testpmd.c |   2 +-
 app/test/test_pmd_perf.c   |   2 +-
 app/test/virtual_pmd.c |   8 +-
 doc/guides/nics/overview.rst   |   1 +
 doc/guides/nics/szedata2.rst   |   6 -
 doc/guides/rel_notes/release_16_04.rst |  22 
 doc/guides/testpmd_app_ug/testpmd_funcs.rst|   2 +-
 drivers/net/af_packet/rte_eth_af_packet.c  |   9 +-
 drivers/net/bnx2x/bnx2x_ethdev.c   |   7 +-
 drivers/net/bnx2x/elink.c  |   2 +-
 drivers/net/bonding/rte_eth_bond_8023ad.c  |  14 +--
 drivers/net/bonding/rte_eth_bond_api.c |   4 +-
 drivers/net/bonding/rte_eth_bond_pmd.c |  12 +-
 drivers/net/cxgbe/base/t4_hw.c |   8 +-
 drivers/net/cxgbe/cxgbe_ethdev.c   |   1 +
 drivers/net/e1000/em_ethdev.c  | 113 +--
 drivers/net/e1000/igb_ethdev.c | 104 +
 drivers/net/fm10k/fm10k_ethdev.c   |   6 +-
 drivers/net/i40e/i40e_ethdev.c |  76 +++--
 drivers/net/i40e/i40e_ethdev_vf.c  |  11 +-
 drivers/net/ixgbe/ixgbe_ethdev.c   |  76 ++---
 drivers/net/mlx4/mlx4.c|   4 +
 drivers/net/mlx5/mlx5_ethdev.c |  

[dpdk-dev] [PATCH v2] ring: check for zero objects mc dequeue / mp enqueue

2016-03-17 Thread Lazaros Koromilas
Issuing a zero objects dequeue with a single consumer has no effect.
Doing so with multiple consumers, can get more than one thread to succeed
the compare-and-set operation and observe starvation or even deadlock in
the while loop that checks for preceding dequeues.  The problematic piece
of code when n = 0:

cons_next = cons_head + n;
success = rte_atomic32_cmpset(>cons.head, cons_head, cons_next);

The same is possible on the enqueue path.

Signed-off-by: Lazaros Koromilas 
---
 lib/librte_ring/rte_ring.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 943c97c..eb45e41 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -431,6 +431,11 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const 
*obj_table,
uint32_t mask = r->prod.mask;
int ret;

+   /* Avoid the unnecessary cmpset operation below, which is also
+* potentially harmful when n equals 0. */
+   if (n == 0)
+   return 0;
+
/* move prod.head atomically */
do {
/* Reset n to the initial burst count */
@@ -618,6 +623,11 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void 
**obj_table,
unsigned i, rep = 0;
uint32_t mask = r->prod.mask;

+   /* Avoid the unnecessary cmpset operation below, which is also
+* potentially harmful when n equals 0. */
+   if (n == 0)
+   return 0;
+
/* move cons.head atomically */
do {
/* Restore n as it may change every loop */
-- 
1.9.1



[dpdk-dev] [PATCH] ring: assert on zero objects dequeue/enqueue

2016-03-17 Thread Lazaros Koromilas
Sure, I'm sending it again with your suggestions.

Lazaros.

On Wed, Mar 16, 2016 at 1:21 PM, Bruce Richardson
 wrote:
> On Tue, Mar 15, 2016 at 06:58:45PM +0200, Lazaros Koromilas wrote:
>> Issuing a zero objects dequeue with a single consumer has no effect.
>> Doing so with multiple consumers, can get more than one thread to succeed
>> the compare-and-set operation and observe starvation or even deadlock in
>> the while loop that checks for preceding dequeues.  The problematic piece
>> of code when n = 0:
>>
>> cons_next = cons_head + n;
>> success = rte_atomic32_cmpset(>cons.head, cons_head, cons_next);
>>
>> The same is possible on the enqueue path.
>>
>> Signed-off-by: Lazaros Koromilas 
>
> I'm not sure how serious a problem this really is, and I really suspect that
> just calling rte_panic is rather an overreaction here. At worst, this should
> be a check only when RTE_RING_DEBUG is on.
>
> However, probably my preferred solution to this issue would be to just add
>if (n == 0)
>return 0
>
> to the mp and mc enqueue/dequeue functions. That way there is no performance
> penalty for the higher-performing sp/sc paths, and you avoid and unnecessary
> cmpset operations for the mp/mc cases.
>
> /Bruce
>
>> ---
>>  lib/librte_ring/rte_ring.h | 26 ++
>>  1 file changed, 26 insertions(+)
>>
>> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
>> index 943c97c..2bf9ce3 100644
>> --- a/lib/librte_ring/rte_ring.h
>> +++ b/lib/librte_ring/rte_ring.h
>> @@ -100,6 +100,7 @@ extern "C" {
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #define RTE_TAILQ_RING_NAME "RTE_RING"
>>
>> @@ -211,6 +212,19 @@ struct rte_ring {
>>  #endif
>>
>>  /**
>> + * @internal Assert macro.
>> + * @param exp
>> + *   The expression to evaluate.
>> + */
>> +#define RTE_RING_ASSERT(exp) do { \
>> + if (!(exp)) { \
>> + rte_panic("line%d\t"  \
>> +   "assert \"" #exp "\" failed\n", \
>> +   __LINE__);  \
>> + } \
>> + } while (0)
>> +
>> +/**
>>   * Calculate the memory size needed for a ring
>>   *
>>   * This function returns the number of bytes needed for a ring, given
>> @@ -406,6 +420,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
>>   *   A pointer to a table of void * pointers (objects).
>>   * @param n
>>   *   The number of objects to add in the ring from the obj_table.
>> + *   Must be greater than zero.
>>   * @param behavior
>>   *   RTE_RING_QUEUE_FIXED:Enqueue a fixed number of items from a ring
>>   *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
>> @@ -431,6 +446,8 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * 
>> const *obj_table,
>>   uint32_t mask = r->prod.mask;
>>   int ret;
>>
>> + RTE_RING_ASSERT(n > 0);
>> +
>>   /* move prod.head atomically */
>>   do {
>>   /* Reset n to the initial burst count */
>> @@ -510,6 +527,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * 
>> const *obj_table,
>>   *   A pointer to a table of void * pointers (objects).
>>   * @param n
>>   *   The number of objects to add in the ring from the obj_table.
>> + *   Must be greater than zero.
>>   * @param behavior
>>   *   RTE_RING_QUEUE_FIXED:Enqueue a fixed number of items from a ring
>>   *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
>> @@ -533,6 +551,8 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * 
>> const *obj_table,
>>   uint32_t mask = r->prod.mask;
>>   int ret;
>>
>> + RTE_RING_ASSERT(n > 0);
>> +
>>   prod_head = r->prod.head;
>>   cons_tail = r->cons.tail;
>>   /* The subtraction is done between two unsigned 32bits value
>> @@ -594,6 +614,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * 
>> const *obj_table,
>>   *   A pointer to a table of void * pointers (objects) that will be filled.
>>   * @param n
>>   *   The number of objects to dequeue from the ring to the obj_table.
>> + *   Must be greater than zero.
>>   * @param behavior
>>   *   RTE_RING_QUEUE_FIXED:Dequeue a fixed number of items from a ring
>>   *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
>> @@ -618,6 +639,8 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void 
>> **obj_table,
>>   unsigned i, rep = 0;
>>   uint32_t mask = r->prod.mask;
>>
>> + RTE_RING_ASSERT(n > 0);
>> +
>>   /* move cons.head atomically */
>>   do {
>>   /* Restore n as it may change every loop */
>> @@ -689,6 +712,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void 
>> **obj_table,
>>   *   A pointer to a table of void * pointers (objects) that will be filled.
>>   * @param n
>>   *   The number of objects to dequeue from the ring to 

[dpdk-dev] [PATCH] ethdev: don't count missed packets in erroneous packets counter

2016-03-17 Thread Thomas Monjalon
CC Maryam and Olivier who had discussions about imissed and other stats:
http://dpdk.org/ml/archives/dev/2015-August/022905.html
http://dpdk.org/ml/archives/dev/2015-September/023351.html
http://dpdk.org/ml/archives/dev/2015-September/023612.html

2016-03-10 16:03, Igor Ryzhov:
> Comment for "ierrors" counter says that it counts erroneous received packets. 
> But for some reason "imissed" counter is added to "ierrors" counter in most 
> drivers. It is a mistake, because missed packets are obviously not received. 
> This patch fixes it.

According to this patch
http://dpdk.org/browse/dpdk/commit/?id=70bdb186
imissed was kept in ierrors because of backward compatibility.
I'm OK to remove imissed from ierrors.

Fixes: 70bdb18657da ("ethdev: add Rx error counters for missed, badcrc and 
badlen packets")
Fixes: 6bfe648406b5 ("i40e: add Rx error statistics")
Fixes: 856505d303f4 ("cxgbe: add port statistics")

Acked-by: Thomas Monjalon 


[dpdk-dev] [PATCH] ethdev: don't count missed packets in erroneous packets counter

2016-03-17 Thread Rahul Lakkireddy
On Thursday, March 03/10/16, 2016 at 16:03:30 +0300, Igor Ryzhov wrote:
> Comment for "ierrors" counter says that it counts erroneous received packets. 
> But for some reason "imissed" counter is added to "ierrors" counter in most 
> drivers. It is a mistake, because missed packets are obviously not received. 
> This patch fixes it.
> 
> Signed-off-by: Igor Ryzhov 
> ---

For the cxgbe part,
Acked-by: Rahul Lakkireddy 


[dpdk-dev] Huge pages to be allocated based on number of mbufs

2016-03-17 Thread Zoltan Kiss


On 14/03/16 17:54, Saurabh Mishra wrote:
> Hi,
>
> We are planning to support virtio, vmxnet3, ixgbe, i40e, bxn2x and SR-IOV
> on some of them with DPDK.
>
> We have seen that even if we give correct number of mbufs given the number
> hugepages reserved, rte_eth_tx_queue_setup() may still fail with no enough
> memory (I saw this on i40evf but worked on virtio and vmxnet3).
>
> We like to know what's the recommended way to determine how many hugepages
> we should allocate given the number of mbufs such that queue setup APIs
> also don't fail.

I think you ran into a fragmentation problem. If you allocate the 
hugepages later on after startup, chances are they are fragmented in the 
memory. When you allocate a pool, DPDK needs a continuous area of memory 
on the hugepages.
You should allocate them through the kernel boot params so they'll be as 
continuous as possible.


>
> Since we will be running on low-end systems too we need to be careful about
> reserving hugepages.
>
> Thanks,
> /Saurabh
>


[dpdk-dev] ixgbe TX function selection

2016-03-17 Thread Zoltan Kiss


On 10/03/16 07:51, Wu, Jingjing wrote:
> Hi, Zoltan
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zoltan Kiss
>> Sent: Wednesday, March 2, 2016 3:19 AM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] ixgbe TX function selection
>>
>> Hi,
>>
>> I've noticed that ixgbe_set_tx_function() selects the non-SG function
>> even if (dev->data->scattered_rx == 1). That seems a bit dangerous, as
>> you can turn that on inadvertently when you don't set max_rx_pkt_len and
>> buffer size in certain ways. I've learnt it in the hard way, as my
>> segmented packets were leaking memory on the TX path, which doesn't
>> cries if you send out segmented packets.
>> How should this case be treated? Assert on the non-SG TX side for the
>> 'next' pointer? Or turning on SG if RX has it? It doesn't seem to be a
>> solid way as other interfaces still can have SG turned on.
>>
>
> If you look into the ixgbe_set_tx_function, you will find tx function
> selection is decided by the tx_flags on queue configure, which is
> passed by rte_eth_txconf. So even you set dev->data->scattered_rx to 1,
> if the tx_flags is ETH_TXQ_FLAGS_NOMULTSEGS, ixgbe_xmit_pkts_simple is
> still selected as tx function. So, you'd better to set tx_flags=0, and have a 
> try.

You mean getting default_txconf from rte_eth_dev_info_get() and 
explicitly turn ETH_TXQ_FLAGS_NOMULTSEGS to 0? (filling tx_flags with 
zeros doesn't work very well) That's a way to solve it for me, but I'm 
rather talking about using defaults which doesn't cause memory leak 
quite easily.

>
>> Regards,
>>
>> Zoltan


[dpdk-dev] [PATCH v2] ring: check for zero objects mc dequeue / mp enqueue

2016-03-17 Thread Mauricio Vásquez
Hi Lazaros,

On Thu, Mar 17, 2016 at 4:49 PM, Lazaros Koromilas 
wrote:

> Issuing a zero objects dequeue with a single consumer has no effect.
> Doing so with multiple consumers, can get more than one thread to succeed
> the compare-and-set operation and observe starvation or even deadlock in
> the while loop that checks for preceding dequeues.  The problematic piece
> of code when n = 0:
>
> cons_next = cons_head + n;
> success = rte_atomic32_cmpset(>cons.head, cons_head, cons_next);
>
> The same is possible on the enqueue path.
>
> Signed-off-by: Lazaros Koromilas 
> ---
>  lib/librte_ring/rte_ring.h | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index 943c97c..eb45e41 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -431,6 +431,11 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void *
> const *obj_table,
> uint32_t mask = r->prod.mask;
> int ret;
>
> +   /* Avoid the unnecessary cmpset operation below, which is also
> +* potentially harmful when n equals 0. */
> +   if (n == 0)
>

What about using unlikely here?


> +   return 0;
> +
> /* move prod.head atomically */
> do {
> /* Reset n to the initial burst count */
> @@ -618,6 +623,11 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void
> **obj_table,
> unsigned i, rep = 0;
> uint32_t mask = r->prod.mask;
>
> +   /* Avoid the unnecessary cmpset operation below, which is also
> +* potentially harmful when n equals 0. */
> +   if (n == 0)
>

Also here.


> +   return 0;
> +
> /* move cons.head atomically */
> do {
> /* Restore n as it may change every loop */
> --
> 1.9.1
>
>


[dpdk-dev] [PATCH 0/8] Various fixes to compile with gcc6

2016-03-17 Thread Thomas Monjalon
RE fixing email addresses...

2016-03-17 16:39, Thomas Monjalon:
> 2016-02-25 13:48, Aaron Conole:
> > This series brings a number of code cleanups to allow building using gcc6,
> > with various legitimate warnings being fixed.
> > 
> > In particular, patch 3 ("drivers/net/e1000: Fix missing brackets") should be
> > checked for correctness (it does not alter any behavior from a functional
> > standpoint, but it may be required to do so for a correct fix).
> 
> Wenzhuo, Helin, Konstantin, Bruce, we need your opinion for some
> of these patches.
> Thanks




[dpdk-dev] [PATCH 0/8] Various fixes to compile with gcc6

2016-03-17 Thread Thomas Monjalon
2016-02-25 13:48, Aaron Conole:
> This series brings a number of code cleanups to allow building using gcc6,
> with various legitimate warnings being fixed.
> 
> In particular, patch 3 ("drivers/net/e1000: Fix missing brackets") should be
> checked for correctness (it does not alter any behavior from a functional
> standpoint, but it may be required to do so for a correct fix).

Wenzhuo, Helin, Konstantin, Bruce, we need your opinion for some
of these patches.
Thanks


[dpdk-dev] [PATCH v3 5/5] mlx5: add VLAN insertion offload

2016-03-17 Thread Adrien Mazarguil
From: Yaacov Hazan 

VLAN insertion can be done in hardware when supported in Verbs. A software
fallback is provided otherwise. The software implementation is also used
when multi-packet send is enabled on a queue, as both features are mutually
exclusive.

Signed-off-by: Yaacov Hazan 
Signed-off-by: Adrien Mazarguil 
---
 doc/guides/nics/mlx5.rst   |   2 +
 doc/guides/rel_notes/release_16_04.rst |   6 ++
 drivers/net/mlx5/Makefile  |   5 ++
 drivers/net/mlx5/mlx5.c|  12 +++-
 drivers/net/mlx5/mlx5.h|   1 +
 drivers/net/mlx5/mlx5_ethdev.c |  12 ++--
 drivers/net/mlx5/mlx5_rxtx.c   | 112 +++--
 drivers/net/mlx5/mlx5_rxtx.h   |  13 
 drivers/net/mlx5/mlx5_txq.c|  16 -
 9 files changed, 151 insertions(+), 28 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 9df30be..925cb9e 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -84,6 +84,7 @@ Features
 - Support for multiple MAC addresses.
 - VLAN filtering.
 - RX VLAN stripping.
+- TX VLAN insertion.
 - RX CRC stripping configuration.
 - Promiscuous mode.
 - Multicast promiscuous mode.
@@ -247,6 +248,7 @@ Currently supported by DPDK:

 - Flow director.
 - RX VLAN stripping.
+- TX VLAN insertion.
 - RX CRC stripping configuration.

 - Minimum firmware version:
diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 8eb423f..087eb25 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -151,6 +151,12 @@ This section should contain new features added in this 
release. Sample format:

   Only available with Mellanox OFED >= 3.2.

+* **Added mlx5 TX VLAN insertion support.**
+
+  Added support for TX VLAN insertion.
+
+  Only available with Mellanox OFED >= 3.2.
+
 * **Added af_packet dynamic removal function.**

   Af_packet device can now be detached using API, like other PMD devices.
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index a6a3cab..7e6d589 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -147,6 +147,11 @@ mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
infiniband/verbs.h \
enum IBV_EXP_CREATE_WQ_FLAG_RX_END_PADDING \
$(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_VERBS_VLAN_INSERTION \
+   infiniband/verbs.h \
+   enum IBV_EXP_RECEIVE_WQ_CVLAN_INSERTION \
+   $(AUTOCONF_OUTPUT)

 $(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 94eefb9..ea4b6e3 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -260,6 +260,7 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
struct ibv_context *attr_ctx = NULL;
struct ibv_device_attr device_attr;
unsigned int vf;
+   unsigned int mps;
int idx;
int i;

@@ -305,8 +306,14 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
   PCI_DEVICE_ID_MELLANOX_CONNECTX4VF) ||
  (pci_dev->id.device_id ==
   PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF));
-   INFO("PCI information matches, using device \"%s\" (VF: %s)",
-list[i]->name, (vf ? "true" : "false"));
+   /* Multi-packet send is only supported by ConnectX-4 Lx PF. */
+   mps = (pci_dev->id.device_id ==
+  PCI_DEVICE_ID_MELLANOX_CONNECTX4LX);
+   INFO("PCI information matches, using device \"%s\" (VF: %s,"
+" MPS: %s)",
+list[i]->name,
+vf ? "true" : "false",
+mps ? "true" : "false");
attr_ctx = ibv_open_device(list[i]);
err = errno;
break;
@@ -457,6 +464,7 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
 #endif /* HAVE_EXP_QUERY_DEVICE */

priv->vf = vf;
+   priv->mps = mps;
/* Allocate and register default RSS hash keys. */
priv->rss_conf = rte_calloc(__func__, hash_rxq_init_n,
sizeof((*priv->rss_conf)[0]), 0);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 1904d54..d012f50 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -106,6 +106,7 @@ struct priv {
unsigned int hw_fcs_strip:1; /* FCS stripping is supported. */
unsigned int hw_padding:1; /* End alignment padding is supported. */
unsigned int vf:1; /* This is a VF device. */
+   unsigned int mps:1; /* Whether multi-packet send is supported. */
unsigned int pending_alarm:1; /* 

[dpdk-dev] [PATCH v3 4/5] mlx5: add support for HW packet padding

2016-03-17 Thread Adrien Mazarguil
From: Olga Shern 

Environment variable MLX5_PMD_ENABLE_PADDING enables HW packet padding
in PCI bus transactions.

When packet size is cache aligned and CRC stripping is enabled, 4 fewer
bytes are written to the PCI bus. Enabling padding makes such packets
aligned again.

In cases where PCI bandwidth is the bottleneck, padding can improve
performance by 10%.

This is disabled by default since this can also decrease performance for
unaligned packet sizes.

Signed-off-by: Olga Shern 
---
 doc/guides/nics/mlx5.rst   | 14 ++
 doc/guides/rel_notes/release_16_04.rst |  7 +++
 drivers/net/mlx5/Makefile  |  5 +
 drivers/net/mlx5/mlx5.c| 28 
 drivers/net/mlx5/mlx5.h|  5 +
 drivers/net/mlx5/mlx5_rxq.c| 15 +++
 6 files changed, 74 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 8b63f3f..9df30be 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -156,6 +156,20 @@ Environment variables
   lower performance when there is no backpressure, it is not enabled by
   default.

+- ``MLX5_PMD_ENABLE_PADDING``
+
+  Enables HW packet padding in PCI bus transactions.
+
+  When packet size is cache aligned and CRC stripping is enabled, 4 fewer
+  bytes are written to the PCI bus. Enabling padding makes such packets
+  aligned again.
+
+  In cases where PCI bandwidth is the bottleneck, padding can improve
+  performance by 10%.
+
+  This is disabled by default since this can also decrease performance for
+  unaligned packet sizes.
+
 Run-time configuration
 ~~

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index a498ef7..8eb423f 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -144,6 +144,13 @@ This section should contain new features added in this 
release. Sample format:

   Only available with Mellanox OFED >= 3.2.

+* **Added mlx5 optional packet padding by HW.**
+
+  Added an option to make PCI bus transactions rounded to multiple of a
+  cache line size for better alignment.
+
+  Only available with Mellanox OFED >= 3.2.
+
 * **Added af_packet dynamic removal function.**

   Af_packet device can now be detached using API, like other PMD devices.
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index cc6de2d..a6a3cab 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -142,6 +142,11 @@ mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
infiniband/verbs.h \
enum IBV_EXP_CREATE_WQ_FLAG_SCATTER_FCS \
$(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_VERBS_RX_END_PADDING \
+   infiniband/verbs.h \
+   enum IBV_EXP_CREATE_WQ_FLAG_RX_END_PADDING \
+   $(AUTOCONF_OUTPUT)

 $(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index acfb365..94eefb9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -68,6 +68,25 @@
 #include "mlx5_defs.h"

 /**
+ * Retrieve integer value from environment variable.
+ *
+ * @param[in] name
+ *   Environment variable name.
+ *
+ * @return
+ *   Integer value, 0 if the variable is not set.
+ */
+int
+mlx5_getenv_int(const char *name)
+{
+   const char *val = getenv(name);
+
+   if (val == NULL)
+   return 0;
+   return atoi(val);
+}
+
+/**
  * DPDK callback to close the device.
  *
  * Destroy all queues and objects, free memory.
@@ -332,6 +351,9 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
 #ifdef HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS
IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS |
 #endif /* HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS */
+#ifdef HAVE_EXP_CREATE_WQ_FLAG_RX_END_PADDING
+   IBV_EXP_DEVICE_ATTR_RX_PAD_END_ALIGN |
+#endif /* HAVE_EXP_CREATE_WQ_FLAG_RX_END_PADDING */
0;
 #endif /* HAVE_EXP_QUERY_DEVICE */

@@ -424,6 +446,12 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
DEBUG("FCS stripping configuration is %ssupported",
  (priv->hw_fcs_strip ? "" : "not "));

+#ifdef HAVE_VERBS_RX_END_PADDING
+   priv->hw_padding = !!exp_device_attr.rx_pad_end_addr_align;
+#endif /* HAVE_VERBS_RX_END_PADDING */
+   DEBUG("hardware RX end alignment padding is %ssupported",
+ (priv->hw_padding ? "" : "not "));
+
 #else /* HAVE_EXP_QUERY_DEVICE */
priv->ind_table_max_size = RSS_INDIRECTION_TABLE_SIZE;
 #endif /* HAVE_EXP_QUERY_DEVICE */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 9690827..1904d54 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -104,6 +104,7 @@ struct priv 

[dpdk-dev] [PATCH v3 3/5] mlx5: add RX CRC stripping configuration

2016-03-17 Thread Adrien Mazarguil
From: Olga Shern 

Until now, CRC was always stripped by hardware. This feature can be
configured since MLNX_OFED >= 3.2.

Signed-off-by: Olga Shern 
---
 doc/guides/nics/mlx5.rst   |  2 ++
 doc/guides/rel_notes/release_16_04.rst |  6 ++
 drivers/net/mlx5/Makefile  |  5 +
 drivers/net/mlx5/mlx5.c|  7 +++
 drivers/net/mlx5/mlx5.h|  1 +
 drivers/net/mlx5/mlx5_rxq.c| 24 
 drivers/net/mlx5/mlx5_rxtx.c   |  6 --
 drivers/net/mlx5/mlx5_rxtx.h   |  1 +
 8 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index f0d8a7e..8b63f3f 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -84,6 +84,7 @@ Features
 - Support for multiple MAC addresses.
 - VLAN filtering.
 - RX VLAN stripping.
+- RX CRC stripping configuration.
 - Promiscuous mode.
 - Multicast promiscuous mode.
 - Hardware checksum offloads.
@@ -232,6 +233,7 @@ Currently supported by DPDK:

 - Flow director.
 - RX VLAN stripping.
+- RX CRC stripping configuration.

 - Minimum firmware version:

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index ceef9b7..a498ef7 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -138,6 +138,12 @@ This section should contain new features added in this 
release. Sample format:

   Implemented TX support in secondary processes (like mlx4).

+* **Added mlx5 RX CRC stripping configuration.**
+
+  Until now, CRC was always stripped. It can now be configured.
+
+  Only available with Mellanox OFED >= 3.2.
+
 * **Added af_packet dynamic removal function.**

   Af_packet device can now be detached using API, like other PMD devices.
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 7076ae3..cc6de2d 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -137,6 +137,11 @@ mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
infiniband/verbs.h \
enum IBV_EXP_CQ_RX_TCP_PACKET \
$(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_VERBS_FCS \
+   infiniband/verbs.h \
+   enum IBV_EXP_CREATE_WQ_FLAG_SCATTER_FCS \
+   $(AUTOCONF_OUTPUT)

 $(SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD):.c=.o): mlx5_autoconf.h

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 998e6f0..acfb365 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -417,6 +417,13 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
DEBUG("VLAN stripping is %ssupported",
  (priv->hw_vlan_strip ? "" : "not "));

+#ifdef HAVE_VERBS_FCS
+   priv->hw_fcs_strip = !!(exp_device_attr.exp_device_cap_flags &
+   IBV_EXP_DEVICE_SCATTER_FCS);
+#endif /* HAVE_VERBS_FCS */
+   DEBUG("FCS stripping configuration is %ssupported",
+ (priv->hw_fcs_strip ? "" : "not "));
+
 #else /* HAVE_EXP_QUERY_DEVICE */
priv->ind_table_max_size = RSS_INDIRECTION_TABLE_SIZE;
 #endif /* HAVE_EXP_QUERY_DEVICE */
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index bad9283..9690827 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -103,6 +103,7 @@ struct priv {
unsigned int hw_csum:1; /* Checksum offload is supported. */
unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
unsigned int hw_vlan_strip:1; /* VLAN stripping is supported. */
+   unsigned int hw_fcs_strip:1; /* FCS stripping is supported. */
unsigned int vf:1; /* This is a VF device. */
unsigned int pending_alarm:1; /* An alarm is pending. */
/* RX/TX queues. */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 3d84f41..19a1119 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1258,6 +1258,30 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, 
uint16_t desc,
  0),
 #endif /* HAVE_EXP_DEVICE_ATTR_VLAN_OFFLOADS */
};
+
+#ifdef HAVE_VERBS_FCS
+   /* By default, FCS (CRC) is stripped by hardware. */
+   if (dev->data->dev_conf.rxmode.hw_strip_crc) {
+   tmpl.crc_present = 0;
+   } else if (priv->hw_fcs_strip) {
+   /* Ask HW/Verbs to leave CRC in place when supported. */
+   attr.wq.flags |= IBV_EXP_CREATE_WQ_FLAG_SCATTER_FCS;
+   attr.wq.comp_mask |= IBV_EXP_CREATE_WQ_FLAGS;
+   tmpl.crc_present = 1;
+   } else {
+   WARN("%p: CRC stripping has been disabled but will still"
+" be performed by hardware, make sure MLNX_OFED and"
+" firmware are up to date",
+(void 

[dpdk-dev] [PATCH v3 2/5] mlx5: allow operation in secondary processes

2016-03-17 Thread Adrien Mazarguil
From: Or Ami 

Secondary processes are expected to use queues and other resources
allocated by the primary, however Verbs resources can only be shared
between processes when inherited through fork().

This limitation can be worked around for TX by configuring separate queues
from secondary processes.

Signed-off-by: Or Ami 
---
 doc/guides/nics/mlx5.rst   |   3 +-
 doc/guides/rel_notes/release_16_04.rst |   4 +
 drivers/net/mlx5/mlx5.c|  42 +--
 drivers/net/mlx5/mlx5.h|  12 ++
 drivers/net/mlx5/mlx5_ethdev.c | 202 -
 drivers/net/mlx5/mlx5_mac.c|   6 +
 drivers/net/mlx5/mlx5_rxmode.c |  12 ++
 drivers/net/mlx5/mlx5_rxq.c|  46 
 drivers/net/mlx5/mlx5_rxtx.h   |   8 ++
 drivers/net/mlx5/mlx5_stats.c  |   2 +-
 drivers/net/mlx5/mlx5_trigger.c|   6 +
 drivers/net/mlx5/mlx5_txq.c|  50 +++-
 12 files changed, 378 insertions(+), 15 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index edfbf1f..f0d8a7e 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -88,6 +88,7 @@ Features
 - Multicast promiscuous mode.
 - Hardware checksum offloads.
 - Flow director (RTE_FDIR_MODE_PERFECT and RTE_FDIR_MODE_PERFECT_MAC_VLAN).
+- Secondary process TX is supported.

 Limitations
 ---
@@ -96,7 +97,7 @@ Limitations
 - Inner RSS for VXLAN frames is not supported yet.
 - Port statistics through software counters only.
 - Hardware checksum offloads for VXLAN inner header are not supported yet.
-- Secondary processes are not supported yet.
+- Secondary process RX is not supported.

 Configuration
 -
diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index a011f0b..ceef9b7 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -134,6 +134,10 @@ This section should contain new features added in this 
release. Sample format:

   Implemented callbacks to bring link up and down.

+* **Added mlx5 support for operation in secondary processes.**
+
+  Implemented TX support in secondary processes (like mlx4).
+
 * **Added af_packet dynamic removal function.**

   Af_packet device can now be detached using API, like other PMD devices.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 14ac4ba..998e6f0 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -78,7 +78,7 @@
 static void
 mlx5_dev_close(struct rte_eth_dev *dev)
 {
-   struct priv *priv = dev->data->dev_private;
+   struct priv *priv = mlx5_get_priv(dev);
void *tmp;
unsigned int i;

@@ -483,18 +483,44 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
goto port_error;
}

-   eth_dev->data->dev_private = priv;
-   eth_dev->pci_dev = pci_dev;
+   /* Secondary processes have to use local storage for their
+* private data as well as a copy of eth_dev->data, but this
+* pointer must not be modified before burst functions are
+* actually called. */
+   if (mlx5_is_secondary()) {
+   struct mlx5_secondary_data *sd =
+   _secondary_data[eth_dev->data->port_id];
+   sd->primary_priv = eth_dev->data->dev_private;
+   if (sd->primary_priv == NULL) {
+   ERROR("no private data for port %u",
+   eth_dev->data->port_id);
+   err = EINVAL;
+   goto port_error;
+   }
+   sd->shared_dev_data = eth_dev->data;
+   rte_spinlock_init(>lock);
+   memcpy(sd->data.name, sd->shared_dev_data->name,
+  sizeof(sd->data.name));
+   sd->data.dev_private = priv;
+   sd->data.rx_mbuf_alloc_failed = 0;
+   sd->data.mtu = ETHER_MTU;
+   sd->data.port_id = sd->shared_dev_data->port_id;
+   sd->data.mac_addrs = priv->mac;
+   eth_dev->tx_pkt_burst = mlx5_tx_burst_secondary_setup;
+   eth_dev->rx_pkt_burst = mlx5_rx_burst_secondary_setup;
+   } else {
+   eth_dev->data->dev_private = priv;
+   eth_dev->data->rx_mbuf_alloc_failed = 0;
+   eth_dev->data->mtu = ETHER_MTU;
+   eth_dev->data->mac_addrs = priv->mac;
+   }

+   eth_dev->pci_dev = pci_dev;
rte_eth_copy_pci_info(eth_dev, pci_dev);
-
eth_dev->driver = _driver;
-   

[dpdk-dev] [PATCH v3 1/5] mlx5: add callbacks to support link (up / down) changes

2016-03-17 Thread Adrien Mazarguil
From: Or Ami 

Burst functions are updated to make sure applications cannot attempt to
send/receive after link is brought down.

Signed-off-by: Or Ami 
---
 doc/guides/rel_notes/release_16_04.rst |  4 ++
 drivers/net/mlx5/mlx5.c|  2 +
 drivers/net/mlx5/mlx5.h|  2 +
 drivers/net/mlx5/mlx5_ethdev.c | 85 ++
 4 files changed, 93 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 5f9eb3e..a011f0b 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -130,6 +130,10 @@ This section should contain new features added in this 
release. Sample format:

   Only available with Mellanox OFED >= 3.2.

+* **Added mlx5 link up/down callbacks.**
+
+  Implemented callbacks to bring link up and down.
+
 * **Added af_packet dynamic removal function.**

   Af_packet device can now be detached using API, like other PMD devices.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ad69ec2..14ac4ba 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -148,6 +148,8 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.dev_configure = mlx5_dev_configure,
.dev_start = mlx5_dev_start,
.dev_stop = mlx5_dev_stop,
+   .dev_set_link_down = mlx5_set_link_down,
+   .dev_set_link_up = mlx5_set_link_up,
.dev_close = mlx5_dev_close,
.promiscuous_enable = mlx5_promiscuous_enable,
.promiscuous_disable = mlx5_promiscuous_disable,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 43b24fb..9a3f240 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -168,6 +168,8 @@ void mlx5_dev_link_status_handler(void *);
 void mlx5_dev_interrupt_handler(struct rte_intr_handle *, void *);
 void priv_dev_interrupt_handler_uninstall(struct priv *, struct rte_eth_dev *);
 void priv_dev_interrupt_handler_install(struct priv *, struct rte_eth_dev *);
+int mlx5_set_link_down(struct rte_eth_dev *dev);
+int mlx5_set_link_up(struct rte_eth_dev *dev);

 /* mlx5_mac.c */

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 6704382..f609e0f 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -968,3 +968,88 @@ priv_dev_interrupt_handler_install(struct priv *priv, 
struct rte_eth_dev *dev)
   dev);
}
 }
+
+/**
+ * Change the link state (UP / DOWN).
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param up
+ *   Nonzero for link up, otherwise link down.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+static int
+priv_set_link(struct priv *priv, int up)
+{
+   struct rte_eth_dev *dev = priv->dev;
+   int err;
+   unsigned int i;
+
+   if (up) {
+   err = priv_set_flags(priv, ~IFF_UP, IFF_UP);
+   if (err)
+   return err;
+   for (i = 0; i < priv->rxqs_n; i++)
+   if ((*priv->rxqs)[i]->sp)
+   break;
+   /* Check if an sp queue exists.
+* Note: Some old frames might be received.
+*/
+   if (i == priv->rxqs_n)
+   dev->rx_pkt_burst = mlx5_rx_burst;
+   else
+   dev->rx_pkt_burst = mlx5_rx_burst_sp;
+   dev->tx_pkt_burst = mlx5_tx_burst;
+   } else {
+   err = priv_set_flags(priv, ~IFF_UP, ~IFF_UP);
+   if (err)
+   return err;
+   dev->rx_pkt_burst = removed_rx_burst;
+   dev->tx_pkt_burst = removed_tx_burst;
+   }
+   return 0;
+}
+
+/**
+ * DPDK callback to bring the link DOWN.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+mlx5_set_link_down(struct rte_eth_dev *dev)
+{
+   struct priv *priv = dev->data->dev_private;
+   int err;
+
+   priv_lock(priv);
+   err = priv_set_link(priv, 0);
+   priv_unlock(priv);
+   return err;
+}
+
+/**
+ * DPDK callback to bring the link UP.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+mlx5_set_link_up(struct rte_eth_dev *dev)
+{
+   struct priv *priv = dev->data->dev_private;
+   int err;
+
+   priv_lock(priv);
+   err = priv_set_link(priv, 1);
+   priv_unlock(priv);
+   return err;
+}
-- 
2.1.4



[dpdk-dev] [PATCH v3 0/5] Implement missing features in mlx5

2016-03-17 Thread Adrien Mazarguil
This patchset adds to mlx5 a few features available in mlx4 (TX from
secondary processes) or provided by Verbs (support for HW packet padding,
TX VLAN insertion).

Release notes and documentation are updated accordingly.

Changes in v3:
- Removed compilation option for TX VLAN insertion, the method to use is now
  determined at runtime.
- Modified releases notes slightly.

Changes in v2:
- Added support for CRC stripping configuration.
- Updated packet padding feature macro and made cosmetic changes to its
  implementation to match CRC stripping's.
- Updated release notes about packet padding.
- Updated TX VLAN insertion documentation.

Olga Shern (2):
  mlx5: add RX CRC stripping configuration
  mlx5: add support for HW packet padding

Or Ami (2):
  mlx5: add callbacks to support link (up / down) changes
  mlx5: allow operation in secondary processes

Yaacov Hazan (1):
  mlx5: add VLAN insertion offload

 doc/guides/nics/mlx5.rst   |  21 ++-
 doc/guides/rel_notes/release_16_04.rst |  27 +++
 drivers/net/mlx5/Makefile  |  15 ++
 drivers/net/mlx5/mlx5.c|  91 --
 drivers/net/mlx5/mlx5.h|  21 +++
 drivers/net/mlx5/mlx5_ethdev.c | 299 -
 drivers/net/mlx5/mlx5_mac.c|   6 +
 drivers/net/mlx5/mlx5_rxmode.c |  12 ++
 drivers/net/mlx5/mlx5_rxq.c|  85 ++
 drivers/net/mlx5/mlx5_rxtx.c   | 118 ++---
 drivers/net/mlx5/mlx5_rxtx.h   |  22 +++
 drivers/net/mlx5/mlx5_stats.c  |   2 +-
 drivers/net/mlx5/mlx5_trigger.c|   6 +
 drivers/net/mlx5/mlx5_txq.c|  66 +++-
 14 files changed, 746 insertions(+), 45 deletions(-)

-- 
2.1.4



[dpdk-dev] [PATCH RFC v3 0/3] Thread safe rte_vhost_enqueue_burst().

2016-03-17 Thread Thomas Monjalon
2016-02-24 14:47, Ilya Maximets:
> Implementation of rte_vhost_enqueue_burst() based on lockless ring-buffer
> algorithm and contains almost all to be thread-safe, but it's not.
> 
> This set adds required changes.
> 
> First patch in set is a standalone patch that fixes many times discussed
> issue with barriers on different architectures.
> 
> Second and third adds fixes to make rte_vhost_enqueue_burst thread safe.

My understanding is that we do not want to pollute Rx/Tx with locks.

Huawei, Yuanhan, Bruce, do you confirm?


[dpdk-dev] [PATCH] kni: set kni mac on ioctl_create

2016-03-17 Thread Thomas Monjalon
Helin,
You have probably missed this (old) patch / bug report.

2015-08-28 16:08, Sergey Balabanov:
> Hi,
> 
> Probably I missed something in understanding why the mac is not set on kni 
> creation. Any comments would be highly appreciated.
> 
> Thanks,
> Sergey
> 
> On Friday 28 August 2015 16:06:27 Sergey Balabanov wrote:
> > There is a situation when ioctl returns zero mac address (00:00:00:00:00:00)
> > for just created kni. The situation happens because kni mac is set on
> > 'ipconfig up' event (kni_net_open callback) not on kni creation
> > (kni_ioctl_create).
> > 
> > Signed-off-by: Sergey Balabanov 
> > ---
> >  lib/librte_eal/linuxapp/kni/kni_misc.c | 10 ++
> >  lib/librte_eal/linuxapp/kni/kni_net.c  |  9 -
> >  2 files changed, 10 insertions(+), 9 deletions(-)
> > 
> > diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c
> > b/lib/librte_eal/linuxapp/kni/kni_misc.c index 2e9fa89..61f83a0 100644
> > --- a/lib/librte_eal/linuxapp/kni/kni_misc.c
> > +++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
> > @@ -28,6 +28,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include  /* eth_type_trans */
> > 
> >  #include 
> >  #include "kni_dev.h"
> > @@ -465,6 +466,15 @@ kni_ioctl_create(unsigned int ioctl_num, unsigned long
> > ioctl_param) if (pci)
> > pci_dev_put(pci);
> > 
> > +   if (kni->lad_dev)
> > +   memcpy(net_dev->dev_addr, kni->lad_dev->dev_addr, ETH_ALEN);
> > +   else
> > +   /*
> > +* Generate random mac address. eth_random_addr() is the newer
> > +* version of generating mac address in linux kernel.
> > +*/
> > +   random_ether_addr(net_dev->dev_addr);
> > +
> > ret = register_netdev(net_dev);
> > if (ret) {
> > KNI_ERR("error %i registering device \"%s\"\n",
> > diff --git a/lib/librte_eal/linuxapp/kni/kni_net.c
> > b/lib/librte_eal/linuxapp/kni/kni_net.c index ab5add4..b50b4cf 100644
> > --- a/lib/librte_eal/linuxapp/kni/kni_net.c
> > +++ b/lib/librte_eal/linuxapp/kni/kni_net.c
> > @@ -70,15 +70,6 @@ kni_net_open(struct net_device *dev)
> > struct rte_kni_request req;
> > struct kni_dev *kni = netdev_priv(dev);
> > 
> > -   if (kni->lad_dev)
> > -   memcpy(dev->dev_addr, kni->lad_dev->dev_addr, ETH_ALEN);
> > -   else
> > -   /*
> > -* Generate random mac address. eth_random_addr() is the newer
> > -* version of generating mac address in linux kernel.
> > -*/
> > -   random_ether_addr(dev->dev_addr);
> > -
> > netif_start_queue(dev);
> > 
> > memset(, 0, sizeof(req));




[dpdk-dev] [PATCH] eal_interrupts.c: properly init struct epoll_event (valgrind)

2016-03-17 Thread Matthew Hall
On Thu, Mar 17, 2016 at 10:19:24AM -0700, Stephen Hemminger wrote:
> > > A better patch would be to move the data structure into the
> > > code block used, and get rid of the useless else (rte_panic never 
> > > returns);
> > > and fix the indentation, and use C99 initialization which should make 
> > > valgrind
> > > happier.
> > > 
> > > The moral is don't just slap memsets around

Hi guys,

As you probably read before, I am a security developer, not a low-level / 
kernel guy, and my DPDK work is spare time only. So I try to limit the scope 
of my DPDK patches where possible, to avoid making headaches for the full-time 
team to the minimum required. I can try to redo or refactor all this unrelated 
stuff in the code, but I wouldn't be as fast or accurate as you would.

Matthew.


[dpdk-dev] [PATCH 3/3] enic: small cleanup- remove a packet_error conditional

2016-03-17 Thread John Daley
Signed-off-by: John Daley 
---
 drivers/net/enic/enic_rx.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/net/enic/enic_rx.c b/drivers/net/enic/enic_rx.c
index 817a891..232987a 100644
--- a/drivers/net/enic/enic_rx.c
+++ b/drivers/net/enic/enic_rx.c
@@ -266,7 +266,6 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
nb_hold = rq->rx_nb_hold;   /* mbufs held by software */

while (nb_rx < nb_pkts) {
-   uint16_t rx_pkt_len;
volatile struct rq_enet_desc *rqd_ptr;
dma_addr_t dma_addr;
struct cq_desc cqd;
@@ -295,10 +294,6 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,

/* A packet error means descriptor and data are untrusted */
packet_error = enic_cq_rx_to_pkt_err_flags(, _err_flags);
-   if (!packet_error)
-   rx_pkt_len = enic_cq_rx_desc_n_bytes();
-   else
-   rx_pkt_len = 0;

/* Get the mbuf to return and replace with one just allocated */
rxmb = rq->mbuf_ring[rx_id];
@@ -327,16 +322,17 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
rxmb->data_off = RTE_PKTMBUF_HEADROOM;
rxmb->nb_segs = 1;
rxmb->next = NULL;
-   rxmb->pkt_len = rx_pkt_len;
-   rxmb->data_len = rx_pkt_len;
rxmb->port = enic->port_id;
if (!packet_error) {
+   rxmb->pkt_len = enic_cq_rx_desc_n_bytes();
rxmb->packet_type = enic_cq_rx_flags_to_pkt_type();
enic_cq_rx_to_pkt_flags(, rxmb);
} else {
+   rxmb->pkt_len = 0;
rxmb->packet_type = 0;
rxmb->ol_flags = 0;
}
+   rxmb->data_len = rxmb->pkt_len;

/* prefetch mbuf data for caller */
rte_packet_prefetch(RTE_PTR_ADD(rxmb->buf_addr,
-- 
2.7.0



[dpdk-dev] [PATCH 2/3] enic: handle error packets properly

2016-03-17 Thread John Daley
If the packet_error bit in the completion descriptor is set, the
remainder of the descriptor and data are invalid. PKT_RX_MAC_ERR
was set in the mbuf->ol_flags if packet_error was set and used
later to indicate an error packet. But since PKT_RX_MAC_ERR is
defined as 0, mbuf flags and packet types and length were being
misinterpreted.

Make the function enic_cq_rx_to_pkt_err_flags() return true for error
packets and use the return value instead of mbuf->ol_flags to indicate
error packets. Also remove warning for error packets and rely on
rx_error stats.

Fixes: 947d860c821f ("enic: improve Rx performance")

Signed-off-by: John Daley 
---
 drivers/net/enic/enic_rx.c | 43 ++-
 1 file changed, 18 insertions(+), 25 deletions(-)

diff --git a/drivers/net/enic/enic_rx.c b/drivers/net/enic/enic_rx.c
index 59ebaa4..817a891 100644
--- a/drivers/net/enic/enic_rx.c
+++ b/drivers/net/enic/enic_rx.c
@@ -129,13 +129,6 @@ enic_cq_rx_desc_rss_hash(struct cq_enet_rq_desc *cqrd)
return le32_to_cpu(cqrd->rss_hash);
 }

-static inline uint8_t
-enic_cq_rx_desc_fcs_ok(struct cq_enet_rq_desc *cqrd)
-{
-   return ((cqrd->flags & CQ_ENET_RQ_DESC_FLAGS_FCS_OK) ==
-   CQ_ENET_RQ_DESC_FLAGS_FCS_OK);
-}
-
 static inline uint16_t
 enic_cq_rx_desc_vlan(struct cq_enet_rq_desc *cqrd)
 {
@@ -150,25 +143,21 @@ enic_cq_rx_desc_n_bytes(struct cq_desc *cqd)
CQ_ENET_RQ_DESC_BYTES_WRITTEN_MASK;
 }

-static inline uint64_t
-enic_cq_rx_to_pkt_err_flags(struct cq_desc *cqd)
+static inline uint8_t
+enic_cq_rx_to_pkt_err_flags(struct cq_desc *cqd, uint64_t *pkt_err_flags_out)
 {
struct cq_enet_rq_desc *cqrd = (struct cq_enet_rq_desc *)cqd;
uint16_t bwflags;
+   int ret = 0;
uint64_t pkt_err_flags = 0;

bwflags = enic_cq_rx_desc_bwflags(cqrd);
-
-   /* Check for packet error. Can't be more specific than MAC error */
-   if (enic_cq_rx_desc_packet_error(bwflags)) {
-   pkt_err_flags |= PKT_RX_MAC_ERR;
-   }
-
-   /* Check for bad FCS. MAC error isn't quite, but no other choice */
-   if (!enic_cq_rx_desc_fcs_ok(cqrd)) {
-   pkt_err_flags |= PKT_RX_MAC_ERR;
+   if (unlikely(enic_cq_rx_desc_packet_error(bwflags))) {
+   pkt_err_flags = PKT_RX_MAC_ERR;
+   ret = 1;
}
-   return pkt_err_flags;
+   *pkt_err_flags_out = pkt_err_flags;
+   return ret;
 }

 /*
@@ -282,6 +271,7 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
dma_addr_t dma_addr;
struct cq_desc cqd;
uint64_t ol_err_flags;
+   uint8_t packet_error;

/* Check for pkts available */
color = (cqd_ptr->type_color >> CQ_DESC_COLOR_SHIFT)
@@ -303,9 +293,9 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
break;
}

-   /* Check for FCS or packet errors */
-   ol_err_flags = enic_cq_rx_to_pkt_err_flags();
-   if (ol_err_flags == 0)
+   /* A packet error means descriptor and data are untrusted */
+   packet_error = enic_cq_rx_to_pkt_err_flags(, _err_flags);
+   if (!packet_error)
rx_pkt_len = enic_cq_rx_desc_n_bytes();
else
rx_pkt_len = 0;
@@ -340,10 +330,13 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
rxmb->pkt_len = rx_pkt_len;
rxmb->data_len = rx_pkt_len;
rxmb->port = enic->port_id;
-   rxmb->packet_type = enic_cq_rx_flags_to_pkt_type();
-   rxmb->ol_flags = ol_err_flags;
-   if (!ol_err_flags)
+   if (!packet_error) {
+   rxmb->packet_type = enic_cq_rx_flags_to_pkt_type();
enic_cq_rx_to_pkt_flags(, rxmb);
+   } else {
+   rxmb->packet_type = 0;
+   rxmb->ol_flags = 0;
+   }

/* prefetch mbuf data for caller */
rte_packet_prefetch(RTE_PTR_ADD(rxmb->buf_addr,
-- 
2.7.0



[dpdk-dev] [PATCH 1/3] enic: mbuf->ol_flags could be set incorrectly

2016-03-17 Thread John Daley
In the receive path, the function to set mbuf ol_flags used the
mbuf packet_type before it was set.

Fixes: 947d860c821f ("enic: improve Rx performance")

Signed-off-by: John Daley 
---
 drivers/net/enic/enic_rx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/enic/enic_rx.c b/drivers/net/enic/enic_rx.c
index 945a60f..59ebaa4 100644
--- a/drivers/net/enic/enic_rx.c
+++ b/drivers/net/enic/enic_rx.c
@@ -210,7 +210,7 @@ enic_cq_rx_to_pkt_flags(struct cq_desc *cqd, struct 
rte_mbuf *mbuf)
ciflags = enic_cq_rx_desc_ciflags(cqrd);
bwflags = enic_cq_rx_desc_bwflags(cqrd);

-   ASSERT(mbuf->ol_flags == 0);
+   mbuf->ol_flags = 0;

/* flags are meaningless if !EOP */
if (unlikely(!enic_cq_rx_desc_eop(ciflags)))
@@ -340,10 +340,10 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
rxmb->pkt_len = rx_pkt_len;
rxmb->data_len = rx_pkt_len;
rxmb->port = enic->port_id;
+   rxmb->packet_type = enic_cq_rx_flags_to_pkt_type();
rxmb->ol_flags = ol_err_flags;
if (!ol_err_flags)
enic_cq_rx_to_pkt_flags(, rxmb);
-   rxmb->packet_type = enic_cq_rx_flags_to_pkt_type();

/* prefetch mbuf data for caller */
rte_packet_prefetch(RTE_PTR_ADD(rxmb->buf_addr,
-- 
2.7.0



[dpdk-dev] [PATCH 0/3] enic PMD receive path fixes

2016-03-17 Thread John Daley
These patches fix up some bugs in the enic receive path.

John Daley (3):
  enic: mbuf->ol_flags could be set incorrectly
  enic: handle error packets properly
  enic: small cleanup- remove a packet_error conditional

 drivers/net/enic/enic_rx.c | 53 ++
 1 file changed, 21 insertions(+), 32 deletions(-)

-- 
2.7.0



[dpdk-dev] [dpdk-dev, 1/3] rte_interrupts: add rte_eal_intr_exit to shut down IRQ thread

2016-03-17 Thread Matthew Hall
>From Cunming:
> I'm trying to understand the motivation.
> 
> I don't think you're going to gracefully exit intr thread but leave all 
> other eal threads live. We don't have API to new launch intr thread again.

The doc comment added for rte_eal_intr_exit already explains this. According 
to the doc I wrote, use of the function is limited to shutting everything 
down.

> So I guess your app is using own pthread(none EAL thread), you're trying to 
> safely shutdown the whole application by your signal handler.

No, the app is using DPDK pthreads, and trying to shutdown everything safely 
and cleanly w/ its signal handler, across DPDK and many other services in the 
app.

Unfortunately, right now from my experience it is impossible to get everything 
to 
cleanly shutdown, one an interrupt thread is activated. Because interrupt 
threads violate violate POSIX semantics:

1) It ignores EINTR and immediately forcibly restarts a poll() syscall. If the 
signal is delivered to the interrupt thread of the process by the kernel, this 
makes the thread uninterruptible to process the signal. Stuck running forever.

2) It does not properly set PTHREAD_CREATE_DETACHED for a background thread. 
So it holds the process open for its infinite loop of poll(). Stuck running 
forever.

3) There is no way to access the thread_id from intr_thread. So then you can't 
call pthread_cancel on it to shut it down. Stuck running forever.

> For this purpose, the device shall close safely(turn off intr) during the 
> time, intr thread still wait but no event will be raised.

In theory yes. In practice no. Because the intr thread violated POSIX rules 
for background processing threads per above.

> In this view, it seems not necessary to have this new. Can you explain more 
> detail for the purpose?

Based on my testing, I disagree. I could not get reliable shutdowns without 
this, or I wouldn't have coded it. (:

Matthew.


[dpdk-dev] [PATCH] vchost: Notify application of ownership change

2016-03-17 Thread Jan Kiszka
On 2016-03-17 15:42, Thomas Monjalon wrote:
> 2015-08-07 19:20, Jan Kiszka:
>> On VHOST_*_RESET_OWNER, we reinitialize the device but without telling
>> the application. That will cause crashes when it continues to invoke
>> vhost services on the device. Fix it by calling the destruction hook if
>> the device is still in use.
>>
>> Signed-off-by: Jan Kiszka 
> 
> For an unknown reason, this patch has been missed and
> another one replaced it in DPDK 2.2:
> http://dpdk.org/browse/dpdk/commit/?id=d243ecf0
> 

But the bug is fixed now - that is what matters :)

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux


[dpdk-dev] [PATCH] enic: prevent segfaults when allocating too many TX or RX queues

2016-03-17 Thread John Daley
From: Nelson Escobar 

Add checks to make sure we don't try to allocate more tx or rx queues
than we support.

Signed-off-by: Nelson Escobar 
Reviewed-by: John Daley 
---
 drivers/net/enic/enic_ethdev.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index 6f2ada5..6c3c734 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -174,6 +174,13 @@ static int enicpmd_dev_tx_queue_setup(struct rte_eth_dev 
*eth_dev,
struct enic *enic = pmd_priv(eth_dev);

ENICPMD_FUNC_TRACE();
+   if (queue_idx >= ENIC_WQ_MAX) {
+   dev_err(enic,
+   "Max number of TX queues exceeded.  Max is %d\n",
+   ENIC_WQ_MAX);
+   return -EINVAL;
+   }
+
eth_dev->data->tx_queues[queue_idx] = (void *)>wq[queue_idx];

ret = enic_alloc_wq(enic, queue_idx, socket_id, nb_desc);
@@ -262,6 +269,13 @@ static int enicpmd_dev_rx_queue_setup(struct rte_eth_dev 
*eth_dev,
struct enic *enic = pmd_priv(eth_dev);

ENICPMD_FUNC_TRACE();
+   if (queue_idx >= ENIC_RQ_MAX) {
+   dev_err(enic,
+   "Max number of RX queues exceeded.  Max is %d\n",
+   ENIC_RQ_MAX);
+   return -EINVAL;
+   }
+
eth_dev->data->rx_queues[queue_idx] = (void *)>rq[queue_idx];

ret = enic_alloc_rq(enic, queue_idx, socket_id, mp, nb_desc);
-- 
2.7.0



[dpdk-dev] [PATCH] enic: add missing \n to a few print statements

2016-03-17 Thread John Daley
From: Nelson Escobar 

Signed-off-by: Nelson Escobar 
Acked-by: John Daley 
---
 drivers/net/enic/enic_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index cd7857f..2f79cf0 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -342,13 +342,13 @@ enic_alloc_rx_queue_mbufs(struct enic *enic, struct 
vnic_rq *rq)
unsigned i;
dma_addr_t dma_addr;

-   dev_debug(enic, "queue %u, allocating %u rx queue mbufs", rq->index,
+   dev_debug(enic, "queue %u, allocating %u rx queue mbufs\n", rq->index,
  rq->ring.desc_count);

for (i = 0; i < rq->ring.desc_count; i++, rqd++) {
mb = rte_rxmbuf_alloc(rq->mp);
if (mb == NULL) {
-   dev_err(enic, "RX mbuf alloc failed queue_id=%u",
+   dev_err(enic, "RX mbuf alloc failed queue_id=%u\n",
(unsigned)rq->index);
return -ENOMEM;
}
@@ -388,7 +388,7 @@ enic_alloc_consistent(__rte_unused void *priv, size_t size,
rz = rte_memzone_reserve_aligned((const char *)name,
 size, SOCKET_ID_ANY, 0, ENIC_ALIGN);
if (!rz) {
-   pr_err("%s : Failed to allocate memory requested for %s",
+   pr_err("%s : Failed to allocate memory requested for %s\n",
__func__, name);
return NULL;
}
-- 
2.7.0



[dpdk-dev] [PATCH v2] testpmd: avoid only working in XEN when LIBRTE_PMD_XENVIRT is configured

2016-03-17 Thread Christian Ehrhardt
With LIBRTE_PMD_XENVIRT enabled testpmd is built in a way to ONLY work
in XEN environments.
It will surface as:
   PMD: gntalloc: ioctl error
   EAL: Error - exiting with code: 1
 Cause: Creation of mbuf pool for socket 0 failed

With LIBRTE_PMD_XENVIRT enabled this now tries the xen style grant
table allocation, but falls back gracefully for the normal allocation.

The only thing left in the log will be the
   PMD: gntalloc: ioctl error

Updates in v2
- adding missing Signed-off-by and set Pablo on --to with the patch directly

Signed-off-by: Christian Ehrhardt 
---
 app/test-pmd/testpmd.c | 33 -
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 1319917..b008df3 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -410,7 +410,7 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
 unsigned int socket_id)
 {
char pool_name[RTE_MEMPOOL_NAMESIZE];
-   struct rte_mempool *rte_mp;
+   struct rte_mempool *rte_mp = NULL;
uint32_t mb_size;

mb_size = sizeof(struct rte_mbuf) + mbuf_seg_size;
@@ -423,24 +423,23 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
rte_pktmbuf_pool_init, NULL,
rte_pktmbuf_init, NULL,
socket_id, 0);
-
-
-
-#else
-   if (mp_anon != 0)
-   rte_mp = mempool_anon_create(pool_name, nb_mbuf, mb_size,
-   (unsigned) mb_mempool_cache,
-   sizeof(struct rte_pktmbuf_pool_private),
-   rte_pktmbuf_pool_init, NULL,
-   rte_pktmbuf_init, NULL,
-   socket_id, 0);
-   else
-   /* wrapper to rte_mempool_create() */
-   rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
-   mb_mempool_cache, 0, mbuf_seg_size, socket_id);
-
 #endif

+   /* if the former XEN allocation failed fall back to normal allocation */
+   if (rte_mp == NULL) {
+   if (mp_anon != 0)
+   rte_mp = mempool_anon_create(pool_name, nb_mbuf,
+   mb_size, (unsigned) mb_mempool_cache,
+   sizeof(struct rte_pktmbuf_pool_private),
+   rte_pktmbuf_pool_init, NULL,
+   rte_pktmbuf_init, NULL,
+   socket_id, 0);
+   else
+   /* wrapper to rte_mempool_create() */
+   rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
+   mb_mempool_cache, 0, mbuf_seg_size, socket_id);
+   }
+
if (rte_mp == NULL) {
rte_exit(EXIT_FAILURE, "Creation of mbuf pool for socket %u "
"failed\n", socket_id);
-- 
2.7.3



[dpdk-dev] [PATCH] enic: don't set enic->config.rq_desc_count in enic_alloc_rq()

2016-03-17 Thread John Daley
From: Nelson Escobar 

When the requested number of rx descriptors was less than the amount
configured on the vic, enic_alloc_rq() was incorrectly setting
enic->config.rq_desc_count to the lower value.  This screwed up later
calls to enic_alloc_rq().

Signed-off-by: Nelson Escobar 
Reviewed-by: John Daley 
---
 drivers/net/enic/enic_main.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 9fff020..cd7857f 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -524,24 +524,22 @@ int enic_alloc_rq(struct enic *enic, uint16_t queue_idx,
"policy.  Applying the value in the adapter "\
"policy (%d).\n",
queue_idx, nb_desc, enic->config.rq_desc_count);
-   } else if (nb_desc != enic->config.rq_desc_count) {
-   enic->config.rq_desc_count = nb_desc;
-   dev_info(enic,
-   "RX Queues - effective number of descs:%d\n",
-   nb_desc);
+   nb_desc = enic->config.rq_desc_count;
}
+   dev_info(enic, "RX Queues - effective number of descs:%d\n",
+nb_desc);
}

/* Allocate queue resources */
rc = vnic_rq_alloc(enic->vdev, rq, queue_idx,
-   enic->config.rq_desc_count, sizeof(struct rq_enet_desc));
+   nb_desc, sizeof(struct rq_enet_desc));
if (rc) {
dev_err(enic, "error in allocation of rq\n");
goto err_exit;
}

rc = vnic_cq_alloc(enic->vdev, >cq[queue_idx], queue_idx,
-   socket_id, enic->config.rq_desc_count,
+   socket_id, nb_desc,
sizeof(struct cq_enet_rq_desc));
if (rc) {
dev_err(enic, "error in allocation of cq for rq\n");
@@ -550,7 +548,7 @@ int enic_alloc_rq(struct enic *enic, uint16_t queue_idx,

/* Allocate the mbuf ring */
rq->mbuf_ring = (struct rte_mbuf **)rte_zmalloc_socket("rq->mbuf_ring",
-   sizeof(struct rte_mbuf *) * enic->config.rq_desc_count,
+   sizeof(struct rte_mbuf *) * nb_desc,
RTE_CACHE_LINE_SIZE, rq->socket_id);

if (rq->mbuf_ring != NULL)
-- 
2.7.0



[dpdk-dev] [PATCH] enic: change maintainers

2016-03-17 Thread John Daley
Change maintainers for ENIC PMD.

Signed-off-by: John Daley 
---
 doc/guides/nics/enic.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/guides/nics/enic.rst b/doc/guides/nics/enic.rst
index 2a228fd..e67c3db 100644
--- a/doc/guides/nics/enic.rst
+++ b/doc/guides/nics/enic.rst
@@ -218,4 +218,4 @@ Any questions or bugs should be reported to DPDK community 
and to the ENIC PMD
 maintainers:

 - John Daley 
-- Sujith Sankar 
+- Nelson Escobar 
-- 
2.7.0



[dpdk-dev] [PATCH] vchost: Notify application of ownership change

2016-03-17 Thread Thomas Monjalon
2015-08-07 19:20, Jan Kiszka:
> On VHOST_*_RESET_OWNER, we reinitialize the device but without telling
> the application. That will cause crashes when it continues to invoke
> vhost services on the device. Fix it by calling the destruction hook if
> the device is still in use.
> 
> Signed-off-by: Jan Kiszka 

For an unknown reason, this patch has been missed and
another one replaced it in DPDK 2.2:
http://dpdk.org/browse/dpdk/commit/?id=d243ecf0


[dpdk-dev] Fwd: EAL: map_all_hugepages(): mmap failed: Cannot allocate memory

2016-03-17 Thread John Wei
I am setting up OVS inside a Linux container. This OVS is built using DPDK
library.
During the startup of ovs-vswitchd, it core dumped due to fail to mmap.
  in eal_memory.c
   virtaddr = mmap(vma_addr, hugepage_sz, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);

This call is made inside a for loop that loops through all the pages and
mmap them.
My server has two cores, and I allocated 8192 2MB pages.
The mmap for the first 4096 pages were successful. It failed when trying to
map 4096th page.

Can someone help me understand when the mmap for the first 4096 pages were
successful and it failed on 4096th page?


John



ovs-vswitchd --dpdk -c 0x1 -n 4 -l 1 --file-prefix ct- --socket-mem
128,128 -- unix:$DB_SOCK --pidfile --detach --log-file=ct.log


EAL: Detected lcore 23 as core 5 on socket 1
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 24 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: VFIO modules not all loaded, skip VFIO support...
EAL: Setting up physically contiguous memory...
EAL: map_all_hugepages(): mmap failed: Cannot allocate memory
EAL: Failed to mmap 2 MB hugepages
PANIC in rte_eal_init():
Cannot init memory
7: [ovs-vswitchd() [0x411f15]]
6: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ff5f6133b15]]
5: [ovs-vswitchd() [0x4106f9]]
4: [ovs-vswitchd() [0x66917d]]
3: [ovs-vswitchd() [0x42b6f5]]
2: [ovs-vswitchd() [0x40dd8c]]
1: [ovs-vswitchd() [0x56b3ba]]
Aborted (core dumped)


[dpdk-dev] [PATCH v8 4/4] ena: DPDK polling-mode driver for Amazon Elastic Network Adapters (ENA)

2016-03-17 Thread Jan Medala
This is a PMD for the Amazon ethernet ENA family.
The driver operates variety of ENA adapters through feature negotiation
with the adapter and upgradable commands set.
ENA driver handles PCI Physical and Virtual ENA functions.

Signed-off-by: Evgeny Schemeilin 
Signed-off-by: Jan Medala 
Signed-off-by: Jakub Palider 
---
 config/common_base  |   10 +
 drivers/net/Makefile|1 +
 drivers/net/ena/Makefile|   61 ++
 drivers/net/ena/ena_ethdev.c| 1445 +++
 drivers/net/ena/ena_ethdev.h|  160 
 drivers/net/ena/ena_logs.h  |   70 ++
 drivers/net/ena/ena_platform.h  |   59 ++
 drivers/net/ena/rte_pmd_ena_version.map |4 +
 mk/rte.app.mk   |1 +
 9 files changed, 1811 insertions(+)
 create mode 100644 drivers/net/ena/Makefile
 create mode 100644 drivers/net/ena/ena_ethdev.c
 create mode 100644 drivers/net/ena/ena_ethdev.h
 create mode 100644 drivers/net/ena/ena_logs.h
 create mode 100644 drivers/net/ena/ena_platform.h
 create mode 100644 drivers/net/ena/rte_pmd_ena_version.map

diff --git a/config/common_base b/config/common_base
index dbd405b..7d5e956 100644
--- a/config/common_base
+++ b/config/common_base
@@ -135,6 +135,16 @@ CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
 CONFIG_RTE_NIC_BYPASS=n

 #
+# Compile burst-oriented Amazon ENA PMD driver
+#
+CONFIG_RTE_LIBRTE_ENA_PMD=y
+CONFIG_RTE_LIBRTE_ENA_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_ENA_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_ENA_DEBUG_TX_FREE=n
+CONFIG_RTE_LIBRTE_ENA_DEBUG_DRIVER=n
+CONFIG_RTE_LIBRTE_ENA_COM_DEBUG=n
+
+#
 # Compile burst-oriented IGB & EM PMD drivers
 #
 CONFIG_RTE_LIBRTE_EM_PMD=y
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 0c3393f..612e85e 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -36,6 +36,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD) += bnx2x
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += bonding
 DIRS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += cxgbe
 DIRS-$(CONFIG_RTE_LIBRTE_E1000_PMD) += e1000
+DIRS-$(CONFIG_RTE_LIBRTE_ENA_PMD) += ena
 DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic
 DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
 DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e
diff --git a/drivers/net/ena/Makefile b/drivers/net/ena/Makefile
new file mode 100644
index 000..ac2b55d
--- /dev/null
+++ b/drivers/net/ena/Makefile
@@ -0,0 +1,61 @@
+#
+# BSD LICENSE
+#
+# Copyright (c) 2015-2016 Amazon.com, Inc. or its affiliates.
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+#
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_ena.a
+CFLAGS += $(WERROR_FLAGS) -O2
+INCLUDES :=-I$(SRCDIR) -I$(SRCDIR)/base/ena_defs -I$(SRCDIR)/base
+
+EXPORT_MAP := rte_pmd_ena_version.map
+LIBABIVER := 1
+
+VPATH += $(SRCDIR)/base
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_ENA_PMD) += ena_ethdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_ENA_PMD) += ena_com.c
+SRCS-$(CONFIG_RTE_LIBRTE_ENA_PMD) += ena_eth_com.c
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ENA_PMD) += lib/librte_eal lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ENA_PMD) += lib/librte_mempool lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ENA_PMD) += lib/librte_net lib/librte_malloc
+
+CFLAGS += $(INCLUDES)
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
new file mode 100644
index 000..325c513
--- /dev/null
+++ b/drivers/net/ena/ena_ethdev.c
@@ -0,0 +1,1445 @@
+/*-
+* BSD LICENSE
+*
+* Copyright 

[dpdk-dev] [PATCH v8 3/4] ena: Amazon ENA communication layer for DPDK platform

2016-03-17 Thread Jan Medala
Implementation of platform specific code for ENA communication layer.

Signed-off-by: Evgeny Schemeilin 
Signed-off-by: Jan Medala 
Signed-off-by: Jakub Palider 
---
 drivers/net/ena/base/ena_plat_dpdk.h | 217 +++
 1 file changed, 217 insertions(+)
 create mode 100644 drivers/net/ena/base/ena_plat_dpdk.h

diff --git a/drivers/net/ena/base/ena_plat_dpdk.h 
b/drivers/net/ena/base/ena_plat_dpdk.h
new file mode 100644
index 000..3ddc5c2
--- /dev/null
+++ b/drivers/net/ena/base/ena_plat_dpdk.h
@@ -0,0 +1,217 @@
+/*-
+* BSD LICENSE
+*
+* Copyright (c) 2015-2016 Amazon.com, Inc. or its affiliates.
+* All rights reserved.
+*
+* Redistribution and use in source and binary forms, with or without
+* modification, are permitted provided that the following conditions
+* are met:
+*
+* * Redistributions of source code must retain the above copyright
+* notice, this list of conditions and the following disclaimer.
+* * Redistributions in binary form must reproduce the above copyright
+* notice, this list of conditions and the following disclaimer in
+* the documentation and/or other materials provided with the
+* distribution.
+* * Neither the name of copyright holder nor the names of its
+* contributors may be used to endorse or promote products derived
+* from this software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+*/
+
+#ifndef DPDK_ENA_COM_ENA_PLAT_DPDK_H_
+#define DPDK_ENA_COM_ENA_PLAT_DPDK_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+typedef uint64_t u64;
+typedef uint32_t u32;
+typedef uint16_t u16;
+typedef uint8_t u8;
+
+typedef uint64_t dma_addr_t;
+
+#define ena_atomic32_t rte_atomic32_t
+#define ena_mem_handle_t void *
+
+#define SZ_256 (256)
+#define SZ_4K (4096)
+
+#define ENA_COM_OK 0
+#define ENA_COM_NO_MEM -ENOMEM
+#define ENA_COM_INVAL  -EINVAL
+#define ENA_COM_NO_SPACE   -ENOSPC
+#define ENA_COM_NO_DEVICE  -ENODEV
+#define ENA_COM_PERMISSION -EPERM
+#define ENA_COM_TIMER_EXPIRED  -ETIME
+#define ENA_COM_FAULT  -EFAULT
+
+#define cacheline_aligned __rte_cache_aligned
+
+#define ENA_ABORT() abort()
+
+#define ENA_MSLEEP(x) rte_delay_ms(x)
+#define ENA_UDELAY(x) rte_delay_us(x)
+
+#define memcpy_toio memcpy
+#define wmb rte_wmb
+#define rmb rte_wmb
+#define mb rte_mb
+#define __iomem
+
+#define US_PER_S 100
+#define ENA_GET_SYSTEM_USECS() \
+   (rte_get_timer_cycles() * US_PER_S / rte_get_timer_hz())
+
+#define ENA_ASSERT(cond, format, arg...)   \
+   do {\
+   if (unlikely(!(cond))) {\
+   printf("Assertion failed on %s:%s:%d: " format, \
+   __FILE__, __func__, __LINE__, ##arg);   \
+   rte_exit(EXIT_FAILURE, "ASSERTION FAILED\n");   \
+   }   \
+   } while (0)
+
+#define ENA_MAX32(x, y) RTE_MAX((x), (y))
+#define ENA_MAX16(x, y) RTE_MAX((x), (y))
+#define ENA_MAX8(x, y) RTE_MAX((x), (y))
+#define ENA_MIN32(x, y) RTE_MIN((x), (y))
+#define ENA_MIN16(x, y) RTE_MIN((x), (y))
+#define ENA_MIN8(x, y) RTE_MIN((x), (y))
+
+#define U64_C(x) x ## ULL
+#define BIT(nr) (1UL << (nr))
+#define BITS_PER_LONG  (__SIZEOF_LONG__ * 8)
+#define GENMASK(h, l)  (((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h
+#define GENMASK_ULL(h, l) (((U64_C(1) << ((h) - (l) + 1)) - 1) << (l))
+
+#ifdef RTE_LIBRTE_ENA_COM_DEBUG
+#define ena_trc_dbg(format, arg...)\
+   RTE_LOG(DEBUG, PMD, "[ENA_COM: %s] " format, __func__, ##arg)
+#define ena_trc_info(format, arg...)   \
+   RTE_LOG(INFO, PMD, "[ENA_COM: %s] " format, __func__, ##arg)
+#define ena_trc_warn(format, arg...)   \
+   RTE_LOG(ERR, PMD, "[ENA_COM: %s] " format, __func__, ##arg)
+#define ena_trc_err(format, arg...)\
+   RTE_LOG(ERR, PMD, 

[dpdk-dev] [PATCH v8 2/4] ena: Amazon ENA communication layer

2016-03-17 Thread Jan Medala
Low level common abstraction for ENA device communication.

Signed-off-by: Netanel Belgazal 
Signed-off-by: Jan Medala 
Signed-off-by: Jakub Palider 
---
 drivers/net/ena/base/ena_com.c  | 2809 +++
 drivers/net/ena/base/ena_com.h  | 1052 +
 drivers/net/ena/base/ena_defs/ena_admin_defs.h  | 1979 
 drivers/net/ena/base/ena_defs/ena_common_defs.h |   54 +
 drivers/net/ena/base/ena_defs/ena_eth_io_defs.h | 1488 
 drivers/net/ena/base/ena_defs/ena_gen_info.h|   35 +
 drivers/net/ena/base/ena_defs/ena_includes.h|   39 +
 drivers/net/ena/base/ena_defs/ena_regs_defs.h   |  135 ++
 drivers/net/ena/base/ena_eth_com.c  |  508 
 drivers/net/ena/base/ena_eth_com.h  |  153 ++
 drivers/net/ena/base/ena_plat.h |   51 +
 11 files changed, 8303 insertions(+)
 create mode 100644 drivers/net/ena/base/ena_com.c
 create mode 100644 drivers/net/ena/base/ena_com.h
 create mode 100644 drivers/net/ena/base/ena_defs/ena_admin_defs.h
 create mode 100644 drivers/net/ena/base/ena_defs/ena_common_defs.h
 create mode 100644 drivers/net/ena/base/ena_defs/ena_eth_io_defs.h
 create mode 100644 drivers/net/ena/base/ena_defs/ena_gen_info.h
 create mode 100644 drivers/net/ena/base/ena_defs/ena_includes.h
 create mode 100644 drivers/net/ena/base/ena_defs/ena_regs_defs.h
 create mode 100644 drivers/net/ena/base/ena_eth_com.c
 create mode 100644 drivers/net/ena/base/ena_eth_com.h
 create mode 100644 drivers/net/ena/base/ena_plat.h

diff --git a/drivers/net/ena/base/ena_com.c b/drivers/net/ena/base/ena_com.c
new file mode 100644
index 000..c7355eb
--- /dev/null
+++ b/drivers/net/ena/base/ena_com.c
@@ -0,0 +1,2809 @@
+/*-
+* BSD LICENSE
+*
+* Copyright (c) 2015-2016 Amazon.com, Inc. or its affiliates.
+* All rights reserved.
+*
+* Redistribution and use in source and binary forms, with or without
+* modification, are permitted provided that the following conditions
+* are met:
+*
+* * Redistributions of source code must retain the above copyright
+* notice, this list of conditions and the following disclaimer.
+* * Redistributions in binary form must reproduce the above copyright
+* notice, this list of conditions and the following disclaimer in
+* the documentation and/or other materials provided with the
+* distribution.
+* * Neither the name of copyright holder nor the names of its
+* contributors may be used to endorse or promote products derived
+* from this software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+*/
+
+#include "ena_com.h"
+
+/*/
+/*/
+
+/* Timeout in micro-sec */
+#define ADMIN_CMD_TIMEOUT_US (100)
+
+#define ENA_ASYNC_QUEUE_DEPTH 4
+#define ENA_ADMIN_QUEUE_DEPTH 32
+
+#define ENA_EXTENDED_STAT_GET_FUNCT(_funct_queue) (_funct_queue & 0x)
+#define ENA_EXTENDED_STAT_GET_QUEUE(_funct_queue) (_funct_queue >> 16)
+
+#define MIN_ENA_VER (((ENA_COMMON_SPEC_VERSION_MAJOR) << \
+   ENA_REGS_VERSION_MAJOR_VERSION_SHIFT) \
+   | (ENA_COMMON_SPEC_VERSION_MINOR))
+
+#define ENA_CTRL_MAJOR 0
+#define ENA_CTRL_MINOR 0
+#define ENA_CTRL_SUB_MINOR 1
+
+#define MIN_ENA_CTRL_VER \
+   (((ENA_CTRL_MAJOR) << \
+   (ENA_REGS_CONTROLLER_VERSION_MAJOR_VERSION_SHIFT)) | \
+   ((ENA_CTRL_MINOR) << \
+   (ENA_REGS_CONTROLLER_VERSION_MINOR_VERSION_SHIFT)) | \
+   (ENA_CTRL_SUB_MINOR))
+
+#define ENA_DMA_ADDR_TO_UINT32_LOW(x)  ((u32)((u64)(x)))
+#define ENA_DMA_ADDR_TO_UINT32_HIGH(x) ((u32)(((u64)(x)) >> 32))
+
+#define ENA_MMIO_READ_TIMEOUT 0x
+
+static int ena_alloc_cnt;
+
+/*/
+/*/
+/*/
+
+enum ena_cmd_status {
+   ENA_CMD_SUBMITTED,
+   ENA_CMD_COMPLETED,
+   /* Abort - canceled by the driver */
+   ENA_CMD_ABORTED,
+};
+
+struct ena_comp_ctx {
+   ena_wait_event_t 

[dpdk-dev] [PATCH v8 1/4] ena: Amazon ENA documentation

2016-03-17 Thread Jan Medala
Signed-off-by: Alexander Matushevsky 
Signed-off-by: Jan Medala 
Signed-off-by: Jakub Palider 
---
 MAINTAINERS  |   8 ++
 doc/guides/nics/ena.rst  | 251 +++
 doc/guides/nics/index.rst|   1 +
 doc/guides/nics/overview.rst | 116 ++--
 4 files changed, 318 insertions(+), 58 deletions(-)
 create mode 100644 doc/guides/nics/ena.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 8b21979..5052456 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -261,6 +261,14 @@ Linux AF_PACKET
 M: John W. Linville 
 F: drivers/net/af_packet/

+Amazon ena
+M: Jan Medala 
+M: Jakub Palider 
+M: Netanel Belgazal 
+M: Evgeny Schemeilin 
+F: drivers/net/ena/
+F: doc/guides/nics/ena.rst
+
 Chelsio cxgbe
 M: Rahul Lakkireddy 
 F: drivers/net/cxgbe/
diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst
new file mode 100644
index 000..9f93848
--- /dev/null
+++ b/doc/guides/nics/ena.rst
@@ -0,0 +1,251 @@
+.. BSD LICENSE
+
+Copyright (c) 2015-2016 Amazon.com, Inc. or its affiliates.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Amazon.com, Inc. nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ENA Poll Mode Driver
+
+
+The ENA PMD is a DPDK poll-mode driver for the Amazon Elastic
+Network Adapter (ENA) family.
+
+Overview
+
+
+The ENA driver exposes a lightweight management interface with a
+minimal set of memory mapped registers and an extendable command set
+through an Admin Queue.
+
+The driver supports a wide range of ENA adapters, is link-speed
+independent (i.e., the same driver is used for 10GbE, 25GbE, 40GbE,
+etc.), and it negotiates and supports an extendable feature set.
+
+ENA adapters allow high speed and low overhead Ethernet traffic
+processing by providing a dedicated Tx/Rx queue pair per CPU core.
+
+The ENA driver supports industry standard TCP/IP offload features such
+as checksum offload and TCP transmit segmentation offload (TSO).
+
+Receive-side scaling (RSS) is supported for multi-core scaling.
+
+Some of the ENA devices support a working mode called Low-latency
+Queue (LLQ), which saves several more microseconds.
+
+Management Interface
+
+
+ENA management interface is exposed by means of:
+
+* Device Registers
+* Admin Queue (AQ) and Admin Completion Queue (ACQ)
+
+ENA device memory-mapped PCIe space for registers (MMIO registers)
+are accessed only during driver initialization and are not involved
+in further normal device operation.
+
+AQ is used for submitting management commands, and the
+results/responses are reported asynchronously through ACQ.
+
+ENA introduces a very small set of management commands with room for
+vendor-specific extensions. Most of the management operations are
+framed in a generic Get/Set feature command.
+
+The following admin queue commands are supported:
+
+* Create I/O submission queue
+* Create I/O completion queue
+* Destroy I/O submission queue
+* Destroy I/O completion queue
+* Get feature
+* Set feature
+* Get statistics
+
+Refer to ``ena_admin_defs.h`` for the list of supported Get/Set Feature
+properties.
+
+Data Path Interface
+---
+
+I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx
+SQ correspondingly). Each SQ has a completion queue (CQ) associated
+with it.
+
+The SQs and CQs are implemented as descriptor rings in contiguous
+physical memory.
+
+Refer to ``ena_eth_io_defs.h`` for the detailed structure of the descriptor
+
+The driver supports multi-queue for both Tx and 

[dpdk-dev] [PATCH v8 0/4] DPDK polling-mode driver for Amazon Elastic Network Adapters (ENA)

2016-03-17 Thread Jan Medala
v3:
Additional features for Amazon ENA:
* Low Latenycy Queue (LLQ) for Tx
* RSS
v4:
* Improved doc
* Improved style according to checkpatch script
* Fixed build problems on: i686, clang, +shared, +debug
v5:
* Removed 'cvos' environment code from ena Makefile
* Driver symbol version fixed to DPDK_16.04
* Max MTU is read from device attributes
v6:
* Updated ENA communication layer
* Added check if DPDK queue size is supported by device
* Checkpatch results: 6 warns >80, 0 warns >90, no whitespace issues
* defined likely/unlikely (can compile with ARM toolchain)
* Updated doc/guides/nics/overview.rst w/ ENA
* Removed metioned #pragma for "-Wcast-qual"
v7:
* Resolved Thomas's comments:
  - included  instead of own definition of
likely/unlikely
  - used RTE_MIN/RTE_MAX macros
v8:
* Fixed init (error) logging to be always available

Jan Medala (4):
  ena: Amazon ENA documentation
  ena: Amazon ENA communication layer
  ena: Amazon ENA communication layer for DPDK platform
  ena: DPDK polling-mode driver for Amazon Elastic Network Adapters
(ENA)

 MAINTAINERS |8 +
 config/common_base  |   10 +
 doc/guides/nics/ena.rst |  251 ++
 doc/guides/nics/index.rst   |1 +
 doc/guides/nics/overview.rst|  116 +-
 drivers/net/Makefile|1 +
 drivers/net/ena/Makefile|   61 +
 drivers/net/ena/base/ena_com.c  | 2809 +++
 drivers/net/ena/base/ena_com.h  | 1052 +
 drivers/net/ena/base/ena_defs/ena_admin_defs.h  | 1979 
 drivers/net/ena/base/ena_defs/ena_common_defs.h |   54 +
 drivers/net/ena/base/ena_defs/ena_eth_io_defs.h | 1488 
 drivers/net/ena/base/ena_defs/ena_gen_info.h|   35 +
 drivers/net/ena/base/ena_defs/ena_includes.h|   39 +
 drivers/net/ena/base/ena_defs/ena_regs_defs.h   |  135 ++
 drivers/net/ena/base/ena_eth_com.c  |  508 
 drivers/net/ena/base/ena_eth_com.h  |  153 ++
 drivers/net/ena/base/ena_plat.h |   51 +
 drivers/net/ena/base/ena_plat_dpdk.h|  217 ++
 drivers/net/ena/ena_ethdev.c| 1445 
 drivers/net/ena/ena_ethdev.h|  160 ++
 drivers/net/ena/ena_logs.h  |   70 +
 drivers/net/ena/ena_platform.h  |   59 +
 drivers/net/ena/rte_pmd_ena_version.map |4 +
 mk/rte.app.mk   |1 +
 25 files changed, 10649 insertions(+), 58 deletions(-)
 create mode 100644 doc/guides/nics/ena.rst
 create mode 100644 drivers/net/ena/Makefile
 create mode 100644 drivers/net/ena/base/ena_com.c
 create mode 100644 drivers/net/ena/base/ena_com.h
 create mode 100644 drivers/net/ena/base/ena_defs/ena_admin_defs.h
 create mode 100644 drivers/net/ena/base/ena_defs/ena_common_defs.h
 create mode 100644 drivers/net/ena/base/ena_defs/ena_eth_io_defs.h
 create mode 100644 drivers/net/ena/base/ena_defs/ena_gen_info.h
 create mode 100644 drivers/net/ena/base/ena_defs/ena_includes.h
 create mode 100644 drivers/net/ena/base/ena_defs/ena_regs_defs.h
 create mode 100644 drivers/net/ena/base/ena_eth_com.c
 create mode 100644 drivers/net/ena/base/ena_eth_com.h
 create mode 100644 drivers/net/ena/base/ena_plat.h
 create mode 100644 drivers/net/ena/base/ena_plat_dpdk.h
 create mode 100644 drivers/net/ena/ena_ethdev.c
 create mode 100644 drivers/net/ena/ena_ethdev.h
 create mode 100644 drivers/net/ena/ena_logs.h
 create mode 100644 drivers/net/ena/ena_platform.h
 create mode 100644 drivers/net/ena/rte_pmd_ena_version.map

-- 
2.7.3



[dpdk-dev] [PATCH v7 4/4] ena: DPDK polling-mode driver for Amazon Elastic Network Adapters (ENA)

2016-03-17 Thread Jan Mędala
2016-03-17 14:57 GMT+01:00 Thomas Monjalon :

> 2016-03-17 14:48, Jan M?dala:
> > >
> > > >  #
> > > > +# Compile burst-oriented Amazon ENA PMD driver
> > > > +#
> > > > +CONFIG_RTE_LIBRTE_ENA_PMD=y
> > > > +CONFIG_RTE_LIBRTE_ENA_DEBUG_INIT=y
> > >
> > > Do you really want initialization debuggin to be on by default?
> Normally,
> > > we
> > > keep all debug options disabled.
> >
> > This is actually error logging, so it's silent for user until there is
> > something wrong with initialization.
> > Do you want me to rename it to point more accurately it's role?
>
> There should not be any option at all to disable error logging.
>
OK, I'm going to fix that.


[dpdk-dev] [PATCH] eal_interrupts.c: properly init struct epoll_event (valgrind)

2016-03-17 Thread Thomas Monjalon
Hi Stephen,

Please, could you turn it into a real patch with your sign-off?
Thanks

2016-02-14 12:22, Stephen Hemminger:
> A better patch would be to move the data structure into the
> code block used, and get rid of the useless else (rte_panic never returns);
> and fix the indentation, and use C99 initialization which should make valgrind
> happier.
> 
> The moral is don't just slap memsets around
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
> b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> index 06b26a9..d53826e 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> @@ -799,8 +799,6 @@ eal_intr_handle_interrupts(int pfd, unsigned totalfds)
>  static __attribute__((noreturn)) void *
>  eal_intr_thread_main(__rte_unused void *arg)
>  {
> - struct epoll_event ev;
> -
>   /* host thread, never break out */
>   for (;;) {
>   /* build up the epoll fd with all descriptors we are to
> @@ -834,20 +832,22 @@ eal_intr_thread_main(__rte_unused void *arg)
>   TAILQ_FOREACH(src, _sources, next) {
>   if (src->callbacks.tqh_first == NULL)
>   continue; /* skip those with no callbacks */
> - ev.events = EPOLLIN | EPOLLPRI;
> - ev.data.fd = src->intr_handle.fd;
> +
> + struct epoll_event ev = {
> + .events = EPOLLIN | EPOLLPRI,
> + .data.fd = src->intr_handle.fd,
> + };
>  
>   /**
>* add all the uio device file descriptor
>* into wait list.
>*/
>   if (epoll_ctl(pfd, EPOLL_CTL_ADD,
> - src->intr_handle.fd, ) < 0){
> + src->intr_handle.fd, ) < 0)
>   rte_panic("Error adding fd %d epoll_ctl, %s\n",
>   src->intr_handle.fd, strerror(errno));
> - }
> - else
> - numfds++;
> +
> + numfds++;
>   }
>   rte_spinlock_unlock(_lock);
>   /* serve the interrupt */




[dpdk-dev] [PATCH v4 1/2] eal/tile: add rte_vect.h and enable CONFIG_RTE_LIBRTE_LPM

2016-03-17 Thread Thomas Monjalon
Any news? a v5 could be part of the RC2.

2016-03-08 20:59, Thomas Monjalon:
> 2016-02-09 23:04, Liming Sun:
> > rte_vect.h was missing earlier thus LPM was disabled and l3fwd is
> > not able to compile. This commit implements the vector api and
> > enable LPM in the tilegx configuration by default.
> > 
> > Signed-off-by: Liming Sun 
> > Acked-by: Zhigang Lu 
> [...]
> >  # This following libraries are not available on the tile architecture.
> >  # So they're turned off.
> > -CONFIG_RTE_LIBRTE_LPM=n
> > +CONFIG_RTE_LIBRTE_LPM=y
> 
> You just have to remove the disabling line.
> 
> > +typedef union rte_xmm {
> > +   __m128i x;
> > +   uint32_t u32[XMM_SIZE / sizeof(uint32_t)];
> > +   uint64_t u64[XMM_SIZE / sizeof(uint64_t)];
> > +} rte_xmm_t;
> 
> Why do you mimic SSE?
> 
> > +/* Shifts right the 4 32-bit integers by count bits with zeros. */
> > +#define _mm_srli_epi32(v, cnt) ({  \
> > +   rte_xmm_t m; \
> > +   m.u64[0] = __insn_v4shru(((rte_xmm_t*)&(v))->u64[0], cnt); \
> > +   m.u64[1] = __insn_v4shru(((rte_xmm_t*)&(v))->u64[1], cnt); \
> > +   (m.x);   \
> > +})
> 
> Please check the work in progress to have arch-specific implementation
> of rte_lpm_lookupx4():
>   http://dpdk.org/dev/patchwork/patch/10478/




[dpdk-dev] [PATCH v7 4/4] ena: DPDK polling-mode driver for Amazon Elastic Network Adapters (ENA)

2016-03-17 Thread Thomas Monjalon
2016-03-17 14:48, Jan M?dala:
> >
> > >  #
> > > +# Compile burst-oriented Amazon ENA PMD driver
> > > +#
> > > +CONFIG_RTE_LIBRTE_ENA_PMD=y
> > > +CONFIG_RTE_LIBRTE_ENA_DEBUG_INIT=y
> >
> > Do you really want initialization debuggin to be on by default? Normally,
> > we
> > keep all debug options disabled.
> 
> This is actually error logging, so it's silent for user until there is
> something wrong with initialization.
> Do you want me to rename it to point more accurately it's role?

There should not be any option at all to disable error logging.


[dpdk-dev] [PATCH v7 4/4] ena: DPDK polling-mode driver for Amazon Elastic Network Adapters (ENA)

2016-03-17 Thread Jan Mędala
>
> >  #
> > +# Compile burst-oriented Amazon ENA PMD driver
> > +#
> > +CONFIG_RTE_LIBRTE_ENA_PMD=y
> > +CONFIG_RTE_LIBRTE_ENA_DEBUG_INIT=y
>
> Do you really want initialization debuggin to be on by default? Normally,
> we
> keep all debug options disabled.

This is actually error logging, so it's silent for user until there is
something wrong with initialization.
Do you want me to rename it to point more accurately it's role?

  Jan


[dpdk-dev] Document8

2016-03-17 Thread dev@dpdk.org

-- next part --
A non-text attachment was scrubbed...
Name: Document8.zip
Type: application/zip
Size: 4982 bytes
Desc: Document8.zip
URL: 
<http://dpdk.org/ml/archives/dev/attachments/20160317/77884823/attachment-0001.zip>


[dpdk-dev] Reg: promiscuous mode on VF

2016-03-17 Thread bharath paulraj
Hi Lu, Helin, Greg,

  Many thanks for your response, which is really quick. Now, If I want to
implement L2 bridging with Intel virtualization technologies, using 82599
controller, then Michael is my only hope, as getting the new kernel
versions and upstream support will take considerable amount of time.

   Michael, Could you please share your experience on L2 bridging using
Intel virtualization technologies.

Thanks,
Bharath

On Wed, Mar 16, 2016 at 9:40 PM, Rose, Gregory V 
wrote:

> Intel has not supported promiscuous mode for virtual functions due to the
> security concerns mentioned below.
>
> There will be upstream support in an upcoming Linux kernel for setting
> virtual functions as "trusted" and when that is available then Intel will
> allow virtual functions to enter unicast promiscuous mode on those Ethernet
> controllers that support promiscuous mode for virtual functions in the
> HW/FW.  Be aware that not all Intel Ethernet controllers have support for
> unicast promiscuous mode for virtual functions.  The only currently
> released product that does is the X710/XL710.
>
> The key take away is that unicast promiscuous mode for X710/XL710 virtual
> functions requires Linux kernel support, iproute2 package support and
> driver support.  Only when all three of these are in place will the feature
> work.
>
> Thanks,
>
> - Greg
>
> -Original Message-
> From: Zhang, Helin
> Sent: Wednesday, March 16, 2016 9:04 AM
> To: bharath paulraj ; Lu, Wenzhuo <
> wenzhuo.lu at intel.com>; Rowden, Aaron F ; Rose,
> Gregory V 
> Cc: dev at dpdk.org; Qiu, Michael ; Jayakumar,
> Muthurajan 
> Subject: RE: [dpdk-dev] Reg: promiscuous mode on VF
>
> Hi Bharath
>
> For your question of "why intel does not support unicast promiscuos
> mode?", I'd ask Aaron or Greg to give answers.
> Thank you very much!
>
> Regards,
> Helin
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of bharath paulraj
> > Sent: Wednesday, March 16, 2016 11:29 PM
> > To: Lu, Wenzhuo
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] Reg: promiscuous mode on VF
> >
> > Hi Lu,
> >
> > Many thanks for your response. Again I have few more queries.
> > If VF unicast promiscuous mode is not supported then can't we
> > implement a Layer 2 bridging functionality using intel virtualization
> > technologies? Or Is there any other way, say tweeking some hardware
> > registers or drivers, which may help us in implementing Layer 2 bridging.
> > Also I would like to know, why intel does not support unicast promiscuos
> mode?
> > It could have been optional register settings and user should have had
> > a previleage to set or unset it. Besides, security reasons, is there
> > any other big reason why Intel does not support this?
> >
> > Thanks,
> > Bharath Paulraj
> >
> > On Wed, Mar 16, 2016 at 6:15 AM, Lu, Wenzhuo 
> > wrote:
> >
> > > Hi Bharath,
> > >
> > > > 2) Is the above supported for 82599 controller? If it is
> > > > supported
> > > in the NIC,
> > > > please provide the steps to enable.
> > > Talking about 82599, VF unicast promiscuous mode is not supported.
> > > Only broadcast and multicast can be supported.
> > >
> > > >
> > > > Thanks,
> > > > Bharath Paulraj
> > >
> >
> >
> >
> > --
> > Regards,
> > Bharath
>



-- 
Regards,
Bharath


[dpdk-dev] [PATCH] mk: fix linker script when re-building

2016-03-17 Thread Panu Matilainen
On 03/17/2016 01:22 AM, Sergio Gonzalez Monroy wrote:
> The linker script is generated by simply finding all libraries in
> RTE_OUTPUT/lib.
>
> The issue shows up when re-building the DPDK, hence already having a
> linker script in that directory, resulting in the linker script
> including itself.
>
> That does not play well with the linker.
>
> Simply filtering the linker script from all the found libraries solves
> the problem.
>
> Fixes: 948fd64befc3 ("mk: replace the combined library with a linker script")
>
> Signed-off-by: Sergio Gonzalez Monroy 
> ---
>   mk/rte.combinedlib.mk | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mk/rte.combinedlib.mk b/mk/rte.combinedlib.mk
> index fe4817b..449358b 100644
> --- a/mk/rte.combinedlib.mk
> +++ b/mk/rte.combinedlib.mk
> @@ -42,7 +42,7 @@ endif
>   RTE_LIBNAME := dpdk
>   COMBINEDLIB := lib$(RTE_LIBNAME)$(EXT)
>
> -LIBS := $(notdir $(wildcard $(RTE_OUTPUT)/lib/*$(EXT)))
> +LIBS := $(filter-out $(COMBINEDLIB), $(notdir $(wildcard 
> $(RTE_OUTPUT)/lib/*$(EXT
>
>   all: FORCE
>   $(Q)echo "GROUP ( $(LIBS) )" > $(RTE_OUTPUT)/lib/$(COMBINEDLIB)
>

Oops, thanks for spotting.

Acked-by: Panu Matilainen 

- Panu -


[dpdk-dev] Patch "Increased number of next hops for LPM IPv4" break IP Pipeline application

2016-03-17 Thread Jastrzebski, MichalX K
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhang, Roy Fan
> Sent: Monday, March 14, 2016 5:57 PM
> To: dev at dpdk.org; Kobylinski, MichalX 
> Cc: Dumitrescu, Cristian ; Singh, Jasvinder
> ; thomas.monjalon at 6wind.com; Glynn,
> Michael J 
> Subject: [dpdk-dev] Patch "Increased number of next hops for LPM IPv4"
> break IP Pipeline application
> 
> Hi Michal,
> 
> Your patch "Increased number of next hops for LPM IPv4"
> (http://dpdk.org/ml/archives/dev/2016-March/035269.html) is breaking the
> IP Pipeline application. Without this patch, the application runs
> successfully.
> 
> The IP Pipeline failed on executing the following command:
> ./build/ip_pipeline -f ./config/edge_router_downstream.cfg -s
> ./config/edge_router_downstream.sh -p 0xf
> 
> The error messages:
> 
> [PIPELINE1] Routing
> TABLE: rte_table_lpm_create: Invalid number_tbl8s
> PIPELINE: rte_pipeline_table_create: Table creation failed
> PANIC in app_init_pipelines():
> Pipeline instance "PIPELINE1" back-end init error
> 
> Regards,
> Fan

Hi all,
A fix for this issue is here: 
http://dpdk.org/dev/patchwork/patch/11552/

Michal


[dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order

2016-03-17 Thread gowrishankar
Could this patch be reviewed please.

Thanks,
Gowrishankar

On Monday 07 March 2016 07:43 PM, Gowrishankar wrote:
> From: Gowri Shankar 
>
> For a secondary process address space to map hugepages from every segment of
> primary process, hugepage_file entries has to be mapped reversely from the
> list that primary process updated for every segment. This is for a reason 
> that,
> in ppc64, hugepages are sorted for decrementing addresses.
>
> Signed-off-by: Gowrishankar 
> ---
>   lib/librte_eal/linuxapp/eal/eal_memory.c |   26 --
>   1 file changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 5b9132c..6aea5d0 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1400,7 +1400,7 @@ rte_eal_hugepage_attach(void)
>   {
>   const struct rte_mem_config *mcfg = 
> rte_eal_get_configuration()->mem_config;
>   const struct hugepage_file *hp = NULL;
> - unsigned num_hp = 0;
> + unsigned num_hp = 0, mapped_hp = 0;
>   unsigned i, s = 0; /* s used to track the segment number */
>   off_t size;
>   int fd, fd_zero = -1, fd_hugepage = -1;
> @@ -1486,14 +1486,12 @@ rte_eal_hugepage_attach(void)
>   goto error;
>   }
>
> - num_hp = size / sizeof(struct hugepage_file);
> - RTE_LOG(DEBUG, EAL, "Analysing %u files\n", num_hp);
> -
>   s = 0;
>   while (s < RTE_MAX_MEMSEG && mcfg->memseg[s].len > 0){
>   void *addr, *base_addr;
>   uintptr_t offset = 0;
>   size_t mapping_size;
> + unsigned int index;
>   #ifdef RTE_LIBRTE_IVSHMEM
>   /*
>* if segment has ioremap address set, it's an IVSHMEM segment 
> and
> @@ -1504,6 +1502,8 @@ rte_eal_hugepage_attach(void)
>   continue;
>   }
>   #endif
> + num_hp = mcfg->memseg[s].len / mcfg->memseg[s].hugepage_sz;
> + RTE_LOG(DEBUG, EAL, "Analysing %u files in segment %u\n", 
> num_hp, s);
>   /*
>* free previously mapped memory so we can map the
>* hugepages into the space
> @@ -1514,18 +1514,23 @@ rte_eal_hugepage_attach(void)
>   /* find the hugepages for this segment and map them
>* we don't need to worry about order, as the server sorted the
>* entries before it did the second mmap of them */
> +#ifdef RTE_ARCH_PPC_64
> + for (i = num_hp-1; i < num_hp && offset < mcfg->memseg[s].len; 
> i--){
> +#else
>   for (i = 0; i < num_hp && offset < mcfg->memseg[s].len; i++){
> - if (hp[i].memseg_id == (int)s){
> - fd = open(hp[i].filepath, O_RDWR);
> +#endif
> + index = i + mapped_hp;
> + if (hp[index].memseg_id == (int)s){
> + fd = open(hp[index].filepath, O_RDWR);
>   if (fd < 0) {
>   RTE_LOG(ERR, EAL, "Could not open %s\n",
> - hp[i].filepath);
> + hp[index].filepath);
>   goto error;
>   }
>   #ifdef RTE_EAL_SINGLE_FILE_SEGMENTS
> - mapping_size = hp[i].size * hp[i].repeated;
> + mapping_size = hp[index].size * 
> hp[index].repeated;
>   #else
> - mapping_size = hp[i].size;
> + mapping_size = hp[index].size;
>   #endif
>   addr = mmap(RTE_PTR_ADD(base_addr, offset),
>   mapping_size, PROT_READ | 
> PROT_WRITE,
> @@ -1534,7 +1539,7 @@ rte_eal_hugepage_attach(void)
>   if (addr == MAP_FAILED ||
>   addr != RTE_PTR_ADD(base_addr, 
> offset)) {
>   RTE_LOG(ERR, EAL, "Could not mmap %s\n",
> - hp[i].filepath);
> + hp[index].filepath);
>   goto error;
>   }
>   offset+=mapping_size;
> @@ -1543,6 +1548,7 @@ rte_eal_hugepage_attach(void)
>   RTE_LOG(DEBUG, EAL, "Mapped segment %u of size 0x%llx\n", s,
>   (unsigned long long)mcfg->memseg[s].len);
>   s++;
> + mapped_hp += num_hp;
>   }
>   /* unmap the hugepage config file, since we are done using it */
>   munmap((void *)(uintptr_t)hp, size);




[dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the tail of rx hwring

2016-03-17 Thread Jianbo Liu
On 16 March 2016 at 19:14, Bruce Richardson  
wrote:
> On Wed, Mar 16, 2016 at 03:51:53PM +0800, Jianbo Liu wrote:
>> Hi Wenzhuo,
>>
>> On 16 March 2016 at 14:06, Lu, Wenzhuo  wrote:
>> > HI Jianbo,
>> >
>> >
>> >> -Original Message-
>> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianbo Liu
>> >> Sent: Monday, March 14, 2016 10:26 PM
>> >> To: Zhang, Helin; Ananyev, Konstantin; dev at dpdk.org
>> >> Cc: Jianbo Liu
>> >> Subject: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at 
>> >> the
>> >> tail of rx hwring
>> >>
>> >> When checking rx ring queue, it's possible that loop will break at the 
>> >> tail while
>> >> there are packets still in the queue header.
>> > Would you like to give more details about in what scenario this issue will 
>> > be hit? Thanks.
>> >
>>
>> vPMD will place extra RTE_IXGBE_DESCS_PER_LOOP - 1 number of empty
>> descriptiors at the end of hwring to avoid overflow when do checking
>> on rx side.
>>
>> For the loop in _recv_raw_pkts_vec(), we check 4 descriptors each
>> time. If all 4 DD are set, and all 4 packets are received.That's OK in
>> the middle.
>> But if come to the end of hwring, and less than 4 descriptors left, we
>> still need to check 4 descriptors at the same time, so the extra empty
>> descriptors are checked with them.
>> This time, the number of received packets is apparently less than 4,
>> and we break out of the loop because of the condition "var !=
>> RTE_IXGBE_DESCS_PER_LOOP".
>> So the problem arises. It is possible that there could be more packets
>> at the hwring beginning that still waiting for being received.
>> I think this fix can avoid this situation, and at least reduce the
>> latency for the packets in the header.
>>
> Packets are always received in order from the NIC, so no packets ever get left
> behind or skipped on an RX burst call.
>
> /Bruce
>

I knew packets are received in order, and no packets will be skipped,
but some will be left behind as I explained above.
vPMD will not received nb_pkts required by one RX burst call, and
those at the beginning of hwring are still waiting to be received till
the next call.

Thanks!
Jianbo


[dpdk-dev] [PATCH] eal_interrupts.c: properly init struct epoll_event (valgrind)

2016-03-17 Thread Stephen Hemminger
On Thu, 17 Mar 2016 15:18:15 +0100
Thomas Monjalon  wrote:

> Hi Stephen,
> 
> Please, could you turn it into a real patch with your sign-off?
> Thanks
> 
> 2016-02-14 12:22, Stephen Hemminger:
> > A better patch would be to move the data structure into the
> > code block used, and get rid of the useless else (rte_panic never returns);
> > and fix the indentation, and use C99 initialization which should make 
> > valgrind
> > happier.
> > 
> > The moral is don't just slap memsets around
> > 
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
> > b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> > index 06b26a9..d53826e 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> > @@ -799,8 +799,6 @@ eal_intr_handle_interrupts(int pfd, unsigned totalfds)
> >  static __attribute__((noreturn)) void *
> >  eal_intr_thread_main(__rte_unused void *arg)
> >  {
> > -   struct epoll_event ev;
> > -
> > /* host thread, never break out */
> > for (;;) {
> > /* build up the epoll fd with all descriptors we are to
> > @@ -834,20 +832,22 @@ eal_intr_thread_main(__rte_unused void *arg)
> > TAILQ_FOREACH(src, _sources, next) {
> > if (src->callbacks.tqh_first == NULL)
> > continue; /* skip those with no callbacks */
> > -   ev.events = EPOLLIN | EPOLLPRI;
> > -   ev.data.fd = src->intr_handle.fd;
> > +
> > +   struct epoll_event ev = {
> > +   .events = EPOLLIN | EPOLLPRI,
> > +   .data.fd = src->intr_handle.fd,
> > +   };
> >  
> > /**
> >  * add all the uio device file descriptor
> >  * into wait list.
> >  */
> > if (epoll_ctl(pfd, EPOLL_CTL_ADD,
> > -   src->intr_handle.fd, ) < 0){
> > +   src->intr_handle.fd, ) < 0)
> > rte_panic("Error adding fd %d epoll_ctl, %s\n",
> > src->intr_handle.fd, strerror(errno));
> > -   }
> > -   else
> > -   numfds++;
> > +
> > +   numfds++;
> > }
> > rte_spinlock_unlock(_lock);
> > /* serve the interrupt */
> 
> 

Sure I thought Matthew would since he reported the issue and had the ability
to test it.


[dpdk-dev] [PATCH] doc: mempool ABI deprecation notice for 16.07

2016-03-17 Thread Olivier Matz
Add a deprecation notice for coming changes in mempool for 16.07.

Signed-off-by: Olivier Matz 
---
 doc/guides/rel_notes/deprecation.rst | 8 
 1 file changed, 8 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 252a096..3e8e327 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -33,3 +33,11 @@ Deprecation Notices
 * ABI changes are planned for adding four new flow types. This impacts
   RTE_ETH_FLOW_MAX. The release 2.2 does not contain these ABI changes,
   but release 2.3 will.
+
+* librte_mempool: new fixes and features will be added in 16.07:
+  allocation of large mempool in several virtual memory chunks, new API
+  to populate a mempool, new API to free a mempool, allocation in
+  anonymous mapping, drop of specific dom0 code. These changes will
+  induce a modification of the rte_mempool structure, plus a
+  modification of the API of rte_mempool_obj_iter(), implying a breakage
+  of the ABI.
-- 
2.1.4



[dpdk-dev] [PATCH] app: fix for lpm in ip_pipeline

2016-03-17 Thread Michal Jastrzebski
From: Michal Kobylinski 

Updated ip_pipeline app is using new changes from LPM library 
(Increased number of next hops and added new config structure 
for LPM IPv4).

Fixes: 7164439d017d ("lpm: add a new config structure for IPv4")

Signed-off-by: Michal Kobylinski 
Acked-by: Cristian Dumitrescu 
---
 app/test-pipeline/pipeline_lpm.c| 6 ++
 examples/ip_pipeline/pipeline/pipeline_routing_be.c | 6 ++
 2 files changed, 12 insertions(+)

diff --git a/app/test-pipeline/pipeline_lpm.c b/app/test-pipeline/pipeline_lpm.c
index 916abd4..ecea6b3 100644
--- a/app/test-pipeline/pipeline_lpm.c
+++ b/app/test-pipeline/pipeline_lpm.c
@@ -47,6 +47,10 @@

 #include "main.h"

+#ifndef PIPELINE_LPM_TABLE_NUMBER_TABLE8s
+#define PIPELINE_LPM_TABLE_NUMBER_TABLE8s 256
+#endif
+
 void
 app_main_loop_worker_pipeline_lpm(void) {
struct rte_pipeline_params pipeline_params = {
@@ -113,6 +117,8 @@ app_main_loop_worker_pipeline_lpm(void) {
struct rte_table_lpm_params table_lpm_params = {
.name = "LPM",
.n_rules = 1 << 24,
+   .number_tbl8s = PIPELINE_LPM_TABLE_NUMBER_TABLE8s,
+   .flags = 0,
.entry_unique_size =
sizeof(struct rte_pipeline_table_entry),
.offset = APP_METADATA_OFFSET(32),
diff --git a/examples/ip_pipeline/pipeline/pipeline_routing_be.c 
b/examples/ip_pipeline/pipeline/pipeline_routing_be.c
index 8342b7b..431c636 100644
--- a/examples/ip_pipeline/pipeline/pipeline_routing_be.c
+++ b/examples/ip_pipeline/pipeline/pipeline_routing_be.c
@@ -67,6 +67,10 @@

 #define MAC_SRC_DEFAULT 0x112233445566

+#ifndef PIPELINE_ROUTING_LPM_TABLE_NUMBER_TABLE8s
+#define PIPELINE_ROUTING_LPM_TABLE_NUMBER_TABLE8s 256
+#endif
+
 struct pipeline_routing {
struct pipeline p;
struct pipeline_routing_params params;
@@ -1284,6 +1288,8 @@ pipeline_routing_init(struct pipeline_params *params,
struct rte_table_lpm_params table_lpm_params = {
.name = p->name,
.n_rules = p_rt->params.n_routes,
+   .number_tbl8s = 
PIPELINE_ROUTING_LPM_TABLE_NUMBER_TABLE8s,
+   .flags = 0,
.entry_unique_size = sizeof(struct routing_table_entry),
.offset = p_rt->params.ip_hdr_offset +
__builtin_offsetof(struct ipv4_hdr, dst_addr),
-- 
1.9.1



[dpdk-dev] vhost: no protection against malformed queue descriptors in rte_vhost_dequeue_burst()

2016-03-17 Thread Patrik Andersson R
Hi Huawei,

thank you for the quick response and for the pointer to  the 16.04-rc1
version. Nice!

I think it would be great also to have a sanity check on the gpa_to_vva().
Although nothing recent has hit it we had some problems in that area
in the past.

Regards,

Patrik

On 03/17/2016 02:35 AM, Xie, Huawei wrote:
> On 3/16/2016 8:53 PM, Patrik Andersson R wrote:
>> Hello,
>>
>> When taking a snapshot of a running VM instance, using OpenStack
>> "nova image-create", I noticed that one OVS pmd-thread eventually
>> failed in DPDK rte_vhost_dequeue_burst() with repeating log entries:
>>
>> compute-0-6 ovs-vswitchd[38172]: VHOST_DATA: Failed to allocate
>> memory for mbuf.
>>
>>
>> Debugging (data included further down) this issue lead to the
>> observation that there is no protection against malformed vhost
>> queue descriptors, thus tenant separation might be violated as a
>> single faulty VM might bring down the connectivity of all VMs
>> connected to the same virtual switch.
>>
>> To avoid this, validation would be needed at some points in the
>> rte_vhost_dequeue_burst() code:
>>
>>1) when the queue descriptor is picked up for processing,
>>desc->flags and desc->len might both be 0
>>
>> ...
>> desc = >desc[head[entry_success]];
>> ...
>> /* Discard first buffer as it is the virtio header */
>> if (desc->flags & VRING_DESC_F_NEXT) {
>>  desc = >desc[desc->next];
>>  vb_offset = 0;
>>  vb_avail = desc->len;
>> } else {
>>  vb_offset = vq->vhost_hlen;
>>  vb_avail = desc->len - vb_offset;
>> }
>>  
>>
>>2) at buffer address translation gpa_to_vva(), might fail
>>returning NULL as indication
>>
>> vb_addr = gpa_to_vva(dev, desc->addr);
>> ...
>> while (cpy_len != 0) {
>>  rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, seg_offset),
>>  (void *)((uintptr_t)(vb_addr + vb_offset)),
>>  cpy_len);
>> ...
>> }
>> ...
>>
>>
>> Wondering if there are any plans of adding any kind of validation in
>> DPDK, or if it would be useful to suggest specific implementation of
>> such validations in the DPDK code?
>>
>> Or is there some mechanism that gives us the confidence to trust
>> the vhost queue content absolutely?
>>
>>
>>
>> Debugging data:
>>
>> For my scenario the problem occurs in DPDK rte_vhost_dequeue_burst()
>> due to use of a vhost queue descriptor that has all fields 0:
>>
>>(gdb) print *desc
>> {addr = 0, len = 0, flags = 0, next = 0}
>>
>>
>> Subsequent use of desc->len to compute vb_avail = desc->len - vb_offset,
>> leads to the problem observed. What happens is that the packet needs to
>> be segmented -- on my system it fails roughly at segment 122000 when
>> memory available for mbufs run out.
>>
>> The relevant local variables for rte_vhost_dequeue_burst() when breaking
>> on the condition desc->len == 0:
>>
>> vb_avail = 4294967284  (0xfff4)
>> seg_avail = 2608
>> vb_offset = 12
>> cpy_len = 2608
>> seg_num = 1
>> desc = 0x2aadb6e5c000
>> vb_addr = 46928960159744
>> entry_success = 0
>>
>> Note also that there is no crash despite to the desc->addr being zero,
>> it is a valid address in the regions mapped to the device. Although, the
>> 3 regions mapped does not seem to be correct either at this stage.
>>
>>
>> The versions that I'm running are OVS 2.4.0, with corrections from the
>> 2.4 branch, and DPDK 2.1.0. QEMU emulator version 2.2.0 and
>> libvirt version 1.2.12.
>>
>>
>> Regards,
>>
>> Patrik
> Thanks Patrik. You are right. We had planned to enhance the robustness
> of vhost so that neither malicious nor buggy guest virtio driver could
> corrupt vhost. Actually the 16.04 RC1 has fixed some issues (the return
> of gpa_to_vva isn't checked).
>



[dpdk-dev] vhost: no protection against malformed queue descriptors in rte_vhost_dequeue_burst()

2016-03-17 Thread Xie, Huawei
On 3/16/2016 8:53 PM, Patrik Andersson R wrote:
> Hello,
>
> When taking a snapshot of a running VM instance, using OpenStack
> "nova image-create", I noticed that one OVS pmd-thread eventually
> failed in DPDK rte_vhost_dequeue_burst() with repeating log entries:
>
>compute-0-6 ovs-vswitchd[38172]: VHOST_DATA: Failed to allocate
> memory for mbuf.
>
>
> Debugging (data included further down) this issue lead to the
> observation that there is no protection against malformed vhost
> queue descriptors, thus tenant separation might be violated as a
> single faulty VM might bring down the connectivity of all VMs
> connected to the same virtual switch.
>
> To avoid this, validation would be needed at some points in the
> rte_vhost_dequeue_burst() code:
>
>   1) when the queue descriptor is picked up for processing,
>   desc->flags and desc->len might both be 0
>
>...
>desc = >desc[head[entry_success]];
>...
>/* Discard first buffer as it is the virtio header */
>if (desc->flags & VRING_DESC_F_NEXT) {
> desc = >desc[desc->next];
> vb_offset = 0;
> vb_avail = desc->len;
>} else {
> vb_offset = vq->vhost_hlen;
> vb_avail = desc->len - vb_offset;
>}
> 
>
>   2) at buffer address translation gpa_to_vva(), might fail
>   returning NULL as indication
>
>vb_addr = gpa_to_vva(dev, desc->addr);
>...
>while (cpy_len != 0) {
> rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, seg_offset),
> (void *)((uintptr_t)(vb_addr + vb_offset)),
> cpy_len);
>...
>}
>...
>
>
> Wondering if there are any plans of adding any kind of validation in
> DPDK, or if it would be useful to suggest specific implementation of
> such validations in the DPDK code?
>
> Or is there some mechanism that gives us the confidence to trust
> the vhost queue content absolutely?
>
>
>
> Debugging data:
>
> For my scenario the problem occurs in DPDK rte_vhost_dequeue_burst()
> due to use of a vhost queue descriptor that has all fields 0:
>
>   (gdb) print *desc
>{addr = 0, len = 0, flags = 0, next = 0}
>
>
> Subsequent use of desc->len to compute vb_avail = desc->len - vb_offset,
> leads to the problem observed. What happens is that the packet needs to
> be segmented -- on my system it fails roughly at segment 122000 when
> memory available for mbufs run out.
>
> The relevant local variables for rte_vhost_dequeue_burst() when breaking
> on the condition desc->len == 0:
>
>vb_avail = 4294967284  (0xfff4)
>seg_avail = 2608
>vb_offset = 12
>cpy_len = 2608
>seg_num = 1
>desc = 0x2aadb6e5c000
>vb_addr = 46928960159744
>entry_success = 0
>
> Note also that there is no crash despite to the desc->addr being zero,
> it is a valid address in the regions mapped to the device. Although, the
> 3 regions mapped does not seem to be correct either at this stage.
>
>
> The versions that I'm running are OVS 2.4.0, with corrections from the
> 2.4 branch, and DPDK 2.1.0. QEMU emulator version 2.2.0 and
> libvirt version 1.2.12.
>
>
> Regards,
>
> Patrik

Thanks Patrik. You are right. We had planned to enhance the robustness
of vhost so that neither malicious nor buggy guest virtio driver could
corrupt vhost. Actually the 16.04 RC1 has fixed some issues (the return
of gpa_to_vva isn't checked).

>



[dpdk-dev] [PATCH] vhost: remove unnecessary memset for virtio net hdr

2016-03-17 Thread Xie, Huawei
On 3/16/2016 2:44 PM, Yuanhan Liu wrote:
> We have to reset the virtio net hdr at virtio_enqueue_offload()
> before, due to all mbufs share a single virtio_hdr structure:
>
>   struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0};
>
>   foreach (mbuf) {
>   virtio_enqueue_offload(mbuf, _hdr.hdr);
>
>   copy net hdr and mbuf to desc buf
>   }
>
> However, after the vhost rxtx refactor, the code looks like:
>
>   copy_mbuf_to_desc(mbuf)
>   {
>   struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, }, 0}
>
>   virtio_enqueue_offload(mbuf, _hdr.hdr);
>
>   copy net hdr and mbuf to desc buf
>   }
>
>   foreach (mbuf) {
>   copy_mbuf_to_desc(mbuf);
>   }
>
> Therefore, the memset at virtio_enqueue_offload() is not necessary
> any more; remove it.
>
> Signed-off-by: Yuanhan Liu 
> ---
>  lib/librte_vhost/vhost_rxtx.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index a6330f8..b4da665 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -94,8 +94,6 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t 
> qp_nb)
>  static void
>  virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr 
> *net_hdr)
>  {
> - memset(net_hdr, 0, sizeof(struct virtio_net_hdr));
> -
>   if (m_buf->ol_flags & PKT_TX_L4_MASK) {
>   net_hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
>   net_hdr->csum_start = m_buf->l2_len + m_buf->l3_len;

Acked-by: Huawei Xie 


[dpdk-dev] Performance issue with uio_pci_generic driver

2016-03-17 Thread Simon Jouet
Hi everyone,

First off I would like to thanks tmonjalo, Harry Van Harren and Bruce 
Richardson for the input they gave while I was trying to figure out the issue 
and pushing me to report the problem here ?

Okay, so I was trying out some basic sanity benchmarks with DPDK before doing 
anything more complicated and surprisingly I was getting lower than gigabit 
speed for minimum packet size running l2fwd (or l3fwd for that matter).

The setup is very simple I?ve got two machine with Intel x710 quad port NICs 
one is running DPDK l2fwd and the other is running MoonGen for the performance 
benchmark.

After much debugging and trying to modify parameters one by one, giving up 
after nothing worked and setting up ovs-dpdk I noticed from the ovs 
documentation that the kernel module to load were uio and igb_uio while I was 
previous loading uio_pci_generic as mentioned in the DPDK getting started 
guide. I simply changed the kernel module and l2fwd went from 700Mbps to 10G 
line-rate.  Bruce said that shouldn?t be the case and the performance should be 
similar regardless of the driver loaded ...

Here is the full log of the experiment, if you?re interested:
https://gist.github.com/simon-jouet/178e1d302afef5c6a642

Best regards,
Simon



[dpdk-dev] [PATCH] mk: fix linker script when re-building

2016-03-17 Thread Sergio Gonzalez Monroy
The linker script is generated by simply finding all libraries in
RTE_OUTPUT/lib.

The issue shows up when re-building the DPDK, hence already having a
linker script in that directory, resulting in the linker script
including itself.

That does not play well with the linker.

Simply filtering the linker script from all the found libraries solves
the problem.

Fixes: 948fd64befc3 ("mk: replace the combined library with a linker script")

Signed-off-by: Sergio Gonzalez Monroy 
---
 mk/rte.combinedlib.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mk/rte.combinedlib.mk b/mk/rte.combinedlib.mk
index fe4817b..449358b 100644
--- a/mk/rte.combinedlib.mk
+++ b/mk/rte.combinedlib.mk
@@ -42,7 +42,7 @@ endif
 RTE_LIBNAME := dpdk
 COMBINEDLIB := lib$(RTE_LIBNAME)$(EXT)

-LIBS := $(notdir $(wildcard $(RTE_OUTPUT)/lib/*$(EXT)))
+LIBS := $(filter-out $(COMBINEDLIB), $(notdir $(wildcard 
$(RTE_OUTPUT)/lib/*$(EXT

 all: FORCE
$(Q)echo "GROUP ( $(LIBS) )" > $(RTE_OUTPUT)/lib/$(COMBINEDLIB)
-- 
2.4.3