date:20150225

[dpdk-dev] [PATCH v14 12/13] eal/pci: Add rte_eal_dev_attach/detach() functions

2015-02-25 Thread Tetsuya Mukawa

On 2015/02/25 23:00, Thomas Monjalon wrote:
> 2015-02-25 21:32, Tetsuya Mukawa:
>> 2015-02-25 20:21 GMT+09:00 Thomas Monjalon :
>>> 2015-02-25 13:04, Tetsuya Mukawa:
 --- a/lib/librte_eal/common/eal_common_dev.c
 +++ b/lib/librte_eal/common/eal_common_dev.c
 @@ -32,10 +32,13 @@
   *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
   */

 +#include 
 +#include 
  #include 
  #include 
  #include 

 +#include 
  #include 
  #include 
>>> No, you must not include ethdev in EAL.
>>> The ethdev layer is by design on top of EAL.
>>> Maxime already asked why you did it. He was implicitly asking to remove it.
>>> You said that you are calling ethdev_is_detachable() but you should
>>> call a function eal_is_detachable() or something like that.
>>> The detachable state must be only device-related, i.e. in EAL.
>>> The ethdev API is only a wrapper (with port id) in such case.
>>>
>> Hi Thomas,
>>
>> If ethdev library is on top of EAL, hotplug functions like
>> rte_eal_dev_attach/detach should be implemented in ethdev library.
>> Is it right?
> Yes you're right.
>
>> If so, I will move rte_eal_dev_attach/detach to ethdev library.
>> And I will change names like rte_eth_dev_attach/detach.
> It seems to be the right thing to do.
>
>> Also, I will add "rte_dev.h" and "rte_pci.h" in rte_ethdev.h, and call
>> below EAL functions from ethdev library.
>>
>> - For virtual device initialization and finalization
>> -- rte_eth_vdev_init
>> -- rte_eth_vdev_uninit()
>> - For physical NIC initialization and finalization
>> -- rte_eal_pci_probe_one()
>> -- rte_eal_pci_close_one()
>>
>> I guess this will fix this design violation.
>> Is this ok?
> I think yes.
> If needed, we could do some cleanup after RC1.
> I'm just waiting for you fixing this, to avoid introducing
> a layering violation.
> Would you able to do it today?

Hi Thomas,

I appreciate for your reply.
I start trying it.

Thanks,
Tetsuya

> Thanks
>
 --- a/lib/librte_eal/linuxapp/eal/Makefile
 +++ b/lib/librte_eal/linuxapp/eal/Makefile
 @@ -45,6 +45,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common/include
  CFLAGS += -I$(RTE_SDK)/lib/librte_ring
  CFLAGS += -I$(RTE_SDK)/lib/librte_mempool
  CFLAGS += -I$(RTE_SDK)/lib/librte_malloc
 +CFLAGS += -I$(RTE_SDK)/lib/librte_mbuf
>>> By removing ethdev dependency, you can remove this ugly mbuf dependency.
>>>
>>> Thanks Tetsuya
>>>
>

[dpdk-dev] [PATCH] eal: Clean up export of per_lcore__socket_id

2015-02-25 Thread Liang, Cunming

Hi Neil,

Thanks for the cleanup.
Does it better moving rte_socket_id() to eal_common_thread.c ?
As it simply returns _socket_id, it's not necessary to have two copy in both 
linux and bsd.

-Cunming

> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Wednesday, February 25, 2015 10:34 PM
> To: dev at dpdk.org
> Cc: thomas.monjalon at 6wind.com; Liang, Cunming; Neil Horman
> Subject: [PATCH] eal: Clean up export of per_lcore__socket_id
> 
> Theres no need to export this variable.  Its set and queried from an API call
> that doesn't exist in the hot path.  Instead just export the rte_socket_id
> symbol and make the variable private to protect it from type changes.  We 
> should
> do this with the other exported variables too, but I think its too late in the
> release cycle to do that.
> 
> tested using distributor_autotest (which uses rte_socket_id), successfully.
> Only tested on linux, as I don't currently have a bsd system spun up, but the
> changes are symmetric, and should be fine
> 
> Signed-off-by: Neil Horman 
> ---
>  lib/librte_eal/bsdapp/eal/eal_thread.c  | 5 +
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 2 +-
>  lib/librte_eal/common/eal_common_thread.c   | 2 ++
>  lib/librte_eal/common/include/rte_lcore.h   | 7 +--
>  lib/librte_eal/linuxapp/eal/eal_thread.c| 5 +
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 +-
>  6 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c
> b/lib/librte_eal/bsdapp/eal/eal_thread.c
> index ca95c72..5e6eea9 100644
> --- a/lib/librte_eal/bsdapp/eal/eal_thread.c
> +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
> @@ -60,6 +60,11 @@ RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) =
> LCORE_ID_ANY;
>  RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
>  RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
> 
> +unsigned rte_socket_id(void)
> +{
> + return RTE_PER_LCORE(_socket_id);
> +}
> +
>  /*
>   * Send a message to a slave lcore identified by slave_id to call a
>   * function f with argument arg. Once the execution is done, the
> diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> index 17515a9..d83524d 100644
> --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> @@ -10,7 +10,6 @@ DPDK_2.0 {
>   pci_driver_list;
>   per_lcore__lcore_id;
>   per_lcore__rte_errno;
> - per_lcore__socket_id;
>   rte_cpu_check_supported;
>   rte_cpu_get_flag_enabled;
>   rte_cycles_vmware_tsc_map;
> @@ -82,6 +81,7 @@ DPDK_2.0 {
>   rte_set_log_level;
>   rte_set_log_type;
>   rte_snprintf;
> + rte_socket_id;
>   rte_strerror;
>   rte_strsplit;
>   rte_sys_gettid;
> diff --git a/lib/librte_eal/common/eal_common_thread.c
> b/lib/librte_eal/common/eal_common_thread.c
> index f4d9892..4010eab 100644
> --- a/lib/librte_eal/common/eal_common_thread.c
> +++ b/lib/librte_eal/common/eal_common_thread.c
> @@ -46,6 +46,8 @@
> 
>  #include "eal_thread.h"
> 
> +RTE_DECLARE_PER_LCORE(unsigned , _socket_id);
> +
>  int eal_cpuset_socket_id(rte_cpuset_t *cpusetp)
>  {
>   unsigned cpu = 0;
> diff --git a/lib/librte_eal/common/include/rte_lcore.h
> b/lib/librte_eal/common/include/rte_lcore.h
> index 20a58eb..e03264e 100644
> --- a/lib/librte_eal/common/include/rte_lcore.h
> +++ b/lib/librte_eal/common/include/rte_lcore.h
> @@ -81,7 +81,6 @@ struct lcore_config {
>  extern struct lcore_config lcore_config[RTE_MAX_LCORE];
> 
>  RTE_DECLARE_PER_LCORE(unsigned, _lcore_id);  /**< Per thread "lcore id". */
> -RTE_DECLARE_PER_LCORE(unsigned, _socket_id); /**< Per thread "socket id".
> */
>  RTE_DECLARE_PER_LCORE(rte_cpuset_t, _cpuset); /**< Per thread "cpuset". */
> 
>  /**
> @@ -145,11 +144,7 @@ rte_lcore_index(int lcore_id)
>   * @return
>   *   the ID of current lcoreid's physical socket
>   */
> -static inline unsigned
> -rte_socket_id(void)
> -{
> - return RTE_PER_LCORE(_socket_id);
> -}
> +unsigned rte_socket_id(void);
> 
>  /**
>   * Get the ID of the physical socket of the specified lcore
> diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c
> b/lib/librte_eal/linuxapp/eal/eal_thread.c
> index 5635c7d..9cacd86 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_thread.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
> @@ -60,6 +60,11 @@ RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) =
> LCORE_ID_ANY;
>  RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
>  RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
> 
> +unsigned rte_socket_id(void)
> +{
> + return RTE_PER_LCORE(_socket_id);
> +}
> +
>  /*
>   * Send a message to a slave lcore identified by slave_id to call a
>   * function f with argument arg. Once the execution is done, the
> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> index 17515a9..d835

[dpdk-dev] [PATCH 1/4] xen: allow choosing dom0 support at runtime

2015-02-25 Thread Stephen Hemminger

On Thu, 26 Feb 2015 06:53:24 +
"Liu, Jijiang"  wrote:

> Ok, thanks for the explanation.
> 
> Could you  replace 'internal_config.xen_dom0_support' with  
> 'is_xen_dom0_supported()' in the function  rte_eal_hugepage_init()?

Ok, but then as a function it would have to be exported
as shared library map and becomoe part of API.

[dpdk-dev] [PATCH v14 12/13] eal/pci: Add rte_eal_dev_attach/detach() functions

2015-02-25 Thread Tetsuya Mukawa

2015-02-25 20:21 GMT+09:00 Thomas Monjalon :
> 2015-02-25 13:04, Tetsuya Mukawa:
>> --- a/lib/librte_eal/common/eal_common_dev.c
>> +++ b/lib/librte_eal/common/eal_common_dev.c
>> @@ -32,10 +32,13 @@
>>   *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>>   */
>>
>> +#include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>>
>> +#include 
>>  #include 
>>  #include 
>
> No, you must not include ethdev in EAL.
> The ethdev layer is by design on top of EAL.
> Maxime already asked why you did it. He was implicitly asking to remove it.
> You said that you are calling ethdev_is_detachable() but you should
> call a function eal_is_detachable() or something like that.
> The detachable state must be only device-related, i.e. in EAL.
> The ethdev API is only a wrapper (with port id) in such case.
>

Hi Thomas,

If ethdev library is on top of EAL, hotplug functions like
rte_eal_dev_attach/detach should be implemented in ethdev library.
Is it right?

If so, I will move rte_eal_dev_attach/detach to ethdev library.
And I will change names like rte_eth_dev_attach/detach.
Also, I will add "rte_dev.h" and "rte_pci.h" in rte_ethdev.h, and call
below EAL functions from ethdev library.

- For virtual device initialization and finalization
-- rte_eth_vdev_init
-- rte_eth_vdev_uninit()
- For physical NIC initialization and finalization
-- rte_eal_pci_probe_one()
-- rte_eal_pci_close_one()

I guess this will fix this design violation.
Is this ok?

Thanks,
Tetsuya

>> --- a/lib/librte_eal/linuxapp/eal/Makefile
>> +++ b/lib/librte_eal/linuxapp/eal/Makefile
>> @@ -45,6 +45,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common/include
>>  CFLAGS += -I$(RTE_SDK)/lib/librte_ring
>>  CFLAGS += -I$(RTE_SDK)/lib/librte_mempool
>>  CFLAGS += -I$(RTE_SDK)/lib/librte_malloc
>> +CFLAGS += -I$(RTE_SDK)/lib/librte_mbuf
>
> By removing ethdev dependency, you can remove this ugly mbuf dependency.
>
> Thanks Tetsuya
>

[dpdk-dev] [PATCH 3/3] doc: add docs for the rxtx_callbacks sample app

2015-02-25 Thread John McNamara

Added a sample application guide for the rxtx_callbacks app.

Signed-off-by: John McNamara 
---
 MAINTAINERS |1 +
 doc/guides/sample_app_ug/index.rst  |1 +
 doc/guides/sample_app_ug/rxtx_callbacks.rst |  251 +++
 3 files changed, 253 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/sample_app_ug/rxtx_callbacks.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 86c1c6b..2ddb312 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -443,6 +443,7 @@ F: doc/guides/sample_app_ug/quota_watermark.rst
 M: Bruce Richardson 
 M: John McNamara 
 F: examples/rxtx_callbacks/
+F: doc/guides/sample_app_ug/rxtx_callbacks.rst

 M: Bruce Richardson 
 M: John McNamara 
diff --git a/doc/guides/sample_app_ug/index.rst 
b/doc/guides/sample_app_ug/index.rst
index 4e9d59b..4a86459 100644
--- a/doc/guides/sample_app_ug/index.rst
+++ b/doc/guides/sample_app_ug/index.rst
@@ -44,6 +44,7 @@ Sample Applications User Guide
 exception_path
 hello_world
 skeleton
+rxtx_callbacks
 ip_frag
 ipv4_multicast
 ip_reassembly
diff --git a/doc/guides/sample_app_ug/rxtx_callbacks.rst 
b/doc/guides/sample_app_ug/rxtx_callbacks.rst
new file mode 100644
index 000..9df57ed
--- /dev/null
+++ b/doc/guides/sample_app_ug/rxtx_callbacks.rst
@@ -0,0 +1,251 @@
+..  BSD LICENSE
+Copyright(c) 2015 Intel Corporation. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+RX/TX Callbacks Sample Application
+==
+
+The RX/TX Callbacks sample application is a packet forwarding application that
+demonstrates the use of user defined callbacks on received and transmitted
+packets. The application performs a simple latency check, using callbacks, to
+determine the time packets spend within the application.
+
+In the sample application a user defined callback is applied to all received
+packets to add a timestamp. A separate callback is applied to all packets
+prior to transmission to calculate the elapsed time, in CPU cycles.
+
+
+Compiling the Application
+-
+
+To compile the application export the path to the DPDK source tree and go to
+the example directory:
+
+.. code-block:: console
+
+export RTE_SDK=/path/to/rte_sdk
+
+cd ${RTE_SDK}/examples/rxtx_callbacks
+
+
+Set the target, for example:
+
+.. code-block:: console
+
+export RTE_TARGET=x86_64-native-linuxapp-gcc
+
+See the *DPDK Getting Started* Guide for possible ``RTE_TARGET`` values.
+
+The callbacks feature requires that the ``CONFIG_RTE_ETHDEV_RXTX_CALLBACKS``
+setting is on in the ``config/common_`` config file that applies to the
+target. This is generally on by default:
+
+.. code-block:: console
+
+CONFIG_RTE_ETHDEV_RXTX_CALLBACKS=y
+
+Build the application as follows:
+
+.. code-block:: console
+
+make
+
+
+Running the Application
+---
+
+To run the example in a ``linuxapp`` environment:
+
+.. code-block:: console
+
+./build/rxtx_callbacks -c 2 -n 4
+
+Refer to *DPDK Getting Started Guide* for general information on running
+applications and the Environment Abstraction Layer (EAL) options.
+
+
+
+Explanation
+---
+
+The ``rxtx_callbacks`` application is mainly a simple forwarding application
+based on the :doc:`skeleton`. See that section of the documentation for more
+details of the forwarding part of the application.
+
+The sections below explain the additional RX/TX callback code.

[dpdk-dev] [PATCH 2/3] doc: add docs for basic forwarding skeleton app

2015-02-25 Thread John McNamara

Added a sample application guide for the basic forwarding
/skeleton app.

Signed-off-by: John McNamara 
---
 MAINTAINERS   |3 +
 doc/guides/sample_app_ug/index.rst|3 +-
 doc/guides/sample_app_ug/skeleton.rst |  338 +
 3 files changed, 343 insertions(+), 1 deletions(-)
 create mode 100644 doc/guides/sample_app_ug/skeleton.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 349ad2b..86c1c6b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -444,7 +444,10 @@ M: Bruce Richardson 
 M: John McNamara 
 F: examples/rxtx_callbacks/

+M: Bruce Richardson 
+M: John McNamara 
 F: examples/skeleton/
+F: doc/guides/sample_app_ug/skeleton.rst

 F: examples/vmdq/
 F: examples/vmdq_dcb/
diff --git a/doc/guides/sample_app_ug/index.rst 
b/doc/guides/sample_app_ug/index.rst
index 5720181..4e9d59b 100644
--- a/doc/guides/sample_app_ug/index.rst
+++ b/doc/guides/sample_app_ug/index.rst
@@ -1,5 +1,5 @@
 ..  BSD LICENSE
-Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
@@ -43,6 +43,7 @@ Sample Applications User Guide
 cmd_line
 exception_path
 hello_world
+skeleton
 ip_frag
 ipv4_multicast
 ip_reassembly
diff --git a/doc/guides/sample_app_ug/skeleton.rst 
b/doc/guides/sample_app_ug/skeleton.rst
new file mode 100644
index 000..e832c13
--- /dev/null
+++ b/doc/guides/sample_app_ug/skeleton.rst
@@ -0,0 +1,338 @@
+..  BSD LICENSE
+Copyright(c) 2015 Intel Corporation. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+Basic Forwarding Sample Application
+===
+
+The Basic Forwarding sample application is a simple *skeleton* example of a
+forwarding application.
+
+It is intended as a demonstration of the basic components of a DPDK forwarding
+application. For more detailed implementations see the L2 and L3 forwarding
+sample applications.
+
+
+Compiling the Application
+-
+
+To compile the application export the path to the DPDK source tree and go to
+the example directory:
+
+.. code-block:: console
+
+export RTE_SDK=/path/to/rte_sdk
+
+cd ${RTE_SDK}/examples/skeleton
+
+Set the target, for example:
+
+.. code-block:: console
+
+export RTE_TARGET=x86_64-native-linuxapp-gcc
+
+See the *DPDK Getting Started* Guide for possible ``RTE_TARGET`` values.
+
+Build the application as follows:
+
+.. code-block:: console
+
+make
+
+
+Running the Application
+---
+
+To run the example in a ``linuxapp`` environment:
+
+.. code-block:: console
+
+./build/basicfwd -c 2 -n 4
+
+Refer to *DPDK Getting Started Guide* for general information on running
+applications and the Environment Abstraction Layer (EAL) options.
+
+
+Explanation
+---
+
+The following sections provide an explanation of the main components of the
+code.
+
+All DPDK library functions used in the sample code are prefixed with ``rte_``
+and are explained in detail in the *DPDK API Documentation*.
+
+
+The Main Function
+~
+
+The ``main()`` function performs the initialization and calls the execution
+threads for each lcore.
+
+The first task is to initialize the Environment Abstraction Layer (EAL).  The
+``argc`` and ``argv`` arguments are provided to the ``rte_eal_init()``
+functio

[dpdk-dev] [PATCH 1/3] examples/skeleton: minor refactoring to help documentation

2015-02-25 Thread John McNamara

Minor refactoring and comments to make the sample app and
code examples clearer for the sample app guide.

Signed-off-by: John McNamara 
---
 examples/skeleton/basicfwd.c |   77 +++---
 1 files changed, 57 insertions(+), 20 deletions(-)

diff --git a/examples/skeleton/basicfwd.c b/examples/skeleton/basicfwd.c
index 6aa931e..1bce6e7 100644
--- a/examples/skeleton/basicfwd.c
+++ b/examples/skeleton/basicfwd.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -48,12 +48,14 @@
 #define BURST_SIZE 32

 static const struct rte_eth_conf port_conf_default = {
-   .rxmode = { .max_rx_pkt_len = ETHER_MAX_LEN, },
+   .rxmode = { .max_rx_pkt_len = ETHER_MAX_LEN }
 };

+/* basicfwd.c: Basic DPDK skeleton forwarding example. */
+
 /*
- * Initialises a given port using global settings and with the rx buffers
- * coming from the mbuf_pool passed as parameter
+ * Initializes a given port using global settings and with the RX buffers
+ * coming from the mbuf_pool passed as a parameter.
  */
 static inline int
 port_init(uint8_t port, struct rte_mempool *mbuf_pool)
@@ -66,10 +68,12 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
if (port >= rte_eth_dev_count())
return -1;

+   /* Configure the Ethernet device. */
retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf);
if (retval != 0)
return retval;

+   /* Allocate and set up 1 RX queue per Ethernet port. */
for (q = 0; q < rx_rings; q++) {
retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE,
rte_eth_dev_socket_id(port), NULL, mbuf_pool);
@@ -77,6 +81,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
return retval;
}

+   /* Allocate and set up 1 TX queue per Ethernet port. */
for (q = 0; q < tx_rings; q++) {
retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE,
rte_eth_dev_socket_id(port), NULL);
@@ -84,33 +89,41 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
return retval;
}

-   retval  = rte_eth_dev_start(port);
+   /* Start the Ethernet port. */
+   retval = rte_eth_dev_start(port);
if (retval < 0)
return retval;

+   /* Display the port MAC address. */
struct ether_addr addr;
rte_eth_macaddr_get(port, &addr);
-   printf("Port %u MAC: %02"PRIx8" %02"PRIx8" %02"PRIx8
-   " %02"PRIx8" %02"PRIx8" %02"PRIx8"\n",
+   printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8
+  " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n",
(unsigned)port,
addr.addr_bytes[0], addr.addr_bytes[1],
addr.addr_bytes[2], addr.addr_bytes[3],
addr.addr_bytes[4], addr.addr_bytes[5]);

+   /* Enable RX in promiscuous mode for the Ethernet device. */
rte_eth_promiscuous_enable(port);

return 0;
 }

 /*
- * Main thread that does the work, reading from INPUT_PORT
- * and writing to OUTPUT_PORT
+ * The lcore main. This is the main thread that does the work, reading from
+ * an input port and writing to an output port.
  */
-static  __attribute__((noreturn)) void
+static __attribute__((noreturn)) void
 lcore_main(void)
 {
const uint8_t nb_ports = rte_eth_dev_count();
uint8_t port;
+
+   /*
+* Check that the port is on the same NUMA node as the polling thread
+* for best performance.
+*/
for (port = 0; port < nb_ports; port++)
if (rte_eth_dev_socket_id(port) > 0 &&
rte_eth_dev_socket_id(port) !=
@@ -121,15 +134,28 @@ lcore_main(void)

printf("\nCore %u forwarding packets. [Ctrl+C to quit]\n",
rte_lcore_id());
+
+   /* Run until the application is quit or killed. */
for (;;) {
+   /*
+* Receive packets on a port and forward them on the paired
+* port. The mapping is 0 -> 1, 1 -> 0, 2 -> 3, 3 -> 2, etc.
+*/
for (port = 0; port < nb_ports; port++) {
+
+   /* Get burst of RX packets, from first port of pair. */
struct rte_mbuf *bufs[BURST_SIZE];
const uint16_t nb_rx = rte_eth_rx_burst(port, 0,
bufs, BURST_SIZE);
+
if (unlikely(nb_rx == 0))
continue;
+
+   /* Send burst of TX packets, to second port of pair. */

[dpdk-dev] [PATCH 0/3] additional sample app guides

2015-02-25 Thread John McNamara

This patchset includes two new sample app guides. 

The first is for the existing basic forwarding/skeleton application. 

The second is for the recently added rxtx_callbacks sample application.



John McNamara (3):
  examples/skeleton: minor refactoring to help documentation
  doc: add docs for basic forwarding skeleton app
  doc: add docs for the rxtx_callbacks sample app

 MAINTAINERS |4 +
 doc/guides/sample_app_ug/index.rst  |4 +-
 doc/guides/sample_app_ug/rxtx_callbacks.rst |  251 
 doc/guides/sample_app_ug/skeleton.rst   |  338 +++
 examples/skeleton/basicfwd.c|   77 +--
 5 files changed, 653 insertions(+), 21 deletions(-)
 create mode 100644 doc/guides/sample_app_ug/rxtx_callbacks.rst
 create mode 100644 doc/guides/sample_app_ug/skeleton.rst

-- 
1.7.4.1

[dpdk-dev] [PATCH] eal: Clean up export of per_lcore__socket_id

2015-02-25 Thread Neil Horman

On Wed, Feb 25, 2015 at 11:54:51PM +, Liang, Cunming wrote:
> Hi Neil,
> 
> Thanks for the cleanup.
> Does it better moving rte_socket_id() to eal_common_thread.c ?
> As it simply returns _socket_id, it's not necessary to have two copy in both 
> linux and bsd.
> 
> -Cunming
> 
Sure, I can respin this in the AM.
neil

> > -Original Message-
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Wednesday, February 25, 2015 10:34 PM
> > To: dev at dpdk.org
> > Cc: thomas.monjalon at 6wind.com; Liang, Cunming; Neil Horman
> > Subject: [PATCH] eal: Clean up export of per_lcore__socket_id
> > 
> > Theres no need to export this variable.  Its set and queried from an API 
> > call
> > that doesn't exist in the hot path.  Instead just export the rte_socket_id
> > symbol and make the variable private to protect it from type changes.  We 
> > should
> > do this with the other exported variables too, but I think its too late in 
> > the
> > release cycle to do that.
> > 
> > tested using distributor_autotest (which uses rte_socket_id), successfully.
> > Only tested on linux, as I don't currently have a bsd system spun up, but 
> > the
> > changes are symmetric, and should be fine
> > 
> > Signed-off-by: Neil Horman 
> > ---
> >  lib/librte_eal/bsdapp/eal/eal_thread.c  | 5 +
> >  lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 2 +-
> >  lib/librte_eal/common/eal_common_thread.c   | 2 ++
> >  lib/librte_eal/common/include/rte_lcore.h   | 7 +--
> >  lib/librte_eal/linuxapp/eal/eal_thread.c| 5 +
> >  lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 +-
> >  6 files changed, 15 insertions(+), 8 deletions(-)
> > 
> > diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > index ca95c72..5e6eea9 100644
> > --- a/lib/librte_eal/bsdapp/eal/eal_thread.c
> > +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
> > @@ -60,6 +60,11 @@ RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) =
> > LCORE_ID_ANY;
> >  RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
> >  RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
> > 
> > +unsigned rte_socket_id(void)
> > +{
> > +   return RTE_PER_LCORE(_socket_id);
> > +}
> > +
> >  /*
> >   * Send a message to a slave lcore identified by slave_id to call a
> >   * function f with argument arg. Once the execution is done, the
> > diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> > b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> > index 17515a9..d83524d 100644
> > --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> > +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
> > @@ -10,7 +10,6 @@ DPDK_2.0 {
> > pci_driver_list;
> > per_lcore__lcore_id;
> > per_lcore__rte_errno;
> > -   per_lcore__socket_id;
> > rte_cpu_check_supported;
> > rte_cpu_get_flag_enabled;
> > rte_cycles_vmware_tsc_map;
> > @@ -82,6 +81,7 @@ DPDK_2.0 {
> > rte_set_log_level;
> > rte_set_log_type;
> > rte_snprintf;
> > +   rte_socket_id;
> > rte_strerror;
> > rte_strsplit;
> > rte_sys_gettid;
> > diff --git a/lib/librte_eal/common/eal_common_thread.c
> > b/lib/librte_eal/common/eal_common_thread.c
> > index f4d9892..4010eab 100644
> > --- a/lib/librte_eal/common/eal_common_thread.c
> > +++ b/lib/librte_eal/common/eal_common_thread.c
> > @@ -46,6 +46,8 @@
> > 
> >  #include "eal_thread.h"
> > 
> > +RTE_DECLARE_PER_LCORE(unsigned , _socket_id);
> > +
> >  int eal_cpuset_socket_id(rte_cpuset_t *cpusetp)
> >  {
> > unsigned cpu = 0;
> > diff --git a/lib/librte_eal/common/include/rte_lcore.h
> > b/lib/librte_eal/common/include/rte_lcore.h
> > index 20a58eb..e03264e 100644
> > --- a/lib/librte_eal/common/include/rte_lcore.h
> > +++ b/lib/librte_eal/common/include/rte_lcore.h
> > @@ -81,7 +81,6 @@ struct lcore_config {
> >  extern struct lcore_config lcore_config[RTE_MAX_LCORE];
> > 
> >  RTE_DECLARE_PER_LCORE(unsigned, _lcore_id);  /**< Per thread "lcore id". */
> > -RTE_DECLARE_PER_LCORE(unsigned, _socket_id); /**< Per thread "socket id".
> > */
> >  RTE_DECLARE_PER_LCORE(rte_cpuset_t, _cpuset); /**< Per thread "cpuset". */
> > 
> >  /**
> > @@ -145,11 +144,7 @@ rte_lcore_index(int lcore_id)
> >   * @return
> >   *   the ID of current lcoreid's physical socket
> >   */
> > -static inline unsigned
> > -rte_socket_id(void)
> > -{
> > -   return RTE_PER_LCORE(_socket_id);
> > -}
> > +unsigned rte_socket_id(void);
> > 
> >  /**
> >   * Get the ID of the physical socket of the specified lcore
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c
> > b/lib/librte_eal/linuxapp/eal/eal_thread.c
> > index 5635c7d..9cacd86 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_thread.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
> > @@ -60,6 +60,11 @@ RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) =
> > LCORE_ID_ANY;
> >  RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
> >  RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
> > 
> > +unsigned rte_socket_id(v

[dpdk-dev] : ixgbe: why bulk allocation is not used for a scattered Rx flow?

2015-02-25 Thread Vlad Zolotarov


On 02/25/15 13:02, Bruce Richardson wrote:
> On Wed, Feb 25, 2015 at 11:40:36AM +0200, Vlad Zolotarov wrote:
>> Hi, I have a question about the "scattered Rx" feature: why enabling it
>> disabled "bulk allocation" feature?
> The "bulk-allocation" feature is one where a more optimized RX code path is
> used. For the sake of performance, when doing that code path, certain 
> assumptions
> were made, one of which was that packets would fit inside a single mbuf. Not
> having this assumption makes the receiving of packets much more complicated 
> and
> therefore slower. [For similar reasons, the optimized TX routines e.g. vector
> TX, are only used if it is guaranteed that no hardware offload features are
> going to be used].
>
> Now, it is possible, though challenging, to write optimized code for these 
> more
> complicated cases, such as scattered RX, or TX with offloads or scattered 
> packets.
> In general, we will always want separate routines for the simple case and the
> complicated cases, as the performance hit of checking for the offloads, or
> multi-mbuf packets will be significant enough to hit our performance badly 
> when
> they are not needed. In the case of the vector PMD for ixgbe - our highest
> performance path right now - we have indeed two receive routines, for simple
> and scattered cases. For TX, we only have an optimized path for the simple 
> case,
> but that is not to say that at some point someone may provide one for the
> offload case too.
>
> A final note on scattered packets in particular: if packets are too big to fit
> in a single mbuf, then they are not small packets, and the processing time per
> packet available is, by definition, larger than for packets that fit in a
> single mbuf. For 64-byte packets, the packet arrival rate is 67ns @ 10G, or
> approx 200 cycles at 3GHz. If we assume a standard 2k mbuf, then a packet 
> which
> spans two mbufs takes at least 1654ns, and therefore a 3GHz CPU has nearly 
> 5000
> cycles to process that same packet. Therefore, since the processing budget is
> so much bigger the need to optimize is much less. Therefore it's more 
> important
> to focus on the small packet case, which is what we have done.

Sure. I'm doing my best not to harm the existing code paths: the RSC 
handler is a separate function (i first patched the scalar scattered 
function but now I'm rewriting it as a stand alone routine), I don't 
change the igb_rx_entry (leave it to be a pointer) and keep the 
additional info in separate descriptors in a separate ring that is not 
accessed in a non-RSC flow.

>
>> There is some unclear comment in the ixgbe_recv_scattered_pkts():
>>
>>  /*
>>   * Descriptor done.
>>   *
>>   * Allocate a new mbuf to replenish the RX ring descriptor.
>>   * If the allocation fails:
>>   *- arrange for that RX descriptor to be the first one
>>   *  being parsed the next time the receive function is
>>   *  invoked [on the same queue].
>>   *
>>   *- Stop parsing the RX ring and return immediately.
>>   *
>>   * This policy does not drop the packet received in the RX
>>   * descriptor for which the allocation of a new mbuf failed.
>>   * Thus, it allows that packet to be later retrieved if
>>   * mbuf have been freed in the mean time.
>>   * As a side effect, holding RX descriptors instead of
>>   * systematically giving them back to the NIC may lead to
>>   * RX ring exhaustion situations.
>>   * However, the NIC can gracefully prevent such situations
>>   * to happen by sending specific "back-pressure" flow control
>>   * frames to its peer(s).
>>   */
>>
>> Why the same "policy" can't be done in the bulk-context allocation? - Don't
>> advance the RDT until u've refilled the ring. What do I miss here?
> A lot of the optimizations done in other code paths, such as bulk alloc, may 
> well
> be applicable here, it's just that the work has not been done yet, as the 
> focus
> is elsewhere. For vector PMD RX, we have now routines that work on both 
> regular
> and scattered packets, and both perform much better than the scalar 
> equivalents.
> Also to note that in every RX (and TX) routine, the NIC tail pointer update is
> always done just once at the end of the function.

I see. Thanks for an educated clarification. Although I've spent some 
time with DPDK I still feel sometimes that I don't I fully understand 
the original author's idea and the clarifications like your really help.
I looked at the vectored receive function (_recv_raw_pkts_vec()) and it 
is one cryptic piece of a code! ;) Since u've brought it up - could u 
direct me to the measurements comparing the vectored  and scalar DPDK 
data paths please? I wonder how working without CSUM offload for 
instance may be fa

[dpdk-dev] [PATCH v2] app/test: add crc32 algorithms equivalence check

2015-02-25 Thread Yerden Zhumabekov

All notes taken into account. v3 posted.

25.02.2015 17:34, Bruce Richardson ?:
> On Wed, Feb 25, 2015 at 10:08:32AM +0600, Yerden Zhumabekov wrote:
>> New function test_crc32_hash_alg_equiv() checks whether software,
>> 4-byte operand and 8-byte operand versions of CRC32 hash function
>> implementations return the same result value.
>>
>> Signed-off-by: Yerden Zhumabekov 
> Two small notes below for improving output on error.
>
> Acked-by: Bruce Richardson 
>
>> ---
>>  app/test/test_hash.c |   63 
>> ++
>>  1 file changed, 63 insertions(+)
>>
>> diff --git a/app/test/test_hash.c b/app/test/test_hash.c
>> index 76b1b8f..3e94af1 100644
>> --- a/app/test/test_hash.c
>> +++ b/app/test/test_hash.c
>> @@ -177,6 +177,66 @@ static struct rte_hash_parameters ut_params = {
>>  .socket_id = 0,
>>  };
>>  
>> +#define CRC32_ITERATIONS (1U << 20)
>> +#define CRC32_DWORDS (1U << 6)
>> +/*
>> + * Test if all CRC32 implementations yield the same hash value
>> + */
>> +static int
>> +test_crc32_hash_alg_equiv(void)
>> +{
>> +uint32_t hash_val;
>> +uint32_t init_val;
>> +uint64_t data64[CRC32_DWORDS];
>> +unsigned i, j;
>> +size_t data_len;
>> +
>> +printf("# CRC32 implementations equivalence test\n");
>> +for (i = 0; i < CRC32_ITERATIONS; i++) {
>> +/* Randomizing data_len of data set */
>> +data_len = (size_t) ((rte_rand() % sizeof(data64)) + 1);
>> +init_val = (uint32_t) rte_rand();
>> +
>> +/* Fill the data set */
>> +for (j = 0; j < CRC32_DWORDS; j++)
>> +data64[j] = rte_rand();
>> +
>> +/* Calculate software CRC32 */
>> +rte_hash_crc_set_alg(CRC32_SW);
>> +hash_val = rte_hash_crc(data64, data_len, init_val);
>> +
>> +/* Check against 4-byte-operand sse4.2 CRC32 if available */
>> +rte_hash_crc_set_alg(CRC32_SSE42);
>> +if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
>> +printf("Failed checking CRC32_SW against 
>> CRC32_SSE42\n");
>> +break;
>> +}
>> +
>> +/* Check against 8-byte-operand sse4.2 CRC32 if available */
>> +rte_hash_crc_set_alg(CRC32_SSE42_x64);
>> +if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
>> +printf("Failed checking CRC32_SW against 
>> CRC32_SSE42_x64\n");
>> +break;
>> +}
>> +}
>> +
>> +/* Resetting to best available algorithm */
>> +rte_hash_crc_set_alg(CRC32_SSE42_x64);
>> +
>> +if (i == CRC32_ITERATIONS)
>> +return 0;
>> +
>> +printf("Failed test data (hex):\n");
>> +
>> +for (j = 0; j < data_len; j++) {
>> +printf("%02X", ((uint8_t *)data64)[j]);
> Put in a space after each hex character, otherwise it comes out like:
>
> Failed test data (hex):
> AAD292776348010C7A18D3080DB3A300
> FD
> Test Failed
>
> [I forced a failure by changing a != to == to test it, don't worry, the
> hash calculations are fine! :-)]
>
>> +if ((j+1) % 16 == 0 || j == data_len - 1)
>> +printf("\n");
>> +}
> Maybe also print out here, or before the hex digits, the length of the data
> that was tested. e.g. "printf("%u bytes total\n", data_len);" or similar.
>> +
>> +return -1;
>> +}
>> +
>>  /*
>>   * Test a hash function.
>>   */
>> @@ -1356,6 +1416,9 @@ test_hash(void)
>>  
>>  run_hash_func_tests();
>>  
>> +if (test_crc32_hash_alg_equiv() < 0)
>> +return -1;
>> +
>>  return 0;
>>  }
>>  
>> -- 
>> 1.7.9.5
>>

-- 
Sincerely,

Yerden Zhumabekov
State Technical Service
Astana, KZ

[dpdk-dev] [PATCH v3] app/test: add crc32 algorithms equivalence check

2015-02-25 Thread Yerden Zhumabekov

New function test_crc32_hash_alg_equiv() checks whether software,
4-byte operand and 8-byte operand versions of CRC32 hash function
implementations return the same result value.

Signed-off-by: Yerden Zhumabekov 
---
 app/test/test_hash.c |   60 ++
 1 file changed, 60 insertions(+)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 76b1b8f..653dd86 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -177,6 +177,63 @@ static struct rte_hash_parameters ut_params = {
.socket_id = 0,
 };

+#define CRC32_ITERATIONS (1U << 20)
+#define CRC32_DWORDS (1U << 6)
+/*
+ * Test if all CRC32 implementations yield the same hash value
+ */
+static int
+test_crc32_hash_alg_equiv(void)
+{
+   uint32_t hash_val;
+   uint32_t init_val;
+   uint64_t data64[CRC32_DWORDS];
+   unsigned i, j;
+   size_t data_len;
+
+   printf("# CRC32 implementations equivalence test\n");
+   for (i = 0; i < CRC32_ITERATIONS; i++) {
+   /* Randomizing data_len of data set */
+   data_len = (size_t) ((rte_rand() % sizeof(data64)) + 1);
+   init_val = (uint32_t) rte_rand();
+
+   /* Fill the data set */
+   for (j = 0; j < CRC32_DWORDS; j++)
+   data64[j] = rte_rand();
+
+   /* Calculate software CRC32 */
+   rte_hash_crc_set_alg(CRC32_SW);
+   hash_val = rte_hash_crc(data64, data_len, init_val);
+
+   /* Check against 4-byte-operand sse4.2 CRC32 if available */
+   rte_hash_crc_set_alg(CRC32_SSE42);
+   if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
+   printf("Failed checking CRC32_SW against 
CRC32_SSE42\n");
+   break;
+   }
+
+   /* Check against 8-byte-operand sse4.2 CRC32 if available */
+   rte_hash_crc_set_alg(CRC32_SSE42_x64);
+   if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
+   printf("Failed checking CRC32_SW against 
CRC32_SSE42_x64\n");
+   break;
+   }
+   }
+
+   /* Resetting to best available algorithm */
+   rte_hash_crc_set_alg(CRC32_SSE42_x64);
+
+   if (i == CRC32_ITERATIONS)
+   return 0;
+
+   printf("Failed test data (hex, %lu bytes total):\n", data_len);
+   for (j = 0; j < data_len; j++)
+   printf("%02X%c", ((uint8_t *)data64)[j],
+   ((j+1) % 16 == 0 || j == data_len - 1) ? '\n' : 
' ');
+
+   return -1;
+}
+
 /*
  * Test a hash function.
  */
@@ -1356,6 +1413,9 @@ test_hash(void)

run_hash_func_tests();

+   if (test_crc32_hash_alg_equiv() < 0)
+   return -1;
+
return 0;
 }

-- 
1.7.9.5

[dpdk-dev] [PATCH 7/7] vmxnet3: support jumbo frames

2015-02-25 Thread Stephen Hemminger

Add support for linking multi-segment buffers together to
handle Jumbo packets.

Signed-off-by: Stephen Hemminger 
---
v2 -- add missing pieces from last version

 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c |  3 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ring.h   |  2 +
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c   | 76 -
 3 files changed, 50 insertions(+), 31 deletions(-)

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
index 35bb561..4f1bc4f 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
@@ -401,6 +401,7 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev)
 {
struct rte_eth_conf port_conf = dev->data->dev_conf;
struct vmxnet3_hw *hw = dev->data->dev_private;
+   uint32_t mtu = dev->data->mtu;
Vmxnet3_DriverShared *shared = hw->shared;
Vmxnet3_DSDevRead *devRead = &shared->devRead;
uint32_t *mac_ptr;
@@ -418,7 +419,7 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev)
devRead->misc.driverInfo.vmxnet3RevSpt = 1;
devRead->misc.driverInfo.uptVerSpt = 1;

-   devRead->misc.mtu = rte_le_to_cpu_32(dev->data->mtu);
+   devRead->misc.mtu = rte_le_to_cpu_32(mtu);
devRead->misc.queueDescPA  = hw->queueDescPA;
devRead->misc.queueDescLen = hw->queue_desc_len;
devRead->misc.numTxQueues  = hw->num_tx_queues;
diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ring.h 
b/lib/librte_pmd_vmxnet3/vmxnet3_ring.h
index 612487e..55ceadf 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_ring.h
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_ring.h
@@ -171,6 +171,8 @@ typedef struct vmxnet3_rx_queue {
uint32_tqid1;
uint32_tqid2;
Vmxnet3_RxQueueDesc *shared;
+   struct rte_mbuf *start_seg;
+   struct rte_mbuf *last_seg;
struct vmxnet3_rxq_statsstats;
boolstopped;
uint16_tqueue_id;  /**< Device RX queue index. 
*/
diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
index 82bcae6..b7babea 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
@@ -555,7 +555,6 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
vmxnet3_rx_queue_t *rxq;
Vmxnet3_RxCompDesc *rcd;
vmxnet3_buf_info_t *rbi;
-   Vmxnet3_RxDesc *rxd;
struct rte_mbuf *rxm = NULL;
struct vmxnet3_hw *hw;

@@ -580,42 +579,18 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)

idx = rcd->rxdIdx;
ring_idx = (uint8_t)((rcd->rqID == rxq->qid1) ? 0 : 1);
-   rxd = (Vmxnet3_RxDesc *)rxq->cmd_ring[ring_idx].base + idx;
rbi = rxq->cmd_ring[ring_idx].buf_info + idx;

-   if (unlikely(rcd->sop != 1 || rcd->eop != 1)) {
-   rte_pktmbuf_free_seg(rbi->m);
-   PMD_RX_LOG(DEBUG, "Packet spread across multiple 
buffers\n)");
-   goto rcd_done;
-   }

PMD_RX_LOG(DEBUG, "rxd idx: %d ring idx: %d.", idx, ring_idx);

 #ifdef RTE_LIBRTE_VMXNET3_DEBUG_DRIVER
+   Vmxnet3_RxDesc *rxd
+   = (Vmxnet3_RxDesc *)rxq->cmd_ring[ring_idx].base + idx;
VMXNET3_ASSERT(rcd->len <= rxd->len);
VMXNET3_ASSERT(rbi->m);
 #endif
-   if (unlikely(rcd->len == 0)) {
-   PMD_RX_LOG(DEBUG, "Rx buf was skipped. 
rxring[%d][%d]\n)",
-  ring_idx, idx);
-#ifdef RTE_LIBRTE_VMXNET3_DEBUG_DRIVER
-   VMXNET3_ASSERT(rcd->sop && rcd->eop);
-#endif
-   rte_pktmbuf_free_seg(rbi->m);
-   goto rcd_done;
-   }

-   /* Assuming a packet is coming in a single packet buffer */
-   if (unlikely(rxd->btype != VMXNET3_RXD_BTYPE_HEAD)) {
-   PMD_RX_LOG(DEBUG,
-  "Alert : Misbehaving device, incorrect "
-  " buffer type used. iPacket dropped.");
-   rte_pktmbuf_free_seg(rbi->m);
-   goto rcd_done;
-   }
-#ifdef RTE_LIBRTE_VMXNET3_DEBUG_DRIVER
-   VMXNET3_ASSERT(rxd->btype == VMXNET3_RXD_BTYPE_HEAD);
-#endif
/* Get the packet buffer pointer from buf_info */
rxm = rbi->m;

@@ -627,7 +602,7 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
rxq->cmd_ring[ring_idx].next2comp = idx;

/* For RCD with EOP set, check if there is frame error */
-   if (unlikely(rcd->err)) {
+   if (unlikely(rcd->eop && rcd->err)) {
rxq->stats.drop_total++;

[dpdk-dev] [PATCH 6/7] vmxnet3: support RSS and refactor offload

2015-02-25 Thread Stephen Hemminger

Refactor the logic to compute receive offload flags to a simpler
function. And add support for putting RSS flow hash into packet.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Bill Hong 
---
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 69 ---
 1 file changed, 40 insertions(+), 29 deletions(-)

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
index 884b57f..82bcae6 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
@@ -505,6 +505,43 @@ vmxnet3_post_rx_bufs(vmxnet3_rx_queue_t *rxq, uint8_t 
ring_id)
return i;
 }

+
+/* Receive side checksum and other offloads */
+static void
+vmxnet3_rx_offload(const Vmxnet3_RxCompDesc *rcd, struct rte_mbuf *rxm)
+{
+   /* Check for hardware stripped VLAN tag */
+   if (rcd->ts) {
+   rxm->ol_flags |= PKT_RX_VLAN_PKT;
+   rxm->vlan_tci = rte_le_to_cpu_16((uint16_t)rcd->tci);
+   }
+
+   /* Check for RSS */
+   if (rcd->rssType != VMXNET3_RCD_RSS_TYPE_NONE) {
+   rxm->ol_flags |= PKT_RX_RSS_HASH;
+   rxm->hash.rss = rcd->rssHash;
+   }
+
+   /* Check packet type, checksum errors, etc. Only support IPv4 for now. 
*/
+   if (rcd->v4) {
+   struct ether_hdr *eth = rte_pktmbuf_mtod(rxm, struct ether_hdr 
*);
+   struct ipv4_hdr *ip = (struct ipv4_hdr *)(eth + 1);
+
+   if (((ip->version_ihl & 0xf) << 2) > (int)sizeof(struct 
ipv4_hdr))
+   rxm->ol_flags |= PKT_RX_IPV4_HDR_EXT;
+   else
+   rxm->ol_flags |= PKT_RX_IPV4_HDR;
+
+   if (!rcd->cnc) {
+   if (!rcd->ipc)
+   rxm->ol_flags |= PKT_RX_IP_CKSUM_BAD;
+
+   if ((rcd->tcp || rcd->udp) && !rcd->tuc)
+   rxm->ol_flags |= PKT_RX_L4_CKSUM_BAD;
+   }
+   }
+}
+
 /*
  * Process the Rx Completion Ring of given vmxnet3_rx_queue
  * for nb_pkts burst and return the number of packets received
@@ -605,17 +642,6 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
goto rcd_done;
}

-   /* Check for hardware stripped VLAN tag */
-   if (rcd->ts) {
-   PMD_RX_LOG(DEBUG, "Received packet with vlan ID: %d.",
-  rcd->tci);
-   rxm->ol_flags = PKT_RX_VLAN_PKT;
-   /* Copy vlan tag in packet buffer */
-   rxm->vlan_tci = rte_le_to_cpu_16((uint16_t)rcd->tci);
-   } else {
-   rxm->ol_flags = 0;
-   rxm->vlan_tci = 0;
-   }

/* Initialize newly received packet buffer */
rxm->port = rxq->port_id;
@@ -624,25 +650,10 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
rxm->pkt_len = (uint16_t)rcd->len;
rxm->data_len = (uint16_t)rcd->len;
rxm->data_off = RTE_PKTMBUF_HEADROOM;
+   rxm->ol_flags = 0;
+   rxm->vlan_tci = 0;

-   /* Check packet type, checksum errors, etc. Only support IPv4 
for now. */
-   if (rcd->v4) {
-   struct ether_hdr *eth = rte_pktmbuf_mtod(rxm, struct 
ether_hdr *);
-   struct ipv4_hdr *ip = (struct ipv4_hdr *)(eth + 1);
-
-   if (((ip->version_ihl & 0xf) << 2) > (int)sizeof(struct 
ipv4_hdr))
-   rxm->ol_flags |= PKT_RX_IPV4_HDR_EXT;
-   else
-   rxm->ol_flags |= PKT_RX_IPV4_HDR;
-
-   if (!rcd->cnc) {
-   if (!rcd->ipc)
-   rxm->ol_flags |= PKT_RX_IP_CKSUM_BAD;
-
-   if ((rcd->tcp || rcd->udp) && !rcd->tuc)
-   rxm->ol_flags |= PKT_RX_L4_CKSUM_BAD;
-   }
-   }
+   vmxnet3_rx_offload(rcd, rxm);

rx_pkts[nb_rx++] = rxm;
 rcd_done:
-- 
2.1.4

[dpdk-dev] [PATCH 5/7] vmxnet3: fix link state handling

2015-02-25 Thread Stephen Hemminger

The Intel version of VMXNET3 driver does not handle link state properly.
The VMXNET3 API returns 1 if connected and 0 if disconnected.
Also need to return correct value to indicate state change.

Signed-off-by: Stephen Hemminger 
Acked-by: Yong Wang 
---
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c | 54 -
 1 file changed, 39 insertions(+), 15 deletions(-)

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
index 570565a..35bb561 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
@@ -157,9 +157,36 @@ gpa_zone_reserve(struct rte_eth_dev *dev, uint32_t size,
  *   - On success, zero.
  *   - On failure, negative value.
  */
-static inline int
-rte_vmxnet3_dev_atomic_write_link_status(struct rte_eth_dev *dev,
-   struct rte_eth_link *link)
+
+static int
+vmxnet3_dev_atomic_read_link_status(struct rte_eth_dev *dev,
+   struct rte_eth_link *link)
+{
+   struct rte_eth_link *dst = link;
+   struct rte_eth_link *src = &(dev->data->dev_link);
+
+   if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
+   *(uint64_t *)src) == 0)
+   return -1;
+
+   return 0;
+}
+
+/**
+ * Atomically writes the link status information into global
+ * structure rte_eth_dev.
+ *
+ * @param dev
+ *   - Pointer to the structure rte_eth_dev to write to.
+ *   - Pointer to the buffer to be saved with the link status.
+ *
+ * @return
+ *   - On success, zero.
+ *   - On failure, negative value.
+ */
+static int
+vmxnet3_dev_atomic_write_link_status(struct rte_eth_dev *dev,
+struct rte_eth_link *link)
 {
struct rte_eth_link *dst = &(dev->data->dev_link);
struct rte_eth_link *src = link;
@@ -391,6 +418,7 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev)
devRead->misc.driverInfo.vmxnet3RevSpt = 1;
devRead->misc.driverInfo.uptVerSpt = 1;

+   devRead->misc.mtu = rte_le_to_cpu_32(dev->data->mtu);
devRead->misc.queueDescPA  = hw->queueDescPA;
devRead->misc.queueDescLen = hw->queue_desc_len;
devRead->misc.numTxQueues  = hw->num_tx_queues;
@@ -576,7 +604,7 @@ vmxnet3_dev_stop(struct rte_eth_dev *dev)

/* Clear recorded link status */
memset(&link, 0, sizeof(link));
-   rte_vmxnet3_dev_atomic_write_link_status(dev, &link);
+   vmxnet3_dev_atomic_write_link_status(dev, &link);
 }

 /*
@@ -659,28 +687,24 @@ static int
 vmxnet3_dev_link_update(struct rte_eth_dev *dev, __attribute__((unused)) int 
wait_to_complete)
 {
struct vmxnet3_hw *hw = dev->data->dev_private;
-   struct rte_eth_link link;
+   struct rte_eth_link old, link;
uint32_t ret;

+   memset(&link, 0, sizeof(link));
+   vmxnet3_dev_atomic_read_link_status(dev, &old);
+
VMXNET3_WRITE_BAR1_REG(hw, VMXNET3_REG_CMD, VMXNET3_CMD_GET_LINK);
ret = VMXNET3_READ_BAR1_REG(hw, VMXNET3_REG_CMD);

-   if (!ret) {
-   PMD_INIT_LOG(ERR, "Link Status Negative : %s()", __func__);
-   return -1;
-   }
-
if (ret & 0x1) {
link.link_status = 1;
link.link_duplex = ETH_LINK_FULL_DUPLEX;
link.link_speed = ETH_LINK_SPEED_1;
-
-   rte_vmxnet3_dev_atomic_write_link_status(dev, &link);
-
-   return 0;
}

-   return -1;
+   vmxnet3_dev_atomic_write_link_status(dev, &link);
+
+   return (old.link_status == link.link_status) ? -1 : 0;
 }

 /* Updating rxmode through Vmxnet3_DriverShared structure in adapter */
-- 
2.1.4

[dpdk-dev] [PATCH 4/7] vmxnet3: add support for multi-segment transmit

2015-02-25 Thread Stephen Hemminger

Change sending loop to support multi-segment mbufs.
The VMXNET3 api has start-of-packet and end-packet flags, so it
is not hard to send multi-segment mbuf's.

Also, update descriptor in 32 bit value rather than toggling
bitfields which is slower and error prone.
Based on code in earlier driver, and the Linux kernel driver.

Add a compiler barrier to make sure that update of earlier descriptor
are completed prior to update of generation bit on start of packet.

Signed-off-by: Stephen Hemminger 
---
v2 -- incorporate # of segments check

 lib/librte_pmd_vmxnet3/vmxnet3_ring.h |   1 +
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 137 --
 2 files changed, 65 insertions(+), 73 deletions(-)

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ring.h 
b/lib/librte_pmd_vmxnet3/vmxnet3_ring.h
index ebe6268..612487e 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_ring.h
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_ring.h
@@ -125,6 +125,7 @@ struct vmxnet3_txq_stats {
 * the counters below track droppings due to
 * different reasons
 */
+   uint64_tdrop_too_many_segs;
uint64_tdrop_tso;
uint64_ttx_ring_full;
 };
diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
index 38ac811..884b57f 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
@@ -312,20 +312,22 @@ vmxnet3_tq_tx_complete(vmxnet3_tx_queue_t *txq)
VMXNET3_ASSERT(txq->cmd_ring.base[tcd->txdIdx].txd.eop == 1);
 #endif
mbuf = txq->cmd_ring.buf_info[tcd->txdIdx].m;
-   if (unlikely(mbuf == NULL))
-   rte_panic("EOP desc does not point to a valid mbuf");
-   else
-   rte_pktmbuf_free(mbuf);
+   rte_pktmbuf_free_seg(mbuf);
+   txq->cmd_ring.buf_info[tcd->txdIdx].m = NULL;

+   while (txq->cmd_ring.next2comp != tcd->txdIdx) {
+   mbuf = 
txq->cmd_ring.buf_info[txq->cmd_ring.next2comp].m;
+   txq->cmd_ring.buf_info[txq->cmd_ring.next2comp].m = 
NULL;
+   rte_pktmbuf_free_seg(mbuf);

-   txq->cmd_ring.buf_info[tcd->txdIdx].m = NULL;
-   /* Mark the txd for which tcd was generated as completed */
-   vmxnet3_cmd_ring_adv_next2comp(&txq->cmd_ring);
+   /* Mark the txd for which tcd was generated as 
completed */
+   vmxnet3_cmd_ring_adv_next2comp(&txq->cmd_ring);
+   completed++;
+   }

vmxnet3_comp_ring_adv_next2proc(comp_ring);
tcd = (struct Vmxnet3_TxCompDesc *)(comp_ring->base +
comp_ring->next2proc);
-   completed++;
}

PMD_TX_LOG(DEBUG, "Processed %d tx comps & command descs.", completed);
@@ -336,13 +338,8 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
  uint16_t nb_pkts)
 {
uint16_t nb_tx;
-   Vmxnet3_TxDesc *txd = NULL;
-   vmxnet3_buf_info_t *tbi = NULL;
-   struct vmxnet3_hw *hw;
-   struct rte_mbuf *txm;
vmxnet3_tx_queue_t *txq = tx_queue;
-
-   hw = txq->hw;
+   struct vmxnet3_hw *hw = txq->hw;

if (unlikely(txq->stopped)) {
PMD_TX_LOG(DEBUG, "Tx queue is stopped.");
@@ -354,75 +351,69 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,

nb_tx = 0;
while (nb_tx < nb_pkts) {
+   Vmxnet3_GenericDesc *gdesc;
+   vmxnet3_buf_info_t *tbi;
+   uint32_t first2fill, avail, dw2;
+   struct rte_mbuf *txm = tx_pkts[nb_tx];
+   struct rte_mbuf *m_seg = txm;
+
+   /* Is this packet execessively fragmented, then drop */
+   if (unlikely(txm->nb_segs > VMXNET3_MAX_TXD_PER_PKT)) {
+   ++txq->stats.drop_too_many_segs;
+   ++txq->stats.drop_total;
+   rte_pktmbuf_free(txm);
+   ++nb_tx;
+   continue;
+   }

-   if (vmxnet3_cmd_ring_desc_avail(&txq->cmd_ring)) {
-   int copy_size = 0;
-
-   txm = tx_pkts[nb_tx];
-   /* Don't support scatter packets yet, free them if met 
*/
-   if (txm->nb_segs != 1) {
-   PMD_TX_LOG(DEBUG, "Don't support scatter 
packets yet, drop!");
-   rte_pktmbuf_free(tx_pkts[nb_tx]);
-   txq->stats.drop_total++;
-
-   nb_tx++;
-   continue;
-   }
-
-   txd = (Vmxnet3_TxDesc *)(txq->cmd_ring.base + 
txq->cmd_ring.next2fill);
-

[dpdk-dev] [PATCH 3/7] vmxnet3: cleanup txq stats

2015-02-25 Thread Stephen Hemminger

There are several stats here which are never set, and have no way
to be displayed.  Assume in future xstats could be used.

Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_vmxnet3/vmxnet3_ring.h | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ring.h 
b/lib/librte_pmd_vmxnet3/vmxnet3_ring.h
index c5abdb6..ebe6268 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_ring.h
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_ring.h
@@ -121,16 +121,12 @@ vmxnet3_comp_ring_adv_next2proc(struct vmxnet3_comp_ring 
*ring)
 }

 struct vmxnet3_txq_stats {
-   uint64_t   drop_total; /* # of pkts dropped by the driver, 
the
-  * 
counters below track droppings due to
-  * 
different reasons
-  */
-   uint64_t   drop_oversized;
-   uint64_t   drop_hdr_inspect_err;
-   uint64_t   drop_tso;
-   uint64_t   deferred;
-   uint64_t   tx_ring_full;
-   uint64_t   linearized;  /* # of pkts linearized */
+   uint64_tdrop_total; /* # of pkts dropped by the driver,
+* the counters below track droppings due to
+* different reasons
+*/
+   uint64_tdrop_tso;
+   uint64_ttx_ring_full;
 };

 typedef struct vmxnet3_tx_ctx {
-- 
2.1.4

[dpdk-dev] [PATCH 2/7] vmxnet3: remove mtu check

2015-02-25 Thread Stephen Hemminger

Remove check for packets greater than MTU. No other driver does
this, it should be handled at higher layer

Signed-off-by: Stephen Hemminger 
Acked-by: Yong Wang 
---
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c |  2 --
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.h |  1 -
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c   | 10 --
 3 files changed, 13 deletions(-)

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
index 23b4558..570565a 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
@@ -219,7 +219,6 @@ eth_vmxnet3_dev_init(__attribute__((unused)) struct 
eth_driver *eth_drv,

hw->num_rx_queues = 1;
hw->num_tx_queues = 1;
-   hw->cur_mtu = ETHER_MTU;
hw->bufs_per_pkt = 1;

/* Check h/w version compatibility with driver. */
@@ -394,7 +393,6 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev)

devRead->misc.queueDescPA  = hw->queueDescPA;
devRead->misc.queueDescLen = hw->queue_desc_len;
-   devRead->misc.mtu  = hw->cur_mtu;
devRead->misc.numTxQueues  = hw->num_tx_queues;
devRead->misc.numRxQueues  = hw->num_rx_queues;

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.h 
b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.h
index e97e3ca..b392061 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.h
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.h
@@ -107,7 +107,6 @@ struct vmxnet3_hw {
uint8_t num_tx_queues;
uint8_t num_rx_queues;
uint8_t bufs_per_pkt;
-   uint16_t cur_mtu;

Vmxnet3_TxQueueDesc   *tqd_start;   /* start address of all tx 
queue desc */
Vmxnet3_RxQueueDesc   *rqd_start;   /* start address of all rx 
queue desc */
diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
index 5fe3de5..38ac811 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
@@ -369,16 +369,6 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
continue;
}

-   /* Needs to minus ether header len */
-   if (txm->data_len > (hw->cur_mtu + ETHER_HDR_LEN)) {
-   PMD_TX_LOG(DEBUG, "Packet data_len higher than 
MTU");
-   rte_pktmbuf_free(tx_pkts[nb_tx]);
-   txq->stats.drop_total++;
-
-   nb_tx++;
-   continue;
-   }
-
txd = (Vmxnet3_TxDesc *)(txq->cmd_ring.base + 
txq->cmd_ring.next2fill);
if (rte_pktmbuf_pkt_len(txm) <= VMXNET3_HDR_COPY_SIZE) {
struct Vmxnet3_TxDataDesc *tdd;
-- 
2.1.4

[dpdk-dev] [PATCH 1/7] vmxnet3: enable VLAN filtering

2015-02-25 Thread Stephen Hemminger

Support the VLAN filter functionality of the VMXNET3 interface.

Signed-off-by: Stephen Hemminger 
---
v2 -- incorporate comments from Yong Wang

 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c | 105 +---
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.h |   3 +-
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c   |  31 +-
 3 files changed, 101 insertions(+), 38 deletions(-)

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
index 6068c60..23b4558 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
@@ -87,6 +87,12 @@ static void vmxnet3_dev_stats_get(struct rte_eth_dev *dev,
struct rte_eth_stats *stats);
 static void vmxnet3_dev_info_get(struct rte_eth_dev *dev,
struct rte_eth_dev_info *dev_info);
+static int vmxnet3_dev_vlan_filter_set(struct rte_eth_dev *dev,
+  uint16_t vid, int on);
+static void vmxnet3_dev_vlan_offload_set(struct rte_eth_dev *dev, int mask);
+static void vmxnet3_dev_vlan_offload_set_clear(struct rte_eth_dev *dev,
+   int mask, int clear);
+
 #if PROCESS_SYS_EVENTS == 1
 static void vmxnet3_process_events(struct vmxnet3_hw *);
 #endif
@@ -113,6 +119,8 @@ static struct eth_dev_ops vmxnet3_eth_dev_ops = {
.link_update  = vmxnet3_dev_link_update,
.stats_get= vmxnet3_dev_stats_get,
.dev_infos_get= vmxnet3_dev_info_get,
+   .vlan_filter_set  = vmxnet3_dev_vlan_filter_set,
+   .vlan_offload_set = vmxnet3_dev_vlan_offload_set,
.rx_queue_setup   = vmxnet3_dev_rx_queue_setup,
.rx_queue_release = vmxnet3_dev_rx_queue_release,
.tx_queue_setup   = vmxnet3_dev_tx_queue_setup,
@@ -371,7 +379,7 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev)
Vmxnet3_DSDevRead *devRead = &shared->devRead;
uint32_t *mac_ptr;
uint32_t val, i;
-   int ret;
+   int ret, mask;

shared->magic = VMXNET3_REV1_MAGIC;
devRead->misc.driverInfo.version = VMXNET3_DRIVER_VERSION_NUM;
@@ -442,9 +450,6 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev)
if (dev->data->dev_conf.rxmode.hw_ip_checksum)
devRead->misc.uptFeatures |= VMXNET3_F_RXCSUM;

-   if (dev->data->dev_conf.rxmode.hw_vlan_strip)
-   devRead->misc.uptFeatures |= VMXNET3_F_RXVLAN;
-
if (port_conf.rxmode.mq_mode == ETH_MQ_RX_RSS) {
ret = vmxnet3_rss_configure(dev);
if (ret != VMXNET3_SUCCESS)
@@ -456,11 +461,14 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev)
devRead->rssConfDesc.confPA  = hw->rss_confPA;
}

-   if (dev->data->dev_conf.rxmode.hw_vlan_filter) {
-   ret = vmxnet3_vlan_configure(dev);
-   if (ret != VMXNET3_SUCCESS)
-   return ret;
-   }
+   mask = 0;
+   if (dev->data->dev_conf.rxmode.hw_vlan_strip)
+   mask |= ETH_VLAN_STRIP_MASK;
+
+   if (dev->data->dev_conf.rxmode.hw_vlan_filter)
+   mask |= ETH_VLAN_FILTER_MASK;
+
+   vmxnet3_dev_vlan_offload_set_clear(dev, mask, 1);

PMD_INIT_LOG(DEBUG,
 "Writing MAC Address : %02x:%02x:%02x:%02x:%02x:%02x",
@@ -696,8 +704,13 @@ static void
 vmxnet3_dev_promiscuous_enable(struct rte_eth_dev *dev)
 {
struct vmxnet3_hw *hw = dev->data->dev_private;
+   uint32_t *vf_table = hw->shared->devRead.rxFilterConf.vfTable;

+   memset(vf_table, 0, VMXNET3_VFT_TABLE_SIZE);
vmxnet3_dev_set_rxmode(hw, VMXNET3_RXM_PROMISC, 1);
+
+   VMXNET3_WRITE_BAR1_REG(hw, VMXNET3_REG_CMD,
+  VMXNET3_CMD_UPDATE_VLAN_FILTERS);
 }

 /* Promiscuous supported only if Vmxnet3_DriverShared is initialized in 
adapter */
@@ -705,8 +718,12 @@ static void
 vmxnet3_dev_promiscuous_disable(struct rte_eth_dev *dev)
 {
struct vmxnet3_hw *hw = dev->data->dev_private;
+   uint32_t *vf_table = hw->shared->devRead.rxFilterConf.vfTable;

+   memcpy(vf_table, hw->shadow_vfta, VMXNET3_VFT_TABLE_SIZE);
vmxnet3_dev_set_rxmode(hw, VMXNET3_RXM_PROMISC, 0);
+   VMXNET3_WRITE_BAR1_REG(hw, VMXNET3_REG_CMD,
+  VMXNET3_CMD_UPDATE_VLAN_FILTERS);
 }

 /* Allmulticast supported only if Vmxnet3_DriverShared is initialized in 
adapter */
@@ -727,6 +744,76 @@ vmxnet3_dev_allmulticast_disable(struct rte_eth_dev *dev)
vmxnet3_dev_set_rxmode(hw, VMXNET3_RXM_ALL_MULTI, 0);
 }

+/* Enable/disable filter on vlan */
+static int
+vmxnet3_dev_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vid, int on)
+{
+   struct vmxnet3_hw *hw = dev->data->dev_private;
+   struct Vmxnet3_RxFilterConf *rxConf = &hw->shared->devRead.rxFilterConf;
+   uint32_t *vf_table = rxConf->vfTable;
+
+   /* save state for restore */
+   if (on)
+

[dpdk-dev] Manage DPDK port capability via KNI

2015-02-25 Thread Tim Deng

Thanks Danny,
That means DPDK ports have to have dedicated control path other than KNI.


I originally got confused by the statement at 
http://dpdk.org/doc/guides/prog_guide/kernel_nic_interface.html:
"...Allows management of DPDK ports using standard Linux net tools such as 
ethtool, ifconfig and tcpdump."
Thanks,
Tim


At 2015-02-25 13:05:08, "Zhou, Danny"  wrote:
>You can do it but it will not sync with DPDK. In current KNI implementation, 
>the devices'
>I/O address spaces are mapped to both userspace DPDK and kenrelspace KNI, so 
>one
>can control the NIC device independently(using ethtool for KNI and ethdev APIs 
>for DPDK)
>without synchronization.
>
>In theory, KNI should route all device control request from ethtool to DPDK. 
>But unfortunately,
>a short path is adopted at the moment due to DPDK reused lots of legacy kernel 
>codes with BSD license.
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tim Deng
>> Sent: Wednesday, February 25, 2015 9:57 AM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] Manage DPDK port capability via KNI
>> 
>> Hi,
>> 
>> 
>> I am wondering how could we manage a DPDK port offload capabilities,
>> e.g. if we want to disable TSO capability on a DPDK port, is it feasible
>> that we use ethtool to configure a KNI then the config will be sync to a 
>> DPDK port?
>> 
>> 
>> Thanks,
>> Tim

[dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when read

2015-02-25 Thread Dumitrescu, Cristian



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Tuesday, February 24, 2015 8:07 PM
> To: Stephen Hemminger
> Cc: dev at dpdk.org; Stephen Hemminger
> Subject: Re: [dpdk-dev] [PATCH v2 4/7] rte_sched: don't clear statistics when
> read
> 
> 2015-02-24 11:18, Stephen Hemminger:
> > On Mon, 23 Feb 2015 23:51:31 +
> > Thomas Monjalon  wrote:
> >
> > > 2015-02-05 07:43, Neil Horman:
> > > > On Wed, Feb 04, 2015 at 10:13:58PM -0800, Stephen Hemminger wrote:
> > > > > +
> > > > > +/**
> > > > > + * Hierarchical scheduler subport statistics reset
> > > > > + *
> > > > > + * @param port
> > > > > + *   Handle to port scheduler instance
> > > > > + * @param subport_id
> > > > > + *   Subport ID
> > > > > + * @return
> > > > > + *   0 upon success, error code otherwise
> > > > > + */
> > > > > +int
> > > > > +rte_sched_subport_stats_reset(struct rte_sched_port *port,
> > > > > +   uint32_t subport_id);
> > > > > +
> > > > >  /**
> > > > >   * Hierarchical scheduler queue statistics read
> > > > >   *
> > > > > @@ -338,6 +353,20 @@ rte_sched_queue_read_stats(struct
> rte_sched_port *port,
> > > > >   struct rte_sched_queue_stats *stats,
> > > > >   uint16_t *qlen);
> > > > >
> > > > > +/**
> > > > > + * Hierarchical scheduler queue statistics reset
> > > > > + *
> > > > > + * @param port
> > > > > + *   Handle to port scheduler instance
> > > > > + * @param queue_id
> > > > > + *   Queue ID within port scheduler
> > > > > + * @return
> > > > > + *   0 upon success, error code otherwise
> > > > > + */
> > > > > +int
> > > > > +rte_sched_queue_stats_reset(struct rte_sched_port *port,
> > > > > + uint32_t queue_id);
> > > > > +
> > > > Both need to be added to the version map to expose them properly.
> > > > Neil
> > >
> > > Stephen, this patchset is partially acked and could enter in 2.0.0-rc1.
> > > May you send a v3 addressing comments? Or should I break the serie by
> > > applying only some of them? Or postpone the serie to 2.1?
> >
> > I can resend v3. Wasn't clear that a conclusion was reached.
> > IMHO read should not clear.
> 
> Me too. I'm just saying that I cannot apply anything.
> So you have to decide the strategy to adopt for your patches.

How about my latest proposal to have the stats read functions either reset the 
counters or not, based on init-time user configuration? I did not see any reply 
on this.

Maybe you guys missed my reply, I am pasting it below:

"Personally, I think we should avoid proliferating the number of stats 
functions, I would keep a single set of stats read functions, which can clear 
the stats or not, depending on behaviour configured per rte_sched object at 
creation time. Basically, based on the value of configuration parameter struct 
rte_sched_params::clear_stats_on_reset, the stats read functions do clear the 
counters or not. In my opinion, this allows a clean init-time selection of the 
required behaviour, and it also provides backward compatibility. Any issues 
with this approach?"

[dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst

2015-02-25 Thread Hemant Agrawal

From: Hemant Agrawal 

if any buffer is read from the tx_q, MAX_BURST buffers will be allocated and 
attempted to be added to to the alloc_q.
This seems terribly inefficient and it also looks like the alloc_q will quickly 
fill to its maximum capacity. If the system buffers are low in number, it will 
reach "out of memory" situation.

This patch allocates the number of buffers as many dequeued from tx_q.

Signed-off-by: Hemant Agrawal 
---
 lib/librte_kni/rte_kni.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index 4e70fa0..4cf8e30 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -128,7 +128,7 @@ struct rte_kni_memzone_pool {


 static void kni_free_mbufs(struct rte_kni *kni);
-static void kni_allocate_mbufs(struct rte_kni *kni);
+static void kni_allocate_mbufs(struct rte_kni *kni, int num);

 static volatile int kni_fd = -1;
 static struct rte_kni_memzone_pool kni_memzone_pool = {
@@ -575,7 +575,7 @@ rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf 
**mbufs, unsigned num)

/* If buffers removed, allocate mbufs and then put them into alloc_q */
if (ret)
-   kni_allocate_mbufs(kni);
+   kni_allocate_mbufs(kni, ret);

return ret;
 }
@@ -594,7 +594,7 @@ kni_free_mbufs(struct rte_kni *kni)
 }

 static void
-kni_allocate_mbufs(struct rte_kni *kni)
+kni_allocate_mbufs(struct rte_kni *kni, int num)
 {
int i, ret;
struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];
@@ -620,7 +620,10 @@ kni_allocate_mbufs(struct rte_kni *kni)
return;
}

-   for (i = 0; i < MAX_MBUF_BURST_NUM; i++) {
+   if (num == 0 || num > MAX_MBUF_BURST_NUM)
+   num = MAX_MBUF_BURST_NUM;
+
+   for (i = 0; i < num; i++) {
pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
if (unlikely(pkts[i] == NULL)) {
/* Out of memory */
@@ -636,7 +639,7 @@ kni_allocate_mbufs(struct rte_kni *kni)
ret = kni_fifo_put(kni->alloc_q, (void **)pkts, i);

/* Check if any mbufs not put into alloc_q, and then free them */
-   if (ret >= 0 && ret < i && ret < MAX_MBUF_BURST_NUM) {
+   if (ret >= 0 && ret < i && ret < num) {
int j;

for (j = ret; j < i; j++)
-- 
1.9.1

[dpdk-dev] ixgbe vector mode not working.

2015-02-25 Thread Stephen Hemminger

On Wed, 25 Feb 2015 08:49:48 +
"Liang, Cunming"  wrote:

> Hi Stephen,
> 
> Thanks for the info, with rxd=4000, I can reproduce it.
> On that time, it runs out of mbuf.
> I'll follow up this issue.

The first time I ran it, the code was configure rx/tx conf
which was leftover from older versions.

Second time I ran it and the same hang happened.
Looking at mbuf pool statistics I see that it gets exhausted,
even when extra mbuf's are added to the pool.

Looks like a memory leak.

[dpdk-dev] [PATCH v5 5/6] eal: add per rx queue interrupt handling based on VFIO

2015-02-25 Thread David Marchand

On Wed, Feb 25, 2015 at 4:29 PM, Zhou, Danny  wrote:

>
>
>
>
> *From:* David Marchand [mailto:david.marchand at 6wind.com]
> *Sent:* Wednesday, February 25, 2015 6:22 PM
> *To:* Zhou, Danny
> *Cc:* dev at dpdk.org; Liang, Cunming
> *Subject:* Re: [dpdk-dev] [PATCH v5 5/6] eal: add per rx queue interrupt
> handling based on VFIO
>
>
>
>
>
> DZ: To avoid recreating the epoll instance for each queue, the struct
> rte_intr_handle(or a new structure added to ethdev)
>
> should be extended by adding fields storing per-queue pfd. This way, it
> could reduce user/kernel context  switch overhead
>
> when calling epoll_create() each time.
>
>
>
> Sounds good?
>
>
>
> You don't need a epfd per queue. And hardcoding epfd == eventfd will give
> a not very usable api.
>
>
>
> Plus, epoll is something linux-specific, so you can't move it out of
> eal/linux.
>
> I suppose you need an abstraction here (and in the future we could add
> something for bsd ?).
>
>
>
> DZ: libev provides abstraction layer which is a good candidate to
> integrate, rather than
>
> reinventing one I think. The BSD support can be implemented in the files
> under
>
> lib\librte_eal\bsdapp\eal folder by calling BSD specific APIs. Maybe it is
> a good idea to introduce
>
> a separated component like OS Adaption Layer into EAL in the future once
> DPDK is widely adopted in
>
> BSD as well as Windows, then all DPDK components invoke Linux specific
> APIs could instead calling abstraction APIs.
>
>
>
> Adding an abstraction here specifically for epoll does not resolve all the
> porting/migration problem in my mind.
>

Yes, reusing this kind of library (or libevent) looks like a good idea.

Hum, I would say eal/common is there for the common part and for the
different abstractions.
Do you see anything that would not fit in ?



>  eventfds creation can not be handled by ethdev, since it needs
> infrastructure and informations from within the eal/linux.
>
> Again, do we need an abstraction ?
>
>
>
> ethdev must be the one that does the mappings between port/queue and
> eventfds (or any object that represents a way to wake up for a given
> port/queue).
>
>
>
> DZ: agreed after revisiting code. Let us follow your direction to create a
> ethdev API, similar to
> rte_eth_dev_rx_queue_intr_enable()/rte_eth_dev_rx_queue_intr_disable(), to
> use portiid and queueid as arguments. Then this ethdev API uses the mapped
> eventfds to invoke corresponding EAL API, waiting for interrupt event
> notification from kernel.  A V6 patchset will be created for this.
>

Ok, I will look at it when available.


-- 
David Marchand

[dpdk-dev] [PATCH v11 2/2] librte_pmd_null: Support port hotplug function

2015-02-25 Thread Stephen Hemminger

Build fails if HOTPLUG is disabled

== Build lib/librte_ether
  CC rte_ethdev.o
/var/src/dpdk/lib/librte_ether/rte_ethdev.c:430:1: error: 
?rte_eth_dev_get_device_type? defined but not used [-Werror=unused-function]
 rte_eth_dev_get_device_type(uint8_t port_id)
 ^
/var/src/dpdk/lib/librte_ether/rte_ethdev.c:438:1: error: ?rte_eth_dev_save? 
defined but not used [-Werror=unused-function]
 rte_eth_dev_save(struct rte_eth_dev *devs, size_t size)
 ^
/var/src/dpdk/lib/librte_ether/rte_ethdev.c:450:1: error: 
?rte_eth_dev_get_changed_port? defined but not used [-Werror=unused-function]
 rte_eth_dev_get_changed_port(struct rte_eth_dev *devs, uint8_t *port_id)
 ^
/var/src/dpdk/lib/librte_ether/rte_ethdev.c:464:1: error: 
?rte_eth_dev_get_addr_by_port? defined but not used [-Werror=unused-function]
 rte_eth_dev_get_addr_by_port(uint8_t port_id, struct rte_pci_addr *addr)
 ^
/var/src/dpdk/lib/librte_ether/rte_ethdev.c:481:1: error: 
?rte_eth_dev_get_name_by_port? defined but not used [-Werror=unused-function]
 rte_eth_dev_get_name_by_port(uint8_t port_id, char *name)
 ^
/var/src/dpdk/lib/librte_ether/rte_ethdev.c:503:1: error: 
?rte_eth_dev_is_detachable? defined but not used [-Werror=unused-function]
 rte_eth_dev_is_detachable(uint8_t port_id)
 ^
cc1: all warnings being treated as errors

[dpdk-dev] [PATCH v5 5/6] eal: add per rx queue interrupt handling based on VFIO

2015-02-25 Thread Thomas Monjalon

Please Danny, click on the button "uninstall Outlook"
or configure it to have quote marks.
This email is really hard to read.

2015-02-25 15:29, Zhou, Danny:
> From: David Marchand [mailto:david.marchand at 6wind.com]
> Sent: Wednesday, February 25, 2015 6:22 PM
> To: Zhou, Danny
> Cc: dev at dpdk.org; Liang, Cunming
> Subject: Re: [dpdk-dev] [PATCH v5 5/6] eal: add per rx queue interrupt 
> handling based on VFIO
> 
> Hello Danny,
> 
> On Wed, Feb 25, 2015 at 7:58 AM, Zhou, Danny  intel.com> wrote:
> 
> +int
> +rte_intr_wait_rx_pkt(struct rte_intr_handle *intr_handle, uint8_t queue_id)
> +{
> +   struct epoll_event ev;
> +   unsigned numfds = 0;
> +
> +   if (!intr_handle || intr_handle->fd < 0 || intr_handle->uio_cfg_fd < 
> 0)
> +   return -1;
> +   if (queue_id >= VFIO_MAX_QUEUE_ID)
> +   return -1;
> +
> +   /* create epoll fd */
> +   int pfd = epoll_create(1);
> +   if (pfd < 0) {
> +   RTE_LOG(ERR, EAL, "Cannot create epoll instance\n");
> +   return -1;
> +   }
> 
> Why recreate the epoll instance at each call to this function ?
> 
> DZ: To avoid recreating the epoll instance for each queue, the struct 
> rte_intr_handle(or a new structure added to ethdev)
> should be extended by adding fields storing per-queue pfd. This way, it could 
> reduce user/kernel context  switch overhead
> when calling epoll_create() each time.
> 
> Sounds good?
> 
> You don't need a epfd per queue. And hardcoding epfd == eventfd will give a 
> not very usable api.
> 
> Plus, epoll is something linux-specific, so you can't move it out of 
> eal/linux.
> I suppose you need an abstraction here (and in the future we could add 
> something for bsd ?).
> 
> DZ: libev provides abstraction layer which is a good candidate to integrate, 
> rather than
> reinventing one I think. The BSD support can be implemented in the files under
> lib\librte_eal\bsdapp\eal folder by calling BSD specific APIs. Maybe it is a 
> good idea to introduce
> a separated component like OS Adaption Layer into EAL in the future once DPDK 
> is widely adopted in
> BSD as well as Windows, then all DPDK components invoke Linux specific APIs 
> could instead calling abstraction APIs.

EAL means Environment Abstraction Layer.
In my mind, OS is part of the environment.
DPDK components don't invoke Linux specific APIs, they use EAL!
What are you thinking about?

> Adding an abstraction here specifically for epoll does not resolve all the 
> porting/migration problem in my mind.

[dpdk-dev] [PATCH v2] ixgbe: fix build with gcc 5

2015-02-25 Thread Thomas Monjalon

> > gcc 5 supports a new logical-not-parentheses warning which
> > ixgbe_common.c triggers, causing build failure with -Werror.
> > Since this source must not be modified, silence the warning instead.
> > 
> > Signed-off-by: Panu Matilainen 
> 
> Acked-by: Konstantin Ananyev 

Applied, thanks

[dpdk-dev] [PATCH v3] app/test: add crc32 algorithms equivalence check

2015-02-25 Thread Thomas Monjalon

> > New function test_crc32_hash_alg_equiv() checks whether software,
> > 4-byte operand and 8-byte operand versions of CRC32 hash function
> > implementations return the same result value.
> > 
> > Signed-off-by: Yerden Zhumabekov 
> 
> Acked-by: Bruce Richardson 

Applied, thanks

[dpdk-dev] [PATCH v3 0/3] Mellanox ConnectX-3 PMD

2015-02-25 Thread Thomas Monjalon

> This PMD adds support for Mellanox ConnectX-3-based adapters through the
> verbs framework. It relies on external libraries (libibverbs and user space
> driver libmlx4) and kernel support to do so.
> 
> While these libraries and kernel modules are available on OpenFabrics
> Alliance's website [1] and provided by package managers on most
> distributions, this PMD requires Ethernet extensions that may not be
> supported at the moment (this is a work in progress).
> 
> Mellanox OFED [2] includes the necessary support and should be used in the
> meantime. For DPDK, only libibverbs, libmlx4 and mlnx-ofed-kernel packages
> are required from that distribution.
> 
> The following kernel modules must be loaded before using this PMD:
> 
> - mlx4_core (hardware driver, does global initialization)
> - mlx4_en (Ethernet device driver)
> - mlx4_ib (InfiniBand device driver)
> - ib_uverbs (user space driver for verbs)
> 
> [1] https://www.openfabrics.org/
> [2] 
> http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers
> 
> v2:
>  - Include minor bugfix for VLAN filtering.
>  - Add maintainers entry.
>  - Add documentation.
> 
> v3:
>  - Add script and documentation to MAINTAINERS.
>  - Make cosmetic changes to copyright notices.
>  - Remove unwanted executable bits.
>  - Fix coding style and typos found by checkpatch.
>  - Add shared library compilation support.
> 
> Adrien Mazarguil (3):
>   scripts: check features to generate configuration header
>   mlx4: new poll mode driver
>   doc: add librte_pmd_mlx4 documentation

Applied, thanks

Documentation should be moved in a NICs guide (work in progress).

[dpdk-dev] Looking forward to DPDK 2.1

2015-02-25 Thread Thomas Monjalon

Hi Siobhan,

Thanks for contributing to the roadmap, the page http://dpdk.org/dev/roadmap
will be updated soon.

2015-02-25 13:39, Butler, Siobhan A:
> Hi all,
> 
> The progress on DPDK 2.0 has been really positive and thanks to everyone for 
> contributing and helping to grow our community. We now look onwards to DPDK 
> 2.1 planning which is due to release at the end of July, and we'd like to 
> inform the community of the features that we hope to submit to that release. 
> The current list of features, along with brief descriptions, is included 
> below.
> 
> This list is provisional and will naturally change over the lifecycle of the 
> release, and should be taken as guidance on what we hope to submit, not a 
> commitment.
> 
> Our aim in providing this information now is to solicit input from the 
> community. We'd like to make sure we avoid duplication or conflicts with work 
> that others are planning, so please feel free to let the community know of 
> any plans that you have for contributions to DPDK in this timeframe. This 
> will allow us to build a complete picture and ensure we avoid duplication of 
> effort. We have seen great community collaboration in DPDK 2.0 and hope that 
> this will increase in 2.1.
> 
> I'm sure people will have questions, and will be looking for more information 
> on these features. Further details will be provided by the individual 
> developers over the next few months. We aim to provide early outlines of the 
> features so that we can obtain community feedback as soon as possible. In 
> addition, community calls can be arranged to discuss features as required.
> 
> 
> 2.1 (Q2 2015) DPDK Features:
> 
> * Cuckoo hash - Provide a new hash library based on the cuckoo hashing scheme 
> (see http://www.cs.cmu.edu/~dongz/papers/cuckooswitch.pdf), which shall 
> guarantee worst-case constant lookup time with a better memory utilization, 
> compared to the current implementation.
> 
> * IEEE1588 Support for i40e - Support IEEE1588 Standard (PTP) for i40e   
> Ethernet Controller
> 
> * Continued development of PCI Hot Plug - Add support for PCI Hotplug 
> Framework in librte_pmd drivers (librte_pmd_ixgbe, librte_pmd_bond, 
> librte_pmd_e1000, librte_pmd_i40e, librte_pmd_virtio, librte_pmd_vmxnet3)
> 
> * Packet Framework Enhancements - Enhancements to Packet Framework Port and 
> Table Libraries, as well as IP Pipeline  Application, to include additional 
> statistics, better pipeline encapsulation and CLI simplification.
> 
> * i40e DCB (ETS only) - Support DCB Enhanced Transmission Selection algorithm 
> with i40e Ethernet controller.
> 
> * i40e Mirroring Rule - Add support for port mirroring using i40e Ethernet 
> Controller.
> 
> * Additional FM10K Features - Add support for additional usage models for 
> FM10K including: promiscuous mode, mac vlan filter, statistics, vlan offload 
> (strip, insertion, dual), flow control, Tx offload (checksum).
> 
> * Dynamic Configuration of RSS on Bonded Slave devices - Support dynamic 
> queues assignment for RX packets. Implementation for a bonding device will 
> require multiple RX queues support on a bonding slave and its dynamic
> reconfiguration.
> 
> * VXLAN Offload Sample Application - Provide a sample application to 
> demonstrate the usage of VXLAN overlay encapsulation protocol in DPDK.
> 
> * Dynamic Memory Management - Add DPDK API's (Rte_free_unused_pages, 
> rte_attach_pages, rte_detach_pages, rte_lazy_allocation) for Dynamic Memory 
> Management for NFV use cases.
> 
> 
> Thanks,
> Siobhan Butler
>

[dpdk-dev] [PATCH v5 5/6] eal: add per rx queue interrupt handling based on VFIO

2015-02-25 Thread Zhou, Danny

From: David Marchand [mailto:david.march...@6wind.com]
Sent: Wednesday, February 25, 2015 6:22 PM
To: Zhou, Danny
Cc: dev at dpdk.org; Liang, Cunming
Subject: Re: [dpdk-dev] [PATCH v5 5/6] eal: add per rx queue interrupt handling 
based on VFIO

Hello Danny,

On Wed, Feb 25, 2015 at 7:58 AM, Zhou, Danny mailto:danny.zhou at intel.com>> wrote:

+int
+rte_intr_wait_rx_pkt(struct rte_intr_handle *intr_handle, uint8_t queue_id)
+{
+   struct epoll_event ev;
+   unsigned numfds = 0;
+
+   if (!intr_handle || intr_handle->fd < 0 || intr_handle->uio_cfg_fd < 0)
+   return -1;
+   if (queue_id >= VFIO_MAX_QUEUE_ID)
+   return -1;
+
+   /* create epoll fd */
+   int pfd = epoll_create(1);
+   if (pfd < 0) {
+   RTE_LOG(ERR, EAL, "Cannot create epoll instance\n");
+   return -1;
+   }

Why recreate the epoll instance at each call to this function ?

DZ: To avoid recreating the epoll instance for each queue, the struct 
rte_intr_handle(or a new structure added to ethdev)
should be extended by adding fields storing per-queue pfd. This way, it could 
reduce user/kernel context  switch overhead
when calling epoll_create() each time.

Sounds good?

You don't need a epfd per queue. And hardcoding epfd == eventfd will give a not 
very usable api.

Plus, epoll is something linux-specific, so you can't move it out of eal/linux.
I suppose you need an abstraction here (and in the future we could add 
something for bsd ?).

DZ: libev provides abstraction layer which is a good candidate to integrate, 
rather than
reinventing one I think. The BSD support can be implemented in the files under
lib\librte_eal\bsdapp\eal folder by calling BSD specific APIs. Maybe it is a 
good idea to introduce
a separated component like OS Adaption Layer into EAL in the future once DPDK 
is widely adopted in
BSD as well as Windows, then all DPDK components invoke Linux specific APIs 
could instead calling abstraction APIs.

Adding an abstraction here specifically for epoll does not resolve all the 
porting/migration problem in my mind.

Looking at this patchset, I think there is a design issue.
eal does not need to know about portid neither queueid.

eal can provide an api to retrieve the interrupt fds, configure an epoll 
instance, wait on an epoll instance etc...
ethdev is then responsible to setup the mapping between port id / queue id and 
interrupt fds by asking the eal about those fds.

This would result in an eal api even simpler and we could add other fds in a 
single epoll fd for other uses.

DZ: The queueid is just an index to the queue related eventfd array stored in 
EAL. If this array is still in the EAL and ethdev can apply for it and setup 
mapping for certain queue, there
might be issue for multiple-process use case where the fd resources allocated 
for secondary process are not freed if the secondary process exits unexpectedly.

Not sure I follow you.
If a secondary process exits, the eventfds created in primary process should 
still be valid and reusable.
Why would you need to free them ? Something to do with vfio ?

DZ: See below.

Probably we can setup the eventfd array inside ethdev,  and we just need EAL 
API to wait for ethdev?fd. So application invokes ethdev API with portid and 
queueid, and ethdev calls eal
API to wait on a ethdev fd which correlates with the specified portid and 
queueid.

Sounds ok to you?

eventfds creation can not be handled by ethdev, since it needs infrastructure 
and informations from within the eal/linux.
Again, do we need an abstraction ?

ethdev must be the one that does the mappings between port/queue and eventfds 
(or any object that represents a way to wake up for a given port/queue).

DZ: agreed after revisiting code. Let us follow your direction to create a 
ethdev API, similar to 
rte_eth_dev_rx_queue_intr_enable()/rte_eth_dev_rx_queue_intr_disable(), to use 
portiid and queueid as arguments. Then this ethdev API uses the mapped eventfds 
to invoke corresponding EAL API, waiting for interrupt event notification from 
kernel.  A V6 patchset will be created for this.

--
David Marchand

[dpdk-dev] [PATCH v14 12/13] eal/pci: Add rte_eal_dev_attach/detach() functions

2015-02-25 Thread Thomas Monjalon

2015-02-25 21:32, Tetsuya Mukawa:
> 2015-02-25 20:21 GMT+09:00 Thomas Monjalon :
> > 2015-02-25 13:04, Tetsuya Mukawa:
> >> --- a/lib/librte_eal/common/eal_common_dev.c
> >> +++ b/lib/librte_eal/common/eal_common_dev.c
> >> @@ -32,10 +32,13 @@
> >>   *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> >>   */
> >>
> >> +#include 
> >> +#include 
> >>  #include 
> >>  #include 
> >>  #include 
> >>
> >> +#include 
> >>  #include 
> >>  #include 
> >
> > No, you must not include ethdev in EAL.
> > The ethdev layer is by design on top of EAL.
> > Maxime already asked why you did it. He was implicitly asking to remove it.
> > You said that you are calling ethdev_is_detachable() but you should
> > call a function eal_is_detachable() or something like that.
> > The detachable state must be only device-related, i.e. in EAL.
> > The ethdev API is only a wrapper (with port id) in such case.
> >
> 
> Hi Thomas,
> 
> If ethdev library is on top of EAL, hotplug functions like
> rte_eal_dev_attach/detach should be implemented in ethdev library.
> Is it right?

Yes you're right.

> If so, I will move rte_eal_dev_attach/detach to ethdev library.
> And I will change names like rte_eth_dev_attach/detach.

It seems to be the right thing to do.

> Also, I will add "rte_dev.h" and "rte_pci.h" in rte_ethdev.h, and call
> below EAL functions from ethdev library.
> 
> - For virtual device initialization and finalization
> -- rte_eth_vdev_init
> -- rte_eth_vdev_uninit()
> - For physical NIC initialization and finalization
> -- rte_eal_pci_probe_one()
> -- rte_eal_pci_close_one()
> 
> I guess this will fix this design violation.
> Is this ok?

I think yes.
If needed, we could do some cleanup after RC1.
I'm just waiting for you fixing this, to avoid introducing
a layering violation.
Would you able to do it today?

Thanks

> >> --- a/lib/librte_eal/linuxapp/eal/Makefile
> >> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> >> @@ -45,6 +45,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common/include
> >>  CFLAGS += -I$(RTE_SDK)/lib/librte_ring
> >>  CFLAGS += -I$(RTE_SDK)/lib/librte_mempool
> >>  CFLAGS += -I$(RTE_SDK)/lib/librte_malloc
> >> +CFLAGS += -I$(RTE_SDK)/lib/librte_mbuf
> >
> > By removing ethdev dependency, you can remove this ugly mbuf dependency.
> >
> > Thanks Tetsuya
> >

[dpdk-dev] [PATCH v3 3/3] doc: add librte_pmd_mlx4 documentation

2015-02-25 Thread Adrien Mazarguil

This documentation covers implementation details, features and limitations,
configuration, prerequisites and provides a usage example.

Signed-off-by: Adrien Mazarguil 
---
 MAINTAINERS  |   1 +
 doc/guides/prog_guide/index.rst  |   1 +
 doc/guides/prog_guide/mlx4_poll_mode_drv.rst | 326 +++
 doc/guides/prog_guide/source_org.rst |   1 +
 4 files changed, 329 insertions(+)
 create mode 100644 doc/guides/prog_guide/mlx4_poll_mode_drv.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index d8b0fbc..ac61825 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -223,6 +223,7 @@ F: lib/librte_pmd_fm10k/
 Mellanox mlx4
 M: Adrien Mazarguil 
 F: lib/librte_pmd_mlx4/
+F: doc/guides/prog_guide/mlx4_poll_mode_drv.rst

 RedHat virtio
 M: Changchun Ouyang 
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index de69682..87f6b35 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -56,6 +56,7 @@ Programmer's Guide
 intel_dpdk_xen_based_packet_switch_sol
 libpcap_ring_based_poll_mode_drv
 link_bonding_poll_mode_drv_lib
+mlx4_poll_mode_drv
 timer_lib
 hash_lib
 lpm_lib
diff --git a/doc/guides/prog_guide/mlx4_poll_mode_drv.rst 
b/doc/guides/prog_guide/mlx4_poll_mode_drv.rst
new file mode 100644
index 000..35570c3
--- /dev/null
+++ b/doc/guides/prog_guide/mlx4_poll_mode_drv.rst
@@ -0,0 +1,326 @@
+..  BSD LICENSE
+Copyright 2012-2015 6WIND S.A.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of 6WIND S.A. nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+MLX4 poll mode driver library
+=
+
+The MLX4 poll mode driver library (**librte_pmd_mlx4**) implements support
+for **Mellanox ConnectX-3** 10/40 Gbps adapters (EN 40, EN 10, Pro EN 40) as
+well as their virtual functions (VF) in SR-IOV context.
+
+.. note::
+
+   Due to external dependencies, this driver is disabled by default. It must
+   be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX4_PMD=y`` and
+   recompiling DPDK.
+
+Implementation details
+--
+
+Most Mellanox ConnectX-3 devices provide two ports but expose a single PCI
+bus address, thus unlike most drivers, librte_pmd_mlx4 registers itself as a
+PCI driver that allocates one Ethernet device per detected port.
+
+For this reason, one cannot white/blacklist a single port without also
+white/blacklisting the others on the same device.
+
+Besides its dependency on libibverbs (that implies libmlx4 and associated
+kernel support), librte_pmd_mlx4 relies heavily on system calls for control
+operations such as querying/updating the MTU and flow control parameters.
+
+For security reasons and robustness, this driver only deals with virtual
+memory addresses. The way resources allocations are handled by the kernel
+combined with hardware specifications that allow it to handle virtual memory
+addresses directly ensure that DPDK applications cannot access random
+physical memory (or memory that does not belong to the current process).
+
+This capability allows the PMD to coexist with kernel network interfaces
+which remain functional, although they stop receiving unicast packets as
+long as they share the same MAC address.
+
+Compiling librte_pmd_mlx4 causes DPDK to be linked against libibverbs.
+
+Features and limitations
+
+
+- RSS, also known as RCA, is supported. In this mode the number of
+  configured RX queues must be a power of two.
+- VLAN filtering is supported.
+- Link state information

[dpdk-dev] [PATCH v3 2/3] mlx4: new poll mode driver

2015-02-25 Thread Adrien Mazarguil

This PMD manages all variants of Mellanox ConnectX-3 (EN 40, EN 10, Pro EN
40) as well as their virtual functions in SR-IOV context through IB Verbs
(libibverbs) and the dedicated user-space driver (libmlx4).

It is disabled by default due to dependencies on these libraries and only
supports Linux userland at the moment partly because /sys (sysfs) support is
required.

Also claim responsibility in the MAINTAINERS file.

Signed-off-by: Adrien Mazarguil 
Signed-off-by: Olga Shern 
---
 MAINTAINERS  |4 +
 config/common_bsdapp |   11 +
 config/common_linuxapp   |   11 +
 lib/Makefile |1 +
 lib/librte_pmd_mlx4/Makefile |  121 +
 lib/librte_pmd_mlx4/mlx4.c   | 4749 ++
 lib/librte_pmd_mlx4/mlx4.h   |  165 +
 lib/librte_pmd_mlx4/rte_pmd_mlx4_version.map |4 +
 mk/rte.app.mk|8 +
 9 files changed, 5074 insertions(+)
 create mode 100644 lib/librte_pmd_mlx4/Makefile
 create mode 100644 lib/librte_pmd_mlx4/mlx4.c
 create mode 100644 lib/librte_pmd_mlx4/mlx4.h
 create mode 100644 lib/librte_pmd_mlx4/rte_pmd_mlx4_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 631e8ea..d8b0fbc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -220,6 +220,10 @@ Intel fm10k
 M: Jing Chen 
 F: lib/librte_pmd_fm10k/

+Mellanox mlx4
+M: Adrien Mazarguil 
+F: lib/librte_pmd_mlx4/
+
 RedHat virtio
 M: Changchun Ouyang 
 F: lib/librte_pmd_virtio/
diff --git a/config/common_bsdapp b/config/common_bsdapp
index 83a62a6..4bbacaf 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -194,6 +194,17 @@ CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y

 #
+# Compile burst-oriented Mellanox ConnectX-3 (MLX4) PMD
+#
+CONFIG_RTE_LIBRTE_MLX4_PMD=n
+CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
+CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N=4
+CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE=0
+CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
+CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
+CONFIG_RTE_LIBRTE_MLX4_COMPAT_VMWARE=1
+
+#
 # Compile burst-oriented Cisco ENIC PMD driver
 #
 CONFIG_RTE_LIBRTE_ENIC_PMD=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 2716381..2ea6711 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -192,6 +192,17 @@ CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y

 #
+# Compile burst-oriented Mellanox ConnectX-3 (MLX4) PMD
+#
+CONFIG_RTE_LIBRTE_MLX4_PMD=n
+CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
+CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N=4
+CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE=0
+CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
+CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
+CONFIG_RTE_LIBRTE_MLX4_COMPAT_VMWARE=1
+
+#
 # Compile burst-oriented Cisco ENIC PMD driver
 #
 CONFIG_RTE_LIBRTE_ENIC_PMD=y
diff --git a/lib/Makefile b/lib/Makefile
index 7dc12af..3ebd394 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -45,6 +45,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_E1000_PMD) += librte_pmd_e1000
 DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += librte_pmd_ixgbe
 DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += librte_pmd_i40e
 DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += librte_pmd_fm10k
+DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += librte_pmd_mlx4
 DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += librte_pmd_enic
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += librte_pmd_bond
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) += librte_pmd_ring
diff --git a/lib/librte_pmd_mlx4/Makefile b/lib/librte_pmd_mlx4/Makefile
new file mode 100644
index 000..de50a5a
--- /dev/null
+++ b/lib/librte_pmd_mlx4/Makefile
@@ -0,0 +1,121 @@
+#   BSD LICENSE
+#
+#   Copyright 2012-2015 6WIND S.A.
+#   Copyright 2012 Mellanox.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of 6WIND S.A. nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; O

[dpdk-dev] [PATCH v3 1/3] scripts: check features to generate configuration header

2015-02-25 Thread Adrien Mazarguil

This script looks for types, macros and functions in header files using
compilation options found in the environment (CC, CFLAGS, CPPFLAGS) to
define feature macros in a generated header.

Useful in combination with external headers that do not provide such macros.

Signed-off-by: Adrien Mazarguil 
---
 MAINTAINERS  |   1 +
 scripts/auto-config-h.sh | 136 +++
 2 files changed, 137 insertions(+)
 create mode 100755 scripts/auto-config-h.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 349ad2b..631e8ea 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -45,6 +45,7 @@ F: Makefile
 F: config/
 F: mk/
 F: pkg/
+F: scripts/auto-config-h.sh
 F: scripts/depdirs-rule.sh
 F: scripts/gen-build-mk.sh
 F: scripts/gen-config-h.sh
diff --git a/scripts/auto-config-h.sh b/scripts/auto-config-h.sh
new file mode 100755
index 000..4356d7e
--- /dev/null
+++ b/scripts/auto-config-h.sh
@@ -0,0 +1,136 @@
+#!/bin/sh
+#
+#   BSD LICENSE
+#
+#   Copyright 2014-2015 6WIND S.A.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of 6WIND S.A. nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+#
+# Crude script to detect whether particular types, macros and functions are
+# defined by trying to compile a file with a given header. Can be used to
+# perform cross-platform checks since the resulting object file is not
+# executed.
+#
+# Set VERBOSE=1 in the environment to display compiler output and errors.
+#
+# CC, CPPFLAGS, CFLAGS, EXTRA_CPPFLAGS and EXTRA_CFLAGS are taken from the
+# environment.
+#
+# AUTO_CONFIG_CFLAGS may append additional CFLAGS without modifying the
+# above variables.
+
+file=${1:?output file name required (config.h)}
+macro=${2:?output macro name required (HAVE_*)}
+include=${3:?include name required (foo.h)}
+type=${4:?object type required (define, enum, type, field, func)}
+name=${5:?define/type/function name required}
+
+: ${CC:=cc}
+
+temp=/tmp/${0##*/}.$$.c
+
+case $type in
+define)
+   code="\
+#ifndef $name
+#error $name not defined
+#endif
+"
+   ;;
+enum)
+   code="\
+long test = $name;
+"
+   ;;
+type)
+   code="\
+$name test;
+"
+   ;;
+field)
+   code="\
+void test(void)
+{
+   ${name%%.*} test_;
+
+   (void)test_.${name#*.};
+}
+"
+   ;;
+func)
+   code="\
+void (*test)() = (void (*)())$name;
+"
+   ;;
+*)
+   unset error
+   : ${error:?unknown object type \"$type\"}
+   exit
+esac
+
+if [ "${VERBOSE}" = 1 ]
+then
+   err=2
+   out=1
+   eol='
+'
+else
+   exec 3> /dev/null ||
+   exit
+   err=3
+   out=3
+   eol=' '
+fi &&
+printf 'Looking for %s %s in %s.%s' \
+   "${name}" "${type}" "${include}" "${eol}" &&
+printf "\
+#include <%s>
+
+%s
+" "$include" "$code" > "${temp}" &&
+if ${CC} ${CPPFLAGS} ${EXTRA_CPPFLAGS} ${CFLAGS} ${EXTRA_CFLAGS} \
+   ${AUTO_CONFIG_CFLAGS} \
+   -c -o /dev/null "${temp}" 1>&${out} 2>&${err}
+then
+   rm -f "${temp}"
+   printf "\
+#ifndef %s
+#define %s 1
+#endif /* %s */
+
+" "${macro}" "${macro}" "${macro}" >> "${file}" &&
+   printf 'Defining %s.\n' "${macro}"
+else
+   rm -f "${temp}"
+   printf "\
+/* %s is not defined. */
+
+" "${macro}" >> "${file}" &&
+   printf 'Not defining %s.\n' "${macro}"
+fi
+
+exit
-- 
2.1.0

[dpdk-dev] [PATCH v3 0/3] Mellanox ConnectX-3 PMD

2015-02-25 Thread Adrien Mazarguil

This PMD adds support for Mellanox ConnectX-3-based adapters through the
verbs framework. It relies on external libraries (libibverbs and user space
driver libmlx4) and kernel support to do so.

While these libraries and kernel modules are available on OpenFabrics
Alliance's website [1] and provided by package managers on most
distributions, this PMD requires Ethernet extensions that may not be
supported at the moment (this is a work in progress).

Mellanox OFED [2] includes the necessary support and should be used in the
meantime. For DPDK, only libibverbs, libmlx4 and mlnx-ofed-kernel packages
are required from that distribution.

The following kernel modules must be loaded before using this PMD:

- mlx4_core (hardware driver, does global initialization)
- mlx4_en (Ethernet device driver)
- mlx4_ib (InfiniBand device driver)
- ib_uverbs (user space driver for verbs)

[1] https://www.openfabrics.org/
[2] 
http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

v2:
 - Include minor bugfix for VLAN filtering.
 - Add maintainers entry.
 - Add documentation.

v3:
 - Add script and documentation to MAINTAINERS.
 - Make cosmetic changes to copyright notices.
 - Remove unwanted executable bits.
 - Fix coding style and typos found by checkpatch.
 - Add shared library compilation support.

Adrien Mazarguil (3):
  scripts: check features to generate configuration header
  mlx4: new poll mode driver
  doc: add librte_pmd_mlx4 documentation

 MAINTAINERS  |6 +
 config/common_bsdapp |   11 +
 config/common_linuxapp   |   11 +
 doc/guides/prog_guide/index.rst  |1 +
 doc/guides/prog_guide/mlx4_poll_mode_drv.rst |  326 ++
 doc/guides/prog_guide/source_org.rst |1 +
 lib/Makefile |1 +
 lib/librte_pmd_mlx4/Makefile |  121 +
 lib/librte_pmd_mlx4/mlx4.c   | 4749 ++
 lib/librte_pmd_mlx4/mlx4.h   |  165 +
 lib/librte_pmd_mlx4/rte_pmd_mlx4_version.map |4 +
 mk/rte.app.mk|8 +
 scripts/auto-config-h.sh |  136 +
 13 files changed, 5540 insertions(+)
 create mode 100644 doc/guides/prog_guide/mlx4_poll_mode_drv.rst
 create mode 100644 lib/librte_pmd_mlx4/Makefile
 create mode 100644 lib/librte_pmd_mlx4/mlx4.c
 create mode 100644 lib/librte_pmd_mlx4/mlx4.h
 create mode 100644 lib/librte_pmd_mlx4/rte_pmd_mlx4_version.map
 create mode 100755 scripts/auto-config-h.sh

-- 
2.1.0

[dpdk-dev] [PATCH v1 0/2] eal: fix symbol missing in version map

2015-02-25 Thread Thomas Monjalon

2015-02-25 07:30, Neil Horman:
> On Wed, Feb 25, 2015 at 11:39:47AM +0800, Cunming Liang wrote:
> > These two patches are the fixing for the compling error when 
> > CONFIG_RTE_BUILD_SHARED_LIB=y.
> > The root cause is *per_lcore__socket_id* and *rte_sys_gettid* are missing 
> > in the version map.
> > Thanks for the notification from Tetsuya Mukawa . 
> > 
> > Cunming Liang (2):
> >   eal/linux: fix symbol missing in version map
> >   eal/bsd: fix symbol missing in version map
> > 
> >  lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 2 ++
> >  lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 ++
> >  2 files changed, 4 insertions(+)
> > 
> 
> NAK
> 
> This is the wrong way to fix this problem. Exporting global variables is
> never a good solution when it can be helped.  Instead, rte_socket id should be
> made a non inline function and exported.  Then the definition of
> per_lcore_socket_id can be made private, protecting it from type changes.

Neil, I applied the patches to fix compilation on HEAD.
In case your comment makes sense, a cleanup would be appreciated.

Thanks

[dpdk-dev] [PATCH] headers: typeof -> typeof to unbreak C++11 code

2015-02-25 Thread Simon Kagstrom

When compiling C++11-code or above (--std=c++11), the build fails with
lots of

  rte_eth_ctrl.h:517:3: note: in expansion of macro RTE_ALIGN
(RTE_ALIGN(RTE_ETH_FLOW_MAX, UINT32_BIT)/UINT32_BIT)
^

When reading the GCC info pages, I get the feeling that __typeof__ is
a better choice, and that indeed works when including the headers in
C++ files (--std=c++11).

There are some typeof()s left in C files, the patch only touches the
public API.

Signed-off-by: Simon Kagstrom 
---
 lib/librte_acl/acl_vect.h  |  8 
 lib/librte_eal/common/include/rte_common.h | 17 +
 lib/librte_eal/common/include/rte_pci.h|  2 +-
 3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/lib/librte_acl/acl_vect.h b/lib/librte_acl/acl_vect.h
index 6cc1999..de47071 100644
--- a/lib/librte_acl/acl_vect.h
+++ b/lib/librte_acl/acl_vect.h
@@ -52,8 +52,8 @@ extern "C" {
  * hi - contains high 32 bits of given N transitions.
  */
 #defineACL_TR_HILO(P, TC, tr0, tr1, lo, hi)do 
{ \
-   lo = (typeof(lo))_##P##_shuffle_ps((TC)(tr0), (TC)(tr1), 0x88);  \
-   hi = (typeof(hi))_##P##_shuffle_ps((TC)(tr0), (TC)(tr1), 0xdd);  \
+   lo = (__typeof__(lo))_##P##_shuffle_ps((TC)(tr0), (TC)(tr1), 0x88);  \
+   hi = (__typeof__(hi))_##P##_shuffle_ps((TC)(tr0), (TC)(tr1), 0xdd);  \
 } while (0)


@@ -74,8 +74,8 @@ extern "C" {
addr, index_mask, next_input, shuffle_input,\
ones_16, range_base, tr_lo, tr_hi)   do {   \
\
-   typeof(addr) in, node_type, r, t;   \
-   typeof(addr) dfa_msk, dfa_ofs, quad_ofs;\
+   __typeof__(addr) in, node_type, r, t;   \
+   __typeof__(addr) dfa_msk, dfa_ofs, quad_ofs;\
\
t = _##P##_xor_si##S(index_mask, index_mask);   \
in = _##P##_shuffle_epi8(next_input, shuffle_input);\
diff --git a/lib/librte_eal/common/include/rte_common.h 
b/lib/librte_eal/common/include/rte_common.h
index 8ac940c..40c2603 100644
--- a/lib/librte_eal/common/include/rte_common.h
+++ b/lib/librte_eal/common/include/rte_common.h
@@ -43,6 +43,7 @@

 #ifdef __cplusplus
 extern "C" {
+
 #endif

 #include 
@@ -112,7 +113,7 @@ rte_align_floor_int(uintptr_t ptr, uintptr_t align)
  * must be a power-of-two value.
  */
 #define RTE_PTR_ALIGN_FLOOR(ptr, align) \
-   (typeof(ptr))rte_align_floor_int((uintptr_t)ptr, align)
+   (__typeof__(ptr))rte_align_floor_int((uintptr_t)ptr, align)

 /**
  * Macro to align a value to a given power-of-two. The resultant value
@@ -121,7 +122,7 @@ rte_align_floor_int(uintptr_t ptr, uintptr_t align)
  * power-of-two value.
  */
 #define RTE_ALIGN_FLOOR(val, align) \
-   (typeof(val))((val) & (~((typeof(val))((align) - 1
+   (__typeof__(val))((val) & (~((__typeof__(val))((align) - 1

 /**
  * Macro to align a pointer to a given power-of-two. The resultant
@@ -130,7 +131,7 @@ rte_align_floor_int(uintptr_t ptr, uintptr_t align)
  * must be a power-of-two value.
  */
 #define RTE_PTR_ALIGN_CEIL(ptr, align) \
-   RTE_PTR_ALIGN_FLOOR((typeof(ptr))RTE_PTR_ADD(ptr, (align) - 1), align)
+   RTE_PTR_ALIGN_FLOOR((__typeof__(ptr))RTE_PTR_ADD(ptr, (align) - 1), 
align)

 /**
  * Macro to align a value to a given power-of-two. The resultant value
@@ -139,7 +140,7 @@ rte_align_floor_int(uintptr_t ptr, uintptr_t align)
  * value.
  */
 #define RTE_ALIGN_CEIL(val, align) \
-   RTE_ALIGN_FLOOR(((val) + ((typeof(val)) (align) - 1)), align)
+   RTE_ALIGN_FLOOR(((val) + ((__typeof__(val)) (align) - 1)), align)

 /**
  * Macro to align a pointer to a given power-of-two. The resultant
@@ -257,8 +258,8 @@ rte_align64pow2(uint64_t v)
  * Macro to return the minimum of two numbers
  */
 #define RTE_MIN(a, b) ({ \
-   typeof (a) _a = (a); \
-   typeof (b) _b = (b); \
+   __typeof__ (a) _a = (a); \
+   __typeof__ (b) _b = (b); \
_a < _b ? _a : _b; \
})

@@ -266,8 +267,8 @@ rte_align64pow2(uint64_t v)
  * Macro to return the maximum of two numbers
  */
 #define RTE_MAX(a, b) ({ \
-   typeof (a) _a = (a); \
-   typeof (b) _b = (b); \
+   __typeof__ (a) _a = (a); \
+   __typeof__ (b) _b = (b); \
_a > _b ? _a : _b; \
})

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 3df07e8..bc065d4 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -212,7 +212,7 @@ do {
   \
val = strtoul((in), &end, 16);  \
if (errno != 0 || end[0] != (dlm) || val > (lim))   \
return (-EINVAL);

[dpdk-dev] [PATCH] maintainers: claim responsibility for timers

2015-02-25 Thread Sanford, Robert

Signed-off-by: Robert Sanford 

---
MAINTAINERS |1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2eb7761..a2b53b3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -371,6 +371,7 @@ F: examples/vm_power_manager/
F: doc/guides/sample_app_ug/vm_power_management.rst
Timers
+M: Robert Sanford 
F: lib/librte_timer/
F: doc/guides/prog_guide/timer_lib.rst
F: app/test/test_timer*
--
1.7.1

[dpdk-dev] [PATCH v2 4/4] cmdline: make parse_set_list() use size_t instead of int for low/high parameter

2015-02-25 Thread Pawel Wodkowski

Fix warning reported during static analysis about size_t to int cast
when passing
parameters to parse_set_list().

This patch fix code formating errors that give checkpatch.pl errors
after generating patch.

Signed-off-by: Pawel Wodkowski 
---
 lib/librte_cmdline/cmdline_parse_portlist.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_cmdline/cmdline_parse_portlist.c 
b/lib/librte_cmdline/cmdline_parse_portlist.c
index fc6c14e..9c1fe3e 100644
--- a/lib/librte_cmdline/cmdline_parse_portlist.c
+++ b/lib/librte_cmdline/cmdline_parse_portlist.c
@@ -78,7 +78,7 @@ struct cmdline_token_ops cmdline_token_portlist_ops = {
 };

 static void
-parse_set_list(cmdline_portlist_t * pl, int low, int high)
+parse_set_list(cmdline_portlist_t *pl, size_t low, size_t high)
 {
do {
pl->map |= (1 << low++);
@@ -86,7 +86,7 @@ parse_set_list(cmdline_portlist_t * pl, int low, int high)
 }

 static int
-parse_ports(cmdline_portlist_t * pl, const char * str)
+parse_ports(cmdline_portlist_t *pl, const char *str)
 {
size_t ps, pe;
const char *first, *last;
-- 
1.9.1

[dpdk-dev] [PATCH v2 3/4] pmd ring: fix possible memory leak during devinit

2015-02-25 Thread Pawel Wodkowski

Free kvlist on function exit to avoid memory leak.

Signed-off-by: Pawel Wodkowski 
---
 lib/librte_pmd_ring/rte_eth_ring.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_pmd_ring/rte_eth_ring.c 
b/lib/librte_pmd_ring/rte_eth_ring.c
index a5dc71e..f049bb3 100644
--- a/lib/librte_pmd_ring/rte_eth_ring.c
+++ b/lib/librte_pmd_ring/rte_eth_ring.c
@@ -527,7 +527,7 @@ out:
 static int
 rte_pmd_ring_devinit(const char *name, const char *params)
 {
-   struct rte_kvargs *kvlist;
+   struct rte_kvargs *kvlist = NULL;
int ret = 0;
struct node_action_list *info = NULL;

@@ -548,7 +548,7 @@ rte_pmd_ring_devinit(const char *name, const char *params)
info = rte_zmalloc("struct node_action_list", 
sizeof(struct node_action_list) +
   (sizeof(struct node_action_pair) * 
ret), 0);
if (!info)
-   goto out;
+   goto out_free;

info->total = ret;
info->list = (struct node_action_pair*)(info + 1);
@@ -567,8 +567,8 @@ rte_pmd_ring_devinit(const char *name, const char *params)
}

 out_free:
+   rte_kvargs_free(kvlist);
rte_free(info);
-out:
return ret;
 }

-- 
1.9.1

[dpdk-dev] [PATCH v2 2/4] librte_kvargs: make rte_kvargs_free() be consistent with other "free()" functions

2015-02-25 Thread Pawel Wodkowski

By convenction free() functions should ignore NULL parameter. This patch
add this behaviour for rte_kvargs_free().

Signed-off-by: Pawel Wodkowski 
---
 lib/librte_kvargs/rte_kvargs.c | 4 
 lib/librte_kvargs/rte_kvargs.h | 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/librte_kvargs/rte_kvargs.c b/lib/librte_kvargs/rte_kvargs.c
index 8bc1e46..c2dd051 100644
--- a/lib/librte_kvargs/rte_kvargs.c
+++ b/lib/librte_kvargs/rte_kvargs.c
@@ -174,8 +174,12 @@ rte_kvargs_process(const struct rte_kvargs *kvlist,
 void
 rte_kvargs_free(struct rte_kvargs *kvlist)
 {
+   if (!kvlist)
+   return;
+
if (kvlist->str != NULL)
free(kvlist->str);
+
free(kvlist);
 }

diff --git a/lib/librte_kvargs/rte_kvargs.h b/lib/librte_kvargs/rte_kvargs.h
index ef4efab..ae9ae79 100644
--- a/lib/librte_kvargs/rte_kvargs.h
+++ b/lib/librte_kvargs/rte_kvargs.h
@@ -115,7 +115,8 @@ void rte_kvargs_free(struct rte_kvargs *kvlist);
  *
  * For each key/value association that matches the given key, calls the
  * handler function with the for a given arg_name passing the value on the
- * dictionary for that key and a given extra argument.
+ * dictionary for that key and a given extra argument. If *kvlist* is NULL
+ * function does nothing.
  *
  * @param kvlist
  *   The rte_kvargs structure
-- 
1.9.1

[dpdk-dev] [PATCH v2 1/4] rte_timer: change declaration of rte_timer_cb_t

2015-02-25 Thread Pawel Wodkowski

This patch remove inconsistency between declaration of type
rte_timer_cb_t, field f in struct rte_timer and function
__rte_timer_reset().

Although compiler treat both of them the same, the static analysis tool
like complain about that.

Signed-off-by: Pawel Wodkowski 
---
 lib/librte_timer/rte_timer.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 35b8719..77547c6 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -115,7 +115,7 @@ struct rte_timer;
 /**
  * Callback function type for timer expiry.
  */
-typedef void (rte_timer_cb_t)(struct rte_timer *, void *);
+typedef void (*rte_timer_cb_t)(struct rte_timer *, void *);

 #define MAX_SKIPLIST_DEPTH 10

@@ -128,7 +128,7 @@ struct rte_timer
struct rte_timer *sl_next[MAX_SKIPLIST_DEPTH];
volatile union rte_timer_status status; /**< Status of timer. */
uint64_t period;   /**< Period of timer (0 if not periodic). */
-   rte_timer_cb_t *f; /**< Callback function. */
+   rte_timer_cb_t f;  /**< Callback function. */
void *arg; /**< Argument to callback function. */
 };

-- 
1.9.1

[dpdk-dev] [PATCH v2 0/4] Fix issues reported by static analysis tool

2015-02-25 Thread Pawel Wodkowski

Static analysis report some issues against current DPDK version. Most of
them need only cosmetic code changes (changing type of variable).

One issue related with ring pmd fix real memory leak problem.

PATCH v2 changes:
 - remove patch 5/5 as it was NACKed
 - reword commit log acording to mailing list sugestions.

Pawel Wodkowski (4):
  rte_timer: change declaration of rte_timer_cb_t
  librte_kvargs: make rte_kvargs_free() be consistent with other
"free()" functions
  pmd ring: fix possible memory leak during devinit
  cmdline: make parse_set_list() use size_t instead of int for low/high 
   parameter

 lib/librte_cmdline/cmdline_parse_portlist.c | 4 ++--
 lib/librte_kvargs/rte_kvargs.c  | 4 
 lib/librte_kvargs/rte_kvargs.h  | 3 ++-
 lib/librte_pmd_ring/rte_eth_ring.c  | 6 +++---
 lib/librte_timer/rte_timer.h| 4 ++--
 5 files changed, 13 insertions(+), 8 deletions(-)

-- 
1.9.1

[dpdk-dev] Looking forward to DPDK 2.1

2015-02-25 Thread Butler, Siobhan A

Hi all,

The progress on DPDK 2.0 has been really positive and thanks to everyone for 
contributing and helping to grow our community. We now look onwards to DPDK 2.1 
planning which is due to release at the end of July, and we'd like to inform 
the community of the features that we hope to submit to that release. The 
current list of features, along with brief descriptions, is included below.

This list is provisional and will naturally change over the lifecycle of the 
release, and should be taken as guidance on what we hope to submit, not a 
commitment.

Our aim in providing this information now is to solicit input from the 
community. We'd like to make sure we avoid duplication or conflicts with work 
that others are planning, so please feel free to let the community know of any 
plans that you have for contributions to DPDK in this timeframe. This will 
allow us to build a complete picture and ensure we avoid duplication of effort. 
We have seen great community collaboration in DPDK 2.0 and hope that this will 
increase in 2.1.

I'm sure people will have questions, and will be looking for more information 
on these features. Further details will be provided by the individual 
developers over the next few months. We aim to provide early outlines of the 
features so that we can obtain community feedback as soon as possible. In 
addition, community calls can be arranged to discuss features as required.


2.1 (Q2 2015) DPDK Features:

* Cuckoo hash - Provide a new hash library based on the cuckoo hashing scheme 
(see http://www.cs.cmu.edu/~dongz/papers/cuckooswitch.pdf), which shall 
guarantee worst-case constant lookup time with a better memory utilization, 
compared to the current implementation.

* IEEE1588 Support for i40e - Support IEEE1588 Standard (PTP) for i40e   
Ethernet Controller

* Continued development of PCI Hot Plug - Add support for PCI Hotplug Framework 
in librte_pmd drivers (librte_pmd_ixgbe, librte_pmd_bond, librte_pmd_e1000, 
librte_pmd_i40e, librte_pmd_virtio, librte_pmd_vmxnet3)

* Packet Framework Enhancements - Enhancements to Packet Framework Port and 
Table Libraries, as well as IP Pipeline  Application, to include additional 
statistics, better pipeline encapsulation and CLI simplification.

* i40e DCB (ETS only) - Support DCB Enhanced Transmission Selection algorithm 
with i40e Ethernet controller.

* i40e Mirroring Rule - Add support for port mirroring using i40e Ethernet 
Controller.

* Additional FM10K Features - Add support for additional usage models for FM10K 
including: promiscuous mode, mac vlan filter, statistics, vlan offload (strip, 
insertion, dual), flow control, Tx offload (checksum).

* Dynamic Configuration of RSS on Bonded Slave devices - Support dynamic queues 
assignment for RX packets. Implementation for a bonding device will require 
multiple RX queues support on a bonding slave and its dynamic
reconfiguration.

* VXLAN Offload Sample Application - Provide a sample application to 
demonstrate the usage of VXLAN overlay encapsulation protocol in DPDK.

* Dynamic Memory Management - Add DPDK API's (Rte_free_unused_pages, 
rte_attach_pages, rte_detach_pages, rte_lazy_allocation) for Dynamic Memory 
Management for NFV use cases.


Thanks,
Siobhan Butler

[dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst

2015-02-25 Thread Marc Sune


On 25/02/15 13:24, Hemant at freescale.com wrote:
> Hi OIivier
>Comments inline.
> Regards,
> Hemant
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Deme
>> Sent: 25/Feb/2015 5:44 PM
>> To: dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst
>>
>> Thank you Hemant, I think there might be one issue left with the patch 
>> though.
>> The alloc_q must initially be filled with mbufs before getting mbuf back on 
>> the
>> tx_q.
>>
>> So the patch should allow rte_kni_rx_burst to check if alloc_q is empty.
>> If so, it should invoke kni_allocate_mbufs(kni, 0) (to fill the alloc_q with
>> MAX_MBUF_BURST_NUM mbufs)
>>
>> The patch for rte_kni_rx_burst would then look like:
>>
>> @@ -575,7 +575,7 @@ rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf
>> **mbufs, unsigned num)
>>
>>/* If buffers removed, allocate mbufs and then put them into alloc_q 
>> */
>>if (ret)
>> -kni_allocate_mbufs(kni);
>> +  kni_allocate_mbufs(kni, ret);
>> +  else if (unlikely(kni->alloc_q->write == kni->alloc_q->read))
>> +  kni_allocate_mbufs(kni, 0);
>>
> [hemant]  This will introduce a run-time check.
>
> I missed to include the other change in the patch.
>   I am doing it in kni_alloc i.e. initiate the alloc_q with default burst 
> size.
>   kni_allocate_mbufs(ctx, 0);
>
> In a way, we are now suggesting to reduce the size of alloc_q to only default 
> burst size.

As an aside comment here, I think that we should allow to tweak the 
userspace <-> kernel queue sizes (rx_q, tx_q, free_q and alloc_q) . 
Whether this should be a build configuration option or a parameter to 
rte_kni_init(), it is not completely clear to me, but I guess 
rte_kni_init() is a better option.

Having said that, the original mail from Hemant was describing that KNI 
was giving an out-of-memory. This to me indicates that the pool is 
incorrectly dimensioned. Even if KNI will not pre-allocate in the 
alloc_q, or not completely, in the event of high load, you will get this 
same "out of memory".

We can reduce the usage of buffers by the KNI subsystem in kernel space 
and in userspace, but the kernel will always need a small cache of 
pre-allocated buffers (coming from user-space), since the KNI kernel 
module does not know where to grab the packets from (which pool). So my 
guess is that the dimensioning problem experienced by Hemant would be 
the same, even with the proposed changes.

>
> Can we reach is situation, when the kernel is adding packets faster in tx_q 
> than the application is able to dequeue?

I think so. We cannot control much how the kernel will schedule the KNI 
thread(s), specially if the # of threads in relation to the cores is 
incorrect (not enough), hence we need at least a reasonable amount of 
buffering to prevent early dropping to those "internal" burst side effects.

Marc

>   alloc_q  can be empty in this case and kernel will be striving.
>
>> Olivier.
>>
>> On 25/02/15 11:48, Hemant Agrawal wrote:
>>> From: Hemant Agrawal 
>>>
>>> if any buffer is read from the tx_q, MAX_BURST buffers will be allocated and
>> attempted to be added to to the alloc_q.
>>> This seems terribly inefficient and it also looks like the alloc_q will 
>>> quickly fill
>> to its maximum capacity. If the system buffers are low in number, it will 
>> reach
>> "out of memory" situation.
>>> This patch allocates the number of buffers as many dequeued from tx_q.
>>>
>>> Signed-off-by: Hemant Agrawal 
>>> ---
>>>lib/librte_kni/rte_kni.c | 13 -
>>>1 file changed, 8 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index
>>> 4e70fa0..4cf8e30 100644
>>> --- a/lib/librte_kni/rte_kni.c
>>> +++ b/lib/librte_kni/rte_kni.c
>>> @@ -128,7 +128,7 @@ struct rte_kni_memzone_pool {
>>>
>>>
>>>static void kni_free_mbufs(struct rte_kni *kni); -static void
>>> kni_allocate_mbufs(struct rte_kni *kni);
>>> +static void kni_allocate_mbufs(struct rte_kni *kni, int num);
>>>
>>>static volatile int kni_fd = -1;
>>>static struct rte_kni_memzone_pool kni_memzone_pool = { @@ -575,7
>>> +575,7 @@ rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf
>>> **mbufs, unsigned num)
>>>
>>> /* If buffers removed, allocate mbufs and then put them into alloc_q
>> */
>>> if (ret)
>>> -   kni_allocate_mbufs(kni);
>>> +   kni_allocate_mbufs(kni, ret);
>>>
>>> return ret;
>>>}
>>> @@ -594,7 +594,7 @@ kni_free_mbufs(struct rte_kni *kni)
>>>}
>>>
>>>static void
>>> -kni_allocate_mbufs(struct rte_kni *kni)
>>> +kni_allocate_mbufs(struct rte_kni *kni, int num)
>>>{
>>> int i, ret;
>>> struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM]; @@ -620,7 +620,10
>> @@
>>> kni_allocate_mbufs(struct rte_kni *kni)
>>> return;
>>> }
>>>
>>> -   for (i = 0; i < MAX_MBUF_BURST_NUM; i++) {
>>> +   if (num == 0 || num > MAX_MBUF_BURST_NUM)
>>> +

[dpdk-dev] [PATCH 2/2] doc: update programmers guide for uio_pci_generic

2015-02-25 Thread Bruce Richardson

On Wed, Feb 25, 2015 at 01:12:43PM +, Iremonger, Bernard wrote:
> 
> 
> > -Original Message-
> > From: Richardson, Bruce
> > Sent: Wednesday, February 25, 2015 12:28 PM
> > To: Iremonger, Bernard
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 2/2] doc: update programmers guide for 
> > uio_pci_generic
> > 
> > On Wed, Feb 25, 2015 at 12:19:10PM +, Iremonger, Bernard wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce
> > > > Richardson
> > > > Sent: Tuesday, February 24, 2015 4:28 PM
> > > > To: dev at dpdk.org
> > > > Subject: [dpdk-dev] [PATCH 2/2] doc: update programmers guide for
> > > > uio_pci_generic
> > > >
> > > > Since DPDK now has support for the in-tree uio_pci_generic driver,
> > > > update the programmers guide document to reference this module, and
> > > > to use it in preference to the igb_uio driver, which is DPDK- specific.
> > > >
> > > > Signed-off-by: Bruce Richardson 
> > > > ---
> > > >  doc/guides/prog_guide/env_abstraction_layer.rst  | 8 
> > > > 
> > > >  doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst | 6 
> > > > +++---
> > > >  doc/guides/prog_guide/kernel_nic_interface.rst   | 2 +-
> > > >  doc/guides/prog_guide/poll_mode_drv_emulated_virtio_nic.rst  | 8 
> > > > 
> > > >  doc/guides/prog_guide/poll_mode_drv_paravirtual_vmxnets_nic.rst  |
> > > > 2 +-
> > > >  5 files changed, 13 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst
> > > > b/doc/guides/prog_guide/env_abstraction_layer.rst
> > > > index 231e266..b5321c3 100644
> > > > --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> > > > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> > > > @@ -66,7 +66,7 @@ EAL in a Linux-userland Execution Environment
> > > >  -
> > > >
> > > >  In a Linux user space environment, the DPDK application runs as a
> > > > user-space application using the pthread library.
> > > > -PCI information about devices and address space is discovered
> > > > through the /sys kernel interface and through a module called igb_uio.
> > > > +PCI information about devices and address space is discovered
> > > > +through the /sys kernel interface and
> > > > through kernel modules such as uio_pci_generic, or igb_uio.
> > > >  Refer to the UIO: User-space drivers documentation in the Linux
> > > > kernel. This memory is mmap'd in the application.
> > > >
> > > >  The EAL performs physical memory allocation using mmap() in
> > > > hugetlbfs (using huge page sizes to increase performance).
> > > > @@ -134,10 +134,10 @@ PCI Access
> > > >  ~~
> > > >
> > > >  The EAL uses the /sys/bus/pci utilities provided by the kernel to scan 
> > > > the content on the PCI bus.
> > > > -
> > > > -To access PCI memory, a kernel module called igb_uio provides a
> > > > /dev/uioX device file
> > > > +To access PCI memory, a kernel module called uio_pci_generic
> > > > +provides a /dev/uioX device file and resource files in /sys
> > > >  that can be mmap'd to obtain access to PCI address space from the 
> > > > application.
> > > > -It uses the uio kernel feature (userland driver).
> > > > +The DPDK-specific igb_uio module can also be used for this. Both
> > > > +drivers use the uio kernel feature
> > > > (userland driver).
> > > >
> > > >  Per-lcore and Shared Variables
> > > >  ~~
> > > > diff --git
> > > > a/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> > > > b/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> > > > index 1f1e04f..a0dd959 100644
> > > > ---
> > > > a/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> > > > +++ b/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.r
> > > > +++ st
> > > > @@ -306,12 +306,12 @@ Building and Running the Switching Backend
> > > >  Refer to the *DPDK Getting Started Guide* for more
> > > > information on memory management in the DPDK.
> > > >  In the above command, 4 GB memory is reserved (2048 of 2 MB 
> > > > pages) for DPDK.
> > > >
> > > > -#.  Load igb_uio and bind one Intel NIC controller to igb_uio:
> > > > +#.  Load uio_pci_generic and bind one Intel NIC controller to it:
> > > >
> > > >  .. code-block:: console
> > > >
> > > > -insmod x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
> > > > -python tools/dpdk_nic_bind.py -b igb_uio :09:00:00.0
> > >
> > >
> > > Hi Bruce,
> > >
> > > Should the information about igb_uio be retained alongside the new 
> > > information about
> > uio_pci_generic?
> > >
> > While the answer may not be as clear-cut as with the GSG, why would be 
> > bother covering both here.
> > We already ignore VFIO in these examples.
> > 
> > /Bruce
> 
> Hi Bruce,
> 
> The method of loading is different for both modules, igb_uio uses insmod and 
> uio_

[dpdk-dev] [PATCH v3] app/test: add crc32 algorithms equivalence check

2015-02-25 Thread Bruce Richardson

On Wed, Feb 25, 2015 at 06:34:06PM +0600, Yerden Zhumabekov wrote:
> New function test_crc32_hash_alg_equiv() checks whether software,
> 4-byte operand and 8-byte operand versions of CRC32 hash function
> implementations return the same result value.
> 
> Signed-off-by: Yerden Zhumabekov 

Acked-by: Bruce Richardson 

> ---
>  app/test/test_hash.c |   60 
> ++
>  1 file changed, 60 insertions(+)
> 
> diff --git a/app/test/test_hash.c b/app/test/test_hash.c
> index 76b1b8f..653dd86 100644
> --- a/app/test/test_hash.c
> +++ b/app/test/test_hash.c
> @@ -177,6 +177,63 @@ static struct rte_hash_parameters ut_params = {
>   .socket_id = 0,
>  };
>  
> +#define CRC32_ITERATIONS (1U << 20)
> +#define CRC32_DWORDS (1U << 6)
> +/*
> + * Test if all CRC32 implementations yield the same hash value
> + */
> +static int
> +test_crc32_hash_alg_equiv(void)
> +{
> + uint32_t hash_val;
> + uint32_t init_val;
> + uint64_t data64[CRC32_DWORDS];
> + unsigned i, j;
> + size_t data_len;
> +
> + printf("# CRC32 implementations equivalence test\n");
> + for (i = 0; i < CRC32_ITERATIONS; i++) {
> + /* Randomizing data_len of data set */
> + data_len = (size_t) ((rte_rand() % sizeof(data64)) + 1);
> + init_val = (uint32_t) rte_rand();
> +
> + /* Fill the data set */
> + for (j = 0; j < CRC32_DWORDS; j++)
> + data64[j] = rte_rand();
> +
> + /* Calculate software CRC32 */
> + rte_hash_crc_set_alg(CRC32_SW);
> + hash_val = rte_hash_crc(data64, data_len, init_val);
> +
> + /* Check against 4-byte-operand sse4.2 CRC32 if available */
> + rte_hash_crc_set_alg(CRC32_SSE42);
> + if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
> + printf("Failed checking CRC32_SW against 
> CRC32_SSE42\n");
> + break;
> + }
> +
> + /* Check against 8-byte-operand sse4.2 CRC32 if available */
> + rte_hash_crc_set_alg(CRC32_SSE42_x64);
> + if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
> + printf("Failed checking CRC32_SW against 
> CRC32_SSE42_x64\n");
> + break;
> + }
> + }
> +
> + /* Resetting to best available algorithm */
> + rte_hash_crc_set_alg(CRC32_SSE42_x64);
> +
> + if (i == CRC32_ITERATIONS)
> + return 0;
> +
> + printf("Failed test data (hex, %lu bytes total):\n", data_len);
> + for (j = 0; j < data_len; j++)
> + printf("%02X%c", ((uint8_t *)data64)[j],
> + ((j+1) % 16 == 0 || j == data_len - 1) ? '\n' : 
> ' ');
> +
> + return -1;
> +}
> +
>  /*
>   * Test a hash function.
>   */
> @@ -1356,6 +1413,9 @@ test_hash(void)
>  
>   run_hash_func_tests();
>  
> + if (test_crc32_hash_alg_equiv() < 0)
> + return -1;
> +
>   return 0;
>  }
>  
> -- 
> 1.7.9.5
>

[dpdk-dev] [PATCH 2/2] doc: update programmers guide for uio_pci_generic

2015-02-25 Thread Iremonger, Bernard



> -Original Message-
> From: Richardson, Bruce
> Sent: Wednesday, February 25, 2015 12:28 PM
> To: Iremonger, Bernard
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/2] doc: update programmers guide for 
> uio_pci_generic
> 
> On Wed, Feb 25, 2015 at 12:19:10PM +, Iremonger, Bernard wrote:
> >
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce
> > > Richardson
> > > Sent: Tuesday, February 24, 2015 4:28 PM
> > > To: dev at dpdk.org
> > > Subject: [dpdk-dev] [PATCH 2/2] doc: update programmers guide for
> > > uio_pci_generic
> > >
> > > Since DPDK now has support for the in-tree uio_pci_generic driver,
> > > update the programmers guide document to reference this module, and
> > > to use it in preference to the igb_uio driver, which is DPDK- specific.
> > >
> > > Signed-off-by: Bruce Richardson 
> > > ---
> > >  doc/guides/prog_guide/env_abstraction_layer.rst  | 8 
> > > 
> > >  doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst | 6 
> > > +++---
> > >  doc/guides/prog_guide/kernel_nic_interface.rst   | 2 +-
> > >  doc/guides/prog_guide/poll_mode_drv_emulated_virtio_nic.rst  | 8 
> > > 
> > >  doc/guides/prog_guide/poll_mode_drv_paravirtual_vmxnets_nic.rst  |
> > > 2 +-
> > >  5 files changed, 13 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst
> > > b/doc/guides/prog_guide/env_abstraction_layer.rst
> > > index 231e266..b5321c3 100644
> > > --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> > > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> > > @@ -66,7 +66,7 @@ EAL in a Linux-userland Execution Environment
> > >  -
> > >
> > >  In a Linux user space environment, the DPDK application runs as a
> > > user-space application using the pthread library.
> > > -PCI information about devices and address space is discovered
> > > through the /sys kernel interface and through a module called igb_uio.
> > > +PCI information about devices and address space is discovered
> > > +through the /sys kernel interface and
> > > through kernel modules such as uio_pci_generic, or igb_uio.
> > >  Refer to the UIO: User-space drivers documentation in the Linux
> > > kernel. This memory is mmap'd in the application.
> > >
> > >  The EAL performs physical memory allocation using mmap() in
> > > hugetlbfs (using huge page sizes to increase performance).
> > > @@ -134,10 +134,10 @@ PCI Access
> > >  ~~
> > >
> > >  The EAL uses the /sys/bus/pci utilities provided by the kernel to scan 
> > > the content on the PCI bus.
> > > -
> > > -To access PCI memory, a kernel module called igb_uio provides a
> > > /dev/uioX device file
> > > +To access PCI memory, a kernel module called uio_pci_generic
> > > +provides a /dev/uioX device file and resource files in /sys
> > >  that can be mmap'd to obtain access to PCI address space from the 
> > > application.
> > > -It uses the uio kernel feature (userland driver).
> > > +The DPDK-specific igb_uio module can also be used for this. Both
> > > +drivers use the uio kernel feature
> > > (userland driver).
> > >
> > >  Per-lcore and Shared Variables
> > >  ~~
> > > diff --git
> > > a/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> > > b/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> > > index 1f1e04f..a0dd959 100644
> > > ---
> > > a/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> > > +++ b/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.r
> > > +++ st
> > > @@ -306,12 +306,12 @@ Building and Running the Switching Backend
> > >  Refer to the *DPDK Getting Started Guide* for more
> > > information on memory management in the DPDK.
> > >  In the above command, 4 GB memory is reserved (2048 of 2 MB 
> > > pages) for DPDK.
> > >
> > > -#.  Load igb_uio and bind one Intel NIC controller to igb_uio:
> > > +#.  Load uio_pci_generic and bind one Intel NIC controller to it:
> > >
> > >  .. code-block:: console
> > >
> > > -insmod x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
> > > -python tools/dpdk_nic_bind.py -b igb_uio :09:00:00.0
> >
> >
> > Hi Bruce,
> >
> > Should the information about igb_uio be retained alongside the new 
> > information about
> uio_pci_generic?
> >
> While the answer may not be as clear-cut as with the GSG, why would be bother 
> covering both here.
> We already ignore VFIO in these examples.
> 
> /Bruce

Hi Bruce,

The method of loading is different for both modules, igb_uio uses insmod and 
uio_pci_generic uses modprobe.
It would be useful to retain this igb_uio information. Maybe vfio information 
should be added too.
This comment also applies to the GSG,  the differences need to be documented.

Regards,

Bernard.

[dpdk-dev] [PATCH v1 0/2] eal: fix symbol missing in version map

2015-02-25 Thread Tetsuya Mukawa

On 2015/02/25 12:39, Cunming Liang wrote:
> These two patches are the fixing for the compling error when 
> CONFIG_RTE_BUILD_SHARED_LIB=y.
> The root cause is *per_lcore__socket_id* and *rte_sys_gettid* are missing in 
> the version map.
> Thanks for the notification from Tetsuya Mukawa . 
>
> Cunming Liang (2):
>   eal/linux: fix symbol missing in version map
>   eal/bsd: fix symbol missing in version map
>
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 2 ++
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 ++
>  2 files changed, 4 insertions(+)
>
Hi Liang,

I've confirmed it works on my Linux environment.

Thanks,
Tetsuya

[dpdk-dev] [PATCH v14] testpmd: Add port hotplug support

2015-02-25 Thread Tetsuya Mukawa

The patch introduces following commands.
- port attach [ident]
- port detach [port_id]
 - attach: attaching a port
 - detach: detaching a port
 - ident: pci address of physical device.
  Or device name and parameters of virtual device.
 (ex. :02:00.0, eth_pcap0,iface=eth0)
 - port_id: port identifier

v7:
- Fix doc.
  (Thanks to Iremonger, Bernard)
- Fix port checking implementation of star_port();
  (Thanks to Qiu, Michael)
v5:
- Add testpmd documentation.
  (Thanks to Iremonger, Bernard)
v4:
 - Fix strings of command help.

Signed-off-by: Tetsuya Mukawa 
---
 app/test-pmd/cmdline.c  | 137 +++
 app/test-pmd/config.c   | 102 --
 app/test-pmd/parameters.c   |  22 ++-
 app/test-pmd/testpmd.c  | 199 +---
 app/test-pmd/testpmd.h  |  18 ++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  57 
 6 files changed, 409 insertions(+), 126 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4c9f423..c8312be 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -513,6 +513,12 @@ static void cmd_help_long_parsed(void *parsed_result,
"port close (port_id|all)\n"
"Close all ports or port_id.\n\n"

+   "port attach (ident)\n"
+   "Attach physical or virtual dev by pci address or 
virtual device name\n\n"
+
+   "port detach (port_id)\n"
+   "Detach physical or virtual dev by port_id\n\n"
+
"port config (port_id|all)"
" speed (10|100|1000|1|4|auto)"
" duplex (half|full|auto)\n"
@@ -793,6 +799,89 @@ cmdline_parse_inst_t cmd_operate_specific_port = {
},
 };

+/* *** attach a specified port *** */
+struct cmd_operate_attach_port_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t keyword;
+   cmdline_fixed_string_t identifier;
+};
+
+static void cmd_operate_attach_port_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_operate_attach_port_result *res = parsed_result;
+
+   if (!strcmp(res->keyword, "attach"))
+   attach_port(res->identifier);
+   else
+   printf("Unknown parameter\n");
+}
+
+cmdline_parse_token_string_t cmd_operate_attach_port_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   port, "port");
+cmdline_parse_token_string_t cmd_operate_attach_port_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   keyword, "attach");
+cmdline_parse_token_string_t cmd_operate_attach_port_identifier =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_attach_port_result,
+   identifier, NULL);
+
+cmdline_parse_inst_t cmd_operate_attach_port = {
+   .f = cmd_operate_attach_port_parsed,
+   .data = NULL,
+   .help_str = "port attach identifier, "
+   "identifier: pci address or virtual dev name",
+   .tokens = {
+   (void *)&cmd_operate_attach_port_port,
+   (void *)&cmd_operate_attach_port_keyword,
+   (void *)&cmd_operate_attach_port_identifier,
+   NULL,
+   },
+};
+
+/* *** detach a specified port *** */
+struct cmd_operate_detach_port_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t keyword;
+   uint8_t port_id;
+};
+
+static void cmd_operate_detach_port_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_operate_detach_port_result *res = parsed_result;
+
+   if (!strcmp(res->keyword, "detach"))
+   detach_port(res->port_id);
+   else
+   printf("Unknown parameter\n");
+}
+
+cmdline_parse_token_string_t cmd_operate_detach_port_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_detach_port_result,
+   port, "port");
+cmdline_parse_token_string_t cmd_operate_detach_port_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_operate_detach_port_result,
+   keyword, "detach");
+cmdline_parse_token_num_t cmd_operate_detach_port_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_operate_detach_port_result,
+   port_id, UINT8);
+
+cmdline_parse_inst_t cmd_operate_detach_port = {
+   .f = cmd_operate_detach_port_parsed,
+   .data = NULL,
+   .help_str = "port detach port_id",
+   .tokens = {
+   (void *)&cmd_operate_detach_port_port,
+   (void *)&cmd_operate_detach_port_keyword,
+

[dpdk-dev] [PATCH v14] librte_pmd_pcap: Add port hotplug support

2015-02-25 Thread Tetsuya Mukawa

This patch adds finalization code to free resources allocated by the
PMD.

v6:
 - Fix a paramter of rte_eth_dev_free().
v4:
 - Change function name.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_pmd_pcap/rte_eth_pcap.c | 40 ++
 1 file changed, 40 insertions(+)

diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c 
b/lib/librte_pmd_pcap/rte_eth_pcap.c
index af7fae8..5e94930 100644
--- a/lib/librte_pmd_pcap/rte_eth_pcap.c
+++ b/lib/librte_pmd_pcap/rte_eth_pcap.c
@@ -498,6 +498,13 @@ static struct eth_dev_ops ops = {
.stats_reset = eth_stats_reset,
 };

+static struct eth_driver rte_pcap_pmd = {
+   .pci_drv = {
+   .name = "rte_pcap_pmd",
+   .drv_flags = RTE_PCI_DRV_DETACHABLE,
+   },
+};
+
 /*
  * Function handler that opens the pcap file for reading a stores a
  * reference of it for use it later on.
@@ -713,6 +720,10 @@ rte_pmd_init_internals(const char *name, const unsigned 
nb_rx_queues,
if (*eth_dev == NULL)
goto error;

+   /* check length of device name */
+   if ((strlen((*eth_dev)->data->name) + 1) > sizeof(data->name))
+   goto error;
+
/* now put it all together
 * - store queue data in internals,
 * - store numa_node info in pci_driver
@@ -739,10 +750,13 @@ rte_pmd_init_internals(const char *name, const unsigned 
nb_rx_queues,
data->nb_tx_queues = (uint16_t)nb_tx_queues;
data->dev_link = pmd_link;
data->mac_addrs = ð_addr;
+   strncpy(data->name,
+   (*eth_dev)->data->name, strlen((*eth_dev)->data->name));

(*eth_dev)->data = data;
(*eth_dev)->dev_ops = &ops;
(*eth_dev)->pci_dev = pci_dev;
+   (*eth_dev)->driver = &rte_pcap_pmd;

return 0;

@@ -927,10 +941,36 @@ rte_pmd_pcap_devinit(const char *name, const char *params)

 }

+static int
+rte_pmd_pcap_devuninit(const char *name)
+{
+   struct rte_eth_dev *eth_dev = NULL;
+
+   RTE_LOG(INFO, PMD, "Closing pcap ethdev on numa socket %u\n",
+   rte_socket_id());
+
+   if (name == NULL)
+   return -1;
+
+   /* reserve an ethdev entry */
+   eth_dev = rte_eth_dev_allocated(name);
+   if (eth_dev == NULL)
+   return -1;
+
+   rte_free(eth_dev->data->dev_private);
+   rte_free(eth_dev->data);
+   rte_free(eth_dev->pci_dev);
+
+   rte_eth_dev_release_port(eth_dev);
+
+   return 0;
+}
+
 static struct rte_driver pmd_pcap_drv = {
.name = "eth_pcap",
.type = PMD_VDEV,
.init = rte_pmd_pcap_devinit,
+   .uninit = rte_pmd_pcap_devuninit,
 };

 PMD_REGISTER_DRIVER(pmd_pcap_drv);
-- 
1.9.1

[dpdk-dev] [PATCH v14 13/13] doc: Add port hotplug framework section to programmers guide

2015-02-25 Thread Tetsuya Mukawa

This patch adds a new section for describing port hotplug framework.

Signed-off-by: Tetsuya Mukawa 
---
 doc/guides/prog_guide/index.rst  |   1 +
 doc/guides/prog_guide/port_hotplug_framework.rst | 110 +++
 2 files changed, 111 insertions(+)
 create mode 100644 doc/guides/prog_guide/port_hotplug_framework.rst

diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index de69682..60a6ac5 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -71,6 +71,7 @@ Programmer's Guide
 packet_classif_access_ctrl
 packet_framework
 vhost_lib
+port_hotplug_framework
 source_org
 dev_kit_build_system
 dev_kit_root_make_help
diff --git a/doc/guides/prog_guide/port_hotplug_framework.rst 
b/doc/guides/prog_guide/port_hotplug_framework.rst
new file mode 100644
index 000..355ae28
--- /dev/null
+++ b/doc/guides/prog_guide/port_hotplug_framework.rst
@@ -0,0 +1,110 @@
+..  BSD LICENSE
+Copyright(c) 2015 IGEL Co.,Ltd. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of IGEL Co.,Ltd. nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Port Hotplug Framework
+==
+
+The Port Hotplug Framework provides DPDK applications with the ability to
+attach and detach ports at runtime. Because the framework depends on PMD
+implementation, the ports that PMDs cannot handle are out of scope of this
+framework. Furthermore, after detaching a port from a DPDK application, the
+framework doesn't provide a way for removing the devices from the system.
+For the ports backed by a physical NIC, the kernel will need to support PCI
+Hotplug feature.
+
+Overview
+
+
+The basic requirements of the Port Hotplug Framework are:
+
+*   DPDK applications that use the Port Hotplug Framework must manage their
+own ports.
+
+The Port Hotplug Framework is implemented to allow DPDK applications to
+manage ports. For example, when DPDK applications call the port attach
+function, the attached port number is returned. DPDK applications can
+also detach the port by port number.
+
+*   Kernel support is needed for attaching or detaching physical device
+ports.
+
+To attach new physical device ports, the device will be recognized by
+userspace driver I/O framework in kernel at first. Then DPDK
+applications can call the Port Hotplug functions to attach the ports.
+For detaching, steps are vice versa.
+
+*   Before detaching, they must be stopped and closed.
+
+DPDK applications must call "rte_eth_dev_stop()" and
+"rte_eth_dev_close()" APIs before detaching ports. These functions will
+start finalization sequence of the PMDs.
+
+*   The framework doesn't affect legacy DPDK applications behavior.
+
+If the Port Hotplug functions aren't called, all legacy DPDK apps can
+still work without modifications.
+
+Port Hotplug API overview
+-
+
+*   Attaching a port
+
+"rte_eal_dev_attach()" API attaches a port to DPDK application, and
+returns the attached port number. Before calling the API, the device
+should be recognized by an userspace driver I/O framework. The API
+receives a pci address like ":01:00.0" or a virtual device name
+like "eth_pcap0,iface=eth0". In the case of virtual device name, the
+format is the same as the general "--vdev" option of DPDK.
+
+*   Detac

[dpdk-dev] [PATCH v14 12/13] eal/pci: Add rte_eal_dev_attach/detach() functions

2015-02-25 Thread Tetsuya Mukawa

These functions are used for attaching or detaching a port.
When rte_eal_dev_attach() is called, the function tries to realize the
device name as pci address. If this is done successfully,
rte_eal_dev_attach() will attach physical device port. If not, attaches
virtual devive port.
When rte_eal_dev_detach() is called, the function gets the device type
of this port to know whether the port is come from physical or virtual.
And then specific detaching function will be called.

v14:
- Remove needless if statement.
  (Thanks to Maxime Leroy)
v13:
- Change log level when error occurs in rte_eal_vdev_init() and
  rte_eal_dev_init().
- Return value of driver init and uninit functions.
- Replace rte_panic by RTE_LOG in rte_eal_dev_init()
- Fix return value of rte_eal_vdev_uninit().
- Fix rte_eal_dev_attach_vdev to set port_id correctly.
  (Thanks to Maxime Leroy)
v11:
- Remove needless devargs handling codes.
- Replace get_vdev_name() by rte_eal_parse_devargs_str().
- Replace rte_eal_vdev_find_and_init by rte_eal_vdev_init()
- Replace rte_eal_vdev_find_and_uninit by rte_eal_vdev_uninit()
- Fix rte_eal_dev_init() to use rte_eal_vdev_init().
  (Thanks to Maxime Leroy)
v10:
- Add comments.
- Change order of version.map.
  (Thanks to Thomas Monjalon)
v9:
- Fix comments.
- Use strcmp() instead of strncmp().
- Remove RTE_EAL_INVOKE_TYPE_PROBE/CLOSE.
- Change definition of rte_dev_uninit_t.
  (Thanks to Thomas Monjalon and Maxime Leroy)
v8:
- Add missing symbol in version map.
  (Thanks to Qiu, Michael and Iremonger, Bernard)
v7:
- Fix typo of warning messages.
  (Thanks to Qiu, Michael)
v5:
- Change function names like below.
  rte_eal_dev_find_and_invoke() to rte_eal_vdev_find_and_invoke().
  rte_eal_dev_invoke() to rte_eal_vdev_invoke().
- Add code to handle a return value of rte_eal_devargs_remove().
- Fix pci address format in rte_eal_dev_detach().
v4:
- Fix comment.
- Add error checking.
- Fix indent of 'if' statement.
- Change function name.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_dev.c  | 278 ++--
 lib/librte_eal/common/eal_common_devargs.c  |  54 +++--
 lib/librte_eal/common/eal_private.h |  11 +
 lib/librte_eal/common/include/rte_dev.h |  33 +++
 lib/librte_eal/common/include/rte_devargs.h |  28 +++
 lib/librte_eal/linuxapp/eal/Makefile|   1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   |   6 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |   2 +
 8 files changed, 375 insertions(+), 38 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index eae5656..6d805aa 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -32,10 +32,13 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

+#include 
+#include 
 #include 
 #include 
 #include 

+#include 
 #include 
 #include 
 #include 
@@ -61,6 +64,32 @@ rte_eal_driver_unregister(struct rte_driver *driver)
TAILQ_REMOVE(&dev_driver_list, driver, next);
 }

+static int
+rte_eal_vdev_init(const char *name, const char *args)
+{
+   struct rte_driver *driver;
+
+   if (name == NULL)
+   return -EINVAL;
+
+   TAILQ_FOREACH(driver, &dev_driver_list, next) {
+   if (driver->type != PMD_VDEV)
+   continue;
+
+   /*
+* search a driver prefix in virtual device name.
+* For example, if the driver is pcap PMD, driver->name
+* will be "eth_pcap", but "name" will be "eth_pcapN".
+* So use strncmp to compare.
+*/
+   if (!strncmp(driver->name, name, strlen(driver->name)))
+   return driver->init(name, args);
+   }
+
+   RTE_LOG(ERR, EAL, "no driver found for %s\n", name);
+   return -EINVAL;
+}
+
 int
 rte_eal_dev_init(void)
 {
@@ -79,22 +108,11 @@ rte_eal_dev_init(void)
if (devargs->type != RTE_DEVTYPE_VIRTUAL)
continue;

-   TAILQ_FOREACH(driver, &dev_driver_list, next) {
-   if (driver->type != PMD_VDEV)
-   continue;
-
-   /* search a driver prefix in virtual device name */
-   if (!strncmp(driver->name, devargs->virtual.drv_name,
-   strlen(driver->name))) {
-   driver->init(devargs->virtual.drv_name,
-   devargs->args);
-   break;
-   }
-   }
-
-   if (driver == NULL) {
-   rte_panic("no driver found for %s\n",
- devargs->virtual.drv_name);
+   if (rte_eal_vdev_init(devargs->virtual.drv_name,
+   devargs->args)) {
+   RTE_LOG

[dpdk-dev] [PATCH v14 11/13] ethdev: Add one dev_type parameter to rte_eth_dev_allocate

2015-02-25 Thread Tetsuya Mukawa

This new parameter is needed to keep device type like PCI or virtual.
Port detaching processes are different between PCI device and virtual
device.
RTE_ETH_DEV_PCI indicates device type is PCI. RTE_ETH_DEV_VIRTUAL
indicates device is virtual.

v12:
- Add missing symbol in version map.
  (Thanks to Iremonger, Bernard)
v10:
- Change order of version.map.
  (Thanks to Thomas Monjalon)
- Fix comment of "rte_ethdev.h".
  (Thanks to Thomas Monjalon)
v9:
- Fix commit log.
- RTE_ETH_DEV_PHYSICAL is replaced by RTE_ETH_DEV_PCI.
  (Thanks to Thomas Monjalon)
v8:
- NONE_TRACE is replaced by NO_TRACE.
- Add missing symbol in version map.
  (Thanks to Qiu, Michael and Iremonger, Bernard)
v4:
- Fix comments of rte_eth_dev_type.

Signed-off-by: Tetsuya Mukawa 
---
 app/test/virtual_pmd.c   |  2 +-
 lib/librte_ether/rte_ethdev.c| 25 +++--
 lib/librte_ether/rte_ethdev.h| 25 -
 lib/librte_ether/rte_ether_version.map   |  1 +
 lib/librte_pmd_af_packet/rte_eth_af_packet.c |  2 +-
 lib/librte_pmd_bond/rte_eth_bond_api.c   |  2 +-
 lib/librte_pmd_pcap/rte_eth_pcap.c   |  2 +-
 lib/librte_pmd_ring/rte_eth_ring.c   |  2 +-
 lib/librte_pmd_xenvirt/rte_eth_xenvirt.c |  2 +-
 9 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/app/test/virtual_pmd.c b/app/test/virtual_pmd.c
index 785bccc..9b07ab1 100644
--- a/app/test/virtual_pmd.c
+++ b/app/test/virtual_pmd.c
@@ -580,7 +580,7 @@ virtual_ethdev_create(const char *name, struct ether_addr 
*mac_addr,
goto err;

/* reserve an ethdev entry */
-   eth_dev = rte_eth_dev_allocate(name);
+   eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_PCI);
if (eth_dev == NULL)
goto err;

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 1f6a066..4ebdd9f 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -227,7 +227,7 @@ rte_eth_dev_find_free_port(void)
 }

 struct rte_eth_dev *
-rte_eth_dev_allocate(const char *name)
+rte_eth_dev_allocate(const char *name, enum rte_eth_dev_type type)
 {
uint8_t port_id;
struct rte_eth_dev *eth_dev;
@@ -251,6 +251,7 @@ rte_eth_dev_allocate(const char *name)
snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
eth_dev->data->port_id = port_id;
eth_dev->attached = DEV_ATTACHED;
+   eth_dev->dev_type = type;
nb_ports++;
return eth_dev;
 }
@@ -262,6 +263,7 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
return -EINVAL;

eth_dev->attached = 0;
+   eth_dev->dev_type = RTE_ETH_DEV_UNKNOWN;
nb_ports--;
return 0;
 }
@@ -300,7 +302,7 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
rte_eth_dev_create_unique_device_name(ethdev_name,
sizeof(ethdev_name), pci_dev);

-   eth_dev = rte_eth_dev_allocate(ethdev_name);
+   eth_dev = rte_eth_dev_allocate(ethdev_name, RTE_ETH_DEV_PCI);
if (eth_dev == NULL)
return -ENOMEM;

@@ -426,6 +428,14 @@ rte_eth_dev_count(void)
return (nb_ports);
 }

+enum rte_eth_dev_type
+rte_eth_dev_get_device_type(uint8_t port_id)
+{
+   if (!rte_eth_dev_is_valid_port(port_id))
+   return -1;
+   return rte_eth_devices[port_id].dev_type;
+}
+
 int
 rte_eth_dev_save(struct rte_eth_dev *devs, size_t size)
 {
@@ -523,6 +533,17 @@ rte_eth_dev_is_detachable(uint8_t port_id)
return -EINVAL;
}

+   if (rte_eth_devices[port_id].dev_type == RTE_ETH_DEV_PCI) {
+   switch (rte_eth_devices[port_id].pci_dev->pt_driver) {
+   case RTE_PT_IGB_UIO:
+   case RTE_PT_UIO_GENERIC:
+   break;
+   case RTE_PT_VFIO:
+   default:
+   return -ENOTSUP;
+   }
+   }
+
drv_flags = rte_eth_devices[port_id].driver->pci_drv.drv_flags;
return !(drv_flags & RTE_PCI_DRV_DETACHABLE);
 }
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 5519ce0..d8e5543 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1422,6 +1422,17 @@ struct rte_eth_rxtx_callback {
void *param;
 };

+/*
+ * The eth device type
+ */
+enum rte_eth_dev_type {
+   RTE_ETH_DEV_UNKNOWN,/**< unknown device type */
+   RTE_ETH_DEV_PCI,
+   /**< Physical function and Virtual function of PCI devices */
+   RTE_ETH_DEV_VIRTUAL,/**< non hardware device */
+   RTE_ETH_DEV_MAX /**< max value of this enum */
+};
+
 /**
  * @internal
  * The generic data structure associated with each ethernet device.
@@ -1452,6 +1463,7 @@ struct rte_eth_dev {
 */
struct rte_eth_rxtx_callback **pre_tx_burst_cbs;
uint8_t attached; /**< Flag indicating the port is attached */
+   enum rte_eth_

[dpdk-dev] [PATCH v14 10/13] eal/pci: Add probe and close functions of pci driver

2015-02-25 Thread Tetsuya Mukawa

- Add pci_close_all_drivers()
  The function tries to find a driver for the specified device, and
  then close the driver.
- Add rte_eal_pci_probe_one() and rte_eal_pci_close_one()
  The functions are used for probe and close a device.
  First the function tries to find a device that has the specified
  PCI address. Then, probe or close the device.

v9:
- Fix commit title.
- Remove RTE_EAL_INVOKE_TYPE_PROBE/CLOSE.
  (Thanks to Thomas Monjalon)
- Implement pci_unmap_device() in this patch.
v5:
- Remove RTE_EAL_INVOKE_TYPE_UNKNOWN, because it's unused.
v4:
- Fix parameter checking.
- Fix indent of 'if' statement.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/eal_common_pci.c  | 98 -
 lib/librte_eal/common/eal_private.h | 15 +
 lib/librte_eal/common/include/rte_pci.h | 32 +++
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 94 +++
 4 files changed, 238 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index bf2793f..5b6b55d 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -108,7 +108,10 @@ static int
 pci_probe_all_drivers(struct rte_pci_device *dev)
 {
struct rte_pci_driver *dr = NULL;
-   int rc;
+   int rc = 0;
+
+   if (dev == NULL)
+   return -1;

TAILQ_FOREACH(dr, &pci_driver_list, next) {
rc = rte_eal_pci_probe_one_driver(dr, dev);
@@ -123,6 +126,99 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
return 1;
 }

+#ifdef RTE_LIBRTE_EAL_HOTPLUG
+/*
+ * If vendor/device ID match, call the devuninit() function of all
+ * registered driver for the given device. Return -1 if initialization
+ * failed, return 1 if no driver is found for this device.
+ */
+static int
+pci_close_all_drivers(struct rte_pci_device *dev)
+{
+   struct rte_pci_driver *dr = NULL;
+   int rc = 0;
+
+   if (dev == NULL)
+   return -1;
+
+   TAILQ_FOREACH(dr, &pci_driver_list, next) {
+   rc = rte_eal_pci_close_one_driver(dr, dev);
+   if (rc < 0)
+   /* negative value is an error */
+   return -1;
+   if (rc > 0)
+   /* positive value means driver not found */
+   continue;
+   return 0;
+   }
+   return 1;
+}
+
+/*
+ * Find the pci device specified by pci address, then invoke probe function of
+ * the driver of the devive.
+ */
+int
+rte_eal_pci_probe_one(struct rte_pci_addr *addr)
+{
+   struct rte_pci_device *dev = NULL;
+   int ret = 0;
+
+   if (addr == NULL)
+   return -1;
+
+   TAILQ_FOREACH(dev, &pci_device_list, next) {
+   if (rte_eal_compare_pci_addr(&dev->addr, addr))
+   continue;
+
+   ret = pci_probe_all_drivers(dev);
+   if (ret < 0)
+   goto err_return;
+   return 0;
+   }
+   return -1;
+
+err_return:
+   RTE_LOG(WARNING, EAL, "Requested device " PCI_PRI_FMT
+   " cannot be used\n", dev->addr.domain, dev->addr.bus,
+   dev->addr.devid, dev->addr.function);
+   return -1;
+}
+
+/*
+ * Find the pci device specified by pci address, then invoke close function of
+ * the driver of the devive.
+ */
+int
+rte_eal_pci_close_one(struct rte_pci_addr *addr)
+{
+   struct rte_pci_device *dev = NULL;
+   int ret = 0;
+
+   if (addr == NULL)
+   return -1;
+
+   TAILQ_FOREACH(dev, &pci_device_list, next) {
+   if (rte_eal_compare_pci_addr(&dev->addr, addr))
+   continue;
+
+   ret = pci_close_all_drivers(dev);
+   if (ret < 0)
+   goto err_return;
+
+   TAILQ_REMOVE(&pci_device_list, dev, next);
+   return 0;
+   }
+   return -1;
+
+err_return:
+   RTE_LOG(WARNING, EAL, "Requested device " PCI_PRI_FMT
+   " cannot be used\n", dev->addr.domain, dev->addr.bus,
+   dev->addr.devid, dev->addr.function);
+   return -1;
+}
+#endif /* RTE_LIBRTE_EAL_HOTPLUG */
+
 /*
  * Scan the content of the PCI bus, and call the devinit() function for
  * all registered drivers that have a matching entry in its id_table
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 159cd66..4acf5a0 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -165,6 +165,21 @@ int rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr,
struct rte_pci_device *dev);

 /**
+ * Munmap memory for single PCI device
+ *
+ * This function is private to EAL.
+ *
+ * @param  dr
+ *  The pointer to the pci driver structure
+ * @param  dev
+ *  The pointer to the pci device structure
+ * @return
+

[dpdk-dev] [PATCH v14 09/13] eal/linux/pci: Add functions for unmapping igb_uio resources

2015-02-25 Thread Tetsuya Mukawa

The patch adds functions for unmapping igb_uio resources. The patch is only
for Linux and igb_uio environment. VFIO and BSD are not supported.

v9:
- Remove "rte_dev_hotplug.h".
- Remove needless "#ifdef".
  (Thanks to Thomas Monjalon and Neil Horman)
- Remove pci_unmap_device(). It will be implemented in later patch.
v8:
- Fix typo.
  (Thanks to Iremonger, Bernard)
v5:
- Fix pci_unmap_device() to check pt_driver.
v4:
- Add parameter checking.
- Add header file to determine if hotplug can be enabled.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 17 
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |  7 
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c  | 65 ++
 3 files changed, 89 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 06bfc1a..d03429c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -168,6 +168,23 @@ pci_map_resource(void *requested_addr, int fd, off_t 
offset, size_t size,
return mapaddr;
 }

+/* unmap a particular resource */
+void
+pci_unmap_resource(void *requested_addr, size_t size)
+{
+   if (requested_addr == NULL)
+   return;
+
+   /* Unmap the PCI memory resource of device */
+   if (munmap(requested_addr, size)) {
+   RTE_LOG(ERR, EAL, "%s(): cannot munmap(%p, 0x%lx): %s\n",
+   __func__, requested_addr, (unsigned long)size,
+   strerror(errno));
+   } else
+   RTE_LOG(DEBUG, EAL, "  PCI memory unmapped at %p\n",
+   requested_addr);
+}
+
 /* parse the "resource" sysfs file */
 static int
 pci_parse_sysfs_resource(const char *filename, struct rte_pci_device *dev)
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index 03d2b52..6af84d1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -72,6 +72,13 @@ void *pci_map_resource(void *requested_addr, int fd, off_t 
offset,
 /* map IGB_UIO resource prototype */
 int pci_uio_map_resource(struct rte_pci_device *dev);

+void pci_unmap_resource(void *requested_addr, size_t size);
+
+#ifdef RTE_LIBRTE_EAL_HOTPLUG
+/* unmap IGB_UIO resource prototype */
+void pci_uio_unmap_resource(struct rte_pci_device *dev);
+#endif /* RTE_LIBRTE_EAL_HOTPLUG */
+
 #ifdef VFIO_PRESENT

 #define VFIO_MAX_GROUPS 64
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index c5e0cf3..35d31c5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -386,3 +386,68 @@ pci_uio_map_resource(struct rte_pci_device *dev)

return 0;
 }
+
+#ifdef RTE_LIBRTE_EAL_HOTPLUG
+static void
+pci_uio_unmap(struct mapped_pci_resource *uio_res)
+{
+   int i;
+
+   if (uio_res == NULL)
+   return;
+
+   for (i = 0; i != uio_res->nb_maps; i++)
+   pci_unmap_resource(uio_res->maps[i].addr,
+   (size_t)uio_res->maps[i].size);
+}
+
+static struct mapped_pci_resource *
+pci_uio_find_resource(struct rte_pci_device *dev)
+{
+   struct mapped_pci_resource *uio_res;
+
+   if (dev == NULL)
+   return NULL;
+
+   TAILQ_FOREACH(uio_res, pci_res_list, next) {
+
+   /* skip this element if it doesn't match our PCI address */
+   if (!rte_eal_compare_pci_addr(&uio_res->pci_addr, &dev->addr))
+   return uio_res;
+   }
+   return NULL;
+}
+
+/* unmap the PCI resource of a PCI device in virtual memory */
+void
+pci_uio_unmap_resource(struct rte_pci_device *dev)
+{
+   struct mapped_pci_resource *uio_res;
+
+   if (dev == NULL)
+   return;
+
+   /* find an entry for the device */
+   uio_res = pci_uio_find_resource(dev);
+   if (uio_res == NULL)
+   return;
+
+   /* secondary processes - just free maps */
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+   return pci_uio_unmap(uio_res);
+
+   TAILQ_REMOVE(pci_res_list, uio_res, next);
+
+   /* unmap all resources */
+   pci_uio_unmap(uio_res);
+
+   /* free uio resource */
+   rte_free(uio_res);
+
+   /* close fd if in primary process */
+   close(dev->intr_handle.fd);
+
+   dev->intr_handle.fd = -1;
+   dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+}
+#endif /* RTE_LIBRTE_EAL_HOTPLUG */
-- 
1.9.1

[dpdk-dev] [PATCH v14 08/13] ethdev: Add functions that will be used by port hotplug functions

2015-02-25 Thread Tetsuya Mukawa

The patch adds following functions.

- rte_eth_dev_save()
  The function is used for saving current rte_eth_dev structures.
- rte_eth_dev_get_changed_port()
  The function receives the rte_eth_dev structures, then compare
  these with current values to know which port is actually
  attached or detached.
- rte_eth_dev_get_addr_by_port()
  The function returns a pci address of an ethdev specified by port
  identifier.
- rte_eth_dev_get_port_by_addr()
  The function returns a port identifier of an ethdev specified by
  pci address.
- rte_eth_dev_get_name_by_port()
  The function returns a unique identifier name of an ethdev
  specified by port identifier.
- Add rte_eth_dev_is_detachable()
  The function returns whether a PMD supports detach function.

Also, the patch changes scope of rte_eth_dev_allocated() to global.
This function will be called by virtual PMDs to support port hotplug.
So change scope of the function to global.

v10:
- Change order of version.map.
  (Thanks to Thomas Monjalon)
v9:
- rte_eth_dev_check_detachable() is replaced by
  rte_eth_dev_is_detachable().
- strncpy() is replaced by strcpy().
  (Thanks to Thomas Monjalon)
- Add missing symbol in version map.
  (Thanks to Nail Horman)
v8:
- Add size parameter to rte_eth_dev_save().
- Add missing symbol in version map.
  (Thanks to Qiu, Michael and Iremonger, Bernard)
v7:
- Add pt_driver checking to rte_eth_dev_check_detachable().
  (Thanks to Qiu, Michael)
v5:
- Fix return value of below functions.
  rte_eth_dev_get_changed_port().
  rte_eth_dev_get_port_by_addr().
v4:
- Add parameter checking.
v3:
- Fix if-condition bug while comparing pci addresses.
- Add error checking codes.
Reported-by: Mark Enright 

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c  | 103 -
 lib/librte_ether/rte_ethdev.h  |  83 ++
 lib/librte_ether/rte_ether_version.map |   7 +++
 3 files changed, 192 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 165ec74..1f6a066 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -201,7 +201,7 @@ rte_eth_dev_data_alloc(void)
RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data));
 }

-static struct rte_eth_dev *
+struct rte_eth_dev *
 rte_eth_dev_allocated(const char *name)
 {
unsigned i;
@@ -426,6 +426,107 @@ rte_eth_dev_count(void)
return (nb_ports);
 }

+int
+rte_eth_dev_save(struct rte_eth_dev *devs, size_t size)
+{
+   if ((devs == NULL) ||
+   (size != sizeof(struct rte_eth_dev) * RTE_MAX_ETHPORTS))
+   return -EINVAL;
+
+   /* save current rte_eth_devices */
+   memcpy(devs, rte_eth_devices, size);
+   return 0;
+}
+
+int
+rte_eth_dev_get_changed_port(struct rte_eth_dev *devs, uint8_t *port_id)
+{
+   if ((devs == NULL) || (port_id == NULL))
+   return -EINVAL;
+
+   /* check which port was attached or detached */
+   for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++, devs++) {
+   if (rte_eth_devices[*port_id].attached ^ devs->attached)
+   return 0;
+   }
+   return -ENODEV;
+}
+
+int
+rte_eth_dev_get_addr_by_port(uint8_t port_id, struct rte_pci_addr *addr)
+{
+   if (!rte_eth_dev_is_valid_port(port_id)) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -EINVAL;
+   }
+
+   if (addr == NULL) {
+   PMD_DEBUG_TRACE("Null pointer is specified\n");
+   return -EINVAL;
+   }
+
+   *addr = rte_eth_devices[port_id].pci_dev->addr;
+   return 0;
+}
+
+int
+rte_eth_dev_get_port_by_addr(struct rte_pci_addr *addr, uint8_t *port_id)
+{
+   struct rte_pci_addr *tmp;
+
+   if ((addr == NULL) || (port_id == NULL)) {
+   PMD_DEBUG_TRACE("Null pointer is specified\n");
+   return -EINVAL;
+   }
+
+   for (*port_id = 0; *port_id < RTE_MAX_ETHPORTS; (*port_id)++) {
+   if (!rte_eth_devices[*port_id].attached)
+   continue;
+   if (!rte_eth_devices[*port_id].pci_dev)
+   continue;
+   tmp = &rte_eth_devices[*port_id].pci_dev->addr;
+   if (rte_eal_compare_pci_addr(tmp, addr) == 0)
+   return 0;
+   }
+   return -ENODEV;
+}
+
+int
+rte_eth_dev_get_name_by_port(uint8_t port_id, char *name)
+{
+   char *tmp;
+
+   if (!rte_eth_dev_is_valid_port(port_id)) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -EINVAL;
+   }
+
+   if (name == NULL) {
+   PMD_DEBUG_TRACE("Null pointer is specified\n");
+   return -EINVAL;
+   }
+
+   /* shouldn't check 'rte_eth_devices[i].data',
+* because it might be overwritten by VDEV PMD */
+   tmp = rte_eth_dev_data[port_id].name;
+   strcpy(name,

[dpdk-dev] [PATCH v14 07/13] eal, ethdev: Add a function and function pointers to close ether device

2015-02-25 Thread Tetsuya Mukawa

The patch adds function pointer to rte_pci_driver and eth_driver
structure. These function pointers are used when ports are detached.
Also, the patch adds rte_eth_dev_uninit(). So far, it's not called
by anywhere, but it will be called when port hotplug function is
implemented.

v10:
- Add size parameter to rte_eth_dev_create_unique_device_name().
  (Thanks to Iremonger, Bernard)
v9:
- Change parameter of pci_devuninit_t and rte_eth_dev_uninit.
- Remove code that initiaize callback of ethdev from
  rte_eth_dev_uninit().
- Add a function to create a unique device name.
  (Thanks to Thomas Monjalon)
v6:
- Fix rte_eth_dev_uninit() to handle a return value of uninit
  function of PMD.
v4:
- Add parameter checking.
- Change function names.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/include/rte_pci.h |  6 
 lib/librte_ether/rte_ethdev.c   | 64 +++--
 lib/librte_ether/rte_ethdev.h   | 24 +
 3 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index dcf9c81..ecde36f 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -192,12 +192,18 @@ struct rte_pci_driver;
 typedef int (pci_devinit_t)(struct rte_pci_driver *, struct rte_pci_device *);

 /**
+ * Uninitialisation function for the driver called during hotplugging.
+ */
+typedef int (pci_devuninit_t)(struct rte_pci_device *);
+
+/**
  * A structure describing a PCI driver.
  */
 struct rte_pci_driver {
TAILQ_ENTRY(rte_pci_driver) next;   /**< Next in list. */
const char *name;   /**< Driver name. */
pci_devinit_t *devinit; /**< Device init. function. */
+   pci_devuninit_t *devuninit; /**< Device uninit function. */
struct rte_pci_id *id_table;/**< ID table, NULL terminated. 
*/
uint32_t drv_flags; /**< Flags contolling handling 
of device. */
 };
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a089557..165ec74 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -266,6 +266,24 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
return 0;
 }

+static inline int
+rte_eth_dev_create_unique_device_name(char *name, size_t size,
+   struct rte_pci_device *pci_dev)
+{
+   int ret;
+
+   if ((name == NULL) || (pci_dev == NULL))
+   return -EINVAL;
+
+   ret = snprintf(name, size, "%d:%d.%d",
+   pci_dev->addr.bus, pci_dev->addr.devid,
+   pci_dev->addr.function);
+   if (ret < 0)
+   return ret;
+
+   return 0;
+}
+
 static int
 rte_eth_dev_init(struct rte_pci_driver *pci_drv,
 struct rte_pci_device *pci_dev)
@@ -279,8 +297,8 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
eth_drv = (struct eth_driver *)pci_drv;

/* Create unique Ethernet device name using PCI address */
-   snprintf(ethdev_name, RTE_ETH_NAME_MAX_LEN, "%d:%d.%d",
-   pci_dev->addr.bus, pci_dev->addr.devid, 
pci_dev->addr.function);
+   rte_eth_dev_create_unique_device_name(ethdev_name,
+   sizeof(ethdev_name), pci_dev);

eth_dev = rte_eth_dev_allocate(ethdev_name);
if (eth_dev == NULL)
@@ -321,6 +339,47 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
return diag;
 }

+static int
+rte_eth_dev_uninit(struct rte_pci_device *pci_dev)
+{
+   const struct eth_driver *eth_drv;
+   struct rte_eth_dev *eth_dev;
+   char ethdev_name[RTE_ETH_NAME_MAX_LEN];
+   int ret;
+
+   if (pci_dev == NULL)
+   return -EINVAL;
+
+   /* Create unique Ethernet device name using PCI address */
+   rte_eth_dev_create_unique_device_name(ethdev_name,
+   sizeof(ethdev_name), pci_dev);
+
+   eth_dev = rte_eth_dev_allocated(ethdev_name);
+   if (eth_dev == NULL)
+   return -ENODEV;
+
+   eth_drv = (const struct eth_driver *)pci_dev->driver;
+
+   /* Invoke PMD device uninit function */
+   if (*eth_drv->eth_dev_uninit) {
+   ret = (*eth_drv->eth_dev_uninit)(eth_drv, eth_dev);
+   if (ret)
+   return ret;
+   }
+
+   /* free ether device */
+   rte_eth_dev_release_port(eth_dev);
+
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+   rte_free(eth_dev->data->dev_private);
+
+   eth_dev->pci_dev = NULL;
+   eth_dev->driver = NULL;
+   eth_dev->data = NULL;
+
+   return 0;
+}
+
 /**
  * Register an Ethernet [Poll Mode] driver.
  *
@@ -339,6 +398,7 @@ void
 rte_eth_driver_register(struct eth_driver *eth_drv)
 {
eth_drv->pci_drv.devinit = rte_eth_dev_init;
+   eth_drv->pci_drv.devuninit = rte_eth_dev_uninit;
rte_eal_pci_regis

[dpdk-dev] [PATCH v14 06/13] ethdev: Add rte_eth_dev_release_port to release specified port

2015-02-25 Thread Tetsuya Mukawa

This patch adds rte_eth_dev_release_port(). The function is used for
changing an attached status of the device that has specified name.

v9:
- rte_eth_dev_free() is replaced by rte_eth_dev_release_port().
  (Thanks to Thomas Monjalon)
v6:
- Use rte_eth_dev structure as the paramter of rte_eth_dev_free().
v4:
- Add parameter checking.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.c | 11 +++
 lib/librte_ether/rte_ethdev.h | 12 
 2 files changed, 23 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index b702039..a089557 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -255,6 +255,17 @@ rte_eth_dev_allocate(const char *name)
return eth_dev;
 }

+int
+rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
+{
+   if (eth_dev == NULL)
+   return -EINVAL;
+
+   eth_dev->attached = 0;
+   nb_ports--;
+   return 0;
+}
+
 static int
 rte_eth_dev_init(struct rte_pci_driver *pci_drv,
 struct rte_pci_device *pci_dev)
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 110ddba..7963e56 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1539,6 +1539,18 @@ extern uint8_t rte_eth_dev_count(void);
  */
 struct rte_eth_dev *rte_eth_dev_allocate(const char *name);

+/**
+ * Function for internal use by dummy drivers primarily, e.g. ring-based
+ * driver.
+ * Release the specified ethdev port.
+ *
+ * @param eth_dev
+ * The *eth_dev* pointer is the address of the *rte_eth_dev* structure.
+ * @return
+ *   - 0 on success, negative on error
+ */
+int rte_eth_dev_release_port(struct rte_eth_dev *eth_dev);
+
 struct eth_driver;
 /**
  * @internal
-- 
1.9.1

[dpdk-dev] [PATCH v14 05/13] eal/pci: Consolidate pci address comparison APIs

2015-02-25 Thread Tetsuya Mukawa

This patch replaces pci_addr_comparison() and memcmp() of pci addresses by
rte_eal_compare_pci_addr().

To compare PCI addresses, rte_eal_compare_pci_addr() doesn't use memcmp().
This is because sizeof(struct rte_pci_addr) returns 6, but actually
this structure is like below.

struct rte_pci_addr {
uint16_t domain;/**< Device domain */
uint8_t bus;/**< Device bus */
uint8_t devid;  /**< Device ID */
uint8_t function;   /**< Device function. */
};

If the structure is dynamically allocated in a function without bzero,
last 1 byte may have value. As a result, memcmp may not work.
To avoid such a case, rte_eal_compare_pci_addr() compare following values.

dev_addr = (addr->domain << 24) | (addr->bus << 16) |
(addr->devid << 8) | addr->function;

v9:
- eal_compare_pci_addr() is replaced by rte_eal_compare_pci_addr().
- Fix commit log.
  (Thanks to Thomas Monjalon)
v8:
- Fix pci_scan_one() to update sysfs values.
  (Thanks to Qiu, Michael and Iremonger, Bernard)
v5:
- Fix pci_scan_one to handle pt_driver correctly.
v4:
- Fix calculation method of eal_compare_pci_addr().
- Add parameter checking.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c   | 29 --
 lib/librte_eal/common/eal_common_pci.c|  2 +-
 lib/librte_eal/common/include/rte_pci.h   | 34 +++
 lib/librte_eal/linuxapp/eal/eal_pci.c | 30 +--
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  2 +-
 5 files changed, 63 insertions(+), 34 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 74ecce7..9193f80 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -270,20 +270,6 @@ pci_uio_map_resource(struct rte_pci_device *dev)
return (0);
 }

-/* Compare two PCI device addresses. */
-static int
-pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
-{
-   uint64_t dev_addr = (addr->domain << 24) + (addr->bus << 16) + 
(addr->devid << 8) + addr->function;
-   uint64_t dev_addr2 = (addr2->domain << 24) + (addr2->bus << 16) + 
(addr2->devid << 8) + addr2->function;
-
-   if (dev_addr > dev_addr2)
-   return 1;
-   else
-   return 0;
-}
-
-
 /* Scan one pci sysfs entry, and fill the devices list from it. */
 static int
 pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
@@ -356,13 +342,24 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
}
else {
struct rte_pci_device *dev2 = NULL;
+   int ret;

TAILQ_FOREACH(dev2, &pci_device_list, next) {
-   if (pci_addr_comparison(&dev->addr, &dev2->addr))
+   ret = rte_eal_compare_pci_addr(&dev->addr, &dev2->addr);
+   if (ret > 0)
continue;
-   else {
+   else if (ret < 0) {
TAILQ_INSERT_BEFORE(dev2, dev, next);
return 0;
+   } else { /* already registered */
+   /* update pt_driver */
+   dev2->pt_driver = dev->pt_driver;
+   dev2->max_vfs = dev->max_vfs;
+   memmove(dev2->mem_resource,
+   dev->mem_resource,
+   sizeof(dev->mem_resource));
+   free(dev);
+   return 0;
}
}
TAILQ_INSERT_TAIL(&pci_device_list, dev, next);
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index f3c7f71..bf2793f 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -93,7 +93,7 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)
if (devargs->type != RTE_DEVTYPE_BLACKLISTED_PCI &&
devargs->type != RTE_DEVTYPE_WHITELISTED_PCI)
continue;
-   if (!memcmp(&dev->addr, &devargs->pci.addr, sizeof(dev->addr)))
+   if (!rte_eal_compare_pci_addr(&dev->addr, &devargs->pci.addr))
return devargs;
}
return NULL;
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 255a77b..dcf9c81 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -272,6 +272,40 @@ eal_parse_pci_DomBDF(const char *input, struct 
rte_pci_addr *dev_addr)
 }
 #undef GET_PCIADDR_FIELD

+/* Compare two PCI device addresses. */
+/**
+ * Utility function to compare two PCI device addresses.
+ *
+

[dpdk-dev] [PATCH v14 04/13] eal/pci, ethdev: Remove assumption that port will not be detached

2015-02-25 Thread Tetsuya Mukawa

To remove assumption, do like followings.

This patch adds "RTE_PCI_DRV_DETACHABLE" to drv_flags of rte_pci_driver
structure. The flags indicate the driver can detach devices at runtime.
Also, remove assumption that port will not be detached.

To remove the assumption.
- Add 'attached' member to rte_eth_dev structure.
  This member is used for indicating the port is attached, or not.
  DEV_ATTACHED indicates a port is attached.
  DEV_DETACHED indicates a port is detached.
- Add rte_eth_dev_allocate_new_port().
  This function is used for allocating new port.

v9:
- DEV_INVALID/VALID are removed.
- DEV_DISCONNECTED is replaced by DEV_DETACHED.
- DEV_CONNECTED is replaced by DEV_ATTACHED.
- rte_eth_dev_allocate_new_port() is renamed to
  rte_eth_dev_find_free_port().
- rte_eth_dev_validate_port() is renamed to rte_eth_dev_is_valid_port().
- rte_eth_dev_is_valid_port() is changed not to handle log toggle.
- Fix commit log to describe DEV_ATACHED and DEV_DETACHED.
  (Thanks to Thomas Monjalon)
v8:
- NONE_TRACE is changed to NO_TRACE.
  (Thanks to Iremonger, Bernard)
v5:
- Change parameters of rte_eth_dev_validate_port() to cleanup code.
v4:
- Use braces with 'for' loop.
- Fix indent of 'if' statement.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/include/rte_pci.h |   2 +
 lib/librte_ether/rte_ethdev.c   | 248 
 lib/librte_ether/rte_ethdev.h   |   5 +
 3 files changed, 164 insertions(+), 91 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index a87b4b3..255a77b 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -210,6 +210,8 @@ struct rte_pci_driver {
 #define RTE_PCI_DRV_FORCE_UNBIND 0x0004
 /** Device driver supports link state interrupt */
 #define RTE_PCI_DRV_INTR_LSC   0x0008
+/** Device driver supports detaching capability */
+#define RTE_PCI_DRV_DETACHABLE 0x0010

 /**< Internal use only - Macro used by pci addr parsing functions **/
 #define GET_PCIADDR_FIELD(in, fd, lim, dlm)   \
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ecbe93c..b702039 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -175,6 +175,11 @@ enum {
STAT_QMAP_RX
 };

+enum {
+   DEV_DETACHED = 0,
+   DEV_ATTACHED
+};
+
 static inline void
 rte_eth_dev_data_alloc(void)
 {
@@ -201,19 +206,34 @@ rte_eth_dev_allocated(const char *name)
 {
unsigned i;

-   for (i = 0; i < nb_ports; i++) {
-   if (strcmp(rte_eth_devices[i].data->name, name) == 0)
+   for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
+   if ((rte_eth_devices[i].attached == DEV_ATTACHED) &&
+   strcmp(rte_eth_devices[i].data->name, name) == 0)
return &rte_eth_devices[i];
}
return NULL;
 }

+static uint8_t
+rte_eth_dev_find_free_port(void)
+{
+   unsigned i;
+
+   for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
+   if (rte_eth_devices[i].attached == DEV_DETACHED)
+   return i;
+   }
+   return RTE_MAX_ETHPORTS;
+}
+
 struct rte_eth_dev *
 rte_eth_dev_allocate(const char *name)
 {
+   uint8_t port_id;
struct rte_eth_dev *eth_dev;

-   if (nb_ports == RTE_MAX_ETHPORTS) {
+   port_id = rte_eth_dev_find_free_port();
+   if (port_id == RTE_MAX_ETHPORTS) {
PMD_DEBUG_TRACE("Reached maximum number of Ethernet ports\n");
return NULL;
}
@@ -226,10 +246,12 @@ rte_eth_dev_allocate(const char *name)
return NULL;
}

-   eth_dev = &rte_eth_devices[nb_ports];
-   eth_dev->data = &rte_eth_dev_data[nb_ports];
+   eth_dev = &rte_eth_devices[port_id];
+   eth_dev->data = &rte_eth_dev_data[port_id];
snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
-   eth_dev->data->port_id = nb_ports++;
+   eth_dev->data->port_id = port_id;
+   eth_dev->attached = DEV_ATTACHED;
+   nb_ports++;
return eth_dev;
 }

@@ -283,6 +305,7 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
(unsigned) pci_dev->id.device_id);
if (rte_eal_process_type() == RTE_PROC_PRIMARY)
rte_free(eth_dev->data->dev_private);
+   eth_dev->attached = DEV_DETACHED;
nb_ports--;
return diag;
 }
@@ -308,10 +331,20 @@ rte_eth_driver_register(struct eth_driver *eth_drv)
rte_eal_pci_register(ð_drv->pci_drv);
 }

+static int
+rte_eth_dev_is_valid_port(uint8_t port_id)
+{
+   if (port_id >= RTE_MAX_ETHPORTS ||
+   rte_eth_devices[port_id].attached != DEV_ATTACHED)
+   return 0;
+   else
+   return 1;
+}
+
 int
 rte_eth_dev_socket_id(uint8_t port_id)
 {
-   if (port_id >= nb_ports)
+   if (!rte_eth_dev_is_valid_port(port_id))
return -1;
return rte_eth_device

[dpdk-dev] [PATCH v14 03/13] eal_pci: pci memory map work with driver type

2015-02-25 Thread Tetsuya Mukawa

From: Michael Qiu 

With the driver type flag in struct rte_pci_dev, we do not need
to always  map uio devices with vfio related function when
vfio enabled.

Signed-off-by: Michael Qiu 
Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 4615756..3291c68 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -555,25 +555,29 @@ pci_config_space_set(struct rte_pci_device *dev)
 static int
 pci_map_device(struct rte_pci_device *dev)
 {
-   int ret, mapped = 0;
+   int ret = -1;

/* try mapping the NIC resources using VFIO if it exists */
+   switch (dev->pt_driver) {
+   case RTE_PT_VFIO:
 #ifdef VFIO_PRESENT
-   if (pci_vfio_is_enabled()) {
-   ret = pci_vfio_map_resource(dev);
-   if (ret == 0)
-   mapped = 1;
-   else if (ret < 0)
-   return ret;
-   }
+   if (pci_vfio_is_enabled())
+   ret = pci_vfio_map_resource(dev);
 #endif
-   /* map resources for devices that use uio_pci_generic or igb_uio */
-   if (!mapped) {
+   break;
+   case RTE_PT_IGB_UIO:
+   case RTE_PT_UIO_GENERIC:
+   /* map resources for devices that use uio */
ret = pci_uio_map_resource(dev);
-   if (ret != 0)
-   return ret;
+   break;
+   default:
+   RTE_LOG(DEBUG, EAL, "  Not managed by known pt driver,"
+   " skipped\n");
+   ret = 1;
+   break;
}
-   return 0;
+
+   return ret;
 }

 /*
-- 
1.9.1

[dpdk-dev] [PATCH v14 02/13] eal_pci: Add flag to hold kernel driver type

2015-02-25 Thread Tetsuya Mukawa

From: Michael Qiu 

Currently, dpdk has no ability to know which type of driver(
vfio-pci/igb_uio/uio_pci_generic) the device used. It only can
check whether vfio is enabled or not staticly.

It really useful to have the flag, becasue different type need to
handle differently in runtime. For example, pci memory map,
pot hotplug, and so on.

This patch add a flag field for pci device to solve above issue.

Signed-off-by: Michael Qiu 
Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_eal/common/include/rte_pci.h |  8 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 53 +++--
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 3df07e8..a87b4b3 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -142,6 +142,13 @@ struct rte_pci_addr {

 struct rte_devargs;

+enum rte_pt_driver {
+   RTE_PT_UNKNOWN  = 0,
+   RTE_PT_IGB_UIO  = 1,
+   RTE_PT_VFIO = 2,
+   RTE_PT_UIO_GENERIC  = 3,
+};
+
 /**
  * A structure describing a PCI device.
  */
@@ -155,6 +162,7 @@ struct rte_pci_device {
uint16_t max_vfs;   /**< sriov enable if not zero */
int numa_node;  /**< NUMA node connection */
struct rte_devargs *devargs;/**< Device user arguments */
+   enum rte_pt_driver pt_driver;   /**< Driver of passthrough */
 };

 /** Any PCI device identifier (vendor, device, ...) */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index a4fd5f5..4615756 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -97,6 +97,35 @@ error:
return -1;
 }

+static int
+pci_get_kernel_driver_by_path(const char *filename, char *dri_name)
+{
+   int count;
+   char path[PATH_MAX];
+   char *name;
+
+   if (!filename || !dri_name)
+   return -1;
+
+   count = readlink(filename, path, PATH_MAX);
+   if (count >= PATH_MAX)
+   return -1;
+
+   /* For device does not have a driver */
+   if (count < 0)
+   return 1;
+
+   path[count] = '\0';
+
+   name = strrchr(path, '/');
+   if (name) {
+   strncpy(dri_name, name + 1, strlen(name + 1) + 1);
+   return 0;
+   }
+
+   return -1;
+}
+
 void *
 pci_find_max_end_va(void)
 {
@@ -221,11 +250,12 @@ pci_scan_one(const char *dirname, uint16_t domain, 
uint8_t bus,
char filename[PATH_MAX];
unsigned long tmp;
struct rte_pci_device *dev;
+   char driver[PATH_MAX];
+   int ret;

dev = malloc(sizeof(*dev));
-   if (dev == NULL) {
+   if (dev == NULL)
return -1;
-   }

memset(dev, 0, sizeof(*dev));
dev->addr.domain = domain;
@@ -304,6 +334,25 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t 
bus,
return -1;
}

+   /* parse driver */
+   snprintf(filename, sizeof(filename), "%s/driver", dirname);
+   ret = pci_get_kernel_driver_by_path(filename, driver);
+   if (!ret) {
+   if (!strcmp(driver, "vfio-pci"))
+   dev->pt_driver = RTE_PT_VFIO;
+   else if (!strcmp(driver, "igb_uio"))
+   dev->pt_driver = RTE_PT_IGB_UIO;
+   else if (!strcmp(driver, "uio_pci_generic"))
+   dev->pt_driver = RTE_PT_UIO_GENERIC;
+   else
+   dev->pt_driver = RTE_PT_UNKNOWN;
+   } else if (ret < 0) {
+   RTE_LOG(ERR, EAL, "Fail to get kernel driver\n");
+   free(dev);
+   return -1;
+   } else
+   dev->pt_driver = RTE_PT_UNKNOWN;
+
/* device is valid, add in list (sorted) */
if (TAILQ_EMPTY(&pci_device_list)) {
TAILQ_INSERT_TAIL(&pci_device_list, dev, next);
-- 
1.9.1

[dpdk-dev] [PATCH v14 01/13] eal: Enable port Hotplug framework in Linux

2015-02-25 Thread Tetsuya Mukawa

The patch adds CONFIG_RTE_LIBRTE_EAL_HOTPLUG in Linux and BSD
configuration. So far, Hotplug functions only support linux.

v9:
- Move this patch at the top of this patch series.
  (Thanks to Thomas Monjalon)

Signed-off-by: Tetsuya Mukawa 
---
 config/common_bsdapp   | 6 ++
 config/common_linuxapp | 5 +
 2 files changed, 11 insertions(+)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 83a62a6..4108c01 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -116,6 +116,12 @@ CONFIG_RTE_LIBRTE_EAL_BSDAPP=y
 CONFIG_RTE_LIBRTE_EAL_LINUXAPP=n

 #
+# Compile Environment Abstraction Layer to support hotplug
+# So far, Hotplug functions only support linux
+#
+CONFIG_RTE_LIBRTE_EAL_HOTPLUG=n
+
+#
 # Compile Environment Abstraction Layer to support Vmware TSC map
 #
 CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 2716381..8ba0258 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -114,6 +114,11 @@ CONFIG_RTE_PCI_MAX_READ_REQUEST_SIZE=0
 CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y

 #
+# Compile Environment Abstraction Layer to support hotplug
+#
+CONFIG_RTE_LIBRTE_EAL_HOTPLUG=y
+
+#
 # Compile Environment Abstraction Layer to support Vmware TSC map
 #
 CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
-- 
1.9.1

[dpdk-dev] [PATCH v14 00/13] Port Hotplug Framework

2015-02-25 Thread Tetsuya Mukawa

This patch series adds a dynamic port hotplug framework to DPDK.
With the patches, DPDK apps can attach or detach ports at runtime.

The basic concept of the port hotplug is like followings.
- DPDK apps must have responsibility to manage ports.
  DPDK apps only know which ports are attached or detached at the moment.
  The port hotplug framework is implemented to allow DPDK apps to manage ports.
  For example, when DPDK apps call port attach function, attached port number
  will be returned. Also, DPDK apps can detach port by port number.
- Kernel support is needed for attaching or detaching physical device ports.
  To attach a new physical device port, the device will be recognized by
  userspace directly I/O framework in kernel at first. Then DPDK apps can
  call the port hotplug functions to attach ports.
  For detaching, steps are vice versa.
- Before detach ports, ports must be stopped and closed.
  DPDK application must call rte_eth_dev_stop() and rte_eth_dev_close() before
  detaching ports. These function will call finalization codes of PMDs.
  But so far, no PMD frees all resources allocated by initialization.
  It means PMDs are needed to be fixed to support the port hotplug.
  'RTE_PCI_DRV_DETACHABLE' is a new flag indicating a PMD supports detaching.
  Without this flag, detaching will be failed.
- Mustn't affect legacy DPDK apps.
  No DPDK EAL behavior is changed, if the port hotplug functions are't called.
  So all legacy DPDK apps can still work without modifications.

And a few limitations.
- The port hotplug functions are not thread safe.
  DPDK apps should handle it.
- Only support Linux and igb_uio so far.
  BSD and VFIO is not supported. I will send VFIO patches at least, but I don't
  have a plan to submit BSD patch so far.


Here is port hotplug APIs.
---
/**
 * Attach a new device.
 *
 * @param devargs
 *   A pointer to a strings array describing the new device
 *   to be attached. The strings should be a pci address like
 *   ':01:00.0' or virtual device name like 'eth_pcap0'.
 * @param port_id
 *  A pointer to a port identifier actually attached.
 * @return
 *  0 on success and port_id is filled, negative on error
 */
int rte_eal_dev_attach(const char *devargs, uint8_t *port_id);

/**
 * Detach a device.
 *
 * @param port_id
 *   The port identifier of the device to detach.
 * @param addr
 *  A pointer to a device name actually detached.
 * @return
 *  0 on success and devname is filled, negative on error
 */
int rte_eal_dev_detach(uint8_t port_id, char *devname);
---

This patch series are for DPDK EAL. To use port hotplug function by DPDK apps,
each PMD should be fixed to support 'RTE_PCI_DRV_DETACHABLE' flag. Please check
a patch for pcap PMD.

Also, please check testpmd patch. It will show you how to fix your legacy
applications to support port hotplug feature.

PATCH v14 changes
 - Remove needless if statement.
   (Thanks to Maxime Leroy)

PATCH v13 changes
 - Change log level when error occurs in rte_eal_vdev_init() and
   rte_eal_dev_init().
 - Return value of driver init and uninit functions.
 - Replace rte_panic by RTE_LOG in rte_eal_dev_init()
 - Fix return value of rte_eal_vdev_uninit().
 - Fix rte_eal_dev_attach_vdev to set port_id correctly.
   (Thanks to Maxime Leroy)

PATCH v12 changes
 - Add missing symbol in version map.
   (Thanks to Iremonger, Bernard)

PATCH v11 changes
 - Remove needless devargs handling codes.
 - Replace get_vdev_name() by rte_eal_parse_devargs_str().
 - Replace rte_eal_vdev_find_and_init by rte_eal_vdev_init()
 - Replace rte_eal_vdev_find_and_uninit by rte_eal_vdev_uninit()
 - Fix rte_eal_dev_init() to use rte_eal_vdev_init().
 - Remove needless patch.
   (Thanks to Maxime Leroy)

PATCH v10 changes
 - Add comments.
 - Chagne order of version.map.
 - Fix comment of "rte_ethdev.h".
   (Thanks to Thomas Monjalon)
 - Add size parameter to rte_eth_dev_create_unique_device_name().
   (Thanks to Iremonger, Bernard)

PATCH v9 changes
 - Fix commit title.
 - Fix commit log.
 - Fix comments.
 - Define CONFIG_RTE_LIBRTE_EAL_HOTPLUG at the top of this patch series.
 - DEV_INVALID/VALID are removed.
 - DEV_DISCONNECTED is replaced by DEV_DETACHED.
 - DEV_CONNECTED is replaced by DEV_ATTACHED.
 - rte_eth_dev_allocate_new_port() is renamed to
   rte_eth_dev_find_free_port().
 - rte_eth_dev_validate_port() is renamed to rte_eth_dev_is_valid_port().
 - rte_eth_dev_is_valid_port() is changed not to handle log toggle.
 - eal_compare_pci_addr() is replaced by rte_eal_compare_pci_addr().
 - rte_eth_dev_free() is replaced by rte_eth_dev_release_port().
 - Add a function to create a unique device name.
 - Change parameter of pci_devuninit_t and rte_eth_dev_uninit.
 - Remove code that initiaize callback of ethdev from
   rte_eth_dev_uninit().
 - Remove pci_unmap_device(). It will be implemented in later patch.
 -

[dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst

2015-02-25 Thread Olivier Deme

Hi Marc,

I think one of the observations is that currently the alloc_q grows very 
quickly to the maximum fifo size (1024).
The patch suggests fixing the alloc_q to a fix size and maybe make that 
size configurable in rte_kni_alloc or rte_kni_init.

It should then be up to the application to provision the mempool 
accordingly.
Currently the out of memory problem shows up if the mempool doesn't have 
1024 buffers per KNI.

Olivier.

On 25/02/15 12:38, Marc Sune wrote:
>
> On 25/02/15 13:24, Hemant at freescale.com wrote:
>> Hi OIivier
>>  Comments inline.
>> Regards,
>> Hemant
>>
>>> -Original Message-
>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Deme
>>> Sent: 25/Feb/2015 5:44 PM
>>> To: dev at dpdk.org
>>> Subject: Re: [dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst
>>>
>>> Thank you Hemant, I think there might be one issue left with the 
>>> patch though.
>>> The alloc_q must initially be filled with mbufs before getting mbuf 
>>> back on the
>>> tx_q.
>>>
>>> So the patch should allow rte_kni_rx_burst to check if alloc_q is 
>>> empty.
>>> If so, it should invoke kni_allocate_mbufs(kni, 0) (to fill the 
>>> alloc_q with
>>> MAX_MBUF_BURST_NUM mbufs)
>>>
>>> The patch for rte_kni_rx_burst would then look like:
>>>
>>> @@ -575,7 +575,7 @@ rte_kni_rx_burst(struct rte_kni *kni, struct 
>>> rte_mbuf
>>> **mbufs, unsigned num)
>>>
>>>/* If buffers removed, allocate mbufs and then put them into 
>>> alloc_q */
>>>if (ret)
>>> -kni_allocate_mbufs(kni);
>>> +  kni_allocate_mbufs(kni, ret);
>>> +  else if (unlikely(kni->alloc_q->write == kni->alloc_q->read))
>>> +  kni_allocate_mbufs(kni, 0);
>>>
>> [hemant]  This will introduce a run-time check.
>>
>> I missed to include the other change in the patch.
>>   I am doing it in kni_alloc i.e. initiate the alloc_q with default 
>> burst size.
>> kni_allocate_mbufs(ctx, 0);
>>
>> In a way, we are now suggesting to reduce the size of alloc_q to only 
>> default burst size.
>
> As an aside comment here, I think that we should allow to tweak the 
> userspace <-> kernel queue sizes (rx_q, tx_q, free_q and alloc_q) . 
> Whether this should be a build configuration option or a parameter to 
> rte_kni_init(), it is not completely clear to me, but I guess 
> rte_kni_init() is a better option.
>
> Having said that, the original mail from Hemant was describing that 
> KNI was giving an out-of-memory. This to me indicates that the pool is 
> incorrectly dimensioned. Even if KNI will not pre-allocate in the 
> alloc_q, or not completely, in the event of high load, you will get 
> this same "out of memory".
>
> We can reduce the usage of buffers by the KNI subsystem in kernel 
> space and in userspace, but the kernel will always need a small cache 
> of pre-allocated buffers (coming from user-space), since the KNI 
> kernel module does not know where to grab the packets from (which 
> pool). So my guess is that the dimensioning problem experienced by 
> Hemant would be the same, even with the proposed changes.
>
>>
>> Can we reach is situation, when the kernel is adding packets faster 
>> in tx_q than the application is able to dequeue?
>
> I think so. We cannot control much how the kernel will schedule the 
> KNI thread(s), specially if the # of threads in relation to the cores 
> is incorrect (not enough), hence we need at least a reasonable amount 
> of buffering to prevent early dropping to those "internal" burst side 
> effects.
>
> Marc
>
>>   alloc_q  can be empty in this case and kernel will be striving.
>>
>>> Olivier.
>>>
>>> On 25/02/15 11:48, Hemant Agrawal wrote:
 From: Hemant Agrawal 

 if any buffer is read from the tx_q, MAX_BURST buffers will be 
 allocated and
>>> attempted to be added to to the alloc_q.
 This seems terribly inefficient and it also looks like the alloc_q 
 will quickly fill
>>> to its maximum capacity. If the system buffers are low in number, it 
>>> will reach
>>> "out of memory" situation.
 This patch allocates the number of buffers as many dequeued from tx_q.

 Signed-off-by: Hemant Agrawal 
 ---
lib/librte_kni/rte_kni.c | 13 -
1 file changed, 8 insertions(+), 5 deletions(-)

 diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index
 4e70fa0..4cf8e30 100644
 --- a/lib/librte_kni/rte_kni.c
 +++ b/lib/librte_kni/rte_kni.c
 @@ -128,7 +128,7 @@ struct rte_kni_memzone_pool {


static void kni_free_mbufs(struct rte_kni *kni); -static void
 kni_allocate_mbufs(struct rte_kni *kni);
 +static void kni_allocate_mbufs(struct rte_kni *kni, int num);

static volatile int kni_fd = -1;
static struct rte_kni_memzone_pool kni_memzone_pool = { @@ -575,7
 +575,7 @@ rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf
 **mbufs, unsigned num)

/* If buffers removed, allocate mbufs and then put them into 
 alloc_

[dpdk-dev] Vhost-user early adopter feedback

2015-02-25 Thread Benoît Canet

The Wednesday 25 Feb 2015 ? 07:46:56 (+), Xie, Huawei wrote :
> On 2/18/2015 3:59 PM, Beno?t Canet wrote:
> > Hello Xie,
> >
> > As promized I integrated your vhost-user patchset from january in my 
> > vswitch.
> >
> > I just tried it, it works pretty well.
> >
> > I just had a minor bug with rte_vhost_driver_register taking ownership of 
> > the
> > string patch pointer too late. I freed it out of habit just after 
> > registering in the
> > caller and when ifname[IFNAMESIZ] was written the pointer was used for a 
> > new string I
> > allocated later. Maybe an early strdup() would fix this.
> Thanks.
> Do you mean we duplicate a string from the first parameter path, like
> vserver->path = strdup(path) ?

Yes I was thinking about this.

Best regards

Beno?t

> If this was the case, it was ever in my mind. We would do this if
> necessary.

> >
> > The last patch of your new version is really a great idea since it will
> > simplify a lot the socket creation and management code.
> >
> > Best regards
> >
> > Beno?t
> >
> >
>

[dpdk-dev] [PATCH v4 5/7] pmd ixgbe: enable DCB in SRIOV

2015-02-25 Thread Pawel Wodkowski

On 2015-02-25 04:36, Ouyang, Changchun wrote:
>> @@ -652,7 +655,9 @@ ixgbe_get_vf_queues(struct rte_eth_dev *dev,
>> >uint32_t vf, uint32_t *msgbuf)  {
>> >struct ixgbe_vf_info *vfinfo =
>> >*IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data-
>>> > >dev_private);
>> >-   uint32_t default_q = vf *
>> >RTE_ETH_DEV_SRIOV(dev).nb_tx_q_per_pool;
>> >+   struct ixgbe_dcb_config *dcbinfo =
>> >+   IXGBE_DEV_PRIVATE_TO_DCB_CFG(dev->data-
>>> > >dev_private);
>> >+   uint32_t default_q = RTE_ETH_DEV_SRIOV(dev).def_pool_q_idx;
> Why need change the default_q here?
>

Because this field holds default queue index.

-- 
Pawel

[dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst

2015-02-25 Thread Olivier Deme

I guess it would be unusual but possible for the kernel to enqueue 
faster to tx_q than the application dequeues.
But that would also be possible with a real NIC, so I think it is 
acceptable for the kernel to have to drop egress packets in that case.


On 25/02/15 12:24, Hemant at freescale.com wrote:
> Hi OIivier
>Comments inline.
> Regards,
> Hemant
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Deme
>> Sent: 25/Feb/2015 5:44 PM
>> To: dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst
>>
>> Thank you Hemant, I think there might be one issue left with the patch 
>> though.
>> The alloc_q must initially be filled with mbufs before getting mbuf back on 
>> the
>> tx_q.
>>
>> So the patch should allow rte_kni_rx_burst to check if alloc_q is empty.
>> If so, it should invoke kni_allocate_mbufs(kni, 0) (to fill the alloc_q with
>> MAX_MBUF_BURST_NUM mbufs)
>>
>> The patch for rte_kni_rx_burst would then look like:
>>
>> @@ -575,7 +575,7 @@ rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf
>> **mbufs, unsigned num)
>>
>>/* If buffers removed, allocate mbufs and then put them into alloc_q 
>> */
>>if (ret)
>> -kni_allocate_mbufs(kni);
>> +  kni_allocate_mbufs(kni, ret);
>> +  else if (unlikely(kni->alloc_q->write == kni->alloc_q->read))
>> +  kni_allocate_mbufs(kni, 0);
>>
> [hemant]  This will introduce a run-time check.
>
> I missed to include the other change in the patch.
>   I am doing it in kni_alloc i.e. initiate the alloc_q with default burst 
> size.
>   kni_allocate_mbufs(ctx, 0);
>
> In a way, we are now suggesting to reduce the size of alloc_q to only default 
> burst size.
>
> Can we reach is situation, when the kernel is adding packets faster in tx_q 
> than the application is able to dequeue?
>   alloc_q  can be empty in this case and kernel will be striving.
>
>> Olivier.
>>
>> On 25/02/15 11:48, Hemant Agrawal wrote:
>>> From: Hemant Agrawal 
>>>
>>> if any buffer is read from the tx_q, MAX_BURST buffers will be allocated and
>> attempted to be added to to the alloc_q.
>>> This seems terribly inefficient and it also looks like the alloc_q will 
>>> quickly fill
>> to its maximum capacity. If the system buffers are low in number, it will 
>> reach
>> "out of memory" situation.
>>> This patch allocates the number of buffers as many dequeued from tx_q.
>>>
>>> Signed-off-by: Hemant Agrawal 
>>> ---
>>>lib/librte_kni/rte_kni.c | 13 -
>>>1 file changed, 8 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index
>>> 4e70fa0..4cf8e30 100644
>>> --- a/lib/librte_kni/rte_kni.c
>>> +++ b/lib/librte_kni/rte_kni.c
>>> @@ -128,7 +128,7 @@ struct rte_kni_memzone_pool {
>>>
>>>
>>>static void kni_free_mbufs(struct rte_kni *kni); -static void
>>> kni_allocate_mbufs(struct rte_kni *kni);
>>> +static void kni_allocate_mbufs(struct rte_kni *kni, int num);
>>>
>>>static volatile int kni_fd = -1;
>>>static struct rte_kni_memzone_pool kni_memzone_pool = { @@ -575,7
>>> +575,7 @@ rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf
>>> **mbufs, unsigned num)
>>>
>>> /* If buffers removed, allocate mbufs and then put them into alloc_q
>> */
>>> if (ret)
>>> -   kni_allocate_mbufs(kni);
>>> +   kni_allocate_mbufs(kni, ret);
>>>
>>> return ret;
>>>}
>>> @@ -594,7 +594,7 @@ kni_free_mbufs(struct rte_kni *kni)
>>>}
>>>
>>>static void
>>> -kni_allocate_mbufs(struct rte_kni *kni)
>>> +kni_allocate_mbufs(struct rte_kni *kni, int num)
>>>{
>>> int i, ret;
>>> struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM]; @@ -620,7 +620,10
>> @@
>>> kni_allocate_mbufs(struct rte_kni *kni)
>>> return;
>>> }
>>>
>>> -   for (i = 0; i < MAX_MBUF_BURST_NUM; i++) {
>>> +   if (num == 0 || num > MAX_MBUF_BURST_NUM)
>>> +   num = MAX_MBUF_BURST_NUM;
>>> +
>>> +   for (i = 0; i < num; i++) {
>>> pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
>>> if (unlikely(pkts[i] == NULL)) {
>>> /* Out of memory */
>>> @@ -636,7 +639,7 @@ kni_allocate_mbufs(struct rte_kni *kni)
>>> ret = kni_fifo_put(kni->alloc_q, (void **)pkts, i);
>>>
>>> /* Check if any mbufs not put into alloc_q, and then free them */
>>> -   if (ret >= 0 && ret < i && ret < MAX_MBUF_BURST_NUM)
>> {MAX_MBUF_BURST_NUM
>>> +   if (ret >= 0 && ret < i && ret < num) {
>>> int j;
>>>
>>> for (j = ret; j < i; j++)
>> --
>>  *Olivier Dem?*
>> *Druid Software Ltd.*
>> *Tel: +353 1 202 1831*
>> *Email: odeme at druidsoftware.com *
>> *URL: http://www.druidsoftware.com*
>>  *Hall 7, stand 7F70.*
>> Druid Software: Monetising enterprise small cells solutions.

-- 
*Olivier Dem?*
*Druid Software Ltd.*
*Tel: +353 1 202 1831*
*Email: odeme at druidsoftware.com

[dpdk-dev] [PATCH 2/2] doc: update programmers guide for uio_pci_generic

2015-02-25 Thread Bruce Richardson

On Wed, Feb 25, 2015 at 12:19:10PM +, Iremonger, Bernard wrote:
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Tuesday, February 24, 2015 4:28 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH 2/2] doc: update programmers guide for 
> > uio_pci_generic
> > 
> > Since DPDK now has support for the in-tree uio_pci_generic driver, update 
> > the programmers guide
> > document to reference this module, and to use it in preference to the 
> > igb_uio driver, which is DPDK-
> > specific.
> > 
> > Signed-off-by: Bruce Richardson 
> > ---
> >  doc/guides/prog_guide/env_abstraction_layer.rst  | 8 
> > 
> >  doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst | 6 +++---
> >  doc/guides/prog_guide/kernel_nic_interface.rst   | 2 +-
> >  doc/guides/prog_guide/poll_mode_drv_emulated_virtio_nic.rst  | 8 
> > 
> >  doc/guides/prog_guide/poll_mode_drv_paravirtual_vmxnets_nic.rst  | 2 +-
> >  5 files changed, 13 insertions(+), 13 deletions(-)
> > 
> > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst
> > b/doc/guides/prog_guide/env_abstraction_layer.rst
> > index 231e266..b5321c3 100644
> > --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> > @@ -66,7 +66,7 @@ EAL in a Linux-userland Execution Environment
> >  -
> > 
> >  In a Linux user space environment, the DPDK application runs as a 
> > user-space application using the
> > pthread library.
> > -PCI information about devices and address space is discovered through the 
> > /sys kernel interface and
> > through a module called igb_uio.
> > +PCI information about devices and address space is discovered through the 
> > /sys kernel interface and
> > through kernel modules such as uio_pci_generic, or igb_uio.
> >  Refer to the UIO: User-space drivers documentation in the Linux kernel. 
> > This memory is mmap'd in
> > the application.
> > 
> >  The EAL performs physical memory allocation using mmap() in hugetlbfs 
> > (using huge page sizes to
> > increase performance).
> > @@ -134,10 +134,10 @@ PCI Access
> >  ~~
> > 
> >  The EAL uses the /sys/bus/pci utilities provided by the kernel to scan the 
> > content on the PCI bus.
> > -
> > -To access PCI memory, a kernel module called igb_uio provides a /dev/uioX 
> > device file
> > +To access PCI memory, a kernel module called uio_pci_generic provides a
> > +/dev/uioX device file and resource files in /sys
> >  that can be mmap'd to obtain access to PCI address space from the 
> > application.
> > -It uses the uio kernel feature (userland driver).
> > +The DPDK-specific igb_uio module can also be used for this. Both drivers 
> > use the uio kernel feature
> > (userland driver).
> > 
> >  Per-lcore and Shared Variables
> >  ~~
> > diff --git 
> > a/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> > b/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> > index 1f1e04f..a0dd959 100644
> > --- a/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> > +++ b/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> > @@ -306,12 +306,12 @@ Building and Running the Switching Backend
> >  Refer to the *DPDK Getting Started Guide* for more information on 
> > memory management in the
> > DPDK.
> >  In the above command, 4 GB memory is reserved (2048 of 2 MB pages) 
> > for DPDK.
> > 
> > -#.  Load igb_uio and bind one Intel NIC controller to igb_uio:
> > +#.  Load uio_pci_generic and bind one Intel NIC controller to it:
> > 
> >  .. code-block:: console
> > 
> > -insmod x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
> > -python tools/dpdk_nic_bind.py -b igb_uio :09:00:00.0
> 
> 
> Hi Bruce,
> 
> Should the information about igb_uio be retained alongside the new 
> information about uio_pci_generic?
>
While the answer may not be as clear-cut as with the GSG, why would be bother
covering both here. We already ignore VFIO in these examples.

/Bruce

[dpdk-dev] [PATCH 1/2] doc: Update GSG for uio_pci_generic use

2015-02-25 Thread Bruce Richardson

On Wed, Feb 25, 2015 at 12:14:15PM +, Iremonger, Bernard wrote:
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Tuesday, February 24, 2015 4:28 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH 1/2] doc: Update GSG for uio_pci_generic use
> > 
> > Since DPDK now has support for the in-tree uio_pci_generic driver, update 
> > the GSG document to
> > reference this module, and to use it in preference to the igb_uio driver, 
> > which is DPDK-specific.
> > 
> > Signed-off-by: Bruce Richardson 
> > ---
> >  doc/guides/linux_gsg/build_dpdk.rst| 63 
> > +-
> >  doc/guides/linux_gsg/build_sample_apps.rst |  5 ++-
> >  doc/guides/linux_gsg/enable_func.rst   |  2 +
> >  3 files changed, 40 insertions(+), 30 deletions(-)
> > 
> > diff --git a/doc/guides/linux_gsg/build_dpdk.rst 
> > b/doc/guides/linux_gsg/build_dpdk.rst
> > index d09c69d..255d6dc 100644
> > --- a/doc/guides/linux_gsg/build_dpdk.rst
> > +++ b/doc/guides/linux_gsg/build_dpdk.rst
> > @@ -133,7 +133,8 @@ use the make config T= command:
> > 
> >  .. warning::
> > 
> > -The igb_uio module must be compiled with the same kernel as the one 
> > running on the target.
> > +Any kernel modules to be used, e.g. igb_uio, kni, must be compiled 
> > with the
> > +same kernel as the one running on the target.
> >  If the DPDK is not being built on the target machine,
> >  the RTE_KERNELDIR environment variable should be used to point the 
> > compilation at a copy of the
> > kernel version to be used on the target machine.
> > 
> > @@ -154,28 +155,29 @@ Browsing the Installed DPDK Environment Target
> > 
> >  Once a target is created it contains all libraries and header files for 
> > the DPDK environment that are
> > required to build customer applications.
> >  In addition, the test and testpmd applications are built under the 
> > build/app directory, which may be
> > used for testing.
> > -In the case of Linux, a kmod  directory is also present that contains a 
> > module to install:
> > +A kmod  directory is also present that contains kernel modules which may 
> > be loaded if needed:
> > 
> >  .. code-block:: console
> > 
> >  $ ls x86_64-native-linuxapp-gcc
> >  app build hostapp include kmod lib Makefile
> > 
> > -Loading the DPDK igb_uio Module
> > 
> > +Loading Modules to Enable Userspace IO for DPDK
> > +---
> > 
> > -To run any DPDK application, the igb_uio module can be loaded into the 
> > running kernel.
> > -The module is found in the kmod sub-directory of the DPDK target directory.
> > -This module should be loaded using the insmod command as shown below 
> > (assuming that the
> > current directory is the DPDK target directory).
> > -In many cases, the uio support in the Linux* kernel is compiled as a 
> > module rather than as part of the
> > kernel, -so it is often necessary to load the uio module first:
> 
> 
> Hi Bruce,
> 
> Should the information about igb_uio be retained alongside the new 
> information about uio_pci_generic?
>  

This is obviously a matter of opinion, but: "no".
This doc is a Getting Started Guide, and therefore meant to cover just the 
minimum
needed to get up and running and ignoring advanced details. 
"uio_pci_generic" is the simplest path to getting up and running quickly, and
maintaining mention of igb_uio just adds to the complexity of the documentation.

Since uio_pci_generic also works on most linux distro's I'd also be tempted to
move the vfio details out of the main GSG body - perhaps to the extra chapter
covering KNI and running as non-root, again with the objective of simplifying
things for the beginner. VFIO and igb_uio are provided for those who want 
something
extra above what uio_pci_generic provides, e.g. security, or ability to create
VF devices on all kernels while having the PF in use by DPDK.

Regards,
/Bruce

[dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst

2015-02-25 Thread hem...@freescale.com

Hi OIivier
 Comments inline.
Regards,
Hemant

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Deme
> Sent: 25/Feb/2015 5:44 PM
> To: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst
> 
> Thank you Hemant, I think there might be one issue left with the patch though.
> The alloc_q must initially be filled with mbufs before getting mbuf back on 
> the
> tx_q.
> 
> So the patch should allow rte_kni_rx_burst to check if alloc_q is empty.
> If so, it should invoke kni_allocate_mbufs(kni, 0) (to fill the alloc_q with
> MAX_MBUF_BURST_NUM mbufs)
> 
> The patch for rte_kni_rx_burst would then look like:
> 
> @@ -575,7 +575,7 @@ rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf
> **mbufs, unsigned num)
> 
>   /* If buffers removed, allocate mbufs and then put them into alloc_q */
>   if (ret)
> -kni_allocate_mbufs(kni);
> +  kni_allocate_mbufs(kni, ret);
> +  else if (unlikely(kni->alloc_q->write == kni->alloc_q->read))
> +  kni_allocate_mbufs(kni, 0);
> 
[hemant]  This will introduce a run-time check.

I missed to include the other change in the patch. 
 I am doing it in kni_alloc i.e. initiate the alloc_q with default burst size. 
kni_allocate_mbufs(ctx, 0);

In a way, we are now suggesting to reduce the size of alloc_q to only default 
burst size. 

Can we reach is situation, when the kernel is adding packets faster in tx_q 
than the application is able to dequeue?
 alloc_q  can be empty in this case and kernel will be striving. 

> 
> Olivier.
> 
> On 25/02/15 11:48, Hemant Agrawal wrote:
> > From: Hemant Agrawal 
> >
> > if any buffer is read from the tx_q, MAX_BURST buffers will be allocated and
> attempted to be added to to the alloc_q.
> > This seems terribly inefficient and it also looks like the alloc_q will 
> > quickly fill
> to its maximum capacity. If the system buffers are low in number, it will 
> reach
> "out of memory" situation.
> >
> > This patch allocates the number of buffers as many dequeued from tx_q.
> >
> > Signed-off-by: Hemant Agrawal 
> > ---
> >   lib/librte_kni/rte_kni.c | 13 -
> >   1 file changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index
> > 4e70fa0..4cf8e30 100644
> > --- a/lib/librte_kni/rte_kni.c
> > +++ b/lib/librte_kni/rte_kni.c
> > @@ -128,7 +128,7 @@ struct rte_kni_memzone_pool {
> >
> >
> >   static void kni_free_mbufs(struct rte_kni *kni); -static void
> > kni_allocate_mbufs(struct rte_kni *kni);
> > +static void kni_allocate_mbufs(struct rte_kni *kni, int num);
> >
> >   static volatile int kni_fd = -1;
> >   static struct rte_kni_memzone_pool kni_memzone_pool = { @@ -575,7
> > +575,7 @@ rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf
> > **mbufs, unsigned num)
> >
> > /* If buffers removed, allocate mbufs and then put them into alloc_q
> */
> > if (ret)
> > -   kni_allocate_mbufs(kni);
> > +   kni_allocate_mbufs(kni, ret);
> >
> > return ret;
> >   }
> > @@ -594,7 +594,7 @@ kni_free_mbufs(struct rte_kni *kni)
> >   }
> >
> >   static void
> > -kni_allocate_mbufs(struct rte_kni *kni)
> > +kni_allocate_mbufs(struct rte_kni *kni, int num)
> >   {
> > int i, ret;
> > struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM]; @@ -620,7 +620,10
> @@
> > kni_allocate_mbufs(struct rte_kni *kni)
> > return;
> > }
> >
> > -   for (i = 0; i < MAX_MBUF_BURST_NUM; i++) {
> > +   if (num == 0 || num > MAX_MBUF_BURST_NUM)
> > +   num = MAX_MBUF_BURST_NUM;
> > +
> > +   for (i = 0; i < num; i++) {
> > pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
> > if (unlikely(pkts[i] == NULL)) {
> > /* Out of memory */
> > @@ -636,7 +639,7 @@ kni_allocate_mbufs(struct rte_kni *kni)
> > ret = kni_fifo_put(kni->alloc_q, (void **)pkts, i);
> >
> > /* Check if any mbufs not put into alloc_q, and then free them */
> > -   if (ret >= 0 && ret < i && ret < MAX_MBUF_BURST_NUM)
> {MAX_MBUF_BURST_NUM
> >
> > +   if (ret >= 0 && ret < i && ret < num) {
> > int j;
> >
> > for (j = ret; j < i; j++)
> 
> --
>   *Olivier Dem?*
> *Druid Software Ltd.*
> *Tel: +353 1 202 1831*
> *Email: odeme at druidsoftware.com *
> *URL: http://www.druidsoftware.com*
>   *Hall 7, stand 7F70.*
> Druid Software: Monetising enterprise small cells solutions.

[dpdk-dev] [PATCH v14 12/13] eal/pci: Add rte_eal_dev_attach/detach() functions

2015-02-25 Thread Thomas Monjalon

2015-02-25 13:04, Tetsuya Mukawa:
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -32,10 +32,13 @@
>   *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>   */
>  
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  #include 

No, you must not include ethdev in EAL.
The ethdev layer is by design on top of EAL.
Maxime already asked why you did it. He was implicitly asking to remove it.
You said that you are calling ethdev_is_detachable() but you should
call a function eal_is_detachable() or something like that.
The detachable state must be only device-related, i.e. in EAL.
The ethdev API is only a wrapper (with port id) in such case.

> --- a/lib/librte_eal/linuxapp/eal/Makefile
> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> @@ -45,6 +45,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common/include
>  CFLAGS += -I$(RTE_SDK)/lib/librte_ring
>  CFLAGS += -I$(RTE_SDK)/lib/librte_mempool
>  CFLAGS += -I$(RTE_SDK)/lib/librte_malloc
> +CFLAGS += -I$(RTE_SDK)/lib/librte_mbuf

By removing ethdev dependency, you can remove this ugly mbuf dependency.

Thanks Tetsuya

[dpdk-dev] dpdk - poll mode - context switches

2015-02-25 Thread Jog Lie

Hi Bruce,

Ok. understood. 

Thanks !


--?
Jog

[dpdk-dev] [PATCH 2/2] doc: update programmers guide for uio_pci_generic

2015-02-25 Thread Iremonger, Bernard



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> Sent: Tuesday, February 24, 2015 4:28 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH 2/2] doc: update programmers guide for 
> uio_pci_generic
> 
> Since DPDK now has support for the in-tree uio_pci_generic driver, update the 
> programmers guide
> document to reference this module, and to use it in preference to the igb_uio 
> driver, which is DPDK-
> specific.
> 
> Signed-off-by: Bruce Richardson 
> ---
>  doc/guides/prog_guide/env_abstraction_layer.rst  | 8 
>  doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst | 6 +++---
>  doc/guides/prog_guide/kernel_nic_interface.rst   | 2 +-
>  doc/guides/prog_guide/poll_mode_drv_emulated_virtio_nic.rst  | 8 
>  doc/guides/prog_guide/poll_mode_drv_paravirtual_vmxnets_nic.rst  | 2 +-
>  5 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst
> b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 231e266..b5321c3 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -66,7 +66,7 @@ EAL in a Linux-userland Execution Environment
>  -
> 
>  In a Linux user space environment, the DPDK application runs as a user-space 
> application using the
> pthread library.
> -PCI information about devices and address space is discovered through the 
> /sys kernel interface and
> through a module called igb_uio.
> +PCI information about devices and address space is discovered through the 
> /sys kernel interface and
> through kernel modules such as uio_pci_generic, or igb_uio.
>  Refer to the UIO: User-space drivers documentation in the Linux kernel. This 
> memory is mmap'd in
> the application.
> 
>  The EAL performs physical memory allocation using mmap() in hugetlbfs (using 
> huge page sizes to
> increase performance).
> @@ -134,10 +134,10 @@ PCI Access
>  ~~
> 
>  The EAL uses the /sys/bus/pci utilities provided by the kernel to scan the 
> content on the PCI bus.
> -
> -To access PCI memory, a kernel module called igb_uio provides a /dev/uioX 
> device file
> +To access PCI memory, a kernel module called uio_pci_generic provides a
> +/dev/uioX device file and resource files in /sys
>  that can be mmap'd to obtain access to PCI address space from the 
> application.
> -It uses the uio kernel feature (userland driver).
> +The DPDK-specific igb_uio module can also be used for this. Both drivers use 
> the uio kernel feature
> (userland driver).
> 
>  Per-lcore and Shared Variables
>  ~~
> diff --git a/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> b/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> index 1f1e04f..a0dd959 100644
> --- a/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> +++ b/doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst
> @@ -306,12 +306,12 @@ Building and Running the Switching Backend
>  Refer to the *DPDK Getting Started Guide* for more information on 
> memory management in the
> DPDK.
>  In the above command, 4 GB memory is reserved (2048 of 2 MB pages) 
> for DPDK.
> 
> -#.  Load igb_uio and bind one Intel NIC controller to igb_uio:
> +#.  Load uio_pci_generic and bind one Intel NIC controller to it:
> 
>  .. code-block:: console
> 
> -insmod x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
> -python tools/dpdk_nic_bind.py -b igb_uio :09:00:00.0


Hi Bruce,

Should the information about igb_uio be retained alongside the new information 
about uio_pci_generic?

> +modprobe uio_pci_generic
> +python tools/dpdk_nic_bind.py -b uio_pci_generic
> + :09:00:00.0
> 
>  In this case, :09:00.0 is the PCI address for the NIC controller.
> 
> diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst
> b/doc/guides/prog_guide/kernel_nic_interface.rst
> index 0276019..9ed7688 100644
> --- a/doc/guides/prog_guide/kernel_nic_interface.rst
> +++ b/doc/guides/prog_guide/kernel_nic_interface.rst
> @@ -224,7 +224,7 @@ Otherwise, by default, KNI will not enable its backend 
> support capability.
> 
>  Of course, as a prerequisite, the vhost/vhost-net kernel CONFIG should be 
> chosen before compiling
> the kernel.
> 
> -#.  Compile the DPDK and insert igb_uio as normal.
> +#.  Compile the DPDK and insert uio_pci_generic/igb_uio kernel modules as 
> normal.
> 
>  #.  Insert the KNI kernel module:
> 
> diff --git a/doc/guides/prog_guide/poll_mode_drv_emulated_virtio_nic.rst
> b/doc/guides/prog_guide/poll_mode_drv_emulated_virtio_nic.rst
> index 86f4f60..b0a6250 100644
> --- a/doc/guides/prog_guide/poll_mode_drv_emulated_virtio_nic.rst
> +++ b/doc/guides/prog_guide/poll_mode_drv_emulated_virtio_nic.rst
> @@ -113,7 +113,7 @@ Host2VM communication example
> 
>

[dpdk-dev] [PATCH 1/2] doc: Update GSG for uio_pci_generic use

2015-02-25 Thread Iremonger, Bernard



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> Sent: Tuesday, February 24, 2015 4:28 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH 1/2] doc: Update GSG for uio_pci_generic use
> 
> Since DPDK now has support for the in-tree uio_pci_generic driver, update the 
> GSG document to
> reference this module, and to use it in preference to the igb_uio driver, 
> which is DPDK-specific.
> 
> Signed-off-by: Bruce Richardson 
> ---
>  doc/guides/linux_gsg/build_dpdk.rst| 63 
> +-
>  doc/guides/linux_gsg/build_sample_apps.rst |  5 ++-
>  doc/guides/linux_gsg/enable_func.rst   |  2 +
>  3 files changed, 40 insertions(+), 30 deletions(-)
> 
> diff --git a/doc/guides/linux_gsg/build_dpdk.rst 
> b/doc/guides/linux_gsg/build_dpdk.rst
> index d09c69d..255d6dc 100644
> --- a/doc/guides/linux_gsg/build_dpdk.rst
> +++ b/doc/guides/linux_gsg/build_dpdk.rst
> @@ -133,7 +133,8 @@ use the make config T= command:
> 
>  .. warning::
> 
> -The igb_uio module must be compiled with the same kernel as the one 
> running on the target.
> +Any kernel modules to be used, e.g. igb_uio, kni, must be compiled with 
> the
> +same kernel as the one running on the target.
>  If the DPDK is not being built on the target machine,
>  the RTE_KERNELDIR environment variable should be used to point the 
> compilation at a copy of the
> kernel version to be used on the target machine.
> 
> @@ -154,28 +155,29 @@ Browsing the Installed DPDK Environment Target
> 
>  Once a target is created it contains all libraries and header files for the 
> DPDK environment that are
> required to build customer applications.
>  In addition, the test and testpmd applications are built under the build/app 
> directory, which may be
> used for testing.
> -In the case of Linux, a kmod  directory is also present that contains a 
> module to install:
> +A kmod  directory is also present that contains kernel modules which may be 
> loaded if needed:
> 
>  .. code-block:: console
> 
>  $ ls x86_64-native-linuxapp-gcc
>  app build hostapp include kmod lib Makefile
> 
> -Loading the DPDK igb_uio Module
> 
> +Loading Modules to Enable Userspace IO for DPDK
> +---
> 
> -To run any DPDK application, the igb_uio module can be loaded into the 
> running kernel.
> -The module is found in the kmod sub-directory of the DPDK target directory.
> -This module should be loaded using the insmod command as shown below 
> (assuming that the
> current directory is the DPDK target directory).
> -In many cases, the uio support in the Linux* kernel is compiled as a module 
> rather than as part of the
> kernel, -so it is often necessary to load the uio module first:


Hi Bruce,

Should the information about igb_uio be retained alongside the new information 
about uio_pci_generic?

> +To run any DPDK application, a suitable uio module can be loaded into the 
> running kernel.
> +In most cases, the standard uio_pci_generic module included in the
> +linux kernel can provide the uio capability. This module can be loaded
> +using the command
> 
>  .. code-block:: console
> 
> -sudo modprobe uio
> -sudo insmod kmod/igb_uio.ko

Should the information about igb_uio be retained alongside the new information 
about uio_pci_generic?

> +sudo modprobe uio_pci_generic
> 
> -Since DPDK release 1.7 provides VFIO support, compilation and use of igb_uio 
> module has become
> optional for platforms that support using VFIO.
> +As an alternative to the uio_pci_generic, the DPDK also includes the
> +igb_uio module which can be found in the kmod subdirectory referred to above.
> +
> +Since DPDK release 1.7 onward provides VFIO support, use of UIO is
> +optional for platforms that support using VFIO.
> 
>  Loading VFIO Module
>  ---
> @@ -195,24 +197,29 @@ Also, to use VFIO, both kernel and BIOS must support 
> and be configured to
> use IO  For proper operation of VFIO when running DPDK applications as a 
> non-privileged user, correct
> permissions should also be set up.
>  This can be done by using the DPDK setup script (called setup.sh and located 
> in the tools directory).
> 
> -Binding and Unbinding Network Ports to/from the igb_uioor VFIO Modules
> +Binding and Unbinding Network Ports to/from the Kernel Modules
>  --
> 
>  As of release 1.4, DPDK applications no longer automatically unbind all 
> supported network ports from
> the kernel driver in use.
> -Instead, all ports that are to be used by an DPDK application must be bound 
> to the igb_uio or vfio-pci
> module before the application is run.
> +Instead, all ports that are to be used by an DPDK application must be
> +bound to the uio_pci_generic, igb_uio or vfio-pci module before the 
> application is run.
>  Any network ports under Linux* control will

[dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst

2015-02-25 Thread Olivier Deme

Thank you Hemant, I think there might be one issue left with the patch 
though.
The alloc_q must initially be filled with mbufs before getting mbuf back 
on the tx_q.

So the patch should allow rte_kni_rx_burst to check if alloc_q is empty.
If so, it should invoke kni_allocate_mbufs(kni, 0)
(to fill the alloc_q with MAX_MBUF_BURST_NUM mbufs)

The patch for rte_kni_rx_burst would then look like:

@@ -575,7 +575,7 @@ rte_kni_rx_burst(struct rte_kni *kni, struct 
rte_mbuf **mbufs, unsigned num)

  /* If buffers removed, allocate mbufs and then put them into 
alloc_q */
  if (ret)
-kni_allocate_mbufs(kni);
+  kni_allocate_mbufs(kni, ret);
+  else if (unlikely(kni->alloc_q->write == kni->alloc_q->read))
+  kni_allocate_mbufs(kni, 0);


Olivier.

On 25/02/15 11:48, Hemant Agrawal wrote:
> From: Hemant Agrawal 
>
> if any buffer is read from the tx_q, MAX_BURST buffers will be allocated and 
> attempted to be added to to the alloc_q.
> This seems terribly inefficient and it also looks like the alloc_q will 
> quickly fill to its maximum capacity. If the system buffers are low in 
> number, it will reach "out of memory" situation.
>
> This patch allocates the number of buffers as many dequeued from tx_q.
>
> Signed-off-by: Hemant Agrawal 
> ---
>   lib/librte_kni/rte_kni.c | 13 -
>   1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
> index 4e70fa0..4cf8e30 100644
> --- a/lib/librte_kni/rte_kni.c
> +++ b/lib/librte_kni/rte_kni.c
> @@ -128,7 +128,7 @@ struct rte_kni_memzone_pool {
>   
>   
>   static void kni_free_mbufs(struct rte_kni *kni);
> -static void kni_allocate_mbufs(struct rte_kni *kni);
> +static void kni_allocate_mbufs(struct rte_kni *kni, int num);
>   
>   static volatile int kni_fd = -1;
>   static struct rte_kni_memzone_pool kni_memzone_pool = {
> @@ -575,7 +575,7 @@ rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf 
> **mbufs, unsigned num)
>   
>   /* If buffers removed, allocate mbufs and then put them into alloc_q */
>   if (ret)
> - kni_allocate_mbufs(kni);
> + kni_allocate_mbufs(kni, ret);
>   
>   return ret;
>   }
> @@ -594,7 +594,7 @@ kni_free_mbufs(struct rte_kni *kni)
>   }
>   
>   static void
> -kni_allocate_mbufs(struct rte_kni *kni)
> +kni_allocate_mbufs(struct rte_kni *kni, int num)
>   {
>   int i, ret;
>   struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];
> @@ -620,7 +620,10 @@ kni_allocate_mbufs(struct rte_kni *kni)
>   return;
>   }
>   
> - for (i = 0; i < MAX_MBUF_BURST_NUM; i++) {
> + if (num == 0 || num > MAX_MBUF_BURST_NUM)
> + num = MAX_MBUF_BURST_NUM;
> +
> + for (i = 0; i < num; i++) {
>   pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
>   if (unlikely(pkts[i] == NULL)) {
>   /* Out of memory */
> @@ -636,7 +639,7 @@ kni_allocate_mbufs(struct rte_kni *kni)
>   ret = kni_fifo_put(kni->alloc_q, (void **)pkts, i);
>   
>   /* Check if any mbufs not put into alloc_q, and then free them */
> - if (ret >= 0 && ret < i && ret < MAX_MBUF_BURST_NUM) {MAX_MBUF_BURST_NUM
>
> + if (ret >= 0 && ret < i && ret < num) {
>   int j;
>   
>   for (j = ret; j < i; j++)

-- 
*Olivier Dem?*
*Druid Software Ltd.*
*Tel: +353 1 202 1831*
*Email: odeme at druidsoftware.com *
*URL: http://www.druidsoftware.com*
*Hall 7, stand 7F70.*
Druid Software: Monetising enterprise small cells solutions.

[dpdk-dev] [PATCH v2 0/3] timer: fix rte_timer_reset

2015-02-25 Thread Thomas Monjalon

2015-02-25 06:02, Robert Sanford:
> Hi Thomas,
> 
> Yes, I'm interested in becoming a maintainer of rte_timer. What are the
> responsibilities?

It means we know someone who can answer our questions about rte_timer.
Having you email in the MAINTAINERS file helps to CC you.
And we expect from the maintainer he tried to review patches for its part.
But reviews may be done by someone else.
In general, technical review from the maintainer is more trustable.

If you are still interested, please drop a patch like this one:
http://dpdk.org/browse/dpdk/commit/?id=a7d7ece480093

[dpdk-dev] Missing symbol error

2015-02-25 Thread Tetsuya Mukawa

Hi,

I've got following error when I enable CONFIG_RTE_BUILD_SHARED_LIB.

dpdk/x86_64-native-linuxapp-gcc/lib/libethdev.so: undefined reference to
`per_lcore__socket_id'
collect2: error: ld returned 1 exit status
make[5]: *** [dump_cfg] Error 1
make[4]: *** [dump_cfg] Error 2
make[4]: *** Waiting for unfinished jobs


It seems after applying below commit, this issue is occurred.
8baacdd... eal: apply thread affinity by assigned cpuset

Thanks,
Tetsuya

[dpdk-dev] [PATCH v2 0/4] DPDK memcpy optimization

2015-02-25 Thread Thomas Monjalon

> > This patch set optimizes memcpy for DPDK for both SSE and AVX platforms.
> > It also extends memcpy test coverage with unaligned cases and more test
> > points.
> > 
> > Optimization techniques are summarized below:
> > 
> > 1. Utilize full cache bandwidth
> > 
> > 2. Enforce aligned stores
> > 
> > 3. Apply load address alignment based on architecture features
> > 
> > 4. Make load/store address available as early as possible
> > 
> > 5. General optimization techniques like inlining, branch reducing, prefetch
> > pattern access
> > 
> > --
> > Changes in v2:
> > 
> > 1. Reduced constant test cases in app/test/test_memcpy_perf.c for fast
> > build
> > 
> > 2. Modified macro definition for better code readability & safety
> > 
> > Zhihong Wang (4):
> >   app/test: Disabled VTA for memcpy test in app/test/Makefile
> >   app/test: Removed unnecessary test cases in app/test/test_memcpy.c
> >   app/test: Extended test coverage in app/test/test_memcpy_perf.c
> >   lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE
> > and AVX platforms
> 
> Acked-by: Pablo de Lara 

Applied, thanks for the great work!

Note: we are still looking for a maintainer of x86 EAL.

[dpdk-dev] : ixgbe: why bulk allocation is not used for a scattered Rx flow?

2015-02-25 Thread Vlad Zolotarov

Hi, I have a question about the "scattered Rx" feature: why enabling it 
disabled "bulk allocation" feature?
There is some unclear comment in the ixgbe_recv_scattered_pkts():

/*
 * Descriptor done.
 *
 * Allocate a new mbuf to replenish the RX ring descriptor.
 * If the allocation fails:
 *- arrange for that RX descriptor to be the first one
 *  being parsed the next time the receive function is
 *  invoked [on the same queue].
 *
 *- Stop parsing the RX ring and return immediately.
 *
 * This policy does not drop the packet received in the RX
 * descriptor for which the allocation of a new mbuf failed.
 * Thus, it allows that packet to be later retrieved if
 * mbuf have been freed in the mean time.
 * As a side effect, holding RX descriptors instead of
 * systematically giving them back to the NIC may lead to
 * RX ring exhaustion situations.
 * However, the NIC can gracefully prevent such situations
 * to happen by sending specific "back-pressure" flow control
 * frames to its peer(s).
 */

Why the same "policy" can't be done in the bulk-context allocation? - 
Don't advance the RDT until u've refilled the ring. What do I miss here?

Another question is about the LRO feature - is there a reason why it's 
not implemented? I've implemented the LRO support in ixgbe PMD to begin 
with - I used a "scattered Rx" as a template and now I'm tuning it 
(things like the stuff above).

Is there any philosophical reason why it hasn't been implemented in 
*any* PMD so far? ;)

thanks,
vlad

[dpdk-dev] [PATCH v1 2/2] eal/bsd: fix symbol missing in version map

2015-02-25 Thread Cunming Liang

As per_lcore__socket_id and rte_sys_gettid are missing in version map,
it causes compiling error when CONFIG_RTE_BUILD_SHARED_LIB is enabled.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index c207cee..17515a9 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -10,6 +10,7 @@ DPDK_2.0 {
pci_driver_list;
per_lcore__lcore_id;
per_lcore__rte_errno;
+   per_lcore__socket_id;
rte_cpu_check_supported;
rte_cpu_get_flag_enabled;
rte_cycles_vmware_tsc_map;
@@ -83,6 +84,7 @@ DPDK_2.0 {
rte_snprintf;
rte_strerror;
rte_strsplit;
+   rte_sys_gettid;
rte_thread_get_affinity;
rte_thread_set_affinity;
rte_vlog;
-- 
1.8.1.4

[dpdk-dev] [PATCH v1 1/2] eal/linux: fix symbol missing in version map

2015-02-25 Thread Cunming Liang

As per_lcore__socket_id and rte_sys_gettid are missing in version map,
it causes compiling error when CONFIG_RTE_BUILD_SHARED_LIB is enabled.

Signed-off-by: Cunming Liang 
---
 lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map 
b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index c207cee..17515a9 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -10,6 +10,7 @@ DPDK_2.0 {
pci_driver_list;
per_lcore__lcore_id;
per_lcore__rte_errno;
+   per_lcore__socket_id;
rte_cpu_check_supported;
rte_cpu_get_flag_enabled;
rte_cycles_vmware_tsc_map;
@@ -83,6 +84,7 @@ DPDK_2.0 {
rte_snprintf;
rte_strerror;
rte_strsplit;
+   rte_sys_gettid;
rte_thread_get_affinity;
rte_thread_set_affinity;
rte_vlog;
-- 
1.8.1.4

[dpdk-dev] [PATCH v1 0/2] eal: fix symbol missing in version map

2015-02-25 Thread Cunming Liang

These two patches are the fixing for the compling error when 
CONFIG_RTE_BUILD_SHARED_LIB=y.
The root cause is *per_lcore__socket_id* and *rte_sys_gettid* are missing in 
the version map.
Thanks for the notification from Tetsuya Mukawa . 

Cunming Liang (2):
  eal/linux: fix symbol missing in version map
  eal/bsd: fix symbol missing in version map

 lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 2 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 ++
 2 files changed, 4 insertions(+)

-- 
1.8.1.4

[dpdk-dev] [PATCH v2] app/test: add crc32 algorithms equivalence check

2015-02-25 Thread Bruce Richardson

On Wed, Feb 25, 2015 at 10:08:32AM +0600, Yerden Zhumabekov wrote:
> New function test_crc32_hash_alg_equiv() checks whether software,
> 4-byte operand and 8-byte operand versions of CRC32 hash function
> implementations return the same result value.
> 
> Signed-off-by: Yerden Zhumabekov 

Two small notes below for improving output on error.

Acked-by: Bruce Richardson 

> ---
>  app/test/test_hash.c |   63 
> ++
>  1 file changed, 63 insertions(+)
> 
> diff --git a/app/test/test_hash.c b/app/test/test_hash.c
> index 76b1b8f..3e94af1 100644
> --- a/app/test/test_hash.c
> +++ b/app/test/test_hash.c
> @@ -177,6 +177,66 @@ static struct rte_hash_parameters ut_params = {
>   .socket_id = 0,
>  };
>  
> +#define CRC32_ITERATIONS (1U << 20)
> +#define CRC32_DWORDS (1U << 6)
> +/*
> + * Test if all CRC32 implementations yield the same hash value
> + */
> +static int
> +test_crc32_hash_alg_equiv(void)
> +{
> + uint32_t hash_val;
> + uint32_t init_val;
> + uint64_t data64[CRC32_DWORDS];
> + unsigned i, j;
> + size_t data_len;
> +
> + printf("# CRC32 implementations equivalence test\n");
> + for (i = 0; i < CRC32_ITERATIONS; i++) {
> + /* Randomizing data_len of data set */
> + data_len = (size_t) ((rte_rand() % sizeof(data64)) + 1);
> + init_val = (uint32_t) rte_rand();
> +
> + /* Fill the data set */
> + for (j = 0; j < CRC32_DWORDS; j++)
> + data64[j] = rte_rand();
> +
> + /* Calculate software CRC32 */
> + rte_hash_crc_set_alg(CRC32_SW);
> + hash_val = rte_hash_crc(data64, data_len, init_val);
> +
> + /* Check against 4-byte-operand sse4.2 CRC32 if available */
> + rte_hash_crc_set_alg(CRC32_SSE42);
> + if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
> + printf("Failed checking CRC32_SW against 
> CRC32_SSE42\n");
> + break;
> + }
> +
> + /* Check against 8-byte-operand sse4.2 CRC32 if available */
> + rte_hash_crc_set_alg(CRC32_SSE42_x64);
> + if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
> + printf("Failed checking CRC32_SW against 
> CRC32_SSE42_x64\n");
> + break;
> + }
> + }
> +
> + /* Resetting to best available algorithm */
> + rte_hash_crc_set_alg(CRC32_SSE42_x64);
> +
> + if (i == CRC32_ITERATIONS)
> + return 0;
> +
> + printf("Failed test data (hex):\n");
> +
> + for (j = 0; j < data_len; j++) {
> + printf("%02X", ((uint8_t *)data64)[j]);
Put in a space after each hex character, otherwise it comes out like:

Failed test data (hex):
AAD292776348010C7A18D3080DB3A300
FD
Test Failed

[I forced a failure by changing a != to == to test it, don't worry, the
hash calculations are fine! :-)]

> + if ((j+1) % 16 == 0 || j == data_len - 1)
> + printf("\n");
> + }
Maybe also print out here, or before the hex digits, the length of the data
that was tested. e.g. "printf("%u bytes total\n", data_len);" or similar.
> +
> + return -1;
> +}
> +
>  /*
>   * Test a hash function.
>   */
> @@ -1356,6 +1416,9 @@ test_hash(void)
>  
>   run_hash_func_tests();
>  
> + if (test_crc32_hash_alg_equiv() < 0)
> + return -1;
> +
>   return 0;
>  }
>  
> -- 
> 1.7.9.5
>

[dpdk-dev] Cannot compile l2fwd-jobstats example

2015-02-25 Thread Tetsuya Mukawa

Hi,

I cannot compile l2fwd-jobstats using master branch.
Here is log

$ T=x86_64-native-linuxapp-gcc make examples
== Build examples for x86_64-native-linuxapp-gcc
== bond
== cmdline
== distributor
== exception_path
== helloworld
== ip_pipeline
== ip_reassembly
== ipv4_multicast
== kni
== l2fwd
== l2fwd-jobstats
make: *** l2fwd-jobstats: No such file or directory.  Stop.
make[2]: *** [l2fwd-jobstats] Error 2
make[1]: *** [x86_64-native-linuxapp-gcc_examples] Error 2
make: *** [examples] Error 2


As a result of bisecting, it seems after applying below commit, this
error can be seen.

commit 2caeb8c0141dcf488f2d68aa8e8c44d1f85ed28b
Author: Pawel Wodkowski 
Date:   Tue Feb 24 17:33:24 2015 +0100

examples/l2fwd-jobstats: new example


Thanks,
Tetsuya

[dpdk-dev] [PATCH v5 5/6] eal: add per rx queue interrupt handling based on VFIO

2015-02-25 Thread David Marchand

Hello Danny,

On Wed, Feb 25, 2015 at 7:58 AM, Zhou, Danny  wrote:

>
> +int
> +rte_intr_wait_rx_pkt(struct rte_intr_handle *intr_handle, uint8_t
> queue_id)
> +{
> +   struct epoll_event ev;
> +   unsigned numfds = 0;
> +
> +   if (!intr_handle || intr_handle->fd < 0 || intr_handle->uio_cfg_fd
> < 0)
> +   return -1;
> +   if (queue_id >= VFIO_MAX_QUEUE_ID)
> +   return -1;
> +
> +   /* create epoll fd */
> +   int pfd = epoll_create(1);
> +   if (pfd < 0) {
> +   RTE_LOG(ERR, EAL, "Cannot create epoll instance\n");
> +   return -1;
> +   }
>
>
>
> Why recreate the epoll instance at each call to this function ?
>
>
>
> DZ: To avoid recreating the epoll instance for each queue, the struct
> rte_intr_handle(or a new structure added to ethdev)
>
> should be extended by adding fields storing per-queue pfd. This way, it
> could reduce user/kernel context  switch overhead
>
> when calling epoll_create() each time.
>
>
>
> Sounds good?
>

You don't need a epfd per queue. And hardcoding epfd == eventfd will give a
not very usable api.

Plus, epoll is something linux-specific, so you can't move it out of
eal/linux.
I suppose you need an abstraction here (and in the future we could add
something for bsd ?).



>
> Looking at this patchset, I think there is a design issue.
>
> eal does not need to know about portid neither queueid.
>
>
>
> eal can provide an api to retrieve the interrupt fds, configure an epoll
> instance, wait on an epoll instance etc...
>
> ethdev is then responsible to setup the mapping between port id / queue id
> and interrupt fds by asking the eal about those fds.
>
>
>
> This would result in an eal api even simpler and we could add other fds in
> a single epoll fd for other uses.
>
>
>
> DZ: The queueid is just an index to the queue related eventfd array stored
> in EAL. If this array is still in the EAL and ethdev can apply for it and
> setup mapping for certain queue, there
>
> might be issue for multiple-process use case where the fd resources
> allocated for secondary process are not freed if the secondary process
> exits unexpectedly.
>

Not sure I follow you.
If a secondary process exits, the eventfds created in primary process
should still be valid and reusable.
Why would you need to free them ? Something to do with vfio ?



>
> Probably we can setup the eventfd array inside ethdev,  and we just need
> EAL API to wait for ethdev?fd. So application invokes ethdev API with
> portid and queueid, and ethdev calls eal
>
> API to wait on a ethdev fd which correlates with the specified portid and
> queueid.
>
>
>
> Sounds ok to you?
>

eventfds creation can not be handled by ethdev, since it needs
infrastructure and informations from within the eal/linux.
Again, do we need an abstraction ?

ethdev must be the one that does the mappings between port/queue and
eventfds (or any object that represents a way to wake up for a given
port/queue).


-- 
David Marchand

[dpdk-dev] [PATCH v2 0/3] timer: fix rte_timer_reset

2015-02-25 Thread Bruce Richardson

On Wed, Feb 25, 2015 at 06:02:24AM -0500, Robert Sanford wrote:
> Hi Thomas,
> 
> Yes, I'm interested in becoming a maintainer of rte_timer. What are the
> responsibilities?
> 
> 
> One question about lib rte_timer that's been troubling me for a while: How
> are skip lists better than BSD-style timer wheels?
> 
> --

The skip list may not be any better than a timer wheel - it's just what is used
now, and it does give pretty good performance (insert O(log n) [up to a few 
million timers per core], expiry O(1)).
Originally in DPDK, the timers were maintained in a regular sorted linked list,
but that suffered from scalability issues when starting timers, or stopped 
before
expiry. The skip-list was therefore a big improvement on that, and gave us
much greater scalability in timers, without any regressions in performance. I
don't know if anyone has tried to implement and benchmark a timer-wheel based
rte_timer library replacement. I'd be interested to see a performance comparison
between the two implementations! :-)

Regards,
/Bruce

> Regards,
> Robert
> 
> 
> On Wed, Feb 25, 2015 at 4:46 AM, Thomas Monjalon  6wind.com>
> wrote:
> 
> > > > Changes in v2:
> > > > - split into multiple patches
> > > > - minor coding-style changes
> > > >
> > > > Robert Sanford (3):
> > > >timer: fix return value of rte_timer_reset(),
> > > >  insert rte_pause() into rte_timer_reset_sync() wait-loop
> > > >app/test: fix timer stress test to succeed on multiple runs,
> > > >  display number of times that rte_timer_reset() fails
> > > >  (expected) due to races with other cores
> > >
> > > Series:
> > > Acked-by: Olivier Matz 
> >
> > Applied, thanks
> >
> > Robert, as you well know rte_timer and you work on it,
> > maybe you are interested in becoming maintainer?
> >

[dpdk-dev] dpdk - poll mode - context switches

2015-02-25 Thread Bruce Richardson

On Wed, Feb 25, 2015 at 10:54:51AM +0100, Jog Lie wrote:
> Hello,
> 
> I am not sure to understand the mechanism behind dpdk concerning the context 
> switches.
> I have two user space applications that need access to the NIC according to 
> incoming port rules (port 80 and port 443).
> 
> How to be sure that DPDK spreads the load to the right application ? 
> 
> Will 2 dpdk instances be needed (one per app) -> two incoming packets 
> analysis to "know" if the packet should be forwarded to 
> the user space process ? Which would basically be the same thing as 
> inefficient promiscuous mode.
> 
> i don't understand that "filtering" point.
> 
> Could you please clarify ?
> 
> Thanks
> 
> --?
> Jog

Hi Jog,

The missing link in connecting applications which receive packets from port
80/443 and DPDK itself is the TCP/IP stack in use. DPDK itself does not include
any stack, so you'll need to select a stack to use with your applications. The
mechanics of how apps talk to ports and how traffic gets filtered to them is
largely the stack's responsibility.

/Bruce

[dpdk-dev] : ixgbe: why bulk allocation is not used for a scattered Rx flow?

2015-02-25 Thread Bruce Richardson

On Wed, Feb 25, 2015 at 11:40:36AM +0200, Vlad Zolotarov wrote:
> Hi, I have a question about the "scattered Rx" feature: why enabling it
> disabled "bulk allocation" feature?

The "bulk-allocation" feature is one where a more optimized RX code path is
used. For the sake of performance, when doing that code path, certain 
assumptions
were made, one of which was that packets would fit inside a single mbuf. Not
having this assumption makes the receiving of packets much more complicated and
therefore slower. [For similar reasons, the optimized TX routines e.g. vector
TX, are only used if it is guaranteed that no hardware offload features are
going to be used].

Now, it is possible, though challenging, to write optimized code for these more
complicated cases, such as scattered RX, or TX with offloads or scattered 
packets.
In general, we will always want separate routines for the simple case and the
complicated cases, as the performance hit of checking for the offloads, or
multi-mbuf packets will be significant enough to hit our performance badly when
they are not needed. In the case of the vector PMD for ixgbe - our highest
performance path right now - we have indeed two receive routines, for simple
and scattered cases. For TX, we only have an optimized path for the simple case,
but that is not to say that at some point someone may provide one for the
offload case too.

A final note on scattered packets in particular: if packets are too big to fit
in a single mbuf, then they are not small packets, and the processing time per
packet available is, by definition, larger than for packets that fit in a 
single mbuf. For 64-byte packets, the packet arrival rate is 67ns @ 10G, or
approx 200 cycles at 3GHz. If we assume a standard 2k mbuf, then a packet which
spans two mbufs takes at least 1654ns, and therefore a 3GHz CPU has nearly 5000
cycles to process that same packet. Therefore, since the processing budget is
so much bigger the need to optimize is much less. Therefore it's more important
to focus on the small packet case, which is what we have done.

> There is some unclear comment in the ixgbe_recv_scattered_pkts():
> 
>   /*
>* Descriptor done.
>*
>* Allocate a new mbuf to replenish the RX ring descriptor.
>* If the allocation fails:
>*- arrange for that RX descriptor to be the first one
>*  being parsed the next time the receive function is
>*  invoked [on the same queue].
>*
>*- Stop parsing the RX ring and return immediately.
>*
>* This policy does not drop the packet received in the RX
>* descriptor for which the allocation of a new mbuf failed.
>* Thus, it allows that packet to be later retrieved if
>* mbuf have been freed in the mean time.
>* As a side effect, holding RX descriptors instead of
>* systematically giving them back to the NIC may lead to
>* RX ring exhaustion situations.
>* However, the NIC can gracefully prevent such situations
>* to happen by sending specific "back-pressure" flow control
>* frames to its peer(s).
>*/
> 
> Why the same "policy" can't be done in the bulk-context allocation? - Don't
> advance the RDT until u've refilled the ring. What do I miss here?

A lot of the optimizations done in other code paths, such as bulk alloc, may 
well
be applicable here, it's just that the work has not been done yet, as the focus
is elsewhere. For vector PMD RX, we have now routines that work on both regular
and scattered packets, and both perform much better than the scalar equivalents.
Also to note that in every RX (and TX) routine, the NIC tail pointer update is
always done just once at the end of the function. 

> 
> Another question is about the LRO feature - is there a reason why it's not
> implemented? I've implemented the LRO support in ixgbe PMD to begin with - I
> used a "scattered Rx" as a template and now I'm tuning it (things like the
> stuff above).
> 
> Is there any philosophical reason why it hasn't been implemented in *any*
> PMD so far? ;)

I'm not aware of any philosophical reasons why it hasn't been done. Patches
are welcome, as always. :-)

/Bruce

[dpdk-dev] [PATCH v4 4/7] move rte_eth_dev_check_mq_mode() logic to driver

2015-02-25 Thread Pawel Wodkowski

On 2015-02-25 07:14, Ouyang, Changchun wrote:
>
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pawel Wodkowski
>> Sent: Thursday, February 19, 2015 11:55 PM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH v4 4/7] move rte_eth_dev_check_mq_mode()
>> logic to driver
>>
>> Function rte_eth_dev_check_mq_mode() is driver specific. It should be
>> done in PF configuration phase. This patch move igb/ixgbe driver specific mq
>> check and SRIOV configuration code to driver part. Also rewriting log
>> messages to be shorter and more descriptive.
>>
>> Signed-off-by: Pawel Wodkowski 
>> ---
>>   lib/librte_ether/rte_ethdev.c   | 197 
>> ---
>>   lib/librte_pmd_e1000/igb_ethdev.c   |  43 
>>   lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 105 ++-
>>   lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   5 +-
>>   lib/librte_pmd_ixgbe/ixgbe_pf.c | 202
>> +++-
>>   5 files changed, 327 insertions(+), 225 deletions(-)
>>
>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>> index 4007054..aa27e39 100644
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -502,195 +502,6 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev
>> *dev, uint16_t nb_queues)
>>  return (0);
>>   }
>>
>> -static int
>> -rte_eth_dev_check_vf_rss_rxq_num(uint8_t port_id, uint16_t nb_rx_q) -{
>> -struct rte_eth_dev *dev = &rte_eth_devices[port_id];
>> -switch (nb_rx_q) {
>> -case 1:
>> -case 2:
>> -RTE_ETH_DEV_SRIOV(dev).active =
>> -ETH_64_POOLS;
>> -break;
>> -case 4:
>> -RTE_ETH_DEV_SRIOV(dev).active =
>> -ETH_32_POOLS;
>> -break;
>> -default:
>> -return -EINVAL;
>> -}
>> -
>> -RTE_ETH_DEV_SRIOV(dev).nb_rx_q_per_pool = nb_rx_q;
>> -RTE_ETH_DEV_SRIOV(dev).def_pool_q_idx =
>> -dev->pci_dev->max_vfs * nb_rx_q;
>> -
>> -return 0;
>> -}
>> -
>> -static int
>> -rte_eth_dev_check_mq_mode(uint8_t port_id, uint16_t nb_rx_q, uint16_t
>> nb_tx_q,
>> -  const struct rte_eth_conf *dev_conf)
>> -{
>> -struct rte_eth_dev *dev = &rte_eth_devices[port_id];
>> -
>> -if (RTE_ETH_DEV_SRIOV(dev).active != 0) {
>> -/* check multi-queue mode */
>> -if ((dev_conf->rxmode.mq_mode == ETH_MQ_RX_DCB) ||
>> -(dev_conf->rxmode.mq_mode == ETH_MQ_RX_DCB_RSS)
>> ||
>> -(dev_conf->txmode.mq_mode == ETH_MQ_TX_DCB)) {
>> -/* SRIOV only works in VMDq enable mode */
>> -PMD_DEBUG_TRACE("ethdev port_id=%" PRIu8
>> -" SRIOV active, "
>> -"wrong VMDQ mq_mode rx %u
>> tx %u\n",
>> -port_id,
>> -dev_conf->rxmode.mq_mode,
>> -dev_conf->txmode.mq_mode);
>> -return (-EINVAL);
>> -}
>> -
>> -switch (dev_conf->rxmode.mq_mode) {
>> -case ETH_MQ_RX_VMDQ_DCB:
>> -case ETH_MQ_RX_VMDQ_DCB_RSS:
>> -/* DCB/RSS VMDQ in SRIOV mode, not implement
>> yet */
>> -PMD_DEBUG_TRACE("ethdev port_id=%" PRIu8
>> -" SRIOV active, "
>> -"unsupported VMDQ mq_mode
>> rx %u\n",
>> -port_id, dev_conf-
>>> rxmode.mq_mode);
>> -return (-EINVAL);
>> -case ETH_MQ_RX_RSS:
>> -PMD_DEBUG_TRACE("ethdev port_id=%" PRIu8
>> -" SRIOV active, "
>> -"Rx mq mode is changed from:"
>> -"mq_mode %u into VMDQ
>> mq_mode %u\n",
>> -port_id,
>> -dev_conf->rxmode.mq_mode,
>> -dev->data-
>>> dev_conf.rxmode.mq_mode);
>> -case ETH_MQ_RX_VMDQ_RSS:
>> -dev->data->dev_conf.rxmode.mq_mode =
>> ETH_MQ_RX_VMDQ_RSS;
>> -if (nb_rx_q <=
>> RTE_ETH_DEV_SRIOV(dev).nb_rx_q_per_pool)
>> -if
>> (rte_eth_dev_check_vf_rss_rxq_num(port_id, nb_rx_q) != 0) {
>> -PMD_DEBUG_TRACE("ethdev
>> port_id=%d"
>> -" SRIOV active, invalid queue"
>> -" number for VMDQ RSS,
>> allowed"
>> -" value are 1, 2 or 4\n",
>> -port_id);
>> -return -EINVAL;
>> -}
>> -break;
>> -default: /* ETH_MQ_RX_VM

[dpdk-dev] [PATCH v1] afpacket: fix critical issue reported by klocwork

2015-02-25 Thread Thomas Monjalon

2015-02-25 09:52, Liang, Cunming:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > 2015-02-25 00:57, Liang, Cunming:
> > > From: John W. Linville [mailto:linville at tuxdriver.com]
> > > > On Fri, Feb 20, 2015 at 11:19:59AM +0100, Thomas Monjalon wrote:
> > > > > 2015-02-12 17:08, Cunming Liang:
> > > > > > --- a/lib/librte_pmd_af_packet/rte_eth_af_packet.c
> > > > > > +++ b/lib/librte_pmd_af_packet/rte_eth_af_packet.c
> > > > > > @@ -439,13 +439,15 @@ rte_pmd_init_internals(const char *name,
> > > > > > size_t ifnamelen;
> > > > > > unsigned k_idx;
> > > > > > struct sockaddr_ll sockaddr;
> > > > > > -   struct tpacket_req *req;
> > > > > > +   struct tpacket_req *req = NULL;
> > > > >
> > > > > If *internals is set to NULL, there should be no case where req used
> > > > > and undefined.
> > >
> > > [LCM] Agree, so that's why I add '*internals = NULL' below as well.
> > > >
> > > > I agree -- it looks to me like req is protected by checking for
> > > > *internals == NULL.  I don't think this patch is necessary.
> > >
> > > [LCM] The major piece of the patch is add setting for '*internals=NULL;'.
> > 
> > Yes understood, but it is already initialized to NULL before calling
> > rte_pmd_init_internals():
> > http://dpdk.org/browse/dpdk/tree/lib/librte_pmd_af_packet/rte_eth_af_packet
> > .c#n706
> [LCM] I see, it's complained by klocwork.
> So either adding 'internals=NULL' or adding some comments helps to avoid 
> checking again on the next scanning.
> How do you think ?

No, we don't have to pollute the code for a tool.
You should check how to disable this false positive in your tool.

[dpdk-dev] dpdk - poll mode - context switches

2015-02-25 Thread Jog Lie

Hello,

I am not sure to understand the mechanism behind dpdk concerning the context 
switches.
I have two user space applications that need access to the NIC according to 
incoming port rules (port 80 and port 443).

How to be sure that DPDK spreads the load to the right application ? 

Will 2 dpdk instances be needed (one per app) -> two incoming packets analysis 
to "know" if the packet should be forwarded to 
the user space process ? Which would basically be the same thing as inefficient 
promiscuous mode.

i don't understand that "filtering" point.

Could you please clarify ?

Thanks

--?
Jog

[dpdk-dev] [PATCH v2 0/3] timer: fix rte_timer_reset

2015-02-25 Thread Thomas Monjalon

> > Changes in v2:
> > - split into multiple patches
> > - minor coding-style changes
> >
> > Robert Sanford (3):
> >timer: fix return value of rte_timer_reset(),
> >  insert rte_pause() into rte_timer_reset_sync() wait-loop
> >app/test: fix timer stress test to succeed on multiple runs,
> >  display number of times that rte_timer_reset() fails
> >  (expected) due to races with other cores
> 
> Series:
> Acked-by: Olivier Matz 

Applied, thanks

Robert, as you well know rte_timer and you work on it,
maybe you are interested in becoming maintainer?

[dpdk-dev] KNI as kernel vHost backend failing

2015-02-25 Thread Xie, Huawei

On 1/1/2015 5:02 PM, sai kiran wrote:
> Hi,
>
>
>
> We are trying to experiment with DPDK?s KNI application, with KNI working
> as Kernel vHost backend.
>
>
> 1.  After starting the KNI application, KNI application has detected link
> up.
>
>
> *[root at localhost kni]# ./build/app/kni -c 0xf0 -n 4 -- -p 0x3 -P
> --config="(0,4,6),(1,5,7)"*
>
>
> APP: Initialising port 0 ...
>
> KNI: pci: 10:00:01   8086:10fb
>
> APP: Initialising port 1 ...
>
> PMD: To improve 1G driver performance, consider setting the TX WTHRESH
> value to 4, 8, or 16.
>
> KNI: pci: 16:00:01   8086:10e7
>
> Checking link status
>
> .done
>
> Port 0 Link Up - speed 1 Mbps - full-duplex
>
> Port 1 Link Up - speed 1000 Mbps - full-duplex
>
> APP: Lcore 5 is reading from port 1
>
> APP: Lcore 7 is writing to port 1
>
> APP: Lcore 6 is writing to port 0
>
> APP: Lcore 4 is reading from port 0
>
>
> 2. As mentioned in Programming guide, *sock_en* variable in sysfs is
> enabled and a fd is generated
>
> [root at localhost dpdk-1.7.1]# cat /sys/class/net/vEth0/sock_en
> 1
> [root at localhost dpdk-1.7.1]# cat /sys/class/net/vEth1/sock_en
> 1
> [root at localhost dpdk-1.7.1]# cat /sys/class/net/vEth0/sock_fd
> 11
> [root at localhost dpdk-1.7.1]# cat /sys/class/net/vEth1/sock_fd
> 12
>
> 3. But when a VM is launched with this file-descriptor as the
> vhost-backend, the qemu-kvm is throwing an ioctl-failure error. This
> ioctl
> is making the vhost-backend fallback to virtio-userspace.
>
>
>
> [root at localhost qemu-kvm-1.2.0]# /usr/bin/qemu-kvm -m 2048 -enable-kvm
> -cpu
> host -smp 2 -name VSK1 -drive file=/root/SAI/NSVPX-KVM-11.0-28.1_nc.raw
> -netdev tap,fd=12,id=mynet_kni,vhost=on -device
> virtio-net-pci,netdev=mynet_kni,bus=pci.0,addr=0x4,ioeventfd=on
>
> qemu-kvm: -netdev tap,fd=12,id=mynet_kni,vhost=on: TUNGETIFF ioctl()
> failed: Bad file descriptor
>
> TUNSETOFFLOAD ioctl() failed: Bad file descriptor
>
> qemu-kvm: unable to start vhost net: 88: falling back on userspace virtio
>
> qemu-kvm: unable to start vhost net: 88: falling back on userspace virtio
>
> qemu-kvm: unable to start vhost net: 88: falling back on userspace virtio
>
> With this failure, the traffic from VM is not flowing through KNI
> interface.
>
>
>
> The above mentioned ioctl failure does NOT happen consistently. During
> the
> instances when failure is not seen, traffic flows successfully through
> the
> KNI interfaces.
>
>
>
> Can someone please shed some light as to what is happening in this case.
> Are we missing something here? Is there a known issue?
>
>
Hi Kiran:
Is it possible you switch to user space vhost?
>
> Thanks,
>
> Kiran
>
>
>

[dpdk-dev] Cannot compile l2fwd-jobstats example

2015-02-25 Thread Thomas Monjalon

2015-02-25 08:38, Pawel Wodkowski:
> On 2015-02-25 03:26, Tetsuya Mukawa wrote:
> > Hi,
> >
> > I cannot compile l2fwd-jobstats using master branch.
> > Here is log
> >
> > $ T=x86_64-native-linuxapp-gcc make examples
> > == Build examples for x86_64-native-linuxapp-gcc
> > == bond
> > == cmdline
> > == distributor
> > == exception_path
> > == helloworld
> > == ip_pipeline
> > == ip_reassembly
> > == ipv4_multicast
> > == kni
> > == l2fwd
> > == l2fwd-jobstats
> > make: *** l2fwd-jobstats: No such file or directory.  Stop.
> > make[2]: *** [l2fwd-jobstats] Error 2
> > make[1]: *** [x86_64-native-linuxapp-gcc_examples] Error 2
> > make: *** [examples] Error 2
> >
> >
> > As a result of bisecting, it seems after applying below commit, this
> > error can be seen.
> >
> > commit 2caeb8c0141dcf488f2d68aa8e8c44d1f85ed28b
> > Author: Pawel Wodkowski 
> > Date:   Tue Feb 24 17:33:24 2015 +0100
> >
> >  examples/l2fwd-jobstats: new example
> >
> >
> > Thanks,
> > Tetsuya
> >
> 
> Looking on git log, there are missing two files there:
> 
>   examples/l2fwd-jobstats/Makefile
>   examples/l2fwd-jobstats/main.c
> 
> from patch http://dpdk.org/ml/archives/dev/2015-February/014107.html

Yes, it explains why it works on my machine...
I forgot to add them after fixing merge.
It's fixed now. Sorry for the inconvenience.

[dpdk-dev] ixgbe vector mode not working.

2015-02-25 Thread Thomas Monjalon

2015-02-24 23:36, Stephen Hemminger:
> On Wed, 25 Feb 2015 04:55:09 +
> "Liang, Cunming"  wrote:
> 
> > Hi Stephen,
> > 
> > I tried on the latest mater branch with testpmd.
> > 2 rxq and 2 txq as below, vector pmd on both rx and tx. I can't reproduced 
> > it.
> > I checked your log, on tx side, it looks the tx vector haven't enabled. (it 
> > shows vpmd on rx, spmd on tx).
> > Would you help to share the below params in your app ?
> > RX desc=128 - RX free threshold=32
> > TX desc=512 - TX free threshold=32
> > TX RS bit threshold=32 - TXQ flags=0xf01
> > As in your case which using 2 rxq and 1 txq, would you explain the traffic 
> > flow between them.
> > One thread polling packets from each rxq and send to the specified txq ?
> 
> Basic thread model of application is same as examples/qos_sched.
> 
> On ixgbe:
>   RX desc = 4000 - RX free threshold=32
>   TX desc = 512  - TX free threshold=0 so driver sets default of 32
> 
> I was setting rx/tx conf but since examples don't went away from that.
> 
> The whole RX/TX tuning parameters are a very poor programming model only
> a hardware engineer could love. Requiring the application to look at
> driver string and choose the magic parameter settings, is in my opnion
> an indication of using incorrect abstraction.

Yes, improvements are welcome.

[dpdk-dev] [PATCH v2] app/test: add crc32 algorithms equivalence check

2015-02-25 Thread Yerden Zhumabekov

New function test_crc32_hash_alg_equiv() checks whether software,
4-byte operand and 8-byte operand versions of CRC32 hash function
implementations return the same result value.

Signed-off-by: Yerden Zhumabekov 
---
 app/test/test_hash.c |   63 ++
 1 file changed, 63 insertions(+)

diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 76b1b8f..3e94af1 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -177,6 +177,66 @@ static struct rte_hash_parameters ut_params = {
.socket_id = 0,
 };

+#define CRC32_ITERATIONS (1U << 20)
+#define CRC32_DWORDS (1U << 6)
+/*
+ * Test if all CRC32 implementations yield the same hash value
+ */
+static int
+test_crc32_hash_alg_equiv(void)
+{
+   uint32_t hash_val;
+   uint32_t init_val;
+   uint64_t data64[CRC32_DWORDS];
+   unsigned i, j;
+   size_t data_len;
+
+   printf("# CRC32 implementations equivalence test\n");
+   for (i = 0; i < CRC32_ITERATIONS; i++) {
+   /* Randomizing data_len of data set */
+   data_len = (size_t) ((rte_rand() % sizeof(data64)) + 1);
+   init_val = (uint32_t) rte_rand();
+
+   /* Fill the data set */
+   for (j = 0; j < CRC32_DWORDS; j++)
+   data64[j] = rte_rand();
+
+   /* Calculate software CRC32 */
+   rte_hash_crc_set_alg(CRC32_SW);
+   hash_val = rte_hash_crc(data64, data_len, init_val);
+
+   /* Check against 4-byte-operand sse4.2 CRC32 if available */
+   rte_hash_crc_set_alg(CRC32_SSE42);
+   if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
+   printf("Failed checking CRC32_SW against 
CRC32_SSE42\n");
+   break;
+   }
+
+   /* Check against 8-byte-operand sse4.2 CRC32 if available */
+   rte_hash_crc_set_alg(CRC32_SSE42_x64);
+   if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
+   printf("Failed checking CRC32_SW against 
CRC32_SSE42_x64\n");
+   break;
+   }
+   }
+
+   /* Resetting to best available algorithm */
+   rte_hash_crc_set_alg(CRC32_SSE42_x64);
+
+   if (i == CRC32_ITERATIONS)
+   return 0;
+
+   printf("Failed test data (hex):\n");
+
+   for (j = 0; j < data_len; j++) {
+   printf("%02X", ((uint8_t *)data64)[j]);
+   if ((j+1) % 16 == 0 || j == data_len - 1)
+   printf("\n");
+   }
+
+   return -1;
+}
+
 /*
  * Test a hash function.
  */
@@ -1356,6 +1416,9 @@ test_hash(void)

run_hash_func_tests();

+   if (test_crc32_hash_alg_equiv() < 0)
+   return -1;
+
return 0;
 }

-- 
1.7.9.5

[dpdk-dev] [PATCH v1 0/2] eal: fix symbol missing in version map

2015-02-25 Thread Thomas Monjalon

> > These two patches are the fixing for the compling error when
> > CONFIG_RTE_BUILD_SHARED_LIB=y.
> > The root cause is *per_lcore__socket_id* and *rte_sys_gettid* are missing
> > in the version map.
> > Thanks for the notification from Tetsuya Mukawa .

Please use Reported-by: in such case.

Fixes: ef76436c6834 ("eal: get unique thread id")
Fixes: 9e29251b2afa ("eal: thread affinity API")

> > Cunming Liang (2):
> >   eal/linux: fix symbol missing in version map
> >   eal/bsd: fix symbol missing in version map

Merged together

> Series Acked-by: John McNamara 

Applied, thanks

1 2 >

1 - 100 of 131 matches

Mail list logo