date:20151002

[dpdk-dev] [PATCH] lib: added support for armv7 architecture

2015-10-02 Thread David Hunt

From: Amruta Zende 

Signed-off-by: Amruta Zende 
Signed-off-by: David Hunt 
---
 MAINTAINERS|5 +
 config/defconfig_arm-native-linuxapp-gcc   |   56 
 .../common/include/arch/arm/rte_atomic.h   |  269 
 .../common/include/arch/arm/rte_byteorder.h|  146 +++
 .../common/include/arch/arm/rte_cpuflags.h |  138 ++
 .../common/include/arch/arm/rte_cycles.h   |   77 ++
 .../common/include/arch/arm/rte_memcpy.h   |  101 
 .../common/include/arch/arm/rte_prefetch.h |   64 +
 .../common/include/arch/arm/rte_rwlock.h   |   70 +
 .../common/include/arch/arm/rte_spinlock.h |  116 +
 lib/librte_eal/common/include/arch/arm/rte_vect.h  |   37 +++
 lib/librte_eal/linuxapp/Makefile   |3 +
 lib/librte_eal/linuxapp/arm_pmu/Makefile   |   52 
 lib/librte_eal/linuxapp/arm_pmu/rte_enable_pmu.c   |   83 ++
 mk/arch/arm/rte.vars.mk|   58 +
 mk/machine/armv7-a/rte.vars.mk |   63 +
 mk/toolchain/gcc/rte.vars.mk   |8 +-
 17 files changed, 1343 insertions(+), 3 deletions(-)
 create mode 100644 config/defconfig_arm-native-linuxapp-gcc
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_atomic.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_byteorder.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cycles.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_rwlock.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_spinlock.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_vect.h
 create mode 100755 lib/librte_eal/linuxapp/arm_pmu/Makefile
 create mode 100644 lib/librte_eal/linuxapp/arm_pmu/rte_enable_pmu.c
 create mode 100644 mk/arch/arm/rte.vars.mk
 create mode 100644 mk/machine/armv7-a/rte.vars.mk

diff --git a/MAINTAINERS b/MAINTAINERS
index 080a8e8..9d99d53 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -124,6 +124,11 @@ IBM POWER
 M: Chao Zhu 
 F: lib/librte_eal/common/include/arch/ppc_64/

+Arm V7
+M: Amrute Zende 
+M: David Hunt 
+F: lib/librte_eal/common/include/arch/arm/
+
 Intel x86
 M: Bruce Richardson 
 M: Konstantin Ananyev 
diff --git a/config/defconfig_arm-native-linuxapp-gcc 
b/config/defconfig_arm-native-linuxapp-gcc
new file mode 100644
index 000..159aa36
--- /dev/null
+++ b/config/defconfig_arm-native-linuxapp-gcc
@@ -0,0 +1,56 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2015 Intel Corporation. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+#
+
+#include "common_linuxapp"
+
+CONFIG_RTE_MACHINE="armv7-a"
+
+CONFIG_RTE_ARCH="arm"
+CONFIG_RTE_ARCH_ARM32=y
+CONFIG_RTE_ARCH_32=y
+
+CONFIG_RTE_TOOLCHAIN="gcc"
+CONFIG_RTE_TOOLCHAIN_GCC=y
+
+CONFIG_RTE_FORCE_INTRINSICS=y
+CONFIG_RTE_LIBRTE_VHOST=n
+CONFIG_RTE_LIBRTE_KNI=n
+CONFIG_RTE_KNI_KMOD=n
+CONFIG_RTE_LIBRTE_LPM=n
+CONFIG_RTE_LIBRTE_ACL=n
+CONFIG_RTE_LIBRTE_SCHED=n
+CONFIG_RTE_LIBRTE_PORT=n
+CONFIG_RTE_LIBRTE_PIPELINE=n
+CONFIG_RTE_LIBRTE_TABLE=n
+CONFIG_RTE_IXGBE_INC_VECTOR=n
+CONFIG_RTE_LIBRTE_VIRTIO_PMD=n

[dpdk-dev] [PATCH] add armv7 architecture support

2015-10-02 Thread David Hunt

This patch provides EAL support for the ARMv7 architecture. We hope that this 
will encourage the ARM community to contribute PMDs for their SoCs to DPDK.

For now, we've added Intel engineers to the MAINTAINERS file. We would like to 
encourage the ARM community to take over maintenance of this area in future, 
and to further improve it.

This patch was tested on AXM5500 and Raspberry Pi 2 Model B+

Amruta Zende (1):
  lib: added support for armv7 architecture

 MAINTAINERS|5 +
 config/defconfig_arm-native-linuxapp-gcc   |   56 
 .../common/include/arch/arm/rte_atomic.h   |  269 
 .../common/include/arch/arm/rte_byteorder.h|  146 +++
 .../common/include/arch/arm/rte_cpuflags.h |  138 ++
 .../common/include/arch/arm/rte_cycles.h   |   77 ++
 .../common/include/arch/arm/rte_memcpy.h   |  101 
 .../common/include/arch/arm/rte_prefetch.h |   64 +
 .../common/include/arch/arm/rte_rwlock.h   |   70 +
 .../common/include/arch/arm/rte_spinlock.h |  116 +
 lib/librte_eal/common/include/arch/arm/rte_vect.h  |   37 +++
 lib/librte_eal/linuxapp/Makefile   |3 +
 lib/librte_eal/linuxapp/arm_pmu/Makefile   |   52 
 lib/librte_eal/linuxapp/arm_pmu/rte_enable_pmu.c   |   83 ++
 mk/arch/arm/rte.vars.mk|   58 +
 mk/machine/armv7-a/rte.vars.mk |   63 +
 mk/toolchain/gcc/rte.vars.mk   |8 +-
 17 files changed, 1343 insertions(+), 3 deletions(-)
 create mode 100644 config/defconfig_arm-native-linuxapp-gcc
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_atomic.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_byteorder.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cycles.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_rwlock.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_spinlock.h
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_vect.h
 create mode 100755 lib/librte_eal/linuxapp/arm_pmu/Makefile
 create mode 100644 lib/librte_eal/linuxapp/arm_pmu/rte_enable_pmu.c
 create mode 100644 mk/arch/arm/rte.vars.mk
 create mode 100644 mk/machine/armv7-a/rte.vars.mk

-- 
1.7.4.1

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-10-02 Thread Gleb Natapov

On Fri, Oct 02, 2015 at 05:00:14PM +0300, Michael S. Tsirkin wrote:
> On Thu, Oct 01, 2015 at 02:02:24PM -0700, Alexander Duyck wrote:
> > validation and translation would add 10s if not 100s of nanoseconds to the
> > time needed to process each packet.  In addition we are talking about doing
> > this in kernel space which means we wouldn't really be able to take
> > advantage of things like SSE or AVX instructions.
> 
> Yes. But the nice thing is that it's rearming so it can happen on
> a separate core, in parallel with packet processing.
> It does not need to add to latency.
> 
Modern nics have no less queues than most machines has cores. There is
no such thing as free core to offload you processing to, otherwise you
designed your application wrong and waste cpu cycles.

> You will burn up more CPU, but again, all this for boxes/hypervisors
> without an IOMMU.
> 
> I'm sure people can come up with even better approaches, once enough
> people get it that kernel absolutely needs to be protected from
> userspace.
> 
People should not "get" things which are, lets be polite here, untrue.
The kernel never tried to protect itself from userspace rumning on
behalf of root. Secure boot, which is quite recent, is may be an only
instance where kernel tries to do so (unfortunately) and it does so by
disabling things if boot is secure. Linux was always "jack of all
trades" and was suitable to run on a machine with secure boot and a vm
that acts as application container or embedded device running packet
forwarding.

the only valid point is that nobody should debug crashes that may be
caused by buggy userspace and tainting kernel solves that.

> Long term, the right thing to do is to focus on IOMMU support. This
> gives you hardware-based memory protection without need to burn up CPU
> cycles.
> 
> -- 
> MST

--
Gleb.

[dpdk-dev] [PATCH 1/6] cxgbe: Optimize forwarding performance for 40G

2015-10-02 Thread Aaron Conole

Hi Rahul,

Rahul Lakkireddy  writes:

> Update sge initialization with respect to free-list manager configuration
> and ingress arbiter. Also update refill logic to refill mbufs only after
> a certain threshold for rx.  Optimize tx packet prefetch and free.
<>
>   for (i = 0; i < sd->coalesce.idx; i++) {
> - rte_pktmbuf_free(sd->coalesce.mbuf[i]);
> + struct rte_mbuf *tmp = sd->coalesce.mbuf[i];
> +
> + do {
> + struct rte_mbuf *next = tmp->next;
> +
> + rte_pktmbuf_free_seg(tmp);
> + tmp = next;
> + } while (tmp);
>   sd->coalesce.mbuf[i] = NULL;
Pardon my ignorance here, but rte_pktmbuf_free does this work. I can't
actually see much difference between your rewrite of this block, and
the implementation of rte_pktmbuf_free() (apart from moving your branch
to the end of the function). Did your microbenchmarking really show this
as an improvement? 

Thanks for your time,
Aaron

[dpdk-dev] [PATCH 3/3] example: PTP client slave minimal implementation

2015-10-02 Thread Daniel Mrzyglod

Add a sample application that acts as a PTP slave using the
DPDK ieee1588 functions.

Signed-off-by: Daniel Mrzyglod 
---
 MAINTAINERS|   3 +
 examples/Makefile  |   1 +
 examples/ptpclient/Makefile|  59 +
 examples/ptpclient/ptpclient.c | 525 +
 4 files changed, 588 insertions(+)
 create mode 100644 examples/ptpclient/Makefile
 create mode 100644 examples/ptpclient/ptpclient.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 080a8e8..a80ce96 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -514,3 +514,6 @@ F: examples/tep_termination/
 F: examples/vmdq/
 F: examples/vmdq_dcb/
 F: doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
+
+M: Daniel Mrzyglod 
+F: examples/ptpclient
\ No newline at end of file
diff --git a/examples/Makefile b/examples/Makefile
index b4eddbd..4672534 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -74,5 +74,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_XEN_DOM0) += vhost_xen
 DIRS-y += vmdq
 DIRS-y += vmdq_dcb
 DIRS-$(CONFIG_RTE_LIBRTE_POWER) += vm_power_manager
+DIRS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ptpclient

 include $(RTE_SDK)/mk/rte.extsubdir.mk
diff --git a/examples/ptpclient/Makefile b/examples/ptpclient/Makefile
new file mode 100644
index 000..503339f
--- /dev/null
+++ b/examples/ptpclient/Makefile
@@ -0,0 +1,59 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overriddegitn by command line or environment
+RTE_TARGET ?= x86_64-native-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# binary name
+APP = ptpclient
+
+# all source are stored in SRCS-y
+SRCS-y := ptpclient.c
+#SRCS-$(CONFIG_RTE_LIBRTE_IEEE1588) := ptpclient.c
+
+
+CFLAGS += $(WERROR_FLAGS)
+
+# workaround for a gcc bug with noreturn attribute
+# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
+ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+CFLAGS_main.o += -Wno-return-type
+endif
+
+EXTRA_CFLAGS += -O3
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/ptpclient/ptpclient.c b/examples/ptpclient/ptpclient.c
new file mode 100644
index 000..1fe8e6d
--- /dev/null
+++ b/examples/ptpclient/ptpclient.c
@@ -0,0 +1,525 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+

[dpdk-dev] [PATCH 2/3] ixgbe: add additional ieee1588 support functions

2015-10-02 Thread Daniel Mrzyglod

Add additional functions to support the existing IEEE1588
functionality and to enable getting, setting and adjusting
the device time.

Signed-off-by: Daniel Mrzyglod 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 250 +--
 drivers/net/ixgbe/ixgbe_ethdev.h |  24 
 2 files changed, 263 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index ec2918c..d0c575f 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -126,10 +126,12 @@
 #define IXGBE_HKEY_MAX_INDEX 10

 /* Additional timesync values. */
-#define IXGBE_TIMINCA_16NS_SHIFT 24
-#define IXGBE_TIMINCA_INCVALUE   1600
-#define IXGBE_TIMINCA_INIT   ((0x02 << IXGBE_TIMINCA_16NS_SHIFT) \
- | IXGBE_TIMINCA_INCVALUE)
+#define NSEC_PER_SEC 10L
+#define IXGBE_INCVAL_10GB0x
+#define IXGBE_INCVAL_SHIFT_10GB  28
+#define IXGBE_INCVAL_SHIFT_82599 7
+#define IXGBE_INCPER_SHIFT_82599 24
+#define IXGBE_CYCLECOUTER_MASK   0x

 static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);
 static int eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev);
@@ -325,6 +327,11 @@ static int ixgbe_timesync_read_rx_timestamp(struct 
rte_eth_dev *dev,
uint32_t flags);
 static int ixgbe_timesync_read_tx_timestamp(struct rte_eth_dev *dev,
struct timespec *timestamp);
+static int ixgbe_timesync_adjust(struct rte_eth_dev *dev, int64_t delta);
+static int ixgbe_timesync_gettime(struct rte_eth_dev *dev,
+   struct timespec *timestamp);
+static int ixgbe_timesync_settime(struct rte_eth_dev *dev,
+   struct timespec *timestamp);

 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -465,6 +472,9 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
.get_eeprom_length= ixgbe_get_eeprom_length,
.get_eeprom   = ixgbe_get_eeprom,
.set_eeprom   = ixgbe_set_eeprom,
+   .timesync_adjust = ixgbe_timesync_adjust,
+   .timesync_gettime = ixgbe_timesync_gettime,
+   .timesync_settime = ixgbe_timesync_settime,
 };

 /*
@@ -5241,20 +5251,223 @@ ixgbe_dev_set_mc_addr_list(struct rte_eth_dev *dev,
 ixgbe_dev_addr_list_itr, TRUE);
 }

+static inline uint64_t
+timespec_to_ns(const struct timespec *ts)
+{
+   return ((uint64_t) ts->tv_sec * NSEC_PER_SEC) + ts->tv_nsec;
+}
+
+static struct timespec
+ns_to_timespec(int64_t nsec)
+{
+struct timespec ts = {0, 0};
+int32_t rem;
+
+if (nsec == 0)
+return ts;
+rem = nsec % NSEC_PER_SEC;
+ts.tv_sec = nsec / NSEC_PER_SEC;
+
+if (unlikely(rem < 0)) {
+ts.tv_sec--;
+rem += NSEC_PER_SEC;
+}
+
+ts.tv_nsec = rem;
+
+return ts;
+}
+
+static inline uint64_t
+cyclecounter_cycles_to_ns(const struct cyclecounter *cc,
+ uint64_t cycles, uint64_t mask, uint64_t 
*frac)
+{
+   uint64_t ns = cycles;
+
+   ns = (ns * cc->mult) + *frac;
+   *frac = ns & mask;
+   return ns >> cc->shift;
+}
+
+static uint64_t
+cyclecounter_cycles_to_ns_backwards(const struct cyclecounter *cc,
+  uint64_t cycles, uint64_t mask __rte_unused, 
uint64_t frac)
+{
+   uint64_t ns = (uint64_t) cycles;
+
+   ns = ((ns * cc->mult) - frac) >> cc->shift;
+
+   return ns;
+}
+
+static uint64_t
+timecounter_cycles_to_ns_time(struct timecounter *tc, uint64_t cycle_tstamp)
+{
+   uint64_t delta = (cycle_tstamp - tc->cycle_last) & tc->cc->mask;
+   uint64_t nsec = tc->nsec, frac = tc->frac;
+
+
+   /* Cycle counts that are corectly converted as they
+* are between -1/2 max cycle count and +1/2max cycle count
+* */
+   if (delta > tc->cc->mask / 2) {
+   delta = (tc->cycle_last - cycle_tstamp) & tc->cc->mask;
+   nsec -= cyclecounter_cycles_to_ns_backwards(tc->cc, delta, 
tc->mask, frac);
+   } else {
+   nsec += cyclecounter_cycles_to_ns(tc->cc, delta, tc->mask, 
);
+   }
+
+   return nsec;
+}
+
+static uint64_t
+ixgbe_read_timesync_cyclecounter(struct rte_eth_dev *dev)
+{
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint64_t systim_cycles = 0;
+
+   systim_cycles |= (uint64_t)IXGBE_READ_REG(hw, IXGBE_SYSTIML);
+   systim_cycles |= (uint64_t)IXGBE_READ_REG(hw, IXGBE_SYSTIMH) << 32;
+
+   return systim_cycles;
+}
+
+static uint64_t
+timecounter_read_ns_delta(struct rte_eth_dev *dev)
+{
+   uint64_t cycle_now, cycle_delta;
+   uint64_t ns_offset;
+   struct ixgbe_adapter *adapter =
+   (struct ixgbe_adapter *)dev->data->dev_private;
+
+   /* read cycle counter: */
+

[dpdk-dev] [PATCH 1/3] ethdev: add additional ieee1588 support functions

2015-10-02 Thread Daniel Mrzyglod

Add additional functions to support the existing IEEE1588
functionality.

* rte_eth_timesync_settime(), function to set the device clock time.
* rte_eth_timesync_gettime, function to get the device clock time.
* rte_eth_timesync_adjust, function to adjust the device clock time.

Signed-off-by: Daniel Mrzyglod 
---
 lib/librte_ether/rte_ethdev.c  | 36 +++
 lib/librte_ether/rte_ethdev.h  | 64 ++
 lib/librte_ether/rte_ether_version.map |  9 +
 3 files changed, 109 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index f593f6e..6f26f3a 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3272,6 +3272,42 @@ rte_eth_timesync_read_rx_timestamp(uint8_t port_id, 
struct timespec *timestamp,
 }

 int
+rte_eth_timesync_adjust(uint8_t port_id, int64_t delta)
+{
+   struct rte_eth_dev *dev;
+
+   VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->timesync_adjust, -ENOTSUP);
+   return (*dev->dev_ops->timesync_adjust)(dev, delta);
+}
+
+int
+rte_eth_timesync_gettime(uint8_t port_id, struct timespec *timestamp)
+{
+   struct rte_eth_dev *dev;
+
+   VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->timesync_gettime, -ENOTSUP);
+   return (*dev->dev_ops->timesync_gettime)(dev, timestamp);
+}
+
+int
+rte_eth_timesync_settime(uint8_t port_id, struct timespec *timestamp)
+{
+   struct rte_eth_dev *dev;
+
+   VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   dev = _eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->timesync_settime, -ENOTSUP);
+   return (*dev->dev_ops->timesync_settime)(dev, timestamp);
+}
+
+int
 rte_eth_timesync_read_tx_timestamp(uint8_t port_id, struct timespec *timestamp)
 {
struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 8a8c82b..6fdaacd 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1129,6 +1129,17 @@ typedef int (*eth_timesync_read_tx_timestamp_t)(struct 
rte_eth_dev *dev,
struct timespec *timestamp);
 /**< @internal Function used to read a TX IEEE1588/802.1AS timestamp. */

+typedef int (*eth_timesync_adjust)(struct rte_eth_dev *dev, int64_t);
+/**< @internal Function used to adjust device clock */
+
+typedef int (*eth_timesync_gettime)(struct rte_eth_dev *dev,
+   struct timespec *timestamp);
+/**< @internal Function used to get time from device clock. */
+
+typedef int (*eth_timesync_settime)(struct rte_eth_dev *dev,
+   struct timespec *timestamp);
+/**< @internal Function used to get time from device clock */
+
 typedef int (*eth_get_reg_length_t)(struct rte_eth_dev *dev);
 /**< @internal Retrieve device register count  */

@@ -1312,6 +1323,12 @@ struct eth_dev_ops {
eth_timesync_read_rx_timestamp_t timesync_read_rx_timestamp;
/** Read the IEEE1588/802.1AS TX timestamp. */
eth_timesync_read_tx_timestamp_t timesync_read_tx_timestamp;
+   /** Adjust the device clock */
+   eth_timesync_adjust timesync_adjust;
+   /** Get the device clock timespec */
+   eth_timesync_gettime timesync_gettime;
+   /** Set the device clock timespec */
+   eth_timesync_settime timesync_settime;
 };

 /**
@@ -3598,6 +3615,53 @@ extern int rte_eth_timesync_read_rx_timestamp(uint8_t 
port_id,
 extern int rte_eth_timesync_read_tx_timestamp(uint8_t port_id,
  struct timespec *timestamp);

+/**
+ * Adjust the timesync clock on an Ethernet device..
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param delta
+ *   The adjustment in nanoseconds
+ *
+ * @return
+ *   - 0: Success.
+ *   - -ENODEV: The port ID is invalid.
+ *   - -ENOTSUP: The function is not supported by the Ethernet driver.
+ */
+extern int rte_eth_timesync_adjust(uint8_t port_id, int64_t delta);
+
+/**
+ * Read the time from the timesync clock on an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param time
+ *   Pointer to the timespec struct.
+ *
+ * @return
+ *   - 0: Success.
+ */
+extern int rte_eth_timesync_gettime(uint8_t port_id,
+ struct timespec *time);
+
+
+/**
+ * Set the time of the timesync clock on an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param time
+ *   Pointer to the timespec struct.
+ *
+ * @return
+ *   - 0: Success.
+ *   - -EINVAL: No timestamp is available.
+ *   - -ENODEV: The port ID is invalid.
+ *   - -ENOTSUP: The function is not supported by the Ethernet driver.
+ */
+extern int rte_eth_timesync_settime(uint8_t port_id,
+

[dpdk-dev] [PATCH 0/3] add sample ptp slave application

2015-10-02 Thread Daniel Mrzyglod

Add a sample application that acts as a PTP slave using the DPDK IEEE1588
functions.

Also add some additional IEEE1588 support functions to enable getting,
setting and adjusting the device time.

Some V1 limitations of the app:

* The mater clock sequence id and clock id are not verified fully.
* Only one master clock is supported/assumed.

To be added:

* Support for igb and i40e.
* Multiple checks on clock source.
* Some additional protocol values may be required to be parsed for more
  complex PTP environments.
* Add frequency adjustment as well as absolute time adjustment.
* Make the implementation NIC speed independent.
* Check for linkup/down.




Daniel Mrzyglod (3):
  ethdev: add additional ieee1588 support functions
  ixgbe: add additional ieee1588 support functions
  example: PTP client slave minimal implementation

 MAINTAINERS|   3 +
 drivers/net/ixgbe/ixgbe_ethdev.c   | 250 +++-
 drivers/net/ixgbe/ixgbe_ethdev.h   |  24 ++
 examples/Makefile  |   1 +
 examples/ptpclient/Makefile|  59 
 examples/ptpclient/ptpclient.c | 525 +
 lib/librte_ether/rte_ethdev.c  |  36 +++
 lib/librte_ether/rte_ethdev.h  |  64 
 lib/librte_ether/rte_ether_version.map |   9 +
 9 files changed, 960 insertions(+), 11 deletions(-)
 create mode 100644 examples/ptpclient/Makefile
 create mode 100644 examples/ptpclient/ptpclient.c

-- 
2.1.0

[dpdk-dev] [PATCH] hash: free internal ring when freeing hash

2015-10-02 Thread Pablo de Lara

Since freeing a ring is now possible, then when freeing
a hash table, its internal ring can be freed as well.
Therefore when a new table, with the same name as a previously
freed table, is created, there is no need to look up
the already allocated ring.

Signed-off-by: Pablo de Lara 
---

This patch depends on patch "ring: add function to free a ring"
(http://dpdk.org/dev/patchwork/patch/7376/)

 lib/librte_hash/rte_cuckoo_hash.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index 7019763..409fc2e 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -180,7 +180,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
struct rte_hash_list *hash_list;
struct rte_ring *r = NULL;
char hash_name[RTE_HASH_NAMESIZE];
-   void *ptr, *k = NULL;
+   void *k = NULL;
void *buckets = NULL;
char ring_name[RTE_RING_NAMESIZE];
unsigned i;
@@ -288,13 +288,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 #endif

snprintf(ring_name, sizeof(ring_name), "HT_%s", params->name);
-   r = rte_ring_lookup(ring_name);
-   if (r != NULL) {
-   /* clear the free ring */
-   while (rte_ring_dequeue(r, ) == 0)
-   rte_pause();
-   } else
-   r = rte_ring_create(ring_name, rte_align32pow2(params->entries 
+ 1),
+   r = rte_ring_create(ring_name, rte_align32pow2(params->entries + 1),
params->socket_id, 0);
if (r == NULL) {
RTE_LOG(ERR, HASH, "memory allocation failed\n");
@@ -363,6 +357,7 @@ rte_hash_free(struct rte_hash *h)

rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

+   rte_ring_free(h->free_slots);
rte_free(h->key_store);
rte_free(h->buckets);
rte_free(h);
-- 
2.4.3

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-10-02 Thread Michael S. Tsirkin

On Thu, Oct 01, 2015 at 02:02:24PM -0700, Alexander Duyck wrote:
> validation and translation would add 10s if not 100s of nanoseconds to the
> time needed to process each packet.  In addition we are talking about doing
> this in kernel space which means we wouldn't really be able to take
> advantage of things like SSE or AVX instructions.

Yes. But the nice thing is that it's rearming so it can happen on
a separate core, in parallel with packet processing.
It does not need to add to latency.

You will burn up more CPU, but again, all this for boxes/hypervisors
without an IOMMU.

I'm sure people can come up with even better approaches, once enough
people get it that kernel absolutely needs to be protected from
userspace.

Long term, the right thing to do is to focus on IOMMU support. This
gives you hardware-based memory protection without need to burn up CPU
cycles.

-- 
MST

[dpdk-dev] [PATCH v4] ring: add function to free a ring

2015-10-02 Thread Pablo de Lara

From: "De Lara Guarch, Pablo" 

When creating a ring, a memzone is created to allocate it in memory,
but the ring could not be freed, as memzones could not be.

Since memzones can be freed now, then rings can be as well,
taking into account if they were initialized using pre-allocated memory
(in which case, memory should be freed externally) or using rte_memzone_reserve
(with rte_ring_create), freeing the memory with rte_memzone_free.

Signed-off-by: Pablo de Lara 
---
Changes in v4:
 - Include below missing patch ID which this patch depends on

Changes in v3:
 - Simplify patch using stored memzone address in ring structure
 - Change copyright date

Changes in v2:
 - Include note in release notes
 - Add error log when ring cannot be freed

This patch depends on patch "rte_ring: store memzone pointer inside ring"
(http://dpdk.org/dev/patchwork/patch/7308)

 doc/guides/rel_notes/release_2_2.rst |  4 +++
 lib/librte_ring/rte_ring.c   | 47 +++-
 lib/librte_ring/rte_ring.h   |  7 ++
 lib/librte_ring/rte_ring_version.map |  7 ++
 4 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index 5687676..24937ac 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -4,6 +4,10 @@ DPDK Release 2.2
 New Features
 

+* **Enabled freeing of rte_ring.**
+
+  New function rte_ring_free() allows the user to free a ring
+  if it was created with rte_ring_create().

 Resolved Issues
 ---
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 4e78e14..d80faf3 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -209,6 +209,51 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
return r;
 }

+/* free the ring */
+void
+rte_ring_free(struct rte_ring *r)
+{
+   struct rte_ring_list *ring_list = NULL;
+   struct rte_tailq_entry *te;
+
+   if (r == NULL)
+   return;
+
+   /*
+* Ring was not created with rte_ring_create,
+* therefore, there is no memzone to free.
+*/
+   if (r->memzone == NULL) {
+   RTE_LOG(ERR, RING, "Cannot free ring (not created with 
rte_ring_create()");
+   return;
+   }
+
+   if (rte_memzone_free(r->memzone) != 0) {
+   RTE_LOG(ERR, RING, "Cannot free memory\n");
+   return;
+   }
+
+   ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list);
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find out tailq entry */
+   TAILQ_FOREACH(te, ring_list, next) {
+   if (te->data == (void *) r)
+   break;
+   }
+
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(ring_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+   rte_free(te);
+}
+
 /*
  * change the high water mark. If *count* is 0, water marking is
  * disabled
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index df45f3f..fb5a626 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -304,6 +304,13 @@ int rte_ring_init(struct rte_ring *r, const char *name, 
unsigned count,
  */
 struct rte_ring *rte_ring_create(const char *name, unsigned count,
 int socket_id, unsigned flags);
+/**
+ * De-allocate all memory used by the ring.
+ *
+ * @param r
+ *   Ring to free
+ */
+void rte_ring_free(struct rte_ring *r);

 /**
  * Change the high water mark.
diff --git a/lib/librte_ring/rte_ring_version.map 
b/lib/librte_ring/rte_ring_version.map
index 982fdd1..5474b98 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -11,3 +11,10 @@ DPDK_2.0 {

local: *;
 };
+
+DPDK_2.2 {
+   global:
+
+   rte_ring_free;
+
+} DPDK_2.0;
-- 
2.4.3

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-10-02 Thread Michael S. Tsirkin

On Thu, Oct 01, 2015 at 02:17:49PM -0700, Alexander Duyck wrote:
> On 10/01/2015 02:42 AM, Michael S. Tsirkin wrote:
> >On Thu, Oct 01, 2015 at 12:22:46PM +0300, Avi Kivity wrote:
> >>even when they are some users
> >>prefer to avoid the performance penalty.
> >I don't think there's a measureable penalty from passing through the
> >IOMMU, as long as mappings are mostly static (i.e. iommu=pt).  I sure
> >never saw any numbers that show such.
> 
> It depends on the IOMMU.  I believe Intel had a performance penalty on all
> CPUs prior to Ivy Bridge.  Since then things have improved to where they are
> comparable to bare metal.
> 
> The graph on page 5 of
> https://networkbuilders.intel.com/docs/Network_Builders_RA_vBRAS_Final.pdf
> shows the penalty clear as day.  Pretty much anything before Ivy Bridge w/
> small packets is slowed to a crawl with an IOMMU enabled.
> 
> - Alex

VMs are running with IOMMU enabled anyway.
Avi here tells us no one uses SRIOV on bare metal so ...
we don't need to argue about that.

-- 
MST

[dpdk-dev] [PATCH 6/6] doc: Update cxgbe documentation and release notes

2015-10-02 Thread Rahul Lakkireddy

- Add a missed step to mount huge pages in Linux.
- Re-structure Sample Application Notes.
- Add Jumbo Frame support to list of supported features and instructions
  on how to enable it via testpmd.
- Update release notes.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
 doc/guides/nics/cxgbe.rst| 81 +---
 doc/guides/rel_notes/release_2_2.rst |  5 +++
 2 files changed, 61 insertions(+), 25 deletions(-)

diff --git a/doc/guides/nics/cxgbe.rst b/doc/guides/nics/cxgbe.rst
index 148cd25..d718f19 100644
--- a/doc/guides/nics/cxgbe.rst
+++ b/doc/guides/nics/cxgbe.rst
@@ -50,6 +50,7 @@ CXGBE PMD has support for:
 - Promiscuous mode
 - All multicast mode
 - Port hardware statistics
+- Jumbo frames

 Limitations
 ---
@@ -211,8 +212,8 @@ Unified Wire package for Linux operating system are as 
follows:

   firmware-version: 1.13.32.0, TP 0.1.4.8

-Sample Application Notes
-
+Running testpmd
+~~~

 This section demonstrates how to launch **testpmd** with Chelsio T5
 devices managed by librte_pmd_cxgbe in Linux operating system.
@@ -260,6 +261,13 @@ devices managed by librte_pmd_cxgbe in Linux operating 
system.

   echo 1024 > 
/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages

+#. Mount huge pages:
+
+   .. code-block:: console
+
+  mkdir /mnt/huge
+  mount -t hugetlbfs nodev /mnt/huge
+
 #. Load igb_uio or vfio-pci driver:

.. code-block:: console
@@ -329,19 +337,7 @@ devices managed by librte_pmd_cxgbe in Linux operating 
system.
 .. note::

Flow control pause TX/RX is disabled by default and can be enabled via
-   testpmd as follows:
-
-   .. code-block:: console
-
-  testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off 
autoneg on 0
-  testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off 
autoneg on 1
-
-   To disable again, use:
-
-   .. code-block:: console
-
-  testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off 
autoneg off 0
-  testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off 
autoneg off 1
+   testpmd. Refer section :ref:`flow-control` for more details.

 FreeBSD
 ---
@@ -409,8 +405,8 @@ Unified Wire package for FreeBSD operating system are as 
follows:

   dev.t5nex.0.firmware_version: 1.13.32.0

-Sample Application Notes
-
+Running testpmd
+~~~

 This section demonstrates how to launch **testpmd** with Chelsio T5
 devices managed by librte_pmd_cxgbe in FreeBSD operating system.
@@ -543,16 +539,51 @@ devices managed by librte_pmd_cxgbe in FreeBSD operating 
system.
 .. note::

Flow control pause TX/RX is disabled by default and can be enabled via
-   testpmd as follows:
+   testpmd. Refer section :ref:`flow-control` for more details.

-   .. code-block:: console
+Sample Application Notes
+

-  testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off 
autoneg on 0
-  testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off 
autoneg on 1
+.. _flow-control:

-   To disable again, use:
+Enable/Disable Flow Control
+~~~

-   .. code-block:: console
+Flow control pause TX/RX is disabled by default and can be enabled via
+testpmd as follows:
+
+.. code-block:: console
+
+   testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off autoneg 
on 0
+   testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off autoneg 
on 1
+
+To disable again, run:
+
+.. code-block:: console
+
+   testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off autoneg 
off 0
+   testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off autoneg 
off 1
+
+Jumbo Mode
+~~
+
+There are two ways to enable sending and receiving of jumbo frames via testpmd.
+One method involves using the **mtu** command, which changes the mtu of an
+individual port without having to stop the selected port. Another method
+involves stopping all the ports first and then running **max-pkt-len** command
+to configure the mtu of all the ports with a single command.
+
+- To configure each port individually, run the mtu command as follows:
+
+  .. code-block:: console
+
+ testpmd> port config mtu 0 9000
+ testpmd> port config mtu 1 9000
+
+- To configure all the ports at once, stop all the ports first and run the
+  max-pkt-len command as follows:
+
+  .. code-block:: console

-  testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off 
autoneg off 0
-  testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off 
autoneg off 1
+ testpmd> port stop all
+ testpmd> port config all max-pkt-len 9000
diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index 5687676..a3f4f77 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -4,6 +4,11 @@ DPDK Release 2.2
 New

[dpdk-dev] [PATCH 5/6] cxgbe: Allow apps to change mtu

2015-10-02 Thread Rahul Lakkireddy

Add a mtu_set() eth_dev_ops to allow DPDK apps to modify device mtu.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
 drivers/net/cxgbe/cxgbe_ethdev.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index 6d7b29c..a8e057b 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -225,6 +225,34 @@ static int cxgbe_dev_link_update(struct rte_eth_dev 
*eth_dev,
return 0;
 }

+static int cxgbe_dev_mtu_set(struct rte_eth_dev *eth_dev, uint16_t mtu)
+{
+   struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private);
+   struct adapter *adapter = pi->adapter;
+   struct rte_eth_dev_info dev_info;
+   int err;
+   uint16_t new_mtu = mtu + ETHER_HDR_LEN + ETHER_CRC_LEN;
+
+   cxgbe_dev_info_get(eth_dev, _info);
+
+   /* Must accommodate at least ETHER_MIN_MTU */
+   if ((new_mtu < ETHER_MIN_MTU) || (new_mtu > dev_info.max_rx_pktlen))
+   return -EINVAL;
+
+   /* set to jumbo mode if needed */
+   if (new_mtu > ETHER_MAX_LEN)
+   eth_dev->data->dev_conf.rxmode.jumbo_frame = 1;
+   else
+   eth_dev->data->dev_conf.rxmode.jumbo_frame = 0;
+
+   err = t4_set_rxmode(adapter, adapter->mbox, pi->viid, new_mtu, -1, -1,
+   -1, -1, true);
+   if (!err)
+   eth_dev->data->dev_conf.rxmode.max_rx_pkt_len = new_mtu;
+
+   return err;
+}
+
 static int cxgbe_dev_tx_queue_start(struct rte_eth_dev *eth_dev,
uint16_t tx_queue_id);
 static int cxgbe_dev_rx_queue_start(struct rte_eth_dev *eth_dev,
@@ -724,6 +752,7 @@ static struct eth_dev_ops cxgbe_eth_dev_ops = {
.dev_configure  = cxgbe_dev_configure,
.dev_infos_get  = cxgbe_dev_info_get,
.link_update= cxgbe_dev_link_update,
+   .mtu_set= cxgbe_dev_mtu_set,
.tx_queue_setup = cxgbe_dev_tx_queue_setup,
.tx_queue_start = cxgbe_dev_tx_queue_start,
.tx_queue_stop  = cxgbe_dev_tx_queue_stop,
-- 
2.5.3

[dpdk-dev] [PATCH 4/6] cxgbe: Update rx path to receive jumbo frames

2015-10-02 Thread Rahul Lakkireddy

Ensure jumbo mode is enabled and that the mbuf data room size can
accommodate jumbo size.  If the mbuf data room size can't accommodate
jumbo size, chain mbufs to jumbo size.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
 drivers/net/cxgbe/sge.c | 58 -
 1 file changed, 53 insertions(+), 5 deletions(-)

diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c
index 921173a..91ef363 100644
--- a/drivers/net/cxgbe/sge.c
+++ b/drivers/net/cxgbe/sge.c
@@ -247,6 +247,29 @@ static inline bool fl_starving(const struct adapter 
*adapter,
return fl->avail - fl->pend_cred <= s->fl_starve_thres;
 }

+static inline unsigned int get_buf_size(struct adapter *adapter,
+   const struct rx_sw_desc *d)
+{
+   unsigned int rx_buf_size_idx = d->dma_addr & RX_BUF_SIZE;
+   unsigned int buf_size = 0;
+
+   switch (rx_buf_size_idx) {
+   case RX_SMALL_MTU_BUF:
+   buf_size = FL_MTU_SMALL_BUFSIZE(adapter);
+   break;
+
+   case RX_LARGE_MTU_BUF:
+   buf_size = FL_MTU_LARGE_BUFSIZE(adapter);
+   break;
+
+   default:
+   BUG_ON(1);
+   /* NOT REACHED */
+   }
+
+   return buf_size;
+}
+
 /**
  * free_rx_bufs - free the Rx buffers on an SGE free list
  * @q: the SGE free list to free buffers from
@@ -362,6 +385,14 @@ static unsigned int refill_fl_usembufs(struct adapter 
*adap, struct sge_fl *q,
unsigned int buf_size_idx = RX_SMALL_MTU_BUF;
struct rte_mbuf *buf_bulk[n];
int ret, i;
+   struct rte_pktmbuf_pool_private *mbp_priv;
+   u8 jumbo_en = rxq->rspq.eth_dev->data->dev_conf.rxmode.jumbo_frame;
+
+   /* Use jumbo mtu buffers iff mbuf data room size can fit jumbo data. */
+   mbp_priv = rte_mempool_get_priv(rxq->rspq.mb_pool);
+   if (jumbo_en &&
+   ((mbp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM) >= 9000))
+   buf_size_idx = RX_LARGE_MTU_BUF;

ret = rte_mempool_get_bulk(rxq->rspq.mb_pool, (void *)buf_bulk, n);
if (unlikely(ret != 0)) {
@@ -1439,14 +1470,31 @@ static int process_responses(struct sge_rspq *q, int 
budget,
const struct cpl_rx_pkt *cpl =
(const void *)>cur_desc[1];
bool csum_ok = cpl->csum_calc && !cpl->err_vec;
-   struct rte_mbuf *pkt;
-   u32 len = ntohl(rc->pldbuflen_qid);
+   struct rte_mbuf *pkt, *npkt;
+   u32 len, bufsz;

+   len = ntohl(rc->pldbuflen_qid);
BUG_ON(!(len & F_RSPD_NEWBUF));
pkt = rsd->buf;
-   pkt->data_len = G_RSPD_LEN(len);
-   pkt->pkt_len = pkt->data_len;
-   unmap_rx_buf(>fl);
+   npkt = pkt;
+   len = G_RSPD_LEN(len);
+   pkt->pkt_len = len;
+
+   /* Chain mbufs into len if necessary */
+   while (len) {
+   struct rte_mbuf *new_pkt = rsd->buf;
+
+   bufsz = min(get_buf_size(q->adapter, rsd), len);
+   new_pkt->data_len = bufsz;
+   unmap_rx_buf(>fl);
+   len -= bufsz;
+   npkt->next = new_pkt;
+   npkt = new_pkt;
+   pkt->nb_segs++;
+   rsd = >fl.sdesc[rxq->fl.cidx];
+   }
+   npkt->next = NULL;
+   pkt->nb_segs--;

if (cpl->l2info & htonl(F_RXF_IP)) {
pkt->packet_type = RTE_PTYPE_L3_IPV4;
-- 
2.5.3

[dpdk-dev] [PATCH 3/6] cxgbe: Update tx path to transmit jumbo frames

2015-10-02 Thread Rahul Lakkireddy

Add a non-coalesce path.  Skip coalescing for Jumbo Frames, and send the
packet through non-coalesced path if there are enough credits.  Also,
free these non-coalesced packets while reclaiming credits.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
 drivers/net/cxgbe/sge.c | 96 -
 1 file changed, 64 insertions(+), 32 deletions(-)

diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c
index e540881..921173a 100644
--- a/drivers/net/cxgbe/sge.c
+++ b/drivers/net/cxgbe/sge.c
@@ -199,11 +199,20 @@ static void free_tx_desc(struct sge_txq *q, unsigned int 
n)

 static void reclaim_tx_desc(struct sge_txq *q, unsigned int n)
 {
+   struct tx_sw_desc *d;
unsigned int cidx = q->cidx;

+   d = >sdesc[cidx];
while (n--) {
-   if (++cidx == q->size)
+   if (d->mbuf) {   /* an SGL is present */
+   rte_pktmbuf_free(d->mbuf);
+   d->mbuf = NULL;
+   }
+   ++d;
+   if (++cidx == q->size) {
cidx = 0;
+   d = q->sdesc;
+   }
}
q->cidx = cidx;
 }
@@ -1045,6 +1054,7 @@ int t4_eth_xmit(struct sge_eth_txq *txq, struct rte_mbuf 
*mbuf)
u32 wr_mid;
u64 cntrl, *end;
bool v6;
+   u32 max_pkt_len = txq->eth_dev->data->dev_conf.rxmode.max_rx_pkt_len;

/* Reject xmit if queue is stopped */
if (unlikely(txq->flags & EQ_STOPPED))
@@ -1060,6 +1070,10 @@ out_free:
return 0;
}

+   if ((!(m->ol_flags & PKT_TX_TCP_SEG)) &&
+   (unlikely(m->pkt_len > max_pkt_len)))
+   goto out_free;
+
pi = (struct port_info *)txq->eth_dev->data->dev_private;
adap = pi->adapter;

@@ -1067,7 +1081,7 @@ out_free:
/* align the end of coalesce WR to a 512 byte boundary */
txq->q.coalesce.max = (8 - (txq->q.pidx & 7)) * 8;

-   if (!(m->ol_flags & PKT_TX_TCP_SEG)) {
+   if (!((m->ol_flags & PKT_TX_TCP_SEG) || (m->pkt_len > ETHER_MAX_LEN))) {
if (should_tx_packet_coalesce(txq, mbuf, , adap)) {
if (unlikely(map_mbuf(mbuf, addr) < 0)) {
dev_warn(adap, "%s: mapping err for coalesce\n",
@@ -1114,33 +1128,46 @@ out_free:

len = 0;
len += sizeof(*cpl);
-   lso = (void *)(wr + 1);
-   v6 = (m->ol_flags & PKT_TX_IPV6) != 0;
-   l3hdr_len = m->l3_len;
-   l4hdr_len = m->l4_len;
-   eth_xtra_len = m->l2_len - ETHER_HDR_LEN;
-   len += sizeof(*lso);
-   wr->op_immdlen = htonl(V_FW_WR_OP(FW_ETH_TX_PKT_WR) |
-  V_FW_WR_IMMDLEN(len));
-   lso->lso_ctrl = htonl(V_LSO_OPCODE(CPL_TX_PKT_LSO) |
- F_LSO_FIRST_SLICE | F_LSO_LAST_SLICE |
- V_LSO_IPV6(v6) |
- V_LSO_ETHHDR_LEN(eth_xtra_len / 4) |
- V_LSO_IPHDR_LEN(l3hdr_len / 4) |
- V_LSO_TCPHDR_LEN(l4hdr_len / 4));
-   lso->ipid_ofst = htons(0);
-   lso->mss = htons(m->tso_segsz);
-   lso->seqno_offset = htonl(0);
-   if (is_t4(adap->params.chip))
-   lso->len = htonl(m->pkt_len);
-   else
-   lso->len = htonl(V_LSO_T5_XFER_SIZE(m->pkt_len));
-   cpl = (void *)(lso + 1);
-   cntrl = V_TXPKT_CSUM_TYPE(v6 ? TX_CSUM_TCPIP6 : TX_CSUM_TCPIP) |
- V_TXPKT_IPHDR_LEN(l3hdr_len) |
- V_TXPKT_ETHHDR_LEN(eth_xtra_len);
-   txq->stats.tso++;
-   txq->stats.tx_cso += m->tso_segsz;
+
+   /* Coalescing skipped and we send through normal path */
+   if (!(m->ol_flags & PKT_TX_TCP_SEG)) {
+   wr->op_immdlen = htonl(V_FW_WR_OP(FW_ETH_TX_PKT_WR) |
+  V_FW_WR_IMMDLEN(len));
+   cpl = (void *)(wr + 1);
+   if (m->ol_flags & PKT_TX_IP_CKSUM) {
+   cntrl = hwcsum(adap->params.chip, m) |
+   F_TXPKT_IPCSUM_DIS;
+   txq->stats.tx_cso++;
+   }
+   } else {
+   lso = (void *)(wr + 1);
+   v6 = (m->ol_flags & PKT_TX_IPV6) != 0;
+   l3hdr_len = m->l3_len;
+   l4hdr_len = m->l4_len;
+   eth_xtra_len = m->l2_len - ETHER_HDR_LEN;
+   len += sizeof(*lso);
+   wr->op_immdlen = htonl(V_FW_WR_OP(FW_ETH_TX_PKT_WR) |
+  V_FW_WR_IMMDLEN(len));
+   lso->lso_ctrl = htonl(V_LSO_OPCODE(CPL_TX_PKT_LSO) |
+ F_LSO_FIRST_SLICE | F_LSO_LAST_SLICE |
+ V_LSO_IPV6(v6) |
+ V_LSO_ETHHDR_LEN(eth_xtra_len / 4) |
+

[dpdk-dev] [PATCH 2/6] cxgbe: Update device info and perform sanity checks to enable jumbo frames

2015-10-02 Thread Rahul Lakkireddy

Increase max_rx_pktlen to accommodate jumbo frame size. Perform sanity
checks and enable jumbo mode in rx queue setup. Set link mtu based on
max_rx_pktlen.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
 drivers/net/cxgbe/cxgbe.h|  3 +++
 drivers/net/cxgbe/cxgbe_ethdev.c | 23 +--
 drivers/net/cxgbe/cxgbe_main.c   |  3 ++-
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/net/cxgbe/cxgbe.h b/drivers/net/cxgbe/cxgbe.h
index 97c37d2..adc0d92 100644
--- a/drivers/net/cxgbe/cxgbe.h
+++ b/drivers/net/cxgbe/cxgbe.h
@@ -43,6 +43,9 @@
 #define CXGBE_DEFAULT_TX_DESC_SIZE1024 /* Default TX ring size */
 #define CXGBE_DEFAULT_RX_DESC_SIZE1024 /* Default RX ring size */

+#define CXGBE_MIN_RX_BUFSIZE ETHER_MIN_MTU /* min buf size */
+#define CXGBE_MAX_RX_PKTLEN (9000 + ETHER_HDR_LEN + ETHER_CRC_LEN) /* max pkt 
*/
+
 int cxgbe_probe(struct adapter *adapter);
 int cxgbe_up(struct adapter *adap);
 int cxgbe_down(struct port_info *pi);
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index 478051a..6d7b29c 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -141,8 +141,8 @@ static void cxgbe_dev_info_get(struct rte_eth_dev *eth_dev,
struct adapter *adapter = pi->adapter;
int max_queues = adapter->sge.max_ethqsets / adapter->params.nports;

-   device_info->min_rx_bufsize = 68; /* XXX: Smallest pkt size */
-   device_info->max_rx_pktlen = 1500; /* XXX: For now we support mtu */
+   device_info->min_rx_bufsize = CXGBE_MIN_RX_BUFSIZE;
+   device_info->max_rx_pktlen = CXGBE_MAX_RX_PKTLEN;
device_info->max_rx_queues = max_queues;
device_info->max_tx_queues = max_queues;
device_info->max_mac_addrs = 1;
@@ -498,6 +498,8 @@ static int cxgbe_dev_rx_queue_setup(struct rte_eth_dev 
*eth_dev,
int err = 0;
int msi_idx = 0;
unsigned int temp_nb_desc;
+   struct rte_eth_dev_info dev_info;
+   unsigned int pkt_len = eth_dev->data->dev_conf.rxmode.max_rx_pkt_len;

RTE_SET_USED(rx_conf);

@@ -505,6 +507,17 @@ static int cxgbe_dev_rx_queue_setup(struct rte_eth_dev 
*eth_dev,
  __func__, eth_dev->data->nb_rx_queues, queue_idx, nb_desc,
  socket_id, mp);

+   cxgbe_dev_info_get(eth_dev, _info);
+
+   /* Must accommodate at least ETHER_MIN_MTU */
+   if ((pkt_len < dev_info.min_rx_bufsize) ||
+   (pkt_len > dev_info.max_rx_pktlen)) {
+   dev_err(adap, "%s: max pkt len must be > %d and <= %d\n",
+   __func__, dev_info.min_rx_bufsize,
+   dev_info.max_rx_pktlen);
+   return -EINVAL;
+   }
+
/*  Free up the existing queue  */
if (eth_dev->data->rx_queues[queue_idx]) {
cxgbe_dev_rx_queue_release(eth_dev->data->rx_queues[queue_idx]);
@@ -534,6 +547,12 @@ static int cxgbe_dev_rx_queue_setup(struct rte_eth_dev 
*eth_dev,
if ((>fl) != NULL)
rxq->fl.size = temp_nb_desc;

+   /* Set to jumbo mode if necessary */
+   if (pkt_len > ETHER_MAX_LEN)
+   eth_dev->data->dev_conf.rxmode.jumbo_frame = 1;
+   else
+   eth_dev->data->dev_conf.rxmode.jumbo_frame = 0;
+
err = t4_sge_alloc_rxq(adapter, >rspq, false, eth_dev, msi_idx,
   >fl, t4_ethrx_handler,
   t4_get_mps_bg_map(adapter, pi->tx_chan), mp,
diff --git a/drivers/net/cxgbe/cxgbe_main.c b/drivers/net/cxgbe/cxgbe_main.c
index 316b87d..aff23d0 100644
--- a/drivers/net/cxgbe/cxgbe_main.c
+++ b/drivers/net/cxgbe/cxgbe_main.c
@@ -855,12 +855,13 @@ int link_start(struct port_info *pi)
 {
struct adapter *adapter = pi->adapter;
int ret;
+   unsigned int mtu = pi->eth_dev->data->dev_conf.rxmode.max_rx_pkt_len;

/*
 * We do not set address filters and promiscuity here, the stack does
 * that step explicitly.
 */
-   ret = t4_set_rxmode(adapter, adapter->mbox, pi->viid, 1500, -1, -1,
+   ret = t4_set_rxmode(adapter, adapter->mbox, pi->viid, mtu, -1, -1,
-1, 1, true);
if (ret == 0) {
ret = t4_change_mac(adapter, adapter->mbox, pi->viid,
-- 
2.5.3

[dpdk-dev] [PATCH 1/6] cxgbe: Optimize forwarding performance for 40G

2015-10-02 Thread Rahul Lakkireddy

Update sge initialization with respect to free-list manager configuration
and ingress arbiter. Also update refill logic to refill mbufs only after
a certain threshold for rx.  Optimize tx packet prefetch and free.

Approx. 4 MPPS improvement seen in forwarding performance after the
optimization.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
 drivers/net/cxgbe/base/t4_regs.h | 16 
 drivers/net/cxgbe/cxgbe_main.c   |  7 +++
 drivers/net/cxgbe/sge.c  | 17 -
 3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/drivers/net/cxgbe/base/t4_regs.h b/drivers/net/cxgbe/base/t4_regs.h
index cd28b59..9057e40 100644
--- a/drivers/net/cxgbe/base/t4_regs.h
+++ b/drivers/net/cxgbe/base/t4_regs.h
@@ -266,6 +266,18 @@
 #define A_SGE_FL_BUFFER_SIZE2 0x104c
 #define A_SGE_FL_BUFFER_SIZE3 0x1050

+#define A_SGE_FLM_CFG 0x1090
+
+#define S_CREDITCNT4
+#define M_CREDITCNT0x3U
+#define V_CREDITCNT(x) ((x) << S_CREDITCNT)
+#define G_CREDITCNT(x) (((x) >> S_CREDITCNT) & M_CREDITCNT)
+
+#define S_CREDITCNTPACKING2
+#define M_CREDITCNTPACKING0x3U
+#define V_CREDITCNTPACKING(x) ((x) << S_CREDITCNTPACKING)
+#define G_CREDITCNTPACKING(x) (((x) >> S_CREDITCNTPACKING) & 
M_CREDITCNTPACKING)
+
 #define A_SGE_CONM_CTRL 0x1094

 #define S_EGRTHRESHOLD8
@@ -361,6 +373,10 @@

 #define A_SGE_CONTROL2 0x1124

+#define S_IDMAARBROUNDROBIN19
+#define V_IDMAARBROUNDROBIN(x) ((x) << S_IDMAARBROUNDROBIN)
+#define F_IDMAARBROUNDROBINV_IDMAARBROUNDROBIN(1U)
+
 #define S_INGPACKBOUNDARY16
 #define M_INGPACKBOUNDARY0x7U
 #define V_INGPACKBOUNDARY(x) ((x) << S_INGPACKBOUNDARY)
diff --git a/drivers/net/cxgbe/cxgbe_main.c b/drivers/net/cxgbe/cxgbe_main.c
index 3755444..316b87d 100644
--- a/drivers/net/cxgbe/cxgbe_main.c
+++ b/drivers/net/cxgbe/cxgbe_main.c
@@ -422,6 +422,13 @@ static int adap_init0_tweaks(struct adapter *adapter)
t4_set_reg_field(adapter, A_SGE_CONTROL, V_PKTSHIFT(M_PKTSHIFT),
 V_PKTSHIFT(rx_dma_offset));

+   t4_set_reg_field(adapter, A_SGE_FLM_CFG,
+V_CREDITCNT(M_CREDITCNT) | M_CREDITCNTPACKING,
+V_CREDITCNT(3) | V_CREDITCNTPACKING(1));
+
+   t4_set_reg_field(adapter, A_SGE_CONTROL2, V_IDMAARBROUNDROBIN(1U),
+V_IDMAARBROUNDROBIN(1U));
+
/*
 * Don't include the "IP Pseudo Header" in CPL_RX_PKT checksums: Linux
 * adds the pseudo header itself.
diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c
index 6eb1244..e540881 100644
--- a/drivers/net/cxgbe/sge.c
+++ b/drivers/net/cxgbe/sge.c
@@ -286,8 +286,7 @@ static void unmap_rx_buf(struct sge_fl *q)

 static inline void ring_fl_db(struct adapter *adap, struct sge_fl *q)
 {
-   /* see if we have exceeded q->size / 4 */
-   if (q->pend_cred >= (q->size / 4)) {
+   if (q->pend_cred >= 64) {
u32 val = adap->params.arch.sge_fl_db;

if (is_t4(adap->params.chip))
@@ -995,7 +994,14 @@ static inline int tx_do_packet_coalesce(struct sge_eth_txq 
*txq,
int i;

for (i = 0; i < sd->coalesce.idx; i++) {
-   rte_pktmbuf_free(sd->coalesce.mbuf[i]);
+   struct rte_mbuf *tmp = sd->coalesce.mbuf[i];
+
+   do {
+   struct rte_mbuf *next = tmp->next;
+
+   rte_pktmbuf_free_seg(tmp);
+   tmp = next;
+   } while (tmp);
sd->coalesce.mbuf[i] = NULL;
}
}
@@ -1054,7 +1060,6 @@ out_free:
return 0;
}

-   rte_prefetch0(&((>q)->sdesc->mbuf->pool));
pi = (struct port_info *)txq->eth_dev->data->dev_private;
adap = pi->adapter;

@@ -1070,6 +1075,7 @@ out_free:
txq->stats.mapping_err++;
goto out_free;
}
+   rte_prefetch0((volatile void *)addr);
return tx_do_packet_coalesce(txq, mbuf, cflits, adap,
 pi, addr);
} else {
@@ -1454,7 +1460,8 @@ static int process_responses(struct sge_rspq *q, int 
budget,
unsigned int params;
u32 val;

-   __refill_fl(q->adapter, >fl);
+   if (fl_cap(>fl) - rxq->fl.avail >= 64)
+   __refill_fl(q->adapter, >fl);
params = V_QINTR_TIMER_IDX(X_TIMERREG_UPDATE_CIDX);
q->next_intr_params = params;
val = V_CIDXINC(cidx_inc) | V_SEINTARM(params);
-- 
2.5.3

[dpdk-dev] [PATCH 0/6] cxgbe: Optimize tx/rx for 40GbE and add Jumbo Frame support for CXGBE PMD

2015-10-02 Thread Rahul Lakkireddy

This series of patches improve forwarding performance for Chelsio T5 40GbE
cards and add Jumbo Frame support for cxgbe pmd. Also update documentation
and release notes.

Rahul Lakkireddy (6):
  cxgbe: Optimize forwarding performance for 40G
  cxgbe: Update device info and perform sanity checks to enable jumbo
frames
  cxgbe: Update tx path to transmit jumbo frames
  cxgbe: Update rx path to receive jumbo frames
  cxgbe: Allow apps to change mtu
  doc: Update cxgbe documentation and release notes

 doc/guides/nics/cxgbe.rst|  81 -
 doc/guides/rel_notes/release_2_2.rst |   5 +
 drivers/net/cxgbe/base/t4_regs.h |  16 
 drivers/net/cxgbe/cxgbe.h|   3 +
 drivers/net/cxgbe/cxgbe_ethdev.c |  52 ++-
 drivers/net/cxgbe/cxgbe_main.c   |  10 +-
 drivers/net/cxgbe/sge.c  | 171 ++-
 7 files changed, 268 insertions(+), 70 deletions(-)

-- 
2.5.3

[dpdk-dev] [PATCH] devargs: add blacklisting by linux interface name

2015-10-02 Thread Richardson, Bruce

> -Original Message-
> From: Charles (Chas) Williams [mailto:3chas3 at gmail.com]
> 
> On Fri, 2015-10-02 at 16:18 +0100, Bruce Richardson wrote:
> > On Fri, Oct 02, 2015 at 11:00:07AM -0400, Chas Williams wrote:
> > > If a system is using deterministic interface names, it may be easier
> > > in some cases to use the interface name to blacklist an interface.
> > >
> >
> > Is it possible to do this using the existing arguments, i.e. have the
> > -b flag detect if it's a pci address or name automatically, rather
> > than having to use a separate command-line arg for it?
> 
> You might be able to distinguish names by context.  I doubt interface
> names ever look like PCI addresses.  But that's going to be a bigger
> change since -b will need to be updated to 'blacklist' intead of 'pci-
> blacklist' to prevent confusion.  Or do you just want to overload '-b' and
> keep both long options?
>
I'm not sure about that, to be honest. However, I'd rather not have
too many cmd line options to be maintained in the code. 

Does you proposed blacklisting patch work with non-pci devices as well
as with PCI ones as now?

/Bruce

[dpdk-dev] i40e SRIOV and dpdk

2015-10-02 Thread Serguei Bezverkhi (sbezverk)

Hello,

I was wondering if anybody has tested and managed to get working the following 
scenario:

Host with 2 NIC: 1 Niantic 82599  and 1 Fortville i40e, both NICs configured 
with 8 VFs. There are two VMs running on this host, each VM has two SRIOV 
interfaces 1 for each type (Niantic based VF and Fortville based VF). An 
application inside of VM uses dpdk for packet forwarding and SRIOV interfaces 
are bound to dpdk 2.0.  Unicast connectivity between VMs via Niantic based 
SRIOV works, but fails over Fortville based SRIOV. Application uses the same 
DPDK API to initialize SRIOV interfaces. When interfaces in VM are converted to 
use ixgbevf and i40evf kernel drivers, connectivity works for both types of VF. 
 Clearly there is a particularity on i40e dpdk driver.

I would greatly appreciate some feedback.

Thank you

Serguei

[dpdk-dev] DPDK install behavior Question

2015-10-02 Thread Arevalo, Mario Alfredo C

Hi,

Working with the patchset to include new features to make install,
some questions I missed to ask before:

for example if you use only "make install":

"T" variable is going to get "*" value and the
makefiles are going to build all TARGETS,

Is it an expected behavior?,

and if you use "make install T=TARGET"
(e.g. make install T=x86_64-native-linuxapp-gcc)
the makefiles are going to config, build and install dpdk,
however the target just say "install", and is doing config
and build again.

is it and expected behavior too?.

Thank you so much.
Mario.

[dpdk-dev] [PATCH v3 8/8] mk: Add rule for installing runtime files

2015-10-02 Thread Arevalo, Mario Alfredo C

Anwser for [PATCH v3 8/8] mk: Add rule for installing runtime files

Hi Panu,

Thank you for  taking time in this revision :),.  In this patchset
I?ve tried to keep current behavior (make install) untouched, I mean
thye don't affect the current makefile rules and they work like "new features".
For that reason, they were created as new rules. Now you can do the next.

1) make config T=TARGET (Create a build directory with config files according 
TARGET and directoy environment)
2) make (build dpdk binaries)
and in this point, if you chose some new rule from serie of patches 
(install-sdk, install-doc, install-bin... etc)
the files that were built in the last step will be installed in paths according 
this site
http://www.freedesktop.org/software/systemd/man/file-hierarchy.html
this just will be possibe if build/.config exist.
3) However if you use last rules, they should have the previos behavior before 
patches.

example:
make install T=x86_64-native-linuxapp-gcc
then the makefiles are going to config, build and install dpdk in a directory 
using TARGET as a name.

thanks.
Mario.

From: Panu Matilainen [pmati...@redhat.com]
Sent: Friday, October 02, 2015 4:15 AM
To: Arevalo, Mario Alfredo C; dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH v3 8/8] mk: Add rule for installing runtime files

On 10/01/2015 03:11 AM, Mario Carrillo wrote:
> Add hierarchy-file support to the DPDK libraries, modules,
> binary files, nic bind files and documentation,
> when invoking "make install-fhs" (filesystem hierarchy standard)
> runtime files will be by default installed in:
> $(DESTDIR)/$(BIN_DIR) where BIN_DIR=/usr/bin (binary files)
> $(DESTDIR)/$(SBIN_DIR) where SBIN_DIR=/usr/sbin/dpdk_nic_bind (nic bind
> files)
> $(DESTDIR)/$(DOC_DIR) where DOC_DIR=/usr/share/doc/dpdk (documentation)
> $(DESTDIR)/$(LIB_DIR)  (libraries)
> if the architecture is 64 bits then LIB_DIR=/usr/lib64
> else LIB_DIR=/usr/lib
> $(DESTDIR)/$(KERNEL_DIR) (modules)
> if RTE_EXEC_ENV=linuxapp then
> KERNEL_DIR=/lib/modules/$(uname -r)/build
> else KERNEL_DIR=/boot/modules
> All directory variables mentioned above can be overridden.
> This hierarchy is based on:
> http://www.freedesktop.org/software/systemd/man/file-hierarchy.html
>

Hmm, I think there's a slight misunderstanding here.

What I meant earlier by install-sdk and install-fhs is to preserve the
current behavior of "make install" as "make install-sdk" and have "make
install-fhs" behave like normal OSS app on "make install", which
installs everything (both devel and runtime parts)

This patch series eliminates the current behavior of "make install"
entirely. I personally would not miss it at all, but there likely are
people relying on it since its quite visibly documented and all. So I
think the idea was to introduce a separate FHS-installation target and
then deal with the notion of default behaviors etc separately.

I guess it was already this way in v2 of the series, apologies for
missing it there.

- Panu -

[dpdk-dev] [PATCH v3 6/8] mk: Add rule for installing nic bind files

2015-10-02 Thread Arevalo, Mario Alfredo C

Hi, Panu and Bruce, sounds good your suggestion :) , then I'm going
to change the patch for installing  dpdk_nic_bind.py and cpu_layout.py  in 
/usr/bin

Thanks.
Mario.

From: Richardson, Bruce
Sent: Friday, October 02, 2015 3:54 AM
To: Panu Matilainen; Arevalo, Mario Alfredo C; dev at dpdk.org
Subject: RE: [dpdk-dev] [PATCH v3 6/8] mk: Add rule for installing nic bind 
files

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu Matilainen
> Sent: Friday, October 2, 2015 11:50 AM
> To: Arevalo, Mario Alfredo C; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 6/8] mk: Add rule for installing nic
> bind files
>
> On 10/01/2015 03:11 AM, Mario Carrillo wrote:
> > Add hierarchy-file support to the DPDK nic bind files, when invoking
> > "make install-sbin" nic bind files will be installed by default in:
> > $(DESTDIR)/$(SBIN_DIR) where SBIN_DIR=/usr/sbin/dpdk_nic_bind by
> > default, you can override SBIN_DIR var.
> > This hierarchy is based on:
> > http://www.freedesktop.org/software/systemd/man/file-hierarchy.html
> > and dpdk spec file.
> >
> > Signed-off-by: Mario Carrillo 
> > ---
> >   mk/rte.sdkinstall.mk | 14 ++
> >   mk/rte.sdkroot.mk|  4 ++--
> >   2 files changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk index
> > 5a2fd40..4eecf31 100644
> > --- a/mk/rte.sdkinstall.mk
> > +++ b/mk/rte.sdkinstall.mk
> > @@ -46,11 +46,13 @@ else
> >   INCLUDE_DIR ?= /usr/include/dpdk
> >   BIN_DIR ?= /usr/bin
> >   DOC_DIR ?= /usr/share/doc/dpdk
> > +SBIN_DIR ?= /usr/sbin/dpdk_nic_bind
> >   HSLINKS := $(wildcard $(RTE_OUTPUT)/include/*)
> >   BINARY_FILES := $(patsubst %.map,,$(wildcard $(RTE_OUTPUT)/app/*))
> >   LIBS := $(wildcard $(RTE_OUTPUT)/lib/*)
> >   MODULES := $(wildcard $(RTE_OUTPUT)/kmod/*)
> >   DOCS := $(wildcard $(BUILD_DIR)/doc/*)
> > +NIC_BIND_FILES := $(wildcard $(BUILD_DIR)/tools/*nic_bind.py)
> >   include $(BUILD_DIR)/build/.config
> >   RTE_ARCH := $(CONFIG_RTE_ARCH:"%"=%)
> >   RTE_EXEC_ENV := $(CONFIG_RTE_EXEC_ENV:"%"=%) @@ -161,6 +163,18 @@
> > install-doc:
> > echo installing: $$DOC; \
> > done
> >   #
> > +# install nic bind files in /usr/sbin/dpdk_nic_bind # by default
> > +SBIN_DIR can be overridden.
> > +#
>
> This creates an out-of-path directory /usr/sbin/dpdk_nic_bind/ in which
> the dpdk_nic_bind.py is installed. Besides not being a very accessible
> location, the FHS explicitly forbids creation of subdirectories below
> /usr/[s]bin.
>
> SBIN_DIR should be /usr/sbin unless overridden, but OTOH I think this
> could go into /usr/bin just as well, the split is fairly ambiguous anyway
> (I mean, testpmd is not something a regular user is going to run
> either)
>
> In addition, if dpdk_nic_bind.py is installed then perhaps the
> cpu_layout.py utility should be installed too?
>
>   - Panu -

I think there are better utilities available for determining the core layout
that cpu_layout.py. "lstopo", for one, is much more powerful. Do we want/need
to keep our own script around for that?

/Bruce

[dpdk-dev] [PATCH v3 4/8] mk: Add rule for installing modules

2015-10-02 Thread Arevalo, Mario Alfredo C

 Hi, Panu, perfect thank you for your feedback, I going to change that
path by /lib/modules/$(uname -r)/extra/ in my patch.

Thanks.
Mario.

From: Panu Matilainen [pmati...@redhat.com]
Sent: Friday, October 02, 2015 3:38 AM
To: Arevalo, Mario Alfredo C; dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH v3 4/8] mk: Add rule for installing modules

On 10/01/2015 03:11 AM, Mario Carrillo wrote:
> Add hierarchy-file support to the DPDK modules,
> when invoking "make install-mod" modules will be
> installed in: $(DESTDIR)/$(KERNEL_DIR)
> if RTE_EXEC_ENV=linuxapp then
> KERNEL_DIR=/lib/modules/$(uname -r)/build

/lib/modules/$(uname -r)/build is the path you need when *building*
external kernel modules, you dont want to install anything there.

The default install path for the kernel modules should be somewhere
within /lib/modules/$(uname -r)/extra/, but dunno what the recommended
naming/placing within that is.

Sorry for not catching this earlier,

- Panu -

[dpdk-dev] [PATCH] devargs: add blacklisting by linux interface name

2015-10-02 Thread Bruce Richardson

On Fri, Oct 02, 2015 at 11:00:07AM -0400, Chas Williams wrote:
> If a system is using deterministic interface names, it may be easier in
> some cases to use the interface name to blacklist an interface.
>

Is it possible to do this using the existing arguments, i.e. have the -b flag
detect if it's a pci address or name automatically, rather than having to use
a separate command-line arg for it?

/Bruce

[dpdk-dev] [PATCH v3] ring: add function to free a ring

2015-10-02 Thread Bruce Richardson

On Fri, Oct 02, 2015 at 03:01:25PM +0100, Pablo de Lara wrote:
> From: "Pablo de Lara" 
> 
> When creating a ring, a memzone is created to allocate it in memory,
> but the ring could not be freed, as memzones could not be.
> 
> Since memzones can be freed now, then rings can be as well,
> taking into account if they were initialized using pre-allocated memory
> (in which case, memory should be freed externally) or using 
> rte_memzone_reserve
> (with rte_ring_create), freeing the memory with rte_memzone_free.
> 
> Signed-off-by: Pablo de Lara 
> ---
> Changes in v3:
>  - Simplify patch using stored memzone address in ring structure
>  - Change copyright date

I think you need to call out that this patch depends upon
http://dpdk.org/dev/patchwork/patch/7308/

> 
> Changes in v2:
>  - Include note in release notes
>  - Add error log when ring cannot be freed
> 
> This patch depends on patch "rte_ring: store memzone pointer inside ring"
> 
>  doc/guides/rel_notes/release_2_2.rst |  4 +++
>  lib/librte_ring/rte_ring.c   | 47 
> +++-
>  lib/librte_ring/rte_ring.h   |  7 ++
>  lib/librte_ring/rte_ring_version.map |  7 ++
>  4 files changed, 64 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/rel_notes/release_2_2.rst 
> b/doc/guides/rel_notes/release_2_2.rst
> index 5687676..24937ac 100644
> --- a/doc/guides/rel_notes/release_2_2.rst
> +++ b/doc/guides/rel_notes/release_2_2.rst
> @@ -4,6 +4,10 @@ DPDK Release 2.2
>  New Features
>  
>  
> +* **Enabled freeing of rte_ring.**
> +
> +  New function rte_ring_free() allows the user to free a ring
> +  if it was created with rte_ring_create().
>  
>  Resolved Issues
>  ---
> diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> index 4e78e14..d80faf3 100644
> --- a/lib/librte_ring/rte_ring.c
> +++ b/lib/librte_ring/rte_ring.c
> @@ -1,7 +1,7 @@
>  /*-
>   *   BSD LICENSE
>   *
> - *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
>   *   All rights reserved.
>   *
>   *   Redistribution and use in source and binary forms, with or without
> @@ -209,6 +209,51 @@ rte_ring_create(const char *name, unsigned count, int 
> socket_id,
>   return r;
>  }
>  
> +/* free the ring */
> +void
> +rte_ring_free(struct rte_ring *r)
> +{
> + struct rte_ring_list *ring_list = NULL;
> + struct rte_tailq_entry *te;
> +
> + if (r == NULL)
> + return;
> +
> + /*
> +  * Ring was not created with rte_ring_create,
> +  * therefore, there is no memzone to free.
> +  */
> + if (r->memzone == NULL) {
> + RTE_LOG(ERR, RING, "Cannot free ring (not created with 
> rte_ring_create()");
> + return;
> + }
> +
> + if (rte_memzone_free(r->memzone) != 0) {
> + RTE_LOG(ERR, RING, "Cannot free memory\n");
> + return;
> + }
> +
> + ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list);
> + rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> +
> + /* find out tailq entry */
> + TAILQ_FOREACH(te, ring_list, next) {
> + if (te->data == (void *) r)
> + break;
> + }
> +
> + if (te == NULL) {
> + rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
> + return;
> + }
> +
> + TAILQ_REMOVE(ring_list, te, next);
> +
> + rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
> +
> + rte_free(te);
> +}
> +
>  /*
>   * change the high water mark. If *count* is 0, water marking is
>   * disabled
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index df45f3f..fb5a626 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -304,6 +304,13 @@ int rte_ring_init(struct rte_ring *r, const char *name, 
> unsigned count,
>   */
>  struct rte_ring *rte_ring_create(const char *name, unsigned count,
>int socket_id, unsigned flags);
> +/**
> + * De-allocate all memory used by the ring.
> + *
> + * @param r
> + *   Ring to free
> + */
> +void rte_ring_free(struct rte_ring *r);
>  
>  /**
>   * Change the high water mark.
> diff --git a/lib/librte_ring/rte_ring_version.map 
> b/lib/librte_ring/rte_ring_version.map
> index 982fdd1..5474b98 100644
> --- a/lib/librte_ring/rte_ring_version.map
> +++ b/lib/librte_ring/rte_ring_version.map
> @@ -11,3 +11,10 @@ DPDK_2.0 {
>  
>   local: *;
>  };
> +
> +DPDK_2.2 {
> + global:
> +
> + rte_ring_free;
> +
> +} DPDK_2.0;
> -- 
> 2.4.3
>

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-10-02 Thread Bruce Richardson

On Fri, Oct 02, 2015 at 05:00:14PM +0300, Michael S. Tsirkin wrote:
> On Thu, Oct 01, 2015 at 02:02:24PM -0700, Alexander Duyck wrote:
> > validation and translation would add 10s if not 100s of nanoseconds to the
> > time needed to process each packet.  In addition we are talking about doing
> > this in kernel space which means we wouldn't really be able to take
> > advantage of things like SSE or AVX instructions.
> 
> Yes. But the nice thing is that it's rearming so it can happen on
> a separate core, in parallel with packet processing.
> It does not need to add to latency.
> 
> You will burn up more CPU, but again, all this for boxes/hypervisors
> without an IOMMU.
> 
> I'm sure people can come up with even better approaches, once enough
> people get it that kernel absolutely needs to be protected from
> userspace.
> 
> Long term, the right thing to do is to focus on IOMMU support. This
> gives you hardware-based memory protection without need to burn up CPU
> cycles.
> 
> -- 
> MST

Running it on another will have it's own problems. The main one that springs to
mind for me is the performance impact of having all those cache lines shared
between the two cores.

/Bruce

[dpdk-dev] [PATCH v3] ring: add function to free a ring

2015-10-02 Thread Pablo de Lara

From: "Pablo de Lara" 

When creating a ring, a memzone is created to allocate it in memory,
but the ring could not be freed, as memzones could not be.

Since memzones can be freed now, then rings can be as well,
taking into account if they were initialized using pre-allocated memory
(in which case, memory should be freed externally) or using rte_memzone_reserve
(with rte_ring_create), freeing the memory with rte_memzone_free.

Signed-off-by: Pablo de Lara 
---
Changes in v3:
 - Simplify patch using stored memzone address in ring structure
 - Change copyright date

Changes in v2:
 - Include note in release notes
 - Add error log when ring cannot be freed

This patch depends on patch "rte_ring: store memzone pointer inside ring"

 doc/guides/rel_notes/release_2_2.rst |  4 +++
 lib/librte_ring/rte_ring.c   | 47 +++-
 lib/librte_ring/rte_ring.h   |  7 ++
 lib/librte_ring/rte_ring_version.map |  7 ++
 4 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index 5687676..24937ac 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -4,6 +4,10 @@ DPDK Release 2.2
 New Features
 

+* **Enabled freeing of rte_ring.**
+
+  New function rte_ring_free() allows the user to free a ring
+  if it was created with rte_ring_create().

 Resolved Issues
 ---
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 4e78e14..d80faf3 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -209,6 +209,51 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
return r;
 }

+/* free the ring */
+void
+rte_ring_free(struct rte_ring *r)
+{
+   struct rte_ring_list *ring_list = NULL;
+   struct rte_tailq_entry *te;
+
+   if (r == NULL)
+   return;
+
+   /*
+* Ring was not created with rte_ring_create,
+* therefore, there is no memzone to free.
+*/
+   if (r->memzone == NULL) {
+   RTE_LOG(ERR, RING, "Cannot free ring (not created with 
rte_ring_create()");
+   return;
+   }
+
+   if (rte_memzone_free(r->memzone) != 0) {
+   RTE_LOG(ERR, RING, "Cannot free memory\n");
+   return;
+   }
+
+   ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list);
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find out tailq entry */
+   TAILQ_FOREACH(te, ring_list, next) {
+   if (te->data == (void *) r)
+   break;
+   }
+
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(ring_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+   rte_free(te);
+}
+
 /*
  * change the high water mark. If *count* is 0, water marking is
  * disabled
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index df45f3f..fb5a626 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -304,6 +304,13 @@ int rte_ring_init(struct rte_ring *r, const char *name, 
unsigned count,
  */
 struct rte_ring *rte_ring_create(const char *name, unsigned count,
 int socket_id, unsigned flags);
+/**
+ * De-allocate all memory used by the ring.
+ *
+ * @param r
+ *   Ring to free
+ */
+void rte_ring_free(struct rte_ring *r);

 /**
  * Change the high water mark.
diff --git a/lib/librte_ring/rte_ring_version.map 
b/lib/librte_ring/rte_ring_version.map
index 982fdd1..5474b98 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -11,3 +11,10 @@ DPDK_2.0 {

local: *;
 };
+
+DPDK_2.2 {
+   global:
+
+   rte_ring_free;
+
+} DPDK_2.0;
-- 
2.4.3

[dpdk-dev] [PATCH] devargs: add blacklisting by linux interface name

2015-10-02 Thread Charles (Chas) Williams

On Fri, 2015-10-02 at 16:44 +, Richardson, Bruce wrote:
> > -Original Message-
> > From: Charles (Chas) Williams [mailto:3chas3 at gmail.com]
> > 
> > On Fri, 2015-10-02 at 16:18 +0100, Bruce Richardson wrote:
> > > On Fri, Oct 02, 2015 at 11:00:07AM -0400, Chas Williams wrote:
> > > > If a system is using deterministic interface names, it may be easier
> > > > in some cases to use the interface name to blacklist an interface.
> > > >
> > >
> > > Is it possible to do this using the existing arguments, i.e. have the
> > > -b flag detect if it's a pci address or name automatically, rather
> > > than having to use a separate command-line arg for it?
> > 
> > You might be able to distinguish names by context.  I doubt interface
> > names ever look like PCI addresses.  But that's going to be a bigger
> > change since -b will need to be updated to 'blacklist' intead of 'pci-
> > blacklist' to prevent confusion.  Or do you just want to overload '-b' and
> > keep both long options?
> >
> I'm not sure about that, to be honest. However, I'd rather not have
> too many cmd line options to be maintained in the code. 
> 
> Does you proposed blacklisting patch work with non-pci devices as well
> as with PCI ones as now?

Unfortunately, the devargs API is rather PCI specific -- it takes a PCI
device.  Nothing prevents you from writing a device specific version of
the devargs API though for your device class since the devargs list isn't
static but checking for certain devargs wouldn't make sense in some cases.
Checking to see if a USB device matched a blacklisted PCI device would
be pointless.

Other devices (like Xen or hyperv) have a net/ directory/link in their /sys
entry that lets you determine an interface name.  I think it's the same
for USB ethernet devices -- I don't happen to have one to check.

[dpdk-dev] [PATCH v3] ring: add function to free a ring

2015-10-02 Thread De Lara Guarch, Pablo

Hi Bruce,

> -Original Message-
> From: Richardson, Bruce
> Sent: Friday, October 02, 2015 3:10 PM
> To: De Lara Guarch, Pablo
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3] ring: add function to free a ring
> 
> On Fri, Oct 02, 2015 at 03:01:25PM +0100, Pablo de Lara wrote:
> > From: "Pablo de Lara" 
> >
> > When creating a ring, a memzone is created to allocate it in memory,
> > but the ring could not be freed, as memzones could not be.
> >
> > Since memzones can be freed now, then rings can be as well,
> > taking into account if they were initialized using pre-allocated memory
> > (in which case, memory should be freed externally) or using
> rte_memzone_reserve
> > (with rte_ring_create), freeing the memory with rte_memzone_free.
> >
> > Signed-off-by: Pablo de Lara 
> > ---
> > Changes in v3:
> >  - Simplify patch using stored memzone address in ring structure
> >  - Change copyright date
> 
> I think you need to call out that this patch depends upon
> http://dpdk.org/dev/patchwork/patch/7308/

I did below, probably I should have included the patch ID :S

> 
> >
> > Changes in v2:
> >  - Include note in release notes
> >  - Add error log when ring cannot be freed
> >
> > This patch depends on patch "rte_ring: store memzone pointer inside ring"
> >
> >  doc/guides/rel_notes/release_2_2.rst |  4 +++
> >  lib/librte_ring/rte_ring.c   | 47
> +++-
> >  lib/librte_ring/rte_ring.h   |  7 ++
> >  lib/librte_ring/rte_ring_version.map |  7 ++
> >  4 files changed, 64 insertions(+), 1 deletion(-)
> >
> > diff --git a/doc/guides/rel_notes/release_2_2.rst
> b/doc/guides/rel_notes/release_2_2.rst
> > index 5687676..24937ac 100644
> > --- a/doc/guides/rel_notes/release_2_2.rst
> > +++ b/doc/guides/rel_notes/release_2_2.rst
> > @@ -4,6 +4,10 @@ DPDK Release 2.2
> >  New Features
> >  
> >
> > +* **Enabled freeing of rte_ring.**
> > +
> > +  New function rte_ring_free() allows the user to free a ring
> > +  if it was created with rte_ring_create().
> >
> >  Resolved Issues
> >  ---
> > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > index 4e78e14..d80faf3 100644
> > --- a/lib/librte_ring/rte_ring.c
> > +++ b/lib/librte_ring/rte_ring.c
> > @@ -1,7 +1,7 @@
> >  /*-
> >   *   BSD LICENSE
> >   *
> > - *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > + *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
> >   *   All rights reserved.
> >   *
> >   *   Redistribution and use in source and binary forms, with or without
> > @@ -209,6 +209,51 @@ rte_ring_create(const char *name, unsigned
> count, int socket_id,
> > return r;
> >  }
> >
> > +/* free the ring */
> > +void
> > +rte_ring_free(struct rte_ring *r)
> > +{
> > +   struct rte_ring_list *ring_list = NULL;
> > +   struct rte_tailq_entry *te;
> > +
> > +   if (r == NULL)
> > +   return;
> > +
> > +   /*
> > +* Ring was not created with rte_ring_create,
> > +* therefore, there is no memzone to free.
> > +*/
> > +   if (r->memzone == NULL) {
> > +   RTE_LOG(ERR, RING, "Cannot free ring (not created with
> rte_ring_create()");
> > +   return;
> > +   }
> > +
> > +   if (rte_memzone_free(r->memzone) != 0) {
> > +   RTE_LOG(ERR, RING, "Cannot free memory\n");
> > +   return;
> > +   }
> > +
> > +   ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list);
> > +   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> > +
> > +   /* find out tailq entry */
> > +   TAILQ_FOREACH(te, ring_list, next) {
> > +   if (te->data == (void *) r)
> > +   break;
> > +   }
> > +
> > +   if (te == NULL) {
> > +   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
> > +   return;
> > +   }
> > +
> > +   TAILQ_REMOVE(ring_list, te, next);
> > +
> > +   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
> > +
> > +   rte_free(te);
> > +}
> > +
> >  /*
> >   * change the high water mark. If *count* is 0, water marking is
> >   * disabled
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index df45f3f..fb5a626 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -304,6 +304,13 @@ int rte_ring_init(struct rte_ring *r, const char
> *name, unsigned count,
> >   */
> >  struct rte_ring *rte_ring_create(const char *name, unsigned count,
> >  int socket_id, unsigned flags);
> > +/**
> > + * De-allocate all memory used by the ring.
> > + *
> > + * @param r
> > + *   Ring to free
> > + */
> > +void rte_ring_free(struct rte_ring *r);
> >
> >  /**
> >   * Change the high water mark.
> > diff --git a/lib/librte_ring/rte_ring_version.map
> b/lib/librte_ring/rte_ring_version.map
> > index 982fdd1..5474b98 100644
> > --- a/lib/librte_ring/rte_ring_version.map
> > +++ b/lib/librte_ring/rte_ring_version.map
> > @@ -11,3 +11,10 @@ DPDK_2.0 {
> >
> > local: *;
> >  };
> > +
> >

[dpdk-dev] [PATCH] ethdev: distinguish between drop and error stats

2015-10-02 Thread Maryam Tahhan

Make a distniction between dropped packets and error statistics to allow
a higher level fault management entity to interact with DPDK and take
appropriate measures when errors are detected. It will also provide
valuable information for any applications that collects/extracts DPDK
stats, such applications include Open vSwitch.
After this patch the distinction is:
ierrors = Total number of packets dropped by hardware (malformed
packets, ...) Where the # of drops can ONLY be <=  the packets received
(without overlap between registers).
Rx_pkt_errors = Total number of erroneous received packets. Where the #
of errors can be >= the packets received (without overlap between
registers), this is because there may be multiple errors associated with
a packet.

Signed-off-by: Maryam Tahhan 
---
 lib/librte_ether/rte_ethdev.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 8a8c82b..53dd55d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -200,8 +200,9 @@ struct rte_eth_stats {
/**< Deprecated; Total of RX packets with CRC error. */
uint64_t ibadlen;
/**< Deprecated; Total of RX packets with bad length. */
-   uint64_t ierrors;   /**< Total number of erroneous received packets. */
+   uint64_t ierrors;   /**< Total number of dropped received packets. */
uint64_t oerrors;   /**< Total number of failed transmitted packets. */
+   uint64_t ipkterrors;   /**< Total number of erroneous received packets. 
*/
uint64_t imcasts;
/**< Deprecated; Total number of multicast received packets. */
uint64_t rx_nombuf; /**< Total number of RX mbuf allocation failures. */
-- 
2.4.3

[dpdk-dev] [PATCH v3 4/8] mk: Add rule for installing modules

2015-10-02 Thread Panu Matilainen

On 10/01/2015 03:11 AM, Mario Carrillo wrote:
> Add hierarchy-file support to the DPDK modules,
> when invoking "make install-mod" modules will be
> installed in: $(DESTDIR)/$(KERNEL_DIR)
> if RTE_EXEC_ENV=linuxapp then
> KERNEL_DIR=/lib/modules/$(uname -r)/build

/lib/modules/$(uname -r)/build is the path you need when *building* 
external kernel modules, you dont want to install anything there.

The default install path for the kernel modules should be somewhere 
within /lib/modules/$(uname -r)/extra/, but dunno what the recommended 
naming/placing within that is.

Sorry for not catching this earlier,

- Panu -

[dpdk-dev] [PATCH] devargs: add blacklisting by linux interface name

2015-10-02 Thread Charles (Chas) Williams

On Fri, 2015-10-02 at 16:18 +0100, Bruce Richardson wrote:
> On Fri, Oct 02, 2015 at 11:00:07AM -0400, Chas Williams wrote:
> > If a system is using deterministic interface names, it may be easier in
> > some cases to use the interface name to blacklist an interface.
> >
> 
> Is it possible to do this using the existing arguments, i.e. have the -b flag
> detect if it's a pci address or name automatically, rather than having to use
> a separate command-line arg for it?

You might be able to distinguish names by context.  I doubt interface
names ever look like PCI addresses.  But that's going to be a bigger
change since -b will need to be updated to 'blacklist' intead of
'pci-blacklist' to prevent confusion.  Or do you just want to overload
'-b' and keep both long options?

[dpdk-dev] [PATCH v3 8/8] mk: Add rule for installing runtime files

2015-10-02 Thread Bruce Richardson

On Fri, Oct 02, 2015 at 02:15:29PM +0300, Panu Matilainen wrote:
> On 10/01/2015 03:11 AM, Mario Carrillo wrote:
> >Add hierarchy-file support to the DPDK libraries, modules,
> >binary files, nic bind files and documentation,
> >when invoking "make install-fhs" (filesystem hierarchy standard)
> >runtime files will be by default installed in:
> >$(DESTDIR)/$(BIN_DIR) where BIN_DIR=/usr/bin (binary files)
> >$(DESTDIR)/$(SBIN_DIR) where SBIN_DIR=/usr/sbin/dpdk_nic_bind (nic bind
> >files)
> >$(DESTDIR)/$(DOC_DIR) where DOC_DIR=/usr/share/doc/dpdk (documentation)
> >$(DESTDIR)/$(LIB_DIR)  (libraries)
> >if the architecture is 64 bits then LIB_DIR=/usr/lib64
> >else LIB_DIR=/usr/lib
> >$(DESTDIR)/$(KERNEL_DIR) (modules)
> >if RTE_EXEC_ENV=linuxapp then
> >KERNEL_DIR=/lib/modules/$(uname -r)/build
> >else KERNEL_DIR=/boot/modules
> >All directory variables mentioned above can be overridden.
> >This hierarchy is based on:
> >http://www.freedesktop.org/software/systemd/man/file-hierarchy.html
> >
> 
> Hmm, I think there's a slight misunderstanding here.
> 
> What I meant earlier by install-sdk and install-fhs is to preserve the
> current behavior of "make install" as "make install-sdk" and have "make
> install-fhs" behave like normal OSS app on "make install", which installs
> everything (both devel and runtime parts)
> 
> This patch series eliminates the current behavior of "make install"
> entirely. I personally would not miss it at all, but there likely are people
> relying on it since its quite visibly documented and all. So I think the
> idea was to introduce a separate FHS-installation target and then deal with
> the notion of default behaviors etc separately.
> 
> I guess it was already this way in v2 of the series, apologies for missing
> it there.
> 
>   - Panu -

I also think that having some way to get the old behaviour for those relying on
it would be good. Even though it's not ABI affecting, for those compiling from
source it would be nice to follow some sort of gradual deprecation process 
rather
than just changing everything in one go.

/Bruce

[dpdk-dev] [PATCH 3/3] Modifying configuration scripts for Netronome's nfp_uio driver.

2015-10-02 Thread Alejandro.Lucero

From: "Alejandro.Lucero" 

Signed-off-by: Alejandro.Lucero 
Signed-off-by: Rolf.Neugebauer 
---
 tools/dpdk_nic_bind.py |8 ++--
 tools/setup.sh |  122 ++--
 2 files changed, 101 insertions(+), 29 deletions(-)

diff --git a/tools/dpdk_nic_bind.py b/tools/dpdk_nic_bind.py
index b7bd877..f7f8a39 100755
--- a/tools/dpdk_nic_bind.py
+++ b/tools/dpdk_nic_bind.py
@@ -43,7 +43,7 @@ ETHERNET_CLASS = "0200"
 # Each device within this is itself a dictionary of device properties
 devices = {}
 # list of supported DPDK drivers
-dpdk_drivers = [ "igb_uio", "vfio-pci", "uio_pci_generic" ]
+dpdk_drivers = [ "igb_uio", "vfio-pci", "uio_pci_generic", "nfp_uio" ]

 # command-line arg flags
 b_flag = None
@@ -153,7 +153,7 @@ def find_module(mod):
 return path

 def check_modules():
-'''Checks that igb_uio is loaded'''
+'''Checks that at least one dpdk module is loaded'''
 global dpdk_drivers

 fd = file("/proc/modules")
@@ -261,7 +261,7 @@ def get_nic_details():
 devices[d]["Active"] = "*Active*"
 break;

-# add igb_uio to list of supporting modules if needed
+# add module to list of supporting modules if needed
 if "Module_str" in devices[d]:
 for driver in dpdk_drivers:
 if driver not in devices[d]["Module_str"]:
@@ -440,7 +440,7 @@ def display_devices(title, dev_list, extra_params = None):

 def show_status():
 '''Function called when the script is passed the "--status" option. 
Displays
-to the user what devices are bound to the igb_uio driver, the kernel driver
+to the user what devices are bound to a dpdk driver, the kernel driver
 or to no driver'''
 global dpdk_drivers
 kernel_drv = []
diff --git a/tools/setup.sh b/tools/setup.sh
index 5a8b2f3..e434ddb 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -236,6 +236,52 @@ load_vfio_module()
 }

 #
+# Unloads nfp_uio.ko.
+#
+remove_nfp_uio_module()
+{
+   echo "Unloading any existing DPDK UIO module"
+   /sbin/lsmod | grep -s nfp_uio > /dev/null
+   if [ $? -eq 0 ] ; then
+   sudo /sbin/rmmod nfp_uio
+   fi
+}
+
+#
+# Loads new nfp_uio.ko (and uio module if needed).
+#
+load_nfp_uio_module()
+{
+   echo "Using RTE_SDK=$RTE_SDK and RTE_TARGET=$RTE_TARGET"
+   if [ ! -f $RTE_SDK/$RTE_TARGET/kmod/nfp_uio.ko ];then
+   echo "## ERROR: Target does not have the DPDK UIO Kernel 
Module."
+   echo "   To fix, please try to rebuild target."
+   return
+   fi
+
+   remove_nfp_uio_module
+
+   /sbin/lsmod | grep -s uio > /dev/null
+   if [ $? -ne 0 ] ; then
+   modinfo uio > /dev/null
+   if [ $? -eq 0 ]; then
+   echo "Loading uio module"
+   sudo /sbin/modprobe uio
+   fi
+   fi
+
+   # UIO may be compiled into kernel, so it may not be an error if it can't
+   # be loaded.
+
+   echo "Loading DPDK UIO module"
+   sudo /sbin/insmod $RTE_SDK/$RTE_TARGET/kmod/nfp_uio.ko
+   if [ $? -ne 0 ] ; then
+   echo "## ERROR: Could not load kmod/nfp_uio.ko."
+   quit
+   fi
+}
+
+#
 # Unloads the rte_kni.ko module.
 #
 remove_kni_module()
@@ -427,10 +473,10 @@ grep_meminfo()
 #
 show_nics()
 {
-   if  /sbin/lsmod | grep -q -e igb_uio -e vfio_pci; then
+   if  /sbin/lsmod | grep -q -e igb_uio -e vfio_pci -e nfp_uio; then
${RTE_SDK}/tools/dpdk_nic_bind.py --status
else
-   echo "# Please load the 'igb_uio' or 'vfio-pci' kernel module 
before "
+   echo "# Please load the 'igb_uio', 'vfio-pci' or 'nfp_uio' 
kernel module before "
echo "# querying or adjusting NIC device bindings"
fi
 }
@@ -471,6 +517,23 @@ bind_nics_to_igb_uio()
 }

 #
+# Uses dpdk_nic_bind.py to move devices to work with nfp_uio
+#
+bind_nics_to_nfp_uio()
+{
+   if  /sbin/lsmod  | grep -q nfp_uio ; then
+   ${RTE_SDK}/tools/dpdk_nic_bind.py --status
+   echo ""
+   echo -n "Enter PCI address of device to bind to NFP UIO driver: 
"
+   read PCI_PATH
+   sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b nfp_uio $PCI_PATH && 
echo "OK"
+   else
+   echo "# Please load the 'nfp_uio' kernel module before querying 
or "
+   echo "# adjusting NIC device bindings"
+   fi
+}
+
+#
 # Uses dpdk_nic_bind.py to move devices to work with kernel drivers again
 #
 unbind_nics()
@@ -513,29 +576,35 @@ step2_func()
TEXT[1]="Insert IGB UIO module"
FUNC[1]="load_igb_uio_module"

-   TEXT[2]="Insert VFIO module"
-   FUNC[2]="load_vfio_module"
+   TEXT[2]="Insert NFP UIO module"
+   FUNC[2]="load_nfp_uio_module"

-   TEXT[3]="Insert KNI module"
-   FUNC[3]="load_kni_module"
+   TEXT[3]="Insert VFIO

[dpdk-dev] [PATCH 2/3] This patch adds a new UIO driver for Netronome NFP PCI cards.

2015-10-02 Thread Alejandro.Lucero

From: "Alejandro.Lucero" 

Current Netronome's PMD just supports Virtual Functions. Future Physical
Function support will require specific Netronome code here.

Signed-off-by: Alejandro.Lucero 
Signed-off-by: Rolf.Neugebauer 
---
 lib/librte_eal/common/include/rte_pci.h   |1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c |4 +
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |2 +-
 lib/librte_eal/linuxapp/nfp_uio/Makefile  |   53 +++
 lib/librte_eal/linuxapp/nfp_uio/nfp_uio.c |  497 +
 lib/librte_ether/rte_ethdev.c |1 +
 6 files changed, 557 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/linuxapp/nfp_uio/Makefile
 create mode 100644 lib/librte_eal/linuxapp/nfp_uio/nfp_uio.c

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 83e3c28..89baaf6 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -146,6 +146,7 @@ struct rte_devargs;
 enum rte_kernel_driver {
RTE_KDRV_UNKNOWN = 0,
RTE_KDRV_IGB_UIO,
+   RTE_KDRV_NFP_UIO,
RTE_KDRV_VFIO,
RTE_KDRV_UIO_GENERIC,
RTE_KDRV_NIC_UIO,
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index bc5b5be..19a93fe 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -137,6 +137,7 @@ pci_map_device(struct rte_pci_device *dev)
 #endif
break;
case RTE_KDRV_IGB_UIO:
+   case RTE_KDRV_NFP_UIO:
case RTE_KDRV_UIO_GENERIC:
/* map resources for devices that use uio */
ret = pci_uio_map_resource(dev);
@@ -161,6 +162,7 @@ pci_unmap_device(struct rte_pci_device *dev)
RTE_LOG(ERR, EAL, "Hotplug doesn't support vfio yet\n");
break;
case RTE_KDRV_IGB_UIO:
+   case RTE_KDRV_NFP_UIO:
case RTE_KDRV_UIO_GENERIC:
/* unmap resources for devices that use uio */
pci_uio_unmap_resource(dev);
@@ -357,6 +359,8 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t 
bus,
dev->kdrv = RTE_KDRV_VFIO;
else if (!strcmp(driver, "igb_uio"))
dev->kdrv = RTE_KDRV_IGB_UIO;
+   else if (!strcmp(driver, "nfp_uio"))
+   dev->kdrv = RTE_KDRV_NFP_UIO;
else if (!strcmp(driver, "uio_pci_generic"))
dev->kdrv = RTE_KDRV_UIO_GENERIC;
else
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index ac50e13..29ec9cb 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -270,7 +270,7 @@ pci_uio_alloc_resource(struct rte_pci_device *dev,
goto error;
}

-   if (dev->kdrv == RTE_KDRV_IGB_UIO)
+   if (dev->kdrv == RTE_KDRV_IGB_UIO || dev->kdrv == RTE_KDRV_NFP_UIO)
dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
else {
dev->intr_handle.type = RTE_INTR_HANDLE_UIO_INTX;
diff --git a/lib/librte_eal/linuxapp/nfp_uio/Makefile 
b/lib/librte_eal/linuxapp/nfp_uio/Makefile
new file mode 100644
index 000..b9e2f0a
--- /dev/null
+++ b/lib/librte_eal/linuxapp/nfp_uio/Makefile
@@ -0,0 +1,53 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2014-2015 Netronome. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY

[dpdk-dev] [PATCH 1/3] This patch adds a PMD driver for Netronome NFP PCI cards.

2015-10-02 Thread Alejandro.Lucero

From: "Alejandro.Lucero" 

Signed-off-by: Alejandro.Lucero 
Signed-off-by: Rolf.Neugebauer 
---
 config/common_linuxapp   |6 +
 doc/guides/nics/nfp.rst  |  248 
 drivers/net/Makefile |1 +
 drivers/net/nfp/Makefile |   88 ++
 drivers/net/nfp/nfp_net.c| 2480 ++
 drivers/net/nfp/nfp_net_ctrl.h   |  294 +
 drivers/net/nfp/nfp_net_logs.h   |   76 ++
 drivers/net/nfp/nfp_net_pmd.h|  415 +++
 lib/librte_eal/linuxapp/Makefile |3 +
 mk/rte.app.mk|1 +
 10 files changed, 3612 insertions(+)
 create mode 100644 doc/guides/nics/nfp.rst
 create mode 100644 drivers/net/nfp/Makefile
 create mode 100644 drivers/net/nfp/nfp_net.c
 create mode 100644 drivers/net/nfp/nfp_net_ctrl.h
 create mode 100644 drivers/net/nfp/nfp_net_logs.h
 create mode 100644 drivers/net/nfp/nfp_net_pmd.h

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..d8d6384 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -108,6 +108,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_NFP_UIO=y
 CONFIG_RTE_EAL_VFIO=y
 CONFIG_RTE_MALLOC_DEBUG=n

@@ -238,6 +239,11 @@ CONFIG_RTE_LIBRTE_ENIC_PMD=y
 CONFIG_RTE_LIBRTE_ENIC_DEBUG=n

 #
+# Compile burst-oriented Netronome PMD driver
+#
+CONFIG_RTE_LIBRTE_NFP_PMD=y
+
+#
 # Compile burst-oriented VIRTIO PMD driver
 #
 CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst
new file mode 100644
index 000..df5a746
--- /dev/null
+++ b/doc/guides/nics/nfp.rst
@@ -0,0 +1,248 @@
+..  BSD LICENSE
+Copyright(c) 2015 Netronome Systems, Inc. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+1. Intro
+
+
+Netronome's sixth generation of flow processors pack 216 programmable
+cores and over 100 hardware accelerators that uniquely combine packet,
+flow, security and content processing in a single device that scales
+up to 400 Gbps.
+
+This document explains how to use DPDK with the Netronome Poll Mode
+Driver (PMD) supporting Netronome's Network Flow Processor 6xxx
+(NFP-6xxx).
+
+Currently the driver supports virtual functions (VFs) only.
+
+2. Dependencies
+===
+
+Before using the Netronome's DPDK PMD some NFP-6xxx configuration,
+which is not related to DPDK, is required. The system requires
+installation of Netronome's BSP (Board Support Package) which includes
+Linux drivers, programs and libraries.
+
+If you have a NFP-6xxx device you should already have the code and
+documentation for doing this configuration. Contact
+support at netronome.com to obtain the latest available firmware.
+
+The NFP Linux kernel drivers (including the required PF driver for the
+NFP) are available on Github at
+https://github.com/Netronome/nfp-drv-kmods along with build
+instructions.
+
+DPDK runs in userspace and PMDs uses the Linux kernel UIO interface to
+allow access to physical devices from userspace. The NFP PMD requires
+a separate UIO driver, nfp_uio, to perform correct
+initialization. This driver is part of the DPDK source tree and is
+equivalent to Intel's igb_uio driver.
+
+3. Building the software
+
+
+Netronome's PMD code is provided in the drivers/net/nfp directory and
+nfp_uio is present in the lib/librte_eal/linuxapp/nfp_uio

[dpdk-dev] [Pktgen] [PATCH] pktgen_setup_packets: fix race for packet header

2015-10-02 Thread Ilya Maximets

Ping.

On 17.09.2015 08:55, Ilya Maximets wrote:
> Ok. Thank you. I'll wait.
> 
> On 16.09.2015 18:37, Wiles, Keith wrote:
>> Thanks the patch looks fine, but I have not had a lot of time to review it
>> detail. I hope to get to it next week after I return back home.
>>
>> On 9/16/15, 2:09 AM, "Ilya Maximets"  wrote:
>>
>>> Ping.
>>>
>>> On 09.09.2015 17:22, Ilya Maximets wrote:
 While pktgen_setup_packets() all threads of one port uses same
 info->seq_pkt. This leads to constructing packets in the same memory
 region
 (>hdr). As a result, pktgen_setup_packets generates random headers.

 Fix that by making a local copy of info->seq_pkt and using it for
 constructing of packets.

 Signed-off-by: Ilya Maximets 
 ---
  app/pktgen-arp.c  |  2 +-
  app/pktgen-cmds.c | 40 
  app/pktgen-ipv4.c |  2 +-
  app/pktgen.c  | 39 +++
  app/pktgen.h  |  4 ++--
  app/t/pktgen.t.c  |  6 +++---
  6 files changed, 54 insertions(+), 39 deletions(-)

 diff --git a/app/pktgen-arp.c b/app/pktgen-arp.c
 index c378880..b7040d7 100644
 --- a/app/pktgen-arp.c
 +++ b/app/pktgen-arp.c
 @@ -190,7 +190,7 @@ pktgen_process_arp( struct rte_mbuf * m, uint32_t
 pid, uint32_t vlan )
  
rte_memcpy(>eth_dst_addr, >sha, 6);
for (i = 0; i < info->seqCnt; i++)
 -  pktgen_packet_ctor(info, i, -1);
 +  pktgen_packet_ctor(info, i, -1, NULL);
}
  
// Swap the two MAC addresses
 diff --git a/app/pktgen-cmds.c b/app/pktgen-cmds.c
 index da040e5..a6abb41 100644
 --- a/app/pktgen-cmds.c
 +++ b/app/pktgen-cmds.c
 @@ -931,7 +931,7 @@ pktgen_set_proto(port_info_t * info, char type)
if ( type == 'i' )
info->seq_pkt[SINGLE_PKT].ethType = ETHER_TYPE_IPv4;
  
 -  pktgen_packet_ctor(info, SINGLE_PKT, -1);
 +  pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
  }
  
  
 /
 **//**
 @@ -1067,7 +1067,7 @@ pktgen_set_pkt_type(port_info_t * info, const
 char * type)

 (type[3] == '6') ? ETHER_TYPE_IPv6 :

 /* TODO print error: unknown type */ ETHER_TYPE_IPv4;
  
 -  pktgen_packet_ctor(info, SINGLE_PKT, -1);
 +  pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
  }
  
  
 /
 **//**
 @@ -1092,7 +1092,7 @@ pktgen_set_vlan(port_info_t * info, uint32_t
 onOff)
}
else
pktgen_clr_port_flags(info, SEND_VLAN_ID);
 -  pktgen_packet_ctor(info, SINGLE_PKT, -1);
 +  pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
  }
  
  
 /
 **//**
 @@ -1112,7 +1112,7 @@ pktgen_set_vlanid(port_info_t * info, uint16_t
 vlanid)
  {
info->vlanid = vlanid;
info->seq_pkt[SINGLE_PKT].vlanid = info->vlanid;
 -  pktgen_packet_ctor(info, SINGLE_PKT, -1);
 +  pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
  }
  
  
 /
 **//**
 @@ -1137,7 +1137,7 @@ pktgen_set_mpls(port_info_t * info, uint32_t
 onOff)
}
else
pktgen_clr_port_flags(info, SEND_MPLS_LABEL);
 -  pktgen_packet_ctor(info, SINGLE_PKT, -1);
 +  pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
  }
  
  
 /
 **//**
 @@ -1157,7 +1157,7 @@ pktgen_set_mpls_entry(port_info_t * info,
 uint32_t mpls_entry)
  {
info->mpls_entry = mpls_entry;
info->seq_pkt[SINGLE_PKT].mpls_entry = info->mpls_entry;
 -  pktgen_packet_ctor(info, SINGLE_PKT, -1);
 +  pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
  }
  
  
 /
 **//**
 @@ -1182,7 +1182,7 @@ pktgen_set_qinq(port_info_t * info, uint32_t
 onOff)
}
else
pktgen_clr_port_flags(info, SEND_Q_IN_Q_IDS);
 -  pktgen_packet_ctor(info, SINGLE_PKT, -1);
 +  pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
  }
  
  
 /
 **//**
 @@ -1204,7 +1204,7 @@ pktgen_set_qinqids(port_info_t * info, uint16_t
 outerid, uint16_t innerid)
info->seq_pkt[SINGLE_PKT].qinq_outerid =

[dpdk-dev] [Pktgen] [PATCH] pktgen_setup_packets: fix race for packet header

2015-10-02 Thread Wiles, Keith

I looked at the code and everything looks good. I will try to merge the
code next week as I am traveling again :-(

Thanks for the patch, I am glad you found this problem as I believe
someone else reported something odd in that area, but was not able to give
me many details.
? 
Regards,
++Keith Wiles

Intel Corporation




On 10/2/15, 10:23 AM, "Ilya Maximets"  wrote:

>Ping.
>
>On 17.09.2015 08:55, Ilya Maximets wrote:
>> Ok. Thank you. I'll wait.
>> 
>> On 16.09.2015 18:37, Wiles, Keith wrote:
>>> Thanks the patch looks fine, but I have not had a lot of time to
>>>review it
>>> detail. I hope to get to it next week after I return back home.
>>>
>>> On 9/16/15, 2:09 AM, "Ilya Maximets"  wrote:
>>>
 Ping.

 On 09.09.2015 17:22, Ilya Maximets wrote:
> While pktgen_setup_packets() all threads of one port uses same
> info->seq_pkt. This leads to constructing packets in the same memory
> region
> (>hdr). As a result, pktgen_setup_packets generates random
>headers.
>
> Fix that by making a local copy of info->seq_pkt and using it for
> constructing of packets.
>
> Signed-off-by: Ilya Maximets 
> ---
>  app/pktgen-arp.c  |  2 +-
>  app/pktgen-cmds.c | 40 
>  app/pktgen-ipv4.c |  2 +-
>  app/pktgen.c  | 39 +++
>  app/pktgen.h  |  4 ++--
>  app/t/pktgen.t.c  |  6 +++---
>  6 files changed, 54 insertions(+), 39 deletions(-)
>
> diff --git a/app/pktgen-arp.c b/app/pktgen-arp.c
> index c378880..b7040d7 100644
> --- a/app/pktgen-arp.c
> +++ b/app/pktgen-arp.c
> @@ -190,7 +190,7 @@ pktgen_process_arp( struct rte_mbuf * m, uint32_t
> pid, uint32_t vlan )
>  
>   rte_memcpy(>eth_dst_addr, >sha, 6);
>   for (i = 0; i < info->seqCnt; i++)
> - pktgen_packet_ctor(info, i, -1);
> + pktgen_packet_ctor(info, i, -1, NULL);
>   }
>  
>   // Swap the two MAC addresses
> diff --git a/app/pktgen-cmds.c b/app/pktgen-cmds.c
> index da040e5..a6abb41 100644
> --- a/app/pktgen-cmds.c
> +++ b/app/pktgen-cmds.c
> @@ -931,7 +931,7 @@ pktgen_set_proto(port_info_t * info, char type)
>   if ( type == 'i' )
>   info->seq_pkt[SINGLE_PKT].ethType = ETHER_TYPE_IPv4;
>  
> - pktgen_packet_ctor(info, SINGLE_PKT, -1);
> + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
>  }
>  
>  
> 
>/*
>***
> **//**
> @@ -1067,7 +1067,7 @@ pktgen_set_pkt_type(port_info_t * info, const
> char * type)
>   
> (type[3] == '6') ? ETHER_TYPE_IPv6 :
>   
> /* TODO print error: unknown type */ ETHER_TYPE_IPv4;
>  
> - pktgen_packet_ctor(info, SINGLE_PKT, -1);
> + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
>  }
>  
>  
> 
>/*
>***
> **//**
> @@ -1092,7 +1092,7 @@ pktgen_set_vlan(port_info_t * info, uint32_t
> onOff)
>   }
>   else
>   pktgen_clr_port_flags(info, SEND_VLAN_ID);
> - pktgen_packet_ctor(info, SINGLE_PKT, -1);
> + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
>  }
>  
>  
> 
>/*
>***
> **//**
> @@ -1112,7 +1112,7 @@ pktgen_set_vlanid(port_info_t * info, uint16_t
> vlanid)
>  {
>   info->vlanid = vlanid;
>   info->seq_pkt[SINGLE_PKT].vlanid = info->vlanid;
> - pktgen_packet_ctor(info, SINGLE_PKT, -1);
> + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
>  }
>  
>  
> 
>/*
>***
> **//**
> @@ -1137,7 +1137,7 @@ pktgen_set_mpls(port_info_t * info, uint32_t
> onOff)
>   }
>   else
>   pktgen_clr_port_flags(info, SEND_MPLS_LABEL);
> - pktgen_packet_ctor(info, SINGLE_PKT, -1);
> + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
>  }
>  
>  
> 
>/*
>***
> **//**
> @@ -1157,7 +1157,7 @@ pktgen_set_mpls_entry(port_info_t * info,
> uint32_t mpls_entry)
>  {
>   info->mpls_entry = mpls_entry;
>   info->seq_pkt[SINGLE_PKT].mpls_entry = info->mpls_entry;
> - pktgen_packet_ctor(info, SINGLE_PKT, -1);
> + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL);
>  }
>  
>  
> 
>/*
>***
> **//**

[dpdk-dev] [PATCH] devargs: add blacklisting by linux interface name

2015-10-02 Thread Chas Williams

If a system is using deterministic interface names, it may be easier in
some cases to use the interface name to blacklist an interface.

Signed-off-by: Chas Williams <3chas3 at gmail.com>
---
 app/test/test_devargs.c |  2 ++
 lib/librte_eal/common/eal_common_devargs.c  |  8 
 lib/librte_eal/common/eal_common_options.c  | 10 ++
 lib/librte_eal/common/eal_common_pci.c  | 17 +++--
 lib/librte_eal/common/eal_options.h |  2 ++
 lib/librte_eal/common/include/rte_devargs.h |  5 +
 lib/librte_eal/common/include/rte_pci.h |  1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 15 +++
 8 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/app/test/test_devargs.c b/app/test/test_devargs.c
index f7fc59c..c204c49 100644
--- a/app/test/test_devargs.c
+++ b/app/test/test_devargs.c
@@ -85,6 +85,8 @@ test_devargs(void)
goto fail;
if (rte_eal_devargs_type_count(RTE_DEVTYPE_VIRTUAL) != 2)
goto fail;
+   if (rte_eal_devargs_add(RTE_DEVTYPE_BLACKLISTED_NAME, "eth0") < 0)
+   goto fail;
free_devargs_list();

/* check virtual device with argument parsing */
diff --git a/lib/librte_eal/common/eal_common_devargs.c 
b/lib/librte_eal/common/eal_common_devargs.c
index ec56165..cac651b 100644
--- a/lib/librte_eal/common/eal_common_devargs.c
+++ b/lib/librte_eal/common/eal_common_devargs.c
@@ -113,6 +113,14 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
*devargs_str)
goto fail;

break;
+   case RTE_DEVTYPE_BLACKLISTED_NAME:
+   /* save interface name */
+   ret = snprintf(devargs->name.name,
+  sizeof(devargs->name.name), "%s", buf);
+   if (ret < 0 || ret >= (int)sizeof(devargs->name.name))
+   goto fail;
+
+   break;
}

free(buf);
diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 1f459ac..c08126d 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -90,6 +90,7 @@ eal_long_options[] = {
{OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
{OPT_VMWARE_TSC_MAP,0, NULL, OPT_VMWARE_TSC_MAP_NUM   },
{OPT_XEN_DOM0,  0, NULL, OPT_XEN_DOM0_NUM },
+   {OPT_BLACKLISTED_NAME,  1, NULL, OPT_BLACKLISTED_NAME_NUM },
{0, 0, NULL, 0}
 };

@@ -785,6 +786,13 @@ eal_parse_common_option(int opt, const char *optarg,
}
break;

+   case OPT_BLACKLISTED_NAME_NUM:
+   if (rte_eal_devargs_add(RTE_DEVTYPE_BLACKLISTED_NAME,
+   optarg) < 0) {
+   return -1;
+   }
+   break;
+
/* don't know what to do, leave this to caller */
default:
return 1;
@@ -898,6 +906,8 @@ eal_common_usage(void)
   "  --"OPT_VDEV"  Add a virtual device.\n"
   "  The argument format is 
[,key=val,...]\n"
   "  (ex: --vdev=eth_pcap0,iface=eth2).\n"
+  "  --"OPT_BLACKLISTED_NAME" Add a device name to the black 
list.\n"
+  "  Prevent EAL from using this named 
interface.\n"
   "  --"OPT_VMWARE_TSC_MAP"Use VMware TSC map instead of 
native RDTSC\n"
   "  --"OPT_PROC_TYPE" Type of this process 
(primary|secondary|auto)\n"
   "  --"OPT_SYSLOG"Set syslog facility\n"
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index dcfe947..41a7690 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -90,11 +90,15 @@ static struct rte_devargs *pci_devargs_lookup(struct 
rte_pci_device *dev)
struct rte_devargs *devargs;

TAILQ_FOREACH(devargs, _list, next) {
-   if (devargs->type != RTE_DEVTYPE_BLACKLISTED_PCI &&
-   devargs->type != RTE_DEVTYPE_WHITELISTED_PCI)
-   continue;
-   if (!rte_eal_compare_pci_addr(>addr, >pci.addr))
-   return devargs;
+   if (devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI ||
+   devargs->type == RTE_DEVTYPE_WHITELISTED_PCI) {
+   if (!rte_eal_compare_pci_addr(>addr, 
>pci.addr))
+   return devargs;
+   }
+   if (devargs->type == RTE_DEVTYPE_BLACKLISTED_NAME) {
+   if (strcmp(dev->name, devargs->name.name) == 0)
+   return devargs;
+   }
}
return NULL;
 }
@@ -174,7 +178,8 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct

[dpdk-dev] [PATCH v3 6/8] mk: Add rule for installing nic bind files

2015-10-02 Thread Richardson, Bruce

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu Matilainen
> Sent: Friday, October 2, 2015 11:50 AM
> To: Arevalo, Mario Alfredo C; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 6/8] mk: Add rule for installing nic
> bind files
> 
> On 10/01/2015 03:11 AM, Mario Carrillo wrote:
> > Add hierarchy-file support to the DPDK nic bind files, when invoking
> > "make install-sbin" nic bind files will be installed by default in:
> > $(DESTDIR)/$(SBIN_DIR) where SBIN_DIR=/usr/sbin/dpdk_nic_bind by
> > default, you can override SBIN_DIR var.
> > This hierarchy is based on:
> > http://www.freedesktop.org/software/systemd/man/file-hierarchy.html
> > and dpdk spec file.
> >
> > Signed-off-by: Mario Carrillo 
> > ---
> >   mk/rte.sdkinstall.mk | 14 ++
> >   mk/rte.sdkroot.mk|  4 ++--
> >   2 files changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk index
> > 5a2fd40..4eecf31 100644
> > --- a/mk/rte.sdkinstall.mk
> > +++ b/mk/rte.sdkinstall.mk
> > @@ -46,11 +46,13 @@ else
> >   INCLUDE_DIR ?= /usr/include/dpdk
> >   BIN_DIR ?= /usr/bin
> >   DOC_DIR ?= /usr/share/doc/dpdk
> > +SBIN_DIR ?= /usr/sbin/dpdk_nic_bind
> >   HSLINKS := $(wildcard $(RTE_OUTPUT)/include/*)
> >   BINARY_FILES := $(patsubst %.map,,$(wildcard $(RTE_OUTPUT)/app/*))
> >   LIBS := $(wildcard $(RTE_OUTPUT)/lib/*)
> >   MODULES := $(wildcard $(RTE_OUTPUT)/kmod/*)
> >   DOCS := $(wildcard $(BUILD_DIR)/doc/*)
> > +NIC_BIND_FILES := $(wildcard $(BUILD_DIR)/tools/*nic_bind.py)
> >   include $(BUILD_DIR)/build/.config
> >   RTE_ARCH := $(CONFIG_RTE_ARCH:"%"=%)
> >   RTE_EXEC_ENV := $(CONFIG_RTE_EXEC_ENV:"%"=%) @@ -161,6 +163,18 @@
> > install-doc:
> > echo installing: $$DOC; \
> > done
> >   #
> > +# install nic bind files in /usr/sbin/dpdk_nic_bind # by default
> > +SBIN_DIR can be overridden.
> > +#
> 
> This creates an out-of-path directory /usr/sbin/dpdk_nic_bind/ in which
> the dpdk_nic_bind.py is installed. Besides not being a very accessible
> location, the FHS explicitly forbids creation of subdirectories below
> /usr/[s]bin.
> 
> SBIN_DIR should be /usr/sbin unless overridden, but OTOH I think this
> could go into /usr/bin just as well, the split is fairly ambiguous anyway
> (I mean, testpmd is not something a regular user is going to run
> either)
> 
> In addition, if dpdk_nic_bind.py is installed then perhaps the
> cpu_layout.py utility should be installed too?
> 
>   - Panu -

I think there are better utilities available for determining the core layout
that cpu_layout.py. "lstopo", for one, is much more powerful. Do we want/need
to keep our own script around for that?

/Bruce

[dpdk-dev] DPDK Logo Release

2015-10-02 Thread Thomas Monjalon

2015-10-01 15:28, St Leger, Jim:
> When can we expect the main website (including home page http://dpdk.org/) to 
> be updated?

As there was no comment about the demo page, it is now applied to every pages.

> Have we opened up the website to allow the community to edit it? (I think 
> this has been discussed in the past...)

Yes you're right, it has been discussed and planned.
The git tree will be published. Should it include the server config parts?
Should we open a new mailing-list to receive patches and discussions?

> Who owns website changes today?

As you know, it was created at 6WIND but there is no special reason
to not allow others to contribute.
The opening effort hasn't been done yet because there was no specific request
until recently and the email box admin at dpdk.org (advertised in the "about"
section) was almost empty.

Glad to see people interested in improving such things :)
Thanks Intel for the nice logo!
Do you have some insights about its signification to share?

[dpdk-dev] [PATCH 0/2] xenvirt hotplug support

2015-10-02 Thread Bernard Iremonger

add PCI Port Hotplug support to the xenvirt PMD


This patch depends on 4 patches from the following patch set:

-remove-pci-driver-from-vdevs.patch 

0001-librte_eal-add-RTE_KDRV_NONE-for-vdevs.patch
0002-librte_ether-add-fields-from-rte_pci_driver-to-rte_e.patch
0003-librte_ether-add-function-rte_eth_copy_dev_info.patch
0009-xenvirt-copy-pci-device-info-to-eth_dev-data.patch

Bernard Iremonger (2):
  xenvirt: add support for PCI Port Hotplug
  xenvirt: free queues in dev_close

 drivers/net/xenvirt/rte_eth_xenvirt.c | 87 +++
 drivers/net/xenvirt/rte_xen_lib.c | 26 +--
 drivers/net/xenvirt/rte_xen_lib.h |  5 +-
 3 files changed, 105 insertions(+), 13 deletions(-)

-- 
1.9.1

[dpdk-dev] [PATCH] vhost_xen: fix compile error in main.c

2015-10-02 Thread Bernard Iremonger

Signed-off-by: Bernard Iremonger 
---
 examples/vhost_xen/main.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/examples/vhost_xen/main.c b/examples/vhost_xen/main.c
index 5d20700..d124be1 100644
--- a/examples/vhost_xen/main.c
+++ b/examples/vhost_xen/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -579,6 +579,7 @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf 
**pkts, uint32_t count)
uint16_t res_base_idx, res_end_idx;
uint16_t free_entries;
uint8_t success = 0;
+   void *userdata;

LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh);
vq = dev->virtqueue_rx;
@@ -656,13 +657,14 @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf 
**pkts, uint32_t count)
vq->used->ring[res_cur_idx & (vq->size - 1)].len = packet_len;

/* Copy mbuf data to buffer */
-   rte_memcpy((void *)(uintptr_t)buff_addr, (const 
void*)buff->data, rte_pktmbuf_data_len(buff));
+   userdata = rte_pktmbuf_mtod(buff, void *);
+   rte_memcpy((void *)(uintptr_t)buff_addr, userdata, 
rte_pktmbuf_data_len(buff));

res_cur_idx++;
packet_success++;

/* mergeable is disabled then a header is required per buffer. 
*/
-   rte_memcpy((void *)(uintptr_t)buff_hdr_addr, (const 
void*)_hdr, vq->vhost_hlen);
+   rte_memcpy((void *)(uintptr_t)buff_hdr_addr, (const void 
*)_hdr, vq->vhost_hlen);
if (res_cur_idx < res_end_idx) {
/* Prefetch descriptor index. */
rte_prefetch0(>desc[head[packet_success]]);
-- 
1.9.1

[dpdk-dev] [PATCH 2/2] xenvirt: free queues in dev_close

2015-10-02 Thread Bernard Iremonger

Signed-off-by: Bernard Iremonger 
---
 drivers/net/xenvirt/rte_eth_xenvirt.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/net/xenvirt/rte_eth_xenvirt.c 
b/drivers/net/xenvirt/rte_eth_xenvirt.c
index 8923826..1bf35b7 100644
--- a/drivers/net/xenvirt/rte_eth_xenvirt.c
+++ b/drivers/net/xenvirt/rte_eth_xenvirt.c
@@ -75,6 +75,9 @@ static struct rte_eth_link pmd_link = {
.link_status = 0
 };

+static void
+eth_xenvirt_free_queues(struct rte_eth_dev *dev);
+
 static inline struct rte_mbuf *
 rte_rxmbuf_alloc(struct rte_mempool *mp)
 {
@@ -326,7 +329,7 @@ eth_dev_stop(struct rte_eth_dev *dev)
 static void
 eth_dev_close(struct rte_eth_dev *dev)
 {
-   RTE_SET_USED(dev);
+   eth_xenvirt_free_queues(dev);
 }

 static void
@@ -362,8 +365,9 @@ eth_stats_reset(struct rte_eth_dev *dev)
 }

 static void
-eth_queue_release(void *q __rte_unused)
+eth_queue_release(void *q)
 {
+   rte_free(q);
 }

 static int
@@ -524,7 +528,23 @@ eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
tx_queue_id,
return 0;
 }

+static void
+eth_xenvirt_free_queues(struct rte_eth_dev *dev)
+{
+   int i;

+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   eth_queue_release(dev->data->rx_queues[i]);
+   dev->data->rx_queues[i] = NULL;
+   }
+   dev->data->nb_rx_queues = 0;
+
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   eth_queue_release(dev->data->tx_queues[i]);
+   dev->data->tx_queues[i] = NULL;
+   }
+   dev->data->nb_tx_queues = 0;
+}

 static const struct eth_dev_ops ops = {
.dev_start = eth_dev_start,
-- 
1.9.1

[dpdk-dev] [PATCH 1/2] xenvirt: add support for PCI Port Hotplug

2015-10-02 Thread Bernard Iremonger

Signed-off-by: Bernard Iremonger 
---
 drivers/net/xenvirt/rte_eth_xenvirt.c | 63 +++
 drivers/net/xenvirt/rte_xen_lib.c | 26 ---
 drivers/net/xenvirt/rte_xen_lib.h |  5 ++-
 3 files changed, 83 insertions(+), 11 deletions(-)

diff --git a/drivers/net/xenvirt/rte_eth_xenvirt.c 
b/drivers/net/xenvirt/rte_eth_xenvirt.c
index b3383af..8923826 100644
--- a/drivers/net/xenvirt/rte_eth_xenvirt.c
+++ b/drivers/net/xenvirt/rte_eth_xenvirt.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -642,10 +642,14 @@ eth_dev_xenvirt_create(const char *name, const char 
*params,
if (internals == NULL)
goto err;

-   /* reserve an ethdev entry */
-   eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
-   if (eth_dev == NULL)
-   goto err;
+   /* find an ethdev entry */
+   eth_dev = rte_eth_dev_allocated(name);
+   if (eth_dev == NULL) {
+   /* reserve an ethdev entry */
+   eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
+   if (eth_dev == NULL)
+   goto err;
+   }

data->dev_private = internals;
data->port_id = eth_dev->data->port_id;
@@ -661,7 +665,7 @@ eth_dev_xenvirt_create(const char *name, const char *params,

eth_dev->data = data;
eth_dev->dev_ops = 
-   eth_dev->data->dev_flags = 0;
+   eth_dev->data->dev_flags = RTE_PCI_DRV_DETACHABLE;
eth_dev->data->kdrv = RTE_KDRV_NONE;
eth_dev->data->drv_name = NULL;
eth_dev->driver = NULL;
@@ -683,6 +687,38 @@ err:
 }


+static int
+eth_dev_xenvirt_free(const char *name, const unsigned numa_node)
+{
+   struct rte_eth_dev *eth_dev = NULL;
+
+   RTE_LOG(DEBUG, PMD,
+   "Free virtio rings backed ethdev on numa socket %u\n",
+   numa_node);
+
+   /* find an ethdev entry */
+   eth_dev = rte_eth_dev_allocated(name);
+   if (eth_dev == NULL)
+   return -1;
+
+   if (eth_dev->data->dev_started == 1) {
+   eth_dev_stop(eth_dev);
+   eth_dev_close(eth_dev);
+   }
+
+   eth_dev->rx_pkt_burst = NULL;
+   eth_dev->tx_pkt_burst = NULL;
+   eth_dev->dev_ops = NULL;
+
+   rte_free(eth_dev->data);
+   rte_free(eth_dev->data->dev_private);
+   rte_free(eth_dev->data->mac_addrs);
+
+   virtio_idx--;
+
+   return 0;
+}
+
 /*TODO: Support multiple process model */
 static int
 rte_pmd_xenvirt_devinit(const char *name, const char *params)
@@ -701,10 +737,25 @@ rte_pmd_xenvirt_devinit(const char *name, const char 
*params)
return 0;
 }

+static int
+rte_pmd_xenvirt_devuninit(const char *name)
+{
+   eth_dev_xenvirt_free(name, rte_socket_id());
+
+   if (virtio_idx == 0) {
+   if (xenstore_uninit() != 0)
+   RTE_LOG(ERR, PMD, "%s: xenstore uninit failed\n", 
__func__);
+
+   gntalloc_close();
+   }
+   return 0;
+}
+
 static struct rte_driver pmd_xenvirt_drv = {
.name = "eth_xenvirt",
.type = PMD_VDEV,
.init = rte_pmd_xenvirt_devinit,
+   .uninit = rte_pmd_xenvirt_devuninit,
 };

 PMD_REGISTER_DRIVER(pmd_xenvirt_drv);
diff --git a/drivers/net/xenvirt/rte_xen_lib.c 
b/drivers/net/xenvirt/rte_xen_lib.c
index b3932f0..5900b53 100644
--- a/drivers/net/xenvirt/rte_xen_lib.c
+++ b/drivers/net/xenvirt/rte_xen_lib.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -50,6 +50,7 @@

 #include 
 #include 
+#include 

 #include "rte_xen_lib.h"

@@ -72,6 +73,8 @@ int gntalloc_fd = -1;
 static char *dompath = NULL;
 /* handle to xenstore read/write operations */
 static struct xs_handle *xs = NULL;
+/* flag to indicate if xenstore cleanup is required */
+static bool is_xenstore_cleaned_up;

 /*
  * Reserve a virtual address space.
@@ -275,7 +278,6 @@ xenstore_init(void)
 {
unsigned int len, domid;
char *buf;
-   static int cleanup = 0;
char *end;

xs = xs_domain_open();
@@ -301,16 +303,32 @@ xenstore_init(void)

xs_transaction_start(xs); /* When to stop transaction */

-   if (cleanup == 0) {
+   if (is_xenstore_cleaned_up == 0) {
if (xenstore_cleanup())
return -1;
-   cleanup = 1;
+   is_xenstore_cleaned_up = 1;
}

return 0;
 }

 int
+xenstore_uninit(void)
+{
+   xs_close(xs);
+
+   if (is_xenstore_cleaned_up == 0) {
+   if

[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance

2015-10-02 Thread Alexander Duyck

On 10/02/2015 07:00 AM, Michael S. Tsirkin wrote:
> On Thu, Oct 01, 2015 at 02:02:24PM -0700, Alexander Duyck wrote:
>> validation and translation would add 10s if not 100s of nanoseconds to the
>> time needed to process each packet.  In addition we are talking about doing
>> this in kernel space which means we wouldn't really be able to take
>> advantage of things like SSE or AVX instructions.
> Yes. But the nice thing is that it's rearming so it can happen on
> a separate core, in parallel with packet processing.
> It does not need to add to latency.

Moving it to another core is automatically going to add extra latency.  
You will have to evict the data out of the L1 cache for one core and 
into the L1 cache for another when you update it, and then reading it 
will force it to have to transition back out.  If you are lucky it is 
only evicted to L2, if not then to L3, or possibly even back to memory.  
Odds are that alone will add tens of nanoseconds to the process, and you 
would need three or more cores to do the same workload as running the 
process over multiple threads means having to add synchronization 
primitives to the whole mess. Then there is the NUMA factor on top of that.

> You will burn up more CPU, but again, all this for boxes/hypervisors
> without an IOMMU.

There are use cases this will completely make useless.  If for example 
you are running a workload that needs three cores with DPDK bumping it 
to nine or more will likely push you out of being able to do the 
workload on some systems.

> I'm sure people can come up with even better approaches, once enough
> people get it that kernel absolutely needs to be protected from
> userspace.

I don't see that happening.  Many people don't care about kernel 
security that much.  If they did something like DPDK wouldn't have 
gotten off of the ground.  Once someone has the ability to load kernel 
modules any protection of the kernel from userspace pretty much goes 
right out the window.  You are just as much at risk from a buggy driver 
in userspace as you are from one that can be added to the kernel.

> Long term, the right thing to do is to focus on IOMMU support. This
> gives you hardware-based memory protection without need to burn up CPU
> cycles.

We have a solution that makes use of IOMMU support with vfio.  The 
problem is there are multiple cases where that support is either not 
available, or using the IOMMU provides excess overhead.

- Alex

[dpdk-dev] How kernel can share the mem from dpdk hugepage?

2015-10-02 Thread Kyle Larose

All of this information is in shared memory, is it not? For example,
you could patch the ring library to give a programmable interface to
the following function:
http://dpdk.org/doc/api/rte__ring_8h.html#a7bfcef0ad324fcc4c03bcb59cd7e867f.
This would allow you to see the full set of rings in a process that
has attached as a secondary to DPDK. Write a process that does this,
and then interfaces with whatever you have running in the kernel.

Ultimately, the architecture is pulling from userspace and pushing
into the kernel, rather than pulling directly from the kernel.

Does that help?

Thanks,

Kyle

On Thu, Oct 1, 2015 at 11:53 AM, ??  wrote:
> Hi all,
>
>
> I want to ask does anybody know how kernel can share the info from dpdk 
> hugepage. My project has a requirement which kernel needs to get some info 
> from dpdk application. Eg, in multi-process example, every client has a 
> shared ring buffer with server. The shared ring contains some meta data of 
> packets. Is it possible that dpdk share this info to kernel, then kernel can 
> access it? What are the key points that can help to achieve the goal?

[dpdk-dev] [PATCH] ethdev: distinguish between drop and error stats

2015-10-02 Thread Jay Rolette

Can you improve the comments on these counters? If you didn't happen to
follow this thread, there's no way to reasonably figure out what the
difference is from looking at the code without chasing it all the way down
and cross-referencing the NIC datasheet.

Thanks,
Jay

On Fri, Oct 2, 2015 at 7:47 AM, Maryam Tahhan 
wrote:

> Make a distniction between dropped packets and error statistics to allow
> a higher level fault management entity to interact with DPDK and take
> appropriate measures when errors are detected. It will also provide
> valuable information for any applications that collects/extracts DPDK
> stats, such applications include Open vSwitch.
> After this patch the distinction is:
> ierrors = Total number of packets dropped by hardware (malformed
> packets, ...) Where the # of drops can ONLY be <=  the packets received
> (without overlap between registers).
> Rx_pkt_errors = Total number of erroneous received packets. Where the #
> of errors can be >= the packets received (without overlap between
> registers), this is because there may be multiple errors associated with
> a packet.
>
> Signed-off-by: Maryam Tahhan 
> ---
>  lib/librte_ether/rte_ethdev.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 8a8c82b..53dd55d 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -200,8 +200,9 @@ struct rte_eth_stats {
> /**< Deprecated; Total of RX packets with CRC error. */
> uint64_t ibadlen;
> /**< Deprecated; Total of RX packets with bad length. */
> -   uint64_t ierrors;   /**< Total number of erroneous received
> packets. */
> +   uint64_t ierrors;   /**< Total number of dropped received packets.
> */
> uint64_t oerrors;   /**< Total number of failed transmitted
> packets. */
> +   uint64_t ipkterrors;   /**< Total number of erroneous received
> packets. */
> uint64_t imcasts;
> /**< Deprecated; Total number of multicast received packets. */
> uint64_t rx_nombuf; /**< Total number of RX mbuf allocation
> failures. */
> --
> 2.4.3
>
>

[dpdk-dev] [PATCH 00/52] update i40e base driver

2015-10-02 Thread Mcnamara, John

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Friday, October 2, 2015 12:39 AM
> To: Wu, Jingjing
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 00/52] update i40e base driver
> 
> 2015-09-09 01:53, Zhang, Helin:
> > Acked-by: Helin Zhang 
> 
> Applied, thanks.
> Some titles were fixed and some patches are squashed or reordered.
> 
> Maybe it deserves an entry in the release notes.

Hi Jingjing,

See the "Updated the i40e base driver" entry in the 2.1 Release note:

http://dpdk.org/doc/guides/rel_notes/release_2_1.html

It doesn't need to include every single change in the base driver. Just a 
summary of the main ones.

John
--

[dpdk-dev] [PATCH 00/52] update i40e base driver

2015-10-02 Thread Thomas Monjalon

2015-09-09 01:53, Zhang, Helin:
> Acked-by: Helin Zhang 

Applied, thanks.
Some titles were fixed and some patches are squashed or reordered.

Maybe it deserves an entry in the release notes.

[dpdk-dev] How kernel can share the mem from dpdk hugepage?

2015-10-02 Thread 张伟

Hi all, 


I want to ask does anybody know how kernel can share the info from dpdk 
hugepage. My project has a requirement which kernel needs to get some info from 
dpdk application. Eg, in multi-process example, every client has a shared ring 
buffer with server. The shared ring contains some meta data of packets. Is it 
possible that dpdk share this info to kernel, then kernel can access it? What 
are the key points that can help to achieve the goal?

53 matches

Mail list logo