[dpdk-dev] [PATCH] lib: added support for armv7 architecture
From: Amruta ZendeSigned-off-by: Amruta Zende Signed-off-by: David Hunt --- MAINTAINERS|5 + config/defconfig_arm-native-linuxapp-gcc | 56 .../common/include/arch/arm/rte_atomic.h | 269 .../common/include/arch/arm/rte_byteorder.h| 146 +++ .../common/include/arch/arm/rte_cpuflags.h | 138 ++ .../common/include/arch/arm/rte_cycles.h | 77 ++ .../common/include/arch/arm/rte_memcpy.h | 101 .../common/include/arch/arm/rte_prefetch.h | 64 + .../common/include/arch/arm/rte_rwlock.h | 70 + .../common/include/arch/arm/rte_spinlock.h | 116 + lib/librte_eal/common/include/arch/arm/rte_vect.h | 37 +++ lib/librte_eal/linuxapp/Makefile |3 + lib/librte_eal/linuxapp/arm_pmu/Makefile | 52 lib/librte_eal/linuxapp/arm_pmu/rte_enable_pmu.c | 83 ++ mk/arch/arm/rte.vars.mk| 58 + mk/machine/armv7-a/rte.vars.mk | 63 + mk/toolchain/gcc/rte.vars.mk |8 +- 17 files changed, 1343 insertions(+), 3 deletions(-) create mode 100644 config/defconfig_arm-native-linuxapp-gcc create mode 100644 lib/librte_eal/common/include/arch/arm/rte_atomic.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_byteorder.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cpuflags.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cycles.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_rwlock.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_spinlock.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_vect.h create mode 100755 lib/librte_eal/linuxapp/arm_pmu/Makefile create mode 100644 lib/librte_eal/linuxapp/arm_pmu/rte_enable_pmu.c create mode 100644 mk/arch/arm/rte.vars.mk create mode 100644 mk/machine/armv7-a/rte.vars.mk diff --git a/MAINTAINERS b/MAINTAINERS index 080a8e8..9d99d53 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -124,6 +124,11 @@ IBM POWER M: Chao Zhu F: lib/librte_eal/common/include/arch/ppc_64/ +Arm V7 +M: Amrute Zende +M: David Hunt +F: lib/librte_eal/common/include/arch/arm/ + Intel x86 M: Bruce Richardson M: Konstantin Ananyev diff --git a/config/defconfig_arm-native-linuxapp-gcc b/config/defconfig_arm-native-linuxapp-gcc new file mode 100644 index 000..159aa36 --- /dev/null +++ b/config/defconfig_arm-native-linuxapp-gcc @@ -0,0 +1,56 @@ +# BSD LICENSE +# +# Copyright(c) 2015 Intel Corporation. All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +# + +#include "common_linuxapp" + +CONFIG_RTE_MACHINE="armv7-a" + +CONFIG_RTE_ARCH="arm" +CONFIG_RTE_ARCH_ARM32=y +CONFIG_RTE_ARCH_32=y + +CONFIG_RTE_TOOLCHAIN="gcc" +CONFIG_RTE_TOOLCHAIN_GCC=y + +CONFIG_RTE_FORCE_INTRINSICS=y +CONFIG_RTE_LIBRTE_VHOST=n +CONFIG_RTE_LIBRTE_KNI=n +CONFIG_RTE_KNI_KMOD=n +CONFIG_RTE_LIBRTE_LPM=n +CONFIG_RTE_LIBRTE_ACL=n +CONFIG_RTE_LIBRTE_SCHED=n +CONFIG_RTE_LIBRTE_PORT=n +CONFIG_RTE_LIBRTE_PIPELINE=n +CONFIG_RTE_LIBRTE_TABLE=n +CONFIG_RTE_IXGBE_INC_VECTOR=n +CONFIG_RTE_LIBRTE_VIRTIO_PMD=n
[dpdk-dev] [PATCH] add armv7 architecture support
This patch provides EAL support for the ARMv7 architecture. We hope that this will encourage the ARM community to contribute PMDs for their SoCs to DPDK. For now, we've added Intel engineers to the MAINTAINERS file. We would like to encourage the ARM community to take over maintenance of this area in future, and to further improve it. This patch was tested on AXM5500 and Raspberry Pi 2 Model B+ Amruta Zende (1): lib: added support for armv7 architecture MAINTAINERS|5 + config/defconfig_arm-native-linuxapp-gcc | 56 .../common/include/arch/arm/rte_atomic.h | 269 .../common/include/arch/arm/rte_byteorder.h| 146 +++ .../common/include/arch/arm/rte_cpuflags.h | 138 ++ .../common/include/arch/arm/rte_cycles.h | 77 ++ .../common/include/arch/arm/rte_memcpy.h | 101 .../common/include/arch/arm/rte_prefetch.h | 64 + .../common/include/arch/arm/rte_rwlock.h | 70 + .../common/include/arch/arm/rte_spinlock.h | 116 + lib/librte_eal/common/include/arch/arm/rte_vect.h | 37 +++ lib/librte_eal/linuxapp/Makefile |3 + lib/librte_eal/linuxapp/arm_pmu/Makefile | 52 lib/librte_eal/linuxapp/arm_pmu/rte_enable_pmu.c | 83 ++ mk/arch/arm/rte.vars.mk| 58 + mk/machine/armv7-a/rte.vars.mk | 63 + mk/toolchain/gcc/rte.vars.mk |8 +- 17 files changed, 1343 insertions(+), 3 deletions(-) create mode 100644 config/defconfig_arm-native-linuxapp-gcc create mode 100644 lib/librte_eal/common/include/arch/arm/rte_atomic.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_byteorder.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cpuflags.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cycles.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_rwlock.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_spinlock.h create mode 100644 lib/librte_eal/common/include/arch/arm/rte_vect.h create mode 100755 lib/librte_eal/linuxapp/arm_pmu/Makefile create mode 100644 lib/librte_eal/linuxapp/arm_pmu/rte_enable_pmu.c create mode 100644 mk/arch/arm/rte.vars.mk create mode 100644 mk/machine/armv7-a/rte.vars.mk -- 1.7.4.1
[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance
On Fri, Oct 02, 2015 at 05:00:14PM +0300, Michael S. Tsirkin wrote: > On Thu, Oct 01, 2015 at 02:02:24PM -0700, Alexander Duyck wrote: > > validation and translation would add 10s if not 100s of nanoseconds to the > > time needed to process each packet. In addition we are talking about doing > > this in kernel space which means we wouldn't really be able to take > > advantage of things like SSE or AVX instructions. > > Yes. But the nice thing is that it's rearming so it can happen on > a separate core, in parallel with packet processing. > It does not need to add to latency. > Modern nics have no less queues than most machines has cores. There is no such thing as free core to offload you processing to, otherwise you designed your application wrong and waste cpu cycles. > You will burn up more CPU, but again, all this for boxes/hypervisors > without an IOMMU. > > I'm sure people can come up with even better approaches, once enough > people get it that kernel absolutely needs to be protected from > userspace. > People should not "get" things which are, lets be polite here, untrue. The kernel never tried to protect itself from userspace rumning on behalf of root. Secure boot, which is quite recent, is may be an only instance where kernel tries to do so (unfortunately) and it does so by disabling things if boot is secure. Linux was always "jack of all trades" and was suitable to run on a machine with secure boot and a vm that acts as application container or embedded device running packet forwarding. the only valid point is that nobody should debug crashes that may be caused by buggy userspace and tainting kernel solves that. > Long term, the right thing to do is to focus on IOMMU support. This > gives you hardware-based memory protection without need to burn up CPU > cycles. > > -- > MST -- Gleb.
[dpdk-dev] [PATCH 1/6] cxgbe: Optimize forwarding performance for 40G
Hi Rahul, Rahul Lakkireddy writes: > Update sge initialization with respect to free-list manager configuration > and ingress arbiter. Also update refill logic to refill mbufs only after > a certain threshold for rx. Optimize tx packet prefetch and free. <> > for (i = 0; i < sd->coalesce.idx; i++) { > - rte_pktmbuf_free(sd->coalesce.mbuf[i]); > + struct rte_mbuf *tmp = sd->coalesce.mbuf[i]; > + > + do { > + struct rte_mbuf *next = tmp->next; > + > + rte_pktmbuf_free_seg(tmp); > + tmp = next; > + } while (tmp); > sd->coalesce.mbuf[i] = NULL; Pardon my ignorance here, but rte_pktmbuf_free does this work. I can't actually see much difference between your rewrite of this block, and the implementation of rte_pktmbuf_free() (apart from moving your branch to the end of the function). Did your microbenchmarking really show this as an improvement? Thanks for your time, Aaron
[dpdk-dev] [PATCH 3/3] example: PTP client slave minimal implementation
Add a sample application that acts as a PTP slave using the DPDK ieee1588 functions. Signed-off-by: Daniel Mrzyglod --- MAINTAINERS| 3 + examples/Makefile | 1 + examples/ptpclient/Makefile| 59 + examples/ptpclient/ptpclient.c | 525 + 4 files changed, 588 insertions(+) create mode 100644 examples/ptpclient/Makefile create mode 100644 examples/ptpclient/ptpclient.c diff --git a/MAINTAINERS b/MAINTAINERS index 080a8e8..a80ce96 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -514,3 +514,6 @@ F: examples/tep_termination/ F: examples/vmdq/ F: examples/vmdq_dcb/ F: doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst + +M: Daniel Mrzyglod +F: examples/ptpclient \ No newline at end of file diff --git a/examples/Makefile b/examples/Makefile index b4eddbd..4672534 100644 --- a/examples/Makefile +++ b/examples/Makefile @@ -74,5 +74,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_XEN_DOM0) += vhost_xen DIRS-y += vmdq DIRS-y += vmdq_dcb DIRS-$(CONFIG_RTE_LIBRTE_POWER) += vm_power_manager +DIRS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ptpclient include $(RTE_SDK)/mk/rte.extsubdir.mk diff --git a/examples/ptpclient/Makefile b/examples/ptpclient/Makefile new file mode 100644 index 000..503339f --- /dev/null +++ b/examples/ptpclient/Makefile @@ -0,0 +1,59 @@ +# BSD LICENSE +# +# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, can be overriddegitn by command line or environment +RTE_TARGET ?= x86_64-native-linuxapp-gcc + +include $(RTE_SDK)/mk/rte.vars.mk + +# binary name +APP = ptpclient + +# all source are stored in SRCS-y +SRCS-y := ptpclient.c +#SRCS-$(CONFIG_RTE_LIBRTE_IEEE1588) := ptpclient.c + + +CFLAGS += $(WERROR_FLAGS) + +# workaround for a gcc bug with noreturn attribute +# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) +CFLAGS_main.o += -Wno-return-type +endif + +EXTRA_CFLAGS += -O3 + +include $(RTE_SDK)/mk/rte.extapp.mk diff --git a/examples/ptpclient/ptpclient.c b/examples/ptpclient/ptpclient.c new file mode 100644 index 000..1fe8e6d --- /dev/null +++ b/examples/ptpclient/ptpclient.c @@ -0,0 +1,525 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +
[dpdk-dev] [PATCH 2/3] ixgbe: add additional ieee1588 support functions
Add additional functions to support the existing IEEE1588 functionality and to enable getting, setting and adjusting the device time. Signed-off-by: Daniel Mrzyglod --- drivers/net/ixgbe/ixgbe_ethdev.c | 250 +-- drivers/net/ixgbe/ixgbe_ethdev.h | 24 2 files changed, 263 insertions(+), 11 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index ec2918c..d0c575f 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -126,10 +126,12 @@ #define IXGBE_HKEY_MAX_INDEX 10 /* Additional timesync values. */ -#define IXGBE_TIMINCA_16NS_SHIFT 24 -#define IXGBE_TIMINCA_INCVALUE 1600 -#define IXGBE_TIMINCA_INIT ((0x02 << IXGBE_TIMINCA_16NS_SHIFT) \ - | IXGBE_TIMINCA_INCVALUE) +#define NSEC_PER_SEC 10L +#define IXGBE_INCVAL_10GB0x +#define IXGBE_INCVAL_SHIFT_10GB 28 +#define IXGBE_INCVAL_SHIFT_82599 7 +#define IXGBE_INCPER_SHIFT_82599 24 +#define IXGBE_CYCLECOUTER_MASK 0x static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev); static int eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev); @@ -325,6 +327,11 @@ static int ixgbe_timesync_read_rx_timestamp(struct rte_eth_dev *dev, uint32_t flags); static int ixgbe_timesync_read_tx_timestamp(struct rte_eth_dev *dev, struct timespec *timestamp); +static int ixgbe_timesync_adjust(struct rte_eth_dev *dev, int64_t delta); +static int ixgbe_timesync_gettime(struct rte_eth_dev *dev, + struct timespec *timestamp); +static int ixgbe_timesync_settime(struct rte_eth_dev *dev, + struct timespec *timestamp); /* * Define VF Stats MACRO for Non "cleared on read" register @@ -465,6 +472,9 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = { .get_eeprom_length= ixgbe_get_eeprom_length, .get_eeprom = ixgbe_get_eeprom, .set_eeprom = ixgbe_set_eeprom, + .timesync_adjust = ixgbe_timesync_adjust, + .timesync_gettime = ixgbe_timesync_gettime, + .timesync_settime = ixgbe_timesync_settime, }; /* @@ -5241,20 +5251,223 @@ ixgbe_dev_set_mc_addr_list(struct rte_eth_dev *dev, ixgbe_dev_addr_list_itr, TRUE); } +static inline uint64_t +timespec_to_ns(const struct timespec *ts) +{ + return ((uint64_t) ts->tv_sec * NSEC_PER_SEC) + ts->tv_nsec; +} + +static struct timespec +ns_to_timespec(int64_t nsec) +{ +struct timespec ts = {0, 0}; +int32_t rem; + +if (nsec == 0) +return ts; +rem = nsec % NSEC_PER_SEC; +ts.tv_sec = nsec / NSEC_PER_SEC; + +if (unlikely(rem < 0)) { +ts.tv_sec--; +rem += NSEC_PER_SEC; +} + +ts.tv_nsec = rem; + +return ts; +} + +static inline uint64_t +cyclecounter_cycles_to_ns(const struct cyclecounter *cc, + uint64_t cycles, uint64_t mask, uint64_t *frac) +{ + uint64_t ns = cycles; + + ns = (ns * cc->mult) + *frac; + *frac = ns & mask; + return ns >> cc->shift; +} + +static uint64_t +cyclecounter_cycles_to_ns_backwards(const struct cyclecounter *cc, + uint64_t cycles, uint64_t mask __rte_unused, uint64_t frac) +{ + uint64_t ns = (uint64_t) cycles; + + ns = ((ns * cc->mult) - frac) >> cc->shift; + + return ns; +} + +static uint64_t +timecounter_cycles_to_ns_time(struct timecounter *tc, uint64_t cycle_tstamp) +{ + uint64_t delta = (cycle_tstamp - tc->cycle_last) & tc->cc->mask; + uint64_t nsec = tc->nsec, frac = tc->frac; + + + /* Cycle counts that are corectly converted as they +* are between -1/2 max cycle count and +1/2max cycle count +* */ + if (delta > tc->cc->mask / 2) { + delta = (tc->cycle_last - cycle_tstamp) & tc->cc->mask; + nsec -= cyclecounter_cycles_to_ns_backwards(tc->cc, delta, tc->mask, frac); + } else { + nsec += cyclecounter_cycles_to_ns(tc->cc, delta, tc->mask, ); + } + + return nsec; +} + +static uint64_t +ixgbe_read_timesync_cyclecounter(struct rte_eth_dev *dev) +{ + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); + uint64_t systim_cycles = 0; + + systim_cycles |= (uint64_t)IXGBE_READ_REG(hw, IXGBE_SYSTIML); + systim_cycles |= (uint64_t)IXGBE_READ_REG(hw, IXGBE_SYSTIMH) << 32; + + return systim_cycles; +} + +static uint64_t +timecounter_read_ns_delta(struct rte_eth_dev *dev) +{ + uint64_t cycle_now, cycle_delta; + uint64_t ns_offset; + struct ixgbe_adapter *adapter = + (struct ixgbe_adapter *)dev->data->dev_private; + + /* read cycle counter: */ +
[dpdk-dev] [PATCH 1/3] ethdev: add additional ieee1588 support functions
Add additional functions to support the existing IEEE1588 functionality. * rte_eth_timesync_settime(), function to set the device clock time. * rte_eth_timesync_gettime, function to get the device clock time. * rte_eth_timesync_adjust, function to adjust the device clock time. Signed-off-by: Daniel Mrzyglod --- lib/librte_ether/rte_ethdev.c | 36 +++ lib/librte_ether/rte_ethdev.h | 64 ++ lib/librte_ether/rte_ether_version.map | 9 + 3 files changed, 109 insertions(+) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index f593f6e..6f26f3a 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -3272,6 +3272,42 @@ rte_eth_timesync_read_rx_timestamp(uint8_t port_id, struct timespec *timestamp, } int +rte_eth_timesync_adjust(uint8_t port_id, int64_t delta) +{ + struct rte_eth_dev *dev; + + VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + dev = _eth_devices[port_id]; + + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->timesync_adjust, -ENOTSUP); + return (*dev->dev_ops->timesync_adjust)(dev, delta); +} + +int +rte_eth_timesync_gettime(uint8_t port_id, struct timespec *timestamp) +{ + struct rte_eth_dev *dev; + + VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + dev = _eth_devices[port_id]; + + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->timesync_gettime, -ENOTSUP); + return (*dev->dev_ops->timesync_gettime)(dev, timestamp); +} + +int +rte_eth_timesync_settime(uint8_t port_id, struct timespec *timestamp) +{ + struct rte_eth_dev *dev; + + VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + dev = _eth_devices[port_id]; + + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->timesync_settime, -ENOTSUP); + return (*dev->dev_ops->timesync_settime)(dev, timestamp); +} + +int rte_eth_timesync_read_tx_timestamp(uint8_t port_id, struct timespec *timestamp) { struct rte_eth_dev *dev; diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 8a8c82b..6fdaacd 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -1129,6 +1129,17 @@ typedef int (*eth_timesync_read_tx_timestamp_t)(struct rte_eth_dev *dev, struct timespec *timestamp); /**< @internal Function used to read a TX IEEE1588/802.1AS timestamp. */ +typedef int (*eth_timesync_adjust)(struct rte_eth_dev *dev, int64_t); +/**< @internal Function used to adjust device clock */ + +typedef int (*eth_timesync_gettime)(struct rte_eth_dev *dev, + struct timespec *timestamp); +/**< @internal Function used to get time from device clock. */ + +typedef int (*eth_timesync_settime)(struct rte_eth_dev *dev, + struct timespec *timestamp); +/**< @internal Function used to get time from device clock */ + typedef int (*eth_get_reg_length_t)(struct rte_eth_dev *dev); /**< @internal Retrieve device register count */ @@ -1312,6 +1323,12 @@ struct eth_dev_ops { eth_timesync_read_rx_timestamp_t timesync_read_rx_timestamp; /** Read the IEEE1588/802.1AS TX timestamp. */ eth_timesync_read_tx_timestamp_t timesync_read_tx_timestamp; + /** Adjust the device clock */ + eth_timesync_adjust timesync_adjust; + /** Get the device clock timespec */ + eth_timesync_gettime timesync_gettime; + /** Set the device clock timespec */ + eth_timesync_settime timesync_settime; }; /** @@ -3598,6 +3615,53 @@ extern int rte_eth_timesync_read_rx_timestamp(uint8_t port_id, extern int rte_eth_timesync_read_tx_timestamp(uint8_t port_id, struct timespec *timestamp); +/** + * Adjust the timesync clock on an Ethernet device.. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param delta + * The adjustment in nanoseconds + * + * @return + * - 0: Success. + * - -ENODEV: The port ID is invalid. + * - -ENOTSUP: The function is not supported by the Ethernet driver. + */ +extern int rte_eth_timesync_adjust(uint8_t port_id, int64_t delta); + +/** + * Read the time from the timesync clock on an Ethernet device. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param time + * Pointer to the timespec struct. + * + * @return + * - 0: Success. + */ +extern int rte_eth_timesync_gettime(uint8_t port_id, + struct timespec *time); + + +/** + * Set the time of the timesync clock on an Ethernet device. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param time + * Pointer to the timespec struct. + * + * @return + * - 0: Success. + * - -EINVAL: No timestamp is available. + * - -ENODEV: The port ID is invalid. + * - -ENOTSUP: The function is not supported by the Ethernet driver. + */ +extern int rte_eth_timesync_settime(uint8_t port_id, +
[dpdk-dev] [PATCH 0/3] add sample ptp slave application
Add a sample application that acts as a PTP slave using the DPDK IEEE1588 functions. Also add some additional IEEE1588 support functions to enable getting, setting and adjusting the device time. Some V1 limitations of the app: * The mater clock sequence id and clock id are not verified fully. * Only one master clock is supported/assumed. To be added: * Support for igb and i40e. * Multiple checks on clock source. * Some additional protocol values may be required to be parsed for more complex PTP environments. * Add frequency adjustment as well as absolute time adjustment. * Make the implementation NIC speed independent. * Check for linkup/down. Daniel Mrzyglod (3): ethdev: add additional ieee1588 support functions ixgbe: add additional ieee1588 support functions example: PTP client slave minimal implementation MAINTAINERS| 3 + drivers/net/ixgbe/ixgbe_ethdev.c | 250 +++- drivers/net/ixgbe/ixgbe_ethdev.h | 24 ++ examples/Makefile | 1 + examples/ptpclient/Makefile| 59 examples/ptpclient/ptpclient.c | 525 + lib/librte_ether/rte_ethdev.c | 36 +++ lib/librte_ether/rte_ethdev.h | 64 lib/librte_ether/rte_ether_version.map | 9 + 9 files changed, 960 insertions(+), 11 deletions(-) create mode 100644 examples/ptpclient/Makefile create mode 100644 examples/ptpclient/ptpclient.c -- 2.1.0
[dpdk-dev] [PATCH] hash: free internal ring when freeing hash
Since freeing a ring is now possible, then when freeing a hash table, its internal ring can be freed as well. Therefore when a new table, with the same name as a previously freed table, is created, there is no need to look up the already allocated ring. Signed-off-by: Pablo de Lara --- This patch depends on patch "ring: add function to free a ring" (http://dpdk.org/dev/patchwork/patch/7376/) lib/librte_hash/rte_cuckoo_hash.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c index 7019763..409fc2e 100644 --- a/lib/librte_hash/rte_cuckoo_hash.c +++ b/lib/librte_hash/rte_cuckoo_hash.c @@ -180,7 +180,7 @@ rte_hash_create(const struct rte_hash_parameters *params) struct rte_hash_list *hash_list; struct rte_ring *r = NULL; char hash_name[RTE_HASH_NAMESIZE]; - void *ptr, *k = NULL; + void *k = NULL; void *buckets = NULL; char ring_name[RTE_RING_NAMESIZE]; unsigned i; @@ -288,13 +288,7 @@ rte_hash_create(const struct rte_hash_parameters *params) #endif snprintf(ring_name, sizeof(ring_name), "HT_%s", params->name); - r = rte_ring_lookup(ring_name); - if (r != NULL) { - /* clear the free ring */ - while (rte_ring_dequeue(r, ) == 0) - rte_pause(); - } else - r = rte_ring_create(ring_name, rte_align32pow2(params->entries + 1), + r = rte_ring_create(ring_name, rte_align32pow2(params->entries + 1), params->socket_id, 0); if (r == NULL) { RTE_LOG(ERR, HASH, "memory allocation failed\n"); @@ -363,6 +357,7 @@ rte_hash_free(struct rte_hash *h) rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK); + rte_ring_free(h->free_slots); rte_free(h->key_store); rte_free(h->buckets); rte_free(h); -- 2.4.3
[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance
On Thu, Oct 01, 2015 at 02:02:24PM -0700, Alexander Duyck wrote: > validation and translation would add 10s if not 100s of nanoseconds to the > time needed to process each packet. In addition we are talking about doing > this in kernel space which means we wouldn't really be able to take > advantage of things like SSE or AVX instructions. Yes. But the nice thing is that it's rearming so it can happen on a separate core, in parallel with packet processing. It does not need to add to latency. You will burn up more CPU, but again, all this for boxes/hypervisors without an IOMMU. I'm sure people can come up with even better approaches, once enough people get it that kernel absolutely needs to be protected from userspace. Long term, the right thing to do is to focus on IOMMU support. This gives you hardware-based memory protection without need to burn up CPU cycles. -- MST
[dpdk-dev] [PATCH v4] ring: add function to free a ring
From: "De Lara Guarch, Pablo"When creating a ring, a memzone is created to allocate it in memory, but the ring could not be freed, as memzones could not be. Since memzones can be freed now, then rings can be as well, taking into account if they were initialized using pre-allocated memory (in which case, memory should be freed externally) or using rte_memzone_reserve (with rte_ring_create), freeing the memory with rte_memzone_free. Signed-off-by: Pablo de Lara --- Changes in v4: - Include below missing patch ID which this patch depends on Changes in v3: - Simplify patch using stored memzone address in ring structure - Change copyright date Changes in v2: - Include note in release notes - Add error log when ring cannot be freed This patch depends on patch "rte_ring: store memzone pointer inside ring" (http://dpdk.org/dev/patchwork/patch/7308) doc/guides/rel_notes/release_2_2.rst | 4 +++ lib/librte_ring/rte_ring.c | 47 +++- lib/librte_ring/rte_ring.h | 7 ++ lib/librte_ring/rte_ring_version.map | 7 ++ 4 files changed, 64 insertions(+), 1 deletion(-) diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst index 5687676..24937ac 100644 --- a/doc/guides/rel_notes/release_2_2.rst +++ b/doc/guides/rel_notes/release_2_2.rst @@ -4,6 +4,10 @@ DPDK Release 2.2 New Features +* **Enabled freeing of rte_ring.** + + New function rte_ring_free() allows the user to free a ring + if it was created with rte_ring_create(). Resolved Issues --- diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c index 4e78e14..d80faf3 100644 --- a/lib/librte_ring/rte_ring.c +++ b/lib/librte_ring/rte_ring.c @@ -1,7 +1,7 @@ /*- * BSD LICENSE * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -209,6 +209,51 @@ rte_ring_create(const char *name, unsigned count, int socket_id, return r; } +/* free the ring */ +void +rte_ring_free(struct rte_ring *r) +{ + struct rte_ring_list *ring_list = NULL; + struct rte_tailq_entry *te; + + if (r == NULL) + return; + + /* +* Ring was not created with rte_ring_create, +* therefore, there is no memzone to free. +*/ + if (r->memzone == NULL) { + RTE_LOG(ERR, RING, "Cannot free ring (not created with rte_ring_create()"); + return; + } + + if (rte_memzone_free(r->memzone) != 0) { + RTE_LOG(ERR, RING, "Cannot free memory\n"); + return; + } + + ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list); + rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK); + + /* find out tailq entry */ + TAILQ_FOREACH(te, ring_list, next) { + if (te->data == (void *) r) + break; + } + + if (te == NULL) { + rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK); + return; + } + + TAILQ_REMOVE(ring_list, te, next); + + rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK); + + rte_free(te); +} + /* * change the high water mark. If *count* is 0, water marking is * disabled diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index df45f3f..fb5a626 100644 --- a/lib/librte_ring/rte_ring.h +++ b/lib/librte_ring/rte_ring.h @@ -304,6 +304,13 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count, */ struct rte_ring *rte_ring_create(const char *name, unsigned count, int socket_id, unsigned flags); +/** + * De-allocate all memory used by the ring. + * + * @param r + * Ring to free + */ +void rte_ring_free(struct rte_ring *r); /** * Change the high water mark. diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map index 982fdd1..5474b98 100644 --- a/lib/librte_ring/rte_ring_version.map +++ b/lib/librte_ring/rte_ring_version.map @@ -11,3 +11,10 @@ DPDK_2.0 { local: *; }; + +DPDK_2.2 { + global: + + rte_ring_free; + +} DPDK_2.0; -- 2.4.3
[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance
On Thu, Oct 01, 2015 at 02:17:49PM -0700, Alexander Duyck wrote: > On 10/01/2015 02:42 AM, Michael S. Tsirkin wrote: > >On Thu, Oct 01, 2015 at 12:22:46PM +0300, Avi Kivity wrote: > >>even when they are some users > >>prefer to avoid the performance penalty. > >I don't think there's a measureable penalty from passing through the > >IOMMU, as long as mappings are mostly static (i.e. iommu=pt). I sure > >never saw any numbers that show such. > > It depends on the IOMMU. I believe Intel had a performance penalty on all > CPUs prior to Ivy Bridge. Since then things have improved to where they are > comparable to bare metal. > > The graph on page 5 of > https://networkbuilders.intel.com/docs/Network_Builders_RA_vBRAS_Final.pdf > shows the penalty clear as day. Pretty much anything before Ivy Bridge w/ > small packets is slowed to a crawl with an IOMMU enabled. > > - Alex VMs are running with IOMMU enabled anyway. Avi here tells us no one uses SRIOV on bare metal so ... we don't need to argue about that. -- MST
[dpdk-dev] [PATCH 6/6] doc: Update cxgbe documentation and release notes
- Add a missed step to mount huge pages in Linux. - Re-structure Sample Application Notes. - Add Jumbo Frame support to list of supported features and instructions on how to enable it via testpmd. - Update release notes. Signed-off-by: Rahul Lakkireddy Signed-off-by: Kumar Sanghvi --- doc/guides/nics/cxgbe.rst| 81 +--- doc/guides/rel_notes/release_2_2.rst | 5 +++ 2 files changed, 61 insertions(+), 25 deletions(-) diff --git a/doc/guides/nics/cxgbe.rst b/doc/guides/nics/cxgbe.rst index 148cd25..d718f19 100644 --- a/doc/guides/nics/cxgbe.rst +++ b/doc/guides/nics/cxgbe.rst @@ -50,6 +50,7 @@ CXGBE PMD has support for: - Promiscuous mode - All multicast mode - Port hardware statistics +- Jumbo frames Limitations --- @@ -211,8 +212,8 @@ Unified Wire package for Linux operating system are as follows: firmware-version: 1.13.32.0, TP 0.1.4.8 -Sample Application Notes - +Running testpmd +~~~ This section demonstrates how to launch **testpmd** with Chelsio T5 devices managed by librte_pmd_cxgbe in Linux operating system. @@ -260,6 +261,13 @@ devices managed by librte_pmd_cxgbe in Linux operating system. echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages +#. Mount huge pages: + + .. code-block:: console + + mkdir /mnt/huge + mount -t hugetlbfs nodev /mnt/huge + #. Load igb_uio or vfio-pci driver: .. code-block:: console @@ -329,19 +337,7 @@ devices managed by librte_pmd_cxgbe in Linux operating system. .. note:: Flow control pause TX/RX is disabled by default and can be enabled via - testpmd as follows: - - .. code-block:: console - - testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off autoneg on 0 - testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off autoneg on 1 - - To disable again, use: - - .. code-block:: console - - testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off autoneg off 0 - testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off autoneg off 1 + testpmd. Refer section :ref:`flow-control` for more details. FreeBSD --- @@ -409,8 +405,8 @@ Unified Wire package for FreeBSD operating system are as follows: dev.t5nex.0.firmware_version: 1.13.32.0 -Sample Application Notes - +Running testpmd +~~~ This section demonstrates how to launch **testpmd** with Chelsio T5 devices managed by librte_pmd_cxgbe in FreeBSD operating system. @@ -543,16 +539,51 @@ devices managed by librte_pmd_cxgbe in FreeBSD operating system. .. note:: Flow control pause TX/RX is disabled by default and can be enabled via - testpmd as follows: + testpmd. Refer section :ref:`flow-control` for more details. - .. code-block:: console +Sample Application Notes + - testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off autoneg on 0 - testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off autoneg on 1 +.. _flow-control: - To disable again, use: +Enable/Disable Flow Control +~~~ - .. code-block:: console +Flow control pause TX/RX is disabled by default and can be enabled via +testpmd as follows: + +.. code-block:: console + + testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off autoneg on 0 + testpmd> set flow_ctrl rx on tx on 0 0 0 0 mac_ctrl_frame_fwd off autoneg on 1 + +To disable again, run: + +.. code-block:: console + + testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off autoneg off 0 + testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off autoneg off 1 + +Jumbo Mode +~~ + +There are two ways to enable sending and receiving of jumbo frames via testpmd. +One method involves using the **mtu** command, which changes the mtu of an +individual port without having to stop the selected port. Another method +involves stopping all the ports first and then running **max-pkt-len** command +to configure the mtu of all the ports with a single command. + +- To configure each port individually, run the mtu command as follows: + + .. code-block:: console + + testpmd> port config mtu 0 9000 + testpmd> port config mtu 1 9000 + +- To configure all the ports at once, stop all the ports first and run the + max-pkt-len command as follows: + + .. code-block:: console - testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off autoneg off 0 - testpmd> set flow_ctrl rx off tx off 0 0 0 0 mac_ctrl_frame_fwd off autoneg off 1 + testpmd> port stop all + testpmd> port config all max-pkt-len 9000 diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst index 5687676..a3f4f77 100644 --- a/doc/guides/rel_notes/release_2_2.rst +++ b/doc/guides/rel_notes/release_2_2.rst @@ -4,6 +4,11 @@ DPDK Release 2.2 New
[dpdk-dev] [PATCH 5/6] cxgbe: Allow apps to change mtu
Add a mtu_set() eth_dev_ops to allow DPDK apps to modify device mtu. Signed-off-by: Rahul Lakkireddy Signed-off-by: Kumar Sanghvi --- drivers/net/cxgbe/cxgbe_ethdev.c | 29 + 1 file changed, 29 insertions(+) diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c index 6d7b29c..a8e057b 100644 --- a/drivers/net/cxgbe/cxgbe_ethdev.c +++ b/drivers/net/cxgbe/cxgbe_ethdev.c @@ -225,6 +225,34 @@ static int cxgbe_dev_link_update(struct rte_eth_dev *eth_dev, return 0; } +static int cxgbe_dev_mtu_set(struct rte_eth_dev *eth_dev, uint16_t mtu) +{ + struct port_info *pi = (struct port_info *)(eth_dev->data->dev_private); + struct adapter *adapter = pi->adapter; + struct rte_eth_dev_info dev_info; + int err; + uint16_t new_mtu = mtu + ETHER_HDR_LEN + ETHER_CRC_LEN; + + cxgbe_dev_info_get(eth_dev, _info); + + /* Must accommodate at least ETHER_MIN_MTU */ + if ((new_mtu < ETHER_MIN_MTU) || (new_mtu > dev_info.max_rx_pktlen)) + return -EINVAL; + + /* set to jumbo mode if needed */ + if (new_mtu > ETHER_MAX_LEN) + eth_dev->data->dev_conf.rxmode.jumbo_frame = 1; + else + eth_dev->data->dev_conf.rxmode.jumbo_frame = 0; + + err = t4_set_rxmode(adapter, adapter->mbox, pi->viid, new_mtu, -1, -1, + -1, -1, true); + if (!err) + eth_dev->data->dev_conf.rxmode.max_rx_pkt_len = new_mtu; + + return err; +} + static int cxgbe_dev_tx_queue_start(struct rte_eth_dev *eth_dev, uint16_t tx_queue_id); static int cxgbe_dev_rx_queue_start(struct rte_eth_dev *eth_dev, @@ -724,6 +752,7 @@ static struct eth_dev_ops cxgbe_eth_dev_ops = { .dev_configure = cxgbe_dev_configure, .dev_infos_get = cxgbe_dev_info_get, .link_update= cxgbe_dev_link_update, + .mtu_set= cxgbe_dev_mtu_set, .tx_queue_setup = cxgbe_dev_tx_queue_setup, .tx_queue_start = cxgbe_dev_tx_queue_start, .tx_queue_stop = cxgbe_dev_tx_queue_stop, -- 2.5.3
[dpdk-dev] [PATCH 4/6] cxgbe: Update rx path to receive jumbo frames
Ensure jumbo mode is enabled and that the mbuf data room size can accommodate jumbo size. If the mbuf data room size can't accommodate jumbo size, chain mbufs to jumbo size. Signed-off-by: Rahul Lakkireddy Signed-off-by: Kumar Sanghvi --- drivers/net/cxgbe/sge.c | 58 - 1 file changed, 53 insertions(+), 5 deletions(-) diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c index 921173a..91ef363 100644 --- a/drivers/net/cxgbe/sge.c +++ b/drivers/net/cxgbe/sge.c @@ -247,6 +247,29 @@ static inline bool fl_starving(const struct adapter *adapter, return fl->avail - fl->pend_cred <= s->fl_starve_thres; } +static inline unsigned int get_buf_size(struct adapter *adapter, + const struct rx_sw_desc *d) +{ + unsigned int rx_buf_size_idx = d->dma_addr & RX_BUF_SIZE; + unsigned int buf_size = 0; + + switch (rx_buf_size_idx) { + case RX_SMALL_MTU_BUF: + buf_size = FL_MTU_SMALL_BUFSIZE(adapter); + break; + + case RX_LARGE_MTU_BUF: + buf_size = FL_MTU_LARGE_BUFSIZE(adapter); + break; + + default: + BUG_ON(1); + /* NOT REACHED */ + } + + return buf_size; +} + /** * free_rx_bufs - free the Rx buffers on an SGE free list * @q: the SGE free list to free buffers from @@ -362,6 +385,14 @@ static unsigned int refill_fl_usembufs(struct adapter *adap, struct sge_fl *q, unsigned int buf_size_idx = RX_SMALL_MTU_BUF; struct rte_mbuf *buf_bulk[n]; int ret, i; + struct rte_pktmbuf_pool_private *mbp_priv; + u8 jumbo_en = rxq->rspq.eth_dev->data->dev_conf.rxmode.jumbo_frame; + + /* Use jumbo mtu buffers iff mbuf data room size can fit jumbo data. */ + mbp_priv = rte_mempool_get_priv(rxq->rspq.mb_pool); + if (jumbo_en && + ((mbp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM) >= 9000)) + buf_size_idx = RX_LARGE_MTU_BUF; ret = rte_mempool_get_bulk(rxq->rspq.mb_pool, (void *)buf_bulk, n); if (unlikely(ret != 0)) { @@ -1439,14 +1470,31 @@ static int process_responses(struct sge_rspq *q, int budget, const struct cpl_rx_pkt *cpl = (const void *)>cur_desc[1]; bool csum_ok = cpl->csum_calc && !cpl->err_vec; - struct rte_mbuf *pkt; - u32 len = ntohl(rc->pldbuflen_qid); + struct rte_mbuf *pkt, *npkt; + u32 len, bufsz; + len = ntohl(rc->pldbuflen_qid); BUG_ON(!(len & F_RSPD_NEWBUF)); pkt = rsd->buf; - pkt->data_len = G_RSPD_LEN(len); - pkt->pkt_len = pkt->data_len; - unmap_rx_buf(>fl); + npkt = pkt; + len = G_RSPD_LEN(len); + pkt->pkt_len = len; + + /* Chain mbufs into len if necessary */ + while (len) { + struct rte_mbuf *new_pkt = rsd->buf; + + bufsz = min(get_buf_size(q->adapter, rsd), len); + new_pkt->data_len = bufsz; + unmap_rx_buf(>fl); + len -= bufsz; + npkt->next = new_pkt; + npkt = new_pkt; + pkt->nb_segs++; + rsd = >fl.sdesc[rxq->fl.cidx]; + } + npkt->next = NULL; + pkt->nb_segs--; if (cpl->l2info & htonl(F_RXF_IP)) { pkt->packet_type = RTE_PTYPE_L3_IPV4; -- 2.5.3
[dpdk-dev] [PATCH 3/6] cxgbe: Update tx path to transmit jumbo frames
Add a non-coalesce path. Skip coalescing for Jumbo Frames, and send the packet through non-coalesced path if there are enough credits. Also, free these non-coalesced packets while reclaiming credits. Signed-off-by: Rahul Lakkireddy Signed-off-by: Kumar Sanghvi --- drivers/net/cxgbe/sge.c | 96 - 1 file changed, 64 insertions(+), 32 deletions(-) diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c index e540881..921173a 100644 --- a/drivers/net/cxgbe/sge.c +++ b/drivers/net/cxgbe/sge.c @@ -199,11 +199,20 @@ static void free_tx_desc(struct sge_txq *q, unsigned int n) static void reclaim_tx_desc(struct sge_txq *q, unsigned int n) { + struct tx_sw_desc *d; unsigned int cidx = q->cidx; + d = >sdesc[cidx]; while (n--) { - if (++cidx == q->size) + if (d->mbuf) { /* an SGL is present */ + rte_pktmbuf_free(d->mbuf); + d->mbuf = NULL; + } + ++d; + if (++cidx == q->size) { cidx = 0; + d = q->sdesc; + } } q->cidx = cidx; } @@ -1045,6 +1054,7 @@ int t4_eth_xmit(struct sge_eth_txq *txq, struct rte_mbuf *mbuf) u32 wr_mid; u64 cntrl, *end; bool v6; + u32 max_pkt_len = txq->eth_dev->data->dev_conf.rxmode.max_rx_pkt_len; /* Reject xmit if queue is stopped */ if (unlikely(txq->flags & EQ_STOPPED)) @@ -1060,6 +1070,10 @@ out_free: return 0; } + if ((!(m->ol_flags & PKT_TX_TCP_SEG)) && + (unlikely(m->pkt_len > max_pkt_len))) + goto out_free; + pi = (struct port_info *)txq->eth_dev->data->dev_private; adap = pi->adapter; @@ -1067,7 +1081,7 @@ out_free: /* align the end of coalesce WR to a 512 byte boundary */ txq->q.coalesce.max = (8 - (txq->q.pidx & 7)) * 8; - if (!(m->ol_flags & PKT_TX_TCP_SEG)) { + if (!((m->ol_flags & PKT_TX_TCP_SEG) || (m->pkt_len > ETHER_MAX_LEN))) { if (should_tx_packet_coalesce(txq, mbuf, , adap)) { if (unlikely(map_mbuf(mbuf, addr) < 0)) { dev_warn(adap, "%s: mapping err for coalesce\n", @@ -1114,33 +1128,46 @@ out_free: len = 0; len += sizeof(*cpl); - lso = (void *)(wr + 1); - v6 = (m->ol_flags & PKT_TX_IPV6) != 0; - l3hdr_len = m->l3_len; - l4hdr_len = m->l4_len; - eth_xtra_len = m->l2_len - ETHER_HDR_LEN; - len += sizeof(*lso); - wr->op_immdlen = htonl(V_FW_WR_OP(FW_ETH_TX_PKT_WR) | - V_FW_WR_IMMDLEN(len)); - lso->lso_ctrl = htonl(V_LSO_OPCODE(CPL_TX_PKT_LSO) | - F_LSO_FIRST_SLICE | F_LSO_LAST_SLICE | - V_LSO_IPV6(v6) | - V_LSO_ETHHDR_LEN(eth_xtra_len / 4) | - V_LSO_IPHDR_LEN(l3hdr_len / 4) | - V_LSO_TCPHDR_LEN(l4hdr_len / 4)); - lso->ipid_ofst = htons(0); - lso->mss = htons(m->tso_segsz); - lso->seqno_offset = htonl(0); - if (is_t4(adap->params.chip)) - lso->len = htonl(m->pkt_len); - else - lso->len = htonl(V_LSO_T5_XFER_SIZE(m->pkt_len)); - cpl = (void *)(lso + 1); - cntrl = V_TXPKT_CSUM_TYPE(v6 ? TX_CSUM_TCPIP6 : TX_CSUM_TCPIP) | - V_TXPKT_IPHDR_LEN(l3hdr_len) | - V_TXPKT_ETHHDR_LEN(eth_xtra_len); - txq->stats.tso++; - txq->stats.tx_cso += m->tso_segsz; + + /* Coalescing skipped and we send through normal path */ + if (!(m->ol_flags & PKT_TX_TCP_SEG)) { + wr->op_immdlen = htonl(V_FW_WR_OP(FW_ETH_TX_PKT_WR) | + V_FW_WR_IMMDLEN(len)); + cpl = (void *)(wr + 1); + if (m->ol_flags & PKT_TX_IP_CKSUM) { + cntrl = hwcsum(adap->params.chip, m) | + F_TXPKT_IPCSUM_DIS; + txq->stats.tx_cso++; + } + } else { + lso = (void *)(wr + 1); + v6 = (m->ol_flags & PKT_TX_IPV6) != 0; + l3hdr_len = m->l3_len; + l4hdr_len = m->l4_len; + eth_xtra_len = m->l2_len - ETHER_HDR_LEN; + len += sizeof(*lso); + wr->op_immdlen = htonl(V_FW_WR_OP(FW_ETH_TX_PKT_WR) | + V_FW_WR_IMMDLEN(len)); + lso->lso_ctrl = htonl(V_LSO_OPCODE(CPL_TX_PKT_LSO) | + F_LSO_FIRST_SLICE | F_LSO_LAST_SLICE | + V_LSO_IPV6(v6) | + V_LSO_ETHHDR_LEN(eth_xtra_len / 4) | +
[dpdk-dev] [PATCH 2/6] cxgbe: Update device info and perform sanity checks to enable jumbo frames
Increase max_rx_pktlen to accommodate jumbo frame size. Perform sanity checks and enable jumbo mode in rx queue setup. Set link mtu based on max_rx_pktlen. Signed-off-by: Rahul Lakkireddy Signed-off-by: Kumar Sanghvi --- drivers/net/cxgbe/cxgbe.h| 3 +++ drivers/net/cxgbe/cxgbe_ethdev.c | 23 +-- drivers/net/cxgbe/cxgbe_main.c | 3 ++- 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/drivers/net/cxgbe/cxgbe.h b/drivers/net/cxgbe/cxgbe.h index 97c37d2..adc0d92 100644 --- a/drivers/net/cxgbe/cxgbe.h +++ b/drivers/net/cxgbe/cxgbe.h @@ -43,6 +43,9 @@ #define CXGBE_DEFAULT_TX_DESC_SIZE1024 /* Default TX ring size */ #define CXGBE_DEFAULT_RX_DESC_SIZE1024 /* Default RX ring size */ +#define CXGBE_MIN_RX_BUFSIZE ETHER_MIN_MTU /* min buf size */ +#define CXGBE_MAX_RX_PKTLEN (9000 + ETHER_HDR_LEN + ETHER_CRC_LEN) /* max pkt */ + int cxgbe_probe(struct adapter *adapter); int cxgbe_up(struct adapter *adap); int cxgbe_down(struct port_info *pi); diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c index 478051a..6d7b29c 100644 --- a/drivers/net/cxgbe/cxgbe_ethdev.c +++ b/drivers/net/cxgbe/cxgbe_ethdev.c @@ -141,8 +141,8 @@ static void cxgbe_dev_info_get(struct rte_eth_dev *eth_dev, struct adapter *adapter = pi->adapter; int max_queues = adapter->sge.max_ethqsets / adapter->params.nports; - device_info->min_rx_bufsize = 68; /* XXX: Smallest pkt size */ - device_info->max_rx_pktlen = 1500; /* XXX: For now we support mtu */ + device_info->min_rx_bufsize = CXGBE_MIN_RX_BUFSIZE; + device_info->max_rx_pktlen = CXGBE_MAX_RX_PKTLEN; device_info->max_rx_queues = max_queues; device_info->max_tx_queues = max_queues; device_info->max_mac_addrs = 1; @@ -498,6 +498,8 @@ static int cxgbe_dev_rx_queue_setup(struct rte_eth_dev *eth_dev, int err = 0; int msi_idx = 0; unsigned int temp_nb_desc; + struct rte_eth_dev_info dev_info; + unsigned int pkt_len = eth_dev->data->dev_conf.rxmode.max_rx_pkt_len; RTE_SET_USED(rx_conf); @@ -505,6 +507,17 @@ static int cxgbe_dev_rx_queue_setup(struct rte_eth_dev *eth_dev, __func__, eth_dev->data->nb_rx_queues, queue_idx, nb_desc, socket_id, mp); + cxgbe_dev_info_get(eth_dev, _info); + + /* Must accommodate at least ETHER_MIN_MTU */ + if ((pkt_len < dev_info.min_rx_bufsize) || + (pkt_len > dev_info.max_rx_pktlen)) { + dev_err(adap, "%s: max pkt len must be > %d and <= %d\n", + __func__, dev_info.min_rx_bufsize, + dev_info.max_rx_pktlen); + return -EINVAL; + } + /* Free up the existing queue */ if (eth_dev->data->rx_queues[queue_idx]) { cxgbe_dev_rx_queue_release(eth_dev->data->rx_queues[queue_idx]); @@ -534,6 +547,12 @@ static int cxgbe_dev_rx_queue_setup(struct rte_eth_dev *eth_dev, if ((>fl) != NULL) rxq->fl.size = temp_nb_desc; + /* Set to jumbo mode if necessary */ + if (pkt_len > ETHER_MAX_LEN) + eth_dev->data->dev_conf.rxmode.jumbo_frame = 1; + else + eth_dev->data->dev_conf.rxmode.jumbo_frame = 0; + err = t4_sge_alloc_rxq(adapter, >rspq, false, eth_dev, msi_idx, >fl, t4_ethrx_handler, t4_get_mps_bg_map(adapter, pi->tx_chan), mp, diff --git a/drivers/net/cxgbe/cxgbe_main.c b/drivers/net/cxgbe/cxgbe_main.c index 316b87d..aff23d0 100644 --- a/drivers/net/cxgbe/cxgbe_main.c +++ b/drivers/net/cxgbe/cxgbe_main.c @@ -855,12 +855,13 @@ int link_start(struct port_info *pi) { struct adapter *adapter = pi->adapter; int ret; + unsigned int mtu = pi->eth_dev->data->dev_conf.rxmode.max_rx_pkt_len; /* * We do not set address filters and promiscuity here, the stack does * that step explicitly. */ - ret = t4_set_rxmode(adapter, adapter->mbox, pi->viid, 1500, -1, -1, + ret = t4_set_rxmode(adapter, adapter->mbox, pi->viid, mtu, -1, -1, -1, 1, true); if (ret == 0) { ret = t4_change_mac(adapter, adapter->mbox, pi->viid, -- 2.5.3
[dpdk-dev] [PATCH 1/6] cxgbe: Optimize forwarding performance for 40G
Update sge initialization with respect to free-list manager configuration and ingress arbiter. Also update refill logic to refill mbufs only after a certain threshold for rx. Optimize tx packet prefetch and free. Approx. 4 MPPS improvement seen in forwarding performance after the optimization. Signed-off-by: Rahul Lakkireddy Signed-off-by: Kumar Sanghvi --- drivers/net/cxgbe/base/t4_regs.h | 16 drivers/net/cxgbe/cxgbe_main.c | 7 +++ drivers/net/cxgbe/sge.c | 17 - 3 files changed, 35 insertions(+), 5 deletions(-) diff --git a/drivers/net/cxgbe/base/t4_regs.h b/drivers/net/cxgbe/base/t4_regs.h index cd28b59..9057e40 100644 --- a/drivers/net/cxgbe/base/t4_regs.h +++ b/drivers/net/cxgbe/base/t4_regs.h @@ -266,6 +266,18 @@ #define A_SGE_FL_BUFFER_SIZE2 0x104c #define A_SGE_FL_BUFFER_SIZE3 0x1050 +#define A_SGE_FLM_CFG 0x1090 + +#define S_CREDITCNT4 +#define M_CREDITCNT0x3U +#define V_CREDITCNT(x) ((x) << S_CREDITCNT) +#define G_CREDITCNT(x) (((x) >> S_CREDITCNT) & M_CREDITCNT) + +#define S_CREDITCNTPACKING2 +#define M_CREDITCNTPACKING0x3U +#define V_CREDITCNTPACKING(x) ((x) << S_CREDITCNTPACKING) +#define G_CREDITCNTPACKING(x) (((x) >> S_CREDITCNTPACKING) & M_CREDITCNTPACKING) + #define A_SGE_CONM_CTRL 0x1094 #define S_EGRTHRESHOLD8 @@ -361,6 +373,10 @@ #define A_SGE_CONTROL2 0x1124 +#define S_IDMAARBROUNDROBIN19 +#define V_IDMAARBROUNDROBIN(x) ((x) << S_IDMAARBROUNDROBIN) +#define F_IDMAARBROUNDROBINV_IDMAARBROUNDROBIN(1U) + #define S_INGPACKBOUNDARY16 #define M_INGPACKBOUNDARY0x7U #define V_INGPACKBOUNDARY(x) ((x) << S_INGPACKBOUNDARY) diff --git a/drivers/net/cxgbe/cxgbe_main.c b/drivers/net/cxgbe/cxgbe_main.c index 3755444..316b87d 100644 --- a/drivers/net/cxgbe/cxgbe_main.c +++ b/drivers/net/cxgbe/cxgbe_main.c @@ -422,6 +422,13 @@ static int adap_init0_tweaks(struct adapter *adapter) t4_set_reg_field(adapter, A_SGE_CONTROL, V_PKTSHIFT(M_PKTSHIFT), V_PKTSHIFT(rx_dma_offset)); + t4_set_reg_field(adapter, A_SGE_FLM_CFG, +V_CREDITCNT(M_CREDITCNT) | M_CREDITCNTPACKING, +V_CREDITCNT(3) | V_CREDITCNTPACKING(1)); + + t4_set_reg_field(adapter, A_SGE_CONTROL2, V_IDMAARBROUNDROBIN(1U), +V_IDMAARBROUNDROBIN(1U)); + /* * Don't include the "IP Pseudo Header" in CPL_RX_PKT checksums: Linux * adds the pseudo header itself. diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c index 6eb1244..e540881 100644 --- a/drivers/net/cxgbe/sge.c +++ b/drivers/net/cxgbe/sge.c @@ -286,8 +286,7 @@ static void unmap_rx_buf(struct sge_fl *q) static inline void ring_fl_db(struct adapter *adap, struct sge_fl *q) { - /* see if we have exceeded q->size / 4 */ - if (q->pend_cred >= (q->size / 4)) { + if (q->pend_cred >= 64) { u32 val = adap->params.arch.sge_fl_db; if (is_t4(adap->params.chip)) @@ -995,7 +994,14 @@ static inline int tx_do_packet_coalesce(struct sge_eth_txq *txq, int i; for (i = 0; i < sd->coalesce.idx; i++) { - rte_pktmbuf_free(sd->coalesce.mbuf[i]); + struct rte_mbuf *tmp = sd->coalesce.mbuf[i]; + + do { + struct rte_mbuf *next = tmp->next; + + rte_pktmbuf_free_seg(tmp); + tmp = next; + } while (tmp); sd->coalesce.mbuf[i] = NULL; } } @@ -1054,7 +1060,6 @@ out_free: return 0; } - rte_prefetch0(&((>q)->sdesc->mbuf->pool)); pi = (struct port_info *)txq->eth_dev->data->dev_private; adap = pi->adapter; @@ -1070,6 +1075,7 @@ out_free: txq->stats.mapping_err++; goto out_free; } + rte_prefetch0((volatile void *)addr); return tx_do_packet_coalesce(txq, mbuf, cflits, adap, pi, addr); } else { @@ -1454,7 +1460,8 @@ static int process_responses(struct sge_rspq *q, int budget, unsigned int params; u32 val; - __refill_fl(q->adapter, >fl); + if (fl_cap(>fl) - rxq->fl.avail >= 64) + __refill_fl(q->adapter, >fl); params = V_QINTR_TIMER_IDX(X_TIMERREG_UPDATE_CIDX); q->next_intr_params = params; val = V_CIDXINC(cidx_inc) | V_SEINTARM(params); -- 2.5.3
[dpdk-dev] [PATCH 0/6] cxgbe: Optimize tx/rx for 40GbE and add Jumbo Frame support for CXGBE PMD
This series of patches improve forwarding performance for Chelsio T5 40GbE cards and add Jumbo Frame support for cxgbe pmd. Also update documentation and release notes. Rahul Lakkireddy (6): cxgbe: Optimize forwarding performance for 40G cxgbe: Update device info and perform sanity checks to enable jumbo frames cxgbe: Update tx path to transmit jumbo frames cxgbe: Update rx path to receive jumbo frames cxgbe: Allow apps to change mtu doc: Update cxgbe documentation and release notes doc/guides/nics/cxgbe.rst| 81 - doc/guides/rel_notes/release_2_2.rst | 5 + drivers/net/cxgbe/base/t4_regs.h | 16 drivers/net/cxgbe/cxgbe.h| 3 + drivers/net/cxgbe/cxgbe_ethdev.c | 52 ++- drivers/net/cxgbe/cxgbe_main.c | 10 +- drivers/net/cxgbe/sge.c | 171 ++- 7 files changed, 268 insertions(+), 70 deletions(-) -- 2.5.3
[dpdk-dev] [PATCH] devargs: add blacklisting by linux interface name
> -Original Message- > From: Charles (Chas) Williams [mailto:3chas3 at gmail.com] > > On Fri, 2015-10-02 at 16:18 +0100, Bruce Richardson wrote: > > On Fri, Oct 02, 2015 at 11:00:07AM -0400, Chas Williams wrote: > > > If a system is using deterministic interface names, it may be easier > > > in some cases to use the interface name to blacklist an interface. > > > > > > > Is it possible to do this using the existing arguments, i.e. have the > > -b flag detect if it's a pci address or name automatically, rather > > than having to use a separate command-line arg for it? > > You might be able to distinguish names by context. I doubt interface > names ever look like PCI addresses. But that's going to be a bigger > change since -b will need to be updated to 'blacklist' intead of 'pci- > blacklist' to prevent confusion. Or do you just want to overload '-b' and > keep both long options? > I'm not sure about that, to be honest. However, I'd rather not have too many cmd line options to be maintained in the code. Does you proposed blacklisting patch work with non-pci devices as well as with PCI ones as now? /Bruce
[dpdk-dev] i40e SRIOV and dpdk
Hello, I was wondering if anybody has tested and managed to get working the following scenario: Host with 2 NIC: 1 Niantic 82599 and 1 Fortville i40e, both NICs configured with 8 VFs. There are two VMs running on this host, each VM has two SRIOV interfaces 1 for each type (Niantic based VF and Fortville based VF). An application inside of VM uses dpdk for packet forwarding and SRIOV interfaces are bound to dpdk 2.0. Unicast connectivity between VMs via Niantic based SRIOV works, but fails over Fortville based SRIOV. Application uses the same DPDK API to initialize SRIOV interfaces. When interfaces in VM are converted to use ixgbevf and i40evf kernel drivers, connectivity works for both types of VF. Clearly there is a particularity on i40e dpdk driver. I would greatly appreciate some feedback. Thank you Serguei
[dpdk-dev] DPDK install behavior Question
Hi, Working with the patchset to include new features to make install, some questions I missed to ask before: for example if you use only "make install": "T" variable is going to get "*" value and the makefiles are going to build all TARGETS, Is it an expected behavior?, and if you use "make install T=TARGET" (e.g. make install T=x86_64-native-linuxapp-gcc) the makefiles are going to config, build and install dpdk, however the target just say "install", and is doing config and build again. is it and expected behavior too?. Thank you so much. Mario.
[dpdk-dev] [PATCH v3 8/8] mk: Add rule for installing runtime files
Anwser for [PATCH v3 8/8] mk: Add rule for installing runtime files Hi Panu, Thank you for taking time in this revision :),. In this patchset I?ve tried to keep current behavior (make install) untouched, I mean thye don't affect the current makefile rules and they work like "new features". For that reason, they were created as new rules. Now you can do the next. 1) make config T=TARGET (Create a build directory with config files according TARGET and directoy environment) 2) make (build dpdk binaries) and in this point, if you chose some new rule from serie of patches (install-sdk, install-doc, install-bin... etc) the files that were built in the last step will be installed in paths according this site http://www.freedesktop.org/software/systemd/man/file-hierarchy.html this just will be possibe if build/.config exist. 3) However if you use last rules, they should have the previos behavior before patches. example: make install T=x86_64-native-linuxapp-gcc then the makefiles are going to config, build and install dpdk in a directory using TARGET as a name. thanks. Mario. From: Panu Matilainen [pmati...@redhat.com] Sent: Friday, October 02, 2015 4:15 AM To: Arevalo, Mario Alfredo C; dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH v3 8/8] mk: Add rule for installing runtime files On 10/01/2015 03:11 AM, Mario Carrillo wrote: > Add hierarchy-file support to the DPDK libraries, modules, > binary files, nic bind files and documentation, > when invoking "make install-fhs" (filesystem hierarchy standard) > runtime files will be by default installed in: > $(DESTDIR)/$(BIN_DIR) where BIN_DIR=/usr/bin (binary files) > $(DESTDIR)/$(SBIN_DIR) where SBIN_DIR=/usr/sbin/dpdk_nic_bind (nic bind > files) > $(DESTDIR)/$(DOC_DIR) where DOC_DIR=/usr/share/doc/dpdk (documentation) > $(DESTDIR)/$(LIB_DIR) (libraries) > if the architecture is 64 bits then LIB_DIR=/usr/lib64 > else LIB_DIR=/usr/lib > $(DESTDIR)/$(KERNEL_DIR) (modules) > if RTE_EXEC_ENV=linuxapp then > KERNEL_DIR=/lib/modules/$(uname -r)/build > else KERNEL_DIR=/boot/modules > All directory variables mentioned above can be overridden. > This hierarchy is based on: > http://www.freedesktop.org/software/systemd/man/file-hierarchy.html > Hmm, I think there's a slight misunderstanding here. What I meant earlier by install-sdk and install-fhs is to preserve the current behavior of "make install" as "make install-sdk" and have "make install-fhs" behave like normal OSS app on "make install", which installs everything (both devel and runtime parts) This patch series eliminates the current behavior of "make install" entirely. I personally would not miss it at all, but there likely are people relying on it since its quite visibly documented and all. So I think the idea was to introduce a separate FHS-installation target and then deal with the notion of default behaviors etc separately. I guess it was already this way in v2 of the series, apologies for missing it there. - Panu -
[dpdk-dev] [PATCH v3 6/8] mk: Add rule for installing nic bind files
Hi, Panu and Bruce, sounds good your suggestion :) , then I'm going to change the patch for installing dpdk_nic_bind.py and cpu_layout.py in /usr/bin Thanks. Mario. From: Richardson, Bruce Sent: Friday, October 02, 2015 3:54 AM To: Panu Matilainen; Arevalo, Mario Alfredo C; dev at dpdk.org Subject: RE: [dpdk-dev] [PATCH v3 6/8] mk: Add rule for installing nic bind files > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu Matilainen > Sent: Friday, October 2, 2015 11:50 AM > To: Arevalo, Mario Alfredo C; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3 6/8] mk: Add rule for installing nic > bind files > > On 10/01/2015 03:11 AM, Mario Carrillo wrote: > > Add hierarchy-file support to the DPDK nic bind files, when invoking > > "make install-sbin" nic bind files will be installed by default in: > > $(DESTDIR)/$(SBIN_DIR) where SBIN_DIR=/usr/sbin/dpdk_nic_bind by > > default, you can override SBIN_DIR var. > > This hierarchy is based on: > > http://www.freedesktop.org/software/systemd/man/file-hierarchy.html > > and dpdk spec file. > > > > Signed-off-by: Mario Carrillo > > --- > > mk/rte.sdkinstall.mk | 14 ++ > > mk/rte.sdkroot.mk| 4 ++-- > > 2 files changed, 16 insertions(+), 2 deletions(-) > > > > diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk index > > 5a2fd40..4eecf31 100644 > > --- a/mk/rte.sdkinstall.mk > > +++ b/mk/rte.sdkinstall.mk > > @@ -46,11 +46,13 @@ else > > INCLUDE_DIR ?= /usr/include/dpdk > > BIN_DIR ?= /usr/bin > > DOC_DIR ?= /usr/share/doc/dpdk > > +SBIN_DIR ?= /usr/sbin/dpdk_nic_bind > > HSLINKS := $(wildcard $(RTE_OUTPUT)/include/*) > > BINARY_FILES := $(patsubst %.map,,$(wildcard $(RTE_OUTPUT)/app/*)) > > LIBS := $(wildcard $(RTE_OUTPUT)/lib/*) > > MODULES := $(wildcard $(RTE_OUTPUT)/kmod/*) > > DOCS := $(wildcard $(BUILD_DIR)/doc/*) > > +NIC_BIND_FILES := $(wildcard $(BUILD_DIR)/tools/*nic_bind.py) > > include $(BUILD_DIR)/build/.config > > RTE_ARCH := $(CONFIG_RTE_ARCH:"%"=%) > > RTE_EXEC_ENV := $(CONFIG_RTE_EXEC_ENV:"%"=%) @@ -161,6 +163,18 @@ > > install-doc: > > echo installing: $$DOC; \ > > done > > # > > +# install nic bind files in /usr/sbin/dpdk_nic_bind # by default > > +SBIN_DIR can be overridden. > > +# > > This creates an out-of-path directory /usr/sbin/dpdk_nic_bind/ in which > the dpdk_nic_bind.py is installed. Besides not being a very accessible > location, the FHS explicitly forbids creation of subdirectories below > /usr/[s]bin. > > SBIN_DIR should be /usr/sbin unless overridden, but OTOH I think this > could go into /usr/bin just as well, the split is fairly ambiguous anyway > (I mean, testpmd is not something a regular user is going to run > either) > > In addition, if dpdk_nic_bind.py is installed then perhaps the > cpu_layout.py utility should be installed too? > > - Panu - I think there are better utilities available for determining the core layout that cpu_layout.py. "lstopo", for one, is much more powerful. Do we want/need to keep our own script around for that? /Bruce
[dpdk-dev] [PATCH v3 4/8] mk: Add rule for installing modules
Hi, Panu, perfect thank you for your feedback, I going to change that path by /lib/modules/$(uname -r)/extra/ in my patch. Thanks. Mario. From: Panu Matilainen [pmati...@redhat.com] Sent: Friday, October 02, 2015 3:38 AM To: Arevalo, Mario Alfredo C; dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH v3 4/8] mk: Add rule for installing modules On 10/01/2015 03:11 AM, Mario Carrillo wrote: > Add hierarchy-file support to the DPDK modules, > when invoking "make install-mod" modules will be > installed in: $(DESTDIR)/$(KERNEL_DIR) > if RTE_EXEC_ENV=linuxapp then > KERNEL_DIR=/lib/modules/$(uname -r)/build /lib/modules/$(uname -r)/build is the path you need when *building* external kernel modules, you dont want to install anything there. The default install path for the kernel modules should be somewhere within /lib/modules/$(uname -r)/extra/, but dunno what the recommended naming/placing within that is. Sorry for not catching this earlier, - Panu -
[dpdk-dev] [PATCH] devargs: add blacklisting by linux interface name
On Fri, Oct 02, 2015 at 11:00:07AM -0400, Chas Williams wrote: > If a system is using deterministic interface names, it may be easier in > some cases to use the interface name to blacklist an interface. > Is it possible to do this using the existing arguments, i.e. have the -b flag detect if it's a pci address or name automatically, rather than having to use a separate command-line arg for it? /Bruce
[dpdk-dev] [PATCH v3] ring: add function to free a ring
On Fri, Oct 02, 2015 at 03:01:25PM +0100, Pablo de Lara wrote: > From: "Pablo de Lara" > > When creating a ring, a memzone is created to allocate it in memory, > but the ring could not be freed, as memzones could not be. > > Since memzones can be freed now, then rings can be as well, > taking into account if they were initialized using pre-allocated memory > (in which case, memory should be freed externally) or using > rte_memzone_reserve > (with rte_ring_create), freeing the memory with rte_memzone_free. > > Signed-off-by: Pablo de Lara > --- > Changes in v3: > - Simplify patch using stored memzone address in ring structure > - Change copyright date I think you need to call out that this patch depends upon http://dpdk.org/dev/patchwork/patch/7308/ > > Changes in v2: > - Include note in release notes > - Add error log when ring cannot be freed > > This patch depends on patch "rte_ring: store memzone pointer inside ring" > > doc/guides/rel_notes/release_2_2.rst | 4 +++ > lib/librte_ring/rte_ring.c | 47 > +++- > lib/librte_ring/rte_ring.h | 7 ++ > lib/librte_ring/rte_ring_version.map | 7 ++ > 4 files changed, 64 insertions(+), 1 deletion(-) > > diff --git a/doc/guides/rel_notes/release_2_2.rst > b/doc/guides/rel_notes/release_2_2.rst > index 5687676..24937ac 100644 > --- a/doc/guides/rel_notes/release_2_2.rst > +++ b/doc/guides/rel_notes/release_2_2.rst > @@ -4,6 +4,10 @@ DPDK Release 2.2 > New Features > > > +* **Enabled freeing of rte_ring.** > + > + New function rte_ring_free() allows the user to free a ring > + if it was created with rte_ring_create(). > > Resolved Issues > --- > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c > index 4e78e14..d80faf3 100644 > --- a/lib/librte_ring/rte_ring.c > +++ b/lib/librte_ring/rte_ring.c > @@ -1,7 +1,7 @@ > /*- > * BSD LICENSE > * > - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. > * All rights reserved. > * > * Redistribution and use in source and binary forms, with or without > @@ -209,6 +209,51 @@ rte_ring_create(const char *name, unsigned count, int > socket_id, > return r; > } > > +/* free the ring */ > +void > +rte_ring_free(struct rte_ring *r) > +{ > + struct rte_ring_list *ring_list = NULL; > + struct rte_tailq_entry *te; > + > + if (r == NULL) > + return; > + > + /* > + * Ring was not created with rte_ring_create, > + * therefore, there is no memzone to free. > + */ > + if (r->memzone == NULL) { > + RTE_LOG(ERR, RING, "Cannot free ring (not created with > rte_ring_create()"); > + return; > + } > + > + if (rte_memzone_free(r->memzone) != 0) { > + RTE_LOG(ERR, RING, "Cannot free memory\n"); > + return; > + } > + > + ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list); > + rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK); > + > + /* find out tailq entry */ > + TAILQ_FOREACH(te, ring_list, next) { > + if (te->data == (void *) r) > + break; > + } > + > + if (te == NULL) { > + rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK); > + return; > + } > + > + TAILQ_REMOVE(ring_list, te, next); > + > + rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK); > + > + rte_free(te); > +} > + > /* > * change the high water mark. If *count* is 0, water marking is > * disabled > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > index df45f3f..fb5a626 100644 > --- a/lib/librte_ring/rte_ring.h > +++ b/lib/librte_ring/rte_ring.h > @@ -304,6 +304,13 @@ int rte_ring_init(struct rte_ring *r, const char *name, > unsigned count, > */ > struct rte_ring *rte_ring_create(const char *name, unsigned count, >int socket_id, unsigned flags); > +/** > + * De-allocate all memory used by the ring. > + * > + * @param r > + * Ring to free > + */ > +void rte_ring_free(struct rte_ring *r); > > /** > * Change the high water mark. > diff --git a/lib/librte_ring/rte_ring_version.map > b/lib/librte_ring/rte_ring_version.map > index 982fdd1..5474b98 100644 > --- a/lib/librte_ring/rte_ring_version.map > +++ b/lib/librte_ring/rte_ring_version.map > @@ -11,3 +11,10 @@ DPDK_2.0 { > > local: *; > }; > + > +DPDK_2.2 { > + global: > + > + rte_ring_free; > + > +} DPDK_2.0; > -- > 2.4.3 >
[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance
On Fri, Oct 02, 2015 at 05:00:14PM +0300, Michael S. Tsirkin wrote: > On Thu, Oct 01, 2015 at 02:02:24PM -0700, Alexander Duyck wrote: > > validation and translation would add 10s if not 100s of nanoseconds to the > > time needed to process each packet. In addition we are talking about doing > > this in kernel space which means we wouldn't really be able to take > > advantage of things like SSE or AVX instructions. > > Yes. But the nice thing is that it's rearming so it can happen on > a separate core, in parallel with packet processing. > It does not need to add to latency. > > You will burn up more CPU, but again, all this for boxes/hypervisors > without an IOMMU. > > I'm sure people can come up with even better approaches, once enough > people get it that kernel absolutely needs to be protected from > userspace. > > Long term, the right thing to do is to focus on IOMMU support. This > gives you hardware-based memory protection without need to burn up CPU > cycles. > > -- > MST Running it on another will have it's own problems. The main one that springs to mind for me is the performance impact of having all those cache lines shared between the two cores. /Bruce
[dpdk-dev] [PATCH v3] ring: add function to free a ring
From: "Pablo de Lara"When creating a ring, a memzone is created to allocate it in memory, but the ring could not be freed, as memzones could not be. Since memzones can be freed now, then rings can be as well, taking into account if they were initialized using pre-allocated memory (in which case, memory should be freed externally) or using rte_memzone_reserve (with rte_ring_create), freeing the memory with rte_memzone_free. Signed-off-by: Pablo de Lara --- Changes in v3: - Simplify patch using stored memzone address in ring structure - Change copyright date Changes in v2: - Include note in release notes - Add error log when ring cannot be freed This patch depends on patch "rte_ring: store memzone pointer inside ring" doc/guides/rel_notes/release_2_2.rst | 4 +++ lib/librte_ring/rte_ring.c | 47 +++- lib/librte_ring/rte_ring.h | 7 ++ lib/librte_ring/rte_ring_version.map | 7 ++ 4 files changed, 64 insertions(+), 1 deletion(-) diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst index 5687676..24937ac 100644 --- a/doc/guides/rel_notes/release_2_2.rst +++ b/doc/guides/rel_notes/release_2_2.rst @@ -4,6 +4,10 @@ DPDK Release 2.2 New Features +* **Enabled freeing of rte_ring.** + + New function rte_ring_free() allows the user to free a ring + if it was created with rte_ring_create(). Resolved Issues --- diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c index 4e78e14..d80faf3 100644 --- a/lib/librte_ring/rte_ring.c +++ b/lib/librte_ring/rte_ring.c @@ -1,7 +1,7 @@ /*- * BSD LICENSE * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -209,6 +209,51 @@ rte_ring_create(const char *name, unsigned count, int socket_id, return r; } +/* free the ring */ +void +rte_ring_free(struct rte_ring *r) +{ + struct rte_ring_list *ring_list = NULL; + struct rte_tailq_entry *te; + + if (r == NULL) + return; + + /* +* Ring was not created with rte_ring_create, +* therefore, there is no memzone to free. +*/ + if (r->memzone == NULL) { + RTE_LOG(ERR, RING, "Cannot free ring (not created with rte_ring_create()"); + return; + } + + if (rte_memzone_free(r->memzone) != 0) { + RTE_LOG(ERR, RING, "Cannot free memory\n"); + return; + } + + ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list); + rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK); + + /* find out tailq entry */ + TAILQ_FOREACH(te, ring_list, next) { + if (te->data == (void *) r) + break; + } + + if (te == NULL) { + rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK); + return; + } + + TAILQ_REMOVE(ring_list, te, next); + + rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK); + + rte_free(te); +} + /* * change the high water mark. If *count* is 0, water marking is * disabled diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index df45f3f..fb5a626 100644 --- a/lib/librte_ring/rte_ring.h +++ b/lib/librte_ring/rte_ring.h @@ -304,6 +304,13 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count, */ struct rte_ring *rte_ring_create(const char *name, unsigned count, int socket_id, unsigned flags); +/** + * De-allocate all memory used by the ring. + * + * @param r + * Ring to free + */ +void rte_ring_free(struct rte_ring *r); /** * Change the high water mark. diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map index 982fdd1..5474b98 100644 --- a/lib/librte_ring/rte_ring_version.map +++ b/lib/librte_ring/rte_ring_version.map @@ -11,3 +11,10 @@ DPDK_2.0 { local: *; }; + +DPDK_2.2 { + global: + + rte_ring_free; + +} DPDK_2.0; -- 2.4.3
[dpdk-dev] [PATCH] devargs: add blacklisting by linux interface name
On Fri, 2015-10-02 at 16:44 +, Richardson, Bruce wrote: > > -Original Message- > > From: Charles (Chas) Williams [mailto:3chas3 at gmail.com] > > > > On Fri, 2015-10-02 at 16:18 +0100, Bruce Richardson wrote: > > > On Fri, Oct 02, 2015 at 11:00:07AM -0400, Chas Williams wrote: > > > > If a system is using deterministic interface names, it may be easier > > > > in some cases to use the interface name to blacklist an interface. > > > > > > > > > > Is it possible to do this using the existing arguments, i.e. have the > > > -b flag detect if it's a pci address or name automatically, rather > > > than having to use a separate command-line arg for it? > > > > You might be able to distinguish names by context. I doubt interface > > names ever look like PCI addresses. But that's going to be a bigger > > change since -b will need to be updated to 'blacklist' intead of 'pci- > > blacklist' to prevent confusion. Or do you just want to overload '-b' and > > keep both long options? > > > I'm not sure about that, to be honest. However, I'd rather not have > too many cmd line options to be maintained in the code. > > Does you proposed blacklisting patch work with non-pci devices as well > as with PCI ones as now? Unfortunately, the devargs API is rather PCI specific -- it takes a PCI device. Nothing prevents you from writing a device specific version of the devargs API though for your device class since the devargs list isn't static but checking for certain devargs wouldn't make sense in some cases. Checking to see if a USB device matched a blacklisted PCI device would be pointless. Other devices (like Xen or hyperv) have a net/ directory/link in their /sys entry that lets you determine an interface name. I think it's the same for USB ethernet devices -- I don't happen to have one to check.
[dpdk-dev] [PATCH v3] ring: add function to free a ring
Hi Bruce, > -Original Message- > From: Richardson, Bruce > Sent: Friday, October 02, 2015 3:10 PM > To: De Lara Guarch, Pablo > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3] ring: add function to free a ring > > On Fri, Oct 02, 2015 at 03:01:25PM +0100, Pablo de Lara wrote: > > From: "Pablo de Lara" > > > > When creating a ring, a memzone is created to allocate it in memory, > > but the ring could not be freed, as memzones could not be. > > > > Since memzones can be freed now, then rings can be as well, > > taking into account if they were initialized using pre-allocated memory > > (in which case, memory should be freed externally) or using > rte_memzone_reserve > > (with rte_ring_create), freeing the memory with rte_memzone_free. > > > > Signed-off-by: Pablo de Lara > > --- > > Changes in v3: > > - Simplify patch using stored memzone address in ring structure > > - Change copyright date > > I think you need to call out that this patch depends upon > http://dpdk.org/dev/patchwork/patch/7308/ I did below, probably I should have included the patch ID :S > > > > > Changes in v2: > > - Include note in release notes > > - Add error log when ring cannot be freed > > > > This patch depends on patch "rte_ring: store memzone pointer inside ring" > > > > doc/guides/rel_notes/release_2_2.rst | 4 +++ > > lib/librte_ring/rte_ring.c | 47 > +++- > > lib/librte_ring/rte_ring.h | 7 ++ > > lib/librte_ring/rte_ring_version.map | 7 ++ > > 4 files changed, 64 insertions(+), 1 deletion(-) > > > > diff --git a/doc/guides/rel_notes/release_2_2.rst > b/doc/guides/rel_notes/release_2_2.rst > > index 5687676..24937ac 100644 > > --- a/doc/guides/rel_notes/release_2_2.rst > > +++ b/doc/guides/rel_notes/release_2_2.rst > > @@ -4,6 +4,10 @@ DPDK Release 2.2 > > New Features > > > > > > +* **Enabled freeing of rte_ring.** > > + > > + New function rte_ring_free() allows the user to free a ring > > + if it was created with rte_ring_create(). > > > > Resolved Issues > > --- > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c > > index 4e78e14..d80faf3 100644 > > --- a/lib/librte_ring/rte_ring.c > > +++ b/lib/librte_ring/rte_ring.c > > @@ -1,7 +1,7 @@ > > /*- > > * BSD LICENSE > > * > > - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > > + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. > > * All rights reserved. > > * > > * Redistribution and use in source and binary forms, with or without > > @@ -209,6 +209,51 @@ rte_ring_create(const char *name, unsigned > count, int socket_id, > > return r; > > } > > > > +/* free the ring */ > > +void > > +rte_ring_free(struct rte_ring *r) > > +{ > > + struct rte_ring_list *ring_list = NULL; > > + struct rte_tailq_entry *te; > > + > > + if (r == NULL) > > + return; > > + > > + /* > > +* Ring was not created with rte_ring_create, > > +* therefore, there is no memzone to free. > > +*/ > > + if (r->memzone == NULL) { > > + RTE_LOG(ERR, RING, "Cannot free ring (not created with > rte_ring_create()"); > > + return; > > + } > > + > > + if (rte_memzone_free(r->memzone) != 0) { > > + RTE_LOG(ERR, RING, "Cannot free memory\n"); > > + return; > > + } > > + > > + ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list); > > + rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK); > > + > > + /* find out tailq entry */ > > + TAILQ_FOREACH(te, ring_list, next) { > > + if (te->data == (void *) r) > > + break; > > + } > > + > > + if (te == NULL) { > > + rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK); > > + return; > > + } > > + > > + TAILQ_REMOVE(ring_list, te, next); > > + > > + rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK); > > + > > + rte_free(te); > > +} > > + > > /* > > * change the high water mark. If *count* is 0, water marking is > > * disabled > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > > index df45f3f..fb5a626 100644 > > --- a/lib/librte_ring/rte_ring.h > > +++ b/lib/librte_ring/rte_ring.h > > @@ -304,6 +304,13 @@ int rte_ring_init(struct rte_ring *r, const char > *name, unsigned count, > > */ > > struct rte_ring *rte_ring_create(const char *name, unsigned count, > > int socket_id, unsigned flags); > > +/** > > + * De-allocate all memory used by the ring. > > + * > > + * @param r > > + * Ring to free > > + */ > > +void rte_ring_free(struct rte_ring *r); > > > > /** > > * Change the high water mark. > > diff --git a/lib/librte_ring/rte_ring_version.map > b/lib/librte_ring/rte_ring_version.map > > index 982fdd1..5474b98 100644 > > --- a/lib/librte_ring/rte_ring_version.map > > +++ b/lib/librte_ring/rte_ring_version.map > > @@ -11,3 +11,10 @@ DPDK_2.0 { > > > > local: *; > > }; > > + > >
[dpdk-dev] [PATCH] ethdev: distinguish between drop and error stats
Make a distniction between dropped packets and error statistics to allow a higher level fault management entity to interact with DPDK and take appropriate measures when errors are detected. It will also provide valuable information for any applications that collects/extracts DPDK stats, such applications include Open vSwitch. After this patch the distinction is: ierrors = Total number of packets dropped by hardware (malformed packets, ...) Where the # of drops can ONLY be <= the packets received (without overlap between registers). Rx_pkt_errors = Total number of erroneous received packets. Where the # of errors can be >= the packets received (without overlap between registers), this is because there may be multiple errors associated with a packet. Signed-off-by: Maryam Tahhan --- lib/librte_ether/rte_ethdev.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 8a8c82b..53dd55d 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -200,8 +200,9 @@ struct rte_eth_stats { /**< Deprecated; Total of RX packets with CRC error. */ uint64_t ibadlen; /**< Deprecated; Total of RX packets with bad length. */ - uint64_t ierrors; /**< Total number of erroneous received packets. */ + uint64_t ierrors; /**< Total number of dropped received packets. */ uint64_t oerrors; /**< Total number of failed transmitted packets. */ + uint64_t ipkterrors; /**< Total number of erroneous received packets. */ uint64_t imcasts; /**< Deprecated; Total number of multicast received packets. */ uint64_t rx_nombuf; /**< Total number of RX mbuf allocation failures. */ -- 2.4.3
[dpdk-dev] [PATCH v3 4/8] mk: Add rule for installing modules
On 10/01/2015 03:11 AM, Mario Carrillo wrote: > Add hierarchy-file support to the DPDK modules, > when invoking "make install-mod" modules will be > installed in: $(DESTDIR)/$(KERNEL_DIR) > if RTE_EXEC_ENV=linuxapp then > KERNEL_DIR=/lib/modules/$(uname -r)/build /lib/modules/$(uname -r)/build is the path you need when *building* external kernel modules, you dont want to install anything there. The default install path for the kernel modules should be somewhere within /lib/modules/$(uname -r)/extra/, but dunno what the recommended naming/placing within that is. Sorry for not catching this earlier, - Panu -
[dpdk-dev] [PATCH] devargs: add blacklisting by linux interface name
On Fri, 2015-10-02 at 16:18 +0100, Bruce Richardson wrote: > On Fri, Oct 02, 2015 at 11:00:07AM -0400, Chas Williams wrote: > > If a system is using deterministic interface names, it may be easier in > > some cases to use the interface name to blacklist an interface. > > > > Is it possible to do this using the existing arguments, i.e. have the -b flag > detect if it's a pci address or name automatically, rather than having to use > a separate command-line arg for it? You might be able to distinguish names by context. I doubt interface names ever look like PCI addresses. But that's going to be a bigger change since -b will need to be updated to 'blacklist' intead of 'pci-blacklist' to prevent confusion. Or do you just want to overload '-b' and keep both long options?
[dpdk-dev] [PATCH v3 8/8] mk: Add rule for installing runtime files
On Fri, Oct 02, 2015 at 02:15:29PM +0300, Panu Matilainen wrote: > On 10/01/2015 03:11 AM, Mario Carrillo wrote: > >Add hierarchy-file support to the DPDK libraries, modules, > >binary files, nic bind files and documentation, > >when invoking "make install-fhs" (filesystem hierarchy standard) > >runtime files will be by default installed in: > >$(DESTDIR)/$(BIN_DIR) where BIN_DIR=/usr/bin (binary files) > >$(DESTDIR)/$(SBIN_DIR) where SBIN_DIR=/usr/sbin/dpdk_nic_bind (nic bind > >files) > >$(DESTDIR)/$(DOC_DIR) where DOC_DIR=/usr/share/doc/dpdk (documentation) > >$(DESTDIR)/$(LIB_DIR) (libraries) > >if the architecture is 64 bits then LIB_DIR=/usr/lib64 > >else LIB_DIR=/usr/lib > >$(DESTDIR)/$(KERNEL_DIR) (modules) > >if RTE_EXEC_ENV=linuxapp then > >KERNEL_DIR=/lib/modules/$(uname -r)/build > >else KERNEL_DIR=/boot/modules > >All directory variables mentioned above can be overridden. > >This hierarchy is based on: > >http://www.freedesktop.org/software/systemd/man/file-hierarchy.html > > > > Hmm, I think there's a slight misunderstanding here. > > What I meant earlier by install-sdk and install-fhs is to preserve the > current behavior of "make install" as "make install-sdk" and have "make > install-fhs" behave like normal OSS app on "make install", which installs > everything (both devel and runtime parts) > > This patch series eliminates the current behavior of "make install" > entirely. I personally would not miss it at all, but there likely are people > relying on it since its quite visibly documented and all. So I think the > idea was to introduce a separate FHS-installation target and then deal with > the notion of default behaviors etc separately. > > I guess it was already this way in v2 of the series, apologies for missing > it there. > > - Panu - I also think that having some way to get the old behaviour for those relying on it would be good. Even though it's not ABI affecting, for those compiling from source it would be nice to follow some sort of gradual deprecation process rather than just changing everything in one go. /Bruce
[dpdk-dev] [PATCH 3/3] Modifying configuration scripts for Netronome's nfp_uio driver.
From: "Alejandro.Lucero"Signed-off-by: Alejandro.Lucero Signed-off-by: Rolf.Neugebauer --- tools/dpdk_nic_bind.py |8 ++-- tools/setup.sh | 122 ++-- 2 files changed, 101 insertions(+), 29 deletions(-) diff --git a/tools/dpdk_nic_bind.py b/tools/dpdk_nic_bind.py index b7bd877..f7f8a39 100755 --- a/tools/dpdk_nic_bind.py +++ b/tools/dpdk_nic_bind.py @@ -43,7 +43,7 @@ ETHERNET_CLASS = "0200" # Each device within this is itself a dictionary of device properties devices = {} # list of supported DPDK drivers -dpdk_drivers = [ "igb_uio", "vfio-pci", "uio_pci_generic" ] +dpdk_drivers = [ "igb_uio", "vfio-pci", "uio_pci_generic", "nfp_uio" ] # command-line arg flags b_flag = None @@ -153,7 +153,7 @@ def find_module(mod): return path def check_modules(): -'''Checks that igb_uio is loaded''' +'''Checks that at least one dpdk module is loaded''' global dpdk_drivers fd = file("/proc/modules") @@ -261,7 +261,7 @@ def get_nic_details(): devices[d]["Active"] = "*Active*" break; -# add igb_uio to list of supporting modules if needed +# add module to list of supporting modules if needed if "Module_str" in devices[d]: for driver in dpdk_drivers: if driver not in devices[d]["Module_str"]: @@ -440,7 +440,7 @@ def display_devices(title, dev_list, extra_params = None): def show_status(): '''Function called when the script is passed the "--status" option. Displays -to the user what devices are bound to the igb_uio driver, the kernel driver +to the user what devices are bound to a dpdk driver, the kernel driver or to no driver''' global dpdk_drivers kernel_drv = [] diff --git a/tools/setup.sh b/tools/setup.sh index 5a8b2f3..e434ddb 100755 --- a/tools/setup.sh +++ b/tools/setup.sh @@ -236,6 +236,52 @@ load_vfio_module() } # +# Unloads nfp_uio.ko. +# +remove_nfp_uio_module() +{ + echo "Unloading any existing DPDK UIO module" + /sbin/lsmod | grep -s nfp_uio > /dev/null + if [ $? -eq 0 ] ; then + sudo /sbin/rmmod nfp_uio + fi +} + +# +# Loads new nfp_uio.ko (and uio module if needed). +# +load_nfp_uio_module() +{ + echo "Using RTE_SDK=$RTE_SDK and RTE_TARGET=$RTE_TARGET" + if [ ! -f $RTE_SDK/$RTE_TARGET/kmod/nfp_uio.ko ];then + echo "## ERROR: Target does not have the DPDK UIO Kernel Module." + echo " To fix, please try to rebuild target." + return + fi + + remove_nfp_uio_module + + /sbin/lsmod | grep -s uio > /dev/null + if [ $? -ne 0 ] ; then + modinfo uio > /dev/null + if [ $? -eq 0 ]; then + echo "Loading uio module" + sudo /sbin/modprobe uio + fi + fi + + # UIO may be compiled into kernel, so it may not be an error if it can't + # be loaded. + + echo "Loading DPDK UIO module" + sudo /sbin/insmod $RTE_SDK/$RTE_TARGET/kmod/nfp_uio.ko + if [ $? -ne 0 ] ; then + echo "## ERROR: Could not load kmod/nfp_uio.ko." + quit + fi +} + +# # Unloads the rte_kni.ko module. # remove_kni_module() @@ -427,10 +473,10 @@ grep_meminfo() # show_nics() { - if /sbin/lsmod | grep -q -e igb_uio -e vfio_pci; then + if /sbin/lsmod | grep -q -e igb_uio -e vfio_pci -e nfp_uio; then ${RTE_SDK}/tools/dpdk_nic_bind.py --status else - echo "# Please load the 'igb_uio' or 'vfio-pci' kernel module before " + echo "# Please load the 'igb_uio', 'vfio-pci' or 'nfp_uio' kernel module before " echo "# querying or adjusting NIC device bindings" fi } @@ -471,6 +517,23 @@ bind_nics_to_igb_uio() } # +# Uses dpdk_nic_bind.py to move devices to work with nfp_uio +# +bind_nics_to_nfp_uio() +{ + if /sbin/lsmod | grep -q nfp_uio ; then + ${RTE_SDK}/tools/dpdk_nic_bind.py --status + echo "" + echo -n "Enter PCI address of device to bind to NFP UIO driver: " + read PCI_PATH + sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b nfp_uio $PCI_PATH && echo "OK" + else + echo "# Please load the 'nfp_uio' kernel module before querying or " + echo "# adjusting NIC device bindings" + fi +} + +# # Uses dpdk_nic_bind.py to move devices to work with kernel drivers again # unbind_nics() @@ -513,29 +576,35 @@ step2_func() TEXT[1]="Insert IGB UIO module" FUNC[1]="load_igb_uio_module" - TEXT[2]="Insert VFIO module" - FUNC[2]="load_vfio_module" + TEXT[2]="Insert NFP UIO module" + FUNC[2]="load_nfp_uio_module" - TEXT[3]="Insert KNI module" - FUNC[3]="load_kni_module" + TEXT[3]="Insert VFIO
[dpdk-dev] [PATCH 2/3] This patch adds a new UIO driver for Netronome NFP PCI cards.
From: "Alejandro.Lucero"Current Netronome's PMD just supports Virtual Functions. Future Physical Function support will require specific Netronome code here. Signed-off-by: Alejandro.Lucero Signed-off-by: Rolf.Neugebauer --- lib/librte_eal/common/include/rte_pci.h |1 + lib/librte_eal/linuxapp/eal/eal_pci.c |4 + lib/librte_eal/linuxapp/eal/eal_pci_uio.c |2 +- lib/librte_eal/linuxapp/nfp_uio/Makefile | 53 +++ lib/librte_eal/linuxapp/nfp_uio/nfp_uio.c | 497 + lib/librte_ether/rte_ethdev.c |1 + 6 files changed, 557 insertions(+), 1 deletion(-) create mode 100644 lib/librte_eal/linuxapp/nfp_uio/Makefile create mode 100644 lib/librte_eal/linuxapp/nfp_uio/nfp_uio.c diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h index 83e3c28..89baaf6 100644 --- a/lib/librte_eal/common/include/rte_pci.h +++ b/lib/librte_eal/common/include/rte_pci.h @@ -146,6 +146,7 @@ struct rte_devargs; enum rte_kernel_driver { RTE_KDRV_UNKNOWN = 0, RTE_KDRV_IGB_UIO, + RTE_KDRV_NFP_UIO, RTE_KDRV_VFIO, RTE_KDRV_UIO_GENERIC, RTE_KDRV_NIC_UIO, diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index bc5b5be..19a93fe 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -137,6 +137,7 @@ pci_map_device(struct rte_pci_device *dev) #endif break; case RTE_KDRV_IGB_UIO: + case RTE_KDRV_NFP_UIO: case RTE_KDRV_UIO_GENERIC: /* map resources for devices that use uio */ ret = pci_uio_map_resource(dev); @@ -161,6 +162,7 @@ pci_unmap_device(struct rte_pci_device *dev) RTE_LOG(ERR, EAL, "Hotplug doesn't support vfio yet\n"); break; case RTE_KDRV_IGB_UIO: + case RTE_KDRV_NFP_UIO: case RTE_KDRV_UIO_GENERIC: /* unmap resources for devices that use uio */ pci_uio_unmap_resource(dev); @@ -357,6 +359,8 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t bus, dev->kdrv = RTE_KDRV_VFIO; else if (!strcmp(driver, "igb_uio")) dev->kdrv = RTE_KDRV_IGB_UIO; + else if (!strcmp(driver, "nfp_uio")) + dev->kdrv = RTE_KDRV_NFP_UIO; else if (!strcmp(driver, "uio_pci_generic")) dev->kdrv = RTE_KDRV_UIO_GENERIC; else diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c index ac50e13..29ec9cb 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c @@ -270,7 +270,7 @@ pci_uio_alloc_resource(struct rte_pci_device *dev, goto error; } - if (dev->kdrv == RTE_KDRV_IGB_UIO) + if (dev->kdrv == RTE_KDRV_IGB_UIO || dev->kdrv == RTE_KDRV_NFP_UIO) dev->intr_handle.type = RTE_INTR_HANDLE_UIO; else { dev->intr_handle.type = RTE_INTR_HANDLE_UIO_INTX; diff --git a/lib/librte_eal/linuxapp/nfp_uio/Makefile b/lib/librte_eal/linuxapp/nfp_uio/Makefile new file mode 100644 index 000..b9e2f0a --- /dev/null +++ b/lib/librte_eal/linuxapp/nfp_uio/Makefile @@ -0,0 +1,53 @@ +# BSD LICENSE +# +# Copyright(c) 2014-2015 Netronome. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
[dpdk-dev] [PATCH 1/3] This patch adds a PMD driver for Netronome NFP PCI cards.
From: "Alejandro.Lucero"Signed-off-by: Alejandro.Lucero Signed-off-by: Rolf.Neugebauer --- config/common_linuxapp |6 + doc/guides/nics/nfp.rst | 248 drivers/net/Makefile |1 + drivers/net/nfp/Makefile | 88 ++ drivers/net/nfp/nfp_net.c| 2480 ++ drivers/net/nfp/nfp_net_ctrl.h | 294 + drivers/net/nfp/nfp_net_logs.h | 76 ++ drivers/net/nfp/nfp_net_pmd.h| 415 +++ lib/librte_eal/linuxapp/Makefile |3 + mk/rte.app.mk|1 + 10 files changed, 3612 insertions(+) create mode 100644 doc/guides/nics/nfp.rst create mode 100644 drivers/net/nfp/Makefile create mode 100644 drivers/net/nfp/nfp_net.c create mode 100644 drivers/net/nfp/nfp_net_ctrl.h create mode 100644 drivers/net/nfp/nfp_net_logs.h create mode 100644 drivers/net/nfp/nfp_net_pmd.h diff --git a/config/common_linuxapp b/config/common_linuxapp index 0de43d5..d8d6384 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -108,6 +108,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n CONFIG_RTE_EAL_IGB_UIO=y +CONFIG_RTE_EAL_NFP_UIO=y CONFIG_RTE_EAL_VFIO=y CONFIG_RTE_MALLOC_DEBUG=n @@ -238,6 +239,11 @@ CONFIG_RTE_LIBRTE_ENIC_PMD=y CONFIG_RTE_LIBRTE_ENIC_DEBUG=n # +# Compile burst-oriented Netronome PMD driver +# +CONFIG_RTE_LIBRTE_NFP_PMD=y + +# # Compile burst-oriented VIRTIO PMD driver # CONFIG_RTE_LIBRTE_VIRTIO_PMD=y diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst new file mode 100644 index 000..df5a746 --- /dev/null +++ b/doc/guides/nics/nfp.rst @@ -0,0 +1,248 @@ +.. BSD LICENSE +Copyright(c) 2015 Netronome Systems, Inc. All rights reserved. +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions +are met: + +* Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer. +* Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in +the documentation and/or other materials provided with the +distribution. +* Neither the name of Intel Corporation nor the names of its +contributors may be used to endorse or promote products derived +from this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +1. Intro + + +Netronome's sixth generation of flow processors pack 216 programmable +cores and over 100 hardware accelerators that uniquely combine packet, +flow, security and content processing in a single device that scales +up to 400 Gbps. + +This document explains how to use DPDK with the Netronome Poll Mode +Driver (PMD) supporting Netronome's Network Flow Processor 6xxx +(NFP-6xxx). + +Currently the driver supports virtual functions (VFs) only. + +2. Dependencies +=== + +Before using the Netronome's DPDK PMD some NFP-6xxx configuration, +which is not related to DPDK, is required. The system requires +installation of Netronome's BSP (Board Support Package) which includes +Linux drivers, programs and libraries. + +If you have a NFP-6xxx device you should already have the code and +documentation for doing this configuration. Contact +support at netronome.com to obtain the latest available firmware. + +The NFP Linux kernel drivers (including the required PF driver for the +NFP) are available on Github at +https://github.com/Netronome/nfp-drv-kmods along with build +instructions. + +DPDK runs in userspace and PMDs uses the Linux kernel UIO interface to +allow access to physical devices from userspace. The NFP PMD requires +a separate UIO driver, nfp_uio, to perform correct +initialization. This driver is part of the DPDK source tree and is +equivalent to Intel's igb_uio driver. + +3. Building the software + + +Netronome's PMD code is provided in the drivers/net/nfp directory and +nfp_uio is present in the lib/librte_eal/linuxapp/nfp_uio
[dpdk-dev] [Pktgen] [PATCH] pktgen_setup_packets: fix race for packet header
Ping. On 17.09.2015 08:55, Ilya Maximets wrote: > Ok. Thank you. I'll wait. > > On 16.09.2015 18:37, Wiles, Keith wrote: >> Thanks the patch looks fine, but I have not had a lot of time to review it >> detail. I hope to get to it next week after I return back home. >> >> On 9/16/15, 2:09 AM, "Ilya Maximets" wrote: >> >>> Ping. >>> >>> On 09.09.2015 17:22, Ilya Maximets wrote: While pktgen_setup_packets() all threads of one port uses same info->seq_pkt. This leads to constructing packets in the same memory region (>hdr). As a result, pktgen_setup_packets generates random headers. Fix that by making a local copy of info->seq_pkt and using it for constructing of packets. Signed-off-by: Ilya Maximets --- app/pktgen-arp.c | 2 +- app/pktgen-cmds.c | 40 app/pktgen-ipv4.c | 2 +- app/pktgen.c | 39 +++ app/pktgen.h | 4 ++-- app/t/pktgen.t.c | 6 +++--- 6 files changed, 54 insertions(+), 39 deletions(-) diff --git a/app/pktgen-arp.c b/app/pktgen-arp.c index c378880..b7040d7 100644 --- a/app/pktgen-arp.c +++ b/app/pktgen-arp.c @@ -190,7 +190,7 @@ pktgen_process_arp( struct rte_mbuf * m, uint32_t pid, uint32_t vlan ) rte_memcpy(>eth_dst_addr, >sha, 6); for (i = 0; i < info->seqCnt; i++) - pktgen_packet_ctor(info, i, -1); + pktgen_packet_ctor(info, i, -1, NULL); } // Swap the two MAC addresses diff --git a/app/pktgen-cmds.c b/app/pktgen-cmds.c index da040e5..a6abb41 100644 --- a/app/pktgen-cmds.c +++ b/app/pktgen-cmds.c @@ -931,7 +931,7 @@ pktgen_set_proto(port_info_t * info, char type) if ( type == 'i' ) info->seq_pkt[SINGLE_PKT].ethType = ETHER_TYPE_IPv4; - pktgen_packet_ctor(info, SINGLE_PKT, -1); + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL); } / **//** @@ -1067,7 +1067,7 @@ pktgen_set_pkt_type(port_info_t * info, const char * type) (type[3] == '6') ? ETHER_TYPE_IPv6 : /* TODO print error: unknown type */ ETHER_TYPE_IPv4; - pktgen_packet_ctor(info, SINGLE_PKT, -1); + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL); } / **//** @@ -1092,7 +1092,7 @@ pktgen_set_vlan(port_info_t * info, uint32_t onOff) } else pktgen_clr_port_flags(info, SEND_VLAN_ID); - pktgen_packet_ctor(info, SINGLE_PKT, -1); + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL); } / **//** @@ -1112,7 +1112,7 @@ pktgen_set_vlanid(port_info_t * info, uint16_t vlanid) { info->vlanid = vlanid; info->seq_pkt[SINGLE_PKT].vlanid = info->vlanid; - pktgen_packet_ctor(info, SINGLE_PKT, -1); + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL); } / **//** @@ -1137,7 +1137,7 @@ pktgen_set_mpls(port_info_t * info, uint32_t onOff) } else pktgen_clr_port_flags(info, SEND_MPLS_LABEL); - pktgen_packet_ctor(info, SINGLE_PKT, -1); + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL); } / **//** @@ -1157,7 +1157,7 @@ pktgen_set_mpls_entry(port_info_t * info, uint32_t mpls_entry) { info->mpls_entry = mpls_entry; info->seq_pkt[SINGLE_PKT].mpls_entry = info->mpls_entry; - pktgen_packet_ctor(info, SINGLE_PKT, -1); + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL); } / **//** @@ -1182,7 +1182,7 @@ pktgen_set_qinq(port_info_t * info, uint32_t onOff) } else pktgen_clr_port_flags(info, SEND_Q_IN_Q_IDS); - pktgen_packet_ctor(info, SINGLE_PKT, -1); + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL); } / **//** @@ -1204,7 +1204,7 @@ pktgen_set_qinqids(port_info_t * info, uint16_t outerid, uint16_t innerid) info->seq_pkt[SINGLE_PKT].qinq_outerid =
[dpdk-dev] [Pktgen] [PATCH] pktgen_setup_packets: fix race for packet header
I looked at the code and everything looks good. I will try to merge the code next week as I am traveling again :-( Thanks for the patch, I am glad you found this problem as I believe someone else reported something odd in that area, but was not able to give me many details. ? Regards, ++Keith Wiles Intel Corporation On 10/2/15, 10:23 AM, "Ilya Maximets" wrote: >Ping. > >On 17.09.2015 08:55, Ilya Maximets wrote: >> Ok. Thank you. I'll wait. >> >> On 16.09.2015 18:37, Wiles, Keith wrote: >>> Thanks the patch looks fine, but I have not had a lot of time to >>>review it >>> detail. I hope to get to it next week after I return back home. >>> >>> On 9/16/15, 2:09 AM, "Ilya Maximets" wrote: >>> Ping. On 09.09.2015 17:22, Ilya Maximets wrote: > While pktgen_setup_packets() all threads of one port uses same > info->seq_pkt. This leads to constructing packets in the same memory > region > (>hdr). As a result, pktgen_setup_packets generates random >headers. > > Fix that by making a local copy of info->seq_pkt and using it for > constructing of packets. > > Signed-off-by: Ilya Maximets > --- > app/pktgen-arp.c | 2 +- > app/pktgen-cmds.c | 40 > app/pktgen-ipv4.c | 2 +- > app/pktgen.c | 39 +++ > app/pktgen.h | 4 ++-- > app/t/pktgen.t.c | 6 +++--- > 6 files changed, 54 insertions(+), 39 deletions(-) > > diff --git a/app/pktgen-arp.c b/app/pktgen-arp.c > index c378880..b7040d7 100644 > --- a/app/pktgen-arp.c > +++ b/app/pktgen-arp.c > @@ -190,7 +190,7 @@ pktgen_process_arp( struct rte_mbuf * m, uint32_t > pid, uint32_t vlan ) > > rte_memcpy(>eth_dst_addr, >sha, 6); > for (i = 0; i < info->seqCnt; i++) > - pktgen_packet_ctor(info, i, -1); > + pktgen_packet_ctor(info, i, -1, NULL); > } > > // Swap the two MAC addresses > diff --git a/app/pktgen-cmds.c b/app/pktgen-cmds.c > index da040e5..a6abb41 100644 > --- a/app/pktgen-cmds.c > +++ b/app/pktgen-cmds.c > @@ -931,7 +931,7 @@ pktgen_set_proto(port_info_t * info, char type) > if ( type == 'i' ) > info->seq_pkt[SINGLE_PKT].ethType = ETHER_TYPE_IPv4; > > - pktgen_packet_ctor(info, SINGLE_PKT, -1); > + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL); > } > > > >/* >*** > **//** > @@ -1067,7 +1067,7 @@ pktgen_set_pkt_type(port_info_t * info, const > char * type) > > (type[3] == '6') ? ETHER_TYPE_IPv6 : > > /* TODO print error: unknown type */ ETHER_TYPE_IPv4; > > - pktgen_packet_ctor(info, SINGLE_PKT, -1); > + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL); > } > > > >/* >*** > **//** > @@ -1092,7 +1092,7 @@ pktgen_set_vlan(port_info_t * info, uint32_t > onOff) > } > else > pktgen_clr_port_flags(info, SEND_VLAN_ID); > - pktgen_packet_ctor(info, SINGLE_PKT, -1); > + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL); > } > > > >/* >*** > **//** > @@ -1112,7 +1112,7 @@ pktgen_set_vlanid(port_info_t * info, uint16_t > vlanid) > { > info->vlanid = vlanid; > info->seq_pkt[SINGLE_PKT].vlanid = info->vlanid; > - pktgen_packet_ctor(info, SINGLE_PKT, -1); > + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL); > } > > > >/* >*** > **//** > @@ -1137,7 +1137,7 @@ pktgen_set_mpls(port_info_t * info, uint32_t > onOff) > } > else > pktgen_clr_port_flags(info, SEND_MPLS_LABEL); > - pktgen_packet_ctor(info, SINGLE_PKT, -1); > + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL); > } > > > >/* >*** > **//** > @@ -1157,7 +1157,7 @@ pktgen_set_mpls_entry(port_info_t * info, > uint32_t mpls_entry) > { > info->mpls_entry = mpls_entry; > info->seq_pkt[SINGLE_PKT].mpls_entry = info->mpls_entry; > - pktgen_packet_ctor(info, SINGLE_PKT, -1); > + pktgen_packet_ctor(info, SINGLE_PKT, -1, NULL); > } > > > >/* >*** > **//**
[dpdk-dev] [PATCH] devargs: add blacklisting by linux interface name
If a system is using deterministic interface names, it may be easier in some cases to use the interface name to blacklist an interface. Signed-off-by: Chas Williams <3chas3 at gmail.com> --- app/test/test_devargs.c | 2 ++ lib/librte_eal/common/eal_common_devargs.c | 8 lib/librte_eal/common/eal_common_options.c | 10 ++ lib/librte_eal/common/eal_common_pci.c | 17 +++-- lib/librte_eal/common/eal_options.h | 2 ++ lib/librte_eal/common/include/rte_devargs.h | 5 + lib/librte_eal/common/include/rte_pci.h | 1 + lib/librte_eal/linuxapp/eal/eal_pci.c | 15 +++ 8 files changed, 54 insertions(+), 6 deletions(-) diff --git a/app/test/test_devargs.c b/app/test/test_devargs.c index f7fc59c..c204c49 100644 --- a/app/test/test_devargs.c +++ b/app/test/test_devargs.c @@ -85,6 +85,8 @@ test_devargs(void) goto fail; if (rte_eal_devargs_type_count(RTE_DEVTYPE_VIRTUAL) != 2) goto fail; + if (rte_eal_devargs_add(RTE_DEVTYPE_BLACKLISTED_NAME, "eth0") < 0) + goto fail; free_devargs_list(); /* check virtual device with argument parsing */ diff --git a/lib/librte_eal/common/eal_common_devargs.c b/lib/librte_eal/common/eal_common_devargs.c index ec56165..cac651b 100644 --- a/lib/librte_eal/common/eal_common_devargs.c +++ b/lib/librte_eal/common/eal_common_devargs.c @@ -113,6 +113,14 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char *devargs_str) goto fail; break; + case RTE_DEVTYPE_BLACKLISTED_NAME: + /* save interface name */ + ret = snprintf(devargs->name.name, + sizeof(devargs->name.name), "%s", buf); + if (ret < 0 || ret >= (int)sizeof(devargs->name.name)) + goto fail; + + break; } free(buf); diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index 1f459ac..c08126d 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -90,6 +90,7 @@ eal_long_options[] = { {OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM}, {OPT_VMWARE_TSC_MAP,0, NULL, OPT_VMWARE_TSC_MAP_NUM }, {OPT_XEN_DOM0, 0, NULL, OPT_XEN_DOM0_NUM }, + {OPT_BLACKLISTED_NAME, 1, NULL, OPT_BLACKLISTED_NAME_NUM }, {0, 0, NULL, 0} }; @@ -785,6 +786,13 @@ eal_parse_common_option(int opt, const char *optarg, } break; + case OPT_BLACKLISTED_NAME_NUM: + if (rte_eal_devargs_add(RTE_DEVTYPE_BLACKLISTED_NAME, + optarg) < 0) { + return -1; + } + break; + /* don't know what to do, leave this to caller */ default: return 1; @@ -898,6 +906,8 @@ eal_common_usage(void) " --"OPT_VDEV" Add a virtual device.\n" " The argument format is [,key=val,...]\n" " (ex: --vdev=eth_pcap0,iface=eth2).\n" + " --"OPT_BLACKLISTED_NAME" Add a device name to the black list.\n" + " Prevent EAL from using this named interface.\n" " --"OPT_VMWARE_TSC_MAP"Use VMware TSC map instead of native RDTSC\n" " --"OPT_PROC_TYPE" Type of this process (primary|secondary|auto)\n" " --"OPT_SYSLOG"Set syslog facility\n" diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c index dcfe947..41a7690 100644 --- a/lib/librte_eal/common/eal_common_pci.c +++ b/lib/librte_eal/common/eal_common_pci.c @@ -90,11 +90,15 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev) struct rte_devargs *devargs; TAILQ_FOREACH(devargs, _list, next) { - if (devargs->type != RTE_DEVTYPE_BLACKLISTED_PCI && - devargs->type != RTE_DEVTYPE_WHITELISTED_PCI) - continue; - if (!rte_eal_compare_pci_addr(>addr, >pci.addr)) - return devargs; + if (devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI || + devargs->type == RTE_DEVTYPE_WHITELISTED_PCI) { + if (!rte_eal_compare_pci_addr(>addr, >pci.addr)) + return devargs; + } + if (devargs->type == RTE_DEVTYPE_BLACKLISTED_NAME) { + if (strcmp(dev->name, devargs->name.name) == 0) + return devargs; + } } return NULL; } @@ -174,7 +178,8 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct
[dpdk-dev] [PATCH v3 6/8] mk: Add rule for installing nic bind files
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu Matilainen > Sent: Friday, October 2, 2015 11:50 AM > To: Arevalo, Mario Alfredo C; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3 6/8] mk: Add rule for installing nic > bind files > > On 10/01/2015 03:11 AM, Mario Carrillo wrote: > > Add hierarchy-file support to the DPDK nic bind files, when invoking > > "make install-sbin" nic bind files will be installed by default in: > > $(DESTDIR)/$(SBIN_DIR) where SBIN_DIR=/usr/sbin/dpdk_nic_bind by > > default, you can override SBIN_DIR var. > > This hierarchy is based on: > > http://www.freedesktop.org/software/systemd/man/file-hierarchy.html > > and dpdk spec file. > > > > Signed-off-by: Mario Carrillo > > --- > > mk/rte.sdkinstall.mk | 14 ++ > > mk/rte.sdkroot.mk| 4 ++-- > > 2 files changed, 16 insertions(+), 2 deletions(-) > > > > diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk index > > 5a2fd40..4eecf31 100644 > > --- a/mk/rte.sdkinstall.mk > > +++ b/mk/rte.sdkinstall.mk > > @@ -46,11 +46,13 @@ else > > INCLUDE_DIR ?= /usr/include/dpdk > > BIN_DIR ?= /usr/bin > > DOC_DIR ?= /usr/share/doc/dpdk > > +SBIN_DIR ?= /usr/sbin/dpdk_nic_bind > > HSLINKS := $(wildcard $(RTE_OUTPUT)/include/*) > > BINARY_FILES := $(patsubst %.map,,$(wildcard $(RTE_OUTPUT)/app/*)) > > LIBS := $(wildcard $(RTE_OUTPUT)/lib/*) > > MODULES := $(wildcard $(RTE_OUTPUT)/kmod/*) > > DOCS := $(wildcard $(BUILD_DIR)/doc/*) > > +NIC_BIND_FILES := $(wildcard $(BUILD_DIR)/tools/*nic_bind.py) > > include $(BUILD_DIR)/build/.config > > RTE_ARCH := $(CONFIG_RTE_ARCH:"%"=%) > > RTE_EXEC_ENV := $(CONFIG_RTE_EXEC_ENV:"%"=%) @@ -161,6 +163,18 @@ > > install-doc: > > echo installing: $$DOC; \ > > done > > # > > +# install nic bind files in /usr/sbin/dpdk_nic_bind # by default > > +SBIN_DIR can be overridden. > > +# > > This creates an out-of-path directory /usr/sbin/dpdk_nic_bind/ in which > the dpdk_nic_bind.py is installed. Besides not being a very accessible > location, the FHS explicitly forbids creation of subdirectories below > /usr/[s]bin. > > SBIN_DIR should be /usr/sbin unless overridden, but OTOH I think this > could go into /usr/bin just as well, the split is fairly ambiguous anyway > (I mean, testpmd is not something a regular user is going to run > either) > > In addition, if dpdk_nic_bind.py is installed then perhaps the > cpu_layout.py utility should be installed too? > > - Panu - I think there are better utilities available for determining the core layout that cpu_layout.py. "lstopo", for one, is much more powerful. Do we want/need to keep our own script around for that? /Bruce
[dpdk-dev] DPDK Logo Release
2015-10-01 15:28, St Leger, Jim: > When can we expect the main website (including home page http://dpdk.org/) to > be updated? As there was no comment about the demo page, it is now applied to every pages. > Have we opened up the website to allow the community to edit it? (I think > this has been discussed in the past...) Yes you're right, it has been discussed and planned. The git tree will be published. Should it include the server config parts? Should we open a new mailing-list to receive patches and discussions? > Who owns website changes today? As you know, it was created at 6WIND but there is no special reason to not allow others to contribute. The opening effort hasn't been done yet because there was no specific request until recently and the email box admin at dpdk.org (advertised in the "about" section) was almost empty. Glad to see people interested in improving such things :) Thanks Intel for the nice logo! Do you have some insights about its signification to share?
[dpdk-dev] [PATCH 0/2] xenvirt hotplug support
add PCI Port Hotplug support to the xenvirt PMD This patch depends on 4 patches from the following patch set: -remove-pci-driver-from-vdevs.patch 0001-librte_eal-add-RTE_KDRV_NONE-for-vdevs.patch 0002-librte_ether-add-fields-from-rte_pci_driver-to-rte_e.patch 0003-librte_ether-add-function-rte_eth_copy_dev_info.patch 0009-xenvirt-copy-pci-device-info-to-eth_dev-data.patch Bernard Iremonger (2): xenvirt: add support for PCI Port Hotplug xenvirt: free queues in dev_close drivers/net/xenvirt/rte_eth_xenvirt.c | 87 +++ drivers/net/xenvirt/rte_xen_lib.c | 26 +-- drivers/net/xenvirt/rte_xen_lib.h | 5 +- 3 files changed, 105 insertions(+), 13 deletions(-) -- 1.9.1
[dpdk-dev] [PATCH] vhost_xen: fix compile error in main.c
Signed-off-by: Bernard Iremonger --- examples/vhost_xen/main.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/examples/vhost_xen/main.c b/examples/vhost_xen/main.c index 5d20700..d124be1 100644 --- a/examples/vhost_xen/main.c +++ b/examples/vhost_xen/main.c @@ -1,7 +1,7 @@ /*- * BSD LICENSE * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -579,6 +579,7 @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf **pkts, uint32_t count) uint16_t res_base_idx, res_end_idx; uint16_t free_entries; uint8_t success = 0; + void *userdata; LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh); vq = dev->virtqueue_rx; @@ -656,13 +657,14 @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf **pkts, uint32_t count) vq->used->ring[res_cur_idx & (vq->size - 1)].len = packet_len; /* Copy mbuf data to buffer */ - rte_memcpy((void *)(uintptr_t)buff_addr, (const void*)buff->data, rte_pktmbuf_data_len(buff)); + userdata = rte_pktmbuf_mtod(buff, void *); + rte_memcpy((void *)(uintptr_t)buff_addr, userdata, rte_pktmbuf_data_len(buff)); res_cur_idx++; packet_success++; /* mergeable is disabled then a header is required per buffer. */ - rte_memcpy((void *)(uintptr_t)buff_hdr_addr, (const void*)_hdr, vq->vhost_hlen); + rte_memcpy((void *)(uintptr_t)buff_hdr_addr, (const void *)_hdr, vq->vhost_hlen); if (res_cur_idx < res_end_idx) { /* Prefetch descriptor index. */ rte_prefetch0(>desc[head[packet_success]]); -- 1.9.1
[dpdk-dev] [PATCH 2/2] xenvirt: free queues in dev_close
Signed-off-by: Bernard Iremonger --- drivers/net/xenvirt/rte_eth_xenvirt.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/net/xenvirt/rte_eth_xenvirt.c b/drivers/net/xenvirt/rte_eth_xenvirt.c index 8923826..1bf35b7 100644 --- a/drivers/net/xenvirt/rte_eth_xenvirt.c +++ b/drivers/net/xenvirt/rte_eth_xenvirt.c @@ -75,6 +75,9 @@ static struct rte_eth_link pmd_link = { .link_status = 0 }; +static void +eth_xenvirt_free_queues(struct rte_eth_dev *dev); + static inline struct rte_mbuf * rte_rxmbuf_alloc(struct rte_mempool *mp) { @@ -326,7 +329,7 @@ eth_dev_stop(struct rte_eth_dev *dev) static void eth_dev_close(struct rte_eth_dev *dev) { - RTE_SET_USED(dev); + eth_xenvirt_free_queues(dev); } static void @@ -362,8 +365,9 @@ eth_stats_reset(struct rte_eth_dev *dev) } static void -eth_queue_release(void *q __rte_unused) +eth_queue_release(void *q) { + rte_free(q); } static int @@ -524,7 +528,23 @@ eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id, return 0; } +static void +eth_xenvirt_free_queues(struct rte_eth_dev *dev) +{ + int i; + for (i = 0; i < dev->data->nb_rx_queues; i++) { + eth_queue_release(dev->data->rx_queues[i]); + dev->data->rx_queues[i] = NULL; + } + dev->data->nb_rx_queues = 0; + + for (i = 0; i < dev->data->nb_tx_queues; i++) { + eth_queue_release(dev->data->tx_queues[i]); + dev->data->tx_queues[i] = NULL; + } + dev->data->nb_tx_queues = 0; +} static const struct eth_dev_ops ops = { .dev_start = eth_dev_start, -- 1.9.1
[dpdk-dev] [PATCH 1/2] xenvirt: add support for PCI Port Hotplug
Signed-off-by: Bernard Iremonger --- drivers/net/xenvirt/rte_eth_xenvirt.c | 63 +++ drivers/net/xenvirt/rte_xen_lib.c | 26 --- drivers/net/xenvirt/rte_xen_lib.h | 5 ++- 3 files changed, 83 insertions(+), 11 deletions(-) diff --git a/drivers/net/xenvirt/rte_eth_xenvirt.c b/drivers/net/xenvirt/rte_eth_xenvirt.c index b3383af..8923826 100644 --- a/drivers/net/xenvirt/rte_eth_xenvirt.c +++ b/drivers/net/xenvirt/rte_eth_xenvirt.c @@ -1,7 +1,7 @@ /*- * BSD LICENSE * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -642,10 +642,14 @@ eth_dev_xenvirt_create(const char *name, const char *params, if (internals == NULL) goto err; - /* reserve an ethdev entry */ - eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL); - if (eth_dev == NULL) - goto err; + /* find an ethdev entry */ + eth_dev = rte_eth_dev_allocated(name); + if (eth_dev == NULL) { + /* reserve an ethdev entry */ + eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL); + if (eth_dev == NULL) + goto err; + } data->dev_private = internals; data->port_id = eth_dev->data->port_id; @@ -661,7 +665,7 @@ eth_dev_xenvirt_create(const char *name, const char *params, eth_dev->data = data; eth_dev->dev_ops = - eth_dev->data->dev_flags = 0; + eth_dev->data->dev_flags = RTE_PCI_DRV_DETACHABLE; eth_dev->data->kdrv = RTE_KDRV_NONE; eth_dev->data->drv_name = NULL; eth_dev->driver = NULL; @@ -683,6 +687,38 @@ err: } +static int +eth_dev_xenvirt_free(const char *name, const unsigned numa_node) +{ + struct rte_eth_dev *eth_dev = NULL; + + RTE_LOG(DEBUG, PMD, + "Free virtio rings backed ethdev on numa socket %u\n", + numa_node); + + /* find an ethdev entry */ + eth_dev = rte_eth_dev_allocated(name); + if (eth_dev == NULL) + return -1; + + if (eth_dev->data->dev_started == 1) { + eth_dev_stop(eth_dev); + eth_dev_close(eth_dev); + } + + eth_dev->rx_pkt_burst = NULL; + eth_dev->tx_pkt_burst = NULL; + eth_dev->dev_ops = NULL; + + rte_free(eth_dev->data); + rte_free(eth_dev->data->dev_private); + rte_free(eth_dev->data->mac_addrs); + + virtio_idx--; + + return 0; +} + /*TODO: Support multiple process model */ static int rte_pmd_xenvirt_devinit(const char *name, const char *params) @@ -701,10 +737,25 @@ rte_pmd_xenvirt_devinit(const char *name, const char *params) return 0; } +static int +rte_pmd_xenvirt_devuninit(const char *name) +{ + eth_dev_xenvirt_free(name, rte_socket_id()); + + if (virtio_idx == 0) { + if (xenstore_uninit() != 0) + RTE_LOG(ERR, PMD, "%s: xenstore uninit failed\n", __func__); + + gntalloc_close(); + } + return 0; +} + static struct rte_driver pmd_xenvirt_drv = { .name = "eth_xenvirt", .type = PMD_VDEV, .init = rte_pmd_xenvirt_devinit, + .uninit = rte_pmd_xenvirt_devuninit, }; PMD_REGISTER_DRIVER(pmd_xenvirt_drv); diff --git a/drivers/net/xenvirt/rte_xen_lib.c b/drivers/net/xenvirt/rte_xen_lib.c index b3932f0..5900b53 100644 --- a/drivers/net/xenvirt/rte_xen_lib.c +++ b/drivers/net/xenvirt/rte_xen_lib.c @@ -1,7 +1,7 @@ /*- * BSD LICENSE * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -50,6 +50,7 @@ #include #include +#include #include "rte_xen_lib.h" @@ -72,6 +73,8 @@ int gntalloc_fd = -1; static char *dompath = NULL; /* handle to xenstore read/write operations */ static struct xs_handle *xs = NULL; +/* flag to indicate if xenstore cleanup is required */ +static bool is_xenstore_cleaned_up; /* * Reserve a virtual address space. @@ -275,7 +278,6 @@ xenstore_init(void) { unsigned int len, domid; char *buf; - static int cleanup = 0; char *end; xs = xs_domain_open(); @@ -301,16 +303,32 @@ xenstore_init(void) xs_transaction_start(xs); /* When to stop transaction */ - if (cleanup == 0) { + if (is_xenstore_cleaned_up == 0) { if (xenstore_cleanup()) return -1; - cleanup = 1; + is_xenstore_cleaned_up = 1; } return 0; } int +xenstore_uninit(void) +{ + xs_close(xs); + + if (is_xenstore_cleaned_up == 0) { + if
[dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance
On 10/02/2015 07:00 AM, Michael S. Tsirkin wrote: > On Thu, Oct 01, 2015 at 02:02:24PM -0700, Alexander Duyck wrote: >> validation and translation would add 10s if not 100s of nanoseconds to the >> time needed to process each packet. In addition we are talking about doing >> this in kernel space which means we wouldn't really be able to take >> advantage of things like SSE or AVX instructions. > Yes. But the nice thing is that it's rearming so it can happen on > a separate core, in parallel with packet processing. > It does not need to add to latency. Moving it to another core is automatically going to add extra latency. You will have to evict the data out of the L1 cache for one core and into the L1 cache for another when you update it, and then reading it will force it to have to transition back out. If you are lucky it is only evicted to L2, if not then to L3, or possibly even back to memory. Odds are that alone will add tens of nanoseconds to the process, and you would need three or more cores to do the same workload as running the process over multiple threads means having to add synchronization primitives to the whole mess. Then there is the NUMA factor on top of that. > You will burn up more CPU, but again, all this for boxes/hypervisors > without an IOMMU. There are use cases this will completely make useless. If for example you are running a workload that needs three cores with DPDK bumping it to nine or more will likely push you out of being able to do the workload on some systems. > I'm sure people can come up with even better approaches, once enough > people get it that kernel absolutely needs to be protected from > userspace. I don't see that happening. Many people don't care about kernel security that much. If they did something like DPDK wouldn't have gotten off of the ground. Once someone has the ability to load kernel modules any protection of the kernel from userspace pretty much goes right out the window. You are just as much at risk from a buggy driver in userspace as you are from one that can be added to the kernel. > Long term, the right thing to do is to focus on IOMMU support. This > gives you hardware-based memory protection without need to burn up CPU > cycles. We have a solution that makes use of IOMMU support with vfio. The problem is there are multiple cases where that support is either not available, or using the IOMMU provides excess overhead. - Alex
[dpdk-dev] How kernel can share the mem from dpdk hugepage?
All of this information is in shared memory, is it not? For example, you could patch the ring library to give a programmable interface to the following function: http://dpdk.org/doc/api/rte__ring_8h.html#a7bfcef0ad324fcc4c03bcb59cd7e867f. This would allow you to see the full set of rings in a process that has attached as a secondary to DPDK. Write a process that does this, and then interfaces with whatever you have running in the kernel. Ultimately, the architecture is pulling from userspace and pushing into the kernel, rather than pulling directly from the kernel. Does that help? Thanks, Kyle On Thu, Oct 1, 2015 at 11:53 AM, ?? wrote: > Hi all, > > > I want to ask does anybody know how kernel can share the info from dpdk > hugepage. My project has a requirement which kernel needs to get some info > from dpdk application. Eg, in multi-process example, every client has a > shared ring buffer with server. The shared ring contains some meta data of > packets. Is it possible that dpdk share this info to kernel, then kernel can > access it? What are the key points that can help to achieve the goal?
[dpdk-dev] [PATCH] ethdev: distinguish between drop and error stats
Can you improve the comments on these counters? If you didn't happen to follow this thread, there's no way to reasonably figure out what the difference is from looking at the code without chasing it all the way down and cross-referencing the NIC datasheet. Thanks, Jay On Fri, Oct 2, 2015 at 7:47 AM, Maryam Tahhan wrote: > Make a distniction between dropped packets and error statistics to allow > a higher level fault management entity to interact with DPDK and take > appropriate measures when errors are detected. It will also provide > valuable information for any applications that collects/extracts DPDK > stats, such applications include Open vSwitch. > After this patch the distinction is: > ierrors = Total number of packets dropped by hardware (malformed > packets, ...) Where the # of drops can ONLY be <= the packets received > (without overlap between registers). > Rx_pkt_errors = Total number of erroneous received packets. Where the # > of errors can be >= the packets received (without overlap between > registers), this is because there may be multiple errors associated with > a packet. > > Signed-off-by: Maryam Tahhan > --- > lib/librte_ether/rte_ethdev.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h > index 8a8c82b..53dd55d 100644 > --- a/lib/librte_ether/rte_ethdev.h > +++ b/lib/librte_ether/rte_ethdev.h > @@ -200,8 +200,9 @@ struct rte_eth_stats { > /**< Deprecated; Total of RX packets with CRC error. */ > uint64_t ibadlen; > /**< Deprecated; Total of RX packets with bad length. */ > - uint64_t ierrors; /**< Total number of erroneous received > packets. */ > + uint64_t ierrors; /**< Total number of dropped received packets. > */ > uint64_t oerrors; /**< Total number of failed transmitted > packets. */ > + uint64_t ipkterrors; /**< Total number of erroneous received > packets. */ > uint64_t imcasts; > /**< Deprecated; Total number of multicast received packets. */ > uint64_t rx_nombuf; /**< Total number of RX mbuf allocation > failures. */ > -- > 2.4.3 > >
[dpdk-dev] [PATCH 00/52] update i40e base driver
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon > Sent: Friday, October 2, 2015 12:39 AM > To: Wu, Jingjing > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH 00/52] update i40e base driver > > 2015-09-09 01:53, Zhang, Helin: > > Acked-by: Helin Zhang > > Applied, thanks. > Some titles were fixed and some patches are squashed or reordered. > > Maybe it deserves an entry in the release notes. Hi Jingjing, See the "Updated the i40e base driver" entry in the 2.1 Release note: http://dpdk.org/doc/guides/rel_notes/release_2_1.html It doesn't need to include every single change in the base driver. Just a summary of the main ones. John --
[dpdk-dev] [PATCH 00/52] update i40e base driver
2015-09-09 01:53, Zhang, Helin: > Acked-by: Helin Zhang Applied, thanks. Some titles were fixed and some patches are squashed or reordered. Maybe it deserves an entry in the release notes.
[dpdk-dev] How kernel can share the mem from dpdk hugepage?
Hi all, I want to ask does anybody know how kernel can share the info from dpdk hugepage. My project has a requirement which kernel needs to get some info from dpdk application. Eg, in multi-process example, every client has a shared ring buffer with server. The shared ring contains some meta data of packets. Is it possible that dpdk share this info to kernel, then kernel can access it? What are the key points that can help to achieve the goal?