date:20140926

[dpdk-dev] [PATCH 0/4] Add DSO symbol versioning to support backwards compatibility

2014-09-26 Thread Neil Horman

On Fri, Sep 26, 2014 at 03:02:55PM -0700, Stephen Hemminger wrote:
> On Fri, 26 Sep 2014 10:45:49 -0400
> Neil Horman  wrote:
> 
> > On Fri, Sep 26, 2014 at 12:41:33PM +0200, Thomas Monjalon wrote:
> > > Hi Neil,
> > > 
> > > 2014-09-24 14:19, Neil Horman:
> > > > Ping Thomas. I know you're busy, but I would like this to not fall off 
> > > > anyones
> > > > radar.  You alluded to concerns regarding what, for lack of a better 
> > > > term,
> > > > ABI/API lockin.  I had asked you to enuumerate/elaborate on specifics, 
> > > > but never
> > > > heard back.  Are there further specifics you wish to discuss, or are you
> > > > satisfied with the above answers?
> > > 
> > > Sorry for not being very reactive on this thread.
> > > All this discussion is very interesting but it's really not the proper
> > > time to apply it. As you said, it requires an extra effort. I'm not saying
> > > it will never be integrated. I'm just saying that we cannot change
> > > everything at the same time.
> > > 
> > > Let me sum up the situation. This community project has been very active
> > > for few months now. First, we learnt how to make some releases together
> > > and we are improving the process to be able to deliver a new major release
> > > every 4 months while having some good quality process.
> > > But these releases are still not complete because documentation is not
> > > integrated yet. Then developers should have a role in documentation 
> > > updates.
> > > We also need to integrate and learn how to use more tools to be more
> > > efficient and improve quality.
> > > 
> > > So the question is "when should we care about API compatibility"?
> > > And the answer is: ASAP, but not now. I feel next year is a better target.
> > > Because the most important priority is to move together at a pace which
> > > allow most of us to stay in the race.
> > > 
> > 
> > 
> > I'm sorry Thomas, I don't accept this.  I asked you for details as to your
> > concerns regarding this patch series, and you've provided more vague 
> > comments.
> > I need details to address
> > 
> > You say it requires extra effort, you're right it does.  Any feature that 
> > you
> > integreate requires some additional effort.  How is this patch any different
> > from adding the acl library or any other new API?  Everything requires
> > maintenence, thats how software works.  What specfically about this patch 
> > series
> > makes the effort insurmountable to you?
> > 
> > You say you're improving your process.  Great, this patch aids in that 
> > process
> > by ensuring backwards compatibility for a period of time.  Given that the 
> > API
> > and ABI can still evolve within this framework, as I've described, how is 
> > this
> > patch series not a significant step forward toward your goal of quality 
> > process.
> > 
> > You say documentation isn't integrated.  So, what does getting documentation
> > integrated have to do with this patch set, or any other?  I don't see you
> > holding any other patches based on documentation.  Again, nothing in this 
> > series
> > prevents evolution of the API or ABI.  If you're hope is to wait until
> > everything is perfect, then apply some control to the public facing API, 
> > and get
> > it all documented, none of thosse things will ever happen, I promise you.
> > 
> > You say you also need to learn to use more tools to be more efficient and
> > improve quality.  Great!  Thats exactly what this is. If we mandate even a 
> > short
> > term commitment to ABI stability (1 single relese worth of time), we will
> > quickly identify what API's change quickly and where we need to be cautious 
> > with
> > our API design.  If you just assume that developers will get better of 
> > their own
> > volition, it will never happen.
> > 
> > You say this should go in next year, but not now.  When exactly?  What 
> > event do
> > you forsee occuring in the next 12-18 months that will change everything 
> > such
> > that we can start supporing an ABI for more than just a few weeks at the 
> > head of
> > the tree?  
> > 
> > To this end, I just did a quick search through the git history for dpdk to 
> > look
> > at the histories of all the header files that are exposed via the makefile
> > SYMLINK command (given that that provides a list of header files that
> > applications can include, and embodies all the function symbols and data 
> > types
> > applications have access to.
> > 
> > There are 179 total commits in that list
> > Of those, a bit of spot checking suggests that about 10-15% of them actually
> > change ABI, and many of those came from Bruce's rework of the mbuf 
> > structure.
> > That about 17-20 instances over the last 2 years where an ABI update would 
> > have
> > been needed.  That seems pretty reasonable to me.  Where exactly is your 
> > concern
> > here?
> > 
> > Neil
> 
> Isn't ABI stablity a distro responsibility not a project responsibility?
> I have lots more API/ABI changes, just been too busy trying to

[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Ananyev, Konstantin



> > As I remember the purpose of the patch was to fix the race condition inside 
> > rte_alarm library.
> > I believe that the patch provided by Michal & Pawel fixes the issues you 
> > discovered.
> > If you think, that is not the case, could you please provide a list of 
> > remaining issues?
> > Excluding ones that you just don't like it, and you are not happy with 
> > rte_alarm API in total?


> Gladly.  As Pawel explained the race, its possible that, after calling
> rte_eal_alarm_cancel, an in-flight execution of an alarm callback may still be
> running.  The problem with that ostensibly is that data which is being 
> accessed
> by the callback might be then accessed in parallel with another process 
> leading
> to data corruption or some other problem. The issue I have with his patch is
> that it doesn't completely close the race.  While it does close the race for 
> the
> condition in whcih thread B is running the alarm callback while thread A is
> executing the cancel operation, it does not close the case for when a single
> thread B is running the cancel operation, as the in-flight execution itself is
> still active.

A bit puzzled here:
Are you saying that calling alarm_cancel() for itself inside 
eal_alarm_callback() might cause a problem?
I still don't see how.

>  If such a cancellation occurs via an intermediary function (i.e.
> one which is not aware that it is explicitly running an alarm callback, which
> signals another thread to execute via some other method (ipc communication,
> etc), the same data corruption may occur, because the canceled and complete
> guarantee has been violated.
>

[dpdk-dev] [PATCH v5 11/11] examples/vhost: add vhost example Makefile

2014-09-26 Thread Huawei Xie

Signed-off-by: Huawei Xie 
---
 examples/vhost/Makefile | 52 +
 1 file changed, 52 insertions(+)
 create mode 100644 examples/vhost/Makefile

diff --git a/examples/vhost/Makefile b/examples/vhost/Makefile
new file mode 100644
index 000..a4d4fb0
--- /dev/null
+++ b/examples/vhost/Makefile
@@ -0,0 +1,52 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overriden by command line or environment
+RTE_TARGET ?= x86_64-native-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# binary name
+APP = vhost-switch
+
+# all source are stored in SRCS-y
+#SRCS-y := cusedrv.c loopback-userspace.c
+SRCS-y := main.c
+
+CFLAGS += -O2 -I/usr/local/include -D_FILE_OFFSET_BITS=64 -Wno-unused-parameter
+CFLAGS += $(WERROR_FLAGS)
+LDFLAGS += -lfuse
+
+include $(RTE_SDK)/mk/rte.extapp.mk
-- 
1.8.1.4

[dpdk-dev] [PATCH v5 10/11] examples/vhost: merge oliver's mbuf changes to vhost example

2014-09-26 Thread Huawei Xie

The mbuf changes include:
1. flattened structure vlan_macip
2. removed rte_pktmbuf structure.
3. mbuf data pointer replaced by an offset

Other changes include:
1. fix sg mbuf xmit in virtio_tx_route
2. rename RTE_MBUF_SCATTER_GATHER to RTE_MBUF_REFCNT
3. add one TODO and FIXME 

Signed-off-by: Huawei Xie 
---
 examples/vhost/main.c | 57 ++-
 1 file changed, 29 insertions(+), 28 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 3834af4..6569188 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -716,10 +716,10 @@ us_vhost_parse_args(int argc, char **argv)
zero_copy = ret;

if (zero_copy) {
-#ifdef RTE_MBUF_SCATTER_GATHER
+#ifdef RTE_MBUF_REFCNT
RTE_LOG(ERR, VHOST_CONFIG, "Before 
running "
"zero copy vhost APP, please "
-   "disable RTE_MBUF_SCATTER_GATHER\n"
+   "disable RTE_MBUF_REFCNT\n"
"in config file and then rebuild DPDK "
"core lib!\n"
"Otherwise please disable zero copy "
@@ -906,7 +906,7 @@ link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m)
int i, ret;

/* Learn MAC address of guest device from packet */
-   pkt_hdr = (struct ether_hdr *)m->pkt.data;
+   pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);

dev_ll = ll_root_used;

@@ -995,7 +995,7 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m)
struct virtio_net *dev = vdev->dev;
struct virtio_net *tdev; /* destination virito device */

-   pkt_hdr = (struct ether_hdr *)m->pkt.data;
+   pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);

/*get the used devices list*/
dev_ll = ll_root_used;
@@ -1052,7 +1052,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, struct rte_mempool *
unsigned len, ret, offset = 0;
const uint16_t lcore_id = rte_lcore_id();
struct virtio_net_data_ll *dev_ll = ll_root_used;
-   struct ether_hdr *pkt_hdr = (struct ether_hdr *)m->pkt.data;
+   struct ether_hdr *pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
struct virtio_net *dev = vdev->dev;

/*heck if destination is local VM*/
@@ -1104,8 +1104,8 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, struct rte_mempool *

m->ol_flags = PKT_TX_VLAN_PKT;
/*FIXME: offset*/
-   m->pkt.data_len += offset;
-   m->pkt.vlan_macip.f.vlan_tci = vlan_tag;
+   m->data_len += offset;
+   m->vlan_tci = vlan_tag;

tx_q->m_table[len] = m;
len++;
@@ -1449,9 +1449,9 @@ attach_rxmbuf_zcp(struct virtio_net *dev)
}

mbuf->buf_addr = (void *)(uintptr_t)(buff_addr - RTE_PKTMBUF_HEADROOM);
-   mbuf->pkt.data = (void *)(uintptr_t)(buff_addr);
+   mbuf->data_off = RTE_PKTMBUF_HEADROOM;
mbuf->buf_physaddr = phys_addr - RTE_PKTMBUF_HEADROOM;
-   mbuf->pkt.data_len = desc->len;
+   mbuf->data_len = desc->len;
MBUF_HEADROOM_UINT32(mbuf) = (uint32_t)desc_idx;

LOG_DEBUG(VHOST_DATA,
@@ -1486,9 +1486,9 @@ static inline void pktmbuf_detach_zcp(struct rte_mbuf *m)

buf_ofs = (RTE_PKTMBUF_HEADROOM <= m->buf_len) ?
RTE_PKTMBUF_HEADROOM : m->buf_len;
-   m->pkt.data = (char *) m->buf_addr + buf_ofs;
+   m->data_off = buf_ofs;

-   m->pkt.data_len = 0;
+   m->data_len = 0;
 }

 /*
@@ -1720,7 +1720,7 @@ virtio_tx_route_zcp(struct virtio_net *dev, struct 
rte_mbuf *m,
unsigned len, ret, offset = 0;
struct vpool *vpool;
struct virtio_net_data_ll *dev_ll = ll_root_used;
-   struct ether_hdr *pkt_hdr = (struct ether_hdr *)m->pkt.data;
+   struct ether_hdr *pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
uint16_t vlan_tag = (uint16_t)vlan_tags[(uint16_t)dev->device_fh];
uint16_t vmdq_rx_q = ((struct vhost_dev *)dev->priv)->vmdq_rx_q;

@@ -1792,24 +1792,25 @@ virtio_tx_route_zcp(struct virtio_net *dev, struct 
rte_mbuf *m,
}
}

-   mbuf->pkt.nb_segs = m->pkt.nb_segs;
-   mbuf->pkt.next = m->pkt.next;
-   mbuf->pkt.data_len = m->pkt.data_len + offset;
-   mbuf->pkt.pkt_len = mbuf->pkt.data_len;
+   mbuf->nb_segs = m->nb_segs;
+   mbuf->next = m->next;
+   mbuf->data_len = m->data_len + offset;
+   mbuf->pkt_len = mbuf->data_len;
if (unlikely(need_copy)) {
/* Copy the packet contents to the mbuf. */
-   rte_memcpy((void *)((uint8_t *)mbuf->pkt.data),
-   (const void *) ((uint8_t *)m->pkt.data),
-   m->pkt.data_len);
+   rte_memcpy(rte_pktmbuf_mtod(mbuf, void *),
+

[dpdk-dev] [PATCH v5 09/11] examples/vhost: vhost example based on vhost lib API

2014-09-26 Thread Huawei Xie

This vhost example demonstrates how to integrate user space vhost
with DPDK accelerated ethernet vSwitch.

Signed-off-by: Huawei Xie 
---
 examples/vhost/main.c | 1455 +
 examples/vhost/main.h |   47 +-
 2 files changed, 431 insertions(+), 1071 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 7d9e6a2..3834af4 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -49,10 +49,9 @@
 #include 
 #include 
 #include 
+#include 

 #include "main.h"
-#include "virtio-net.h"
-#include "vhost-net-cdev.h"

 #define MAX_QUEUES 128

@@ -100,7 +99,6 @@
 #define TX_WTHRESH 0  /* Default values of TX write-back threshold reg. */

 #define MAX_PKT_BURST 32   /* Max burst size for RX/TX */
-#define MAX_MRG_PKT_BURST 16   /* Max burst for merge buffers. Set to 1 due to 
performance issue. */
 #define BURST_TX_DRAIN_US 100  /* TX drain every ~100us */

 #define BURST_RX_WAIT_US 15/* Defines how long we wait between retries on 
RX */
@@ -168,13 +166,14 @@ static uint32_t num_switching_cores = 0;

 /* number of devices/queues to support*/
 static uint32_t num_queues = 0;
-uint32_t num_devices = 0;
+static uint32_t num_devices;

 /*
  * Enable zero copy, pkts buffer will directly dma to hw descriptor,
  * disabled on default.
  */
 static uint32_t zero_copy;
+static int mergeable;

 /* number of descriptors to apply*/
 static uint32_t num_rx_descriptor = RTE_TEST_RX_DESC_DEFAULT_ZCP;
@@ -218,12 +217,6 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 /* Character device basename. Can be set by user. */
 static char dev_basename[MAX_BASENAME_SZ] = "vhost-net";

-/* Charater device index. Can be set by user. */
-static uint32_t dev_index = 0;
-
-/* This can be set by the user so it is made available here. */
-extern uint64_t VHOST_FEATURES;
-
 /* Default configuration for rx and tx thresholds etc. */
 static struct rte_eth_rxconf rx_conf_default = {
.rx_thresh = {
@@ -678,11 +671,12 @@ us_vhost_parse_args(int argc, char **argv)
us_vhost_usage(prgname);
return -1;
} else {
+   mergeable = !!ret;
if (ret) {

vmdq_conf_default.rxmode.jumbo_frame = 1;

vmdq_conf_default.rxmode.max_rx_pkt_len
= JUMBO_FRAME_MAX_SIZE;
-   VHOST_FEATURES = (1ULL << 
VIRTIO_NET_F_MRG_RXBUF);
+
}
}
}
@@ -708,17 +702,6 @@ us_vhost_parse_args(int argc, char **argv)
}
}

-   /* Set character device index. */
-   if (!strncmp(long_option[option_index].name, 
"dev-index", MAX_LONG_OPT_SZ)) {
-   ret = parse_num_opt(optarg, INT32_MAX);
-   if (ret == -1) {
-   RTE_LOG(INFO, VHOST_CONFIG, "Invalid 
argument for character device index [0..N]\n");
-   us_vhost_usage(prgname);
-   return -1;
-   } else
-   dev_index = ret;
-   }
-
/* Enable/disable rx/tx zero copy. */
if (!strncmp(long_option[option_index].name,
"zero-copy", MAX_LONG_OPT_SZ)) {
@@ -867,36 +850,11 @@ static unsigned check_ports_num(unsigned nb_ports)
 #endif

 /*
- * Function to convert guest physical addresses to vhost virtual addresses. 
This
- * is used to convert virtio buffer addresses.
- */
-static inline uint64_t __attribute__((always_inline))
-gpa_to_vva(struct virtio_net *dev, uint64_t guest_pa)
-{
-   struct virtio_memory_regions *region;
-   uint32_t regionidx;
-   uint64_t vhost_va = 0;
-
-   for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
-   region = >mem->regions[regionidx];
-   if ((guest_pa >= region->guest_phys_address) &&
-   (guest_pa <= region->guest_phys_address_end)) {
-   vhost_va = region->address_offset + guest_pa;
-   break;
-   }
-   }
-   LOG_DEBUG(VHOST_DATA, "(%"PRIu64") GPA %p| VVA %p\n",
-   dev->device_fh, (void*)(uintptr_t)guest_pa, 
(void*)(uintptr_t)vhost_va);
-
-   return vhost_va;
-}
-
-/*
  * Function to convert guest physical addresses to vhost physical addresses.
  * This is used to convert virtio buffer addresses.
  */
 static inline uint64_t __attribute__((always_inline))
-gpa_to_hpa(struct

[dpdk-dev] [PATCH v5 08/11] examples/vhost: copy old vhost example src file

2014-09-26 Thread Huawei Xie

copy old vhost example source files without any modification.
The subsequent patch will modify it to use new vhost lib API.

Signed-off-by: Huawei Xie 
---
 examples/vhost/main.c | 3722 +
 examples/vhost/main.h |   86 ++
 2 files changed, 3808 insertions(+)
 create mode 100644 examples/vhost/main.c
 create mode 100644 examples/vhost/main.h

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
new file mode 100644
index 000..7d9e6a2
--- /dev/null
+++ b/examples/vhost/main.c
@@ -0,0 +1,3722 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "main.h"
+#include "virtio-net.h"
+#include "vhost-net-cdev.h"
+
+#define MAX_QUEUES 128
+
+/* the maximum number of external ports supported */
+#define MAX_SUP_PORTS 1
+
+/*
+ * Calculate the number of buffers needed per port
+ */
+#define NUM_MBUFS_PER_PORT ((MAX_QUEUES*RTE_TEST_RX_DESC_DEFAULT) +
\
+   
(num_switching_cores*MAX_PKT_BURST) +   \
+   
(num_switching_cores*RTE_TEST_TX_DESC_DEFAULT) +\
+   
(num_switching_cores*MBUF_CACHE_SIZE))
+
+#define MBUF_CACHE_SIZE 128
+#define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)
+
+/*
+ * No frame data buffer allocated from host are required for zero copy
+ * implementation, guest will allocate the frame data buffer, and vhost
+ * directly use it.
+ */
+#define VIRTIO_DESCRIPTOR_LEN_ZCP 1518
+#define MBUF_SIZE_ZCP (VIRTIO_DESCRIPTOR_LEN_ZCP + sizeof(struct rte_mbuf) \
+   + RTE_PKTMBUF_HEADROOM)
+#define MBUF_CACHE_SIZE_ZCP 0
+
+/*
+ * RX and TX Prefetch, Host, and Write-back threshold values should be
+ * carefully set for optimal performance. Consult the network
+ * controller's datasheet and supporting DPDK documentation for guidance
+ * on how these parameters should be set.
+ */
+#define RX_PTHRESH 8 /* Default values of RX prefetch threshold reg. */
+#define RX_HTHRESH 8 /* Default values of RX host threshold reg. */
+#define RX_WTHRESH 4 /* Default values of RX write-back threshold reg. */
+
+/*
+ * These default values are optimized for use with the Intel(R) 82599 10 GbE
+ * Controller and the DPDK ixgbe PMD. Consider using other values for other
+ * network controllers and/or network drivers.
+ */
+#define TX_PTHRESH 36 /* Default values of TX prefetch threshold reg. */
+#define TX_HTHRESH 0  /* Default values of TX host threshold reg. */
+#define TX_WTHRESH 0  /* Default values of TX write-back threshold reg. */
+
+#define MAX_PKT_BURST 32   /* Max burst size for RX/TX */
+#define MAX_MRG_PKT_BURST 16   /* Max burst for merge buffers. Set to 1 due to 
performance issue. */
+#define BURST_TX_DRAIN_US 100  /* TX drain every ~100us */
+
+#define BURST_RX_WAIT_US 15/* Defines how long we wait between retries on 
RX */
+#define BURST_RX_RETRIES 4 /* Number of retries on RX. */
+
+#define JUMBO_FRAME_MAX_SIZE0x2600
+
+/* State of virtio device. */
+#define DEVICE_MAC_LEARNING 0

[dpdk-dev] [PATCH v5 07/11] lib/librte_vhost: add vhost support in DPDK makefile

2014-09-26 Thread Huawei Xie

vhost lib is turned off by default as it requires fuse-devel package.
fuse-devel isn't installed in every linux distribution.
fuse-devel enables user space filesystem/char driver development.
vhost lib contains a user space char driver, which replies on this package.

Signed-off-by: Huawei Xie 
---
 config/common_linuxapp | 8 
 lib/Makefile   | 1 +
 mk/rte.app.mk  | 5 +
 3 files changed, 14 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 5bee910..6ac6f35 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -390,6 +390,14 @@ CONFIG_RTE_KNI_VHOST_DEBUG_RX=n
 CONFIG_RTE_KNI_VHOST_DEBUG_TX=n

 #
+# Compile vhost library
+# fuse-devel is needed to run vhost.
+# fuse-devel enables user space char driver development
+#
+CONFIG_RTE_LIBRTE_VHOST=n
+CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index 10c5bb3..007c174 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -60,6 +60,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_METER) += librte_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += librte_sched
 DIRS-$(CONFIG_RTE_LIBRTE_KVARGS) += librte_kvargs
 DIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += librte_distributor
+DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += librte_vhost
 DIRS-$(CONFIG_RTE_LIBRTE_PORT) += librte_port
 DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table
 DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 34dff2a..285b65c 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -190,6 +190,11 @@ ifeq ($(CONFIG_RTE_LIBRTE_VIRTIO_PMD),y)
 LDLIBS += -lrte_pmd_virtio_uio
 endif

+ifeq ($(CONFIG_RTE_LIBRTE_VHOST), y)
+LDLIBS += -lrte_vhost
+LDLIBS += -lfuse
+endif
+
 ifeq ($(CONFIG_RTE_LIBRTE_I40E_PMD),y)
 LDLIBS += -lrte_pmd_i40e
 endif
-- 
1.8.1.4

[dpdk-dev] [PATCH v5 06/11] lib/librte_vhost: fixes serious coding style issues

2014-09-26 Thread Huawei Xie

fixes checkpatch reported issues.

Signed-off-by: Huawei Xie 
---
 lib/librte_vhost/rte_virtio_net.h |  55 +
 lib/librte_vhost/vhost-net-cdev.c | 236 +++---
 lib/librte_vhost/vhost-net-cdev.h |  33 +++---
 lib/librte_vhost/vhost_rxtx.c |  57 +
 lib/librte_vhost/virtio-net.c | 149 ++--
 5 files changed, 281 insertions(+), 249 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 99ddfc1..06cbdf7 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -66,19 +66,18 @@ struct buf_vector {
 /**
  * Structure contains variables relevant to RX/TX virtqueues.
  */
-struct vhost_virtqueue
-{
-   struct vring_desc   *desc;  /* Virtqueue 
descriptor ring. */
-   struct vring_avail  *avail; /* Virtqueue 
available ring. */
-   struct vring_used   *used;  /* Virtqueue 
used ring. */
-   uint32_tsize;   /* Size 
of descriptor ring. */
-   uint32_tbackend;/* 
Backend value to determine if device should started/stopped. */
-   uint16_tvhost_hlen; /* 
Vhost header length (varies depending on RX merge buffers. */
+struct vhost_virtqueue {
+   struct vring_desc   *desc;  /* Virtqueue descriptor 
ring. */
+   struct vring_avail  *avail; /* Virtqueue available 
ring. */
+   struct vring_used   *used;  /* Virtqueue used ring. 
*/
+   uint32_tsize;   /* Size of descriptor 
ring. */
+   uint32_tbackend;/* Backend value to 
determine if device should started/stopped. */
+   uint16_tvhost_hlen; /* Vhost header length 
(varies depending on RX merge buffers. */
volatile uint16_t   last_used_idx;  /* Last index used on 
the available ring */
volatile uint16_t   last_used_idx_res;  /* Used for multiple 
devices reserving buffers. */
-   eventfd_t   callfd; /* 
Currently unused as polling mode is enabled. */
-   eventfd_t   kickfd; /* Used 
to notify the guest (trigger interrupt). */
-   struct buf_vectorbuf_vec[BUF_VECTOR_MAX]; /**< for scatter RX. */
+   eventfd_t   callfd; /* Currently unused as 
polling mode is enabled. */
+   eventfd_t   kickfd; /* Used to notify the 
guest (trigger interrupt). */
+   struct buf_vector   buf_vec[BUF_VECTOR_MAX]; /**< for scatter RX. */
 } __rte_cache_aligned;


@@ -86,11 +85,11 @@ struct vhost_virtqueue
  * Information relating to memory regions including offsets to addresses in 
QEMUs memory file.
  */
 struct virtio_memory_regions {
-   uint64_tguest_phys_address; /* Base guest physical 
address of region. */
+   uint64_tguest_phys_address; /* Base guest physical address 
of region. */
uint64_tguest_phys_address_end; /* End guest physical address 
of region. */
-   uint64_tmemory_size;/* Size of region. */
-   uint64_tuserspace_address;  /* Base userspace 
address of region. */
-   uint64_taddress_offset; /* Offset of region for 
address translation. */
+   uint64_tmemory_size;/* Size of region. */
+   uint64_tuserspace_address;  /* Base userspace address of 
region. */
+   uint64_taddress_offset; /* Offset of region for address 
translation. */
 };


@@ -98,31 +97,31 @@ struct virtio_memory_regions {
  * Memory structure includes region and mapping information.
  */
 struct virtio_memory {
-   uint64_tbase_address;/**< Base QEMU userspace address of the 
memory file. */
-   uint64_tmapped_address;  /**< Mapped address of memory file base in 
our applications memory space. */
-   uint64_tmapped_size; /**< Total size of memory file. */
-   uint32_tnregions;/**< Number of memory regions. */
-   struct virtio_memory_regions  regions[0]; /**< Memory region 
information. */
+   uint64_tbase_address;   /**< Base QEMU userspace address of the 
memory file. */
+   uint64_tmapped_address; /**< Mapped address of memory file base 
in our applications memory space. */
+   uint64_tmapped_size;/**< Total size of memory file. */
+   uint32_tnregions;   /**< Number of memory regions. */
+   struct  virtio_memory_regions  regions[0];  /**< Memory 
region information. */
 };

 /**
  * Device structure contains

[dpdk-dev] [PATCH v5 05/11] lib/librte_vhost: merge Oliver's mbuf change

2014-09-26 Thread Huawei Xie

There is no rte_pktmbuf structure in mbuf now. Its fields are merged to
rte_mbuf structure.

Signed-off-by: Huawei Xie 
---
 lib/librte_vhost/vhost_rxtx.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 81368e6..688e661 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -145,7 +145,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, 
struct rte_mbuf **pkts,
/* Copy mbuf data to buffer */
/* TODO fixme for sg mbuf and the case that desc couldn't hold 
the mbuf data */
rte_memcpy((void *)(uintptr_t)buff_addr,
-   (const void *)buff->pkt.data,
+   rte_pktmbuf_mtod(buff, const void *),
rte_pktmbuf_data_len(buff));
VHOST_PRINT_PACKET(dev, (uintptr_t)buff_addr,
rte_pktmbuf_data_len(buff), 0);
@@ -307,7 +307,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t 
res_base_idx,
 * This current segment complete, need continue to
 * check if the whole packet complete or not.
 */
-   pkt = pkt->pkt.next;
+   pkt = pkt->next;
if (pkt != NULL) {
/*
 * There are more segments.
@@ -411,7 +411,7 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id, struct rte_mbuf *
uint32_t secure_len = 0;
uint16_t need_cnt;
uint32_t vec_idx = 0;
-   uint32_t pkt_len = pkts[pkt_idx]->pkt.pkt_len + vq->vhost_hlen;
+   uint32_t pkt_len = pkts[pkt_idx]->pkt_len + vq->vhost_hlen;
uint16_t i, id;

do {
@@ -631,8 +631,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id, struct rte_me
 * while the virtio buffer in TX vring has
 * more data to be copied.
 */
-   cur->pkt.data_len = seg_offset;
-   m->pkt.pkt_len += seg_offset;
+   cur->data_len = seg_offset;
+   m->pkt_len += seg_offset;
/* Allocate mbuf and populate the structure. */
cur = rte_pktmbuf_alloc(mbuf_pool);
if (unlikely(cur == NULL)) {
@@ -644,7 +644,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id, struct rte_me
}

seg_num++;
-   prev->pkt.next = cur;
+   prev->next = cur;
prev = cur;
seg_offset = 0;
seg_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM;
@@ -660,8 +660,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id, struct rte_me
 * room to accomodate more
 * data.
 */
-   cur->pkt.data_len = seg_offset;
-   m->pkt.pkt_len += seg_offset;
+   cur->data_len = seg_offset;
+   m->pkt_len += seg_offset;
/*
 * Allocate an mbuf and
 * populate the structure.
@@ -678,7 +678,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id, struct rte_me
break;
}
seg_num++;
-   prev->pkt.next = cur;
+   prev->next = cur;
prev = cur;
seg_offset = 0;
seg_avail = cur->buf_len - 
RTE_PKTMBUF_HEADROOM;
@@ -697,8 +697,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id, struct rte_me
desc->len, 0);
} else {
/* The whole packet completes. */
-   cur->pkt.data_len = seg_offset;
-   m->pkt.pkt_len += seg_offset;
+

[dpdk-dev] [PATCH v5 04/11] lib/librte_vhost: merge vhost merge-able rx. merge vhost tx fix.

2014-09-26 Thread Huawei Xie

Merge vhost merge-able rx.
For vhost tx, previous vhost merge-able feature introduces virtio_dev_merge_tx,
and calls virtio_dev_tx and vritio_dev_merge_tx respectively depends on whether
the vhost device supports merge-able feature.
There is no so called merge-tx, it is actually fix for memcpy from chained vring
desc to chained mbuf.
Use virtio_dev_merge_tx as the base for vhost tx.

Signed-off-by: Huawei Xie 
---
 lib/librte_vhost/rte_virtio_net.h |  16 +-
 lib/librte_vhost/vhost_rxtx.c | 568 +-
 2 files changed, 511 insertions(+), 73 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 08dc6f4..99ddfc1 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -53,9 +53,18 @@
 /* Enum for virtqueue management. */
 enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};

-
-/*
- * Structure contains variables relevant to TX/RX virtqueues.
+#define BUF_VECTOR_MAX 256
+/**
+ * Structure contains buffer address, length and descriptor index
+ * from vring to do scatter RX.
+ */
+struct buf_vector {
+   uint64_t buf_addr;
+   uint32_t buf_len;
+   uint32_t desc_idx;
+};
+/**
+ * Structure contains variables relevant to RX/TX virtqueues.
  */
 struct vhost_virtqueue
 {
@@ -69,6 +78,7 @@ struct vhost_virtqueue
volatile uint16_t   last_used_idx_res;  /* Used for multiple 
devices reserving buffers. */
eventfd_t   callfd; /* 
Currently unused as polling mode is enabled. */
eventfd_t   kickfd; /* Used 
to notify the guest (trigger interrupt). */
+   struct buf_vectorbuf_vec[BUF_VECTOR_MAX]; /**< for scatter RX. */
 } __rte_cache_aligned;


diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 0d96c43..81368e6 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -49,8 +49,8 @@
  * count is returned to indicate the number of packets that were succesfully
  * added to the RX queue. This function works when mergeable is disabled.
  */
-uint32_t
-rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id, struct 
rte_mbuf **pkts, uint32_t count)
+static inline uint32_t __attribute__((always_inline))
+virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, struct rte_mbuf 
**pkts, uint32_t count)
 {
struct vhost_virtqueue *vq;
struct vring_desc *desc;
@@ -61,7 +61,6 @@ rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t 
queue_id, struct rte_mb
uint64_t buff_hdr_addr = 0;
uint32_t head[VHOST_MAX_PKT_BURST], packet_len = 0;
uint32_t head_idx, packet_success = 0;
-   uint32_t mergeable, mrg_count = 0;
uint16_t avail_idx, res_cur_idx;
uint16_t res_base_idx, res_end_idx;
uint16_t free_entries;
@@ -101,9 +100,6 @@ rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t 
queue_id, struct rte_mb
/* Prefetch available ring to retrieve indexes. */
rte_prefetch0(>avail->ring[res_cur_idx & (vq->size - 1)]);

-   /* Check if the VIRTIO_NET_F_MRG_RXBUF feature is enabled. */
-   mergeable = dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF);
-
/* Retrieve all of the head indexes first to avoid caching issues. */
for (head_idx = 0; head_idx < count; head_idx++)
head[head_idx] = vq->avail->ring[(res_cur_idx + head_idx) & 
(vq->size - 1)];
@@ -122,27 +118,23 @@ rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t 
queue_id, struct rte_mb
/* Prefetch buffer address. */
rte_prefetch0((void*)(uintptr_t)buff_addr);

-   if (mergeable && (mrg_count != 0)) {
-   desc->len = packet_len = rte_pktmbuf_data_len(buff);
+   /* Copy virtio_hdr to packet and increment buffer address */
+   buff_hdr_addr = buff_addr;
+   packet_len = rte_pktmbuf_data_len(buff) + vq->vhost_hlen;
+
+   /*
+* If the descriptors are chained the header and data are 
placed in
+* separate buffers.
+*/
+   if (desc->flags & VRING_DESC_F_NEXT) {
+   desc->len = vq->vhost_hlen;
+   desc = >desc[desc->next];
+   /* Buffer address translation. */
+   buff_addr = gpa_to_vva(dev, desc->addr);
+   desc->len = rte_pktmbuf_data_len(buff);
} else {
-   /* Copy virtio_hdr to packet and increment buffer 
address */
-   buff_hdr_addr = buff_addr;
-   packet_len = rte_pktmbuf_data_len(buff) + 
vq->vhost_hlen;
-
-   /*
-* If the descriptors are chained the header and data 
are placed in
-* separate buffers.
-*/
-

[dpdk-dev] [PATCH v5 03/11] lib/librte_vhost: vhost lib transform

2014-09-26 Thread Huawei Xie

This vhost lib consists of five APIs plus several other helper routines
for feature disable/enable.
1) rte_vhost_driver_register initialises vhost driver.
2) rte_vhost_driver_callback_register registers the callbacks.
Callbacks are called from vhost driver when virtio device is ready
for polling or is de-activated by guest.
3) rte_vhost_driver_session_start, a blocking API to start vhost
message handler session.
4) rte_vhost_enqueue_burst and rte_vhost_dequeue_burst for
enqueue/dequeue to/from virtio ring.

Modifications include:
1) in vhost_rxtx.c
   virtio_dev_rx -> rte_vhost_enqueue_burst
   virtio_dev_tx -> rte_vhost_dequeue_burst
2) VMDQ, MAC learning and other switch related logics are removed.
3) zero copy feature isn't generic at this stage, and is removed.
4) retry logic is removed from vhost rx functions.
The above three logics will be implemented in example as reference.
5) Add several TODO/FIXME:
   -allow application to disable cmpset reserve in rte_vhost_enqueue_burst
in case there is no contention.
   -fix memcpy from mbuf to vring desc when mbuf is chained and the
desc couldn't hold all the data
   -fix vhost_set_mem_table possible race condition: two vqs concurrently
calls set_mem_table which cause saved mem_temp to be overide.
6) merge-able feature is removed, which will be merged in subsequent patch.


Signed-off-by: Huawei Xie 
---
 lib/librte_vhost/Makefile |  48 
 lib/librte_vhost/rte_virtio_net.h | 179 ---
 lib/librte_vhost/vhost-net-cdev.c |  35 +++---
 lib/librte_vhost/vhost-net-cdev.h |  45 +--
 lib/librte_vhost/vhost_rxtx.c | 157 +---
 lib/librte_vhost/virtio-net.c | 249 +++---
 6 files changed, 341 insertions(+), 372 deletions(-)
 create mode 100644 lib/librte_vhost/Makefile

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
new file mode 100644
index 000..6ad706d
--- /dev/null
+++ b/lib/librte_vhost/Makefile
@@ -0,0 +1,48 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_vhost.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -D_FILE_OFFSET_BITS=64 -lfuse
+LDFLAGS += -lfuse
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost-net-cdev.c virtio-net.c vhost_rxtx.c
+
+# install includes
+SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
+
+# this lib needs eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_VHOST) += lib/librte_eal lib/librte_mbuf
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 1a2f0dc..08dc6f4 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -34,28 +34,25 @@
 #ifndef _VIRTIO_NET_H_
 #define _VIRTIO_NET_H_

+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
 /* Used to indicate that the device is running on a data core */
 #define VIRTIO_DEV_RUNNING 1

 /* Backend value set by guest. */
 #define VIRTIO_DEV_STOPPED -1

-#define PAGE_SIZE   4096

 /* Enum for virtqueue management. */
 enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};

-#define BUF_VECTOR_MAX 256
-
-/*
- * Structure contains buffer address, length and descriptor index
- * from vring to do scatter RX.
-*/
-struct buf_vector {
-uint64_t buf_addr;

[dpdk-dev] [PATCH v5 02/11] lib/librte_vhost: refactor vhost lib for subsequent transform

2014-09-26 Thread Huawei Xie

This patch does simple split of the original vhost example source
files in vhost lib directory.
vhost rx/tx functions virtio_dev_rx/tx are copied from main.c to
new file vhost_rxtx.c and license header is added.
main.c and main.h are removed and will be copied to new vhost
example in subsequent patch.
virtio-net.h is renamed to rte_virtio_net.h as API header file.

Signed-off-by: Huawei Xie 
---
 lib/librte_vhost/main.c   | 3725 -
 lib/librte_vhost/main.h   |   86 -
 lib/librte_vhost/rte_virtio_net.h |  161 ++
 lib/librte_vhost/vhost_rxtx.c |  281 +++
 lib/librte_vhost/virtio-net.h |  161 --
 5 files changed, 442 insertions(+), 3972 deletions(-)
 delete mode 100644 lib/librte_vhost/main.c
 delete mode 100644 lib/librte_vhost/main.h
 create mode 100644 lib/librte_vhost/rte_virtio_net.h
 create mode 100644 lib/librte_vhost/vhost_rxtx.c
 delete mode 100644 lib/librte_vhost/virtio-net.h

diff --git a/lib/librte_vhost/main.c b/lib/librte_vhost/main.c
deleted file mode 100644
index 85ee8b8..000
--- a/lib/librte_vhost/main.c
+++ /dev/null
@@ -1,3725 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of Intel Corporation nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "main.h"
-#include "virtio-net.h"
-#include "vhost-net-cdev.h"
-
-#define MAX_QUEUES 128
-
-/* the maximum number of external ports supported */
-#define MAX_SUP_PORTS 1
-
-/*
- * Calculate the number of buffers needed per port
- */
-#define NUM_MBUFS_PER_PORT ((MAX_QUEUES*RTE_TEST_RX_DESC_DEFAULT) +
\
-   
(num_switching_cores*MAX_PKT_BURST) +   \
-   
(num_switching_cores*RTE_TEST_TX_DESC_DEFAULT) +\
-   
(num_switching_cores*MBUF_CACHE_SIZE))
-
-#define MBUF_CACHE_SIZE 128
-#define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)
-
-/*
- * No frame data buffer allocated from host are required for zero copy
- * implementation, guest will allocate the frame data buffer, and vhost
- * directly use it.
- */
-#define VIRTIO_DESCRIPTOR_LEN_ZCP 1518
-#define MBUF_SIZE_ZCP (VIRTIO_DESCRIPTOR_LEN_ZCP + sizeof(struct rte_mbuf) \
-   + RTE_PKTMBUF_HEADROOM)
-#define MBUF_CACHE_SIZE_ZCP 0
-
-/*
- * RX and TX Prefetch, Host, and Write-back threshold values should be
- * carefully set for optimal performance. Consult the network
- * controller's datasheet and supporting DPDK documentation for guidance
- * on how these parameters should be set.
- */
-#define RX_PTHRESH 8 /* Default values of RX prefetch threshold reg. */
-#define RX_HTHRESH 8 /* Default values of RX host threshold reg. */
-#define RX_WTHRESH 4 /* Default values of RX write-back threshold reg. */
-
-/*
- * These default values are optimized for use with the Intel(R) 82599 10 GbE
- * Controller and the DPDK ixgbe PMD. Consider using other values for other
- * network controllers and/or network drivers.
- */
-#define TX_PTHRESH 36 /* Default values of TX prefetch threshold reg. */
-#define TX_HTHRESH 0  /* Default values of TX host threshold reg. */

[dpdk-dev] [PATCH v5 01/11] lib/librte_vhost: move src files in vhost example to vhost lib directory

2014-09-26 Thread Huawei Xie

"git mv examples/vhost lib/librte_vhost"
This is a purely src file move, without any modification.
Subsequent patch will transform those src files to a vhost library.

Signed-off-by: Huawei Xie 
---
 examples/vhost/Makefile  |   60 -
 examples/vhost/eventfd_link/Makefile |   39 -
 examples/vhost/eventfd_link/eventfd_link.c   |  205 --
 examples/vhost/eventfd_link/eventfd_link.h   |   79 -
 examples/vhost/libvirt/qemu-wrap.py  |  367 ---
 examples/vhost/main.c| 3725 --
 examples/vhost/main.h|   86 -
 examples/vhost/vhost-net-cdev.c  |  367 ---
 examples/vhost/vhost-net-cdev.h  |   83 -
 examples/vhost/virtio-net.c  | 1165 
 examples/vhost/virtio-net.h  |  161 --
 lib/librte_vhost/eventfd_link/Makefile   |   39 +
 lib/librte_vhost/eventfd_link/eventfd_link.c |  205 ++
 lib/librte_vhost/eventfd_link/eventfd_link.h |   79 +
 lib/librte_vhost/libvirt/qemu-wrap.py|  367 +++
 lib/librte_vhost/main.c  | 3725 ++
 lib/librte_vhost/main.h  |   86 +
 lib/librte_vhost/vhost-net-cdev.c|  367 +++
 lib/librte_vhost/vhost-net-cdev.h|   83 +
 lib/librte_vhost/virtio-net.c| 1165 
 lib/librte_vhost/virtio-net.h|  161 ++
 21 files changed, 6277 insertions(+), 6337 deletions(-)
 delete mode 100644 examples/vhost/Makefile
 delete mode 100644 examples/vhost/eventfd_link/Makefile
 delete mode 100644 examples/vhost/eventfd_link/eventfd_link.c
 delete mode 100644 examples/vhost/eventfd_link/eventfd_link.h
 delete mode 100755 examples/vhost/libvirt/qemu-wrap.py
 delete mode 100644 examples/vhost/main.c
 delete mode 100644 examples/vhost/main.h
 delete mode 100644 examples/vhost/vhost-net-cdev.c
 delete mode 100644 examples/vhost/vhost-net-cdev.h
 delete mode 100644 examples/vhost/virtio-net.c
 delete mode 100644 examples/vhost/virtio-net.h
 create mode 100644 lib/librte_vhost/eventfd_link/Makefile
 create mode 100644 lib/librte_vhost/eventfd_link/eventfd_link.c
 create mode 100644 lib/librte_vhost/eventfd_link/eventfd_link.h
 create mode 100755 lib/librte_vhost/libvirt/qemu-wrap.py
 create mode 100644 lib/librte_vhost/main.c
 create mode 100644 lib/librte_vhost/main.h
 create mode 100644 lib/librte_vhost/vhost-net-cdev.c
 create mode 100644 lib/librte_vhost/vhost-net-cdev.h
 create mode 100644 lib/librte_vhost/virtio-net.c
 create mode 100644 lib/librte_vhost/virtio-net.h

diff --git a/examples/vhost/Makefile b/examples/vhost/Makefile
deleted file mode 100644
index f45f83f..000
--- a/examples/vhost/Makefile
+++ /dev/null
@@ -1,60 +0,0 @@
-#   BSD LICENSE
-#
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
-#   All rights reserved.
-#
-#   Redistribution and use in source and binary forms, with or without
-#   modification, are permitted provided that the following conditions
-#   are met:
-#
-# * Redistributions of source code must retain the above copyright
-#   notice, this list of conditions and the following disclaimer.
-# * Redistributions in binary form must reproduce the above copyright
-#   notice, this list of conditions and the following disclaimer in
-#   the documentation and/or other materials provided with the
-#   distribution.
-# * Neither the name of Intel Corporation nor the names of its
-#   contributors may be used to endorse or promote products derived
-#   from this software without specific prior written permission.
-#
-#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-ifeq ($(RTE_SDK),)
-$(error "Please define RTE_SDK environment variable")
-endif
-
-# Default target, can be overriden by command line or environment
-RTE_TARGET ?= x86_64-native-linuxapp-gcc
-
-include $(RTE_SDK)/mk/rte.vars.mk
-
-ifneq ($(CONFIG_RTE_EXEC_ENV),"linuxapp")
-$(info This application can only operate in a linuxapp environment, \
-please change the definition of the RTE_TARGET environment variable)
-all:
-else
-
-# binary name
-APP = vhost-switch
-
-# all source are stored in SRCS-y
-#SRCS-y := cusedrv.c

[dpdk-dev] [PATCH v5 00/11] user space vhost library and vhost example

2014-09-26 Thread Huawei Xie

This set of patches transforms and refactors vhost example to a user
space vhost library and a new vhost example based on this library.
This library implements a user space vhost cuse driver, and provides
generic APIs for user space ethernet vSwitch to integrate us-vhost for
fast packet switching with guest virtio.

The vhost lib consists of five APIs puls several other helper routines.
1) rte_vhost_driver_register initialises vhost driver.
2) rte_vhost_driver_callback_register registers new_device/destroy_device
callbacks. Those callbacks should be implemented in ethernet switch application.
new_device is called when a virtio_device is ready for processing.
destroy_device is called when a virtio_device is de-activated by guest.
3) rte_vhost_driver_session_start starts vhost driver
4) rte_vhost_enqueue_burst and rte_vhost_dequeue_burst for enqueue/dequeue
to/from virtio ring.

Change notes:
  v2) Turn of vhost lib by default
  v3) Fixed checkpatch issues
  v4) Split the monolithic patch
  v5) Merge merge-able rx/tx and mbuf change. Lots of coding style fixes.

Huawei Xie (11):
  1) move src files in vhost example to vhost lib directory.
  2) copy vhost rx/tx functions from main.c to a new file vhost_rxtx.c.
  3) remove main.c and main.h in vhost lib.
  4) rename virtio-net.h to rte_virtio_net.h as API header file.
  5) VMDQ, MAC learning related switching logic are removed from library.
  6) zero copy logic isn't generic enough at this stage, and is moved to 
example.
  7) retry logic is moved from vhost rx functions in vhost lib to switch_worker
switching function in example.
  8) add TODOs/FIXME   
-allow application to disable cmpset reserve in rte_vhost_enqueue_burst in 
case
 there is no contention.  
-fix memcpy from mbuf to vring desc when mbuf is chained and the
desc couldn't hold all the data
-fix vhost_set_mem_table possible race condition: two vqs concurrently   
 calls set_mem_table which cause saved mem_temp to be overided.  
  9) merge vhost merge-able rx
  10) for vhost tx, previous vhost merge-able feature introduces another 
version of
virtio_dev_merge_tx, and calls virtio_dev_tx and vritio_dev_merge_tx
respectively depends on whether the vhost device supports merge-able 
feature.
Actually "merge-able" tx is the fix for memcpy from chained vring desc to 
mbuf.
will use virtio_dev_merge_tx as the base for vhost tx.
  11) merge mbuf patch in vhost lib.
  12) fixes serious coding style issues.
  13) add vhost lib Makefile and vhost lib support in DPDK makefile. vhost lib 
is turned
off by default as it requires fuse-devel package.
  14) copy old vhost example files main.c and main.h as the base for new vhost 
example
  15) modify vhost example to use vhost lib API, and merge Oliver's mbuf patch.


 config/common_linuxapp   |8 +
 examples/vhost/Makefile  |   10 +-
 examples/vhost/eventfd_link/Makefile |   39 -
 examples/vhost/eventfd_link/eventfd_link.c   |  205 
 examples/vhost/eventfd_link/eventfd_link.h   |   79 --
 examples/vhost/libvirt/qemu-wrap.py  |  367 ---
 examples/vhost/main.c| 1465 +++---
 examples/vhost/main.h|   47 +-
 examples/vhost/vhost-net-cdev.c  |  367 ---
 examples/vhost/vhost-net-cdev.h  |   83 --
 examples/vhost/virtio-net.c  | 1165 
 examples/vhost/virtio-net.h  |  161 ---
 lib/Makefile |1 +
 lib/librte_vhost/Makefile|   48 +
 lib/librte_vhost/eventfd_link/Makefile   |   39 +
 lib/librte_vhost/eventfd_link/eventfd_link.c |  205 
 lib/librte_vhost/eventfd_link/eventfd_link.h |   79 ++
 lib/librte_vhost/libvirt/qemu-wrap.py|  367 +++
 lib/librte_vhost/rte_virtio_net.h|  207 
 lib/librte_vhost/vhost-net-cdev.c|  362 +++
 lib/librte_vhost/vhost-net-cdev.h|  113 ++
 lib/librte_vhost/vhost_rxtx.c|  737 +
 lib/librte_vhost/virtio-net.c| 1029 ++
 mk/rte.app.mk|5 +
 24 files changed, 3636 insertions(+), 3552 deletions(-)
 delete mode 100644 examples/vhost/eventfd_link/Makefile
 delete mode 100644 examples/vhost/eventfd_link/eventfd_link.c
 delete mode 100644 examples/vhost/eventfd_link/eventfd_link.h
 delete mode 100755 examples/vhost/libvirt/qemu-wrap.py
 delete mode 100644 examples/vhost/vhost-net-cdev.c
 delete mode 100644 examples/vhost/vhost-net-cdev.h
 delete mode 100644 examples/vhost/virtio-net.c
 delete mode 100644 examples/vhost/virtio-net.h
 create mode 100644 lib/librte_vhost/Makefile
 create mode 100644 lib/librte_vhost/eventfd_link/Makefile
 create mode 100644 lib/librte_vhost/eventfd_link/eventfd_link.c
 create mode 100644 lib/librte_vhost/eventfd_link/eventfd_link.h
 create mode 100755

[dpdk-dev] [PATCH v2] ADD mode 5(tlb) to link bonding pmd

2014-09-26 Thread Daniel Mrzyglod


Signed-off-by: Daniel Mrzyglod 
---
 app/test/test_link_bonding.c   |  501 +++-
 app/test/virtual_pmd.c |6 +-
 app/test/virtual_pmd.h |7 +
 lib/librte_pmd_bond/rte_eth_bond.h |   23 ++
 lib/librte_pmd_bond/rte_eth_bond_args.c|1 +
 lib/librte_pmd_bond/rte_eth_bond_pmd.c |  161 -
 lib/librte_pmd_bond/rte_eth_bond_private.h |3 +-
 7 files changed, 696 insertions(+), 6 deletions(-)

diff --git a/app/test/test_link_bonding.c b/app/test/test_link_bonding.c
index c4fcaf7..77f791f 100644
--- a/app/test/test_link_bonding.c
+++ b/app/test/test_link_bonding.c
@@ -41,7 +41,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 #include 
 #include 
@@ -3845,6 +3845,500 @@ testsuite_teardown(void)
return remove_slaves_and_stop_bonded_device();
 }

+#define NINETY_PERCENT_NUMERAL 90
+#define ONE_HUNDRED_PERCENT_DENOMINATOR 100
+#define ONE_HUNDRED_PERCENT_AND_TEN_NUMERAL 110
+static int
+test_tlb_tx_burst(void)
+{
+   int i, burst_size, nb_tx;
+   uint64_t nb_tx2 = 0;
+   struct rte_mbuf *pkt_burst[MAX_PKT_BURST];
+   struct rte_eth_stats port_stats[32];
+   uint64_t sum_ports_opackets = 0, all_bond_opackets = 0, all_bond_obytes 
= 0;
+   uint16_t pktlen;
+
+   TEST_ASSERT_SUCCESS(initialize_bonded_device_with_slaves
+   (BONDING_MODE_ADAPTIVE_TRANSMIT_LOAD_BALANCING, 1, 3, 
1),
+   "Failed to initialise bonded device");
+
+   burst_size = 20 * test_params->bonded_slave_count;
+
+   TEST_ASSERT(burst_size < MAX_PKT_BURST,
+   "Burst size specified is greater than supported.\n");
+
+
+   /* Generate 40 test bursts in 2s of packets to transmit  */
+   for (i = 0; i < 40; i++) {
+   /*test two types of mac src own(bonding) and others */
+   if (i % 2 == 0) {
+   initialize_eth_header(test_params->pkt_eth_hdr,
+   (struct ether_addr *)src_mac, (struct 
ether_addr *)dst_mac_0, 0, 0);
+   } else {
+   initialize_eth_header(test_params->pkt_eth_hdr,
+   (struct ether_addr 
*)test_params->default_slave_mac,
+   (struct ether_addr *)dst_mac_0, 0, 0);
+   }
+   pktlen = initialize_udp_header(test_params->pkt_udp_hdr, 
src_port,
+   dst_port_0, 16);
+   pktlen = initialize_ipv4_header(test_params->pkt_ipv4_hdr, 
src_addr,
+   dst_addr_0, pktlen);
+   generate_packet_burst(test_params->mbuf_pool, pkt_burst,
+   test_params->pkt_eth_hdr, 0, 
test_params->pkt_ipv4_hdr,
+   1, test_params->pkt_udp_hdr, burst_size, 60, 1);
+   /* Send burst on bonded port */
+   nb_tx = rte_eth_tx_burst(test_params->bonded_port_id, 0, 
pkt_burst,
+   burst_size);
+   nb_tx2 += nb_tx;
+
+   TEST_ASSERT_EQUAL(nb_tx, burst_size,
+   "number of packet not equal burst size");
+
+   rte_delay_us(5);
+   }
+
+
+   /* Verify bonded port tx stats */
+   rte_eth_stats_get(test_params->bonded_port_id, _stats[0]);
+
+   all_bond_opackets = port_stats[0].opackets;
+   all_bond_obytes = port_stats[0].obytes;
+
+   TEST_ASSERT_EQUAL(port_stats[0].opackets, (uint64_t)nb_tx2,
+   "Bonded Port (%d) opackets value (%u) not as expected 
(%d)\n",
+   test_params->bonded_port_id, (unsigned 
int)port_stats[0].opackets,
+   burst_size);
+
+
+   /* Verify slave ports tx stats */
+   for (i = 0; i < test_params->bonded_slave_count; i++) {
+   rte_eth_stats_get(test_params->slave_port_ids[i], 
_stats[i]);
+   sum_ports_opackets += port_stats[i].opackets;
+   }
+
+   TEST_ASSERT_EQUAL(sum_ports_opackets, (uint64_t)all_bond_opackets,
+   "Total packets sent by slaves is not equalto packets 
sent by bond interface");
+
+   for (i = 0; i < test_params->bonded_slave_count; i++) {
+   printf("port stats:%"PRIu64"\n", port_stats[i].opackets);
+   /* distribution of packets on each slave within +/- 10% of the 
expected value. */
+   TEST_ASSERT(port_stats[i].obytes >= 
((all_bond_obytes*NINETY_PERCENT_NUMERAL)/
+   
(test_params->bonded_slave_count*ONE_HUNDRED_PERCENT_DENOMINATOR)) &&
+   port_stats[i].obytes <= 
((all_bond_obytes*ONE_HUNDRED_PERCENT_AND_TEN_NUMERAL) /
+   
(test_params->bonded_slave_count*ONE_HUNDRED_PERCENT_DENOMINATOR)),
+   "Distribution is not even");

[dpdk-dev] Hi all, does Amazon VMs supported DPDK or not?

2014-09-26 Thread Patel, Rashmin N

It really depends on the devices offered in the VM. If direct device assignment 
is not provided to a VM or if the node hypervisor doesn't have an optimized 
para-virtual interface to a VM, I don't see any benefit using DPDK in VMs.

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Dong, Binghua
Sent: Friday, September 26, 2014 5:47 AM
To: dev at dpdk.org
Subject: [dpdk-dev] Hi all, does Amazon VMs supported DPDK or not?

A customer plan to buy some global Amazon VMs to run their DPDK 1.3(will 
upgrade to DPDK1.6 or 1.7) based VPN applications on global sites.

Thanks a lot;

[dpdk-dev] [PATCH] eal: remove rte_snprintf

2014-09-26 Thread Thomas Monjalon

The function rte_snprintf() was deprecated in version 1.7.0
(commit 6f41fe75e2dd).
It's now totally removed.

Signed-off-by: Thomas Monjalon 
---
 app/test/Makefile  |   7 --
 app/test/test_string_fns.c | 136 +
 lib/librte_eal/common/eal_common_string_fns.c  |  28 -
 lib/librte_eal/common/include/rte_string_fns.h |  24 -
 lib/librte_eal/common/include/rte_warnings.h   |   4 -
 5 files changed, 1 insertion(+), 198 deletions(-)

diff --git a/app/test/Makefile b/app/test/Makefile
index 210a7f6..822bbd4 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -133,13 +133,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_KVARGS) += test_kvargs.c
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)

-# Allow use of deprecated rte_snprintf in test_string_fns.c
-ifeq ($(CC), icc)
-CFLAGS_test_string_fns.o += -Wd1478
-else
-CFLAGS_test_string_fns.o += -Wno-deprecated-declarations
-endif
-
 # Disable warnings of deprecated-declarations in test_kni.c
 ifeq ($(CC), icc)
 CFLAGS_test_kni.o += -wd1478
diff --git a/app/test/test_string_fns.c b/app/test/test_string_fns.c
index 29bfe5b..39e6a9d 100644
--- a/app/test/test_string_fns.c
+++ b/app/test/test_string_fns.c
@@ -49,139 +49,6 @@
 #define DATA_BYTE 'a'

 static int
-test_rte_snprintf(void)
-{
-   /* =
-* First test with a string that will fit in buffer
-* =*/
-   do {
-   int retval;
-   const char source[] = "This is a string that will fit in 
buffer";
-   char buf[sizeof(source)+2]; /* make buffer big enough to fit 
string */
-
-   /* initialise buffer with characters so it can contain no nulls 
*/
-   memset(buf, DATA_BYTE, sizeof(buf));
-
-   /* run rte_snprintf and check results */
-   retval = rte_snprintf(buf, sizeof(buf), "%s", source);
-   if (retval != sizeof(source) - 1) {
-   LOG("Error, retval = %d, expected = %u\n",
-   retval, (unsigned)sizeof(source));
-   return -1;
-   }
-   if (buf[retval] != '\0') {
-   LOG("Error, resultant is not null-terminated\n");
-   return -1;
-   }
-   if (memcmp(source, buf, sizeof(source)-1) != 0){
-   LOG("Error, corrupt data in buffer\n");
-   return -1;
-   }
-   } while (0);
-
-   do {
-   /* =
-* Test with a string that will get truncated
-* =*/
-   int retval;
-   const char source[] = "This is a long string that won't fit in 
buffer";
-   char buf[sizeof(source)/2]; /* make buffer half the size */
-
-   /* initialise buffer with characters so it can contain no nulls 
*/
-   memset(buf, DATA_BYTE, sizeof(buf));
-
-   /* run rte_snprintf and check results */
-   retval = rte_snprintf(buf, sizeof(buf), "%s", source);
-   if (retval != sizeof(source) - 1) {
-   LOG("Error, retval = %d, expected = %u\n",
-   retval, (unsigned)sizeof(source));
-   return -1;
-   }
-   if (buf[sizeof(buf)-1] != '\0') {
-   LOG("Error, buffer is not null-terminated\n");
-   return -1;
-   }
-   if (memcmp(source, buf, sizeof(buf)-1) != 0){
-   LOG("Error, corrupt data in buffer\n");
-   return -1;
-   }
-   } while (0);
-
-   do {
-   /* ===
-* Test using zero-size buf to check how long a buffer we need
-* ===*/
-   int retval;
-   const char source[] = "This is a string";
-   char buf[10];
-
-   /* call with a zero-sized non-NULL buffer, should tell how big 
a buffer
-* we need */
-   retval = rte_snprintf(buf, 0, "%s", source);
-   if (retval != sizeof(source) - 1) {
-   LOG("Call with 0-length buffer does not return correct 
size."
-   "Expected: %zu, got: %d\n", 
sizeof(source), retval);
-   return -1;
-   }
-
-   /* call with a zero-sized NULL buffer, should tell how big a 
buffer
-* we need */
-   retval = rte_snprintf(NULL, 0, "%s", source);
-   if (retval != sizeof(source) - 1) {
-   LOG("Call with

[dpdk-dev] [PATCH 1/4 v2] compat: Add infrastructure to support symbol versioning

2014-09-26 Thread Sergio Gonzalez Monroy

On Fri, Sep 26, 2014 at 11:16:30AM -0400, Neil Horman wrote:
> On Fri, Sep 26, 2014 at 03:16:08PM +0100, Sergio Gonzalez Monroy wrote:
> > On Thu, Sep 25, 2014 at 02:52:32PM -0400, Neil Horman wrote:
> > > Add initial pass header files to support symbol versioning.
> > > 
> > > ---
> > > Change notes
> > > v2)
> > > * Fixed ifdef in rte_compat.h to test for RTE_BUILD_SHARED_LIB instead of 
> > > the
> > > non-existant RTE_SYMBOL_VERSIONING
> > > 
> > > * Fixed VERSION_SYMBOL macro to add the needed extra @ to make versioning 
> > > work
> > > properly
> > > 
> > > * Improved/Clarified documentation
> > > 
> > > Signed-off-by: Neil Horman 
> > > CC: Thomas Monjalon 
> > > CC: "Richardson, Bruce" 
> > > CC: "Gonzalez Monroy, Sergio" 
> > > ---
> > >  lib/Makefile   |  1 +
> > >  lib/librte_compat/Makefile | 38 ++
> > >  lib/librte_compat/rte_compat.h | 87 
> > > ++
> > >  mk/rte.lib.mk  |  6 +++
> > >  4 files changed, 132 insertions(+)
> > >  create mode 100644 lib/librte_compat/Makefile
> > >  create mode 100644 lib/librte_compat/rte_compat.h
> > > 
> > > diff --git a/lib/Makefile b/lib/Makefile
> > > index 10c5bb3..a85b55b 100644
> > > --- a/lib/Makefile
> > > +++ b/lib/Makefile
> > > @@ -32,6 +32,7 @@
> > >  include $(RTE_SDK)/mk/rte.vars.mk
> > >  
> > >  DIRS-$(CONFIG_RTE_LIBC) += libc
> > > +DIRS-y += librte_compat
> > >  DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal
> > >  DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc
> > >  DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring
> > > diff --git a/lib/librte_compat/Makefile b/lib/librte_compat/Makefile
> > > new file mode 100644
> > > index 000..3415c7b
> > > --- /dev/null
> > > +++ b/lib/librte_compat/Makefile
> > > @@ -0,0 +1,38 @@
> > > +#   BSD LICENSE
> > > +#
> > > +#   Copyright(c) 2010-2014 Neil Horman 
> > > +#   All rights reserved.
> > > +#
> > > +#   Redistribution and use in source and binary forms, with or without
> > > +#   modification, are permitted provided that the following conditions
> > > +#   are met:
> > > +#
> > > +# * Redistributions of source code must retain the above copyright
> > > +#   notice, this list of conditions and the following disclaimer.
> > > +# * Redistributions in binary form must reproduce the above copyright
> > > +#   notice, this list of conditions and the following disclaimer in
> > > +#   the documentation and/or other materials provided with the
> > > +#   distribution.
> > > +# * Neither the name of Intel Corporation nor the names of its
> > > +#   contributors may be used to endorse or promote products derived
> > > +#   from this software without specific prior written permission.
> > > +#
> > > +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > > +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > > +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > > +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > > +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > > +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > > +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > > +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > > +
> > > +include $(RTE_SDK)/mk/rte.vars.mk
> > > +
> > > +
> > > +# install includes
> > > +SYMLINK-y-include := rte_compat.h
> > > +
> > > +include $(RTE_SDK)/mk/rte.lib.mk
> > > diff --git a/lib/librte_compat/rte_compat.h 
> > > b/lib/librte_compat/rte_compat.h
> > > new file mode 100644
> > > index 000..cff9aea
> > > --- /dev/null
> > > +++ b/lib/librte_compat/rte_compat.h
> > > @@ -0,0 +1,87 @@
> > > +/*-
> > > + *   BSD LICENSE
> > > + *
> > > + *   Copyright(c) 2010-2014 Neil Horman .
> > > + *   All rights reserved.
> > > + *
> > > + *   Redistribution and use in source and binary forms, with or without
> > > + *   modification, are permitted provided that the following conditions
> > > + *   are met:
> > > + *
> > > + * * Redistributions of source code must retain the above copyright
> > > + *   notice, this list of conditions and the following disclaimer.
> > > + * * Redistributions in binary form must reproduce the above 
> > > copyright
> > > + *   notice, this list of conditions and the following disclaimer in
> > > + *   the documentation and/or other materials provided with the
> > > + *   distribution.
> > > + * * Neither the name of Intel Corporation nor the names of its
> > > + *   contributors may be used to endorse or promote products derived
> > > + *   from this

[dpdk-dev] [PATCH 5/5] examples: no more bare metal environment

2014-09-26 Thread Thomas Monjalon

From: David Marchand 

Signed-off-by: David Marchand 
---
 examples/cmdline/main.c|  3 +-
 examples/cmdline/main.h| 45 -
 examples/dpdk_qat/main.c   |  3 +-
 examples/dpdk_qat/main.h   | 45 -
 examples/helloworld/main.c |  4 +-
 examples/helloworld/main.h | 45 -
 examples/ip_fragmentation/main.c   |  4 +-
 examples/ip_fragmentation/main.h   | 46 --
 examples/ip_pipeline/main.c|  2 +-
 examples/ip_pipeline/main.h|  8 
 examples/ip_reassembly/main.c  |  4 +-
 examples/ip_reassembly/main.h  | 46 --
 examples/ipv4_multicast/main.c |  4 +-
 examples/ipv4_multicast/main.h | 46 --
 examples/l2fwd/main.c  |  4 +-
 examples/l2fwd/main.h  | 45 -
 examples/l3fwd-acl/main.c  |  4 +-
 examples/l3fwd-acl/main.h  | 45 -
 examples/l3fwd-power/main.c|  4 +-
 examples/l3fwd-power/main.h| 45 -
 examples/l3fwd-vf/main.c   |  4 +-
 examples/l3fwd-vf/main.h   | 45 -
 examples/l3fwd/main.c  |  4 +-
 examples/l3fwd/main.h  | 41 ---
 examples/link_status_interrupt/main.c  |  4 +-
 examples/link_status_interrupt/main.h  | 45 -
 examples/load_balancer/main.c  |  2 +-
 examples/load_balancer/main.h  |  8 
 .../client_server_mp/mp_server/init.c  |  1 -
 .../client_server_mp/mp_server/main.c  |  3 +-
 .../client_server_mp/mp_server/main.h  | 45 -
 examples/multi_process/l2fwd_fork/main.c   |  3 +-
 examples/multi_process/l2fwd_fork/main.h   | 45 -
 examples/qos_meter/main.c  |  2 +-
 examples/qos_meter/main.h  |  9 -
 examples/qos_sched/main.c  |  2 +-
 examples/qos_sched/main.h  |  7 
 examples/quota_watermark/qw/main.c |  2 +-
 examples/quota_watermark/qw/main.h |  9 -
 examples/quota_watermark/qwctl/qwctl.c |  2 +-
 examples/quota_watermark/qwctl/qwctl.h |  8 
 examples/timer/main.c  |  4 +-
 examples/timer/main.h  | 45 -
 examples/vhost/main.c  |  6 +--
 examples/vhost/main.h  |  7 
 examples/vhost_xen/main.c  |  2 +-
 examples/vhost_xen/main.h  |  8 
 examples/vmdq/main.c   |  8 +---
 examples/vmdq/main.h   | 46 --
 examples/vmdq_dcb/main.c   |  8 +---
 examples/vmdq_dcb/main.h   | 46 --
 51 files changed, 27 insertions(+), 896 deletions(-)
 delete mode 100644 examples/cmdline/main.h
 delete mode 100644 examples/dpdk_qat/main.h
 delete mode 100644 examples/helloworld/main.h
 delete mode 100644 examples/ip_fragmentation/main.h
 delete mode 100644 examples/ip_reassembly/main.h
 delete mode 100644 examples/ipv4_multicast/main.h
 delete mode 100644 examples/l2fwd/main.h
 delete mode 100644 examples/l3fwd-acl/main.h
 delete mode 100644 examples/l3fwd-power/main.h
 delete mode 100644 examples/l3fwd-vf/main.h
 delete mode 100644 examples/l3fwd/main.h
 delete mode 100644 examples/link_status_interrupt/main.h
 delete mode 100644 examples/multi_process/client_server_mp/mp_server/main.h
 delete mode 100644 examples/multi_process/l2fwd_fork/main.h
 delete mode 100644 examples/timer/main.h
 delete mode 100644 examples/vmdq/main.h
 delete mode 100644 examples/vmdq_dcb/main.h

diff --git a/examples/cmdline/main.c b/examples/cmdline/main.c
index 668f152..f8ee0a5 100644
--- a/examples/cmdline/main.c
+++ b/examples/cmdline/main.c
@@ -77,9 +77,8 @@
 #include 

 #include "commands.h"
-#include "main.h"

-int MAIN(int argc, char **argv)
+int main(int argc, char **argv)
 {
int ret;
struct cmdline *cl;
diff --git a/examples/cmdline/main.h b/examples/cmdline/main.h
deleted file mode 100644
index f54938b..000
--- a/examples/cmdline/main.h
+++ /dev/null
@@ -1,45 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *

[dpdk-dev] [PATCH 4/5] app: no more bare metal environment

2014-09-26 Thread Thomas Monjalon

From: David Marchand 

Signed-off-by: David Marchand 
Signed-off-by: Thomas Monjalon 
---
 app/cmdline_test/cmdline_test.h |  7 --
 app/dump_cfg/main.c |  4 +---
 app/dump_cfg/main.h | 45 -
 app/test-acl/main.c |  6 +++--
 app/test-acl/main.h | 50 -
 app/test-pipeline/main.c|  2 +-
 app/test-pipeline/main.h|  8 ---
 app/test-pmd/cmdline.c  |  2 +-
 app/test-pmd/testpmd.c  |  4 
 app/test-pmd/testpmd.h  |  6 -
 app/test/Makefile   |  4 
 app/test/autotest.py|  8 +--
 app/test/autotest_runner.py | 33 ++-
 app/test/process.h  |  4 
 app/test/test.c |  4 
 app/test/test.h |  7 --
 app/test/test_debug.c   | 28 ++-
 app/test/test_interrupts.c  |  2 +-
 app/test/test_mbuf.c| 14 +---
 19 files changed, 24 insertions(+), 214 deletions(-)
 delete mode 100644 app/dump_cfg/main.h
 delete mode 100644 app/test-acl/main.h

diff --git a/app/cmdline_test/cmdline_test.h b/app/cmdline_test/cmdline_test.h
index 796fe20..1c9af12 100644
--- a/app/cmdline_test/cmdline_test.h
+++ b/app/cmdline_test/cmdline_test.h
@@ -34,13 +34,6 @@
 #ifndef _CMDLINE_TEST_H_
 #define _CMDLINE_TEST_H_

-/* icc on baremetal gives us troubles with function named 'main' */
-#ifdef RTE_EXEC_ENV_BAREMETAL
-#define main _main
-#endif
-
 extern cmdline_parse_ctx_t main_ctx[];

-int main(int argc, char **argv);
-
 #endif
diff --git a/app/dump_cfg/main.c b/app/dump_cfg/main.c
index c9b40d1..127dbb1 100644
--- a/app/dump_cfg/main.c
+++ b/app/dump_cfg/main.c
@@ -53,10 +53,8 @@
 #include 
 #include 

-#include "main.h"
-
 int
-MAIN(int argc, char **argv)
+main(int argc, char **argv)
 {
int ret;
int i;
diff --git a/app/dump_cfg/main.h b/app/dump_cfg/main.h
deleted file mode 100644
index f54938b..000
--- a/app/dump_cfg/main.h
+++ /dev/null
@@ -1,45 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of Intel Corporation nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _MAIN_H_
-#define _MAIN_H_
-
-#ifdef RTE_EXEC_ENV_BAREMETAL
-#define MAIN _main
-#else
-#define MAIN main
-#endif
-
-int MAIN(int argc, char **argv);
-
-#endif /* _MAIN_H_ */
diff --git a/app/test-acl/main.c b/app/test-acl/main.c
index 44add10..9d4dce6 100644
--- a/app/test-acl/main.c
+++ b/app/test-acl/main.c
@@ -62,7 +62,9 @@

 #endif /*RTE_LIBRTE_ACL_STANDALONE */

-#include "main.h"
+#defineRTE_LOGTYPE_TESTACL RTE_LOGTYPE_USER1
+
+#defineAPP_NAME"TESTACL"

 #define GET_CB_FIELD(in, fd, base, lim, dlm)   do {\
unsigned long val;  \
@@ -1012,7 +1014,7 @@ get_input_opts(int argc, char **argv)
 }

 int
-MAIN(int argc, char **argv)
+main(int argc, char **argv)
 {
int ret;
uint32_t lcore;
diff --git a/app/test-acl/main.h b/app/test-acl/main.h
deleted file mode 100644
index cec0408..000
--- a/app/test-acl/main.h
+++ /dev/null
@@ -1,50 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary

[dpdk-dev] [PATCH 3/5] eal: no more bare metal environment

2014-09-26 Thread Thomas Monjalon

From: David Marchand 

Signed-off-by: David Marchand 
Signed-off-by: Thomas Monjalon 
---
 lib/Makefile| 1 -
 lib/librte_eal/Makefile | 2 --
 lib/librte_eal/common/Makefile  | 3 ---
 lib/librte_eal/common/include/rte_eal.h | 5 ++---
 lib/librte_eal/common/include/rte_log.h | 3 +--
 5 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/lib/Makefile b/lib/Makefile
index 10c5bb3..8af6bd7 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -31,7 +31,6 @@

 include $(RTE_SDK)/mk/rte.vars.mk

-DIRS-$(CONFIG_RTE_LIBC) += libc
 DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc
 DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring
diff --git a/lib/librte_eal/Makefile b/lib/librte_eal/Makefile
index 3e1441b..69003cf 100644
--- a/lib/librte_eal/Makefile
+++ b/lib/librte_eal/Makefile
@@ -35,7 +35,5 @@ DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += common
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += linuxapp
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += common
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += bsdapp
-DIRS-$(CONFIG_RTE_LIBRTE_EAL_BAREMETAL) += baremetal
-DIRS-$(CONFIG_RTE_LIBRTE_EAL_BAREMETAL) += common

 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 7f27966..40986a7 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -52,7 +52,4 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include := $(addprefix 
include/,$(INC))
 SYMLINK-$(CONFIG_RTE_LIBRTE_EAL)-include/arch := \
$(addprefix include/$(RTE_ARCH)/arch/,$(ARCH_INC))

-# add libc if configured
-DEPDIRS-$(CONFIG_RTE_LIBC) += lib/libc
-
 include $(RTE_SDK)/mk/rte.install.mk
diff --git a/lib/librte_eal/common/include/rte_eal.h 
b/lib/librte_eal/common/include/rte_eal.h
index 273da9a..3c2d357 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -124,9 +124,8 @@ enum rte_proc_type_t rte_eal_process_type(void);
  * This function is to be executed on the MASTER lcore only, as soon
  * as possible in the application's main() function.
  *
- * The function finishes the initialization process that was started
- * during boot (in case of baremetal) or before main() is called (in
- * case of linuxapp). It puts the SLAVE lcores in the WAIT state.
+ * The function finishes the initialization process before main() is called.
+ * It puts the SLAVE lcores in the WAIT state.
  *
  * When the multi-partition feature is supported, depending on the
  * configuration (if CONFIG_RTE_EAL_MAIN_PARTITION is disabled), this
diff --git a/lib/librte_eal/common/include/rte_log.h 
b/lib/librte_eal/common/include/rte_log.h
index 02cbb14..db1ea08 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -106,8 +106,7 @@ extern FILE *eal_default_log_stream;
  *
  * This can be done at any time. The f argument represents the stream
  * to be used to send the logs. If f is NULL, the default output is
- * used, which is the serial line in case of bare metal, or directly
- * sent to syslog in case of linux application.
+ * used (stderr).
  *
  * @param f
  *   Pointer to the stream.
-- 
2.0.4

[dpdk-dev] [PATCH 2/5] mk: no more bare metal environment

2014-09-26 Thread Thomas Monjalon

From: David Marchand 

Signed-off-by: David Marchand 
Signed-off-by: Thomas Monjalon 
---
 mk/exec-env/bsdapp/rte.vars.mk   |  2 +-
 mk/exec-env/linuxapp/rte.vars.mk |  2 +-
 mk/rte.app.mk|  9 -
 mk/rte.sdkroot.mk|  2 +-
 mk/target/generic/rte.vars.mk|  2 +-
 mk/toolchain/gcc/rte.vars.mk |  4 
 mk/toolchain/icc/rte.vars.mk | 15 ---
 7 files changed, 4 insertions(+), 32 deletions(-)

diff --git a/mk/exec-env/bsdapp/rte.vars.mk b/mk/exec-env/bsdapp/rte.vars.mk
index fef9579..aed0e18 100644
--- a/mk/exec-env/bsdapp/rte.vars.mk
+++ b/mk/exec-env/bsdapp/rte.vars.mk
@@ -37,7 +37,7 @@
 #   - define EXECENV_ASFLAGS variable (overriden by cmdline)
 #   - may override any previously defined variable
 #
-# examples for RTE_EXEC_ENV: linuxapp, baremetal
+# examples for RTE_EXEC_ENV: linuxapp, bsdapp
 #
 ifeq ($(RTE_BUILD_SHARED_LIB),y)
 EXECENV_CFLAGS  = -pthread -fPIC
diff --git a/mk/exec-env/linuxapp/rte.vars.mk b/mk/exec-env/linuxapp/rte.vars.mk
index d4808c2..afcefa6 100644
--- a/mk/exec-env/linuxapp/rte.vars.mk
+++ b/mk/exec-env/linuxapp/rte.vars.mk
@@ -37,7 +37,7 @@
 #   - define EXECENV_ASFLAGS variable (overriden by cmdline)
 #   - may override any previously defined variable
 #
-# examples for RTE_EXEC_ENV: linuxapp, baremetal
+# examples for RTE_EXEC_ENV: linuxapp, bsdapp
 #
 ifeq ($(RTE_BUILD_SHARED_LIB),y)
 EXECENV_CFLAGS  = -pthread -fPIC
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 34dff2a..76f2fd3 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -149,15 +149,6 @@ ifeq ($(CONFIG_RTE_LIBRTE_RING),y)
 LDLIBS += -lrte_ring
 endif

-ifeq ($(CONFIG_RTE_LIBC),y)
-LDLIBS += -lc
-LDLIBS += -lm
-endif
-
-ifeq ($(CONFIG_RTE_LIBGLOSS),y)
-LDLIBS += -lgloss
-endif
-
 ifeq ($(CONFIG_RTE_LIBRTE_EAL),y)
 LDLIBS += -lrte_eal
 endif
diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
index f7eab1d..e8423b0 100644
--- a/mk/rte.sdkroot.mk
+++ b/mk/rte.sdkroot.mk
@@ -53,7 +53,7 @@ export BUILDING_RTE_SDK

 #
 # We can specify the configuration template when doing the "make
-# config". For instance: make config T=i686-native-baremetal-gcc
+# config". For instance: make config T=x86_64-native-linuxapp-gcc
 #
 RTE_CONFIG_TEMPLATE :=
 ifdef T
diff --git a/mk/target/generic/rte.vars.mk b/mk/target/generic/rte.vars.mk
index 6020f20..53650c3 100644
--- a/mk/target/generic/rte.vars.mk
+++ b/mk/target/generic/rte.vars.mk
@@ -94,7 +94,7 @@ include $(RTE_SDK)/mk/toolchain/$(RTE_TOOLCHAIN)/rte.vars.mk
 #   - define EXECENV_ASFLAGS variable (overriden by cmdline)
 #   - may override any previously defined variable
 #
-# examples for RTE_EXEC_ENV: linuxapp, baremetal
+# examples for RTE_EXEC_ENV: linuxapp, bsdapp
 #
 include $(RTE_SDK)/mk/exec-env/$(RTE_EXEC_ENV)/rte.vars.mk

diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
index 262ebdf..fac5697 100644
--- a/mk/toolchain/gcc/rte.vars.mk
+++ b/mk/toolchain/gcc/rte.vars.mk
@@ -74,11 +74,7 @@ WERROR_FLAGS := -W -Wall -Werror -Wstrict-prototypes 
-Wmissing-prototypes
 WERROR_FLAGS += -Wmissing-declarations -Wold-style-definition -Wpointer-arith
 WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual
 WERROR_FLAGS += -Wformat-nonliteral -Wformat-security
-
-ifeq ($(CONFIG_RTE_EXEC_ENV),"linuxapp")
-# These trigger warnings in newlib, so can't be used for baremetal
 WERROR_FLAGS += -Wundef -Wwrite-strings
-endif

 # process cpu flags
 include $(RTE_SDK)/mk/toolchain/$(RTE_TOOLCHAIN)/rte.toolchain-compat.mk
diff --git a/mk/toolchain/icc/rte.vars.mk b/mk/toolchain/icc/rte.vars.mk
index 612370d..807134a 100644
--- a/mk/toolchain/icc/rte.vars.mk
+++ b/mk/toolchain/icc/rte.vars.mk
@@ -70,22 +70,7 @@ TOOLCHAIN_ASFLAGS =
 #   Remark #271   : trailing comma is nonstandard
 #   Warning #1478 : function "" (declared at line N of "")
 #   was declared "deprecated"
-ifeq ($(CONFIG_RTE_EXEC_ENV),"linuxapp")
 WERROR_FLAGS := -Wall -Werror-all -w2 -diag-disable 271 -diag-warning 1478
-else
-
-# Turn off some ICC warnings -
-#   Remark #193   : zero used for undefined preprocessing identifier
-#  (needed for newlib)
-#   Remark #271   : trailing comma is nonstandard
-#   Remark #1292  : attribute "warning" ignored ((warning ("the use of
-#   `mktemp' is dangerous; use `mkstemp' instead";
-#   (needed for newlib)
-#   Warning #1478 : function "" (declared at line N of "")
-#   was declared "deprecated"
-WERROR_FLAGS := -Wall -Werror-all -w2 -diag-disable 193,271,1292 \
-   -diag-warning 1478
-endif

 # process cpu flags
 include $(RTE_SDK)/mk/toolchain/$(RTE_TOOLCHAIN)/rte.toolchain-compat.mk
-- 
2.0.4

[dpdk-dev] [PATCH 1/5] config: no more bare metal environment

2014-09-26 Thread Thomas Monjalon

From: David Marchand 

Signed-off-by: David Marchand 
Signed-off-by: Thomas Monjalon 
---
 config/common_bsdapp   |  7 +--
 config/common_linuxapp | 32 +---
 2 files changed, 2 insertions(+), 37 deletions(-)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index eebd05b..c3cee6e 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -33,7 +33,7 @@
 #
 # define executive environment
 #
-# CONFIG_RTE_EXEC_ENV can be linuxapp, baremetal, bsdapp
+# CONFIG_RTE_EXEC_ENV can be linuxapp, bsdapp
 #
 CONFIG_RTE_EXEC_ENV="bsdapp"
 CONFIG_RTE_EXEC_ENV_BSDAPP=y
@@ -116,11 +116,6 @@ CONFIG_RTE_LIBRTE_EAL_BSDAPP=y
 CONFIG_RTE_LIBRTE_EAL_LINUXAPP=n

 #
-# Compile Environment Abstraction Layer for Bare metal
-#
-CONFIG_RTE_LIBRTE_EAL_BAREMETAL=n
-
-#
 # Compile Environment Abstraction Layer to support Vmware TSC map
 #
 CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 4713eb4..3acb8cb 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -33,7 +33,7 @@
 #
 # define executive environment
 #
-# CONFIG_RTE_EXEC_ENV can be linuxapp, baremetal, bsdapp
+# CONFIG_RTE_EXEC_ENV can be linuxapp, bsdapp
 #
 CONFIG_RTE_EXEC_ENV="linuxapp"
 CONFIG_RTE_EXEC_ENV_LINUXAPP=y
@@ -85,31 +85,6 @@ CONFIG_RTE_BUILD_COMBINE_LIBS=n
 CONFIG_RTE_LIBNAME="intel_dpdk"

 #
-# Compile libc directory
-#
-CONFIG_RTE_LIBC=n
-
-#
-# Compile newlib as libc from source
-#
-CONFIG_RTE_LIBC_NEWLIB_SRC=n
-
-#
-# Use binary newlib
-#
-CONFIG_RTE_LIBC_NEWLIB_BIN=n
-
-#
-# Use binary newlib
-#
-CONFIG_RTE_LIBC_NETINCS=n
-
-#
-# Compile libgloss (newlib-stubs)
-#
-CONFIG_RTE_LIBGLOSS=n
-
-#
 # Compile Environment Abstraction Layer
 #
 CONFIG_RTE_LIBRTE_EAL=y
@@ -139,11 +114,6 @@ CONFIG_RTE_PCI_MAX_READ_REQUEST_SIZE=0
 CONFIG_RTE_LIBRTE_EAL_LINUXAPP=y

 #
-# Compile Environment Abstraction Layer for Bare metal
-#
-CONFIG_RTE_LIBRTE_EAL_BAREMETAL=n
-
-#
 # Compile Environment Abstraction Layer to support Vmware TSC map
 #
 CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=y
-- 
2.0.4

[dpdk-dev] [PATCH 0/5] remove traces of bare metal support

2014-09-26 Thread Thomas Monjalon

There are some references to bare metal (i.e. without OS) support,
especially some options to build a libc with DPDK.
As there are currently no such support, it can be removed.
Some comments are cleaned in the same time.

Thanks to David for having done most of this effort:
  config: no more bare metal environment
  mk: no more bare metal environment
  eal: no more bare metal environment
  app: no more bare metal environment
  examples: no more bare metal environment

-- 
Thomas

[dpdk-dev] [PATCH v2] distributor_app: new sample app

2014-09-26 Thread Ananyev, Konstantin



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of De Lara Guarch, Pablo
> Sent: Friday, September 26, 2014 4:12 PM
> To: Pattan, Reshma; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] distributor_app: new sample app
> 
> Hi,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of reshmapa
> > Sent: Wednesday, September 24, 2014 3:17 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v2] distributor_app: new sample app
> >
> > From: Reshma Pattan 
> >
> > A new sample app that shows the usage of the distributor library. This
> > app works as follows:
> >
> > * An RX thread runs which pulls packets from each ethernet port in turn
> >   and passes those packets to worker using a distributor component.
> > * The workers take the packets in turn, and determine the output port
> >   for those packets using basic l2forwarding doing an xor on the source
> >   port id.
> > * The RX thread takes the returned packets from the workers and enqueue
> >   those packets into an rte_ring structure.
> > * A TX thread pulls the packets off the rte_ring structure and then
> >   sends each packet out the output port specified previously by the worker
> > * Command-line option support provided only for portmask.
> >
> > Signed-off-by: Bruce Richardson 
> > Signed-off-by: Reshma Pattan 
> > ---
> >  examples/Makefile |   1 +
> >  examples/distributor_app/Makefile |  57 
> >  examples/distributor_app/main.c   | 585
> > ++
> >  examples/distributor_app/main.h   |  46 +++
> >  4 files changed, 689 insertions(+)
> >  create mode 100644 examples/distributor_app/Makefile
> >  create mode 100644 examples/distributor_app/main.c
> >  create mode 100644 examples/distributor_app/main.h
> >
> > diff --git a/examples/Makefile b/examples/Makefile
> > index 6245f83..2ba82b0 100644
> > --- a/examples/Makefile
> > +++ b/examples/Makefile
> > @@ -66,5 +66,6 @@ DIRS-y += vhost
> >  DIRS-$(CONFIG_RTE_LIBRTE_XEN_DOM0) += vhost_xen
> >  DIRS-y += vmdq
> >  DIRS-y += vmdq_dcb
> > +DIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += distributor_app
> >
> >  include $(RTE_SDK)/mk/rte.extsubdir.mk
> > diff --git a/examples/distributor_app/Makefile
> > b/examples/distributor_app/Makefile
> > new file mode 100644
> > index 000..394785d
> > --- /dev/null
> > +++ b/examples/distributor_app/Makefile
> > @@ -0,0 +1,57 @@
> > +#   BSD LICENSE
> > +#
> > +#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > +#   All rights reserved.
> > +#
> > +#   Redistribution and use in source and binary forms, with or without
> > +#   modification, are permitted provided that the following conditions
> > +#   are met:
> > +#
> > +# * Redistributions of source code must retain the above copyright
> > +#   notice, this list of conditions and the following disclaimer.
> > +# * Redistributions in binary form must reproduce the above copyright
> > +#   notice, this list of conditions and the following disclaimer in
> > +#   the documentation and/or other materials provided with the
> > +#   distribution.
> > +# * Neither the name of Intel Corporation nor the names of its
> > +#   contributors may be used to endorse or promote products derived
> > +#   from this software without specific prior written permission.
> > +#
> > +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> > CONTRIBUTORS
> > +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> > NOT
> > +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> > FITNESS FOR
> > +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> > COPYRIGHT
> > +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> > INCIDENTAL,
> > +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> > NOT
> > +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> > OF USE,
> > +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> > AND ON ANY
> > +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> > TORT
> > +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> > THE USE
> > +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> > DAMAGE.
> > +
> > +ifeq ($(RTE_SDK),)
> > +$(error "Please define RTE_SDK environment variable")
> > +endif
> > +
> > +# Default target, can be overriden by command line or environment
> > +RTE_TARGET ?= x86_64-default-linuxapp-gcc
> 
> This target is not present anymore. Change it to x86_64-native-linuxapp-gcc.
> 
> > +
> > +include $(RTE_SDK)/mk/rte.vars.mk
> > +
> > +# binary name
> > +APP = distributor_app
> > +
> > +# all source are stored in SRCS-y
> > +SRCS-y := main.c
> > +
> > +CFLAGS += $(WERROR_FLAGS)
> > +
> > +# workaround for a gcc bug with noreturn attribute
> > +# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
> > +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
> > +CFLAGS_main.o +=

[dpdk-dev] [PATCH] examples: do not probe pci twice

2014-09-26 Thread David Marchand

On Fri, Sep 26, 2014 at 2:31 PM, Thomas Monjalon 
wrote:

> Since commit a155d430119 ("support link bonding device initialization"),
> rte_eal_pci_probe() is called in rte_eal_init().
> So it doesn't have to be called by application anymore.
> It has been fixed for testpmd in commit 2950a769315,
> and this patch remove it from other applications.
>
> Signed-off-by: Thomas Monjalon 
> ---
>  app/test-pipeline/init.c   |  5 ---
>  app/test/test_kni.c|  5 ---
>  examples/dpdk_qat/main.c   |  3 --
>  examples/exception_path/main.c |  5 ---
>  examples/ip_fragmentation/main.c   |  3 --
>  examples/ip_pipeline/init.c|  5 ---
>  examples/ip_reassembly/main.c  |  3 --
>  examples/ipv4_multicast/main.c |  3 --
>  examples/kni/main.c|  5 ---
>  examples/l2fwd-ivshmem/host/host.c |  3 --
>  examples/l2fwd/main.c  |  3 --
>  examples/l3fwd-acl/main.c  |  3 --
>  examples/l3fwd-power/main.c|  3 --
>  examples/l3fwd-vf/main.c   |  3 --
>  examples/l3fwd/main.c  |  4 --
>  examples/link_status_interrupt/main.c  |  3 --
>  examples/load_balancer/init.c  |  4 --
>  .../client_server_mp/mp_client/client.c|  3 --
>  .../client_server_mp/mp_server/init.c  |  6 ---
>  .../client_server_mp/shared/init_drivers.h | 49
> --
>  examples/multi_process/l2fwd_fork/main.c   |  3 --
>  examples/multi_process/symmetric_mp/main.c |  6 +--
>  examples/netmap_compat/bridge/bridge.c |  4 --
>  examples/qos_meter/main.c  |  3 --
>  examples/qos_sched/init.c  |  3 --
>  examples/quota_watermark/qw/init.c |  7 
>  examples/vhost/main.c  |  3 --
>  examples/vhost_xen/main.c  |  3 --
>  examples/vmdq/main.c   |  3 --
>  29 files changed, 2 insertions(+), 154 deletions(-)
>  delete mode 100644
> examples/multi_process/client_server_mp/shared/init_drivers.h
>
> diff --git a/app/test-pipeline/init.c b/app/test-pipeline/init.c
> index a4337d0..17b6d23 100644
> --- a/app/test-pipeline/init.c
> +++ b/app/test-pipeline/init.c
> @@ -228,11 +228,6 @@ app_init_ports(void)
>  {
> uint32_t i;
>
> -   /* Init driver */
> -   RTE_LOG(INFO, USER1, "Initializing the PMD driver ...\n");
> -   if (rte_eal_pci_probe() < 0)
> -   rte_panic("Cannot probe PCI\n");
> -
> /* Init NIC ports, then start the ports */
> for (i = 0; i < app.n_ports; i++) {
> uint8_t port;
> diff --git a/app/test/test_kni.c b/app/test/test_kni.c
> index 2860bf3..1081131 100644
> --- a/app/test/test_kni.c
> +++ b/app/test/test_kni.c
> @@ -508,11 +508,6 @@ test_kni(void)
> printf("fail to create mempool for kni\n");
> return -1;
> }
> -   ret = rte_eal_pci_probe();
> -   if (ret < 0) {
> -   printf("fail to probe PCI devices\n");
> -   return -1;
> -   }
>
> nb_ports = rte_eth_dev_count();
> if (nb_ports == 0) {
> diff --git a/examples/dpdk_qat/main.c b/examples/dpdk_qat/main.c
> index 1599a0a..c130ea3 100644
> --- a/examples/dpdk_qat/main.c
> +++ b/examples/dpdk_qat/main.c
> @@ -696,9 +696,6 @@ MAIN(int argc, char **argv)
> if (ret < 0)
> return -1;
>
> -   if (rte_eal_pci_probe() < 0)
> -   rte_panic("Cannot probe PCI\n");
> -
> if (check_lcore_params() < 0)
> rte_panic("check_lcore_params failed\n");
>
> diff --git a/examples/exception_path/main.c
> b/examples/exception_path/main.c
> index f286bf2..b485976 100644
> --- a/examples/exception_path/main.c
> +++ b/examples/exception_path/main.c
> @@ -567,11 +567,6 @@ main(int argc, char** argv)
> return -1;
> }
>
> -   /* Scan PCI bus for recognised devices */
> -   ret = rte_eal_pci_probe();
> -   if (ret < 0)
> -   FATAL_ERROR("Could not probe PCI (%d)", ret);
> -
> /* Get number of ports found in scan */
> nb_sys_ports = rte_eth_dev_count();
> if (nb_sys_ports == 0)
> diff --git a/examples/ip_fragmentation/main.c
> b/examples/ip_fragmentation/main.c
> index 6d309b5..75028ac 100644
> --- a/examples/ip_fragmentation/main.c
> +++ b/examples/ip_fragmentation/main.c
> @@ -871,9 +871,6 @@ MAIN(int argc, char **argv)
> if (ret < 0)
> rte_exit(EXIT_FAILURE, "Invalid arguments");
>
> -   if (rte_eal_pci_probe() < 0)
> -   rte_panic("Cannot probe PCI\n");
> -
> nb_ports = rte_eth_dev_count();
> if (nb_ports

[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Neil Horman

On Fri, Sep 26, 2014 at 06:07:14PM +, Ananyev, Konstantin wrote:
> 
> 
> > > As I remember the purpose of the patch was to fix the race condition 
> > > inside rte_alarm library.
> > > I believe that the patch provided by Michal & Pawel fixes the issues you 
> > > discovered.
> > > If you think, that is not the case, could you please provide a list of 
> > > remaining issues?
> > > Excluding ones that you just don't like it, and you are not happy with 
> > > rte_alarm API in total?
> 
> 
> > Gladly.  As Pawel explained the race, its possible that, after calling
> > rte_eal_alarm_cancel, an in-flight execution of an alarm callback may still 
> > be
> > running.  The problem with that ostensibly is that data which is being 
> > accessed
> > by the callback might be then accessed in parallel with another process 
> > leading
> > to data corruption or some other problem. The issue I have with his patch is
> > that it doesn't completely close the race.  While it does close the race 
> > for the
> > condition in whcih thread B is running the alarm callback while thread A is
> > executing the cancel operation, it does not close the case for when a single
> > thread B is running the cancel operation, as the in-flight execution itself 
> > is
> > still active.
> 
> A bit puzzled here:
> Are you saying that calling alarm_cancel() for itself inside 
> eal_alarm_callback() might cause a problem?
> I still don't see how.
> 
Potentially yes, by the same race condition that exists when using a secondary
thread to do the cancel call.  As I understand it the race that Pawel described
is as follows:

Thread AThread B
alarm_cancel()  eal_alarm_callback
block on alarm spinlock drop spinlock
run cancel operationexecute callback function
return from cancel
rte_eal_alarm_set   

As Pawel described the problem, there is a desire to not set the new alarm while
the old alarm is still executing.  And his patch accomplishes that for the two
thread case above just fine

The problem with Pawels patch is that its non functional in the case where the
cancel happens within Thread B.  Lets change the scenario just a little bit:

Thread BThread C
eal_alarm_callback
 callback_function
  some_other_common_func
   rte_eal_alarm_cancel(this)
  pthread_signal(Thread C)  wake up
  operate on alarm data rte_eal_alarm_set


In this scenario the problem is not fixed because when called from within the
alarm thread, the executing alarm is skipped (as it must be), but that fact is
invisible to the caller, and because of that its still possible for the same
origional problem to occur.

Neil

[dpdk-dev] [PATCH 1/4 v2] compat: Add infrastructure to support symbol versioning

2014-09-26 Thread Neil Horman

On Fri, Sep 26, 2014 at 12:22:56PM -0400, Neil Horman wrote:
> On Fri, Sep 26, 2014 at 04:33:04PM +0100, Sergio Gonzalez Monroy wrote:
> > On Fri, Sep 26, 2014 at 11:16:30AM -0400, Neil Horman wrote:
> > > On Fri, Sep 26, 2014 at 03:16:08PM +0100, Sergio Gonzalez Monroy wrote:
> > > > On Thu, Sep 25, 2014 at 02:52:32PM -0400, Neil Horman wrote:
> > > > > Add initial pass header files to support symbol versioning.
> > > > > 
> > > > > ---
> > > > > Change notes
> > > > > v2)
> > > > > * Fixed ifdef in rte_compat.h to test for RTE_BUILD_SHARED_LIB 
> > > > > instead of the
> > > > > non-existant RTE_SYMBOL_VERSIONING
> > > > > 
> > > > > * Fixed VERSION_SYMBOL macro to add the needed extra @ to make 
> > > > > versioning work
> > > > > properly
> > > > > 
> > > > > * Improved/Clarified documentation
> > > > > 
> > > > > Signed-off-by: Neil Horman 
> > > > > CC: Thomas Monjalon 
> > > > > CC: "Richardson, Bruce" 
> > > > > CC: "Gonzalez Monroy, Sergio" 
> > > > > ---
> > > > >  lib/Makefile   |  1 +
> > > > >  lib/librte_compat/Makefile | 38 ++
> > > > >  lib/librte_compat/rte_compat.h | 87 
> > > > > ++
> > > > >  mk/rte.lib.mk  |  6 +++
> > > > >  4 files changed, 132 insertions(+)
> > > > >  create mode 100644 lib/librte_compat/Makefile
> > > > >  create mode 100644 lib/librte_compat/rte_compat.h
> > > > > 
> > > > > diff --git a/lib/Makefile b/lib/Makefile
> > > > > index 10c5bb3..a85b55b 100644
> > > > > --- a/lib/Makefile
> > > > > +++ b/lib/Makefile
> > > > > @@ -32,6 +32,7 @@
> > > > >  include $(RTE_SDK)/mk/rte.vars.mk
> > > > >  
> > > > >  DIRS-$(CONFIG_RTE_LIBC) += libc
> > > > > +DIRS-y += librte_compat
> > > > >  DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal
> > > > >  DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc
> > > > >  DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring
> > > > > diff --git a/lib/librte_compat/Makefile b/lib/librte_compat/Makefile
> > > > > new file mode 100644
> > > > > index 000..3415c7b
> > > > > --- /dev/null
> > > > > +++ b/lib/librte_compat/Makefile
> > > > > @@ -0,0 +1,38 @@
> > > > > +#   BSD LICENSE
> > > > > +#
> > > > > +#   Copyright(c) 2010-2014 Neil Horman 
> > > > > +#   All rights reserved.
> > > > > +#
> > > > > +#   Redistribution and use in source and binary forms, with or 
> > > > > without
> > > > > +#   modification, are permitted provided that the following 
> > > > > conditions
> > > > > +#   are met:
> > > > > +#
> > > > > +# * Redistributions of source code must retain the above 
> > > > > copyright
> > > > > +#   notice, this list of conditions and the following disclaimer.
> > > > > +# * Redistributions in binary form must reproduce the above 
> > > > > copyright
> > > > > +#   notice, this list of conditions and the following disclaimer 
> > > > > in
> > > > > +#   the documentation and/or other materials provided with the
> > > > > +#   distribution.
> > > > > +# * Neither the name of Intel Corporation nor the names of its
> > > > > +#   contributors may be used to endorse or promote products 
> > > > > derived
> > > > > +#   from this software without specific prior written permission.
> > > > > +#
> > > > > +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND 
> > > > > CONTRIBUTORS
> > > > > +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > > > > +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 
> > > > > FITNESS FOR
> > > > > +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
> > > > > COPYRIGHT
> > > > > +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 
> > > > > INCIDENTAL,
> > > > > +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > > > > +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF 
> > > > > USE,
> > > > > +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 
> > > > > ON ANY
> > > > > +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR 
> > > > > TORT
> > > > > +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF 
> > > > > THE USE
> > > > > +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH 
> > > > > DAMAGE.
> > > > > +
> > > > > +include $(RTE_SDK)/mk/rte.vars.mk
> > > > > +
> > > > > +
> > > > > +# install includes
> > > > > +SYMLINK-y-include := rte_compat.h
> > > > > +
> > > > > +include $(RTE_SDK)/mk/rte.lib.mk
> > > > > diff --git a/lib/librte_compat/rte_compat.h 
> > > > > b/lib/librte_compat/rte_compat.h
> > > > > new file mode 100644
> > > > > index 000..cff9aea
> > > > > --- /dev/null
> > > > > +++ b/lib/librte_compat/rte_compat.h
> > > > > @@ -0,0 +1,87 @@
> > > > > +/*-
> > > > > + *   BSD LICENSE
> > > > > + *
> > > > > + *   Copyright(c) 2010-2014 Neil Horman .
> > > > > + *   All rights reserved.
> > > > > + *
> > > > > + *   Redistribution and use in source and binary forms, with or 
> > > > > without
> > >

[dpdk-dev] [PATCH 2/2] app: Used rte_eth_rxconf_defaults and rte_eth_txconf_defaults in apps

2014-09-26 Thread Pablo de Lara

For apps that were using default rte_eth_rxconf and rte_eth_txconf
structures, these have been removed and now they are obtained by
calling rte_eth_rxconf_defaults and rte_eth_txconf_defaults, just
before setting up RX/TX queues.

Signed-off-by: Pablo de Lara 
---
 examples/dpdk_qat/main.c   |   44 ++-
 examples/exception_path/main.c |   30 +-
 examples/ip_fragmentation/main.c   |   42 ++
 examples/ip_reassembly/main.c  |   44 ++-
 examples/ipv4_multicast/main.c |   44 ++
 examples/kni/main.c|   34 +--
 examples/l2fwd-ivshmem/host/host.c |   43 +-
 examples/l2fwd/main.c  |   48 +--
 examples/l3fwd-acl/main.c  |   46 ++-
 examples/l3fwd-power/main.c|   46 ++-
 examples/l3fwd-vf/main.c   |   31 ++
 examples/l3fwd/main.c  |   54 +++---
 examples/link_status_interrupt/main.c  |   43 +-
 examples/load_balancer/init.c  |   24 +---
 .../client_server_mp/mp_server/init.c  |   41 +
 examples/multi_process/l2fwd_fork/main.c   |   44 +-
 examples/multi_process/symmetric_mp/main.c |   36 +--
 examples/netmap_compat/bridge/bridge.c |   25 
 examples/netmap_compat/lib/compat_netmap.c |6 +-
 examples/netmap_compat/lib/compat_netmap.h |2 -
 examples/qos_meter/main.c  |   36 
 examples/quota_watermark/qw/init.c |   26 ++---
 examples/vhost_xen/main.c  |   31 ++
 examples/vmdq/main.c   |   60 +++-
 examples/vmdq_dcb/main.c   |   36 +--
 25 files changed, 118 insertions(+), 798 deletions(-)

diff --git a/examples/dpdk_qat/main.c b/examples/dpdk_qat/main.c
index d61db4c..69b8e6a 100644
--- a/examples/dpdk_qat/main.c
+++ b/examples/dpdk_qat/main.c
@@ -75,25 +75,6 @@
 #define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)
 #define NB_MBUF   (32 * 1024)

-/*
- * RX and TX Prefetch, Host, and Write-back threshold values should be
- * carefully set for optimal performance. Consult the network
- * controller's datasheet and supporting DPDK documentation for guidance
- * on how these parameters should be set.
- */
-#define RX_PTHRESH 8 /**< Default values of RX prefetch threshold reg. */
-#define RX_HTHRESH 8 /**< Default values of RX host threshold reg. */
-#define RX_WTHRESH 4 /**< Default values of RX write-back threshold reg. */
-
-/*
- * These default values are optimized for use with the Intel(R) 82599 10 GbE
- * Controller and the DPDK ixgbe PMD. Consider using other values for other
- * network controllers and/or network drivers.
- */
-#define TX_PTHRESH 36 /**< Default values of TX prefetch threshold reg. */
-#define TX_HTHRESH 0  /**< Default values of TX host threshold reg. */
-#define TX_WTHRESH 0  /**< Default values of TX write-back threshold reg. */
-
 #define MAX_PKT_BURST 32
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */

@@ -178,24 +159,6 @@ static struct rte_eth_conf port_conf = {
},
 };

-static const struct rte_eth_rxconf rx_conf = {
-   .rx_thresh = {
-   .pthresh = RX_PTHRESH,
-   .hthresh = RX_HTHRESH,
-   .wthresh = RX_WTHRESH,
-   },
-};
-
-static const struct rte_eth_txconf tx_conf = {
-   .tx_thresh = {
-   .pthresh = TX_PTHRESH,
-   .hthresh = TX_HTHRESH,
-   .wthresh = TX_WTHRESH,
-   },
-   .tx_free_thresh = 0, /* Use PMD default values */
-   .tx_rs_thresh = 0, /* Use PMD default values */
-};
-
 static struct rte_mempool * pktmbuf_pool[RTE_MAX_NUMA_NODES];

 struct lcore_conf {
@@ -785,7 +748,8 @@ MAIN(int argc, char **argv)
printf("txq=%u,%d,%d ", lcoreid, queueid, socketid);
fflush(stdout);
ret = rte_eth_tx_queue_setup(portid, queueid, nb_txd,
-socketid, _conf);
+   socketid,
+   NULL);
if (ret < 0)
rte_panic("rte_eth_tx_queue_setup: err=%d, "
"port=%d\n", ret, portid);
@@ -810,7 +774,9 @@ MAIN(int argc, char **argv)
fflush(stdout);

ret = rte_eth_rx_queue_setup(portid, queueid, nb_rxd,
-   socketid, _conf, 
pktmbuf_pool[socketid]);
+   socketid,
+

[dpdk-dev] [PATCH 1/2] pmd: Added rte_eth_rxconf_defaults and rte_eth_txconf defaults functions

2014-09-26 Thread Pablo de Lara

Many sample apps use duplicated code to set rte_eth_txconf and rte_eth_rxconf
structures. This patch allows the user to get a default optimal RX/TX 
configuration
through these two functions, and still any parameters may be tweaked as wished,
before setting up queues.

Signed-off-by: Pablo de Lara 
---
 lib/librte_ether/rte_ethdev.c   |   68 +++
 lib/librte_ether/rte_ethdev.h   |   29 +++
 lib/librte_pmd_e1000/igb_ethdev.c   |   56 -
 lib/librte_pmd_i40e/i40e_ethdev.c   |   56 
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   59 ++
 5 files changed, 267 insertions(+), 1 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index fd1010a..3c24040 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -928,6 +928,7 @@ rte_eth_rx_queue_setup(uint8_t port_id, uint16_t 
rx_queue_id,
struct rte_eth_dev *dev;
struct rte_pktmbuf_pool_private *mbp_priv;
struct rte_eth_dev_info dev_info;
+   const struct rte_eth_rxconf *conf;

/* This function is only safe when called from the primary process
 * in a multi-process setup*/
@@ -937,6 +938,16 @@ rte_eth_rx_queue_setup(uint8_t port_id, uint16_t 
rx_queue_id,
PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
return (-EINVAL);
}
+
+   conf = rx_conf;
+   if (conf == NULL) {
+   conf = rte_eth_rxconf_defaults(port_id, NULL);
+   if (conf == NULL) {
+   PMD_DEBUG_TRACE("Invalid RX port configuration\n");
+   return (-EINVAL);
+   }
+   }
+
dev = _eth_devices[port_id];
if (rx_queue_id >= dev->data->nb_rx_queues) {
PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", rx_queue_id);
@@ -997,6 +1008,7 @@ rte_eth_tx_queue_setup(uint8_t port_id, uint16_t 
tx_queue_id,
   const struct rte_eth_txconf *tx_conf)
 {
struct rte_eth_dev *dev;
+   const struct rte_eth_txconf *conf;

/* This function is only safe when called from the primary process
 * in a multi-process setup*/
@@ -1006,6 +1018,16 @@ rte_eth_tx_queue_setup(uint8_t port_id, uint16_t 
tx_queue_id,
PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
return (-EINVAL);
}
+
+   conf = tx_conf;
+   if (conf == NULL) {
+   conf = rte_eth_txconf_defaults(port_id, NULL);
+   if (conf == NULL) {
+   PMD_DEBUG_TRACE("Invalid TX port configuration\n");
+   return (-EINVAL);
+   }
+   }
+
dev = _eth_devices[port_id];
if (tx_queue_id >= dev->data->nb_tx_queues) {
PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", tx_queue_id);
@@ -3002,3 +3024,49 @@ rte_eth_dev_get_flex_filter(uint8_t port_id, uint16_t 
index,
return (*dev->dev_ops->get_flex_filter)(dev, index, filter,
rx_queue);
 }
+
+const struct rte_eth_rxconf *
+rte_eth_rxconf_defaults(uint8_t port_id, struct rte_eth_rxconf *conf)
+{
+   struct rte_eth_dev *dev;
+   static const struct rte_eth_rxconf defaults;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return NULL;
+   }
+
+   dev = _eth_devices[port_id];
+
+   if (dev->dev_ops->rxconf_defaults) {
+   return (*dev->dev_ops->rxconf_defaults)(conf);
+   } else {
+   if (conf == NULL)
+   return 
+   *conf = defaults;
+   }
+   return conf;
+}
+
+const struct rte_eth_txconf *
+rte_eth_txconf_defaults(uint8_t port_id, struct rte_eth_txconf *conf)
+{
+   struct rte_eth_dev *dev;
+   static const struct rte_eth_txconf defaults;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return NULL;
+   }
+
+   dev = _eth_devices[port_id];
+
+   if (dev->dev_ops->txconf_defaults) {
+   return (*dev->dev_ops->txconf_defaults)(conf);
+   } else {
+   if (conf == NULL)
+   return 
+   *conf = defaults;
+   }
+   return conf;
+}
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 50df654..70026fd 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1361,6 +1361,12 @@ typedef int (*eth_get_flex_filter_t)(struct rte_eth_dev 
*dev,
uint16_t *rx_queue);
 /**< @internal Get a flex filter rule on an Ethernet device */

+typedef const struct rte_eth_rxconf * (*eth_rxconf_defaults_t) (struct 
rte_eth_rxconf *conf);
+/**< @internal Get the default RX port configuration on an Ethernet device */
+
+typedef const struct rte_eth_txconf *

[dpdk-dev] [PATCH 0/2] Added functions to get RX/TX default configuration

2014-09-26 Thread Pablo de Lara

These patches add two new API functions to get an optimal values
for the RX/TX configuration structures (rte_eth_rxconf and rte_eth_txconf),
so users can get these configurations and modify or use them directly,
to set up RX/TX queues. Besides, most of the apps that were modifying little
or none of the default values of the structures, have been modified to use
these functions to simplify the code and avoid duplication.

Pablo de Lara (2):
  pmd: Added rte_eth_rxconf_defaults and rte_eth_txconf defaults
functions
  app: Used rte_eth_rxconf_defaults and rte_eth_txconf_defaults in apps

 examples/dpdk_qat/main.c   |   44 ++---
 examples/exception_path/main.c |   30 +
 examples/ip_fragmentation/main.c   |   42 ++---
 examples/ip_reassembly/main.c  |   44 ++---
 examples/ipv4_multicast/main.c |   44 ++---
 examples/kni/main.c|   34 +-
 examples/l2fwd-ivshmem/host/host.c |   43 +---
 examples/l2fwd/main.c  |   48 +-
 examples/l3fwd-acl/main.c  |   46 ++
 examples/l3fwd-power/main.c|   46 ++---
 examples/l3fwd-vf/main.c   |   31 ++---
 examples/l3fwd/main.c  |   54 +++-
 examples/link_status_interrupt/main.c  |   43 +---
 examples/load_balancer/init.c  |   24 +--
 .../client_server_mp/mp_server/init.c  |   41 +---
 examples/multi_process/l2fwd_fork/main.c   |   44 +
 examples/multi_process/symmetric_mp/main.c |   36 +-
 examples/netmap_compat/bridge/bridge.c |   25 ---
 examples/netmap_compat/lib/compat_netmap.c |6 +-
 examples/netmap_compat/lib/compat_netmap.h |2 -
 examples/qos_meter/main.c  |   36 ---
 examples/quota_watermark/qw/init.c |   26 ++--
 examples/vhost_xen/main.c  |   31 ++---
 examples/vmdq/main.c   |   60 ++---
 examples/vmdq_dcb/main.c   |   36 +-
 lib/librte_ether/rte_ethdev.c  |   68 
 lib/librte_ether/rte_ethdev.h  |   29 
 lib/librte_pmd_e1000/igb_ethdev.c  |   56 -
 lib/librte_pmd_i40e/i40e_ethdev.c  |   56 
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c|   59 +
 30 files changed, 385 insertions(+), 799 deletions(-)

-- 
1.7.7.6

[dpdk-dev] [PATCH 1/4 v2] compat: Add infrastructure to support symbol versioning

2014-09-26 Thread Sergio Gonzalez Monroy

On Thu, Sep 25, 2014 at 02:52:32PM -0400, Neil Horman wrote:
> Add initial pass header files to support symbol versioning.
> 
> ---
> Change notes
> v2)
> * Fixed ifdef in rte_compat.h to test for RTE_BUILD_SHARED_LIB instead of the
> non-existant RTE_SYMBOL_VERSIONING
> 
> * Fixed VERSION_SYMBOL macro to add the needed extra @ to make versioning work
> properly
> 
> * Improved/Clarified documentation
> 
> Signed-off-by: Neil Horman 
> CC: Thomas Monjalon 
> CC: "Richardson, Bruce" 
> CC: "Gonzalez Monroy, Sergio" 
> ---
>  lib/Makefile   |  1 +
>  lib/librte_compat/Makefile | 38 ++
>  lib/librte_compat/rte_compat.h | 87 
> ++
>  mk/rte.lib.mk  |  6 +++
>  4 files changed, 132 insertions(+)
>  create mode 100644 lib/librte_compat/Makefile
>  create mode 100644 lib/librte_compat/rte_compat.h
> 
> diff --git a/lib/Makefile b/lib/Makefile
> index 10c5bb3..a85b55b 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -32,6 +32,7 @@
>  include $(RTE_SDK)/mk/rte.vars.mk
>  
>  DIRS-$(CONFIG_RTE_LIBC) += libc
> +DIRS-y += librte_compat
>  DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal
>  DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc
>  DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring
> diff --git a/lib/librte_compat/Makefile b/lib/librte_compat/Makefile
> new file mode 100644
> index 000..3415c7b
> --- /dev/null
> +++ b/lib/librte_compat/Makefile
> @@ -0,0 +1,38 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2010-2014 Neil Horman 
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +# * Redistributions of source code must retain the above copyright
> +#   notice, this list of conditions and the following disclaimer.
> +# * Redistributions in binary form must reproduce the above copyright
> +#   notice, this list of conditions and the following disclaimer in
> +#   the documentation and/or other materials provided with the
> +#   distribution.
> +# * Neither the name of Intel Corporation nor the names of its
> +#   contributors may be used to endorse or promote products derived
> +#   from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +
> +# install includes
> +SYMLINK-y-include := rte_compat.h
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_compat/rte_compat.h b/lib/librte_compat/rte_compat.h
> new file mode 100644
> index 000..cff9aea
> --- /dev/null
> +++ b/lib/librte_compat/rte_compat.h
> @@ -0,0 +1,87 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Neil Horman .
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *

[dpdk-dev] [PATCH v2] distributor_app: new sample app

2014-09-26 Thread De Lara Guarch, Pablo

Hi,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of reshmapa
> Sent: Wednesday, September 24, 2014 3:17 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2] distributor_app: new sample app
> 
> From: Reshma Pattan 
> 
> A new sample app that shows the usage of the distributor library. This
> app works as follows:
> 
> * An RX thread runs which pulls packets from each ethernet port in turn
>   and passes those packets to worker using a distributor component.
> * The workers take the packets in turn, and determine the output port
>   for those packets using basic l2forwarding doing an xor on the source
>   port id.
> * The RX thread takes the returned packets from the workers and enqueue
>   those packets into an rte_ring structure.
> * A TX thread pulls the packets off the rte_ring structure and then
>   sends each packet out the output port specified previously by the worker
> * Command-line option support provided only for portmask.
> 
> Signed-off-by: Bruce Richardson 
> Signed-off-by: Reshma Pattan 
> ---
>  examples/Makefile |   1 +
>  examples/distributor_app/Makefile |  57 
>  examples/distributor_app/main.c   | 585
> ++
>  examples/distributor_app/main.h   |  46 +++
>  4 files changed, 689 insertions(+)
>  create mode 100644 examples/distributor_app/Makefile
>  create mode 100644 examples/distributor_app/main.c
>  create mode 100644 examples/distributor_app/main.h
> 
> diff --git a/examples/Makefile b/examples/Makefile
> index 6245f83..2ba82b0 100644
> --- a/examples/Makefile
> +++ b/examples/Makefile
> @@ -66,5 +66,6 @@ DIRS-y += vhost
>  DIRS-$(CONFIG_RTE_LIBRTE_XEN_DOM0) += vhost_xen
>  DIRS-y += vmdq
>  DIRS-y += vmdq_dcb
> +DIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += distributor_app
> 
>  include $(RTE_SDK)/mk/rte.extsubdir.mk
> diff --git a/examples/distributor_app/Makefile
> b/examples/distributor_app/Makefile
> new file mode 100644
> index 000..394785d
> --- /dev/null
> +++ b/examples/distributor_app/Makefile
> @@ -0,0 +1,57 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +# * Redistributions of source code must retain the above copyright
> +#   notice, this list of conditions and the following disclaimer.
> +# * Redistributions in binary form must reproduce the above copyright
> +#   notice, this list of conditions and the following disclaimer in
> +#   the documentation and/or other materials provided with the
> +#   distribution.
> +# * Neither the name of Intel Corporation nor the names of its
> +#   contributors may be used to endorse or promote products derived
> +#   from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> +
> +ifeq ($(RTE_SDK),)
> +$(error "Please define RTE_SDK environment variable")
> +endif
> +
> +# Default target, can be overriden by command line or environment
> +RTE_TARGET ?= x86_64-default-linuxapp-gcc

This target is not present anymore. Change it to x86_64-native-linuxapp-gcc.

> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +# binary name
> +APP = distributor_app
> +
> +# all source are stored in SRCS-y
> +SRCS-y := main.c
> +
> +CFLAGS += $(WERROR_FLAGS)
> +
> +# workaround for a gcc bug with noreturn attribute
> +# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
> +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
> +CFLAGS_main.o += -Wno-return-type
> +endif
> +
> +EXTRA_CFLAGS += -O3 -Wfatal-errors
> +
> +include $(RTE_SDK)/mk/rte.extapp.mk
> diff --git a/examples/distributor_app/main.c
> b/examples/distributor_app/main.c
> new file mode 100644
> index 000..628810a
> --- /dev/null
> +++ b/examples/distributor_app/main.c
> @@ -0,0 +1,585 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary

[dpdk-dev] [PATCH 0/4] Add DSO symbol versioning to support backwards compatibility

2014-09-26 Thread Stephen Hemminger

On Fri, 26 Sep 2014 10:45:49 -0400
Neil Horman  wrote:

> On Fri, Sep 26, 2014 at 12:41:33PM +0200, Thomas Monjalon wrote:
> > Hi Neil,
> > 
> > 2014-09-24 14:19, Neil Horman:
> > > Ping Thomas. I know you're busy, but I would like this to not fall off 
> > > anyones
> > > radar.  You alluded to concerns regarding what, for lack of a better term,
> > > ABI/API lockin.  I had asked you to enuumerate/elaborate on specifics, 
> > > but never
> > > heard back.  Are there further specifics you wish to discuss, or are you
> > > satisfied with the above answers?
> > 
> > Sorry for not being very reactive on this thread.
> > All this discussion is very interesting but it's really not the proper
> > time to apply it. As you said, it requires an extra effort. I'm not saying
> > it will never be integrated. I'm just saying that we cannot change
> > everything at the same time.
> > 
> > Let me sum up the situation. This community project has been very active
> > for few months now. First, we learnt how to make some releases together
> > and we are improving the process to be able to deliver a new major release
> > every 4 months while having some good quality process.
> > But these releases are still not complete because documentation is not
> > integrated yet. Then developers should have a role in documentation updates.
> > We also need to integrate and learn how to use more tools to be more
> > efficient and improve quality.
> > 
> > So the question is "when should we care about API compatibility"?
> > And the answer is: ASAP, but not now. I feel next year is a better target.
> > Because the most important priority is to move together at a pace which
> > allow most of us to stay in the race.
> > 
> 
> 
> I'm sorry Thomas, I don't accept this.  I asked you for details as to your
> concerns regarding this patch series, and you've provided more vague comments.
> I need details to address
> 
> You say it requires extra effort, you're right it does.  Any feature that you
> integreate requires some additional effort.  How is this patch any different
> from adding the acl library or any other new API?  Everything requires
> maintenence, thats how software works.  What specfically about this patch 
> series
> makes the effort insurmountable to you?
> 
> You say you're improving your process.  Great, this patch aids in that process
> by ensuring backwards compatibility for a period of time.  Given that the API
> and ABI can still evolve within this framework, as I've described, how is this
> patch series not a significant step forward toward your goal of quality 
> process.
> 
> You say documentation isn't integrated.  So, what does getting documentation
> integrated have to do with this patch set, or any other?  I don't see you
> holding any other patches based on documentation.  Again, nothing in this 
> series
> prevents evolution of the API or ABI.  If you're hope is to wait until
> everything is perfect, then apply some control to the public facing API, and 
> get
> it all documented, none of thosse things will ever happen, I promise you.
> 
> You say you also need to learn to use more tools to be more efficient and
> improve quality.  Great!  Thats exactly what this is. If we mandate even a 
> short
> term commitment to ABI stability (1 single relese worth of time), we will
> quickly identify what API's change quickly and where we need to be cautious 
> with
> our API design.  If you just assume that developers will get better of their 
> own
> volition, it will never happen.
> 
> You say this should go in next year, but not now.  When exactly?  What event 
> do
> you forsee occuring in the next 12-18 months that will change everything such
> that we can start supporing an ABI for more than just a few weeks at the head 
> of
> the tree?  
> 
> To this end, I just did a quick search through the git history for dpdk to 
> look
> at the histories of all the header files that are exposed via the makefile
> SYMLINK command (given that that provides a list of header files that
> applications can include, and embodies all the function symbols and data types
> applications have access to.
> 
> There are 179 total commits in that list
> Of those, a bit of spot checking suggests that about 10-15% of them actually
> change ABI, and many of those came from Bruce's rework of the mbuf structure.
> That about 17-20 instances over the last 2 years where an ABI update would 
> have
> been needed.  That seems pretty reasonable to me.  Where exactly is your 
> concern
> here?
> 
> Neil

Isn't ABI stablity a distro responsibility not a project responsibility?
I have lots more API/ABI changes, just been too busy trying to release a real
product using DPDK to upstream all the changes.

[dpdk-dev] DPDK doesn't work with iommu=pt

2014-09-26 Thread Choi, Sy Jong

Hi Shimamoto-san,

There are a lot of sighting relate to "DMAR:[fault reason 06] PTE Read access 
is not set"
https://www.mail-archive.com/kvm at vger.kernel.org/msg106573.html

This might be related to IOMMU, and kernel code.

Here is what we know :-
1) Disabling VT-d in bios also removed the symptom
2) Switch to another OS distribution also removed the symptom
3) even different HW we will not see the symptom. In my case, switch from 
Engineering board to EPSD board.

Regards,
Choi, Sy Jong
Platform Application Engineer


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Hiroshi Shimamoto
Sent: Friday, September 26, 2014 5:14 PM
To: dev at dpdk.org
Cc: Hayato Momma
Subject: [dpdk-dev] DPDK doesn't work with iommu=pt

I encountered an issue that DPDK doesn't work with "iommu=pt intel_iommu=on"
on HP ProLiant DL380p Gen8 server. I'm using the following environment;

  HW: ProLiant DL380p Gen8
  CPU: E5-2697 v2
  OS: RHEL7
  kernel: kernel-3.10.0-123 and the latest kernel 3.17-rc6+
  DPDK: v1.7.1-53-gce5abac
  NIC: 82599ES

When boot with "iommu=pt intel_iommu=on", I got the below message and no 
packets are handled.

  [  120.809611] dmar: DRHD: handling fault status reg 2
  [  120.809635] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr 
aa01
  DMAR:[fault reason 02] Present bit in context entry is clear

How to reproduce;
just run testpmd
# ./testpmd -c 0xf -n 4 -- -i

Configuring Port 0 (socket 0)
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x754eafc0 hw_ring=0x7420 
dma_addr=0xaa00
PMD: ixgbe_dev_tx_queue_setup(): Using full-featured tx code path
PMD: ixgbe_dev_tx_queue_setup():  - txq_flags = 0 [IXGBE_SIMPLE_FLAGS=f01]
PMD: ixgbe_dev_tx_queue_setup():  - tx_rs_thresh = 32 
[RTE_PMD_IXGBE_TX_MAX_BURST=32]
PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x754ea740 hw_ring=0x7421 
dma_addr=0xaa01
PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc 
Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32
PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are not 
satisfied, Scattered Rx is requested, or RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC 
is not enabled (port=0, queue=0).
PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc 
Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32

testpmd> start
  io packet forwarding - CRC stripping disabled - packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  RX queues=1 - RX desc=128 - RX free threshold=0
  RX threshold registers: pthresh=8 hthresh=8 wthresh=0
  TX queues=1 - TX desc=512 - TX free threshold=0
  TX threshold registers: pthresh=32 hthresh=0 wthresh=0
  TX RS bit threshold=0 - TXQ flags=0x0


and ping from another box to this server.
# ping6 -I eth2 ff02::1

I got the below error message and no packet is received.
I couldn't see any increase RX/TX count in testpmt statistics

testpmd> show port stats 0

   NIC statistics for port 0  
  RX-packets: 6  RX-missed: 0  RX-bytes:  732
  RX-badcrc:  0  RX-badlen: 0  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 0  TX-errors: 0  TX-bytes:  0
  
testpmd> show port stats 0

   NIC statistics for port 0  
  RX-packets: 6  RX-missed: 0  RX-bytes:  732
  RX-badcrc:  0  RX-badlen: 0  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 0  TX-errors: 0  TX-bytes:  0
  


The fault addr in error message must be RX DMA descriptor

error message
  [  120.809635] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr 
aa01

log in testpmd
  PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x754ea740 
hw_ring=0x7421 dma_addr=0xaa01

I think the NIC received a packet in fifo and try to put into memory with DMA.
Before starting DMA, the NIC get the target address from RX descriptors in RDBA 
register.
But accessing RX descriptors failed in IOMMU unit and reported it to the kernel.

  DMAR:[fault reason 02] Present bit in context entry is clear

The error message looks there is no valid entry in IOMMU.

I think the following issue is very similar, but using Ubuntu14.04 couldn't fix 
in my case.
http://thread.gmane.org/gmane.comp.networking.dpdk.devel/2281

I tried Ubuntu14.04.1 and got the below error.

  [  199.710191] dmar: DRHD: handling fault status reg 2
  [  199.710896] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr 
7c24df000
  [  199.710896] DMAR:[fault reason 06] PTE Read access is not set

Currently I could see this issue on HP ProLiant DL380p Gen8 only.
Is there any idea?
Has anyone noticed this issue?

Note: we're thinking to use SR-IOV and DPDK app in the same box.
The box has 2 NICs, one for SR-IOV and pass

[dpdk-dev] [PATCH] ixgbe: allow unsupported SFP

2014-09-26 Thread Thomas Monjalon

No need to restrict usage of non Intel SFP.
If (hw->phy.type == ixgbe_phy_sfp_intel) is false,
a warning will be logged.
It was disabled for ixgbe and enabled but unused for i40e.

Signed-off-by: Thomas Monjalon 
---
 config/common_bsdapp| 2 --
 config/common_linuxapp  | 2 --
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 2 --
 3 files changed, 6 deletions(-)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 645949f..eebd05b 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -167,7 +167,6 @@ CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n
 CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=y
-CONFIG_RTE_LIBRTE_IXGBE_ALLOW_UNSUPPORTED_SFP=n
 CONFIG_RTE_IXGBE_INC_VECTOR=n
 CONFIG_RTE_IXGBE_RX_OLFLAGS_DISABLE=n

@@ -182,7 +181,6 @@ CONFIG_RTE_LIBRTE_I40E_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_I40E_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_I40E_PF_DISABLE_STRIP_CRC=y
 CONFIG_RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC=n
-CONFIG_RTE_LIBRTE_I40E_ALLOW_UNSUPPORTED_SFP=y
 CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC=n
 CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF=4
 # interval up to 8160 us, aligned to 2 (or default value)
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 5bee910..4713eb4 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -190,7 +190,6 @@ CONFIG_RTE_LIBRTE_IXGBE_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_IXGBE_PF_DISABLE_STRIP_CRC=n
 CONFIG_RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC=y
-CONFIG_RTE_LIBRTE_IXGBE_ALLOW_UNSUPPORTED_SFP=n
 CONFIG_RTE_IXGBE_INC_VECTOR=y
 CONFIG_RTE_IXGBE_RX_OLFLAGS_ENABLE=y

@@ -205,7 +204,6 @@ CONFIG_RTE_LIBRTE_I40E_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_I40E_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_I40E_PF_DISABLE_STRIP_CRC=n
 CONFIG_RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC=y
-CONFIG_RTE_LIBRTE_I40E_ALLOW_UNSUPPORTED_SFP=n
 CONFIG_RTE_LIBRTE_I40E_16BYTE_RX_DESC=n
 CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF=4
 # interval up to 8160 us, aligned to 2 (or default value)
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index f4b590b..a147d46 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -727,9 +727,7 @@ eth_ixgbe_dev_init(__attribute__((unused)) struct 
eth_driver *eth_drv,
hw->device_id = pci_dev->id.device_id;
hw->vendor_id = pci_dev->id.vendor_id;
hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
-#ifdef RTE_LIBRTE_IXGBE_ALLOW_UNSUPPORTED_SFP
hw->allow_unsupported_sfp = 1;
-#endif

/* Initialize the shared code (base driver) */
 #ifdef RTE_NIC_BYPASS
-- 
2.0.4

[dpdk-dev] [PATCH] examples: do not probe pci twice

2014-09-26 Thread Thomas Monjalon

Since commit a155d430119 ("support link bonding device initialization"),
rte_eal_pci_probe() is called in rte_eal_init().
So it doesn't have to be called by application anymore.
It has been fixed for testpmd in commit 2950a769315,
and this patch remove it from other applications.

Signed-off-by: Thomas Monjalon 
---
 app/test-pipeline/init.c   |  5 ---
 app/test/test_kni.c|  5 ---
 examples/dpdk_qat/main.c   |  3 --
 examples/exception_path/main.c |  5 ---
 examples/ip_fragmentation/main.c   |  3 --
 examples/ip_pipeline/init.c|  5 ---
 examples/ip_reassembly/main.c  |  3 --
 examples/ipv4_multicast/main.c |  3 --
 examples/kni/main.c|  5 ---
 examples/l2fwd-ivshmem/host/host.c |  3 --
 examples/l2fwd/main.c  |  3 --
 examples/l3fwd-acl/main.c  |  3 --
 examples/l3fwd-power/main.c|  3 --
 examples/l3fwd-vf/main.c   |  3 --
 examples/l3fwd/main.c  |  4 --
 examples/link_status_interrupt/main.c  |  3 --
 examples/load_balancer/init.c  |  4 --
 .../client_server_mp/mp_client/client.c|  3 --
 .../client_server_mp/mp_server/init.c  |  6 ---
 .../client_server_mp/shared/init_drivers.h | 49 --
 examples/multi_process/l2fwd_fork/main.c   |  3 --
 examples/multi_process/symmetric_mp/main.c |  6 +--
 examples/netmap_compat/bridge/bridge.c |  4 --
 examples/qos_meter/main.c  |  3 --
 examples/qos_sched/init.c  |  3 --
 examples/quota_watermark/qw/init.c |  7 
 examples/vhost/main.c  |  3 --
 examples/vhost_xen/main.c  |  3 --
 examples/vmdq/main.c   |  3 --
 29 files changed, 2 insertions(+), 154 deletions(-)
 delete mode 100644 
examples/multi_process/client_server_mp/shared/init_drivers.h

diff --git a/app/test-pipeline/init.c b/app/test-pipeline/init.c
index a4337d0..17b6d23 100644
--- a/app/test-pipeline/init.c
+++ b/app/test-pipeline/init.c
@@ -228,11 +228,6 @@ app_init_ports(void)
 {
uint32_t i;

-   /* Init driver */
-   RTE_LOG(INFO, USER1, "Initializing the PMD driver ...\n");
-   if (rte_eal_pci_probe() < 0)
-   rte_panic("Cannot probe PCI\n");
-
/* Init NIC ports, then start the ports */
for (i = 0; i < app.n_ports; i++) {
uint8_t port;
diff --git a/app/test/test_kni.c b/app/test/test_kni.c
index 2860bf3..1081131 100644
--- a/app/test/test_kni.c
+++ b/app/test/test_kni.c
@@ -508,11 +508,6 @@ test_kni(void)
printf("fail to create mempool for kni\n");
return -1;
}
-   ret = rte_eal_pci_probe();
-   if (ret < 0) {
-   printf("fail to probe PCI devices\n");
-   return -1;
-   }

nb_ports = rte_eth_dev_count();
if (nb_ports == 0) {
diff --git a/examples/dpdk_qat/main.c b/examples/dpdk_qat/main.c
index 1599a0a..c130ea3 100644
--- a/examples/dpdk_qat/main.c
+++ b/examples/dpdk_qat/main.c
@@ -696,9 +696,6 @@ MAIN(int argc, char **argv)
if (ret < 0)
return -1;

-   if (rte_eal_pci_probe() < 0)
-   rte_panic("Cannot probe PCI\n");
-
if (check_lcore_params() < 0)
rte_panic("check_lcore_params failed\n");

diff --git a/examples/exception_path/main.c b/examples/exception_path/main.c
index f286bf2..b485976 100644
--- a/examples/exception_path/main.c
+++ b/examples/exception_path/main.c
@@ -567,11 +567,6 @@ main(int argc, char** argv)
return -1;
}

-   /* Scan PCI bus for recognised devices */
-   ret = rte_eal_pci_probe();
-   if (ret < 0)
-   FATAL_ERROR("Could not probe PCI (%d)", ret);
-
/* Get number of ports found in scan */
nb_sys_ports = rte_eth_dev_count();
if (nb_sys_ports == 0)
diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index 6d309b5..75028ac 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -871,9 +871,6 @@ MAIN(int argc, char **argv)
if (ret < 0)
rte_exit(EXIT_FAILURE, "Invalid arguments");

-   if (rte_eal_pci_probe() < 0)
-   rte_panic("Cannot probe PCI\n");
-
nb_ports = rte_eth_dev_count();
if (nb_ports > RTE_MAX_ETHPORTS)
nb_ports = RTE_MAX_ETHPORTS;
diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c
index e3ebd46..cb7568b 100644
--- a/examples/ip_pipeline/init.c
+++ b/examples/ip_pipeline/init.c
@@ -474,11 +474,6 @@ app_init_ports(void)
 {

[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Ananyev, Konstantin

> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> Sent: Friday, September 26, 2014 2:40 PM
> To: Wodkowski, PawelX
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to 
> thread-safe:
> 
> On Fri, Sep 26, 2014 at 12:37:54PM +, Wodkowski, PawelX wrote:
> > > So basically cancel() just set ALARM_CANCELLED and leaves actual alarm
> > > deletion to the callback()?
> > > That was the thought, yes.
> > >
> > > > I think it is doable - but I don't see any real advantage with that 
> > > > approach.
> > > > Yes, code will become a bit simpler, as  we'll have one point when we 
> > > > remove
> > > alarm from the list.
> > > Yes, that would be the advantage, that the code would be much simpler.
> > >
> > > > But from other side, imagine such simple test-case:
> > > >
> > > > for (i = 0; i < 0x10; i++) {
> > > >rte_eal_alarm_set(ONE_MIN, cb_func, (void *)i);
> > > >rte_eal_alarm_cancel(cb_func, (void *)i);
> > > > }
> > > >
> > > > We'll endup with 1M of cancelled, but still not removed entries in the
> > > alarm_list.
> > > > With current implementation that means - few MBs of wasted memory,
> > > Thats correct, and the tradeoff to choose between.  Do you want simpler 
> > > code
> > > that is easier to maintain, or do you want a high speed cancel and set
> > > operation.  I'm not aware of all the use cases, but I have a hard time 
> > > seeing
> > > a use case in which the in-flight alarm list grows unboundedly large, 
> > > which in
> > > my mind mitigates the risk of deferred removal, but I'm perfectly willing 
> > > to
> > > believe that there are use cases which I'm not aware of.

After executing example above - from user perspective there is no active alarms 
in the system at all.
Though in fact alarm_list contains 1M entries. 

> > >
> > > > plus very slow set() and cancel(), as they'll  have to traverse all 
> > > > entries in the
> > > list.
> > > > And all that - for empty from user perspective alarm_list
> > > > So I still prefer Michal's way.
> > > > After all, it doesn't look that complicated to me.
> > > Except that the need for Michals fix arose from the fact that we have two 
> > > free
> > > locations that might both get called depending on the situation.  Thats 
> > > what I'm
> > > trying to address here, the complexity itself, rather than the fix (which 
> > > I
> > > agree is perfectly valid).

Well, I believe his fix addresses all the open issues, no?

Another thing, as Pawel pointed in another mail, your fix doesn't address 
properly the situation when
we have a racing alarm_cancel(cb_func) and alarm cb_func rearming itself. 
While the original patch does.

> > >
> > > > BTW, any particular reason you are so negative about pthread_self()?
> > > >
> > > Nothing specifically against it (save for its inverted return code sense, 
> > > which
> > > made it difficult for me to parse when reviewing).  Its more the 
> > > complexity
> > > itself in the alarm cancel and callback routine that I was looking at.  
> > > Given
> > > that the origional bug happened because an cancel in a callback might 
> > > produce a
> > > double free, I wanted to fix it by simpifying the code, not adding 
> > > conditions
> > > which make it more complex.
> > >
> > > You know, looking at it, something else just occured to me.  I think this 
> > > could
> > > all be fixed without the cancel flag or the pthread_self assignments.  
> > > What if
> > > we simply removed the alarm from the list before we called the callback in
> > > rte_eal_alarm_callback()?  That way any cancel operation called from 
> > > within the
> > > callback would fail, as it wouldn't appear on the list, and the callback
> > > operation would be the only freeing entity?  That would let you still 
> > > have a
> > > fast set and cancel, and avoid the race.  Thoughts?  Untested sample patch
> > > below


Hmm, and how it would address any of the problems I mentioned above:

1. The alarm_list still would grow by the repetition of: {alarm_set(x); 
alarm_cansel(x);} 
2. Race between alarm_cancel(cb_func) and alarm cb_func rearming itself.

?

> > >
> > >
> > > > >
> > > > > It also seems like the alarm api as a whole could use some 
> > > > > improvement.
> > > The
> > > > > way its written right now, theres no way to refer to a specific alarm 
> > > > > (i.e.
> > > > > cancelation relies on the specification of a function and data 
> > > > > pointer, which
> > > > > may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an 
> > > > > opaque
> > > > > handle to a unique timer instance that can be store by a caller and 
> > > > > used to
> > > > > specfically cancel that timer?  Thats how both the bsd and linux timer
> > > > > subsystems model timers.
> > > >
> > > > Yeh,  alarm API looks a bit unusual.
> > > > Though, I suppose that's subject for another patch/discussion :)
> > > >
> > > Yes, agreed :)
> > >
> >
> > Please read quoted message bellow:
> >
> > > >
> > >

[dpdk-dev] [PATCH v3 20/20] app/test-pmd: add test command to configure flexible masks

2014-09-26 Thread Jingjing Wu

add test command to configure flexible masks for each flow type

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 app/test-pmd/cmdline.c | 173 +
 1 file changed, 173 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index da77752..073b929 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -690,6 +690,11 @@ static void cmd_help_long_parsed(void *parsed_result,
"flow_director_flex_payload (port_id)"
" (l2|l3|l4) (config)\n"
"configure flexible payload selection.\n\n"
+
+   "flow_director_flex_mask (port_id)"
+   " flow (ether|ip4|tcp4|udp4|sctp4|ip6|tcp6|udp6|sctp6)"
+   " words_mask (words) (word_mask_list)"
+   "configure masks of flexible payload.\n\n"
);
}
 }
@@ -8046,6 +8051,173 @@ cmdline_parse_inst_t cmd_set_flow_director_flex_payload 
= {
},
 };

+/* *** deal with flow director mask on flexible payload *** */
+struct cmd_flow_director_flex_mask_result {
+   cmdline_fixed_string_t flow_director_flexmask;
+   uint8_t port_id;
+   cmdline_fixed_string_t flow;
+   cmdline_fixed_string_t flow_type;
+   cmdline_fixed_string_t words_mask;
+   uint8_t words;
+   cmdline_fixed_string_t word_mask_list;
+};
+
+static inline int
+parse_word_masks_cfg(const char *q_arg,
+struct rte_eth_fdir_flex_masks *masks)
+{
+   char s[256];
+   const char *p, *p0 = q_arg;
+   char *end;
+   enum fieldnames {
+   FLD_OFFSET = 0,
+   FLD_MASK,
+   _NUM_FLD
+   };
+   unsigned long int_fld[_NUM_FLD];
+   char *str_fld[_NUM_FLD];
+   int i;
+   unsigned size;
+
+   masks->nb_field = 0;
+   p = strchr(p0, '(');
+   while (p != NULL) {
+   ++p;
+   p0 = strchr(p, ')');
+   if (p0 == NULL)
+   return -1;
+
+   size = p0 - p;
+   if (size >= sizeof(s))
+   return -1;
+
+   snprintf(s, sizeof(s), "%.*s", size, p);
+   if (rte_strsplit(s, sizeof(s), str_fld, _NUM_FLD, ',') != 
_NUM_FLD)
+   return -1;
+   for (i = 0; i < _NUM_FLD; i++) {
+   errno = 0;
+   int_fld[i] = strtoul(str_fld[i], , 0);
+   if (errno != 0 || end == str_fld[i] || int_fld[i] > 
UINT16_MAX)
+   return -1;
+   }
+   masks->field[masks->nb_field].offset =
+   (uint16_t)int_fld[FLD_OFFSET];
+   masks->field[masks->nb_field].bitmask =
+   ~(uint16_t)int_fld[FLD_MASK];
+   masks->nb_field++;
+   if (masks->nb_field > 2) {
+   printf("exceeded max number of fields: %hu\n",
+   masks->nb_field);
+   return -1;
+   }
+   p = strchr(p0, '(');
+   }
+   return 0;
+}
+
+static void
+cmd_flow_director_flex_mask_parsed(void *parsed_result,
+ __attribute__((unused)) struct cmdline *cl,
+ __attribute__((unused)) void *data)
+{
+   struct cmd_flow_director_flex_mask_result *res = parsed_result;
+   struct rte_eth_fdir_flex_masks *flex_masks;
+   struct rte_eth_fdir_cfg fdir_cfg;
+   int ret = 0;
+   int cfg_size = 2 * sizeof(struct rte_eth_flex_mask) +
+ offsetof(struct rte_eth_fdir_flex_masks, field);
+
+   ret = rte_eth_dev_filter_supported(res->port_id, RTE_ETH_FILTER_FDIR);
+   if (ret < 0) {
+   printf("flow director is not supported on port %u.\n",
+   res->port_id);
+   return;
+   }
+
+   memset(_cfg, 0, sizeof(struct rte_eth_fdir_cfg));
+
+   flex_masks = (struct rte_eth_fdir_flex_masks *)rte_zmalloc_socket("CLI",
+   cfg_size, CACHE_LINE_SIZE, rte_socket_id());
+
+   if (flex_masks == NULL) {
+   printf("fail to malloc memory to configure flexi masks\n");
+   return;
+   }
+
+   if (!strcmp(res->flow_type, "ip4"))
+   flex_masks->flow_type = RTE_ETH_FLOW_TYPE_IPV4_OTHER;
+   else if (!strcmp(res->flow_type, "udp4"))
+   flex_masks->flow_type = RTE_ETH_FLOW_TYPE_UDPV4;
+   else if (!strcmp(res->flow_type, "tcp4"))
+   flex_masks->flow_type = RTE_ETH_FLOW_TYPE_TCPV4;
+   else if (!strcmp(res->flow_type, "sctp4"))
+   flex_masks->flow_type = RTE_ETH_FLOW_TYPE_SCTPV4;
+   else if (!strcmp(res->flow_type, "ip6"))
+   flex_masks->flow_type = RTE_ETH_FLOW_TYPE_IPV6_OTHER;
+   else if (!strcmp(res->flow_type, "udp6"))
+

[dpdk-dev] [PATCH v3 19/20] i40e: implement operations to configure flexible masks

2014-09-26 Thread Jingjing Wu

implement operation to flexible masks for each flow type in i40e pmd driver

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_fdir.c | 121 
 1 file changed, 121 insertions(+)

diff --git a/lib/librte_pmd_i40e/i40e_fdir.c b/lib/librte_pmd_i40e/i40e_fdir.c
index 01693a2..ddb436e 100644
--- a/lib/librte_pmd_i40e/i40e_fdir.c
+++ b/lib/librte_pmd_i40e/i40e_fdir.c
@@ -85,6 +85,8 @@
 static int i40e_fdir_rx_queue_init(struct i40e_rx_queue *rxq);
 static int i40e_set_flx_pld_cfg(struct i40e_pf *pf,
 struct rte_eth_flex_payload_cfg *cfg);
+static int i40e_set_fdir_flx_mask(struct i40e_pf *pf,
+   struct rte_eth_fdir_flex_masks *flex_masks);
 static int i40e_fdir_construct_pkt(struct i40e_pf *pf,
 struct rte_eth_fdir_input *fdir_input,
 unsigned char *raw_pkt);
@@ -419,6 +421,122 @@ i40e_set_flx_pld_cfg(struct i40e_pf *pf,

return 0;
 }
+
+static inline void
+i40e_set_flex_mask_on_pctype(
+   struct i40e_hw *hw,
+   enum i40e_filter_pctype pctype,
+   struct rte_eth_fdir_flex_masks *flex_masks)
+{
+   uint32_t flxinset, mask;
+   int i;
+
+   flxinset = (flex_masks->words_mask <<
+   I40E_PRTQF_FD_FLXINSET_INSET_SHIFT) &
+   I40E_PRTQF_FD_FLXINSET_INSET_MASK;
+   I40E_WRITE_REG(hw, I40E_PRTQF_FD_FLXINSET(pctype), flxinset);
+
+   for (i = 0; i < flex_masks->nb_field; i++) {
+   mask = (flex_masks->field[i].bitmask <<
+   I40E_PRTQF_FD_MSK_MASK_SHIFT) &
+   I40E_PRTQF_FD_MSK_MASK_MASK;
+   mask |= ((flex_masks->field[i].offset +
+   I40E_FLX_OFFSET_IN_FIELD_VECTOR) <<
+   I40E_PRTQF_FD_MSK_OFFSET_SHIFT) &
+   I40E_PRTQF_FD_MSK_OFFSET_MASK;
+   I40E_WRITE_REG(hw, I40E_PRTQF_FD_MSK(pctype, i), mask);
+   }
+}
+
+/*
+ * i40e_set_fdir_flx_mask - configure the mask on flexible payload
+ */
+static int
+i40e_set_fdir_flx_mask(struct i40e_pf *pf,
+   struct rte_eth_fdir_flex_masks *flex_masks)
+{
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   struct rte_eth_fdir_info fdir;
+   int ret = 0;
+
+   if (flex_masks == NULL)
+   return -EINVAL;
+
+   if (flex_masks->nb_field > 2) {
+   PMD_DRV_LOG(ERR, "bit masks cannot support more than 2 words.");
+   return -EINVAL;
+   }
+   /*
+* flexible payload masks need to be configured before
+* flow director filters are added
+* If filters exist, flush them.
+*/
+   memset(, 0, sizeof(fdir));
+   i40e_fdir_info_get(pf, );
+   if (fdir.info_ext.best_cnt + fdir.info_ext.guarant_cnt > 0) {
+   ret = i40e_fdir_flush(pf);
+   if (ret) {
+   PMD_DRV_LOG(ERR, " failed to flush fdir table.");
+   return ret;
+   }
+   }
+
+   switch (flex_masks->flow_type) {
+   case RTE_ETH_FLOW_TYPE_UDPV4:
+   i40e_set_flex_mask_on_pctype(hw,
+   I40E_FILTER_PCTYPE_NONF_IPV4_UDP,
+   flex_masks);
+   break;
+   case RTE_ETH_FLOW_TYPE_TCPV4:
+   i40e_set_flex_mask_on_pctype(hw,
+   I40E_FILTER_PCTYPE_NONF_IPV4_TCP,
+   flex_masks);
+   break;
+   case RTE_ETH_FLOW_TYPE_SCTPV4:
+   i40e_set_flex_mask_on_pctype(hw,
+   I40E_FILTER_PCTYPE_NONF_IPV4_SCTP,
+   flex_masks);
+   break;
+   case RTE_ETH_FLOW_TYPE_IPV4_OTHER:
+   /* set mask for both NONF_IPV4 and FRAG_IPV4 PCTYPE*/
+   i40e_set_flex_mask_on_pctype(hw,
+   I40E_FILTER_PCTYPE_NONF_IPV4_OTHER,
+   flex_masks);
+   i40e_set_flex_mask_on_pctype(hw,
+   I40E_FILTER_PCTYPE_FRAG_IPV4,
+   flex_masks);
+   break;
+   case RTE_ETH_FLOW_TYPE_UDPV6:
+   i40e_set_flex_mask_on_pctype(hw,
+   I40E_FILTER_PCTYPE_NONF_IPV6_UDP,
+   flex_masks);
+   break;
+   case RTE_ETH_FLOW_TYPE_TCPV6:
+   i40e_set_flex_mask_on_pctype(hw,
+   I40E_FILTER_PCTYPE_NONF_IPV6_TCP,
+   flex_masks);
+   case RTE_ETH_FLOW_TYPE_SCTPV6:
+   i40e_set_flex_mask_on_pctype(hw,
+   I40E_FILTER_PCTYPE_NONF_IPV6_SCTP,
+   flex_masks);
+   break;
+   case RTE_ETH_FLOW_TYPE_IPV6_OTHER:
+   /* set mask for both NONF_IPV6 and FRAG_IPV6 PCTYPE*/
+   i40e_set_flex_mask_on_pctype(hw,
+

[dpdk-dev] [PATCH v3 18/20] lib/librte_ether: define structures for configuring flex masks

2014-09-26 Thread Jingjing Wu

define structures for configuring flexible masks

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 lib/librte_ether/rte_eth_ctrl.h | 24 
 1 file changed, 24 insertions(+)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index e412471..13861c8 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -119,7 +119,28 @@ struct rte_eth_flex_payload_cfg {
struct rte_eth_field_vector field[0];
 };

+/**
+ * A structure defined to specify each word's bit mask
+ */
+struct rte_eth_flex_mask {
+   uint8_t offset;  /**< word offset of word in flexible payload */
+   uint16_t bitmask;/**< bit mask for word defined by offset */
+};
+
+/**
+ * A structure used to configure FDIR masks for flexible payload
+ * for each flow type
+ */
+struct rte_eth_fdir_flex_masks {
+   enum rte_eth_flow_type flow_type;  /**< flow type */
+   uint8_t words_mask;  /**< bit i enables word i of 8 words flexible 
payload */
+   uint8_t nb_field;   /**< the number of folloing fieds */
+   struct rte_eth_flex_mask field[0];
+};
+
 #define RTE_ETH_FDIR_CFG_FLX  0x0001
+#define RTE_ETH_FDIR_CFG_MASK 0x0002
+#define RTE_ETH_FDIR_CFG_FLX_MASK 0x0003
 /**
  * A structure used to config FDIR filter global set
  * to support RTE_ETH_FILTER_FDIR with RTE_ETH_FILTER_OP_SET operation.
@@ -129,9 +150,12 @@ struct rte_eth_fdir_cfg {
/**
 * A pointer to structure for the configuration e.g.
 * struct rte_eth_flex_payload_cfg for FDIR_CFG_FLX
+* struct rte_fdir_masks mask for FDIR_MASK
+* struct rte_eth_fdir_flex_masks for FDIR_FLX_MASK
*/
void *cfg;
 };
+
 /**
  * A structure used to define the input for IPV4 UDP flow
  */
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 17/20] app/test-pmd: add test command to configure flexible payload

2014-09-26 Thread Jingjing Wu

add test command to configure flexible payload

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 app/test-pmd/cmdline.c | 144 +
 1 file changed, 144 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 240a464..da77752 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -58,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -685,6 +686,10 @@ static void cmd_help_long_parsed(void *parsed_result,

"flush_flow_diretor (port_id)\n"
"Flush all flow director entries of a device.\n\n"
+
+   "flow_director_flex_payload (port_id)"
+   " (l2|l3|l4) (config)\n"
+   "configure flexible payload selection.\n\n"
);
}
 }
@@ -7903,6 +7908,144 @@ cmdline_parse_inst_t cmd_flush_flow_director = {
},
 };

+/* *** deal with flow director flexible payload configuration *** */
+struct cmd_flow_director_flexpayload_result {
+   cmdline_fixed_string_t flow_director_flexpayload;
+   uint8_t port_id;
+   cmdline_fixed_string_t payload_layer;
+   cmdline_fixed_string_t payload_cfg;
+};
+
+static inline int
+parse_flex_payload_cfg(const char *q_arg,
+struct rte_eth_flex_payload_cfg *cfg)
+{
+   char s[256];
+   const char *p, *p0 = q_arg;
+   char *end;
+   enum fieldnames {
+   FLD_OFFSET = 0,
+   FLD_SIZE,
+   _NUM_FLD
+   };
+   unsigned long int_fld[_NUM_FLD];
+   char *str_fld[_NUM_FLD];
+   int i;
+   unsigned size;
+
+   cfg->nb_field = 0;
+   p = strchr(p0, '(');
+   while (p != NULL) {
+   ++p;
+   p0 = strchr(p, ')');
+   if (p0 == NULL)
+   return -1;
+
+   size = p0 - p;
+   if (size >= sizeof(s))
+   return -1;
+
+   snprintf(s, sizeof(s), "%.*s", size, p);
+   if (rte_strsplit(s, sizeof(s), str_fld, _NUM_FLD, ',') != 
_NUM_FLD)
+   return -1;
+   for (i = 0; i < _NUM_FLD; i++) {
+   errno = 0;
+   int_fld[i] = strtoul(str_fld[i], , 0);
+   if (errno != 0 || end == str_fld[i] || int_fld[i] > 255)
+   return -1;
+   }
+   cfg->field[cfg->nb_field].offset = (uint8_t)int_fld[FLD_OFFSET];
+   cfg->field[cfg->nb_field].size = (uint8_t)int_fld[FLD_SIZE];
+   cfg->nb_field++;
+   if (cfg->nb_field > 3) {
+   printf("exceeded max number of fields\n");
+   return -1;
+   }
+   p = strchr(p0, '(');
+   }
+   return 0;
+}
+
+static void
+cmd_flow_director_flxpld_parsed(void *parsed_result,
+ __attribute__((unused)) struct cmdline *cl,
+ __attribute__((unused)) void *data)
+{
+   struct cmd_flow_director_flexpayload_result *res = parsed_result;
+   struct rte_eth_fdir_cfg fdir_cfg;
+   struct rte_eth_flex_payload_cfg *flxpld_cfg;
+   int ret = 0;
+   int cfg_size = 3 * sizeof(struct rte_eth_field_vector) +
+ offsetof(struct rte_eth_flex_payload_cfg, field);
+
+   ret = rte_eth_dev_filter_supported(res->port_id, RTE_ETH_FILTER_FDIR);
+   if (ret < 0) {
+   printf("flow director is not supported on port %u.\n",
+   res->port_id);
+   return;
+   }
+
+   memset(_cfg, 0, sizeof(struct rte_eth_fdir_cfg));
+
+   flxpld_cfg = (struct rte_eth_flex_payload_cfg 
*)rte_zmalloc_socket("CLI",
+   cfg_size, CACHE_LINE_SIZE, rte_socket_id());
+
+   if (flxpld_cfg == NULL) {
+   printf("fail to malloc memory to configure flexible payload\n");
+   return;
+   }
+
+   if (!strcmp(res->payload_layer, "l2"))
+   flxpld_cfg->type = RTE_ETH_L2_PAYLOAD;
+   else if (!strcmp(res->payload_layer, "l3"))
+   flxpld_cfg->type = RTE_ETH_L3_PAYLOAD;
+   else if (!strcmp(res->payload_layer, "l4"))
+   flxpld_cfg->type = RTE_ETH_L4_PAYLOAD;
+
+   ret = parse_flex_payload_cfg(res->payload_cfg, flxpld_cfg);
+   if (ret < 0) {
+   printf("fdir flexpayload parsing error: (%s)\n",
+   strerror(-ret));
+   rte_free(flxpld_cfg);
+   return;
+   }
+   fdir_cfg.cmd = RTE_ETH_FDIR_CFG_FLX;
+   fdir_cfg.cfg = flxpld_cfg;
+   ret = rte_eth_dev_filter_ctrl(res->port_id, RTE_ETH_FILTER_FDIR,
+RTE_ETH_FILTER_OP_SET, _cfg);
+   if (ret < 0)
+   printf("fdir flexpayload setting error: (%s)\n",
+

[dpdk-dev] [PATCH v3 15/20] lib/librte_ether: define structures for configuring flexible payload

2014-09-26 Thread Jingjing Wu

define structures for configuring flexible payload

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 lib/librte_ether/rte_eth_ctrl.h | 42 +
 1 file changed, 42 insertions(+)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index 208082e..e412471 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -75,6 +75,25 @@ enum rte_filter_op {
  * Define all structures for Flow Director Filter type corresponding with 
specific operations.
  */

+
+/**
+ * A structure defined a field vector to specify each field.
+ */
+struct rte_eth_field_vector {
+   uint8_t offset;   /**< Source word offset */
+   uint8_t size; /**< Field Size defined in word units */
+};
+
+/**
+ * payload type
+ */
+enum rte_eth_payload_type {
+   RTE_ETH_PAYLOAD_UNKNOWN = 0,
+   RTE_ETH_L2_PAYLOAD,
+   RTE_ETH_L3_PAYLOAD,
+   RTE_ETH_L4_PAYLOAD,
+};
+
 /**
  * flow type
  */
@@ -91,6 +110,29 @@ enum rte_eth_flow_type {
 };

 /**
+ * A structure used to select fields extracted from the protocol layers to
+ * the Field Vector as flexible payload for filter
+ */
+struct rte_eth_flex_payload_cfg {
+   enum rte_eth_payload_type type;  /**< payload type */
+   uint8_t nb_field;/**< the number of folloing fieds */
+   struct rte_eth_field_vector field[0];
+};
+
+#define RTE_ETH_FDIR_CFG_FLX  0x0001
+/**
+ * A structure used to config FDIR filter global set
+ * to support RTE_ETH_FILTER_FDIR with RTE_ETH_FILTER_OP_SET operation.
+ */
+struct rte_eth_fdir_cfg {
+   uint16_t cmd;  /**< define sub command in the generic OP_SET  */
+   /**
+* A pointer to structure for the configuration e.g.
+* struct rte_eth_flex_payload_cfg for FDIR_CFG_FLX
+   */
+   void *cfg;
+};
+/**
  * A structure used to define the input for IPV4 UDP flow
  */
 struct rte_eth_udpv4_flow {
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 14/20] app/test-pmd: add test command to flush flow director table

2014-09-26 Thread Jingjing Wu

add test command to flush flow director table

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 app/test-pmd/cmdline.c | 49 +
 1 file changed, 49 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 173e863..240a464 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -682,6 +682,9 @@ static void cmd_help_long_parsed(void *parsed_result,
" flexwords (flexwords_value) (drop|fwd)"
" queue (queue_id) fd_id (fd_id_value)\n"
"Add/Del a SCTP type flow director filter.\n\n"
+
+   "flush_flow_diretor (port_id)\n"
+   "Flush all flow director entries of a device.\n\n"
);
}
 }
@@ -7855,6 +7858,51 @@ cmdline_parse_inst_t cmd_add_del_sctp_flow_director = {
},
 };

+struct cmd_flush_flow_director_result {
+   cmdline_fixed_string_t flush_flow_director;
+   uint8_t port_id;
+};
+
+cmdline_parse_token_string_t cmd_flush_flow_director_flush =
+   TOKEN_STRING_INITIALIZER(struct cmd_flush_flow_director_result,
+flush_flow_director, "flush_flow_director");
+cmdline_parse_token_num_t cmd_flush_flow_director_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_flush_flow_director_result,
+ port_id, UINT8);
+
+static void
+cmd_flush_flow_director_parsed(void *parsed_result,
+ __attribute__((unused)) struct cmdline *cl,
+ __attribute__((unused)) void *data)
+{
+   struct cmd_flow_director_result *res = parsed_result;
+   int ret = 0;
+
+   ret = rte_eth_dev_filter_supported(res->port_id, RTE_ETH_FILTER_FDIR);
+   if (ret < 0) {
+   printf("flow director is not supported on port %u.\n",
+   res->port_id);
+   return;
+   }
+
+   ret = rte_eth_dev_filter_ctrl(res->port_id, RTE_ETH_FILTER_FDIR,
+   RTE_ETH_FILTER_OP_FLUSH, NULL);
+   if (ret < 0)
+   printf("flow director table flushing error: (%s)\n",
+   strerror(-ret));
+}
+
+cmdline_parse_inst_t cmd_flush_flow_director = {
+   .f = cmd_flush_flow_director_parsed,
+   .data = NULL,
+   .help_str = "flush all flow director entries of a device on NIC",
+   .tokens = {
+   (void *)_flush_flow_director_flush,
+   (void *)_flush_flow_director_port_id,
+   NULL,
+   },
+};
+
 /* 

 */

 /* list of instructions */
@@ -7984,6 +8032,7 @@ cmdline_parse_ctx_t main_ctx[] = {
(cmdline_parse_inst_t *)_add_del_ip_flow_director,
(cmdline_parse_inst_t *)_add_del_udp_flow_director,
(cmdline_parse_inst_t *)_add_del_sctp_flow_director,
+   (cmdline_parse_inst_t *)_flush_flow_director,
NULL,
 };

-- 
1.8.1.4

[dpdk-dev] [PATCH v3 13/20] i40e: implement operation to flush flow director table

2014-09-26 Thread Jingjing Wu

implement operation to flush flow director table

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_fdir.c | 43 +
 1 file changed, 43 insertions(+)

diff --git a/lib/librte_pmd_i40e/i40e_fdir.c b/lib/librte_pmd_i40e/i40e_fdir.c
index d6c1793..973c8e0 100644
--- a/lib/librte_pmd_i40e/i40e_fdir.c
+++ b/lib/librte_pmd_i40e/i40e_fdir.c
@@ -92,6 +92,7 @@ static int i40e_fdir_filter_programming(struct i40e_pf *pf,
enum i40e_filter_pctype pctype,
struct rte_eth_fdir_filter *filter,
bool add);
+static int i40e_fdir_flush(struct i40e_pf *pf);
 static void i40e_fdir_info_get(struct i40e_pf *pf,
   struct rte_eth_fdir_info *fdir);

@@ -867,6 +868,45 @@ i40e_fdir_filter_programming(struct i40e_pf *pf,
 }

 /*
+ * i40e_fdir_flush - clear all filters of Flow Director
+ * @pf: board private structure
+ */
+static int
+i40e_fdir_flush(struct i40e_pf *pf)
+{
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   uint32_t reg;
+   uint16_t guarant_cnt, best_cnt;
+   int i;
+
+   I40E_WRITE_REG(hw, I40E_PFQF_CTL_1, I40E_PFQF_CTL_1_CLEARFDTABLE_MASK);
+   I40E_WRITE_FLUSH(hw);
+
+   for (i = 0; i < I40E_FDIR_FLUSH_RETRY; i++) {
+   rte_delay_ms(I40E_FDIR_FLUSH_INTERVAL_MS);
+   reg = I40E_READ_REG(hw, I40E_PFQF_CTL_1);
+   if (!(reg & I40E_PFQF_CTL_1_CLEARFDTABLE_MASK))
+   break;
+   }
+   if (i >= I40E_FDIR_FLUSH_RETRY) {
+   PMD_DRV_LOG(ERR, "FD table did not flush, may need more time.");
+   return -ETIMEDOUT;
+   }
+   guarant_cnt = (uint16_t)((I40E_READ_REG(hw, I40E_PFQF_FDSTAT) &
+   I40E_PFQF_FDSTAT_GUARANT_CNT_MASK) >>
+   I40E_PFQF_FDSTAT_GUARANT_CNT_SHIFT);
+   best_cnt = (uint16_t)((I40E_READ_REG(hw, I40E_PFQF_FDSTAT) &
+   I40E_PFQF_FDSTAT_BEST_CNT_MASK) >>
+   I40E_PFQF_FDSTAT_BEST_CNT_SHIFT);
+   if (guarant_cnt != 0 || best_cnt != 0) {
+   PMD_DRV_LOG(ERR, "Failed to flush FD table.");
+   return -ENOSYS;
+   } else
+   PMD_DRV_LOG(INFO, "FD table Flush success.");
+   return 0;
+}
+
+/*
  * i40e_fdir_info_get - get information of Flow Director
  * @pf: ethernet device to get info from
  * @fdir: a pointer to a structure of type *rte_eth_fdir_info* to be filled 
with
@@ -925,6 +965,9 @@ i40e_fdir_ctrl_func(struct i40e_pf *pf, enum rte_filter_op 
filter_op, void *arg)
(struct rte_eth_fdir_filter *)arg,
FALSE);
break;
+   case RTE_ETH_FILTER_OP_FLUSH:
+   ret = i40e_fdir_flush(pf);
+   break;
case RTE_ETH_FILTER_OP_GET_INFO:
i40e_fdir_info_get(pf, (struct rte_eth_fdir_info *)arg);
break;
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 12/20] app/test-pmd: display fdir statistics

2014-09-26 Thread Jingjing Wu

display flow director's statistics information

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 app/test-pmd/config.c | 40 +++-
 1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 2a1b93f..f28e338 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1815,26 +1815,48 @@ fdir_remove_signature_filter(portid_t port_id,
 void
 fdir_get_infos(portid_t port_id)
 {
-   struct rte_eth_fdir fdir_infos;
+   struct rte_eth_fdir_info fdir_infos;
+   int ret;

static const char *fdir_stats_border = "";

if (port_id_is_invalid(port_id))
return;

-   rte_eth_dev_fdir_get_infos(port_id, _infos);
-
+   memset(_infos, 0, sizeof(fdir_infos));
+   ret = rte_eth_dev_filter_ctrl(port_id, RTE_ETH_FILTER_FDIR,
+  RTE_ETH_FILTER_OP_GET_INFO, _infos);
+   if (ret < 0) {
+   ret = rte_eth_dev_fdir_get_infos(port_id, _infos.info);
+   if (ret < 0) {
+   printf("\n getting fdir info fails on port %-2d\n",
+   port_id);
+   return;
+   }
+   fdir_infos.mode = (fdir_conf.mode == RTE_FDIR_MODE_NONE) ? 0 : 
1;
+   }
printf("\n  %s FDIR infos for port %-2d %s\n",
   fdir_stats_border, port_id, fdir_stats_border);
-
+   if (fdir_infos.mode)
+   printf("  FDIR is enabled\n");
+   else
+   printf("  FDIR is disabled\n");
printf("  collision: %-10"PRIu64"  free: %"PRIu64"\n"
   "  maxhash:   %-10"PRIu64"  maxlen:   %"PRIu64"\n"
-  "  add:   %-10"PRIu64"  remove:   %"PRIu64"\n"
+  "  add:   %-10"PRIu64"  remove:   %"PRIu64"\n"
   "  f_add: %-10"PRIu64"  f_remove: %"PRIu64"\n",
-  (uint64_t)(fdir_infos.collision), (uint64_t)(fdir_infos.free),
-  (uint64_t)(fdir_infos.maxhash), (uint64_t)(fdir_infos.maxlen),
-  fdir_infos.add, fdir_infos.remove,
-  fdir_infos.f_add, fdir_infos.f_remove);
+  (uint64_t)(fdir_infos.info.collision), 
(uint64_t)(fdir_infos.info.free),
+  (uint64_t)(fdir_infos.info.maxhash), 
(uint64_t)(fdir_infos.info.maxlen),
+  fdir_infos.info.add, fdir_infos.info.remove,
+  fdir_infos.info.f_add, fdir_infos.info.f_remove);
+   printf("  guarant_space: %-10"PRIu16
+  "  best_space:%-10"PRIu16"\n",
+  fdir_infos.info_ext.guarant_spc,
+  fdir_infos.info_ext.best_spc);
+   printf("  guarant_count: %-10"PRIu16
+  "  best_count:%-10"PRIu16"\n",
+  fdir_infos.info_ext.guarant_cnt,
+  fdir_infos.info_ext.best_cnt);
printf("  %s%s\n",
   fdir_stats_border, fdir_stats_border);
 }
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 11/20] i40e: implement operations to get fdir info

2014-09-26 Thread Jingjing Wu

implement operation to get flow director information in i40e pmd driver

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_fdir.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/lib/librte_pmd_i40e/i40e_fdir.c b/lib/librte_pmd_i40e/i40e_fdir.c
index 82645df..d6c1793 100644
--- a/lib/librte_pmd_i40e/i40e_fdir.c
+++ b/lib/librte_pmd_i40e/i40e_fdir.c
@@ -92,6 +92,8 @@ static int i40e_fdir_filter_programming(struct i40e_pf *pf,
enum i40e_filter_pctype pctype,
struct rte_eth_fdir_filter *filter,
bool add);
+static void i40e_fdir_info_get(struct i40e_pf *pf,
+  struct rte_eth_fdir_info *fdir);

 static int
 i40e_fdir_rx_queue_init(struct i40e_rx_queue *rxq)
@@ -865,6 +867,35 @@ i40e_fdir_filter_programming(struct i40e_pf *pf,
 }

 /*
+ * i40e_fdir_info_get - get information of Flow Director
+ * @pf: ethernet device to get info from
+ * @fdir: a pointer to a structure of type *rte_eth_fdir_info* to be filled 
with
+ *the flow director information.
+ */
+static void
+i40e_fdir_info_get(struct i40e_pf *pf, struct rte_eth_fdir_info *fdir)
+{
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   uint32_t pfqf_ctl;
+
+   pfqf_ctl = I40E_READ_REG(hw, I40E_PFQF_CTL_0);
+   fdir->mode = pfqf_ctl & I40E_PFQF_CTL_0_FD_ENA_MASK ? 1 : 0;
+   fdir->info_ext.guarant_spc =
+   (uint16_t)hw->func_caps.fd_filters_guaranteed;
+   fdir->info_ext.guarant_cnt =
+   (uint16_t)((I40E_READ_REG(hw, I40E_PFQF_FDSTAT) &
+   I40E_PFQF_FDSTAT_GUARANT_CNT_MASK) >>
+   I40E_PFQF_FDSTAT_GUARANT_CNT_SHIFT);
+   fdir->info_ext.best_spc =
+   (uint16_t)hw->func_caps.fd_filters_best_effort;
+   fdir->info_ext.best_cnt =
+   (uint16_t)((I40E_READ_REG(hw, I40E_PFQF_FDSTAT) &
+   I40E_PFQF_FDSTAT_BEST_CNT_MASK) >>
+   I40E_PFQF_FDSTAT_BEST_CNT_SHIFT);
+   return;
+}
+
+/*
  * i40e_fdir_ctrl_func - deal with all operations on flow director.
  * @pf: board private structure
  * @filter_op:operation will be taken.
@@ -894,6 +925,9 @@ i40e_fdir_ctrl_func(struct i40e_pf *pf, enum rte_filter_op 
filter_op, void *arg)
(struct rte_eth_fdir_filter *)arg,
FALSE);
break;
+   case RTE_ETH_FILTER_OP_GET_INFO:
+   i40e_fdir_info_get(pf, (struct rte_eth_fdir_info *)arg);
+   break;
default:
PMD_DRV_LOG(ERR, "unknown operation %u.", filter_op);
ret = -EINVAL;
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 10/20] lib/librte_ether: define structures for getting flow director information

2014-09-26 Thread Jingjing Wu

define structures for getting flow director information

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 lib/librte_ether/rte_eth_ctrl.h | 40 
 lib/librte_ether/rte_ethdev.h   | 23 ---
 2 files changed, 40 insertions(+), 23 deletions(-)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index df1ce4b..208082e 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -230,6 +230,46 @@ struct rte_eth_fdir_filter {
struct rte_eth_fdir_action action;  /**< action taken when match */
 };

+/**
+ * A structure used to report the status of the flow director filters in use.
+ */
+struct rte_eth_fdir {
+   /** Number of filters with collision indication. */
+   uint16_t collision;
+   /** Number of free (non programmed) filters. */
+   uint16_t free;
+   /** The Lookup hash value of the added filter that updated the value
+  of the MAXLEN field */
+   uint16_t maxhash;
+   /** Longest linked list of filters in the table. */
+   uint8_t maxlen;
+   /** Number of added filters. */
+   uint64_t add;
+   /** Number of removed filters. */
+   uint64_t remove;
+   /** Number of failed added filters (no more space in device). */
+   uint64_t f_add;
+   /** Number of failed removed filters. */
+   uint64_t f_remove;
+};
+
+struct rte_eth_fdir_ext {
+   uint16_t guarant_spc;  /**< guaranteed spaces.*/
+   uint16_t guarant_cnt;  /**< Number of filters in guaranteed spaces. */
+   uint16_t best_spc; /**< best effort spaces.*/
+   uint16_t best_cnt; /**< Number of filters in best effort spaces. */
+};
+
+/**
+ * A structure used for user to get the status information of flow director 
filter
+ * to support RTE_ETH_FILTER_FDIR with RTE_ETH_FILTER_OP_GET_INFO operation.
+ */
+struct rte_eth_fdir_info {
+   int mode; /**< if 0 disbale, if 1 enable*/
+   struct rte_eth_fdir info;
+   struct rte_eth_fdir_ext info_ext;
+};
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index e2ea84a..6407e5d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -794,29 +794,6 @@ struct rte_fdir_masks {
 };

 /**
- *  A structure used to report the status of the flow director filters in use.
- */
-struct rte_eth_fdir {
-   /** Number of filters with collision indication. */
-   uint16_t collision;
-   /** Number of free (non programmed) filters. */
-   uint16_t free;
-   /** The Lookup hash value of the added filter that updated the value
-  of the MAXLEN field */
-   uint16_t maxhash;
-   /** Longest linked list of filters in the table. */
-   uint8_t maxlen;
-   /** Number of added filters. */
-   uint64_t add;
-   /** Number of removed filters. */
-   uint64_t remove;
-   /** Number of failed added filters (no more space in device). */
-   uint64_t f_add;
-   /** Number of failed removed filters. */
-   uint64_t f_remove;
-};
-
-/**
  * A structure used to enable/disable specific device interrupts.
  */
 struct rte_intr_conf {
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 09/20] i40e: report flow director match info to mbuf

2014-09-26 Thread Jingjing Wu

support to set the FDIR flag and report FD_ID in mbuf if match

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_rxtx.c | 48 +
 1 file changed, 48 insertions(+)

diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c
index 4435367..c067cdd 100644
--- a/lib/librte_pmd_i40e/i40e_rxtx.c
+++ b/lib/librte_pmd_i40e/i40e_rxtx.c
@@ -105,6 +105,10 @@ i40e_rxd_status_to_pkt_flags(uint64_t qword)
I40E_RX_DESC_FLTSTAT_RSS_HASH) ==
I40E_RX_DESC_FLTSTAT_RSS_HASH) ? PKT_RX_RSS_HASH : 0;

+   /* Check if FDIR Match */
+   flags |= (uint16_t)(qword & (1 << I40E_RX_DESC_STATUS_FLM_SHIFT) ?
+   PKT_RX_FDIR : 0);
+
return flags;
 }

@@ -641,6 +645,22 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
if (pkt_flags & PKT_RX_RSS_HASH)
mb->hash.rss = rte_le_to_cpu_32(\
rxdp->wb.qword0.hi_dword.rss);
+
+   if (pkt_flags & PKT_RX_FDIR) {
+#ifdef RTE_LIBRTE_I40E_16BYTE_RX_DESC
+   if (((qword1 >> 
I40E_RX_DESC_STATUS_FLTSTAT_SHIFT) &
+   I40E_RX_DESC_FLTSTAT_RSS_HASH) 
==
+   I40E_RX_DESC_FLTSTAT_RSV_FD_ID)
+   mb->hash.fdir.id = (uint16_t)
+   
rte_le_to_cpu_32(rxdp[j].wb.qword0.hi_dword.fd);
+#else
+   if (((rxdp[j].wb.qword2.ext_status >>
+   I40E_RX_DESC_EXT_STATUS_FLEXBH_SHIFT) &
+   0x03) == 0x01)
+   mb->hash.fdir.id = (uint16_t)
+   
rte_le_to_cpu_32(rxdp[j].wb.qword3.hi_dword.fd_id);
+#endif
+   }
}

for (j = 0; j < I40E_LOOK_AHEAD; j++)
@@ -877,6 +897,20 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
if (pkt_flags & PKT_RX_RSS_HASH)
rxm->hash.rss =
rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.rss);
+   if (pkt_flags & PKT_RX_FDIR) {
+#ifdef RTE_LIBRTE_I40E_16BYTE_RX_DESC
+   if (((qword1 >> I40E_RX_DESC_STATUS_FLTSTAT_SHIFT) &
+   I40E_RX_DESC_FLTSTAT_RSS_HASH) ==
+   I40E_RX_DESC_FLTSTAT_RSV_FD_ID)
+   rxm->hash.fdir.id = (uint16_t)
+   
rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.fd);
+#else
+   if (((rxd.wb.qword2.ext_status >> 
I40E_RX_DESC_EXT_STATUS_FLEXBH_SHIFT) &
+   0x03) == 0x01)
+   rxm->hash.fdir.id = (uint16_t)
+   
rte_le_to_cpu_32(rxd.wb.qword3.hi_dword.fd_id);
+#endif
+   }

rx_pkts[nb_rx++] = rxm;
}
@@ -1031,6 +1065,20 @@ i40e_recv_scattered_pkts(void *rx_queue,
if (pkt_flags & PKT_RX_RSS_HASH)
rxm->hash.rss =
rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.rss);
+   if (pkt_flags & PKT_RX_FDIR) {
+#ifdef RTE_LIBRTE_I40E_16BYTE_RX_DESC
+   if (((qword1 >> I40E_RX_DESC_STATUS_FLTSTAT_SHIFT) &
+   I40E_RX_DESC_FLTSTAT_RSS_HASH) ==
+   I40E_RX_DESC_FLTSTAT_RSV_FD_ID)
+   rxm->hash.fdir.id = (uint16_t)
+   
rte_le_to_cpu_32(rxd.wb.qword0.hi_dword.fd);
+#else
+   if (((rxd.wb.qword2.ext_status >> 
I40E_RX_DESC_EXT_STATUS_FLEXBH_SHIFT) &
+   0x03) == 0x01)
+   rxm->hash.fdir.id = (uint16_t)
+   
rte_le_to_cpu_32(rxd.wb.qword3.hi_dword.fd_id);
+#endif
+   }

/* Prefetch data of first segment, if configured to do so. */
rte_prefetch0(RTE_PTR_ADD(first_seg->buf_addr,
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 08/20] i40e: match counter for flow director

2014-09-26 Thread Jingjing Wu

support to get the fdir_match counter

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_ethdev.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index 9791519..565ee00 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -1303,6 +1303,9 @@ i40e_dev_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
I40E_GLPRT_PTC9522L(hw->port),
pf->offset_loaded, >tx_size_big,
>tx_size_big);
+   i40e_stat_update_32(hw, I40E_GLQF_PCNT(pf->fdir.match_counter_index),
+  pf->offset_loaded,
+  >fd_sb_match, >fd_sb_match);
/* GLPRT_MSPDC not supported */
/* GLPRT_XEC not supported */

@@ -1316,6 +1319,7 @@ i40e_dev_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
stats->obytes   = ns->eth.tx_bytes;
stats->oerrors  = ns->eth.tx_errors;
stats->imcasts  = ns->eth.rx_multicast;
+   stats->fdirmatch = ns->fd_sb_match;

if (pf->main_vsi)
i40e_update_vsi_stats(pf->main_vsi);
@@ -1387,6 +1391,7 @@ i40e_dev_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
ns->mac_short_packet_dropped);
PMD_DRV_LOG(DEBUG, "checksum_error:   %lu",
ns->checksum_error);
+   PMD_DRV_LOG(DEBUG, "fdir_match:   %lu", ns->fd_sb_match);
PMD_DRV_LOG(DEBUG, "* PF stats end 
");
 }

-- 
1.8.1.4

[dpdk-dev] [PATCH v3 07/20] app/test-pmd: add test commands to add/delete flow director filter

2014-09-26 Thread Jingjing Wu

add commands which can be used to test adding or deleting 8 flow types of the 
flow director filters: ipv4, tcpv4, udpv4, sctpv4, ipv6, tcpv6, udpv6, sctpv6

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 app/test-pmd/cmdline.c | 447 +
 app/test-pmd/testpmd.h |   3 +
 2 files changed, 450 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 225f669..173e863 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -660,6 +660,28 @@ static void cmd_help_long_parsed(void *parsed_result,

"get_flex_filter (port_id) index (idx)\n"
"get info of a flex filter.\n\n"
+
+   "flow_director_filter (port_id) (add|del)"
+   " flow (ip4|ip6) src (src_ip_address) dst 
(dst_ip_address)"
+   " flexwords (flexwords_value) (drop|fwd)"
+   " queue (queue_id) fd_id (fd_id_value)\n"
+   "Add/Del a IP type flow director filter.\n\n"
+
+   "flow_director_filter (port_id) (add|del)"
+   " flow (udp4|tcp4|udp6|tcp6)"
+   " src (src_ip_address) (src_port)"
+   " dst (dst_ip_address) (dst_port)"
+   " flexwords (flexwords_value) (drop|fwd)"
+   " queue (queue_id) fd_id (fd_id_value)\n"
+   "Add/Del a UDP/TCP type flow director filter.\n\n"
+
+   "flow_director_filter (port_id) (add|del)"
+   " flow (sctp4|sctp6)"
+   " src (src_ip_address) dst (dst_ip_address)"
+   " tag (verification_tag)"
+   " flexwords (flexwords_value) (drop|fwd)"
+   " queue (queue_id) fd_id (fd_id_value)\n"
+   "Add/Del a SCTP type flow director filter.\n\n"
);
}
 }
@@ -7411,6 +7433,428 @@ cmdline_parse_inst_t cmd_get_flex_filter = {
},
 };

+/* *** Filters Control *** */
+
+/* *** deal with flow director filter *** */
+struct cmd_flow_director_result {
+   cmdline_fixed_string_t flow_director_filter;
+   uint8_t port_id;
+   cmdline_fixed_string_t ops;
+   cmdline_fixed_string_t flow;
+   cmdline_fixed_string_t flow_type;
+   cmdline_fixed_string_t src;
+   cmdline_ipaddr_t ip_src;
+   uint16_t port_src;
+   cmdline_fixed_string_t dst;
+   cmdline_ipaddr_t ip_dst;
+   uint16_t port_dst;
+   cmdline_fixed_string_t verify_tag;
+   uint32_t verify_tag_value;
+   cmdline_fixed_string_t flexwords;
+   cmdline_fixed_string_t flexwords_value;
+   cmdline_fixed_string_t drop;
+   cmdline_fixed_string_t queue;
+   uint16_t  queue_id;
+   cmdline_fixed_string_t fd_id;
+   uint32_t  fd_id_value;
+};
+
+static inline int
+parse_flexwords(const char *q_arg, uint16_t *flexwords)
+{
+#define MAX_NUM_WORD 8
+   char s[256];
+   const char *p, *p0 = q_arg;
+   char *end;
+   unsigned long int_fld[MAX_NUM_WORD];
+   char *str_fld[MAX_NUM_WORD];
+   int i;
+   unsigned size;
+   int num_words = -1;
+
+   p = strchr(p0, '(');
+   if (p == NULL)
+   return -1;
+   ++p;
+   p0 = strchr(p, ')');
+   if (p0 == NULL)
+   return -1;
+
+   size = p0 - p;
+   if (size >= sizeof(s))
+   return -1;
+
+   snprintf(s, sizeof(s), "%.*s", size, p);
+   num_words = rte_strsplit(s, sizeof(s), str_fld, MAX_NUM_WORD, ',');
+   if (num_words < 0 || num_words > MAX_NUM_WORD)
+   return -1;
+   for (i = 0; i < num_words; i++) {
+   errno = 0;
+   int_fld[i] = strtoul(str_fld[i], , 0);
+   if (errno != 0 || end == str_fld[i] || int_fld[i] > UINT16_MAX)
+   return -1;
+   flexwords[i] = rte_cpu_to_be_16((uint16_t)int_fld[i]);
+   }
+   return num_words;
+}
+
+static void
+cmd_flow_director_filter_parsed(void *parsed_result,
+ __attribute__((unused)) struct cmdline *cl,
+ __attribute__((unused)) void *data)
+{
+   struct cmd_flow_director_result *res = parsed_result;
+   struct rte_eth_fdir_filter entry;
+   uint16_t flexwords[8];
+   int num_flexwords;
+   int ret = 0;
+
+   ret = rte_eth_dev_filter_supported(res->port_id, RTE_ETH_FILTER_FDIR);
+   if (ret < 0) {
+   printf("flow director is not supported on port %u.\n",
+   res->port_id);
+   return;
+   }
+   memset(flexwords, 0, sizeof(flexwords));
+   memset(, 0, sizeof(struct rte_eth_fdir_filter));
+   num_flexwords = parse_flexwords(res->flexwords_value, flexwords);
+   if (num_flexwords < 0) {
+   printf("error: Cannot pase

[dpdk-dev] [PATCH v3 06/20] i40e: implement operations to add/delete flow director

2014-09-26 Thread Jingjing Wu

deal with two operations for flow director
 - RTE_ETH_FILTER_OP_ADD
 - RTE_ETH_FILTER_OP_DELETE
encode the flow inputs to programming packet
sent the packet to filter programming queue and check status on the status 
report queue

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_ethdev.c |  29 ++
 lib/librte_pmd_i40e/i40e_ethdev.h |   3 +
 lib/librte_pmd_i40e/i40e_fdir.c   | 617 +-
 3 files changed, 648 insertions(+), 1 deletion(-)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index a3f25e6..9791519 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -205,6 +205,9 @@ static int i40e_dev_rss_hash_update(struct rte_eth_dev *dev,
struct rte_eth_rss_conf *rss_conf);
 static int i40e_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
  struct rte_eth_rss_conf *rss_conf);
+static int i40e_dev_filter_ctrl(struct rte_eth_dev *dev,
+   enum rte_filter_type filter_type,
+   enum rte_filter_op filter_op, void *arg);

 /* Default hash key buffer for RSS */
 static uint32_t rss_key_default[I40E_PFQF_HKEY_MAX_INDEX + 1];
@@ -256,6 +259,7 @@ static struct eth_dev_ops i40e_eth_dev_ops = {
.reta_query   = i40e_dev_rss_reta_query,
.rss_hash_update  = i40e_dev_rss_hash_update,
.rss_hash_conf_get= i40e_dev_rss_hash_conf_get,
+   .filter_ctrl  = i40e_dev_filter_ctrl,
 };

 static struct eth_driver rte_i40e_pmd = {
@@ -4221,3 +4225,28 @@ i40e_pf_config_mq_rx(struct i40e_pf *pf)

return 0;
 }
+
+/*
+ * Take operations to assigned filter type on NIC.
+ */
+static int
+i40e_dev_filter_ctrl(struct rte_eth_dev *dev, enum rte_filter_type filter_type,
+enum rte_filter_op filter_op, void *arg)
+{
+   struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+   int ret = I40E_SUCCESS;
+
+   if (dev == NULL)
+   return -EINVAL;
+
+   switch (filter_type) {
+   case RTE_ETH_FILTER_FDIR:
+   ret = i40e_fdir_ctrl_func(pf, filter_op, arg);
+   break;
+   default:
+   PMD_DRV_LOG(ERR, "unsupported filter type %u.", filter_type);
+   ret = -EINVAL;
+   break;
+   }
+   return ret;
+}
diff --git a/lib/librte_pmd_i40e/i40e_ethdev.h 
b/lib/librte_pmd_i40e/i40e_ethdev.h
index 2460635..af149df 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.h
+++ b/lib/librte_pmd_i40e/i40e_ethdev.h
@@ -341,6 +341,9 @@ enum i40e_status_code i40e_fdir_setup_rx_resources(struct 
i40e_pf *pf,
unsigned int socket_id);
 int i40e_fdir_setup(struct i40e_pf *pf);
 void i40e_fdir_teardown(struct i40e_pf *pf);
+int i40e_fdir_ctrl_func(struct i40e_pf *pf,
+ enum rte_filter_op filter_op,
+ void *arg);

 /* I40E_DEV_PRIVATE_TO */
 #define I40E_DEV_PRIVATE_TO_PF(adapter) \
diff --git a/lib/librte_pmd_i40e/i40e_fdir.c b/lib/librte_pmd_i40e/i40e_fdir.c
index a3e6bd7..82645df 100644
--- a/lib/librte_pmd_i40e/i40e_fdir.c
+++ b/lib/librte_pmd_i40e/i40e_fdir.c
@@ -44,6 +44,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 

 #include "i40e_logs.h"
 #include "i40e/i40e_type.h"
@@ -51,12 +55,43 @@
 #include "i40e_rxtx.h"

 #define I40E_FDIR_MZ_NAME  "FDIR_MEMZONE"
+#ifndef IPV6_ADDR_LEN
+#define IPV6_ADDR_LEN  16
+#endif
+
 #define I40E_FDIR_PKT_LEN   512
+#define I40E_FDIR_IP_DEFAULT_LEN420
+#define I40E_FDIR_IP_DEFAULT_TTL0x40
+#define I40E_FDIR_IP_DEFAULT_VERSION_IHL0x45
+#define I40E_FDIR_TCP_DEFAULT_DATAOFF   0x50
+#define I40E_FDIR_IPv6_DEFAULT_VTC_FLOW 0x6030
+#define I40E_FDIR_IPv6_DEFAULT_HOP_LIMITS   0xFF
+#define I40E_FDIR_IPv6_PAYLOAD_LEN  380
+#define I40E_FDIR_UDP_DEFAULT_LEN   400
+
+/* Wait count and inteval for fdir filter programming */
+#define I40E_FDIR_WAIT_COUNT   10
+#define I40E_FDIR_WAIT_INTERVAL_US 1000
+
+/* Wait count and inteval for fdir filter flush */
+#define I40E_FDIR_FLUSH_RETRY   50
+#define I40E_FDIR_FLUSH_INTERVAL_MS 5
+
 #define I40E_COUNTER_PF   2
 /* Statistic counter index for one pf */
 #define I40E_COUNTER_INDEX_FDIR(pf_id)   (0 + (pf_id) * I40E_COUNTER_PF)

 static int i40e_fdir_rx_queue_init(struct i40e_rx_queue *rxq);
+static int i40e_fdir_construct_pkt(struct i40e_pf *pf,
+struct rte_eth_fdir_input *fdir_input,
+unsigned char *raw_pkt);
+static int i40e_add_del_fdir_filter(struct i40e_pf *pf,
+   struct rte_eth_fdir_filter *filter,
+   bool add);
+static int

[dpdk-dev] [PATCH v3 05/20] lib/librte_ether: define structures for adding/deleting flow director

2014-09-26 Thread Jingjing Wu

define structures to add or delete flow director filter

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 lib/librte_ether/rte_eth_ctrl.h | 159 
 1 file changed, 159 insertions(+)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index 34ab278..df1ce4b 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -71,6 +71,165 @@ enum rte_filter_op {
RTE_ETH_FILTER_OP_MAX,
 };

+/**
+ * Define all structures for Flow Director Filter type corresponding with 
specific operations.
+ */
+
+/**
+ * flow type
+ */
+enum rte_eth_flow_type {
+   RTE_ETH_FLOW_TYPE_NONE = 0x0,
+   RTE_ETH_FLOW_TYPE_UDPV4,
+   RTE_ETH_FLOW_TYPE_TCPV4,
+   RTE_ETH_FLOW_TYPE_SCTPV4,
+   RTE_ETH_FLOW_TYPE_IPV4_OTHER,
+   RTE_ETH_FLOW_TYPE_UDPV6,
+   RTE_ETH_FLOW_TYPE_TCPV6,
+   RTE_ETH_FLOW_TYPE_SCTPV6,
+   RTE_ETH_FLOW_TYPE_IPV6_OTHER,
+};
+
+/**
+ * A structure used to define the input for IPV4 UDP flow
+ */
+struct rte_eth_udpv4_flow {
+   uint32_t src_ip;  /**< IPv4 source address to match. */
+   uint32_t dst_ip;  /**< IPv4 destination address to match. */
+   uint16_t src_port;/**< UDP Source port to match. */
+   uint16_t dst_port;/**< UDP Destination port to match. */
+};
+
+/**
+ * A structure used to define the input for IPV4 TCP flow
+ */
+struct rte_eth_tcpv4_flow {
+   uint32_t src_ip;  /**< IPv4 source address to match. */
+   uint32_t dst_ip;  /**< IPv4 destination address to match. */
+   uint16_t src_port;/**< TCP Source port to match. */
+   uint16_t dst_port;/**< TCP Destination port to match. */
+};
+
+/**
+ * A structure used to define the input for IPV4 SCTP flow
+ */
+struct rte_eth_sctpv4_flow {
+   uint32_t src_ip;  /**< IPv4 source address to match. */
+   uint32_t dst_ip;  /**< IPv4 destination address to match. */
+   uint32_t verify_tag;  /**< verify tag to match */
+};
+
+/**
+ * A structure used to define the input for IPV4 flow
+ */
+struct rte_eth_ipv4_flow {
+   uint32_t src_ip;  /**< IPv4 source address to match. */
+   uint32_t dst_ip;  /**< IPv4 destination address to match. */
+};
+
+/**
+ * A structure used to define the input for IPV6 UDP flow
+ */
+struct rte_eth_udpv6_flow {
+   uint32_t src_ip[4];  /**< IPv6 source address to match. */
+   uint32_t dst_ip[4];  /**< IPv6 destination address to match. */
+   uint16_t src_port;   /**< UDP Source port to match. */
+   uint16_t dst_port;   /**< UDP Destination port to match. */
+};
+
+/**
+ * A structure used to define the input for IPV6 TCP flow
+ */
+struct rte_eth_tcpv6_flow {
+   uint32_t src_ip[4];  /**< IPv6 source address to match. */
+   uint32_t dst_ip[4];  /**< IPv6 destination address to match. */
+   uint16_t src_port;   /**< TCP Source port to match. */
+   uint16_t dst_port;   /**< TCP Destination port to match. */
+};
+
+/**
+ * A structure used to define the input for IPV6 SCTP flow
+ */
+struct rte_eth_sctpv6_flow {
+   uint32_t src_ip[4];  /**< IPv6 source address to match. */
+   uint32_t dst_ip[4];  /**< IPv6 destination address to match. */
+   uint32_t verify_tag; /**< verify tag to match */
+};
+
+/**
+ * A structure used to define the input for IPV6 flow
+ */
+struct rte_eth_ipv6_flow {
+   uint32_t src_ip[4];  /**< IPv6 source address to match. */
+   uint32_t dst_ip[4];  /**< IPv6 destination address to match. */
+};
+
+/**
+ * A union constains the inputs for all types of flow
+ */
+union rte_eth_fdir_flow {
+   struct rte_eth_udpv4_flow  udp4_flow;
+   struct rte_eth_tcpv4_flow  tcp4_flow;
+   struct rte_eth_sctpv4_flow sctp4_flow;
+   struct rte_eth_ipv4_flow   ip4_flow;
+   struct rte_eth_udpv6_flow  udp6_flow;
+   struct rte_eth_tcpv6_flow  tcp6_flow;
+   struct rte_eth_sctpv6_flow sctp6_flow;
+   struct rte_eth_ipv6_flow   ip6_flow;
+};
+
+#define RTE_ETH_FDIR_MAX_FLEXWORD_LEN  8
+/**
+ * A structure used to contain extend input of flow
+ */
+struct rte_eth_fdir_flow_ext {
+   uint16_t vlan_tci;
+   uint8_t num_flexwords; /**< number of flexwords */
+   uint16_t flexwords[RTE_ETH_FDIR_MAX_FLEXWORD_LEN];
+   uint16_t dest_id;  /**< destination vsi or pool id*/
+};
+
+/**
+ * A structure used to define the input for an flow director filter entry
+ */
+struct rte_eth_fdir_input {
+   enum rte_eth_flow_type flow_type;  /**< type of flow */
+   union rte_eth_fdir_flow flow;  /**< specific flow structure */
+   struct rte_eth_fdir_flow_ext flow_ext; /**< specific flow info */
+};
+
+/**
+ * Flow director report status
+ */
+enum rte_eth_fdir_status {
+   RTE_ETH_FDIR_NO_REPORT_STATUS = 0, /**< no report FDIR. */
+   RTE_ETH_FDIR_REPORT_FD_ID,

[dpdk-dev] [PATCH v3 04/20] lib/librte_ether: new filter APIs definition

2014-09-26 Thread Jingjing Wu

Define new APIs to support configure multi-kind filters using same APIs
 - rte_eth_dev_filter_supported
 - rte_eth_dev_filter_ctrl

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 lib/librte_ether/Makefile   |  1 +
 lib/librte_ether/rte_eth_ctrl.h | 78 +
 lib/librte_ether/rte_ethdev.c   | 32 +
 lib/librte_ether/rte_ethdev.h   | 44 +++
 4 files changed, 155 insertions(+)
 create mode 100644 lib/librte_ether/rte_eth_ctrl.h

diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index b310f8b..a461c31 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -46,6 +46,7 @@ SRCS-y += rte_ethdev.c
 #
 SYMLINK-y-include += rte_ether.h
 SYMLINK-y-include += rte_ethdev.h
+SYMLINK-y-include += rte_eth_ctrl.h

 # this lib depends upon:
 DEPDIRS-y += lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
new file mode 100644
index 000..34ab278
--- /dev/null
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -0,0 +1,78 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_CTRL_H_
+#define _RTE_ETH_CTRL_H_
+
+/**
+ * @file
+ *
+ * Ethernet device features and related data structures used
+ * by control APIs should be defined in this file.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Feature filter types
+ */
+enum rte_filter_type {
+   RTE_ETH_FILTER_NONE = 0,
+   RTE_ETH_FILTER_RSS,
+   RTE_ETH_FILTER_FDIR,
+   RTE_ETH_FILTER_MAX,
+};
+
+/**
+ * All generic operations to filters
+ */
+enum rte_filter_op {
+   RTE_ETH_FILTER_OP_NONE = 0, /**< used to check whether the type filter 
is supported */
+   RTE_ETH_FILTER_OP_ADD,  /**< add filter entry */
+   RTE_ETH_FILTER_OP_UPDATE,   /**< update filter entry */
+   RTE_ETH_FILTER_OP_DELETE,   /**< delete filter entry */
+   RTE_ETH_FILTER_OP_FLUSH,/**< flush all entries */
+   RTE_ETH_FILTER_OP_GET,  /**< get filter entry */
+   RTE_ETH_FILTER_OP_SET,  /**< configurations */
+   RTE_ETH_FILTER_OP_GET_INFO, /**< get information of filter, such as 
status or statistics */
+   RTE_ETH_FILTER_OP_MAX,
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_ETH_CTRL_H_ */
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index b71b679..fdafb15 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3139,3 +3139,35 @@ rte_eth_dev_get_flex_filter(uint8_t port_id, uint16_t 
index,
return (*dev->dev_ops->get_flex_filter)(dev, index, filter,
rx_queue);
 }
+
+int
+rte_eth_dev_filter_supported(uint8_t port_id, enum rte_filter_type filter_type)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -ENODEV;
+   }
+
+   dev = _eth_devices[port_id];
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->filter_ctrl, -ENOTSUP);
+   return (*dev->dev_ops->filter_ctrl)(dev, filter_type,
+   RTE_ETH_FILTER_OP_NONE, NULL);
+}
+
+int
+rte_eth_dev_filter_ctrl(uint8_t port_id, enum

[dpdk-dev] [PATCH v3 03/20] i40e: initialize flexible payload setting

2014-09-26 Thread Jingjing Wu

set flexible payload related registers to default value at initialization time.

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_ethdev.c | 33 ++
 lib/librte_pmd_i40e/i40e_fdir.c   | 49 +++
 2 files changed, 82 insertions(+)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index da131a8..a3f25e6 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -336,6 +336,32 @@ static struct rte_driver rte_i40e_driver = {

 PMD_REGISTER_DRIVER(rte_i40e_driver);

+/*
+ * Initialize registers for flexible payload, which should be set by NVM.
+ * This should be removed from code once is fixed in NVM.
+ */
+static inline void i40e_flex_payload_reg_init(struct i40e_hw *hw)
+{
+   /* GLQF_ORT Registers */
+   I40E_WRITE_REG(hw, I40E_GLQF_ORT(18), 0x0030);
+   I40E_WRITE_REG(hw, I40E_GLQF_ORT(19), 0x0030);
+   I40E_WRITE_REG(hw, I40E_GLQF_ORT(26), 0x002B);
+   I40E_WRITE_REG(hw, I40E_GLQF_ORT(30), 0x002B);
+   I40E_WRITE_REG(hw, I40E_GLQF_ORT(33), 0x00E0);
+   I40E_WRITE_REG(hw, I40E_GLQF_ORT(34), 0x00E3);
+   I40E_WRITE_REG(hw, I40E_GLQF_ORT(35), 0x00E6);
+   I40E_WRITE_REG(hw, I40E_GLQF_ORT(20), 0x0031);
+   I40E_WRITE_REG(hw, I40E_GLQF_ORT(23), 0x0031);
+   I40E_WRITE_REG(hw, I40E_GLQF_ORT(63), 0x002D);
+
+   /* GLQF_PIT Registers */
+   I40E_WRITE_REG(hw, I40E_GLQF_PIT(16), 0x7480);
+   I40E_WRITE_REG(hw, I40E_GLQF_PIT(17), 0x7440);
+
+   /* GL_PRS_FVBM Registers */
+   I40E_WRITE_REG(hw, I40E_GL_PRS_FVBM(1), 0x835B);
+}
+
 static int
 eth_i40e_dev_init(__rte_unused struct eth_driver *eth_drv,
   struct rte_eth_dev *dev)
@@ -399,6 +425,13 @@ eth_i40e_dev_init(__rte_unused struct eth_driver *eth_drv,
return ret;
}

+   /*
+* To work around the NVM issue,initialize registers
+* for flexible payload by software.
+* It should be removed once issues are fixed in NVM.
+*/
+   i40e_flex_payload_reg_init(hw);
+
/* Initialize the parameters for adminq */
i40e_init_adminq_parameter(hw);
ret = i40e_init_adminq(hw);
diff --git a/lib/librte_pmd_i40e/i40e_fdir.c b/lib/librte_pmd_i40e/i40e_fdir.c
index 3d8faa0..a3e6bd7 100644
--- a/lib/librte_pmd_i40e/i40e_fdir.c
+++ b/lib/librte_pmd_i40e/i40e_fdir.c
@@ -109,6 +109,53 @@ i40e_fdir_rx_queue_init(struct i40e_rx_queue *rxq)
 }

 /*
+ * Initialize the configuration about bytes stream extracted as flexible 
payload
+ * and mask setting
+ */
+static inline void
+i40e_init_flx_pld(struct i40e_pf *pf)
+{
+   uint8_t pctype;;
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+
+   /*
+* Define the bytes stream extracted as flexible payload in
+* field vector. By default, select 8 words from the beginning
+* of payload as flexible payload.
+*/
+   memset(pf->fdir.flex_set, 0, sizeof(pf->fdir.flex_set));
+
+   /* initialize the flexible payload for L2 payload*/
+   pf->fdir.flex_set[0][0].offset = 0;
+   pf->fdir.flex_set[0][0].size = 8;
+   I40E_WRITE_REG(hw, I40E_PRTQF_FLX_PIT(0), 0xC900);
+   I40E_WRITE_REG(hw, I40E_PRTQF_FLX_PIT(1), 0xFC29);/*non-used*/
+   I40E_WRITE_REG(hw, I40E_PRTQF_FLX_PIT(2), 0xFC2A);/*non-used*/
+
+   /* initialize the flexible payload for L3 payload*/
+   pf->fdir.flex_set[1][0].offset = 0;
+   pf->fdir.flex_set[1][0].size = 8;
+   I40E_WRITE_REG(hw, I40E_PRTQF_FLX_PIT(3), 0xC900);
+   I40E_WRITE_REG(hw, I40E_PRTQF_FLX_PIT(4), 0xFC29);/*non-used*/
+   I40E_WRITE_REG(hw, I40E_PRTQF_FLX_PIT(5), 0xFC2A);/*non-used*/
+
+   /* initialize the flexible payload for L4 payload*/
+   pf->fdir.flex_set[2][0].offset = 0;
+   pf->fdir.flex_set[2][0].size = 8;
+   I40E_WRITE_REG(hw, I40E_PRTQF_FLX_PIT(6), 0xC900);
+   I40E_WRITE_REG(hw, I40E_PRTQF_FLX_PIT(7), 0xFC29);/*non-used*/
+   I40E_WRITE_REG(hw, I40E_PRTQF_FLX_PIT(8), 0xFC2A);/*non-used*/
+
+   /* initialize the masks */
+   for (pctype = I40E_FILTER_PCTYPE_NONF_IPV4_UDP;
+pctype <= I40E_FILTER_PCTYPE_FRAG_IPV6; pctype++) {
+   I40E_WRITE_REG(hw, I40E_PRTQF_FD_FLXINSET(pctype), 0);
+   I40E_WRITE_REG(hw, I40E_PRTQF_FD_MSK(pctype, 0), 0);
+   I40E_WRITE_REG(hw, I40E_PRTQF_FD_MSK(pctype, 1), 0);
+   }
+}
+
+/*
  * i40e_fdir_setup - reserve and initialize the Flow Director resources
  * @pf: board private structure
  */
@@ -182,6 +229,8 @@ i40e_fdir_setup(struct i40e_pf *pf)
goto fail_mem;
}

+   i40e_init_flx_pld(pf);
+
/* reserve memory for the fdir programming packet */
snprintf(z_name, sizeof(z_name), "%s_%s_%d",
eth_dev->driver->pci_drv.name,

[dpdk-dev] [PATCH v3 02/20] i40e: tear down flow director

2014-09-26 Thread Jingjing Wu

release fortville resources on flow director, includes
 - queue 0 pair release
 - release vsi

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 lib/librte_pmd_i40e/i40e_ethdev.c |  4 +++-
 lib/librte_pmd_i40e/i40e_ethdev.h |  1 +
 lib/librte_pmd_i40e/i40e_fdir.c   | 19 +++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index d0413cb..da131a8 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -522,7 +522,8 @@ eth_i40e_dev_init(__rte_unused struct eth_driver *eth_drv,
return 0;

 err_setup_pf_switch:
-   rte_free(pf->main_vsi);
+   i40e_fdir_teardown(pf);
+   i40e_vsi_release(pf->main_vsi);
 err_get_mac_addr:
 err_configure_lan_hmc:
(void)i40e_shutdown_lan_hmc(hw);
@@ -850,6 +851,7 @@ i40e_dev_close(struct rte_eth_dev *dev)
i40e_shutdown_lan_hmc(hw);

/* release all the existing VSIs and VEBs */
+   i40e_fdir_teardown(pf);
i40e_vsi_release(pf->main_vsi);

/* shutdown the adminq */
diff --git a/lib/librte_pmd_i40e/i40e_ethdev.h 
b/lib/librte_pmd_i40e/i40e_ethdev.h
index a2b1578..2460635 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.h
+++ b/lib/librte_pmd_i40e/i40e_ethdev.h
@@ -340,6 +340,7 @@ enum i40e_status_code i40e_fdir_setup_tx_resources(struct 
i40e_pf *pf,
 enum i40e_status_code i40e_fdir_setup_rx_resources(struct i40e_pf *pf,
unsigned int socket_id);
 int i40e_fdir_setup(struct i40e_pf *pf);
+void i40e_fdir_teardown(struct i40e_pf *pf);

 /* I40E_DEV_PRIVATE_TO */
 #define I40E_DEV_PRIVATE_TO_PF(adapter) \
diff --git a/lib/librte_pmd_i40e/i40e_fdir.c b/lib/librte_pmd_i40e/i40e_fdir.c
index 5872494..3d8faa0 100644
--- a/lib/librte_pmd_i40e/i40e_fdir.c
+++ b/lib/librte_pmd_i40e/i40e_fdir.c
@@ -218,3 +218,22 @@ fail_setup_tx:
pf->fdir.fdir_vsi = NULL;
return err;
 }
+
+/*
+ * i40e_fdir_teardown - release the Flow Director resources
+ * @pf: board private structure
+ */
+void
+i40e_fdir_teardown(struct i40e_pf *pf)
+{
+   struct i40e_vsi *vsi;
+
+   vsi = pf->fdir.fdir_vsi;
+   i40e_dev_rx_queue_release(pf->fdir.rxq);
+   pf->fdir.rxq = NULL;
+   i40e_dev_tx_queue_release(pf->fdir.txq);
+   pf->fdir.txq = NULL;
+   i40e_vsi_release(vsi);
+   pf->fdir.fdir_vsi = NULL;
+   return;
+}
\ No newline at end of file
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 01/20] i40e: set up and initialize flow director

2014-09-26 Thread Jingjing Wu

set up fortville resources to support flow director, includes
 - queue 0 pair allocated and set up for flow director
 - create vsi
 - reserve memzone for flow director programming packet

Signed-off-by: Jingjing Wu 
Acked-by: Chen Jing D(Mark) 
Acked-by: Helin Zhang 
---
 lib/librte_pmd_i40e/Makefile  |   2 +
 lib/librte_pmd_i40e/i40e_ethdev.c |  77 +++--
 lib/librte_pmd_i40e/i40e_ethdev.h |  30 +-
 lib/librte_pmd_i40e/i40e_fdir.c   | 220 ++
 lib/librte_pmd_i40e/i40e_rxtx.c   | 127 ++
 5 files changed, 444 insertions(+), 12 deletions(-)
 create mode 100644 lib/librte_pmd_i40e/i40e_fdir.c

diff --git a/lib/librte_pmd_i40e/Makefile b/lib/librte_pmd_i40e/Makefile
index 4b31675..fdb9e26 100644
--- a/lib/librte_pmd_i40e/Makefile
+++ b/lib/librte_pmd_i40e/Makefile
@@ -87,6 +87,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_ethdev_vf.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_pf.c
+SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_fdir.c
+
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += lib/librte_eal lib/librte_ether
 DEPDIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += lib/librte_mempool lib/librte_mbuf
diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index a00d6ca..d0413cb 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -779,6 +779,12 @@ i40e_dev_start(struct rte_eth_dev *dev)
i40e_vsi_queues_bind_intr(vsi);
i40e_vsi_enable_queues_intr(vsi);

+   /* enable FDIR MSIX interrupt */
+   if (pf->flags & I40E_FLAG_FDIR) {
+   i40e_vsi_queues_bind_intr(pf->fdir.fdir_vsi);
+   i40e_vsi_enable_queues_intr(pf->fdir.fdir_vsi);
+   }
+
/* Enable all queues which have been configured */
ret = i40e_vsi_switch_queues(vsi, TRUE);
if (ret != I40E_SUCCESS) {
@@ -2587,16 +2593,30 @@ i40e_vsi_setup(struct i40e_pf *pf,
case I40E_VSI_SRIOV :
vsi->nb_qps = pf->vf_nb_qps;
break;
+   case I40E_VSI_FDIR:
+   vsi->nb_qps = pf->fdir_nb_qps;
+   break;
default:
goto fail_mem;
}
-   ret = i40e_res_pool_alloc(>qp_pool, vsi->nb_qps);
-   if (ret < 0) {
-   PMD_DRV_LOG(ERR, "VSI %d allocate queue failed %d",
-   vsi->seid, ret);
-   goto fail_mem;
-   }
-   vsi->base_queue = ret;
+   /*
+* The filter status descriptor is reported in rx queue 0,
+* while the tx queue for fdir filter programming has no
+* such constraints, can be non-zero queues.
+* To simplify it, choose FDIR vsi use queue 0 pair.
+* To make sure it will use queue 0 pair, queue allocation
+* need be done before this function is called
+*/
+   if (type != I40E_VSI_FDIR) {
+   ret = i40e_res_pool_alloc(>qp_pool, vsi->nb_qps);
+   if (ret < 0) {
+   PMD_DRV_LOG(ERR, "VSI %d allocate queue failed 
%d",
+   vsi->seid, ret);
+   goto fail_mem;
+   }
+   vsi->base_queue = ret;
+   } else
+   vsi->base_queue = I40E_FDIR_QUEUE_ID;

/* VF has MSIX interrupt in VF range, don't allocate here */
if (type != I40E_VSI_SRIOV) {
@@ -2728,8 +2748,24 @@ i40e_vsi_setup(struct i40e_pf *pf,
 * Since VSI is not created yet, only configure parameter,
 * will add vsi below.
 */
-   }
-   else {
+   } else if (type == I40E_VSI_FDIR) {
+   vsi->uplink_seid = uplink_vsi->uplink_seid;
+   ctxt.pf_num = hw->pf_id;
+   ctxt.vf_num = 0;
+   ctxt.uplink_seid = vsi->uplink_seid;
+   ctxt.connection_type = 0x1; /* regular data port */
+   ctxt.flags = I40E_AQ_VSI_TYPE_PF;
+   ret = i40e_vsi_config_tc_queue_mapping(vsi, ,
+   I40E_DEFAULT_TCMAP);
+   if (ret != I40E_SUCCESS) {
+   PMD_DRV_LOG(ERR, "Failed to configure "
+   "TC queue mapping\n");
+   goto fail_msix_alloc;
+   }
+   ctxt.info.up_enable_bits = I40E_DEFAULT_TCMAP;
+   ctxt.info.valid_sections |=
+   rte_cpu_to_le_16(I40E_AQ_VSI_PROP_SCHED_VALID);
+   } else {
PMD_DRV_LOG(ERR, "VSI: Not support other type VSI yet");
goto fail_msix_alloc;
}
@@ -2915,8 +2951,16 @@ i40e_pf_setup(struct i40e_pf *pf)
PMD_DRV_LOG(ERR, "Could not get switch config, err %d", ret);
return ret;
}
-
-

[dpdk-dev] [PATCH v3 00/20] Support flow director programming on Fortville

2014-09-26 Thread Jingjing Wu

The patch set supports flow director on fortville.
It includes:
 - set up/tear down fortville resources to support flow director, such as queue 
and vsi.
 - define new APIs to support multi-kind filters and their operations.
 - support operation to add or delete 8 flow types of the flow director 
filters, they are ipv4, tcpv4, udpv4, sctpv4, ipv6, tcpv6, udpv6, sctpv6.
 - support flushing flow director table (all filters).
 - support operation to get flow director information.
 - match status statistics and FD_ID report .
 - support operation to configure flexible payload and its mask
 - support flexible payload involved in comparison

v2 changes:
 - create real fdir vsi and assign queue 0 pair to it.
 - check filter status report on the rx queue 0

v3 change:
 - redefine filter APIs to support multi-kind filters
 - support sctpv4 and sctpv6 type flows
 - support flexible payload involved in comparison

Jingjing Wu (20):
  i40e: set up and initialize flow director
  i40e: tear down flow director
  i40e: initialize flexible payload setting
  lib/librte_ether: new filter APIs definition
  lib/librte_ether: define structures for adding/deleting flow director
  i40e: implement operations to add/delete flow director
  app/test-pmd: add test commands to add/delete flow director filter
  i40e: match counter for flow director
  i40e: report flow director match info to mbuf
  lib/librte_ether: define structures for getting flow director information
  i40e: implement operations to get fdir info
  app/test-pmd: display fdir statistics
  i40e: implement operation to flush flow director table
  app/test-pmd: add test command to flush flow director table
  lib/librte_ether: define structures for configuring flexible payload
  i40e: implement operations to configure flexible payload
  app/test-pmd: add test command to configure flexible payload
  lib/librte_ether: define structures for configuring flex masks
  i40e: implement operations to configure flexible masks
  app/test-pmd: add test command to configure flexible masks

 app/test-pmd/cmdline.c|  813 +
 app/test-pmd/config.c |   40 +-
 app/test-pmd/testpmd.h|3 +
 lib/librte_ether/Makefile |1 +
 lib/librte_ether/rte_eth_ctrl.h   |  343 +++
 lib/librte_ether/rte_ethdev.c |   32 +
 lib/librte_ether/rte_ethdev.h |   67 ++-
 lib/librte_pmd_i40e/Makefile  |2 +
 lib/librte_pmd_i40e/i40e_ethdev.c |  148 -
 lib/librte_pmd_i40e/i40e_ethdev.h |   34 +-
 lib/librte_pmd_i40e/i40e_fdir.c   | 1202 +
 lib/librte_pmd_i40e/i40e_rxtx.c   |  175 ++
 12 files changed, 2815 insertions(+), 45 deletions(-)
 create mode 100644 lib/librte_ether/rte_eth_ctrl.h
 create mode 100644 lib/librte_pmd_i40e/i40e_fdir.c

-- 
1.8.1.4

[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Wodkowski, PawelX

> > > Maybe I don't see something obvious? :)
> 
> I think you're missing the fact that your patch doesn't do what you assert 
> above
> either :)

Issue is not in setting alarms but canceling it. If you look closer to my patch 
you
see that it address this issue (look at added *do { lock(); ; unlock(); } 
while( )* 
part).

> 
> First, lets address rte_alarm_set.  There is no notion of "re-arming" in this
> alarm implementation, because theres no ability to refer to a specific alarm
> from the callers perspective.  When you call rte_eal_alarm_set you get a new
> alarm every time.  So I don't really see a race there.  It might not be 
> exactly
> the behavior you want, but its not a race, becuase you're not modifying an
> alarm
> in the middle of execution, you're just creating a new alarm, which is safe.

OK, it is safe, but this is not the case.

> 
> There is a race in what you describe above, insofar as its possible that you
> might call rte_eal_alarm_cancel and return without having canceled all the
> matching alarms.  I don't see any clear documentation on what the behavior is
> supposed to be, but if you want to ensure that all matching alarms are 
> cancelled
> or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in 
> linux
> API parlance, thats usually denoted as a cancel_sync operation).

Again, look at the patch. I changed documentation to inform about this behavior.

> 
> For that race condition, you're correct, my patch doesn't address it, I see 
> that
> now.  Though your patch doesn't either.  If you call rte_eal_alarm_cancel from
> within a callback function, then, by definition, you can't wait on the
> completion of the active alarm, because thats a deadlock.  Its a necessecary
> evil, I grant you, but it means that you can't be guaranteed the cancelled and
> complete (cancel_sync) behavior that you want, at least not with the current
> api.  If you want that behavior, you need to do one of two things:

This patch does not break any API. It only removes undefined behavior.

> 
> 1) Modify the api to allow callers to individually reference timer instances, 
> so
> that when cancelling, we can return an appropriate return code to indicate to
> the caller that this alarm is in-progress.  That way you can guarantee the
> caller that the specific alarm that you cancelled is either complete and 
> cancelled
> or currently executing.  Add an api to expicitly wait on a referenced alarm as
> well.  This allows developers to know that, when executing an alarm callback, 
> an
> -ECURRENTLYEXECUTING return code is ok, because they are in the currently
> executing context.

This would brake API for sure.

[dpdk-dev] GSO support by PMD drivers

2014-09-26 Thread Vadim Suraev

Hi, all,
I found ixgbe in couple with rte_mbuf (and probably other PMD drivers)
don't support GSO, I reverse engineered the linux kernel's  ixgbe's gso
support and got it working in 1.6. Could it be useful to provide the patch?
Regards,
 Vadim.

[dpdk-dev] [PATCH v3] ethdev: Rename RX/TX enable queue field for queue start and stop

2014-09-26 Thread Ouyang Changchun

V3 change:
 - Rename field name to rx_deferred_start/tx_deferred_start in
   both ixgbe and i40e PMD. 
 - Move the doxygen comments for rx_deferred_start after it is declared.
 - Simplify/split the long description and move some to doxygen comments of
   rte_eth_dev_rx_queue_start and rte_eth_dev_tx_queue_start.

V2 and V1 change:
 - Update comments for the field start_rx_per_q for better readability.
 - Rename the field name to rx_enable_queue for better readability too.
 - Accordingly Update its reference in sample vhost.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c|  9 +++--
 lib/librte_ether/rte_ethdev.h| 11 +++
 lib/librte_pmd_i40e/i40e_ethdev.c|  4 ++--
 lib/librte_pmd_i40e/i40e_ethdev_vf.c |  4 ++--
 lib/librte_pmd_i40e/i40e_rxtx.c  |  4 ++--
 lib/librte_pmd_i40e/i40e_rxtx.h  |  4 ++--
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c|  8 
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h|  6 --
 8 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 85ee8b8..4fd164d 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -3607,9 +3607,14 @@ MAIN(int argc, char *argv[])
char pool_name[RTE_MEMPOOL_NAMESIZE];
char ring_name[RTE_MEMPOOL_NAMESIZE];

-   rx_conf_default.start_rx_per_q = (uint8_t)zero_copy;
+   /*
+* Zero copy defers queue RX/TX start to the time when guest
+* finishes its startup and packet buffers from that guest are
+* available.
+*/
+   rx_conf_default.rx_deferred_start = (uint8_t)zero_copy;
rx_conf_default.rx_drop_en = 0;
-   tx_conf_default.start_tx_per_q = (uint8_t)zero_copy;
+   tx_conf_default.tx_deferred_start = (uint8_t)zero_copy;
nb_mbuf = num_rx_descriptor
+ num_switching_cores * MBUF_CACHE_SIZE_ZCP
+ num_switching_cores * MAX_PKT_BURST;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 60b24c5..bef8962 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -604,7 +604,7 @@ struct rte_eth_rxconf {
struct rte_eth_thresh rx_thresh; /**< RX ring threshold registers. */
uint16_t rx_free_thresh; /**< Drives the freeing of RX descriptors. */
uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. 
*/
-   uint8_t start_rx_per_q; /**< start rx per queue. */
+   uint8_t rx_deferred_start; /**< RX not start in rte_eth_dev_start(). */
 };

 #define ETH_TXQ_FLAGS_NOMULTSEGS 0x0001 /**< nb_segs=1 for all mbufs */
@@ -625,7 +625,7 @@ struct rte_eth_txconf {
uint16_t tx_rs_thresh; /**< Drives the setting of RS bit on TXDs. */
uint16_t tx_free_thresh; /**< Drives the freeing of TX buffers. */
uint32_t txq_flags; /**< Set flags for the Tx queue */
-   uint8_t start_tx_per_q; /**< start tx per queue. */
+   uint8_t tx_deferred_start; /**< TX not start in rte_eth_dev_start(). */
 };

 /**
@@ -1795,7 +1795,9 @@ extern int rte_eth_tx_queue_setup(uint8_t port_id, 
uint16_t tx_queue_id,
 extern int rte_eth_dev_socket_id(uint8_t port_id);

 /*
- * Start specified RX queue of a port
+ * Allocate mbuf from mempool, setup the DMA physical address
+ * and then start RX for specified queue of a port. It is used
+ * when rx_deferred_start flag of the specified queue is true.
  *
  * @param port_id
  *   The port identifier of the Ethernet device
@@ -1827,7 +1829,8 @@ extern int rte_eth_dev_rx_queue_start(uint8_t port_id, 
uint16_t rx_queue_id);
 extern int rte_eth_dev_rx_queue_stop(uint8_t port_id, uint16_t rx_queue_id);

 /*
- * Start specified TX queue of a port
+ * Start TX for specified queue of a port. It is used when tx_deferred_start
+ * flag of the specified queue is true.
  *
  * @param port_id
  *   The port identifier of the Ethernet device
diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index a00d6ca..26f1799 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -3017,7 +3017,7 @@ i40e_vsi_switch_tx_queues(struct i40e_vsi *vsi, bool on)
txq = dev_data->tx_queues[i];
/* Don't operate the queue if not configured or
 * if starting only per queue */
-   if (!txq->q_set || (on && txq->start_tx_per_q))
+   if (!txq->q_set || (on && txq->tx_deferred_start))
continue;
if (on)
ret = i40e_dev_tx_queue_start(dev, i);
@@ -3095,7 +3095,7 @@ i40e_vsi_switch_rx_queues(struct i40e_vsi *vsi, bool on)
rxq = dev_data->rx_queues[i];
/* Don't operate the queue if not configured or
 * if starting only per queue */
-   if (!rxq->q_set || (on &&

[dpdk-dev] Hi all, does Amazon VMs supported DPDK or not?

2014-09-26 Thread Dong, Binghua

A customer plan to buy some global Amazon VMs to run their DPDK 1.3(will 
upgrade to DPDK1.6 or 1.7) based VPN applications on global sites.

Thanks a lot;

[dpdk-dev] [PATCH 0/4] Add DSO symbol versioning to support backwards compatibility

2014-09-26 Thread Thomas Monjalon

Hi Neil,

2014-09-24 14:19, Neil Horman:
> Ping Thomas. I know you're busy, but I would like this to not fall off anyones
> radar.  You alluded to concerns regarding what, for lack of a better term,
> ABI/API lockin.  I had asked you to enuumerate/elaborate on specifics, but 
> never
> heard back.  Are there further specifics you wish to discuss, or are you
> satisfied with the above answers?

Sorry for not being very reactive on this thread.
All this discussion is very interesting but it's really not the proper
time to apply it. As you said, it requires an extra effort. I'm not saying
it will never be integrated. I'm just saying that we cannot change
everything at the same time.

Let me sum up the situation. This community project has been very active
for few months now. First, we learnt how to make some releases together
and we are improving the process to be able to deliver a new major release
every 4 months while having some good quality process.
But these releases are still not complete because documentation is not
integrated yet. Then developers should have a role in documentation updates.
We also need to integrate and learn how to use more tools to be more
efficient and improve quality.

So the question is "when should we care about API compatibility"?
And the answer is: ASAP, but not now. I feel next year is a better target.
Because the most important priority is to move together at a pace which
allow most of us to stay in the race.

-- 
Thomas

[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Wodkowski, PawelX

> So basically cancel() just set ALARM_CANCELLED and leaves actual alarm
> deletion to the callback()?
> That was the thought, yes.
> 
> > I think it is doable - but I don't see any real advantage with that 
> > approach.
> > Yes, code will become a bit simpler, as  we'll have one point when we remove
> alarm from the list.
> Yes, that would be the advantage, that the code would be much simpler.
> 
> > But from other side, imagine such simple test-case:
> >
> > for (i = 0; i < 0x10; i++) {
> >rte_eal_alarm_set(ONE_MIN, cb_func, (void *)i);
> >rte_eal_alarm_cancel(cb_func, (void *)i);
> > }
> >
> > We'll endup with 1M of cancelled, but still not removed entries in the
> alarm_list.
> > With current implementation that means - few MBs of wasted memory,
> Thats correct, and the tradeoff to choose between.  Do you want simpler code
> that is easier to maintain, or do you want a high speed cancel and set
> operation.  I'm not aware of all the use cases, but I have a hard time seeing
> a use case in which the in-flight alarm list grows unboundedly large, which in
> my mind mitigates the risk of deferred removal, but I'm perfectly willing to
> believe that there are use cases which I'm not aware of.
> 
> > plus very slow set() and cancel(), as they'll  have to traverse all entries 
> > in the
> list.
> > And all that - for empty from user perspective alarm_list
> > So I still prefer Michal's way.
> > After all, it doesn't look that complicated to me.
> Except that the need for Michals fix arose from the fact that we have two free
> locations that might both get called depending on the situation.  Thats what 
> I'm
> trying to address here, the complexity itself, rather than the fix (which I
> agree is perfectly valid).
> 
> > BTW, any particular reason you are so negative about pthread_self()?
> >
> Nothing specifically against it (save for its inverted return code sense, 
> which
> made it difficult for me to parse when reviewing).  Its more the complexity
> itself in the alarm cancel and callback routine that I was looking at.  Given
> that the origional bug happened because an cancel in a callback might produce 
> a
> double free, I wanted to fix it by simpifying the code, not adding conditions
> which make it more complex.
> 
> You know, looking at it, something else just occured to me.  I think this 
> could
> all be fixed without the cancel flag or the pthread_self assignments.  What if
> we simply removed the alarm from the list before we called the callback in
> rte_eal_alarm_callback()?  That way any cancel operation called from within 
> the
> callback would fail, as it wouldn't appear on the list, and the callback
> operation would be the only freeing entity?  That would let you still have a
> fast set and cancel, and avoid the race.  Thoughts?  Untested sample patch
> below
> 
> 
> > >
> > > It also seems like the alarm api as a whole could use some improvement.
> The
> > > way its written right now, theres no way to refer to a specific alarm 
> > > (i.e.
> > > cancelation relies on the specification of a function and data pointer, 
> > > which
> > > may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an 
> > > opaque
> > > handle to a unique timer instance that can be store by a caller and used 
> > > to
> > > specfically cancel that timer?  Thats how both the bsd and linux timer
> > > subsystems model timers.
> >
> > Yeh,  alarm API looks a bit unusual.
> > Though, I suppose that's subject for another patch/discussion :)
> >
> Yes, agreed :)
> 

Please read quoted message bellow:

> >
> >
> > His solution *does* eliminate race condition too.
> >
> I applied his patch. And here is the problem
> 1 rte_spinlock_lock(_list_lk);
> 2 while ((ap = LIST_FIRST(_list)) !=NULL &&
> 3 gettimeofday(, NULL) == 0 &&
> 4 (ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> now.tv_sec &&
> 5 ap->time.tv_usec <=
> now.tv_usec))){
> 6 ap->executing |= ALARM_EXECUTING;
> 7 if (likely(!(ap->executing & ALARM_CANCELLED))) {
> 8 rte_spinlock_unlock(_list_lk);
> 9   //another thread: rte_alarm_cancel called, mark 
> this timer
> canceled and exit ( THE RACE)
> 10ap->cb_fn(ap->cb_arg); // rte_alarm_set called
> (THE RACE)
> 11
> 12rte_spinlock_lock(_list_lk);
> 13}
> 14
> 15rte_spinlock_lock(_list_lk);
> 16LIST_REMOVE(ap, next);
> 17rte_free(ap);
> 18}
> 
> Imagine
> 
> Thread 1: Thread2
> Execute eal_alarm_callback
> Lock list at 1   rte_alarm_cancel -> block on 
> spinlock
> 
> Realease lock at line 8  rte_alarm_cancel -> resumes 
> execution -> it
> mark this timer canceled
> ap->cb_fn is called at line 10   rte_alarm_cancel

[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Neil Horman

On Fri, Sep 26, 2014 at 03:41:58PM +, Ananyev, Konstantin wrote:
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> > Sent: Friday, September 26, 2014 4:02 PM
> > To: Wodkowski, PawelX
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to 
> > thread-safe:
> > 
> > On Fri, Sep 26, 2014 at 02:01:05PM +, Wodkowski, PawelX wrote:
> > > > > > Maybe I don't see something obvious? :)
> > > >
> > > > I think you're missing the fact that your patch doesn't do what you 
> > > > assert above
> > > > either :)
> > >
> > > Issue is not in setting alarms but canceling it. If you look closer to my 
> > > patch you
> > > see that it address this issue (look at added *do { lock(); ; 
> > > unlock(); } while( )*
> > > part).
> > >
> > I get where the issue is, and I'm looking at your patch.  I see that you did
> > some locking there.  The issue I'm pointing out is that, if you call
> > rte_eal_alarm_cancel on an alarm callback, you will exit the alarm_cancel
> > function with, by definition, one alarm executing (the one you are currently
> > running).  You're patch works perfectly for the case where another thread 
> > calls
> > cancel, in that it waits until the executing alarm is complete, but it 
> > doesn't
> > work in the case where you are calling it from within the alarm callback.
> 
> Hm, and why do we need it from alarm callback?
Because you might not know if you're in an alarm callback or not. Pawel
explained that the point of the patch was to ensure that alarms are canceled and
complete when you call rte_eal_alarm_cancel, and thats not always going to be
the case, even whith this patch.

> After cb_func() is finished given alarm entry will be removed anyway.
> 
Yes, but thats true with or without this patch.

> > If  you're goal is to guarantee that all the matching alarms are cancelled 
> > and
> > complete, you haven't done that, because the recursive state is still 
> > unhandled.
> > 
> > > >
> > > > First, lets address rte_alarm_set.  There is no notion of "re-arming" 
> > > > in this
> > > > alarm implementation, because theres no ability to refer to a specific 
> > > > alarm
> > > > from the callers perspective.  When you call rte_eal_alarm_set you get 
> > > > a new
> > > > alarm every time.  So I don't really see a race there.  It might not be 
> > > > exactly
> > > > the behavior you want, but its not a race, becuase you're not modifying 
> > > > an
> > > > alarm
> > > > in the middle of execution, you're just creating a new alarm, which is 
> > > > safe.
> > >
> > > OK, it is safe, but this is not the case.
> > >
> > I don't know what you mean by this.  We agree its safe, great.  But it is 
> > the
> > case as I've described it, you can see it from the implementation, every 
> > call to
> > rte_eal_alarm_set starts with a malloc of a new alarm structure.
> > 
> > > >
> > > > There is a race in what you describe above, insofar as its possible 
> > > > that you
> > > > might call rte_eal_alarm_cancel and return without having canceled all 
> > > > the
> > > > matching alarms.  I don't see any clear documentation on what the 
> > > > behavior is
> > > > supposed to be, but if you want to ensure that all matching alarms are 
> > > > cancelled
> > > > or complete on return from rte_eal_alarm_cancel, thats perfectly fine 
> > > > (in linux
> > > > API parlance, thats usually denoted as a cancel_sync operation).
> > >
> > > Again, look at the patch. I changed documentation to inform about this 
> > > behavior.
> > >
> > 
> > This is the documentation included in the patch:
> > Change alarm cancel function to thread-safe.
> > It eliminates a race between threads using rte_alarm_cancel and
> > rte_alarm_set.
> > 
> > neither have you compeltely described the race condition (though you now 
> > have
> > previously in this thread), nor have you completely addressed it (calling
> > rte_eal_alarm_cancel and rte_eal_alarm_set still behaves exactly as it did
> > previously with a 2nd thread).
> > 
> > > >
> > > > For that race condition, you're correct, my patch doesn't address it, I 
> > > > see that
> > > > now.  Though your patch doesn't either.  If you call 
> > > > rte_eal_alarm_cancel from
> > > > within a callback function, then, by definition, you can't wait on the
> > > > completion of the active alarm, because thats a deadlock.  Its a 
> > > > necessecary
> > > > evil, I grant you, but it means that you can't be guaranteed the 
> > > > cancelled and
> > > > complete (cancel_sync) behavior that you want, at least not with the 
> > > > current
> > > > api.  If you want that behavior, you need to do one of two things:
> > >
> > > This patch does not break any API. It only removes undefined behavior.
> > >
> > I never said it did break ABI.  I said that to completely fix it you would 
> > have
> > to break ABI.  And it doesn't really remove undefined behavior, because you
> > still

[dpdk-dev] [PATCH v2] librte_pmd_packet: add PMD for AF_PACKET-based virtual devices

2014-09-26 Thread Thomas Monjalon

2014-09-16 16:16, Neil Horman:
> On Fri, Sep 12, 2014 at 02:05:23PM -0400, John W. Linville wrote:
> > Ping?  Are there objections to this patch from mid-July?
> 
> Thomas, Where are you on this?  It seems like if you don't have any objections
> to this patch, it should go in, in ilght of the lack of further commentary.

1) It doesn't appear as a top priority.
2) It's competing with pcap PMD and bifurcated PMD to come
   (http://dpdk.org/ml/archives/dev/2014-September/005379.html)
3) There is no test associated with this PMD.
If one of this item becomes wrong, it should go in.

Currently, 2 projects are being initiated for validation (dcts) and
documentation. Keeping new things outside of the DPDK core makes it
clear that they have not to be supported by dcts and doc yet.
So, it is better to have an external PMD, like memnic, acting as a
staging area.

During this time, keeping this PMD separately will allow you to update it
with a maintainer account in dpdk.org. I just need your SSH public key.

Thank you
-- 
Thomas

[dpdk-dev] patches validation

2014-09-26 Thread Thomas Monjalon

2014-09-25 23:29, Ananyev, Konstantin:
> From: Thomas Monjalon
> > 2014-09-25 13:07, Cao, Waterman:
> > >  I will work with team to see if we can improve test report.
> > >  Because intel validation team will continue to upgrade test cases to 
> > > verify feature,
> > >  I think that it's still worth to verify patch or features even it has 
> > > already integrated branch.
> > 
> > Of course, it's important to continue validation after integration.
> > But please do not send test report on the list for patches which are
> > already integrated, except for 2 cases:
> > 1) there is an error
> > 2) this is a new feature and you want to explain how to test it
> > (btw, how do you test "zero copy" and "one copy" for virtio?)
> > 
> > About report content, please add these informations:
> > - commit id or tag used as a base to apply the patch
> > - tools used for the test (testpmd, sample, qemu, etc)
> > - command parameters if relevant
> > - test topology if relevant
> > 
> > If someone think about an useful information I missed, please share it.
> 
> May be it is just me, but what's wrong with mail for every tested patch?
> At least it makes easy to check was the patch formally validated or not
> - all you have to do - grep through mail archives.

The right place to check something about a patch is the git history.
So it's important to send test reports before having it integrated in git.
Doing so, without any reference to commit id, imply that the patch is pending.
If you think it's really important to send test report about an integrated
patch, the commit id must be clearly visible to quickly understand its status.
There is something else wrong about these test reports: there is no useful
information about how to reproduce the test.
So it's not forbidden to send any email you want but please try to be more
informative and easy to understand. We are getting a huge email traffic so
everyone must be concerned about how to make it effective.

Thanks
-- 
Thomas

[dpdk-dev] [PATCH 1/4 v2] compat: Add infrastructure to support symbol versioning

2014-09-26 Thread Neil Horman

On Fri, Sep 26, 2014 at 03:16:08PM +0100, Sergio Gonzalez Monroy wrote:
> On Thu, Sep 25, 2014 at 02:52:32PM -0400, Neil Horman wrote:
> > Add initial pass header files to support symbol versioning.
> > 
> > ---
> > Change notes
> > v2)
> > * Fixed ifdef in rte_compat.h to test for RTE_BUILD_SHARED_LIB instead of 
> > the
> > non-existant RTE_SYMBOL_VERSIONING
> > 
> > * Fixed VERSION_SYMBOL macro to add the needed extra @ to make versioning 
> > work
> > properly
> > 
> > * Improved/Clarified documentation
> > 
> > Signed-off-by: Neil Horman 
> > CC: Thomas Monjalon 
> > CC: "Richardson, Bruce" 
> > CC: "Gonzalez Monroy, Sergio" 
> > ---
> >  lib/Makefile   |  1 +
> >  lib/librte_compat/Makefile | 38 ++
> >  lib/librte_compat/rte_compat.h | 87 
> > ++
> >  mk/rte.lib.mk  |  6 +++
> >  4 files changed, 132 insertions(+)
> >  create mode 100644 lib/librte_compat/Makefile
> >  create mode 100644 lib/librte_compat/rte_compat.h
> > 
> > diff --git a/lib/Makefile b/lib/Makefile
> > index 10c5bb3..a85b55b 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -32,6 +32,7 @@
> >  include $(RTE_SDK)/mk/rte.vars.mk
> >  
> >  DIRS-$(CONFIG_RTE_LIBC) += libc
> > +DIRS-y += librte_compat
> >  DIRS-$(CONFIG_RTE_LIBRTE_EAL) += librte_eal
> >  DIRS-$(CONFIG_RTE_LIBRTE_MALLOC) += librte_malloc
> >  DIRS-$(CONFIG_RTE_LIBRTE_RING) += librte_ring
> > diff --git a/lib/librte_compat/Makefile b/lib/librte_compat/Makefile
> > new file mode 100644
> > index 000..3415c7b
> > --- /dev/null
> > +++ b/lib/librte_compat/Makefile
> > @@ -0,0 +1,38 @@
> > +#   BSD LICENSE
> > +#
> > +#   Copyright(c) 2010-2014 Neil Horman 
> > +#   All rights reserved.
> > +#
> > +#   Redistribution and use in source and binary forms, with or without
> > +#   modification, are permitted provided that the following conditions
> > +#   are met:
> > +#
> > +# * Redistributions of source code must retain the above copyright
> > +#   notice, this list of conditions and the following disclaimer.
> > +# * Redistributions in binary form must reproduce the above copyright
> > +#   notice, this list of conditions and the following disclaimer in
> > +#   the documentation and/or other materials provided with the
> > +#   distribution.
> > +# * Neither the name of Intel Corporation nor the names of its
> > +#   contributors may be used to endorse or promote products derived
> > +#   from this software without specific prior written permission.
> > +#
> > +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> > +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> > +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > +
> > +include $(RTE_SDK)/mk/rte.vars.mk
> > +
> > +
> > +# install includes
> > +SYMLINK-y-include := rte_compat.h
> > +
> > +include $(RTE_SDK)/mk/rte.lib.mk
> > diff --git a/lib/librte_compat/rte_compat.h b/lib/librte_compat/rte_compat.h
> > new file mode 100644
> > index 000..cff9aea
> > --- /dev/null
> > +++ b/lib/librte_compat/rte_compat.h
> > @@ -0,0 +1,87 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2010-2014 Neil Horman .
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + * * Redistributions of source code must retain the above copyright
> > + *   notice, this list of conditions and the following disclaimer.
> > + * * Redistributions in binary form must reproduce the above copyright
> > + *   notice, this list of conditions and the following disclaimer in
> > + *   the documentation and/or other materials provided with the
> > + *   distribution.
> > + * * Neither the name of Intel Corporation nor the names of its
> > + *   contributors may be used to endorse or promote products derived
> > + *   from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> > + *

[dpdk-dev] [PATCH] eal: remove rte_snprintf

2014-09-26 Thread Neil Horman

On Fri, Sep 26, 2014 at 04:39:44PM +0200, Thomas Monjalon wrote:
> The function rte_snprintf() was deprecated in version 1.7.0
> (commit 6f41fe75e2dd).
> It's now totally removed.
> 
> Signed-off-by: Thomas Monjalon 
Acked-by: Neil Horman 

>

[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Neil Horman

On Fri, Sep 26, 2014 at 02:01:05PM +, Wodkowski, PawelX wrote:
> > > > Maybe I don't see something obvious? :)
> > 
> > I think you're missing the fact that your patch doesn't do what you assert 
> > above
> > either :)
> 
> Issue is not in setting alarms but canceling it. If you look closer to my 
> patch you
> see that it address this issue (look at added *do { lock(); ; unlock(); } 
> while( )* 
> part).
> 
I get where the issue is, and I'm looking at your patch.  I see that you did
some locking there.  The issue I'm pointing out is that, if you call
rte_eal_alarm_cancel on an alarm callback, you will exit the alarm_cancel
function with, by definition, one alarm executing (the one you are currently
running).  You're patch works perfectly for the case where another thread calls
cancel, in that it waits until the executing alarm is complete, but it doesn't
work in the case where you are calling it from within the alarm callback. If
you're goal is to guarantee that all the matching alarms are cancelled and
complete, you haven't done that, because the recursive state is still unhandled.

> > 
> > First, lets address rte_alarm_set.  There is no notion of "re-arming" in 
> > this
> > alarm implementation, because theres no ability to refer to a specific alarm
> > from the callers perspective.  When you call rte_eal_alarm_set you get a new
> > alarm every time.  So I don't really see a race there.  It might not be 
> > exactly
> > the behavior you want, but its not a race, becuase you're not modifying an
> > alarm
> > in the middle of execution, you're just creating a new alarm, which is safe.
> 
> OK, it is safe, but this is not the case.
> 
I don't know what you mean by this.  We agree its safe, great.  But it is the
case as I've described it, you can see it from the implementation, every call to
rte_eal_alarm_set starts with a malloc of a new alarm structure. 

> > 
> > There is a race in what you describe above, insofar as its possible that you
> > might call rte_eal_alarm_cancel and return without having canceled all the
> > matching alarms.  I don't see any clear documentation on what the behavior 
> > is
> > supposed to be, but if you want to ensure that all matching alarms are 
> > cancelled
> > or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in 
> > linux
> > API parlance, thats usually denoted as a cancel_sync operation).
> 
> Again, look at the patch. I changed documentation to inform about this 
> behavior.
> 

This is the documentation included in the patch:
Change alarm cancel function to thread-safe.
It eliminates a race between threads using rte_alarm_cancel and
rte_alarm_set.

neither have you compeltely described the race condition (though you now have
previously in this thread), nor have you completely addressed it (calling
rte_eal_alarm_cancel and rte_eal_alarm_set still behaves exactly as it did
previously with a 2nd thread).

> > 
> > For that race condition, you're correct, my patch doesn't address it, I see 
> > that
> > now.  Though your patch doesn't either.  If you call rte_eal_alarm_cancel 
> > from
> > within a callback function, then, by definition, you can't wait on the
> > completion of the active alarm, because thats a deadlock.  Its a necessecary
> > evil, I grant you, but it means that you can't be guaranteed the cancelled 
> > and
> > complete (cancel_sync) behavior that you want, at least not with the current
> > api.  If you want that behavior, you need to do one of two things:
> 
> This patch does not break any API. It only removes undefined behavior.
> 
I never said it did break ABI.  I said that to completely fix it you would have
to break ABI.  And it doesn't really remove undefined behavior, because you
still have the old behavior in the recursive case (which you may be ok with, I
don't know, but if you really want to address the behavior, you should address
this aspect of it).

> > 
> > 1) Modify the api to allow callers to individually reference timer 
> > instances, so
> > that when cancelling, we can return an appropriate return code to indicate 
> > to
> > the caller that this alarm is in-progress.  That way you can guarantee the
> > caller that the specific alarm that you cancelled is either complete and 
> > cancelled
> > or currently executing.  Add an api to expicitly wait on a referenced alarm 
> > as
> > well.  This allows developers to know that, when executing an alarm 
> > callback, an
> > -ECURRENTLYEXECUTING return code is ok, because they are in the currently
> > executing context.
> 
> This would brake API for sure.
Yes, it would.  Bruce Richardson just made a major ABI break with his mbuf
cleanup set.  If there was a time to change ABI here, now would be the time I
think.

Neil

> 
>

[dpdk-dev] [PATCH 0/4] Add DSO symbol versioning to support backwards compatibility

2014-09-26 Thread Neil Horman

On Fri, Sep 26, 2014 at 12:41:33PM +0200, Thomas Monjalon wrote:
> Hi Neil,
> 
> 2014-09-24 14:19, Neil Horman:
> > Ping Thomas. I know you're busy, but I would like this to not fall off 
> > anyones
> > radar.  You alluded to concerns regarding what, for lack of a better term,
> > ABI/API lockin.  I had asked you to enuumerate/elaborate on specifics, but 
> > never
> > heard back.  Are there further specifics you wish to discuss, or are you
> > satisfied with the above answers?
> 
> Sorry for not being very reactive on this thread.
> All this discussion is very interesting but it's really not the proper
> time to apply it. As you said, it requires an extra effort. I'm not saying
> it will never be integrated. I'm just saying that we cannot change
> everything at the same time.
> 
> Let me sum up the situation. This community project has been very active
> for few months now. First, we learnt how to make some releases together
> and we are improving the process to be able to deliver a new major release
> every 4 months while having some good quality process.
> But these releases are still not complete because documentation is not
> integrated yet. Then developers should have a role in documentation updates.
> We also need to integrate and learn how to use more tools to be more
> efficient and improve quality.
> 
> So the question is "when should we care about API compatibility"?
> And the answer is: ASAP, but not now. I feel next year is a better target.
> Because the most important priority is to move together at a pace which
> allow most of us to stay in the race.
> 

I'm sorry Thomas, I don't accept this.  I asked you for details as to your
concerns regarding this patch series, and you've provided more vague comments.
I need details to address

You say it requires extra effort, you're right it does.  Any feature that you
integreate requires some additional effort.  How is this patch any different
from adding the acl library or any other new API?  Everything requires
maintenence, thats how software works.  What specfically about this patch series
makes the effort insurmountable to you?

You say you're improving your process.  Great, this patch aids in that process
by ensuring backwards compatibility for a period of time.  Given that the API
and ABI can still evolve within this framework, as I've described, how is this
patch series not a significant step forward toward your goal of quality process.

You say documentation isn't integrated.  So, what does getting documentation
integrated have to do with this patch set, or any other?  I don't see you
holding any other patches based on documentation.  Again, nothing in this series
prevents evolution of the API or ABI.  If you're hope is to wait until
everything is perfect, then apply some control to the public facing API, and get
it all documented, none of thosse things will ever happen, I promise you.

You say you also need to learn to use more tools to be more efficient and
improve quality.  Great!  Thats exactly what this is. If we mandate even a short
term commitment to ABI stability (1 single relese worth of time), we will
quickly identify what API's change quickly and where we need to be cautious with
our API design.  If you just assume that developers will get better of their own
volition, it will never happen.

You say this should go in next year, but not now.  When exactly?  What event do
you forsee occuring in the next 12-18 months that will change everything such
that we can start supporing an ABI for more than just a few weeks at the head of
the tree?  

To this end, I just did a quick search through the git history for dpdk to look
at the histories of all the header files that are exposed via the makefile
SYMLINK command (given that that provides a list of header files that
applications can include, and embodies all the function symbols and data types
applications have access to.

There are 179 total commits in that list
Of those, a bit of spot checking suggests that about 10-15% of them actually
change ABI, and many of those came from Bruce's rework of the mbuf structure.
That about 17-20 instances over the last 2 years where an ABI update would have
been needed.  That seems pretty reasonable to me.  Where exactly is your concern
here?

Neil

> -- 
> Thomas
>

[dpdk-dev] [PATCH 0/5] remove traces of bare metal support

2014-09-26 Thread Neil Horman

On Fri, Sep 26, 2014 at 04:03:57PM +0200, Thomas Monjalon wrote:
> There are some references to bare metal (i.e. without OS) support,
> especially some options to build a libc with DPDK.
> As there are currently no such support, it can be removed.
> Some comments are cleaned in the same time.
> 
> Thanks to David for having done most of this effort:
>   config: no more bare metal environment
>   mk: no more bare metal environment
>   eal: no more bare metal environment
>   app: no more bare metal environment
>   examples: no more bare metal environment
> 
> -- 
> Thomas
> 
Acked-by: Neil Horman

[dpdk-dev] [PATCH] examples: do not probe pci twice

2014-09-26 Thread Neil Horman

On Fri, Sep 26, 2014 at 02:31:24PM +0200, Thomas Monjalon wrote:
> Since commit a155d430119 ("support link bonding device initialization"),
> rte_eal_pci_probe() is called in rte_eal_init().
> So it doesn't have to be called by application anymore.
> It has been fixed for testpmd in commit 2950a769315,
> and this patch remove it from other applications.
> 
> Signed-off-by: Thomas Monjalon 

Acked-by: Neil Horman

[dpdk-dev] [PATCH v2] librte_pmd_packet: add PMD for AF_PACKET-based virtual devices

2014-09-26 Thread Neil Horman

On Fri, Sep 26, 2014 at 11:28:05AM +0200, Thomas Monjalon wrote:
> 2014-09-16 16:16, Neil Horman:
> > On Fri, Sep 12, 2014 at 02:05:23PM -0400, John W. Linville wrote:
> > > Ping?  Are there objections to this patch from mid-July?
> > 
> > Thomas, Where are you on this?  It seems like if you don't have any 
> > objections
> > to this patch, it should go in, in ilght of the lack of further commentary.
> 
> 1) It doesn't appear as a top priority.
Thats your responsibility.  Patches can't languish and rot on a list forever
just because others aren't willing to test it.  If theres further testing that
you feel it needs, ask. But from my read, its been tested for functionality and
performance (though high performance is never expected from a AF_PACKET PMD).
Given that any one PMD will not affect the performance of another in isolation,
I'm not sure what more you're waiting for here.

> 2) It's competing with pcap PMD and bifurcated PMD to come
>(http://dpdk.org/ml/archives/dev/2014-September/005379.html)
Regarding the pcap PMD, so?  Its an alternate implementation that provides
different features with different limitations.  The fact that they are simmilar
is irrelevant.  If simmilarity was the test, then we wouldn't bother with the
bifurcated driver either, because the pcap pmd already exists.

Regarding the bifurcated driver, you can't hold existing patches on the promise
of another pmd thats comming at an indeterminate time in the future.  Theres no
reason not to take this now and deprecate it in the future if there is
sufficient overlap with the bifurcated driver, though to my point above, they
still address different needs with different limitations, so I don't see doing
so as necessecary.

> 3) There is no test associated with this PMD.
That would have been a great comment to make a few months back, though whats
wrong with testpmd here?  That seems to be the same test that every other pmd
uses. What exactly are you looking for?


> If one of this item becomes wrong, it should go in.
> 

> Currently, 2 projects are being initiated for validation (dcts) and
> documentation. Keeping new things outside of the DPDK core makes it
> clear that they have not to be supported by dcts and doc yet.
> So, it is better to have an external PMD, like memnic, acting as a
> staging area.
> 
So, this brings up an excellent point - Validation and support.  Commonly open
source projects don't provide support at the upstream HEAD. Those items are
applied and inforced by distributors.  Theres no need to ensure that the
upstream head is always the most performance and stable point of the tree.  Its
that need that keeps the development pace slow, and creates frustrations like
this one, where a patch sits unaddressed for long periods of time.  Commonly the
workflow for most open source projects is for there to be a window of time where
visual review and basic functional testing are sufficient for acceptance into
the head of the tree.  After the development window closes there is a
stabilization period where testing/validation is done to ensure that no
regressions have been encountered, optionally with a -next branch temporarily
being created to accept patches for upcomming future releases.  If regressions
are found, its a simple matter in git to bisect back to the offending patch,
allow the contributing developer an opportunity to fix the issue, or to drop the
patch.  Using a workflow like this we can have a reasonable balance of needs
(good patch turn around time, as well as reasonable testing).  We've discussed
this when I posted the PMD_REGISTER_DRIVER patch months ago, and I thought you
were going to move in the direction of this workflow.  What happened?

> During this time, keeping this PMD separately will allow you to update it
> with a maintainer account in dpdk.org. I just need your SSH public key.
> 
We've discussed this too, keeping PMDs maintained separately is a very bad idea.
Doing so means developers have to constantly be aware of changes to the core
tree and try to keep up individually.  Integrating them all means that API
changes can be easily propogated to all PMD's when needed without making work
for many people.  Its exactly the reason we encourage driver writers to open
source drivers in Linux, because not doing so closes developers off from the
free maintenence they get when optimizations are made to API's.  And if you
follow the development model above, you don't need to worry about implied
support, as that correctly becomes a distributor issue.


Neil

[dpdk-dev] [PATCH v4 8/8]app/testpmd:test VxLAN Tx checksum offload

2014-09-26 Thread Jijiang Liu

Add test cases in testpmd to test VxLAN Tx Checksum offload, which include
 - IPv4 tunnel and IPv6 tunnel
 - outer L3, inner L3 and L4 checksum offload for Tx side.

Signed-off-by: Jijiang Liu 
Acked-by: Helin Zhang 
Acked-by: Jingjing Wu 
Acked-by: Jing Chen 

---
 app/test-pmd/config.c   |6 +-
 app/test-pmd/csumonly.c |  200 +++
 2 files changed, 188 insertions(+), 18 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 2a1b93f..9bc08f4 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1753,9 +1753,9 @@ tx_cksum_set(portid_t port_id, uint64_t ol_flags)
uint64_t tx_ol_flags;
if (port_id_is_invalid(port_id))
return;
-   /* Clear last 4 bits and then set L3/4 checksum mask again */
-   tx_ol_flags = ports[port_id].tx_ol_flags & (~0x0Full);
-   ports[port_id].tx_ol_flags = ((ol_flags & 0xf) | tx_ol_flags);
+   /* Clear last 8 bits and then set L3/4 checksum mask again */
+   tx_ol_flags = ports[port_id].tx_ol_flags & (~0x0FFull);
+   ports[port_id].tx_ol_flags = ((ol_flags & 0xff) | tx_ol_flags);
 }

 void
diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index fcc4876..4c53042 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -196,7 +196,6 @@ get_ipv6_udptcp_checksum(struct ipv6_hdr *ipv6_hdr, 
uint16_t *l4_hdr)
return (uint16_t)cksum;
 }

-
 /*
  * Forwarding of packets. Change the checksum field with HW or SW methods
  * The HW/SW method selection depends on the ol_flags on every packet
@@ -209,10 +208,16 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
struct rte_mbuf  *mb;
struct ether_hdr *eth_hdr;
struct ipv4_hdr  *ipv4_hdr;
+   struct ether_hdr *inner_eth_hdr;
+   struct ipv4_hdr  *inner_ipv4_hdr = NULL;
struct ipv6_hdr  *ipv6_hdr;
+   struct ipv6_hdr  *inner_ipv6_hdr = NULL;
struct udp_hdr   *udp_hdr;
+   struct udp_hdr   *inner_udp_hdr;
struct tcp_hdr   *tcp_hdr;
+   struct tcp_hdr   *inner_tcp_hdr;
struct sctp_hdr  *sctp_hdr;
+   struct sctp_hdr  *inner_sctp_hdr;

uint16_t nb_rx;
uint16_t nb_tx;
@@ -221,12 +226,18 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
uint64_t pkt_ol_flags;
uint64_t tx_ol_flags;
uint16_t l4_proto;
+   uint16_t inner_l4_proto = 0;
uint16_t eth_type;
uint8_t  l2_len;
uint8_t  l3_len;
+   uint8_t  inner_l2_len;
+   uint8_t  inner_l3_len = 0;

uint32_t rx_bad_ip_csum;
uint32_t rx_bad_l4_csum;
+   uint8_t  ipv4_tunnel;
+   uint8_t  ipv6_tunnel;
+   uint16_t ptype;

 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
uint64_t start_tsc;
@@ -255,6 +266,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)

txp = [fs->tx_port];
tx_ol_flags = txp->tx_ol_flags;
+   ptype = mb->reserved;

for (i = 0; i < nb_rx; i++) {

@@ -262,7 +274,9 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
l2_len  = sizeof(struct ether_hdr);
pkt_ol_flags = mb->ol_flags;
ol_flags = (pkt_ol_flags & (~PKT_TX_L4_MASK));
-
+   ptype = mb->reserved;
+   ipv4_tunnel = IS_ETH_IPV4_TUNNEL(ptype);
+   ipv6_tunnel = IS_ETH_IPV6_TUNNEL(ptype);
eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
if (eth_type == ETHER_TYPE_VLAN) {
@@ -295,7 +309,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
 *  + ipv4 or ipv6
 *  + udp or tcp or sctp or others
 */
-   if (pkt_ol_flags & PKT_RX_IPV4_HDR) {
+   if (pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_IPV4_HDR_EXT)) {

/* Do not support ipv4 option field */
l3_len = sizeof(struct ipv4_hdr) ;
@@ -325,17 +339,95 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
if (tx_ol_flags & 0x2) {
/* HW Offload */
ol_flags |= PKT_TX_UDP_CKSUM;
-   /* Pseudo header sum need be set 
properly */
-   udp_hdr->dgram_cksum = 
get_ipv4_psd_sum(ipv4_hdr);
+   if (ipv4_tunnel)
+   udp_hdr->dgram_cksum = 0;
+   else
+   /* Pseudo header sum need be 
set properly */
+   udp_hdr->dgram_cksum =
+   
get_ipv4_psd_sum(ipv4_hdr);
}
else {
/* SW Implementation,

[dpdk-dev] [PATCH v4 7/8]i40e:support VxLAN Tx checksum offload

2014-09-26 Thread Jijiang Liu

Support VxLAN Tx checksum offload, which include
  - outer L3(IP) checksum offload
  - inner L3(IP) checksum offload
  - inner L4(UDP, TCP and SCTP) checksum offload

Signed-off-by: Jijiang Liu 
Acked-by: Helin Zhang 
Acked-by: Jingjing Wu 
Acked-by: Jing Chen 

---
 lib/librte_mbuf/rte_mbuf.h|2 +
 lib/librte_pmd_i40e/i40e_ethdev.c |4 +-
 lib/librte_pmd_i40e/i40e_rxtx.c   |   47 ++--
 3 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 4955684..1f3f4eb 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -86,6 +86,8 @@ extern "C" {
 #define PKT_RX_IEEE1588_PTP  0x0200 /**< RX IEEE1588 L2 Ethernet PT Packet. */
 #define PKT_RX_IEEE1588_TMST 0x0400 /**< RX IEEE1588 L2/L4 timestamped 
packet.*/

+#define PKT_TX_VXLAN_CKSUM   0x0001 /**< Checksum of TX VxLAN pkt. computed by 
NIC.. */
+#define PKT_TX_IVLAN_PKT 0x0002 /**< TX packet is VxLAN packet with an 
inner VLAN. */
 #define PKT_TX_VLAN_PKT  0x0800 /**< TX packet is a 802.1q VLAN packet. */
 #define PKT_TX_IP_CKSUM  0x1000 /**< IP cksum of TX pkt. computed by NIC. 
*/
 #define PKT_TX_IPV4_CSUM 0x1000 /**< Alias of PKT_TX_IP_CKSUM. */
diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index a2d9111..10f15c9 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -2566,13 +2566,13 @@ i40e_vxlan_filters_init(struct i40e_pf *pf)
_index, NULL);
if (ret < 0) {
PMD_DRV_LOG(ERR, "Failed to add UDP tunnel port %d "
-   "with index=%d\n", RTE_VXLAN_UDP_PORT,
+   "with index=%d\n", RTE_LIBRTE_TUNNEL_UDP_PORT,
 filter_index);
} else {
pf->vxlan_bitmap |= 1;
pf->vxlan_ports[0] = RTE_LIBRTE_TUNNEL_UDP_PORT;
PMD_DRV_LOG(INFO, "Added UDP tunnel port %d with "
-   "index=%d\n", RTE_VXLAN_UDP_PORT, filter_index);
+   "index=%d\n", RTE_LIBRTE_TUNNEL_UDP_PORT, filter_index);
}

return ret;
diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c
index abdf406..821457c 100644
--- a/lib/librte_pmd_i40e/i40e_rxtx.c
+++ b/lib/librte_pmd_i40e/i40e_rxtx.c
@@ -410,12 +410,16 @@ i40e_rxd_ptype_to_pkt_flags(uint64_t qword)
return ip_ptype_map[ptype];
 }

+#define L4TUN_LEN (sizeof(struct udp_hdr) + sizeof(struct vxlan_hdr)\
++ sizeof(struct ether_hdr))
 static inline void
 i40e_txd_enable_checksum(uint32_t ol_flags,
uint32_t *td_cmd,
uint32_t *td_offset,
uint8_t l2_len,
-   uint8_t l3_len)
+   uint8_t l3_len,
+   uint8_t inner_l3_len,
+   uint32_t *cd_tunneling)
 {
if (!l2_len) {
PMD_DRV_LOG(DEBUG, "L2 length set to 0");
@@ -428,6 +432,31 @@ i40e_txd_enable_checksum(uint32_t ol_flags,
return;
}

+   /* VxLAN packet TX checksum offload */
+   if (unlikely(ol_flags & PKT_TX_VXLAN_CKSUM)) {
+   uint8_t l4tun_len;
+
+   /* packet with inner VLAN */
+   if (ol_flags  & PKT_TX_IVLAN_PKT)
+   l4tun_len = L4TUN_LEN + sizeof(struct vlan_hdr);
+   else
+   l4tun_len = L4TUN_LEN;
+
+   if (ol_flags & PKT_TX_IPV4_CSUM)
+   *cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV4;
+   else if (ol_flags & PKT_TX_IPV6)
+   *cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV6;
+
+   /* Now set the ctx descriptor fields */
+   *cd_tunneling |= (l3_len >> 2) <<
+   I40E_TXD_CTX_QW0_EXT_IPLEN_SHIFT |
+   I40E_TXD_CTX_UDP_TUNNELING |
+   (l4tun_len >> 1) <<
+   I40E_TXD_CTX_QW0_NATLEN_SHIFT;
+
+   l3_len = inner_l3_len;
+   }
+
/* Enable L3 checksum offloads */
if (ol_flags & PKT_TX_IPV4_CSUM) {
*td_cmd |= I40E_TX_DESC_CMD_IIPT_IPV4_CSUM;
@@ -1080,6 +1109,9 @@ i40e_calc_context_desc(uint64_t flags)
 {
uint16_t mask = 0;

+   if (flags | PKT_TX_VXLAN_CKSUM)
+   mask |= PKT_TX_VXLAN_CKSUM;
+
 #ifdef RTE_LIBRTE_IEEE1588
mask |= PKT_TX_IEEE1588_TMST;
 #endif
@@ -1099,6 +1131,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
volatile struct i40e_tx_desc *txr;
struct rte_mbuf *tx_pkt;
struct rte_mbuf *m_seg;
+   uint32_t cd_tunneling_params;
uint16_t tx_id;
uint16_t nb_tx;
uint32_t td_cmd;
@@ -1108,6 +1141,7 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t

[dpdk-dev] [PATCH v4 6/8]app/testpmd:test VxLAN packet filter API

2014-09-26 Thread Jijiang Liu

Add tunnel_filter command in testpmd to test VxLAN packet filter API.

Signed-off-by: Jijiang Liu 
Acked-by: Helin Zhang 
Acked-by: Jingjing Wu 
Acked-by: Jing Chen 

---
 app/test-pmd/cmdline.c |  152 
 1 files changed, 152 insertions(+), 0 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index c0b7293..a74e9dc 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -285,6 +285,14 @@ static void cmd_help_long_parsed(void *parsed_result,
"Set the outer VLAN TPID for Packet Filtering on"
" a port\n\n"

+   "tunnel_filter add (port_id) (outer_mac) (inner_mac) 
(ip_addr) "
+   "(inner_vlan) (tunnel_type) (filter_type) (tenant_id) 
(queue_id)\n"
+   "   add a tunnel filter of a port.\n\n"
+
+   "tunnel_filter rm (port_id) (outer_mac) (inner_mac) 
(ip_addr) "
+   "(inner_vlan) (tunnel_type) (filter_type) (tenant_id) 
(queue_id)\n"
+   "   remove a tunnel filter of a port.\n\n"
+
"rx_vxlan_port add (udp_port) (port_id)\n"
"Add an UDP port for VxLAN packet filter on a 
port\n\n"

@@ -6232,6 +6240,149 @@ cmdline_parse_inst_t cmd_vf_rate_limit = {
},
 };

+/* *** ADD TUNNEL FILTER OF A PORT *** */
+struct cmd_tunnel_filter_result {
+   cmdline_fixed_string_t cmd;
+   cmdline_fixed_string_t what;
+   uint8_t port_id;
+   struct ether_addr outer_mac;
+   struct ether_addr inner_mac;
+   cmdline_ipaddr_t ip_value;
+   uint16_t inner_vlan;
+   cmdline_fixed_string_t tunnel_type;
+   cmdline_fixed_string_t filter_type;
+   uint32_t tenant_id;
+   uint16_t queue_num;
+};
+
+static void
+cmd_tunnel_filter_parsed(void *parsed_result,
+ __attribute__((unused)) struct cmdline *cl,
+ __attribute__((unused)) void *data)
+{
+   struct cmd_tunnel_filter_result *res = parsed_result;
+   struct rte_eth_tunnel_filter_conf tunnel_filter_conf;
+   int ret = 0;
+
+   tunnel_filter_conf.outer_mac = >outer_mac;
+   tunnel_filter_conf.inner_mac = >inner_mac;
+   tunnel_filter_conf.inner_vlan = res->inner_vlan;
+
+   if (res->ip_value.family == AF_INET) {
+   tunnel_filter_conf.ip_addr.ipv4_addr =
+   res->ip_value.addr.ipv4.s_addr;
+   tunnel_filter_conf.ip_type = RTE_TUNNEL_IPTYPE_IPV4;
+   } else {
+   memcpy(&(tunnel_filter_conf.ip_addr.ipv6_addr),
+   &(res->ip_value.addr.ipv6),
+   sizeof(struct in6_addr));
+   tunnel_filter_conf.ip_type = RTE_TUNNEL_IPTYPE_IPV6;
+   }
+
+   if (!strcmp(res->filter_type, "imac-ivlan"))
+   tunnel_filter_conf.filter_type = RTE_TUNNEL_FILTER_IMAC_IVLAN;
+   else if (!strcmp(res->filter_type, "imac-ivlan-tenid"))
+   tunnel_filter_conf.filter_type =
+   RTE_TUNNEL_FILTER_IMAC_IVLAN_TENID;
+   else if (!strcmp(res->filter_type, "imac-tenid"))
+   tunnel_filter_conf.filter_type = RTE_TUNNEL_FILTER_IMAC_TENID;
+   else if (!strcmp(res->filter_type, "imac"))
+   tunnel_filter_conf.filter_type = RTE_TUNNEL_FILTER_IMAC;
+   else if (!strcmp(res->filter_type, "omac-imac-tenid"))
+   tunnel_filter_conf.filter_type =
+   RTE_TUNNEL_FILTER_OMAC_TENID_IMAC;
+   else {
+   printf("The filter type is not supported");
+   return;
+   }
+
+   tunnel_filter_conf.to_queue = RTE_TUNNEL_FLAGS_TO_QUEUE;
+
+   if (!strcmp(res->tunnel_type, "vxlan"))
+   tunnel_filter_conf.tunnel_type = RTE_TUNNEL_TYPE_VXLAN;
+   else {
+   printf("Only VxLAN is supported now.\n");
+   return;
+   }
+
+   tunnel_filter_conf.tenant_id = res->tenant_id;
+   tunnel_filter_conf.queue_id = res->queue_num;
+   if (!strcmp(res->what, "add"))
+   ret = rte_eth_dev_filter_ctrl(res->port_id,
+   RTE_ETH_FILTER_TUNNEL,
+   RTE_ETH_FILTER_OP_ADD,
+   _filter_conf);
+   else
+   ret = rte_eth_dev_filter_ctrl(res->port_id,
+   RTE_ETH_FILTER_TUNNEL,
+   RTE_ETH_FILTER_OP_DELETE,
+   _filter_conf);
+   if (ret < 0)
+   printf("cmd_tunnel_filter_parsed error: (%s)\n",
+   strerror(-ret));
+
+}
+cmdline_parse_token_string_t cmd_tunnel_filter_cmd =
+   TOKEN_STRING_INITIALIZER(struct cmd_tunnel_filter_result,
+   cmd, "tunnel_filter");
+cmdline_parse_token_string_t cmd_tunnel_filter_what =
+

[dpdk-dev] [PATCH v4 5/8]i40e:implement API of VxLAN packet filter in librte_pmd_i40e

2014-09-26 Thread Jijiang Liu

The implementation of VxLAN tunnel filter in librte_pmd_i40e, which include
 - add the i40e_dev_filter_ctrl() function.
 - add the i40e_dev_tunnel_filter_set() function.

Signed-off-by: Jijiang Liu 
Acked-by: Helin Zhang 
Acked-by: Jingjing Wu 
Acked-by: Jing Chen 

---
 lib/librte_pmd_i40e/i40e_ethdev.c |  202 +
 1 files changed, 202 insertions(+), 0 deletions(-)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index ddc7ea0..c9c881a 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "i40e_logs.h"
 #include "i40e/i40e_register_x710_int.h"
@@ -211,7 +212,13 @@ static int i40e_dev_udp_tunnel_add(struct rte_eth_dev *dev,
 static int i40e_dev_udp_tunnel_del(struct rte_eth_dev *dev,
   struct rte_eth_udp_tunnel *udp_tunnel,
   uint8_t count);
+static int i40e_dev_tunnel_filter_set(struct i40e_pf *pf,
+struct rte_eth_tunnel_filter_conf *tunnel_filter,
+uint8_t add);
 static int i40e_pf_config_vxlan(struct i40e_pf *pf);
+static int i40e_dev_filter_ctrl(struct rte_eth_dev *dev,
+  enum rte_filter_type filter_type,
+  enum rte_filter_op filter_op, void *arg);


 /* Default hash key buffer for RSS */
@@ -266,6 +273,7 @@ static struct eth_dev_ops i40e_eth_dev_ops = {
.rss_hash_conf_get= i40e_dev_rss_hash_conf_get,
.udp_tunnel_add   = i40e_dev_udp_tunnel_add,
.udp_tunnel_del   = i40e_dev_udp_tunnel_del,
+   .filter_ctrl  = i40e_dev_filter_ctrl,
 };

 static struct eth_driver rte_i40e_pmd = {
@@ -4124,6 +4132,110 @@ i40e_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
 }

 static int
+i40e_dev_get_filter_type(enum rte_tunnel_filter_type filter_type,
+   uint16_t *flag)
+{
+   switch (filter_type) {
+   case RTE_TUNNEL_FILTER_IMAC_IVLAN:
+   *flag = I40E_AQC_ADD_CLOUD_FILTER_IMAC_IVLAN;
+   break;
+   case RTE_TUNNEL_FILTER_IMAC_IVLAN_TENID:
+   *flag = I40E_AQC_ADD_CLOUD_FILTER_IMAC_IVLAN_TEN_ID;
+   break;
+   case RTE_TUNNEL_FILTER_IMAC_TENID:
+   *flag = I40E_AQC_ADD_CLOUD_FILTER_IMAC_TEN_ID;
+   break;
+   case RTE_TUNNEL_FILTER_OMAC_TENID_IMAC:
+   *flag = I40E_AQC_ADD_CLOUD_FILTER_OMAC_TEN_ID_IMAC;
+   break;
+   case RTE_TUNNEL_FILTER_IMAC:
+   *flag = I40E_AQC_ADD_CLOUD_FILTER_IMAC;
+   break;
+   default:
+   PMD_DRV_LOG(ERR, "invalid tunnel filter type\n");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+static int
+i40e_dev_tunnel_filter_set(struct i40e_pf *pf,
+   struct rte_eth_tunnel_filter_conf *tunnel_filter,
+   uint8_t add)
+{
+   uint16_t ip_type;
+   uint8_t tun_type = 0;
+   int ret = 0;
+   int val;
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   struct i40e_vsi *vsi = pf->main_vsi;
+   struct i40e_aqc_add_remove_cloud_filters_element_data  *cld_filter;
+   struct i40e_aqc_add_remove_cloud_filters_element_data  *pfilter;
+
+   cld_filter = rte_zmalloc("tunnel_filter",
+   sizeof(struct i40e_aqc_add_remove_cloud_filters_element_data),
+   0);
+
+   if (NULL == cld_filter) {
+   PMD_DRV_LOG(ERR, "Failed to alloc memory.\n");
+   return -EINVAL;
+   }
+   pfilter = cld_filter;
+
+   (void)rte_memcpy(>outer_mac, tunnel_filter->outer_mac,
+   sizeof(struct ether_addr));
+   (void)rte_memcpy(>inner_mac, tunnel_filter->inner_mac,
+   sizeof(struct ether_addr));
+
+   pfilter->inner_vlan = tunnel_filter->inner_vlan;
+   if (tunnel_filter->ip_type == RTE_TUNNEL_IPTYPE_IPV4) {
+   ip_type = I40E_AQC_ADD_CLOUD_FLAGS_IPV4;
+   (void)rte_memcpy(>ipaddr.v4.data,
+   _filter->ip_addr,
+   sizeof(pfilter->ipaddr.v4.data));
+   } else {
+   ip_type = I40E_AQC_ADD_CLOUD_FLAGS_IPV6;
+   (void)rte_memcpy(>ipaddr.v6.data,
+   _filter->ip_addr,
+   sizeof(pfilter->ipaddr.v6.data));
+   }
+
+   /* check tunnel type */
+   switch (tunnel_filter->tunnel_type) {
+   case RTE_TUNNEL_TYPE_VXLAN:
+   tun_type = I40E_AQC_ADD_CLOUD_TNL_TYPE_XVLAN;
+   break;
+   default:
+   /* Other tunnel types is not supported. */
+   PMD_DRV_LOG(ERR, "tunnel type is not supported.\n");
+   rte_free(cld_filter);
+   return -EINVAL;
+   }
+
+   val =

[dpdk-dev] [PATCH v4 4/8]librte_ether:add a common filter API

2014-09-26 Thread Jijiang Liu

Introduce a new filter framewok in librte_ether. As to the implemetation 
discussion, please refer to
http://dpdk.org/ml/archives/dev/2014-September/005179.html, and VxLAN tunnel 
filter implementation is based on
it.

Signed-off-by: Jijiang Liu 
Acked-by: Helin Zhang 
Acked-by: Jingjing Wu 

---
 lib/librte_ether/Makefile   |1 +
 lib/librte_ether/rte_eth_ctrl.h |  150 +++
 lib/librte_ether/rte_ethdev.c   |   32 
 lib/librte_ether/rte_ethdev.h   |   56 +++---
 4 files changed, 227 insertions(+), 12 deletions(-)
 create mode 100644 lib/librte_ether/rte_eth_ctrl.h

diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index b310f8b..a461c31 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -46,6 +46,7 @@ SRCS-y += rte_ethdev.c
 #
 SYMLINK-y-include += rte_ether.h
 SYMLINK-y-include += rte_ethdev.h
+SYMLINK-y-include += rte_eth_ctrl.h

 # this lib depends upon:
 DEPDIRS-y += lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
new file mode 100644
index 000..47186e5
--- /dev/null
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -0,0 +1,150 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_CTRL_H_
+#define _RTE_ETH_CTRL_H_
+
+/**
+ * @file
+ *
+ * Ethernet device features and related data structures used
+ * by control APIs should be defined in this file.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Feature filter types
+ */
+enum rte_filter_type {
+   RTE_ETH_FILTER_NONE = 0,
+   RTE_ETH_FILTER_RSS,
+   RTE_ETH_FILTER_FDIR,
+   RTE_ETH_FILTER_TUNNEL,
+   RTE_ETH_FILTER_MAX,
+};
+
+/**
+ * all generic operations to filters
+ */
+enum rte_filter_op {
+   RTE_ETH_FILTER_OP_NONE = 0, /**< used to check whether the type filter 
is supported */
+   RTE_ETH_FILTER_OP_ADD,  /**< add filter entry */
+   RTE_ETH_FILTER_OP_UPDATE,   /**< update filter entry */
+   RTE_ETH_FILTER_OP_DELETE,   /**< delete filter entry */
+   RTE_ETH_FILTER_OP_GET,  /**< get filter entry */
+   RTE_ETH_FILTER_OP_SET,  /**< configurations */
+   RTE_ETH_FILTER_OP_GET_INFO, /**< get information of filter, such as 
status or statistics */
+   RTE_ETH_FILTER_OP_MAX,
+};
+
+/ TUNNEL FILTER DATA DEFINATION *** */
+
+#define ETH_TUNNEL_FILTER_OMAC  0x01
+#define ETH_TUNNEL_FILTER_OIP   0x02
+#define ETH_TUNNEL_FILTER_TENID 0x04
+#define ETH_TUNNEL_FILTER_IMAC  0x08
+#define ETH_TUNNEL_FILTER_IVLAN 0x10
+#define ETH_TUNNEL_FILTER_IIP   0x20
+
+#define RTE_TUNNEL_FLAGS_TO_QUEUE 1
+
+/*
+ * Tunneled filter type
+ */
+enum rte_tunnel_filter_type {
+   RTE_TUNNEL_FILTER_TYPE_NONE = 0,
+   RTE_TUNNEL_FILTER_OIP = ETH_TUNNEL_FILTER_OIP,
+   RTE_TUNNEL_FILTER_IMAC_IVLAN =
+   ETH_TUNNEL_FILTER_IMAC | ETH_TUNNEL_FILTER_IVLAN,
+   RTE_TUNNEL_FILTER_IMAC_IVLAN_TENID =
+   ETH_TUNNEL_FILTER_IMAC | ETH_TUNNEL_FILTER_IVLAN |
+   ETH_TUNNEL_FILTER_TENID,
+   RTE_TUNNEL_FILTER_IMAC_TENID =
+   ETH_TUNNEL_FILTER_IMAC | ETH_TUNNEL_FILTER_TENID,
+   RTE_TUNNEL_FILTER_IMAC = ETH_TUNNEL_FILTER_IMAC,
+   RTE_TUNNEL_FILTER_OMAC_TENID_IMAC =
+

[dpdk-dev] [PATCH v4 3/8]app/test-pmd:test VxLAN packet identification

2014-09-26 Thread Jijiang Liu

Add commands to test VxLAN packet identification, which include
 - use commands to add/delete VxLAN UDP port.
 - use rxonly mode to receive VxLAN packet.

Signed-off-by: Jijiang Liu 
Acked-by: Helin Zhang 
Acked-by: Jingjing Wu 
Acked-by: Jing Chen 

---
 app/test-pmd/cmdline.c|   78 ++--
 app/test-pmd/parameters.c |   13 +++
 app/test-pmd/rxonly.c |   49 
 app/test-pmd/testpmd.c|8 +
 app/test-pmd/testpmd.h|4 ++
 5 files changed, 148 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 225f669..c0b7293 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -285,6 +285,12 @@ static void cmd_help_long_parsed(void *parsed_result,
"Set the outer VLAN TPID for Packet Filtering on"
" a port\n\n"

+   "rx_vxlan_port add (udp_port) (port_id)\n"
+   "Add an UDP port for VxLAN packet filter on a 
port\n\n"
+
+   "rx_vxlan_port rm (udp_port) (port_id)\n"
+   "Remove an UDP port for VxLAN packet filter on a 
port\n\n"
+
"tx_vlan set vlan_id (port_id)\n"
"Set hardware insertion of VLAN ID in packets sent"
" on a port.\n\n"
@@ -296,13 +302,17 @@ static void cmd_help_long_parsed(void *parsed_result,
"Disable hardware insertion of a VLAN header in"
" packets sent on a port.\n\n"

-   "tx_checksum set mask (port_id)\n"
+   "tx_checksum set (mask) (port_id)\n"
"Enable hardware insertion of checksum offload with"
-   " the 4-bit mask, 0~0xf, in packets sent on a port.\n"
+   " the 8-bit mask, 0~0xff, in packets sent on a port.\n"
"bit 0 - insert ip   checksum offload if set\n"
"bit 1 - insert udp  checksum offload if set\n"
"bit 2 - insert tcp  checksum offload if set\n"
"bit 3 - insert sctp checksum offload if set\n"
+   "bit 4 - insert inner ip  checksum offload if 
set\n"
+   "bit 5 - insert inner udp checksum offload if 
set\n"
+   "bit 6 - insert inner tcp checksum offload if 
set\n"
+   "bit 7 - insert inner sctp checksum offload if 
set\n"
"Please check the NIC datasheet for HW limits.\n\n"

"set fwd (%s)\n"
@@ -2745,8 +2755,9 @@ cmdline_parse_inst_t cmd_tx_cksum_set = {
.f = cmd_tx_cksum_set_parsed,
.data = NULL,
.help_str = "enable hardware insertion of L3/L4checksum with a given "
-   "mask in packets sent on a port, the bit mapping is given as, Bit 0 for 
ip"
-   "Bit 1 for UDP, Bit 2 for TCP, Bit 3 for SCTP",
+   "mask in packets sent on a port, the bit mapping is given as, Bit 0 for 
ip "
+   "Bit 1 for UDP, Bit 2 for TCP, Bit 3 for SCTP, Bit 4 for inner ip "
+   "Bit 5 for inner UDP, Bit 6 for inner TCP, Bit 7 for inner SCTP",
.tokens = {
(void *)_tx_cksum_set_tx_cksum,
(void *)_tx_cksum_set_set,
@@ -6221,6 +6232,64 @@ cmdline_parse_inst_t cmd_vf_rate_limit = {
},
 };

+/* *** CONFIGURE TUNNEL UDP PORT *** */
+struct cmd_tunnel_udp_config {
+   cmdline_fixed_string_t cmd;
+   cmdline_fixed_string_t what;
+   uint16_t udp_port;
+   uint8_t port_id;
+};
+
+static void
+cmd_tunnel_udp_config_parsed(void *parsed_result,
+ __attribute__((unused)) struct cmdline *cl,
+ __attribute__((unused)) void *data)
+{
+   struct cmd_tunnel_udp_config *res = parsed_result;
+   struct rte_eth_udp_tunnel tunnel_udp;
+   int ret;
+
+   tunnel_udp.udp_port = res->udp_port;
+
+   if (!strcmp(res->cmd, "rx_vxlan_port"))
+   tunnel_udp.prot_type = RTE_TUNNEL_TYPE_VXLAN;
+
+   if (!strcmp(res->what, "add"))
+   ret = rte_eth_dev_udp_tunnel_add(res->port_id, _udp, 1);
+   else
+   ret = rte_eth_dev_udp_tunnel_delete(res->port_id, _udp, 
1);
+
+   if (ret < 0)
+   printf("udp tunneling add error: (%s)\n", strerror(-ret));
+}
+
+cmdline_parse_token_string_t cmd_tunnel_udp_config_cmd =
+   TOKEN_STRING_INITIALIZER(struct cmd_tunnel_udp_config,
+   cmd, "rx_vxlan_port");
+cmdline_parse_token_string_t cmd_tunnel_udp_config_what =
+   TOKEN_STRING_INITIALIZER(struct cmd_tunnel_udp_config,
+   what, "add#rm");
+cmdline_parse_token_num_t cmd_tunnel_udp_config_udp_port =
+   TOKEN_NUM_INITIALIZER(struct cmd_tunnel_udp_config,
+

[dpdk-dev] [PATCH v4 2/8]i40e:support VxLAN packet identification in librte_pmd_i40e

2014-09-26 Thread Jijiang Liu

Support tunneling UDP port configuration on i40e in librte_pmd_i40e.
Currently, only VxLAN is implemented, which include
 -  VxLAN UDP port initialization
 -  Implement the APIs to configure VxLAN UDP port in librte_pmd_i40e.

Signed-off-by: Jijiang Liu 
Acked-by: Helin Zhang 
Acked-by: Jingjing Wu 
Acked-by: Jing Chen 

---
 config/common_linuxapp|5 +
 lib/librte_mbuf/rte_mbuf.h|2 +
 lib/librte_pmd_i40e/i40e_ethdev.c |  200 -
 lib/librte_pmd_i40e/i40e_ethdev.h |5 +
 lib/librte_pmd_i40e/i40e_rxtx.c   |   10 ++
 5 files changed, 221 insertions(+), 1 deletions(-)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 5bee910..75a4cd7 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -212,6 +212,11 @@ CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VF=4
 CONFIG_RTE_LIBRTE_I40E_ITR_INTERVAL=-1

 #
+# Compile tunneling UDP port support
+#
+CONFIG_RTE_LIBRTE_TUNNEL_UDP_PORT=4789
+
+#
 # Compile burst-oriented VIRTIO PMD driver
 #
 CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 1c6e115..4955684 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -538,6 +538,7 @@ static inline void rte_pktmbuf_reset(struct rte_mbuf *m)
m->port = 0xff;

m->ol_flags = 0;
+   m->reserved = 0;
m->data_off = (RTE_PKTMBUF_HEADROOM <= m->buf_len) ?
RTE_PKTMBUF_HEADROOM : m->buf_len;

@@ -607,6 +608,7 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, 
struct rte_mbuf *md)
mi->pkt_len = mi->data_len;
mi->nb_segs = 1;
mi->ol_flags = md->ol_flags;
+   mi->reserved = md->reserved;

__rte_mbuf_sanity_check(mi, 1);
__rte_mbuf_sanity_check(md, 0);
diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index a00d6ca..ddc7ea0 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -189,7 +189,7 @@ static int i40e_res_pool_alloc(struct i40e_res_pool_info 
*pool,
 static int i40e_dev_init_vlan(struct rte_eth_dev *dev);
 static int i40e_veb_release(struct i40e_veb *veb);
 static struct i40e_veb *i40e_veb_setup(struct i40e_pf *pf,
-   struct i40e_vsi *vsi);
+   struct i40e_vsi *vsi);
 static int i40e_pf_config_mq_rx(struct i40e_pf *pf);
 static int i40e_vsi_config_double_vlan(struct i40e_vsi *vsi, int on);
 static inline int i40e_find_all_vlan_for_mac(struct i40e_vsi *vsi,
@@ -205,6 +205,14 @@ static int i40e_dev_rss_hash_update(struct rte_eth_dev 
*dev,
struct rte_eth_rss_conf *rss_conf);
 static int i40e_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
  struct rte_eth_rss_conf *rss_conf);
+static int i40e_dev_udp_tunnel_add(struct rte_eth_dev *dev,
+  struct rte_eth_udp_tunnel *udp_tunnel,
+  uint8_t count);
+static int i40e_dev_udp_tunnel_del(struct rte_eth_dev *dev,
+  struct rte_eth_udp_tunnel *udp_tunnel,
+  uint8_t count);
+static int i40e_pf_config_vxlan(struct i40e_pf *pf);
+

 /* Default hash key buffer for RSS */
 static uint32_t rss_key_default[I40E_PFQF_HKEY_MAX_INDEX + 1];
@@ -256,6 +264,8 @@ static struct eth_dev_ops i40e_eth_dev_ops = {
.reta_query   = i40e_dev_rss_reta_query,
.rss_hash_update  = i40e_dev_rss_hash_update,
.rss_hash_conf_get= i40e_dev_rss_hash_conf_get,
+   .udp_tunnel_add   = i40e_dev_udp_tunnel_add,
+   .udp_tunnel_del   = i40e_dev_udp_tunnel_del,
 };

 static struct eth_driver rte_i40e_pmd = {
@@ -2532,6 +2542,34 @@ i40e_vsi_dump_bw_config(struct i40e_vsi *vsi)
return 0;
 }

+static int
+i40e_vxlan_filters_init(struct i40e_pf *pf)
+{
+   uint8_t filter_index;
+   int ret = 0;
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+
+   if (!(pf->flags & I40E_FLAG_VXLAN))
+   return 0;
+
+   /* Init first entry in tunneling UDP table */
+   ret = i40e_aq_add_udp_tunnel(hw, RTE_LIBRTE_TUNNEL_UDP_PORT,
+   I40E_AQC_TUNNEL_TYPE_VXLAN,
+   _index, NULL);
+   if (ret < 0) {
+   PMD_DRV_LOG(ERR, "Failed to add UDP tunnel port %d "
+   "with index=%d\n", RTE_VXLAN_UDP_PORT,
+filter_index);
+   } else {
+   pf->vxlan_bitmap |= 1;
+   pf->vxlan_ports[0] = RTE_LIBRTE_TUNNEL_UDP_PORT;
+   PMD_DRV_LOG(INFO, "Added UDP tunnel port %d with "
+   "index=%d\n", RTE_VXLAN_UDP_PORT, filter_index);
+   }
+
+   return ret;
+}
+
 /* Setup a VSI */
 struct i40e_vsi *
 i40e_vsi_setup(struct i40e_pf *pf,
@@ -3163,6 +3201,12

[dpdk-dev] [PATCH v4 1/8]i40e:support VxLAN packet identification in librte_ether

2014-09-26 Thread Jijiang Liu

Add data structures and APIs in librte_ether for supporting tunneling UDP port 
configuration on i40e,
Currently, only VxLAN is implemented, which include
 -  VxLAN UDP port initialization
 -  Add APIs to configure VxLAN UDP port

Signed-off-by: Jijiang Liu 
Acked-by: Helin Zhang 
Acked-by: Jingjing Wu 
Acked-by: Jing Chen 

---
 lib/librte_ether/rte_ethdev.c |   63 ++
 lib/librte_ether/rte_ethdev.h |   76 +
 lib/librte_ether/rte_ether.h  |8 
 3 files changed, 147 insertions(+), 0 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index b71b679..642d312 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2029,6 +2029,69 @@ rte_eth_dev_rss_hash_conf_get(uint8_t port_id,
 }

 int
+rte_eth_dev_udp_tunnel_add(uint8_t port_id,
+  struct rte_eth_udp_tunnel *udp_tunnel,
+  uint8_t count)
+{
+   uint8_t i;
+   struct rte_eth_dev *dev;
+   struct rte_eth_udp_tunnel *tunnel;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -ENODEV;
+   }
+
+   if (udp_tunnel == NULL) {
+   PMD_DEBUG_TRACE("Invalid udp_tunnel parameter\n");
+   return -EINVAL;
+   }
+   tunnel = udp_tunnel;
+
+   for (i = 0; i < count; i++, tunnel++) {
+   if (tunnel->prot_type >= RTE_TUNNEL_TYPE_MAX) {
+   PMD_DEBUG_TRACE("Invalid tunnel type\n");
+   return -EINVAL;
+   }
+   }
+
+   dev = _eth_devices[port_id];
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->udp_tunnel_add, -ENOTSUP);
+   return (*dev->dev_ops->udp_tunnel_add)(dev, udp_tunnel, count);
+}
+
+int
+rte_eth_dev_udp_tunnel_delete(uint8_t port_id,
+ struct rte_eth_udp_tunnel *udp_tunnel,
+ uint8_t count)
+{
+   uint8_t i;
+   struct rte_eth_dev *dev;
+   struct rte_eth_udp_tunnel *tunnel;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -ENODEV;
+   }
+   dev = _eth_devices[port_id];
+
+   if (udp_tunnel == NULL) {
+   PMD_DEBUG_TRACE("Invalid udp_tunnel parametr\n");
+   return -EINVAL;
+   }
+   tunnel = udp_tunnel;
+   for (i = 0; i < count; i++, tunnel++) {
+   if (tunnel->prot_type >= RTE_TUNNEL_TYPE_MAX) {
+   PMD_DEBUG_TRACE("Invalid tunnel type\n");
+   return -EINVAL;
+   }
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->udp_tunnel_del, -ENOTSUP);
+   return (*dev->dev_ops->udp_tunnel_del)(dev, udp_tunnel, count);
+}
+
+int
 rte_eth_led_on(uint8_t port_id)
 {
struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 60b24c5..615fec0 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -708,6 +708,26 @@ struct rte_fdir_conf {
 };

 /**
+ * UDP tunneling configuration.
+ */
+struct rte_eth_udp_tunnel {
+   uint16_t udp_port;
+   uint8_t prot_type;
+};
+
+/**
+ * Tunneled type.
+ */
+enum rte_eth_tunnel_type {
+   RTE_TUNNEL_TYPE_NONE = 0,
+   RTE_TUNNEL_TYPE_VXLAN,
+   RTE_TUNNEL_TYPE_GENEVE,
+   RTE_TUNNEL_TYPE_TEREDO,
+   RTE_TUNNEL_TYPE_NVGRE,
+   RTE_TUNNEL_TYPE_MAX,
+};
+
+/**
  *  Possible l4type of FDIR filters.
  */
 enum rte_l4type {
@@ -829,6 +849,7 @@ struct rte_intr_conf {
  * configuration settings may be needed.
  */
 struct rte_eth_conf {
+   enum rte_eth_tunnel_type tunnel_type;
uint16_t link_speed;
/**< ETH_LINK_SPEED_10[0|00|000], or 0 for autonegotation */
uint16_t link_duplex;
@@ -1262,6 +1283,17 @@ typedef int (*eth_mirror_rule_reset_t)(struct 
rte_eth_dev *dev,
  uint8_t rule_id);
 /**< @internal Remove a traffic mirroring rule on an Ethernet device */

+typedef int (*eth_udp_tunnel_add_t)(struct rte_eth_dev *dev,
+   struct rte_eth_udp_tunnel *tunnel_udp,
+   uint8_t count);
+/**< @internal Add tunneling UDP info */
+
+typedef int (*eth_udp_tunnel_del_t)(struct rte_eth_dev *dev,
+   struct rte_eth_udp_tunnel *tunnel_udp,
+   uint8_t count);
+/**< @internal Delete tunneling UDP info */
+
+
 #ifdef RTE_NIC_BYPASS

 enum {
@@ -1436,6 +1468,8 @@ struct eth_dev_ops {
eth_set_vf_rx_tset_vf_rx;  /**< enable/disable a VF receive 
*/
eth_set_vf_tx_tset_vf_tx;  /**< enable/disable a VF 
transmit */
eth_set_vf_vlan_filter_t   set_vf_vlan_filter;  /**< Set VF VLAN filter 
*/
+   eth_udp_tunnel_add_t   udp_tunnel_add;
+   eth_udp_tunnel_del_t

[dpdk-dev] [PATCH v4 0/8]Support VxLAN on Fortville

2014-09-26 Thread Jijiang Liu

The patch set supports VxLAN on Fortville based on current mbuf structure. When 
Bruce's Mbuf Structure Rework(part 3) is applied, there will be minor changes 
later.

It includes:
 - Support VxLAN packet identification by configuring tunneling UDP port.
 - Support VxLAN packet filters. It uses MAC and VLAN to point
   to a queue. The filter types supported include below:
   1. Inner MAC and Inner VLAN ID
   2. Inner MAC address, inner VLAN ID and tenant ID.
   3. Inner MAC and tenant ID
   4. Inner MAC address
   5. Outer MAC address, tenant ID and inner MAC
 - Support VxLAN TX checksum offload, which include outer L3(IP), inner L3(IP) 
and inner L4(UDP,TCP and SCTP)

Change notes:

 v4) Merge Mbuf Structure changes done by Bruce, and fix merged conflicits in 
app/test-pmd/config.c file.


jijiangl (8):
  support VxLAN packet identification in librte_ether
  support VxLAN packet identification in librte_pmd_i40e
  test vxlan packet identification
  Add new filter framework
  implement API of VxLAN packet filter in librte_pmd_i40e
  test VxLAN packet filter API
  support VxLAN Tx checksum offload
  test VxLAN Tx checksum offload

 app/test-pmd/cmdline.c|  233 +-
 app/test-pmd/config.c |6 +-
 app/test-pmd/csumonly.c   |  200 +--
 app/test-pmd/parameters.c |   13 ++
 app/test-pmd/rxonly.c |   49 +
 app/test-pmd/testpmd.c|8 +
 app/test-pmd/testpmd.h|4 +
 config/common_linuxapp|5 +
 lib/librte_ether/Makefile |1 +
 lib/librte_ether/rte_eth_ctrl.h   |  150 ++
 lib/librte_ether/rte_ethdev.c |   95 +
 lib/librte_ether/rte_ethdev.h |  108 ++
 lib/librte_ether/rte_ether.h  |8 +
 lib/librte_mbuf/rte_mbuf.h|4 +
 lib/librte_pmd_i40e/i40e_ethdev.c |  405 -
 lib/librte_pmd_i40e/i40e_ethdev.h |5 +
 lib/librte_pmd_i40e/i40e_rxtx.c   |   57 +-
 17 files changed, 1325 insertions(+), 26 deletions(-)
 create mode 100644 lib/librte_ether/rte_eth_ctrl.h

-- 
1.7.7.6

[dpdk-dev] [PATCH] ixgbe: fix crash caused by bulk allocation failure in vector pmd

2014-09-26 Thread Balazs Nemeth

Since the introduction of vector PMD, a bug in ixgbe_rxq_rearm could
cause a crash. As long as the memory pool allocated to the RX queue
has mbufs available, there is no problem. After allocation of _all_
mbufs from the memory pool, previously returned mbufs by
rte_eth_rx_burst could be accessed by subsequent calls to the PMD and
could be returned by subsequent calls to rte_eth_rx_burst. From the
perspective of the application, the means that fields within the mbuf
could change and that previously allocated mbufs could appear multiple
times.

After failure of mbuf allocation, the dd bits should indicate that the
packets are not ready. For this, this patch adds code to reset the dd
bits in the first RTE_IXGBE_DESCS_PER_LOOP packets of the next
RTE_IXGBE_RXQ_REARM_THRESH packets only if the next
RTE_IXGBE_RXQ_REARM_THRESH packets that will be accessed contain
previously allocated packets.

Setting the bits is not enough. The bits are checked _after_ setting
the mbuf fields, thus a mechanism is needed to prevent the previously
used mbuf pointers from being accessed during the speculative load of
the mbuf fields. For this reason, not only the dd bits are reset, but
also the mbufs associated to those descriptors are set to point to a
"fake" mbuf.

Signed-off-by: Balazs Nemeth 
---
 lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
index 203ddf7..457f267 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
@@ -54,17 +54,28 @@ ixgbe_rxq_rearm(struct igb_rx_queue *rxq)
struct rte_mbuf *mb0, *mb1;
__m128i hdr_room = _mm_set_epi64x(RTE_PKTMBUF_HEADROOM,
RTE_PKTMBUF_HEADROOM);
+   __m128i dma_addr0, dma_addr1;
+
+   rxdp = rxq->rx_ring + rxq->rxrearm_start;

/* Pull 'n' more MBUFs into the software ring */
if (rte_mempool_get_bulk(rxq->mb_pool,
-(void *)rxep, RTE_IXGBE_RXQ_REARM_THRESH) < 0)
+(void *)rxep,
+RTE_IXGBE_RXQ_REARM_THRESH) < 0) {
+   if (rxq->rxrearm_nb + RTE_IXGBE_RXQ_REARM_THRESH >=
+   rxq->nb_rx_desc) {
+   dma_addr0 = _mm_xor_si128(dma_addr0, dma_addr0);
+   for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) {
+   rxep[i].mbuf = >fake_mbuf;
+   _mm_store_si128((__m128i *)[i].read,
+   dma_addr0);
+   }
+   }
return;
-
-   rxdp = rxq->rx_ring + rxq->rxrearm_start;
+   }

/* Initialize the mbufs in vector, process 2 mbufs in one loop */
for (i = 0; i < RTE_IXGBE_RXQ_REARM_THRESH; i += 2, rxep += 2) {
-   __m128i dma_addr0, dma_addr1;
__m128i vaddr0, vaddr1;

mb0 = rxep[0].mbuf;
-- 
2.1.0

[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Wodkowski, PawelX

> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > index 480f0cb..73b6dc5 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > @@ -64,6 +64,9 @@
> >  #define MS_PER_S 1000
> >  #define US_PER_S (US_PER_MS * MS_PER_S)
> >
> > +#define ALARM_EXECUTING (1 << 0)
> > +#define ALARM_CANCELLED (1 << 1)
> > +
> >  struct alarm_entry {
> > LIST_ENTRY(alarm_entry) next;
> > struct timeval time;
> > @@ -107,12 +110,14 @@ eal_alarm_callback(struct rte_intr_handle *hdl
> > __rte_unused,
> > gettimeofday(, NULL) == 0 &&
> > (ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> > now.tv_sec &&
> > ap->time.tv_usec <=
> > now.tv_usec))){
> > -   ap->executing = 1;
> > -   rte_spinlock_unlock(_list_lk);
> 
> Removing unlock here introduce deadlock.

I does no spotted unlocking bellow so above is invalid.

> 
> > +   ap->executing |= ALARM_EXECUTING;
> > +   if (likely(!(ap->executing & ALARM_CANCELLED)) {
> > +   rte_spinlock_unlock(_list_lk);
> >
> > -   ap->cb_fn(ap->cb_arg);
> > +   ap->cb_fn(ap->cb_arg);
> >
> > -   rte_spinlock_lock(_list_lk);
> > +   rte_spinlock_lock(_list_lk);
> > +   }
> > LIST_REMOVE(ap, next);
> > rte_free(ap);
> > }
> > @@ -209,10 +214,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> > void *cb_arg)
> > rte_spinlock_lock(_list_lk);
> > /* remove any matches at the start of the list */
> > while ((ap = LIST_FIRST(_list)) != NULL &&
> > -   cb_fn == ap->cb_fn && ap->executing == 0 &&
> > +   cb_fn == ap->cb_fn &&
> > (cb_arg == (void *)-1 || cb_arg == ap->cb_arg)) {
> > -   LIST_REMOVE(ap, next);
> > -   rte_free(ap);
> > +   ap->executing |= ALARM_CANCELLED;
> > count++;
> > }
> > ap_prev = ap;
> > @@ -220,10 +224,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> > void *cb_arg)
> > /* now go through list, removing entries not at start */
> > LIST_FOREACH(ap, _list, next) {
> > /* this won't be true first time through */
> > -   if (cb_fn == ap->cb_fn &&  ap->executing == 0 &&
> > +   if (cb_fn == ap->cb_fn &&
> > (cb_arg == (void *)-1 || cb_arg == ap->cb_arg))
> > {
> > -   LIST_REMOVE(ap,next);
> > -   rte_free(ap);
> > +   ap->executing |= ALARM_CANCELLED;
> > count++;
> > ap = ap_prev;
> > }
> 
> Pawel

[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Neil Horman

On Fri, Sep 26, 2014 at 06:33:12AM +, Wodkowski, PawelX wrote:
> > Given what you said above, I agree, at least in the current implementation. 
> >  It
> > still seems like theres a simpler solution that doesn't require all the
> > comparative gymnastics.
> 
> Yes, there is simpler solution, but this solution involve recursive locking.
> DPDK recursive spinlocks are no an option in here, so only option is posix 
> recursive
> mutex, which I think is even worst option than this gymnastics.
> 
I agree, lets avoid more locking if we can.

> > 
> > What if, instead of testing if you're the callback thread, we turn the 
> > executing
> > field of alarm_entry into a bitfield, where bit 0 represents the former
> > "executing" state, and bit 1 is defined as a "cancelled" bit.  Then
> > rte_eal_alarm_cancel becomes a search that, when an alarm is found simply 
> > or's
> > in the cancelled bit to the executing bit field.  When the callback thread 
> > runs,
> > it skips executing any alarm that is marked as cancelled, but frees all 
> > alarm
> > entries that are executed or cancelled.  That gives us a single point at 
> > which
> > frees of alarm entires happen?  Something like the patch below (completely
> > untested)?
> > 
> > It also seems like the alarm api as a whole could use some improvement.  The
> > way its written right now, theres no way to refer to a specific alarm (i.e.
> > cancelation relies on the specification of a function and data pointer, 
> > which
> > may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an opaque
> > handle to a unique timer instance that can be store by a caller and used to
> > specfically cancel that timer?  Thats how both the bsd and linux timer
> > subsystems model timers.
> > 
> 
> Goal was to not break user applications that use this library.
> 
You break API all the time, why are you worried about it here?  I'm all for
maintaining API definately, but once my ABI versioning code gets into place we
can manage this alot better.

> > 
> > 
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > index 480f0cb..73b6dc5 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > @@ -64,6 +64,9 @@
> >  #define MS_PER_S 1000
> >  #define US_PER_S (US_PER_MS * MS_PER_S)
> > 
> > +#define ALARM_EXECUTING (1 << 0)
> > +#define ALARM_CANCELLED (1 << 1)
> > +
> >  struct alarm_entry {
> > LIST_ENTRY(alarm_entry) next;
> > struct timeval time;
> > @@ -107,12 +110,14 @@ eal_alarm_callback(struct rte_intr_handle *hdl
> > __rte_unused,
> > gettimeofday(, NULL) == 0 &&
> > (ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> > now.tv_sec &&
> > ap->time.tv_usec <=
> > now.tv_usec))){
> > -   ap->executing = 1;
> > -   rte_spinlock_unlock(_list_lk);
> 
> Removing unlock here introduce deadlock.
> 
Please look more closely, I've not removed anything, only moved where the lock
occurs.

> > +   ap->executing |= ALARM_EXECUTING;
> > +   if (likely(!(ap->executing & ALARM_CANCELLED)) {
> > +   rte_spinlock_unlock(_list_lk);
The unlock is now here, conditional on needing to call the callback.

> > 
> > -   ap->cb_fn(ap->cb_arg);
> > +   ap->cb_fn(ap->cb_arg);
> > 
> > -   rte_spinlock_lock(_list_lk);
> > +   rte_spinlock_lock(_list_lk);
> > +   }
> > LIST_REMOVE(ap, next);
> > rte_free(ap);
> > }
> > @@ -209,10 +214,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> > void *cb_arg)
> > rte_spinlock_lock(_list_lk);
> > /* remove any matches at the start of the list */
> > while ((ap = LIST_FIRST(_list)) != NULL &&
> > -   cb_fn == ap->cb_fn && ap->executing == 0 &&
> > +   cb_fn == ap->cb_fn &&
> > (cb_arg == (void *)-1 || cb_arg == ap->cb_arg)) {
> > -   LIST_REMOVE(ap, next);
> > -   rte_free(ap);
> > +   ap->executing |= ALARM_CANCELLED;
> > count++;
> > }
> > ap_prev = ap;
> > @@ -220,10 +224,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> > void *cb_arg)
> > /* now go through list, removing entries not at start */
> > LIST_FOREACH(ap, _list, next) {
> > /* this won't be true first time through */
> > -   if (cb_fn == ap->cb_fn &&  ap->executing == 0 &&
> > +   if (cb_fn == ap->cb_fn &&
> > (cb_arg == (void *)-1 || cb_arg == ap->cb_arg))
> > {
> > -   LIST_REMOVE(ap,next);
> > -   rte_free(ap);
> > +   ap->executing |= ALARM_CANCELLED;
> > count++;
> > ap = ap_prev;
> > }
> 
> Pawel
>

[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Neil Horman

On Fri, Sep 26, 2014 at 12:37:54PM +, Wodkowski, PawelX wrote:
> > So basically cancel() just set ALARM_CANCELLED and leaves actual alarm
> > deletion to the callback()?
> > That was the thought, yes.
> > 
> > > I think it is doable - but I don't see any real advantage with that 
> > > approach.
> > > Yes, code will become a bit simpler, as  we'll have one point when we 
> > > remove
> > alarm from the list.
> > Yes, that would be the advantage, that the code would be much simpler.
> > 
> > > But from other side, imagine such simple test-case:
> > >
> > > for (i = 0; i < 0x10; i++) {
> > >rte_eal_alarm_set(ONE_MIN, cb_func, (void *)i);
> > >rte_eal_alarm_cancel(cb_func, (void *)i);
> > > }
> > >
> > > We'll endup with 1M of cancelled, but still not removed entries in the
> > alarm_list.
> > > With current implementation that means - few MBs of wasted memory,
> > Thats correct, and the tradeoff to choose between.  Do you want simpler code
> > that is easier to maintain, or do you want a high speed cancel and set
> > operation.  I'm not aware of all the use cases, but I have a hard time 
> > seeing
> > a use case in which the in-flight alarm list grows unboundedly large, which 
> > in
> > my mind mitigates the risk of deferred removal, but I'm perfectly willing to
> > believe that there are use cases which I'm not aware of.
> > 
> > > plus very slow set() and cancel(), as they'll  have to traverse all 
> > > entries in the
> > list.
> > > And all that - for empty from user perspective alarm_list
> > > So I still prefer Michal's way.
> > > After all, it doesn't look that complicated to me.
> > Except that the need for Michals fix arose from the fact that we have two 
> > free
> > locations that might both get called depending on the situation.  Thats 
> > what I'm
> > trying to address here, the complexity itself, rather than the fix (which I
> > agree is perfectly valid).
> > 
> > > BTW, any particular reason you are so negative about pthread_self()?
> > >
> > Nothing specifically against it (save for its inverted return code sense, 
> > which
> > made it difficult for me to parse when reviewing).  Its more the complexity
> > itself in the alarm cancel and callback routine that I was looking at.  
> > Given
> > that the origional bug happened because an cancel in a callback might 
> > produce a
> > double free, I wanted to fix it by simpifying the code, not adding 
> > conditions
> > which make it more complex.
> > 
> > You know, looking at it, something else just occured to me.  I think this 
> > could
> > all be fixed without the cancel flag or the pthread_self assignments.  What 
> > if
> > we simply removed the alarm from the list before we called the callback in
> > rte_eal_alarm_callback()?  That way any cancel operation called from within 
> > the
> > callback would fail, as it wouldn't appear on the list, and the callback
> > operation would be the only freeing entity?  That would let you still have a
> > fast set and cancel, and avoid the race.  Thoughts?  Untested sample patch
> > below
> > 
> > 
> > > >
> > > > It also seems like the alarm api as a whole could use some improvement.
> > The
> > > > way its written right now, theres no way to refer to a specific alarm 
> > > > (i.e.
> > > > cancelation relies on the specification of a function and data pointer, 
> > > > which
> > > > may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an 
> > > > opaque
> > > > handle to a unique timer instance that can be store by a caller and 
> > > > used to
> > > > specfically cancel that timer?  Thats how both the bsd and linux timer
> > > > subsystems model timers.
> > >
> > > Yeh,  alarm API looks a bit unusual.
> > > Though, I suppose that's subject for another patch/discussion :)
> > >
> > Yes, agreed :)
> > 
> 
> Please read quoted message bellow:
> 
> > >
> > >
> > > His solution *does* eliminate race condition too.
> > >
> > I applied his patch. And here is the problem
> > 1   rte_spinlock_lock(_list_lk);
> > 2   while ((ap = LIST_FIRST(_list)) !=NULL &&
> > 3   gettimeofday(, NULL) == 0 &&
> > 4   (ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> > now.tv_sec &&
> > 5   ap->time.tv_usec <=
> > now.tv_usec))){
> > 6   ap->executing |= ALARM_EXECUTING;
> > 7   if (likely(!(ap->executing & ALARM_CANCELLED))) {
> > 8   rte_spinlock_unlock(_list_lk);
> > 9   //another thread: rte_alarm_cancel called, mark 
> > this timer
> > canceled and exit ( THE RACE)
> > 10  ap->cb_fn(ap->cb_arg); // rte_alarm_set called
> > (THE RACE)
> > 11
> > 12  rte_spinlock_lock(_list_lk);
> > 13  }
> > 14
> > 15  rte_spinlock_lock(_list_lk);
> > 16  LIST_REMOVE(ap, next);
> > 17  rte_free(ap);
> > 18  }
> > 
> > Imagine
> > 
> > Thread 1:

[dpdk-dev] [PATCH 4/4] table: fix pointer calculations at initialization

2014-09-26 Thread Balazs Nemeth

During initialization of rte_table_hash_ext and rte_table_hash_lru, a
contiguous region of memory is allocated to store meta data, buckets,
extended buckets, keys, stack of keys, stack of extended buckets and
data entries. The size of each region depends on the hash table
configuration.

The address of each region is calculated using offsets relative to the
beginning of the memory region. Without this patch, the offsets
contain the size of the table meta data (sizeof(struct
rte_table_hash)). These addresses are stored in pointers which are
used when entries are added or deleted and lookups are performed.

Instead of adding these offsets to the address of the beginning of the
memory region, they are added to the address of the end of the meta
data (= address of the beginning of the memory region + sizeof(struct
rte_table_hash)). The resulting addresses are off by sizeof(struct
rte_table_hash) bytes. As a consequence, memory past the allocated
region can be accessed by the add, delete and lookup operations.

This patch corrects the address calculation by not including the size
of the meta data in the offsets.

Acked-by: Cristian Dumitrescu 
Signed-off-by: Balazs Nemeth 
---
 lib/librte_table/rte_table_hash_ext.c | 5 ++---
 lib/librte_table/rte_table_hash_lru.c | 5 ++---
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/lib/librte_table/rte_table_hash_ext.c 
b/lib/librte_table/rte_table_hash_ext.c
index fb3e6d2..467f48a 100644
--- a/lib/librte_table/rte_table_hash_ext.c
+++ b/lib/librte_table/rte_table_hash_ext.c
@@ -170,7 +170,7 @@ rte_table_hash_ext_create(void *params, int socket_id, 
uint32_t entry_size)
struct rte_table_hash_ext_params *p =
(struct rte_table_hash_ext_params *) params;
struct rte_table_hash *t;
-   uint32_t total_size, table_meta_sz, table_meta_offset;
+   uint32_t total_size, table_meta_sz;
uint32_t bucket_sz, bucket_ext_sz, key_sz;
uint32_t key_stack_sz, bkt_ext_stack_sz, data_sz;
uint32_t bucket_offset, bucket_ext_offset, key_offset;
@@ -224,8 +224,7 @@ rte_table_hash_ext_create(void *params, int socket_id, 
uint32_t entry_size)
t->data_size_shl = __builtin_ctzl(entry_size);

/* Tables */
-   table_meta_offset = 0;
-   bucket_offset = table_meta_offset + table_meta_sz;
+   bucket_offset = 0;
bucket_ext_offset = bucket_offset + bucket_sz;
key_offset = bucket_ext_offset + bucket_ext_sz;
key_stack_offset = key_offset + key_sz;
diff --git a/lib/librte_table/rte_table_hash_lru.c 
b/lib/librte_table/rte_table_hash_lru.c
index bf92e81..f94c0a2 100644
--- a/lib/librte_table/rte_table_hash_lru.c
+++ b/lib/librte_table/rte_table_hash_lru.c
@@ -147,7 +147,7 @@ rte_table_hash_lru_create(void *params, int socket_id, 
uint32_t entry_size)
struct rte_table_hash_lru_params *p =
(struct rte_table_hash_lru_params *) params;
struct rte_table_hash *t;
-   uint32_t total_size, table_meta_sz, table_meta_offset;
+   uint32_t total_size, table_meta_sz;
uint32_t bucket_sz, key_sz, key_stack_sz, data_sz;
uint32_t bucket_offset, key_offset, key_stack_offset, data_offset;
uint32_t i;
@@ -195,8 +195,7 @@ rte_table_hash_lru_create(void *params, int socket_id, 
uint32_t entry_size)
t->data_size_shl = __builtin_ctzl(entry_size);

/* Tables */
-   table_meta_offset = 0;
-   bucket_offset = table_meta_offset + table_meta_sz;
+   bucket_offset = 0;
key_offset = bucket_offset + bucket_sz;
key_stack_offset = key_offset + key_sz;
data_offset = key_stack_offset + key_stack_sz;
--
2.1.0

[dpdk-dev] [PATCH 3/4] table: fix incorrect t->data_size_shl initialization

2014-09-26 Thread Balazs Nemeth

During initialization of rte_hash_table_ext and rte_hash_table_lru,
t->data_size_shl is calculated.  This member contains the number of
bits to shift left during calculation of the location of entries in
the hash table.  To determine the number of bits to shift left, the
size of the entry (as provided to the rte_table_hash_ext_create and
rte_table_hash_lru_create) has to be used instead of the size of the
key.

Acked-by: Cristian Dumitrescu 
Signed-off-by: Balazs Nemeth 
---
 lib/librte_table/rte_table_hash_ext.c | 2 +-
 lib/librte_table/rte_table_hash_lru.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_table/rte_table_hash_ext.c 
b/lib/librte_table/rte_table_hash_ext.c
index 17c16cd..fb3e6d2 100644
--- a/lib/librte_table/rte_table_hash_ext.c
+++ b/lib/librte_table/rte_table_hash_ext.c
@@ -221,7 +221,7 @@ rte_table_hash_ext_create(void *params, int socket_id, 
uint32_t entry_size)
/* Internal */
t->bucket_mask = t->n_buckets - 1;
t->key_size_shl = __builtin_ctzl(p->key_size);
-   t->data_size_shl = __builtin_ctzl(p->key_size);
+   t->data_size_shl = __builtin_ctzl(entry_size);

/* Tables */
table_meta_offset = 0;
diff --git a/lib/librte_table/rte_table_hash_lru.c 
b/lib/librte_table/rte_table_hash_lru.c
index d1a4984..bf92e81 100644
--- a/lib/librte_table/rte_table_hash_lru.c
+++ b/lib/librte_table/rte_table_hash_lru.c
@@ -192,7 +192,7 @@ rte_table_hash_lru_create(void *params, int socket_id, 
uint32_t entry_size)
/* Internal */
t->bucket_mask = t->n_buckets - 1;
t->key_size_shl = __builtin_ctzl(p->key_size);
-   t->data_size_shl = __builtin_ctzl(p->key_size);
+   t->data_size_shl = __builtin_ctzl(entry_size);

/* Tables */
table_meta_offset = 0;
--
2.1.0

[dpdk-dev] [PATCH 2/4] table: fix checking extended buckets in unoptimized case

2014-09-26 Thread Balazs Nemeth

If a key is not found in a bucket and the bucket has been extended,
the extended buckets also have to checked for potentially matching
keys. The extended buckets are checked at the end of the lookup. In
most cases, this logic is skipped as it is uncommon to have buckets in
an extended state.

In case the lookup is performed with less than 5 packets, an
unoptimized version is run instead (the optimized version requires at
least 5 packets). The extended buckets should also be checked in this
case instead of simply ignoring the extended buckets.

Acked-by: Cristian Dumitrescu 
Signed-off-by: Balazs Nemeth 
---
 lib/librte_table/rte_table_hash_key16.c | 4 ++--
 lib/librte_table/rte_table_hash_key32.c | 4 ++--
 lib/librte_table/rte_table_hash_key8.c  | 8 
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/lib/librte_table/rte_table_hash_key16.c 
b/lib/librte_table/rte_table_hash_key16.c
index f5ec87d..f78db77 100644
--- a/lib/librte_table/rte_table_hash_key16.c
+++ b/lib/librte_table/rte_table_hash_key16.c
@@ -968,8 +968,7 @@ rte_table_hash_lookup_key16_ext(
buckets, keys, f);
}

-   *lookup_hit_mask = pkts_mask_out;
-   return 0;
+   goto grind_next_buckets;
}

/*
@@ -1060,6 +1059,7 @@ rte_table_hash_lookup_key16_ext(
bucket20, bucket21, pkts_mask_out, entries,
buckets_mask, buckets, keys, f);

+grind_next_buckets:
/* Grind next buckets */
for ( ; buckets_mask; ) {
uint64_t buckets_mask_next = 0;
diff --git a/lib/librte_table/rte_table_hash_key32.c 
b/lib/librte_table/rte_table_hash_key32.c
index e8f4812..10e281d 100644
--- a/lib/librte_table/rte_table_hash_key32.c
+++ b/lib/librte_table/rte_table_hash_key32.c
@@ -988,8 +988,7 @@ rte_table_hash_lookup_key32_ext(
keys, f);
}

-   *lookup_hit_mask = pkts_mask_out;
-   return 0;
+   goto grind_next_buckets;
}

/*
@@ -1080,6 +1079,7 @@ rte_table_hash_lookup_key32_ext(
bucket20, bucket21, pkts_mask_out, entries,
buckets_mask, buckets, keys, f);

+grind_next_buckets:
/* Grind next buckets */
for ( ; buckets_mask; ) {
uint64_t buckets_mask_next = 0;
diff --git a/lib/librte_table/rte_table_hash_key8.c 
b/lib/librte_table/rte_table_hash_key8.c
index d60c96e..606805d 100644
--- a/lib/librte_table/rte_table_hash_key8.c
+++ b/lib/librte_table/rte_table_hash_key8.c
@@ -1104,8 +1104,7 @@ rte_table_hash_lookup_key8_ext(
keys, f);
}

-   *lookup_hit_mask = pkts_mask_out;
-   return 0;
+   goto grind_next_buckets;
}

/*
@@ -1196,6 +1195,7 @@ rte_table_hash_lookup_key8_ext(
bucket20, bucket21, pkts_mask_out, entries,
buckets_mask, buckets, keys, f);

+grind_next_buckets:
/* Grind next buckets */
for ( ; buckets_mask; ) {
uint64_t buckets_mask_next = 0;
@@ -1250,8 +1250,7 @@ rte_table_hash_lookup_key8_ext_dosig(
buckets, keys, f);
}

-   *lookup_hit_mask = pkts_mask_out;
-   return 0;
+   goto grind_next_buckets;
}

/*
@@ -1342,6 +1341,7 @@ rte_table_hash_lookup_key8_ext_dosig(
bucket20, bucket21, pkts_mask_out, entries,
buckets_mask, buckets, keys, f);

+grind_next_buckets:
/* Grind next buckets */
for ( ; buckets_mask; ) {
uint64_t buckets_mask_next = 0;
--
2.1.0

[dpdk-dev] [PATCH 1/4] table: fix empty bucket removal during entry deletion in rte_table_hash_ext

2014-09-26 Thread Balazs Nemeth

When an entry is deleted from an extensible rte_table_hash, the bucket
that stored the entry can become empty. If this is the case, the
bucket needs to be removed from the chain of buckets.

During removal of the bucket, the chain should be updated first. If
the bucket that will be removed is cleared first, the chain is broken
and the information to update the chain is lost.

Acked-by: Cristian Dumitrescu 
Signed-off-by: Balazs Nemeth 
---
 lib/librte_table/rte_table_hash_ext.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_table/rte_table_hash_ext.c 
b/lib/librte_table/rte_table_hash_ext.c
index 6e26d98..17c16cd 100644
--- a/lib/librte_table/rte_table_hash_ext.c
+++ b/lib/librte_table/rte_table_hash_ext.c
@@ -408,12 +408,12 @@ void *entry)
if ((bkt_prev != NULL) &&
(bkt->sig[0] == 0) && (bkt->sig[1] == 0) &&
(bkt->sig[2] == 0) && (bkt->sig[3] == 0)) {
-   /* Clear bucket */
-   memset(bkt, 0, sizeof(struct bucket));
-
/* Unchain bucket */
BUCKET_NEXT_COPY(bkt_prev, bkt);

+   /* Clear bucket */
+   memset(bkt, 0, sizeof(struct bucket));
+
/* Free bucket back to buckets ext */
bkt_index = bkt - t->buckets_ext;
t->bkt_ext_stack[t->bkt_ext_stack_tos++]
--
2.1.0

[dpdk-dev] [PATCH 0/4] table: fix bugs occuring in corner cases

2014-09-26 Thread Balazs Nemeth

This set of patches fixes bugs in the packet framework. Some of the
bugs occur in corner cases (i.e. when a lookup is performed on a few
packets or when buckets are in extended states) while others can cause
memory to be accessed beyond what is reserved during initialization
time.

Balazs Nemeth (4):
  table: fix empty bucket removal during entry deletion in
rte_table_hash_ext
  table: fix checking extended buckets in unoptimized case
  table: fix incorrect t->data_size_shl initialization
  table: fix pointer calculations at initialization

 lib/librte_table/rte_table_hash_ext.c   | 13 ++---
 lib/librte_table/rte_table_hash_key16.c |  4 ++--
 lib/librte_table/rte_table_hash_key32.c |  4 ++--
 lib/librte_table/rte_table_hash_key8.c  |  8 
 lib/librte_table/rte_table_hash_lru.c   |  7 +++
 5 files changed, 17 insertions(+), 19 deletions(-)

-- 
2.1.0

[dpdk-dev] DPDK doesn't work with iommu=pt

2014-09-26 Thread Hiroshi Shimamoto

I encountered an issue that DPDK doesn't work with "iommu=pt intel_iommu=on"
on HP ProLiant DL380p Gen8 server. I'm using the following environment;

  HW: ProLiant DL380p Gen8
  CPU: E5-2697 v2
  OS: RHEL7 
  kernel: kernel-3.10.0-123 and the latest kernel 3.17-rc6+
  DPDK: v1.7.1-53-gce5abac
  NIC: 82599ES

When boot with "iommu=pt intel_iommu=on", I got the below message and
no packets are handled.

  [  120.809611] dmar: DRHD: handling fault status reg 2
  [  120.809635] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr 
aa01
  DMAR:[fault reason 02] Present bit in context entry is clear

How to reproduce;
just run testpmd
# ./testpmd -c 0xf -n 4 -- -i

Configuring Port 0 (socket 0)
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x754eafc0 hw_ring=0x7420 
dma_addr=0xaa00
PMD: ixgbe_dev_tx_queue_setup(): Using full-featured tx code path
PMD: ixgbe_dev_tx_queue_setup():  - txq_flags = 0 [IXGBE_SIMPLE_FLAGS=f01]
PMD: ixgbe_dev_tx_queue_setup():  - tx_rs_thresh = 32 
[RTE_PMD_IXGBE_TX_MAX_BURST=32]
PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x754ea740 hw_ring=0x7421 
dma_addr=0xaa01
PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc 
Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32
PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are not 
satisfied, Scattered Rx is requested, or RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC 
is not enabled (port=0, queue=0).
PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc 
Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32

testpmd> start
  io packet forwarding - CRC stripping disabled - packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  RX queues=1 - RX desc=128 - RX free threshold=0
  RX threshold registers: pthresh=8 hthresh=8 wthresh=0
  TX queues=1 - TX desc=512 - TX free threshold=0
  TX threshold registers: pthresh=32 hthresh=0 wthresh=0
  TX RS bit threshold=0 - TXQ flags=0x0


and ping from another box to this server.
# ping6 -I eth2 ff02::1

I got the below error message and no packet is received.
I couldn't see any increase RX/TX count in testpmt statistics

testpmd> show port stats 0

   NIC statistics for port 0  
  RX-packets: 6  RX-missed: 0  RX-bytes:  732
  RX-badcrc:  0  RX-badlen: 0  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 0  TX-errors: 0  TX-bytes:  0
  
testpmd> show port stats 0

   NIC statistics for port 0  
  RX-packets: 6  RX-missed: 0  RX-bytes:  732
  RX-badcrc:  0  RX-badlen: 0  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 0  TX-errors: 0  TX-bytes:  0
  


The fault addr in error message must be RX DMA descriptor

error message
  [  120.809635] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr 
aa01

log in testpmd
  PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x754ea740 
hw_ring=0x7421 dma_addr=0xaa01

I think the NIC received a packet in fifo and try to put into memory with DMA.
Before starting DMA, the NIC get the target address from RX descriptors in RDBA 
register.
But accessing RX descriptors failed in IOMMU unit and reported it to the kernel.

  DMAR:[fault reason 02] Present bit in context entry is clear

The error message looks there is no valid entry in IOMMU.

I think the following issue is very similar, but using Ubuntu14.04 couldn't fix 
in my case.
http://thread.gmane.org/gmane.comp.networking.dpdk.devel/2281

I tried Ubuntu14.04.1 and got the below error.

  [  199.710191] dmar: DRHD: handling fault status reg 2
  [  199.710896] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr 
7c24df000
  [  199.710896] DMAR:[fault reason 06] PTE Read access is not set

Currently I could see this issue on HP ProLiant DL380p Gen8 only.
Is there any idea?
Has anyone noticed this issue?

Note: we're thinking to use SR-IOV and DPDK app in the same box.
The box has 2 NICs, one for SR-IOV and pass through to VM, one (no SR-IOV) for 
DPDK app in host.

thanks,
Hiroshi

[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Neil Horman

On Thu, Sep 25, 2014 at 11:24:30PM +, Ananyev, Konstantin wrote:
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Thursday, September 25, 2014 6:24 PM
> > To: Ananyev, Konstantin
> > Cc: Jastrzebski, MichalX K; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to 
> > thread-safe:
> > 
> > On Thu, Sep 25, 2014 at 04:03:48PM +, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> > > > Sent: Thursday, September 25, 2014 4:08 PM
> > > > To: Jastrzebski, MichalX K
> > > > Cc: dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to 
> > > > thread-safe:
> > > >
> > > > On Thu, Sep 25, 2014 at 01:56:08PM +0100, Michal Jastrzebski wrote:
> > > > > Change alarm cancel function to thread-safe.
> > > > > It eliminates a race between threads using rte_alarm_cancel and
> > > > > rte_alarm_set.
> > > > >
> > > > > Signed-off-by: Pawel Wodkowski 
> > > > > Reviewed-by: Michal Jastrzebski 
> > > > >
> > > > > ---
> > > > >  lib/librte_eal/common/include/rte_alarm.h |3 +-
> > > > >  lib/librte_eal/linuxapp/eal/eal_alarm.c   |   68 
> > > > > ++---
> > > > >  2 files changed, 45 insertions(+), 26 deletions(-)
> > > > >
> > > >
> > > > > diff --git a/lib/librte_eal/common/include/rte_alarm.h 
> > > > > b/lib/librte_eal/common/include/rte_alarm.h
> > > > > index d451522..e7cbaef 100644
> > > > > --- a/lib/librte_eal/common/include/rte_alarm.h
> > > > > +++ b/lib/librte_eal/common/include/rte_alarm.h
> > > > > @@ -76,7 +76,8 @@ typedef void (*rte_eal_alarm_callback)(void *arg);
> > > > >  int rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb, void 
> > > > > *cb_arg);
> > > > >
> > > > >  /**
> > > > > - * Function to cancel an alarm callback which has been registered 
> > > > > before.
> > > > > + * Function to cancel an alarm callback which has been registered 
> > > > > before. If
> > > > > + * used outside alarm callback it wait for all callbacks to finish 
> > > > > its execution.
> > > > >   *
> > > > >   * @param cb_fn
> > > > >   *  alarm callback
> > > > > diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c 
> > > > > b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > > > > index 480f0cb..ea8dfb4 100644
> > > > > --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > > > > +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > > > > @@ -69,7 +69,8 @@ struct alarm_entry {
> > > > >   struct timeval time;
> > > > >   rte_eal_alarm_callback cb_fn;
> > > > >   void *cb_arg;
> > > > > - volatile int executing;
> > > > > + volatile uint8_t executing;
> > > > > + volatile pthread_t executing_id;
> > > > >  };
> > > > >
> > > > >  static LIST_HEAD(alarm_list, alarm_entry) alarm_list = 
> > > > > LIST_HEAD_INITIALIZER();
> > > > > @@ -108,11 +109,13 @@ eal_alarm_callback(struct rte_intr_handle *hdl 
> > > > > __rte_unused,
> > > > >   (ap->time.tv_sec < now.tv_sec || 
> > > > > (ap->time.tv_sec == now.tv_sec &&
> > > > >   ap->time.tv_usec <= 
> > > > > now.tv_usec))){
> > > > >   ap->executing = 1;
> > > > > + ap->executing_id = pthread_self();
> > > > How exactly does this work?  From my read all alarm callbacks are 
> > > > handled by the
> > > > thread created in rte_eal_intr_init (which runs forever in
> > > > eal_intr_thread_main()).
> > >
> > > In current implementation - yes.
> > >
> > >  So every assignment to the above executing_id value
> > > > will be from that thread.  As such, anytime rte_eal_alarm_cancel is 
> > > > called from
> > > > within a callback we are guaranteed that:
> > > > a) the ap->executing flag is set to 1
> > > > b) the ap->executing_id value should equal whatever is returned from
> > > > pthread_self()
> > >
> > > Yes
> > >
> > > >
> > > > That will cause the executing counter local to the cancel function to 
> > > > get
> > > > incremented, meaning we will deadlock withing that do { ... } while 
> > > > (executing
> > > > != 0) loop, no?
> > >
> > > No, as for the case when cancel is called from callback:
> > > pthread_equal(ap->executing_id, pthread_self())
> > > would return non-zero value (which means threads ids are equal), so 
> > > executing will not be incremented.
> > >
> > Ah, pthread_equal is one of the backwards functions that returns zero for
> > inequality.  Maybe then rewrite that as:
> > if (!pthread_equal(...)
> > 
> > So its clear that we're looking for inequality there to increment?
> > 
> > > >
> > > > >   rte_spinlock_unlock(_list_lk);
> > > > >
> > > > >   ap->cb_fn(ap->cb_arg);
> > > > >
> > > > >   rte_spinlock_lock(_list_lk);
> > > > > +
> > > > >   LIST_REMOVE(ap, next);
> > > > >   rte_free(ap);
> > > > >   }
> > > > > @@ -145,7 +148,7 @@ rte_eal_alarm_set(uint64_t us,

[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Wodkowski, PawelX

> Given what you said above, I agree, at least in the current implementation.  
> It
> still seems like theres a simpler solution that doesn't require all the
> comparative gymnastics.

Yes, there is simpler solution, but this solution involve recursive locking.
DPDK recursive spinlocks are no an option in here, so only option is posix 
recursive
mutex, which I think is even worst option than this gymnastics.

> 
> What if, instead of testing if you're the callback thread, we turn the 
> executing
> field of alarm_entry into a bitfield, where bit 0 represents the former
> "executing" state, and bit 1 is defined as a "cancelled" bit.  Then
> rte_eal_alarm_cancel becomes a search that, when an alarm is found simply or's
> in the cancelled bit to the executing bit field.  When the callback thread 
> runs,
> it skips executing any alarm that is marked as cancelled, but frees all alarm
> entries that are executed or cancelled.  That gives us a single point at which
> frees of alarm entires happen?  Something like the patch below (completely
> untested)?
> 
> It also seems like the alarm api as a whole could use some improvement.  The
> way its written right now, theres no way to refer to a specific alarm (i.e.
> cancelation relies on the specification of a function and data pointer, which
> may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an opaque
> handle to a unique timer instance that can be store by a caller and used to
> specfically cancel that timer?  Thats how both the bsd and linux timer
> subsystems model timers.
> 

Goal was to not break user applications that use this library.

> 
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> index 480f0cb..73b6dc5 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> @@ -64,6 +64,9 @@
>  #define MS_PER_S 1000
>  #define US_PER_S (US_PER_MS * MS_PER_S)
> 
> +#define ALARM_EXECUTING (1 << 0)
> +#define ALARM_CANCELLED (1 << 1)
> +
>  struct alarm_entry {
>   LIST_ENTRY(alarm_entry) next;
>   struct timeval time;
> @@ -107,12 +110,14 @@ eal_alarm_callback(struct rte_intr_handle *hdl
> __rte_unused,
>   gettimeofday(, NULL) == 0 &&
>   (ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> now.tv_sec &&
>   ap->time.tv_usec <=
> now.tv_usec))){
> - ap->executing = 1;
> - rte_spinlock_unlock(_list_lk);

Removing unlock here introduce deadlock.

> + ap->executing |= ALARM_EXECUTING;
> + if (likely(!(ap->executing & ALARM_CANCELLED)) {
> + rte_spinlock_unlock(_list_lk);
> 
> - ap->cb_fn(ap->cb_arg);
> + ap->cb_fn(ap->cb_arg);
> 
> - rte_spinlock_lock(_list_lk);
> + rte_spinlock_lock(_list_lk);
> + }
>   LIST_REMOVE(ap, next);
>   rte_free(ap);
>   }
> @@ -209,10 +214,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> void *cb_arg)
>   rte_spinlock_lock(_list_lk);
>   /* remove any matches at the start of the list */
>   while ((ap = LIST_FIRST(_list)) != NULL &&
> - cb_fn == ap->cb_fn && ap->executing == 0 &&
> + cb_fn == ap->cb_fn &&
>   (cb_arg == (void *)-1 || cb_arg == ap->cb_arg)) {
> - LIST_REMOVE(ap, next);
> - rte_free(ap);
> + ap->executing |= ALARM_CANCELLED;
>   count++;
>   }
>   ap_prev = ap;
> @@ -220,10 +224,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> void *cb_arg)
>   /* now go through list, removing entries not at start */
>   LIST_FOREACH(ap, _list, next) {
>   /* this won't be true first time through */
> - if (cb_fn == ap->cb_fn &&  ap->executing == 0 &&
> + if (cb_fn == ap->cb_fn &&
>   (cb_arg == (void *)-1 || cb_arg == ap->cb_arg))
> {
> - LIST_REMOVE(ap,next);
> - rte_free(ap);
> + ap->executing |= ALARM_CANCELLED;
>   count++;
>   ap = ap_prev;
>   }

Pawel

[dpdk-dev] [PATCH 12/12] Add memory support for IBM Power Architecture

2014-09-26 Thread Chao Zhu

IBM Power architecture has different memory architecture with x86. When
the physical memory address is in ascending order, the mmaped virtual
address is in descending order. This patch modified the memory segment
detection code to make it work for Power.

Signed-off-by: Chao Zhu 
---
 config/defconfig_ppc_64-native-linuxapp-gcc   |1 +
 config/defconfig_x86_64-native-linuxapp-clang |1 +
 config/defconfig_x86_64-native-linuxapp-gcc   |1 +
 config/defconfig_x86_64-native-linuxapp-icc   |1 +
 lib/librte_eal/linuxapp/eal/eal_memory.c  |   19 +--
 5 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/config/defconfig_ppc_64-native-linuxapp-gcc 
b/config/defconfig_ppc_64-native-linuxapp-gcc
index cc11cfc..c29888c 100644
--- a/config/defconfig_ppc_64-native-linuxapp-gcc
+++ b/config/defconfig_ppc_64-native-linuxapp-gcc
@@ -34,6 +34,7 @@ CONFIG_RTE_MACHINE="powerpc"

 CONFIG_RTE_ARCH="powerpc"
 CONFIG_RTE_ARCH_PPC_64=y
+CONFIG_RTE_ARCH_64=y
 CONFIG_RTE_ARCH_BIG_ENDIAN=y

 CONFIG_RTE_TOOLCHAIN="gcc"
diff --git a/config/defconfig_x86_64-native-linuxapp-clang 
b/config/defconfig_x86_64-native-linuxapp-clang
index bbda080..5f3074e 100644
--- a/config/defconfig_x86_64-native-linuxapp-clang
+++ b/config/defconfig_x86_64-native-linuxapp-clang
@@ -36,6 +36,7 @@ CONFIG_RTE_MACHINE="native"

 CONFIG_RTE_ARCH="x86_64"
 CONFIG_RTE_ARCH_X86_64=y
+CONFIG_RTE_ARCH_64=y

 CONFIG_RTE_TOOLCHAIN="clang"
 CONFIG_RTE_TOOLCHAIN_CLANG=y
diff --git a/config/defconfig_x86_64-native-linuxapp-gcc 
b/config/defconfig_x86_64-native-linuxapp-gcc
index 3de818a..60baf5b 100644
--- a/config/defconfig_x86_64-native-linuxapp-gcc
+++ b/config/defconfig_x86_64-native-linuxapp-gcc
@@ -36,6 +36,7 @@ CONFIG_RTE_MACHINE="native"

 CONFIG_RTE_ARCH="x86_64"
 CONFIG_RTE_ARCH_X86_64=y
+CONFIG_RTE_ARCH_64=y

 CONFIG_RTE_TOOLCHAIN="gcc"
 CONFIG_RTE_TOOLCHAIN_GCC=y
diff --git a/config/defconfig_x86_64-native-linuxapp-icc 
b/config/defconfig_x86_64-native-linuxapp-icc
index 795333b..71d1e28 100644
--- a/config/defconfig_x86_64-native-linuxapp-icc
+++ b/config/defconfig_x86_64-native-linuxapp-icc
@@ -36,6 +36,7 @@ CONFIG_RTE_MACHINE="native"

 CONFIG_RTE_ARCH="x86_64"
 CONFIG_RTE_ARCH_X86_64=y
+CONFIG_RTE_ARCH_64=y

 CONFIG_RTE_TOOLCHAIN="icc"
 CONFIG_RTE_TOOLCHAIN_ICC=y
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index f2454f4..6694e08 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -316,7 +316,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
 #endif
hugepg_tbl[i].filepath[sizeof(hugepg_tbl[i].filepath) - 
1] = '\0';
}
-#ifndef RTE_ARCH_X86_64
+#ifndef RTE_ARCH_64
/* for 32-bit systems, don't remap 1G pages, just reuse original
 * map address as final map address.
 */
@@ -412,7 +412,7 @@ remap_all_hugepages(struct hugepage_file *hugepg_tbl, 
struct hugepage_info *hpi)

while (i < hpi->num_pages[0]) {

-#ifndef RTE_ARCH_X86_64
+#ifndef RTE_ARCH_64
/* for 32-bit systems, don't remap 1G pages, just reuse original
 * map address as final map address.
 */
@@ -1263,9 +1263,18 @@ rte_eal_hugepage_init(void)
else if ((hugepage[i].physaddr - hugepage[i-1].physaddr) !=
hugepage[i].size)
new_memseg = 1;
+#ifdef RTE_ARCH_PPC_64
+   /* IBM Power architecture has different memory layout. 
+* If the physical address is lower address first, the mmaped 
virtual
+* address will be higher address first */
+   else if (((unsigned long)hugepage[i-1].final_va -
+   (unsigned long)hugepage[i].final_va) != hugepage[i].size)
+   new_memseg = 1;
+#else
else if (((unsigned long)hugepage[i].final_va -
(unsigned long)hugepage[i-1].final_va) != hugepage[i].size)
new_memseg = 1;
+#endif

if (new_memseg) {
j += 1;
@@ -1284,6 +1293,12 @@ rte_eal_hugepage_init(void)
}
/* continuation of previous memseg */
else {
+#ifdef RTE_ARCH_PPC_64
+   /* Use the phy and virt address of the last page as segment 
address 
+* for IBM Power architecture */ 
+   mcfg->memseg[j].phys_addr = hugepage[i].physaddr;
+   mcfg->memseg[j].addr = hugepage[i].final_va;
+#endif
mcfg->memseg[j].len += mcfg->memseg[j].hugepage_sz;
}
hugepage[i].memseg_id = j;
-- 
1.7.1

[dpdk-dev] [PATCH 11/12] Add huge page sizes for IBM Power architecture

2014-09-26 Thread Chao Zhu

IBM Power architecture has different huge page sizes (16MB, 16GB) than
x86.This patch inserts RTE_PGSIZE_16M and RTE_PGSIZE_16G to the
rte_page_sizes enum variable and adds huge page size support of DPDK for
IBM Power architecture.

Signed-off-by: Chao Zhu 
---
 app/test/test_memzone.c |  119 ++-
 lib/librte_eal/common/eal_common_memzone.c  |   15 +++-
 lib/librte_eal/common/include/rte_memory.h  |9 ++-
 lib/librte_eal/common/include/rte_memzone.h |8 ++
 lib/librte_eal/linuxapp/eal/eal.c   |5 +-
 5 files changed, 147 insertions(+), 9 deletions(-)

diff --git a/app/test/test_memzone.c b/app/test/test_memzone.c
index 381f643..8668103 100644
--- a/app/test/test_memzone.c
+++ b/app/test/test_memzone.c
@@ -133,6 +133,8 @@ test_memzone_reserve_flags(void)
const struct rte_memseg *ms;
int hugepage_2MB_avail = 0;
int hugepage_1GB_avail = 0;
+   int hugepage_16MB_avail = 0;
+   int hugepage_16GB_avail = 0;
const size_t size = 100;
int i = 0;
ms = rte_eal_get_physmem_layout();
@@ -141,12 +143,20 @@ test_memzone_reserve_flags(void)
hugepage_2MB_avail = 1;
if (ms[i].hugepage_sz == RTE_PGSIZE_1G)
hugepage_1GB_avail = 1;
+   if (ms[i].hugepage_sz == RTE_PGSIZE_16M)
+   hugepage_16MB_avail = 1;
+   if (ms[i].hugepage_sz == RTE_PGSIZE_16G)
+   hugepage_16GB_avail = 1;
}
-   /* Display the availability of 2MB and 1GB pages */
+   /* Display the availability of 2MB ,1GB, 16MB, 16GB pages */
if (hugepage_2MB_avail)
printf("2MB Huge pages available\n");
if (hugepage_1GB_avail)
printf("1GB Huge pages available\n");
+   if (hugepage_16MB_avail)
+   printf("16MB Huge pages available\n");
+   if (hugepage_16GB_avail)
+   printf("16GB Huge pages available\n");
/*
 * If 2MB pages available, check that a small memzone is correctly
 * reserved from 2MB huge pages when requested by the RTE_MEMZONE_2MB 
flag.
@@ -255,6 +265,113 @@ test_memzone_reserve_flags(void)
}
}
}
+   /*
+* This option is for IBM Power. If 16MB pages available, check that a 
small memzone is correctly
+* reserved from 16MB huge pages when requested by the RTE_MEMZONE_16MB 
flag.
+* Also check that RTE_MEMZONE_SIZE_HINT_ONLY flag only defaults to an
+* available page size (i.e 16GB ) when 16MB pages are unavailable.
+*/
+   if (hugepage_16MB_avail){
+   mz = rte_memzone_reserve("flag_zone_16M", size, SOCKET_ID_ANY,
+   RTE_MEMZONE_16MB);
+   if (mz == NULL) {
+   printf("MEMZONE FLAG 16MB\n");
+   return -1;
+   }
+   if (mz->hugepage_sz != RTE_PGSIZE_16M) {
+   printf("hugepage_sz not equal 16M\n");
+   return -1;
+   }
+
+   mz = rte_memzone_reserve("flag_zone_16M_HINT", size, 
SOCKET_ID_ANY,
+   RTE_MEMZONE_16MB|RTE_MEMZONE_SIZE_HINT_ONLY);
+   if (mz == NULL) {
+   printf("MEMZONE FLAG 2MB\n");
+   return -1;
+   }
+   if (mz->hugepage_sz != RTE_PGSIZE_16M) {
+   printf("hugepage_sz not equal 16M\n");
+   return -1;
+   }
+
+   /* Check if 1GB huge pages are unavailable, that function fails 
unless
+* HINT flag is indicated
+*/
+   if (!hugepage_16GB_avail) {
+   mz = rte_memzone_reserve("flag_zone_16G_HINT", size, 
SOCKET_ID_ANY,
+   
RTE_MEMZONE_16GB|RTE_MEMZONE_SIZE_HINT_ONLY);
+   if (mz == NULL) {
+   printf("MEMZONE FLAG 16GB & HINT\n");
+   return -1;
+   }
+   if (mz->hugepage_sz != RTE_PGSIZE_16M) {
+   printf("hugepage_sz not equal 16M\n");
+   return -1;
+   }
+
+   mz = rte_memzone_reserve("flag_zone_16G", size, 
SOCKET_ID_ANY,
+   RTE_MEMZONE_16GB);
+   if (mz != NULL) {
+   printf("MEMZONE FLAG 16GB\n");
+   return -1;
+   }
+   }
+   }
+   /*As with 16MB tests above for 16GB huge page requests*/
+   if (hugepage_16GB_avail){
+   mz = rte_memzone_reserve("flag_zone_16G", size, SOCKET_ID_ANY,
+   RTE_MEMZONE_16GB);
+   if (mz == NULL) {
+

[dpdk-dev] [PATCH 10/12] Add cache size define for IBM Power Architecture

2014-09-26 Thread Chao Zhu

IBM Power architecture has different cache line size (128 bytes) than
x86 (64 bytes). This patch defines CACHE_LINE_SIZE to 128 bytes to
override the default value 64 bytes to support IBM Power Architecture.

Signed-off-by: Chao Zhu 
---
 app/test/test_malloc.c  |8 
 app/test/test_mbuf.c|2 +-
 mk/arch/powerpc/rte.vars.mk |2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/app/test/test_malloc.c b/app/test/test_malloc.c
index ee34ca3..63e6b32 100644
--- a/app/test/test_malloc.c
+++ b/app/test/test_malloc.c
@@ -300,9 +300,9 @@ test_big_alloc(void)
size_t size =rte_str_to_size(MALLOC_MEMZONE_SIZE)*2;
int align = 0;
 #ifndef RTE_LIBRTE_MALLOC_DEBUG
-   int overhead = 64 + 64;
+   int overhead = CACHE_LINE_SIZE + CACHE_LINE_SIZE;
 #else
-   int overhead = 64 + 64 + 64;
+   int overhead = CACHE_LINE_SIZE + CACHE_LINE_SIZE + CACHE_LINE_SIZE;
 #endif

rte_malloc_get_socket_stats(socket, _stats);
@@ -356,9 +356,9 @@ test_multi_alloc_statistics(void)
 #ifndef RTE_LIBRTE_MALLOC_DEBUG
int trailer_size = 0;
 #else
-   int trailer_size = 64;
+   int trailer_size = CACHE_LINE_SIZE;
 #endif
-   int overhead = 64 + trailer_size;
+   int overhead = CACHE_LINE_SIZE + trailer_size;

rte_malloc_get_socket_stats(socket, _stats);

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index 21024e7..03da329 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -832,7 +832,7 @@ test_failing_mbuf_sanity_check(void)
 static int
 test_mbuf(void)
 {
-   RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != 64);
+   RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != CACHE_LINE_SIZE);

/* create pktmbuf pool if it does not exist */
if (pktmbuf_pool == NULL) {
diff --git a/mk/arch/powerpc/rte.vars.mk b/mk/arch/powerpc/rte.vars.mk
index 363fcd1..dfdeaea 100644
--- a/mk/arch/powerpc/rte.vars.mk
+++ b/mk/arch/powerpc/rte.vars.mk
@@ -32,7 +32,7 @@
 ARCH  ?= powerpc
 CROSS ?=

-CPU_CFLAGS  ?= -m64
+CPU_CFLAGS  ?= -m64 -DCACHE_LINE_SIZE=128
 CPU_LDFLAGS ?=
 CPU_ASFLAGS ?= -felf64

-- 
1.7.1

[dpdk-dev] [PATCH 09/12] Remove iopl operation for IBM Power architecture

2014-09-26 Thread Chao Zhu

iopl() call is mostly for the i386 architecture. In Power architecture.
It doesn't exist. This patch modified rte_eal_iopl_init() and make it
return -1 on Power. This means rte_config.flags will not contain
EAL_FLG_HIGH_IOPL flag on IBM Power architecture.

Signed-off-by: Chao Zhu 
---
 lib/librte_eal/linuxapp/eal/eal.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 4869e7c..8cc1f21 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -50,7 +50,10 @@
 #include 
 #include 
 #include 
+/* Power architecture doesn't have this header file */
+#ifndef RTE_ARCH_PPC_64
 #include 
+#endif

 #include 
 #include 
@@ -1019,11 +1022,19 @@ rte_eal_mcfg_complete(void)

 /*
  * Request iopl privilege for all RPL, returns 0 on success
+ *
+ * Power architecture doesn't have iopl function, so this function
+ * return -1 on Power architecture, because this function is only used
+ * in rte_eal_init to add EAL_FLG_HIGH_IOPL to rte_config.flags.
  */
 static int
 rte_eal_iopl_init(void)
 {
+#ifndef RTE_ARCH_PPC_64
return iopl(HIGHEST_RPL);
+#else
+   return -1;
+#endif
 }

 /* Launch threads, called at application init(). */
-- 
1.7.1

[dpdk-dev] [PATCH 08/12] Add CPU flag checking for IBM Power architecture

2014-09-26 Thread Chao Zhu

Intel processors contain registers to identify CPU flags, DPDK reads the
registers to get the CPU flags. IBM Power processor doesn't have such
registers. This patch uses aux vector software register to get CPU flags
and add CPU flag checking support for IBM Power architecture.

Signed-off-by: Chao Zhu 
---
 app/test/test_cpuflags.c   |   35 
 .../include/powerpc/arch/rte_cpuflags_arch.h   |  199 
 mk/rte.cpuflags.mk |   17 ++
 3 files changed, 251 insertions(+), 0 deletions(-)
 create mode 100644 
lib/librte_eal/common/include/powerpc/arch/rte_cpuflags_arch.h

diff --git a/app/test/test_cpuflags.c b/app/test/test_cpuflags.c
index 82c0197..5aeba5d 100644
--- a/app/test/test_cpuflags.c
+++ b/app/test/test_cpuflags.c
@@ -80,6 +80,40 @@ test_cpuflags(void)
int result;
printf("\nChecking for flags from different registers...\n");

+#ifdef RTE_ARCH_PPC_64
+   printf("Check for PPC64:\t\t");
+   CHECK_FOR_FLAG(RTE_CPUFLAG_PPC64);
+
+   printf("Check for PPC32:\t\t");
+   CHECK_FOR_FLAG(RTE_CPUFLAG_PPC32);
+
+   printf("Check for VSX:\t\t");
+   CHECK_FOR_FLAG(RTE_CPUFLAG_VSX);
+
+   printf("Check for DFP:\t\t");
+   CHECK_FOR_FLAG(RTE_CPUFLAG_DFP);
+
+   printf("Check for FPU:\t\t");
+   CHECK_FOR_FLAG(RTE_CPUFLAG_FPU);
+
+   printf("Check for SMT:\t\t");
+   CHECK_FOR_FLAG(RTE_CPUFLAG_SMT);
+
+   printf("Check for MMU:\t\t");
+   CHECK_FOR_FLAG(RTE_CPUFLAG_MMU);
+
+   printf("Check for ALTIVEC:\t\t");
+   CHECK_FOR_FLAG(RTE_CPUFLAG_ALTIVEC);
+
+   printf("Check for ARCH_2_06:\t\t");
+   CHECK_FOR_FLAG(RTE_CPUFLAG_ARCH_2_06);
+
+   printf("Check for ARCH_2_07:\t\t");
+   CHECK_FOR_FLAG(RTE_CPUFLAG_ARCH_2_07);
+
+   printf("Check for ICACHE_SNOOP:\t\t");
+   CHECK_FOR_FLAG(RTE_CPUFLAG_ICACHE_SNOOP);
+#else
printf("Check for SSE:\t\t");
CHECK_FOR_FLAG(RTE_CPUFLAG_SSE);

@@ -117,6 +151,7 @@ test_cpuflags(void)
CHECK_FOR_FLAG(RTE_CPUFLAG_INVTSC);


+#endif

/*
 * Check if invalid data is handled properly
diff --git a/lib/librte_eal/common/include/powerpc/arch/rte_cpuflags_arch.h 
b/lib/librte_eal/common/include/powerpc/arch/rte_cpuflags_arch.h
new file mode 100644
index 000..4d0ba6b
--- /dev/null
+++ b/lib/librte_eal/common/include/powerpc/arch/rte_cpuflags_arch.h
@@ -0,0 +1,199 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright (C) IBM Corporation 2014.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of IBM Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+*/
+
+#ifndef _RTE_CPUFLAGS_ARCH_H_
+#define _RTE_CPUFLAGS_ARCH_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Symbolic values for the entries in the auxiliary table */
+#define AT_HWCAP  16
+#define AT_HWCAP2 26
+
+/* software based registers */
+enum cpu_register_t {
+   REG_HWCAP = 0,
+   REG_HWCAP2,
+};
+
+/**
+ * Enumeration of all CPU features supported
+ */
+enum rte_cpu_flag_t {
+   RTE_CPUFLAG_PPC_LE = 0,
+   RTE_CPUFLAG_TRUE_LE,
+   RTE_CPUFLAG_PSERIES_PERFMON_COMPAT,
+   RTE_CPUFLAG_VSX,
+   RTE_CPUFLAG_ARCH_2_06,
+   RTE_CPUFLAG_POWER6_EXT,
+   RTE_CPUFLAG_DFP,
+   RTE_CPUFLAG_PA6T,
+   RTE_CPUFLAG_ARCH_2_05,
+   RTE_CPUFLAG_ICACHE_SNOOP,
+   RTE_CPUFLAG_SMT,
+   RTE_CPUFLAG_BOOKE,
+   RTE_CPUFLAG_CELLBE,
+   RTE_CPUFLAG_POWER5_PLUS,
+

1 2 >

1 - 100 of 119 matches

Mail list logo