date:20160224

[dpdk-dev] [PATCH 01/10] ethdev: add a generic flow and new behavior switch to fdir

2016-02-24 Thread Thomas Monjalon

Caution: I truly respect the work done by Chelsio on DPDK.
And I'm sure you can help to build a good filtering API, which
was mainly designed with Intel needs in mind because it was
difficult to have opinions of other vendors some time ago.
That's why it's a chance to have new needs and it would be a shame
to let it go through a vendor specific backdoor.

2016-02-25 00:10, Rahul Lakkireddy:
> Hi Thomas,
> 
> On Wednesday, February 02/24/16, 2016 at 07:02:42 -0800, Thomas Monjalon 
> wrote:
> > 2016-02-24 14:43, Bruce Richardson:
> > > On Wed, Feb 03, 2016 at 02:02:22PM +0530, Rahul Lakkireddy wrote:
> > > > Add a new raw packet flow that allows specifying generic flow input.
> > > > 
> > > > Add the ability to provide masks for fields in flow to allow range of
> > > > values.
> > > > 
> > > > Add a new behavior switch.
> > > > 
> > > > Add the ability to provide behavior arguments to allow rewriting matched
> > > > fields with new values. Ex: allows to provide new ip and port addresses
> > > > to rewrite the fields of packets matching a filter rule before NAT'ing.
> > > > 
> > > Thomas, any comments as ethdev maintainer?
> > 
> > Yes, some comments.
> > First, there are several different changes in the same patch. It must be 
> > split.
> 
> Should each structure change be split into a separate patch?

A patch = a feature.
The switch action and the flow rule are different things.

> > Then I don't understand at all the raw flow filter. What is a raw flow?
> > How behavior_arg must be used?
> 
> This was discussed with Jingjing at
> 
> http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/31471

Thanks, I missed it.

> A raw flow provides a generic way for vendors to add their vendor
> specific input flow.

Please, "generic" and "vendor specific" in the same sentence.
It's obviously wrong.

> In our case, it is possible to match several flows
> in a single rule.  For example, it's possible to set an ethernet, vlan,
> ip and tcp/udp flows all in a single rule.  We can specify all of these
> flows in a single raw input flow, which can then be passed to cxgbe flow
> director to set the corresponding filter.

I feel we need to define what is an API.
If the application wants to call something specific to the NIC, why using
the ethdev API? You just have to include cxgbe.h.

> On similar lines, behavior_arg provides a generic way to pass extra
> action arguments for matched flows.  For example, in our case, to
> perform NAT, the new src/dst ip and src/dst port addresses to be
> re-written for a matched rule can be passed in behavior_arg.

Yes a kind of void* to give what you want to the driver without the
convenience of a documented function.

I know the support of filters among NICs is really heterogeneous.
And the DPDK API are not yet generic enough. But please do not give up!
If the filtering API can be improved to support your cases, please do it.

[dpdk-dev] [PATCH v3 0/3] fix RTE_PROC_PRIMARY_OR_ERR_RET RTE_PROC_PRIMARY_OR_RET

2016-02-24 Thread Thomas Monjalon

> > From: reshmapa 
> > 
> > Patches 1 and 2 removes RTE_PROC_PRIMARY_OR_ERR_RET and
> > RTE_PROC_PRIMARY_OR_RET macro usage from rte_ether and rte_cryptodev 
> > libraries to allow API
> > access to secondary process.
> > 
> > Patch 3 allows users to configure ethdev with zero rx/tx queues, but both 
> > should not be zero.
> > Fix rte_eth_dev_tx_queue_config, rte_eth_dev_rx_queue_config to allocate 
> > memory for rx/tx queues
> > only when number of rx/tx queues are nonzero.
> > 
> > v3:
> > * Removed checkpatch fixes of lib/librte_ether/rte_ethdev.h from patch 
> > number 1.
> > 
> > Reshma Pattan (3):
> >   librte_ether: remove RTE_PROC_PRIMARY_OR_ERR_RET and
> > RTE_PROC_PRIMARY_OR_RET
> >   librte_cryptodev: remove RTE_PROC_PRIMARY_OR_RET
> >   librte_ether: fix rte_eth_dev_configure
> 
> Acked-by: Konstantin Ananyev 

Applied with these titles:
- ethdev: allow full control from secondary process
- cryptodev: allow full control from secondary process
- ethdev: support unidirectional configuration
Please see how it is more informative without using the macros or function
names. The title must reflect the intent, not the details.
Thanks

[dpdk-dev] [PATCH v2 2/2] examples: rework to use buffered tx api

2016-02-24 Thread Tomasz Kulasek

The internal buffering of packets for TX in sample apps is no longer
needed, so this patchset also replaces this code with calls to the new
rte_eth_tx_buffer* APIs in:

* l2fwd-jobstats
* l2fwd-keepalive
* l2fwd
* l3fwd-acl
* l3fwd-power
* link_status_interrupt
* client_server_mp
* l2fwd_fork
* packet_ordering
* qos_meter

v2 changes
 - rework synced with tx buffer API changes

Signed-off-by: Tomasz Kulasek 
---
 examples/l2fwd-jobstats/main.c |  104 +++--
 examples/l2fwd-keepalive/main.c|  100 ++--
 examples/l2fwd/main.c  |  104 +++--
 examples/l3fwd-acl/main.c  |   92 ++-
 examples/l3fwd-power/main.c|   89 ++
 examples/link_status_interrupt/main.c  |  107 +++--
 .../client_server_mp/mp_client/client.c|  101 +---
 examples/multi_process/l2fwd_fork/main.c   |   97 +++-
 examples/packet_ordering/main.c|  122 ++--
 examples/qos_meter/main.c  |   61 +++---
 10 files changed, 436 insertions(+), 541 deletions(-)

diff --git a/examples/l2fwd-jobstats/main.c b/examples/l2fwd-jobstats/main.c
index 7b59f4e..f159168 100644
--- a/examples/l2fwd-jobstats/main.c
+++ b/examples/l2fwd-jobstats/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -97,18 +98,12 @@ static uint32_t l2fwd_dst_ports[RTE_MAX_ETHPORTS];

 static unsigned int l2fwd_rx_queue_per_lcore = 1;

-struct mbuf_table {
-   uint64_t next_flush_time;
-   unsigned len;
-   struct rte_mbuf *mbufs[MAX_PKT_BURST];
-};
-
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 struct lcore_queue_conf {
unsigned n_rx_port;
unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
-   struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS];
+   uint64_t next_flush_time[RTE_MAX_ETHPORTS];

struct rte_timer rx_timers[MAX_RX_QUEUE_PER_LCORE];
struct rte_jobstats port_fwd_jobs[MAX_RX_QUEUE_PER_LCORE];
@@ -123,6 +118,8 @@ struct lcore_queue_conf {
 } __rte_cache_aligned;
 struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];

+struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];
+
 static const struct rte_eth_conf port_conf = {
.rxmode = {
.split_hdr_size = 0,
@@ -373,59 +370,14 @@ show_stats_cb(__rte_unused void *param)
rte_eal_alarm_set(timer_period * US_PER_S, show_stats_cb, NULL);
 }

-/* Send the burst of packets on an output interface */
-static void
-l2fwd_send_burst(struct lcore_queue_conf *qconf, uint8_t port)
-{
-   struct mbuf_table *m_table;
-   uint16_t ret;
-   uint16_t queueid = 0;
-   uint16_t n;
-
-   m_table = >tx_mbufs[port];
-   n = m_table->len;
-
-   m_table->next_flush_time = rte_get_timer_cycles() + drain_tsc;
-   m_table->len = 0;
-
-   ret = rte_eth_tx_burst(port, queueid, m_table->mbufs, n);
-
-   port_statistics[port].tx += ret;
-   if (unlikely(ret < n)) {
-   port_statistics[port].dropped += (n - ret);
-   do {
-   rte_pktmbuf_free(m_table->mbufs[ret]);
-   } while (++ret < n);
-   }
-}
-
-/* Enqueue packets for TX and prepare them to be sent */
-static int
-l2fwd_send_packet(struct rte_mbuf *m, uint8_t port)
-{
-   const unsigned lcore_id = rte_lcore_id();
-   struct lcore_queue_conf *qconf = _queue_conf[lcore_id];
-   struct mbuf_table *m_table = >tx_mbufs[port];
-   uint16_t len = qconf->tx_mbufs[port].len;
-
-   m_table->mbufs[len] = m;
-
-   len++;
-   m_table->len = len;
-
-   /* Enough pkts to be sent. */
-   if (unlikely(len == MAX_PKT_BURST))
-   l2fwd_send_burst(qconf, port);
-
-   return 0;
-}
-
 static void
 l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
 {
struct ether_hdr *eth;
void *tmp;
+   int sent;
unsigned dst_port;
+   struct rte_eth_dev_tx_buffer *buffer;

dst_port = l2fwd_dst_ports[portid];
eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
@@ -437,7 +389,10 @@ l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
/* src addr */
ether_addr_copy(_ports_eth_addr[dst_port], >s_addr);

-   l2fwd_send_packet(m, (uint8_t) dst_port);
+   buffer = tx_buffer[dst_port];
+   sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
+   if (sent)
+   port_statistics[dst_port].tx += sent;
 }

 static void
@@ -511,8 +466,10 @@ l2fwd_flush_job(__rte_unused struct

[dpdk-dev] [PATCH v2 1/2] ethdev: add buffered tx api

2016-02-24 Thread Tomasz Kulasek

Many sample apps include internal buffering for single-packet-at-a-time
operation. Since this is such a common paradigm, this functionality is
better suited to being implemented in the ethdev API.

The new APIs in the ethdev library are:
* rte_eth_tx_buffer_init - initialize buffer
* rte_eth_tx_buffer - buffer up a single packet for future transmission
* rte_eth_tx_buffer_flush - flush any unsent buffered packets
* rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
  case transmitting a buffered burst fails. By default, we just free the
  unsent packets.

As well as these, an additional reference callback is provided, which
frees the packets (as the default callback does), as well as updating a
user-provided counter, so that the number of dropped packets can be
tracked.

v2 changes:
 - reworked to use new buffer model
 - buffer data and callbacks are removed from rte_eth_dev/rte_eth_dev_data,
   so this patch doesn't brake an ABI anymore
 - introduced RTE_ETH_TX_BUFFER macro and rte_eth_tx_buffer_init
 - buffers are not attached to the port-queue
 - buffers can be allocated dynamically during application work
 - size of buffer can be changed without port restart

Signed-off-by: Tomasz Kulasek 
---
 lib/librte_ether/rte_ethdev.c  |   36 +++
 lib/librte_ether/rte_ethdev.h  |  182 +++-
 lib/librte_ether/rte_ether_version.map |9 ++
 3 files changed, 226 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 756b234..b8ab747 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1307,6 +1307,42 @@ rte_eth_tx_queue_setup(uint8_t port_id, uint16_t 
tx_queue_id,
 }

 void
+rte_eth_count_unsent_packet_callback(struct rte_mbuf **pkts, uint16_t unsent,
+   void *userdata)
+{
+   uint64_t *count = userdata;
+   unsigned i;
+
+   for (i = 0; i < unsent; i++)
+   rte_pktmbuf_free(pkts[i]);
+
+   *count += unsent;
+}
+
+int
+rte_eth_tx_buffer_set_err_callback(struct rte_eth_dev_tx_buffer *buffer,
+   buffer_tx_error_fn cbfn, void *userdata)
+{
+   buffer->cbfn = cbfn;
+   buffer->userdata = userdata;
+   return 0;
+}
+
+int
+rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size)
+{
+   if (buffer == NULL)
+   return -EINVAL;
+
+   buffer->size = size;
+   if (buffer->cbfn == NULL)
+   rte_eth_tx_buffer_set_err_callback(buffer,
+   rte_eth_count_unsent_packet_callback, (void 
*)>errors);
+
+   return 0;
+}
+
+void
 rte_eth_promiscuous_enable(uint8_t port_id)
 {
struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 16da821..b0d4932 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -2655,6 +2655,186 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, 
nb_pkts);
 }

+typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
+   void *userdata);
+
+/**
+ * Structure used to buffer packets for future TX
+ * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush
+ */
+struct rte_eth_dev_tx_buffer {
+   unsigned nb_pkts;
+   uint64_t errors;
+   /**< Total number of queue packets to sent that are dropped. */
+   buffer_tx_error_fn cbfn;
+   void *userdata;
+   uint16_t size;   /**< Size of buffer for buffered tx */
+   struct rte_mbuf *pkts[];
+};
+
+/**
+ * Calculate the size of the tx buffer.
+ *
+ * @param sz
+ *   Number of stored packets.
+ */
+#define RTE_ETH_TX_BUFFER_SIZE(sz) \
+   (sizeof(struct rte_eth_dev_tx_buffer) + (sz) * sizeof(struct rte_mbuf 
*))
+
+/**
+ * Initialize default values for buffered transmitting
+ *
+ * @param buffer
+ *   Tx buffer to be initialized.
+ * @param size
+ *   Buffer size
+ * @return
+ *   0 if no error
+ */
+int
+rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size);
+
+/**
+ * Send any packets queued up for transmission on a port and HW queue
+ *
+ * This causes an explicit flush of packets previously buffered via the
+ * rte_eth_tx_buffer() function. It returns the number of packets successfully
+ * sent to the NIC, and calls the error callback for any unsent packets. Unless
+ * explicitly set up otherwise, the default callback simply frees the unsent
+ * packets back to the owning mempool.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_id
+ *   The index of the transmit queue through which

[dpdk-dev] [PATCH v2 0/2] add support for buffered tx to ethdev

2016-02-24 Thread Tomasz Kulasek

Many sample apps include internal buffering for single-packet-at-a-time
operation. Since this is such a common paradigm, this functionality is
better suited to being implemented in the ethdev API.

The new APIs in the ethdev library are:
* rte_eth_tx_buffer_init - initialize buffer
* rte_eth_tx_buffer - buffer up a single packet for future transmission
* rte_eth_tx_buffer_flush - flush any unsent buffered packets
* rte_eth_tx_buffer_set_err_callback - set up a callback to be called in
  case transmitting a buffered burst fails. By default, we just free the
  unsent packets.

As well as these, an additional reference callback is provided, which
frees the packets (as the default callback does), as well as updating a
user-provided counter, so that the number of dropped packets can be
tracked.

Due to the feedback from mailing list, that buffer management facilities
in the user application are more preferable than API simplicity, we decided
to move internal buffer table, as well as callback functions and user data,
from rte_eth_dev/rte_eth_dev_data to the application space.
It prevents ABI breakage and gives some more flexibility in the buffer's
management such as allocation, dynamical size change, reuse buffers on many
ports or after fail, and so on.


The following steps illustrate how tx buffers can be used in application:

1) Initialization

a) Allocate memory for a buffer

   struct rte_eth_dev_tx_buffer *buffer = rte_zmalloc_socket("tx_buffer",
   RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0, socket_id);

   RTE_ETH_TX_BUFFER_SIZE(size) macro computes memory required to store
   "size" packets in buffer.

b) Initialize allocated memory and set up default values. Threshold level
   must be lower than or equal to the MAX_PKT_BURST from 1a)

   rte_eth_tx_buffer_init(buffer, threshold);


c) Set error callback (optional)

   rte_eth_tx_buffer_set_err_callback(buffer, callback_fn, userdata);


2) Store packet "pkt" in buffer and send them all to the queue_id on
   port_id when number of packets reaches threshold level set up in 1b)

   rte_eth_tx_buffer(port_id, queue_id, buffer, pkt);


3) Send all stored packets to the queue_id on port_id

   rte_eth_tx_buffer_flush(port_id, queue_id, buffer);


4) Flush buffer and free memory

   rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
   ...
   rte_free(buffer);


v2 changes:
 - reworked to use new buffer model
 - buffer data and callbacks are removed from rte_eth_dev/rte_eth_dev_data,
   so this patch doesn't brake an ABI anymore
 - introduced RTE_ETH_TX_BUFFER macro and rte_eth_tx_buffer_init
 - buffers are not attached to the port-queue
 - buffers can be allocated dynamically during application work
 - size of buffer can be changed without port restart


Tomasz Kulasek (2):
  ethdev: add buffered tx api
  examples: rework to use buffered tx

 examples/l2fwd-jobstats/main.c |  104 +--
 examples/l2fwd-keepalive/main.c|  100 ---
 examples/l2fwd/main.c  |  104 +--
 examples/l3fwd-acl/main.c  |   92 --
 examples/l3fwd-power/main.c|   89 --
 examples/link_status_interrupt/main.c  |  107 +---
 .../client_server_mp/mp_client/client.c|  101 ++-
 examples/multi_process/l2fwd_fork/main.c   |   97 +--
 examples/packet_ordering/main.c|  122 +
 examples/qos_meter/main.c  |   61 ++-
 lib/librte_ether/rte_ethdev.c  |   36 
 lib/librte_ether/rte_ethdev.h  |  182 +++-
 lib/librte_ether/rte_ether_version.map |9 +
 13 files changed, 662 insertions(+), 542 deletions(-)

-- 
1.7.9.5

[dpdk-dev] [PATCH v3] i40e: fix vlan filtering

2016-02-24 Thread Bruce Richardson

On Fri, Feb 05, 2016 at 12:20:15AM +, Zhang, Helin wrote:
> 
> 
> > -Original Message-
> > From: Julien Meunier [mailto:julien.meunier at 6wind.com]
> > Sent: Thursday, February 4, 2016 7:02 PM
> > To: Zhang, Helin 
> > Cc: dev at dpdk.org
> > Subject: [PATCH v3] i40e: fix vlan filtering
> > 
> > VLAN filtering was always performed, even if hw_vlan_filter was disabled.
> > During device initialization, default filter RTE_MACVLAN_PERFECT_MATCH
> > was applied. In this situation, all incoming VLAN frames were dropped by the
> > card (increase of the register RUPP - Rx Unsupported Protocol).
> > 
> > In order to restore default behavior, if HW VLAN filtering is activated, 
> > set a
> > filter to match MAC and VLAN. If not, set a filter to only match MAC.
> > 
> > Signed-off-by: Julien Meunier 
> Acked-by: Helin Zhang 
> 
Applied to dpdk-next-net/rel_16_04

Thanks,
/Bruce

[dpdk-dev] [PATCH v3 1/3] fm10k: enable FTAG based forwarding

2016-02-24 Thread Thomas Monjalon

2016-02-24 15:42, Bruce Richardson:
> On Thu, Feb 04, 2016 at 11:38:47AM +0800, Wang Xiao W wrote:
> > This patch enables reading sglort info into mbuf for RX and inserting
> > an FTAG at the beginning of the packet for TX. The vlan_tci_outer field
> > selected from rte_mbuf structure for sglort is not used in fm10k now.
> > In FTAG based forwarding mode, the switch will forward packets according
> > to glort info in FTAG rather than mac and vlan table.
> > 
> > To activate this feature, user needs to turn 
> > ``CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD``
> > to y in common_linuxapp or common_bsdapp. Currently this feature is 
> > supported
> > only on PF, because FM10K_PFVTCTL register is read-only for VF.
> > 
> > Signed-off-by: Wang Xiao W 
> 
> Any comments on this patch?
> 
> My thoughts: is there a way in which this could be done without adding in a 
> new
> build time config option?

Bruce, it's simpler to explain that build time options are forbidden to
enable such options.
Or the terrific kid's approach: one day, the Big Build-Option Eater will come
and will eat every undecided features! ;)

[dpdk-dev] [PATCH v3 1/3] fm10k: enable FTAG based forwarding

2016-02-24 Thread Bruce Richardson

On Wed, Feb 24, 2016 at 05:37:45PM +0100, Thomas Monjalon wrote:
> 2016-02-24 15:42, Bruce Richardson:
> > On Thu, Feb 04, 2016 at 11:38:47AM +0800, Wang Xiao W wrote:
> > > This patch enables reading sglort info into mbuf for RX and inserting
> > > an FTAG at the beginning of the packet for TX. The vlan_tci_outer field
> > > selected from rte_mbuf structure for sglort is not used in fm10k now.
> > > In FTAG based forwarding mode, the switch will forward packets according
> > > to glort info in FTAG rather than mac and vlan table.
> > > 
> > > To activate this feature, user needs to turn 
> > > ``CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD``
> > > to y in common_linuxapp or common_bsdapp. Currently this feature is 
> > > supported
> > > only on PF, because FM10K_PFVTCTL register is read-only for VF.
> > > 
> > > Signed-off-by: Wang Xiao W 
> > 
> > Any comments on this patch?
> > 
> > My thoughts: is there a way in which this could be done without adding in a 
> > new
> > build time config option?
> 
> Bruce, it's simpler to explain that build time options are forbidden to
> enable such options.
> Or the terrific kid's approach: one day, the Big Build-Option Eater will come
> and will eat every undecided features! ;)

Good-cop, bad-cop, guess who I'm playing? :-)

/Bruce

[dpdk-dev] [PATCH] vhost: broadcast RARP pkt by injecting it to receiving mbuf array

2016-02-24 Thread Yuanhan Liu

On Wed, Feb 24, 2016 at 08:15:36AM +, Qiu, Michael wrote:
> On 2/22/2016 10:35 PM, Yuanhan Liu wrote:
> > Broadcast RARP packet by injecting it to receiving mbuf array at
> > rte_vhost_dequeue_burst().
> >
> > Commit 33226236a35e ("vhost: handle request to send RARP") iterates
> > all host interfaces and then broadcast it by all of them.  It did
> > notify the switches about the new location of the migrated VM, however,
> > the mac learning table in the target host is wrong (at least in my
> > test with OVS):
> >
> > $ ovs-appctl fdb/show ovsbr0
> >  port  VLAN  MACAge
> > 1 0  b6:3c:72:71:cd:4d   10
> > LOCAL 0  b6:3c:72:71:cd:4e   10
> > LOCAL 0  52:54:00:12:34:689
> > 1 0  56:f6:64:2c:bc:c01
> >
> > Where 52:54:00:12:34:68 is the mac of the VM. As you can see from the
> > above, the port learned is "LOCAL", which is the "ovsbr0" port. That
> > is reasonable, since we indeed send the pkt by the "ovsbr0" interface.
> >
> > The wrong mac table lead all the packets to the VM go to the "ovsbr0"
> > in the end, which ends up with all packets being lost, until the guest
> > send a ARP quest (or reply) to refresh the mac learning table.
> >
> > Jianfeng then came up with a solution I have thought of firstly but NAKed
> 
> Is it suitable to mention someone in the commit log?

Why it's not? It's not a secret name or something like that after all :)

On the other hand, it's way of thanking Jianfeng's contribution to this
patch.

--yliu

[dpdk-dev] [PATCH 3/3] ixgbe: allow use of zero MAC address with VF

2016-02-24 Thread Bernard Iremonger

Reprogram the RAR[0] with a zero MAC address,
to ensure that the VF traffic goes to the PF
after stop, close and detach of the VF.

Fixes: af75078fece3 ("first public release")
Fixes: 00e30184daa0 ("ixgbe: add PF support")
Signed-off-by: Bernard Iremonger 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 10 +-
 drivers/net/ixgbe/ixgbe_pf.c |  4 ++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 759177a..5608f67 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -3902,6 +3902,7 @@ static void
 ixgbevf_dev_close(struct rte_eth_dev *dev)
 {
struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct ether_addr *addr = (struct ether_addr *)hw->mac.addr;

PMD_INIT_FUNC_TRACE();

@@ -3911,7 +3912,14 @@ ixgbevf_dev_close(struct rte_eth_dev *dev)

ixgbe_dev_free_queues(dev);

-   /* reprogram the RAR[0] in case user changed it. */
+   memset(addr->addr_bytes, 0, ETHER_ADDR_LEN);
+
+   /**
+* reprogram the RAR[0] with a zero mac address.
+* to ensure that the VF traffic goes to the PF
+* after stop, close and detach of the VF
+**/
+
ixgbe_set_rar(hw, 0, hw->mac.addr, 0, IXGBE_RAH_AV);
 }

diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c
index 2ffbd1f..db2dba4 100644
--- a/drivers/net/ixgbe/ixgbe_pf.c
+++ b/drivers/net/ixgbe/ixgbe_pf.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -445,7 +445,7 @@ ixgbe_vf_set_mac_addr(struct rte_eth_dev *dev, uint32_t vf, 
uint32_t *msgbuf)
int rar_entry = hw->mac.num_rar_entries - (vf + 1);
uint8_t *new_mac = (uint8_t *)([1]);

-   if (is_valid_assigned_ether_addr((struct ether_addr*)new_mac)) {
+   if (is_unicast_ether_addr((struct ether_addr *)new_mac)) {
rte_memcpy(vfinfo[vf].vf_mac_addresses, new_mac, 6);
return hw->mac.ops.set_rar(hw, rar_entry, new_mac, vf, 
IXGBE_RAH_AV);
}
-- 
2.6.3

[dpdk-dev] [PATCH 2/3] ixgbe: add more information to the error message

2016-02-24 Thread Bernard Iremonger

Add the nb_rx_q and nb_tx_q values to the error message
to give details about the error.

Fixes: 27b609cbd1c6 ("ethdev: move the multi-queue mode check to specific 
drivers")
Signed-off-by: Bernard Iremonger 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 0db7f51..759177a 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1835,7 +1835,9 @@ ixgbe_check_mq_mode(struct rte_eth_dev *dev)
if ((nb_rx_q > RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool) ||
(nb_tx_q > RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool)) {
PMD_INIT_LOG(ERR, "SRIOV is active,"
-   " queue number must less equal to %d.",
+   " nb_rx_q=%d nb_tx_q=%d queue number"
+   " must be less than or equal to %d.",
+   nb_rx_q, nb_tx_q,
RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool);
return -EINVAL;
}
-- 
2.6.3

[dpdk-dev] [PATCH 1/3] ixgbe: cleanup eth_ixgbevf_dev_uninit

2016-02-24 Thread Bernard Iremonger

Releasing the rx and tx queues is already done in ixgbe_dev_close()
so it does not need to be done in eth_ixgbevf_dev_uninit().

Fixes: 2866c5f1b87e ("ixgbe: support port hotplug")
Signed-off-by: Bernard Iremonger 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 3e6fe86..0db7f51 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -1390,7 +1390,6 @@ static int
 eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev)
 {
struct ixgbe_hw *hw;
-   unsigned i;

PMD_INIT_FUNC_TRACE();

@@ -1409,18 +1408,6 @@ eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev)
/* Disable the interrupts for VF */
ixgbevf_intr_disable(hw);

-   for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
-   ixgbe_dev_rx_queue_release(eth_dev->data->rx_queues[i]);
-   eth_dev->data->rx_queues[i] = NULL;
-   }
-   eth_dev->data->nb_rx_queues = 0;
-
-   for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
-   ixgbe_dev_tx_queue_release(eth_dev->data->tx_queues[i]);
-   eth_dev->data->tx_queues[i] = NULL;
-   }
-   eth_dev->data->nb_tx_queues = 0;
-
rte_free(eth_dev->data->mac_addrs);
eth_dev->data->mac_addrs = NULL;

-- 
2.6.3

[dpdk-dev] [PATCH 0/3] ixgbe fixes

2016-02-24 Thread Bernard Iremonger

This patch set implements the following:
Removes code which was duplicated in eth_ixgbevf_dev_init().
Adds more information to the error message in ixgbe_check_mq_mode().
Allows the MAC address of the VF to be set to zero.

Bernard Iremonger (3):
  ixgbe: cleanup eth_ixgbevf_dev_uninit
  ixgbe: add more information to the error message
  ixgbe: allow use of zero MAC address with VF

 drivers/net/ixgbe/ixgbe_ethdev.c | 29 +
 drivers/net/ixgbe/ixgbe_pf.c |  4 ++--
 2 files changed, 15 insertions(+), 18 deletions(-)

-- 
2.6.3

[dpdk-dev] [PATCH 01/10] ethdev: add a generic flow and new behavior switch to fdir

2016-02-24 Thread Thomas Monjalon

2016-02-24 14:43, Bruce Richardson:
> On Wed, Feb 03, 2016 at 02:02:22PM +0530, Rahul Lakkireddy wrote:
> > Add a new raw packet flow that allows specifying generic flow input.
> > 
> > Add the ability to provide masks for fields in flow to allow range of
> > values.
> > 
> > Add a new behavior switch.
> > 
> > Add the ability to provide behavior arguments to allow rewriting matched
> > fields with new values. Ex: allows to provide new ip and port addresses
> > to rewrite the fields of packets matching a filter rule before NAT'ing.
> > 
> Thomas, any comments as ethdev maintainer?

Yes, some comments.
First, there are several different changes in the same patch. It must be split.
Then I don't understand at all the raw flow filter. What is a raw flow?
How behavior_arg must be used?

[dpdk-dev] [PATCH v3 1/3] fm10k: enable FTAG based forwarding

2016-02-24 Thread Bruce Richardson

On Thu, Feb 04, 2016 at 11:38:47AM +0800, Wang Xiao W wrote:
> This patch enables reading sglort info into mbuf for RX and inserting
> an FTAG at the beginning of the packet for TX. The vlan_tci_outer field
> selected from rte_mbuf structure for sglort is not used in fm10k now.
> In FTAG based forwarding mode, the switch will forward packets according
> to glort info in FTAG rather than mac and vlan table.
> 
> To activate this feature, user needs to turn 
> ``CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD``
> to y in common_linuxapp or common_bsdapp. Currently this feature is supported
> only on PF, because FM10K_PFVTCTL register is read-only for VF.
> 
> Signed-off-by: Wang Xiao W 

Any comments on this patch?

My thoughts: is there a way in which this could be done without adding in a new
build time config option?

/Bruce

> ---
>  config/common_bsdapp   |  1 +
>  config/common_linuxapp |  1 +
>  drivers/net/fm10k/fm10k_ethdev.c   | 12 
>  drivers/net/fm10k/fm10k_rxtx.c | 17 +
>  drivers/net/fm10k/fm10k_rxtx_vec.c |  9 +
>  5 files changed, 40 insertions(+)
> 
> diff --git a/config/common_bsdapp b/config/common_bsdapp
> index ed7c31c..451f81a 100644
> --- a/config/common_bsdapp
> +++ b/config/common_bsdapp
> @@ -208,6 +208,7 @@ CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX=n
>  CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX_FREE=n
>  CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n
>  CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y
> +CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD=n
>  
>  #
>  # Compile burst-oriented Mellanox ConnectX-3 (MLX4) PMD
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index 74bc515..c928bce 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -207,6 +207,7 @@ CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX_FREE=n
>  CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n
>  CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y
>  CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y
> +CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD=n
>  
>  #
>  # Compile burst-oriented Mellanox ConnectX-3 (MLX4) PMD
> diff --git a/drivers/net/fm10k/fm10k_ethdev.c 
> b/drivers/net/fm10k/fm10k_ethdev.c
> index e4aed94..65d355e 100644
> --- a/drivers/net/fm10k/fm10k_ethdev.c
> +++ b/drivers/net/fm10k/fm10k_ethdev.c
> @@ -668,6 +668,18 @@ fm10k_dev_tx_init(struct rte_eth_dev *dev)
>   PMD_INIT_LOG(ERR, "failed to disable queue %d", i);
>   return -1;
>   }
> +#ifdef RTE_LIBRTE_FM10K_FTAG_FWD
> + /* Enable use of FTAG bit in TX descriptor, PFVTCTL
> +  * register is read-only for VF.
> +  */
> + if (hw->mac.type == fm10k_mac_pf)
> + FM10K_WRITE_REG(hw, FM10K_PFVTCTL(i),
> + FM10K_PFVTCTL_FTAG_DESC_ENABLE);
> + else {
> + PMD_INIT_LOG(ERR, "FTAG is not supported in VF.");
> + return -ENOTSUP;
> + }
> +#endif
>  
>   /* set location and size for descriptor ring */
>   FM10K_WRITE_REG(hw, FM10K_TDBAL(i),
> diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
> index e958865..f87987d 100644
> --- a/drivers/net/fm10k/fm10k_rxtx.c
> +++ b/drivers/net/fm10k/fm10k_rxtx.c
> @@ -152,6 +152,13 @@ fm10k_recv_pkts(void *rx_queue, struct rte_mbuf 
> **rx_pkts,
>*/
>   mbuf->ol_flags |= PKT_RX_VLAN_PKT;
>   mbuf->vlan_tci = desc.w.vlan;
> +#ifdef RTE_LIBRTE_FM10K_FTAG_FWD
> + /**
> +  * mbuf->vlan_tci_outer is an idle field in fm10k driver,
> +  * so it can be selected to store sglort value.
> +  */
> + mbuf->vlan_tci_outer = rte_le_to_cpu_16(desc.w.sglort);
> +#endif
>  
>   rx_pkts[count] = mbuf;
>   if (++next_dd == q->nb_desc) {
> @@ -307,6 +314,13 @@ fm10k_recv_scattered_pkts(void *rx_queue, struct 
> rte_mbuf **rx_pkts,
>*/
>   mbuf->ol_flags |= PKT_RX_VLAN_PKT;
>   first_seg->vlan_tci = desc.w.vlan;
> +#ifdef RTE_LIBRTE_FM10K_FTAG_FWD
> + /**
> +  * mbuf->vlan_tci_outer is an idle field in fm10k driver,
> +  * so it can be selected to store sglort value.
> +  */
> + first_seg->vlan_tci_outer = rte_le_to_cpu_16(desc.w.sglort);
> +#endif
>  
>   /* Prefetch data of first segment, if configured to do so. */
>   rte_packet_prefetch((char *)first_seg->buf_addr +
> @@ -432,6 +446,9 @@ static inline void tx_xmit_pkt(struct fm10k_tx_queue *q, 
> struct rte_mbuf *mb)
>   q->nb_free -= mb->nb_segs;
>  
>   q->hw_ring[q->next_free].flags = 0;
> +#ifdef RTE_LIBRTE_FM10K_FTAG_FWD
> + q->hw_ring[q->next_free].flags |= FM10K_TXD_FLAG_FTAG;
> +#endif
>   /* set checksum flags on first descriptor of packet. SCTP checksum
>* offload is not supported, but we do not explicitly check for this
>* case in favor of greatly

[dpdk-dev] [PATCH] Correcting upstream kernel version in driver

2016-02-24 Thread Thomas Monjalon

2016-02-19 16:43, Declan Doherty:
> On 10/02/16 23:28, John Griffin wrote:
> > Fixing the version of the kernel required in the QAT documentation.
> >
> > Signed-off-by: John Griffin 
> 
> Acked by: Declan Doherty 

Applied, thanks

[dpdk-dev] [PATCH v2] aesni_mb: strict-aliasing rule compilation fix

2016-02-24 Thread Thomas Monjalon

> > Fixes: 924e84f87306 ("aesni_mb: add driver for multi buffer based crypto")
> > 
> > When compiling the AESNI_MB PMD with GCC 4.4.7 on Centos 6.7 a
> > "dereferencing
> > pointer ?obj_p? does break strict-aliasing rules" warning occurs in the
> > get_session() function.
> > 
> > Signed-off-by: Declan Doherty 
> 
> Acked-by: Pablo de Lara 

Applied, thanks

[dpdk-dev] [PATCH] aesni_mb: fix wrong return value

2016-02-24 Thread Thomas Monjalon

2016-02-18 15:39, Declan Doherty:
> On 15/02/16 16:45, Pablo de Lara wrote:
> > cryptodev_aesni_mb_init was returning the device id of
> > the device just created, but rte_eal_vdev_init
> > (the function that calls the first one), was expecting 0 or
> > negative value.
> > This made impossible to create more than one aesni_mb device
> > from command line.
> >
> > Fixes: 924e84f87306 ("aesni_mb: add driver for multi buffer based crypto")
> >
> > Signed-off-by: Pablo de Lara 
> 
> Acked-by: Declan Doherty 

Applied, thanks

[dpdk-dev] [PATCH v3 0/4] Various fixes for L2fwd-crypto

2016-02-24 Thread Thomas Monjalon

2016-02-15 16:44, Declan Doherty:
> On 12/02/16 09:17, Pablo de Lara wrote:
> > Pablo de Lara (4):
> >l2fwd-crypto: fix total stats
> >l2fwd-crypto: fix incorrect params in command line help
> >l2fwd-crypto: fix auth params setting
> >l2fwd-crypto: fix typos
> >
> >   examples/l2fwd-crypto/main.c | 26 --
> >   1 file changed, 16 insertions(+), 10 deletions(-)
> >
> 
> Series Acked-by: Declan Doherty 

Applied, thanks

[dpdk-dev] [PATCH] example/ipsec-secgw: ipsec security gateway

2016-02-24 Thread Sergio Gonzalez Monroy

On 24/02/2016 13:32, Thomas Monjalon wrote:
> Hi,
>
> 2016-01-29 20:29, Sergio Gonzalez Monroy:
>> Sample app implementing an IPsec Security Geteway.
>> The main goal of this app is to show the use of cryptodev framework
>> in a real world application.
>>
>> Currently only supported static IPv4 IPsec tunnels using AES-CBC
>> and HMAC-SHA1.
>>
>> Also, currently not supported:
>> - SA auto negotiation (No IKE support)
>> - chained mbufs
> Is 32-bit arch supported?

It's meant to.

I'll fix it in next version.

Sergio
> I see this error:
> error: left shift count >= width of type [-Werror=shift-count-overflow]
> (ethhdr[1] & (0xUL << 48));
>
>

[dpdk-dev] [PATCH RFC v3 3/3] vhost: avoid reordering of used->idx and last_used_idx updating.

2016-02-24 Thread Ilya Maximets

Calling rte_vhost_enqueue_burst() simultaneously from different threads
for the same queue_id requires additional SMP memory barrier to avoid
reordering of used->idx and last_used_idx updates.

In case of virtio_dev_rx() memory barrier rte_mb() simply moved one
instruction higher.

Signed-off-by: Ilya Maximets 
---
 lib/librte_vhost/vhost_rxtx.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index a8e2582..4d37aa3 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -323,13 +323,16 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
rte_pause();

*(volatile uint16_t *)>used->idx += count;
-   vq->last_used_idx = res_end_idx;
vhost_log_used_vring(dev, vq,
offsetof(struct vring_used, idx),
sizeof(vq->used->idx));

-   /* flush used->idx update before we read avail->flags. */
+   /*
+* Flush used->idx update to make it visible to virtio and all other
+* threads before allowing to modify it.
+*/
rte_mb();
+   vq->last_used_idx = res_end_idx;

/* Kick the guest if necessary. */
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
@@ -645,19 +648,24 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
rte_pause();

*(volatile uint16_t *)>used->idx += entry_success;
+   /*
+* Flush used->idx update to make it visible to all
+* other threads before allowing to modify it.
+*/
+   rte_smp_wmb();
+
vq->last_used_idx = res_cur_idx;
}

 merge_rx_exit:
if (likely(pkt_idx)) {
-   /* flush used->idx update before we read avail->flags. */
+   /* Flush used->idx update to make it visible to virtio. */
rte_mb();

/* Kick the guest if necessary. */
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
eventfd_write(vq->callfd, (eventfd_t)1);
}
-
return pkt_idx;
 }

-- 
2.5.0

[dpdk-dev] [PATCH RFC v3 2/3] vhost: make buf vector for scatter RX local.

2016-02-24 Thread Ilya Maximets

Array of buf_vector's is just an array for temporary storing information
about available descriptors. It used only locally in virtio_dev_merge_rx()
and there is no reason for that array to be shared.

Fix that by allocating local buf_vec inside virtio_dev_merge_rx().
buf_vec field of struct vhost_virtqueue marked as deprecated.

Signed-off-by: Ilya Maximets 
---
 doc/guides/rel_notes/deprecation.rst |  1 +
 lib/librte_vhost/rte_virtio_net.h|  2 +-
 lib/librte_vhost/vhost_rxtx.c| 49 ++--
 3 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index e94d4a2..40f350d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -7,6 +7,7 @@ API and ABI deprecation notices are to be posted here.

 Deprecation Notices
 ---
+* Field buf_vec of struct vhost_virtqueue have been deprecated.

 * The following fields have been deprecated in rte_eth_stats:
   ibadcrc, ibadlen, imcasts, fdirmatch, fdirmiss,
diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 4a2303a..e6e5cf3 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -93,7 +93,7 @@ struct vhost_virtqueue {
int enabled;
uint64_tlog_guest_addr; /**< Physical address 
of used ring, for logging */
uint64_treserved[15];   /**< Reserve some 
spaces for future extension. */
-   struct buf_vector   buf_vec[BUF_VECTOR_MAX];/**< for 
scatter RX. */
+   struct buf_vector   buf_vec[BUF_VECTOR_MAX] __rte_deprecated;   
 /**< @deprecated Buffer for scatter RX. */
 } __rte_cache_aligned;


diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 14c2159..a8e2582 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -340,7 +340,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 static inline uint32_t __attribute__((always_inline))
 copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id,
uint16_t res_base_idx, uint16_t res_end_idx,
-   struct rte_mbuf *pkt)
+   struct rte_mbuf *pkt, struct buf_vector *buf_vec)
 {
uint32_t vec_idx = 0;
uint32_t entry_success = 0;
@@ -371,7 +371,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t 
queue_id,
 */
vq = dev->virtqueue[queue_id];

-   vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr);
+   vb_addr = gpa_to_vva(dev, buf_vec[vec_idx].buf_addr);
vb_hdr_addr = vb_addr;

/* Prefetch buffer address. */
@@ -386,24 +386,24 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t 
queue_id,

rte_memcpy((void *)(uintptr_t)vb_hdr_addr,
(const void *)_hdr, vq->vhost_hlen);
-   vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr, vq->vhost_hlen);
+   vhost_log_write(dev, buf_vec[vec_idx].buf_addr, vq->vhost_hlen);

PRINT_PACKET(dev, (uintptr_t)vb_hdr_addr, vq->vhost_hlen, 1);

seg_avail = rte_pktmbuf_data_len(pkt);
vb_offset = vq->vhost_hlen;
-   vb_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen;
+   vb_avail = buf_vec[vec_idx].buf_len - vq->vhost_hlen;

entry_len = vq->vhost_hlen;

if (vb_avail == 0) {
-   uint32_t desc_idx = vq->buf_vec[vec_idx].desc_idx;
+   uint32_t desc_idx = buf_vec[vec_idx].desc_idx;

if ((vq->desc[desc_idx].flags & VRING_DESC_F_NEXT) == 0) {
idx = cur_idx & (vq->size - 1);

/* Update used ring with desc information */
-   vq->used->ring[idx].id = vq->buf_vec[vec_idx].desc_idx;
+   vq->used->ring[idx].id = buf_vec[vec_idx].desc_idx;
vq->used->ring[idx].len = entry_len;

vhost_log_used_vring(dev, vq,
@@ -416,12 +416,12 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t 
queue_id,
}

vec_idx++;
-   vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr);
+   vb_addr = gpa_to_vva(dev, buf_vec[vec_idx].buf_addr);

/* Prefetch buffer address. */
rte_prefetch0((void *)(uintptr_t)vb_addr);
vb_offset = 0;
-   vb_avail = vq->buf_vec[vec_idx].buf_len;
+   vb_avail = buf_vec[vec_idx].buf_len;
}

cpy_len = RTE_MIN(vb_avail, seg_avail);
@@ -431,7 +431,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t 
queue_id,
rte_memcpy((void *)(uintptr_t)(vb_addr + vb_offset),
rte_pktmbuf_mtod_offset(pkt, const void *, seg_offset),
cpy_len);
-   vhost_log_write(dev,

[dpdk-dev] [PATCH RFC v3 1/3] vhost: use SMP barriers instead of compiler ones.

2016-02-24 Thread Ilya Maximets

Since commit 4c02e453cc62 ("eal: introduce SMP memory barriers") virtio
uses architecture dependent SMP barriers. vHost should use them too.

Fixes: 4c02e453cc62 ("eal: introduce SMP memory barriers")

Signed-off-by: Ilya Maximets 
---
 lib/librte_vhost/vhost_rxtx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 12ce0cc..14c2159 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -316,7 +316,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
}
}

-   rte_compiler_barrier();
+   rte_smp_wmb();

/* Wait until it's our turn to add our buffer to the used ring. */
while (unlikely(vq->last_used_idx != res_base_idx))
@@ -634,7 +634,7 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
entry_success = copy_from_mbuf_to_vring(dev, queue_id,
res_base_idx, res_cur_idx, pkts[pkt_idx]);

-   rte_compiler_barrier();
+   rte_smp_wmb();

/*
 * Wait until it's our turn to add our buffer
@@ -979,7 +979,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
entry_success++;
}

-   rte_compiler_barrier();
+   rte_smp_rmb();
vq->used->idx += entry_success;
vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx),
sizeof(vq->used->idx));
-- 
2.5.0

[dpdk-dev] [PATCH RFC v3 0/3] Thread safe rte_vhost_enqueue_burst().

2016-02-24 Thread Ilya Maximets

Implementation of rte_vhost_enqueue_burst() based on lockless ring-buffer
algorithm and contains almost all to be thread-safe, but it's not.

This set adds required changes.

First patch in set is a standalone patch that fixes many times discussed
issue with barriers on different architectures.

Second and third adds fixes to make rte_vhost_enqueue_burst thread safe.

version 3:
* Rebased on top of current master.

version 2:
* Documentation patch dropped. Other patches of series still
  may be merged to fix existing issues and keep code in
  consistent state for the future.
* buf_vec field of struct vhost_virtqueue marked as deprecated.

 Ilya Maximets (3):
  vhost: use SMP barriers instead of compiler ones.
  vhost: make buf vector for scatter RX local.
  vhost: avoid reordering of used->idx and last_used_idx updating.

 doc/guides/rel_notes/deprecation.rst |  1 +
 lib/librte_vhost/rte_virtio_net.h|  2 +-
 lib/librte_vhost/vhost_rxtx.c| 71 
 3 files changed, 42 insertions(+), 32 deletions(-)

-- 
2.5.0

[dpdk-dev] [PATCH 01/10] ethdev: add a generic flow and new behavior switch to fdir

2016-02-24 Thread Bruce Richardson

On Wed, Feb 03, 2016 at 02:02:22PM +0530, Rahul Lakkireddy wrote:
> Add a new raw packet flow that allows specifying generic flow input.
> 
> Add the ability to provide masks for fields in flow to allow range of
> values.
> 
> Add a new behavior switch.
> 
> Add the ability to provide behavior arguments to allow rewriting matched
> fields with new values. Ex: allows to provide new ip and port addresses
> to rewrite the fields of packets matching a filter rule before NAT'ing.
> 
> Signed-off-by: Rahul Lakkireddy 
> Signed-off-by: Kumar Sanghvi 
> ---
>  doc/guides/rel_notes/release_2_3.rst |  3 +++
>  lib/librte_ether/rte_eth_ctrl.h  | 15 ++-
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
Thomas, any comments as ethdev maintainer?

Jingjing, you have been doing some work on flow director for other NICs. Can
you perhaps review this patch as well.

Regards,
/Bruce

[dpdk-dev] [PATCH 02/10] examples/test-cxgbe-filters: add example to test cxgbe fdir support

2016-02-24 Thread Bruce Richardson

On Wed, Feb 03, 2016 at 02:02:23PM +0530, Rahul Lakkireddy wrote:
> Add a new test_cxgbe_filters command line example to test support for
> Chelsio T5 hardware filtering. Shows how to pass the Chelsio input flow
> and input masks. Also, shows how to pass extra behavior arguments to
> rewrite fields in matched filter rules.
> 
> Also add documentation and update MAINTAINERS.
> 
> Signed-off-by: Rahul Lakkireddy 
> Signed-off-by: Kumar Sanghvi 

Hi,

for testing NIC functionality, the "testpmd" app is what is used, and it already
contains support for existing flow director functionality. Should the testing
functionality not be included there?

Note: that's not to say we don't need a simple example app as well, for 
demonstrating how to use flow director, but at minimum for nic features we
generally need to have testpmd support.

Can this patchset perhaps be changed to include some testpmd support, and maybe
have any example apps as a separate set?

Regards,
/Bruce

[dpdk-dev] [PATCH] example/ipsec-secgw: ipsec security gateway

2016-02-24 Thread Thomas Monjalon

Hi,

2016-01-29 20:29, Sergio Gonzalez Monroy:
> Sample app implementing an IPsec Security Geteway.
> The main goal of this app is to show the use of cryptodev framework
> in a real world application.
> 
> Currently only supported static IPv4 IPsec tunnels using AES-CBC
> and HMAC-SHA1.
> 
> Also, currently not supported:
> - SA auto negotiation (No IKE support)
> - chained mbufs

Is 32-bit arch supported?

I see this error:
error: left shift count >= width of type [-Werror=shift-count-overflow]
   (ethhdr[1] & (0xUL << 48));

[dpdk-dev] [PATCH v2] ixgbe: fix link down issue on x550em_x

2016-02-24 Thread Bruce Richardson

On Thu, Feb 04, 2016 at 06:21:04AM +, He, Shaopeng wrote:
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wenzhuo Lu
> > Sent: Monday, February 01, 2016 4:43 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v2] ixgbe: fix link down issue on x550em_x
> > 
> > Normally the auto-negotiation is supported by FW. But on
> > X550EM_X_10G_T it's not supported by FW. As the port of
> > X550EM_X_10G_T is 10G. If we connect the port with a peer
> > which is 1G. The link is always down.
> > We have to supprted auto-neg by SW to avoid such link down
> > issue.
> > 
> > Signed-off-by: Wenzhuo Lu 
> Acked-by: Shaopeng He 
> 
I'm a bit confused regarding the commit message and the code in the patch.
The commit message talks about enabling speed auto-negotiation, while the code
never refers to any such thing. Instead all we have are settings for 
manipulating
interrupt masks to enable PHY interrupts. I think some additional information is
needed to connect A and B together here.

A second, more minor nit is that the commit title never refers to link
auto-negotiation, but refers to this as a bug fix - which is also correct. If
this is primarily a bug fix, please include a fixes line if possible, but please
also refer to auto-neg in the title if possible too.

/Bruce

[dpdk-dev] [PATCH] config: remove duplicate configuration information

2016-02-24 Thread Trahe, Fiona



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wiles, Keith
> Sent: Wednesday, February 24, 2016 1:58 PM
> To: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] config: remove duplicate configuration
> information
> 
> >In order to cleanup the configuration files some and reduce the number
> >of duplicate configuration information. Add a new file called
> >common_base which contains just about all of the configuration lines in
> >one place. Then have the common_bsdapp, common_linuxapp files include
> >this one file. Then in those OS specific files add the delta
> >configuration lines.
> >
> >Signed-off-by: Keith Wiles 
> 
> Ping, Does this patch have any more comments, do we want to use this patch?

I'd prefer to leave as is, but don't feel strongly about it, I can work with 
either way.
Fiona

[dpdk-dev] [PATCH v3 1/1] examples/l3fwd: modify and modularize l3fwd code

2016-02-24 Thread Thomas Monjalon

Hi,

When compiling for i686, there are some errors:

2016-02-18 10:08, Piotr Azarewicz:
> +   printf("Hash: Adding 0x%" PRIx64 " keys\n", IPV4_L3FWD_EM_NUM_ROUTES);
[...]
> +   printf("Hash: Adding 0x%" PRIx64 "keys\n", IPV6_L3FWD_EM_NUM_ROUTES);

examples/l3fwd/l3fwd_em.c:437:9: error:
format ?%llx? expects argument of type ?long long unsigned int?, but argument 2 
has type ?unsigned int?

[dpdk-dev] [PATCH v3 1/1] examples/l3fwd: modify and modularize l3fwd code

2016-02-24 Thread Azarewicz, PiotrX T

> 
> Hi,
> 
> When compiling for i686, there are some errors:
> 
> 2016-02-18 10:08, Piotr Azarewicz:
> > +   printf("Hash: Adding 0x%" PRIx64 " keys\n",
> IPV4_L3FWD_EM_NUM_ROUTES);
> [...]
> > +   printf("Hash: Adding 0x%" PRIx64 "keys\n",
> IPV6_L3FWD_EM_NUM_ROUTES);
> 
> examples/l3fwd/l3fwd_em.c:437:9: error:
> format ?%llx? expects argument of type ?long long unsigned int?, but
> argument 2 has type ?unsigned int?

Thanks Thomas, v4 will be done.
Piotr

[dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

2016-02-24 Thread Panu Matilainen

On 02/23/2016 07:35 AM, Xie, Huawei wrote:
> On 2/22/2016 10:52 PM, Xie, Huawei wrote:
>> On 2/4/2016 1:24 AM, Olivier MATZ wrote:
>>> Hi,
>>>
>>> On 01/27/2016 02:56 PM, Panu Matilainen wrote:
 Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part of
 the library ABI and should not be listed in the version map.

 I assume its inline for performance reasons, but then you lose the
 benefits of dynamic linking such as ability to fix bugs and/or improve
 itby just updating the library. Since the point of having a bulk API is
 to improve performance by reducing the number of calls required, does it
 really have to be inline? As in, have you actually measured the
 difference between inline and non-inline and decided its worth all the
 downsides?
>>> Agree with Panu. It would be interesting to compare the performance
>>> between inline and non inline to decide whether inlining it or not.
>> Will update after i gathered more data. inline could show obvious
>> performance difference in some cases.
>
> Panu and Oliver:
> I write a simple benchmark. This benchmark run 10M rounds, in each round
> 8 mbufs are allocated through bulk API, and then freed.
> These are the CPU cycles measured(Intel(R) Xeon(R) CPU E5-2680 0 @
> 2.70GHz, CPU isolated, timer interrupt disabled, rcu offloaded).
> Btw, i have removed some exceptional data, the frequency of which is
> like 1/10. Sometimes observed user usage suddenly disappeared, no clue
> what happened.
>
> With 8 mbufs allocated, there is about 6% performance increase using inline.
[...]
>
> With 16 mbufs allocated, we could still observe obvious performance
> difference, though only 1%-2%
>
[...]
>
> With 32/64 mbufs allocated, the deviation of the data itself would hide
> the performance difference.
> So we prefer using inline for performance.

At least I was more after real-world performance in a real-world 
use-case rather than CPU cycles in a microbenchmark, we know function 
calls have a cost but the benefits tend to outweight the cons.

Inline functions have their place and they're far less evil in project 
internal use, but in library public API they are BAD and should be ... 
well, not banned because there are exceptions to every rule, but highly 
discouraged.

- Panu -

[dpdk-dev] [PATCH v9 0/2] Add VHOST PMD

2016-02-24 Thread Tetsuya Mukawa

On 2016/02/24 11:45, Qiu, Michael wrote:
> Hi,  Tetsuya
>
> When I applied your v6 patch, I could reach 9.5Mpps with 64B packet.
>
> But when apply v9 only 8.4 Mpps, could you figure out why has
> performance drop?

Hi Michael,

Thanks for checking it.
I tried to re-produce it, but I don't see the drop on my environment.
(My cpu is Xeon E5-2697-v2, and the performances of v6 and v9 patch are
almost 5.9Mpps)
Did you use totally same code except for vhost PMD?

Thanks,
Tetsuya

> Thanks,
> Michael
> On 2/9/2016 5:38 PM, Tetsuya Mukawa wrote:
>> The patch introduces a new PMD. This PMD is implemented as thin wrapper
>> of librte_vhost.
>>
>>
>> PATCH v9 changes:
>>  - Fix a null pointer access issue implemented in v8 patch.
>>
>> PATCH v8 changes:
>>  - Manage ether devices list instead of internal structures list.
>>  - Remove needless NULL checking.
>>  - Replace "pthread_exit" to "return NULL".
>>  - Replace rte_panic to RTE_LOG, also add error handling.
>>  - Remove duplicated lines.
>>  - Remove needless casting.
>>  - Follow coding style.
>>  - Remove needless parenthesis.
>>
>> PATCH v7 changes:
>>  - Remove needless parenthesis.
>>  - Add release note.
>>  - Remove needless line wraps.
>>  - Add null pointer check in vring_state_changed().
>>  - Free queue memory in eth_queue_release().
>>  - Fix wrong variable name.
>>  - Fix error handling code of eth_dev_vhost_create() and
>>rte_pmd_vhost_devuninit().
>>  - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
>>  - Use port id to create mac address.
>>  - Add doxygen style comments in "rte_eth_vhost.h".
>>  - Fix wrong comment in "mk/rte.app.mk".
>>
>> PATCH v6 changes:
>>  - Remove rte_vhost_driver_pmd_callback_registe().
>>  - Support link status interrupt.
>>  - Support queue state changed interrupt.
>>  - Add rte_eth_vhost_get_queue_event().
>>  - Support numa node detection when new device is connected.
>>
>> PATCH v5 changes:
>>  - Rebase on latest master.
>>  - Fix RX/TX routine to count RX/TX bytes.
>>  - Fix RX/TX routine not to count as error packets if enqueue/dequeue
>>cannot send all packets.
>>  - Fix if-condition checking for multiqueues.
>>  - Add "static" to pthread variable.
>>  - Fix format.
>>  - Change default behavior not to receive queueing event from driver.
>>  - Split the patch to separate rte_eth_vhost_portid2vdev().
>>
>> PATCH v4 changes:
>>  - Rebase on latest DPDK tree.
>>  - Fix cording style.
>>  - Fix code not to invoke multiple messaging handling threads.
>>  - Fix code to handle vdev parameters correctly.
>>  - Remove needless cast.
>>  - Remove needless if-condition before rt_free().
>>
>> PATCH v3 changes:
>>  - Rebase on latest matser
>>  - Specify correct queue_id in RX/TX function.
>>
>> PATCH v2 changes:
>>  - Remove a below patch that fixes vhost library.
>>The patch was applied as a separate patch.
>>- vhost: fix crash with multiqueue enabled
>>  - Fix typos.
>>(Thanks to Thomas, Monjalon)
>>  - Rebase on latest tree with above bernard's patches.
>>
>> PATCH v1 changes:
>>  - Support vhost multiple queues.
>>  - Rebase on "remove pci driver from vdevs".
>>  - Optimize RX/TX functions.
>>  - Fix resource leaks.
>>  - Fix compile issue.
>>  - Add patch to fix vhost library.
>>
>> RFC PATCH v3 changes:
>>  - Optimize performance.
>>In RX/TX functions, change code to access only per core data.
>>  - Add below API to allow user to use vhost library APIs for a port managed
>>by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
>> - rte_eth_vhost_portid2vdev()
>>To support this functionality, vhost library is also changed.
>>Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
>>  - Add code to support vhost multiple queues.
>>Actually, multiple queues functionality is not enabled so far.
>>
>> RFC PATCH v2 changes:
>>  - Fix issues reported by checkpatch.pl
>>(Thanks to Stephen Hemminger)
>>
>>
>> Tetsuya Mukawa (2):
>>   ethdev: Add a new event type to notify a queue state changed event
>>   vhost: Add VHOST PMD
>>
>>  MAINTAINERS |   4 +
>>  config/common_linuxapp  |   6 +
>>  doc/guides/nics/index.rst   |   1 +
>>  doc/guides/rel_notes/release_2_3.rst|   4 +
>>  drivers/net/Makefile|   4 +
>>  drivers/net/vhost/Makefile  |  62 ++
>>  drivers/net/vhost/rte_eth_vhost.c   | 911 
>> 
>>  drivers/net/vhost/rte_eth_vhost.h   | 109 
>>  drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
>>  lib/librte_ether/rte_ethdev.h   |   2 +
>>  mk/rte.app.mk   |   6 +
>>  11 files changed, 1120 insertions(+)
>>  create mode 100644 drivers/net/vhost/Makefile
>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>>  create mode 100644

[dpdk-dev] [PATCH 1/2] cxgbe: fix to copy pci info to other ports under same PF

2016-02-24 Thread Bruce Richardson

On Sun, Jan 31, 2016 at 04:52:49PM +0530, Rahul Lakkireddy wrote:
> Chelsio NIC ports share a single PF. Move rte_eth_copy_pci_info()
> to copy the pci device information to the remaining ports as well.
> 
> Fixes: eeefe73f0af1 ("drivers: copy PCI device info to ethdev data")
> 
> Signed-off-by: Rahul Lakkireddy 
> Signed-off-by: Kumar Sanghvi 
> ---

Hi,

can you perhaps submit this fix as a patch alone, without the copyright update
patch attached. [Feel free to upate the copyright year on the two files affected
here by this change, if you like.]

FYI: Also, the commit title is slightly too long. It should be around 50 
characters
long (for me, vim highlights the correct length for me). It could be shortened
by dropping the word "other". :-)

/Bruce

>  drivers/net/cxgbe/cxgbe_ethdev.c | 2 --
>  drivers/net/cxgbe/cxgbe_main.c   | 3 +++
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c 
> b/drivers/net/cxgbe/cxgbe_ethdev.c
> index 97ef152..fd0eb1c 100644
> --- a/drivers/net/cxgbe/cxgbe_ethdev.c
> +++ b/drivers/net/cxgbe/cxgbe_ethdev.c
> @@ -819,8 +819,6 @@ static int eth_cxgbe_dev_init(struct rte_eth_dev *eth_dev)
>  
>   pci_dev = eth_dev->pci_dev;
>  
> - rte_eth_copy_pci_info(eth_dev, pci_dev);
> -
>   snprintf(name, sizeof(name), "cxgbeadapter%d", eth_dev->data->port_id);
>   adapter = rte_zmalloc(name, sizeof(*adapter), 0);
>   if (!adapter)
> diff --git a/drivers/net/cxgbe/cxgbe_main.c b/drivers/net/cxgbe/cxgbe_main.c
> index aff23d0..6c7eb7f 100644
> --- a/drivers/net/cxgbe/cxgbe_main.c
> +++ b/drivers/net/cxgbe/cxgbe_main.c
> @@ -1166,6 +1166,9 @@ allocate_mac:
>   pi->eth_dev->dev_ops = adapter->eth_dev->dev_ops;
>   pi->eth_dev->tx_pkt_burst = adapter->eth_dev->tx_pkt_burst;
>   pi->eth_dev->rx_pkt_burst = adapter->eth_dev->rx_pkt_burst;
> +
> + rte_eth_copy_pci_info(pi->eth_dev, pi->eth_dev->pci_dev);
> +
>   TAILQ_INIT(>eth_dev->link_intr_cbs);
>  
>   pi->eth_dev->data->mac_addrs = rte_zmalloc(name,
> -- 
> 2.5.3
>

[dpdk-dev] [PATCH v3] af_packet: make the device detachable

2016-02-24 Thread Iremonger, Bernard

Hi Wojciech,


> Subject: [PATCH v3] af_packet: make the device detachable
> 
> Allow dynamic deallocation of af_packet device through proper API
> functions. To achieve this:
> * set device flag to RTE_ETH_DEV_DETACHABLE
> * implement rte_pmd_af_packet_devuninit() and expose it
>   through rte_driver.uninit()
> * copy device name to ethdev->data to make discoverable with
>   rte_eth_dev_allocated()
> Moreover, make af_packet init function static, as there is no reason to keep
> it public.
> 
> Signed-off-by: Wojciech Zmuda 
> ---
> v3:
> * Rephrased feature note in release notes.
> * Rephrased commit log.
> * Added API change note in release notes.
> * Made init function static.
> * Removed af_packet header file, as it is not needed
>   after init function is not public anymore.
> 
> v2:
> * Fixed typo and a comment.
> * Added feature to the 2.3 release notes.
> * Free memory allocated for rx and tx queues.
> 
>  doc/guides/rel_notes/release_2_3.rst   |  6 +++
>  drivers/net/af_packet/Makefile |  5 --
>  drivers/net/af_packet/rte_eth_af_packet.c  | 43 --
>  drivers/net/af_packet/rte_eth_af_packet.h  | 53 
> --
>  .../net/af_packet/rte_pmd_af_packet_version.map|  3 --
>  5 files changed, 45 insertions(+), 65 deletions(-)  delete mode 100644
> drivers/net/af_packet/rte_eth_af_packet.h
> 
> diff --git a/doc/guides/rel_notes/release_2_3.rst
> b/doc/guides/rel_notes/release_2_3.rst
> index 7945694..da4abc3 100644
> --- a/doc/guides/rel_notes/release_2_3.rst
> +++ b/doc/guides/rel_notes/release_2_3.rst

The release_2_3.rst file has been renamed to release_16_04.rst so this patch no 
longer applies.

> @@ -39,6 +39,9 @@ This section should contain new features added in this
> release. Sample format:
> 
>Enabled virtio 1.0 support for virtio pmd driver.
> 
> +* **Added af_packet dynamic removal function.**
> +
> +  Af_packet device can now be detached using API, like other PMD devices.
> 
>  Resolved Issues
>  ---
> @@ -91,6 +94,9 @@ This section should contain API changes. Sample format:
>  * Add a short 1-2 sentence description of the API change. Use fixed width
>quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past
> tense.
> 
> +* Af_packet device init function is no longer public. Device should be
> +attached
> +  with API.
> +
> 
>  ABI Changes
>  ---
> diff --git a/drivers/net/af_packet/Makefile
> b/drivers/net/af_packet/Makefile index ce5d239..cb1a7ae 100644
> --- a/drivers/net/af_packet/Makefile
> +++ b/drivers/net/af_packet/Makefile
> @@ -50,11 +50,6 @@ CFLAGS += $(WERROR_FLAGS)  #
>  SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += rte_eth_af_packet.c
> 
> -#
> -# Export include files
> -#
> -SYMLINK-y-include += rte_eth_af_packet.h
> -
>  # this lib depends upon:
>  DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += lib/librte_mbuf
>  DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += lib/librte_ether diff -
> -git a/drivers/net/af_packet/rte_eth_af_packet.c
> b/drivers/net/af_packet/rte_eth_af_packet.c
> index 767f36b..5544528 100644
> --- a/drivers/net/af_packet/rte_eth_af_packet.c
> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
> @@ -53,8 +53,6 @@
>  #include 
>  #include 
> 
> -#include "rte_eth_af_packet.h"
> -
>  #define ETH_AF_PACKET_IFACE_ARG  "iface"
>  #define ETH_AF_PACKET_NUM_Q_ARG  "qpairs"
>  #define ETH_AF_PACKET_BLOCKSIZE_ARG  "blocksz"
> @@ -65,6 +63,8 @@
>  #define DFLT_FRAME_SIZE  (1 << 11)
>  #define DFLT_FRAME_COUNT (1 << 9)
> 
> +#define RTE_PMD_AF_PACKET_MAX_RINGS 16
> +
>  struct pkt_rx_queue {
>   int sockfd;
> 
> @@ -667,11 +667,13 @@ rte_pmd_init_internals(const char *name,
>   data->nb_tx_queues = (uint16_t)nb_queues;
>   data->dev_link = pmd_link;
>   data->mac_addrs = &(*internals)->eth_addr;
> + strncpy(data->name,
> + (*eth_dev)->data->name, strlen((*eth_dev)->data-
> >name));
> 
>   (*eth_dev)->data = data;
>   (*eth_dev)->dev_ops = 
>   (*eth_dev)->driver = NULL;
> - (*eth_dev)->data->dev_flags = 0;
> + (*eth_dev)->data->dev_flags = RTE_ETH_DEV_DETACHABLE;
>   (*eth_dev)->data->drv_name = drivername;
>   (*eth_dev)->data->kdrv = RTE_KDRV_NONE;
>   (*eth_dev)->data->numa_node = numa_node; @@ -798,7 +800,7
> @@ rte_eth_from_packet(const char *name,
>   return 0;
>  }
> 
> -int
> +static int
>  rte_pmd_af_packet_devinit(const char *name, const char *params)  {
>   unsigned numa_node;
> @@ -836,10 +838,43 @@ exit:
>   return ret;
>  }
> 
> +static int
> +rte_pmd_af_packet_devuninit(const char *name) {
> + struct rte_eth_dev *eth_dev = NULL;
> + struct pmd_internals *internals;
> + unsigned q;
> +
> + RTE_LOG(INFO, PMD, "Closing AF_PACKET ethdev on numa socket
> %u\n",
> + rte_socket_id());
> +
> + if (name == NULL)
> + return -1;
> +
> + /* find the ethdev entry */
> + eth_dev =

[dpdk-dev] [PATCH 2/2] cxgbe: update license year to 2016

2016-02-24 Thread Bruce Richardson

On Sun, Jan 31, 2016 at 04:52:50PM +0530, Rahul Lakkireddy wrote:
> Update CXGBE PMD license year to 2016.
> 
> Signed-off-by: Rahul Lakkireddy 
> Signed-off-by: Kumar Sanghvi 
> ---

Although I don't think it's officially documented, in DPDK - as in many open
source projects - the license year is only updated once a modification is made
to the file in question.

/Bruce

[dpdk-dev] [PATCH v4] eal: add function to check if primary proc alive

2016-02-24 Thread Tahhan, Maryam

> From: Van Haaren, Harry
> Sent: Tuesday, February 23, 2016 2:10 PM
> To: david.marchand at 6wind.com
> Cc: Tahhan, Maryam ; dev at dpdk.org; Van
> Haaren, Harry 
> Subject: [PATCH v4] eal: add function to check if primary proc alive
> 
> This patch adds a new function to the EAL API:
> int rte_eal_primary_proc_alive(const char *path);
> 
> The function indicates if a primary process is alive right now.
> This functionality is implemented by testing for a write- lock on the
> config file, and the function tests for a lock.
> 
> The use case for this functionality is that a secondary process can wait
> until a primary process starts by polling the function and waiting. When
> the primary is running, the secondary continues to poll to detect if the
> primary process has quit unexpectedly, the secondary process can detect
> this.
> 
> The RTE_MAGIC number is written to the shared config by the primary
> process, this is the signal to the secondary process that the EAL is set up,
> and ready to be used. The function
> rte_eal_mcfg_complete() writes RTE_MAGIC. This has been delayed in
> the EAL init proceedure, as the PCI probing in the primary process can
> interfere with the secondary running.
> 
> Signed-off-by: Harry van Haaren 

Acked-by: Maryam Tahhan

[dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL extension

2016-02-24 Thread Ananyev, Konstantin



> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, February 24, 2016 11:46 AM
> To: Ananyev, Konstantin
> Cc: Richardson, Bruce; dev at dpdk.org; Kantecki, Tomasz
> Subject: Re: [dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL 
> extension
> 
> 2016-02-24 11:21, Ananyev, Konstantin:
> >
> > > -Original Message-
> > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > Sent: Wednesday, February 24, 2016 10:35 AM
> > > To: Ananyev, Konstantin
> > > Cc: Richardson, Bruce; dev at dpdk.org; Kantecki, Tomasz
> > > Subject: Re: [dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL 
> > > extension
> > >
> > > 2016-02-24 10:22, Ananyev, Konstantin:
> > > >
> > > > > -Original Message-
> > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce 
> > > > > Richardson
> > > > > Sent: Wednesday, February 24, 2016 10:10 AM
> > > > > To: Thomas Monjalon
> > > > > Cc: dev at dpdk.org; Kantecki, Tomasz
> > > > > Subject: Re: [dpdk-dev] [PATCH] eal: Initial implementation of PQoS 
> > > > > EAL extension
> > > > >
> > > > > On Wed, Feb 24, 2016 at 09:24:33AM +0100, Thomas Monjalon wrote:
> > > > > > 2016-02-23 23:03, Kantecki, Tomasz:
> > > > > > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > > > > If there is nothing specific in DPDK for PQos, why writing an 
> > > > > > > > example in
> > > > > > > > DPDK?
> > > > > > > The example makes it much easier to use the technology with DPDK.
> > > > > > >
> > > > > > > > Maybe the example should be better in the library itself.
> > > > > > > The library in question (https://github.com/01org/intel-cmt-cat) 
> > > > > > > has a couple of examples but none of them refers to
> DPDK.
> > > > > > >
> > > > > > > > I suggest to mention the library in
> > > > > > > > doc/guides/linux_gsg/nic_perf_intel_platform.rst
> > > > > > > Ok it can be added to this document. Does it imply -1 for the 
> > > > > > > sample code idea?
> > > > > >
> > > > > > I may be wrong but I have the feeling the example is more about 
> > > > > > PQoS than DPDK.
> > > > > > So yes, I would vote -1.
> > > > > >
> > > > > Well, the intersection of DPDK and PQoS is what the example is really 
> > > > > all about,
> > > > > and as such it is relevant to both DPDK and the library itself. 
> > > > > Platform QoS
> > > > > can be of great use to packet processing applications for helping to 
> > > > > ensure that
> > > > > the app gets the resources it needed - especially in a virtualised 
> > > > > world - and
> > > > > so we believe that having an example in DPDK showing how to use PQoS 
> > > > > with DPDK
> > > > > is well worthwhile having. It's more effective than a simple doc 
> > > > > update in
> > > > > raising awareness of the existence of the feature, and also provides 
> > > > > for DPDK
> > > > > users a readily available app for the user to start playing with to 
> > > > > evaluate
> > > > > PQoS for their own use-cases.
> > > >
> > > > +1
> > > > I also think it is a good thing to have.
> > > > Again user don't have to trust the whitepapers - instead he can run the 
> > > > app
> > > > and measure performance gain on his particular platform.
> > >
> > > I totally agree the example is good to have.
> > > Konstantin, are you thinking it must be hosted in the PQoS lib repository?
> >
> > Personally I prefer it to be part of dpdk samples.
> > DPDK IO code path is a bit different from what the 'classical' user app 
> > usually does -
> > a lot of polling, avoid system calls, etc.
> > Also it would probably have much better visibility here.
> > Again, as Bruce already mentioned,  we have QAT & TAP samples, why we can't 
> > have PQoS too.
> 
> Indeed the DPDK policies are really flexible.
> How would you suggest to decide which examples can enter in DPDK?

That's a good question, for which I probably don't have an exact answer.
Probably a good opportunity for the TB to show itself :)
My input would be - to justify new sample for dpdk+third-party-lib it has to 
demonstrate one of:
a) clear performance gain for the existing dpdk application,
i.e under scenario X with library Y dpdk app Z shows N% better performance.
(PQos example).
b) how to integrate dpdk based app with some well-known and widely used 
technology.
(tap example, using fuse to implement vhost example).
c) How to expand packet processing with the functionality that is not part of 
dpdk project. 
So yes, if tomorrow someone will come up with example that does packet 
compression,
or encryption or DPI using some third party library, I think we at least have 
to consider to
include it inside dpdk.org/examples.

 As a restriction I would put that the example has to be relatively small and 
simple 
and demonstrate particular feature usage. 
Plus I think that this third-party library has to be freely available and 
open-sourced. 

Konstantin

> Examples: what about a zip compression in the forwarding plane?

[dpdk-dev] [PATCH v3 0/4] fix the issue that DPDK takes over virtio device blindly

2016-02-24 Thread Thomas Monjalon

> Huawei Xie (4):
>   eal: make the comment more accurate
>   eal: set kdrv to RTE_KDRV_NONE if kernel driver isn't manipulating the 
> device.
>   virtio: return 1 to tell the kernel we don't take over this device
>   virtio: check if kernel driver is manipulating the virtio device

The virtio PCI code has been refactored.
Please Huawei, would it be possible to rebase on master?

[dpdk-dev] [PATCH] doc: Malicious Driver Detection not supported by ixgbe

2016-02-24 Thread Wenzhuo Lu

Signed-off-by: Wenzhuo Lu 
---
 doc/guides/nics/ixgbe.rst  | 21 +
 doc/guides/rel_notes/release_16_04.rst | 24 
 2 files changed, 45 insertions(+)

diff --git a/doc/guides/nics/ixgbe.rst b/doc/guides/nics/ixgbe.rst
index 8cae299..aac5586 100644
--- a/doc/guides/nics/ixgbe.rst
+++ b/doc/guides/nics/ixgbe.rst
@@ -147,6 +147,27 @@ The following MACROs are used for these three features:

 *   ETH_TXQ_FLAGS_NOXSUMTCP

+Malicious Driver Detection not Supported by ixgbe
+^
+
+On Intel x550 series NICs, HW supports a feature called MDD (Malcicious
+Driver Detection).
+MDD is used to check the behavior of the VF driver. It means when transmitting
+packets, the VF must use the advanced context descriptor and set it correctly.
+And VF must set the CC (Check Context) bit either.
+DPDK PF doesn't support MDD. We may hit problem in this scenario kernel PF +
+DPDK VF. If user enables MDD in kernel PF, DPDK VF will not work. Because
+kernel PF thinks the VF is malicious. But actually it's not. The only reason
+is the VF doesn't act as MDD required.
+There's significant performance impact to support MDD. DPDK should check if
+the advanced context descriptor should be set and set it. And DPDK has to ask
+the info about the header length from the upper layer, because parsing the
+packet itself is not acceptale. So, it's too expensive to support MDD.
+When using kernel PF + DPDK VF on x550, please make sure using the kernel
+driver that disables MDD or can disable MDD. (Some kernel driver can use
+this CLI 'insmod ixgbe.ko MDD=0,0' to disable MDD. Some kernel driver disable
+it by default.)
+

 Sample Application Notes
 
diff --git a/doc/guides/rel_notes/release_16_04.rst 
b/doc/guides/rel_notes/release_16_04.rst
index 5786f74..df81c54 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -90,6 +90,30 @@ This section should contain new known issues in this 
release. Sample format:
   tense. Add information on any known workarounds.


+Restriction
+---
+
+* **Malicious Driver Detection is not supported by ixgbe**
+
+  On Intel x550 series NICs, HW supports a feature called MDD (Malcicious
+  Driver Detection).
+  MDD is used to check the behavior of the VF driver. It means when 
transmitting
+  packets, the VF must use the advanced context descriptor and set it 
correctly.
+  And VF must set the CC (Check Context) bit either.
+  DPDK PF doesn't support MDD. We may hit problem in this scenario kernel PF +
+  DPDK VF. If user enables MDD in kernel PF, DPDK VF will not work. Because
+  kernel PF thinks the VF is malicious. But actually it's not. The only reason
+  is the VF doesn't act as MDD required.
+  There's significant performance impact to support MDD. DPDK should check if
+  the advanced context descriptor should be set and set it. And DPDK has to ask
+  the info about the header length from the upper layer, because parsing the
+  packet itself is not acceptale. So, it's too expensive to support MDD.
+  When using kernel PF + DPDK VF on x550, please make sure using the kernel
+  driver that disables MDD or can disable MDD. (Some kernel driver can use
+  this CLI 'insmod ixgbe.ko MDD=0,0' to disable MDD. Some kernel driver disable
+  it by default.)
+
+
 API Changes
 ---

-- 
1.9.3

[dpdk-dev] [PATCH] examples/skeleton-cat: PQoS CAT and CDP, example of libpqos usage

2016-02-24 Thread Wojciech Andralojc

Because of the feedback that we have received off the mailing list,
that extending EAL commands is not an option due to the
Intel Architecture nature of CAT,
we have changed the design of PQoS patch.

The current V2 patch implements a sample code, based on the DPDK skeleton
example app, that links against the existing 01.org PQoS library
(https://github.com/01org/intel-cmt-cat).
This eliminates the need for librte_pqos and EAL extensions introduced in
the V1 patch. The sample code implements a C module that parses
the application specific part of the command line with CAT configuration
options (--l3ca, same format as the V1 patch EAL command, but expects
CPU ids rather than lcores).
The module is easy to re-use in other applications as needed.

Signed-off-by: Wojciech Andralojc 
Signed-off-by: Tomasz Kantecki 
Signed-off-by: Marcel D Cornu 
---
Details of "--l3ca" app parameter to configure Intel CAT and CDP features:
--l3ca=bitmask@
--l3ca=(code_bitmask,data_bitmask)@
- makes selected CPU's use specified CAT bitmasks

CAT and CDP features allow management of the CPU's last level cache.
CAT introduces classes of service (COS) that are essentially bitmasks.
In current CAT implementations, a bit in a COS bitmask corresponds to
one cache way in the last level cache.
A CPU core is always assigned to one of the CAT classes.
By programming CPU core assignment and COS bitmasks, applications can be
given exclusive, shared, or mixed access to the CPU's last level cache.
CDP extends CAT so that there are two bitmasks per COS,
one for data and one for code.
The number of classes and number of valid bits in a COS bitmask is CPU
model specific and COS bitmasks need to be contiguous. Sample code calls
this bitmask a cbm or a capacity bitmask.
By default, after reset, all CPU cores are assigned to COS 0 and all
classes are programmed to allow fill into all cache ways.
CDP is off by default.

For more information about CAT please see
https://github.com/01org/intel-cmt-cat

Known issues and limitations:
- --l3ca must be a first app parameter
---
 MAINTAINERS   |   4 +
 doc/guides/sample_app_ug/index.rst|   1 +
 doc/guides/sample_app_ug/skeleton-cat.rst | 461 ++
 examples/Makefile |   1 +
 examples/skeleton-cat/Makefile|  68 +++
 examples/skeleton-cat/basicfwd-cat.c  | 220 +++
 examples/skeleton-cat/cat.c   | 957 ++
 examples/skeleton-cat/cat.h   |  70 +++
 8 files changed, 1782 insertions(+)
 create mode 100644 doc/guides/sample_app_ug/skeleton-cat.rst
 create mode 100644 examples/skeleton-cat/Makefile
 create mode 100644 examples/skeleton-cat/basicfwd-cat.c
 create mode 100644 examples/skeleton-cat/cat.c
 create mode 100644 examples/skeleton-cat/cat.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 628bc05..7a6702b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -600,3 +600,7 @@ F: doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
 M: Pablo de Lara 
 M: Daniel Mrzyglod 
 F: examples/ptpclient/
+
+M: Tomasz Kantecki 
+F: examples/skeleton-cat/
+F: doc/guides/sample_app_ug/skeleton-cat.rst
\ No newline at end of file
diff --git a/doc/guides/sample_app_ug/index.rst 
b/doc/guides/sample_app_ug/index.rst
index 8a646dd..f065e54 100644
--- a/doc/guides/sample_app_ug/index.rst
+++ b/doc/guides/sample_app_ug/index.rst
@@ -41,6 +41,7 @@ Sample Applications User Guide
 exception_path
 hello_world
 skeleton
+skeleton-cat
 rxtx_callbacks
 ip_frag
 ipv4_multicast
diff --git a/doc/guides/sample_app_ug/skeleton-cat.rst 
b/doc/guides/sample_app_ug/skeleton-cat.rst
new file mode 100644
index 000..6684f61
--- /dev/null
+++ b/doc/guides/sample_app_ug/skeleton-cat.rst
@@ -0,0 +1,461 @@
+..  BSD LICENSE
+Copyright(c) 2016 Intel Corporation. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+

[dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

2016-02-24 Thread Ananyev, Konstantin

Hi Panu,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Panu Matilainen
> Sent: Wednesday, February 24, 2016 12:12 PM
> To: Xie, Huawei; Olivier MATZ; dev at dpdk.org
> Cc: dprovan at bivio.net
> Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk 
> API
> 
> On 02/23/2016 07:35 AM, Xie, Huawei wrote:
> > On 2/22/2016 10:52 PM, Xie, Huawei wrote:
> >> On 2/4/2016 1:24 AM, Olivier MATZ wrote:
> >>> Hi,
> >>>
> >>> On 01/27/2016 02:56 PM, Panu Matilainen wrote:
>  Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part of
>  the library ABI and should not be listed in the version map.
> 
>  I assume its inline for performance reasons, but then you lose the
>  benefits of dynamic linking such as ability to fix bugs and/or improve
>  itby just updating the library. Since the point of having a bulk API is
>  to improve performance by reducing the number of calls required, does it
>  really have to be inline? As in, have you actually measured the
>  difference between inline and non-inline and decided its worth all the
>  downsides?
> >>> Agree with Panu. It would be interesting to compare the performance
> >>> between inline and non inline to decide whether inlining it or not.
> >> Will update after i gathered more data. inline could show obvious
> >> performance difference in some cases.
> >
> > Panu and Oliver:
> > I write a simple benchmark. This benchmark run 10M rounds, in each round
> > 8 mbufs are allocated through bulk API, and then freed.
> > These are the CPU cycles measured(Intel(R) Xeon(R) CPU E5-2680 0 @
> > 2.70GHz, CPU isolated, timer interrupt disabled, rcu offloaded).
> > Btw, i have removed some exceptional data, the frequency of which is
> > like 1/10. Sometimes observed user usage suddenly disappeared, no clue
> > what happened.
> >
> > With 8 mbufs allocated, there is about 6% performance increase using inline.
> [...]
> >
> > With 16 mbufs allocated, we could still observe obvious performance
> > difference, though only 1%-2%
> >
> [...]
> >
> > With 32/64 mbufs allocated, the deviation of the data itself would hide
> > the performance difference.
> > So we prefer using inline for performance.
> 
> At least I was more after real-world performance in a real-world
> use-case rather than CPU cycles in a microbenchmark, we know function
> calls have a cost but the benefits tend to outweight the cons.
> 
> Inline functions have their place and they're far less evil in project
> internal use, but in library public API they are BAD and should be ...
> well, not banned because there are exceptions to every rule, but highly
> discouraged.

Why is that?
As you can see right now we have all mbuf alloc/free routines as static inline.
And I think we would like to keep it like that.
So why that particular function should be different?
After all that function is nothing more than a wrapper 
around rte_mempool_get_bulk()  unrolled by 4 loop {rte_pktmbuf_reset()}
So unless mempool get/put API would change, I can hardly see there could be any 
ABI
breakages in future. 
About 'real world' performance gain - it was a 'real world' performance problem,
that we tried to solve by introducing that function:
http://dpdk.org/ml/archives/dev/2015-May/017633.html

And according to the user feedback, it does help:  
http://dpdk.org/ml/archives/dev/2016-February/033203.html

Konstantin

> 
>   - Panu -
>

[dpdk-dev] [PATCH v2 10/10] pci: place all uio pci device ids in a dedicated section

2016-02-24 Thread Thomas Monjalon

2016-02-24 11:37, Bruce Richardson:
> On Wed, Jan 20, 2016 at 10:40:00AM -0500, Neil Horman wrote:
> > On Tue, Jan 19, 2016 at 01:35:14PM -0800, Stephen Hemminger wrote:
> > > On Tue, 19 Jan 2016 15:56:14 -0500
> > > Neil Horman  wrote:
> > > 
> > > > On Tue, Jan 19, 2016 at 08:10:19AM -0800, Stephen Hemminger wrote:
> > > > > On Tue, 19 Jan 2016 09:29:31 -0500
> > > > > Neil Horman  wrote:
> > > > > 
> > > > > > On Tue, Jan 19, 2016 at 08:30:40AM +0100, Thomas Monjalon wrote:
> > > > > > > 2016-01-18 13:30, David Marchand:
> > > > > > > > We could do something ? la modinfo, but let's keep it simple 
> > > > > > > > for now.
> > > > > > > > 
> > > > > > > > With this, you can extract the devices that need to be bound to 
> > > > > > > > uio / vfio
> > > > > > > > with tools like objdump :
> > > > > > > > 
> > > > > > > > $ objdump -j rte_pci_id_uio -s build/lib/librte_pmd_fm10k.so
> > > > > > > > 
> > > > > > > > Contents of section rte_pci_id_uio:
> > > > > > > >  15760 8680a415  8680d015   
> > > > > > > >  15770 8680a515     
> > > > > > > 
> > > > > > > Yes we need a modinfo-like tool.
> > > > > > > Currently, the UIO/VFIO binding can be done after parsing the PCI 
> > > > > > > device list.
> > > > > > > It is better to define the device ids locally to their drivers 
> > > > > > > but it must
> > > > > > > be integrated with an appropriate parsing tool at the same time.
> > > > > > > And more importantly than any tool, the format of these ELF data 
> > > > > > > must be
> > > > > > > properly defined, documented and extensible.
> > > > > > > 
> > > > > > > Is there someone experimented with such format definition?
> > > > > > > Stephen, you were asking for this change, what is your opinion?
> > > > > > > I remember that Neil was also interested in this change:
> > > > > > >   http://dpdk.org/ml/archives/dev/2015-January/012115.html
> > > > > > > Panu, Christian, this change could be related to distribution 
> > > > > > > packaging.
> > > > > > > Thanks for helping to move this change forward.
> > > > > > 
> > > > > > Yes, I would be interested in seeing this.  Is the ask here that 
> > > > > > someone do it?
> > > > > > As I recall from the last thread that you reference, I thought 
> > > > > > David M was
> > > > > > interested in writing it and soliciting for ideas.  If thats no 
> > > > > > longer the case,
> > > > > > I can take a stab at writing it.
> > > > > > 
> > > > > > Neil
> > > > > > 
> > > > > 
> > > > > If these are libraries is there a way to have a real entry point
> > > > > to dump PCI id's. 
> > > > > 
> > > > Sure, you could write a method that could be dlsym-ed easily enough to 
> > > > fetch an
> > > > array of pci ids, or just print stuff the console.  Not sure thats the 
> > > > best way,
> > > > but definately an option
> > > > Neil
> > > 
> > > It is just that reading data with objdump is a kludge likely to get 
> > > broken.
> > > 
> > Not suggesting that we rely on objdump in perpituity, only that we export 
> > the
> > data, rather than a method to access it so that it can be reached via 
> > libelf.
> > Using a function to return the information has implicit issues at the moment
> > (specifically if you dlopen a dpdk driver, its constructor will attempt to
> > register it with the core libraries).  While thats not catastrophic, it 
> > means
> > more stuff than you expect gets loaded, which might have wierd side effects.
> > Adding a separate section that you could reach via libelf would be nice I 
> > think
> > 
> > Neil
> > 
> Hi,
> 
> while there is interesting discussion on tools, are there any objections to
> taking and merging this patchset as-is to at least do the cleanup of the
> existing pci ids list? I would assume that any tools for querying the 
> patchlist
> can be done as additional work once this is applied. 

Today we can parse the global PCI list to bind devices to DPDK.
If we remove this list, we must replace it by another convenient method.
And more importantly, the informations in the ELF files must be extendible
and in a stable syntax.
The problem here is that it is poorly specified.
Please let's describe a syntax for these ELF data, first.

[dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL extension

2016-02-24 Thread Thomas Monjalon

2016-02-24 11:21, Ananyev, Konstantin:
> 
> > -Original Message-
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Sent: Wednesday, February 24, 2016 10:35 AM
> > To: Ananyev, Konstantin
> > Cc: Richardson, Bruce; dev at dpdk.org; Kantecki, Tomasz
> > Subject: Re: [dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL 
> > extension
> > 
> > 2016-02-24 10:22, Ananyev, Konstantin:
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > > > Sent: Wednesday, February 24, 2016 10:10 AM
> > > > To: Thomas Monjalon
> > > > Cc: dev at dpdk.org; Kantecki, Tomasz
> > > > Subject: Re: [dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL 
> > > > extension
> > > >
> > > > On Wed, Feb 24, 2016 at 09:24:33AM +0100, Thomas Monjalon wrote:
> > > > > 2016-02-23 23:03, Kantecki, Tomasz:
> > > > > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > > > If there is nothing specific in DPDK for PQos, why writing an 
> > > > > > > example in
> > > > > > > DPDK?
> > > > > > The example makes it much easier to use the technology with DPDK.
> > > > > >
> > > > > > > Maybe the example should be better in the library itself.
> > > > > > The library in question (https://github.com/01org/intel-cmt-cat) 
> > > > > > has a couple of examples but none of them refers to DPDK.
> > > > > >
> > > > > > > I suggest to mention the library in
> > > > > > > doc/guides/linux_gsg/nic_perf_intel_platform.rst
> > > > > > Ok it can be added to this document. Does it imply -1 for the 
> > > > > > sample code idea?
> > > > >
> > > > > I may be wrong but I have the feeling the example is more about PQoS 
> > > > > than DPDK.
> > > > > So yes, I would vote -1.
> > > > >
> > > > Well, the intersection of DPDK and PQoS is what the example is really 
> > > > all about,
> > > > and as such it is relevant to both DPDK and the library itself. 
> > > > Platform QoS
> > > > can be of great use to packet processing applications for helping to 
> > > > ensure that
> > > > the app gets the resources it needed - especially in a virtualised 
> > > > world - and
> > > > so we believe that having an example in DPDK showing how to use PQoS 
> > > > with DPDK
> > > > is well worthwhile having. It's more effective than a simple doc update 
> > > > in
> > > > raising awareness of the existence of the feature, and also provides 
> > > > for DPDK
> > > > users a readily available app for the user to start playing with to 
> > > > evaluate
> > > > PQoS for their own use-cases.
> > >
> > > +1
> > > I also think it is a good thing to have.
> > > Again user don't have to trust the whitepapers - instead he can run the 
> > > app
> > > and measure performance gain on his particular platform.
> > 
> > I totally agree the example is good to have.
> > Konstantin, are you thinking it must be hosted in the PQoS lib repository?
> 
> Personally I prefer it to be part of dpdk samples.
> DPDK IO code path is a bit different from what the 'classical' user app 
> usually does -
> a lot of polling, avoid system calls, etc.
> Also it would probably have much better visibility here.
> Again, as Bruce already mentioned,  we have QAT & TAP samples, why we can't 
> have PQoS too.

Indeed the DPDK policies are really flexible.
How would you suggest to decide which examples can enter in DPDK?
Examples: what about a zip compression in the forwarding plane?
What about a VM2VM failover synchronization?

[dpdk-dev] [PATCH] fm10k: optimize legacy TX func

2016-02-24 Thread Bruce Richardson

On Thu, Feb 18, 2016 at 09:20:18AM +, He, Shaopeng wrote:
> Hi,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Chen Jing D(Mark)
> > Sent: Thursday, January 28, 2016 5:46 PM
> > To: Qiu, Michael; Ananyev, Konstantin
> > Cc: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] fm10k: optimize legacy TX func
> > 
> > From: "Chen Jing D(Mark)" 
> > 
> > When legacy TX func tries to free a bunch of mbufs, it will free
> > them one by one. This change will scan the free list and merge the
> > requests in case they belongs to same pool, then free once, which
> > will reduce cycles on freeing mbufs.
> > 
> > Signed-off-by: Chen Jing D(Mark) 
> Acked-by: Shaopeng He 

Applied to dpdk-next-net/rel_16_04

/Bruce

[dpdk-dev] [PATCH v9 0/3] Add virtio support for arm/arm64

2016-02-24 Thread Thomas Monjalon

2016-02-22 13:41, Yuanhan Liu:
> On Sun, Feb 21, 2016 at 07:47:58PM +0530, Santosh Shukla wrote:
> > v9 patchset to support vfio infrasture for ioport, required for archs 
> > example
> > arm64/arm and x86.
> > 
> > 
> > For virtio inc_vector patch which is not part of v9..its under review, 
> > refer [2].
> > 
> > Follow on patch history summary, refer[1]
> > [1] http://comments.gmane.org/gmane.comp.networking.dpdk.devel/32821
> > [2] http://dpdk.org/dev/patchwork/patch/10429/
> > 
> > Santosh Shukla (3):
> >   eal/linux: never check iopl for arm
> >   eal/linux: vfio: ignore mapping for ioport region
> >   eal/linux: vfio: add pci ioport support
> > 
> >  lib/librte_eal/linuxapp/eal/eal.c  |2 +
> >  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c |   56 
> > ++--
> >  2 files changed, 46 insertions(+), 12 deletions(-)
> 
> Series looks good to me:
> 
> Reviewed-by: Yuanhan Liu 

Applied, thanks

[dpdk-dev] [PATCH 00/12] extend flow director's fields in i40e driver

2016-02-24 Thread Bruce Richardson

On Tue, Jan 26, 2016 at 02:26:03PM +0800, Jingjing Wu wrote:
> This patch set extends flow director to support filtering by
> additional fields below in i40e driver:
>  - TOS, Protocol and TTL in IP header
>  - Tunnel id if NVGRE/GRE/VxLAN packets
>  - single vlan or inner vlan
> 
> Jingjing Wu (12):
>   ethdev: extend flow director to support input set selection
>   i40e: split function for input set change of hash and fdir
>   i40e: remove flex payload from INPUT_SET_SELECT operation
>   i40e: restore default setting on input set of filters
>   i40e: extend flow director to filter by more IP Header fields
>   testpmd: extend commands for filter's input set changing
>   librte_ether: extend rte_eth_fdir_flow to support tunnel format
>   i40e: extend flow director to filter by tunnel ID
>   testpmd: extend commands for fdir's tunnel id input set
>   i40e: fix VLAN bitmasks for hash/fdir input sets for tunnels
>   i40e: extend flow director to filter by vlan id
>   testpmd: extend commands for fdir's vlan input set
> 
>  app/test-pmd/cmdline.c  | 121 +++--
>  doc/guides/rel_notes/release_2_3.rst|   5 +
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  56 ++--
>  drivers/net/i40e/i40e_ethdev.c  | 401 
> +---
>  drivers/net/i40e/i40e_ethdev.h  |  11 +-
>  drivers/net/i40e/i40e_fdir.c| 163 ---
>  lib/librte_ether/rte_eth_ctrl.h |  35 ++-
>  7 files changed, 529 insertions(+), 263 deletions(-)
> 
> -- 
> 2.4.0
> 

Any review or comments on this patchset?

/Bruce

[dpdk-dev] [PATCH v2 10/10] pci: place all uio pci device ids in a dedicated section

2016-02-24 Thread Bruce Richardson

On Wed, Jan 20, 2016 at 10:40:00AM -0500, Neil Horman wrote:
> On Tue, Jan 19, 2016 at 01:35:14PM -0800, Stephen Hemminger wrote:
> > On Tue, 19 Jan 2016 15:56:14 -0500
> > Neil Horman  wrote:
> > 
> > > On Tue, Jan 19, 2016 at 08:10:19AM -0800, Stephen Hemminger wrote:
> > > > On Tue, 19 Jan 2016 09:29:31 -0500
> > > > Neil Horman  wrote:
> > > > 
> > > > > On Tue, Jan 19, 2016 at 08:30:40AM +0100, Thomas Monjalon wrote:
> > > > > > 2016-01-18 13:30, David Marchand:
> > > > > > > We could do something ? la modinfo, but let's keep it simple for 
> > > > > > > now.
> > > > > > > 
> > > > > > > With this, you can extract the devices that need to be bound to 
> > > > > > > uio / vfio
> > > > > > > with tools like objdump :
> > > > > > > 
> > > > > > > $ objdump -j rte_pci_id_uio -s build/lib/librte_pmd_fm10k.so
> > > > > > > 
> > > > > > > Contents of section rte_pci_id_uio:
> > > > > > >  15760 8680a415  8680d015   
> > > > > > >  15770 8680a515     
> > > > > > 
> > > > > > Yes we need a modinfo-like tool.
> > > > > > Currently, the UIO/VFIO binding can be done after parsing the PCI 
> > > > > > device list.
> > > > > > It is better to define the device ids locally to their drivers but 
> > > > > > it must
> > > > > > be integrated with an appropriate parsing tool at the same time.
> > > > > > And more importantly than any tool, the format of these ELF data 
> > > > > > must be
> > > > > > properly defined, documented and extensible.
> > > > > > 
> > > > > > Is there someone experimented with such format definition?
> > > > > > Stephen, you were asking for this change, what is your opinion?
> > > > > > I remember that Neil was also interested in this change:
> > > > > > http://dpdk.org/ml/archives/dev/2015-January/012115.html
> > > > > > Panu, Christian, this change could be related to distribution 
> > > > > > packaging.
> > > > > > Thanks for helping to move this change forward.
> > > > > 
> > > > > Yes, I would be interested in seeing this.  Is the ask here that 
> > > > > someone do it?
> > > > > As I recall from the last thread that you reference, I thought David 
> > > > > M was
> > > > > interested in writing it and soliciting for ideas.  If thats no 
> > > > > longer the case,
> > > > > I can take a stab at writing it.
> > > > > 
> > > > > Neil
> > > > > 
> > > > 
> > > > If these are libraries is there a way to have a real entry point
> > > > to dump PCI id's. 
> > > > 
> > > Sure, you could write a method that could be dlsym-ed easily enough to 
> > > fetch an
> > > array of pci ids, or just print stuff the console.  Not sure thats the 
> > > best way,
> > > but definately an option
> > > Neil
> > 
> > It is just that reading data with objdump is a kludge likely to get broken.
> > 
> Not suggesting that we rely on objdump in perpituity, only that we export the
> data, rather than a method to access it so that it can be reached via libelf.
> Using a function to return the information has implicit issues at the moment
> (specifically if you dlopen a dpdk driver, its constructor will attempt to
> register it with the core libraries).  While thats not catastrophic, it means
> more stuff than you expect gets loaded, which might have wierd side effects.
> Adding a separate section that you could reach via libelf would be nice I 
> think
> 
> Neil
> 
Hi,

while there is interesting discussion on tools, are there any objections to
taking and merging this patchset as-is to at least do the cleanup of the
existing pci ids list? I would assume that any tools for querying the patchlist
can be done as additional work once this is applied. 

Regards,
/Bruce

[dpdk-dev] [PATCH] mk: fix the combined library problems by replacing it with a linker script

2016-02-24 Thread Panu Matilainen

On 02/23/2016 10:07 PM, Thomas Monjalon wrote:
> Hi,
>
> I'm reviving this old thread.

Thanks.

> My understanding is that everybody prefer the linker script
> than the current combined library which had neither symbol versioning
> nor library dependency informations.

Yeah it seemed to me most (if not everybody) had converged on the side 
of the linker script approach.

>
> Comments below:
>
> 2015-11-24 16:31, Panu Matilainen:
>> The physically linked-together combined library has been an increasing
>> source of problems, as was predicted when library and symbol versioning
>> was introduced. Replace the complex and fragile construction with a
>> simple linker script which achieves the same without all the problems,
>> remove the related kludges from eg mlx drivers.
>>
>> Since creating the linker script is practically zero cost, remove the
>> config option and just create it always.
> [...]
>> --- /dev/null
>> +++ b/mk/rte.combinedlib.mk
>> @@ -0,0 +1,57 @@
>> +#   BSD LICENSE
>> +#
>> +#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
>> +#   All rights reserved.
>> +#
>> +#   Redistribution and use in source and binary forms, with or without
>> +#   modification, are permitted provided that the following conditions
>> +#   are met:
>> +#
>> +# * Redistributions of source code must retain the above copyright
>> +#   notice, this list of conditions and the following disclaimer.
>> +# * Redistributions in binary form must reproduce the above copyright
>> +#   notice, this list of conditions and the following disclaimer in
>> +#   the documentation and/or other materials provided with the
>> +#   distribution.
>> +# * Neither the name of Intel Corporation nor the names of its
>> +#   contributors may be used to endorse or promote products derived
>> +#   from this software without specific prior written permission.
>
> Why this header, Panu?
> I think you should write your own copyright, and assume the linker script ;)

Its just inherited from the original patch by Sergio. As he's the actual 
author here, it didn't seem appropriate for me to remove it.

>
> It needs to be rebased and some docs comments must be removed or updated.
> I'll send a v2.
>

Thanks,

- Panu -

[dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL extension

2016-02-24 Thread Thomas Monjalon

2016-02-24 10:22, Ananyev, Konstantin:
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Wednesday, February 24, 2016 10:10 AM
> > To: Thomas Monjalon
> > Cc: dev at dpdk.org; Kantecki, Tomasz
> > Subject: Re: [dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL 
> > extension
> > 
> > On Wed, Feb 24, 2016 at 09:24:33AM +0100, Thomas Monjalon wrote:
> > > 2016-02-23 23:03, Kantecki, Tomasz:
> > > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > If there is nothing specific in DPDK for PQos, why writing an example 
> > > > > in
> > > > > DPDK?
> > > > The example makes it much easier to use the technology with DPDK.
> > > >
> > > > > Maybe the example should be better in the library itself.
> > > > The library in question (https://github.com/01org/intel-cmt-cat) has a 
> > > > couple of examples but none of them refers to DPDK.
> > > >
> > > > > I suggest to mention the library in
> > > > > doc/guides/linux_gsg/nic_perf_intel_platform.rst
> > > > Ok it can be added to this document. Does it imply -1 for the sample 
> > > > code idea?
> > >
> > > I may be wrong but I have the feeling the example is more about PQoS than 
> > > DPDK.
> > > So yes, I would vote -1.
> > >
> > Well, the intersection of DPDK and PQoS is what the example is really all 
> > about,
> > and as such it is relevant to both DPDK and the library itself. Platform QoS
> > can be of great use to packet processing applications for helping to ensure 
> > that
> > the app gets the resources it needed - especially in a virtualised world - 
> > and
> > so we believe that having an example in DPDK showing how to use PQoS with 
> > DPDK
> > is well worthwhile having. It's more effective than a simple doc update in
> > raising awareness of the existence of the feature, and also provides for 
> > DPDK
> > users a readily available app for the user to start playing with to evaluate
> > PQoS for their own use-cases.
> 
> +1 
> I also think it is a good thing to have.
> Again user don't have to trust the whitepapers - instead he can run the app 
> and measure performance gain on his particular platform.

I totally agree the example is good to have.
Konstantin, are you thinking it must be hosted in the PQoS lib repository?

[dpdk-dev] [RFC PATCH] ivshmem ring aliases

2016-02-24 Thread David Verbeiren

The goal of this parch is to allow VMs to use standard ring names regardless of 
the names
given to the rings by host environment. It applies to configurations using 
ivshmem.

With shared memory rings, all VMs share a single namespace for the rings. 
However, a VM
will typically expect to find its rings with a pre-determined name (e.g. p1_rx, 
p1_tx)
regardless of how it's deployed, inserted in a service chain, or of which other 
VMs are
deployed alongside it. Hence, it is desirable to introduce a level of 
indirection where
the host can set a mapping from the actual ring names (e.g. dpdkr0_rx|tx with 
OVS) and
the names that will be visible in the VM. This patch provides a simple 
implementation
of such a mapping scheme.

Since the mapping must be VM specific, the aliases are inserted into the 
IVSHMEM metadata
area by the host and the guest side uses thoses aliases when doing 
rte_ring_lookup().

A new function, rte_ivshmem_add_ring_alias() is provided in librte_ivshmem to 
populate
alias entries in the host environment when creating the per-VM metadata.

Signed-off-by: David Verbeiren 
---
 config/defconfig_x86_64-ivshmem-linuxapp-gcc |  1 +
 lib/librte_eal/common/include/rte_eal.h  | 12 
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c| 38 +
 lib/librte_ivshmem/rte_ivshmem.c | 42 
 lib/librte_ivshmem/rte_ivshmem.h | 22 +++
 lib/librte_ring/rte_ring.c   |  6 
 6 files changed, 121 insertions(+)

diff --git a/config/defconfig_x86_64-ivshmem-linuxapp-gcc 
b/config/defconfig_x86_64-ivshmem-linuxapp-gcc
index 41ac5c3..2dc7674 100644
--- a/config/defconfig_x86_64-ivshmem-linuxapp-gcc
+++ b/config/defconfig_x86_64-ivshmem-linuxapp-gcc
@@ -44,6 +44,7 @@ CONFIG_RTE_LIBRTE_IVSHMEM_DEBUG=n
 CONFIG_RTE_LIBRTE_IVSHMEM_MAX_PCI_DEVS=4
 CONFIG_RTE_LIBRTE_IVSHMEM_MAX_ENTRIES=128
 CONFIG_RTE_LIBRTE_IVSHMEM_MAX_METADATA_FILES=32
+CONFIG_RTE_LIBRTE_IVSHMEM_MAX_RING_ALIAS=64

 # Set EAL to single file segments
 CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS=y
\ No newline at end of file
diff --git a/lib/librte_eal/common/include/rte_eal.h 
b/lib/librte_eal/common/include/rte_eal.h
index 0e99c31..02aea8e 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -234,6 +234,18 @@ static inline int rte_gettid(void)
return RTE_PER_LCORE(_thread_id);
 }

+/**
+ * Perform a lookup in the IVSHMEM ring aliases
+ *
+ * @param alias
+ *   Ring alias name to search for.
+ * @return
+ *   On success, returns the actual ring name corresponding to the
+ *   provided alias.
+ *   Returns NULL if the alias is not known.
+ */
+const char * rte_eal_ivshmem_alias_get(const char * alias);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c 
b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
index 28ddf09..4989c5c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
+++ b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
@@ -94,6 +94,7 @@ struct ivshmem_shared_config {
uint32_t segment_idx;
struct ivshmem_pci_device pci_devs[RTE_LIBRTE_IVSHMEM_MAX_PCI_DEVS];
uint32_t pci_devs_idx;
+   struct rte_ivshmem_ring_alias 
ring_alias[RTE_LIBRTE_IVSHMEM_MAX_RING_ALIAS];
 };
 static struct ivshmem_shared_config * ivshmem_config;
 static int memseg_idx;
@@ -369,6 +370,26 @@ read_metadata(char * path, int path_len, int fd, uint64_t 
flen)
}
ivshmem_config->segment_idx = idx;

+   int j = 0;  /* supports aliases from multiple IVSHMEM devices */ 
+   for (i = 0; i < RTE_LIBRTE_IVSHMEM_MAX_RING_ALIAS; i++) {
+   if (metadata.ring_alias[i].ring_alias[0] == '\0')
+   break;
+
+   RTE_LOG(DEBUG, EAL, "Alias[%d]: %s -> %s\n", i, 
metadata.ring_alias[i].ring_alias,
+   metadata.ring_alias[i].ring_name);
+   for (; j < RTE_LIBRTE_IVSHMEM_MAX_RING_ALIAS; j++) {
+   if (ivshmem_config->ring_alias[j].ring_alias[0] == '\0')
+   break;
+   }
+   if (j >= RTE_LIBRTE_IVSHMEM_MAX_RING_ALIAS) {
+   RTE_LOG(ERR, EAL, "Not enough space for alias (max 
%d)!\n", RTE_LIBRTE_IVSHMEM_MAX_RING_ALIAS);
+   return -1;
+   }
+
+   strncpy(ivshmem_config->ring_alias[j].ring_alias, 
metadata.ring_alias[i].ring_alias, RTE_RING_NAMESIZE);
+   strncpy(ivshmem_config->ring_alias[j].ring_name, 
metadata.ring_alias[i].ring_name, RTE_RING_NAMESIZE);
+   }
+
return 0;
 }

@@ -735,6 +756,23 @@ map_all_segments(void)
return 0;
 }

+const char * rte_eal_ivshmem_alias_get(const char* alias)
+{
+   if (ivshmem_config == NULL)
+   return NULL;
+
+   unsigned i;
+   for (i = 0; i < RTE_LIBRTE_IVSHMEM_MAX_RING_ALIAS; i++) {
+   if (strncmp(ivshmem_config->ring_alias[i].ring_alias, alias,

[dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL extension

2016-02-24 Thread Thomas Monjalon

2016-02-24 10:10, Bruce Richardson:
> On Wed, Feb 24, 2016 at 09:24:33AM +0100, Thomas Monjalon wrote:
> > 2016-02-23 23:03, Kantecki, Tomasz:
> > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > If there is nothing specific in DPDK for PQos, why writing an example in
> > > > DPDK?
> > > The example makes it much easier to use the technology with DPDK.
> > > 
> > > > Maybe the example should be better in the library itself.
> > > The library in question (https://github.com/01org/intel-cmt-cat) has a 
> > > couple of examples but none of them refers to DPDK.
> > > 
> > > > I suggest to mention the library in
> > > > doc/guides/linux_gsg/nic_perf_intel_platform.rst
> > > Ok it can be added to this document. Does it imply -1 for the sample code 
> > > idea?
> > 
> > I may be wrong but I have the feeling the example is more about PQoS than 
> > DPDK.
> > So yes, I would vote -1.
> > 
> Well, the intersection of DPDK and PQoS is what the example is really all 
> about,
> and as such it is relevant to both DPDK and the library itself. Platform QoS
> can be of great use to packet processing applications for helping to ensure 
> that
> the app gets the resources it needed - especially in a virtualised world - and
> so we believe that having an example in DPDK showing how to use PQoS with DPDK
> is well worthwhile having. It's more effective than a simple doc update in
> raising awareness of the existence of the feature, and also provides for DPDK
> users a readily available app for the user to start playing with to evaluate
> PQoS for their own use-cases.
> I also fail to see what the downside of having the sample app is - it won't 
> add
> significantly to the project maintenance overhead.

We need to draw a line in the sand to decide what can go in DPDK examples 
because
any code using some DPDK functions could require to be integrated.
Until now, the examples were used to demonstrate some DPDK API.
Here you are advocating that examples are "marketing tools" to bring awareness 
on
a library.
I have no strong opinion, except that we cannot host and maintain 100 examples.
Other opinions are welcome.

[dpdk-dev] including rte.app.mk from a Makefile.am

2016-02-24 Thread Panu Matilainen

On 02/24/2016 04:24 AM, Stefan Puiu wrote:
> Hi,
>
> I'm working on a Linux project that uses the DPDK and (unfornately,
> IMO) automake; so we have a Makefile.am where we include rte.extapp.mk
> and rte.vars.mk from the DPDK, add LDLIBS to the linker
>
> However, I've tried building against DPDK 2.2 and I'm getting linker
> errors about options like '--no-as-needed', '--whole-archive' etc not
> being recognized. Basically, we use libtool to link the binary, which
> behind the scenes ends up calling gcc to link the binary, and gcc
> doesn't know how to read linker options - they need to be prefixed
> with '-Wl,..'.  I've traced this to this part of rte.app.mk:
>
> === DPDK 1.7.1
> ifeq ($(LINK_USING_CC),1)
> LDLIBS := $(call linkerprefix,$(LDLIBS))
> LDFLAGS := $(call linkerprefix,$(LDFLAGS))
>
> === DPDK 2.2 (since DPDK 1.8, AFAICT)
> ifeq ($(LINK_USING_CC),1)
> O_TO_EXE = $(CC) $(CFLAGS) $(LDFLAGS_$(@)) \
>  -Wl,-Map=$(@).map,--cref -o $@ $(OBJS-y) $(call
> linkerprefix,$(LDFLAGS)) \
>  $(EXTRA_LDFLAGS) $(call linkerprefix,$(LDLIBS))
>
> Notice on 1.7.1 LDFLAGS gets the -Wl, prefix if linking with gcc; for
> 2.2, that doesn't happen anymore - note O_TO_EXE calls linkerprefix
> explicitly for LDLIBS and LDFLAGS.
>
> The change that removed the LDLIBS/LDFLAGS setting is 3c6a14f6, which
> ironically says "mk: fix link with CC" in the title.
>
> I've tried working around this, but apparently automake doesn't give
> you too much control of what you can do; overriding LDFLAGS with
> $(call linkerprefix,$(LDFLAGS) in Makefile.am doesn't work. Since
> LDFLAGS is treated as a user variable by automake, it's tricky to
> override it.
>
> Now my question is: is this supposed to work? Is there any point in
> trying to use the mk files from my outside project?

I would say no, especially when the rest of your buildsystem is around 
automake (or cmake or...). Pktgen relies on the dpdk make infrastructure 
but even that gets into all sorts of trouble with it.

> I noticed dpdk-ovs
> doesn't seem to bother with that, and just builds one library to link
> against. I guess it's useful to pick up the defines that the DPDK was
> built against, so inline functions in headers are properly picked up.
> Are there people using the DPDK from projects using automake?
>
> IMO, It would be nice if you could extract the CPPFLAGS/LDFLAGS etc
> from the DPDK without including the mk files - maybe by running
> something like 'make showvars' or something like that in the DPDK dir.
> Then external projects could integrate those in their build system
> without too much extra baggage.

It would be nice yes, but instead of some custom make-thing, I'd prefer 
a pkg-config file for the purpose. Adding a pkg-config file has been 
suggested in the past a few times but nobody has stepped up to do it.

- Panu -

> Thanks,
> Stefan.
>

[dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL extension

2016-02-24 Thread Ananyev, Konstantin



> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, February 24, 2016 10:35 AM
> To: Ananyev, Konstantin
> Cc: Richardson, Bruce; dev at dpdk.org; Kantecki, Tomasz
> Subject: Re: [dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL 
> extension
> 
> 2016-02-24 10:22, Ananyev, Konstantin:
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > > Sent: Wednesday, February 24, 2016 10:10 AM
> > > To: Thomas Monjalon
> > > Cc: dev at dpdk.org; Kantecki, Tomasz
> > > Subject: Re: [dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL 
> > > extension
> > >
> > > On Wed, Feb 24, 2016 at 09:24:33AM +0100, Thomas Monjalon wrote:
> > > > 2016-02-23 23:03, Kantecki, Tomasz:
> > > > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > > If there is nothing specific in DPDK for PQos, why writing an 
> > > > > > example in
> > > > > > DPDK?
> > > > > The example makes it much easier to use the technology with DPDK.
> > > > >
> > > > > > Maybe the example should be better in the library itself.
> > > > > The library in question (https://github.com/01org/intel-cmt-cat) has 
> > > > > a couple of examples but none of them refers to DPDK.
> > > > >
> > > > > > I suggest to mention the library in
> > > > > > doc/guides/linux_gsg/nic_perf_intel_platform.rst
> > > > > Ok it can be added to this document. Does it imply -1 for the sample 
> > > > > code idea?
> > > >
> > > > I may be wrong but I have the feeling the example is more about PQoS 
> > > > than DPDK.
> > > > So yes, I would vote -1.
> > > >
> > > Well, the intersection of DPDK and PQoS is what the example is really all 
> > > about,
> > > and as such it is relevant to both DPDK and the library itself. Platform 
> > > QoS
> > > can be of great use to packet processing applications for helping to 
> > > ensure that
> > > the app gets the resources it needed - especially in a virtualised world 
> > > - and
> > > so we believe that having an example in DPDK showing how to use PQoS with 
> > > DPDK
> > > is well worthwhile having. It's more effective than a simple doc update in
> > > raising awareness of the existence of the feature, and also provides for 
> > > DPDK
> > > users a readily available app for the user to start playing with to 
> > > evaluate
> > > PQoS for their own use-cases.
> >
> > +1
> > I also think it is a good thing to have.
> > Again user don't have to trust the whitepapers - instead he can run the app
> > and measure performance gain on his particular platform.
> 
> I totally agree the example is good to have.
> Konstantin, are you thinking it must be hosted in the PQoS lib repository?

Personally I prefer it to be part of dpdk samples.
DPDK IO code path is a bit different from what the 'classical' user app usually 
does -
a lot of polling, avoid system calls, etc.
Also it would probably have much better visibility here.
Again, as Bruce already mentioned,  we have QAT & TAP samples, why we can't 
have PQoS too.
Konstantin

[dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL extension

2016-02-24 Thread Bruce Richardson

On Wed, Feb 24, 2016 at 11:31:47AM +0100, Thomas Monjalon wrote:
> 2016-02-24 10:10, Bruce Richardson:
> > On Wed, Feb 24, 2016 at 09:24:33AM +0100, Thomas Monjalon wrote:
> > > 2016-02-23 23:03, Kantecki, Tomasz:
> > > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > If there is nothing specific in DPDK for PQos, why writing an example 
> > > > > in
> > > > > DPDK?
> > > > The example makes it much easier to use the technology with DPDK.
> > > > 
> > > > > Maybe the example should be better in the library itself.
> > > > The library in question (https://github.com/01org/intel-cmt-cat) has a 
> > > > couple of examples but none of them refers to DPDK.
> > > > 
> > > > > I suggest to mention the library in
> > > > > doc/guides/linux_gsg/nic_perf_intel_platform.rst
> > > > Ok it can be added to this document. Does it imply -1 for the sample 
> > > > code idea?
> > > 
> > > I may be wrong but I have the feeling the example is more about PQoS than 
> > > DPDK.
> > > So yes, I would vote -1.
> > > 
> > Well, the intersection of DPDK and PQoS is what the example is really all 
> > about,
> > and as such it is relevant to both DPDK and the library itself. Platform QoS
> > can be of great use to packet processing applications for helping to ensure 
> > that
> > the app gets the resources it needed - especially in a virtualised world - 
> > and
> > so we believe that having an example in DPDK showing how to use PQoS with 
> > DPDK
> > is well worthwhile having. It's more effective than a simple doc update in
> > raising awareness of the existence of the feature, and also provides for 
> > DPDK
> > users a readily available app for the user to start playing with to evaluate
> > PQoS for their own use-cases.
> > I also fail to see what the downside of having the sample app is - it won't 
> > add
> > significantly to the project maintenance overhead.
> 
> We need to draw a line in the sand to decide what can go in DPDK examples 
> because
> any code using some DPDK functions could require to be integrated.
> Until now, the examples were used to demonstrate some DPDK API.
> Here you are advocating that examples are "marketing tools" to bring 
> awareness on
> a library.
> I have no strong opinion, except that we cannot host and maintain 100 
> examples.
> Other opinions are welcome.

I think it comes down to whether it contributes something that would be useful
and appreciated by the users. 

[While the majority of sample apps have been for showing off DPDK APIs, that's
not exclusively the case. There is the DPDK-QAT example, and also the
exception_path example showing how to use DPDK with TUN/TAP. I believe both
these add value for DPDK users and these examples, and others like them,
are worth having.]

So, in summary, I honestly think that having an example that shows platform 
QoS functionality being used with DPDK is worth having as it is something that
would be useful to DPDK users.

/Bruce

[dpdk-dev] [PATCH v2] examples/l3fwd: exact-match rework

2016-02-24 Thread Tomasz Kulasek

Current implementation of Exact-Match uses different execution path than
for LPM. Unifying them allows to reuse big part of LPM code and sightly
increase performance of Exact-Match.

Main changes:
-
* Packet classification stage is separated from the rest of path for both
  LPM and EM.
* Packet processing, modifying and transmit part is the same for LPM and EM
  and mostly based on the current LPM implementation.
* Shared code is moved to the common file "l3fwd_sse.h".
* While sequential packet classification in EM path, seems to be faster
  than using multi hash lookup, used before, it is used by default. Old
  implementation is moved to the file l3fwd_em_hlm_sse.h and can be enabled
  with HASH_LOOKUP_MULTI global define in compilation time.


This patch depends of Ravi Kerur's "Modify and modularize l3fwd code" and
should be applied after it.


Changes in v2:
 - patch rebase to be applicable on top of "Modify and modularize l3fwd
   code" v3

Signed-off-by: Tomasz Kulasek 
Acked-by: Konstantin Ananyev 
---
 examples/l3fwd/l3fwd.h|8 +
 examples/l3fwd/l3fwd_em.c |   12 +-
 examples/l3fwd/l3fwd_em_hlm_sse.h |  341 +
 examples/l3fwd/l3fwd_em_sse.h |  447 +++-
 examples/l3fwd/l3fwd_lpm.c|   15 +-
 examples/l3fwd/l3fwd_lpm_sse.h|  507 -
 examples/l3fwd/l3fwd_sse.h|  501 
 7 files changed, 943 insertions(+), 888 deletions(-)
 create mode 100644 examples/l3fwd/l3fwd_em_hlm_sse.h
 create mode 100644 examples/l3fwd/l3fwd_sse.h

diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index f450269..da6d369 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -53,6 +53,14 @@
 /* Configure how many packets ahead to prefetch, when reading packets */
 #define PREFETCH_OFFSET  3

+/* Used to mark destination port as 'invalid'. */
+#defineBAD_PORT ((uint16_t)-1)
+
+#define FWDSTEP4
+
+/* replace first 12B of the ethernet header. */
+#defineMASK_ETH 0x3f
+
 /* Hash parameters. */
 #ifdef RTE_ARCH_X86_64
 /* default to 4 million hash entries (approx) */
diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
index d87c3f2..f09fc96 100644
--- a/examples/l3fwd/l3fwd_em.c
+++ b/examples/l3fwd/l3fwd_em.c
@@ -373,8 +373,12 @@ l3fwd_em_simple_forward(struct rte_mbuf *m, uint8_t portid,
  * buffer optimization i.e. ENABLE_MULTI_BUFFER_OPTIMIZE=1.
  */
 #if defined(__SSE4_1__)
+#ifndef HASH_MULTI_LOOKUP
 #include "l3fwd_em_sse.h"
 #else
+#include "l3fwd_em_hlm_sse.h"
+#endif
+#else
 #include "l3fwd_em.h"
 #endif

@@ -570,6 +574,7 @@ populate_ipv6_many_flow_into_table(const struct rte_hash *h,
printf("Hash: Adding 0x%x keys\n", nr_flow);
 }

+/* main processing loop */
 int
 em_main_loop(__attribute__((unused)) void *dummy)
 {
@@ -613,11 +618,8 @@ em_main_loop(__attribute__((unused)) void *dummy)
diff_tsc = cur_tsc - prev_tsc;
if (unlikely(diff_tsc > drain_tsc)) {

-   /*
-* This could be optimized (use queueid instead of
-* portid), but it is not called so often
-*/
-   for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
+   for (i = 0; i < qconf->n_rx_queue; i++) {
+   portid = qconf->rx_queue_list[i].port_id;
if (qconf->tx_mbufs[portid].len == 0)
continue;
send_burst(qconf,
diff --git a/examples/l3fwd/l3fwd_em_hlm_sse.h 
b/examples/l3fwd/l3fwd_em_hlm_sse.h
new file mode 100644
index 000..d3388da
--- /dev/null
+++ b/examples/l3fwd/l3fwd_em_hlm_sse.h
@@ -0,0 +1,341 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR

[dpdk-dev] [PATCH v1 1/3] drivers/net/i40e: Add ethdev functions

2016-02-24 Thread Ananyev, Konstantin



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Remy Horton
> Sent: Wednesday, February 24, 2016 10:32 AM
> To: Zhang, Helin; Xie, Huawei
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 1/3] drivers/net/i40e: Add ethdev functions
> 
> Comments inline.
> 
> ..Remy
> 
> On 23/02/2016 02:06, Zhang, Helin wrote:
>  >
>  >> +static inline int
>  >> +i40e_read_regs(struct i40e_hw *hw, const struct reg_info *reg,
>  >> +uint32_t *reg_buf)
>  >> +{
>  >> + unsigned int i;
>  >> +
>  >> + for (i = 0; i < reg->count; i++)
>  >> + reg_buf[i] = I40E_READ_REG(hw,
>  >> + reg->base_addr + i * reg->stride);
>  >> + return reg->count;
>  >> +}
>  >  From FVL5, some registers should be read by AQ commands, otherwise
> it may fail to
>  > read without any warning.
>  > Please see my patches of which registers should be read by AQ commands.
>  > Please check i40e_osdep.h from below link. Thanks!
>  > http://www.dpdk.org/dev/patchwork/patch/10654/
> 
> Ok - will change for v2.
> 
> I noticed that other patches in the same patchset expose extra registers
> - are these new or were they simply not exposed previously?
> 
> 
>  >> + /* Only support doing full dump */
>  >> + if (regs->offset != 0 && 0)
>  > '&& 0' means it will never be false, right?
>  > Anything wrong here?
> 
> Oops - some dead code that slipped through.. :)
> 
> 
>  >> + return -ENOTSUP;
>  > A message before this return to tell the uers what happened would be
> better.
> 
> Will add these into v2.
> 
> 
>  >> +static int i40e_get_eeprom_length(__rte_unused struct rte_eth_dev *dev)
>  > Why needs __rte_unused?
> 
> Good point - surprised the compiler did not complain about them, as they
> are not supposed to be there..
> 
> 
>  >> +static void i40e_set_default_mac_addr(struct rte_eth_dev *dev,
>  >> +   struct ether_addr *mac_addr)
>  >> +{
>  >> + struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data-
>  >>> dev_private);
>  >> +
>  >> + /* Flags: 0x3 updates port address */
>  >> + i40e_aq_mac_address_write(hw, 0x3, mac_addr->addr_bytes,
>  >> NULL); }
>  > Checks are needed before writing the MAC address.
> 
> Will look into this.
> 
> 
>  >> +struct reg_info {
>  >> + uint32_t base_addr;
>  >> + uint32_t count;
>  >> + uint32_t stride;
>  >> + const char *name;
>  >> +} reg_info;
>  > I think array definition shouldn't be added into a header file,
> otherwise any .c source
>  > file which includes that header file will define that.
> 
> Since it is quite a large table I think this approach, which is also
> used in ixgbe, is the lesser of evils. i40e_ethdev.c itself is already
> pretty big, and would prefer to avoid giving a driver-specific table
> non-static visibility until it actually has to be used from other
> compilation units.

Why not to have a separate .h file, specially for registers table definition?

[dpdk-dev] including rte.app.mk from a Makefile.am

2016-02-24 Thread Stefan Puiu

On Wed, Feb 24, 2016 at 3:26 AM, Thomas Monjalon
 wrote:

> Yes it is prefixed when used instead of assignment.
> In 2.2, it is better fixed:
>
> ifeq ($(LINK_USING_CC),1)
> override EXTRA_LDFLAGS := $(call linkerprefix,$(EXTRA_LDFLAGS))
> O_TO_EXE = $(CC) $(CFLAGS) $(LDFLAGS_$(@)) \
> -Wl,-Map=$(@).map,--cref -o $@ $(OBJS-y) $(call linkerprefix,$(LDFLAGS)) \
> $(EXTRA_LDFLAGS) $(call linkerprefix,$(LDLIBS))
>
> So everything is properly prefixed.
>
> Could you please better describe your issue?
> Is $(LINK_USING_CC) true in your case?

Yes, everything would be perfect if the O_TO_EXE part was used. As far
as I can tell, that's not how automake works.

You just set bin_programs, _SOURCES, _CFLAGS and then it
generates the Makefile code to build your stuff, which is different
from the above thing; it also uses LDFLAGS, which it treats as a user
variable. In 1.7.1, LDFLAGS is prefixed properly. In 2.2, it's not
prefixed, so the build system ends up calling gcc with linker options.

I'm not implying this is something that should be fixed in the DPDK; I
was just trying to see how others handle this.

>
> Yes, mk/rte.app.mk is primarily used by internal apps.
> If an external app don't want to use the DPDK makefiles, it should be
> possible to use pkgconfig on DPDK. I hadn't time yet to write a patch for
> pkgconfig support, so any contribution is welcome.

I'm not very familiar with pkgconfig, but if it can supply some *FLAGS
to build against, then it sounds good. I've worked on a project where
we built with scons, so including mk files wasn't much of an option.

Thanks,
Stefan.

[dpdk-dev] [PATCH v1 1/3] drivers/net/i40e: Add ethdev functions

2016-02-24 Thread Remy Horton

Comments inline.

..Remy

On 23/02/2016 02:06, Zhang, Helin wrote:
 >
 >> +static inline int
 >> +i40e_read_regs(struct i40e_hw *hw, const struct reg_info *reg,
 >> +  uint32_t *reg_buf)
 >> +{
 >> +   unsigned int i;
 >> +
 >> +   for (i = 0; i < reg->count; i++)
 >> +   reg_buf[i] = I40E_READ_REG(hw,
 >> +   reg->base_addr + i * reg->stride);
 >> +   return reg->count;
 >> +}
 >  From FVL5, some registers should be read by AQ commands, otherwise 
it may fail to
 > read without any warning.
 > Please see my patches of which registers should be read by AQ commands.
 > Please check i40e_osdep.h from below link. Thanks!
 > http://www.dpdk.org/dev/patchwork/patch/10654/

Ok - will change for v2.

I noticed that other patches in the same patchset expose extra registers 
- are these new or were they simply not exposed previously?


 >> +   /* Only support doing full dump */
 >> +   if (regs->offset != 0 && 0)
 > '&& 0' means it will never be false, right?
 > Anything wrong here?

Oops - some dead code that slipped through.. :)


 >> +   return -ENOTSUP;
 > A message before this return to tell the uers what happened would be 
better.

Will add these into v2.


 >> +static int i40e_get_eeprom_length(__rte_unused struct rte_eth_dev *dev)
 > Why needs __rte_unused?

Good point - surprised the compiler did not complain about them, as they 
are not supposed to be there..


 >> +static void i40e_set_default_mac_addr(struct rte_eth_dev *dev,
 >> + struct ether_addr *mac_addr)
 >> +{
 >> +   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data-
 >>> dev_private);
 >> +
 >> +   /* Flags: 0x3 updates port address */
 >> +   i40e_aq_mac_address_write(hw, 0x3, mac_addr->addr_bytes,
 >> NULL); }
 > Checks are needed before writing the MAC address.

Will look into this.


 >> +struct reg_info {
 >> +   uint32_t base_addr;
 >> +   uint32_t count;
 >> +   uint32_t stride;
 >> +   const char *name;
 >> +} reg_info;
 > I think array definition shouldn't be added into a header file, 
otherwise any .c source
 > file which includes that header file will define that.

Since it is quite a large table I think this approach, which is also 
used in ixgbe, is the lesser of evils. i40e_ethdev.c itself is already 
pretty big, and would prefer to avoid giving a driver-specific table 
non-static visibility until it actually has to be used from other 
compilation units.

[dpdk-dev] including rte.app.mk from a Makefile.am

2016-02-24 Thread Thomas Monjalon

Hi,

2016-02-23 20:24, Stefan Puiu:
> Hi,
> 
> I'm working on a Linux project that uses the DPDK and (unfornately,
> IMO) automake; so we have a Makefile.am where we include rte.extapp.mk
> and rte.vars.mk from the DPDK, add LDLIBS to the linker
> 
> However, I've tried building against DPDK 2.2 and I'm getting linker
> errors about options like '--no-as-needed', '--whole-archive' etc not
> being recognized. Basically, we use libtool to link the binary, which
> behind the scenes ends up calling gcc to link the binary, and gcc
> doesn't know how to read linker options - they need to be prefixed
> with '-Wl,..'.  I've traced this to this part of rte.app.mk:
> 
> === DPDK 1.7.1
> ifeq ($(LINK_USING_CC),1)
> LDLIBS := $(call linkerprefix,$(LDLIBS))
> LDFLAGS := $(call linkerprefix,$(LDFLAGS))
> 
> === DPDK 2.2 (since DPDK 1.8, AFAICT)
> ifeq ($(LINK_USING_CC),1)
> O_TO_EXE = $(CC) $(CFLAGS) $(LDFLAGS_$(@)) \
> -Wl,-Map=$(@).map,--cref -o $@ $(OBJS-y) $(call
> linkerprefix,$(LDFLAGS)) \
> $(EXTRA_LDFLAGS) $(call linkerprefix,$(LDLIBS))
> 
> Notice on 1.7.1 LDFLAGS gets the -Wl, prefix if linking with gcc; for
> 2.2, that doesn't happen anymore - note O_TO_EXE calls linkerprefix
> explicitly for LDLIBS and LDFLAGS.
> 
> The change that removed the LDLIBS/LDFLAGS setting is 3c6a14f6, which
> ironically says "mk: fix link with CC" in the title.

Yes it is prefixed when used instead of assignment.
In 2.2, it is better fixed:

ifeq ($(LINK_USING_CC),1)
override EXTRA_LDFLAGS := $(call linkerprefix,$(EXTRA_LDFLAGS))
O_TO_EXE = $(CC) $(CFLAGS) $(LDFLAGS_$(@)) \
-Wl,-Map=$(@).map,--cref -o $@ $(OBJS-y) $(call linkerprefix,$(LDFLAGS)) \
$(EXTRA_LDFLAGS) $(call linkerprefix,$(LDLIBS))

So everything is properly prefixed.

Could you please better describe your issue?
Is $(LINK_USING_CC) true in your case?

> I've tried working around this, but apparently automake doesn't give
> you too much control of what you can do; overriding LDFLAGS with
> $(call linkerprefix,$(LDFLAGS) in Makefile.am doesn't work. Since
> LDFLAGS is treated as a user variable by automake, it's tricky to
> override it.
> 
> Now my question is: is this supposed to work?

Yes and I still don't understand what is your issue. Are you really using 2.2?

> Is there any point in
> trying to use the mk files from my outside project? I noticed dpdk-ovs
> doesn't seem to bother with that, and just builds one library to link
> against. I guess it's useful to pick up the defines that the DPDK was
> built against, so inline functions in headers are properly picked up.
> Are there people using the DPDK from projects using automake?
> 
> IMO, It would be nice if you could extract the CPPFLAGS/LDFLAGS etc
> from the DPDK without including the mk files - maybe by running
> something like 'make showvars' or something like that in the DPDK dir.
> Then external projects could integrate those in their build system
> without too much extra baggage.

Yes, mk/rte.app.mk is primarily used by internal apps.
If an external app don't want to use the DPDK makefiles, it should be
possible to use pkgconfig on DPDK. I hadn't time yet to write a patch for
pkgconfig support, so any contribution is welcome.

[dpdk-dev] [PATCH v3 6/6] docs: add release note for qtest virtio container support

2016-02-24 Thread Tetsuya Mukawa

On 2016/02/23 19:28, Mcnamara, John wrote:
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Mcnamara, John
>> Sent: Monday, February 22, 2016 3:41 PM
>> To: Tetsuya Mukawa ; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v3 6/6] docs: add release note for qtest
>> virtio container support
>>
> Also, could you move the v2 patchset to "Superseded".
>

Thanks for your reviewing.
I will change v2 patches status to "Superseded".

Thanks,
Tetsuya

[dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL extension

2016-02-24 Thread Bruce Richardson

On Wed, Feb 24, 2016 at 09:24:33AM +0100, Thomas Monjalon wrote:
> 2016-02-23 23:03, Kantecki, Tomasz:
> > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > If there is nothing specific in DPDK for PQos, why writing an example in
> > > DPDK?
> > The example makes it much easier to use the technology with DPDK.
> > 
> > > Maybe the example should be better in the library itself.
> > The library in question (https://github.com/01org/intel-cmt-cat) has a 
> > couple of examples but none of them refers to DPDK.
> > 
> > > I suggest to mention the library in
> > > doc/guides/linux_gsg/nic_perf_intel_platform.rst
> > Ok it can be added to this document. Does it imply -1 for the sample code 
> > idea?
> 
> I may be wrong but I have the feeling the example is more about PQoS than 
> DPDK.
> So yes, I would vote -1.
> 
Well, the intersection of DPDK and PQoS is what the example is really all about,
and as such it is relevant to both DPDK and the library itself. Platform QoS
can be of great use to packet processing applications for helping to ensure that
the app gets the resources it needed - especially in a virtualised world - and
so we believe that having an example in DPDK showing how to use PQoS with DPDK
is well worthwhile having. It's more effective than a simple doc update in
raising awareness of the existence of the feature, and also provides for DPDK
users a readily available app for the user to start playing with to evaluate
PQoS for their own use-cases.
I also fail to see what the downside of having the sample app is - it won't add
significantly to the project maintenance overhead.

Regards,
/Bruce

[dpdk-dev] [PATCH 2/6] qede: add documentation

2016-02-24 Thread Mcnamara, John



> -Original Message-
> From: Harish Patil [mailto:harish.patil at qlogic.com]
> Sent: Wednesday, February 24, 2016 7:18 AM
> To: Mcnamara, John ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/6] qede: add documentation
> 
> >>
> >
> >Again, try wrap the code/console section at 80 chars in some way that
> >still maintains the meaning. Maybe something like the following (with a
> >note to say that the text has been wrapped for clarity):
> >
> > ...
> 
> These are the actual output from the console that was pasted, that?s why
> they are more than 80 chars.
> Sure, will take care of that as you mentioned.

Hi,

I understand that the text is actual output but if you generate the pdf
documentation you will see that the fixed width text goes off the page.
So some sort of compromise is required and wrapping the text is what
is usually done in the DPDK docs.

Since it may be confusing to the end user if they see different output
in the docs and on the console it is probably worth adding a line in the
preceding paragraph to say something like "(text wrapped for clarity)".

John.
--

[dpdk-dev] [PATCH] eal: Initial implementation of PQoS EAL extension

2016-02-24 Thread Thomas Monjalon

2016-02-23 23:03, Kantecki, Tomasz:
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > If there is nothing specific in DPDK for PQos, why writing an example in
> > DPDK?
> The example makes it much easier to use the technology with DPDK.
> 
> > Maybe the example should be better in the library itself.
> The library in question (https://github.com/01org/intel-cmt-cat) has a couple 
> of examples but none of them refers to DPDK.
> 
> > I suggest to mention the library in
> > doc/guides/linux_gsg/nic_perf_intel_platform.rst
> Ok it can be added to this document. Does it imply -1 for the sample code 
> idea?

I may be wrong but I have the feeling the example is more about PQoS than DPDK.
So yes, I would vote -1.

[dpdk-dev] [PATCH v2 10/10] pci: place all uio pci device ids in a dedicated section

2016-02-24 Thread Neil Horman

On Wed, Feb 24, 2016 at 12:50:40PM +0100, Thomas Monjalon wrote:
> 2016-02-24 11:37, Bruce Richardson:
> > On Wed, Jan 20, 2016 at 10:40:00AM -0500, Neil Horman wrote:
> > > On Tue, Jan 19, 2016 at 01:35:14PM -0800, Stephen Hemminger wrote:
> > > > On Tue, 19 Jan 2016 15:56:14 -0500
> > > > Neil Horman  wrote:
> > > > 
> > > > > On Tue, Jan 19, 2016 at 08:10:19AM -0800, Stephen Hemminger wrote:
> > > > > > On Tue, 19 Jan 2016 09:29:31 -0500
> > > > > > Neil Horman  wrote:
> > > > > > 
> > > > > > > On Tue, Jan 19, 2016 at 08:30:40AM +0100, Thomas Monjalon wrote:
> > > > > > > > 2016-01-18 13:30, David Marchand:
> > > > > > > > > We could do something ? la modinfo, but let's keep it simple 
> > > > > > > > > for now.
> > > > > > > > > 
> > > > > > > > > With this, you can extract the devices that need to be bound 
> > > > > > > > > to uio / vfio
> > > > > > > > > with tools like objdump :
> > > > > > > > > 
> > > > > > > > > $ objdump -j rte_pci_id_uio -s build/lib/librte_pmd_fm10k.so
> > > > > > > > > 
> > > > > > > > > Contents of section rte_pci_id_uio:
> > > > > > > > >  15760 8680a415  8680d015   
> > > > > > > > >  15770 8680a515     
> > > > > > > > 
> > > > > > > > Yes we need a modinfo-like tool.
> > > > > > > > Currently, the UIO/VFIO binding can be done after parsing the 
> > > > > > > > PCI device list.
> > > > > > > > It is better to define the device ids locally to their drivers 
> > > > > > > > but it must
> > > > > > > > be integrated with an appropriate parsing tool at the same time.
> > > > > > > > And more importantly than any tool, the format of these ELF 
> > > > > > > > data must be
> > > > > > > > properly defined, documented and extensible.
> > > > > > > > 
> > > > > > > > Is there someone experimented with such format definition?
> > > > > > > > Stephen, you were asking for this change, what is your opinion?
> > > > > > > > I remember that Neil was also interested in this change:
> > > > > > > > http://dpdk.org/ml/archives/dev/2015-January/012115.html
> > > > > > > > Panu, Christian, this change could be related to distribution 
> > > > > > > > packaging.
> > > > > > > > Thanks for helping to move this change forward.
> > > > > > > 
> > > > > > > Yes, I would be interested in seeing this.  Is the ask here that 
> > > > > > > someone do it?
> > > > > > > As I recall from the last thread that you reference, I thought 
> > > > > > > David M was
> > > > > > > interested in writing it and soliciting for ideas.  If thats no 
> > > > > > > longer the case,
> > > > > > > I can take a stab at writing it.
> > > > > > > 
> > > > > > > Neil
> > > > > > > 
> > > > > > 
> > > > > > If these are libraries is there a way to have a real entry point
> > > > > > to dump PCI id's. 
> > > > > > 
> > > > > Sure, you could write a method that could be dlsym-ed easily enough 
> > > > > to fetch an
> > > > > array of pci ids, or just print stuff the console.  Not sure thats 
> > > > > the best way,
> > > > > but definately an option
> > > > > Neil
> > > > 
> > > > It is just that reading data with objdump is a kludge likely to get 
> > > > broken.
> > > > 
> > > Not suggesting that we rely on objdump in perpituity, only that we export 
> > > the
> > > data, rather than a method to access it so that it can be reached via 
> > > libelf.
> > > Using a function to return the information has implicit issues at the 
> > > moment
> > > (specifically if you dlopen a dpdk driver, its constructor will attempt to
> > > register it with the core libraries).  While thats not catastrophic, it 
> > > means
> > > more stuff than you expect gets loaded, which might have wierd side 
> > > effects.
> > > Adding a separate section that you could reach via libelf would be nice I 
> > > think
> > > 
> > > Neil
> > > 
> > Hi,
> > 
> > while there is interesting discussion on tools, are there any objections to
> > taking and merging this patchset as-is to at least do the cleanup of the
> > existing pci ids list? I would assume that any tools for querying the 
> > patchlist
> > can be done as additional work once this is applied. 
> 
> Today we can parse the global PCI list to bind devices to DPDK.
> If we remove this list, we must replace it by another convenient method.
> And more importantly, the informations in the ELF files must be extendible
> and in a stable syntax.
> The problem here is that it is poorly specified.
> Please let's describe a syntax for these ELF data, first.

Agreed, I'd be fine with taking the patch if it didn't preclude admins from
being able to identify which drivers match which devices without loading the
modules first.
Neil

[dpdk-dev] [PATCH] doc: Malicious Driver Detection not supported by ixgbe

2016-02-24 Thread Stephen Hemminger

On Wed, 24 Feb 2016 13:33:04 +0800
Wenzhuo Lu  wrote:

> +  On Intel x550 series NICs, HW supports a feature called MDD (Malcicious
> +  Driver Detection).
> +  MDD is used to check the behavior of the VF driver. It means when 
> transmitting
> +  packets, the VF must use the advanced context descriptor and set it 
> correctly.
> +  And VF must set the CC (Check Context) bit either.

This is hard sentence to read, why not reword as:

The Intel x550 series NIC's support1 a feature called MDD (Malcicious
Driver Detection) which checks the behavior of the VF driver.
If this feature is enabled, the VF must use the advanced context descriptor
correctly and set the CC (Check Context) bit.


> +  DPDK PF doesn't support MDD. We may hit problem in this scenario kernel PF 
> +
> +  DPDK VF. If user enables MDD in kernel PF, DPDK VF will not work. Because
> +  kernel PF thinks the VF is malicious. But actually it's not. The only 
> reason
> +  is the VF doesn't act as MDD required.
> +  There's significant performance impact to support MDD. DPDK should check if
> +  the advanced context descriptor should be set and set it. And DPDK has to 
> ask
> +  the info about the header length from the upper layer, because parsing the
> +  packet itself is not acceptale. So, it's too expensive to support MDD.
> +  When using kernel PF + DPDK VF on x550, please make sure using the kernel
> +  driver that disables MDD or can disable MDD. (Some kernel driver can use
> +  this CLI 'insmod ixgbe.ko MDD=0,0' to disable MDD. Some kernel driver 
> disable
> +  it by default.)
> +

[dpdk-dev] [PATCH] vhost: broadcast RARP pkt by injecting it to receiving mbuf array

2016-02-24 Thread Qiu, Michael

On 2/22/2016 10:35 PM, Yuanhan Liu wrote:
> Broadcast RARP packet by injecting it to receiving mbuf array at
> rte_vhost_dequeue_burst().
>
> Commit 33226236a35e ("vhost: handle request to send RARP") iterates
> all host interfaces and then broadcast it by all of them.  It did
> notify the switches about the new location of the migrated VM, however,
> the mac learning table in the target host is wrong (at least in my
> test with OVS):
>
> $ ovs-appctl fdb/show ovsbr0
>  port  VLAN  MACAge
> 1 0  b6:3c:72:71:cd:4d   10
> LOCAL 0  b6:3c:72:71:cd:4e   10
> LOCAL 0  52:54:00:12:34:689
> 1 0  56:f6:64:2c:bc:c01
>
> Where 52:54:00:12:34:68 is the mac of the VM. As you can see from the
> above, the port learned is "LOCAL", which is the "ovsbr0" port. That
> is reasonable, since we indeed send the pkt by the "ovsbr0" interface.
>
> The wrong mac table lead all the packets to the VM go to the "ovsbr0"
> in the end, which ends up with all packets being lost, until the guest
> send a ARP quest (or reply) to refresh the mac learning table.
>
> Jianfeng then came up with a solution I have thought of firstly but NAKed

Is it suitable to mention someone in the commit log?

Thanks,
Michael
> by myself, concerning it has potential issues [0]. The solution is as title
> stated: broadcast the RARP packet by injecting it to the receiving mbuf
> arrays at rte_vhost_dequeue_burst(). The re-bring of that idea made me
> think it twice; it looked like a false concern to me then. And I had done
> a rough verification: it worked as expected.
>
> [0]: http://dpdk.org/ml/archives/dev/2016-February/033527.html
>
> Another note is that while preparing this version, I found that DPDK has
> some ARP related structures and macros defined. So, use them instead of
> the one from standard header files here.
>
> Cc: Thibaut Collet 
> Suggested-by: Jianfeng Tan 
> Signed-off-by: Yuanhan Liu 
> ---
>  lib/librte_vhost/rte_virtio_net.h |   5 +-
>  lib/librte_vhost/vhost_rxtx.c |  80 +++-
>  lib/librte_vhost/vhost_user/vhost-net-user.c  |   2 +-
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 128 
> --
>  lib/librte_vhost/vhost_user/virtio-net-user.h |   2 +-
>  5 files changed, 104 insertions(+), 113 deletions(-)
>
> diff --git a/lib/librte_vhost/rte_virtio_net.h 
> b/lib/librte_vhost/rte_virtio_net.h
> index 4a2303a..7d1fde2 100644
> --- a/lib/librte_vhost/rte_virtio_net.h
> +++ b/lib/librte_vhost/rte_virtio_net.h
> @@ -49,6 +49,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  struct rte_mbuf;
>  
> @@ -133,7 +134,9 @@ struct virtio_net {
>   void*priv;  /**< private context */
>   uint64_tlog_size;   /**< Size of log area */
>   uint64_tlog_base;   /**< Where dirty pages are 
> logged */
> - uint64_treserved[62];   /**< Reserve some spaces for 
> future extension. */
> + struct ether_addr   mac;/**< MAC address */
> + rte_atomic16_t  broadcast_rarp; /**< A flag to tell if we need 
> broadcast rarp packet */
> + uint64_treserved[61];   /**< Reserve some spaces for 
> future extension. */
>   struct vhost_virtqueue  *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];  /**< 
> Contains all virtqueue information. */
>  } __rte_cache_aligned;
>  
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 12ce0cc..9d23eb1 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -43,6 +43,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "vhost-net.h"
>  
> @@ -761,11 +762,50 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, 
> struct rte_mbuf *m)
>   }
>  }
>  
> +#define RARP_PKT_SIZE64
> +
> +static int
> +make_rarp_packet(struct rte_mbuf *rarp_mbuf, const struct ether_addr *mac)
> +{
> + struct ether_hdr *eth_hdr;
> + struct arp_hdr  *rarp;
> +
> + if (rarp_mbuf->buf_len < 64) {
> + RTE_LOG(WARNING, VHOST_DATA,
> + "failed to make RARP; mbuf size too small %u (< %d)\n",
> + rarp_mbuf->buf_len, RARP_PKT_SIZE);
> + return -1;
> + }
> +
> + /* Ethernet header. */
> + eth_hdr = rte_pktmbuf_mtod_offset(rarp_mbuf, struct ether_hdr *, 0);
> + memset(eth_hdr->d_addr.addr_bytes, 0xff, ETHER_ADDR_LEN);
> + ether_addr_copy(mac, _hdr->s_addr);
> + eth_hdr->ether_type = htons(ETHER_TYPE_RARP);
> +
> + /* RARP header. */
> + rarp = (struct arp_hdr *)(eth_hdr + 1);
> + rarp->arp_hrd = htons(ARP_HRD_ETHER);
> + rarp->arp_pro = htons(ETHER_TYPE_IPv4);
> + rarp->arp_hln = ETHER_ADDR_LEN;
> + rarp->arp_pln = 4;
> + rarp->arp_op  = htons(ARP_OP_REVREQUEST);
> +
> + ether_addr_copy(mac, >arp_data.arp_sha);
> + ether_addr_copy(mac, >arp_data.arp_tha);
> +

[dpdk-dev] [PATCH RFC 4/4] doc: add note about rte_vhost_enqueue_burst thread safety.

2016-02-24 Thread Ilya Maximets

On 23.02.2016 08:56, Xie, Huawei wrote:
> On 2/22/2016 6:16 PM, Thomas Monjalon wrote:
>> 2016-02-22 02:07, Xie, Huawei:
>>> On 2/19/2016 5:05 PM, Ilya Maximets wrote:
 On 19.02.2016 11:36, Xie, Huawei wrote:
> On 2/19/2016 3:10 PM, Yuanhan Liu wrote:
>> On Fri, Feb 19, 2016 at 09:32:43AM +0300, Ilya Maximets wrote:
>>> Signed-off-by: Ilya Maximets 
>>> ---
>>>  doc/guides/prog_guide/thread_safety_dpdk_functions.rst | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/doc/guides/prog_guide/thread_safety_dpdk_functions.rst 
>>> b/doc/guides/prog_guide/thread_safety_dpdk_functions.rst
>>> index 403e5fc..13a6c89 100644
>>> --- a/doc/guides/prog_guide/thread_safety_dpdk_functions.rst
>>> +++ b/doc/guides/prog_guide/thread_safety_dpdk_functions.rst
>>>  The mempool library is based on the DPDK lockless ring library and 
>>> therefore is also multi-thread safe.
>>> +rte_vhost_enqueue_burst() is also thread safe because based on 
>>> lockless ring-buffer algorithm like the ring library.
>> FYI, Huawei meant to make rte_vhost_enqueue_burst() not be thread-safe,
>> to aligh with the usage of rte_eth_tx_burst().
>>
>>  --yliu
> I have a patch to remove the lockless enqueue. Unless there is strong
> reason, i prefer vhost PMD to behave like other PMDs, with no internal
> lockless algorithm. In future, for people who really need it, we could
> have dynamic/static switch to enable it.
>>> Thomas, what is your opinion on this and my patch removing lockless enqueue?
>> The thread safety behaviour is part of the API specification.
>> If we want to enable/disable such behaviour, it must be done with an API
>> function. But it would introduce a conditional statement in the fast path.
>> That's why the priority must be to keep a simple and consistent behaviour
>> and try to build around. An API complexity may be considered only if there
>> is a real (measured) gain.
> 
> Let us put the gain aside temporarily. I would do the measurement.
> Vhost is wrapped as a PMD in Tetsuya's patch. And also in DPDK OVS's
> case, it is wrapped as a vport like all other physical ports. The DPDK
> app/OVS will treat all ports equally.

That is not true. Currently vhost in Open vSwitch implemented as a separate
netdev class. So, to use concurrency of vhost we just need to remove
2 lines (rte_spinlock_lock and rte_spinlock_unlock) from function
__netdev_dpdk_vhost_send(). This will not change behaviour of other types
of ports.

> It will add complexity if the app
> needs to know that some supports concurrency while some not. Since all
> other PMDs doesn't support thread safety, it doesn't make sense for
> vhost PMD to support that. I believe the APP will not use that behavior.
>>From the API's point of view, if we previously implemented it wrongly,
> we need to fix it as early as possible.

[dpdk-dev] [PATCH 2/6] qede: add documentation

2016-02-24 Thread Harish Patil

>>
>>-Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Harish Patil
>> Sent: Saturday, February 20, 2016 3:40 PM
>> To: dev at dpdk.org
>> Cc: Sony Chacko 
>> Subject: [dpdk-dev] [PATCH 2/6] qede: add documentation
>> 
>> Signed-off-by: Harish Patil 
>> Signed-off-by: Rasesh Mody 
>> Signed-off-by: Sony Chacko 
>> ---
>>  doc/guides/nics/index.rst |   1 +
>>  doc/guides/nics/qede.rst  | 344
>
>Hi,
>
>Thanks for the docs. The overall format and content seem good. A few
>comments below.
>
>
>> +#. Bind the QLogic 579xx adapters to ``igb_uio`` or ``vfio-pci`` loaded
>> in the
>> +   previous step::
>> +
>> +  ./tools/dpdk_nic_bind.py --bind igb_uio :84:00.0 :84:00.1
>> + :84:00.2 :84:00.3
>
>Fixed width lines should be constrained to 80 characters or else they will
>go off the page in the PDF docs. The usual workaround is to use a command-
>line continuation (or text wrap). For example:
>
>
> ./tools/dpdk_nic_bind.py --bind igb_uio \
>  :84:00.0 :84:00.1 \
>  :84:00.2 :84:00.3
>
>Or similar. This also applies to the command-lines in other sections.
>> +
>> +#. Start ``testpmd`` with basic parameters:
>> +
>> +   .. code-block:: console
>> +
>> +  testpmd -c 0xf8000 -n 4 -- -i --nb-cores=4 --portmask=0xf
>> + --rxd=4096 --txd=4096 --txfreet=4068 --rxq=2 --txq=2 --rss-ip
>> + --rss-udp
>
>Same comment about using continuations.
>
>
>> +
>> +  [...]
>> +
>> +EAL: PCI device :84:00.0 on NUMA socket 1
>> +EAL:   probe driver: 1077:1634 rte_qede_pmd
>
>Align the text in this section to the same level of indentation.
>
>
>> +EAL:   Not managed by a supported kernel driver, skipped
>> +EAL: PCI device :84:00.1 on NUMA socket 1
>> +EAL:   probe driver: 1077:1634 rte_qede_pmd
>> +EAL:   Not managed by a supported kernel driver, skipped
>> +EAL: PCI device :88:00.0 on NUMA socket 1
>> +EAL:   probe driver: 1077:1656 rte_qede_pmd
>> +EAL:   PCI memory mapped at 0x7f738b20
>> +EAL:   PCI memory mapped at 0x7f738b28
>> +EAL:   PCI memory mapped at 0x7f738b30
>> +[QEDE PMD: (88:00.0:dpdk-port-0)]qed_load_firmware_data: Loading
>>the
>> firmware file /lib/firmware/qed/qed_init_values_zipped.bin...
>
>Again, try wrap the code/console section at 80 chars in some way that
>still
>maintains the meaning. Maybe something like the following (with a note to
>say that the text has been wrapped for clarity):
>
>[QEDE PMD: (88:00.0:dpdk-port-0)]qed_load_firmware_data:
>   Loading the firmware file
>   /lib/firmware/qed/qed_init_values_zipped.bin...
>[QEDE PMD: (88:00.0:dpdk-port-0)]
>   qede_print_adapter_info:Chip details - BB1
>[QEDE PMD: (88:00.0:dpdk-port-0)]
>   qede_print_adapter_info:Driver version:QEDE PMD
>8.7.9.0_1.0.0
>[QEDE PMD: (88:00.0:dpdk-port-0)]
>   qede_print_adapter_info:Firmware version:8.7.7.0
>
>
>You can test the PDF output as follows:
>
>make -j doc-guides-pdf
>pdf_veiwer build/doc/pdf/guides/nics.pdf
>
>John
>

These are the actual output from the console that was pasted, that?s why
they are more than 80 chars.
Sure, will take care of that as you mentioned.


Thanks,
Harish

[dpdk-dev] [PATCH v2 1/3] i40e: enable DCB in VMDQ VSIs

2016-02-24 Thread Zhang, Helin



> -Original Message-
> From: Wu, Jingjing
> Sent: Wednesday, February 17, 2016 2:58 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org; Wu, Jingjing; Zhang, Helin
> Subject: [PATCH v2 1/3] i40e: enable DCB in VMDQ VSIs
> 
> Previously, DCB(Data Center Bridging) is only enabled on PF, queue mapping
> and BW configuration is only done on PF.
> This patch enabled DCB for VMDQ VSIs(Virtual Station Interfaces) by
> following steps:
>   1. Take BW and ETS(Enhanced Transmission Selection)
>  configuration on VEB(Virtual Ethernet Bridge).
>   2. Take BW and ETS configuration on VMDQ VSIs.
>   3. Update TC(Traffic Class) and queues mapping on VMDQ VSIs.
> To enable DCB on VMDQ, the number of TCs should not be lager than the
> number of queues in VMDQ pools, and the number of queues per VMDQ
> pool is specified by CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM
> in config/common_* file.
> 
> Signed-off-by: Jingjing Wu 
> ---
>  doc/guides/rel_notes/release_16_04.rst |   3 +
>  drivers/net/i40e/i40e_ethdev.c | 153
> +
>  drivers/net/i40e/i40e_ethdev.h |  28 +++---
>  3 files changed, 152 insertions(+), 32 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_16_04.rst
> b/doc/guides/rel_notes/release_16_04.rst
> index 81f62f1..d3b035c 100644
> --- a/doc/guides/rel_notes/release_16_04.rst
> +++ b/doc/guides/rel_notes/release_16_04.rst
> @@ -56,6 +56,9 @@ This section should contain new features added in this
> release. Sample format:
>Added support for sw-firmware sync for resource sharing.
>Use the PHY token, shared between sw-fw for PHY access on X550EM_a.
> 
> +* **VMDQ DCB mode in i40e.**
> +
> +  Added support for DCB in VMDQ mode to i40e driver.
> 
>  Resolved Issues
>  ---
> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
> index ef24122..fc06612 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -8087,6 +8087,8 @@ i40e_vsi_update_queue_mapping(struct i40e_vsi
> *vsi,
>   int i, total_tc = 0;
>   uint16_t qpnum_per_tc, bsf, qp_idx;
>   struct rte_eth_dev_data *dev_data = I40E_VSI_TO_DEV_DATA(vsi);
> + struct i40e_pf *pf = I40E_VSI_TO_PF(vsi);
> + uint16_t used_queues;
> 
>   ret = validate_tcmap_parameter(vsi, enabled_tcmap);
>   if (ret != I40E_SUCCESS)
> @@ -8100,7 +8102,18 @@ i40e_vsi_update_queue_mapping(struct i40e_vsi
> *vsi,
>   total_tc = 1;
>   vsi->enabled_tc = enabled_tcmap;
> 
> - qpnum_per_tc = dev_data->nb_rx_queues / total_tc;
> + /* different VSI has different queues assigned */
> + if (vsi->type == I40E_VSI_MAIN)
> + used_queues = dev_data->nb_rx_queues -
> + pf->nb_cfg_vmdq_vsi *
> RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM;
> + else if (vsi->type == I40E_VSI_VMDQ2)
> + used_queues = RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM;
> + else {
> + PMD_INIT_LOG(ERR, "unsupported VSI type.");
> + return I40E_ERR_NO_AVAILABLE_VSI;
> + }
> +
> + qpnum_per_tc = used_queues / total_tc;
>   /* Number of queues per enabled TC */
>   if (qpnum_per_tc == 0) {
>   PMD_INIT_LOG(ERR, " number of queues is less that tcs.");
> @@ -8145,6 +8158,93 @@ i40e_vsi_update_queue_mapping(struct i40e_vsi
> *vsi,  }
> 
>  /*
> + * i40e_config_switch_comp_tc - Configure VEB tc setting for given TC
> +map
> + * @veb: VEB to be configured
> + * @tc_map: enabled TC bitmap
> + *
> + * Returns 0 on success, negative value on failure  */ static enum
> +i40e_status_code i40e_config_switch_comp_tc(struct i40e_veb *veb,
> +uint8_t tc_map) {
> + struct i40e_aqc_configure_switching_comp_bw_config_data
> veb_bw;
> + struct i40e_aqc_query_switching_comp_bw_config_resp bw_query;
> + struct i40e_aqc_query_switching_comp_ets_config_resp ets_query;
> + struct i40e_hw *hw = I40E_VSI_TO_HW(veb->associate_vsi);
> + enum i40e_status_code ret = I40E_SUCCESS;
> + int i;
> + uint32_t bw_max;
> +
> + /* Check if enabled_tc is same as existing or new TCs */
> + if (veb->enabled_tc == tc_map)
> + return ret;
> +
> + /* configure tc bandwidth */
> + memset(_bw, 0, sizeof(veb_bw));
> + veb_bw.tc_valid_bits = tc_map;
> + /* Enable ETS TCs with equal BW Share for now across all VSIs */
> + for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) {
> + if (tc_map & BIT_ULL(i))
> + veb_bw.tc_bw_share_credits[i] = 1;
> + }
> + ret = i40e_aq_config_switch_comp_bw_config(hw, veb->seid,
> +_bw, NULL);
> + if (ret) {
> + PMD_INIT_LOG(ERR, "AQ command Config switch_comp BW
> allocation"
> +   " per TC failed = %d",
> +   hw->aq.asq_last_status);
> + return ret;
> + }
> +
> + memset(_query, 0, sizeof(ets_query));
> + ret =

[dpdk-dev] [PATCH v9 0/2] Add VHOST PMD

2016-02-24 Thread Qiu, Michael

Hi,  Tetsuya

When I applied your v6 patch, I could reach 9.5Mpps with 64B packet.

But when apply v9 only 8.4 Mpps, could you figure out why has
performance drop?

Thanks,
Michael
On 2/9/2016 5:38 PM, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost.
>
>
> PATCH v9 changes:
>  - Fix a null pointer access issue implemented in v8 patch.
>
> PATCH v8 changes:
>  - Manage ether devices list instead of internal structures list.
>  - Remove needless NULL checking.
>  - Replace "pthread_exit" to "return NULL".
>  - Replace rte_panic to RTE_LOG, also add error handling.
>  - Remove duplicated lines.
>  - Remove needless casting.
>  - Follow coding style.
>  - Remove needless parenthesis.
>
> PATCH v7 changes:
>  - Remove needless parenthesis.
>  - Add release note.
>  - Remove needless line wraps.
>  - Add null pointer check in vring_state_changed().
>  - Free queue memory in eth_queue_release().
>  - Fix wrong variable name.
>  - Fix error handling code of eth_dev_vhost_create() and
>rte_pmd_vhost_devuninit().
>  - Remove needless null checking from rte_pmd_vhost_devinit/devuninit().
>  - Use port id to create mac address.
>  - Add doxygen style comments in "rte_eth_vhost.h".
>  - Fix wrong comment in "mk/rte.app.mk".
>
> PATCH v6 changes:
>  - Remove rte_vhost_driver_pmd_callback_registe().
>  - Support link status interrupt.
>  - Support queue state changed interrupt.
>  - Add rte_eth_vhost_get_queue_event().
>  - Support numa node detection when new device is connected.
>
> PATCH v5 changes:
>  - Rebase on latest master.
>  - Fix RX/TX routine to count RX/TX bytes.
>  - Fix RX/TX routine not to count as error packets if enqueue/dequeue
>cannot send all packets.
>  - Fix if-condition checking for multiqueues.
>  - Add "static" to pthread variable.
>  - Fix format.
>  - Change default behavior not to receive queueing event from driver.
>  - Split the patch to separate rte_eth_vhost_portid2vdev().
>
> PATCH v4 changes:
>  - Rebase on latest DPDK tree.
>  - Fix cording style.
>  - Fix code not to invoke multiple messaging handling threads.
>  - Fix code to handle vdev parameters correctly.
>  - Remove needless cast.
>  - Remove needless if-condition before rt_free().
>
> PATCH v3 changes:
>  - Rebase on latest matser
>  - Specify correct queue_id in RX/TX function.
>
> PATCH v2 changes:
>  - Remove a below patch that fixes vhost library.
>The patch was applied as a separate patch.
>- vhost: fix crash with multiqueue enabled
>  - Fix typos.
>(Thanks to Thomas, Monjalon)
>  - Rebase on latest tree with above bernard's patches.
>
> PATCH v1 changes:
>  - Support vhost multiple queues.
>  - Rebase on "remove pci driver from vdevs".
>  - Optimize RX/TX functions.
>  - Fix resource leaks.
>  - Fix compile issue.
>  - Add patch to fix vhost library.
>
> RFC PATCH v3 changes:
>  - Optimize performance.
>In RX/TX functions, change code to access only per core data.
>  - Add below API to allow user to use vhost library APIs for a port managed
>by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
> - rte_eth_vhost_portid2vdev()
>To support this functionality, vhost library is also changed.
>Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
>  - Add code to support vhost multiple queues.
>Actually, multiple queues functionality is not enabled so far.
>
> RFC PATCH v2 changes:
>  - Fix issues reported by checkpatch.pl
>(Thanks to Stephen Hemminger)
>
>
> Tetsuya Mukawa (2):
>   ethdev: Add a new event type to notify a queue state changed event
>   vhost: Add VHOST PMD
>
>  MAINTAINERS |   4 +
>  config/common_linuxapp  |   6 +
>  doc/guides/nics/index.rst   |   1 +
>  doc/guides/rel_notes/release_2_3.rst|   4 +
>  drivers/net/Makefile|   4 +
>  drivers/net/vhost/Makefile  |  62 ++
>  drivers/net/vhost/rte_eth_vhost.c   | 911 
> 
>  drivers/net/vhost/rte_eth_vhost.h   | 109 
>  drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
>  lib/librte_ether/rte_ethdev.h   |   2 +
>  mk/rte.app.mk   |   6 +
>  11 files changed, 1120 insertions(+)
>  create mode 100644 drivers/net/vhost/Makefile
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
>

[dpdk-dev] [PATCH v2 1/3] i40e: enable extended tag

2016-02-24 Thread Zhang, Helin



> -Original Message-
> From: Richardson, Bruce
> Sent: Tuesday, February 23, 2016 6:45 PM
> To: Zhang, Helin 
> Cc: dev at dpdk.org; zhe.tag at intel.com
> Subject: Re: [dpdk-dev] [PATCH v2 1/3] i40e: enable extended tag
> 
> On Mon, Feb 22, 2016 at 11:59:43AM +0800, Helin Zhang wrote:
> > PCIe feature of 'Extended Tag' is important for 40G performance.
> > It adds its enabling during each port initialization, to ensure the
> > high performance.
> >
> > Signed-off-by: Helin Zhang 
> > ---
> >  doc/guides/rel_notes/release_16_04.rst |  6 
> >  drivers/net/i40e/i40e_ethdev.c | 65
> --
> >  2 files changed, 68 insertions(+), 3 deletions(-)
> >
> > v2:
> >  - Changed the type of return value of i40e_enable_extended_tag() to 'void'.
> >
> > diff --git a/doc/guides/rel_notes/release_16_04.rst
> > b/doc/guides/rel_notes/release_16_04.rst
> > index 5786f74..bed5779 100644
> > --- a/doc/guides/rel_notes/release_16_04.rst
> > +++ b/doc/guides/rel_notes/release_16_04.rst
> > @@ -46,6 +46,12 @@ This section should contain new features added in this
> release. Sample format:
> >
> >  * **Added vhost-user live migration support.**
> >
> > +* **i40e: Enabled extended tag.**
> > +
> > +  It enabled extended tag by checking and writing corresponding PCI
> > + config  space bytes, to boost the performance. In the meanwhile, it
> > + deprecated the  legacy way via reading/writing sysfile supported by kernel
> module of igb_uio.
> > +
> 
> Hi Helin,
> 
> does this really need to go into the release notes? Is it a user-visible 
> change that
> affects the user experience in any way?
Previously we enable it in eal via igb_uio sys file (by default, it is 
disabled), which was deprecated now, and will be removed from next release.
All now added into i40e PMD init only.
Yes, user might see the performance boost without doing anything, which is 
different from previous version of DPDK.

I think it should be mentioned somewhere. Any better idea?
Thanks,
Helin

> 
> /Bruce

72 matches

Mail list logo